dect
/
linux-2.6
Archived
13
0
Fork 0
This repository has been archived on 2022-02-17. You can view files and clone it, but cannot push or open issues or pull requests.
linux-2.6/include/linux/slab.h

449 lines
14 KiB
C
Raw Normal View History

/*
* Written by Mark Hemment, 1996 (markhe@nextd.demon.co.uk).
*
* (C) SGI 2006, Christoph Lameter
* Cleaned up and restructured to ease the addition of alternative
* implementations of SLAB allocators.
*/
#ifndef _LINUX_SLAB_H
#define _LINUX_SLAB_H
#include <linux/gfp.h>
#include <linux/types.h>
#include <linux/workqueue.h>
/*
* Flags to pass to kmem_cache_create().
* The ones marked DEBUG are only valid if CONFIG_SLAB_DEBUG is set.
*/
#define SLAB_DEBUG_FREE 0x00000100UL /* DEBUG: Perform (expensive) checks on free */
#define SLAB_RED_ZONE 0x00000400UL /* DEBUG: Red zone objs in a cache */
#define SLAB_POISON 0x00000800UL /* DEBUG: Poison objects */
#define SLAB_HWCACHE_ALIGN 0x00002000UL /* Align objs on cache lines */
#define SLAB_CACHE_DMA 0x00004000UL /* Use GFP_DMA memory */
#define SLAB_STORE_USER 0x00010000UL /* DEBUG: Store the last owner for bug hunting */
#define SLAB_PANIC 0x00040000UL /* Panic if kmem_cache_create() fails */
/*
* SLAB_DESTROY_BY_RCU - **WARNING** READ THIS!
*
* This delays freeing the SLAB page by a grace period, it does _NOT_
* delay object freeing. This means that if you do kmem_cache_free()
* that memory location is free to be reused at any time. Thus it may
* be possible to see another object there in the same RCU grace period.
*
* This feature only ensures the memory location backing the object
* stays valid, the trick to using this is relying on an independent
* object validation pass. Something like:
*
* rcu_read_lock()
* again:
* obj = lockless_lookup(key);
* if (obj) {
* if (!try_get_ref(obj)) // might fail for free objects
* goto again;
*
* if (obj->key != key) { // not the object we expected
* put_ref(obj);
* goto again;
* }
* }
* rcu_read_unlock();
*
* See also the comment on struct slab_rcu in mm/slab.c.
*/
#define SLAB_DESTROY_BY_RCU 0x00080000UL /* Defer freeing slabs to RCU */
[PATCH] cpuset memory spread slab cache implementation Provide the slab cache infrastructure to support cpuset memory spreading. See the previous patches, cpuset_mem_spread, for an explanation of cpuset memory spreading. This patch provides a slab cache SLAB_MEM_SPREAD flag. If set in the kmem_cache_create() call defining a slab cache, then any task marked with the process state flag PF_MEMSPREAD will spread memory page allocations for that cache over all the allowed nodes, instead of preferring the local (faulting) node. On systems not configured with CONFIG_NUMA, this results in no change to the page allocation code path for slab caches. On systems with cpusets configured in the kernel, but the "memory_spread" cpuset option not enabled for the current tasks cpuset, this adds a call to a cpuset routine and failed bit test of the processor state flag PF_SPREAD_SLAB. For tasks so marked, a second inline test is done for the slab cache flag SLAB_MEM_SPREAD, and if that is set and if the allocation is not in_interrupt(), this adds a call to to a cpuset routine that computes which of the tasks mems_allowed nodes should be preferred for this allocation. ==> This patch adds another hook into the performance critical code path to allocating objects from the slab cache, in the ____cache_alloc() chunk, below. The next patch optimizes this hook, reducing the impact of the combined mempolicy plus memory spreading hooks on this critical code path to a single check against the tasks task_struct flags word. This patch provides the generic slab flags and logic needed to apply memory spreading to a particular slab. A subsequent patch will mark a few specific slab caches for this placement policy. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 11:16:07 +00:00
#define SLAB_MEM_SPREAD 0x00100000UL /* Spread some memory over cpuset */
SLUB core This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps all meta data in the corresponding page_struct. Objects can be naturally aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte boundaries and can fit tightly into a 4k page with no bytes left over. SLAB cannot do this. D. SLAB has a complex cache reaper SLUB does not need a cache reaper for UP systems. On SMP systems the per CPU slab may be pushed back into partial list but that operation is simple and does not require an iteration over a list of objects. SLAB expires per CPU, shared and alien object queues during cache reaping which may cause strange hold offs. E. SLAB has complex NUMA policy layer support SLUB pushes NUMA policy handling into the page allocator. This means that allocation is coarser (SLUB does interleave on a page level) but that situation was also present before 2.6.13. SLABs application of policies to individual slab objects allocated in SLAB is certainly a performance concern due to the frequent references to memory policies which may lead a sequence of objects to come from one node after another. SLUB will get a slab full of objects from one node and then will switch to the next. F. Reduction of the size of partial slab lists SLAB has per node partial lists. This means that over time a large number of partial slabs may accumulate on those lists. These can only be reused if allocator occur on specific nodes. SLUB has a global pool of partial slabs and will consume slabs from that pool to decrease fragmentation. G. Tunables SLAB has sophisticated tuning abilities for each slab cache. One can manipulate the queue sizes in detail. However, filling the queues still requires the uses of the spin lock to check out slabs. SLUB has a global parameter (min_slab_order) for tuning. Increasing the minimum slab order can decrease the locking overhead. The bigger the slab order the less motions of pages between per CPU and partial lists occur and the better SLUB will be scaling. G. Slab merging We often have slab caches with similar parameters. SLUB detects those on boot up and merges them into the corresponding general caches. This leads to more effective memory use. About 50% of all caches can be eliminated through slab merging. This will also decrease slab fragmentation because partial allocated slabs can be filled up again. Slab merging can be switched off by specifying slub_nomerge on boot up. Note that merging can expose heretofore unknown bugs in the kernel because corrupted objects may now be placed differently and corrupt differing neighboring objects. Enable sanity checks to find those. H. Diagnostics The current slab diagnostics are difficult to use and require a recompilation of the kernel. SLUB contains debugging code that is always available (but is kept out of the hot code paths). SLUB diagnostics can be enabled via the "slab_debug" option. Parameters can be specified to select a single or a group of slab caches for diagnostics. This means that the system is running with the usual performance and it is much more likely that race conditions can be reproduced. I. Resiliency If basic sanity checks are on then SLUB is capable of detecting common error conditions and recover as best as possible to allow the system to continue. J. Tracing Tracing can be enabled via the slab_debug=T,<slabcache> option during boot. SLUB will then protocol all actions on that slabcache and dump the object contents on free. K. On demand DMA cache creation. Generally DMA caches are not needed. If a kmalloc is used with __GFP_DMA then just create this single slabcache that is needed. For systems that have no ZONE_DMA requirement the support is completely eliminated. L. Performance increase Some benchmarks have shown speed improvements on kernbench in the range of 5-10%. The locking overhead of slub is based on the underlying base allocation size. If we can reliably allocate larger order pages then it is possible to increase slub performance much further. The anti-fragmentation patches may enable further performance increases. Tested on: i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator SLUB Boot options slub_nomerge Disable merging of slabs slub_min_order=x Require a minimum order for slab caches. This increases the managed chunk size and therefore reduces meta data and locking overhead. slub_min_objects=x Mininum objects per slab. Default is 8. slub_max_order=x Avoid generating slabs larger than order specified. slub_debug Enable all diagnostics for all caches slub_debug=<options> Enable selective options for all caches slub_debug=<o>,<cache> Enable selective options for a certain set of caches Available Debug options F Double Free checking, sanity and resiliency R Red zoning P Object / padding poisoning U Track last free / alloc T Trace all allocs / frees (only use for individual slabs). To use SLUB: Apply this patch and then select SLUB as the default slab allocator. [hugh@veritas.com: fix an oops-causing locking error] [akpm@linux-foundation.org: various stupid cleanups and small fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-06 21:49:36 +00:00
#define SLAB_TRACE 0x00200000UL /* Trace allocations and frees */
/* Flag to prevent checks on free */
#ifdef CONFIG_DEBUG_OBJECTS
# define SLAB_DEBUG_OBJECTS 0x00400000UL
#else
# define SLAB_DEBUG_OBJECTS 0x00000000UL
#endif
#define SLAB_NOLEAKTRACE 0x00800000UL /* Avoid kmemleak tracing */
2008-05-31 13:56:17 +00:00
/* Don't track use of uninitialized memory */
#ifdef CONFIG_KMEMCHECK
# define SLAB_NOTRACK 0x01000000UL
#else
# define SLAB_NOTRACK 0x00000000UL
#endif
#ifdef CONFIG_FAILSLAB
# define SLAB_FAILSLAB 0x02000000UL /* Fault injection mark */
#else
# define SLAB_FAILSLAB 0x00000000UL
#endif
2008-05-31 13:56:17 +00:00
/* The following flags affect the page allocator grouping pages by mobility */
#define SLAB_RECLAIM_ACCOUNT 0x00020000UL /* Objects are reclaimable */
#define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */
/*
* ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
*
* Dereferencing ZERO_SIZE_PTR will lead to a distinct access fault.
*
* ZERO_SIZE_PTR can be passed to kfree though in the same way that NULL can.
* Both make kfree a no-op.
*/
#define ZERO_SIZE_PTR ((void *)16)
#define ZERO_OR_NULL_PTR(x) ((unsigned long)(x) <= \
(unsigned long)ZERO_SIZE_PTR)
/*
* Common fields provided in kmem_cache by all slab allocators
* This struct is either used directly by the allocator (SLOB)
* or the allocator must include definitions for all fields
* provided in kmem_cache_common in their definition of kmem_cache.
*
* Once we can do anonymous structs (C11 standard) we could put a
* anonymous struct definition in these allocators so that the
* separate allocations in the kmem_cache structure of SLAB and
* SLUB is no longer needed.
*/
#ifdef CONFIG_SLOB
struct kmem_cache {
unsigned int object_size;/* The original size of the object */
unsigned int size; /* The aligned/padded/added on size */
unsigned int align; /* Alignment as calculated */
unsigned long flags; /* Active flags on the slab */
const char *name; /* Slab name for sysfs */
int refcount; /* Use counter */
void (*ctor)(void *); /* Called on object slot creation */
struct list_head list; /* List of all slab caches on the system */
};
#endif
struct mem_cgroup;
/*
* struct kmem_cache related prototypes
*/
void __init kmem_cache_init(void);
SLUB core This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps all meta data in the corresponding page_struct. Objects can be naturally aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte boundaries and can fit tightly into a 4k page with no bytes left over. SLAB cannot do this. D. SLAB has a complex cache reaper SLUB does not need a cache reaper for UP systems. On SMP systems the per CPU slab may be pushed back into partial list but that operation is simple and does not require an iteration over a list of objects. SLAB expires per CPU, shared and alien object queues during cache reaping which may cause strange hold offs. E. SLAB has complex NUMA policy layer support SLUB pushes NUMA policy handling into the page allocator. This means that allocation is coarser (SLUB does interleave on a page level) but that situation was also present before 2.6.13. SLABs application of policies to individual slab objects allocated in SLAB is certainly a performance concern due to the frequent references to memory policies which may lead a sequence of objects to come from one node after another. SLUB will get a slab full of objects from one node and then will switch to the next. F. Reduction of the size of partial slab lists SLAB has per node partial lists. This means that over time a large number of partial slabs may accumulate on those lists. These can only be reused if allocator occur on specific nodes. SLUB has a global pool of partial slabs and will consume slabs from that pool to decrease fragmentation. G. Tunables SLAB has sophisticated tuning abilities for each slab cache. One can manipulate the queue sizes in detail. However, filling the queues still requires the uses of the spin lock to check out slabs. SLUB has a global parameter (min_slab_order) for tuning. Increasing the minimum slab order can decrease the locking overhead. The bigger the slab order the less motions of pages between per CPU and partial lists occur and the better SLUB will be scaling. G. Slab merging We often have slab caches with similar parameters. SLUB detects those on boot up and merges them into the corresponding general caches. This leads to more effective memory use. About 50% of all caches can be eliminated through slab merging. This will also decrease slab fragmentation because partial allocated slabs can be filled up again. Slab merging can be switched off by specifying slub_nomerge on boot up. Note that merging can expose heretofore unknown bugs in the kernel because corrupted objects may now be placed differently and corrupt differing neighboring objects. Enable sanity checks to find those. H. Diagnostics The current slab diagnostics are difficult to use and require a recompilation of the kernel. SLUB contains debugging code that is always available (but is kept out of the hot code paths). SLUB diagnostics can be enabled via the "slab_debug" option. Parameters can be specified to select a single or a group of slab caches for diagnostics. This means that the system is running with the usual performance and it is much more likely that race conditions can be reproduced. I. Resiliency If basic sanity checks are on then SLUB is capable of detecting common error conditions and recover as best as possible to allow the system to continue. J. Tracing Tracing can be enabled via the slab_debug=T,<slabcache> option during boot. SLUB will then protocol all actions on that slabcache and dump the object contents on free. K. On demand DMA cache creation. Generally DMA caches are not needed. If a kmalloc is used with __GFP_DMA then just create this single slabcache that is needed. For systems that have no ZONE_DMA requirement the support is completely eliminated. L. Performance increase Some benchmarks have shown speed improvements on kernbench in the range of 5-10%. The locking overhead of slub is based on the underlying base allocation size. If we can reliably allocate larger order pages then it is possible to increase slub performance much further. The anti-fragmentation patches may enable further performance increases. Tested on: i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator SLUB Boot options slub_nomerge Disable merging of slabs slub_min_order=x Require a minimum order for slab caches. This increases the managed chunk size and therefore reduces meta data and locking overhead. slub_min_objects=x Mininum objects per slab. Default is 8. slub_max_order=x Avoid generating slabs larger than order specified. slub_debug Enable all diagnostics for all caches slub_debug=<options> Enable selective options for all caches slub_debug=<o>,<cache> Enable selective options for a certain set of caches Available Debug options F Double Free checking, sanity and resiliency R Red zoning P Object / padding poisoning U Track last free / alloc T Trace all allocs / frees (only use for individual slabs). To use SLUB: Apply this patch and then select SLUB as the default slab allocator. [hugh@veritas.com: fix an oops-causing locking error] [akpm@linux-foundation.org: various stupid cleanups and small fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-06 21:49:36 +00:00
int slab_is_available(void);
struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
unsigned long,
void (*)(void *));
struct kmem_cache *
kmem_cache_create_memcg(struct mem_cgroup *, const char *, size_t, size_t,
slab: propagate tunable values SLAB allows us to tune a particular cache behavior with tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This could be done by an explicit call to do_tune_cpucache() after the cache is created. But this is not very convenient now that the caches are created from common code, since this function is SLAB-specific. Another method of doing that is taking advantage of the fact that do_tune_cpucache() is always called from enable_cpucache(), which is called at cache initialization. We can just preset the values, and then things work as expected. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. This change will require us to move the assignment of root_cache in memcg_params a bit earlier. We need this to be already set - which memcg_kmem_register_cache will do - when we reach __kmem_cache_create() Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18 22:23:03 +00:00
unsigned long, void (*)(void *), struct kmem_cache *);
void kmem_cache_destroy(struct kmem_cache *);
int kmem_cache_shrink(struct kmem_cache *);
void kmem_cache_free(struct kmem_cache *, void *);
/*
* Please use this macro to create slab caches. Simply specify the
* name of the structure and maybe some flags that are listed above.
*
* The alignment of the struct determines object alignment. If you
* f.e. add ____cacheline_aligned_in_smp to the struct declaration
* then the objects will be properly aligned in SMP configurations.
*/
#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\
sizeof(struct __struct), __alignof__(struct __struct),\
(__flags), NULL)
/*
* The largest kmalloc size supported by the slab allocators is
* 32 megabyte (2^25) or the maximum allocatable page order if that is
* less than 32 MB.
*
* WARNING: Its not easy to increase this value since the allocators have
* to do various tricks to work around compiler limitations in order to
* ensure proper constant folding.
*/
#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
(MAX_ORDER + PAGE_SHIFT - 1) : 25)
#define KMALLOC_MAX_SIZE (1UL << KMALLOC_SHIFT_HIGH)
#define KMALLOC_MAX_ORDER (KMALLOC_SHIFT_HIGH - PAGE_SHIFT)
/*
* Some archs want to perform DMA into kmalloc caches and need a guaranteed
* alignment larger than the alignment of a 64-bit integer.
* Setting ARCH_KMALLOC_MINALIGN in arch headers allows that.
*/
#ifdef ARCH_DMA_MINALIGN
#define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN
#else
#define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
#endif
/*
* Setting ARCH_SLAB_MINALIGN in arch headers allows a different alignment.
* Intended for arches that get misalignment faults even for 64 bit integer
* aligned buffers.
*/
#ifndef ARCH_SLAB_MINALIGN
#define ARCH_SLAB_MINALIGN __alignof__(unsigned long long)
#endif
/*
* This is the main placeholder for memcg-related information in kmem caches.
* struct kmem_cache will hold a pointer to it, so the memory cost while
* disabled is 1 pointer. The runtime cost while enabled, gets bigger than it
* would otherwise be if that would be bundled in kmem_cache: we'll need an
* extra pointer chase. But the trade off clearly lays in favor of not
* penalizing non-users.
*
* Both the root cache and the child caches will have it. For the root cache,
* this will hold a dynamically allocated array large enough to hold
* information about the currently limited memcgs in the system.
*
* Child caches will hold extra metadata needed for its operation. Fields are:
*
* @memcg: pointer to the memcg this cache belongs to
* @list: list_head for the list of all caches in this memcg
* @root_cache: pointer to the global, root cache, this cache was derived from
* @dead: set to true after the memcg dies; the cache may still be around.
* @nr_pages: number of pages that belongs to this cache.
* @destroy: worker to be called whenever we are ready, or believe we may be
* ready, to destroy this cache.
*/
struct memcg_cache_params {
bool is_root_cache;
union {
struct kmem_cache *memcg_caches[0];
struct {
struct mem_cgroup *memcg;
struct list_head list;
struct kmem_cache *root_cache;
bool dead;
atomic_t nr_pages;
struct work_struct destroy;
};
};
};
int memcg_update_all_caches(int num_memcgs);
struct seq_file;
int cache_show(struct kmem_cache *s, struct seq_file *m);
void print_slabinfo_header(struct seq_file *m);
/*
* Common kmalloc functions provided by all allocators
*/
void * __must_check __krealloc(const void *, size_t, gfp_t);
void * __must_check krealloc(const void *, size_t, gfp_t);
void kfree(const void *);
void kzfree(const void *);
size_t ksize(const void *);
/*
* Allocator specific definitions. These are mainly used to establish optimized
* ways to convert kmalloc() calls to kmem_cache_alloc() invocations by
* selecting the appropriate general cache at compile time.
*
* Allocators must define at least:
*
* kmem_cache_alloc()
* __kmalloc()
* kmalloc()
*
* Those wishing to support NUMA must also define:
*
* kmem_cache_alloc_node()
* kmalloc_node()
*
* See each allocator definition file for additional comments and
* implementation notes.
*/
#ifdef CONFIG_SLUB
#include <linux/slub_def.h>
#elif defined(CONFIG_SLOB)
#include <linux/slob_def.h>
#else
#include <linux/slab_def.h>
#endif
/**
* kmalloc_array - allocate memory for an array.
* @n: number of elements.
* @size: element size.
* @flags: the type of memory to allocate.
*
* The @flags argument may be one of:
*
* %GFP_USER - Allocate memory on behalf of user. May sleep.
*
* %GFP_KERNEL - Allocate normal kernel ram. May sleep.
*
slob: initial NUMA support This adds preliminary NUMA support to SLOB, primarily aimed at systems with small nodes (tested all the way down to a 128kB SRAM block), whether asymmetric or otherwise. We follow the same conventions as SLAB/SLUB, preferring current node placement for new pages, or with explicit placement, if a node has been specified. Presently on UP NUMA this has the side-effect of preferring node#0 allocations (since numa_node_id() == 0, though this could be reworked if we could hand off a pfn to determine node placement), so single-CPU NUMA systems will want to place smaller nodes further out in terms of node id. Once a page has been bound to a node (via explicit node id typing), we only do block allocations from partial free pages that have a matching node id in the page flags. The current implementation does have some scalability problems, in that all partial free pages are tracked in the global freelist (with contention due to the single spinlock). However, these are things that are being reworked for SMP scalability first, while things like per-node freelists can easily be built on top of this sort of functionality once it's been added. More background can be found in: http://marc.info/?l=linux-mm&m=118117916022379&w=2 http://marc.info/?l=linux-mm&m=118170446306199&w=2 http://marc.info/?l=linux-mm&m=118187859420048&w=2 and subsequent threads. Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 06:38:22 +00:00
* %GFP_ATOMIC - Allocation will not sleep. May use emergency pools.
* For example, use this inside interrupt handlers.
*
* %GFP_HIGHUSER - Allocate pages from high memory.
*
* %GFP_NOIO - Do not do any I/O at all while trying to get memory.
*
* %GFP_NOFS - Do not make any fs calls while trying to get memory.
*
slob: initial NUMA support This adds preliminary NUMA support to SLOB, primarily aimed at systems with small nodes (tested all the way down to a 128kB SRAM block), whether asymmetric or otherwise. We follow the same conventions as SLAB/SLUB, preferring current node placement for new pages, or with explicit placement, if a node has been specified. Presently on UP NUMA this has the side-effect of preferring node#0 allocations (since numa_node_id() == 0, though this could be reworked if we could hand off a pfn to determine node placement), so single-CPU NUMA systems will want to place smaller nodes further out in terms of node id. Once a page has been bound to a node (via explicit node id typing), we only do block allocations from partial free pages that have a matching node id in the page flags. The current implementation does have some scalability problems, in that all partial free pages are tracked in the global freelist (with contention due to the single spinlock). However, these are things that are being reworked for SMP scalability first, while things like per-node freelists can easily be built on top of this sort of functionality once it's been added. More background can be found in: http://marc.info/?l=linux-mm&m=118117916022379&w=2 http://marc.info/?l=linux-mm&m=118170446306199&w=2 http://marc.info/?l=linux-mm&m=118187859420048&w=2 and subsequent threads. Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 06:38:22 +00:00
* %GFP_NOWAIT - Allocation will not sleep.
*
* %GFP_THISNODE - Allocate node-local memory only.
*
* %GFP_DMA - Allocation suitable for DMA.
* Should only be used for kmalloc() caches. Otherwise, use a
* slab created with SLAB_DMA.
*
* Also it is possible to set different flags by OR'ing
* in one or more of the following additional @flags:
*
* %__GFP_COLD - Request cache-cold pages instead of
* trying to return cache-warm pages.
*
* %__GFP_HIGH - This allocation has high priority and may use emergency pools.
*
* %__GFP_NOFAIL - Indicate that this allocation is in no way allowed to fail
* (think twice before using).
*
* %__GFP_NORETRY - If memory is not immediately available,
* then give up at once.
*
* %__GFP_NOWARN - If allocation fails, don't issue any warnings.
*
* %__GFP_REPEAT - If allocation fails initially, try once more before failing.
slob: initial NUMA support This adds preliminary NUMA support to SLOB, primarily aimed at systems with small nodes (tested all the way down to a 128kB SRAM block), whether asymmetric or otherwise. We follow the same conventions as SLAB/SLUB, preferring current node placement for new pages, or with explicit placement, if a node has been specified. Presently on UP NUMA this has the side-effect of preferring node#0 allocations (since numa_node_id() == 0, though this could be reworked if we could hand off a pfn to determine node placement), so single-CPU NUMA systems will want to place smaller nodes further out in terms of node id. Once a page has been bound to a node (via explicit node id typing), we only do block allocations from partial free pages that have a matching node id in the page flags. The current implementation does have some scalability problems, in that all partial free pages are tracked in the global freelist (with contention due to the single spinlock). However, these are things that are being reworked for SMP scalability first, while things like per-node freelists can easily be built on top of this sort of functionality once it's been added. More background can be found in: http://marc.info/?l=linux-mm&m=118117916022379&w=2 http://marc.info/?l=linux-mm&m=118170446306199&w=2 http://marc.info/?l=linux-mm&m=118187859420048&w=2 and subsequent threads. Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 06:38:22 +00:00
*
* There are other flags available as well, but these are not intended
* for general use, and so are not documented here. For a full list of
* potential flags, always refer to linux/gfp.h.
*/
static inline void *kmalloc_array(size_t n, size_t size, gfp_t flags)
{
if (size != 0 && n > SIZE_MAX / size)
slob: initial NUMA support This adds preliminary NUMA support to SLOB, primarily aimed at systems with small nodes (tested all the way down to a 128kB SRAM block), whether asymmetric or otherwise. We follow the same conventions as SLAB/SLUB, preferring current node placement for new pages, or with explicit placement, if a node has been specified. Presently on UP NUMA this has the side-effect of preferring node#0 allocations (since numa_node_id() == 0, though this could be reworked if we could hand off a pfn to determine node placement), so single-CPU NUMA systems will want to place smaller nodes further out in terms of node id. Once a page has been bound to a node (via explicit node id typing), we only do block allocations from partial free pages that have a matching node id in the page flags. The current implementation does have some scalability problems, in that all partial free pages are tracked in the global freelist (with contention due to the single spinlock). However, these are things that are being reworked for SMP scalability first, while things like per-node freelists can easily be built on top of this sort of functionality once it's been added. More background can be found in: http://marc.info/?l=linux-mm&m=118117916022379&w=2 http://marc.info/?l=linux-mm&m=118170446306199&w=2 http://marc.info/?l=linux-mm&m=118187859420048&w=2 and subsequent threads. Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 06:38:22 +00:00
return NULL;
return __kmalloc(n * size, flags);
}
/**
* kcalloc - allocate memory for an array. The memory is set to zero.
* @n: number of elements.
* @size: element size.
* @flags: the type of memory to allocate (see kmalloc).
*/
static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
{
return kmalloc_array(n, size, flags | __GFP_ZERO);
}
slob: initial NUMA support This adds preliminary NUMA support to SLOB, primarily aimed at systems with small nodes (tested all the way down to a 128kB SRAM block), whether asymmetric or otherwise. We follow the same conventions as SLAB/SLUB, preferring current node placement for new pages, or with explicit placement, if a node has been specified. Presently on UP NUMA this has the side-effect of preferring node#0 allocations (since numa_node_id() == 0, though this could be reworked if we could hand off a pfn to determine node placement), so single-CPU NUMA systems will want to place smaller nodes further out in terms of node id. Once a page has been bound to a node (via explicit node id typing), we only do block allocations from partial free pages that have a matching node id in the page flags. The current implementation does have some scalability problems, in that all partial free pages are tracked in the global freelist (with contention due to the single spinlock). However, these are things that are being reworked for SMP scalability first, while things like per-node freelists can easily be built on top of this sort of functionality once it's been added. More background can be found in: http://marc.info/?l=linux-mm&m=118117916022379&w=2 http://marc.info/?l=linux-mm&m=118170446306199&w=2 http://marc.info/?l=linux-mm&m=118187859420048&w=2 and subsequent threads. Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 06:38:22 +00:00
#if !defined(CONFIG_NUMA) && !defined(CONFIG_SLOB)
/**
* kmalloc_node - allocate memory from a specific node
* @size: how many bytes of memory are required.
* @flags: the type of memory to allocate (see kcalloc).
* @node: node to allocate from.
*
* kmalloc() for non-local nodes, used to allocate from a specific node
* if available. Equivalent to kmalloc() in the non-NUMA single-node
* case.
*/
static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
return kmalloc(size, flags);
}
static inline void *__kmalloc_node(size_t size, gfp_t flags, int node)
{
return __kmalloc(size, flags);
}
slob: initial NUMA support This adds preliminary NUMA support to SLOB, primarily aimed at systems with small nodes (tested all the way down to a 128kB SRAM block), whether asymmetric or otherwise. We follow the same conventions as SLAB/SLUB, preferring current node placement for new pages, or with explicit placement, if a node has been specified. Presently on UP NUMA this has the side-effect of preferring node#0 allocations (since numa_node_id() == 0, though this could be reworked if we could hand off a pfn to determine node placement), so single-CPU NUMA systems will want to place smaller nodes further out in terms of node id. Once a page has been bound to a node (via explicit node id typing), we only do block allocations from partial free pages that have a matching node id in the page flags. The current implementation does have some scalability problems, in that all partial free pages are tracked in the global freelist (with contention due to the single spinlock). However, these are things that are being reworked for SMP scalability first, while things like per-node freelists can easily be built on top of this sort of functionality once it's been added. More background can be found in: http://marc.info/?l=linux-mm&m=118117916022379&w=2 http://marc.info/?l=linux-mm&m=118170446306199&w=2 http://marc.info/?l=linux-mm&m=118187859420048&w=2 and subsequent threads. Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 06:38:22 +00:00
void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
static inline void *kmem_cache_alloc_node(struct kmem_cache *cachep,
gfp_t flags, int node)
{
return kmem_cache_alloc(cachep, flags);
}
#endif /* !CONFIG_NUMA && !CONFIG_SLOB */
/*
* kmalloc_track_caller is a special version of kmalloc that records the
* calling function of the routine calling it for slab leak tracking instead
* of just the calling function (confusing, eh?).
* It's useful when the call to kmalloc comes from a widely-used standard
* allocator where we care about the real place the memory allocation
* request comes from.
*/
#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB) || \
(defined(CONFIG_SLAB) && defined(CONFIG_TRACING)) || \
(defined(CONFIG_SLOB) && defined(CONFIG_TRACING))
extern void *__kmalloc_track_caller(size_t, gfp_t, unsigned long);
#define kmalloc_track_caller(size, flags) \
__kmalloc_track_caller(size, flags, _RET_IP_)
#else
#define kmalloc_track_caller(size, flags) \
__kmalloc(size, flags)
#endif /* DEBUG_SLAB */
[PATCH] add kmalloc_node, inline cleanup The patch makes the following function calls available to allocate memory on a specific node without changing the basic operation of the slab allocator: kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node); kmalloc_node(size_t size, unsigned int flags, int node); in a similar way to the existing node-blind functions: kmem_cache_alloc(kmem_cache_t *cachep, unsigned int flags); kmalloc(size, flags); kmem_cache_alloc_node was changed to pass flags and the node information through the existing layers of the slab allocator (which lead to some minor rearrangements). The functions at the lowest layer (kmem_getpages, cache_grow) are already node aware. Also __alloc_percpu can call kmalloc_node now. Performance measurements (using the pageset localization patch) yields: w/o patches: Tasks jobs/min jti jobs/min/task real cpu 1 484.27 100 484.2736 12.02 1.97 Wed Mar 30 20:50:43 2005 100 25170.83 91 251.7083 23.12 150.10 Wed Mar 30 20:51:06 2005 200 34601.66 84 173.0083 33.64 294.14 Wed Mar 30 20:51:40 2005 300 37154.47 86 123.8482 46.99 436.56 Wed Mar 30 20:52:28 2005 400 39839.82 80 99.5995 58.43 580.46 Wed Mar 30 20:53:27 2005 500 40036.32 79 80.0726 72.68 728.60 Wed Mar 30 20:54:40 2005 600 44074.21 79 73.4570 79.23 872.10 Wed Mar 30 20:55:59 2005 700 44016.60 78 62.8809 92.56 1015.84 Wed Mar 30 20:57:32 2005 800 40411.05 80 50.5138 115.22 1161.13 Wed Mar 30 20:59:28 2005 900 42298.56 79 46.9984 123.83 1303.42 Wed Mar 30 21:01:33 2005 1000 40955.05 80 40.9551 142.11 1441.92 Wed Mar 30 21:03:55 2005 with pageset localization and slab API patches: Tasks jobs/min jti jobs/min/task real cpu 1 484.19 100 484.1930 12.02 1.98 Wed Mar 30 21:10:18 2005 100 27428.25 92 274.2825 21.22 149.79 Wed Mar 30 21:10:40 2005 200 37228.94 86 186.1447 31.27 293.49 Wed Mar 30 21:11:12 2005 300 41725.42 85 139.0847 41.84 434.10 Wed Mar 30 21:11:54 2005 400 43032.22 82 107.5805 54.10 582.06 Wed Mar 30 21:12:48 2005 500 42211.23 83 84.4225 68.94 722.61 Wed Mar 30 21:13:58 2005 600 40084.49 82 66.8075 87.12 873.11 Wed Mar 30 21:15:25 2005 700 44169.30 79 63.0990 92.24 1008.77 Wed Mar 30 21:16:58 2005 800 43097.94 79 53.8724 108.03 1155.88 Wed Mar 30 21:18:47 2005 900 41846.75 79 46.4964 125.17 1303.38 Wed Mar 30 21:20:52 2005 1000 40247.85 79 40.2478 144.60 1442.21 Wed Mar 30 21:23:17 2005 Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 15:58:38 +00:00
#ifdef CONFIG_NUMA
/*
* kmalloc_node_track_caller is a special version of kmalloc_node that
* records the calling function of the routine calling it for slab leak
* tracking instead of just the calling function (confusing, eh?).
* It's useful when the call to kmalloc_node comes from a widely-used
* standard allocator where we care about the real place the memory
* allocation request comes from.
*/
#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB) || \
(defined(CONFIG_SLAB) && defined(CONFIG_TRACING)) || \
(defined(CONFIG_SLOB) && defined(CONFIG_TRACING))
extern void *__kmalloc_node_track_caller(size_t, gfp_t, int, unsigned long);
#define kmalloc_node_track_caller(size, flags, node) \
__kmalloc_node_track_caller(size, flags, node, \
_RET_IP_)
#else
#define kmalloc_node_track_caller(size, flags, node) \
__kmalloc_node(size, flags, node)
#endif
#else /* CONFIG_NUMA */
#define kmalloc_node_track_caller(size, flags, node) \
kmalloc_track_caller(size, flags)
[PATCH] add kmalloc_node, inline cleanup The patch makes the following function calls available to allocate memory on a specific node without changing the basic operation of the slab allocator: kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node); kmalloc_node(size_t size, unsigned int flags, int node); in a similar way to the existing node-blind functions: kmem_cache_alloc(kmem_cache_t *cachep, unsigned int flags); kmalloc(size, flags); kmem_cache_alloc_node was changed to pass flags and the node information through the existing layers of the slab allocator (which lead to some minor rearrangements). The functions at the lowest layer (kmem_getpages, cache_grow) are already node aware. Also __alloc_percpu can call kmalloc_node now. Performance measurements (using the pageset localization patch) yields: w/o patches: Tasks jobs/min jti jobs/min/task real cpu 1 484.27 100 484.2736 12.02 1.97 Wed Mar 30 20:50:43 2005 100 25170.83 91 251.7083 23.12 150.10 Wed Mar 30 20:51:06 2005 200 34601.66 84 173.0083 33.64 294.14 Wed Mar 30 20:51:40 2005 300 37154.47 86 123.8482 46.99 436.56 Wed Mar 30 20:52:28 2005 400 39839.82 80 99.5995 58.43 580.46 Wed Mar 30 20:53:27 2005 500 40036.32 79 80.0726 72.68 728.60 Wed Mar 30 20:54:40 2005 600 44074.21 79 73.4570 79.23 872.10 Wed Mar 30 20:55:59 2005 700 44016.60 78 62.8809 92.56 1015.84 Wed Mar 30 20:57:32 2005 800 40411.05 80 50.5138 115.22 1161.13 Wed Mar 30 20:59:28 2005 900 42298.56 79 46.9984 123.83 1303.42 Wed Mar 30 21:01:33 2005 1000 40955.05 80 40.9551 142.11 1441.92 Wed Mar 30 21:03:55 2005 with pageset localization and slab API patches: Tasks jobs/min jti jobs/min/task real cpu 1 484.19 100 484.1930 12.02 1.98 Wed Mar 30 21:10:18 2005 100 27428.25 92 274.2825 21.22 149.79 Wed Mar 30 21:10:40 2005 200 37228.94 86 186.1447 31.27 293.49 Wed Mar 30 21:11:12 2005 300 41725.42 85 139.0847 41.84 434.10 Wed Mar 30 21:11:54 2005 400 43032.22 82 107.5805 54.10 582.06 Wed Mar 30 21:12:48 2005 500 42211.23 83 84.4225 68.94 722.61 Wed Mar 30 21:13:58 2005 600 40084.49 82 66.8075 87.12 873.11 Wed Mar 30 21:15:25 2005 700 44169.30 79 63.0990 92.24 1008.77 Wed Mar 30 21:16:58 2005 800 43097.94 79 53.8724 108.03 1155.88 Wed Mar 30 21:18:47 2005 900 41846.75 79 46.4964 125.17 1303.38 Wed Mar 30 21:20:52 2005 1000 40247.85 79 40.2478 144.60 1442.21 Wed Mar 30 21:23:17 2005 Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 15:58:38 +00:00
#endif /* CONFIG_NUMA */
[PATCH] slob: introduce the SLOB allocator configurable replacement for slab allocator This adds a CONFIG_SLAB option under CONFIG_EMBEDDED. When CONFIG_SLAB is disabled, the kernel falls back to using the 'SLOB' allocator. SLOB is a traditional K&R/UNIX allocator with a SLAB emulation layer, similar to the original Linux kmalloc allocator that SLAB replaced. It's signicantly smaller code and is more memory efficient. But like all similar allocators, it scales poorly and suffers from fragmentation more than SLAB, so it's only appropriate for small systems. It's been tested extensively in the Linux-tiny tree. I've also stress-tested it with make -j 8 compiles on a 3G SMP+PREEMPT box (not recommended). Here's a comparison for otherwise identical builds, showing SLOB saving nearly half a megabyte of RAM: $ size vmlinux* text data bss dec hex filename 3336372 529360 190812 4056544 3de5e0 vmlinux-slab 3323208 527948 190684 4041840 3dac70 vmlinux-slob $ size mm/{slab,slob}.o text data bss dec hex filename 13221 752 48 14021 36c5 mm/slab.o 1896 52 8 1956 7a4 mm/slob.o /proc/meminfo: SLAB SLOB delta MemTotal: 27964 kB 27980 kB +16 kB MemFree: 24596 kB 25092 kB +496 kB Buffers: 36 kB 36 kB 0 kB Cached: 1188 kB 1188 kB 0 kB SwapCached: 0 kB 0 kB 0 kB Active: 608 kB 600 kB -8 kB Inactive: 808 kB 812 kB +4 kB HighTotal: 0 kB 0 kB 0 kB HighFree: 0 kB 0 kB 0 kB LowTotal: 27964 kB 27980 kB +16 kB LowFree: 24596 kB 25092 kB +496 kB SwapTotal: 0 kB 0 kB 0 kB SwapFree: 0 kB 0 kB 0 kB Dirty: 4 kB 12 kB +8 kB Writeback: 0 kB 0 kB 0 kB Mapped: 560 kB 556 kB -4 kB Slab: 1756 kB 0 kB -1756 kB CommitLimit: 13980 kB 13988 kB +8 kB Committed_AS: 4208 kB 4208 kB 0 kB PageTables: 28 kB 28 kB 0 kB VmallocTotal: 1007312 kB 1007312 kB 0 kB VmallocUsed: 48 kB 48 kB 0 kB VmallocChunk: 1007264 kB 1007264 kB 0 kB (this work has been sponsored in part by CELF) From: Ingo Molnar <mingo@elte.hu> Fix 32-bitness bugs in mm/slob.c. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 09:01:45 +00:00
/*
* Shortcuts
*/
static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags)
{
return kmem_cache_alloc(k, flags | __GFP_ZERO);
}
/**
* kzalloc - allocate memory. The memory is set to zero.
* @size: how many bytes of memory are required.
* @flags: the type of memory to allocate (see kmalloc).
*/
static inline void *kzalloc(size_t size, gfp_t flags)
{
return kmalloc(size, flags | __GFP_ZERO);
}
/**
* kzalloc_node - allocate zeroed memory from a particular memory node.
* @size: how many bytes of memory are required.
* @flags: the type of memory to allocate (see kmalloc).
* @node: memory node from which to allocate
*/
static inline void *kzalloc_node(size_t size, gfp_t flags, int node)
{
return kmalloc_node(size, flags | __GFP_ZERO, node);
}
/*
* Determine the size of a slab object
*/
static inline unsigned int kmem_cache_size(struct kmem_cache *s)
{
return s->object_size;
}
void __init kmem_cache_init_late(void);
#endif /* _LINUX_SLAB_H */