dect
/
linux-2.6
Archived
13
0
Fork 0
This repository has been archived on 2022-02-17. You can view files and clone it, but cannot push or open issues or pull requests.
linux-2.6/mm
KAMEZAWA Hiroyuki f75ca96203 memcg: avoid css_get()
Now, memory cgroup increments css(cgroup subsys state)'s reference count
per a charged page.  And the reference count is kept until the page is
uncharged.  But this has 2 bad effect.

 1. Because css_get/put calls atomic_inc()/dec, heavy call of them
    on large smp will not scale well.
 2. Because css's refcnt cannot be in a state as "ready-to-release",
    cgroup's notify_on_release handler can't work with memcg.
 3. css's refcnt is atomic_t, it means smaller than 32bit. Maybe too small.

This has been a problem since the 1st merge of memcg.

This is a trial to remove css's refcnt per a page. Even if we remove
refcnt, pre_destroy() does enough synchronization as
  - check res->usage == 0.
  - check no pages on LRU.

This patch removes css's refcnt per page.  Even after this patch, at the
1st look, it seems css_get() is still called in try_charge().

But the logic is.

  - If a memcg of mm->owner is cached one, consume_stock() will work.
    At success, return immediately.
  - If consume_stock returns false, css_get() is called and go to
    slow path which may be blocked. At the end of slow path,
    css_put() is called and restart from the start if necessary.

So, in the fast path, we don't call css_get() and can avoid access to
shared counter. This patch can make the most possible case fast.

Here is a result of multi-threaded page fault benchmark.

[Before]
    25.32%  multi-fault-all  [kernel.kallsyms]      [k] clear_page_c
     9.30%  multi-fault-all  [kernel.kallsyms]      [k] _raw_spin_lock_irqsave
     8.02%  multi-fault-all  [kernel.kallsyms]      [k] try_get_mem_cgroup_from_mm <=====(*)
     7.83%  multi-fault-all  [kernel.kallsyms]      [k] down_read_trylock
     5.38%  multi-fault-all  [kernel.kallsyms]      [k] __css_put
     5.29%  multi-fault-all  [kernel.kallsyms]      [k] __alloc_pages_nodemask
     4.92%  multi-fault-all  [kernel.kallsyms]      [k] _raw_spin_lock_irq
     4.24%  multi-fault-all  [kernel.kallsyms]      [k] up_read
     3.53%  multi-fault-all  [kernel.kallsyms]      [k] css_put
     2.11%  multi-fault-all  [kernel.kallsyms]      [k] handle_mm_fault
     1.76%  multi-fault-all  [kernel.kallsyms]      [k] __rmqueue
     1.64%  multi-fault-all  [kernel.kallsyms]      [k] __mem_cgroup_commit_charge

[After]
    28.41%  multi-fault-all  [kernel.kallsyms]      [k] clear_page_c
    10.08%  multi-fault-all  [kernel.kallsyms]      [k] _raw_spin_lock_irq
     9.58%  multi-fault-all  [kernel.kallsyms]      [k] down_read_trylock
     9.38%  multi-fault-all  [kernel.kallsyms]      [k] _raw_spin_lock_irqsave
     5.86%  multi-fault-all  [kernel.kallsyms]      [k] __alloc_pages_nodemask
     5.65%  multi-fault-all  [kernel.kallsyms]      [k] up_read
     2.82%  multi-fault-all  [kernel.kallsyms]      [k] handle_mm_fault
     2.64%  multi-fault-all  [kernel.kallsyms]      [k] mem_cgroup_add_lru_list
     2.48%  multi-fault-all  [kernel.kallsyms]      [k] __mem_cgroup_commit_charge

Then, 8.02% of try_get_mem_cgroup_from_mm() disappears because this patch
removes css_tryget() in it. (But yes, this is an extreme case.)

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:19 -07:00
..
Kconfig lmb: rename to memblock 2010-07-14 17:14:00 +10:00
Kconfig.debug trivial: improve help text for mm debug config options 2009-09-21 15:14:57 +02:00
Makefile lmb: rename to memblock 2010-07-14 17:14:00 +10:00
backing-dev.c writeback: fix bad _bh spinlock nesting 2010-08-07 18:53:57 +02:00
bootmem.c x86,nobootmem: make alloc_bootmem_node fall back to other node when 32bit numa is used 2010-07-20 16:25:40 -07:00
bounce.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
compaction.c mm: compaction: add a tunable that decides when memory should be compacted and when it should be reclaimed 2010-05-25 08:06:59 -07:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c dmapools: protect page_list walk in show_pools() 2009-06-30 18:56:00 -07:00
fadvise.c readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM 2010-03-06 11:26:25 -08:00
failslab.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
filemap.c gcc-4.6: mm: fix unused but set warnings 2010-08-09 20:44:58 -07:00
filemap_xip.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
fremap.c mm: clean up mm_counter 2010-03-06 11:26:23 -08:00
highmem.c mm,kdb,kgdb: Add a debug reference for the kdb kmap usage 2010-08-05 09:22:24 -05:00
hugetlb.c hugetlb: call mmu notifiers on hugepage cow 2010-08-09 20:44:54 -07:00
hwpoison-inject.c HWPOISON: Don't do early filtering if filter is disabled 2009-12-16 12:20:01 +01:00
init-mm.c mm: provide init_mm mm_context initializer 2010-08-09 20:44:54 -07:00
internal.h HWPOISON: add an interface to switch off/on all the page filters 2009-12-16 12:19:59 +01:00
kmemcheck.c kmemcheck: Fix build errors due to missing slab.h 2010-03-30 22:02:32 +09:00
kmemleak-test.c percpu: clean up percpu variable definitions 2009-06-24 15:13:48 +09:00
kmemleak.c kmemleak: Fix typo in the comment 2010-08-08 21:57:23 +01:00
ksm.c ksm: cleanup for mm_slots_hash 2010-08-09 20:45:03 -07:00
maccess.c maccess,probe_kernel: Allow arch specific override probe_kernel_(read|write) 2010-01-07 11:58:36 -06:00
madvise.c HWPOISON: Add a madvise() injector for soft page offlining 2009-12-16 12:20:00 +01:00
memblock.c memblock: Fix memblock_is_region_reserved() to return a boolean 2010-08-09 11:21:38 +10:00
memcontrol.c memcg: avoid css_get() 2010-08-11 08:59:19 -07:00
memory-failure.c KVM: Fix a race condition for usage of is_hwpoison_address() 2010-08-01 10:47:11 +03:00
memory.c mmu-notifiers: remove mmu notifier calls in apply_to_page_range() 2010-08-09 20:45:03 -07:00
memory_hotplug.c mem-hotplug: fix potential race while building zonelist for new populated zone 2010-05-25 08:07:02 -07:00
mempolicy.c mempolicy: reduce stack size of migrate_pages() 2010-08-09 20:44:58 -07:00
mempool.c mm: remove broken 'kzalloc' mempool 2009-09-22 07:17:35 -07:00
migrate.c mm: extend KSM refcounts to the anon_vma root 2010-08-09 20:44:55 -07:00
mincore.c mincore: do nested page table walks 2010-05-25 08:06:58 -07:00
mlock.c x86, perf, bts, mm: Delete the never used BTS-ptrace code 2010-03-26 11:33:55 +01:00
mm_init.c
mmap.c mmap: remove unnecessary lock from __vma_link 2010-08-09 20:44:58 -07:00
mmu_context.c exit: fix oops in sync_mm_rss 2010-03-24 16:31:21 -07:00
mmu_notifier.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
mmzone.c [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2 2009-05-18 11:22:24 +01:00
mprotect.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
mremap.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nommu.c nommu: allow private mappings of read-only devices 2010-05-26 08:19:23 -07:00
oom_kill.c memcg: use find_lock_task_mm() in memory cgroups oom 2010-08-11 08:59:19 -07:00
page-writeback.c Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block 2010-08-10 15:22:42 -07:00
page_alloc.c vmscan: kill prev_priority completely 2010-08-09 20:45:00 -07:00
page_cgroup.c kmemleak: Annotate false positive in init_section_page_cgroup() 2010-07-19 11:54:14 +01:00
page_io.c block: unify flags for struct bio and struct request 2010-08-07 18:20:39 +02:00
page_isolation.c
pagewalk.c pagemap: fix pfn calculation for hugepage 2010-04-07 08:38:04 -07:00
percpu-km.c percpu: implement kernel memory based chunk allocation 2010-05-01 08:30:50 +02:00
percpu-vm.c percpu: move vmalloc based chunk management into percpu-vm.c 2010-05-01 08:30:50 +02:00
percpu.c percpu: allow limited allocation before slab is online 2010-06-27 18:50:00 +02:00
percpu_up.c percpu: don't implicitly include slab.h from percpu.h 2010-03-30 22:02:32 +09:00
prio_tree.c
quicklist.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
readahead.c readahead.c: fix comment 2010-05-25 08:07:00 -07:00
rmap.c rmap: add exclusive page to private anon_vma on swapin 2010-08-09 20:45:02 -07:00
shmem.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-08-10 11:26:52 -07:00
slab.c gcc-4.6: mm: fix unused but set warnings 2010-08-09 20:44:58 -07:00
slob.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 2010-08-06 11:44:08 -07:00
slub.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 2010-08-06 11:44:08 -07:00
sparse-vmemmap.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
sparse.c sparsemem: on no vmemmap path put mem_map on node high too 2010-05-25 08:06:56 -07:00
swap.c mm: export lru_cache_add_*() to modules 2010-05-25 15:06:06 +02:00
swap_state.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
swapfile.c hibernation: freeze swap at hibernation 2010-08-09 20:45:04 -07:00
thrash.c mm: pass mm to grab_swap_token 2009-06-23 12:50:05 -07:00
truncate.c check ATTR_SIZE contraints in inode_change_ok 2010-08-09 16:47:39 -04:00
util.c mm: use memdup_user 2010-08-09 20:44:54 -07:00
vmalloc.c mm/vmalloc.c: check kmalloc() return value 2010-08-09 20:45:03 -07:00
vmscan.c vmscan: raise the bar to PAGEOUT_IO_SYNC stalls 2010-08-09 20:45:03 -07:00
vmstat.c vmscan: kill prev_priority completely 2010-08-09 20:45:00 -07:00