[055/262] memcg: flush stats only if updated

Message ID	20211105203731.uHUWGR8SE%akpm@linux-foundation.org (mailing list archive)
State	New
Headers	show Return-Path: <SRS0=bSwl=PY=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 734E461242 Date: Fri, 05 Nov 2021 13:37:31 -0700 From: Andrew Morton <akpm@linux-foundation.org> To: akpm@linux-foundation.org, hannes@cmpxchg.org, linux-mm@kvack.org, mhocko@kernel.org, mkoutny@suse.com, mm-commits@vger.kernel.org, shakeelb@google.com, torvalds@linux-foundation.org Subject: [patch 055/262] memcg: flush stats only if updated Message-ID: <20211105203731.uHUWGR8SE%akpm@linux-foundation.org> In-Reply-To: <20211105133408.cccbb98b71a77d5e8430aba1@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[001/262] scripts/spelling.txt: add more spellings to spelling.txt \| expand [001/262] scripts/spelling.txt: add more spellings to spelling.txt [002/262] scripts/spelling.txt: fix "mistake" version of "synchronization" [003/262] scripts/decodecode: fix faulting instruction no print when opps.file is DOS format [004/262] ocfs2: fix handle refcount leak in two exception handling paths [005/262] ocfs2: cleanup journal init and shutdown [006/262] ocfs2/dlm: remove redundant assignment of variable ret [007/262] ocfs2: fix data corruption on truncate [008/262] ocfs2: do not zero pages beyond i_size [009/262] fs/posix_acl.c: avoid -Wempty-body warning [010/262] d_path: fix Kernel doc validator complaining [011/262] mm: move kvmalloc-related functions to slab.h [012/262] mm/slab.c: remove useless lines in enable_cpucache() [013/262] slub: add back check for free nonslab objects [014/262] mm, slub: change percpu partial accounting from objects to pages [015/262] mm/slub: increase default cpu partial list sizes [016/262] mm, slub: use prefetchw instead of prefetch [017/262] mm: disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT [018/262] mm: don't include <linux/dax.h> in <linux/mempolicy.h> [019/262] lib/stackdepot: include gfp.h [020/262] lib/stackdepot: remove unused function argument [021/262] lib/stackdepot: introduce __stack_depot_save() [022/262] kasan: common: provide can_alloc in kasan_save_stack() [023/262] kasan: generic: introduce kasan_record_aux_stack_noalloc() [024/262] workqueue, kasan: avoid alloc_pages() when recording stack [025/262] kasan: fix tag for large allocations when using CONFIG_SLAB [026/262] kasan: test: add memcpy test that avoids out-of-bounds write [027/262] mm/smaps: fix shmem pte hole swap calculation [028/262] mm/smaps: use vma->vm_pgoff directly when counting partial swap [029/262] mm/smaps: simplify shmem handling of pte holes [030/262] mm: debug_vm_pgtable: don't use __P000 directly [031/262] kasan: test: bypass __alloc_size checks [032/262] rapidio: avoid bogus __alloc_size warning [033/262] Compiler Attributes: add __alloc_size() for better bounds checking [034/262] slab: clean up function prototypes [035/262] slab: add __alloc_size attributes for better bounds checking [036/262] mm/kvmalloc: add __alloc_size attributes for better bounds checking [037/262] mm/vmalloc: add __alloc_size attributes for better bounds checking [038/262] mm/page_alloc: add __alloc_size attributes for better bounds checking [039/262] percpu: add __alloc_size attributes for better bounds checking [040/262] mm/page_ext.c: fix a comment [041/262] mm: stop filemap_read() from grabbing a superfluous page [042/262] mm: export bdi_unregister [043/262] mtd: call bdi_unregister explicitly [044/262] fs: explicitly unregister per-superblock BDIs [045/262] mm: don't automatically unregister bdis [046/262] mm: simplify bdi refcounting [047/262] mm: don't read i_size of inode unless we need it [048/262] mm/filemap.c: remove bogus VM_BUG_ON [049/262] mm: move more expensive part of XA setup out of mapping check [050/262] mm/gup: further simplify __gup_device_huge() [051/262] mm/swapfile: remove needless request_queue NULL pointer check [052/262] mm/swapfile: fix an integer overflow in swap_show() [053/262] mm: optimise put_pages_list() [054/262] mm/memcg: drop swp_entry_t* in mc_handle_file_pte() [055/262] memcg: flush stats only if updated [056/262] memcg: unify memcg stat flushing [057/262] mm/memcg: remove obsolete memcg_free_kmem() [058/262] mm/list_lru.c: prefer struct_size over open coded arithmetic [059/262] memcg, kmem: further deprecate kmem.limit_in_bytes [060/262] mm: list_lru: remove holding lru lock [061/262] mm: list_lru: fix the return value of list_lru_count_one() [062/262] mm: memcontrol: remove kmemcg_id reparenting [063/262] mm: memcontrol: remove the kmem states [064/262] mm: list_lru: only add memcg-aware lrus to the global lru list [065/262] mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks [066/262] mm, oom: do not trigger out_of_memory from the #PF [067/262] memcg: prohibit unconditional exceeding the limit of dying tasks [068/262] mm/mmap.c: fix a data race of mm->total_vm [069/262] mm: use __pfn_to_section() instead of open coding it [070/262] mm/memory.c: avoid unnecessary kernel/user pointer conversion [071/262] mm/memory.c: use correct VMA flags when freeing page-tables [072/262] mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte [073/262] mm: clear vmf->pte after pte_unmap_same() returns [074/262] mm: drop first_index/last_index in zap_details [075/262] mm: add zap_skip_check_mapping() helper [076/262] mm: introduce pmd_install() helper [077/262] mm: remove redundant smp_wmb() [078/262] Documentation: update pagemap with shmem exceptions [079/262] lazy tlb: introduce lazy mm refcount helper functions [080/262] lazy tlb: allow lazy tlb mm refcounting to be configurable [081/262] lazy tlb: shoot lazies, a non-refcounting lazy tlb option [082/262] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN [083/262] memory: remove unused CONFIG_MEM_BLOCK_SIZE [084/262] mm/mprotect.c: avoid repeated assignment in do_mprotect_pkey() [085/262] mm/mremap: don't account pages in vma_to_resize() [086/262] include/linux/io-mapping.h: remove fallback for writecombine [087/262] mm: mmap_lock: remove redundant newline in TP_printk [088/262] mm: mmap_lock: use DECLARE_EVENT_CLASS and DEFINE_EVENT_FN [089/262] mm/vmalloc: repair warn_alloc()s in __vmalloc_area_node() [090/262] mm/vmalloc: don't allow VM_NO_GUARD on vmap() [091/262] mm/vmalloc: make show_numa_info() aware of hugepage mappings [092/262] mm/vmalloc: make sure to dump unpurged areas in /proc/vmallocinfo [093/262] mm/vmalloc: do not adjust the search size for alignment overhead [094/262] mm/vmalloc: check various alignments when debugging [095/262] vmalloc: back off when the current task is OOM-killed [096/262] vmalloc: choose a better start address in vm_area_register_early() [097/262] arm64: support page mapping percpu first chunk allocator [098/262] kasan: arm64: fix pcpu_page_first_chunk crash with KASAN_VMALLOC [099/262] mm/vmalloc: be more explicit about supported gfp flags [100/262] mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to accelerate memory allocation [101/262] lib/test_vmalloc.c: use swap() to make code cleaner [102/262] mm/large system hash: avoid possible NULL deref in alloc_large_system_hash [103/262] mm/page_alloc.c: remove meaningless VM_BUG_ON() in pindex_to_order() [104/262] mm/page_alloc.c: simplify the code by using macro K() [105/262] mm/page_alloc.c: fix obsolete comment in free_pcppages_bulk() [106/262] mm/page_alloc.c: use helper function zone_spans_pfn() [107/262] mm/page_alloc.c: avoid allocating highmem pages via alloc_pages_exact[_nid] [108/262] mm/page_alloc: print node fallback order [109/262] mm/page_alloc: use accumulated load when building node fallback list [110/262] mm: move node_reclaim_distance to fix NUMA without SMP [111/262] mm: move fold_vm_numa_events() to fix NUMA without SMP [112/262] mm/page_alloc.c: do not acquire zone lock in is_free_buddy_page() [113/262] mm/page_alloc: detect allocation forbidden by cpuset and bail out early [114/262] mm/page_alloc.c: show watermark_boost of zone in zoneinfo [115/262] mm: create a new system state and fix core_kernel_text() [116/262] mm: make generic arch_is_kernel_initmem_freed() do what it says [117/262] powerpc: use generic version of arch_is_kernel_initmem_freed() [118/262] s390: use generic version of arch_is_kernel_initmem_freed() [119/262] mm: page_alloc: use migrate_disable() in drain_local_pages_wq() [120/262] mm/page_alloc: use clamp() to simplify code [121/262] mm: fix data race in PagePoisoned() [122/262] mm/memory_failure: constify static mm_walk_ops [123/262] mm: filemap: coding style cleanup for filemap_map_pmd() [124/262] mm: hwpoison: refactor refcount check handling [125/262] mm: shmem: don't truncate page if memory failure happens [126/262] mm: hwpoison: handle non-anonymous THP correctly [127/262] mm/hugetlb: drop __unmap_hugepage_range definition from hugetlb.h [128/262] hugetlb: add demote hugetlb page sysfs interfaces [129/262] mm/cma: add cma_pages_valid to determine if pages are in CMA [130/262] hugetlb: be sure to free demoted CMA pages to CMA [131/262] hugetlb: add demote bool to gigantic page routines [132/262] hugetlb: add hugetlb demote page support [133/262] mm: khugepaged: recalculate min_free_kbytes after stopping khugepaged [134/262] mm, hugepages: add mremap() support for hugepage backed vma [135/262] mm, hugepages: add hugetlb vma mremap() test [136/262] hugetlb: support node specified when using cma for gigantic hugepages [137/262] mm: remove duplicate include in hugepage-mremap.c [138/262] hugetlb_cgroup: remove unused hugetlb_cgroup_from_counter macro [139/262] hugetlb: replace the obsolete hugetlb_instantiation_mutex in the comments [140/262] hugetlb: remove redundant validation in has_same_uncharge_info() [141/262] hugetlb: remove redundant VM_BUG_ON() in add_reservation_in_range() [142/262] hugetlb: remove unnecessary set_page_count in prep_compound_gigantic_page [143/262] userfaultfd/selftests: don't rely on GNU extensions for random numbers [144/262] userfaultfd/selftests: fix feature support detection [145/262] userfaultfd/selftests: fix calculation of expected ioctls [146/262] mm/page_isolation: fix potential missing call to unset_migratetype_isolate() [147/262] mm/page_isolation: guard against possible putback unisolated page [148/262] mm/vmscan.c: fix -Wunused-but-set-variable warning [149/262] mm/vmscan: throttle reclaim until some writeback completes if congested [150/262] mm/vmscan: throttle reclaim and compaction when too may pages are isolated [151/262] mm/vmscan: throttle reclaim when no progress is being made [152/262] mm/writeback: throttle based on page writeback instead of congestion [153/262] mm/page_alloc: remove the throttling logic from the page allocator [154/262] mm/vmscan: centralise timeout values for reclaim_throttle [155/262] mm/vmscan: increase the timeout if page reclaim is not making progress [156/262] mm/vmscan: delay waking of tasks throttled on NOPROGRESS [157/262] mm/vmpressure: fix data-race with memcg->socket_pressure [158/262] tools/vm/page_owner_sort.c: count and sort by mem [159/262] tools/vm/page-types.c: make walk_file() aware of address range option [160/262] tools/vm/page-types.c: move show_file() to summary output [161/262] tools/vm/page-types.c: print file offset in hexadecimal [162/262] arch_numa: simplify numa_distance allocation [163/262] xen/x86: free_p2m_page: use memblock_free_ptr() to free a virtual pointer [164/262] memblock: drop memblock_free_early_nid() and memblock_free_early() [165/262] memblock: stop aliasing __memblock_free_late with memblock_free_late [166/262] memblock: rename memblock_free to memblock_phys_free [167/262] memblock: use memblock_free for freeing virtual pointers [168/262] mm: mark the OOM reaper thread as freezable [169/262] hugetlbfs: extend the definition of hugepages parameter to support node allocation [170/262] mm/migrate: de-duplicate migrate_reason strings [171/262] mm: migrate: make demotion knob depend on migration [172/262] selftests/vm/transhuge-stress: fix ram size thinko [173/262] mm, thp: lock filemap when truncating page cache [174/262] mm, thp: fix incorrect unmap behavior for private pages [175/262] mm/readahead.c: fix incorrect comments for get_init_ra_size [176/262] mm: nommu: kill arch_get_unmapped_area() [177/262] selftest/vm: fix ksm selftest to run with different NUMA topologies [178/262] selftests: vm: add KSM huge pages merging time test [179/262] mm/vmstat: annotate data race for zone->free_area[order].nr_free [180/262] mm: vmstat.c: make extfrag_index show more pretty [181/262] selftests/vm: make MADV_POPULATE_(READ\|WRITE) use in-tree headers [182/262] mm/memory_hotplug: add static qualifier for online_policy_to_str() [183/262] memory-hotplug.rst: fix two instances of "movablecore" that should be "movable_node" [184/262] memory-hotplug.rst: fix wrong /sys/module/memory_hotplug/parameters/ path [185/262] memory-hotplug.rst: document the "auto-movable" online policy [186/262] mm/memory_hotplug: remove CONFIG_X86_64_ACPI_NUMA dependency from CONFIG_MEMORY_HOTPLUG [187/262] mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE [188/262] mm/memory_hotplug: restrict CONFIG_MEMORY_HOTPLUG to 64 bit [189/262] mm/memory_hotplug: remove HIGHMEM leftovers [190/262] mm/memory_hotplug: remove stale function declarations [191/262] x86: remove memory hotplug support on X86_32 [192/262] mm/memory_hotplug: handle memblock_add_node() failures in add_memory_resource() [193/262] memblock: improve MEMBLOCK_HOTPLUG documentation [194/262] memblock: allow to specify flags with memblock_add_node() [195/262] memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED [196/262] mm/memory_hotplug: indicate MEMBLOCK_DRIVER_MANAGED with IORESOURCE_SYSRAM_DRIVER_MANAGED [197/262] mm/rmap.c: avoid double faults migrating device private pages [198/262] mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migrati… [199/262] mm/highmem: remove deprecated kmap_atomic [200/262] zram_drv: allow reclaim on bio_alloc [201/262] zram: off by one in read_block_state() [202/262] zram: introduce an aged idle interface [203/262] mm: remove HARDENED_USERCOPY_FALLBACK [204/262] include/linux/mm.h: move nr_free_buffer_pages from swap.h to mm.h [205/262] stacktrace: move filter_irq_stacks() to kernel/stacktrace.c [206/262] kfence: count unexpectedly skipped allocations [207/262] kfence: move saving stack trace of allocations into __kfence_alloc() [208/262] kfence: limit currently covered allocations when pool nearly full [209/262] kfence: add note to documentation about skipping covered allocations [210/262] kfence: test: use kunit_skip() to skip tests [211/262] kfence: shorten critical sections of alloc/free [212/262] kfence: always use static branches to guard kfence_alloc() [213/262] kfence: default to dynamic branch instead of static keys mode [214/262] mm/damon: grammar s/works/work/ [215/262] Documentation/vm: move user guides to admin-guide/mm/ [216/262] MAINTAINERS: update SeongJae's email address [217/262] docs/vm/damon: remove broken reference [218/262] include/linux/damon.h: fix kernel-doc comments for 'damon_callback' [219/262] mm/damon/core: print kdamond start log in debug mode only [220/262] mm/damon: remove unnecessary do_exit() from kdamond [221/262] mm/damon: needn't hold kdamond_lock to print pid of kdamond [222/262] mm/damon/core: nullify pointer ctx->kdamond with a NULL [223/262] mm/damon/core: account age of target regions [224/262] mm/damon/core: implement DAMON-based Operation Schemes (DAMOS) [225/262] mm/damon/vaddr: support DAMON-based Operation Schemes [226/262] mm/damon/dbgfs: support DAMON-based Operation Schemes [227/262] mm/damon/schemes: implement statistics feature [228/262] selftests/damon: add 'schemes' debugfs tests [229/262] Docs/admin-guide/mm/damon: document DAMON-based Operation Schemes [230/262] mm/damon/dbgfs: allow users to set initial monitoring target regions [231/262] mm/damon/dbgfs-test: add a unit test case for 'init_regions' [232/262] Docs/admin-guide/mm/damon: document 'init_regions' feature [233/262] mm/damon/vaddr: separate commonly usable functions [234/262] mm/damon: implement primitives for physical address space monitoring [235/262] mm/damon/dbgfs: support physical memory monitoring [236/262] Docs/DAMON: document physical memory monitoring support [237/262] mm/damon/vaddr: constify static mm_walk_ops [238/262] mm/damon/dbgfs: remove unnecessary variables [239/262] mm/damon/paddr: support the pageout scheme [240/262] mm/damon/schemes: implement size quota for schemes application speed control [241/262] mm/damon/schemes: skip already charged targets and regions [242/262] mm/damon/schemes: implement time quota [243/262] mm/damon/dbgfs: support quotas of schemes [244/262] mm/damon/selftests: support schemes quotas [245/262] mm/damon/schemes: prioritize regions within the quotas [246/262] mm/damon/vaddr,paddr: support pageout prioritization [247/262] mm/damon/dbgfs: support prioritization weights [248/262] tools/selftests/damon: update for regions prioritization of schemes [249/262] mm/damon/schemes: activate schemes based on a watermarks mechanism [250/262] mm/damon/dbgfs: support watermarks [251/262] selftests/damon: support watermarks [252/262] mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) [253/262] Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM [254/262] mm/damon: remove unnecessary variable initialization [255/262] mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on [256/262] Docs/admin-guide/mm/damon/start: fix wrong example commands [257/262] Docs/admin-guide/mm/damon/start: fix a wrong link [258/262] Docs/admin-guide/mm/damon/start: simplify the content [259/262] Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions [260/262] mm/damon: simplify stop mechanism [261/262] mm/damon: fix a few spelling mistakes in comments and a pr_debug message [262/262] mm/damon: remove return value from before_terminate callback

Message ID

20211105203731.uHUWGR8SE%akpm@linux-foundation.org (mailing list archive)

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 734E461242
Date: Fri, 05 Nov 2021 13:37:31 -0700
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, hannes@cmpxchg.org, linux-mm@kvack.org,
 mhocko@kernel.org, mkoutny@suse.com, mm-commits@vger.kernel.org,
 shakeelb@google.com, torvalds@linux-foundation.org
Subject: [patch 055/262] memcg: flush stats only if updated
Message-ID: <20211105203731.uHUWGR8SE%akpm@linux-foundation.org>
In-Reply-To: <20211105133408.cccbb98b71a77d5e8430aba1@linux-foundation.org>
User-Agent: s-nail v14.8.16
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

[001/262] scripts/spelling.txt: add more spellings to spelling.txt | expand

Commit Message

Andrew Morton Nov. 5, 2021, 8:37 p.m. UTC

From: Shakeel Butt <shakeelb@google.com>
Subject: memcg: flush stats only if updated

At the moment, the kernel flushes the memcg stats on every refault and
also on every reclaim iteration.  Although rstat maintains per-cpu update
tree but on the flush the kernel still has to go through all the cpu rstat
update tree to check if there is anything to flush.  This patch adds the
tracking on the stats update side to make flush side more clever by
skipping the flush if there is no update.

The stats update codepath is very sensitive performance wise for many
workloads and benchmarks.  So, we can not follow what the commit
aa48e47e3906 ("memcg: infrastructure to flush memcg stats") did which was
triggering async flush through queue_work() and caused a lot performance
regression reports.  That got reverted by the commit 1f828223b799 ("memcg:
flush lruvec stats in the refault").

In this patch we kept the stats update codepath very minimal and let the
stats reader side to flush the stats only when the updates are over a
specific threshold.  For now the threshold is (nr_cpus * CHARGE_BATCH).

To evaluate the impact of this patch, an 8 GiB tmpfs file is created on a
system with swap-on-zram and the file was pushed to swap through
memory.force_empty interface.  On reading the whole file, the memcg stat
flush in the refault code path is triggered.  With this patch, we observed
63% reduction in the read time of 8 GiB file.

Link: https://lkml.kernel.org/r/20211001190040.48086-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Reviewed-by: "Michal Koutný" <mkoutny@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |   78 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 55 insertions(+), 23 deletions(-)

--- a/mm/memcontrol.c~memcg-flush-stats-only-if-updated
+++ a/mm/memcontrol.c
@@ -103,11 +103,6 @@  static bool do_memsw_account(void)
 	return !cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_noswap;
 }
 
-/* memcg and lruvec stats flushing */
-static void flush_memcg_stats_dwork(struct work_struct *w);
-static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork);
-static DEFINE_SPINLOCK(stats_flush_lock);
-
 #define THRESHOLDS_EVENTS_TARGET 128
 #define SOFTLIMIT_EVENTS_TARGET 1024
 
@@ -635,6 +630,56 @@  mem_cgroup_largest_soft_limit_node(struc
 	return mz;
 }
 
+/*
+ * memcg and lruvec stats flushing
+ *
+ * Many codepaths leading to stats update or read are performance sensitive and
+ * adding stats flushing in such codepaths is not desirable. So, to optimize the
+ * flushing the kernel does:
+ *
+ * 1) Periodically and asynchronously flush the stats every 2 seconds to not let
+ *    rstat update tree grow unbounded.
+ *
+ * 2) Flush the stats synchronously on reader side only when there are more than
+ *    (MEMCG_CHARGE_BATCH * nr_cpus) update events. Though this optimization
+ *    will let stats be out of sync by atmost (MEMCG_CHARGE_BATCH * nr_cpus) but
+ *    only for 2 seconds due to (1).
+ */
+static void flush_memcg_stats_dwork(struct work_struct *w);
+static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork);
+static DEFINE_SPINLOCK(stats_flush_lock);
+static DEFINE_PER_CPU(unsigned int, stats_updates);
+static atomic_t stats_flush_threshold = ATOMIC_INIT(0);
+
+static inline void memcg_rstat_updated(struct mem_cgroup *memcg)
+{
+	cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id());
+	if (!(__this_cpu_inc_return(stats_updates) % MEMCG_CHARGE_BATCH))
+		atomic_inc(&stats_flush_threshold);
+}
+
+static void __mem_cgroup_flush_stats(void)
+{
+	if (!spin_trylock(&stats_flush_lock))
+		return;
+
+	cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup);
+	atomic_set(&stats_flush_threshold, 0);
+	spin_unlock(&stats_flush_lock);
+}
+
+void mem_cgroup_flush_stats(void)
+{
+	if (atomic_read(&stats_flush_threshold) > num_online_cpus())
+		__mem_cgroup_flush_stats();
+}
+
+static void flush_memcg_stats_dwork(struct work_struct *w)
+{
+	mem_cgroup_flush_stats();
+	queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ);
+}
+
 /**
  * __mod_memcg_state - update cgroup memory statistics
  * @memcg: the memory cgroup
@@ -647,7 +692,7 @@  void __mod_memcg_state(struct mem_cgroup
 		return;
 
 	__this_cpu_add(memcg->vmstats_percpu->state[idx], val);
-	cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id());
+	memcg_rstat_updated(memcg);
 }
 
 /* idx can be of type enum memcg_stat_item or node_stat_item. */
@@ -675,10 +720,12 @@  void __mod_memcg_lruvec_state(struct lru
 	memcg = pn->memcg;
 
 	/* Update memcg */
-	__mod_memcg_state(memcg, idx, val);
+	__this_cpu_add(memcg->vmstats_percpu->state[idx], val);
 
 	/* Update lruvec */
 	__this_cpu_add(pn->lruvec_stats_percpu->state[idx], val);
+
+	memcg_rstat_updated(memcg);
 }
 
 /**
@@ -780,7 +827,7 @@  void __count_memcg_events(struct mem_cgr
 		return;
 
 	__this_cpu_add(memcg->vmstats_percpu->events[idx], count);
-	cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id());
+	memcg_rstat_updated(memcg);
 }
 
 static unsigned long memcg_events(struct mem_cgroup *memcg, int event)
@@ -5341,21 +5388,6 @@  static void mem_cgroup_css_reset(struct
 	memcg_wb_domain_size_changed(memcg);
 }
 
-void mem_cgroup_flush_stats(void)
-{
-	if (!spin_trylock(&stats_flush_lock))
-		return;
-
-	cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup);
-	spin_unlock(&stats_flush_lock);
-}
-
-static void flush_memcg_stats_dwork(struct work_struct *w)
-{
-	mem_cgroup_flush_stats();
-	queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ);
-}
-
 static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);

[055/262] memcg: flush stats only if updated

Commit Message

Patch