Message ID | 20220603070423.10025-1-zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: memcontrol: separate {pgscan,pgsteal}_{kswapd,direct} items in memory.stat of cgroup v2 | expand |
On Fri, Jun 03, 2022 at 03:04:23PM +0800, Qi Zheng wrote: > There are already statistics of {pgscan,pgsteal}_kswapd and > {pgscan,pgsteal}_direct of memcg event here, but now the sum > of the two is displayed in memory.stat of cgroup v2. > > In order to obtain more accurate information during monitoring > and debugging, and to align with the display in /proc/vmstat, > it better to display {pgscan,pgsteal}_kswapd and > {pgscan,pgsteal}_direct separately. > > Moreover, after this modification, all memcg events can be > printed with a combination of vm_event_name() and memcg_events(). > This allows us to create an array to traverse and print, which > reduces redundant seq_buf_printf() codes. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Sounds good to me. We inititally didn't do it because /proc/vmstat has the breakdown to understand global reclaim behavior, and cgroup reclaim doesn't have a kswapd. But it's nice to stay consistent, it's helpful to understand if certain cgroups have a higher share of direct global reclaim (GFP_TRANSHUGE* for example), and we very much do want kswapd per cgroup down the line (we've had it in production for ages). Acked-by: Johannes Weiner <hannes@cmpxchg.org>
On Fri, Jun 03, 2022 at 03:04:23PM +0800, Qi Zheng wrote: > There are already statistics of {pgscan,pgsteal}_kswapd and > {pgscan,pgsteal}_direct of memcg event here, but now the sum > of the two is displayed in memory.stat of cgroup v2. > > In order to obtain more accurate information during monitoring > and debugging, and to align with the display in /proc/vmstat, > it better to display {pgscan,pgsteal}_kswapd and > {pgscan,pgsteal}_direct separately. > > Moreover, after this modification, all memcg events can be > printed with a combination of vm_event_name() and memcg_events(). > This allows us to create an array to traverse and print, which > reduces redundant seq_buf_printf() codes. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Thanks!
On Fri, Jun 3, 2022 at 12:06 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote: > [...] > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 176298f2f4de..0b9ca7e7df34 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1442,11 +1442,17 @@ PAGE_SIZE multiple when read back. > pgrefill (npn) > Amount of scanned pages (in an active LRU list) > > - pgscan (npn) > - Amount of scanned pages (in an inactive LRU list) > + pgscan_kswapd (npn) > + Amount of scanned pages by kswapd (in an inactive LRU list) > > - pgsteal (npn) > - Amount of reclaimed pages > + pgscan_direct (npn) > + Amount of scanned pages directly (in an inactive LRU list) > + > + pgsteal_kswapd (npn) > + Amount of reclaimed pages by kswapd > + > + pgsteal_direct (npn) > + Amount of reclaimed pages directly No objection to adding new fields but removing 'pgsteal' and 'pgscan' from the user visible API might break some applications.
On 2022/6/4 8:47 AM, Shakeel Butt wrote: > On Fri, Jun 3, 2022 at 12:06 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote: >> > [...] >> >> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst >> index 176298f2f4de..0b9ca7e7df34 100644 >> --- a/Documentation/admin-guide/cgroup-v2.rst >> +++ b/Documentation/admin-guide/cgroup-v2.rst >> @@ -1442,11 +1442,17 @@ PAGE_SIZE multiple when read back. >> pgrefill (npn) >> Amount of scanned pages (in an active LRU list) >> >> - pgscan (npn) >> - Amount of scanned pages (in an inactive LRU list) >> + pgscan_kswapd (npn) >> + Amount of scanned pages by kswapd (in an inactive LRU list) >> >> - pgsteal (npn) >> - Amount of reclaimed pages >> + pgscan_direct (npn) >> + Amount of scanned pages directly (in an inactive LRU list) >> + >> + pgsteal_kswapd (npn) >> + Amount of reclaimed pages by kswapd >> + >> + pgsteal_direct (npn) >> + Amount of reclaimed pages directly > > No objection to adding new fields but removing 'pgsteal' and 'pgscan' > from the user visible API might break some applications. Oh, got it. So do we need to keep pgscan and pgsteal fields? If it is, I can add it back in patch v2. Thanks, Qi
On Fri, Jun 3, 2022 at 3:06 PM Qi Zheng <zhengqi.arch@bytedance.com> wrote: > > There are already statistics of {pgscan,pgsteal}_kswapd and > {pgscan,pgsteal}_direct of memcg event here, but now the sum > of the two is displayed in memory.stat of cgroup v2. > > In order to obtain more accurate information during monitoring > and debugging, and to align with the display in /proc/vmstat, > it better to display {pgscan,pgsteal}_kswapd and > {pgscan,pgsteal}_direct separately. > > Moreover, after this modification, all memcg events can be > printed with a combination of vm_event_name() and memcg_events(). > This allows us to create an array to traverse and print, which > reduces redundant seq_buf_printf() codes. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> With Shakeel's changes. Acked-by: Muchun Song <songmuchun@bytedance.com>
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 176298f2f4de..0b9ca7e7df34 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1442,11 +1442,17 @@ PAGE_SIZE multiple when read back. pgrefill (npn) Amount of scanned pages (in an active LRU list) - pgscan (npn) - Amount of scanned pages (in an inactive LRU list) + pgscan_kswapd (npn) + Amount of scanned pages by kswapd (in an inactive LRU list) - pgsteal (npn) - Amount of reclaimed pages + pgscan_direct (npn) + Amount of scanned pages directly (in an inactive LRU list) + + pgsteal_kswapd (npn) + Amount of reclaimed pages by kswapd + + pgsteal_direct (npn) + Amount of reclaimed pages directly pgactivate (npn) Amount of pages moved to the active LRU list diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0d3fe0a0c75a..4093062c5c9b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1460,6 +1460,28 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, return memcg_page_state(memcg, item) * memcg_page_state_unit(item); } +static const unsigned int memcg_vm_event_stat[] = { + PGFAULT, + PGMAJFAULT, + PGREFILL, + PGSCAN_KSWAPD, + PGSCAN_DIRECT, + PGSTEAL_KSWAPD, + PGSTEAL_DIRECT, + PGACTIVATE, + PGDEACTIVATE, + PGLAZYFREE, + PGLAZYFREED, +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) + ZSWPIN, + ZSWPOUT, +#endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + THP_FAULT_ALLOC, + THP_COLLAPSE_ALLOC, +#endif +}; + static char *memory_stat_format(struct mem_cgroup *memcg) { struct seq_buf s; @@ -1495,41 +1517,10 @@ static char *memory_stat_format(struct mem_cgroup *memcg) } /* Accumulated memory events */ - - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGFAULT), - memcg_events(memcg, PGFAULT)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGMAJFAULT), - memcg_events(memcg, PGMAJFAULT)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGREFILL), - memcg_events(memcg, PGREFILL)); - seq_buf_printf(&s, "pgscan %lu\n", - memcg_events(memcg, PGSCAN_KSWAPD) + - memcg_events(memcg, PGSCAN_DIRECT)); - seq_buf_printf(&s, "pgsteal %lu\n", - memcg_events(memcg, PGSTEAL_KSWAPD) + - memcg_events(memcg, PGSTEAL_DIRECT)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGACTIVATE), - memcg_events(memcg, PGACTIVATE)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGDEACTIVATE), - memcg_events(memcg, PGDEACTIVATE)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGLAZYFREE), - memcg_events(memcg, PGLAZYFREE)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGLAZYFREED), - memcg_events(memcg, PGLAZYFREED)); - -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) - seq_buf_printf(&s, "%s %lu\n", vm_event_name(ZSWPIN), - memcg_events(memcg, ZSWPIN)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(ZSWPOUT), - memcg_events(memcg, ZSWPOUT)); -#endif - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - seq_buf_printf(&s, "%s %lu\n", vm_event_name(THP_FAULT_ALLOC), - memcg_events(memcg, THP_FAULT_ALLOC)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(THP_COLLAPSE_ALLOC), - memcg_events(memcg, THP_COLLAPSE_ALLOC)); -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + for (i = 0; i < ARRAY_SIZE(memcg_vm_event_stat); i++) + seq_buf_printf(&s, "%s %lu\n", + vm_event_name(memcg_vm_event_stat[i]), + memcg_events(memcg, memcg_vm_event_stat[i])); /* The above should easily fit into one page */ WARN_ON_ONCE(seq_buf_has_overflowed(&s));
There are already statistics of {pgscan,pgsteal}_kswapd and {pgscan,pgsteal}_direct of memcg event here, but now the sum of the two is displayed in memory.stat of cgroup v2. In order to obtain more accurate information during monitoring and debugging, and to align with the display in /proc/vmstat, it better to display {pgscan,pgsteal}_kswapd and {pgscan,pgsteal}_direct separately. Moreover, after this modification, all memcg events can be printed with a combination of vm_event_name() and memcg_events(). This allows us to create an array to traverse and print, which reduces redundant seq_buf_printf() codes. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- Documentation/admin-guide/cgroup-v2.rst | 14 ++++-- mm/memcontrol.c | 61 +++++++++++-------------- 2 files changed, 36 insertions(+), 39 deletions(-)