Message ID | 20220604082209.55174-1-zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm: memcontrol: add {pgscan,pgsteal}_{kswapd,direct} items in memory.stat of cgroup v2 | expand |
On Sat, Jun 04, 2022 at 04:22:09PM +0800, Qi Zheng wrote: > There are already statistics of {pgscan,pgsteal}_kswapd and > {pgscan,pgsteal}_direct of memcg event here, but now only the > sum of the two is displayed in memory.stat of cgroup v2. > > In order to obtain more accurate information during monitoring > and debugging, and to align with the display in /proc/vmstat, > it better to display {pgscan,pgsteal}_kswapd and > {pgscan,pgsteal}_direct separately. > > Also, for forward compatibility, we still display pgscan and > pgsteal items so that it won't break existing applications. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> > Acked-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Shakeel Butt <shakeelb@google.com>
On Sat 04-06-22 16:22:09, Qi Zheng wrote: > There are already statistics of {pgscan,pgsteal}_kswapd and > {pgscan,pgsteal}_direct of memcg event here, but now only the > sum of the two is displayed in memory.stat of cgroup v2. > > In order to obtain more accurate information during monitoring > and debugging, and to align with the display in /proc/vmstat, > it better to display {pgscan,pgsteal}_kswapd and > {pgscan,pgsteal}_direct separately. > > Also, for forward compatibility, we still display pgscan and > pgsteal items so that it won't break existing applications. I do not remember why we have chosen to report cumulative stats rather than the direct and kswapd parts. Looking back when Roman has introduced those (http://lkml.kernel.org/r/1494530183-30808-1-git-send-email-guro@fb.com) I do not see any discussion around that. So it was likely just not a priority. I have just one question. Say we even decide to have a per memcg kswapd in some form, would we report that into the same counter? > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> > Acked-by: Muchun Song <songmuchun@bytedance.com> In any case Acked-by: Michal Hocko <mhocko@suse.com> One nit below [...] > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 0d3fe0a0c75a..fd78c4d6bbc7 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1460,6 +1460,28 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, > return memcg_page_state(memcg, item) * memcg_page_state_unit(item); > } > I would just add the following for clarity /* Subset of vm_event_item to report for memcg event stats */ > +static const unsigned int memcg_vm_event_stat[] = { > + PGSCAN_KSWAPD, > + PGSCAN_DIRECT, > + PGSTEAL_KSWAPD, > + PGSTEAL_DIRECT, > + PGFAULT, > + PGMAJFAULT, > + PGREFILL, > + PGACTIVATE, > + PGDEACTIVATE, > + PGLAZYFREE, > + PGLAZYFREED, > +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) > + ZSWPIN, > + ZSWPOUT, > +#endif > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > + THP_FAULT_ALLOC, > + THP_COLLAPSE_ALLOC, > +#endif > +};
On 2022/6/6 8:03 PM, Michal Hocko wrote: > On Sat 04-06-22 16:22:09, Qi Zheng wrote: >> There are already statistics of {pgscan,pgsteal}_kswapd and >> {pgscan,pgsteal}_direct of memcg event here, but now only the >> sum of the two is displayed in memory.stat of cgroup v2. >> >> In order to obtain more accurate information during monitoring >> and debugging, and to align with the display in /proc/vmstat, >> it better to display {pgscan,pgsteal}_kswapd and >> {pgscan,pgsteal}_direct separately. >> >> Also, for forward compatibility, we still display pgscan and >> pgsteal items so that it won't break existing applications. > > I do not remember why we have chosen to report cumulative stats rather > than the direct and kswapd parts. Looking back when Roman has introduced > those (http://lkml.kernel.org/r/1494530183-30808-1-git-send-email-guro@fb.com) > I do not see any discussion around that. So it was likely just not > a priority. > > I have just one question. Say we even decide to have a per memcg kswapd > in some form, would we report that into the same counter? IMO, I would like it can be reported into the same counter. > >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> Acked-by: Johannes Weiner <hannes@cmpxchg.org> >> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> >> Acked-by: Muchun Song <songmuchun@bytedance.com> > > In any case > Acked-by: Michal Hocko <mhocko@suse.com> Thanks. > > One nit below > [...] >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 0d3fe0a0c75a..fd78c4d6bbc7 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -1460,6 +1460,28 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, >> return memcg_page_state(memcg, item) * memcg_page_state_unit(item); >> } >> > > I would just add the following for clarity OK, will do. > > /* Subset of vm_event_item to report for memcg event stats */ >> +static const unsigned int memcg_vm_event_stat[] = { >> + PGSCAN_KSWAPD, >> + PGSCAN_DIRECT, >> + PGSTEAL_KSWAPD, >> + PGSTEAL_DIRECT, >> + PGFAULT, >> + PGMAJFAULT, >> + PGREFILL, >> + PGACTIVATE, >> + PGDEACTIVATE, >> + PGLAZYFREE, >> + PGLAZYFREED, >> +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) >> + ZSWPIN, >> + ZSWPOUT, >> +#endif >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> + THP_FAULT_ALLOC, >> + THP_COLLAPSE_ALLOC, >> +#endif >> +}; >
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 176298f2f4de..b2b55e7360d8 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1445,9 +1445,21 @@ PAGE_SIZE multiple when read back. pgscan (npn) Amount of scanned pages (in an inactive LRU list) + pgscan_kswapd (npn) + Amount of scanned pages by kswapd (in an inactive LRU list) + + pgscan_direct (npn) + Amount of scanned pages directly (in an inactive LRU list) + pgsteal (npn) Amount of reclaimed pages + pgsteal_kswapd (npn) + Amount of reclaimed pages by kswapd + + pgsteal_direct (npn) + Amount of reclaimed pages directly + pgactivate (npn) Amount of pages moved to the active LRU list diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0d3fe0a0c75a..fd78c4d6bbc7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1460,6 +1460,28 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, return memcg_page_state(memcg, item) * memcg_page_state_unit(item); } +static const unsigned int memcg_vm_event_stat[] = { + PGSCAN_KSWAPD, + PGSCAN_DIRECT, + PGSTEAL_KSWAPD, + PGSTEAL_DIRECT, + PGFAULT, + PGMAJFAULT, + PGREFILL, + PGACTIVATE, + PGDEACTIVATE, + PGLAZYFREE, + PGLAZYFREED, +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) + ZSWPIN, + ZSWPOUT, +#endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + THP_FAULT_ALLOC, + THP_COLLAPSE_ALLOC, +#endif +}; + static char *memory_stat_format(struct mem_cgroup *memcg) { struct seq_buf s; @@ -1495,41 +1517,17 @@ static char *memory_stat_format(struct mem_cgroup *memcg) } /* Accumulated memory events */ - - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGFAULT), - memcg_events(memcg, PGFAULT)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGMAJFAULT), - memcg_events(memcg, PGMAJFAULT)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGREFILL), - memcg_events(memcg, PGREFILL)); seq_buf_printf(&s, "pgscan %lu\n", memcg_events(memcg, PGSCAN_KSWAPD) + memcg_events(memcg, PGSCAN_DIRECT)); seq_buf_printf(&s, "pgsteal %lu\n", memcg_events(memcg, PGSTEAL_KSWAPD) + memcg_events(memcg, PGSTEAL_DIRECT)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGACTIVATE), - memcg_events(memcg, PGACTIVATE)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGDEACTIVATE), - memcg_events(memcg, PGDEACTIVATE)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGLAZYFREE), - memcg_events(memcg, PGLAZYFREE)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGLAZYFREED), - memcg_events(memcg, PGLAZYFREED)); - -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) - seq_buf_printf(&s, "%s %lu\n", vm_event_name(ZSWPIN), - memcg_events(memcg, ZSWPIN)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(ZSWPOUT), - memcg_events(memcg, ZSWPOUT)); -#endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - seq_buf_printf(&s, "%s %lu\n", vm_event_name(THP_FAULT_ALLOC), - memcg_events(memcg, THP_FAULT_ALLOC)); - seq_buf_printf(&s, "%s %lu\n", vm_event_name(THP_COLLAPSE_ALLOC), - memcg_events(memcg, THP_COLLAPSE_ALLOC)); -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + for (i = 0; i < ARRAY_SIZE(memcg_vm_event_stat); i++) + seq_buf_printf(&s, "%s %lu\n", + vm_event_name(memcg_vm_event_stat[i]), + memcg_events(memcg, memcg_vm_event_stat[i])); /* The above should easily fit into one page */ WARN_ON_ONCE(seq_buf_has_overflowed(&s));