Message ID | 20210106042239.2860107-1-guro@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: memcg/slab: optimize objcg stock draining | expand |
On Tue, Jan 5, 2021 at 8:22 PM Roman Gushchin <guro@fb.com> wrote: > > Imran Khan reported a regression in hackbench results caused by the > commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects > instead of pages"). The regression is noticeable in the case of > a consequent allocation of several relatively large slab objects, > e.g. skb's. As soon as the amount of stocked bytes exceeds PAGE_SIZE, > drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads > to a number of atomic operations in page_counter_uncharge(). > > The corresponding call graph is below (provided by Imran Khan): > |__alloc_skb > | | > | |__kmalloc_reserve.isra.61 > | | | > | | |__kmalloc_node_track_caller > | | | | > | | | |slab_pre_alloc_hook.constprop.88 > | | | obj_cgroup_charge > | | | | | > | | | | |__memcg_kmem_charge > | | | | | | > | | | | | |page_counter_try_charge > | | | | | > | | | | |refill_obj_stock > | | | | | | > | | | | | |drain_obj_stock.isra.68 > | | | | | | | > | | | | | | |__memcg_kmem_uncharge > | | | | | | | | > | | | | | | | |page_counter_uncharge > | | | | | | | | | > | | | | | | | | |page_counter_cancel > | | | | > | | | | > | | | |__slab_alloc > | | | | | > | | | | |___slab_alloc > | | | | | > | | | |slab_post_alloc_hook > > Instead of directly uncharging the accounted kernel memory, it's > possible to refill the generic page-sized per-cpu stock instead. > It's a much faster operation, especially on a default hierarchy. > As a bonus, __memcg_kmem_uncharge_page() will also get faster, > so the freeing of page-sized kernel allocations (e.g. large kmallocs) > will become faster. > > A similar change has been done earlier for the socket memory by > the commit 475d0487a2ad ("mm: memcontrol: use per-cpu stocks for > socket memory uncharging"). > > Signed-off-by: Roman Gushchin <guro@fb.com> > Reported-by: Imran Khan <imran.f.khan@oracle.com> I remember seeing this somewhere https://lore.kernel.org/linux-mm/20190423154405.259178-1-shakeelb@google.com/ Reviewed-by: Shakeel Butt <shakeelb@google.com>
On 6/1/21 3:22 pm, Roman Gushchin wrote: > Imran Khan reported a regression in hackbench results caused by the > commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects > instead of pages"). The regression is noticeable in the case of > a consequent allocation of several relatively large slab objects, > e.g. skb's. As soon as the amount of stocked bytes exceeds PAGE_SIZE, > drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads > to a number of atomic operations in page_counter_uncharge(). > > The corresponding call graph is below (provided by Imran Khan): > |__alloc_skb > | | > | |__kmalloc_reserve.isra.61 > | | | > | | |__kmalloc_node_track_caller > | | | | > | | | |slab_pre_alloc_hook.constprop.88 > | | | obj_cgroup_charge > | | | | | > | | | | |__memcg_kmem_charge > | | | | | | > | | | | | |page_counter_try_charge > | | | | | > | | | | |refill_obj_stock > | | | | | | > | | | | | |drain_obj_stock.isra.68 > | | | | | | | > | | | | | | |__memcg_kmem_uncharge > | | | | | | | | > | | | | | | | |page_counter_uncharge > | | | | | | | | | > | | | | | | | | |page_counter_cancel > | | | | > | | | | > | | | |__slab_alloc > | | | | | > | | | | |___slab_alloc > | | | | | > | | | |slab_post_alloc_hook > > Instead of directly uncharging the accounted kernel memory, it's > possible to refill the generic page-sized per-cpu stock instead. > It's a much faster operation, especially on a default hierarchy. > As a bonus, __memcg_kmem_uncharge_page() will also get faster, > so the freeing of page-sized kernel allocations (e.g. large kmallocs) > will become faster. > > A similar change has been done earlier for the socket memory by > the commit 475d0487a2ad ("mm: memcontrol: use per-cpu stocks for > socket memory uncharging"). > > Signed-off-by: Roman Gushchin <guro@fb.com> > Reported-by: Imran Khan <imran.f.khan@oracle.com> Tested-by: Imran Khan <imran.f.khan@oracle.com> > --- > mm/memcontrol.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 0d74b80fa4de..8148c1df3aff 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3122,9 +3122,7 @@ void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages) > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) > page_counter_uncharge(&memcg->kmem, nr_pages); > > - page_counter_uncharge(&memcg->memory, nr_pages); > - if (do_memsw_account()) > - page_counter_uncharge(&memcg->memsw, nr_pages); > + refill_stock(memcg, nr_pages); > } > > /** >
On Tue, Jan 05, 2021 at 10:05:20PM -0800, Shakeel Butt wrote: > On Tue, Jan 5, 2021 at 8:22 PM Roman Gushchin <guro@fb.com> wrote: > > > > Imran Khan reported a regression in hackbench results caused by the > > commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects > > instead of pages"). The regression is noticeable in the case of > > a consequent allocation of several relatively large slab objects, > > e.g. skb's. As soon as the amount of stocked bytes exceeds PAGE_SIZE, > > drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads > > to a number of atomic operations in page_counter_uncharge(). > > > > The corresponding call graph is below (provided by Imran Khan): > > |__alloc_skb > > | | > > | |__kmalloc_reserve.isra.61 > > | | | > > | | |__kmalloc_node_track_caller > > | | | | > > | | | |slab_pre_alloc_hook.constprop.88 > > | | | obj_cgroup_charge > > | | | | | > > | | | | |__memcg_kmem_charge > > | | | | | | > > | | | | | |page_counter_try_charge > > | | | | | > > | | | | |refill_obj_stock > > | | | | | | > > | | | | | |drain_obj_stock.isra.68 > > | | | | | | | > > | | | | | | |__memcg_kmem_uncharge > > | | | | | | | | > > | | | | | | | |page_counter_uncharge > > | | | | | | | | | > > | | | | | | | | |page_counter_cancel > > | | | | > > | | | | > > | | | |__slab_alloc > > | | | | | > > | | | | |___slab_alloc > > | | | | | > > | | | |slab_post_alloc_hook > > > > Instead of directly uncharging the accounted kernel memory, it's > > possible to refill the generic page-sized per-cpu stock instead. > > It's a much faster operation, especially on a default hierarchy. > > As a bonus, __memcg_kmem_uncharge_page() will also get faster, > > so the freeing of page-sized kernel allocations (e.g. large kmallocs) > > will become faster. > > > > A similar change has been done earlier for the socket memory by > > the commit 475d0487a2ad ("mm: memcontrol: use per-cpu stocks for > > socket memory uncharging"). > > > > Signed-off-by: Roman Gushchin <guro@fb.com> > > Reported-by: Imran Khan <imran.f.khan@oracle.com> > > I remember seeing this somewhere > https://lore.kernel.org/linux-mm/20190423154405.259178-1-shakeelb@google.com/ Yes, we've discussed it a couple of times, as I remember. Looks like now we finally have a good reasoning/benchmark, thanks to Imran. > > Reviewed-by: Shakeel Butt <shakeelb@google.com> Thank you for the review!
On Tue, 5 Jan 2021 20:22:39 -0800 Roman Gushchin <guro@fb.com> wrote: > Imran Khan reported a regression in hackbench results caused by the > commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects > instead of pages"). How large was the regression? > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3122,9 +3122,7 @@ void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages) > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) > page_counter_uncharge(&memcg->kmem, nr_pages); > > - page_counter_uncharge(&memcg->memory, nr_pages); > - if (do_memsw_account()) > - page_counter_uncharge(&memcg->memsw, nr_pages); > + refill_stock(memcg, nr_pages); > } IOW, which kernel version(s) should we be patching?
On Wed, Jan 06, 2021 at 11:50:44AM -0800, Andrew Morton wrote: > On Tue, 5 Jan 2021 20:22:39 -0800 Roman Gushchin <guro@fb.com> wrote: > > > Imran Khan reported a regression in hackbench results caused by the > > commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects > > instead of pages"). > > How large was the regression? ~16% according to Imran's data. > > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -3122,9 +3122,7 @@ void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages) > > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) > > page_counter_uncharge(&memcg->kmem, nr_pages); > > > > - page_counter_uncharge(&memcg->memory, nr_pages); > > - if (do_memsw_account()) > > - page_counter_uncharge(&memcg->memsw, nr_pages); > > + refill_stock(memcg, nr_pages); > > } > > IOW, which kernel version(s) should we be patching? 5.9+ Thanks!
On Tue, Jan 05, 2021 at 08:22:39PM -0800, Roman Gushchin <guro@fb.com> wrote: > Imran Khan reported a regression in hackbench results caused by the > commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects > instead of pages"). The regression is noticeable in the case of > a consequent allocation of several relatively large slab objects, > e.g. skb's. As soon as the amount of stocked bytes exceeds PAGE_SIZE, > drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads > to a number of atomic operations in page_counter_uncharge(). > > The corresponding call graph is below (provided by Imran Khan): > |__alloc_skb > | | > | |__kmalloc_reserve.isra.61 > | | | > | | |__kmalloc_node_track_caller > | | | | > | | | |slab_pre_alloc_hook.constprop.88 > | | | obj_cgroup_charge > | | | | | > | | | | |__memcg_kmem_charge > | | | | | | > | | | | | |page_counter_try_charge > | | | | | > | | | | |refill_obj_stock > | | | | | | > | | | | | |drain_obj_stock.isra.68 <--- draining old memcg > | | | | | | | > | | | | | | |__memcg_kmem_uncharge > | | | | | | | | > | | | | | | | |page_counter_uncharge > | | | | | | | | | > | | | | | | | | |page_counter_cancel > [...] > - page_counter_uncharge(&memcg->memory, nr_pages); > - if (do_memsw_account()) > - page_counter_uncharge(&memcg->memsw, nr_pages); > + refill_stock(memcg, nr_pages); I noticed that refill_stock(memcg,...) switches the local stock to memcg. In this particular call chain, the uncharged memcg is the old memcg of stock->cached_objcg. The refill_stock() then may switch stock->cached to the old memcg too. If the patch leads to better performance, then the switch probably doesn't happen at this moment (and I guess stock->cached_objcg and stock->cached can be independent to some extent, so the old memcg in one needn't be the old in the latter). In conclusion Reviewed-by: Michal Koutný <mkoutny@suse.com> Michal
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0d74b80fa4de..8148c1df3aff 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3122,9 +3122,7 @@ void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages) if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) page_counter_uncharge(&memcg->kmem, nr_pages); - page_counter_uncharge(&memcg->memory, nr_pages); - if (do_memsw_account()) - page_counter_uncharge(&memcg->memsw, nr_pages); + refill_stock(memcg, nr_pages); } /**
Imran Khan reported a regression in hackbench results caused by the commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of pages"). The regression is noticeable in the case of a consequent allocation of several relatively large slab objects, e.g. skb's. As soon as the amount of stocked bytes exceeds PAGE_SIZE, drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads to a number of atomic operations in page_counter_uncharge(). The corresponding call graph is below (provided by Imran Khan): |__alloc_skb | | | |__kmalloc_reserve.isra.61 | | | | | |__kmalloc_node_track_caller | | | | | | | |slab_pre_alloc_hook.constprop.88 | | | obj_cgroup_charge | | | | | | | | | |__memcg_kmem_charge | | | | | | | | | | | |page_counter_try_charge | | | | | | | | | |refill_obj_stock | | | | | | | | | | | |drain_obj_stock.isra.68 | | | | | | | | | | | | | |__memcg_kmem_uncharge | | | | | | | | | | | | | | | |page_counter_uncharge | | | | | | | | | | | | | | | | | |page_counter_cancel | | | | | | | | | | | |__slab_alloc | | | | | | | | | |___slab_alloc | | | | | | | | |slab_post_alloc_hook Instead of directly uncharging the accounted kernel memory, it's possible to refill the generic page-sized per-cpu stock instead. It's a much faster operation, especially on a default hierarchy. As a bonus, __memcg_kmem_uncharge_page() will also get faster, so the freeing of page-sized kernel allocations (e.g. large kmallocs) will become faster. A similar change has been done earlier for the socket memory by the commit 475d0487a2ad ("mm: memcontrol: use per-cpu stocks for socket memory uncharging"). Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Imran Khan <imran.f.khan@oracle.com> --- mm/memcontrol.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)