Message ID | 20240521131556.142176-1-wangkefeng.wang@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: memcontrol: remove page_memcg() | expand |
On Tue 21-05-24 21:15:56, Kefeng Wang wrote: > The page_memcg() only called by mod_memcg_page_state(), so squash it to > cleanup page_memcg(). > > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > include/linux/memcontrol.h | 14 ++------------ > mm/memcontrol.c | 2 +- > 2 files changed, 3 insertions(+), 13 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 030d34e9d117..8abc70cc7219 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -443,11 +443,6 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio) > return __folio_memcg(folio); > } > > -static inline struct mem_cgroup *page_memcg(struct page *page) > -{ > - return folio_memcg(page_folio(page)); > -} > - > /** > * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio. > * @folio: Pointer to the folio. > @@ -1014,7 +1009,7 @@ static inline void mod_memcg_page_state(struct page *page, > return; > > rcu_read_lock(); > - memcg = page_memcg(page); > + memcg = folio_memcg(page_folio(page)); > if (memcg) > mod_memcg_state(memcg, idx, val); > rcu_read_unlock(); > @@ -1133,11 +1128,6 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio) > return NULL; > } > > -static inline struct mem_cgroup *page_memcg(struct page *page) > -{ > - return NULL; > -} > - > static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) > { > WARN_ON_ONCE(!rcu_read_lock_held()); > @@ -1636,7 +1626,7 @@ static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, > spin_unlock_irqrestore(&lruvec->lru_lock, flags); > } > > -/* Test requires a stable page->memcg binding, see page_memcg() */ > +/* Test requires a stable page->memcg binding, see folio_memcg() */ > static inline bool folio_matches_lruvec(struct folio *folio, > struct lruvec *lruvec) > { > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 54070687aad2..72833f6f0944 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3811,7 +3811,7 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, > #endif /* CONFIG_MEMCG_KMEM */ > > /* > - * Because page_memcg(head) is not set on tails, set it now. > + * Because folio_memcg(head) is not set on tails, set it now. > */ > void split_page_memcg(struct page *head, int old_order, int new_order) > { > -- > 2.41.0
On Tue, May 21, 2024 at 09:15:56PM +0800, Kefeng Wang wrote: > -/* Test requires a stable page->memcg binding, see page_memcg() */ > +/* Test requires a stable page->memcg binding, see folio_memcg() */ folio->memcg, not page->memcg
On Tue, May 21, 2024 at 09:15:56PM +0800, Kefeng Wang wrote: > The page_memcg() only called by mod_memcg_page_state(), so squash it to > cleanup page_memcg(). This isn't wrong, except that the entire usage of memcg is wrong in the only two callers of mod_memcg_page_state(): $ git grep mod_memcg_page_state include/linux/memcontrol.h:static inline void mod_memcg_page_state(struct page *page, include/linux/memcontrol.h:static inline void mod_memcg_page_state(struct page *page, mm/vmalloc.c: mod_memcg_page_state(page, MEMCG_VMALLOC, -1); mm/vmalloc.c: mod_memcg_page_state(area->pages[i], MEMCG_VMALLOC, 1); The memcg should not be attached to the individual pages that make up a vmalloc allocation. Rather, it should be managed by the vmalloc allocation itself. I don't have the knowledge to poke around inside vmalloc right now, but maybe somebody else could take that on.
On Tue 21-05-24 15:44:21, Matthew Wilcox wrote: > On Tue, May 21, 2024 at 09:15:56PM +0800, Kefeng Wang wrote: > > The page_memcg() only called by mod_memcg_page_state(), so squash it to > > cleanup page_memcg(). > > This isn't wrong, except that the entire usage of memcg is wrong in the > only two callers of mod_memcg_page_state(): > > $ git grep mod_memcg_page_state > include/linux/memcontrol.h:static inline void mod_memcg_page_state(struct page *page, > include/linux/memcontrol.h:static inline void mod_memcg_page_state(struct page *page, > mm/vmalloc.c: mod_memcg_page_state(page, MEMCG_VMALLOC, -1); > mm/vmalloc.c: mod_memcg_page_state(area->pages[i], MEMCG_VMALLOC, 1); > > The memcg should not be attached to the individual pages that make up a > vmalloc allocation. Rather, it should be managed by the vmalloc > allocation itself. I don't have the knowledge to poke around inside > vmalloc right now, but maybe somebody else could take that on. This would make sense as a follow up.
On Tue, May 21, 2024 at 03:44:21PM +0100, Matthew Wilcox wrote: > On Tue, May 21, 2024 at 09:15:56PM +0800, Kefeng Wang wrote: > > The page_memcg() only called by mod_memcg_page_state(), so squash it to > > cleanup page_memcg(). > > This isn't wrong, except that the entire usage of memcg is wrong in the > only two callers of mod_memcg_page_state(): > > $ git grep mod_memcg_page_state > include/linux/memcontrol.h:static inline void mod_memcg_page_state(struct page *page, > include/linux/memcontrol.h:static inline void mod_memcg_page_state(struct page *page, > mm/vmalloc.c: mod_memcg_page_state(page, MEMCG_VMALLOC, -1); > mm/vmalloc.c: mod_memcg_page_state(area->pages[i], MEMCG_VMALLOC, 1); > > The memcg should not be attached to the individual pages that make up a > vmalloc allocation. Rather, it should be managed by the vmalloc > allocation itself. I don't have the knowledge to poke around inside > vmalloc right now, but maybe somebody else could take that on. Are you concerned about accessing just memcg or any field of the sub-page? There are drivers accessing fields of pages allocated through vmalloc. Some details at 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound").
On 2024/5/22 3:29, Shakeel Butt wrote: > On Tue, May 21, 2024 at 03:44:21PM +0100, Matthew Wilcox wrote: >> On Tue, May 21, 2024 at 09:15:56PM +0800, Kefeng Wang wrote: >>> The page_memcg() only called by mod_memcg_page_state(), so squash it to >>> cleanup page_memcg(). >> >> This isn't wrong, except that the entire usage of memcg is wrong in the >> only two callers of mod_memcg_page_state(): >> >> $ git grep mod_memcg_page_state >> include/linux/memcontrol.h:static inline void mod_memcg_page_state(struct page *page, >> include/linux/memcontrol.h:static inline void mod_memcg_page_state(struct page *page, >> mm/vmalloc.c: mod_memcg_page_state(page, MEMCG_VMALLOC, -1); >> mm/vmalloc.c: mod_memcg_page_state(area->pages[i], MEMCG_VMALLOC, 1); >> >> The memcg should not be attached to the individual pages that make up a >> vmalloc allocation. Rather, it should be managed by the vmalloc >> allocation itself. I don't have the knowledge to poke around inside >> vmalloc right now, but maybe somebody else could take that on. > > Are you concerned about accessing just memcg or any field of the > sub-page? There are drivers accessing fields of pages allocated through > vmalloc. Some details at 3b8000ae185c ("mm/vmalloc: huge vmalloc backing > pages should be split rather than compound"). Maybe Matthew want something shown below, move the memcg MEMCG_VMALLOC stat update from per-page to per-vmalloc-allocation? It should be speed up the statistic after conversion. diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index e4a631ec430b..89f115623124 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -55,6 +55,9 @@ struct vm_struct { unsigned long size; unsigned long flags; struct page **pages; +#ifdef CONFIG_MEMCG_KMEM + struct obj_cgroup *objcg; +#endif #ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC unsigned int page_order; #endif diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 5d3aa2dc88a8..3e28c382f604 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3001,6 +3001,49 @@ static inline void set_vm_area_page_order(struct vm_struct *vm, unsigned int ord #endif } +#ifdef CONFIG_MEMCG_KMEM +static void vmalloc_memcg_alloc_hook(struct vm_struct *area, gfp_t gfp, + int nr_pages) +{ + struct obj_cgroup *objcg; + + if (!memcg_kmem_online() || !(gfp & __GFP_ACCOUNT)) + return; + + objcg = get_obj_cgroup_from_current(); + if (objcg) + return; + + area->objcg = objcg; + + rcu_read_lock(); + mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_VMALLOC, nr_pages); + rcu_read_unlock(); +} + +static void vmalloc_memcg_free_hook(struct vm_struct *area) +{ + struct obj_cgroup *objcg = area->objcg; + + if (!objcg) + return; + + rcu_read_lock(); + mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_VMALLOC, -area->nr_pages); + rcu_read_unlock(); + + obj_cgroup_put(objcg); +} +#else +static void vmalloc_memcg_alloc_hook(struct vm_struct *area, gfp_t gfp, + int nr_pages) +{ +} +static void vmalloc_memcg_free_hook(struct vm_struct *area) +{ +} +#endif + /** * vm_area_add_early - add vmap area early during boot * @vm: vm_struct to add @@ -3338,7 +3381,6 @@ void vfree(const void *addr) struct page *page = vm->pages[i]; BUG_ON(!page); - mod_memcg_page_state(page, MEMCG_VMALLOC, -1); /* * High-order allocs for huge vmallocs are split, so * can be freed as an array of order-0 allocations @@ -3347,6 +3389,7 @@ void vfree(const void *addr) cond_resched(); } atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages); + vmalloc_memcg_free_hook(vm); kvfree(vm->pages); kfree(vm); } @@ -3643,12 +3686,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, node, page_order, nr_small_pages, area->pages); atomic_long_add(area->nr_pages, &nr_vmalloc_pages); - if (gfp_mask & __GFP_ACCOUNT) { - int i; - - for (i = 0; i < area->nr_pages; i++) - mod_memcg_page_state(area->pages[i], MEMCG_VMALLOC, 1); - } + vmalloc_memcg_alloc_hook(area, gfp_mask, area->nr_pages); /* * If not enough pages were obtained to accomplish an
On 2024/5/21 22:21, Matthew Wilcox wrote: > On Tue, May 21, 2024 at 09:15:56PM +0800, Kefeng Wang wrote: >> -/* Test requires a stable page->memcg binding, see page_memcg() */ >> +/* Test requires a stable page->memcg binding, see folio_memcg() */ > > folio->memcg, not page->memcg OK, will fix.
On Tue, May 21, 2024 at 12:29:39PM -0700, Shakeel Butt wrote: > On Tue, May 21, 2024 at 03:44:21PM +0100, Matthew Wilcox wrote: > > The memcg should not be attached to the individual pages that make up a > > vmalloc allocation. Rather, it should be managed by the vmalloc > > allocation itself. I don't have the knowledge to poke around inside > > vmalloc right now, but maybe somebody else could take that on. > > Are you concerned about accessing just memcg or any field of the > sub-page? There are drivers accessing fields of pages allocated through > vmalloc. Some details at 3b8000ae185c ("mm/vmalloc: huge vmalloc backing > pages should be split rather than compound"). Thanks for the pointer, and fb_deferred_io_fault() is already on my hitlist for abusing struct page. My primary concern is that we should track the entire allocation as a single object rather than tracking each page individually. That means assigning the vmalloc allocation to a memcg rather than assigning each page to a memcg. It's a lot less overhead to increment the counter once per allocation rather than once per page in the allocation! But secondarily, yes, pages allocated by vmalloc probably don't need any per-page state, other than tracking the vmalloc allocation they're assigned to. We'll see how that theory turns out.
On Thu, May 23, 2024 at 02:31:05PM +0100, Matthew Wilcox wrote: > On Tue, May 21, 2024 at 12:29:39PM -0700, Shakeel Butt wrote: > > On Tue, May 21, 2024 at 03:44:21PM +0100, Matthew Wilcox wrote: > > > The memcg should not be attached to the individual pages that make up a > > > vmalloc allocation. Rather, it should be managed by the vmalloc > > > allocation itself. I don't have the knowledge to poke around inside > > > vmalloc right now, but maybe somebody else could take that on. > > > > Are you concerned about accessing just memcg or any field of the > > sub-page? There are drivers accessing fields of pages allocated through > > vmalloc. Some details at 3b8000ae185c ("mm/vmalloc: huge vmalloc backing > > pages should be split rather than compound"). > > Thanks for the pointer, and fb_deferred_io_fault() is already on my > hitlist for abusing struct page. > > My primary concern is that we should track the entire allocation as a > single object rather than tracking each page individually. That means > assigning the vmalloc allocation to a memcg rather than assigning each > page to a memcg. It's a lot less overhead to increment the counter once > per allocation rather than once per page in the allocation! > > But secondarily, yes, pages allocated by vmalloc probably don't need > any per-page state, other than tracking the vmalloc allocation they're > assigned to. We'll see how that theory turns out. I think the tricky part would be vmalloc having pages spanning multiple nodes which is not an issue for MEMCG_VMALLOC stat but the vmap based kernel stack (CONFIG_VMAP_STACK) metric NR_KERNEL_STACK_KB cares about that information.
On Thu, May 23, 2024 at 08:41:25AM -0700, Shakeel Butt wrote: > On Thu, May 23, 2024 at 02:31:05PM +0100, Matthew Wilcox wrote: > > On Tue, May 21, 2024 at 12:29:39PM -0700, Shakeel Butt wrote: > > > On Tue, May 21, 2024 at 03:44:21PM +0100, Matthew Wilcox wrote: > > > > The memcg should not be attached to the individual pages that make up a > > > > vmalloc allocation. Rather, it should be managed by the vmalloc > > > > allocation itself. I don't have the knowledge to poke around inside > > > > vmalloc right now, but maybe somebody else could take that on. > > > > > > Are you concerned about accessing just memcg or any field of the > > > sub-page? There are drivers accessing fields of pages allocated through > > > vmalloc. Some details at 3b8000ae185c ("mm/vmalloc: huge vmalloc backing > > > pages should be split rather than compound"). > > > > Thanks for the pointer, and fb_deferred_io_fault() is already on my > > hitlist for abusing struct page. > > > > My primary concern is that we should track the entire allocation as a > > single object rather than tracking each page individually. That means > > assigning the vmalloc allocation to a memcg rather than assigning each > > page to a memcg. It's a lot less overhead to increment the counter once > > per allocation rather than once per page in the allocation! > > > > But secondarily, yes, pages allocated by vmalloc probably don't need > > any per-page state, other than tracking the vmalloc allocation they're > > assigned to. We'll see how that theory turns out. > > I think the tricky part would be vmalloc having pages spanning multiple > nodes which is not an issue for MEMCG_VMALLOC stat but the vmap based > kernel stack (CONFIG_VMAP_STACK) metric NR_KERNEL_STACK_KB cares about > that information. Yes, we'll have to handle mod_lruvec_page_state() differently since that stat is tracked per node. Or we could stop tracking that stat per node. Is it useful to track it per node? Why is it useful to track kernel stacks per node, but not track vmalloc allocations per node?
On Thu, May 23, 2024 at 05:34:27PM GMT, Matthew Wilcox wrote: > On Thu, May 23, 2024 at 08:41:25AM -0700, Shakeel Butt wrote: > > On Thu, May 23, 2024 at 02:31:05PM +0100, Matthew Wilcox wrote: > > > On Tue, May 21, 2024 at 12:29:39PM -0700, Shakeel Butt wrote: > > > > On Tue, May 21, 2024 at 03:44:21PM +0100, Matthew Wilcox wrote: > > > > > The memcg should not be attached to the individual pages that make up a > > > > > vmalloc allocation. Rather, it should be managed by the vmalloc > > > > > allocation itself. I don't have the knowledge to poke around inside > > > > > vmalloc right now, but maybe somebody else could take that on. > > > > > > > > Are you concerned about accessing just memcg or any field of the > > > > sub-page? There are drivers accessing fields of pages allocated through > > > > vmalloc. Some details at 3b8000ae185c ("mm/vmalloc: huge vmalloc backing > > > > pages should be split rather than compound"). > > > > > > Thanks for the pointer, and fb_deferred_io_fault() is already on my > > > hitlist for abusing struct page. > > > > > > My primary concern is that we should track the entire allocation as a > > > single object rather than tracking each page individually. That means > > > assigning the vmalloc allocation to a memcg rather than assigning each > > > page to a memcg. It's a lot less overhead to increment the counter once > > > per allocation rather than once per page in the allocation! > > > > > > But secondarily, yes, pages allocated by vmalloc probably don't need > > > any per-page state, other than tracking the vmalloc allocation they're > > > assigned to. We'll see how that theory turns out. > > > > I think the tricky part would be vmalloc having pages spanning multiple > > nodes which is not an issue for MEMCG_VMALLOC stat but the vmap based > > kernel stack (CONFIG_VMAP_STACK) metric NR_KERNEL_STACK_KB cares about > > that information. > > Yes, we'll have to handle mod_lruvec_page_state() differently since that > stat is tracked per node. Or we could stop tracking that stat per node. > Is it useful to track it per node? Why is it useful to track kernel > stacks per node, but not track vmalloc allocations per node? This is a good question and other than that there are user visible APIs (per numa meminfo & memory.numa_stat), I don't have a good answer.
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 030d34e9d117..8abc70cc7219 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -443,11 +443,6 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio) return __folio_memcg(folio); } -static inline struct mem_cgroup *page_memcg(struct page *page) -{ - return folio_memcg(page_folio(page)); -} - /** * folio_memcg_rcu - Locklessly get the memory cgroup associated with a folio. * @folio: Pointer to the folio. @@ -1014,7 +1009,7 @@ static inline void mod_memcg_page_state(struct page *page, return; rcu_read_lock(); - memcg = page_memcg(page); + memcg = folio_memcg(page_folio(page)); if (memcg) mod_memcg_state(memcg, idx, val); rcu_read_unlock(); @@ -1133,11 +1128,6 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio) return NULL; } -static inline struct mem_cgroup *page_memcg(struct page *page) -{ - return NULL; -} - static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) { WARN_ON_ONCE(!rcu_read_lock_held()); @@ -1636,7 +1626,7 @@ static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, spin_unlock_irqrestore(&lruvec->lru_lock, flags); } -/* Test requires a stable page->memcg binding, see page_memcg() */ +/* Test requires a stable page->memcg binding, see folio_memcg() */ static inline bool folio_matches_lruvec(struct folio *folio, struct lruvec *lruvec) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 54070687aad2..72833f6f0944 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3811,7 +3811,7 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, #endif /* CONFIG_MEMCG_KMEM */ /* - * Because page_memcg(head) is not set on tails, set it now. + * Because folio_memcg(head) is not set on tails, set it now. */ void split_page_memcg(struct page *head, int old_order, int new_order) {
The page_memcg() only called by mod_memcg_page_state(), so squash it to cleanup page_memcg(). Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> --- include/linux/memcontrol.h | 14 ++------------ mm/memcontrol.c | 2 +- 2 files changed, 3 insertions(+), 13 deletions(-)