Message ID | 20240321163705.3067592-21-surenb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Memory allocation profiling | expand |
On Thu, Mar 21, 2024 at 09:36:42AM -0700, Suren Baghdasaryan wrote: > static inline void pgalloc_tag_add(struct page *page, struct task_struct *task, > unsigned int nr) {} > static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {} > static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {} > +static inline struct alloc_tag *pgalloc_tag_get(struct page *page) { return NULL; } > +static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {} > > #endif /* CONFIG_MEM_ALLOC_PROFILING */ > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index fd1cc5b80a56..00e0ae4cbf2d 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4700,12 +4700,15 @@ void __free_pages(struct page *page, unsigned int order) > { > /* get PageHead before we drop reference */ > int head = PageHead(page); > + struct alloc_tag *tag = pgalloc_tag_get(page); > > if (put_page_testzero(page)) > free_the_page(page, order); > - else if (!head) > + else if (!head) { > + pgalloc_tag_sub_pages(tag, (1 << order) - 1); > while (order-- > 0) > free_the_page(page + (1 << order), order); > + } Why do you need these new functions instead of just: + else if (!head) { + pgalloc_tag_sub(page, (1 << order) - 1); while (order-- > 0) free_the_page(page + (1 << order), order); + }
On Thu, Mar 21, 2024 at 04:48:53PM +0000, Matthew Wilcox wrote: > On Thu, Mar 21, 2024 at 09:36:42AM -0700, Suren Baghdasaryan wrote: > > +++ b/mm/page_alloc.c > > @@ -4700,12 +4700,15 @@ void __free_pages(struct page *page, unsigned int order) > > { > > /* get PageHead before we drop reference */ > > int head = PageHead(page); > > + struct alloc_tag *tag = pgalloc_tag_get(page); > > > > if (put_page_testzero(page)) > > free_the_page(page, order); > > - else if (!head) > > + else if (!head) { > > + pgalloc_tag_sub_pages(tag, (1 << order) - 1); > > while (order-- > 0) > > free_the_page(page + (1 << order), order); > > + } > > Why do you need these new functions instead of just: > > + else if (!head) { > + pgalloc_tag_sub(page, (1 << order) - 1); > while (order-- > 0) > free_the_page(page + (1 << order), order); > + } Actually, I'm not sure this is safe (I don't fully understand codetags, so it may be safe). What can happen is that the put_page() can come in before the pgalloc_tag_sub(), and then that page can be allocated again. Will that cause confusion?
On Thu, Mar 21, 2024 at 10:04 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Thu, Mar 21, 2024 at 04:48:53PM +0000, Matthew Wilcox wrote: > > On Thu, Mar 21, 2024 at 09:36:42AM -0700, Suren Baghdasaryan wrote: > > > +++ b/mm/page_alloc.c > > > @@ -4700,12 +4700,15 @@ void __free_pages(struct page *page, unsigned int order) > > > { > > > /* get PageHead before we drop reference */ > > > int head = PageHead(page); > > > + struct alloc_tag *tag = pgalloc_tag_get(page); > > > > > > if (put_page_testzero(page)) > > > free_the_page(page, order); > > > - else if (!head) > > > + else if (!head) { > > > + pgalloc_tag_sub_pages(tag, (1 << order) - 1); > > > while (order-- > 0) > > > free_the_page(page + (1 << order), order); > > > + } > > > > Why do you need these new functions instead of just: > > > > + else if (!head) { > > + pgalloc_tag_sub(page, (1 << order) - 1); > > while (order-- > 0) > > free_the_page(page + (1 << order), order); > > + } > > Actually, I'm not sure this is safe (I don't fully understand codetags, > so it may be safe). What can happen is that the put_page() can come in > before the pgalloc_tag_sub(), and then that page can be allocated again. > Will that cause confusion? So, there are two reasons I unfortunately can't reuse pgalloc_tag_sub(): 1. We need to subtract `bytes` counter from the codetag but not the `calls` counter, otherwise the final accounting will be incorrect. This is because we effectively allocated multiple pages with one call but freeing them with separate calls here. pgalloc_tag_sub_pages() subtracts bytes but keeps calls counter the same. I mentioned this in here: https://lore.kernel.org/all/CAJuCfpEgh1OiYNE_uKG-BqW2x97sOL9+AaTX4Jct3=WHzAv+kg@mail.gmail.com/ 2. The codetag object itself is stable, it's created at build time. The exception is when we unload modules and the codetag section gets freed but during module unloading we check that all module codetags are not referenced anymore and we prevent unloading this section if any of them are still referenced (should not normally happen). That said, the reference to the codetag (in this case from the page_ext) might change from under us and we have to make sure it's valid. We ensure that here by getting the codetag itself with pgalloc_tag_get() *before* calling put_page_testzero(), which ensures its stability. >
On Thu, Mar 21, 2024 at 10:19 AM Suren Baghdasaryan <surenb@google.com> wrote: > > On Thu, Mar 21, 2024 at 10:04 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Thu, Mar 21, 2024 at 04:48:53PM +0000, Matthew Wilcox wrote: > > > On Thu, Mar 21, 2024 at 09:36:42AM -0700, Suren Baghdasaryan wrote: > > > > +++ b/mm/page_alloc.c > > > > @@ -4700,12 +4700,15 @@ void __free_pages(struct page *page, unsigned int order) > > > > { > > > > /* get PageHead before we drop reference */ > > > > int head = PageHead(page); > > > > + struct alloc_tag *tag = pgalloc_tag_get(page); > > > > > > > > if (put_page_testzero(page)) > > > > free_the_page(page, order); > > > > - else if (!head) > > > > + else if (!head) { > > > > + pgalloc_tag_sub_pages(tag, (1 << order) - 1); > > > > while (order-- > 0) > > > > free_the_page(page + (1 << order), order); > > > > + } > > > > > > Why do you need these new functions instead of just: > > > > > > + else if (!head) { > > > + pgalloc_tag_sub(page, (1 << order) - 1); > > > while (order-- > 0) > > > free_the_page(page + (1 << order), order); > > > + } > > > > Actually, I'm not sure this is safe (I don't fully understand codetags, > > so it may be safe). What can happen is that the put_page() can come in > > before the pgalloc_tag_sub(), and then that page can be allocated again. > > Will that cause confusion? I indirectly answered your question in the reason #2 but to be clear, we obtain codetag before we do put_page() here, therefore it's valid. If another page is allocated and it points to the same codetag, then it will operate on the same codetag per-cpu counters and that should not be a problem. > > So, there are two reasons I unfortunately can't reuse pgalloc_tag_sub(): > > 1. We need to subtract `bytes` counter from the codetag but not the > `calls` counter, otherwise the final accounting will be incorrect. > This is because we effectively allocated multiple pages with one call > but freeing them with separate calls here. pgalloc_tag_sub_pages() > subtracts bytes but keeps calls counter the same. I mentioned this in > here: https://lore.kernel.org/all/CAJuCfpEgh1OiYNE_uKG-BqW2x97sOL9+AaTX4Jct3=WHzAv+kg@mail.gmail.com/ > 2. The codetag object itself is stable, it's created at build time. > The exception is when we unload modules and the codetag section gets > freed but during module unloading we check that all module codetags > are not referenced anymore and we prevent unloading this section if > any of them are still referenced (should not normally happen). That > said, the reference to the codetag (in this case from the page_ext) > might change from under us and we have to make sure it's valid. We > ensure that here by getting the codetag itself with pgalloc_tag_get() > *before* calling put_page_testzero(), which ensures its stability. > > >
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h index 093edf98c3d7..50d212330bbb 100644 --- a/include/linux/pgalloc_tag.h +++ b/include/linux/pgalloc_tag.h @@ -96,12 +96,36 @@ static inline void pgalloc_tag_split(struct page *page, unsigned int nr) page_ext_put(page_ext); } +static inline struct alloc_tag *pgalloc_tag_get(struct page *page) +{ + struct alloc_tag *tag = NULL; + + if (mem_alloc_profiling_enabled()) { + union codetag_ref *ref = get_page_tag_ref(page); + + alloc_tag_sub_check(ref); + if (ref && ref->ct) + tag = ct_to_alloc_tag(ref->ct); + put_page_tag_ref(ref); + } + + return tag; +} + +static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) +{ + if (mem_alloc_profiling_enabled() && tag) + this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr); +} + #else /* CONFIG_MEM_ALLOC_PROFILING */ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task, unsigned int nr) {} static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {} static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {} +static inline struct alloc_tag *pgalloc_tag_get(struct page *page) { return NULL; } +static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {} #endif /* CONFIG_MEM_ALLOC_PROFILING */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fd1cc5b80a56..00e0ae4cbf2d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4700,12 +4700,15 @@ void __free_pages(struct page *page, unsigned int order) { /* get PageHead before we drop reference */ int head = PageHead(page); + struct alloc_tag *tag = pgalloc_tag_get(page); if (put_page_testzero(page)) free_the_page(page, order); - else if (!head) + else if (!head) { + pgalloc_tag_sub_pages(tag, (1 << order) - 1); while (order-- > 0) free_the_page(page + (1 << order), order); + } } EXPORT_SYMBOL(__free_pages);
When a non-compound multi-order page is freed, it is possible that a speculative reference keeps the page pinned. In this case we free all pages except for the first page, which will be freed later by the last put_page(). However the page passed to put_page() is indistinguishable from an order-0 page, so it cannot do the accounting, just as it cannot free the subsequent pages. Do the accounting here, where we free the pages. Reported-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- include/linux/pgalloc_tag.h | 24 ++++++++++++++++++++++++ mm/page_alloc.c | 5 ++++- 2 files changed, 28 insertions(+), 1 deletion(-)