diff mbox series

[v2,2/2] mm,page_owner: Fix accounting of pages when migrating

Message ID 20240319183212.17156-3-osalvador@suse.de (mailing list archive)
State New
Headers show
Series page_owner: Refcount fixups | expand

Commit Message

Oscar Salvador March 19, 2024, 6:32 p.m. UTC
Upon migration, new allocated pages are being given the handle of the old
pages. This is problematic because it means that for the stack which
allocated the old page, we will be substracting the old page + the new one
when that page is freed, creating an accounting imbalance.

Fix this by adding a new migrate_handle in the page_owner struct, and
record the handle that allocated the new page in __folio_copy_owner().
Upon freeing, we check whether we have a migrate_handle, and if we do,
we use migrate_handle for dec_stack_record_count(), which will
subtract those pages from its right handle.

Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/page_owner.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox March 19, 2024, 6:48 p.m. UTC | #1
On Tue, Mar 19, 2024 at 07:32:12PM +0100, Oscar Salvador wrote:
> Upon migration, new allocated pages are being given the handle of the old
> pages. This is problematic because it means that for the stack which
> allocated the old page, we will be substracting the old page + the new one
> when that page is freed, creating an accounting imbalance.
> 
> Fix this by adding a new migrate_handle in the page_owner struct, and
> record the handle that allocated the new page in __folio_copy_owner().
> Upon freeing, we check whether we have a migrate_handle, and if we do,
> we use migrate_handle for dec_stack_record_count(), which will
> subtract those pages from its right handle.

Is this the right way to fix this problem?  I would have thought we'd
be better off accounting this as migration freeing the old page and
allocating the new page.  If I understand correctly, this is the code
which says "This page was last allocated by X and freed by Y", and I
would think that being last freed (or allocated) by the migration code
would be a very nice hint about where a problem might stem from.
Oscar Salvador March 20, 2024, 5 a.m. UTC | #2
On Tue, Mar 19, 2024 at 06:48:31PM +0000, Matthew Wilcox wrote:
> Is this the right way to fix this problem?  I would have thought we'd
> be better off accounting this as migration freeing the old page and
> allocating the new page.  If I understand correctly, this is the code
> which says "This page was last allocated by X and freed by Y", and I
> would think that being last freed (or allocated) by the migration code
> would be a very nice hint about where a problem might stem from.

I hear you, and I had the same thought when I stumbled upon this.
I did not know that the handle was being changed, otherwise it would
have saved me quite a lot of debugging time.

Checking the history of this, I can see this decision was made in
2016 in:

 commit d435edca928805074dae005ab9a42d9fa60fc702
 Author: Vlastimil Babka <vbabka@suse.cz>
 Date:   Tue Mar 15 14:56:15 2016 -0700
 
     mm, page_owner: copy page owner info during migration


And let me quote:

   "The page_owner mechanism stores gfp_flags of an allocation and stack
    trace that lead to it.  During page migration, the original information
    is practically replaced by the allocation of free page as the migration
    target.  Arguably this is less useful and might lead to all the
    page_owner info for migratable pages gradually converge towards
    compaction or numa balancing migrations.  It has also lead to
    inaccuracies such as one fixed by commit e2cfc91120fa ("mm/page_owner:
    set correct gfp_mask on page_owner")."

A following patch stored the migration reason in last_migrate_reason,
and the patch also add a bit of information if last_migrate_reason was
other than 0:

 +       if (page_ext->last_migrate_reason != -1) {
 +               ret += snprintf(kbuf + ret, count - ret,
 +                       "Page has been migrated, last migrate reason: %s\n",
 +                       migrate_reason_names[page_ext->last_migrate_reason]);
 +               if (ret >= count)
 +                       goto err;
 +       }

Now, thinking about this some more, it kind of makes sense, because one
of the things page_owner is used for, in my experience, is for memory
leaks.
We print the output, try to correlate allocation/free operations per
stack so one can easily spot a stack that just keeps allocating memory
and never frees it (it might be legit, and not a leak, but it gives a
hint).

Now imagine there are 10k pages pointing to stack A.
If those pages were to be migrated e.g: kcompactd jumps in, we will lose
the original stack and replace it with something like:

migrate_pages()
...
...
kcompatd

After that, stack A does not have those 10k pages pointing to it
anymore, although it stills "own" them, just that got replaced by
new ones due to migration.

This kind of defeats the purpose of page_owner.
And after all, we do record some migration information in those new
pages, which will give us a hint when looking at the output.

So, all in all, I am leaning towards "this is fine".
Vlastimil Babka March 21, 2024, 10:50 a.m. UTC | #3
On 3/20/24 06:00, Oscar Salvador wrote:
> On Tue, Mar 19, 2024 at 06:48:31PM +0000, Matthew Wilcox wrote:
>> Is this the right way to fix this problem?  I would have thought we'd
>> be better off accounting this as migration freeing the old page and
>> allocating the new page.  If I understand correctly, this is the code
>> which says "This page was last allocated by X and freed by Y", and I
>> would think that being last freed (or allocated) by the migration code
>> would be a very nice hint about where a problem might stem from.
> 
> I hear you, and I had the same thought when I stumbled upon this.
> I did not know that the handle was being changed, otherwise it would
> have saved me quite a lot of debugging time.
> 
> Checking the history of this, I can see this decision was made in
> 2016 in:
> 
>  commit d435edca928805074dae005ab9a42d9fa60fc702
>  Author: Vlastimil Babka <vbabka@suse.cz>
>  Date:   Tue Mar 15 14:56:15 2016 -0700
>  
>      mm, page_owner: copy page owner info during migration
> 

Yeah I think we could keep that logic.

But we could also simply subtract the refcount of the old handle (the
"allocated for migration") in __folio_copy_owner() no? Then we wouldn't need
the extra migrate_handle.

Also we might have more issues here. Most page owner code takes care to set
everything for all pages within a folio, but __folio_copy_owner() and
__set_page_owner_migrate_reason() don't.
Oscar Salvador March 21, 2024, 11:07 a.m. UTC | #4
On Thu, Mar 21, 2024 at 11:50:36AM +0100, Vlastimil Babka wrote:
> Yeah I think we could keep that logic.

I am all for keeping it.

> But we could also simply subtract the refcount of the old handle (the
> "allocated for migration") in __folio_copy_owner() no? Then we wouldn't need
> the extra migrate_handle.

Since new_page will have the old handle pointing to the old stack after
the call, we
could uncharge the old_page to the migrate_stack, which new_page->_handle holds
before it gets changed.

So we basically swap it.

It could work, but I kinda have a bittersweet feeling here.
I am trying to work towards to reduce the number of lookups in the
hlist, but for the approach described above I would need to lookup
the stack for new_page->handle in order to substract the page.

OTHO, I understand that adding migrate_handle kinda wasted memory.
16MB for 16GB of memory.

> Also we might have more issues here. Most page owner code takes care to set
> everything for all pages within a folio, but __folio_copy_owner() and
> __set_page_owner_migrate_reason() don't.

I did not check deeply but do not we split the folio upon migration
in case it is large?
Which means we should reach split_page_owner() before the copy takes
place. Do I get it right?
Vlastimil Babka March 21, 2024, 11:20 a.m. UTC | #5
On 3/21/24 12:07, Oscar Salvador wrote:
> On Thu, Mar 21, 2024 at 11:50:36AM +0100, Vlastimil Babka wrote:
>> Yeah I think we could keep that logic.
> 
> I am all for keeping it.
> 
>> But we could also simply subtract the refcount of the old handle (the
>> "allocated for migration") in __folio_copy_owner() no? Then we wouldn't need
>> the extra migrate_handle.
> 
> Since new_page will have the old handle pointing to the old stack after
> the call, we
> could uncharge the old_page to the migrate_stack, which new_page->_handle holds
> before it gets changed.
> 
> So we basically swap it.
> 
> It could work, but I kinda have a bittersweet feeling here.
> I am trying to work towards to reduce the number of lookups in the
> hlist, but for the approach described above I would need to lookup
> the stack for new_page->handle in order to substract the page.

Understood, but migration is kinda heavy and non-fast-path operation already
so the extra lookup wouldn't be in a critical fast path.

> OTHO, I understand that adding migrate_handle kinda wasted memory.
> 16MB for 16GB of memory.
> 
>> Also we might have more issues here. Most page owner code takes care to set
>> everything for all pages within a folio, but __folio_copy_owner() and
>> __set_page_owner_migrate_reason() don't.
> 
> I did not check deeply but do not we split the folio upon migration
> in case it is large?
> Which means we should reach split_page_owner() before the copy takes
> place. Do I get it right?

When I mean is we have __set_page_owner_handle() setting up everything for
tail pages and then we have __folio_copy_owner updating only the head page,
so this will create kinda a mixed up information. Which might not be an
issue as __folio_copy_owner() should mean it's a proper folio thus compound
page thus nobody ever will check those mismatched tail pages... Maybe we
could adjust  __set_page_owner_handle() to skip tail pages for compound
pages as well and unify this, and tail pages would be only setup for those
weird high-order non-compound pages so that the odd case in __free_pages()
works?

Or maybe page_owner is considered debugging functionality enough so it might
be worth having the redundant data in tail pages in case something goes
wrong. But either way now it seems we're not doing it consistently.
Oscar Salvador March 21, 2024, 11:54 a.m. UTC | #6
On Thu, Mar 21, 2024 at 12:20:24PM +0100, Vlastimil Babka wrote:
> Understood, but migration is kinda heavy and non-fast-path operation already
> so the extra lookup wouldn't be in a critical fast path.

Ok, you convinced me, let us save that memory.

> When I mean is we have __set_page_owner_handle() setting up everything for
> tail pages and then we have __folio_copy_owner updating only the head page,
> so this will create kinda a mixed up information. Which might not be an
> issue as __folio_copy_owner() should mean it's a proper folio thus compound
> page thus nobody ever will check those mismatched tail pages... Maybe we
> could adjust  __set_page_owner_handle() to skip tail pages for compound
> pages as well and unify this, and tail pages would be only setup for those
> weird high-order non-compound pages so that the odd case in __free_pages()
> works?
> 
> Or maybe page_owner is considered debugging functionality enough so it might
> be worth having the redundant data in tail pages in case something goes
> wrong. But either way now it seems we're not doing it consistently.

So we basically have two options here, 1) is to also update tail pages
in __folio_copy_owner, and 2) is to follow __folio_copy_owner example
and skip tail pages in __set_page_owner.

The thing is, going with 2) might mean, as you pointed out, that if
something goes wrong we lose the information for those pages as
page_owner does not have a record for them.

OTOH, would that information be really useful? Sure we could see the stack, but
AFAICS not even the migrate_reason would be recorded in those tail
pages, which means that if they are migrated and freed, we would only
see that they were freed. (the free stack would point to the migrate
code though, so that is a hint).

Not really sure which path to take here, skipping tail pages would be
less of a complexity but at a risk of loosing information.

Anyway, let me first tackle the issue at hand here, and then I will
think more about that.

Thanks!
diff mbox series

Patch

diff --git a/mm/page_owner.c b/mm/page_owner.c
index 2613805cb665..1a7d0d1dc640 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -27,6 +27,7 @@  struct page_owner {
 	gfp_t gfp_mask;
 	depot_stack_handle_t handle;
 	depot_stack_handle_t free_handle;
+	depot_stack_handle_t migrate_handle;
 	u64 ts_nsec;
 	u64 free_ts_nsec;
 	char comm[TASK_COMM_LEN];
@@ -240,7 +241,15 @@  void __reset_page_owner(struct page *page, unsigned short order)
 		return;
 
 	page_owner = get_page_owner(page_ext);
-	alloc_handle = page_owner->handle;
+	/*
+	 * If this page was allocated for migration purposes, its handle doesn't
+	 * reference the stack it was allocated from, so make sure to use the
+	 * migrate_handle in order to subtract it from the right stack.
+	 */
+	if (!page_owner->migrate_handle)
+		alloc_handle = page_owner->handle;
+	else
+		alloc_handle = page_owner->migrate_handle;
 
 	handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
 	for (i = 0; i < (1 << order); i++) {
@@ -277,6 +286,7 @@  static inline void __set_page_owner_handle(struct page_ext *page_ext,
 		page_owner->handle = handle;
 		page_owner->order = order;
 		page_owner->gfp_mask = gfp_mask;
+		page_owner->migrate_handle = 0;
 		page_owner->last_migrate_reason = -1;
 		page_owner->pid = current->pid;
 		page_owner->tgid = current->tgid;
@@ -358,6 +368,7 @@  void __folio_copy_owner(struct folio *newfolio, struct folio *old)
 	new_page_owner->gfp_mask = old_page_owner->gfp_mask;
 	new_page_owner->last_migrate_reason =
 		old_page_owner->last_migrate_reason;
+	new_page_owner->migrate_handle = new_page_owner->handle;
 	new_page_owner->handle = old_page_owner->handle;
 	new_page_owner->pid = old_page_owner->pid;
 	new_page_owner->tgid = old_page_owner->tgid;