diff mbox series

[v1] mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize()

Message ID 20250210161317.717936-1-david@redhat.com (mailing list archive)
State New
Headers show
Series [v1] mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() | expand

Commit Message

David Hildenbrand Feb. 10, 2025, 4:13 p.m. UTC
If migration succeeded, we called
folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from
the old to the new folio. This will set memcg_data of the old folio to
0.

Similarly, if migration failed, memcg_data of the dst folio is left
unset.

If we call folio_putback_lru() on such folios (memcg_data == 0), we will
add the folio to be freed to the LRU, making memcg code unhappy. Running
the hmm selftests:

  # ./hmm-tests
  ...
  #  RUN           hmm.hmm_device_private.migrate ...
  [  102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00
  [  102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff)
  [  102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9
  [  102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000
  [  102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
  [  102.087230][T14893] ------------[ cut here ]------------
  [  102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170
  [  102.090478][T14893] Modules linked in:
  [  102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151
  [  102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
  [  102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170
  [  102.096104][T14893] Code: ...
  [  102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293
  [  102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426
  [  102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880
  [  102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
  [  102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8
  [  102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000
  [  102.108830][T14893] FS:  00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
  [  102.110643][T14893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0
  [  102.113478][T14893] PKRU: 55555554
  [  102.114172][T14893] Call Trace:
  [  102.114805][T14893]  <TASK>
  [  102.115397][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
  [  102.116547][T14893]  ? __warn.cold+0x110/0x210
  [  102.117461][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
  [  102.118667][T14893]  ? report_bug+0x1b9/0x320
  [  102.119571][T14893]  ? handle_bug+0x54/0x90
  [  102.120494][T14893]  ? exc_invalid_op+0x17/0x50
  [  102.121433][T14893]  ? asm_exc_invalid_op+0x1a/0x20
  [  102.122435][T14893]  ? __wake_up_klogd.part.0+0x76/0xd0
  [  102.123506][T14893]  ? dump_page+0x4f/0x60
  [  102.124352][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
  [  102.125500][T14893]  folio_batch_move_lru+0xd4/0x200
  [  102.126577][T14893]  ? __pfx_lru_add+0x10/0x10
  [  102.127505][T14893]  __folio_batch_add_and_move+0x391/0x720
  [  102.128633][T14893]  ? __pfx_lru_add+0x10/0x10
  [  102.129550][T14893]  folio_putback_lru+0x16/0x80
  [  102.130564][T14893]  migrate_device_finalize+0x9b/0x530
  [  102.131640][T14893]  dmirror_migrate_to_device.constprop.0+0x7c5/0xad0
  [  102.133047][T14893]  dmirror_fops_unlocked_ioctl+0x89b/0xc80

Likely, nothing else goes wrong: putting the last folio reference will
remove the folio from the LRU again. So besides memcg complaining,
adding the folio to be freed to the LRU is just an unnecessary step.

The new flow resembles what we have in migrate_folio_move(): add the
dst to the lru, remove migration ptes, unlock and unref dst.

Fixes: 8763cb45ab96 ("mm/migrate: new memory migration helper for use with device memory")
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/migrate_device.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)


base-commit: e5b2a356dc8a88708d97bd47cca3b8f7ed7af6cb

Comments

Alistair Popple Feb. 11, 2025, 5:23 a.m. UTC | #1
On Mon, Feb 10, 2025 at 05:13:17PM +0100, David Hildenbrand wrote:
> If migration succeeded, we called
> folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from
> the old to the new folio. This will set memcg_data of the old folio to
> 0.
> 
> Similarly, if migration failed, memcg_data of the dst folio is left
> unset.
> 
> If we call folio_putback_lru() on such folios (memcg_data == 0), we will
> add the folio to be freed to the LRU, making memcg code unhappy. Running
> the hmm selftests:
> 
>   # ./hmm-tests
>   ...
>   #  RUN           hmm.hmm_device_private.migrate ...
>   [  102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00
>   [  102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff)
>   [  102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9
>   [  102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000
>   [  102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
>   [  102.087230][T14893] ------------[ cut here ]------------
>   [  102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170
>   [  102.090478][T14893] Modules linked in:
>   [  102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151
>   [  102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>   [  102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170
>   [  102.096104][T14893] Code: ...
>   [  102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293
>   [  102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426
>   [  102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880
>   [  102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
>   [  102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8
>   [  102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000
>   [  102.108830][T14893] FS:  00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
>   [  102.110643][T14893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [  102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0
>   [  102.113478][T14893] PKRU: 55555554
>   [  102.114172][T14893] Call Trace:
>   [  102.114805][T14893]  <TASK>
>   [  102.115397][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
>   [  102.116547][T14893]  ? __warn.cold+0x110/0x210
>   [  102.117461][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
>   [  102.118667][T14893]  ? report_bug+0x1b9/0x320
>   [  102.119571][T14893]  ? handle_bug+0x54/0x90
>   [  102.120494][T14893]  ? exc_invalid_op+0x17/0x50
>   [  102.121433][T14893]  ? asm_exc_invalid_op+0x1a/0x20
>   [  102.122435][T14893]  ? __wake_up_klogd.part.0+0x76/0xd0
>   [  102.123506][T14893]  ? dump_page+0x4f/0x60
>   [  102.124352][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
>   [  102.125500][T14893]  folio_batch_move_lru+0xd4/0x200
>   [  102.126577][T14893]  ? __pfx_lru_add+0x10/0x10
>   [  102.127505][T14893]  __folio_batch_add_and_move+0x391/0x720
>   [  102.128633][T14893]  ? __pfx_lru_add+0x10/0x10
>   [  102.129550][T14893]  folio_putback_lru+0x16/0x80
>   [  102.130564][T14893]  migrate_device_finalize+0x9b/0x530
>   [  102.131640][T14893]  dmirror_migrate_to_device.constprop.0+0x7c5/0xad0
>   [  102.133047][T14893]  dmirror_fops_unlocked_ioctl+0x89b/0xc80
> 
> Likely, nothing else goes wrong: putting the last folio reference will
> remove the folio from the LRU again. So besides memcg complaining,
> adding the folio to be freed to the LRU is just an unnecessary step.

Agreed - I had always wondered why we did that instead of just dropping the
reference but figured it was something to do with the LRU batching and never
looked too closely.

> The new flow resembles what we have in migrate_folio_move(): add the
> dst to the lru, remove migration ptes, unlock and unref dst.
> 
> Fixes: 8763cb45ab96 ("mm/migrate: new memory migration helper for use with device memory")

If this was introduced by the above I was trying to figure out why I hadn't
seen it, because whilst I don't religiously run hmm-tests and similar users
with CONFIG_DEBUG_VM I do run them often enough that I'd expect to have seen
the above. It turns out that prior to 85ce2c517ade ("memcontrol: only transfer
the memcg data for migration") you can't hit this, probably because pages were
double charged during migration so old->memcg_data remained set. So perhaps the
fixes tag should point at that, but maybe it was always wrong, I'm not familiar
enough with memcg to comment.

Anyway the fix looks reasonable and works for me so you can add:

Reviewed-by: Alistair Popple <apopple@nvidia.com>
Tested-by: Alistair Popple <apopple@nvidia.com>

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/migrate_device.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 9cf26592ac934..5bd888223cc8b 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -840,20 +840,15 @@ void migrate_device_finalize(unsigned long *src_pfns,
>  			dst = src;
>  		}
>  
> +		if (!folio_is_zone_device(dst))
> +			folio_add_lru(dst);
>  		remove_migration_ptes(src, dst, 0);
>  		folio_unlock(src);
> -
> -		if (folio_is_zone_device(src))
> -			folio_put(src);
> -		else
> -			folio_putback_lru(src);
> +		folio_put(src);
>  
>  		if (dst != src) {
>  			folio_unlock(dst);
> -			if (folio_is_zone_device(dst))
> -				folio_put(dst);
> -			else
> -				folio_putback_lru(dst);
> +			folio_put(dst);
>  		}
>  	}
>  }
> 
> base-commit: e5b2a356dc8a88708d97bd47cca3b8f7ed7af6cb
> -- 
> 2.48.1
>
David Hildenbrand Feb. 11, 2025, 9:05 a.m. UTC | #2
On 11.02.25 06:23, Alistair Popple wrote:
> On Mon, Feb 10, 2025 at 05:13:17PM +0100, David Hildenbrand wrote:
>> If migration succeeded, we called
>> folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from
>> the old to the new folio. This will set memcg_data of the old folio to
>> 0.
>>
>> Similarly, if migration failed, memcg_data of the dst folio is left
>> unset.
>>
>> If we call folio_putback_lru() on such folios (memcg_data == 0), we will
>> add the folio to be freed to the LRU, making memcg code unhappy. Running
>> the hmm selftests:
>>
>>    # ./hmm-tests
>>    ...
>>    #  RUN           hmm.hmm_device_private.migrate ...
>>    [  102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00
>>    [  102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff)
>>    [  102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9
>>    [  102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000
>>    [  102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
>>    [  102.087230][T14893] ------------[ cut here ]------------
>>    [  102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170
>>    [  102.090478][T14893] Modules linked in:
>>    [  102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151
>>    [  102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>>    [  102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170
>>    [  102.096104][T14893] Code: ...
>>    [  102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293
>>    [  102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426
>>    [  102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880
>>    [  102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
>>    [  102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8
>>    [  102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000
>>    [  102.108830][T14893] FS:  00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
>>    [  102.110643][T14893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>    [  102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0
>>    [  102.113478][T14893] PKRU: 55555554
>>    [  102.114172][T14893] Call Trace:
>>    [  102.114805][T14893]  <TASK>
>>    [  102.115397][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
>>    [  102.116547][T14893]  ? __warn.cold+0x110/0x210
>>    [  102.117461][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
>>    [  102.118667][T14893]  ? report_bug+0x1b9/0x320
>>    [  102.119571][T14893]  ? handle_bug+0x54/0x90
>>    [  102.120494][T14893]  ? exc_invalid_op+0x17/0x50
>>    [  102.121433][T14893]  ? asm_exc_invalid_op+0x1a/0x20
>>    [  102.122435][T14893]  ? __wake_up_klogd.part.0+0x76/0xd0
>>    [  102.123506][T14893]  ? dump_page+0x4f/0x60
>>    [  102.124352][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
>>    [  102.125500][T14893]  folio_batch_move_lru+0xd4/0x200
>>    [  102.126577][T14893]  ? __pfx_lru_add+0x10/0x10
>>    [  102.127505][T14893]  __folio_batch_add_and_move+0x391/0x720
>>    [  102.128633][T14893]  ? __pfx_lru_add+0x10/0x10
>>    [  102.129550][T14893]  folio_putback_lru+0x16/0x80
>>    [  102.130564][T14893]  migrate_device_finalize+0x9b/0x530
>>    [  102.131640][T14893]  dmirror_migrate_to_device.constprop.0+0x7c5/0xad0
>>    [  102.133047][T14893]  dmirror_fops_unlocked_ioctl+0x89b/0xc80
>>
>> Likely, nothing else goes wrong: putting the last folio reference will
>> remove the folio from the LRU again. So besides memcg complaining,
>> adding the folio to be freed to the LRU is just an unnecessary step.
> 
> Agreed - I had always wondered why we did that instead of just dropping the
> reference but figured it was something to do with the LRU batching and never
> looked too closely.
> 
>> The new flow resembles what we have in migrate_folio_move(): add the
>> dst to the lru, remove migration ptes, unlock and unref dst.
>>
>> Fixes: 8763cb45ab96 ("mm/migrate: new memory migration helper for use with device memory")
> 
> If this was introduced by the above I was trying to figure out why I hadn't
> seen it, because whilst I don't religiously run hmm-tests and similar users
> with CONFIG_DEBUG_VM I do run them often enough that I'd expect to have seen
> the above. It turns out that prior to 85ce2c517ade ("memcontrol: only transfer
> the memcg data for migration") you can't hit this, probably because pages were
> double charged during migration so old->memcg_data remained set. So perhaps the
> fixes tag should point at that, but maybe it was always wrong, I'm not familiar
> enough with memcg to comment.

That would likely explain why we haven't sen it on the "migration 
succeeded" case when dropping src.

However, not so sure on the "migration failed" case, when we would drop 
dst. I would assume that the new folio (dst) would not be charged until 
we reached mem_cgroup_migrate() -- IOW, migration succeeded?

Thanks for the review!
Alistair Popple Feb. 11, 2025, 10:33 p.m. UTC | #3
On Tue, Feb 11, 2025 at 10:05:01AM +0100, David Hildenbrand wrote:
> On 11.02.25 06:23, Alistair Popple wrote:
> > On Mon, Feb 10, 2025 at 05:13:17PM +0100, David Hildenbrand wrote:
> > > If migration succeeded, we called
> > > folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from
> > > the old to the new folio. This will set memcg_data of the old folio to
> > > 0.
> > > 
> > > Similarly, if migration failed, memcg_data of the dst folio is left
> > > unset.
> > > 
> > > If we call folio_putback_lru() on such folios (memcg_data == 0), we will
> > > add the folio to be freed to the LRU, making memcg code unhappy. Running
> > > the hmm selftests:
> > > 
> > >    # ./hmm-tests
> > >    ...
> > >    #  RUN           hmm.hmm_device_private.migrate ...
> > >    [  102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00
> > >    [  102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff)
> > >    [  102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9
> > >    [  102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000
> > >    [  102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
> > >    [  102.087230][T14893] ------------[ cut here ]------------
> > >    [  102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170
> > >    [  102.090478][T14893] Modules linked in:
> > >    [  102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151
> > >    [  102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> > >    [  102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170
> > >    [  102.096104][T14893] Code: ...
> > >    [  102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293
> > >    [  102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426
> > >    [  102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880
> > >    [  102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
> > >    [  102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8
> > >    [  102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000
> > >    [  102.108830][T14893] FS:  00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
> > >    [  102.110643][T14893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >    [  102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0
> > >    [  102.113478][T14893] PKRU: 55555554
> > >    [  102.114172][T14893] Call Trace:
> > >    [  102.114805][T14893]  <TASK>
> > >    [  102.115397][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
> > >    [  102.116547][T14893]  ? __warn.cold+0x110/0x210
> > >    [  102.117461][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
> > >    [  102.118667][T14893]  ? report_bug+0x1b9/0x320
> > >    [  102.119571][T14893]  ? handle_bug+0x54/0x90
> > >    [  102.120494][T14893]  ? exc_invalid_op+0x17/0x50
> > >    [  102.121433][T14893]  ? asm_exc_invalid_op+0x1a/0x20
> > >    [  102.122435][T14893]  ? __wake_up_klogd.part.0+0x76/0xd0
> > >    [  102.123506][T14893]  ? dump_page+0x4f/0x60
> > >    [  102.124352][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
> > >    [  102.125500][T14893]  folio_batch_move_lru+0xd4/0x200
> > >    [  102.126577][T14893]  ? __pfx_lru_add+0x10/0x10
> > >    [  102.127505][T14893]  __folio_batch_add_and_move+0x391/0x720
> > >    [  102.128633][T14893]  ? __pfx_lru_add+0x10/0x10
> > >    [  102.129550][T14893]  folio_putback_lru+0x16/0x80
> > >    [  102.130564][T14893]  migrate_device_finalize+0x9b/0x530
> > >    [  102.131640][T14893]  dmirror_migrate_to_device.constprop.0+0x7c5/0xad0
> > >    [  102.133047][T14893]  dmirror_fops_unlocked_ioctl+0x89b/0xc80
> > > 
> > > Likely, nothing else goes wrong: putting the last folio reference will
> > > remove the folio from the LRU again. So besides memcg complaining,
> > > adding the folio to be freed to the LRU is just an unnecessary step.
> > 
> > Agreed - I had always wondered why we did that instead of just dropping the
> > reference but figured it was something to do with the LRU batching and never
> > looked too closely.
> > 
> > > The new flow resembles what we have in migrate_folio_move(): add the
> > > dst to the lru, remove migration ptes, unlock and unref dst.
> > > 
> > > Fixes: 8763cb45ab96 ("mm/migrate: new memory migration helper for use with device memory")
> > 
> > If this was introduced by the above I was trying to figure out why I hadn't
> > seen it, because whilst I don't religiously run hmm-tests and similar users
> > with CONFIG_DEBUG_VM I do run them often enough that I'd expect to have seen
> > the above. It turns out that prior to 85ce2c517ade ("memcontrol: only transfer
> > the memcg data for migration") you can't hit this, probably because pages were
> > double charged during migration so old->memcg_data remained set. So perhaps the
> > fixes tag should point at that, but maybe it was always wrong, I'm not familiar
> > enough with memcg to comment.
> 
> That would likely explain why we haven't sen it on the "migration succeeded"
> case when dropping src.
> 
> However, not so sure on the "migration failed" case, when we would drop dst.
> I would assume that the new folio (dst) would not be charged until we
> reached mem_cgroup_migrate() -- IOW, migration succeeded?

Hmm, good point. I don't think we actually have any good tests for migration
failed, and mostly it does succeed. So I guess I could believe I haven't hit
that path on a development kernel. We don't have any good test cases to force
migration failure, probably I should add one.

> Thanks for the review!
> 
> -- 
> Cheers,
> 
> David / dhildenb
>
diff mbox series

Patch

diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 9cf26592ac934..5bd888223cc8b 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -840,20 +840,15 @@  void migrate_device_finalize(unsigned long *src_pfns,
 			dst = src;
 		}
 
+		if (!folio_is_zone_device(dst))
+			folio_add_lru(dst);
 		remove_migration_ptes(src, dst, 0);
 		folio_unlock(src);
-
-		if (folio_is_zone_device(src))
-			folio_put(src);
-		else
-			folio_putback_lru(src);
+		folio_put(src);
 
 		if (dst != src) {
 			folio_unlock(dst);
-			if (folio_is_zone_device(dst))
-				folio_put(dst);
-			else
-				folio_putback_lru(dst);
+			folio_put(dst);
 		}
 	}
 }