diff mbox series

[v1] mm/pagewalk: fix usage of pmd_leaf()/pud_leaf() without present check

Message ID 20241015111236.1290921-1-david@redhat.com (mailing list archive)
State New
Headers show
Series [v1] mm/pagewalk: fix usage of pmd_leaf()/pud_leaf() without present check | expand

Commit Message

David Hildenbrand Oct. 15, 2024, 11:12 a.m. UTC
pmd_leaf()/pud_leaf() only implies a pmd_present()/pud_present() check on
some architectures. We really should check for
pmd_present()/pud_present() first.

This should explain the report we got on ppc64 (which has
CONFIG_PGTABLE_HAS_HUGE_LEAVES set in the config) that triggered:
	VM_WARN_ON_ONCE(pmd_leaf(pmdp_get_lockless(pmdp)));

Likely we had a PMD migration entry for which pmd_leaf() did not
trigger. We raced with restoring the PMD migration entry, and suddenly
saw a pmd_leaf(). In this case, pte_offset_map_lock() saved us from more
trouble, because it rechecks the PMD value, but we would not have processed
the migration entry -- which is not too bad because the only user of
FW_MIGRATION is KSM for unsharing, and KSM only applies to small folios.

Further, we shouldn't re-read the PMD/PUD value for our warning, the
primary purpose of the VM_WARN_ON_ONCE() is to find spurious use of
pmd_leaf()/pud_leaf() without CONFIG_PGTABLE_HAS_HUGE_LEAVES.

As a side note, we are currently not implementing FW_MIGRATION support
for PUD migration entries, which likely should exist due to hugetlb. Add
a TODO so this won't fall through the cracks if more FW_MIGRATION users
get added.

Fixes: aa39ca6940f1 ("mm/pagewalk: introduce folio_walk_start() + folio_walk_end()")
Reported-by: syzbot+7d917f67c05066cec295@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/670d3248.050a0220.3e960.0064.GAE@google.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/pagewalk.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

David Hildenbrand Oct. 15, 2024, 1:13 p.m. UTC | #1
On 15.10.24 13:12, David Hildenbrand wrote:
> pmd_leaf()/pud_leaf() only implies a pmd_present()/pud_present() check on
> some architectures. We really should check for
> pmd_present()/pud_present() first.
> 
> This should explain the report we got on ppc64 (which has
> CONFIG_PGTABLE_HAS_HUGE_LEAVES set in the config) that triggered:
> 	VM_WARN_ON_ONCE(pmd_leaf(pmdp_get_lockless(pmdp)));
> 
> Likely we had a PMD migration entry for which pmd_leaf() did not
> trigger. We raced with restoring the PMD migration entry, and suddenly
> saw a pmd_leaf(). In this case, pte_offset_map_lock() saved us from more
> trouble, because it rechecks the PMD value, but we would not have processed
> the migration entry -- which is not too bad because the only user of
> FW_MIGRATION is KSM for unsharing, and KSM only applies to small folios.
> 
> Further, we shouldn't re-read the PMD/PUD value for our warning, the
> primary purpose of the VM_WARN_ON_ONCE() is to find spurious use of
> pmd_leaf()/pud_leaf() without CONFIG_PGTABLE_HAS_HUGE_LEAVES.
> 
> As a side note, we are currently not implementing FW_MIGRATION support
> for PUD migration entries, which likely should exist due to hugetlb. Add
> a TODO so this won't fall through the cracks if more FW_MIGRATION users
> get added.
> 
> Fixes: aa39ca6940f1 ("mm/pagewalk: introduce folio_walk_start() + folio_walk_end()")
> Reported-by: syzbot+7d917f67c05066cec295@syzkaller.appspotmail.com
> Closes: https://lkml.kernel.org/r/670d3248.050a0220.3e960.0064.GAE@google.com
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jann Horn <jannh@google.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Was able to write a quick reproducer and verify that the issue no longer triggers with this fix.

https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/reproducers/move-pages-pmd-leaf.c

Without this fix after a couple of seconds in a VM with 2 NUMA nodes:

[   54.333753] ------------[ cut here ]------------
[   54.334901] WARNING: CPU: 20 PID: 1704 at mm/pagewalk.c:815 folio_walk_start+0x48f/0x6e0
[   54.336455] Modules linked in: ...
[   54.345009] CPU: 20 UID: 0 PID: 1704 Comm: move-pages-pmd- Not tainted 6.12.0-rc2+ #81
[   54.346529] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[   54.348191] RIP: 0010:folio_walk_start+0x48f/0x6e0
[   54.349134] Code: b5 ad 48 8d 35 00 00 00 00 e8 6d 59 d7 ff e8 08 74 da ff e9 9c fe ff ff 4c 8b 7c 24 08 4c 89 ff e8 26 2b be 00 e9 8a fe ff ff <0f> 0b e9 ec fe ff ff f7 c2 ff 0f 00 00 0f 85 81 fe ff ff 48 8b 02
[   54.352660] RSP: 0018:ffffb7e4c430bc78 EFLAGS: 00010282
[   54.353679] RAX: 80000002a3e008e7 RBX: ffff9946039aa580 RCX: ffff994380000000
[   54.355056] RDX: ffff994606aec000 RSI: 00007f004b000000 RDI: 0000000000000000
[   54.356440] RBP: 00007f004b000000 R08: 0000000000000591 R09: 0000000000000001
[   54.357820] R10: 0000000000000200 R11: 0000000000000001 R12: ffffb7e4c430bd10
[   54.359198] R13: ffff994606aec2c0 R14: 0000000000000002 R15: ffff994604a89b00
[   54.360564] FS:  00007f004ae006c0(0000) GS:ffff9947f7400000(0000) knlGS:0000000000000000
[   54.362111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.363242] CR2: 00007f004adffe58 CR3: 0000000281e12005 CR4: 0000000000770ef0
[   54.364615] PKRU: 55555554
[   54.365153] Call Trace:
[   54.365646]  <TASK>
[   54.366073]  ? __warn.cold+0xb7/0x14d
[   54.366796]  ? folio_walk_start+0x48f/0x6e0
[   54.367628]  ? report_bug+0xff/0x140
[   54.368324]  ? handle_bug+0x58/0x90
[   54.369019]  ? exc_invalid_op+0x17/0x70
[   54.369771]  ? asm_exc_invalid_op+0x1a/0x20
[   54.370606]  ? folio_walk_start+0x48f/0x6e0
[   54.371415]  ? folio_walk_start+0x9e/0x6e0
[   54.372227]  do_pages_move+0x1c5/0x680
[   54.372972]  kernel_move_pages+0x1a1/0x2b0
[   54.373804]  __x64_sys_move_pages+0x25/0x30
Kirill A. Shutemov Oct. 15, 2024, 2:32 p.m. UTC | #2
On Tue, Oct 15, 2024 at 01:12:36PM +0200, David Hildenbrand wrote:
> pmd_leaf()/pud_leaf() only implies a pmd_present()/pud_present() check on
> some architectures.

Should we clarify what behaviour we actually want from arch code?


> We really should check for
> pmd_present()/pud_present() first.
> 
> This should explain the report we got on ppc64 (which has
> CONFIG_PGTABLE_HAS_HUGE_LEAVES set in the config) that triggered:
> 	VM_WARN_ON_ONCE(pmd_leaf(pmdp_get_lockless(pmdp)));
> 
> Likely we had a PMD migration entry for which pmd_leaf() did not
> trigger. We raced with restoring the PMD migration entry, and suddenly
> saw a pmd_leaf(). In this case, pte_offset_map_lock() saved us from more
> trouble, because it rechecks the PMD value, but we would not have processed
> the migration entry -- which is not too bad because the only user of
> FW_MIGRATION is KSM for unsharing, and KSM only applies to small folios.
> 
> Further, we shouldn't re-read the PMD/PUD value for our warning, the
> primary purpose of the VM_WARN_ON_ONCE() is to find spurious use of
> pmd_leaf()/pud_leaf() without CONFIG_PGTABLE_HAS_HUGE_LEAVES.
> 
> As a side note, we are currently not implementing FW_MIGRATION support
> for PUD migration entries, which likely should exist due to hugetlb. Add
> a TODO so this won't fall through the cracks if more FW_MIGRATION users
> get added.
> 
> Fixes: aa39ca6940f1 ("mm/pagewalk: introduce folio_walk_start() + folio_walk_end()")
> Reported-by: syzbot+7d917f67c05066cec295@syzkaller.appspotmail.com
> Closes: https://lkml.kernel.org/r/670d3248.050a0220.3e960.0064.GAE@google.com
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jann Horn <jannh@google.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
David Hildenbrand Oct. 15, 2024, 2:40 p.m. UTC | #3
On 15.10.24 16:32, Kirill A. Shutemov wrote:
> On Tue, Oct 15, 2024 at 01:12:36PM +0200, David Hildenbrand wrote:
>> pmd_leaf()/pud_leaf() only implies a pmd_present()/pud_present() check on
>> some architectures.
> 
> Should we clarify what behaviour we actually want from arch code?

We probably should document somewhere that things like pmd_special(), 
pmd_leaf() ... should only be used when we know that the PMD is present.

I wonder if we should even add ways to detect mis-use

Jann also raised that recently in a private message, that it is rather 
unclear (well, and repeatedly leads to issues) when pmd_leaf() is valid 
to be called.

Thanks!
Jann Horn Oct. 15, 2024, 2:43 p.m. UTC | #4
On Tue, Oct 15, 2024 at 4:40 PM David Hildenbrand <david@redhat.com> wrote:
> On 15.10.24 16:32, Kirill A. Shutemov wrote:
> > On Tue, Oct 15, 2024 at 01:12:36PM +0200, David Hildenbrand wrote:
> >> pmd_leaf()/pud_leaf() only implies a pmd_present()/pud_present() check on
> >> some architectures.
> >
> > Should we clarify what behaviour we actually want from arch code?
>
> We probably should document somewhere that things like pmd_special(),
> pmd_leaf() ... should only be used when we know that the PMD is present.
>
> I wonder if we should even add ways to detect mis-use
>
> Jann also raised that recently in a private message, that it is rather
> unclear (well, and repeatedly leads to issues) when pmd_leaf() is valid
> to be called.

I think one place where that should probably be addressed is in
Documentation/mm/arch_pgtable_helpers.rst - that is supposed to be an
overview of these helper functions and what they mean, but the only
thing it currently says about pmd_leaf() is "Tests a leaf mapped PMD",
which doesn't really tell you much more than the function name. It
would be nice if that table contained information about the conditions
under which these helpers may be used.
Zi Yan Oct. 15, 2024, 3:45 p.m. UTC | #5
On 15 Oct 2024, at 10:43, Jann Horn wrote:

> On Tue, Oct 15, 2024 at 4:40 PM David Hildenbrand <david@redhat.com> wrote:
>> On 15.10.24 16:32, Kirill A. Shutemov wrote:
>>> On Tue, Oct 15, 2024 at 01:12:36PM +0200, David Hildenbrand wrote:
>>>> pmd_leaf()/pud_leaf() only implies a pmd_present()/pud_present() check on
>>>> some architectures.
>>>
>>> Should we clarify what behaviour we actually want from arch code?
>>
>> We probably should document somewhere that things like pmd_special(),
>> pmd_leaf() ... should only be used when we know that the PMD is present.
>>
>> I wonder if we should even add ways to detect mis-use
>>
>> Jann also raised that recently in a private message, that it is rather
>> unclear (well, and repeatedly leads to issues) when pmd_leaf() is valid
>> to be called.
>
> I think one place where that should probably be addressed is in
> Documentation/mm/arch_pgtable_helpers.rst - that is supposed to be an
> overview of these helper functions and what they mean, but the only
> thing it currently says about pmd_leaf() is "Tests a leaf mapped PMD",
> which doesn't really tell you much more than the function name. It
> would be nice if that table contained information about the conditions
> under which these helpers may be used.

I find the related documentation in include/linux/pgtable.h [1], but it is
not an easy find.


[1] https://elixir.bootlin.com/linux/v6.11.3/source/include/linux/pgtable.h#L1865

Best Regards,
Yan, Zi
Qi Zheng Oct. 16, 2024, 10:58 a.m. UTC | #6
On 2024/10/15 21:13, David Hildenbrand wrote:
> On 15.10.24 13:12, David Hildenbrand wrote:
>> pmd_leaf()/pud_leaf() only implies a pmd_present()/pud_present() check on
>> some architectures. We really should check for
>> pmd_present()/pud_present() first.
>>
>> This should explain the report we got on ppc64 (which has
>> CONFIG_PGTABLE_HAS_HUGE_LEAVES set in the config) that triggered:
>>     VM_WARN_ON_ONCE(pmd_leaf(pmdp_get_lockless(pmdp)));
>>
>> Likely we had a PMD migration entry for which pmd_leaf() did not
>> trigger. We raced with restoring the PMD migration entry, and suddenly
>> saw a pmd_leaf(). In this case, pte_offset_map_lock() saved us from more
>> trouble, because it rechecks the PMD value, but we would not have 
>> processed
>> the migration entry -- which is not too bad because the only user of
>> FW_MIGRATION is KSM for unsharing, and KSM only applies to small folios.
>>
>> Further, we shouldn't re-read the PMD/PUD value for our warning, the
>> primary purpose of the VM_WARN_ON_ONCE() is to find spurious use of
>> pmd_leaf()/pud_leaf() without CONFIG_PGTABLE_HAS_HUGE_LEAVES.
>>
>> As a side note, we are currently not implementing FW_MIGRATION support
>> for PUD migration entries, which likely should exist due to hugetlb. Add
>> a TODO so this won't fall through the cracks if more FW_MIGRATION users
>> get added.
>>
>> Fixes: aa39ca6940f1 ("mm/pagewalk: introduce folio_walk_start() + 
>> folio_walk_end()")
>> Reported-by: syzbot+7d917f67c05066cec295@syzkaller.appspotmail.com
>> Closes: 
>> https://lkml.kernel.org/r/670d3248.050a0220.3e960.0064.GAE@google.com
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Jann Horn <jannh@google.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
> 
> Was able to write a quick reproducer and verify that the issue no longer 
> triggers with this fix.
> 
> https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/reproducers/move-pages-pmd-leaf.c
> 
> Without this fix after a couple of seconds in a VM with 2 NUMA nodes:
> 
> [   54.333753] ------------[ cut here ]------------
> [   54.334901] WARNING: CPU: 20 PID: 1704 at mm/pagewalk.c:815 
> folio_walk_start+0x48f/0x6e0
> [   54.336455] Modules linked in: ...
> [   54.345009] CPU: 20 UID: 0 PID: 1704 Comm: move-pages-pmd- Not 
> tainted 6.12.0-rc2+ #81
> [   54.346529] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.16.3-2.fc40 04/01/2014
> [   54.348191] RIP: 0010:folio_walk_start+0x48f/0x6e0
> [   54.349134] Code: b5 ad 48 8d 35 00 00 00 00 e8 6d 59 d7 ff e8 08 74 
> da ff e9 9c fe ff ff 4c 8b 7c 24 08 4c 89 ff e8 26 2b be 00 e9 8a fe ff 
> ff <0f> 0b e9 ec fe ff ff f7 c2 ff 0f 00 00 0f 85 81 fe ff ff 48 8b 02
> [   54.352660] RSP: 0018:ffffb7e4c430bc78 EFLAGS: 00010282
> [   54.353679] RAX: 80000002a3e008e7 RBX: ffff9946039aa580 RCX: 
> ffff994380000000
> [   54.355056] RDX: ffff994606aec000 RSI: 00007f004b000000 RDI: 
> 0000000000000000
> [   54.356440] RBP: 00007f004b000000 R08: 0000000000000591 R09: 
> 0000000000000001
> [   54.357820] R10: 0000000000000200 R11: 0000000000000001 R12: 
> ffffb7e4c430bd10
> [   54.359198] R13: ffff994606aec2c0 R14: 0000000000000002 R15: 
> ffff994604a89b00
> [   54.360564] FS:  00007f004ae006c0(0000) GS:ffff9947f7400000(0000) 
> knlGS:0000000000000000
> [   54.362111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   54.363242] CR2: 00007f004adffe58 CR3: 0000000281e12005 CR4: 
> 0000000000770ef0
> [   54.364615] PKRU: 55555554
> [   54.365153] Call Trace:
> [   54.365646]  <TASK>
> [   54.366073]  ? __warn.cold+0xb7/0x14d
> [   54.366796]  ? folio_walk_start+0x48f/0x6e0
> [   54.367628]  ? report_bug+0xff/0x140
> [   54.368324]  ? handle_bug+0x58/0x90
> [   54.369019]  ? exc_invalid_op+0x17/0x70
> [   54.369771]  ? asm_exc_invalid_op+0x1a/0x20
> [   54.370606]  ? folio_walk_start+0x48f/0x6e0
> [   54.371415]  ? folio_walk_start+0x9e/0x6e0
> [   54.372227]  do_pages_move+0x1c5/0x680
> [   54.372972]  kernel_move_pages+0x1a1/0x2b0
> [   54.373804]  __x64_sys_move_pages+0x25/0x30

It would be better to add this call stack to the commit message, which
can help people find this fix patch when they encounter same problem. ;)

Otherwise, LGTM.

Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>

Thanks!

> 
> 
> 
>
David Hildenbrand Oct. 16, 2024, 11:05 a.m. UTC | #7
On 16.10.24 12:58, Qi Zheng wrote:
> 
> 
> On 2024/10/15 21:13, David Hildenbrand wrote:
>> On 15.10.24 13:12, David Hildenbrand wrote:
>>> pmd_leaf()/pud_leaf() only implies a pmd_present()/pud_present() check on
>>> some architectures. We really should check for
>>> pmd_present()/pud_present() first.
>>>
>>> This should explain the report we got on ppc64 (which has
>>> CONFIG_PGTABLE_HAS_HUGE_LEAVES set in the config) that triggered:
>>>      VM_WARN_ON_ONCE(pmd_leaf(pmdp_get_lockless(pmdp)));
>>>
>>> Likely we had a PMD migration entry for which pmd_leaf() did not
>>> trigger. We raced with restoring the PMD migration entry, and suddenly
>>> saw a pmd_leaf(). In this case, pte_offset_map_lock() saved us from more
>>> trouble, because it rechecks the PMD value, but we would not have
>>> processed
>>> the migration entry -- which is not too bad because the only user of
>>> FW_MIGRATION is KSM for unsharing, and KSM only applies to small folios.
>>>
>>> Further, we shouldn't re-read the PMD/PUD value for our warning, the
>>> primary purpose of the VM_WARN_ON_ONCE() is to find spurious use of
>>> pmd_leaf()/pud_leaf() without CONFIG_PGTABLE_HAS_HUGE_LEAVES.
>>>
>>> As a side note, we are currently not implementing FW_MIGRATION support
>>> for PUD migration entries, which likely should exist due to hugetlb. Add
>>> a TODO so this won't fall through the cracks if more FW_MIGRATION users
>>> get added.
>>>
>>> Fixes: aa39ca6940f1 ("mm/pagewalk: introduce folio_walk_start() +
>>> folio_walk_end()")
>>> Reported-by: syzbot+7d917f67c05066cec295@syzkaller.appspotmail.com
>>> Closes:
>>> https://lkml.kernel.org/r/670d3248.050a0220.3e960.0064.GAE@google.com
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Jann Horn <jannh@google.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>
>> Was able to write a quick reproducer and verify that the issue no longer
>> triggers with this fix.
>>
>> https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/reproducers/move-pages-pmd-leaf.c
>>
>> Without this fix after a couple of seconds in a VM with 2 NUMA nodes:
>>
>> [   54.333753] ------------[ cut here ]------------
>> [   54.334901] WARNING: CPU: 20 PID: 1704 at mm/pagewalk.c:815
>> folio_walk_start+0x48f/0x6e0
>> [   54.336455] Modules linked in: ...
>> [   54.345009] CPU: 20 UID: 0 PID: 1704 Comm: move-pages-pmd- Not
>> tainted 6.12.0-rc2+ #81
>> [   54.346529] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>> 1.16.3-2.fc40 04/01/2014
>> [   54.348191] RIP: 0010:folio_walk_start+0x48f/0x6e0
>> [   54.349134] Code: b5 ad 48 8d 35 00 00 00 00 e8 6d 59 d7 ff e8 08 74
>> da ff e9 9c fe ff ff 4c 8b 7c 24 08 4c 89 ff e8 26 2b be 00 e9 8a fe ff
>> ff <0f> 0b e9 ec fe ff ff f7 c2 ff 0f 00 00 0f 85 81 fe ff ff 48 8b 02
>> [   54.352660] RSP: 0018:ffffb7e4c430bc78 EFLAGS: 00010282
>> [   54.353679] RAX: 80000002a3e008e7 RBX: ffff9946039aa580 RCX:
>> ffff994380000000
>> [   54.355056] RDX: ffff994606aec000 RSI: 00007f004b000000 RDI:
>> 0000000000000000
>> [   54.356440] RBP: 00007f004b000000 R08: 0000000000000591 R09:
>> 0000000000000001
>> [   54.357820] R10: 0000000000000200 R11: 0000000000000001 R12:
>> ffffb7e4c430bd10
>> [   54.359198] R13: ffff994606aec2c0 R14: 0000000000000002 R15:
>> ffff994604a89b00
>> [   54.360564] FS:  00007f004ae006c0(0000) GS:ffff9947f7400000(0000)
>> knlGS:0000000000000000
>> [   54.362111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   54.363242] CR2: 00007f004adffe58 CR3: 0000000281e12005 CR4:
>> 0000000000770ef0
>> [   54.364615] PKRU: 55555554
>> [   54.365153] Call Trace:
>> [   54.365646]  <TASK>
>> [   54.366073]  ? __warn.cold+0xb7/0x14d
>> [   54.366796]  ? folio_walk_start+0x48f/0x6e0
>> [   54.367628]  ? report_bug+0xff/0x140
>> [   54.368324]  ? handle_bug+0x58/0x90
>> [   54.369019]  ? exc_invalid_op+0x17/0x70
>> [   54.369771]  ? asm_exc_invalid_op+0x1a/0x20
>> [   54.370606]  ? folio_walk_start+0x48f/0x6e0
>> [   54.371415]  ? folio_walk_start+0x9e/0x6e0
>> [   54.372227]  do_pages_move+0x1c5/0x680
>> [   54.372972]  kernel_move_pages+0x1a1/0x2b0
>> [   54.373804]  __x64_sys_move_pages+0x25/0x30
> 
> It would be better to add this call stack to the commit message, which
> can help people find this fix patch when they encounter same problem. ;)

The commit is not part of a released kernel, though, and a lore search 
would return the result until it's included.

Before it's included, the commit message won't really be helpful :)

But sure, @Andrew, can we include that in the commit?

> 
> Otherwise, LGTM.
> 
> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
> 

Thanks!
Andrew Morton Oct. 16, 2024, 10:44 p.m. UTC | #8
On Wed, 16 Oct 2024 13:05:54 +0200 David Hildenbrand <david@redhat.com> wrote:

> >> [   54.372227]  do_pages_move+0x1c5/0x680
> >> [   54.372972]  kernel_move_pages+0x1a1/0x2b0
> >> [   54.373804]  __x64_sys_move_pages+0x25/0x30
> > 
> > It would be better to add this call stack to the commit message, which
> > can help people find this fix patch when they encounter same problem. ;)
> 
> The commit is not part of a released kernel, though, and a lore search 
> would return the result until it's included.
> 
> Before it's included, the commit message won't really be helpful :)
> 
> But sure, @Andrew, can we include that in the commit?

Done, thanks.
diff mbox series

Patch

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 461ea3bbd8d9..5f9f01532e67 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -744,7 +744,8 @@  struct folio *folio_walk_start(struct folio_walk *fw,
 	pud = pudp_get(pudp);
 	if (pud_none(pud))
 		goto not_found;
-	if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) && pud_leaf(pud)) {
+	if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) &&
+	    (!pud_present(pud) || pud_leaf(pud))) {
 		ptl = pud_lock(vma->vm_mm, pudp);
 		pud = pudp_get(pudp);
 
@@ -753,6 +754,10 @@  struct folio *folio_walk_start(struct folio_walk *fw,
 		fw->pudp = pudp;
 		fw->pud = pud;
 
+		/*
+		 * TODO: FW_MIGRATION support for PUD migration entries
+		 * once there are relevant users.
+		 */
 		if (!pud_present(pud) || pud_devmap(pud) || pud_special(pud)) {
 			spin_unlock(ptl);
 			goto not_found;
@@ -769,12 +774,13 @@  struct folio *folio_walk_start(struct folio_walk *fw,
 	}
 
 pmd_table:
-	VM_WARN_ON_ONCE(pud_leaf(*pudp));
+	VM_WARN_ON_ONCE(!pud_present(pud) || pud_leaf(pud));
 	pmdp = pmd_offset(pudp, addr);
 	pmd = pmdp_get_lockless(pmdp);
 	if (pmd_none(pmd))
 		goto not_found;
-	if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) && pmd_leaf(pmd)) {
+	if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) &&
+	    (!pmd_present(pmd) || pmd_leaf(pmd))) {
 		ptl = pmd_lock(vma->vm_mm, pmdp);
 		pmd = pmdp_get(pmdp);
 
@@ -786,7 +792,7 @@  struct folio *folio_walk_start(struct folio_walk *fw,
 		if (pmd_none(pmd)) {
 			spin_unlock(ptl);
 			goto not_found;
-		} else if (!pmd_leaf(pmd)) {
+		} else if (pmd_present(pmd) && !pmd_leaf(pmd)) {
 			spin_unlock(ptl);
 			goto pte_table;
 		} else if (pmd_present(pmd)) {
@@ -812,7 +818,7 @@  struct folio *folio_walk_start(struct folio_walk *fw,
 	}
 
 pte_table:
-	VM_WARN_ON_ONCE(pmd_leaf(pmdp_get_lockless(pmdp)));
+	VM_WARN_ON_ONCE(!pmd_present(pmd) || pmd_leaf(pmd));
 	ptep = pte_offset_map_lock(vma->vm_mm, pmdp, addr, &ptl);
 	if (!ptep)
 		goto not_found;