Message ID | 20250418085802.2973519-1-gavinguo@igalia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm/huge_memory: fix dereferencing invalid pmd migration entry | expand |
On 4/18/25 16:58, Gavin Guo wrote: > When migrating a THP, concurrent access to the PMD migration entry > during a deferred split scan can lead to a invalid address access, as > illustrated below. To prevent this page fault, it is necessary to check > the PMD migration entry and return early. In this context, there is no > need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the > equality of the target folio. Since the PMD migration entry is locked, > it cannot be served as the target. > > Mailing list discussion and explanation from Hugh Dickins: > "An anon_vma lookup points to a location which may contain the folio of > interest, but might instead contain another folio: and weeding out those > other folios is precisely what the "folio != pmd_folio((*pmd)" check > (and the "risk of replacing the wrong folio" comment a few lines above > it) is for." > > BUG: unable to handle page fault for address: ffffea60001db008 > CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 > Call Trace: > <TASK> > try_to_migrate_one+0x28c/0x3730 > rmap_walk_anon+0x4f6/0x770 > unmap_folio+0x196/0x1f0 > split_huge_page_to_list_to_order+0x9f6/0x1560 > deferred_split_scan+0xac5/0x12a0 > shrinker_debugfs_scan_write+0x376/0x470 > full_proxy_write+0x15c/0x220 > vfs_write+0x2fc/0xcb0 > ksys_write+0x146/0x250 > do_syscall_64+0x6a/0x120 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > The bug is found by syzkaller on an internal kernel, then confirmed on > upstream. > > Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") > Cc: stable@vger.kernel.org > Signed-off-by: Gavin Guo <gavinguo@igalia.com> > Acked-by: David Hildenbrand <david@redhat.com> > Acked-by: Hugh Dickins <hughd@google.com> > Acked-by: Zi Yan <ziy@nvidia.com> > Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ > --- > V1 -> V2: Add explanation from Hugh and correct the wording from page > fault to invalid address access. > > mm/huge_memory.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 2a47682d1ab7..0cb9547dcff2 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, > pmd_t *pmd, bool freeze, struct folio *folio) > { > + bool pmd_migration = is_pmd_migration_entry(*pmd); > + > VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); > VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); > VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); > @@ -3085,10 +3087,18 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, > * require a folio to check the PMD against. Otherwise, there > * is a risk of replacing the wrong folio. > */ > - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || > - is_pmd_migration_entry(*pmd)) { > - if (folio && folio != pmd_folio(*pmd)) > - return; > + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) { > + if (folio) { > + /* > + * Do not apply pmd_folio() to a migration entry; and > + * folio lock guarantees that it must be of the wrong > + * folio anyway. > + */ > + if (pmd_migration) > + return; > + if (folio != pmd_folio(*pmd)) > + return; > + } > __split_huge_pmd_locked(vma, pmd, address, freeze); > } > } > > base-commit: a24588245776dafc227243a01bfbeb8a59bafba9 Hi Zi, I've carefully reviewed the mailing list and observed that the indentation is not a strong concern from the reviews. And the cleanup suggestion from David will override the modification in this patch. I have decided to keep the original version (the unindented one). Let me know if you have any feedback with the v2 patch. Thank you!
On 18.04.25 10:58, Gavin Guo wrote: > When migrating a THP, concurrent access to the PMD migration entry > during a deferred split scan can lead to a invalid address access, as > illustrated below. To prevent this page fault, it is necessary to check > the PMD migration entry and return early. In this context, there is no > need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the > equality of the target folio. Since the PMD migration entry is locked, > it cannot be served as the target. > > Mailing list discussion and explanation from Hugh Dickins: > "An anon_vma lookup points to a location which may contain the folio of > interest, but might instead contain another folio: and weeding out those > other folios is precisely what the "folio != pmd_folio((*pmd)" check > (and the "risk of replacing the wrong folio" comment a few lines above > it) is for." > > BUG: unable to handle page fault for address: ffffea60001db008 > CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 > Call Trace: > <TASK> > try_to_migrate_one+0x28c/0x3730 > rmap_walk_anon+0x4f6/0x770 > unmap_folio+0x196/0x1f0 > split_huge_page_to_list_to_order+0x9f6/0x1560 > deferred_split_scan+0xac5/0x12a0 > shrinker_debugfs_scan_write+0x376/0x470 > full_proxy_write+0x15c/0x220 > vfs_write+0x2fc/0xcb0 > ksys_write+0x146/0x250 > do_syscall_64+0x6a/0x120 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > The bug is found by syzkaller on an internal kernel, then confirmed on > upstream. > > Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") > Cc: stable@vger.kernel.org > Signed-off-by: Gavin Guo <gavinguo@igalia.com> > Acked-by: David Hildenbrand <david@redhat.com> > Acked-by: Hugh Dickins <hughd@google.com> > Acked-by: Zi Yan <ziy@nvidia.com> > Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ > --- > V1 -> V2: Add explanation from Hugh and correct the wording from page > fault to invalid address access. > > mm/huge_memory.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 2a47682d1ab7..0cb9547dcff2 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, > pmd_t *pmd, bool freeze, struct folio *folio) > { > + bool pmd_migration = is_pmd_migration_entry(*pmd); > + > VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); > VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); > VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); > @@ -3085,10 +3087,18 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, > * require a folio to check the PMD against. Otherwise, there > * is a risk of replacing the wrong folio. > */ > - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || > - is_pmd_migration_entry(*pmd)) { > - if (folio && folio != pmd_folio(*pmd)) > - return; > + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) { > + if (folio) { > + /* > + * Do not apply pmd_folio() to a migration entry; and > + * folio lock guarantees that it must be of the wrong > + * folio anyway. > + */ > + if (pmd_migration) > + return; > + if (folio != pmd_folio(*pmd)) > + return; Nit: just re-reading, I would have simply done if (pmd_migration || folio != pmd_folio(*pmd) return; Anyway, this will hopefully get cleaned up soon either way, so I don't particularly mind. :)
On 18 Apr 2025, at 5:03, Gavin Guo wrote: > On 4/18/25 16:58, Gavin Guo wrote: >> When migrating a THP, concurrent access to the PMD migration entry >> during a deferred split scan can lead to a invalid address access, as >> illustrated below. To prevent this page fault, it is necessary to check >> the PMD migration entry and return early. In this context, there is no >> need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the >> equality of the target folio. Since the PMD migration entry is locked, >> it cannot be served as the target. >> >> Mailing list discussion and explanation from Hugh Dickins: >> "An anon_vma lookup points to a location which may contain the folio of >> interest, but might instead contain another folio: and weeding out those >> other folios is precisely what the "folio != pmd_folio((*pmd)" check >> (and the "risk of replacing the wrong folio" comment a few lines above >> it) is for." >> >> BUG: unable to handle page fault for address: ffffea60001db008 >> CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 >> RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 >> Call Trace: >> <TASK> >> try_to_migrate_one+0x28c/0x3730 >> rmap_walk_anon+0x4f6/0x770 >> unmap_folio+0x196/0x1f0 >> split_huge_page_to_list_to_order+0x9f6/0x1560 >> deferred_split_scan+0xac5/0x12a0 >> shrinker_debugfs_scan_write+0x376/0x470 >> full_proxy_write+0x15c/0x220 >> vfs_write+0x2fc/0xcb0 >> ksys_write+0x146/0x250 >> do_syscall_64+0x6a/0x120 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> The bug is found by syzkaller on an internal kernel, then confirmed on >> upstream. >> >> Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") >> Cc: stable@vger.kernel.org >> Signed-off-by: Gavin Guo <gavinguo@igalia.com> >> Acked-by: David Hildenbrand <david@redhat.com> >> Acked-by: Hugh Dickins <hughd@google.com> >> Acked-by: Zi Yan <ziy@nvidia.com> >> Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ >> --- >> V1 -> V2: Add explanation from Hugh and correct the wording from page >> fault to invalid address access. >> >> mm/huge_memory.c | 18 ++++++++++++++---- >> 1 file changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 2a47682d1ab7..0cb9547dcff2 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, >> void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, >> pmd_t *pmd, bool freeze, struct folio *folio) >> { >> + bool pmd_migration = is_pmd_migration_entry(*pmd); >> + >> VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); >> VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); >> VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); >> @@ -3085,10 +3087,18 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, >> * require a folio to check the PMD against. Otherwise, there >> * is a risk of replacing the wrong folio. >> */ >> - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || >> - is_pmd_migration_entry(*pmd)) { >> - if (folio && folio != pmd_folio(*pmd)) >> - return; >> + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) { >> + if (folio) { >> + /* >> + * Do not apply pmd_folio() to a migration entry; and >> + * folio lock guarantees that it must be of the wrong >> + * folio anyway. >> + */ >> + if (pmd_migration) >> + return; >> + if (folio != pmd_folio(*pmd)) >> + return; >> + } >> __split_huge_pmd_locked(vma, pmd, address, freeze); >> } >> } >> >> base-commit: a24588245776dafc227243a01bfbeb8a59bafba9 > > Hi Zi, I've carefully reviewed the mailing list and observed that the indentation is not a strong concern from the reviews. And the cleanup suggestion from David will override the modification in this patch. I have decided to keep the original version (the unindented one). Let me know if you have any feedback with the v2 patch. Thank you! No problem. Thank you for the fix. Best Regards, Yan, Zi
On 18 Apr 2025, at 4:58, Gavin Guo wrote: > When migrating a THP, concurrent access to the PMD migration entry > during a deferred split scan can lead to a invalid address access, as s/a/an/ > illustrated below. To prevent this page fault, it is necessary to check s/this page fault/this invalid access/ The rest looks good to me. Best Regards, Yan, Zi
Hi Gavin, On 4/18/25 8:42 PM, David Hildenbrand wrote: > On 18.04.25 10:58, Gavin Guo wrote: >> When migrating a THP, concurrent access to the PMD migration entry >> during a deferred split scan can lead to a invalid address access, as >> illustrated below. To prevent this page fault, it is necessary to check >> the PMD migration entry and return early. In this context, there is no >> need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the >> equality of the target folio. Since the PMD migration entry is locked, >> it cannot be served as the target. >> >> Mailing list discussion and explanation from Hugh Dickins: >> "An anon_vma lookup points to a location which may contain the folio of >> interest, but might instead contain another folio: and weeding out those >> other folios is precisely what the "folio != pmd_folio((*pmd)" check >> (and the "risk of replacing the wrong folio" comment a few lines above >> it) is for." >> >> BUG: unable to handle page fault for address: ffffea60001db008 >> CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 >> RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 >> Call Trace: >> <TASK> >> try_to_migrate_one+0x28c/0x3730 >> rmap_walk_anon+0x4f6/0x770 >> unmap_folio+0x196/0x1f0 >> split_huge_page_to_list_to_order+0x9f6/0x1560 >> deferred_split_scan+0xac5/0x12a0 >> shrinker_debugfs_scan_write+0x376/0x470 >> full_proxy_write+0x15c/0x220 >> vfs_write+0x2fc/0xcb0 >> ksys_write+0x146/0x250 >> do_syscall_64+0x6a/0x120 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> The bug is found by syzkaller on an internal kernel, then confirmed on >> upstream. >> >> Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") >> Cc: stable@vger.kernel.org >> Signed-off-by: Gavin Guo <gavinguo@igalia.com> >> Acked-by: David Hildenbrand <david@redhat.com> >> Acked-by: Hugh Dickins <hughd@google.com> >> Acked-by: Zi Yan <ziy@nvidia.com> >> Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ >> --- >> V1 -> V2: Add explanation from Hugh and correct the wording from page >> fault to invalid address access. >> >> mm/huge_memory.c | 18 ++++++++++++++---- >> 1 file changed, 14 insertions(+), 4 deletions(-) >> Reviewed-by: Gavin Shan <gshan@redhat.com> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 2a47682d1ab7..0cb9547dcff2 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, >> void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, >> pmd_t *pmd, bool freeze, struct folio *folio) >> { >> + bool pmd_migration = is_pmd_migration_entry(*pmd); >> + >> VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); >> VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); >> VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); >> @@ -3085,10 +3087,18 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, >> * require a folio to check the PMD against. Otherwise, there >> * is a risk of replacing the wrong folio. >> */ >> - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || >> - is_pmd_migration_entry(*pmd)) { >> - if (folio && folio != pmd_folio(*pmd)) >> - return; >> + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) { >> + if (folio) { >> + /* >> + * Do not apply pmd_folio() to a migration entry; and >> + * folio lock guarantees that it must be of the wrong >> + * folio anyway. >> + */ >> + if (pmd_migration) >> + return; >> + if (folio != pmd_folio(*pmd)) >> + return; > > Nit: just re-reading, I would have simply done > > if (pmd_migration || folio != pmd_folio(*pmd) > return; > > Anyway, this will hopefully get cleaned up soon either way, so I don't particularly mind. :) > If v3 is needed to fix Zi's comments (commit log improvement), it can be improved slightly based on David's suggestion, to avoid another nested if statement. Otherwise, it's fine since it needs to be cleaned up soon. /* * Do not apply pmd_folio() to a migration entry, and folio lock * guarantees that it must be of the wrong folio anyway. */ if (folio && (pmd_migration || folio != pmd_filio(*pmd)) return; Thanks, Gavin
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2a47682d1ab7..0cb9547dcff2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, bool freeze, struct folio *folio) { + bool pmd_migration = is_pmd_migration_entry(*pmd); + VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); @@ -3085,10 +3087,18 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, * require a folio to check the PMD against. Otherwise, there * is a risk of replacing the wrong folio. */ - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || - is_pmd_migration_entry(*pmd)) { - if (folio && folio != pmd_folio(*pmd)) - return; + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) { + if (folio) { + /* + * Do not apply pmd_folio() to a migration entry; and + * folio lock guarantees that it must be of the wrong + * folio anyway. + */ + if (pmd_migration) + return; + if (folio != pmd_folio(*pmd)) + return; + } __split_huge_pmd_locked(vma, pmd, address, freeze); } }