Message ID | 20240521073446.23185-1-osalvador@suse.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp | expand |
On 21.05.24 09:34, Oscar Salvador wrote: > hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which > means that we might need to call restore_reserve_on_error() upon error. > vmf_anon_prepare() releases the vma lock before returning, but > restore_reserve_on_error() expects the vma lock to be held by the caller. > > Fix it by calling vmf_anon_prepare() before allocating the page. > > Signed-off-by: Oscar Salvador <osalvador@suse.de> > Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()") > --- > I did not hit this bug, I just spotted this because I was looking at hugetlb_wp > for some other reason. And I did not want to get creative to see if I could > trigger this so I could get a backtrace. > My assumption is that we could trigger this if 1) this was a shared mapping, > so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause > the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp(). FOLL_WRITE should never result in FLAG_UNSHARE. > > mm/hugetlb.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 6be78e7d4f6e..eb0d8a45505e 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6005,6 +6005,15 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, > * be acquired again before returning to the caller, as expected. > */ > spin_unlock(vmf->ptl); > + > + /* > + * When the original hugepage is shared one, it does not have > + * anon_vma prepared. > + */ > + ret = vmf_anon_prepare(vmf); > + if (unlikely(ret)) > + goto out_release_old; > + > new_folio = alloc_hugetlb_folio(vma, vmf->address, outside_reserve); > > if (IS_ERR(new_folio)) { > @@ -6058,14 +6067,6 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, > goto out_release_old; > } > > - /* > - * When the original hugepage is shared one, it does not have > - * anon_vma prepared. > - */ > - ret = vmf_anon_prepare(vmf); > - if (unlikely(ret)) > - goto out_release_all; > - > if (copy_user_large_folio(new_folio, old_folio, vmf->real_address, vma)) { > ret = VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); > goto out_release_all; The joy of hugetlb reservation code. LGTM
On Tue, May 21, 2024 at 11:56:54AM +0200, David Hildenbrand wrote: > On 21.05.24 09:34, Oscar Salvador wrote: > > hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which > > means that we might need to call restore_reserve_on_error() upon error. > > vmf_anon_prepare() releases the vma lock before returning, but > > restore_reserve_on_error() expects the vma lock to be held by the caller. > > > > Fix it by calling vmf_anon_prepare() before allocating the page. > > > > Signed-off-by: Oscar Salvador <osalvador@suse.de> > > Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()") > > --- > > I did not hit this bug, I just spotted this because I was looking at hugetlb_wp > > for some other reason. And I did not want to get creative to see if I could > > trigger this so I could get a backtrace. > > My assumption is that we could trigger this if 1) this was a shared mapping, > > so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause > > the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp(). > > FOLL_WRITE should never result in FLAG_UNSHARE. You are right. It was quite early when I looked at this and I managed to confuse myself when reading hugetlb_follow_page_mask(). > The joy of hugetlb reservation code. > > LGTM thanks David!
On Tue, May 21, 2024 at 09:34:46AM +0200, Oscar Salvador wrote: > I did not hit this bug, I just spotted this because I was looking at hugetlb_wp > for some other reason. And I did not want to get creative to see if I could > trigger this so I could get a backtrace. > My assumption is that we could trigger this if 1) this was a shared mapping, > so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause > the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp(). So I checked this again and I have to confess I am bit confused. hugetlb_wp() can be called from either hugetlb_fault() or hugetlb_no_page(). hugetlb_fault()->hugetlb_wp() upon FAULT_FLAG_{WRITE,UNSHARE} hugetlb_no_page->hugetlb_wp()-> upon FAULT_FLAG_WRITE && !VM_SHARED hugetlb_no_page()->vmf_anon_prepare() upon !VM_SHARED, which means that VM_SHARED mappings do not have vma->anon_vma, while others do. hugetlb_wp() will call set_huge_ptep_writable() right away and return if it sees that the mapping is shared. So the only other we have to end up in hugetlb_wp() is via FAULT_FLAG_UNSHARE. For that to happen gup_must_unshare() must return true, which means the following assumptions must hold. - For Anonymous pages: 1) !PageAnonExclusive - For Filebacked pages: 2) We do not have a vma 3) It is a COW mapping 1) If gup_must_unshare() returns true for Anonymous pages because the page is not exclusive and must be unshared, hugetlb_wp() will already see the vma->anon_prepare being initialized because of the previous hugetlb_no_page()->vmf_anon_prepare. 2) I do not quite understand this case. 3) !VMSHARED mappings already had its anon_vma initialized in hugetlb_no_page()->vmf_anon_prepare. Probably I am missing some bits here, but I cannot see how we would even need vmf_anon_prepare() in hugetlb_wp() in the first place.
Am 27.05.24 um 10:53 schrieb Oscar Salvador: > On Tue, May 21, 2024 at 09:34:46AM +0200, Oscar Salvador wrote: >> I did not hit this bug, I just spotted this because I was looking at hugetlb_wp >> for some other reason. And I did not want to get creative to see if I could >> trigger this so I could get a backtrace. >> My assumption is that we could trigger this if 1) this was a shared mapping, >> so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause >> the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp(). > > So I checked this again and I have to confess I am bit confused. > > hugetlb_wp() can be called from either hugetlb_fault() or hugetlb_no_page(). > > hugetlb_fault()->hugetlb_wp() upon FAULT_FLAG_{WRITE,UNSHARE} > hugetlb_no_page->hugetlb_wp()-> upon FAULT_FLAG_WRITE && !VM_SHARED > > hugetlb_no_page()->vmf_anon_prepare() upon !VM_SHARED, which means that VM_SHARED > mappings do not have vma->anon_vma, while others do. > > hugetlb_wp() will call set_huge_ptep_writable() right away and return if it sees > that the mapping is shared. > So the only other we have to end up in hugetlb_wp() is via FAULT_FLAG_UNSHARE. > For that to happen gup_must_unshare() must return true, which means the following > assumptions must hold. > > - For Anonymous pages: > 1) !PageAnonExclusive > - For Filebacked pages: > 2) We do not have a vma > 3) It is a COW mapping > > 1) If gup_must_unshare() returns true for Anonymous pages because the page is not > exclusive and must be unshared, hugetlb_wp() will already see the > vma->anon_prepare being initialized because of the previous > hugetlb_no_page()->vmf_anon_prepare. > > 2) I do not quite understand this case. gup_must_unshare() without a VMA is only used for GUP-fast. Before triggering a page fault we always fallback to GUP-slow first, where we have a VMA. IMHO we better not make assumptions that hugetlb_wp() will always already have an anon VMA.
On Mon, May 27, 2024 at 03:17:01PM +0200, David Hildenbrand wrote: > gup_must_unshare() without a VMA is only used for GUP-fast. Before > triggering a page fault we always fallback to GUP-slow first, where we have > a VMA. I see, thanks for explaining! > IMHO we better not make assumptions that hugetlb_wp() will always already > have an anon VMA. Fine by me, I just wanted to have the picture more clear, because I could not see how we could end up in such situation, but better be safe and than sorry. @Andrew: This fix is to avoid an issue like [1], but for hugetlb_wp(). [1] https://lore.kernel.org/linux-mm/000000000000daf1e10615e64dcb@google.com/T/
On Tue, May 21, 2024 at 09:34:46AM +0200, Oscar Salvador wrote: > hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which > means that we might need to call restore_reserve_on_error() upon error. > vmf_anon_prepare() releases the vma lock before returning, but > restore_reserve_on_error() expects the vma lock to be held by the caller. > > Fix it by calling vmf_anon_prepare() before allocating the page. > > Signed-off-by: Oscar Salvador <osalvador@suse.de> > Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()") Let me just revamp this and run away
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6be78e7d4f6e..eb0d8a45505e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6005,6 +6005,15 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, * be acquired again before returning to the caller, as expected. */ spin_unlock(vmf->ptl); + + /* + * When the original hugepage is shared one, it does not have + * anon_vma prepared. + */ + ret = vmf_anon_prepare(vmf); + if (unlikely(ret)) + goto out_release_old; + new_folio = alloc_hugetlb_folio(vma, vmf->address, outside_reserve); if (IS_ERR(new_folio)) { @@ -6058,14 +6067,6 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, goto out_release_old; } - /* - * When the original hugepage is shared one, it does not have - * anon_vma prepared. - */ - ret = vmf_anon_prepare(vmf); - if (unlikely(ret)) - goto out_release_all; - if (copy_user_large_folio(new_folio, old_folio, vmf->real_address, vma)) { ret = VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); goto out_release_all;
hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which means that we might need to call restore_reserve_on_error() upon error. vmf_anon_prepare() releases the vma lock before returning, but restore_reserve_on_error() expects the vma lock to be held by the caller. Fix it by calling vmf_anon_prepare() before allocating the page. Signed-off-by: Oscar Salvador <osalvador@suse.de> Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()") --- I did not hit this bug, I just spotted this because I was looking at hugetlb_wp for some other reason. And I did not want to get creative to see if I could trigger this so I could get a backtrace. My assumption is that we could trigger this if 1) this was a shared mapping, so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp(). mm/hugetlb.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)