Message ID | 20220329160440.193848-17-david@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: COW fixes part 2: reliable GUP pins of anonymous pages | expand |
On 3/29/22 18:04, David Hildenbrand wrote: > Let's verify when (un)pinning anonymous pages that we always deal with > exclusive anonymous pages, which guarantees that we'll have a reliable > PIN, meaning that we cannot end up with the GUP pin being inconsistent > with he pages mapped into the page tables due to a COW triggered > by a write fault. > > When pinning pages, after conditionally triggering GUP unsharing of > possibly shared anonymous pages, we should always only see exclusive > anonymous pages. Note that anonymous pages that are mapped writable > must be marked exclusive, otherwise we'd have a BUG. > > When pinning during ordinary GUP, simply add a check after our > conditional GUP-triggered unsharing checks. As we know exactly how the > page is mapped, we know exactly in which page we have to check for > PageAnonExclusive(). > > When pinning via GUP-fast we have to be careful, because we can race with > fork(): verify only after we made sure via the seqcount that we didn't > race with concurrent fork() that we didn't end up pinning a possibly > shared anonymous page. > > Similarly, when unpinning, verify that the pages are still marked as > exclusive: otherwise something turned the pages possibly shared, which > can result in random memory corruptions, which we really want to catch. > > With only the pinned pages at hand and not the actual page table entries > we have to be a bit careful: hugetlb pages are always mapped via a > single logical page table entry referencing the head page and > PG_anon_exclusive of the head page applies. Anon THP are a bit more > complicated, because we might have obtained the page reference either via > a PMD or a PTE -- depending on the mapping type we either have to check > PageAnonExclusive of the head page (PMD-mapped THP) or the tail page > (PTE-mapped THP) applies: as we don't know and to make our life easier, > check that either is set. > > Take care to not verify in case we're unpinning during GUP-fast because > we detected concurrent fork(): we might stumble over an anonymous page > that is now shared. > > Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Nits: > @@ -510,6 +563,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, > page = ERR_PTR(-EMLINK); > goto out; > } > + > + VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && > + !PageAnonExclusive(page)); Do we rather want VM_BUG_ON_PAGE? Also for the same tests in mm/huge*.c below. > + > /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */ > if (unlikely(!try_grab_page(page, flags))) { > page = ERR_PTR(-ENOMEM); > @@ -2744,8 +2801,10 @@ static unsigned long lockless_pages_from_mm(unsigned long start, > */ > if (gup_flags & FOLL_PIN) { > if (read_seqcount_retry(¤t->mm->write_protect_seq, seq)) { > - unpin_user_pages(pages, nr_pinned); > + unpin_user_pages_lockless(pages, nr_pinned); > return 0; > + } else { > + sanity_check_pinned_pages(pages, nr_pinned); > } > } > return nr_pinned; > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 2dc820e8c873..b32774f289d6 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1392,6 +1392,9 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > if (!pmd_write(*pmd) && gup_must_unshare(flags, page)) > return ERR_PTR(-EMLINK); > > + VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && > + !PageAnonExclusive(page)); > + > if (!try_grab_page(page, flags)) > return ERR_PTR(-ENOMEM); > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 21f2ec446117..48740e6c3476 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6097,6 +6097,9 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, > pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; > page = pte_page(huge_ptep_get(pte)); > > + VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && > + !PageAnonExclusive(page)); > + > /* > * If subpage information not requested, update counters > * and skip the same_page loop below.
On 19.04.22 19:40, Vlastimil Babka wrote: > On 3/29/22 18:04, David Hildenbrand wrote: >> Let's verify when (un)pinning anonymous pages that we always deal with >> exclusive anonymous pages, which guarantees that we'll have a reliable >> PIN, meaning that we cannot end up with the GUP pin being inconsistent >> with he pages mapped into the page tables due to a COW triggered >> by a write fault. >> >> When pinning pages, after conditionally triggering GUP unsharing of >> possibly shared anonymous pages, we should always only see exclusive >> anonymous pages. Note that anonymous pages that are mapped writable >> must be marked exclusive, otherwise we'd have a BUG. >> >> When pinning during ordinary GUP, simply add a check after our >> conditional GUP-triggered unsharing checks. As we know exactly how the >> page is mapped, we know exactly in which page we have to check for >> PageAnonExclusive(). >> >> When pinning via GUP-fast we have to be careful, because we can race with >> fork(): verify only after we made sure via the seqcount that we didn't >> race with concurrent fork() that we didn't end up pinning a possibly >> shared anonymous page. >> >> Similarly, when unpinning, verify that the pages are still marked as >> exclusive: otherwise something turned the pages possibly shared, which >> can result in random memory corruptions, which we really want to catch. >> >> With only the pinned pages at hand and not the actual page table entries >> we have to be a bit careful: hugetlb pages are always mapped via a >> single logical page table entry referencing the head page and >> PG_anon_exclusive of the head page applies. Anon THP are a bit more >> complicated, because we might have obtained the page reference either via >> a PMD or a PTE -- depending on the mapping type we either have to check >> PageAnonExclusive of the head page (PMD-mapped THP) or the tail page >> (PTE-mapped THP) applies: as we don't know and to make our life easier, >> check that either is set. >> >> Take care to not verify in case we're unpinning during GUP-fast because >> we detected concurrent fork(): we might stumble over an anonymous page >> that is now shared. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> > > Acked-by: Vlastimil Babka <vbabka@suse.cz> > > Nits: > >> @@ -510,6 +563,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, >> page = ERR_PTR(-EMLINK); >> goto out; >> } >> + >> + VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && >> + !PageAnonExclusive(page)); > > Do we rather want VM_BUG_ON_PAGE? Also for the same tests in mm/huge*.c below. Make sense, thanks: diff --git a/mm/gup.c b/mm/gup.c index 5c17d4816441..46ffd8c51c6e 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -564,8 +564,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, goto out; } - VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && - !PageAnonExclusive(page)); + VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && + !PageAnonExclusive(page), page); /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */ if (unlikely(!try_grab_page(page, flags))) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 390f22334ee9..a2f44d8d3d47 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1392,8 +1392,8 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, if (!pmd_write(*pmd) && gup_must_unshare(flags, page)) return ERR_PTR(-EMLINK); - VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && - !PageAnonExclusive(page)); + VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && + !PageAnonExclusive(page), page); if (!try_grab_page(page, flags)) return ERR_PTR(-ENOMEM); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8a635b5b5270..0ba2b1930b21 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6100,8 +6100,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; page = pte_page(huge_ptep_get(pte)); - VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && - !PageAnonExclusive(page)); + VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && + !PageAnonExclusive(page), page); /* * If subpage information not requested, update counters
On 4/21/22 11:15, David Hildenbrand wrote: > On 19.04.22 19:40, Vlastimil Babka wrote: >> On 3/29/22 18:04, David Hildenbrand wrote: >>> Let's verify when (un)pinning anonymous pages that we always deal with >>> exclusive anonymous pages, which guarantees that we'll have a reliable >>> PIN, meaning that we cannot end up with the GUP pin being inconsistent >>> with he pages mapped into the page tables due to a COW triggered >>> by a write fault. >>> >>> When pinning pages, after conditionally triggering GUP unsharing of >>> possibly shared anonymous pages, we should always only see exclusive >>> anonymous pages. Note that anonymous pages that are mapped writable >>> must be marked exclusive, otherwise we'd have a BUG. >>> >>> When pinning during ordinary GUP, simply add a check after our >>> conditional GUP-triggered unsharing checks. As we know exactly how the >>> page is mapped, we know exactly in which page we have to check for >>> PageAnonExclusive(). >>> >>> When pinning via GUP-fast we have to be careful, because we can race with >>> fork(): verify only after we made sure via the seqcount that we didn't >>> race with concurrent fork() that we didn't end up pinning a possibly >>> shared anonymous page. >>> >>> Similarly, when unpinning, verify that the pages are still marked as >>> exclusive: otherwise something turned the pages possibly shared, which >>> can result in random memory corruptions, which we really want to catch. >>> >>> With only the pinned pages at hand and not the actual page table entries >>> we have to be a bit careful: hugetlb pages are always mapped via a >>> single logical page table entry referencing the head page and >>> PG_anon_exclusive of the head page applies. Anon THP are a bit more >>> complicated, because we might have obtained the page reference either via >>> a PMD or a PTE -- depending on the mapping type we either have to check >>> PageAnonExclusive of the head page (PMD-mapped THP) or the tail page >>> (PTE-mapped THP) applies: as we don't know and to make our life easier, >>> check that either is set. >>> >>> Take care to not verify in case we're unpinning during GUP-fast because >>> we detected concurrent fork(): we might stumble over an anonymous page >>> that is now shared. >>> >>> Signed-off-by: David Hildenbrand <david@redhat.com> >> >> Acked-by: Vlastimil Babka <vbabka@suse.cz> >> >> Nits: >> >>> @@ -510,6 +563,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, >>> page = ERR_PTR(-EMLINK); >>> goto out; >>> } >>> + >>> + VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && >>> + !PageAnonExclusive(page)); >> >> Do we rather want VM_BUG_ON_PAGE? Also for the same tests in mm/huge*.c below. > > Make sense, thanks: LGTM > diff --git a/mm/gup.c b/mm/gup.c > index 5c17d4816441..46ffd8c51c6e 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -564,8 +564,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, > goto out; > } > > - VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && > - !PageAnonExclusive(page)); > + VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && > + !PageAnonExclusive(page), page); > > /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */ > if (unlikely(!try_grab_page(page, flags))) { > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 390f22334ee9..a2f44d8d3d47 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1392,8 +1392,8 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > if (!pmd_write(*pmd) && gup_must_unshare(flags, page)) > return ERR_PTR(-EMLINK); > > - VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && > - !PageAnonExclusive(page)); > + VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && > + !PageAnonExclusive(page), page); > > if (!try_grab_page(page, flags)) > return ERR_PTR(-ENOMEM); > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8a635b5b5270..0ba2b1930b21 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6100,8 +6100,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, > pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; > page = pte_page(huge_ptep_get(pte)); > > - VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && > - !PageAnonExclusive(page)); > + VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && > + !PageAnonExclusive(page), page); > > /* > * If subpage information not requested, update counters > > >
diff --git a/mm/gup.c b/mm/gup.c index 6060823f9be8..2b4cd5fa7f51 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -29,6 +29,39 @@ struct follow_page_context { unsigned int page_mask; }; +static inline void sanity_check_pinned_pages(struct page **pages, + unsigned long npages) +{ + if (!IS_ENABLED(CONFIG_DEBUG_VM)) + return; + + /* + * We only pin anonymous pages if they are exclusive. Once pinned, we + * can no longer turn them possibly shared and PageAnonExclusive() will + * stick around until the page is freed. + * + * We'd like to verify that our pinned anonymous pages are still mapped + * exclusively. The issue with anon THP is that we don't know how + * they are/were mapped when pinning them. However, for anon + * THP we can assume that either the given page (PTE-mapped THP) or + * the head page (PMD-mapped THP) should be PageAnonExclusive(). If + * neither is the case, there is certainly something wrong. + */ + for (; npages; npages--, pages++) { + struct page *page = *pages; + struct folio *folio = page_folio(page); + + if (!folio_test_anon(folio)) + continue; + if (!folio_test_large(folio) || folio_test_hugetlb(folio)) + VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page); + else + /* Either a PTE-mapped or a PMD-mapped THP. */ + VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page) && + !PageAnonExclusive(page), page); + } +} + /* * Return the folio with ref appropriately incremented, * or NULL if that failed. @@ -204,6 +237,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags) */ void unpin_user_page(struct page *page) { + sanity_check_pinned_pages(&page, 1); gup_put_folio(page_folio(page), 1, FOLL_PIN); } EXPORT_SYMBOL(unpin_user_page); @@ -272,6 +306,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, return; } + sanity_check_pinned_pages(pages, npages); for (i = 0; i < npages; i += nr) { folio = gup_folio_next(pages, npages, i, &nr); /* @@ -344,6 +379,23 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages, } EXPORT_SYMBOL(unpin_user_page_range_dirty_lock); +static void unpin_user_pages_lockless(struct page **pages, unsigned long npages) +{ + unsigned long i; + struct folio *folio; + unsigned int nr; + + /* + * Don't perform any sanity checks because we might have raced with + * fork() and some anonymous pages might now actually be shared -- + * which is why we're unpinning after all. + */ + for (i = 0; i < npages; i += nr) { + folio = gup_folio_next(pages, npages, i, &nr); + gup_put_folio(folio, nr, FOLL_PIN); + } +} + /** * unpin_user_pages() - release an array of gup-pinned pages. * @pages: array of pages to be marked dirty and released. @@ -367,6 +419,7 @@ void unpin_user_pages(struct page **pages, unsigned long npages) if (WARN_ON(IS_ERR_VALUE(npages))) return; + sanity_check_pinned_pages(pages, npages); for (i = 0; i < npages; i += nr) { folio = gup_folio_next(pages, npages, i, &nr); gup_put_folio(folio, nr, FOLL_PIN); @@ -510,6 +563,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = ERR_PTR(-EMLINK); goto out; } + + VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && + !PageAnonExclusive(page)); + /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */ if (unlikely(!try_grab_page(page, flags))) { page = ERR_PTR(-ENOMEM); @@ -2744,8 +2801,10 @@ static unsigned long lockless_pages_from_mm(unsigned long start, */ if (gup_flags & FOLL_PIN) { if (read_seqcount_retry(¤t->mm->write_protect_seq, seq)) { - unpin_user_pages(pages, nr_pinned); + unpin_user_pages_lockless(pages, nr_pinned); return 0; + } else { + sanity_check_pinned_pages(pages, nr_pinned); } } return nr_pinned; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2dc820e8c873..b32774f289d6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1392,6 +1392,9 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, if (!pmd_write(*pmd) && gup_must_unshare(flags, page)) return ERR_PTR(-EMLINK); + VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && + !PageAnonExclusive(page)); + if (!try_grab_page(page, flags)) return ERR_PTR(-ENOMEM); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 21f2ec446117..48740e6c3476 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6097,6 +6097,9 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; page = pte_page(huge_ptep_get(pte)); + VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && + !PageAnonExclusive(page)); + /* * If subpage information not requested, update counters * and skip the same_page loop below.
Let's verify when (un)pinning anonymous pages that we always deal with exclusive anonymous pages, which guarantees that we'll have a reliable PIN, meaning that we cannot end up with the GUP pin being inconsistent with he pages mapped into the page tables due to a COW triggered by a write fault. When pinning pages, after conditionally triggering GUP unsharing of possibly shared anonymous pages, we should always only see exclusive anonymous pages. Note that anonymous pages that are mapped writable must be marked exclusive, otherwise we'd have a BUG. When pinning during ordinary GUP, simply add a check after our conditional GUP-triggered unsharing checks. As we know exactly how the page is mapped, we know exactly in which page we have to check for PageAnonExclusive(). When pinning via GUP-fast we have to be careful, because we can race with fork(): verify only after we made sure via the seqcount that we didn't race with concurrent fork() that we didn't end up pinning a possibly shared anonymous page. Similarly, when unpinning, verify that the pages are still marked as exclusive: otherwise something turned the pages possibly shared, which can result in random memory corruptions, which we really want to catch. With only the pinned pages at hand and not the actual page table entries we have to be a bit careful: hugetlb pages are always mapped via a single logical page table entry referencing the head page and PG_anon_exclusive of the head page applies. Anon THP are a bit more complicated, because we might have obtained the page reference either via a PMD or a PTE -- depending on the mapping type we either have to check PageAnonExclusive of the head page (PMD-mapped THP) or the tail page (PTE-mapped THP) applies: as we don't know and to make our life easier, check that either is set. Take care to not verify in case we're unpinning during GUP-fast because we detected concurrent fork(): we might stumble over an anonymous page that is now shared. Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/gup.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++- mm/huge_memory.c | 3 +++ mm/hugetlb.c | 3 +++ 3 files changed, 66 insertions(+), 1 deletion(-)