Message ID | 20231220224504.646757-5-david@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/rmap: interface overhaul | expand |
On Wed, Dec 20, 2023 at 11:44:28PM +0100, David Hildenbrand wrote: > hugetlb rmap handling differs quite a lot from "ordinary" rmap code. > For example, hugetlb currently only supports entire mappings, and treats > any mapping as mapped using a single "logical PTE". Let's move it out > of the way so we can overhaul our "ordinary" rmap. > implementation/interface. > > So let's introduce and use hugetlb_try_dup_anon_rmap() to make all > hugetlb handling use dedicated hugetlb_* rmap functions. > > Add sanity checks that we end up with the right folios in the right > functions. > > Note that is_device_private_page() does not apply to hugetlb. > > Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> > Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> > Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > +static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma, > + struct folio *folio) I particularly like it that you introduced this. > +static inline int hugetlb_try_dup_anon_rmap(struct folio *folio, > + struct vm_area_struct *vma) > +{ > + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); > + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); > + > + if (PageAnonExclusive(&folio->page)) { I wonder if we need a folio_test_hugetlb_anon_exclusive() to make this a little more ergonomic? > + if (unlikely(folio_needs_cow_for_dma(vma, folio))) > + return -EBUSY; > + ClearPageAnonExclusive(&folio->page); ... and set/clear variants.
> On Dec 21, 2023, at 06:44, David Hildenbrand <david@redhat.com> wrote: > > hugetlb rmap handling differs quite a lot from "ordinary" rmap code. > For example, hugetlb currently only supports entire mappings, and treats > any mapping as mapped using a single "logical PTE". Let's move it out > of the way so we can overhaul our "ordinary" rmap. > implementation/interface. > > So let's introduce and use hugetlb_try_dup_anon_rmap() to make all > hugetlb handling use dedicated hugetlb_* rmap functions. > > Add sanity checks that we end up with the right folios in the right > functions. > > Note that is_device_private_page() does not apply to hugetlb. > > Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> > Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> > Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Thanks.
On 21.12.23 05:40, Matthew Wilcox wrote: > On Wed, Dec 20, 2023 at 11:44:28PM +0100, David Hildenbrand wrote: >> hugetlb rmap handling differs quite a lot from "ordinary" rmap code. >> For example, hugetlb currently only supports entire mappings, and treats >> any mapping as mapped using a single "logical PTE". Let's move it out >> of the way so we can overhaul our "ordinary" rmap. >> implementation/interface. >> >> So let's introduce and use hugetlb_try_dup_anon_rmap() to make all >> hugetlb handling use dedicated hugetlb_* rmap functions. >> >> Add sanity checks that we end up with the right folios in the right >> functions. >> >> Note that is_device_private_page() does not apply to hugetlb. >> >> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> >> Signed-off-by: David Hildenbrand <david@redhat.com> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Thanks! >> +static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma, >> + struct folio *folio) > > I particularly like it that you introduced this. And a later patch even removes page_needs_cow_for_dma() :) A note that we have one remaining user of page_maybe_dma_pinned(). Instead of converting that code to folios, we should probably just remove that pte_is_pinned() handling completely: it's inconsistent (only checks PTEs) and cannot handle concurrent GUP-fast. It's a leftover from the COW issues we had before PageAnonExclusive. [I've had patch lying around to do that for a long time, but never sent it] > >> +static inline int hugetlb_try_dup_anon_rmap(struct folio *folio, >> + struct vm_area_struct *vma) >> +{ >> + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); >> + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); >> + >> + if (PageAnonExclusive(&folio->page)) { > > I wonder if we need a folio_test_hugetlb_anon_exclusive() to make this > a little more ergonomic? > >> + if (unlikely(folio_needs_cow_for_dma(vma, folio))) >> + return -EBUSY; >> + ClearPageAnonExclusive(&folio->page); > > ... and set/clear variants. > I thought about that as well, and even going a step further and instead of having PageAnonExclusive checks outside rmap code, have something like the following instead: hugetlb_test_anon_rmap_exclusive() folio_test_anon_rmap_exclusive_[pte|pmd]() I added that to my TODO list, because it results again in a bigger patchset (especially also in GUP).
diff --git a/include/linux/mm.h b/include/linux/mm.h index b72bf25a45cfd..ae547b62f3252 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1964,15 +1964,21 @@ static inline bool page_maybe_dma_pinned(struct page *page) * * The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq. */ -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, - struct page *page) +static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma, + struct folio *folio) { VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1)); if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags)) return false; - return page_maybe_dma_pinned(page); + return folio_maybe_dma_pinned(folio); +} + +static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, + struct page *page) +{ + return folio_needs_cow_for_dma(vma, page_folio(page)); } /** diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 56900a16f41a6..5f26752de945c 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -211,6 +211,22 @@ void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *, void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address); +/* See page_try_dup_anon_rmap() */ +static inline int hugetlb_try_dup_anon_rmap(struct folio *folio, + struct vm_area_struct *vma) +{ + VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + + if (PageAnonExclusive(&folio->page)) { + if (unlikely(folio_needs_cow_for_dma(vma, folio))) + return -EBUSY; + ClearPageAnonExclusive(&folio->page); + } + atomic_inc(&folio->_entire_mapcount); + return 0; +} + static inline void hugetlb_add_file_rmap(struct folio *folio) { VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio); @@ -228,6 +244,8 @@ static inline void hugetlb_remove_rmap(struct folio *folio) static inline void __page_dup_rmap(struct page *page, bool compound) { + VM_WARN_ON(folio_test_hugetlb(page_folio(page))); + if (compound) { struct folio *folio = (struct folio *)page; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 57e8981879314..378e460a6ab41 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5409,8 +5409,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, */ if (!folio_test_anon(pte_folio)) { hugetlb_add_file_rmap(pte_folio); - } else if (page_try_dup_anon_rmap(&pte_folio->page, - true, src_vma)) { + } else if (hugetlb_try_dup_anon_rmap(pte_folio, src_vma)) { pte_t src_pte_old = entry; struct folio *new_folio;