diff mbox series

[v2,04/40] mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()

Message ID 20231220224504.646757-5-david@redhat.com (mailing list archive)
State New
Headers show
Series mm/rmap: interface overhaul | expand

Commit Message

David Hildenbrand Dec. 20, 2023, 10:44 p.m. UTC
hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
For example, hugetlb currently only supports entire mappings, and treats
any mapping as mapped using a single "logical PTE". Let's move it out
of the way so we can overhaul our "ordinary" rmap.
implementation/interface.

So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
hugetlb handling use dedicated hugetlb_* rmap functions.

Add sanity checks that we end up with the right folios in the right
functions.

Note that is_device_private_page() does not apply to hugetlb.

Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm.h   | 12 +++++++++---
 include/linux/rmap.h | 18 ++++++++++++++++++
 mm/hugetlb.c         |  3 +--
 3 files changed, 28 insertions(+), 5 deletions(-)

Comments

Matthew Wilcox (Oracle) Dec. 21, 2023, 4:40 a.m. UTC | #1
On Wed, Dec 20, 2023 at 11:44:28PM +0100, David Hildenbrand wrote:
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
> 
> So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
> hugetlb handling use dedicated hugetlb_* rmap functions.
> 
> Add sanity checks that we end up with the right folios in the right
> functions.
> 
> Note that is_device_private_page() does not apply to hugetlb.
> 
> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

> +static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
> +					  struct folio *folio)

I particularly like it that you introduced this.

> +static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
> +		struct vm_area_struct *vma)
> +{
> +	VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
> +	VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> +
> +	if (PageAnonExclusive(&folio->page)) {

I wonder if we need a folio_test_hugetlb_anon_exclusive() to make this
a little more ergonomic?

> +		if (unlikely(folio_needs_cow_for_dma(vma, folio)))
> +			return -EBUSY;
> +		ClearPageAnonExclusive(&folio->page);

... and set/clear variants.
Muchun Song Dec. 21, 2023, 5:47 a.m. UTC | #2
> On Dec 21, 2023, at 06:44, David Hildenbrand <david@redhat.com> wrote:
> 
> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
> For example, hugetlb currently only supports entire mappings, and treats
> any mapping as mapped using a single "logical PTE". Let's move it out
> of the way so we can overhaul our "ordinary" rmap.
> implementation/interface.
> 
> So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
> hugetlb handling use dedicated hugetlb_* rmap functions.
> 
> Add sanity checks that we end up with the right folios in the right
> functions.
> 
> Note that is_device_private_page() does not apply to hugetlb.
> 
> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.
David Hildenbrand Dec. 21, 2023, 9:29 a.m. UTC | #3
On 21.12.23 05:40, Matthew Wilcox wrote:
> On Wed, Dec 20, 2023 at 11:44:28PM +0100, David Hildenbrand wrote:
>> hugetlb rmap handling differs quite a lot from "ordinary" rmap code.
>> For example, hugetlb currently only supports entire mappings, and treats
>> any mapping as mapped using a single "logical PTE". Let's move it out
>> of the way so we can overhaul our "ordinary" rmap.
>> implementation/interface.
>>
>> So let's introduce and use hugetlb_try_dup_anon_rmap() to make all
>> hugetlb handling use dedicated hugetlb_* rmap functions.
>>
>> Add sanity checks that we end up with the right folios in the right
>> functions.
>>
>> Note that is_device_private_page() does not apply to hugetlb.
>>
>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 

Thanks!

>> +static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
>> +					  struct folio *folio)
> 
> I particularly like it that you introduced this.

And a later patch even removes page_needs_cow_for_dma() :)


A note that we have one remaining user of page_maybe_dma_pinned(). 
Instead of converting that code to folios, we should probably just 
remove that pte_is_pinned() handling completely: it's inconsistent (only 
checks PTEs) and cannot handle concurrent GUP-fast. It's a leftover from 
the COW issues we had before PageAnonExclusive. [I've had patch lying 
around to do that for a long time, but never sent it]

> 
>> +static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
>> +		struct vm_area_struct *vma)
>> +{
>> +	VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
>> +	VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>> +
>> +	if (PageAnonExclusive(&folio->page)) {
> 
> I wonder if we need a folio_test_hugetlb_anon_exclusive() to make this
> a little more ergonomic?
> 
>> +		if (unlikely(folio_needs_cow_for_dma(vma, folio)))
>> +			return -EBUSY;
>> +		ClearPageAnonExclusive(&folio->page);
> 
> ... and set/clear variants.
> 

I thought about that as well, and even going a step further and instead 
of having PageAnonExclusive checks outside rmap code, have something 
like the following instead:

hugetlb_test_anon_rmap_exclusive()
folio_test_anon_rmap_exclusive_[pte|pmd]()

I added that to my TODO list, because it results again in a bigger 
patchset (especially also in GUP).
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b72bf25a45cfd..ae547b62f3252 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1964,15 +1964,21 @@  static inline bool page_maybe_dma_pinned(struct page *page)
  *
  * The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq.
  */
-static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
-					  struct page *page)
+static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma,
+					  struct folio *folio)
 {
 	VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1));
 
 	if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags))
 		return false;
 
-	return page_maybe_dma_pinned(page);
+	return folio_maybe_dma_pinned(folio);
+}
+
+static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
+					  struct page *page)
+{
+	return folio_needs_cow_for_dma(vma, page_folio(page));
 }
 
 /**
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 56900a16f41a6..5f26752de945c 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -211,6 +211,22 @@  void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
 void hugetlb_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
 		unsigned long address);
 
+/* See page_try_dup_anon_rmap() */
+static inline int hugetlb_try_dup_anon_rmap(struct folio *folio,
+		struct vm_area_struct *vma)
+{
+	VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
+	VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+
+	if (PageAnonExclusive(&folio->page)) {
+		if (unlikely(folio_needs_cow_for_dma(vma, folio)))
+			return -EBUSY;
+		ClearPageAnonExclusive(&folio->page);
+	}
+	atomic_inc(&folio->_entire_mapcount);
+	return 0;
+}
+
 static inline void hugetlb_add_file_rmap(struct folio *folio)
 {
 	VM_WARN_ON_FOLIO(!folio_test_hugetlb(folio), folio);
@@ -228,6 +244,8 @@  static inline void hugetlb_remove_rmap(struct folio *folio)
 
 static inline void __page_dup_rmap(struct page *page, bool compound)
 {
+	VM_WARN_ON(folio_test_hugetlb(page_folio(page)));
+
 	if (compound) {
 		struct folio *folio = (struct folio *)page;
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 57e8981879314..378e460a6ab41 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5409,8 +5409,7 @@  int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			 */
 			if (!folio_test_anon(pte_folio)) {
 				hugetlb_add_file_rmap(pte_folio);
-			} else if (page_try_dup_anon_rmap(&pte_folio->page,
-							  true, src_vma)) {
+			} else if (hugetlb_try_dup_anon_rmap(pte_folio, src_vma)) {
 				pte_t src_pte_old = entry;
 				struct folio *new_folio;