diff mbox series

[linux-next,v3] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios

Message ID 202406071740485174hcFl7jRxncsHDtI-Pz-o@zte.com.cn (mailing list archive)
State New
Headers show
Series [linux-next,v3] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios | expand

Commit Message

xu.xin16@zte.com.cn June 7, 2024, 9:40 a.m. UTC
From: Ran Xiaokai <ran.xiaokai@zte.com.cn>

When I did a large folios split test, a WARNING
"[ 5059.122759][  T166] Cannot split file folio to non-0 order"
was triggered. But the test cases are only for anonmous folios.
while mapping_large_folio_support() is only reasonable for page
cache folios.

In split_huge_page_to_list_to_order(), the folio passed to
mapping_large_folio_support() maybe anonmous folio. The
folio_test_anon() check is missing. So the split of the anonmous THP
is failed. This is also the same for shmem_mapping(). We'd better add
a check for both. But the shmem_mapping() in __split_huge_page() is
not involved, as for anonmous folios, the end parameter is set to -1, so
(head[i].index >= end) is always false. shmem_mapping() is not called.

Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
for anon mapping, So we can detect the wrong use more easily.

THP folios maybe exist in the pagecache even the file system doesn't
support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
is enabled, khugepaged will try to collapse read-only file-backed pages
to THP. But the mapping does not actually support multi order
large folios properly.

Using /sys/kernel/debug/split_huge_pages to verify this, with this
patch, large anon THP is successfully split and the warning is ceased.

Changes since v2:
 - fix two coding style problems suggested by David
 - update the comments of the order-1 case suggested by Barry

Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
---
 include/linux/pagemap.h |  4 ++++
 mm/huge_memory.c        | 28 +++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

Comments

Andrew Morton June 7, 2024, 9:21 p.m. UTC | #1
On Fri, 7 Jun 2024 17:40:48 +0800 (CST) <xu.xin16@zte.com.cn> wrote:

> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> 
> When I did a large folios split test, a WARNING
> "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
> was triggered. But the test cases are only for anonmous folios.
> while mapping_large_folio_support() is only reasonable for page
> cache folios.
> 
> In split_huge_page_to_list_to_order(), the folio passed to
> mapping_large_folio_support() maybe anonmous folio. The
> folio_test_anon() check is missing. So the split of the anonmous THP
> is failed. This is also the same for shmem_mapping(). We'd better add
> a check for both. But the shmem_mapping() in __split_huge_page() is
> not involved, as for anonmous folios, the end parameter is set to -1, so
> (head[i].index >= end) is always false. shmem_mapping() is not called.
> 
> Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
> for anon mapping, So we can detect the wrong use more easily.
> 
> THP folios maybe exist in the pagecache even the file system doesn't
> support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
> is enabled, khugepaged will try to collapse read-only file-backed pages
> to THP. But the mapping does not actually support multi order
> large folios properly.
> 
> Using /sys/kernel/debug/split_huge_pages to verify this, with this
> patch, large anon THP is successfully split and the warning is ceased.
> 

Can we pleae identify a Fixes: target for this?  Is it c010d47f107f
("mm: thp: split huge page to any lower order pages")?

It would be good to add a selftest which would have caught this.
Ran Xiaokai June 12, 2024, 4:43 a.m. UTC | #2
> > From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> > 
> > When I did a large folios split test, a WARNING
> > "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
> > was triggered. But the test cases are only for anonmous folios.
> > while mapping_large_folio_support() is only reasonable for page
> > cache folios.
> > 
> > In split_huge_page_to_list_to_order(), the folio passed to
> > mapping_large_folio_support() maybe anonmous folio. The
> > folio_test_anon() check is missing. So the split of the anonmous THP
> > is failed. This is also the same for shmem_mapping(). We'd better add
> > a check for both. But the shmem_mapping() in __split_huge_page() is
> > not involved, as for anonmous folios, the end parameter is set to -1, so
> > (head[i].index >= end) is always false. shmem_mapping() is not called.
> > 
> > Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
> > for anon mapping, So we can detect the wrong use more easily.
> > 
> > THP folios maybe exist in the pagecache even the file system doesn't
> > support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
> > is enabled, khugepaged will try to collapse read-only file-backed pages
> > to THP. But the mapping does not actually support multi order
> > large folios properly.
> > 
> > Using /sys/kernel/debug/split_huge_pages to verify this, with this
> > patch, large anon THP is successfully split and the warning is ceased.
> > 
> 
> Can we pleae identify a Fixes: target for this?  Is it c010d47f107f
> ("mm: thp: split huge page to any lower order pages")?

yes, this fixes c010d47f107f ("mm: thp: split huge page to any lower order pages").

> It would be good to add a selftest which would have caught this.

I have updated the code in selftests/mm/split_huge_page_test.c.
For now, only order-0 is tested for the anonymous THP split case,
I am adding more mTHP-suitable-orders test cases.
I will send that in a separate patch when it is done.
diff mbox series

Patch

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ee633712bba0..59f1df0cde5a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -381,6 +381,10 @@  static inline void mapping_set_large_folios(struct address_space *mapping)
  */
 static inline bool mapping_large_folio_support(struct address_space *mapping)
 {
+	/* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
+	VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
+			"Anonymous mapping always supports large folio");
+
 	return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
 		test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 317de2afd371..155d6a9f73be 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3009,30 +3009,36 @@  int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 	if (new_order >= folio_order(folio))
 		return -EINVAL;

-	/* Cannot split anonymous THP to order-1 */
-	if (new_order == 1 && folio_test_anon(folio)) {
-		VM_WARN_ONCE(1, "Cannot split to order-1 folio");
-		return -EINVAL;
-	}
-
-	if (new_order) {
-		/* Only swapping a whole PMD-mapped folio is supported */
-		if (folio_test_swapcache(folio))
+	if (folio_test_anon(folio)) {
+		/* order-1 is not supported for anonymous THP. */
+		if (new_order == 1) {
+			VM_WARN_ONCE(1, "Cannot split to order-1 folio");
 			return -EINVAL;
+		}
+	} else if (new_order) {
 		/* Split shmem folio to non-zero order not supported */
 		if (shmem_mapping(folio->mapping)) {
 			VM_WARN_ONCE(1,
 				"Cannot split shmem folio to non-0 order");
 			return -EINVAL;
 		}
-		/* No split if the file system does not support large folio */
-		if (!mapping_large_folio_support(folio->mapping)) {
+		/*
+		 * No split if the file system does not support large folio.
+		 * Note that we might still have THPs in such mappings due to
+		 * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
+		 * does not actually support large folios properly.
+		 */
+		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
+		    !mapping_large_folio_support(folio->mapping)) {
 			VM_WARN_ONCE(1,
 				"Cannot split file folio to non-0 order");
 			return -EINVAL;
 		}
 	}

+	/* Only swapping a whole PMD-mapped folio is supported */
+	if (folio_test_swapcache(folio) && new_order)
+		return -EINVAL;

 	is_hzp = is_huge_zero_folio(folio);
 	if (is_hzp) {