diff mbox series

[1/5] hugetlb: Make folio_test_hugetlb safer to call

Message ID 20240301214712.2853147-2-willy@infradead.org (mailing list archive)
State New
Headers show
Series Remove some races around folio_test_hugetlb | expand

Commit Message

Matthew Wilcox March 1, 2024, 9:47 p.m. UTC
At least two places (memory failure and page migration) need to call
folio_test_hugetlb() without a reference on the folio.  This can currently
result in false positives (returning true when the folio doesn't belong
to hugetlb) and more commonly in VM_BUG_ON() when a folio is split.

The new way to distinguish a hugetlb folio is to see if (1) the page
is compound (or the folio is large) and (2) page[1].mapping is set to
the address of hugetlb_lock.  If the folio is (or has been) large then
page[1] is guaranteed to exist.  If the folio is split between the two
tests, page[1].mapping will be set to something which definitely isn't
the address of hugetlb_lock.

Because we shift around the layout of struct folio a bit, we now use
page[1].private, which means we need to adjust __split_huge_page_tail()
a little.  We also need to annoy the vmcore_info people again.  Sorry.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm_types.h   |  4 +++-
 include/linux/page-flags.h | 25 ++++---------------------
 kernel/vmcore_info.c       |  3 ++-
 mm/huge_memory.c           | 10 ++--------
 mm/hugetlb.c               | 24 ++++++++++++++++++++++++
 5 files changed, 35 insertions(+), 31 deletions(-)

Comments

Oscar Salvador March 5, 2024, 6:43 a.m. UTC | #1
On Fri, Mar 01, 2024 at 09:47:06PM +0000, Matthew Wilcox (Oracle) wrote:
> At least two places (memory failure and page migration) need to call
> folio_test_hugetlb() without a reference on the folio.  This can currently
> result in false positives (returning true when the folio doesn't belong
> to hugetlb) and more commonly in VM_BUG_ON() when a folio is split.
> 
> The new way to distinguish a hugetlb folio is to see if (1) the page
> is compound (or the folio is large) and (2) page[1].mapping is set to
> the address of hugetlb_lock.  If the folio is (or has been) large then
> page[1] is guaranteed to exist.  If the folio is split between the two
> tests, page[1].mapping will be set to something which definitely isn't
> the address of hugetlb_lock.
> 
> Because we shift around the layout of struct folio a bit, we now use
> page[1].private, which means we need to adjust __split_huge_page_tail()
> a little.  We also need to annoy the vmcore_info people again.  Sorry.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Thanks for working on this Willy!

I thought about whether we want a Fixes tag here, but since we are only
bugging under DEBUG_VM, and the false positives/negatives can be
rechecked under the lock, I guess it does not matter:


Reviewed-by: Oscar Salvador <osalvador@suse.de>
David Hildenbrand March 5, 2024, 8:39 a.m. UTC | #2
On 01.03.24 22:47, Matthew Wilcox (Oracle) wrote:
> At least two places (memory failure and page migration) need to call
> folio_test_hugetlb() without a reference on the folio.  This can currently
> result in false positives (returning true when the folio doesn't belong
> to hugetlb) and more commonly in VM_BUG_ON() when a folio is split.
> 
> The new way to distinguish a hugetlb folio is to see if (1) the page
> is compound (or the folio is large) and (2) page[1].mapping is set to
> the address of hugetlb_lock.  If the folio is (or has been) large then
> page[1] is guaranteed to exist.  If the folio is split between the two
> tests, page[1].mapping will be set to something which definitely isn't
> the address of hugetlb_lock.
> 
> Because we shift around the layout of struct folio a bit, we now use
> page[1].private, which means we need to adjust __split_huge_page_tail()
> a little.  We also need to annoy the vmcore_info people again.  Sorry.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm_types.h   |  4 +++-
>   include/linux/page-flags.h | 25 ++++---------------------
>   kernel/vmcore_info.c       |  3 ++-
>   mm/huge_memory.c           | 10 ++--------
>   mm/hugetlb.c               | 24 ++++++++++++++++++++++++
>   5 files changed, 35 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index a7223ba3ea1e..fd80bf8b5d8a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -289,6 +289,7 @@ typedef struct {
>    * @virtual: Virtual address in the kernel direct map.
>    * @_last_cpupid: IDs of last CPU and last process that accessed the folio.
>    * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
> + * @large_id: May identify the type of a large folio.
>    * @_nr_pages_mapped: Do not use directly, call folio_mapcount().
>    * @_pincount: Do not use directly, call folio_maybe_dma_pinned().
>    * @_folio_nr_pages: Do not use directly, call folio_nr_pages().
> @@ -348,9 +349,9 @@ struct folio {
>   		struct {
>   			unsigned long _flags_1;
>   			unsigned long _head_1;
> -			unsigned long _folio_avail;
>   	/* public: */
>   			atomic_t _entire_mapcount;
> +			void *large_id;
>   			atomic_t _nr_pages_mapped;
>   			atomic_t _pincount;

Hm, I wanted that for the total mapcount. :(

Great that hugetlb makes things once again worse for everybody ... gah.
diff mbox series

Patch

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a7223ba3ea1e..fd80bf8b5d8a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -289,6 +289,7 @@  typedef struct {
  * @virtual: Virtual address in the kernel direct map.
  * @_last_cpupid: IDs of last CPU and last process that accessed the folio.
  * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
+ * @large_id: May identify the type of a large folio.
  * @_nr_pages_mapped: Do not use directly, call folio_mapcount().
  * @_pincount: Do not use directly, call folio_maybe_dma_pinned().
  * @_folio_nr_pages: Do not use directly, call folio_nr_pages().
@@ -348,9 +349,9 @@  struct folio {
 		struct {
 			unsigned long _flags_1;
 			unsigned long _head_1;
-			unsigned long _folio_avail;
 	/* public: */
 			atomic_t _entire_mapcount;
+			void *large_id;
 			atomic_t _nr_pages_mapped;
 			atomic_t _pincount;
 #ifdef CONFIG_64BIT
@@ -407,6 +408,7 @@  FOLIO_MATCH(_last_cpupid, _last_cpupid);
 			offsetof(struct page, pg) + sizeof(struct page))
 FOLIO_MATCH(flags, _flags_1);
 FOLIO_MATCH(compound_head, _head_1);
+FOLIO_MATCH(mapping, large_id);
 #undef FOLIO_MATCH
 #define FOLIO_MATCH(pg, fl)						\
 	static_assert(offsetof(struct folio, fl) ==			\
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 652d77805e99..75bead4a5f09 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -188,10 +188,9 @@  enum pageflags {
 	 * PF_ANY.
 	 */
 
+	PG_large_rmappable = PG_active, /* anon or file-backed */
 	/* At least one page in this folio has the hwpoison flag set */
-	PG_has_hwpoisoned = PG_error,
-	PG_hugetlb = PG_active,
-	PG_large_rmappable = PG_workingset, /* anon or file-backed */
+	PG_has_hwpoisoned = PG_workingset,
 };
 
 #define PAGEFLAGS_MASK		((1UL << NR_PAGEFLAGS) - 1)
@@ -857,23 +856,7 @@  TESTPAGEFLAG_FALSE(LargeRmappable, large_rmappable)
 
 #ifdef CONFIG_HUGETLB_PAGE
 int PageHuge(const struct page *page);
-SETPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
-CLEARPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
-
-/**
- * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs
- * @folio: The folio to test.
- *
- * Context: Any context.  Caller should have a reference on the folio to
- * prevent it from being turned into a tail page.
- * Return: True for hugetlbfs folios, false for anon folios or folios
- * belonging to other filesystems.
- */
-static inline bool folio_test_hugetlb(const struct folio *folio)
-{
-	return folio_test_large(folio) &&
-		test_bit(PG_hugetlb, const_folio_flags(folio, 1));
-}
+bool folio_test_hugetlb(const struct folio *folio);
 #else
 TESTPAGEFLAG_FALSE(Huge, hugetlb)
 #endif
@@ -1118,7 +1101,7 @@  static __always_inline void __ClearPageAnonExclusive(struct page *page)
  */
 #define PAGE_FLAGS_SECOND						\
 	(0xffUL /* order */		| 1UL << PG_has_hwpoisoned |	\
-	 1UL << PG_hugetlb		| 1UL << PG_large_rmappable)
+	 1UL << PG_large_rmappable)
 
 #define PAGE_FLAGS_PRIVATE				\
 	(1UL << PG_private | 1UL << PG_private_2)
diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
index f95516cd45bb..453cd44a9c9c 100644
--- a/kernel/vmcore_info.c
+++ b/kernel/vmcore_info.c
@@ -14,6 +14,7 @@ 
 #include <linux/cpuhotplug.h>
 #include <linux/memblock.h>
 #include <linux/kmemleak.h>
+#include <linux/hugetlb.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -206,7 +207,7 @@  static int __init crash_save_vmcoreinfo_init(void)
 #define PAGE_BUDDY_MAPCOUNT_VALUE	(~PG_buddy)
 	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
 #ifdef CONFIG_HUGETLB_PAGE
-	VMCOREINFO_NUMBER(PG_hugetlb);
+	VMCOREINFO_NUMBER(&hugetlb_lock);
 #define PAGE_OFFLINE_MAPCOUNT_VALUE	(~PG_offline)
 	VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
 #endif
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a81a09236c16..5731f28cba5f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2840,16 +2840,10 @@  static void __split_huge_page_tail(struct folio *folio, int tail,
 	page_tail->mapping = head->mapping;
 	page_tail->index = head->index + tail;
 
-	/*
-	 * page->private should not be set in tail pages. Fix up and warn once
-	 * if private is unexpectedly set.
-	 */
-	if (unlikely(page_tail->private)) {
-		VM_WARN_ON_ONCE_PAGE(true, page_tail);
-		page_tail->private = 0;
-	}
 	if (folio_test_swapcache(folio))
 		new_folio->swap.val = folio->swap.val + tail;
+	else
+		new_folio->private = NULL;
 
 	/* Page flags must be visible before we make the page non-compound. */
 	smp_wmb();
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bb17e5c22759..963c25963b5e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -100,6 +100,30 @@  static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
 		unsigned long start, unsigned long end);
 static struct resv_map *vma_resv_map(struct vm_area_struct *vma);
 
+static void folio_set_hugetlb(struct folio *folio)
+{
+	folio->large_id = &hugetlb_lock;
+}
+
+static void folio_clear_hugetlb(struct folio *folio)
+{
+	folio->large_id = NULL;
+}
+
+/**
+ * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs.
+ * @folio: The folio to test.
+ *
+ * Context: Any context.
+ * Return: True for hugetlbfs folios, false for anon folios or folios
+ * belonging to other filesystems.
+ */
+bool folio_test_hugetlb(const struct folio *folio)
+{
+	return folio_test_large(folio) && folio->large_id == &hugetlb_lock;
+}
+EXPORT_SYMBOL_GPL(folio_test_hugetlb);
+
 static inline bool subpool_is_free(struct hugepage_subpool *spool)
 {
 	if (spool->count)