diff mbox series

mm: Introduce free_folio_and_swap_cache() to replace free_page_and_swap_cache()

Message ID 20250410180254.164118-1-nifan.cxl@gmail.com (mailing list archive)
State New
Headers show
Series mm: Introduce free_folio_and_swap_cache() to replace free_page_and_swap_cache() | expand

Commit Message

Fan Ni April 10, 2025, 6 p.m. UTC
From: Fan Ni <fan.ni@samsung.com>

The function free_page_and_swap_cache() takes a struct page pointer as
input parameter, but it will immediately convert it to folio and all
operations following within use folio instead of page.  It makes more
sense to pass in folio directly.

Introduce free_folio_and_swap_cache(), which takes folio as input to
replace free_page_and_swap_cache().  And apply it to all occurrences
where free_page_and_swap_cache() was used.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 arch/s390/include/asm/tlb.h |  4 ++--
 include/linux/swap.h        | 10 +++++++---
 mm/huge_memory.c            |  2 +-
 mm/khugepaged.c             |  2 +-
 mm/swap_state.c             |  8 +++-----
 5 files changed, 14 insertions(+), 12 deletions(-)

Comments

Matthew Wilcox April 10, 2025, 6:11 p.m. UTC | #1
On Thu, Apr 10, 2025 at 11:00:31AM -0700, nifan.cxl@gmail.com wrote:
> @@ -522,8 +522,12 @@ static inline void put_swap_device(struct swap_info_struct *si)
>  	do { (val)->freeswap = (val)->totalswap = 0; } while (0)
>  /* only sparc can not include linux/pagemap.h in this file
>   * so leave put_page and release_pages undeclared... */
> -#define free_page_and_swap_cache(page) \
> -	put_page(page)
> +#define free_folio_and_swap_cache(folio)	\
> +	do {					\
> +		if (!folio_test_slab(folio))	\
> +			folio_put(folio);	\
> +	} while (0)

We don't need to test for slab.  Unlike put_page(), we know that slab
cannot be passed this way.
Zi Yan April 10, 2025, 6:16 p.m. UTC | #2
On 10 Apr 2025, at 14:00, nifan.cxl@gmail.com wrote:

> From: Fan Ni <fan.ni@samsung.com>
>
> The function free_page_and_swap_cache() takes a struct page pointer as
> input parameter, but it will immediately convert it to folio and all
> operations following within use folio instead of page.  It makes more
> sense to pass in folio directly.
>
> Introduce free_folio_and_swap_cache(), which takes folio as input to
> replace free_page_and_swap_cache().  And apply it to all occurrences
> where free_page_and_swap_cache() was used.
>
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  arch/s390/include/asm/tlb.h |  4 ++--
>  include/linux/swap.h        | 10 +++++++---
>  mm/huge_memory.c            |  2 +-
>  mm/khugepaged.c             |  2 +-
>  mm/swap_state.c             |  8 +++-----
>  5 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
> index f20601995bb0..e5103e8e697d 100644
> --- a/arch/s390/include/asm/tlb.h
> +++ b/arch/s390/include/asm/tlb.h
> @@ -40,7 +40,7 @@ static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb,
>  /*
>   * Release the page cache reference for a pte removed by
>   * tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page
> - * has already been freed, so just do free_page_and_swap_cache.
> + * has already been freed, so just do free_folio_and_swap_cache.
>   *
>   * s390 doesn't delay rmap removal.
>   */
> @@ -49,7 +49,7 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
>  {
>  	VM_WARN_ON_ONCE(delay_rmap);
>
> -	free_page_and_swap_cache(page);
> +	free_folio_and_swap_cache(page_folio(page));
>  	return false;
>  }

__tlb_remove_page_size() is ruining the fun of the conversion. But it will be
converted to use folio eventually.

>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index db46b25a65ae..9fc8856eeed9 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -450,7 +450,7 @@ static inline unsigned long total_swapcache_pages(void)
>  }
>
>  void free_swap_cache(struct folio *folio);
> -void free_page_and_swap_cache(struct page *);
> +void free_folio_and_swap_cache(struct folio *folio);
>  void free_pages_and_swap_cache(struct encoded_page **, int);
>  /* linux/mm/swapfile.c */
>  extern atomic_long_t nr_swap_pages;
> @@ -522,8 +522,12 @@ static inline void put_swap_device(struct swap_info_struct *si)
>  	do { (val)->freeswap = (val)->totalswap = 0; } while (0)
>  /* only sparc can not include linux/pagemap.h in this file
>   * so leave put_page and release_pages undeclared... */
> -#define free_page_and_swap_cache(page) \
> -	put_page(page)
> +#define free_folio_and_swap_cache(folio)	\
> +	do {					\
> +		if (!folio_test_slab(folio))	\
> +			folio_put(folio);	\
> +	} while (0)
> +

Like Matthew pointed out in another email, the test is not needed.

Otherwise, it looks good to me. Thanks.

Reviewed-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi
Matthew Wilcox April 10, 2025, 6:25 p.m. UTC | #3
On Thu, Apr 10, 2025 at 02:16:09PM -0400, Zi Yan wrote:
> > @@ -49,7 +49,7 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
> >  {
> >  	VM_WARN_ON_ONCE(delay_rmap);
> >
> > -	free_page_and_swap_cache(page);
> > +	free_folio_and_swap_cache(page_folio(page));
> >  	return false;
> >  }
> 
> __tlb_remove_page_size() is ruining the fun of the conversion. But it will be
> converted to use folio eventually.

Well, hm, I'm not sure.  I haven't looked into this in detail.
We have a __tlb_remove_folio_pages() which removes N pages but they must
all be within the same folio:

        VM_WARN_ON_ONCE(page_folio(page) != page_folio(page + nr_pages - 1));

but would we be better off just passing in the folio which contains the
page and always flush all pages in the folio?  It'd certainly simplify
the "encoded pages" stuff since we'd no longer need to pass (page,
length) tuples.  But then, what happens if the folio is split between
being added to the batch and the flush actually happening?
David Hildenbrand April 10, 2025, 6:36 p.m. UTC | #4
On 10.04.25 20:25, Matthew Wilcox wrote:
> On Thu, Apr 10, 2025 at 02:16:09PM -0400, Zi Yan wrote:
>>> @@ -49,7 +49,7 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
>>>   {
>>>   	VM_WARN_ON_ONCE(delay_rmap);
>>>
>>> -	free_page_and_swap_cache(page);
>>> +	free_folio_and_swap_cache(page_folio(page));
>>>   	return false;
>>>   }
>>
>> __tlb_remove_page_size() is ruining the fun of the conversion. But it will be
>> converted to use folio eventually.
> 
> Well, hm, I'm not sure.  I haven't looked into this in detail.
> We have a __tlb_remove_folio_pages() which removes N pages but they must
> all be within the same folio:
> 
>          VM_WARN_ON_ONCE(page_folio(page) != page_folio(page + nr_pages - 1));
> 
> but would we be better off just passing in the folio which contains the
> page and always flush all pages in the folio? 

The delay_rmap needs the precise pages, so we cannot easily switch to 
folio + nr_refs.

Once the per-page mapcounts are gone for good, we might no longer need 
page+nr_pages but folio+nr_refs would work.
Matthew Wilcox April 10, 2025, 6:51 p.m. UTC | #5
On Thu, Apr 10, 2025 at 08:36:34PM +0200, David Hildenbrand wrote:
> > but would we be better off just passing in the folio which contains the
> > page and always flush all pages in the folio?
> 
> The delay_rmap needs the precise pages, so we cannot easily switch to folio
> + nr_refs.
> 
> Once the per-page mapcounts are gone for good, we might no longer need
> page+nr_pages but folio+nr_refs would work.

Ah, I see.  And we'll always need to support 'nr_pages' because we might
have COWed a page in the middle of a large folio and so there's no rule
we can possibly invent that allows us to infer how many pages of the
folio are mapped.  We'd have to go and actually walk the page table
in the rmap code, and that sounds like a terrible idea.
Zi Yan April 10, 2025, 6:55 p.m. UTC | #6
On 10 Apr 2025, at 14:25, Matthew Wilcox wrote:

> On Thu, Apr 10, 2025 at 02:16:09PM -0400, Zi Yan wrote:
>>> @@ -49,7 +49,7 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
>>>  {
>>>  	VM_WARN_ON_ONCE(delay_rmap);
>>>
>>> -	free_page_and_swap_cache(page);
>>> +	free_folio_and_swap_cache(page_folio(page));
>>>  	return false;
>>>  }
>>
>> __tlb_remove_page_size() is ruining the fun of the conversion. But it will be
>> converted to use folio eventually.
>
> Well, hm, I'm not sure.  I haven't looked into this in detail.
> We have a __tlb_remove_folio_pages() which removes N pages but they must
> all be within the same folio:
>
>         VM_WARN_ON_ONCE(page_folio(page) != page_folio(page + nr_pages - 1));
>
> but would we be better off just passing in the folio which contains the
> page and always flush all pages in the folio?  It'd certainly simplify
> the "encoded pages" stuff since we'd no longer need to pass (page,
> length) tuples.  But then, what happens if the folio is split between
> being added to the batch and the flush actually happening?

Apparently I did not read enough context before made the comment.

__tlb_remove_page_size() is used to check if tlb flush is need by
tlb_remove_page_size(), which is used for zap PMDs and PUDs,
whereas __tlb_remove_folio_pages() is used to check tlb flush needs for
zap PTEs, including single page folio and multiple pages in a folio.
On x86, __tlb_remove_page_size() and __tlb_remove_folio_pages()
use the same backend __tlb_remove_folio_pages_size(), but on s390
they are different.

Like you said, if a folio is split between it is added and flushed,
a flush-folio-as-a-whole function would miss part of the original
folio. Unless a pin is added to avoid that, but that sounds stupid.
Probably we will have to live with this per-page flush thing.

Best Regards,
Yan, Zi
Davidlohr Bueso April 10, 2025, 8:31 p.m. UTC | #7
On Thu, 10 Apr 2025, nifan.cxl@gmail.com wrote:

>From: Fan Ni <fan.ni@samsung.com>
>
>The function free_page_and_swap_cache() takes a struct page pointer as
>input parameter, but it will immediately convert it to folio and all
>operations following within use folio instead of page.  It makes more
>sense to pass in folio directly.
>
>Introduce free_folio_and_swap_cache(), which takes folio as input to
>replace free_page_and_swap_cache().  And apply it to all occurrences
>where free_page_and_swap_cache() was used.
>
>Signed-off-by: Fan Ni <fan.ni@samsung.com>

With the already pointed out issues, this looks good.

Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Vishal Moola (Oracle) April 11, 2025, 7:15 p.m. UTC | #8
On Thu, Apr 10, 2025 at 11:00:31AM -0700, nifan.cxl@gmail.com wrote:
> From: Fan Ni <fan.ni@samsung.com>
> 
> The function free_page_and_swap_cache() takes a struct page pointer as
> input parameter, but it will immediately convert it to folio and all
> operations following within use folio instead of page.  It makes more
> sense to pass in folio directly.
> 
> Introduce free_folio_and_swap_cache(), which takes folio as input to
> replace free_page_and_swap_cache().  And apply it to all occurrences
> where free_page_and_swap_cache() was used.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

Aside from the unnecessary folio_test_slab() others have already
mentioned, LGTM.

Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
diff mbox series

Patch

diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index f20601995bb0..e5103e8e697d 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -40,7 +40,7 @@  static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb,
 /*
  * Release the page cache reference for a pte removed by
  * tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page
- * has already been freed, so just do free_page_and_swap_cache.
+ * has already been freed, so just do free_folio_and_swap_cache.
  *
  * s390 doesn't delay rmap removal.
  */
@@ -49,7 +49,7 @@  static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
 {
 	VM_WARN_ON_ONCE(delay_rmap);
 
-	free_page_and_swap_cache(page);
+	free_folio_and_swap_cache(page_folio(page));
 	return false;
 }
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index db46b25a65ae..9fc8856eeed9 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -450,7 +450,7 @@  static inline unsigned long total_swapcache_pages(void)
 }
 
 void free_swap_cache(struct folio *folio);
-void free_page_and_swap_cache(struct page *);
+void free_folio_and_swap_cache(struct folio *folio);
 void free_pages_and_swap_cache(struct encoded_page **, int);
 /* linux/mm/swapfile.c */
 extern atomic_long_t nr_swap_pages;
@@ -522,8 +522,12 @@  static inline void put_swap_device(struct swap_info_struct *si)
 	do { (val)->freeswap = (val)->totalswap = 0; } while (0)
 /* only sparc can not include linux/pagemap.h in this file
  * so leave put_page and release_pages undeclared... */
-#define free_page_and_swap_cache(page) \
-	put_page(page)
+#define free_folio_and_swap_cache(folio)	\
+	do {					\
+		if (!folio_test_slab(folio))	\
+			folio_put(folio);	\
+	} while (0)
+
 #define free_pages_and_swap_cache(pages, nr) \
 	release_pages((pages), (nr));
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 28c87e0e036f..65a5ddf60ec7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3640,7 +3640,7 @@  static int __split_unmapped_folio(struct folio *folio, int new_order,
 		 * requires taking the lru_lock so we do the put_page
 		 * of the tail pages after the split is complete.
 		 */
-		free_page_and_swap_cache(&new_folio->page);
+		free_folio_and_swap_cache(new_folio);
 	}
 	return ret;
 }
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b8838ba8207a..5cf204ab6af0 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -746,7 +746,7 @@  static void __collapse_huge_page_copy_succeeded(pte_t *pte,
 			ptep_clear(vma->vm_mm, address, _pte);
 			folio_remove_rmap_pte(src, src_page, vma);
 			spin_unlock(ptl);
-			free_page_and_swap_cache(src_page);
+			free_folio_and_swap_cache(src);
 		}
 	}
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 68fd981b514f..ac4e0994931c 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -232,13 +232,11 @@  void free_swap_cache(struct folio *folio)
 }
 
 /*
- * Perform a free_page(), also freeing any swap cache associated with
- * this page if it is the last user of the page.
+ * Freeing a folio and also freeing any swap cache associated with
+ * this folio if it is the last user.
  */
-void free_page_and_swap_cache(struct page *page)
+void free_folio_and_swap_cache(struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
-
 	free_swap_cache(folio);
 	if (!is_huge_zero_folio(folio))
 		folio_put(folio);