Message ID | 20250410180254.164118-1-nifan.cxl@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: Introduce free_folio_and_swap_cache() to replace free_page_and_swap_cache() | expand |
On Thu, Apr 10, 2025 at 11:00:31AM -0700, nifan.cxl@gmail.com wrote: > @@ -522,8 +522,12 @@ static inline void put_swap_device(struct swap_info_struct *si) > do { (val)->freeswap = (val)->totalswap = 0; } while (0) > /* only sparc can not include linux/pagemap.h in this file > * so leave put_page and release_pages undeclared... */ > -#define free_page_and_swap_cache(page) \ > - put_page(page) > +#define free_folio_and_swap_cache(folio) \ > + do { \ > + if (!folio_test_slab(folio)) \ > + folio_put(folio); \ > + } while (0) We don't need to test for slab. Unlike put_page(), we know that slab cannot be passed this way.
On 10 Apr 2025, at 14:00, nifan.cxl@gmail.com wrote: > From: Fan Ni <fan.ni@samsung.com> > > The function free_page_and_swap_cache() takes a struct page pointer as > input parameter, but it will immediately convert it to folio and all > operations following within use folio instead of page. It makes more > sense to pass in folio directly. > > Introduce free_folio_and_swap_cache(), which takes folio as input to > replace free_page_and_swap_cache(). And apply it to all occurrences > where free_page_and_swap_cache() was used. > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > --- > arch/s390/include/asm/tlb.h | 4 ++-- > include/linux/swap.h | 10 +++++++--- > mm/huge_memory.c | 2 +- > mm/khugepaged.c | 2 +- > mm/swap_state.c | 8 +++----- > 5 files changed, 14 insertions(+), 12 deletions(-) > > diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h > index f20601995bb0..e5103e8e697d 100644 > --- a/arch/s390/include/asm/tlb.h > +++ b/arch/s390/include/asm/tlb.h > @@ -40,7 +40,7 @@ static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb, > /* > * Release the page cache reference for a pte removed by > * tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page > - * has already been freed, so just do free_page_and_swap_cache. > + * has already been freed, so just do free_folio_and_swap_cache. > * > * s390 doesn't delay rmap removal. > */ > @@ -49,7 +49,7 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb, > { > VM_WARN_ON_ONCE(delay_rmap); > > - free_page_and_swap_cache(page); > + free_folio_and_swap_cache(page_folio(page)); > return false; > } __tlb_remove_page_size() is ruining the fun of the conversion. But it will be converted to use folio eventually. > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index db46b25a65ae..9fc8856eeed9 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -450,7 +450,7 @@ static inline unsigned long total_swapcache_pages(void) > } > > void free_swap_cache(struct folio *folio); > -void free_page_and_swap_cache(struct page *); > +void free_folio_and_swap_cache(struct folio *folio); > void free_pages_and_swap_cache(struct encoded_page **, int); > /* linux/mm/swapfile.c */ > extern atomic_long_t nr_swap_pages; > @@ -522,8 +522,12 @@ static inline void put_swap_device(struct swap_info_struct *si) > do { (val)->freeswap = (val)->totalswap = 0; } while (0) > /* only sparc can not include linux/pagemap.h in this file > * so leave put_page and release_pages undeclared... */ > -#define free_page_and_swap_cache(page) \ > - put_page(page) > +#define free_folio_and_swap_cache(folio) \ > + do { \ > + if (!folio_test_slab(folio)) \ > + folio_put(folio); \ > + } while (0) > + Like Matthew pointed out in another email, the test is not needed. Otherwise, it looks good to me. Thanks. Reviewed-by: Zi Yan <ziy@nvidia.com> Best Regards, Yan, Zi
On Thu, Apr 10, 2025 at 02:16:09PM -0400, Zi Yan wrote: > > @@ -49,7 +49,7 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb, > > { > > VM_WARN_ON_ONCE(delay_rmap); > > > > - free_page_and_swap_cache(page); > > + free_folio_and_swap_cache(page_folio(page)); > > return false; > > } > > __tlb_remove_page_size() is ruining the fun of the conversion. But it will be > converted to use folio eventually. Well, hm, I'm not sure. I haven't looked into this in detail. We have a __tlb_remove_folio_pages() which removes N pages but they must all be within the same folio: VM_WARN_ON_ONCE(page_folio(page) != page_folio(page + nr_pages - 1)); but would we be better off just passing in the folio which contains the page and always flush all pages in the folio? It'd certainly simplify the "encoded pages" stuff since we'd no longer need to pass (page, length) tuples. But then, what happens if the folio is split between being added to the batch and the flush actually happening?
On 10.04.25 20:25, Matthew Wilcox wrote: > On Thu, Apr 10, 2025 at 02:16:09PM -0400, Zi Yan wrote: >>> @@ -49,7 +49,7 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb, >>> { >>> VM_WARN_ON_ONCE(delay_rmap); >>> >>> - free_page_and_swap_cache(page); >>> + free_folio_and_swap_cache(page_folio(page)); >>> return false; >>> } >> >> __tlb_remove_page_size() is ruining the fun of the conversion. But it will be >> converted to use folio eventually. > > Well, hm, I'm not sure. I haven't looked into this in detail. > We have a __tlb_remove_folio_pages() which removes N pages but they must > all be within the same folio: > > VM_WARN_ON_ONCE(page_folio(page) != page_folio(page + nr_pages - 1)); > > but would we be better off just passing in the folio which contains the > page and always flush all pages in the folio? The delay_rmap needs the precise pages, so we cannot easily switch to folio + nr_refs. Once the per-page mapcounts are gone for good, we might no longer need page+nr_pages but folio+nr_refs would work.
On Thu, Apr 10, 2025 at 08:36:34PM +0200, David Hildenbrand wrote: > > but would we be better off just passing in the folio which contains the > > page and always flush all pages in the folio? > > The delay_rmap needs the precise pages, so we cannot easily switch to folio > + nr_refs. > > Once the per-page mapcounts are gone for good, we might no longer need > page+nr_pages but folio+nr_refs would work. Ah, I see. And we'll always need to support 'nr_pages' because we might have COWed a page in the middle of a large folio and so there's no rule we can possibly invent that allows us to infer how many pages of the folio are mapped. We'd have to go and actually walk the page table in the rmap code, and that sounds like a terrible idea.
On 10 Apr 2025, at 14:25, Matthew Wilcox wrote: > On Thu, Apr 10, 2025 at 02:16:09PM -0400, Zi Yan wrote: >>> @@ -49,7 +49,7 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb, >>> { >>> VM_WARN_ON_ONCE(delay_rmap); >>> >>> - free_page_and_swap_cache(page); >>> + free_folio_and_swap_cache(page_folio(page)); >>> return false; >>> } >> >> __tlb_remove_page_size() is ruining the fun of the conversion. But it will be >> converted to use folio eventually. > > Well, hm, I'm not sure. I haven't looked into this in detail. > We have a __tlb_remove_folio_pages() which removes N pages but they must > all be within the same folio: > > VM_WARN_ON_ONCE(page_folio(page) != page_folio(page + nr_pages - 1)); > > but would we be better off just passing in the folio which contains the > page and always flush all pages in the folio? It'd certainly simplify > the "encoded pages" stuff since we'd no longer need to pass (page, > length) tuples. But then, what happens if the folio is split between > being added to the batch and the flush actually happening? Apparently I did not read enough context before made the comment. __tlb_remove_page_size() is used to check if tlb flush is need by tlb_remove_page_size(), which is used for zap PMDs and PUDs, whereas __tlb_remove_folio_pages() is used to check tlb flush needs for zap PTEs, including single page folio and multiple pages in a folio. On x86, __tlb_remove_page_size() and __tlb_remove_folio_pages() use the same backend __tlb_remove_folio_pages_size(), but on s390 they are different. Like you said, if a folio is split between it is added and flushed, a flush-folio-as-a-whole function would miss part of the original folio. Unless a pin is added to avoid that, but that sounds stupid. Probably we will have to live with this per-page flush thing. Best Regards, Yan, Zi
On Thu, 10 Apr 2025, nifan.cxl@gmail.com wrote: >From: Fan Ni <fan.ni@samsung.com> > >The function free_page_and_swap_cache() takes a struct page pointer as >input parameter, but it will immediately convert it to folio and all >operations following within use folio instead of page. It makes more >sense to pass in folio directly. > >Introduce free_folio_and_swap_cache(), which takes folio as input to >replace free_page_and_swap_cache(). And apply it to all occurrences >where free_page_and_swap_cache() was used. > >Signed-off-by: Fan Ni <fan.ni@samsung.com> With the already pointed out issues, this looks good. Acked-by: Davidlohr Bueso <dave@stgolabs.net>
On Thu, Apr 10, 2025 at 11:00:31AM -0700, nifan.cxl@gmail.com wrote: > From: Fan Ni <fan.ni@samsung.com> > > The function free_page_and_swap_cache() takes a struct page pointer as > input parameter, but it will immediately convert it to folio and all > operations following within use folio instead of page. It makes more > sense to pass in folio directly. > > Introduce free_folio_and_swap_cache(), which takes folio as input to > replace free_page_and_swap_cache(). And apply it to all occurrences > where free_page_and_swap_cache() was used. > > Signed-off-by: Fan Ni <fan.ni@samsung.com> Aside from the unnecessary folio_test_slab() others have already mentioned, LGTM. Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h index f20601995bb0..e5103e8e697d 100644 --- a/arch/s390/include/asm/tlb.h +++ b/arch/s390/include/asm/tlb.h @@ -40,7 +40,7 @@ static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb, /* * Release the page cache reference for a pte removed by * tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page - * has already been freed, so just do free_page_and_swap_cache. + * has already been freed, so just do free_folio_and_swap_cache. * * s390 doesn't delay rmap removal. */ @@ -49,7 +49,7 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb, { VM_WARN_ON_ONCE(delay_rmap); - free_page_and_swap_cache(page); + free_folio_and_swap_cache(page_folio(page)); return false; } diff --git a/include/linux/swap.h b/include/linux/swap.h index db46b25a65ae..9fc8856eeed9 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -450,7 +450,7 @@ static inline unsigned long total_swapcache_pages(void) } void free_swap_cache(struct folio *folio); -void free_page_and_swap_cache(struct page *); +void free_folio_and_swap_cache(struct folio *folio); void free_pages_and_swap_cache(struct encoded_page **, int); /* linux/mm/swapfile.c */ extern atomic_long_t nr_swap_pages; @@ -522,8 +522,12 @@ static inline void put_swap_device(struct swap_info_struct *si) do { (val)->freeswap = (val)->totalswap = 0; } while (0) /* only sparc can not include linux/pagemap.h in this file * so leave put_page and release_pages undeclared... */ -#define free_page_and_swap_cache(page) \ - put_page(page) +#define free_folio_and_swap_cache(folio) \ + do { \ + if (!folio_test_slab(folio)) \ + folio_put(folio); \ + } while (0) + #define free_pages_and_swap_cache(pages, nr) \ release_pages((pages), (nr)); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 28c87e0e036f..65a5ddf60ec7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3640,7 +3640,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, * requires taking the lru_lock so we do the put_page * of the tail pages after the split is complete. */ - free_page_and_swap_cache(&new_folio->page); + free_folio_and_swap_cache(new_folio); } return ret; } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b8838ba8207a..5cf204ab6af0 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -746,7 +746,7 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte, ptep_clear(vma->vm_mm, address, _pte); folio_remove_rmap_pte(src, src_page, vma); spin_unlock(ptl); - free_page_and_swap_cache(src_page); + free_folio_and_swap_cache(src); } } diff --git a/mm/swap_state.c b/mm/swap_state.c index 68fd981b514f..ac4e0994931c 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -232,13 +232,11 @@ void free_swap_cache(struct folio *folio) } /* - * Perform a free_page(), also freeing any swap cache associated with - * this page if it is the last user of the page. + * Freeing a folio and also freeing any swap cache associated with + * this folio if it is the last user. */ -void free_page_and_swap_cache(struct page *page) +void free_folio_and_swap_cache(struct folio *folio) { - struct folio *folio = page_folio(page); - free_swap_cache(folio); if (!is_huge_zero_folio(folio)) folio_put(folio);