Message ID | 20240304081348.197341-4-21cnbao@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: support large folios swap-in | expand |
On 04/03/2024 08:13, Barry Song wrote: > From: Chuanhua Han <hanchuanhua@oppo.com> > > should_try_to_free_swap() works with an assumption that swap-in is always done > at normal page granularity, aka, folio_nr_pages = 1. To support large folio > swap-in, this patch removes the assumption. > > Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com> > Co-developed-by: Barry Song <v-songbaohua@oppo.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > Acked-by: Chris Li <chrisl@kernel.org> > --- > mm/memory.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index abd4f33d62c9..e0d34d705e07 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio, > * reference only in case it's likely that we'll be the exlusive user. > */ > return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) && > - folio_ref_count(folio) == 2; > + folio_ref_count(folio) == (1 + folio_nr_pages(folio)); I don't think this is correct; one reference has just been added to the folio in do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by allocating. If it came from the swapcache, it could be a large folio, because we swapped out a large folio and never removed it from swapcache. But in that case, others may have partially mapped it, so the refcount could legitimately equal the number of pages while still not being exclusively mapped. I'm guessing this logic is trying to estimate when we are likely exclusive so that we remove from swapcache (release ref) and can then reuse rather than CoW the folio? The main CoW path currently CoWs page-by-page even for large folios, and with Barry's recent patch, even the last page gets copied. So not sure what this change is really trying to achieve? > } > > static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
hi, Ryan Roberts 在 2024/3/12 20:34, Ryan Roberts 写道: > On 04/03/2024 08:13, Barry Song wrote: >> From: Chuanhua Han <hanchuanhua@oppo.com> >> >> should_try_to_free_swap() works with an assumption that swap-in is always done >> at normal page granularity, aka, folio_nr_pages = 1. To support large folio >> swap-in, this patch removes the assumption. >> >> Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com> >> Co-developed-by: Barry Song <v-songbaohua@oppo.com> >> Signed-off-by: Barry Song <v-songbaohua@oppo.com> >> Acked-by: Chris Li <chrisl@kernel.org> >> --- >> mm/memory.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index abd4f33d62c9..e0d34d705e07 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio, >> * reference only in case it's likely that we'll be the exlusive user. >> */ >> return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) && >> - folio_ref_count(folio) == 2; >> + folio_ref_count(folio) == (1 + folio_nr_pages(folio)); > I don't think this is correct; one reference has just been added to the folio in > do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by > allocating. If it came from the swapcache, it could be a large folio, because we > swapped out a large folio and never removed it from swapcache. But in that case, > others may have partially mapped it, so the refcount could legitimately equal > the number of pages while still not being exclusively mapped. > > I'm guessing this logic is trying to estimate when we are likely exclusive so > that we remove from swapcache (release ref) and can then reuse rather than CoW > the folio? The main CoW path currently CoWs page-by-page even for large folios, > and with Barry's recent patch, even the last page gets copied. So not sure what > this change is really trying to achieve? > First, if it is a large folio in the swap cache, then its refcont is at least folio_nr_pages(folio) : For example, in add_to_swap_cache path: int add_to_swap_cache(struct folio *folio, swp_entry_t entry, gfp_t gfp, void **shadowp) { struct address_space *address_space = swap_address_space(entry); pgoff_t idx = swp_offset(entry); XA_STATE_ORDER(xas, &address_space->i_pages, idx, folio_order(folio)); unsigned long i, nr = folio_nr_pages(folio); <--- void *old; ... folio_ref_add(folio, nr); <--- folio_set_swapcache(folio); ... } * Then in the do_swap_page path: * if (should_try_to_free_swap(folio, vma, vmf->flags)) folio_free_swap(folio); * * It also indicates that only folio in the swap cache will call folio_free_swap * to delete it from the swap cache, So I feel like this patch is necessary!?
On 13/03/2024 02:21, Chuanhua Han wrote: > hi, Ryan Roberts > > 在 2024/3/12 20:34, Ryan Roberts 写道: >> On 04/03/2024 08:13, Barry Song wrote: >>> From: Chuanhua Han <hanchuanhua@oppo.com> >>> >>> should_try_to_free_swap() works with an assumption that swap-in is always done >>> at normal page granularity, aka, folio_nr_pages = 1. To support large folio >>> swap-in, this patch removes the assumption. >>> >>> Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com> >>> Co-developed-by: Barry Song <v-songbaohua@oppo.com> >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> >>> Acked-by: Chris Li <chrisl@kernel.org> >>> --- >>> mm/memory.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index abd4f33d62c9..e0d34d705e07 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio, >>> * reference only in case it's likely that we'll be the exlusive user. >>> */ >>> return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) && >>> - folio_ref_count(folio) == 2; >>> + folio_ref_count(folio) == (1 + folio_nr_pages(folio)); >> I don't think this is correct; one reference has just been added to the folio in >> do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by >> allocating. If it came from the swapcache, it could be a large folio, because we >> swapped out a large folio and never removed it from swapcache. But in that case, >> others may have partially mapped it, so the refcount could legitimately equal >> the number of pages while still not being exclusively mapped. >> >> I'm guessing this logic is trying to estimate when we are likely exclusive so >> that we remove from swapcache (release ref) and can then reuse rather than CoW >> the folio? The main CoW path currently CoWs page-by-page even for large folios, >> and with Barry's recent patch, even the last page gets copied. So not sure what >> this change is really trying to achieve? >> > First, if it is a large folio in the swap cache, then its refcont is at > least folio_nr_pages(folio) : Ahh! Sorry, I had it backwards - was thinking there would be 1 ref for the swap cache, and you were assuming 1 ref per page taken by do_swap_page(). I understand now. On this basis: Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> > > > For example, in add_to_swap_cache path: > > int add_to_swap_cache(struct folio *folio, swp_entry_t entry, > gfp_t gfp, void **shadowp) > { > struct address_space *address_space = swap_address_space(entry); > pgoff_t idx = swp_offset(entry); > XA_STATE_ORDER(xas, &address_space->i_pages, idx, > folio_order(folio)); > unsigned long i, nr = folio_nr_pages(folio); <--- > void *old; > ... > folio_ref_add(folio, nr); <--- > folio_set_swapcache(folio); > ... > } > > > * > > Then in the do_swap_page path: > > * if (should_try_to_free_swap(folio, vma, vmf->flags)) > folio_free_swap(folio); > * > > * It also indicates that only folio in the swap cache will call > folio_free_swap > * to delete it from the swap cache, So I feel like this patch is > necessary!?
Hi Ryan, Ryan Roberts <ryan.roberts@arm.com> 于2024年3月13日周三 17:10写道: > > On 13/03/2024 02:21, Chuanhua Han wrote: > > hi, Ryan Roberts > > > > 在 2024/3/12 20:34, Ryan Roberts 写道: > >> On 04/03/2024 08:13, Barry Song wrote: > >>> From: Chuanhua Han <hanchuanhua@oppo.com> > >>> > >>> should_try_to_free_swap() works with an assumption that swap-in is always done > >>> at normal page granularity, aka, folio_nr_pages = 1. To support large folio > >>> swap-in, this patch removes the assumption. > >>> > >>> Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com> > >>> Co-developed-by: Barry Song <v-songbaohua@oppo.com> > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> > >>> Acked-by: Chris Li <chrisl@kernel.org> > >>> --- > >>> mm/memory.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/mm/memory.c b/mm/memory.c > >>> index abd4f33d62c9..e0d34d705e07 100644 > >>> --- a/mm/memory.c > >>> +++ b/mm/memory.c > >>> @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio, > >>> * reference only in case it's likely that we'll be the exlusive user. > >>> */ > >>> return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) && > >>> - folio_ref_count(folio) == 2; > >>> + folio_ref_count(folio) == (1 + folio_nr_pages(folio)); > >> I don't think this is correct; one reference has just been added to the folio in > >> do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by > >> allocating. If it came from the swapcache, it could be a large folio, because we > >> swapped out a large folio and never removed it from swapcache. But in that case, > >> others may have partially mapped it, so the refcount could legitimately equal > >> the number of pages while still not being exclusively mapped. > >> > >> I'm guessing this logic is trying to estimate when we are likely exclusive so > >> that we remove from swapcache (release ref) and can then reuse rather than CoW > >> the folio? The main CoW path currently CoWs page-by-page even for large folios, > >> and with Barry's recent patch, even the last page gets copied. So not sure what > >> this change is really trying to achieve? > >> > > First, if it is a large folio in the swap cache, then its refcont is at > > least folio_nr_pages(folio) : > > Ahh! Sorry, I had it backwards - was thinking there would be 1 ref for the swap > cache, and you were assuming 1 ref per page taken by do_swap_page(). I > understand now. On this basis: > > Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Thank you for your review! > > > > > > > For example, in add_to_swap_cache path: > > > > int add_to_swap_cache(struct folio *folio, swp_entry_t entry, > > gfp_t gfp, void **shadowp) > > { > > struct address_space *address_space = swap_address_space(entry); > > pgoff_t idx = swp_offset(entry); > > XA_STATE_ORDER(xas, &address_space->i_pages, idx, > > folio_order(folio)); > > unsigned long i, nr = folio_nr_pages(folio); <--- > > void *old; > > ... > > folio_ref_add(folio, nr); <--- > > folio_set_swapcache(folio); > > ... > > } > > > > > > * > > > > Then in the do_swap_page path: > > > > * if (should_try_to_free_swap(folio, vma, vmf->flags)) > > folio_free_swap(folio); > > * > > > > * It also indicates that only folio in the swap cache will call > > folio_free_swap > > * to delete it from the swap cache, So I feel like this patch is > > necessary!?
diff --git a/mm/memory.c b/mm/memory.c index abd4f33d62c9..e0d34d705e07 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio, * reference only in case it's likely that we'll be the exlusive user. */ return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) && - folio_ref_count(folio) == 2; + folio_ref_count(folio) == (1 + folio_nr_pages(folio)); } static vm_fault_t pte_marker_clear(struct vm_fault *vmf)