Message ID | 20230810142942.3169679-1-ryan.roberts@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | variable-order, large folios for anonymous memory | expand |
On 10/08/2023 15:29, Ryan Roberts wrote: > Hi All, > > This is v5 of a series to implement variable order, large folios for anonymous > memory. (currently called "LARGE_ANON_FOLIO", previously called "FLEXIBLE_THP"). > The objective of this is to improve performance by allocating larger chunks of > memory during anonymous page faults: > > 1) Since SW (the kernel) is dealing with larger chunks of memory than base > pages, there are efficiency savings to be had; fewer page faults, batched PTE > and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel > overhead. This should benefit all architectures. > 2) Since we are now mapping physically contiguous chunks of memory, we can take > advantage of HW TLB compression techniques. A reduction in TLB pressure > speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce > TLB entries; "the contiguous bit" (architectural) and HPA (uarch). > > This patch set deals with the SW side of things (1). (2) is being tackled in a > separate series. The new behaviour is hidden behind a new Kconfig switch, > LARGE_ANON_FOLIO, which is disabled by default. Although the eventual aim is to > enable it by default. > > My hope is that we are pretty much there with the changes at this point; > hopefully this is sufficient to get an initial version merged so that we can > scale up characterization efforts. Although they should not be merged until the > prerequisites are complete. These are in progress and tracked at [5]. > > This series is based on mm-unstable (ad3232df3e41). > > I'm going to be out on holiday from the end of today, returning on 29th > August. So responses will likely be patchy, as I'm terrified of posting > to list from my phone! > > > Testing > ------- > > This version adds patches to mm selftests so that the cow tests explicitly test > large anon folios, in the same way that thp is tested. When enabled you should > see something similar at the start of the test suite: > > # [INFO] detected large anon folio size: 32 KiB > > Then the following results are expected. The fails and skips are due to existing > issues in mm-unstable: > > # Totals: pass:207 fail:16 xfail:0 xpass:0 skip:85 error:0 Oops, the above are the results when running with SWAP disabled. This is what you would normally see when SWAP is enabled: # Totals: pass:291 fail:16 xfail:0 xpass:0 skip:1 error:0 > > Existing mm selftests reveal 1 regression in khugepaged tests when > LARGE_ANON_FOLIO is enabled: > > Run test: collapse_max_ptes_none (khugepaged:anon) > Maybe collapse with max_ptes_none exceeded.... Fail > Unexpected huge page > > I believe this is because khugepaged currently skips non-order-0 pages when > looking for collapse opportunities and should get fixed with the help of > DavidH's work to create a mechanism to precisely determine shared vs exclusive > pages. > > > Changes since v4 [4] > -------------------- > > - Removed "arm64: mm: Override arch_wants_pte_order()" patch; arm64 > now uses the default order-3 size. I have moved this patch over to > the contpte series. > - Added "mm: Allow deferred splitting of arbitrary large anon folios" back > into series. I originally removed this at v2 to add to a separate series, > but that series has transformed significantly and it no longer fits, so > bringing it back here. > - Reintroduced dependency on set_ptes(); Originally dropped this at v2, but > set_ptes() is in mm-unstable now. > - Updated policy for when to allocate LAF; only fallback to order-0 if > MADV_NOHUGEPAGE is present or if THP disabled via prctl; no longer rely on > sysfs's never/madvise/always knob. > - Fallback to order-0 whenever uffd is armed for the vma, not just when > uffd-wp is set on the pte. > - alloc_anon_folio() now returns `strucxt folio *`, where errors are encoded > with ERR_PTR(). > > The last 3 changes were proposed by Yu Zhao - thanks! > > > Changes since v3 [3] > -------------------- > > - Renamed feature from FLEXIBLE_THP to LARGE_ANON_FOLIO. > - Removed `flexthp_unhinted_max` boot parameter. Discussion concluded that a > sysctl is preferable but we will wait until real workload needs it. > - Fixed uninitialized `addr` on read fault path in do_anonymous_page(). > - Added mm selftests for large anon folios in cow test suite. > > > Changes since v2 [2] > -------------------- > > - Dropped commit "Allow deferred splitting of arbitrary large anon folios" > - Huang, Ying suggested the "batch zap" work (which I dropped from this > series after v1) is a prerequisite for merging FLXEIBLE_THP, so I've > moved the deferred split patch to a separate series along with the batch > zap changes. I plan to submit this series early next week. > - Changed folio order fallback policy > - We no longer iterate from preferred to 0 looking for acceptable policy > - Instead we iterate through preferred, PAGE_ALLOC_COSTLY_ORDER and 0 only > - Removed vma parameter from arch_wants_pte_order() > - Added command line parameter `flexthp_unhinted_max` > - clamps preferred order when vma hasn't explicitly opted-in to THP > - Never allocate large folio for MADV_NOHUGEPAGE vma (or when THP is disabled > for process or system). > - Simplified implementation and integration with do_anonymous_page() > - Removed dependency on set_ptes() > > > Changes since v1 [1] > -------------------- > > - removed changes to arch-dependent vma_alloc_zeroed_movable_folio() > - replaced with arch-independent alloc_anon_folio() > - follows THP allocation approach > - no longer retry with intermediate orders if allocation fails > - fallback directly to order-0 > - remove folio_add_new_anon_rmap_range() patch > - instead add its new functionality to folio_add_new_anon_rmap() > - remove batch-zap pte mappings optimization patch > - remove enabler folio_remove_rmap_range() patch too > - These offer real perf improvement so will submit separately > - simplify Kconfig > - single FLEXIBLE_THP option, which is independent of arch > - depends on TRANSPARENT_HUGEPAGE > - when enabled default to max anon folio size of 64K unless arch > explicitly overrides > - simplify changes to do_anonymous_page(): > - no more retry loop > > > [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-1-ryan.roberts@arm.com/ > [2] https://lore.kernel.org/linux-mm/20230703135330.1865927-1-ryan.roberts@arm.com/ > [3] https://lore.kernel.org/linux-mm/20230714160407.4142030-1-ryan.roberts@arm.com/ > [4] https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.roberts@arm.com/ > [5] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/ > > > Thanks, > Ryan > > Ryan Roberts (5): > mm: Allow deferred splitting of arbitrary large anon folios > mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() > mm: LARGE_ANON_FOLIO for improved performance > selftests/mm/cow: Generalize do_run_with_thp() helper > selftests/mm/cow: Add large anon folio tests > > include/linux/pgtable.h | 13 ++ > mm/Kconfig | 10 ++ > mm/memory.c | 144 +++++++++++++++++-- > mm/rmap.c | 31 +++-- > tools/testing/selftests/mm/cow.c | 229 ++++++++++++++++++++++--------- > 5 files changed, 347 insertions(+), 80 deletions(-) > > -- > 2.25.1 >
> On Aug 10, 2023, at 23:29, Ryan Roberts <ryan.roberts@arm.com> wrote: > > Hi All, > > This is v5 of a series to implement variable order, large folios for anonymous > memory. (currently called "LARGE_ANON_FOLIO", previously called "FLEXIBLE_THP"). > The objective of this is to improve performance by allocating larger chunks of > memory during anonymous page faults: > > 1) Since SW (the kernel) is dealing with larger chunks of memory than base > pages, there are efficiency savings to be had; fewer page faults, batched PTE > and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel > overhead. This should benefit all architectures. > 2) Since we are now mapping physically contiguous chunks of memory, we can take > advantage of HW TLB compression techniques. A reduction in TLB pressure > speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce > TLB entries; "the contiguous bit" (architectural) and HPA (uarch). > > This patch set deals with the SW side of things (1). (2) is being tackled in a > separate series. The new behaviour is hidden behind a new Kconfig switch, > LARGE_ANON_FOLIO, which is disabled by default. Although the eventual aim is to > enable it by default. > > My hope is that we are pretty much there with the changes at this point; > hopefully this is sufficient to get an initial version merged so that we can > scale up characterization efforts. Although they should not be merged until the > prerequisites are complete. These are in progress and tracked at [5]. > > This series is based on mm-unstable (ad3232df3e41). > > I'm going to be out on holiday from the end of today, returning on 29th > August. So responses will likely be patchy, as I'm terrified of posting > to list from my phone! > > > Testing > ------- > > This version adds patches to mm selftests so that the cow tests explicitly test > large anon folios, in the same way that thp is tested. When enabled you should > see something similar at the start of the test suite: > > # [INFO] detected large anon folio size: 32 KiB > > Then the following results are expected. The fails and skips are due to existing > issues in mm-unstable: > > # Totals: pass:207 fail:16 xfail:0 xpass:0 skip:85 error:0 > > Existing mm selftests reveal 1 regression in khugepaged tests when > LARGE_ANON_FOLIO is enabled: > > Run test: collapse_max_ptes_none (khugepaged:anon) > Maybe collapse with max_ptes_none exceeded.... Fail > Unexpected huge page > > I believe this is because khugepaged currently skips non-order-0 pages when > looking for collapse opportunities and should get fixed with the help of > DavidH's work to create a mechanism to precisely determine shared vs exclusive > pages. > > > Changes since v4 [4] > -------------------- > > - Removed "arm64: mm: Override arch_wants_pte_order()" patch; arm64 > now uses the default order-3 size. I have moved this patch over to > the contpte series. > - Added "mm: Allow deferred splitting of arbitrary large anon folios" back > into series. I originally removed this at v2 to add to a separate series, > but that series has transformed significantly and it no longer fits, so > bringing it back here. > - Reintroduced dependency on set_ptes(); Originally dropped this at v2, but > set_ptes() is in mm-unstable now. > - Updated policy for when to allocate LAF; only fallback to order-0 if > MADV_NOHUGEPAGE is present or if THP disabled via prctl; no longer rely on > sysfs's never/madvise/always knob. > - Fallback to order-0 whenever uffd is armed for the vma, not just when > uffd-wp is set on the pte. > - alloc_anon_folio() now returns `strucxt folio *`, where errors are encoded > with ERR_PTR(). > > The last 3 changes were proposed by Yu Zhao - thanks! > > > Changes since v3 [3] > -------------------- > > - Renamed feature from FLEXIBLE_THP to LARGE_ANON_FOLIO. > - Removed `flexthp_unhinted_max` boot parameter. Discussion concluded that a > sysctl is preferable but we will wait until real workload needs it. > - Fixed uninitialized `addr` on read fault path in do_anonymous_page(). > - Added mm selftests for large anon folios in cow test suite. > > > Changes since v2 [2] > -------------------- > > - Dropped commit "Allow deferred splitting of arbitrary large anon folios" > - Huang, Ying suggested the "batch zap" work (which I dropped from this > series after v1) is a prerequisite for merging FLXEIBLE_THP, so I've > moved the deferred split patch to a separate series along with the batch > zap changes. I plan to submit this series early next week. > - Changed folio order fallback policy > - We no longer iterate from preferred to 0 looking for acceptable policy > - Instead we iterate through preferred, PAGE_ALLOC_COSTLY_ORDER and 0 only > - Removed vma parameter from arch_wants_pte_order() > - Added command line parameter `flexthp_unhinted_max` > - clamps preferred order when vma hasn't explicitly opted-in to THP > - Never allocate large folio for MADV_NOHUGEPAGE vma (or when THP is disabled > for process or system). > - Simplified implementation and integration with do_anonymous_page() > - Removed dependency on set_ptes() > > > Changes since v1 [1] > -------------------- > > - removed changes to arch-dependent vma_alloc_zeroed_movable_folio() > - replaced with arch-independent alloc_anon_folio() > - follows THP allocation approach > - no longer retry with intermediate orders if allocation fails > - fallback directly to order-0 > - remove folio_add_new_anon_rmap_range() patch > - instead add its new functionality to folio_add_new_anon_rmap() > - remove batch-zap pte mappings optimization patch > - remove enabler folio_remove_rmap_range() patch too > - These offer real perf improvement so will submit separately > - simplify Kconfig > - single FLEXIBLE_THP option, which is independent of arch > - depends on TRANSPARENT_HUGEPAGE > - when enabled default to max anon folio size of 64K unless arch > explicitly overrides > - simplify changes to do_anonymous_page(): > - no more retry loop > > > [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-1-ryan.roberts@arm.com/ > [2] https://lore.kernel.org/linux-mm/20230703135330.1865927-1-ryan.roberts@arm.com/ > [3] https://lore.kernel.org/linux-mm/20230714160407.4142030-1-ryan.roberts@arm.com/ > [4] https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.roberts@arm.com/ > [5] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/ > > > Thanks, > Ryan > > Ryan Roberts (5): > mm: Allow deferred splitting of arbitrary large anon folios > mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() > mm: LARGE_ANON_FOLIO for improved performance > selftests/mm/cow: Generalize do_run_with_thp() helper > selftests/mm/cow: Add large anon folio tests > > include/linux/pgtable.h | 13 ++ > mm/Kconfig | 10 ++ > mm/memory.c | 144 +++++++++++++++++-- > mm/rmap.c | 31 +++-- > tools/testing/selftests/mm/cow.c | 229 ++++++++++++++++++++++--------- > 5 files changed, 347 insertions(+), 80 deletions(-) > > -- > 2.25.1 > I know Ryan is away currently, but as I can’t find the base commit mentioned in the cover letter to be based off of can anybody point me to it so I can use b4 for applying the series and test? Thanks, Itaru.
On 8/16/2023 4:11 PM, Itaru Kitayama wrote: > > >> On Aug 10, 2023, at 23:29, Ryan Roberts <ryan.roberts@arm.com> wrote: >> >> Hi All, >> >> This is v5 of a series to implement variable order, large folios for anonymous >> memory. (currently called "LARGE_ANON_FOLIO", previously called "FLEXIBLE_THP"). >> The objective of this is to improve performance by allocating larger chunks of >> memory during anonymous page faults: >> >> 1) Since SW (the kernel) is dealing with larger chunks of memory than base >> pages, there are efficiency savings to be had; fewer page faults, batched PTE >> and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel >> overhead. This should benefit all architectures. >> 2) Since we are now mapping physically contiguous chunks of memory, we can take >> advantage of HW TLB compression techniques. A reduction in TLB pressure >> speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce >> TLB entries; "the contiguous bit" (architectural) and HPA (uarch). >> >> This patch set deals with the SW side of things (1). (2) is being tackled in a >> separate series. The new behaviour is hidden behind a new Kconfig switch, >> LARGE_ANON_FOLIO, which is disabled by default. Although the eventual aim is to >> enable it by default. >> >> My hope is that we are pretty much there with the changes at this point; >> hopefully this is sufficient to get an initial version merged so that we can >> scale up characterization efforts. Although they should not be merged until the >> prerequisites are complete. These are in progress and tracked at [5]. >> >> This series is based on mm-unstable (ad3232df3e41). >> >> I'm going to be out on holiday from the end of today, returning on 29th >> August. So responses will likely be patchy, as I'm terrified of posting >> to list from my phone! >> >> >> Testing >> ------- >> >> This version adds patches to mm selftests so that the cow tests explicitly test >> large anon folios, in the same way that thp is tested. When enabled you should >> see something similar at the start of the test suite: >> >> # [INFO] detected large anon folio size: 32 KiB >> >> Then the following results are expected. The fails and skips are due to existing >> issues in mm-unstable: >> >> # Totals: pass:207 fail:16 xfail:0 xpass:0 skip:85 error:0 >> >> Existing mm selftests reveal 1 regression in khugepaged tests when >> LARGE_ANON_FOLIO is enabled: >> >> Run test: collapse_max_ptes_none (khugepaged:anon) >> Maybe collapse with max_ptes_none exceeded.... Fail >> Unexpected huge page >> >> I believe this is because khugepaged currently skips non-order-0 pages when >> looking for collapse opportunities and should get fixed with the help of >> DavidH's work to create a mechanism to precisely determine shared vs exclusive >> pages. >> >> >> Changes since v4 [4] >> -------------------- >> >> - Removed "arm64: mm: Override arch_wants_pte_order()" patch; arm64 >> now uses the default order-3 size. I have moved this patch over to >> the contpte series. >> - Added "mm: Allow deferred splitting of arbitrary large anon folios" back >> into series. I originally removed this at v2 to add to a separate series, >> but that series has transformed significantly and it no longer fits, so >> bringing it back here. >> - Reintroduced dependency on set_ptes(); Originally dropped this at v2, but >> set_ptes() is in mm-unstable now. >> - Updated policy for when to allocate LAF; only fallback to order-0 if >> MADV_NOHUGEPAGE is present or if THP disabled via prctl; no longer rely on >> sysfs's never/madvise/always knob. >> - Fallback to order-0 whenever uffd is armed for the vma, not just when >> uffd-wp is set on the pte. >> - alloc_anon_folio() now returns `strucxt folio *`, where errors are encoded >> with ERR_PTR(). >> >> The last 3 changes were proposed by Yu Zhao - thanks! >> >> >> Changes since v3 [3] >> -------------------- >> >> - Renamed feature from FLEXIBLE_THP to LARGE_ANON_FOLIO. >> - Removed `flexthp_unhinted_max` boot parameter. Discussion concluded that a >> sysctl is preferable but we will wait until real workload needs it. >> - Fixed uninitialized `addr` on read fault path in do_anonymous_page(). >> - Added mm selftests for large anon folios in cow test suite. >> >> >> Changes since v2 [2] >> -------------------- >> >> - Dropped commit "Allow deferred splitting of arbitrary large anon folios" >> - Huang, Ying suggested the "batch zap" work (which I dropped from this >> series after v1) is a prerequisite for merging FLXEIBLE_THP, so I've >> moved the deferred split patch to a separate series along with the batch >> zap changes. I plan to submit this series early next week. >> - Changed folio order fallback policy >> - We no longer iterate from preferred to 0 looking for acceptable policy >> - Instead we iterate through preferred, PAGE_ALLOC_COSTLY_ORDER and 0 only >> - Removed vma parameter from arch_wants_pte_order() >> - Added command line parameter `flexthp_unhinted_max` >> - clamps preferred order when vma hasn't explicitly opted-in to THP >> - Never allocate large folio for MADV_NOHUGEPAGE vma (or when THP is disabled >> for process or system). >> - Simplified implementation and integration with do_anonymous_page() >> - Removed dependency on set_ptes() >> >> >> Changes since v1 [1] >> -------------------- >> >> - removed changes to arch-dependent vma_alloc_zeroed_movable_folio() >> - replaced with arch-independent alloc_anon_folio() >> - follows THP allocation approach >> - no longer retry with intermediate orders if allocation fails >> - fallback directly to order-0 >> - remove folio_add_new_anon_rmap_range() patch >> - instead add its new functionality to folio_add_new_anon_rmap() >> - remove batch-zap pte mappings optimization patch >> - remove enabler folio_remove_rmap_range() patch too >> - These offer real perf improvement so will submit separately >> - simplify Kconfig >> - single FLEXIBLE_THP option, which is independent of arch >> - depends on TRANSPARENT_HUGEPAGE >> - when enabled default to max anon folio size of 64K unless arch >> explicitly overrides >> - simplify changes to do_anonymous_page(): >> - no more retry loop >> >> >> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-1-ryan.roberts@arm.com/ >> [2] https://lore.kernel.org/linux-mm/20230703135330.1865927-1-ryan.roberts@arm.com/ >> [3] https://lore.kernel.org/linux-mm/20230714160407.4142030-1-ryan.roberts@arm.com/ >> [4] https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.roberts@arm.com/ >> [5] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/ >> >> >> Thanks, >> Ryan >> >> Ryan Roberts (5): >> mm: Allow deferred splitting of arbitrary large anon folios >> mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() >> mm: LARGE_ANON_FOLIO for improved performance >> selftests/mm/cow: Generalize do_run_with_thp() helper >> selftests/mm/cow: Add large anon folio tests >> >> include/linux/pgtable.h | 13 ++ >> mm/Kconfig | 10 ++ >> mm/memory.c | 144 +++++++++++++++++-- >> mm/rmap.c | 31 +++-- >> tools/testing/selftests/mm/cow.c | 229 ++++++++++++++++++++++--------- >> 5 files changed, 347 insertions(+), 80 deletions(-) >> >> -- >> 2.25.1 >> > > I know Ryan is away currently, but as I can’t find the base commit mentioned in the cover letter to be based off of can anybody point me to it so I can use b4 for applying the series and test? > Ryan mentioned: This series is based on mm-unstable (ad3232df3e41). I believe you can apply the patchset to latest mm-unstable. Regards Yin, Fengwei > Thanks, > Itaru.
> On Aug 16, 2023, at 18:25, Yin, Fengwei <fengwei.yin@intel.com> wrote: > > > >> On 8/16/2023 4:11 PM, Itaru Kitayama wrote: >> >> >>>> On Aug 10, 2023, at 23:29, Ryan Roberts <ryan.roberts@arm.com> wrote: >>> >>> Hi All, >>> >>> This is v5 of a series to implement variable order, large folios for anonymous >>> memory. (currently called "LARGE_ANON_FOLIO", previously called "FLEXIBLE_THP"). >>> The objective of this is to improve performance by allocating larger chunks of >>> memory during anonymous page faults: >>> >>> 1) Since SW (the kernel) is dealing with larger chunks of memory than base >>> pages, there are efficiency savings to be had; fewer page faults, batched PTE >>> and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel >>> overhead. This should benefit all architectures. >>> 2) Since we are now mapping physically contiguous chunks of memory, we can take >>> advantage of HW TLB compression techniques. A reduction in TLB pressure >>> speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce >>> TLB entries; "the contiguous bit" (architectural) and HPA (uarch). >>> >>> This patch set deals with the SW side of things (1). (2) is being tackled in a >>> separate series. The new behaviour is hidden behind a new Kconfig switch, >>> LARGE_ANON_FOLIO, which is disabled by default. Although the eventual aim is to >>> enable it by default. >>> >>> My hope is that we are pretty much there with the changes at this point; >>> hopefully this is sufficient to get an initial version merged so that we can >>> scale up characterization efforts. Although they should not be merged until the >>> prerequisites are complete. These are in progress and tracked at [5]. >>> >>> This series is based on mm-unstable (ad3232df3e41). >>> >>> I'm going to be out on holiday from the end of today, returning on 29th >>> August. So responses will likely be patchy, as I'm terrified of posting >>> to list from my phone! >>> >>> >>> Testing >>> ------- >>> >>> This version adds patches to mm selftests so that the cow tests explicitly test >>> large anon folios, in the same way that thp is tested. When enabled you should >>> see something similar at the start of the test suite: >>> >>> # [INFO] detected large anon folio size: 32 KiB >>> >>> Then the following results are expected. The fails and skips are due to existing >>> issues in mm-unstable: >>> >>> # Totals: pass:207 fail:16 xfail:0 xpass:0 skip:85 error:0 >>> >>> Existing mm selftests reveal 1 regression in khugepaged tests when >>> LARGE_ANON_FOLIO is enabled: >>> >>> Run test: collapse_max_ptes_none (khugepaged:anon) >>> Maybe collapse with max_ptes_none exceeded.... Fail >>> Unexpected huge page >>> >>> I believe this is because khugepaged currently skips non-order-0 pages when >>> looking for collapse opportunities and should get fixed with the help of >>> DavidH's work to create a mechanism to precisely determine shared vs exclusive >>> pages. >>> >>> >>> Changes since v4 [4] >>> -------------------- >>> >>> - Removed "arm64: mm: Override arch_wants_pte_order()" patch; arm64 >>> now uses the default order-3 size. I have moved this patch over to >>> the contpte series. >>> - Added "mm: Allow deferred splitting of arbitrary large anon folios" back >>> into series. I originally removed this at v2 to add to a separate series, >>> but that series has transformed significantly and it no longer fits, so >>> bringing it back here. >>> - Reintroduced dependency on set_ptes(); Originally dropped this at v2, but >>> set_ptes() is in mm-unstable now. >>> - Updated policy for when to allocate LAF; only fallback to order-0 if >>> MADV_NOHUGEPAGE is present or if THP disabled via prctl; no longer rely on >>> sysfs's never/madvise/always knob. >>> - Fallback to order-0 whenever uffd is armed for the vma, not just when >>> uffd-wp is set on the pte. >>> - alloc_anon_folio() now returns `strucxt folio *`, where errors are encoded >>> with ERR_PTR(). >>> >>> The last 3 changes were proposed by Yu Zhao - thanks! >>> >>> >>> Changes since v3 [3] >>> -------------------- >>> >>> - Renamed feature from FLEXIBLE_THP to LARGE_ANON_FOLIO. >>> - Removed `flexthp_unhinted_max` boot parameter. Discussion concluded that a >>> sysctl is preferable but we will wait until real workload needs it. >>> - Fixed uninitialized `addr` on read fault path in do_anonymous_page(). >>> - Added mm selftests for large anon folios in cow test suite. >>> >>> >>> Changes since v2 [2] >>> -------------------- >>> >>> - Dropped commit "Allow deferred splitting of arbitrary large anon folios" >>> - Huang, Ying suggested the "batch zap" work (which I dropped from this >>> series after v1) is a prerequisite for merging FLXEIBLE_THP, so I've >>> moved the deferred split patch to a separate series along with the batch >>> zap changes. I plan to submit this series early next week. >>> - Changed folio order fallback policy >>> - We no longer iterate from preferred to 0 looking for acceptable policy >>> - Instead we iterate through preferred, PAGE_ALLOC_COSTLY_ORDER and 0 only >>> - Removed vma parameter from arch_wants_pte_order() >>> - Added command line parameter `flexthp_unhinted_max` >>> - clamps preferred order when vma hasn't explicitly opted-in to THP >>> - Never allocate large folio for MADV_NOHUGEPAGE vma (or when THP is disabled >>> for process or system). >>> - Simplified implementation and integration with do_anonymous_page() >>> - Removed dependency on set_ptes() >>> >>> >>> Changes since v1 [1] >>> -------------------- >>> >>> - removed changes to arch-dependent vma_alloc_zeroed_movable_folio() >>> - replaced with arch-independent alloc_anon_folio() >>> - follows THP allocation approach >>> - no longer retry with intermediate orders if allocation fails >>> - fallback directly to order-0 >>> - remove folio_add_new_anon_rmap_range() patch >>> - instead add its new functionality to folio_add_new_anon_rmap() >>> - remove batch-zap pte mappings optimization patch >>> - remove enabler folio_remove_rmap_range() patch too >>> - These offer real perf improvement so will submit separately >>> - simplify Kconfig >>> - single FLEXIBLE_THP option, which is independent of arch >>> - depends on TRANSPARENT_HUGEPAGE >>> - when enabled default to max anon folio size of 64K unless arch >>> explicitly overrides >>> - simplify changes to do_anonymous_page(): >>> - no more retry loop >>> >>> >>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-1-ryan.roberts@arm.com/ >>> [2] https://lore.kernel.org/linux-mm/20230703135330.1865927-1-ryan.roberts@arm.com/ >>> [3] https://lore.kernel.org/linux-mm/20230714160407.4142030-1-ryan.roberts@arm.com/ >>> [4] https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.roberts@arm.com/ >>> [5] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/ >>> >>> >>> Thanks, >>> Ryan >>> >>> Ryan Roberts (5): >>> mm: Allow deferred splitting of arbitrary large anon folios >>> mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() >>> mm: LARGE_ANON_FOLIO for improved performance >>> selftests/mm/cow: Generalize do_run_with_thp() helper >>> selftests/mm/cow: Add large anon folio tests >>> >>> include/linux/pgtable.h | 13 ++ >>> mm/Kconfig | 10 ++ >>> mm/memory.c | 144 +++++++++++++++++-- >>> mm/rmap.c | 31 +++-- >>> tools/testing/selftests/mm/cow.c | 229 ++++++++++++++++++++++--------- >>> 5 files changed, 347 insertions(+), 80 deletions(-) >>> >>> -- >>> 2.25.1 >>> >> >> I know Ryan is away currently, but as I can’t find the base commit mentioned in the cover letter to be based off of can anybody point me to it so I can use b4 for applying the series and test? >> > Ryan mentioned: This series is based on mm-unstable (ad3232df3e41). Couldn’t find the commit in the mm-unstable branch I checked out today. I’m trying to use Andrew’s mm tree for the first time in a decade so I’m doing something wrong though. > > I believe you can apply the patchset to latest mm-unstable. Okay. Will try that. Thanks, Itaru. > > > Regards > Yin, Fengwei > >> Thanks, >> Itaru.