Message ID | 20240409192301.907377-3-david@redhat.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | mm: mapcount for large folios + page_mapcount() cleanups | expand |
On 4/10/2024 3:22 AM, David Hildenbrand wrote: > As we grow the code, the compiler might make stupid decisions and > unnecessarily degrade fork() performance. Let's make sure to always inline > functions that operate on a single PTE so the compiler will always > optimize out the loop and avoid a function call. > > This is a preparation for maintining a total mapcount for large folios. > > Signed-off-by: David Hildenbrand<david@redhat.com> The patch looks good to me. Just curious: Is this change driven by code reviewing or performance data profiling? Thanks. Regards Yin, Fengwei
On 19.04.24 04:25, Yin, Fengwei wrote: > > > On 4/10/2024 3:22 AM, David Hildenbrand wrote: >> As we grow the code, the compiler might make stupid decisions and >> unnecessarily degrade fork() performance. Let's make sure to always inline >> functions that operate on a single PTE so the compiler will always >> optimize out the loop and avoid a function call. >> >> This is a preparation for maintining a total mapcount for large folios. >> >> Signed-off-by: David Hildenbrand<david@redhat.com> > The patch looks good to me. Just curious: Is this change driven by code > reviewing or performance data profiling? Thanks. It was identified while observing an performance degradation with small folios in the fork() microbenchmark discussed in the cover letter (mentioned here as "unnecessarily degrade fork() performance"). The added atomic_add() was sufficient for the compiler not inline and optimize-out nr_pages, inserting a function call to a function where nr_pages is not optimized out.
On 4/10/2024 3:22 AM, David Hildenbrand wrote: > As we grow the code, the compiler might make stupid decisions and > unnecessarily degrade fork() performance. Let's make sure to always inline > functions that operate on a single PTE so the compiler will always > optimize out the loop and avoid a function call. > > This is a preparation for maintining a total mapcount for large folios. > > Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 9bf9324214fc..9549d78928bb 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -347,8 +347,12 @@ static inline void folio_dup_file_rmap_ptes(struct folio *folio, { __folio_dup_file_rmap(folio, page, nr_pages, RMAP_LEVEL_PTE); } -#define folio_dup_file_rmap_pte(folio, page) \ - folio_dup_file_rmap_ptes(folio, page, 1) + +static __always_inline void folio_dup_file_rmap_pte(struct folio *folio, + struct page *page) +{ + __folio_dup_file_rmap(folio, page, 1, RMAP_LEVEL_PTE); +} /** * folio_dup_file_rmap_pmd - duplicate a PMD mapping of a page range of a folio @@ -448,8 +452,13 @@ static inline int folio_try_dup_anon_rmap_ptes(struct folio *folio, return __folio_try_dup_anon_rmap(folio, page, nr_pages, src_vma, RMAP_LEVEL_PTE); } -#define folio_try_dup_anon_rmap_pte(folio, page, vma) \ - folio_try_dup_anon_rmap_ptes(folio, page, 1, vma) + +static __always_inline int folio_try_dup_anon_rmap_pte(struct folio *folio, + struct page *page, struct vm_area_struct *src_vma) +{ + return __folio_try_dup_anon_rmap(folio, page, 1, src_vma, + RMAP_LEVEL_PTE); +} /** * folio_try_dup_anon_rmap_pmd - try duplicating a PMD mapping of a page range
As we grow the code, the compiler might make stupid decisions and unnecessarily degrade fork() performance. Let's make sure to always inline functions that operate on a single PTE so the compiler will always optimize out the loop and avoid a function call. This is a preparation for maintining a total mapcount for large folios. Signed-off-by: David Hildenbrand <david@redhat.com> --- include/linux/rmap.h | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)