diff mbox series

[v1,02/18] mm/rmap: always inline anon/file rmap duplication of a single PTE

Message ID 20240409192301.907377-3-david@redhat.com (mailing list archive)
State Handled Elsewhere
Headers show
Series mm: mapcount for large folios + page_mapcount() cleanups | expand

Commit Message

David Hildenbrand April 9, 2024, 7:22 p.m. UTC
As we grow the code, the compiler might make stupid decisions and
unnecessarily degrade fork() performance. Let's make sure to always inline
functions that operate on a single PTE so the compiler will always
optimize out the loop and avoid a function call.

This is a preparation for maintining a total mapcount for large folios.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/rmap.h | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

Comments

Yin Fengwei April 19, 2024, 2:25 a.m. UTC | #1
On 4/10/2024 3:22 AM, David Hildenbrand wrote:
> As we grow the code, the compiler might make stupid decisions and
> unnecessarily degrade fork() performance. Let's make sure to always inline
> functions that operate on a single PTE so the compiler will always
> optimize out the loop and avoid a function call.
> 
> This is a preparation for maintining a total mapcount for large folios.
> 
> Signed-off-by: David Hildenbrand<david@redhat.com>
The patch looks good to me. Just curious: Is this change driven by code
reviewing or performance data profiling? Thanks.


Regards
Yin, Fengwei
David Hildenbrand April 19, 2024, 9:14 a.m. UTC | #2
On 19.04.24 04:25, Yin, Fengwei wrote:
> 
> 
> On 4/10/2024 3:22 AM, David Hildenbrand wrote:
>> As we grow the code, the compiler might make stupid decisions and
>> unnecessarily degrade fork() performance. Let's make sure to always inline
>> functions that operate on a single PTE so the compiler will always
>> optimize out the loop and avoid a function call.
>>
>> This is a preparation for maintining a total mapcount for large folios.
>>
>> Signed-off-by: David Hildenbrand<david@redhat.com>
> The patch looks good to me. Just curious: Is this change driven by code
> reviewing or performance data profiling? Thanks.

It was identified while observing an performance degradation with small 
folios in the fork() microbenchmark discussed in the cover letter 
(mentioned here as "unnecessarily degrade fork() performance").

The added atomic_add() was sufficient for the compiler not inline and 
optimize-out nr_pages, inserting a function call to a function where 
nr_pages is not optimized out.
Yin Fengwei April 19, 2024, 2:01 p.m. UTC | #3
On 4/10/2024 3:22 AM, David Hildenbrand wrote:
> As we grow the code, the compiler might make stupid decisions and
> unnecessarily degrade fork() performance. Let's make sure to always inline
> functions that operate on a single PTE so the compiler will always
> optimize out the loop and avoid a function call.
> 
> This is a preparation for maintining a total mapcount for large folios.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
diff mbox series

Patch

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 9bf9324214fc..9549d78928bb 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -347,8 +347,12 @@  static inline void folio_dup_file_rmap_ptes(struct folio *folio,
 {
 	__folio_dup_file_rmap(folio, page, nr_pages, RMAP_LEVEL_PTE);
 }
-#define folio_dup_file_rmap_pte(folio, page) \
-	folio_dup_file_rmap_ptes(folio, page, 1)
+
+static __always_inline void folio_dup_file_rmap_pte(struct folio *folio,
+		struct page *page)
+{
+	__folio_dup_file_rmap(folio, page, 1, RMAP_LEVEL_PTE);
+}
 
 /**
  * folio_dup_file_rmap_pmd - duplicate a PMD mapping of a page range of a folio
@@ -448,8 +452,13 @@  static inline int folio_try_dup_anon_rmap_ptes(struct folio *folio,
 	return __folio_try_dup_anon_rmap(folio, page, nr_pages, src_vma,
 					 RMAP_LEVEL_PTE);
 }
-#define folio_try_dup_anon_rmap_pte(folio, page, vma) \
-	folio_try_dup_anon_rmap_ptes(folio, page, 1, vma)
+
+static __always_inline int folio_try_dup_anon_rmap_pte(struct folio *folio,
+		struct page *page, struct vm_area_struct *src_vma)
+{
+	return __folio_try_dup_anon_rmap(folio, page, 1, src_vma,
+					 RMAP_LEVEL_PTE);
+}
 
 /**
  * folio_try_dup_anon_rmap_pmd - try duplicating a PMD mapping of a page range