diff mbox series

mm: sparse: shift operation instead of division operation for root index

Message ID 20230810103829.10007-1-guohui@uniontech.com (mailing list archive)
State New
Headers show
Series mm: sparse: shift operation instead of division operation for root index | expand

Commit Message

Guo Hui Aug. 10, 2023, 10:38 a.m. UTC
In the function __nr_to_section,
Use shift operation instead of division operation
in order to improve the performance of memory management.
There are no functional changes.

Some performance data is as follows:
Machine configuration: Hygon 128 cores, 256M memory

Stream single core:
           with patch       without patch       promote
Copy       23376.7731       23907.1532          -1.27%
Scale      12580.2913       11679.7852          +7.71%
Add        11922.9562       11461.8669          +4.02%
Triad      12549.2735       11491.9798          +9.20%

Signed-off-by: Guo Hui <guohui@uniontech.com>
---
 include/linux/mmzone.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox (Oracle) Aug. 10, 2023, 1:14 p.m. UTC | #1
On Thu, Aug 10, 2023 at 06:38:29PM +0800, Guo Hui wrote:
> In the function __nr_to_section,
> Use shift operation instead of division operation
> in order to improve the performance of memory management.
> There are no functional changes.
> 
> Some performance data is as follows:
> Machine configuration: Hygon 128 cores, 256M memory
> 
> Stream single core:
>            with patch       without patch       promote
> Copy       23376.7731       23907.1532          -1.27%
> Scale      12580.2913       11679.7852          +7.71%
> Add        11922.9562       11461.8669          +4.02%
> Triad      12549.2735       11491.9798          +9.20%

How stable are these numbers?  Because this patch makes no sense to me.

#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT)

with:

#ifdef CONFIG_SPARSEMEM_EXTREME
#define SECTIONS_PER_ROOT       (PAGE_SIZE / sizeof (struct mem_section))
#else
#define SECTIONS_PER_ROOT       1
#endif

sizeof(struct mem_section) is a constant power-of-two.  So if this
result is real, then GCC isn't able to turn a
divide-by-a-constant-power-of-two into a shift.  That seems _really_
unlikely to me.  And if that is what's going on, then that needs to be
fixed!  Can you examine some before-and-after assembly dumps to see if
that is what's going on?

> Signed-off-by: Guo Hui <guohui@uniontech.com>
> ---
>  include/linux/mmzone.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 5e50b78d58ea..8dde6fb56109 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1818,7 +1818,8 @@ struct mem_section {
>  #define SECTIONS_PER_ROOT	1
>  #endif
>  
> -#define SECTION_NR_TO_ROOT(sec)	((sec) / SECTIONS_PER_ROOT)
> +#define SECTION_ROOT_SHIFT        (__builtin_popcount(SECTIONS_PER_ROOT - 1))
> +#define SECTION_NR_TO_ROOT(sec)	((sec) >> SECTION_ROOT_SHIFT)
>  #define NR_SECTION_ROOTS	DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
>  #define SECTION_ROOT_MASK	(SECTIONS_PER_ROOT - 1)
>  
> -- 
> 2.20.1
> 
>
Guo Hui Aug. 12, 2023, 8:07 a.m. UTC | #2
On 8/10/23 9:14 PM, Matthew Wilcox wrote:
> On Thu, Aug 10, 2023 at 06:38:29PM +0800, Guo Hui wrote:
>> In the function __nr_to_section,
>> Use shift operation instead of division operation
>> in order to improve the performance of memory management.
>> There are no functional changes.
>>
>> Some performance data is as follows:
>> Machine configuration: Hygon 128 cores, 256M memory
>>
>> Stream single core:
>>             with patch       without patch       promote
>> Copy       23376.7731       23907.1532          -1.27%
>> Scale      12580.2913       11679.7852          +7.71%
>> Add        11922.9562       11461.8669          +4.02%
>> Triad      12549.2735       11491.9798          +9.20%
> How stable are these numbers?  Because this patch makes no sense to me.

Thank you for your reply.

The increase is not stable, between 4% and 7%.


>
> #define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT)
>
> with:
>
> #ifdef CONFIG_SPARSEMEM_EXTREME
> #define SECTIONS_PER_ROOT       (PAGE_SIZE / sizeof (struct mem_section))
> #else
> #define SECTIONS_PER_ROOT       1
> #endif
>
> sizeof(struct mem_section) is a constant power-of-two.  So if this
> result is real, then GCC isn't able to turn a
> divide-by-a-constant-power-of-two into a shift.  That seems _really_
> unlikely to me.  And if that is what's going on, then that needs to be
> fixed!  Can you examine some before-and-after assembly dumps to see if
> that is what's going on?


Thank you for your guide.


I have done an assembly code analysis on the use of __nr_to_section in 
the function online_mem_sections,
as follows:


ffffffff81383580 <online_mem_sections>:
{
... ...
         return pfn >> PFN_SECTION_SHIFT;
ffffffff8138359d:       48 89 f8                mov    %rdi,%rax
         unsigned long root = SECTION_NR_TO_ROOT(nr);
ffffffff813835a0:       48 89 f9                mov    %rdi,%rcx
         return pfn >> PFN_SECTION_SHIFT;
ffffffff813835a3:       48 c1 e8 0f             shr    $0xf,%rax
         unsigned long root = SECTION_NR_TO_ROOT(nr);
ffffffff813835a7:       48 c1 e9 16             shr 
$0x16,%rcx               -----------------------------  A
ffffffff813835ab:       e9 38 ea d5 01          jmpq ffffffff830e1fe8 
<_einittext+0x2a78>


In code line A, the compiler can automatically convert the division into 
a shift operation.
My patch has the same effect, so it is unnecessary to use my patch, so 
the improvement of
Stream above may come from other reasons, and I will continue to go 
deeper analyze
this improvement.

My intention is to use the function __builtin_popcount to calculate the 
number
of 1s in (SECTIONS_PER_ROOT - 1) at compile time, which is the offset 
number
of section_nr.


>
>> Signed-off-by: Guo Hui <guohui@uniontech.com>
>> ---
>>   include/linux/mmzone.h | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 5e50b78d58ea..8dde6fb56109 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1818,7 +1818,8 @@ struct mem_section {
>>   #define SECTIONS_PER_ROOT	1
>>   #endif
>>   
>> -#define SECTION_NR_TO_ROOT(sec)	((sec) / SECTIONS_PER_ROOT)
>> +#define SECTION_ROOT_SHIFT        (__builtin_popcount(SECTIONS_PER_ROOT - 1))
>> +#define SECTION_NR_TO_ROOT(sec)	((sec) >> SECTION_ROOT_SHIFT)
>>   #define NR_SECTION_ROOTS	DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
>>   #define SECTION_ROOT_MASK	(SECTIONS_PER_ROOT - 1)
>>   
>> -- 
>> 2.20.1
>>
>>
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5e50b78d58ea..8dde6fb56109 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1818,7 +1818,8 @@  struct mem_section {
 #define SECTIONS_PER_ROOT	1
 #endif
 
-#define SECTION_NR_TO_ROOT(sec)	((sec) / SECTIONS_PER_ROOT)
+#define SECTION_ROOT_SHIFT        (__builtin_popcount(SECTIONS_PER_ROOT - 1))
+#define SECTION_NR_TO_ROOT(sec)	((sec) >> SECTION_ROOT_SHIFT)
 #define NR_SECTION_ROOTS	DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
 #define SECTION_ROOT_MASK	(SECTIONS_PER_ROOT - 1)