Message ID | 20230810103829.10007-1-guohui@uniontech.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: sparse: shift operation instead of division operation for root index | expand |
On Thu, Aug 10, 2023 at 06:38:29PM +0800, Guo Hui wrote: > In the function __nr_to_section, > Use shift operation instead of division operation > in order to improve the performance of memory management. > There are no functional changes. > > Some performance data is as follows: > Machine configuration: Hygon 128 cores, 256M memory > > Stream single core: > with patch without patch promote > Copy 23376.7731 23907.1532 -1.27% > Scale 12580.2913 11679.7852 +7.71% > Add 11922.9562 11461.8669 +4.02% > Triad 12549.2735 11491.9798 +9.20% How stable are these numbers? Because this patch makes no sense to me. #define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT) with: #ifdef CONFIG_SPARSEMEM_EXTREME #define SECTIONS_PER_ROOT (PAGE_SIZE / sizeof (struct mem_section)) #else #define SECTIONS_PER_ROOT 1 #endif sizeof(struct mem_section) is a constant power-of-two. So if this result is real, then GCC isn't able to turn a divide-by-a-constant-power-of-two into a shift. That seems _really_ unlikely to me. And if that is what's going on, then that needs to be fixed! Can you examine some before-and-after assembly dumps to see if that is what's going on? > Signed-off-by: Guo Hui <guohui@uniontech.com> > --- > include/linux/mmzone.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 5e50b78d58ea..8dde6fb56109 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1818,7 +1818,8 @@ struct mem_section { > #define SECTIONS_PER_ROOT 1 > #endif > > -#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT) > +#define SECTION_ROOT_SHIFT (__builtin_popcount(SECTIONS_PER_ROOT - 1)) > +#define SECTION_NR_TO_ROOT(sec) ((sec) >> SECTION_ROOT_SHIFT) > #define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT) > #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1) > > -- > 2.20.1 > >
On 8/10/23 9:14 PM, Matthew Wilcox wrote: > On Thu, Aug 10, 2023 at 06:38:29PM +0800, Guo Hui wrote: >> In the function __nr_to_section, >> Use shift operation instead of division operation >> in order to improve the performance of memory management. >> There are no functional changes. >> >> Some performance data is as follows: >> Machine configuration: Hygon 128 cores, 256M memory >> >> Stream single core: >> with patch without patch promote >> Copy 23376.7731 23907.1532 -1.27% >> Scale 12580.2913 11679.7852 +7.71% >> Add 11922.9562 11461.8669 +4.02% >> Triad 12549.2735 11491.9798 +9.20% > How stable are these numbers? Because this patch makes no sense to me. Thank you for your reply. The increase is not stable, between 4% and 7%. > > #define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT) > > with: > > #ifdef CONFIG_SPARSEMEM_EXTREME > #define SECTIONS_PER_ROOT (PAGE_SIZE / sizeof (struct mem_section)) > #else > #define SECTIONS_PER_ROOT 1 > #endif > > sizeof(struct mem_section) is a constant power-of-two. So if this > result is real, then GCC isn't able to turn a > divide-by-a-constant-power-of-two into a shift. That seems _really_ > unlikely to me. And if that is what's going on, then that needs to be > fixed! Can you examine some before-and-after assembly dumps to see if > that is what's going on? Thank you for your guide. I have done an assembly code analysis on the use of __nr_to_section in the function online_mem_sections, as follows: ffffffff81383580 <online_mem_sections>: { ... ... return pfn >> PFN_SECTION_SHIFT; ffffffff8138359d: 48 89 f8 mov %rdi,%rax unsigned long root = SECTION_NR_TO_ROOT(nr); ffffffff813835a0: 48 89 f9 mov %rdi,%rcx return pfn >> PFN_SECTION_SHIFT; ffffffff813835a3: 48 c1 e8 0f shr $0xf,%rax unsigned long root = SECTION_NR_TO_ROOT(nr); ffffffff813835a7: 48 c1 e9 16 shr $0x16,%rcx ----------------------------- A ffffffff813835ab: e9 38 ea d5 01 jmpq ffffffff830e1fe8 <_einittext+0x2a78> In code line A, the compiler can automatically convert the division into a shift operation. My patch has the same effect, so it is unnecessary to use my patch, so the improvement of Stream above may come from other reasons, and I will continue to go deeper analyze this improvement. My intention is to use the function __builtin_popcount to calculate the number of 1s in (SECTIONS_PER_ROOT - 1) at compile time, which is the offset number of section_nr. > >> Signed-off-by: Guo Hui <guohui@uniontech.com> >> --- >> include/linux/mmzone.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 5e50b78d58ea..8dde6fb56109 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -1818,7 +1818,8 @@ struct mem_section { >> #define SECTIONS_PER_ROOT 1 >> #endif >> >> -#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT) >> +#define SECTION_ROOT_SHIFT (__builtin_popcount(SECTIONS_PER_ROOT - 1)) >> +#define SECTION_NR_TO_ROOT(sec) ((sec) >> SECTION_ROOT_SHIFT) >> #define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT) >> #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1) >> >> -- >> 2.20.1 >> >>
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 5e50b78d58ea..8dde6fb56109 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1818,7 +1818,8 @@ struct mem_section { #define SECTIONS_PER_ROOT 1 #endif -#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT) +#define SECTION_ROOT_SHIFT (__builtin_popcount(SECTIONS_PER_ROOT - 1)) +#define SECTION_NR_TO_ROOT(sec) ((sec) >> SECTION_ROOT_SHIFT) #define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT) #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1)
In the function __nr_to_section, Use shift operation instead of division operation in order to improve the performance of memory management. There are no functional changes. Some performance data is as follows: Machine configuration: Hygon 128 cores, 256M memory Stream single core: with patch without patch promote Copy 23376.7731 23907.1532 -1.27% Scale 12580.2913 11679.7852 +7.71% Add 11922.9562 11461.8669 +4.02% Triad 12549.2735 11491.9798 +9.20% Signed-off-by: Guo Hui <guohui@uniontech.com> --- include/linux/mmzone.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)