Message ID | 20231202135202.4071-1-jszhang@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] riscv: select ARCH_HAS_FAST_MULTIPLIER | expand |
On Sat, Dec 02, 2023 at 09:52:02PM +0800, Jisheng Zhang wrote: > Currently, riscv linux requires at least IMA, so all platforms have a > multiplier. And I assume the 'mul' efficiency is comparable or better > than a sequence of five or so register-dependent arithmetic > instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer > codegen. Refer to commit f9b4192923fa ("[PATCH] bitops: hweight() > speedup") for more details. > > In a simple benchmark test calling hweight64() in a loop, it got: > about 14% performance improvement on JH7110, tested on Milkv Mars. > > about 23% performance improvement on TH1520 and SG2042, tested on > Sipeed LPI4A and SG2042 platform. > > a slight performance drop on CV1800B, tested on milkv duo. Among all > riscv platforms in my hands, this is the only one which sees a slight > performance drop. It means the 'mul' isn't quick enough. However, the > situation exists on x86 too, for example, P4 doesn't have fast > integer multiplies as said in the above commit, x86 also selects > ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER > which can benefit almost riscv platforms. > > Samuel also provided some performance numbers: > On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for > __sw_hweight64. > On D1: 8% speedup for __sw_hweight32 and 8% slowdown for > __sw_hweight64. > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > Reviewed-by: Samuel Holland <samuel.holland@sifive.com> > Tested-by: Samuel Holland <samuel.holland@sifive.com> Hi @Palmer, I saw this simple patch is missed in your for-next tree, could you please pick it up? Thanks in advance > --- > > since v1: > - fix typo in commit msg > - add some performance numbers provided by Samuel > - collect Reviewed-by and Tested-by tag > > arch/riscv/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 95a2a06acc6a..e4834fa76417 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -23,6 +23,7 @@ config RISCV > select ARCH_HAS_DEBUG_VIRTUAL if MMU > select ARCH_HAS_DEBUG_VM_PGTABLE > select ARCH_HAS_DEBUG_WX > + select ARCH_HAS_FAST_MULTIPLIER > select ARCH_HAS_FORTIFY_SOURCE > select ARCH_HAS_GCOV_PROFILE_ALL > select ARCH_HAS_GIGANTIC_PAGE > -- > 2.42.0 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
Hi Jisheng, On 02/12/2023 14:52, Jisheng Zhang wrote: > Currently, riscv linux requires at least IMA, so all platforms have a > multiplier. And I assume the 'mul' efficiency is comparable or better > than a sequence of five or so register-dependent arithmetic > instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer > codegen. Refer to commit f9b4192923fa ("[PATCH] bitops: hweight() > speedup") for more details. > > In a simple benchmark test calling hweight64() in a loop, it got: > about 14% performance improvement on JH7110, tested on Milkv Mars. > > about 23% performance improvement on TH1520 and SG2042, tested on > Sipeed LPI4A and SG2042 platform. > > a slight performance drop on CV1800B, tested on milkv duo. Among all > riscv platforms in my hands, this is the only one which sees a slight > performance drop. It means the 'mul' isn't quick enough. However, the > situation exists on x86 too, for example, P4 doesn't have fast > integer multiplies as said in the above commit, x86 also selects > ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER > which can benefit almost riscv platforms. > > Samuel also provided some performance numbers: > On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for > __sw_hweight64. > On D1: 8% speedup for __sw_hweight32 and 8% slowdown for > __sw_hweight64. > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > Reviewed-by: Samuel Holland <samuel.holland@sifive.com> > Tested-by: Samuel Holland <samuel.holland@sifive.com> > --- > > since v1: > - fix typo in commit msg > - add some performance numbers provided by Samuel > - collect Reviewed-by and Tested-by tag > > arch/riscv/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 95a2a06acc6a..e4834fa76417 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -23,6 +23,7 @@ config RISCV > select ARCH_HAS_DEBUG_VIRTUAL if MMU > select ARCH_HAS_DEBUG_VM_PGTABLE > select ARCH_HAS_DEBUG_WX > + select ARCH_HAS_FAST_MULTIPLIER > select ARCH_HAS_FORTIFY_SOURCE > select ARCH_HAS_GCOV_PROFILE_ALL > select ARCH_HAS_GIGANTIC_PAGE You can add: Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Thanks, Alex
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 95a2a06acc6a..e4834fa76417 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -23,6 +23,7 @@ config RISCV select ARCH_HAS_DEBUG_VIRTUAL if MMU select ARCH_HAS_DEBUG_VM_PGTABLE select ARCH_HAS_DEBUG_WX + select ARCH_HAS_FAST_MULTIPLIER select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE