Message ID | 20230915-optimize_checksum-v6-1-14a6cf61c618@rivosinc.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | riscv: Add fine-tuned checksum functions | expand |
Context | Check | Description |
---|---|---|
conchuod/cover_letter | success | Series has a cover letter |
conchuod/tree_selection | success | Guessed tree name to be for-next at HEAD 0bb80ecc33a8 |
conchuod/fixes_present | success | Fixes tag not required for -next series |
conchuod/maintainers_pattern | success | MAINTAINERS pattern errors before the patch: 5 and now 5 |
conchuod/verify_signedoff | success | Signed-off-by tag matches author and committer |
conchuod/kdoc | success | Errors and warnings before: 0 this patch: 0 |
conchuod/build_rv64_clang_allmodconfig | fail | Failed to build the tree with this patch. |
conchuod/module_param | success | Was 0 now: 0 |
conchuod/build_rv64_gcc_allmodconfig | fail | Failed to build the tree with this patch. |
conchuod/build_rv32_defconfig | fail | Build failed |
conchuod/dtb_warn_rv64 | success | Errors and warnings before: 25 this patch: 25 |
conchuod/header_inline | success | No static functions without inline keyword in header files |
conchuod/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 10 lines checked |
conchuod/build_rv64_nommu_k210_defconfig | fail | Build failed |
conchuod/verify_fixes | success | No Fixes tag |
conchuod/build_rv64_nommu_virt_defconfig | fail | Build failed |
On Fri, Sep 15, 2023 at 10:01:17AM -0700, Charlie Jenkins wrote: > This csum_fold implementation introduced into arch/arc by Vineet Gupta > is better than the default implementation on at least arc, x86, and > riscv. Using GCC trunk and compiling non-inlined version, this > implementation has 41.6667%, 25% fewer instructions on riscv64, x86-64 > respectively with -O3 optimization. Most implmentations override this > default in asm, but this should be more performant than all of those > other implementations except for arm which has barrel shifting and > sparc32 which has a carry flag. > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> > Reviewed-by: David Laight <david.laight@aculab.com> > --- > include/asm-generic/checksum.h | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/include/asm-generic/checksum.h b/include/asm-generic/checksum.h > index 43e18db89c14..37f5ec70ac93 100644 > --- a/include/asm-generic/checksum.h > +++ b/include/asm-generic/checksum.h > @@ -31,9 +31,7 @@ extern __sum16 ip_fast_csum(const void *iph, unsigned int ihl); > static inline __sum16 csum_fold(__wsum csum) > { > u32 sum = (__force u32)csum; > - sum = (sum & 0xffff) + (sum >> 16); > - sum = (sum & 0xffff) + (sum >> 16); > - return (__force __sum16)~sum; > + return (__force __sum16)((~sum - ror32(sum, 16)) >> 16); Breaks the build on RISC-V in a way that is repaired by later patches in the series, so you likely did not notice: ./include/asm-generic/checksum.h:34:35: error: call to undeclared function 'ror32'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] ../include/linux/bitops.h:134:21: error: static declaration of 'ror32' follows non-static declaration Cheers, Conor.
diff --git a/include/asm-generic/checksum.h b/include/asm-generic/checksum.h index 43e18db89c14..37f5ec70ac93 100644 --- a/include/asm-generic/checksum.h +++ b/include/asm-generic/checksum.h @@ -31,9 +31,7 @@ extern __sum16 ip_fast_csum(const void *iph, unsigned int ihl); static inline __sum16 csum_fold(__wsum csum) { u32 sum = (__force u32)csum; - sum = (sum & 0xffff) + (sum >> 16); - sum = (sum & 0xffff) + (sum >> 16); - return (__force __sum16)~sum; + return (__force __sum16)((~sum - ror32(sum, 16)) >> 16); } #endif