Message ID | 20231212-optimize_checksum-v12-1-419a4ba6d666@rivosinc.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | riscv: Add fine-tuned checksum functions | expand |
On Wed, Dec 13, 2023, at 02:18, Charlie Jenkins wrote: > This csum_fold implementation introduced into arch/arc by Vineet Gupta > is better than the default implementation on at least arc, x86, and > riscv. Using GCC trunk and compiling non-inlined version, this > implementation has 41.6667%, 25% fewer instructions on riscv64, x86-64 > respectively with -O3 optimization. Most implmentations override this > default in asm, but this should be more performant than all of those > other implementations except for arm which has barrel shifting and > sparc32 which has a carry flag. > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Arnd Bergmann <arnd@arndb.de>
diff --git a/include/asm-generic/checksum.h b/include/asm-generic/checksum.h index 43e18db89c14..ad928cce268b 100644 --- a/include/asm-generic/checksum.h +++ b/include/asm-generic/checksum.h @@ -2,6 +2,8 @@ #ifndef __ASM_GENERIC_CHECKSUM_H #define __ASM_GENERIC_CHECKSUM_H +#include <linux/bitops.h> + /* * computes the checksum of a memory block at buff, length len, * and adds in "sum" (32-bit) @@ -31,9 +33,7 @@ extern __sum16 ip_fast_csum(const void *iph, unsigned int ihl); static inline __sum16 csum_fold(__wsum csum) { u32 sum = (__force u32)csum; - sum = (sum & 0xffff) + (sum >> 16); - sum = (sum & 0xffff) + (sum >> 16); - return (__force __sum16)~sum; + return (__force __sum16)((~sum - ror32(sum, 16)) >> 16); } #endif