Message ID | e2864e9c5d794c79aa7ee7de4abbfc6d@AcuMS.aculab.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2] x86/lib: Remove the special case for odd-aligned buffers in csum-partial_64.c | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
From: David Laight > Sent: 06 January 2022 14:46 > > There is no need to special case the very unusual odd-aligned buffers. > They are no worse than 4n+2 aligned buffers. > > Signed-off-by: David Laight <david.laight@aculab.com> > Acked-by: Eric Dumazet > --- Ping... This (and my two other patches for the same file) are improvements to Eric's rewrite of this code that is going into 5.17. It would be nice to get these in as well. They are likely to be measurable (if minor) performance improvements for common cases. David > > resend - v1 seems to have got lost :-) > > v2: Also delete from32to16() > Add acked-by from Eric (he sent one at some point) > Fix possible whitespace error in the last hunk. > > The penalty for any misaligned access seems to be minimal. > On an i7-7700 misaligned buffers add 2 or 3 clocks (in 115) to a 512 byte > checksum. > That is less than 1 clock for each cache line! > That is just measuring the main loop with an lfence prior to rdpmc to > read PERF_COUNT_HW_CPU_CYCLES. > > arch/x86/lib/csum-partial_64.c | 28 ++-------------------------- > 1 file changed, 2 insertions(+), 26 deletions(-) > > diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c > index 1f8a8f895173..061b1ed74d6a 100644 > --- a/arch/x86/lib/csum-partial_64.c > +++ b/arch/x86/lib/csum-partial_64.c > @@ -11,16 +11,6 @@ > #include <asm/checksum.h> > #include <asm/word-at-a-time.h> > > -static inline unsigned short from32to16(unsigned a) > -{ > - unsigned short b = a >> 16; > - asm("addw %w2,%w0\n\t" > - "adcw $0,%w0\n" > - : "=r" (b) > - : "0" (b), "r" (a)); > - return b; > -} > - > /* > * Do a checksum on an arbitrary memory area. > * Returns a 32bit checksum. > @@ -30,22 +20,12 @@ static inline unsigned short from32to16(unsigned a) > * > * Still, with CHECKSUM_COMPLETE this is called to compute > * checksums on IPv6 headers (40 bytes) and other small parts. > - * it's best to have buff aligned on a 64-bit boundary > + * The penalty for misaligned buff is negligable. > */ > __wsum csum_partial(const void *buff, int len, __wsum sum) > { > u64 temp64 = (__force u64)sum; > - unsigned odd, result; > - > - odd = 1 & (unsigned long) buff; > - if (unlikely(odd)) { > - if (unlikely(len == 0)) > - return sum; > - temp64 = ror32((__force u32)sum, 8); > - temp64 += (*(unsigned char *)buff << 8); > - len--; > - buff++; > - } > + unsigned result; > > while (unlikely(len >= 64)) { > asm("addq 0*8(%[src]),%[res]\n\t" > @@ -130,10 +110,6 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) > #endif > } > result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff); > - if (unlikely(odd)) { > - result = from32to16(result); > - result = ((result >> 8) & 0xff) | ((result & 0xff) << 8); > - } > return (__force __wsum)result; > } > EXPORT_SYMBOL(csum_partial); > -- > 2.17.1 > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c index 1f8a8f895173..061b1ed74d6a 100644 --- a/arch/x86/lib/csum-partial_64.c +++ b/arch/x86/lib/csum-partial_64.c @@ -11,16 +11,6 @@ #include <asm/checksum.h> #include <asm/word-at-a-time.h> -static inline unsigned short from32to16(unsigned a) -{ - unsigned short b = a >> 16; - asm("addw %w2,%w0\n\t" - "adcw $0,%w0\n" - : "=r" (b) - : "0" (b), "r" (a)); - return b; -} - /* * Do a checksum on an arbitrary memory area. * Returns a 32bit checksum. @@ -30,22 +20,12 @@ static inline unsigned short from32to16(unsigned a) * * Still, with CHECKSUM_COMPLETE this is called to compute * checksums on IPv6 headers (40 bytes) and other small parts. - * it's best to have buff aligned on a 64-bit boundary + * The penalty for misaligned buff is negligable. */ __wsum csum_partial(const void *buff, int len, __wsum sum) { u64 temp64 = (__force u64)sum; - unsigned odd, result; - - odd = 1 & (unsigned long) buff; - if (unlikely(odd)) { - if (unlikely(len == 0)) - return sum; - temp64 = ror32((__force u32)sum, 8); - temp64 += (*(unsigned char *)buff << 8); - len--; - buff++; - } + unsigned result; while (unlikely(len >= 64)) { asm("addq 0*8(%[src]),%[res]\n\t" @@ -130,10 +110,6 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) #endif } result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff); - if (unlikely(odd)) { - result = from32to16(result); - result = ((result >> 8) & 0xff) | ((result & 0xff) << 8); - } return (__force __wsum)result; } EXPORT_SYMBOL(csum_partial);