From patchwork Thu Jan 6 14:45:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Laight X-Patchwork-Id: 12705476 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 163FDC433F5 for ; Thu, 6 Jan 2022 14:45:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240228AbiAFOps (ORCPT ); Thu, 6 Jan 2022 09:45:48 -0500 Received: from eu-smtp-delivery-151.mimecast.com ([185.58.86.151]:51960 "EHLO eu-smtp-delivery-151.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239677AbiAFOpr (ORCPT ); Thu, 6 Jan 2022 09:45:47 -0500 Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-167-S1Fja6XRP0mJDfamsb7scg-1; Thu, 06 Jan 2022 14:45:42 +0000 X-MC-Unique: S1Fja6XRP0mJDfamsb7scg-1 Received: from AcuMS.Aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) by AcuMS.aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) with Microsoft SMTP Server (TLS) id 15.0.1497.26; Thu, 6 Jan 2022 14:45:41 +0000 Received: from AcuMS.Aculab.com ([fe80::994c:f5c2:35d6:9b65]) by AcuMS.aculab.com ([fe80::994c:f5c2:35d6:9b65%12]) with mapi id 15.00.1497.026; Thu, 6 Jan 2022 14:45:41 +0000 From: David Laight To: 'Eric Dumazet' , Peter Zijlstra CC: "'tglx@linutronix.de'" , "'mingo@redhat.com'" , 'Borislav Petkov' , "'dave.hansen@linux.intel.com'" , 'X86 ML' , "'hpa@zytor.com'" , "'alexanderduyck@fb.com'" , 'open list' , 'netdev' , "'Noah Goldstein'" Subject: [PATCH v2] x86/lib: Remove the special case for odd-aligned buffers in csum-partial_64.c Thread-Topic: [PATCH v2] x86/lib: Remove the special case for odd-aligned buffers in csum-partial_64.c Thread-Index: AdgDCwHV1n1J2N5MS6O6mkuVVrK4ag== Date: Thu, 6 Jan 2022 14:45:41 +0000 Message-ID: Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=C51A453 smtp.mailfrom=david.laight@aculab.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org There is no need to special case the very unusual odd-aligned buffers. They are no worse than 4n+2 aligned buffers. Signed-off-by: David Laight Acked-by: Eric Dumazet --- resend - v1 seems to have got lost :-) v2: Also delete from32to16() Add acked-by from Eric (he sent one at some point) Fix possible whitespace error in the last hunk. The penalty for any misaligned access seems to be minimal. On an i7-7700 misaligned buffers add 2 or 3 clocks (in 115) to a 512 byte checksum. That is less than 1 clock for each cache line! That is just measuring the main loop with an lfence prior to rdpmc to read PERF_COUNT_HW_CPU_CYCLES. arch/x86/lib/csum-partial_64.c | 28 ++-------------------------- 1 file changed, 2 insertions(+), 26 deletions(-) diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c index 1f8a8f895173..061b1ed74d6a 100644 --- a/arch/x86/lib/csum-partial_64.c +++ b/arch/x86/lib/csum-partial_64.c @@ -11,16 +11,6 @@ #include #include -static inline unsigned short from32to16(unsigned a) -{ - unsigned short b = a >> 16; - asm("addw %w2,%w0\n\t" - "adcw $0,%w0\n" - : "=r" (b) - : "0" (b), "r" (a)); - return b; -} - /* * Do a checksum on an arbitrary memory area. * Returns a 32bit checksum. @@ -30,22 +20,12 @@ static inline unsigned short from32to16(unsigned a) * * Still, with CHECKSUM_COMPLETE this is called to compute * checksums on IPv6 headers (40 bytes) and other small parts. - * it's best to have buff aligned on a 64-bit boundary + * The penalty for misaligned buff is negligable. */ __wsum csum_partial(const void *buff, int len, __wsum sum) { u64 temp64 = (__force u64)sum; - unsigned odd, result; - - odd = 1 & (unsigned long) buff; - if (unlikely(odd)) { - if (unlikely(len == 0)) - return sum; - temp64 = ror32((__force u32)sum, 8); - temp64 += (*(unsigned char *)buff << 8); - len--; - buff++; - } + unsigned result; while (unlikely(len >= 64)) { asm("addq 0*8(%[src]),%[res]\n\t" @@ -130,10 +110,6 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) #endif } result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff); - if (unlikely(odd)) { - result = from32to16(result); - result = ((result >> 8) & 0xff) | ((result & 0xff) << 8); - } return (__force __wsum)result; } EXPORT_SYMBOL(csum_partial);