From patchwork Thu Jan  6 14:45:41 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Laight <David.Laight@ACULAB.COM>
X-Patchwork-Id: 12705476
X-Patchwork-Delegate: kuba@kernel.org
Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 163FDC433F5
	for <netdev@archiver.kernel.org>; Thu,  6 Jan 2022 14:45:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S240228AbiAFOps (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Thu, 6 Jan 2022 09:45:48 -0500
Received: from eu-smtp-delivery-151.mimecast.com ([185.58.86.151]:51960 "EHLO
        eu-smtp-delivery-151.mimecast.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S239677AbiAFOpr (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 6 Jan 2022 09:45:47 -0500
Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by
 relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id
 uk-mta-167-S1Fja6XRP0mJDfamsb7scg-1; Thu, 06 Jan 2022 14:45:42 +0000
X-MC-Unique: S1Fja6XRP0mJDfamsb7scg-1
Received: from AcuMS.Aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) by
 AcuMS.aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) with Microsoft SMTP
 Server (TLS) id 15.0.1497.26; Thu, 6 Jan 2022 14:45:41 +0000
Received: from AcuMS.Aculab.com ([fe80::994c:f5c2:35d6:9b65]) by
 AcuMS.aculab.com ([fe80::994c:f5c2:35d6:9b65%12]) with mapi id
 15.00.1497.026; Thu, 6 Jan 2022 14:45:41 +0000
From: David Laight <David.Laight@ACULAB.COM>
To: 'Eric Dumazet' <edumazet@google.com>,
        Peter Zijlstra <peterz@infradead.org>
CC: "'tglx@linutronix.de'" <tglx@linutronix.de>,
        "'mingo@redhat.com'" <mingo@redhat.com>,
        'Borislav Petkov' <bp@alien8.de>,
        "'dave.hansen@linux.intel.com'" <dave.hansen@linux.intel.com>,
        'X86 ML' <x86@kernel.org>, "'hpa@zytor.com'" <hpa@zytor.com>,
        "'alexanderduyck@fb.com'" <alexanderduyck@fb.com>,
        'open list' <linux-kernel@vger.kernel.org>,
        'netdev' <netdev@vger.kernel.org>,
        "'Noah Goldstein'" <goldstein.w.n@gmail.com>
Subject: [PATCH v2] x86/lib: Remove the special case for odd-aligned buffers
 in csum-partial_64.c
Thread-Topic: [PATCH v2] x86/lib: Remove the special case for odd-aligned
 buffers in csum-partial_64.c
Thread-Index: AdgDCwHV1n1J2N5MS6O6mkuVVrK4ag==
Date: Thu, 6 Jan 2022 14:45:41 +0000
Message-ID: <e2864e9c5d794c79aa7ee7de4abbfc6d@AcuMS.aculab.com>
Accept-Language: en-GB, en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.202.205.107]
MIME-Version: 1.0
Authentication-Results: relay.mimecast.com;
        auth=pass smtp.auth=C51A453 smtp.mailfrom=david.laight@aculab.com
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: aculab.com
Content-Language: en-US
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

There is no need to special case the very unusual odd-aligned buffers.
They are no worse than 4n+2 aligned buffers.

Signed-off-by: David Laight <david.laight@aculab.com>
Acked-by: Eric Dumazet
---

resend - v1 seems to have got lost :-)

v2: Also delete from32to16()
    Add acked-by from Eric (he sent one at some point)
    Fix possible whitespace error in the last hunk.

The penalty for any misaligned access seems to be minimal.
On an i7-7700 misaligned buffers add 2 or 3 clocks (in 115) to a 512 byte
  checksum.
That is less than 1 clock for each cache line!
That is just measuring the main loop with an lfence prior to rdpmc to
read PERF_COUNT_HW_CPU_CYCLES.

 arch/x86/lib/csum-partial_64.c | 28 ++--------------------------
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index 1f8a8f895173..061b1ed74d6a 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -11,16 +11,6 @@
 #include <asm/checksum.h>
 #include <asm/word-at-a-time.h>
 
-static inline unsigned short from32to16(unsigned a) 
-{
-	unsigned short b = a >> 16; 
-	asm("addw %w2,%w0\n\t"
-	    "adcw $0,%w0\n" 
-	    : "=r" (b)
-	    : "0" (b), "r" (a));
-	return b;
-}
-
 /*
  * Do a checksum on an arbitrary memory area.
  * Returns a 32bit checksum.
@@ -30,22 +20,12 @@ static inline unsigned short from32to16(unsigned a)
  *
  * Still, with CHECKSUM_COMPLETE this is called to compute
  * checksums on IPv6 headers (40 bytes) and other small parts.
- * it's best to have buff aligned on a 64-bit boundary
+ * The penalty for misaligned buff is negligable.
  */
 __wsum csum_partial(const void *buff, int len, __wsum sum)
 {
 	u64 temp64 = (__force u64)sum;
-	unsigned odd, result;
-
-	odd = 1 & (unsigned long) buff;
-	if (unlikely(odd)) {
-		if (unlikely(len == 0))
-			return sum;
-		temp64 = ror32((__force u32)sum, 8);
-		temp64 += (*(unsigned char *)buff << 8);
-		len--;
-		buff++;
-	}
+	unsigned result;
 
 	while (unlikely(len >= 64)) {
 		asm("addq 0*8(%[src]),%[res]\n\t"
@@ -130,10 +110,6 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
 #endif
 	}
 	result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff);
-	if (unlikely(odd)) {
-		result = from32to16(result);
-		result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
-	}
 	return (__force __wsum)result;
 }
 EXPORT_SYMBOL(csum_partial);