From patchwork Tue Sep 22 00:26:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug Anderson X-Patchwork-Id: 11791311 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 127CD112E for ; Tue, 22 Sep 2020 00:26:47 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9BDFB23A79 for ; Tue, 22 Sep 2020 00:26:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="vj4SV9a8"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="jjRfaFPz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9BDFB23A79 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=chromium.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:To:From: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=7bhwYtahgxByftAcNTzDfWJ5G8aCWynPf+Fxi4bv55w=; b=vj4SV9a8YDCFHyb010yP3gM2To wHT/sTRu6k7gZ23y+W7BdPRCGhv3P86LyROZApaSmOfVKwVQ1lHIItuai56Ig14K3LxV5YWYqjcVv fXgE6KeYNz0MSgg71BPNZxZfl6GnmrlnOteeEfj2VC0vQxg4qFghobOe5b3Dy/aeECC4K+CxDa43B +mQBGCzpJTN8qL11BbckzxSCfMSgFS8Ds/McoBfCMq4vOS+uPWiFwHEI4W5Y2YeaV8Nyl7JGIq48U JAR5nZ74ZrtV1+bSevldonsmZDNwKImM5a7kUXeVZ9PlMlOAFbhj4/LRj3goMJABiboRIYN7Fecdl TZs+ha2g==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kKW8h-0002Ao-GD; Tue, 22 Sep 2020 00:26:35 +0000 Received: from mail-pg1-x541.google.com ([2607:f8b0:4864:20::541]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kKW8f-0002A2-Hp for linux-arm-kernel@lists.infradead.org; Tue, 22 Sep 2020 00:26:34 +0000 Received: by mail-pg1-x541.google.com with SMTP id l191so10475709pgd.5 for ; Mon, 21 Sep 2020 17:26:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=f/2N6mzSda+euBdN9YLPr/ei4P32nyFOAmhVMyjcn2g=; b=jjRfaFPz462gb4Nx/t7Kbgiq55j07WhQp0rVjx2vn6ic2xOQPz2pMwyjRY6wfWrYgY +XVcjoaM1cOKkbm2+/fldKBYi6hUdAI8WnsTRU3MQZwgiJ3n7u3L0HCHlBmOZjTvlWGn MRNX5K2SLaw0woDbKPfl0U2wumDWIIGE+Uurk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=f/2N6mzSda+euBdN9YLPr/ei4P32nyFOAmhVMyjcn2g=; b=dNrCii2pH0tPHqPG+hSlRfRbRvpqDhbxsokN3CEIMffqrW1tO2jpYSOHGJ4NKrOo84 a/c9R/8Ol63SfHsP4Q+mxQtARhta0sApzaFMbaBIHJa43UIYdJA/v99OAox5o+jmzZHC SevTGggZ9x4gapVuBbzY+A4C/gNIchbYkeJzVAxkjRJfv2raXYcACg9YF1ufPOfYuSx2 3MSbxwnZVgoanxbHJueQDz2ynvbdG5+33TGlVQ2GvlkwDQI8Op5Sk1xY8ktUhShbn1LL ECLnzl8E0mcqdmYarjhfCew21z6AkQmjkaDqo4/35ubmm0ySUcGxaagkMKihajTAFGUC OjDQ== X-Gm-Message-State: AOAM531OprGeb19ZgrTuGFgVQQRDJe4rdoHZ+tqUijEvUneIMg0pNJ/Y FWozp/9WfOiKhhd6AcKPSF6WXg== X-Google-Smtp-Source: ABdhPJz4O4A3jWBBL5m0YJIuyrVwlUjdyRvpPSH47+TXpRBZit2zPSflscLfljcESVyvzelRLYFs3A== X-Received: by 2002:a62:d44e:0:b029:13c:1611:652f with SMTP id u14-20020a62d44e0000b029013c1611652fmr1929971pfl.15.1600734389830; Mon, 21 Sep 2020 17:26:29 -0700 (PDT) Received: from tictac2.mtv.corp.google.com ([2620:15c:202:1:42b0:34ff:fe3d:58e6]) by smtp.gmail.com with ESMTPSA id m21sm12913815pfo.13.2020.09.21.17.26.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Sep 2020 17:26:29 -0700 (PDT) From: Douglas Anderson To: Catalin Marinas , Will Deacon Subject: [PATCH] arm64: crypto: Add an option to assume NEON XOR is the fastest Date: Mon, 21 Sep 2020 17:26:08 -0700 Message-Id: <20200921172603.1.Id9450c1d3deef17718bd5368580a3c44895209ee@changeid> X-Mailer: git-send-email 2.28.0.681.g6f77f65b4e-goog MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200921_202633_589021_1A3420D3 X-CRM114-Status: GOOD ( 22.93 ) X-Spam-Score: -1.7 (-) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-1.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2607:f8b0:4864:20:0:0:0:541 listed in] [list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -1.5 DKIMWL_WL_HIGH DKIMwl.org - High trust sender X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-kernel@vger.kernel.org, Jackie Liu , Douglas Anderson , linux-arm-kernel@lists.infradead.org, Ard Biesheuvel Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org On every boot time we see messages like this: [ 0.025360] calling calibrate_xor_blocks+0x0/0x134 @ 1 [ 0.025363] xor: measuring software checksum speed [ 0.035351] 8regs : 3952.000 MB/sec [ 0.045384] 32regs : 4860.000 MB/sec [ 0.055418] arm64_neon: 5900.000 MB/sec [ 0.055423] xor: using function: arm64_neon (5900.000 MB/sec) [ 0.055433] initcall calibrate_xor_blocks+0x0/0x134 returned 0 after 29296 usecs As you can see, we spend 30 ms on every boot re-confirming that, yet again, the arm64_neon implementation is the fastest way to do XOR. ...and the above is on a system with HZ=1000. Due to the way the testing happens, if we have HZ defined to something slower it'll take much longer. HZ=100 means we spend 300 ms on every boot re-confirming a fact that will be the same for every bootup. Trying to super-optimize the xor operation makes a lot of sense if you're using software RAID, but the above is probably not worth it for most Linux users because: 1. Quite a few arm64 kernels are built for embedded systems where software raid isn't common. That means we're spending lots of time on every boot trying to optimize something we don't use. 2. Presumably, if we have neon, it's faster than alternatives. If it's not, it's not expected to be tons slower. 3. Quite a lot of arm64 systems are big.LITTLE. This means that the existing test is somewhat misguided because it's assuming that test results on the boot CPU apply to the other CPUs in the system. This is not necessarily the case. Let's add a new config option that allows us to just use the neon functions (if present) without benchmarking. NOTE: One small side effect is that on an arm64 system _without_ neon we'll end up testing the xor_block_8regs_p and xor_block_32regs_p versions of the function. That's presumably OK since we already test all those when KERNEL_MODE_NEON is disabled. ALSO NOTE: presumably the way to do better than this is to add some sort of per-CPU-core lookup table and jump to a per-CPU-core-specific XOR function each time xor is called. Without seeing evidence that this would really help someone, though, that doesn't seem worth it. Signed-off-by: Douglas Anderson --- arch/arm64/Kconfig | 15 +++++++++++++++ arch/arm64/include/asm/xor.h | 5 +++++ 2 files changed, 20 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 64ae5e4eb814..fc18df45a5f8 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -306,6 +306,21 @@ config SMP config KERNEL_MODE_NEON def_bool y +menuconfig FORCE_NEON_XOR_IF_AVAILABLE + bool "Assume neon is fastest for xor if the CPU supports it" + default y + depends on KERNEL_MODE_NEON + help + Normally the kernel will run through several different XOR + algorithms at boot, timing them on the boot processor to see + which is fastest. This can take quite some time. On many + machines it's expected that, if NEON is available, it's going + to provide the fastest implementation. If you set this option + we'll skip testing this every boot and just assume NEON is the + fastest if present. Setting this option will speed up your + boot but you might end up with a less-optimal xor + implementation. + config FIX_EARLYCON_MEM def_bool y diff --git a/arch/arm64/include/asm/xor.h b/arch/arm64/include/asm/xor.h index 947f6a4f1aa0..1acb290866ab 100644 --- a/arch/arm64/include/asm/xor.h +++ b/arch/arm64/include/asm/xor.h @@ -57,6 +57,10 @@ static struct xor_block_template xor_block_arm64 = { .do_4 = xor_neon_4, .do_5 = xor_neon_5 }; +#ifdef CONFIG_FORCE_NEON_XOR_IF_AVAILABLE +#define XOR_SELECT_TEMPLATE(FASTEST) \ + (cpu_has_neon() ? &xor_block_arm64 : FASTEST) +#else /* ! CONFIG_FORCE_NEON_XOR_IF_AVAILABLE */ #undef XOR_TRY_TEMPLATES #define XOR_TRY_TEMPLATES \ do { \ @@ -66,5 +70,6 @@ static struct xor_block_template xor_block_arm64 = { xor_speed(&xor_block_arm64);\ } \ } while (0) +#endif /* ! CONFIG_FORCE_NEON_XOR_IF_AVAILABLE */ #endif /* ! CONFIG_KERNEL_MODE_NEON */