From patchwork Mon Jul 31 20:43:55 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arnd Bergmann X-Patchwork-Id: 9873121 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 04A396037D for ; Mon, 31 Jul 2017 20:44:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E4967285DB for ; Mon, 31 Jul 2017 20:44:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D5B5A285E4; Mon, 31 Jul 2017 20:44:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 036AE285DB for ; Mon, 31 Jul 2017 20:44:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751126AbdGaUo0 (ORCPT ); Mon, 31 Jul 2017 16:44:26 -0400 Received: from mout.kundenserver.de ([212.227.17.24]:60822 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751059AbdGaUoZ (ORCPT ); Mon, 31 Jul 2017 16:44:25 -0400 Received: from wuerfel.lan ([78.43.238.10]) by mrelayeu.kundenserver.de (mreue103 [212.227.15.145]) with ESMTPA (Nemesis) id 0LoaZ0-1e4BIv3iri-00gZcO; Mon, 31 Jul 2017 22:44:11 +0200 From: Arnd Bergmann To: Herbert Xu , "David S. Miller" Cc: Arnd Bergmann , linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] crypto: serpent: improve __serpent_setkey with UBSAN Date: Mon, 31 Jul 2017 22:43:55 +0200 Message-Id: <20170731204407.1396336-1-arnd@arndb.de> X-Mailer: git-send-email 2.9.0 X-Provags-ID: V03:K0:DLbu7Ovj+O3hCVzxG7ifT2IflwiRCZYCUQHEWfBuu8hs1HBW+oU hufbKwA8VYaanMAkpDIqV5Y/vwbgBBH+2fs96X6kb7H7VOpOSqC4XasnM8SUviVL/R4ezXS Uw49Asp56SbEAc5VlyWHeZ/qFmnObBduu+szt0Xlyh/HF2AL2scgTSsStDi1zjTycBlKnEY mOC1rzFHiCCgQTW6S+/CA== X-UI-Out-Filterresults: notjunk:1; V01:K0:ydYBPgqwizg=:OwYzf/b2pfOVnP0o/xhTmN naMm8Zx6K3OXmwspcA3NSgKnKQBq/JBZ/VRDYx4nsi4rBm7fvPdlTz7BtoVwfAEQvPBxiOjiB Vqjfob4uB0vzIfboGwfsz1Fp7u/AfL8hslztEaXCLji0v/28ft/wxboTU7fUe5S8NqIegfhW9 LU1BNAmlYUa6Ch6/IYpTajnLSGbPb03bFMVlPCKNzpwEG3rpLD5EnilOrWsxqVdbvgVD28MNx C6FSnKbTB9ToQKe1r/daDLsgYCQkOElyLnDcy7gavT9Hre+h5V1znu7BSo3Zi4FBAF6leIZWD 7MKeCBnuG3vF6irI5iK1ue3/CIrbpKa0LI3VLXQi5kmX+TWfLU/teSbdMdsZXTkG4B5Vn5FgZ IEHunk67IWCQOFfnOxv+WAMy02YrA5HZf+pvzj+rpXuHofoiSSoj4a/Yc+zzrJnkMHHfIDlDY DE3kFS6Jovrt2hNOHVt/nLplO+ItxCFgaPQwfwmwroAA1v0++/kEyl1Kh1js0oZPbNcHwVpeh UDk/+1dMXbHk/OOBVoIHDy3XlGP9SLyfFPQkZjiLYcL8rDaPSHiwyP9WUkZNT3RCa5VB8ADXI WxLK5KUuYaG1EyJedqPga6doQr/SkSjYSyAWCdCYoNub+uEqH0C+RFG8vOK1wm/3clyE6Tg3j cOJ/tNO2xYOBb6ikVML5Bs/VW1eDnrHtcpc+8RsBoRO5RShBh+85gqqEaAM3Rehpvi2AziVTu Ul6YfKelijEaPhNYPbAPClH6bBBqLhcR1HV02Q== Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When UBSAN is enabled, we get a very large stack frame for __serpent_setkey, when the register allocator ends up using more registers than it has, and has to spill temporary values to the stack. The code was originally optimized for in-order x86-32 CPU implementations using older compilers, but it now runs into a highly suboptimal case on all CPU architectures, as seen by this warning: crypto/serpent_generic.c: In function '__serpent_setkey': crypto/serpent_generic.c:436:1: error: the frame size of 2720 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Disabling -fsanitize=alignment would avoid that warning, presumably the option turns off a optimization step that is required for getting the register allocation right, but there is no easy way to do that on gcc-7 (gcc-8 introduces a function attribute for this). I tried to figure out a way to modify the source code instead, and noticed that the two stages of the setkey() function (keyiter and sbox) each are fine by themselves, but not when combined into one function. Splitting out the entire sbox into a separate function also happens to work fine with all compilers I tried (arm, arm64 and x86). The setkey function uses a strange way to handle offsets into the key array, using both negative and positive index values, as well as adjusting the array pointer back and forth. I have checked that this actually makes no difference to modern compilers, but I left that untouched to make the patch easier to review and to keep the code closer to the reference implementation. Link: https://patchwork.kernel.org/patch/9189575/ Signed-off-by: Arnd Bergmann --- crypto/serpent_generic.c | 77 ++++++++++++++++++++++++++---------------------- 1 file changed, 41 insertions(+), 36 deletions(-) diff --git a/crypto/serpent_generic.c b/crypto/serpent_generic.c index 94970a794975..7c3382facc82 100644 --- a/crypto/serpent_generic.c +++ b/crypto/serpent_generic.c @@ -229,6 +229,46 @@ x4 ^= x2; \ }) +static void __serpent_setkey_sbox(u32 r0, u32 r1, u32 r2, u32 r3, u32 r4, u32 *k) +{ + k += 100; + S3(r3, r4, r0, r1, r2); store_and_load_keys(r1, r2, r4, r3, 28, 24); + S4(r1, r2, r4, r3, r0); store_and_load_keys(r2, r4, r3, r0, 24, 20); + S5(r2, r4, r3, r0, r1); store_and_load_keys(r1, r2, r4, r0, 20, 16); + S6(r1, r2, r4, r0, r3); store_and_load_keys(r4, r3, r2, r0, 16, 12); + S7(r4, r3, r2, r0, r1); store_and_load_keys(r1, r2, r0, r4, 12, 8); + S0(r1, r2, r0, r4, r3); store_and_load_keys(r0, r2, r4, r1, 8, 4); + S1(r0, r2, r4, r1, r3); store_and_load_keys(r3, r4, r1, r0, 4, 0); + S2(r3, r4, r1, r0, r2); store_and_load_keys(r2, r4, r3, r0, 0, -4); + S3(r2, r4, r3, r0, r1); store_and_load_keys(r0, r1, r4, r2, -4, -8); + S4(r0, r1, r4, r2, r3); store_and_load_keys(r1, r4, r2, r3, -8, -12); + S5(r1, r4, r2, r3, r0); store_and_load_keys(r0, r1, r4, r3, -12, -16); + S6(r0, r1, r4, r3, r2); store_and_load_keys(r4, r2, r1, r3, -16, -20); + S7(r4, r2, r1, r3, r0); store_and_load_keys(r0, r1, r3, r4, -20, -24); + S0(r0, r1, r3, r4, r2); store_and_load_keys(r3, r1, r4, r0, -24, -28); + k -= 50; + S1(r3, r1, r4, r0, r2); store_and_load_keys(r2, r4, r0, r3, 22, 18); + S2(r2, r4, r0, r3, r1); store_and_load_keys(r1, r4, r2, r3, 18, 14); + S3(r1, r4, r2, r3, r0); store_and_load_keys(r3, r0, r4, r1, 14, 10); + S4(r3, r0, r4, r1, r2); store_and_load_keys(r0, r4, r1, r2, 10, 6); + S5(r0, r4, r1, r2, r3); store_and_load_keys(r3, r0, r4, r2, 6, 2); + S6(r3, r0, r4, r2, r1); store_and_load_keys(r4, r1, r0, r2, 2, -2); + S7(r4, r1, r0, r2, r3); store_and_load_keys(r3, r0, r2, r4, -2, -6); + S0(r3, r0, r2, r4, r1); store_and_load_keys(r2, r0, r4, r3, -6, -10); + S1(r2, r0, r4, r3, r1); store_and_load_keys(r1, r4, r3, r2, -10, -14); + S2(r1, r4, r3, r2, r0); store_and_load_keys(r0, r4, r1, r2, -14, -18); + S3(r0, r4, r1, r2, r3); store_and_load_keys(r2, r3, r4, r0, -18, -22); + k -= 50; + S4(r2, r3, r4, r0, r1); store_and_load_keys(r3, r4, r0, r1, 28, 24); + S5(r3, r4, r0, r1, r2); store_and_load_keys(r2, r3, r4, r1, 24, 20); + S6(r2, r3, r4, r1, r0); store_and_load_keys(r4, r0, r3, r1, 20, 16); + S7(r4, r0, r3, r1, r2); store_and_load_keys(r2, r3, r1, r4, 16, 12); + S0(r2, r3, r1, r4, r0); store_and_load_keys(r1, r3, r4, r2, 12, 8); + S1(r1, r3, r4, r2, r0); store_and_load_keys(r0, r4, r2, r1, 8, 4); + S2(r0, r4, r2, r1, r3); store_and_load_keys(r3, r4, r0, r1, 4, 0); + S3(r3, r4, r0, r1, r2); storekeys(r1, r2, r4, r3, 0); +} + int __serpent_setkey(struct serpent_ctx *ctx, const u8 *key, unsigned int keylen) { @@ -395,42 +435,7 @@ int __serpent_setkey(struct serpent_ctx *ctx, const u8 *key, keyiter(k[23], r1, r0, r3, 131, 31); /* Apply S-boxes */ - - S3(r3, r4, r0, r1, r2); store_and_load_keys(r1, r2, r4, r3, 28, 24); - S4(r1, r2, r4, r3, r0); store_and_load_keys(r2, r4, r3, r0, 24, 20); - S5(r2, r4, r3, r0, r1); store_and_load_keys(r1, r2, r4, r0, 20, 16); - S6(r1, r2, r4, r0, r3); store_and_load_keys(r4, r3, r2, r0, 16, 12); - S7(r4, r3, r2, r0, r1); store_and_load_keys(r1, r2, r0, r4, 12, 8); - S0(r1, r2, r0, r4, r3); store_and_load_keys(r0, r2, r4, r1, 8, 4); - S1(r0, r2, r4, r1, r3); store_and_load_keys(r3, r4, r1, r0, 4, 0); - S2(r3, r4, r1, r0, r2); store_and_load_keys(r2, r4, r3, r0, 0, -4); - S3(r2, r4, r3, r0, r1); store_and_load_keys(r0, r1, r4, r2, -4, -8); - S4(r0, r1, r4, r2, r3); store_and_load_keys(r1, r4, r2, r3, -8, -12); - S5(r1, r4, r2, r3, r0); store_and_load_keys(r0, r1, r4, r3, -12, -16); - S6(r0, r1, r4, r3, r2); store_and_load_keys(r4, r2, r1, r3, -16, -20); - S7(r4, r2, r1, r3, r0); store_and_load_keys(r0, r1, r3, r4, -20, -24); - S0(r0, r1, r3, r4, r2); store_and_load_keys(r3, r1, r4, r0, -24, -28); - k -= 50; - S1(r3, r1, r4, r0, r2); store_and_load_keys(r2, r4, r0, r3, 22, 18); - S2(r2, r4, r0, r3, r1); store_and_load_keys(r1, r4, r2, r3, 18, 14); - S3(r1, r4, r2, r3, r0); store_and_load_keys(r3, r0, r4, r1, 14, 10); - S4(r3, r0, r4, r1, r2); store_and_load_keys(r0, r4, r1, r2, 10, 6); - S5(r0, r4, r1, r2, r3); store_and_load_keys(r3, r0, r4, r2, 6, 2); - S6(r3, r0, r4, r2, r1); store_and_load_keys(r4, r1, r0, r2, 2, -2); - S7(r4, r1, r0, r2, r3); store_and_load_keys(r3, r0, r2, r4, -2, -6); - S0(r3, r0, r2, r4, r1); store_and_load_keys(r2, r0, r4, r3, -6, -10); - S1(r2, r0, r4, r3, r1); store_and_load_keys(r1, r4, r3, r2, -10, -14); - S2(r1, r4, r3, r2, r0); store_and_load_keys(r0, r4, r1, r2, -14, -18); - S3(r0, r4, r1, r2, r3); store_and_load_keys(r2, r3, r4, r0, -18, -22); - k -= 50; - S4(r2, r3, r4, r0, r1); store_and_load_keys(r3, r4, r0, r1, 28, 24); - S5(r3, r4, r0, r1, r2); store_and_load_keys(r2, r3, r4, r1, 24, 20); - S6(r2, r3, r4, r1, r0); store_and_load_keys(r4, r0, r3, r1, 20, 16); - S7(r4, r0, r3, r1, r2); store_and_load_keys(r2, r3, r1, r4, 16, 12); - S0(r2, r3, r1, r4, r0); store_and_load_keys(r1, r3, r4, r2, 12, 8); - S1(r1, r3, r4, r2, r0); store_and_load_keys(r0, r4, r2, r1, 8, 4); - S2(r0, r4, r2, r1, r3); store_and_load_keys(r3, r4, r0, r1, 4, 0); - S3(r3, r4, r0, r1, r2); storekeys(r1, r2, r4, r3, 0); + __serpent_setkey_sbox(r0, r1, r2, r3, r4, ctx->expkey); return 0; }