From patchwork Tue Dec 19 00:40:26 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10122085 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A6769602CB for ; Tue, 19 Dec 2017 00:43:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E78528D33 for ; Tue, 19 Dec 2017 00:43:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 826E228D55; Tue, 19 Dec 2017 00:43:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C165F28D33 for ; Tue, 19 Dec 2017 00:43:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759898AbdLSAnR (ORCPT ); Mon, 18 Dec 2017 19:43:17 -0500 Received: from mail-io0-f196.google.com ([209.85.223.196]:45704 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759897AbdLSAnQ (ORCPT ); Mon, 18 Dec 2017 19:43:16 -0500 Received: by mail-io0-f196.google.com with SMTP id e204so11777448iof.12; Mon, 18 Dec 2017 16:43:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=KlLU90CA6pPu7EVzzNj0pQpyUg7ZCGOAmx0383pQSF0=; b=BKjZuROazsnW3QaIUe12e5bcPfOp1AYUDY8s6ri08IiCLKvSBo4UyIBKGBZYgor2+x M3xFCc1cc/d5xWPRNVBfNlJAesQimTcoBh7CQpslcWsm5w1JjVrgVbb/+LfUzbUu6ZEM F4XGADK4J4FNvTA2Jvqp6UuD/Hyx8cXfYnzRQRwrhuB6JFmS+eEOVLHrq8KwwM2YVovT Aio+8qVfIqeAUVYu7/6IjmjYt8hdzJpzE2Om6f5LlMf2PeYoLYruRP+Gc/F1JeRhgxu5 FCyfD8scSclSagdyb4is8GPnKZs/NgJ/oA4eiab2dVdlJgevYqHCtKNJYV1bTAvhuIWj ud2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=KlLU90CA6pPu7EVzzNj0pQpyUg7ZCGOAmx0383pQSF0=; b=MgpDPZcHnvpAFJ0WHHIG+UReFWPKi6Ob0DMdJYsMraGZaRouHH0DWl38oiZwnyom9r Zfuja0wuEfxRa0A/qKDmYJ/EQTWOt7A8YOuV/ywGgvpPiJJ3z23ZPPnXBvDkrJTs0UKc vR5UiIBgT+gRjL+gr9G0V+NWATn8zVdisJE6wMujQh/77TdrWXak4FHOggowrsOnNHJl VsX+tSUwKEaSYzr6Y3tb9cmDm6KGaJJi+GsCPfhgWdGmmwGtwOTBk9DRVMbLdWcvkn6W mjCRAxVhqCACLZ+Ru0C118L2gR7U+TZRDuL42/RrF4X4TZIxuo/1cka/1eTDrrfFiEeR v/ew== X-Gm-Message-State: AKGB3mIhqyQEzoJofcgoTHqRfHvCcUFxt3T9OZT3bCtBhX7gFA3NxcHG X+WyHfaSb2gSWaNZKWFaZ1B7eDvp X-Google-Smtp-Source: ACJfBou+YnEbOOGLTdnbTi7D5GBplqjmGAnftpQ5JYGUEWrBNbLFcpZuThXFVC4Tim/a/mNI+Dp+6g== X-Received: by 10.107.128.98 with SMTP id b95mr2032599iod.57.1513644195174; Mon, 18 Dec 2017 16:43:15 -0800 (PST) Received: from ebiggers-linuxstation.kir.corp.google.com ([100.66.175.88]) by smtp.gmail.com with ESMTPSA id m184sm306873itg.8.2017.12.18.16.43.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 18 Dec 2017 16:43:14 -0800 (PST) From: Eric Biggers To: linux-crypto@vger.kernel.org, Herbert Xu Cc: "David S . Miller" , Josh Poimboeuf , Jussi Kivilinna , x86@kernel.org, linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com, Eric Biggers Subject: [PATCH] crypto: x86/twofish-3way - Fix %rbp usage Date: Mon, 18 Dec 2017 16:40:26 -0800 Message-Id: <20171219004026.170565-1-ebiggers3@gmail.com> X-Mailer: git-send-email 2.15.1.504.g5279b80103-goog In-Reply-To: <001a113f2cd26f3532055f0f4a79@google.com> References: <001a113f2cd26f3532055f0f4a79@google.com> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Biggers Using %rbp as a temporary register breaks frame pointer convention and breaks stack traces when unwinding from an interrupt in the crypto code. In twofish-3way, we can't simply replace %rbp with another register because there are none available. Instead, we use the stack to hold the values that %rbp, %r11, and %r12 were holding previously. Each of these values represents the half of the output from the previous Feistel round that is being passed on unchanged to the following round. They are only used once per round, when they are exchanged with %rax, %rbx, and %rcx. As a result, we free up 3 registers (one per block) and can reassign them so that %rbp is not used, and additionally %r14 and %r15 are not used so they do not need to be saved/restored. There may be a small overhead caused by replacing 'xchg REG, REG' with the needed sequence 'mov MEM, REG; mov REG, MEM; mov REG, REG' once per round. But, counterintuitively, when I tested "ctr-twofish-3way" on a Haswell processor, the new version was actually about 2% faster. (Perhaps 'xchg' is not as well optimized as plain moves.) Reported-by: syzbot Signed-off-by: Eric Biggers Reviewed-by: Josh Poimboeuf --- arch/x86/crypto/twofish-x86_64-asm_64-3way.S | 112 ++++++++++++++------------- 1 file changed, 60 insertions(+), 52 deletions(-) diff --git a/arch/x86/crypto/twofish-x86_64-asm_64-3way.S b/arch/x86/crypto/twofish-x86_64-asm_64-3way.S index 1c3b7ceb36d2..e7273a606a07 100644 --- a/arch/x86/crypto/twofish-x86_64-asm_64-3way.S +++ b/arch/x86/crypto/twofish-x86_64-asm_64-3way.S @@ -55,29 +55,31 @@ #define RAB1bl %bl #define RAB2bl %cl +#define CD0 0x0(%rsp) +#define CD1 0x8(%rsp) +#define CD2 0x10(%rsp) + +# used only before/after all rounds #define RCD0 %r8 #define RCD1 %r9 #define RCD2 %r10 -#define RCD0d %r8d -#define RCD1d %r9d -#define RCD2d %r10d - -#define RX0 %rbp -#define RX1 %r11 -#define RX2 %r12 +# used only during rounds +#define RX0 %r8 +#define RX1 %r9 +#define RX2 %r10 -#define RX0d %ebp -#define RX1d %r11d -#define RX2d %r12d +#define RX0d %r8d +#define RX1d %r9d +#define RX2d %r10d -#define RY0 %r13 -#define RY1 %r14 -#define RY2 %r15 +#define RY0 %r11 +#define RY1 %r12 +#define RY2 %r13 -#define RY0d %r13d -#define RY1d %r14d -#define RY2d %r15d +#define RY0d %r11d +#define RY1d %r12d +#define RY2d %r13d #define RT0 %rdx #define RT1 %rsi @@ -85,6 +87,8 @@ #define RT0d %edx #define RT1d %esi +#define RT1bl %sil + #define do16bit_ror(rot, op1, op2, T0, T1, tmp1, tmp2, ab, dst) \ movzbl ab ## bl, tmp2 ## d; \ movzbl ab ## bh, tmp1 ## d; \ @@ -92,6 +96,11 @@ op1##l T0(CTX, tmp2, 4), dst ## d; \ op2##l T1(CTX, tmp1, 4), dst ## d; +#define swap_ab_with_cd(ab, cd, tmp) \ + movq cd, tmp; \ + movq ab, cd; \ + movq tmp, ab; + /* * Combined G1 & G2 function. Reordered with help of rotates to have moves * at begining. @@ -110,15 +119,15 @@ /* G1,2 && G2,2 */ \ do16bit_ror(32, xor, xor, Tx2, Tx3, RT0, RT1, ab ## 0, x ## 0); \ do16bit_ror(16, xor, xor, Ty3, Ty0, RT0, RT1, ab ## 0, y ## 0); \ - xchgq cd ## 0, ab ## 0; \ + swap_ab_with_cd(ab ## 0, cd ## 0, RT0); \ \ do16bit_ror(32, xor, xor, Tx2, Tx3, RT0, RT1, ab ## 1, x ## 1); \ do16bit_ror(16, xor, xor, Ty3, Ty0, RT0, RT1, ab ## 1, y ## 1); \ - xchgq cd ## 1, ab ## 1; \ + swap_ab_with_cd(ab ## 1, cd ## 1, RT0); \ \ do16bit_ror(32, xor, xor, Tx2, Tx3, RT0, RT1, ab ## 2, x ## 2); \ do16bit_ror(16, xor, xor, Ty3, Ty0, RT0, RT1, ab ## 2, y ## 2); \ - xchgq cd ## 2, ab ## 2; + swap_ab_with_cd(ab ## 2, cd ## 2, RT0); #define enc_round_end(ab, x, y, n) \ addl y ## d, x ## d; \ @@ -168,6 +177,16 @@ decrypt_round3(ba, dc, (n*2)+1); \ decrypt_round3(ba, dc, (n*2)); +#define push_cd() \ + pushq RCD2; \ + pushq RCD1; \ + pushq RCD0; + +#define pop_cd() \ + popq RCD0; \ + popq RCD1; \ + popq RCD2; + #define inpack3(in, n, xy, m) \ movq 4*(n)(in), xy ## 0; \ xorq w+4*m(CTX), xy ## 0; \ @@ -223,11 +242,8 @@ ENTRY(__twofish_enc_blk_3way) * %rdx: src, RIO * %rcx: bool, if true: xor output */ - pushq %r15; - pushq %r14; pushq %r13; pushq %r12; - pushq %rbp; pushq %rbx; pushq %rcx; /* bool xor */ @@ -235,40 +251,36 @@ ENTRY(__twofish_enc_blk_3way) inpack_enc3(); - encrypt_cycle3(RAB, RCD, 0); - encrypt_cycle3(RAB, RCD, 1); - encrypt_cycle3(RAB, RCD, 2); - encrypt_cycle3(RAB, RCD, 3); - encrypt_cycle3(RAB, RCD, 4); - encrypt_cycle3(RAB, RCD, 5); - encrypt_cycle3(RAB, RCD, 6); - encrypt_cycle3(RAB, RCD, 7); + push_cd(); + encrypt_cycle3(RAB, CD, 0); + encrypt_cycle3(RAB, CD, 1); + encrypt_cycle3(RAB, CD, 2); + encrypt_cycle3(RAB, CD, 3); + encrypt_cycle3(RAB, CD, 4); + encrypt_cycle3(RAB, CD, 5); + encrypt_cycle3(RAB, CD, 6); + encrypt_cycle3(RAB, CD, 7); + pop_cd(); popq RIO; /* dst */ - popq %rbp; /* bool xor */ + popq RT1; /* bool xor */ - testb %bpl, %bpl; + testb RT1bl, RT1bl; jnz .L__enc_xor3; outunpack_enc3(mov); popq %rbx; - popq %rbp; popq %r12; popq %r13; - popq %r14; - popq %r15; ret; .L__enc_xor3: outunpack_enc3(xor); popq %rbx; - popq %rbp; popq %r12; popq %r13; - popq %r14; - popq %r15; ret; ENDPROC(__twofish_enc_blk_3way) @@ -278,35 +290,31 @@ ENTRY(twofish_dec_blk_3way) * %rsi: dst * %rdx: src, RIO */ - pushq %r15; - pushq %r14; pushq %r13; pushq %r12; - pushq %rbp; pushq %rbx; pushq %rsi; /* dst */ inpack_dec3(); - decrypt_cycle3(RAB, RCD, 7); - decrypt_cycle3(RAB, RCD, 6); - decrypt_cycle3(RAB, RCD, 5); - decrypt_cycle3(RAB, RCD, 4); - decrypt_cycle3(RAB, RCD, 3); - decrypt_cycle3(RAB, RCD, 2); - decrypt_cycle3(RAB, RCD, 1); - decrypt_cycle3(RAB, RCD, 0); + push_cd(); + decrypt_cycle3(RAB, CD, 7); + decrypt_cycle3(RAB, CD, 6); + decrypt_cycle3(RAB, CD, 5); + decrypt_cycle3(RAB, CD, 4); + decrypt_cycle3(RAB, CD, 3); + decrypt_cycle3(RAB, CD, 2); + decrypt_cycle3(RAB, CD, 1); + decrypt_cycle3(RAB, CD, 0); + pop_cd(); popq RIO; /* dst */ outunpack_dec3(); popq %rbx; - popq %rbp; popq %r12; popq %r13; - popq %r14; - popq %r15; ret; ENDPROC(twofish_dec_blk_3way)