From patchwork Mon Jun 18 21:56:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10472735 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 55AA26032A for ; Mon, 18 Jun 2018 21:57:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 388A2284D4 for ; Mon, 18 Jun 2018 21:57:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 35E9728C6A; Mon, 18 Jun 2018 21:57:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FSL_HELO_FAKE, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BEADE284D4 for ; Mon, 18 Jun 2018 21:57:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755381AbeFRV5D (ORCPT ); Mon, 18 Jun 2018 17:57:03 -0400 Received: from mail-pl0-f67.google.com ([209.85.160.67]:35007 "EHLO mail-pl0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755326AbeFRV5B (ORCPT ); Mon, 18 Jun 2018 17:57:01 -0400 Received: by mail-pl0-f67.google.com with SMTP id k1-v6so9773074plt.2 for ; Mon, 18 Jun 2018 14:57:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=1MuK1tzR+EMoeIxknuEV7BdOzXQpECT7zAL6XPaK94A=; b=Ztp/ORD7kIneSKQ7IT0AgHhZTY3KJB9U/AejXGSQuZRAt+bqPdP6tPffIDC2cVxU6N AB/N/SYeTxA6N+bRxSj4GGnp+SFmTnikwJ42btH7pL/narYmsYpAcJoJCfGrMNOWCoXV 72n5y0WTA1cQ86YuHiuaQZLUzsFCBHuFyyCXMeoQ7B6dkPKtcm+AAd5iKhiO7AVDnrOb KSF2mKkD7GOawClOmujMJr0R3JDgR1s718lqnNESiIwzIFQFViTBy2VejI4+QjWZFnMh kEqgdH6UFbJY1kIS1JPbsvJwCQwXxuTqVyAty3w3p4npEHYkxGNz7nrqTNFKHJ3VbWHt NEUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=1MuK1tzR+EMoeIxknuEV7BdOzXQpECT7zAL6XPaK94A=; b=G/QcRiTrNkgQtqowU52oWtl3FU2tngRZWhjaiZlVW0zZMLnTrMFazhTYsXgqbbqoz9 j+IomrCYgXi+1JpLX7EuefED7lvAhX8wukVPXxcS4WLYe5PQYtXAVMxFcHGcrEkg4aQI XePl9fdguXCfqT+UIpVuMEHpfpJs7Y6ZRyk6OWKcYDLojdefEM8ThFC+hovzUfRo41ID Ycdw4dUvDcdhSrR1bOXd540EW4CN169nlcFJsso+opJ/edvXUmr5IEDVqxVtS4Lx6meR xpiSQqsS3/x2SQ+su8JIBHiCnds0/8n+r2NG8ajeAeMvrFAuShbAVUl4FeoYBbHeJmPU /Hkw== X-Gm-Message-State: APt69E0sh3m3ikkbDtpi+5CB9objYw7GRjL5PR/AKSPbG51TqboxEQCU Hz1cLnJTdRukYq0HGT9I4tuCZw== X-Google-Smtp-Source: ADUXVKLJHzrZnY7MCilYa5McxM6VR0TUmgHGZ6DB6vKBi13Bwy8oFb85RQPREej3Q5nYHPIF4QGR+Q== X-Received: by 2002:a17:902:8491:: with SMTP id c17-v6mr15713497plo.97.1529359021152; Mon, 18 Jun 2018 14:57:01 -0700 (PDT) Received: from google.com ([2620:15c:17:3:dc28:5c82:b905:e8a8]) by smtp.gmail.com with ESMTPSA id z12-v6sm8056857pgu.57.2018.06.18.14.56.59 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Jun 2018 14:56:59 -0700 (PDT) Date: Mon, 18 Jun 2018 14:56:57 -0700 From: Eric Biggers To: Ard Biesheuvel Cc: Stefan Agner , "open list:HARDWARE RANDOM NUMBER GENERATOR CORE" , Herbert Xu , linux-fscrypt@vger.kernel.org, linux-arm-kernel , Jeffrey Walton , Paul Crowley , Patrik Torstensson , Greg Kaiser , Paul Lawrence , Michael Halcrow , Alex Cope , Greg Kroah-Hartman , linux-crypto-owner@vger.kernel.org Subject: Re: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS Message-ID: <20180618215657.GB8022@google.com> References: <20180214184223.254359-1-ebiggers@google.com> <20180214184223.254359-4-ebiggers@google.com> <8396d433caf1155f9ca422c6bad3200b@agner.ch> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10+28 (db52f11e) (2018-06-13) Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fscrypt@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sun, Jun 17, 2018 at 01:10:41PM +0200, Ard Biesheuvel wrote: > >>>>> + > >>>>> + // One-time XTS preparation > >>>>> + > >>>>> + /* > >>>>> + * Allocate stack space to store 128 bytes worth of tweaks. For > >>>>> + * performance, this space is aligned to a 16-byte boundary so that we > >>>>> + * can use the load/store instructions that declare 16-byte alignment. > >>>>> + */ > >>>>> + sub sp, #128 > >>>>> + bic sp, #0xf > >>>> > >>>> > >>>> This fails here when building with CONFIG_THUMB2_KERNEL=y > >>>> > >>>> AS arch/arm/crypto/speck-neon-core.o > >>>> > >>>> arch/arm/crypto/speck-neon-core.S: Assembler messages: > >>>> > >>>> arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here -- > >>>> `bic sp,#0xf' > >>>> arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here -- > >>>> `bic sp,#0xf' > >>>> arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here -- > >>>> `bic sp,#0xf' > >>>> arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here -- > >>>> `bic sp,#0xf' > >>>> > >>>> In a quick hack this change seems to address it: > >>>> > >>>> > >>>> - sub sp, #128 > >>>> - bic sp, #0xf > >>>> + mov r6, sp > >>>> + sub r6, #128 > >>>> + bic r6, #0xf > >>>> + mov sp, r6 > >>>> > >>>> But there is probably a better solution to address this. > >>>> > >>> > >>> Given that there is no NEON on M class cores, I recommend we put something like > >>> > >>> THUMB(bx pc) > >>> THUMB(nop.w) > >>> THUMB(.arm) > >>> > >>> at the beginning and be done with it. > >> > >> I mean nop.n or just nop, of course, and we may need a '.align 2' at > >> the beginning as well. > > > > Wouldn't it be preferable to have it assemble it in Thumb2 too? It seems > > that bic sp,#0xf is the only issue... > > > > Well, in general, yes. In the case of NEON code, not really, since the > resulting code will not be smaller anyway, because the Thumb2 NEON > opcodes are all 4 bytes. Also, Thumb2-only cores don't have NEON > units, so all cores that this code can run on will be able to run in > ARM mode. > > So from a maintainability pov, having code that only assembles in one > way is better than having code that must compile both to ARM and to > Thumb2 opcodes. > > Just my 2 cents, anyway. I don't have too much of a preference, though Stefan's suggested 4 instructions can be reduced to 3, which also matches what aes-neonbs-core.S does: sub r12, sp, #128 bic r12, #0xf mov sp, r12 Ard, is the following what you're suggesting instead? --- To unsubscribe from this list: send the line "unsubscribe linux-fscrypt" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S index 3c1e203e53b9..c989ce3dc057 100644 --- a/arch/arm/crypto/speck-neon-core.S +++ b/arch/arm/crypto/speck-neon-core.S @@ -8,6 +8,7 @@ */ #include +#include .text .fpu neon @@ -233,6 +234,12 @@ * nonzero multiple of 128. */ .macro _speck_xts_crypt n, decrypting + + .align 2 + THUMB(bx pc) + THUMB(nop) + THUMB(.arm) + push {r4-r7} mov r7, sp @@ -413,6 +420,8 @@ mov sp, r7 pop {r4-r7} bx lr + + THUMB(.thumb) .endm ENTRY(speck128_xts_encrypt_neon)