From patchwork Sun Feb 5 03:05:27 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 9555839 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0C85160405 for ; Sun, 5 Feb 2017 03:05:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E8F0926B41 for ; Sun, 5 Feb 2017 03:05:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CA96526E39; Sun, 5 Feb 2017 03:05:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4383326B41 for ; Sun, 5 Feb 2017 03:05:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751385AbdBEDFc (ORCPT ); Sat, 4 Feb 2017 22:05:32 -0500 Received: from mail-pf0-f196.google.com ([209.85.192.196]:33859 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751053AbdBEDFc (ORCPT ); Sat, 4 Feb 2017 22:05:32 -0500 Received: by mail-pf0-f196.google.com with SMTP id y143so4336117pfb.1 for ; Sat, 04 Feb 2017 19:05:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=INa9+60tFyxLOObYwDfuuVKSt/QydMtjJu3TXtxOIHo=; b=dV4L6YQOnzyoWoVk37nvkvoi6NphrF9YWxQfspPqEG3RCd542Tri+ueJMJyj2ZeanR QEnEKm0l/9UQhTei7IpGBwME1tjXu116RjmoAzl40tqeQsVtrJU6TLXevnM8FaBbIW6V jFYzIbINc7EMPtA3JtSNlsqiE4DNlE6X3uj1D6C7XQcVn825WvCx/7R2GLi2uVdf5SoA ZHdE88ESvXQyMA8HLZ7LIHZ5QYDt64GLZJNU8MJzbFhvFbjh1BHIVDE9bMyetHuAsOy+ Xij38FL1lV8/2kEimG8KiSe/hRQZD1N5be2vb7YwPehIhOmgV9oVllyfrZsZ1MeNVQdv EKLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=INa9+60tFyxLOObYwDfuuVKSt/QydMtjJu3TXtxOIHo=; b=f7jiabbt/ahUuRftAWzTrrQKT8vFEpi5rGOWxfkcApyETq1SIqvYzs+NVztTIFlm7T vUFauP9hrkT7c1u5BzyjvJkuiP0uVS8yAT5zMzTARdLScVL2Tx+OaYx0+jAtZ1YGQlXy Gp5YTHsNJRAUYTrfPMHN647o9/zSIzLKAkwVExz5bTZsmBWbB5tJ2scxKg5C0nieMRI/ zEkBMPFY8C6Adv8N2u3cUelNxTOfjG+OUYiYE6NrIFiBO/QnaiB9ioiAylVwpUkMZ97w lCWczArddLZMJmjlyH8lcGV6EuyodX1vbXi2B5h24FelU8lpUzMERgC7m5ZioMSDGEb3 YH+Q== X-Gm-Message-State: AIkVDXL+lhTjhpUvF0/QKbU8pg9E9CTJYfUFDnhV5+ohkd8YN50BD7D62frsUuU/K/0IUg== X-Received: by 10.99.42.78 with SMTP id q75mr5802040pgq.144.1486263931309; Sat, 04 Feb 2017 19:05:31 -0800 (PST) Received: from zzz (c-73-239-167-150.hsd1.wa.comcast.net. [73.239.167.150]) by smtp.gmail.com with ESMTPSA id j7sm78096352pfe.84.2017.02.04.19.05.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 04 Feb 2017 19:05:30 -0800 (PST) Date: Sat, 4 Feb 2017 19:05:27 -0800 From: Eric Biggers To: "Jason A. Donenfeld" Cc: Ard Biesheuvel , Linux Crypto Mailing List , Herbert Xu Subject: Re: [RFC PATCH] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic Message-ID: <20170205030527.GA21055@zzz> References: <1485785489-5116-1-git-send-email-ard.biesheuvel@linaro.org> <20170202064716.GB582@zzz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.2 (2016-11-26) Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sun, Feb 05, 2017 at 12:10:53AM +0100, Jason A. Donenfeld wrote: > Another thing that might be helpful is that you can let gcc decide on > the alignment, and then optimize appropriately. Check out what we do > with siphash: > > https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/tree/include/linux/siphash.h#n76 > > static inline u64 siphash(const void *data, size_t len, const > siphash_key_t *key) > { > #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > if (!IS_ALIGNED((unsigned long)data, SIPHASH_ALIGNMENT)) > return __siphash_unaligned(data, len, key); > #endif > return ___siphash_aligned(data, len, key); > } > > With this trick, we fall through to the fast alignment-assuming code, > if gcc can prove that the address is inlined. This is often the case > when passing structs, or when passing buffers that have > __aligned(BLOCKSIZE). It proves to be a very useful optimization on > some platforms. Yes, this is a good idea. Though it seems that usually at least one of the two pointers passed to crypto_xor() will have alignment unknown to the compiler, sometimes the length is constant which inlining can help a lot for. For example, if someone does crypto_xor(foo, bar, 16) on x86_64 or ARM64, we'd really like it to turn into just a few instructions like this: mov (%rsi),%rax xor %rax,(%rdi) mov 0x8(%rsi),%rax xor %rax,0x8(%rdi) So how about inlining crypto_xor() if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS or the pointers are long-aligned, otherwise calling an out-of-line function __crypto_xor_unaligned() that handles all the cases with weird alignment. Something like the following patch: (Note: exactly how __crypto_xor_unaligned() is implemented is still debatable; it could be more similar to Ard's proposal, or it could use the unaligned access helpers.) diff --git a/crypto/algapi.c b/crypto/algapi.c index df939b54b09f..a0591db3f13a 100644 --- a/crypto/algapi.c +++ b/crypto/algapi.c @@ -972,23 +972,69 @@ void crypto_inc(u8 *a, unsigned int size) } EXPORT_SYMBOL_GPL(crypto_inc); -static inline void crypto_xor_byte(u8 *a, const u8 *b, unsigned int size) +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS +void __crypto_xor_unaligned(u8 *dst, const u8 *src, unsigned int len) { - for (; size; size--) - *a++ ^= *b++; -} + unsigned long delta = (unsigned long)dst ^ (unsigned long)src; -void crypto_xor(u8 *dst, const u8 *src, unsigned int size) -{ - u32 *a = (u32 *)dst; - u32 *b = (u32 *)src; + /* Handle relative misalignment */ + if (delta % sizeof(unsigned long)) { + + /* 1-byte relative misalignment? */ + if (delta & 1) { + while (len--) + *dst++ ^= *src++; + return; + } - for (; size >= 4; size -= 4) - *a++ ^= *b++; + /* 2-byte relative misalignment? */ + if ((delta & 2) || sizeof(unsigned long) == 4) { + if ((unsigned long)dst % __alignof__(u16) && len) { + *dst++ ^= *src++; + len--; + } + while (len >= 2) { + *(u16 *)dst ^= *(u16 *)src; + dst += 2, src += 2, len -= 2; + } + if (len) + *dst ^= *src; + return; + } + + /* 4-byte relative misalignment? */ + while ((unsigned long)dst % __alignof__(u32) && len) { + *dst++ ^= *src++; + len--; + } + while (len >= 4) { + *(u32 *)dst ^= *(u32 *)src; + dst += 4, src += 4, len -= 4; + } + while (len--) + *dst++ ^= *src++; + return; + } + + /* No relative misalignment; use word accesses */ + + while ((unsigned long)dst % __alignof__(unsigned long) && len) { + *dst++ ^= *src++; + len--; + } + + while (len >= sizeof(unsigned long)) { + *(unsigned long *)dst ^= *(unsigned long *)src; + dst += sizeof(unsigned long); + src += sizeof(unsigned long); + len -= sizeof(unsigned long); + } - crypto_xor_byte((u8 *)a, (u8 *)b, size); + while (len--) + *dst++ ^= *src++; } -EXPORT_SYMBOL_GPL(crypto_xor); +EXPORT_SYMBOL_GPL(__crypto_xor_unaligned); +#endif /* !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */ unsigned int crypto_alg_extsize(struct crypto_alg *alg) { diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h index 404e9558e879..718145c5eaca 100644 --- a/include/crypto/algapi.h +++ b/include/crypto/algapi.h @@ -191,9 +191,29 @@ static inline unsigned int crypto_queue_len(struct crypto_queue *queue) return queue->qlen; } -/* These functions require the input/output to be aligned as u32. */ void crypto_inc(u8 *a, unsigned int size); -void crypto_xor(u8 *dst, const u8 *src, unsigned int size); + +void __crypto_xor_unaligned(u8 *dst, const u8 *src, unsigned int len); + +static inline void crypto_xor(u8 *dst, const u8 *src, unsigned int len) +{ + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || + (((unsigned long)dst | (unsigned long)src) % + __alignof__(unsigned long) == 0)) + { + while (len >= sizeof(unsigned long)) { + *(unsigned long *)dst ^= *(unsigned long *)src; + dst += sizeof(unsigned long); + src += sizeof(unsigned long); + len -= sizeof(unsigned long); + } + + while (len--) + *dst++ ^= *src++; + return; + } + return __crypto_xor_unaligned(dst, src, len); +} int blkcipher_walk_done(struct blkcipher_desc *desc, struct blkcipher_walk *walk, int err);