mbox series

[v2,0/9] crypto: x86 - stop using the SIMD helper

Message ID 20250402002420.89233-1-ebiggers@kernel.org (mailing list archive)
Headers show
Series crypto: x86 - stop using the SIMD helper | expand

Message

Eric Biggers April 2, 2025, 12:24 a.m. UTC
Patches 2-9 are almost identical to
https://lore.kernel.org/r/20250220051325.340691-3-ebiggers@kernel.org/
but now split into multiple patches.  Patch 1 is just a resend of
https://lore.kernel.org/r/20250320220648.121990-1-ebiggers@kernel.org/
which is needed for the series to apply cleanly but is otherwise
unrelated.  Description of patches 2-9 follows:

Stop wrapping skcipher and aead algorithms with the crypto SIMD helper
(crypto/simd.c).  The only purpose of doing so was to work around x86
not always supporting kernel-mode FPU in softirqs.  Specifically, if a
hardirq interrupted a task context kernel-mode FPU section and then a
softirqs were run at the end of that hardirq, those softirqs could not
use kernel-mode FPU.  This has now been fixed.  In combination with the
fact that the skcipher and aead APIs only support task and softirq
contexts, these can now just use kernel-mode FPU unconditionally on x86.

This simplifies the code and improves performance.

En/decryption gets at least somewhat faster for everyone, since the
crypto API functions such as crypto_skcipher_encrypt() now go directly
to the underlying algorithm rather than taking a detour through
crypto/simd.c which involved an extra indirect call.  For example, on a
Ryzen 9 9950X desktop processor, AES-256-XTS is now 23% faster for
512-byte messages and 7% faster for 4096-byte messages (when accessed
through crypto_skcipher_encrypt() or crypto_skcipher_decrypt()).

There's also a much larger performance improvement for crypto API users
that only support synchronous algorithms.  These users will now actually
use the x86 SIMD (e.g. AES-NI or VAES) optimized en/decryption modes,
which they couldn't before because they were marked as asynchronous.

Eric Biggers (9):
  crypto: x86/aes - drop the avx10_256 AES-XTS and AES-CTR code
  crypto: x86/aegis - stop using the SIMD helper
  crypto: x86/aes - stop using the SIMD helper
  crypto: x86/aria - stop using the SIMD helper
  crypto: x86/camellia - stop using the SIMD helper
  crypto: x86/cast - stop using the SIMD helper
  crypto: x86/serpent - stop using the SIMD helper
  crypto: x86/sm4 - stop using the SIMD helper
  crypto: x86/twofish - stop using the SIMD helper

 arch/x86/crypto/Kconfig                    |  14 --
 arch/x86/crypto/aegis128-aesni-glue.c      |  13 +-
 arch/x86/crypto/aes-ctr-avx-x86_64.S       |  47 ++----
 arch/x86/crypto/aes-xts-avx-x86_64.S       | 118 ++++++--------
 arch/x86/crypto/aesni-intel_glue.c         | 174 ++++++++-------------
 arch/x86/crypto/aria_aesni_avx2_glue.c     |  22 +--
 arch/x86/crypto/aria_aesni_avx_glue.c      |  20 +--
 arch/x86/crypto/aria_gfni_avx512_glue.c    |  22 +--
 arch/x86/crypto/camellia_aesni_avx2_glue.c |  21 +--
 arch/x86/crypto/camellia_aesni_avx_glue.c  |  21 +--
 arch/x86/crypto/cast5_avx_glue.c           |  21 +--
 arch/x86/crypto/cast6_avx_glue.c           |  20 +--
 arch/x86/crypto/serpent_avx2_glue.c        |  21 +--
 arch/x86/crypto/serpent_avx_glue.c         |  21 +--
 arch/x86/crypto/serpent_sse2_glue.c        |  21 +--
 arch/x86/crypto/sm4_aesni_avx2_glue.c      |  31 ++--
 arch/x86/crypto/sm4_aesni_avx_glue.c       |  31 ++--
 arch/x86/crypto/twofish_avx_glue.c         |  21 +--
 18 files changed, 227 insertions(+), 432 deletions(-)


base-commit: 91e5bfe317d8f8471fbaa3e70cf66cae1314a516

Comments

Herbert Xu April 2, 2025, 3:14 a.m. UTC | #1
Eric Biggers <ebiggers@kernel.org> wrote:
>
> Stop wrapping skcipher and aead algorithms with the crypto simd helper
> (crypto/simd.c).  The only purpose of doing so was to work around x86
> not always supporting kernel-mode FPU in softirqs.  Specifically, if a
> hardirq interrupted a task context kernel-mode FPU section and then a
> softirqs were run at the end of that hardirq, those softirqs could not
> use kernel-mode FPU.  This has now been fixed.  In combination with the
> fact that the skcipher and aead APIs only support task and softirq
> contexts, these can now just use kernel-mode FPU unconditionally on x86.

Nice work!

So which platform still needs the simd wrapper? I believe arm/arm64
have both been fixed but we haven't finished removing the legacy
simd code yet? Ard, would you be able to spare some cycles and
finish the removal of simd on arm?

Darn, it looks like powerpc has just started using the simd wrapper
so we need to fix it first before we can completely eliminate simd.

Michael/Danny, any chance you guys could implement something similar
to what's been done on arm/x86 and make simd usable in softirqs?

Thanks,
Ard Biesheuvel April 2, 2025, 6:31 a.m. UTC | #2
On Wed, 2 Apr 2025 at 06:14, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > Stop wrapping skcipher and aead algorithms with the crypto simd helper
> > (crypto/simd.c).  The only purpose of doing so was to work around x86
> > not always supporting kernel-mode FPU in softirqs.  Specifically, if a
> > hardirq interrupted a task context kernel-mode FPU section and then a
> > softirqs were run at the end of that hardirq, those softirqs could not
> > use kernel-mode FPU.  This has now been fixed.  In combination with the
> > fact that the skcipher and aead APIs only support task and softirq
> > contexts, these can now just use kernel-mode FPU unconditionally on x86.
>
> Nice work!
>

Yeah good riddance.

> So which platform still needs the simd wrapper? I believe arm/arm64
> have both been fixed but we haven't finished removing the legacy
> simd code yet? Ard, would you be able to spare some cycles and
> finish the removal of simd on arm?
>

Removal of what, exactly?
Ard Biesheuvel April 2, 2025, 6:34 a.m. UTC | #3
On Wed, 2 Apr 2025 at 09:31, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 2 Apr 2025 at 06:14, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >
> > Eric Biggers <ebiggers@kernel.org> wrote:
> > >
> > > Stop wrapping skcipher and aead algorithms with the crypto simd helper
> > > (crypto/simd.c).  The only purpose of doing so was to work around x86
> > > not always supporting kernel-mode FPU in softirqs.  Specifically, if a
> > > hardirq interrupted a task context kernel-mode FPU section and then a
> > > softirqs were run at the end of that hardirq, those softirqs could not
> > > use kernel-mode FPU.  This has now been fixed.  In combination with the
> > > fact that the skcipher and aead APIs only support task and softirq
> > > contexts, these can now just use kernel-mode FPU unconditionally on x86.
> >
> > Nice work!
> >
>
> Yeah good riddance.
>
> > So which platform still needs the simd wrapper? I believe arm/arm64
> > have both been fixed but we haven't finished removing the legacy
> > simd code yet? Ard, would you be able to spare some cycles and
> > finish the removal of simd on arm?
> >
>
> Removal of what, exactly?

Ah, never mind - I see some calls on 32-bit ARM to
simd_skcipher_create_compat(), which have become redundant now that
SIMD is guaranteed to be available in softirq context.
Herbert Xu April 2, 2025, 8:22 a.m. UTC | #4
On Wed, Apr 02, 2025 at 09:34:30AM +0300, Ard Biesheuvel wrote:
>
> Ah, never mind - I see some calls on 32-bit ARM to
> simd_skcipher_create_compat(), which have become redundant now that
> SIMD is guaranteed to be available in softirq context.

Thanks!

We could also remove all the calls to crypto_simd_usable in the
Crypto API hashing code, e.g., arch/arm64/crypto/sha1-ce-glue.c.

For the lib/crypto code I think we should make it a rule to
not allow any hardirq usage just like the Crypto API.  Does
anyone know of any uses of lib/crypto in a hardirq?

I thought /dev/random might do that but it looks like Jason has
fixed it so that crypto code is no longer used in hardirqs:

commit e3e33fc2ea7fcefd0d761db9d6219f83b4248f5c
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Fri May 6 18:30:51 2022 +0200

    random: do not use input pool from hard IRQs

Cheers,
Eric Biggers April 2, 2025, 5:19 p.m. UTC | #5
On Wed, Apr 02, 2025 at 04:22:21PM +0800, Herbert Xu wrote:
> On Wed, Apr 02, 2025 at 09:34:30AM +0300, Ard Biesheuvel wrote:
> >
> > Ah, never mind - I see some calls on 32-bit ARM to
> > simd_skcipher_create_compat(), which have become redundant now that
> > SIMD is guaranteed to be available in softirq context.
> 
> Thanks!
> 
> We could also remove all the calls to crypto_simd_usable in the
> Crypto API hashing code, e.g., arch/arm64/crypto/sha1-ce-glue.c.
> 
> For the lib/crypto code I think we should make it a rule to
> not allow any hardirq usage just like the Crypto API.  Does
> anyone know of any uses of lib/crypto in a hardirq?

This seems premature.  crypto_shash is documented to be usable in any context.
See the "Context:" comments in include/crypto/hash.h.  Similarly, developers
expect lib/ functions to be available in any context unless otherwise
documented.

For skcipher and aead, there are more reasons why it makes sense to limit the
contexts:

- skcipher_walk_first() already explicitly errors out if in_hardirq(), which
  already prevents them from working in hardirq context in most cases
- Even if it was allowed, the skcipher and aead APIs are already difficult to
  use correctly in a hardirq
- Because of how the crypto API is designed, it's not straightforward to fall
  back to generic skcipher and aead code in no-SIMD contexts

I could see the limitation being brought into crypto_shash too, though the
crypto_shash documentation will need to be updated.  The crypto API also really
needs to be explicitly checking all its requirements.  (It's probably finally
time to add a kconfig option like CONFIG_DEBUG_CRYPTO to lib/Kconfig.debug, and
put the extra assertions under there.  Then they could be added without
impacting performance for normal users.)

IMO, doing it for lib/ too would be going too far though.  The lib/ functions
should be easy to use and not have random requirements on the calling context.
And since they're just functions, it's easy for them to fall back to the generic
functions when needed.  Also note that for very short inputs it can actually be
faster to use no-SIMD code, as that avoids the overhead of a kernel-mode SIMD
section.  So the fallback sometimes exists anyway for that.

- Eric
Herbert Xu April 3, 2025, 1:25 a.m. UTC | #6
On Wed, Apr 02, 2025 at 10:19:30AM -0700, Eric Biggers wrote:
>
> This seems premature.  crypto_shash is documented to be usable in any context.
> See the "Context:" comments in include/crypto/hash.h.  Similarly, developers
> expect lib/ functions to be available in any context unless otherwise
> documented.

Doing slow computations in a hard IRQ is a bad idea.  The whole
point of a hard IRQ handler is to set a flag and defer everything
to a different context.

Please show me one good reason why we should allow crypto in
a hard IRQ.
 
> IMO, doing it for lib/ too would be going too far though.  The lib/ functions
> should be easy to use and not have random requirements on the calling context.
> And since they're just functions, it's easy for them to fall back to the generic
> functions when needed.  Also note that for very short inputs it can actually be
> faster to use no-SIMD code, as that avoids the overhead of a kernel-mode SIMD
> section.  So the fallback sometimes exists anyway for that.

We already disallow SIMD in hard IRQs anyway (may_use_simd is
always false in that context).  The only thing you could use
is the generic implementation.

So making this change in lib/crypto does not take any functionality
away.  You could still invoke the generic lib/crypto code directly.

It does mean that we take away a completely useless check for
people who are actually doing crypto because crypto work should
never be done in a hard IRQ.

Cheers,
Eric Biggers April 3, 2025, 2:14 a.m. UTC | #7
On Thu, Apr 03, 2025 at 09:25:37AM +0800, Herbert Xu wrote:
> On Wed, Apr 02, 2025 at 10:19:30AM -0700, Eric Biggers wrote:
> >
> > This seems premature.  crypto_shash is documented to be usable in any context.
> > See the "Context:" comments in include/crypto/hash.h.  Similarly, developers
> > expect lib/ functions to be available in any context unless otherwise
> > documented.
> 
> Doing slow computations in a hard IRQ is a bad idea.  The whole
> point of a hard IRQ handler is to set a flag and defer everything
> to a different context.
> 
> Please show me one good reason why we should allow crypto in
> a hard IRQ.
>  
> > IMO, doing it for lib/ too would be going too far though.  The lib/ functions
> > should be easy to use and not have random requirements on the calling context.
> > And since they're just functions, it's easy for them to fall back to the generic
> > functions when needed.  Also note that for very short inputs it can actually be
> > faster to use no-SIMD code, as that avoids the overhead of a kernel-mode SIMD
> > section.  So the fallback sometimes exists anyway for that.
> 
> We already disallow SIMD in hard IRQs anyway (may_use_simd is
> always false in that context).  The only thing you could use
> is the generic implementation.
> 
> So making this change in lib/crypto does not take any functionality
> away.  You could still invoke the generic lib/crypto code directly.
> 
> It does mean that we take away a completely useless check for
> people who are actually doing crypto because crypto work should
> never be done in a hard IRQ.

It's not the 90s anymore.  Crypto is fast now, and used ubiquitously.

And "crypto" doesn't necessarily mean a large operation.  It can be hashing just
a few bytes of data, for example.

Also as you know, the crypto API includes some non-cryptographic algorithms too.

BTW, x86 does allow SIMD in hardirq context in some cases.

Certainly agreed that crypto in hardirqs is something to be avoided in general,
though.

So maybe your proposal is okay, if it's done properly.

The thing I actually have more of a problem with is that you tend to start
making random API changes without any of the necessary prerequisites like
updating documentation, or adding debug assertions to catch violations of new
requirements.  You've already started removing the fallbacks from shash (commit
3846c01d42526bc31), but neither of those things have been done.  So we're
currently in a weird state where the shash API is explicitly documented to work
in all contexts, but you've broken that.

- Eric
Herbert Xu April 3, 2025, 2:56 a.m. UTC | #8
On Thu, Apr 03, 2025 at 02:14:53AM +0000, Eric Biggers wrote:
>
> It's not the 90s anymore.  Crypto is fast now, and used ubiquitously.

I have to say that you've done a great job in improving crypto
performance on x86 and I'm very pleased with being able to
encrypt 256 bytes in just over 100 CPU cycles and doing a
whole page takes less than 1000 cycles.

But this is only possible with SIMD instructions which we do not
support in hard IRQ context.

Cheers,
Eric Biggers April 3, 2025, 3:20 a.m. UTC | #9
On Thu, Apr 03, 2025 at 10:56:35AM +0800, Herbert Xu wrote:
> On Thu, Apr 03, 2025 at 02:14:53AM +0000, Eric Biggers wrote:
> >
> > It's not the 90s anymore.  Crypto is fast now, and used ubiquitously.
> 
> I have to say that you've done a great job in improving crypto
> performance on x86 and I'm very pleased with being able to
> encrypt 256 bytes in just over 100 CPU cycles and doing a
> whole page takes less than 1000 cycles.
> 
> But this is only possible with SIMD instructions which we do not
> support in hard IRQ context.
> 

What?  Take a look at siphash_1u32(), for example.  That is crypto, and it is
fast.  It doesn't use, or need to use, SIMD instructions.

Also, riscv has scalar AES instructions.  (They aren't used by the kernel yet,
but they could be.  The CRC code already uses scalar carryless multiplication.)

Obviously, it's also very common to really need the SIMD unit.  That's the way
it is.  But those are not all cases.

Also, as I said already, x86 does support SIMD instructions in hardirq context
in some cases.  Whether anyone actually uses that, I don't know, but it is
explicitly supported.  Check out irq_fpu_usable().

- Eric
Herbert Xu April 3, 2025, 3:42 a.m. UTC | #10
On Wed, Apr 02, 2025 at 08:20:08PM -0700, Eric Biggers wrote:
>
> Also, riscv has scalar AES instructions.  (They aren't used by the kernel yet,
> but they could be.  The CRC code already uses scalar carryless multiplication.)

It still doesn't mean that it's a good idea to use AES in a
hard IRQ handler, especially if the code is meant to be portable.

> Also, as I said already, x86 does support SIMD instructions in hardirq context
> in some cases.  Whether anyone actually uses that, I don't know, but it is
> explicitly supported.  Check out irq_fpu_usable().

This is more of an accident than some deliberate strategy of
supporting FPU usage in hard IRQs.  This test was initially
added for aesni:

commit 54b6a1bd5364aca95cd6ffae00f2b64c6511122c
Author: Ying Huang <huang.ying.caritas@gmail.com>
Date:   Sun Jan 18 16:28:34 2009 +1100

    crypto: aes-ni - Add support to Intel AES-NI instructions for x86_64 platform

It was then improved by:

Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Feb 13 13:56:14 2012 -0800

    i387: make irq_fpu_usable() tests more robust
    
    Some code - especially the crypto layer - wants to use the x86
    FP/MMX/AVX register set in what may be interrupt (typically softirq)
    context.

At no point was there any intention of using this in a hardirq
context.

Until such a time when you have a valid application for using
lib/crypto code in a hardirq context, I don't think we should
be supporting that at the expense of real users who are in
process/softirq context only.

Cheers,
Eric Biggers April 3, 2025, 3:59 a.m. UTC | #11
On Thu, Apr 03, 2025 at 11:42:34AM +0800, Herbert Xu wrote:
> On Wed, Apr 02, 2025 at 08:20:08PM -0700, Eric Biggers wrote:
> >
> > Also, riscv has scalar AES instructions.  (They aren't used by the kernel yet,
> > but they could be.  The CRC code already uses scalar carryless multiplication.)
> 
> It still doesn't mean that it's a good idea to use AES in a
> hard IRQ handler, especially if the code is meant to be portable.
> 
> > Also, as I said already, x86 does support SIMD instructions in hardirq context
> > in some cases.  Whether anyone actually uses that, I don't know, but it is
> > explicitly supported.  Check out irq_fpu_usable().
> 
> This is more of an accident than some deliberate strategy of
> supporting FPU usage in hard IRQs.  This test was initially
> added for aesni:
> 
> commit 54b6a1bd5364aca95cd6ffae00f2b64c6511122c
> Author: Ying Huang <huang.ying.caritas@gmail.com>
> Date:   Sun Jan 18 16:28:34 2009 +1100
> 
>     crypto: aes-ni - Add support to Intel AES-NI instructions for x86_64 platform
> 
> It was then improved by:
> 
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Mon Feb 13 13:56:14 2012 -0800
> 
>     i387: make irq_fpu_usable() tests more robust
>     
>     Some code - especially the crypto layer - wants to use the x86
>     FP/MMX/AVX register set in what may be interrupt (typically softirq)
>     context.
> 
> At no point was there any intention of using this in a hardirq
> context.
> 
> Until such a time when you have a valid application for using
> lib/crypto code in a hardirq context, I don't think we should
> be supporting that at the expense of real users who are in
> process/softirq context only.

Whatever.  We agree that "crypto in hardirq" is not a good idea in general.  I'm
just pointing out that there are certain cases, like SipHash used in a hash
table, where it easily could happen and would be fine.  And all the shash and
crypto library functions currently work in any context, unlike e.g. skcipher and
aead which do not.  You seem to be trying to claim that it was never supported,
but that is incorrect.  Making it unsupported would be a change that needs to be
properly documented (the functions would no longer be simply "Any context")
*and* have proper debug assertions added to enforce it and prevent usage errors.
But in a lot of cases there is also no reason to even add that restriction.  I'm
not sure why you're so eager to make the library functions harder to use.

- Eric
Ard Biesheuvel April 3, 2025, 7:03 a.m. UTC | #12
On Thu, 3 Apr 2025 at 06:59, Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Thu, Apr 03, 2025 at 11:42:34AM +0800, Herbert Xu wrote:
> > On Wed, Apr 02, 2025 at 08:20:08PM -0700, Eric Biggers wrote:
> > >
> > > Also, riscv has scalar AES instructions.  (They aren't used by the kernel yet,
> > > but they could be.  The CRC code already uses scalar carryless multiplication.)
> >
> > It still doesn't mean that it's a good idea to use AES in a
> > hard IRQ handler, especially if the code is meant to be portable.
> >
> > > Also, as I said already, x86 does support SIMD instructions in hardirq context
> > > in some cases.  Whether anyone actually uses that, I don't know, but it is
> > > explicitly supported.  Check out irq_fpu_usable().
> >
> > This is more of an accident than some deliberate strategy of
> > supporting FPU usage in hard IRQs.  This test was initially
> > added for aesni:
> >
> > commit 54b6a1bd5364aca95cd6ffae00f2b64c6511122c
> > Author: Ying Huang <huang.ying.caritas@gmail.com>
> > Date:   Sun Jan 18 16:28:34 2009 +1100
> >
> >     crypto: aes-ni - Add support to Intel AES-NI instructions for x86_64 platform
> >
> > It was then improved by:
> >
> > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > Date:   Mon Feb 13 13:56:14 2012 -0800
> >
> >     i387: make irq_fpu_usable() tests more robust
> >
> >     Some code - especially the crypto layer - wants to use the x86
> >     FP/MMX/AVX register set in what may be interrupt (typically softirq)
> >     context.
> >
> > At no point was there any intention of using this in a hardirq
> > context.
> >
> > Until such a time when you have a valid application for using
> > lib/crypto code in a hardirq context, I don't think we should
> > be supporting that at the expense of real users who are in
> > process/softirq context only.
>
> Whatever.  We agree that "crypto in hardirq" is not a good idea in general.  I'm
> just pointing out that there are certain cases, like SipHash used in a hash
> table, where it easily could happen and would be fine.  And all the shash and
> crypto library functions currently work in any context, unlike e.g. skcipher and
> aead which do not.  You seem to be trying to claim that it was never supported,
> but that is incorrect.  Making it unsupported would be a change that needs to be
> properly documented (the functions would no longer be simply "Any context")
> *and* have proper debug assertions added to enforce it and prevent usage errors.
> But in a lot of cases there is also no reason to even add that restriction.  I'm
> not sure why you're so eager to make the library functions harder to use.
>

Agree with Eric.

There may be cases where some error condition (machine check etc) is
hit while running in hard IRQ context or with IRQs disabled, and the
code that produces the diagnostic, writes to pstore, generates the QR
code for  etc etc may actually be where the library calls to crc32 etc
originate from. So pedantically disallowing that rather than falling
back to a non-SIMD code path make things worse, because now, the
original diagnostic may get lost while the only information left to
debug the issue is an OOPS complaining about a library call in hard
IRQ context.

So while I agree that knowingly invoking library interfaces with IRQs
disabled should be avoided, that is just a variation on the general
adage that IRQs should only be disabled when absolutely necessary. But
that necessity may derive from a condition that exists one or several
layers up.