mbox series

[0/3] crypto: crct10dif assembly cleanup and optimizations

Message ID 20190128085045.6504-1-ebiggers@kernel.org (mailing list archive)
Headers show
Series crypto: crct10dif assembly cleanup and optimizations | expand

Message

Eric Biggers Jan. 28, 2019, 8:50 a.m. UTC
The x86, arm, and arm64 asm implementations of crct10dif are very
difficult to understand partly because many of the comments, labels, and
macros are named incorrectly: the lengths mentioned are usually off by a
factor of two from the actual code.  Many other things are unnecessarily
convoluted as well, e.g. there are many more fold constants than
actually needed and some aren't fully reduced.

This series therefore cleans up all these implementations to be much
more maintainable.  I also made some small optimizations where I saw
opportunities, resulting in slightly better performance.

This is based on top of the pending patches from Ard Biesheuvel.

Eric Biggers (3):
  crypto: x86/crct10dif-pcl - cleanup and optimizations
  crypto: arm/crct10dif-ce - cleanup and optimizations
  crypto: arm64/crct10dif-ce - cleanup and optimizations

 arch/arm/crypto/crct10dif-ce-core.S     | 546 +++++++--------
 arch/arm/crypto/crct10dif-ce-glue.c     |   2 +-
 arch/arm64/crypto/crct10dif-ce-core.S   | 496 +++++++-------
 arch/arm64/crypto/crct10dif-ce-glue.c   |   4 +-
 arch/x86/crypto/crct10dif-pcl-asm_64.S  | 848 +++++++++---------------
 arch/x86/crypto/crct10dif-pclmul_glue.c |   3 +-
 6 files changed, 794 insertions(+), 1105 deletions(-)

Comments

Ard Biesheuvel Jan. 28, 2019, 9:47 a.m. UTC | #1
On Mon, 28 Jan 2019 at 09:57, Eric Biggers <ebiggers@kernel.org> wrote:
>
> The x86, arm, and arm64 asm implementations of crct10dif are very
> difficult to understand partly because many of the comments, labels, and
> macros are named incorrectly: the lengths mentioned are usually off by a
> factor of two from the actual code.  Many other things are unnecessarily
> convoluted as well, e.g. there are many more fold constants than
> actually needed and some aren't fully reduced.
>
> This series therefore cleans up all these implementations to be much
> more maintainable.  I also made some small optimizations where I saw
> opportunities, resulting in slightly better performance.
>

Wonderful, thanks for doing this Eric.

When I ported the code from x86, I noticed that the comments were out
of sync with the code, but I wasn't aware of the other issues at all.

Assuming the new code passes your extended tests,

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

for the series, but with one minor nit for the ARM version: could we
please move the literal pool to .rodata as we did for the arm64
version?



> This is based on top of the pending patches from Ard Biesheuvel.
>
> Eric Biggers (3):
>   crypto: x86/crct10dif-pcl - cleanup and optimizations
>   crypto: arm/crct10dif-ce - cleanup and optimizations
>   crypto: arm64/crct10dif-ce - cleanup and optimizations
>
>  arch/arm/crypto/crct10dif-ce-core.S     | 546 +++++++--------
>  arch/arm/crypto/crct10dif-ce-glue.c     |   2 +-
>  arch/arm64/crypto/crct10dif-ce-core.S   | 496 +++++++-------
>  arch/arm64/crypto/crct10dif-ce-glue.c   |   4 +-
>  arch/x86/crypto/crct10dif-pcl-asm_64.S  | 848 +++++++++---------------
>  arch/x86/crypto/crct10dif-pclmul_glue.c |   3 +-
>  6 files changed, 794 insertions(+), 1105 deletions(-)
>
> --
> 2.20.1
>