mbox series

[v4,0/2] xor: enable auto-vectorization in Clang

Message ID 20220205152346.237392-1-ardb@kernel.org (mailing list archive)
Headers show
Series xor: enable auto-vectorization in Clang | expand

Message

Ard Biesheuvel Feb. 5, 2022, 3:23 p.m. UTC
Update the xor_blocks() prototypes so that the compiler understands that
the inputs always refer to distinct regions of memory. This is implied
by the existing implementations, as they use different granularities for
the load/xor/store loops.

With that, we can fix the ARM/Clang version, which refuses to SIMD
vectorize otherwise, and throws a spurious warning related to the GCC
version being incompatible.

Changes since v3:
- revert broken PPC argument rename - doing it fully results in too
  much pointless churn, and the 'inner' altivec routines are not
  strictly part of the xor_blocks API anyway

Changes since v2:
- fix arm64 build after rebase
- name PPC argument names consistently
- add Nick's acks and link tags

Changes since v1:
- fix PPC build
- add Nathan's Tested-by

Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>

Ard Biesheuvel (2):
  lib/xor: make xor prototypes more friendly to compiler vectorization
  crypto: arm/xor - make vectorized C code Clang-friendly

 arch/alpha/include/asm/xor.h           | 53 ++++++++----
 arch/arm/include/asm/xor.h             | 42 ++++++----
 arch/arm/lib/xor-neon.c                | 12 +--
 arch/arm64/include/asm/xor.h           | 21 +++--
 arch/arm64/lib/xor-neon.c              | 46 +++++++----
 arch/ia64/include/asm/xor.h            | 21 +++--
 arch/powerpc/include/asm/xor_altivec.h | 25 +++---
 arch/powerpc/lib/xor_vmx.c             | 28 ++++---
 arch/powerpc/lib/xor_vmx.h             | 27 ++++---
 arch/powerpc/lib/xor_vmx_glue.c        | 32 ++++----
 arch/s390/lib/xor.c                    | 21 +++--
 arch/sparc/include/asm/xor_32.h        | 21 +++--
 arch/sparc/include/asm/xor_64.h        | 42 ++++++----
 arch/x86/include/asm/xor.h             | 42 ++++++----
 arch/x86/include/asm/xor_32.h          | 42 ++++++----
 arch/x86/include/asm/xor_avx.h         | 21 +++--
 include/asm-generic/xor.h              | 84 +++++++++++++-------
 include/linux/raid/xor.h               | 21 +++--
 18 files changed, 384 insertions(+), 217 deletions(-)

Comments

Herbert Xu Feb. 11, 2022, 9:37 a.m. UTC | #1
On Sat, Feb 05, 2022 at 04:23:44PM +0100, Ard Biesheuvel wrote:
> Update the xor_blocks() prototypes so that the compiler understands that
> the inputs always refer to distinct regions of memory. This is implied
> by the existing implementations, as they use different granularities for
> the load/xor/store loops.
> 
> With that, we can fix the ARM/Clang version, which refuses to SIMD
> vectorize otherwise, and throws a spurious warning related to the GCC
> version being incompatible.
> 
> Changes since v3:
> - revert broken PPC argument rename - doing it fully results in too
>   much pointless churn, and the 'inner' altivec routines are not
>   strictly part of the xor_blocks API anyway
> 
> Changes since v2:
> - fix arm64 build after rebase
> - name PPC argument names consistently
> - add Nick's acks and link tags
> 
> Changes since v1:
> - fix PPC build
> - add Nathan's Tested-by
> 
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Nathan Chancellor <nathan@kernel.org>
> 
> Ard Biesheuvel (2):
>   lib/xor: make xor prototypes more friendly to compiler vectorization
>   crypto: arm/xor - make vectorized C code Clang-friendly
> 
>  arch/alpha/include/asm/xor.h           | 53 ++++++++----
>  arch/arm/include/asm/xor.h             | 42 ++++++----
>  arch/arm/lib/xor-neon.c                | 12 +--
>  arch/arm64/include/asm/xor.h           | 21 +++--
>  arch/arm64/lib/xor-neon.c              | 46 +++++++----
>  arch/ia64/include/asm/xor.h            | 21 +++--
>  arch/powerpc/include/asm/xor_altivec.h | 25 +++---
>  arch/powerpc/lib/xor_vmx.c             | 28 ++++---
>  arch/powerpc/lib/xor_vmx.h             | 27 ++++---
>  arch/powerpc/lib/xor_vmx_glue.c        | 32 ++++----
>  arch/s390/lib/xor.c                    | 21 +++--
>  arch/sparc/include/asm/xor_32.h        | 21 +++--
>  arch/sparc/include/asm/xor_64.h        | 42 ++++++----
>  arch/x86/include/asm/xor.h             | 42 ++++++----
>  arch/x86/include/asm/xor_32.h          | 42 ++++++----
>  arch/x86/include/asm/xor_avx.h         | 21 +++--
>  include/asm-generic/xor.h              | 84 +++++++++++++-------
>  include/linux/raid/xor.h               | 21 +++--
>  18 files changed, 384 insertions(+), 217 deletions(-)

All applied.  Thanks.