[RFC,V2,4/5] crypto: aesni - AES CTR x86_64 "by16" AVX512 optimization

Introduce the "by16" implementation of the AES CTR mode using AVX512
optimizations. "by16" means that 16 independent blocks (each block
being 128 bits) can be ciphered simultaneously as opposed to the
current 8 blocks.

The glue code in AESNI module overrides the existing "by8" CTR mode
encryption/decryption routines with the "by16" ones when the following
criteria are met:
At compile time:
1. CONFIG_CRYPTO_AVX512 is enabled
2. toolchain(assembler) supports VAES instructions
At runtime:
1. VAES and AVX512VL features are supported on platform (currently
   only Icelake)
2. If compiled as built-in or loadable module, aesni_intel.use_avx512
   is set at boot time.

The functions aes_ctr_enc_128_avx512_by16(), aes_ctr_enc_192_avx512_by16()
and aes_ctr_enc_256_avx512_by16() are adapted from Intel Optimized IPSEC
Cryptographic library.

On a Icelake desktop, with turbo disabled and all CPUs running at maximum
frequency, the "by16" CTR mode optimization shows better performance
across data & key sizes as measured by tcrypt.

The average performance improvement of the "by16" version over the "by8"
version is as follows:
For all key sizes(128/192/256 bits),
        data sizes < 128 bytes/block, negligible improvement(~3% loss)
        data sizes > 128 bytes/block, there is an average improvement of
48% for both encryption and decryption.

A typical run of tcrypt with AES CTR mode encryption/decryption of the
"by8" and "by16" optimization on a Icelake desktop shows the following
results:
--------------------------------------------------------------
|  key   | bytes | cycles/op (lower is better)| percentage   |
| length |  per  |  encryption  |  decryption |  loss/gain   |
| (bits) | block |-------------------------------------------|
|        |       | by8  | by16  | by8  | by16 |  enc | dec   |
|------------------------------------------------------------|
|  128   |  16   | 156  | 168   | 164  | 168  | -7.7 |  -2.5 |
|  128   |  64   | 180  | 190   | 157  | 146  | -5.6 |   7.1 |
|  128   |  256  | 248  | 158   | 251  | 161  | 36.3 |  35.9 |
|  128   |  1024 | 633  | 316   | 642  | 319  | 50.1 |  50.4 |
|  128   |  1472 | 853  | 411   | 877  | 407  | 51.9 |  53.6 |
|  128   |  8192 | 4463 | 1959  | 4447 | 1940 | 56.2 |  56.4 |
|  192   |  16   | 136  | 145   | 149  | 166  | -6.7 | -11.5 |
|  192   |  64   | 159  | 154   | 157  | 160  |  3.2 |  -2   |
|  192   |  256  | 268  | 172   | 274  | 177  | 35.9 |  35.5 |
|  192   |  1024 | 710  | 358   | 720  | 355  | 49.6 |  50.7 |
|  192   |  1472 | 989  | 468   | 983  | 469  | 52.7 |  52.3 |
|  192   |  8192 | 6326 | 3551  | 6301 | 3567 | 43.9 |  43.4 |
|  256   |  16   | 153  | 165   | 139  | 156  | -7.9 | -12.3 |
|  256   |  64   | 158  | 152   | 174  | 161  |  3.8 |   7.5 |
|  256   |  256  | 283  | 176   | 287  | 202  | 37.9 |  29.7 |
|  256   |  1024 | 797  | 393   | 807  | 395  | 50.7 |  51.1 |
|  256   |  1472 | 1108 | 534   | 1107 | 527  | 51.9 |  52.4 |
|  256   |  8192 | 5763 | 2616  | 5773 | 2617 | 54.7 |  54.7 |
--------------------------------------------------------------

This work was inspired by the AES CTR mode optimization published
in Intel Optimized IPSEC Cryptographic library.
https://github.com/intel/intel-ipsec-mb/blob/master/lib/avx512/cntr_vaes_avx512.asm

Co-developed-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 arch/x86/crypto/Makefile                    |   1 +
 arch/x86/crypto/aes_avx512_common.S         | 341 ++++++++++
 arch/x86/crypto/aes_ctrby16_avx512-x86_64.S | 955 ++++++++++++++++++++++++++++
 arch/x86/crypto/aesni-intel_glue.c          |  42 +-
 arch/x86/include/asm/disabled-features.h    |   8 +-
 crypto/Kconfig                              |  12 +
 6 files changed, 1357 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/crypto/aes_avx512_common.S
 create mode 100644 arch/x86/crypto/aes_ctrby16_avx512-x86_64.S

Message ID	1611386920-28579-5-git-send-email-megha.dey@intel.com (mailing list archive)
State	RFC
Delegated to:	Herbert Xu
Headers	show Return-Path: <linux-crypto-owner@kernel.org> IronPort-SDR: MOv3/Y6PZJ0ulGPPjHflYO/FHitCAKgsIFXin+Y8a75iM++MxJbO7otdWpsw00QLi9+ANjBF8F ZkT262PcRYPQ== IronPort-SDR: FIVbqqYhIEty4ZMWreKEunqmKmvuGwvg9ScFHFSainpjvECoGdhgDlWQs6QmcfOhQiXJbE6Hcp BeCb+MJ6bfPQ== From: Megha Dey <megha.dey@intel.com> To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net Cc: ravi.v.shankar@intel.com, tim.c.chen@intel.com, andi.kleen@intel.com, dave.hansen@intel.com, megha.dey@intel.com, greg.b.tucker@intel.com, robert.a.kasten@intel.com, rajendrakumar.chinnaiyan@intel.com, tomasz.kantecki@intel.com, ryan.d.saffores@intel.com, ilya.albrekht@intel.com, kyung.min.park@intel.com, tony.luck@intel.com, ira.weiny@intel.com, ebiggers@kernel.org, ardb@kernel.org Subject: [RFC V2 4/5] crypto: aesni - AES CTR x86_64 "by16" AVX512 optimization Date: Fri, 22 Jan 2021 23:28:39 -0800 Message-Id: <1611386920-28579-5-git-send-email-megha.dey@intel.com> In-Reply-To: <1611386920-28579-1-git-send-email-megha.dey@intel.com> References: <1611386920-28579-1-git-send-email-megha.dey@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Introduce AVX512 optimized crypto algorithms \| expand [RFC,V2,0/5] Introduce AVX512 optimized crypto algorithms [RFC,V2,1/5] crypto: aesni - fix coding style for if/else block [RFC,V2,2/5] x86: Probe assembler capabilities for VAES and VPLCMULQDQ support [RFC,V2,3/5] crypto: crct10dif - Accelerated CRC T10 DIF with vectorized instruction [RFC,V2,4/5] crypto: aesni - AES CTR x86_64 "by16" AVX512 optimization [RFC,V2,5/5] crypto: aesni - AVX512 version of AESNI-GCM using VPCLMULQDQ

[RFC,V2,4/5] crypto: aesni - AES CTR x86_64 "by16" AVX512 optimization

Commit Message

Patch