[RFC,V2,5/5] crypto: aesni - AVX512 version of AESNI-GCM using VPCLMULQDQ

Introduce the AVX512 implementation that optimizes the AESNI-GCM encode
and decode routines using VPCLMULQDQ.

The glue code in AESNI module overrides the existing AVX2 GCM mode
encryption/decryption routines with the AX512 AES GCM mode ones when the
following criteria are met:
At compile time:
1. CONFIG_CRYPTO_AVX512 is enabled
2. toolchain(assembler) supports VPCLMULQDQ instructions
At runtime:
1. VPCLMULQDQ and AVX512VL features are supported on a platform
   (currently only Icelake)
2. If compiled as built-in or loadable module, aesni_intel.use_avx512
   is set at boot time

The functions aesni_gcm_init_avx_512, aesni_gcm_enc_update_avx_512,
aesni_gcm_dec_update_avx_512 and aesni_gcm_finalize_avx_512 are adapted
from the Intel Optimized IPSEC Cryptographic library.

On a Icelake desktop, with turbo disabled and all CPUs running at
maximum frequency, the AVX512 GCM mode optimization shows better
performance across data & key sizes as measured by tcrypt.

The average performance improvement of the AVX512 version over the AVX2
version is as follows:
For all key sizes(128/192/256 bits),
        data sizes < 128 bytes/block, negligible improvement (~7.5%)
        data sizes > 128 bytes/block, there is an average improvement of
        40% for both encryption and decryption.

A typical run of tcrypt with AES GCM mode encryption/decryption of the
AVX2 and AVX512 optimization on a Icelake desktop shows the following
results:

  ----------------------------------------------------------------------
  |   key  | bytes | cycles/op (lower is better)   | Percentage gain/  |
  | length |   per |   encryption  |  decryption   |      loss         |
  | (bits) | block |-------------------------------|-------------------|
  |        |       | avx2 | avx512 | avx2 | avx512 | Encrypt | Decrypt |
  |---------------------------------------------------------------------
  |  128   | 16    | 689  |  701   | 689  |  707   |  -1.7   |  -2.61  |
  |  128   | 64    | 731  |  660   | 771  |  649   |   9.7   |  15.82  |
  |  128   | 256   | 911  |  750   | 900  |  721   |  17.67  |  19.88  |
  |  128   | 512   | 1181 |  814   | 1161 |  782   |  31.07  |  32.64  |
  |  128   | 1024  | 1676 |  1052  | 1685 |  1030  |  37.23  |  38.87  |
  |  128   | 2048  | 2475 |  1447  | 2456 |  1419  |  41.53  |  42.22  |
  |  128   | 4096  | 3806 |  2154  | 3820 |  2119  |  43.41  |  44.53  |
  |  128   | 8192  | 9169 |  3806  | 6997 |  3718  |  58.49  |  46.86  |
  |  192   | 16    | 754  |  683   | 737  |  672   |   9.42  |   8.82  |
  |  192   | 64    | 735  |  686   | 715  |  640   |   6.66  |  10.49  |
  |  192   | 256   | 949  |  738   | 2435 |  729   |  22.23  |  70     |
  |  192   | 512   | 1235 |  854   | 1200 |  833   |  30.85  |  30.58  |
  |  192   | 1024  | 1777 |  1084  | 1763 |  1051  |  38.99  |  40.39  |
  |  192   | 2048  | 2574 |  1497  | 2592 |  1459  |  41.84  |  43.71  |
  |  192   | 4096  | 4086 |  2317  | 4091 |  2244  |  43.29  |  45.14  |
  |  192   | 8192  | 7481 |  4054  | 7505 |  3953  |  45.81  |  47.32  |
  |  256   | 16    | 755  |  682   | 720  |  683   |   9.68  |   5.14  |
  |  256   | 64    | 744  |  677   | 719  |  658   |   9     |   8.48  |
  |  256   | 256   | 962  |  758   | 948  |  749   |  21.21  |  21     |
  |  256   | 512   | 1297 |  862   | 1276 |  836   |  33.54  |  34.48  |
  |  256   | 1024  | 1831 |  1114  | 1819 |  1095  |  39.16  |  39.8   |
  |  256   | 2048  | 2767 |  1566  | 2715 |  1524  |  43.4   |  43.87  |
  |  256   | 4096  | 4378 |  2382  | 4368 |  2354  |  45.6   |  46.11  |
  |  256   | 8192  | 8075 |  4262  | 8080 |  4186  |  47.22  |  48.19  |
  ----------------------------------------------------------------------

This work was inspired by the AES GCM mode optimization published in
Intel Optimized IPSEC Cryptographic library.
https://github.com/intel/intel-ipsec-mb/blob/master/lib/avx512/gcm_vaes_avx512.asm

Co-developed-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 arch/x86/crypto/Makefile                    |    1 +
 arch/x86/crypto/aesni-intel_avx512-x86_64.S | 3078 +++++++++++++++++++++++++++
 arch/x86/crypto/aesni-intel_glue.c          |   96 +-
 crypto/Kconfig                              |   12 +
 4 files changed, 3179 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/crypto/aesni-intel_avx512-x86_64.S

Message ID	1611386920-28579-6-git-send-email-megha.dey@intel.com (mailing list archive)
State	RFC
Delegated to:	Herbert Xu
Headers	show Return-Path: <linux-crypto-owner@kernel.org> IronPort-SDR: Sd5WMAkD+knBmoTjbA0e+dheRh/dW9HiK62FDDgZXu80dXKDtqO1gjxMx3rwDeLsLz16KycpBv SYBetz5LWOsw== IronPort-SDR: nx3G3H9/2oF7yCP0pdzBzKi8rnMWQnJLNgC0RjRlE5Ri9czXuu8WwllWdNu0S9JkCrrnHeUABp KqPEFSxKPemw== From: Megha Dey <megha.dey@intel.com> To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net Cc: ravi.v.shankar@intel.com, tim.c.chen@intel.com, andi.kleen@intel.com, dave.hansen@intel.com, megha.dey@intel.com, greg.b.tucker@intel.com, robert.a.kasten@intel.com, rajendrakumar.chinnaiyan@intel.com, tomasz.kantecki@intel.com, ryan.d.saffores@intel.com, ilya.albrekht@intel.com, kyung.min.park@intel.com, tony.luck@intel.com, ira.weiny@intel.com, ebiggers@kernel.org, ardb@kernel.org Subject: [RFC V2 5/5] crypto: aesni - AVX512 version of AESNI-GCM using VPCLMULQDQ Date: Fri, 22 Jan 2021 23:28:40 -0800 Message-Id: <1611386920-28579-6-git-send-email-megha.dey@intel.com> In-Reply-To: <1611386920-28579-1-git-send-email-megha.dey@intel.com> References: <1611386920-28579-1-git-send-email-megha.dey@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Introduce AVX512 optimized crypto algorithms \| expand [RFC,V2,0/5] Introduce AVX512 optimized crypto algorithms [RFC,V2,1/5] crypto: aesni - fix coding style for if/else block [RFC,V2,2/5] x86: Probe assembler capabilities for VAES and VPLCMULQDQ support [RFC,V2,3/5] crypto: crct10dif - Accelerated CRC T10 DIF with vectorized instruction [RFC,V2,4/5] crypto: aesni - AES CTR x86_64 "by16" AVX512 optimization [RFC,V2,5/5] crypto: aesni - AVX512 version of AESNI-GCM using VPCLMULQDQ

[RFC,V2,5/5] crypto: aesni - AVX512 version of AESNI-GCM using VPCLMULQDQ

Commit Message

Patch