mbox series

[v2,00/11] CRC64 library rework and x86 CRC optimization

Message ID 20250130035130.180676-1-ebiggers@kernel.org (mailing list archive)
Headers show
Series CRC64 library rework and x86 CRC optimization | expand

Message

Eric Biggers Jan. 30, 2025, 3:51 a.m. UTC
This patchset applies to commit 72deda0abee6e7 and is also available at:

    git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git crc-x86-v2

This is the next major set of CRC library improvements, targeting 6.15.

Patches 1-5 rework the CRC64 library along the lines of what I did for
CRC32 and CRC-T10DIF in 6.14.  They add direct support for
architecture-specific optimizations, fix the naming of the NVME CRC64
variant, and eliminate a pointless use of the crypto API.

Patches 6-10 replace the existing x86 PCLMULQDQ optimized CRC code with
new code that is shared among the different CRC variants and also adds
VPCLMULQDQ support, greatly improving performance on recent CPUs.
Patch 11 wires up the same optimization to crc64_be() and crc64_nvme()
(a.k.a. the old "crc64_rocksoft") which previously were unoptimized,
improving the performance of those CRC functions by as much as 100x.
crc64_be is used by bcachefs, and crc64_nvme is used by blk-integrity.

Eric Biggers (11):
  lib/crc64-rocksoft: stop wrapping the crypto API
  crypto: crc64-rocksoft - remove from crypto API
  lib/crc64: rename CRC64-Rocksoft to CRC64-NVME
  lib/crc_kunit.c: add test and benchmark for CRC64-NVME
  lib/crc64: add support for arch-optimized implementations
  x86: move ZMM exclusion list into CPU feature flag
  scripts/gen-crc-consts: add gen-crc-consts.py
  x86/crc: add "template" for [V]PCLMULQDQ based CRC functions
  x86/crc32: implement crc32_le using new template
  x86/crc-t10dif: implement crc_t10dif using new template
  x86/crc64: implement crc64_be and crc64_nvme using new template

 MAINTAINERS                         |   1 +
 arch/x86/Kconfig                    |   3 +-
 arch/x86/crypto/aesni-intel_glue.c  |  22 +-
 arch/x86/include/asm/cpufeatures.h  |   1 +
 arch/x86/kernel/cpu/intel.c         |  22 ++
 arch/x86/lib/Makefile               |   5 +-
 arch/x86/lib/crc-pclmul-consts.h    | 195 ++++++++++
 arch/x86/lib/crc-pclmul-template.S  | 578 ++++++++++++++++++++++++++++
 arch/x86/lib/crc-pclmul-template.h  |  81 ++++
 arch/x86/lib/crc-t10dif-glue.c      |  23 +-
 arch/x86/lib/crc16-msb-pclmul.S     |   6 +
 arch/x86/lib/crc32-glue.c           |  37 +-
 arch/x86/lib/crc32-pclmul.S         | 219 +----------
 arch/x86/lib/crc64-glue.c           |  50 +++
 arch/x86/lib/crc64-pclmul.S         |   7 +
 arch/x86/lib/crct10dif-pcl-asm_64.S | 332 ----------------
 block/Kconfig                       |   2 +-
 block/t10-pi.c                      |   2 +-
 crypto/Kconfig                      |  11 -
 crypto/Makefile                     |   1 -
 crypto/crc64_rocksoft_generic.c     |  89 -----
 crypto/testmgr.c                    |   7 -
 crypto/testmgr.h                    |  12 -
 include/linux/crc64.h               |  38 +-
 lib/Kconfig                         |  16 +-
 lib/Makefile                        |   1 -
 lib/crc64-rocksoft.c                | 126 ------
 lib/crc64.c                         |  49 +--
 lib/crc_kunit.c                     |  30 +-
 lib/gen_crc64table.c                |  10 +-
 scripts/gen-crc-consts.py           | 214 ++++++++++
 31 files changed, 1270 insertions(+), 920 deletions(-)
 create mode 100644 arch/x86/lib/crc-pclmul-consts.h
 create mode 100644 arch/x86/lib/crc-pclmul-template.S
 create mode 100644 arch/x86/lib/crc-pclmul-template.h
 create mode 100644 arch/x86/lib/crc16-msb-pclmul.S
 create mode 100644 arch/x86/lib/crc64-glue.c
 create mode 100644 arch/x86/lib/crc64-pclmul.S
 delete mode 100644 arch/x86/lib/crct10dif-pcl-asm_64.S
 delete mode 100644 crypto/crc64_rocksoft_generic.c
 delete mode 100644 lib/crc64-rocksoft.c
 create mode 100755 scripts/gen-crc-consts.py


base-commit: 72deda0abee6e705ae71a93f69f55e33be5bca5c

Comments

Keith Busch Jan. 30, 2025, 3:09 p.m. UTC | #1
On Wed, Jan 29, 2025 at 07:51:19PM -0800, Eric Biggers wrote:
> Patch 11 wires up the same optimization to crc64_be() and crc64_nvme()
> (a.k.a. the old "crc64_rocksoft") which previously were unoptimized,

Yes, I mistakenly thought the name of the table in the spec was the
name of the CRC, but that was just referring to format of how the
parameters were displayed. It really should have been crc64_nvme, so
thank you for clearing up the naming confusion.

> improving the performance of those CRC functions by as much as 100x.

Awesome!

Looks good:

Acked-by: Keith Busch <kbusch@kernel.org>
Martin K. Petersen Jan. 30, 2025, 3:20 p.m. UTC | #2
Eric,

> Patches 1-5 rework the CRC64 library along the lines of what I did for
> CRC32 and CRC-T10DIF in 6.14. They add direct support for
> architecture-specific optimizations, fix the naming of the NVME CRC64
> variant, and eliminate a pointless use of the crypto API.
>
> Patches 6-10 replace the existing x86 PCLMULQDQ optimized CRC code
> with new code that is shared among the different CRC variants and also
> adds VPCLMULQDQ support, greatly improving performance on recent CPUs.
> Patch 11 wires up the same optimization to crc64_be() and crc64_nvme()
> (a.k.a. the old "crc64_rocksoft") which previously were unoptimized,
> improving the performance of those CRC functions by as much as 100x.
> crc64_be is used by bcachefs, and crc64_nvme is used by blk-integrity.

Very nice!

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>