mbox series

[v2,0/5] NVMe PCI endpoint function driver

Message ID 20241011121951.90019-1-dlemoal@kernel.org (mailing list archive)
Headers show
Series NVMe PCI endpoint function driver | expand

Message

Damien Le Moal Oct. 11, 2024, 12:19 p.m. UTC
This patch series implements an NVMe PCI endpoint driver that implements
a PCIe NVMe controller for a local NVMe fabrics host controller.
This series is based on the improved PCI endpoint API patches "Improve
PCI memory mapping API" (see [1]).

The first 3 patches of this series are changes to the NVMe target and
fabrics code to facilitate reusing the NVMe code and an NVMe host
controller from other (non NVMe) drivers.

Patch 4 is the main patch which introduces the NVMe PCI endpoint driver.
This patch commit message provides and overview of the driver design and
operation.

Finally, patch 5 adds documentation files to document the NVMe PCI
endpoint function driver internals, provide a user guide explaning how
to setup an NVMe PCI endpoint device and describe the NVMe endpoint
function driver binding attributes.

This driver has been extensively tested using a Radxa Rock5B board
(rk3588 Arm SoC). Some tests have also been done using a Pine Rockpro64
board (however, this board does not support PCI DMA, leading to very
poor performance).

Using the Radxa Rock5b board and setting up a 4 queue-pairs controller
with a null-blk block device loop target, performance was measured
using fio as follows:                                      

 +----------------------------------+------------------------+
 | Workload                         | IOPS (BW)              |
 +----------------------------------+------------------------+
 | Rand read, 4KB, QD=1, 1 job      | 7382 IOPS              |
 | Rand read, 4KB, QD=32, 1 job     | 45.5k IOPS             |
 | Rand read, 4KB, QD=32, 4 jobs    | 49.7k IOPS             |
 | Rand read, 128KB, QD=32, 1 job   | 10.0k IOPS (1.31 GB/s) |
 | Rand read, 128KB, QD=32, 4 jobs  | 10.2k IOPS (1.33 GB/s) |
 | Seq read, 128KB, QD=32, 1 job    | 1.28 GB/s              |
 | Seq read, 512KB, QD=32, 1 job    | 1.28 GB/s              |
 | Rand write, 128KB, QD=32, 1 job  | 8713 IOPS (1.14 GB/s)  |
 | Rand write, 128KB, QD=32, 4 jobs | 8103 IOPS (1.06 GB/s)  |
 | Seq write, 128KB, QD=32, 1 job   | 8557 IOPS (1.12 GB/s)  |
 | Seq write, 512KB, QD=32, 1 job   | 2069 IOPS (1.08 GB/s)  |
 +----------------------------------+------------------------+

These results use the default MDTS of the NVMe enpoint driver of 128 KB.
Setting the NVMe endpoint device with a larger MDTS of 512 KB leads to
improved maximum throughput of up to 2.4 GB/s (e.g. for the 512K random
read workloads and sequential read workloads). The maximum IOPS achieved
with this larger MDTS does not change significantly.

This driver is not intended for production use but rather to be a
playground for learning NVMe and NVMe over fabrics and exploring/testing
new NVMe features while providing reasonably good performance.

[1] https://lore.kernel.org/linux-pci/20241007040319.157412-1-dlemoal@kernel.org/T/#t

Changes from v1:
 - Added review tag to patch 1
 - Modified patch 4 to:
   - Add Rick's copyright notice
   - Improve admin command handling (set_features command) to handle the
     number of queues feature (among others) to enable Windows host
   - Improved SQ and CQ work items handling

Damien Le Moal (5):
  nvmet: rename and move nvmet_get_log_page_len()
  nvmef: export nvmef_create_ctrl()
  nvmef: Introduce the NVME_OPT_HIDDEN_NS option
  PCI: endpoint: Add NVMe endpoint function driver
  PCI: endpoint: Document the NVMe endpoint function driver

 .../endpoint/function/binding/pci-nvme.rst    |   34 +
 Documentation/PCI/endpoint/index.rst          |    3 +
 .../PCI/endpoint/pci-nvme-function.rst        |  151 +
 Documentation/PCI/endpoint/pci-nvme-howto.rst |  189 ++
 MAINTAINERS                                   |    9 +
 drivers/nvme/host/core.c                      |   17 +-
 drivers/nvme/host/fabrics.c                   |   11 +-
 drivers/nvme/host/fabrics.h                   |    5 +
 drivers/nvme/target/admin-cmd.c               |   20 +-
 drivers/nvme/target/discovery.c               |    4 +-
 drivers/nvme/target/nvmet.h                   |    3 -
 drivers/pci/endpoint/functions/Kconfig        |    9 +
 drivers/pci/endpoint/functions/Makefile       |    1 +
 drivers/pci/endpoint/functions/pci-epf-nvme.c | 2591 +++++++++++++++++
 include/linux/nvme.h                          |   19 +
 15 files changed, 3036 insertions(+), 30 deletions(-)
 create mode 100644 Documentation/PCI/endpoint/function/binding/pci-nvme.rst
 create mode 100644 Documentation/PCI/endpoint/pci-nvme-function.rst
 create mode 100644 Documentation/PCI/endpoint/pci-nvme-howto.rst
 create mode 100644 drivers/pci/endpoint/functions/pci-epf-nvme.c