[v2,4/5] PCI: endpoint: Add NVMe endpoint function driver

From: Damien Le Moal <damien.lemoal@opensource.wdc.com>

From: Damien Le Moal <damien.lemoal@opensource.wdc.com>

Add a Linux PCI Endpoint function driver that implements a PCIe NVMe
controller for a local NVMe fabrics host controller. The NVMe endpoint
function driver relies as most as possible on the NVMe fabrics driver
for executing NVMe commands received from the host, to minimize NVMe
command parsing. However, some admin commands must be modified to
satisfy PCI transport specifications constraints (e.g. queue management
commands support and the optional SGL support).

The NVMe endpoint function drivers is created as follows:
1) Upon binding of the endpoint function driver to the endpoint
   controller (pci_epf_nvme_bind()), the function driver sets up BAR 0
   for the NVMe PCI controller with enough doorbell space to accommodate
   up to PCI_EPF_NVME_MAX_NR_QUEUES (16) queue pairs. The DMA channels
   that will be used to exchange data with the host over the PCI link
   are also initialized.
2) The endpoint function driver then creates the NVMe host fabrics
   controller using nvmef_create_ctrl() (called from
   pci_epf_nvme_create_ctrl()), which connects the host controller to
   its target (e.g. a loop target with a file or block device or a TCP
   remote target).
3) Once the PCI link status is detected to be up, the endpoint
   controller initializes IRQ management and BAR 0 content to advertize
   its capabilities. The capabilities of the fabrics controller are
   mostly used unmodified (pci_epf_nvme_init_ctrl_regs()). With that,
   the endpoint controller starts a delayed task to poll the BAR 0
   register bar to detect changes to the CC register.
4) When the PCI host enables the controller, pci_epf_nvme_enable_ctrl()
   is called to create the admin submission and completion queues and
   start the fabrics controller with nvme_start_ctrl(). The endpoint
   controller then starts a delayed work to poll the admin submission
   queue doorbell to detect commands from the PCI host.
5) Admin commands received from the PCI host are retrieved from the
   admin queue by mapping the queue memory to PCI memory space, copying
   the command locally using a struct pci_epf_nvme_cmd, and proccess the
   command using pci_epf_nvme_process_admin_cmd().
6) I/O commands are similarly handled: each I/O submission queue uses a
   delayed work to poll the queue doorbell and upon detection of a
   command being issued by the host, the I/O command is copied locally
   and processed using pci_epf_nvme_process_io_cmd().

I/O and admin commands are processed as follows:
1) A minimal parsing of the command is done to determine the command
   buffer size and data transfer direction. The command processing then
   continues using a command work scheduled using a per queue-pair
   high-priority workqueue (pci_epf_nvme_exec_cmd_work()).
2) The command execution work calls pci_epf_nvme_exec_cmd() which will
   retrieve and parse the command PRPs to determine the PCI address
   location of the command buffer segments, and retrieve the command
   data if the command is a write command. The command is then executed
   using the host fabrics controller by calling
   __nvme_submit_sync_cmd(). Once done, pci_epf_nvme_complete_cmd() is
   called to complete the command, after having transferred the command
   data back to the PCI host in the case of a read command.
3) pci_epf_nvme_complete_cmd() queues the command in a completion list
   for the completion queue of the command and schedules the queue
   completion work which will batch CQ entry transfers to the PCI host
   with the completion queue memory mapped to the host PCI address of
   the completion queue.

With this processing, most of the command parsing and handling is left
to the NVMe fabrics code. The only NVMe specific parsing implemented in
the endpoint driver is the command PRP parsing. Of note is that the
current code does not support SGL (this capability is thus not
advertized).

For data transfers, the endpoint driver relies by default on the DMA RX
and TX channels of the hardware endpoint PCI controller. If no DMA
channels are available, the NVMe endpoint function driver falls back to
using mmio, which degrades performance significantly but keeps the
function working.

The BAR register polling work also monitors for controller-disable
events (e.g. the PCI host reboots or shutdown). Such events trigger
calls to pci_epf_nvme_disable_ctrl() which drains, cleanups and destroys
the local queue pairs.

The configuration and enablement of this NVMe endpoint function driver
can be fully controlled using configfs, once a NVMe fabrics target is
also setup. The available configfs parameters are:
 - ctrl_opts: Fabrics controller connection arguments, as formatted for
   the nvme cli "connect" command.
 - dma_enable: Enable (default) or disable DMA data transfers.
 - mdts_kb: Change the maximum data transfer size (default: 128 KB).

Early versions of this driver code were based on an RFC submission by
Alan Mikhak <alan.mikhak@sifive.com> (https://lwn.net/Articles/804369/).
The code however has since been completely rewritten.

Co-developed-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 MAINTAINERS                                   |    7 +
 drivers/pci/endpoint/functions/Kconfig        |    9 +
 drivers/pci/endpoint/functions/Makefile       |    1 +
 drivers/pci/endpoint/functions/pci-epf-nvme.c | 2591 +++++++++++++++++
 4 files changed, 2608 insertions(+)
 create mode 100644 drivers/pci/endpoint/functions/pci-epf-nvme.c

Message ID	20241011121951.90019-5-dlemoal@kernel.org (mailing list archive)
State	New
Delegated to:	Manivannan Sadhasivam
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D8941BDAA1 for <linux-pci@vger.kernel.org>; Fri, 11 Oct 2024 12:20:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728649203; cv=none; b=BVZeX/JGJAkgTdTWfAJXV8269tCgHudD1NGjxZsxhnOD/yYciHI2zLdWna4oTzMl1Qh+52lwtjTSKkN3eV27lPg/coywhEBcb7KlA8eU33I7G8/9elEyTvEelTU4Kci60/wA9/4w9xOaTMlKj8gu2OqXWNiDN5tNSOJtRRKQBdM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728649203; c=relaxed/simple; bh=07YP04UK3mdaJBrHVqzRvQq5OHw5XZqXCedoDihffj8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=k9p68CUmZOCY8YjWpVXQjPVI72Lz0mMYh1TU/epXhCaxA6mqVjGsi+e+jGUzGTn0FmLNJE8u3ctcaFmOcfKHDqGV+CEmI9BJ7kjJGTfQJp3kbkCwLglF+Ew357bMOz0iyRdLi+o+jqUk8kkNEYyNx2a+8jSyIIY34/0jLrFfoUA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Wc15Q3FB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Wc15Q3FB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5F915C4CECE; Fri, 11 Oct 2024 12:20:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1728649203; bh=07YP04UK3mdaJBrHVqzRvQq5OHw5XZqXCedoDihffj8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Wc15Q3FBca16yKmKUM5y2u/w5CYf5dkUTNSoO34g2dv8Gt5IXRqAw3iRsDK4fznl1 b033n/naC8djZHOqUmaJTUvdIFMFaVg6phTQdAIV1Q07JZOcQKUC/mXQFvjXtbthsw CFDr40wX7hxRq2sppXLbs9QnI33UlBSyaALwNpIQPoyfTToDS1HxeSFQMej8ys/4M6 2O+vC7F5Or1E7t05Zc4t7nix1prMBtXOgWy95TbVFDxTA5bK4sWH/go79u+Kjus1/3 9/uBZQACcxoJJwakol9gDe3EL1ik4GrK4XS8iw6aoaNjRe2SFMYhyYD2PHikqvkP1U m8oasJ8GeCiBQ== From: Damien Le Moal <dlemoal@kernel.org> To: linux-nvme@lists.infradead.org, Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>, Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>, =?utf-8?q?Krzyszt?= =?utf-8?q?of_Wilczy=C5=84ski?= <kw@linux.com>, Kishon Vijay Abraham I <kishon@kernel.org>, Bjorn Helgaas <bhelgaas@google.com>, Lorenzo Pieralisi <lpieralisi@kernel.org>, linux-pci@vger.kernel.org Cc: Rick Wertenbroek <rick.wertenbroek@gmail.com>, Niklas Cassel <cassel@kernel.org> Subject: [PATCH v2 4/5] PCI: endpoint: Add NVMe endpoint function driver Date: Fri, 11 Oct 2024 21:19:50 +0900 Message-ID: <20241011121951.90019-5-dlemoal@kernel.org> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241011121951.90019-1-dlemoal@kernel.org> References: <20241011121951.90019-1-dlemoal@kernel.org> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: <linux-pci.vger.kernel.org> List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	NVMe PCI endpoint function driver \| expand [v2,0/5] NVMe PCI endpoint function driver [v2,1/5] nvmet: rename and move nvmet_get_log_page_len() [v2,2/5] nvmef: export nvmef_create_ctrl() [v2,3/5] nvmef: Introduce the NVME_OPT_HIDDEN_NS option [v2,4/5] PCI: endpoint: Add NVMe endpoint function driver [v2,5/5] PCI: endpoint: Document the NVMe endpoint function driver

[v2,4/5] PCI: endpoint: Add NVMe endpoint function driver

Commit Message

Comments

Patch