[RFC,v1,18/18] nvme-pci: use new dma API

From: Chaitanya Kulkarni <kch@nvidia.com>

From: Chaitanya Kulkarni <kch@nvidia.com>

Introduce a new structure, iod_dma_map, to hold the DMA mapping for each
I/O. This includes the iova state and mapped addresses from
dma_link_range() or dma_map_page_attrs(). Replace the existing sg_table
in nvme_iod with struct dma_map. The size difference between :-

struct nvme_iod with struct sg_table :- 184
struct nvme_iod with struct dma_map  :- 176

In nvme_map_data(), allocate dma_map from mempool and iova using
dma_alloc_iova(). Obtain the memory type from the first bvec of the
first bio of the request and use that to decide whether we want to use
iova or not. In the newly added function nvme_rq_dma_map(), perform DMA
mapping for the bvec pages using nvme_dma_link_page(). Additionally,
if NVMe SGL is provided, build SGL entry inline while creating this
mapping to avoid extra traversal.

Call nvme_rq_dma_map() from nvme_pci_setup_prps() and
nvme_pci_setup_sgls(). For NVME SGL case, nvme_rq_dma_map() will handle
building SGL inline. To build PRPs, use iod->dma_map->dma_link_address
in nvme_pci_setup_prps() and increment the counter appropriately to
retrieve the next set of DMA addresses.

This demonstrates how the new DMA API can fit into the NVMe driver and
replace the old DMA APIs.

As this is an RFC, I expect more robust error handling, optimizations,
and in-depth testing for the final version once we agree on DMA API
architecture.

Following is the performance comparision for existing DMA API case
with sg_table and with dma_map, once we have agreement on the new DMA
API design I intend to get similar profiling numbers for new DMA API.

sgl (sg_table + old dma API ) vs no_sgl (iod_dma_map + new DMA API) :-

block size                               IOPS (k) Average of 3

4K
--------------------------------------------------------------
sg-list-fio-perf.bs-4k-1.fio:             68.6
sg-list-fio-perf.bs-4k-2.fio:             68       68.36
sg-list-fio-perf.bs-4k-3.fio:             68.5

no-sg-list-fio-perf.bs-4k-1.fio:          68.7
no-sg-list-fio-perf.bs-4k-2.fio:          68.5     68.43
no-sg-list-fio-perf.bs-4k-3.fio:          68.1

% Change default vs new DMA API =       +0.0975%

8K
--------------------------------------------------------------
sg-list-fio-perf.bs-8k-1.fio:             67
sg-list-fio-perf.bs-8k-2.fio:             67.1     67.03
sg-list-fio-perf.bs-8k-3.fio:             67

no-sg-list-fio-perf.bs-8k-1.fio:          66.7
no-sg-list-fio-perf.bs-8k-2.fio:          66.7     66.7
no-sg-list-fio-perf.bs-8k-3.fio:          66.7

% Change default vs new DMA API =       +0.4993%

16K
--------------------------------------------------------------
sg-list-fio-perf.bs-16k-1.fio:            63.8
sg-list-fio-perf.bs-16k-2.fio:            63.4     63.5
sg-list-fio-perf.bs-16k-3.fio:            63.3

no-sg-list-fio-perf.bs-16k-1.fio:         63.5
no-sg-list-fio-perf.bs-16k-2.fio:         63.4     63.33
no-sg-list-fio-perf.bs-16k-3.fio:         63.1

% Change default vs new DMA API =       -0.2632%

32K
--------------------------------------------------------------
sg-list-fio-perf.bs-32k-1.fio:            59.3
sg-list-fio-perf.bs-32k-2.fio:            59.3     59.36
sg-list-fio-perf.bs-32k-3.fio:            59.5

no-sg-list-fio-perf.bs-32k-1.fio:         59.5
no-sg-list-fio-perf.bs-32k-2.fio:         59.6     59.43
no-sg-list-fio-perf.bs-32k-3.fio:         59.2

% Change default vs new DMA API =       +0.1122%

64K
--------------------------------------------------------------
sg-list-fio-perf.bs-64k-1.fio:            53.7
sg-list-fio-perf.bs-64k-2.fio:            53.4     53.56
sg-list-fio-perf.bs-64k-3.fio:            53.6

no-sg-list-fio-perf.bs-64k-1.fio:         53.5
no-sg-list-fio-perf.bs-64k-2.fio:         53.8     53.63
no-sg-list-fio-perf.bs-64k-3.fio:         53.6

% Change default vs new DMA API =        +0.1246%

128K
--------------------------------------------------------------
sg-list-fio-perf/bs-128k-1.fio:           48
sg-list-fio-perf/bs-128k-2.fio:           46.4     47.13
sg-list-fio-perf/bs-128k-3.fio:           47

no-sg-list-fio-perf/bs-128k-1.fio:        46.6
no-sg-list-fio-perf/bs-128k-2.fio:        47        46.9
no-sg-list-fio-perf/bs-128k-3.fio:        47.1

% Change default vs new DMA API =       −0.495%

256K
--------------------------------------------------------------
sg-list-fio-perf/bs-256k-1.fio:           37
sg-list-fio-perf/bs-256k-2.fio:           41        39.93
sg-list-fio-perf/bs-256k-3.fio:           41.8

no-sg-list-fio-perf/bs-256k-1.fio:        37.5
no-sg-list-fio-perf/bs-256k-2.fio:        41.4      40.5
no-sg-list-fio-perf/bs-256k-3.fio:        42.6

% Change default vs new DMA API =       +1.42%

512K
--------------------------------------------------------------
sg-list-fio-perf/bs-512k-1.fio:           28.5
sg-list-fio-perf/bs-512k-2.fio:           28.2      28.4
sg-list-fio-perf/bs-512k-3.fio:           28.5

no-sg-list-fio-perf/bs-512k-1.fio:        28.7
no-sg-list-fio-perf/bs-512k-2.fio:        28.6      28.7
no-sg-list-fio-perf/bs-512k-3.fio:        28.8

% Change default vs new DMA API =       +1.06%

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/nvme/host/pci.c | 283 ++++++++++++++++++++++++++++++----------
 1 file changed, 213 insertions(+), 70 deletions(-)

Message ID	47eb0510b0a6aa52d9f5665d75fa7093dd6af53f.1719909395.git.leon@kernel.org (mailing list archive)
State	Superseded
Delegated to:	Bjorn Helgaas
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A8C1156C40; Tue, 2 Jul 2024 09:11:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911471; cv=none; b=I6QDvszvIPMhiy+qerqlw/ono6Ik3lnxdaPj01J0IBNd2q55TuoXLiaZ+rtz9MlBEqzWydLkhWT977/E/A0otiZ/TZUytAnMmbtmGcRCk1Z77zpNYnihQZFvo9Mb68tEwe/1lH3dun3ZcM9AKa6OnpnSOILEF56ZfNx9Lc/+Z80= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719911471; c=relaxed/simple; bh=rWwAHyiwmax8yvMjrwSRAusPbIB/KT92oJ4QN5II8o8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=uwuB1V+B4RBmHYgt06nUppznL3i1GWUtiAZK1O7I6jMl1foclbL8svmqvcxWqC+NzFkACaUmq4xpqjNpkn4g2OZ7DA1pbKbi97/PwgOYzaHl4cTCeDuEKvFLQZnvEyKSohBxuH3JWTEhQYeo7SIz5iIPSnZntXZsbLuznuZshK0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y6hNK7RX; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y6hNK7RX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 336E6C116B1; Tue, 2 Jul 2024 09:11:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719911470; bh=rWwAHyiwmax8yvMjrwSRAusPbIB/KT92oJ4QN5II8o8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Y6hNK7RXDkaA/i33kcQvJ4JKz8/yPjpD5lRL7DqV5Xmj2oQvXnyGYAQ2epTmINygy 0y19irtGTGWm+OtjW5NkAItWfTaPfLe+Fvzo6XpXK70xytP//3+E0opt8r1DuIjpCG 8r0JXPXWB2Z4hDwHlXlzcZ3/Bm1gsukpVrzK5gOHjyumUYCJ6hIXV0+iG4nt9CDCbA AVvnWwKND922egkc2IkGqZ7kYK3GZ63NIoDvDrhXUVUjhoQqnTcoiujq6cCinieq+O VSwcXGDkFUwWonCqIjy2AUNt+QmmYZvJX5fJ8Y5c5HTLjsPFdIJ2jHoXuGkDlZv1PP 3XgJzr2OB4ILA== From: Leon Romanovsky <leon@kernel.org> To: Jens Axboe <axboe@kernel.dk>, Jason Gunthorpe <jgg@ziepe.ca>, Robin Murphy <robin.murphy@arm.com>, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>, "Zeng, Oak" <oak.zeng@intel.com>, Chaitanya Kulkarni <kch@nvidia.com> Cc: Sagi Grimberg <sagi@grimberg.me>, Bjorn Helgaas <bhelgaas@google.com>, Logan Gunthorpe <logang@deltatee.com>, Yishai Hadas <yishaih@nvidia.com>, Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>, Kevin Tian <kevin.tian@intel.com>, Alex Williamson <alex.williamson@redhat.com>, Marek Szyprowski <m.szyprowski@samsung.com>, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= <jglisse@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v1 18/18] nvme-pci: use new dma API Date: Tue, 2 Jul 2024 12:09:48 +0300 Message-ID: <47eb0510b0a6aa52d9f5665d75fa7093dd6af53f.1719909395.git.leon@kernel.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <cover.1719909395.git.leon@kernel.org> References: <cover.1719909395.git.leon@kernel.org> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: <linux-pci.vger.kernel.org> List-Subscribe: <mailto:linux-pci+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-pci+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	Provide a new two step DMA API mapping API \| expand [RFC,v1,00/18] Provide a new two step DMA API mapping API [RFC,v1,01/18] dma-mapping: query DMA memory type [RFC,v1,02/18] dma-mapping: provide an interface to allocate IOVA [RFC,v1,03/18] dma-mapping: check if IOVA can be used [RFC,v1,04/18] dma-mapping: implement link range API [RFC,v1,05/18] mm/hmm: let users to tag specific PFN with DMA mapped bit [RFC,v1,06/18] dma-mapping: provide callbacks to link/unlink HMM PFNs to specific IOVA [RFC,v1,07/18] iommu/dma: Provide an interface to allow preallocate IOVA [RFC,v1,08/18] iommu/dma: Implement link/unlink ranges callbacks [RFC,v1,09/18] RDMA/umem: Preallocate and cache IOVA for UMEM ODP [RFC,v1,10/18] RDMA/umem: Store ODP access mask information in PFN [RFC,v1,11/18] RDMA/core: Separate DMA mapping to caching IOVA and page linkage [RFC,v1,12/18] RDMA/umem: Prevent UMEM ODP creation with SWIOTLB [RFC,v1,13/18] vfio/mlx5: Explicitly use number of pages instead of allocated length [RFC,v1,14/18] vfio/mlx5: Rewrite create mkey flow to allow better code reuse [RFC,v1,15/18] vfio/mlx5: Explicitly store page list [RFC,v1,16/18] vfio/mlx5: Convert vfio to use DMA link API [RFC,v1,17/18] block: export helper to get segment max size [RFC,v1,18/18] nvme-pci: use new dma API

[RFC,v1,18/18] nvme-pci: use new dma API

Commit Message

Comments

Patch