Message ID | 20180925162231.4354-11-logang@deltatee.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Copy Offload in NVMe Fabrics with P2P PCI Memory | expand |
On Tue, Sep 25, 2018 at 10:22:28AM -0600, Logan Gunthorpe wrote: > For P2P requests, we must use the pci_p2pmem_map_sg() function > instead of the dma_map_sg functions. Sorry if this was already discussed. Is there a reason the following pattern is not pushed to the generic dma_map_sg_attrs? if (is_pci_p2pdma_page(sg_page(sg))) pci_p2pdma_map_sg(dev, sg, nents, dma_dir); Beyond that, series looks good.
Hey, On 2018-09-25 11:11 a.m., Keith Busch wrote: > Sorry if this was already discussed. Is there a reason the following > pattern is not pushed to the generic dma_map_sg_attrs? > > if (is_pci_p2pdma_page(sg_page(sg))) > pci_p2pdma_map_sg(dev, sg, nents, dma_dir); > > Beyond that, series looks good. Yes, this has been discussed. It comes down to a few reasons: 1) Intrusiveness on other systems: ie. not needing to pay the cost for every single dma_map_sg call 2) Consistency: we can add the check to dma_map_sg, but adding similar functionality to dma_map_page, etc is difficult seeing it's hard for the unmap operation to detect if a dma_addr_t was P2P memory to begin with. 3) Safety for developers trying to use P2P memory: Right now developers must be careful with P2P pages and ensure they aren't mapped using other means (ie dma_map_page). Having them check the drivers that are handling the pages to ensure the appropriate map function is always used is and that P2P pages aren't being mixed with regular pages is better than developers relying on magic in dma_map_sg() and getting things wrong. That being said, I think in the future everyone would like to move in that direction but it means we will have to solve some difficult problems with the existing infrastructure. Logan
On Tue, Sep 25, 2018 at 11:41:44AM -0600, Logan Gunthorpe wrote: > Hey, > > On 2018-09-25 11:11 a.m., Keith Busch wrote: > > Sorry if this was already discussed. Is there a reason the following > > pattern is not pushed to the generic dma_map_sg_attrs? > > > > if (is_pci_p2pdma_page(sg_page(sg))) > > pci_p2pdma_map_sg(dev, sg, nents, dma_dir); > > > > Beyond that, series looks good. > > Yes, this has been discussed. It comes down to a few reasons: > > 1) Intrusiveness on other systems: ie. not needing to pay the cost for > every single dma_map_sg call > > 2) Consistency: we can add the check to dma_map_sg, but adding similar > functionality to dma_map_page, etc is difficult seeing it's hard for the > unmap operation to detect if a dma_addr_t was P2P memory to begin with. > > 3) Safety for developers trying to use P2P memory: Right now developers > must be careful with P2P pages and ensure they aren't mapped using other > means (ie dma_map_page). Having them check the drivers that are handling > the pages to ensure the appropriate map function is always used is and > that P2P pages aren't being mixed with regular pages is better than > developers relying on magic in dma_map_sg() and getting things wrong. > > That being said, I think in the future everyone would like to move in > that direction but it means we will have to solve some difficult > problems with the existing infrastructure. Gotchya, thanks for jogging my memory.
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index dd8ec1dd9219..6033ce2fd3e9 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3051,7 +3051,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) ns->queue = blk_mq_init_queue(ctrl->tagset); if (IS_ERR(ns->queue)) goto out_free_ns; + blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue); + if (ctrl->ops->flags & NVME_F_PCI_P2PDMA) + blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue); + ns->queue->queuedata = ns; ns->ctrl = ctrl; diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index bb4a2003c097..4030743c90aa 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -343,6 +343,7 @@ struct nvme_ctrl_ops { unsigned int flags; #define NVME_F_FABRICS (1 << 0) #define NVME_F_METADATA_SUPPORTED (1 << 1) +#define NVME_F_PCI_P2PDMA (1 << 2) int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val); int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val); int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val); diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index f434706a04e8..0d6c41bc2b35 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -745,8 +745,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, goto out; ret = BLK_STS_RESOURCE; - nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir, - DMA_ATTR_NO_WARN); + + if (is_pci_p2pdma_page(sg_page(iod->sg))) + nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents, + dma_dir); + else + nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, + dma_dir, DMA_ATTR_NO_WARN); if (!nr_mapped) goto out; @@ -788,7 +793,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req) DMA_TO_DEVICE : DMA_FROM_DEVICE; if (iod->nents) { - dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir); + /* P2PDMA requests do not need to be unmapped */ + if (!is_pci_p2pdma_page(sg_page(iod->sg))) + dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir); + if (blk_integrity_rq(req)) dma_unmap_sg(dev->dev, &iod->meta_sg, 1, dma_dir); } @@ -2400,7 +2408,8 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size) static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = { .name = "pcie", .module = THIS_MODULE, - .flags = NVME_F_METADATA_SUPPORTED, + .flags = NVME_F_METADATA_SUPPORTED | + NVME_F_PCI_P2PDMA, .reg_read32 = nvme_pci_reg_read32, .reg_write32 = nvme_pci_reg_write32, .reg_read64 = nvme_pci_reg_read64,