Message ID | 20201104095052.1222754-3-hch@lst.de (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | [1/5] RDMA/core: remove ib_dma_{alloc,free}_coherent | expand |
On Wed, Nov 04, 2020 at 10:50:49AM +0100, Christoph Hellwig wrote: > +int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents) > +{ > + struct scatterlist *s; > + int i; > + > + for_each_sg(sg, s, nents, i) { > + sg_dma_address(s) = (uintptr_t)sg_virt(s); > + sg_dma_len(s) = s->length; Hmm.. There is nothing ensuring the page is mapped here for this sg_virt(). Before maybe some of the kconfig stuff prevented highmem systems indirectly, I wonder if we should add something more direct to exclude highmem for these drivers? Sigh. I think the proper fix is to replace addr/length with a scatterlist pointer in the struct ib_sge, then have SW drivers directly use the page pointer properly. Then just delete this stuff, all drivers need is a noop dmaops Looks a lot hard though, so we should probably go ahead with this. Jason
On Wed, Nov 04, 2020 at 09:42:41AM -0400, Jason Gunthorpe wrote: > On Wed, Nov 04, 2020 at 10:50:49AM +0100, Christoph Hellwig wrote: > > > +int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents) > > +{ > > + struct scatterlist *s; > > + int i; > > + > > + for_each_sg(sg, s, nents, i) { > > + sg_dma_address(s) = (uintptr_t)sg_virt(s); > > + sg_dma_len(s) = s->length; > > Hmm.. There is nothing ensuring the page is mapped here for this > sg_virt(). Before maybe some of the kconfig stuff prevented highmem > systems indirectly, I wonder if we should add something more direct to > exclude highmem for these drivers? I had actually noticed this earlier as well and then completely forgot about it.. rdmavt depends on X86_64, so it can't be used with highmem, but for rxe and siw there weren't any such dependencies so I think we were just lucky. Let me send a fix to add explicit depencies and then respin this series on top of that.. > Sigh. I think the proper fix is to replace addr/length with a > scatterlist pointer in the struct ib_sge, then have SW drivers > directly use the page pointer properly. The proper fix is to move the DMA mapping into the RDMA core, yes. And as you said it will be hard. But I don't think scatterlists are the right interface. IMHO we can keep re-use the existing struct ib_sge: struct ib_ge { u64 addr; u32 length; u32 lkey; }; with the difference that if lkey is not a MR, addr is the physical address of the memory, not a dma_addr_t or virtual address.
-----"Christoph Hellwig" <hch@lst.de> wrote: ----- >To: "Jason Gunthorpe" <jgg@ziepe.ca> >From: "Christoph Hellwig" <hch@lst.de> >Date: 11/04/2020 03:02PM >Cc: "Christoph Hellwig" <hch@lst.de>, "Bjorn Helgaas" ><bhelgaas@google.com>, "Logan Gunthorpe" <logang@deltatee.com>, >linux-rdma@vger.kernel.org, linux-pci@vger.kernel.org, >iommu@lists.linux-foundation.org >Subject: [EXTERNAL] Re: [PATCH 2/5] RDMA/core: remove use of >dma_virt_ops > >On Wed, Nov 04, 2020 at 09:42:41AM -0400, Jason Gunthorpe wrote: >> On Wed, Nov 04, 2020 at 10:50:49AM +0100, Christoph Hellwig wrote: >> >> > +int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist >*sg, int nents) >> > +{ >> > + struct scatterlist *s; >> > + int i; >> > + >> > + for_each_sg(sg, s, nents, i) { >> > + sg_dma_address(s) = (uintptr_t)sg_virt(s); >> > + sg_dma_len(s) = s->length; >> >> Hmm.. There is nothing ensuring the page is mapped here for this >> sg_virt(). Before maybe some of the kconfig stuff prevented highmem >> systems indirectly, I wonder if we should add something more direct >to >> exclude highmem for these drivers? > >I had actually noticed this earlier as well and then completely >forgot >about it.. > >rdmavt depends on X86_64, so it can't be used with highmem, but for >rxe and siw there weren't any such dependencies so I think we were >just >lucky. Let me send a fix to add explicit depencies and then respin >this >series on top of that.. > >> Sigh. I think the proper fix is to replace addr/length with a >> scatterlist pointer in the struct ib_sge, then have SW drivers >> directly use the page pointer properly. > >The proper fix is to move the DMA mapping into the RDMA core, yes. >And as you said it will be hard. But I don't think scatterlists >are the right interface. IMHO we can keep re-use the existing >struct ib_sge: > >struct ib_ge { > u64 addr; > u32 length; > u32 lkey; >}; > >with the difference that if lkey is not a MR, addr is the physical >address of the memory, not a dma_addr_t or virtual address. > lkey of zero to pass a physical buffer, only allowed for kernel applications? Very nice idea I think. btw. It would even get the vain blessing of the old IETF RDMA verbs draft ;) https://tools.ietf.org/html/draft-hilland-rddp-verbs-00#page-90 (section '7.2.1 STag of zero' - read lkey for STag)
On Wed, Nov 04, 2020 at 03:01:08PM +0100, Christoph Hellwig wrote: > > Sigh. I think the proper fix is to replace addr/length with a > > scatterlist pointer in the struct ib_sge, then have SW drivers > > directly use the page pointer properly. > > The proper fix is to move the DMA mapping into the RDMA core, yes. > And as you said it will be hard. But I don't think scatterlists > are the right interface. IMHO we can keep re-use the existing > struct ib_sge: > > struct ib_ge { > u64 addr; > u32 length; > u32 lkey; > }; Gah, right, this is all about local_dma_lkey.. > with the difference that if lkey is not a MR, addr is the physical > address of the memory, not a dma_addr_t or virtual address. It could work, I think a resonable ULP API would be to have some rdma_fill_ib_sge_from_sgl() rdma_map_sge_single() etc etc ie instead of wrappering the DMA API as-is we have a new API that directly builds the ib_sge. It always fills the local_dma_lkey from the pd, so it knows it is doing DMA from local kernel memory. Logically SW devices then have a local_dma_lkey MR that has an IOVA of the CPU physical address space, not the DMA address space as HW devices have. The ib_sge builders can know this detail and fill in addr from either a cpu phyical or a dma map. The SW device has to translate the addr/length in CPU space to something else. It actually makes reasonable sense architecturally. This is actually much less horrible than I thought.. Convert all ULPs to one of these new APIs, searching for local_dma_lkey will find all places. This will replace a whole lot of calls to ib DMA API wrapper functions. Searching for local_dma_lkey will find all users. Drivers already work with sge.addr == CPU address, so no driver change Then to kill the dma_ops wrappers the remaining users should all be connected to map_mr_sg. In this case we want a no-op dma map and fix the three map_mr_sg's to use the page side of the sgl, not the DMA side Not as horrible as I imagined at first, actually.. Jason
On Wed, Nov 04, 2020 at 11:52:55AM -0400, Jason Gunthorpe wrote: > It could work, I think a resonable ULP API would be to have some > > rdma_fill_ib_sge_from_sgl() > rdma_map_sge_single() > etc etc > > ie instead of wrappering the DMA API as-is we have a new API that > directly builds the ib_sge. It always fills the local_dma_lkey from > the pd, so it knows it is doing DMA from local kernel memory. Yeah. > Logically SW devices then have a local_dma_lkey MR that has an IOVA of > the CPU physical address space, not the DMA address space as HW > devices have. The ib_sge builders can know this detail and fill in > addr from either a cpu phyical or a dma map. I don't think the builders are the right place to do it - it really should to be in the low-level drivers for a bunch of reasons: 1) this avoids doing the dma_map when no DMA is performed, e.g. for mlx5 when send data is in the extended WQE 2) to deal with the fact that dma mapping reduces the number of SGEs. When the system uses a modern IOMMU we'll always end up with a single IOVA range no matter how many pages were mapped originally. This means any MR process can actually be consolidated to use a single SGE with the local lkey. Note that 2 implies a somewhat more complicated API, where the ULP attempts to create a MR, but the core/driver will tell it that it didn't need a MR at all.
On Wed, Nov 04, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote: > On Wed, Nov 04, 2020 at 11:52:55AM -0400, Jason Gunthorpe wrote: > > It could work, I think a resonable ULP API would be to have some > > > > rdma_fill_ib_sge_from_sgl() > > rdma_map_sge_single() > > etc etc > > > > ie instead of wrappering the DMA API as-is we have a new API that > > directly builds the ib_sge. It always fills the local_dma_lkey from > > the pd, so it knows it is doing DMA from local kernel memory. > > Yeah. > > > Logically SW devices then have a local_dma_lkey MR that has an IOVA of > > the CPU physical address space, not the DMA address space as HW > > devices have. The ib_sge builders can know this detail and fill in > > addr from either a cpu phyical or a dma map. > > I don't think the builders are the right place to do it - it really > should to be in the low-level drivers for a bunch of reasons: At this point we have little choice, the ULP is responsible for map/unmap because the ULP owns the CQ and (batch) unmap is triggered by some CQE. Reworking all drivers to somehow keep track of unmaps a CQEs triggers feels so hard at this point as to be impossible. It is why the rdma_rw_ctx basically exists. So we have to keep the current arrangment, when the ib_sge is built the dma map must be conditionally done. > 1) this avoids doing the dma_map when no DMA is performed, e.g. for > mlx5 when send data is in the extended WQE At least in the kernel, the ULP has to be involved today in IB_SEND_INLINE. Userspace does an auto-copy, but that is because it doesn't have dma map/unmap. Without unmap tracking as above the caller must supply a specially formed ib_sge when using IB_SEND_INLINE that specifies the CPU vaddr so memcpy works. But this all looks like dead code, no ULP sets IB_SEND_INLINE ?? > 2) to deal with the fact that dma mapping reduces the number of SGEs. > When the system uses a modern IOMMU we'll always end up with a > single IOVA range no matter how many pages were mapped originally. > This means any MR process can actually be consolidated to use > a single SGE with the local lkey. Any place like rdma_rw_ctx_init() that decides dynamically between SGE and MR becomes a mess. It would be manageable if rdma_rw_ctx_init() was the only place doing this.. I haven't looked lately, but I wonder if it is feasible that all the MR users would use this API? Jason
On Wed, Nov 04, 2020 at 03:09:04PM +0000, Bernard Metzler wrote: > lkey of zero to pass a physical buffer, only allowed for > kernel applications? Very nice idea I think. It already exists, it is called the local_dma_lkey, just set IB_DEVICE_LOCAL_DMA_LKEY and provide the value you want to use in device->local_dma_lkey > btw. > It would even get the vain blessing of the old IETF RDMA > verbs draft ;) > > https://tools.ietf.org/html/draft-hilland-rddp-verbs-00#page-90 IBTA standadized this, they just didn't require HW to use 0 as the lkey. Jason
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index a3b1fc84cdcab9..528de41487521b 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -1177,25 +1177,6 @@ static int assign_name(struct ib_device *device, const char *name) return ret; } -static void setup_dma_device(struct ib_device *device, - struct device *dma_device) -{ - /* - * If the caller does not provide a DMA capable device then the IB - * device will be used. In this case the caller should fully setup the - * ibdev for DMA. This usually means using dma_virt_ops. - */ -#ifdef CONFIG_DMA_VIRT_OPS - if (!dma_device) { - device->dev.dma_ops = &dma_virt_ops; - dma_device = &device->dev; - } -#endif - WARN_ON(!dma_device); - device->dma_device = dma_device; - WARN_ON(!device->dma_device->dma_parms); -} - /* * setup_device() allocates memory and sets up data that requires calling the * device ops, this is the only reason these actions are not done during @@ -1341,7 +1322,14 @@ int ib_register_device(struct ib_device *device, const char *name, if (ret) return ret; - setup_dma_device(device, dma_device); + /* + * If the caller does not provide a DMA capable device then the IB core + * will set up ib_sge and scatterlist structures that stash the kernel + * virtual address into the address field. + */ + device->dma_device = dma_device; + WARN_ON(dma_device && !dma_device->dma_parms); + ret = setup_device(device); if (ret) return ret; @@ -2675,6 +2663,19 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) } EXPORT_SYMBOL(ib_set_device_ops); +int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents) +{ + struct scatterlist *s; + int i; + + for_each_sg(sg, s, nents, i) { + sg_dma_address(s) = (uintptr_t)sg_virt(s); + sg_dma_len(s) = s->length; + } + return nents; +} +EXPORT_SYMBOL(ib_dma_virt_map_sg); + static const struct rdma_nl_cbs ibnl_ls_cb_table[RDMA_NL_LS_NUM_OPS] = { [RDMA_NL_LS_OP_RESOLVE] = { .doit = ib_nl_handle_resolve_resp, diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c index 13f43ab7220b05..ae5a629ecc6772 100644 --- a/drivers/infiniband/core/rw.c +++ b/drivers/infiniband/core/rw.c @@ -276,7 +276,7 @@ static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, struct ib_qp *qp, static void rdma_rw_unmap_sg(struct ib_device *dev, struct scatterlist *sg, u32 sg_cnt, enum dma_data_direction dir) { - if (is_pci_p2pdma_page(sg_page(sg))) + if (dev->dma_device && is_pci_p2pdma_page(sg_page(sg))) pci_p2pdma_unmap_sg(dev->dma_device, sg, sg_cnt, dir); else ib_dma_unmap_sg(dev, sg, sg_cnt, dir); diff --git a/drivers/infiniband/sw/rdmavt/Kconfig b/drivers/infiniband/sw/rdmavt/Kconfig index 9ef5f5ce1ff6b0..2de31692885cf0 100644 --- a/drivers/infiniband/sw/rdmavt/Kconfig +++ b/drivers/infiniband/sw/rdmavt/Kconfig @@ -1,8 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only config INFINIBAND_RDMAVT tristate "RDMA verbs transport library" - depends on X86_64 && ARCH_DMA_ADDR_T_64BIT + depends on X86_64 depends on PCI - select DMA_VIRT_OPS help This is a common software verbs provider for RDMA networks. diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c index 8490fdb9c91e50..90fc234f489acd 100644 --- a/drivers/infiniband/sw/rdmavt/mr.c +++ b/drivers/infiniband/sw/rdmavt/mr.c @@ -324,8 +324,6 @@ static void __rvt_free_mr(struct rvt_mr *mr) * @acc: access flags * * Return: the memory region on success, otherwise returns an errno. - * Note that all DMA addresses should be created via the functions in - * struct dma_virt_ops. */ struct ib_mr *rvt_get_dma_mr(struct ib_pd *pd, int acc) { @@ -766,7 +764,7 @@ int rvt_lkey_ok(struct rvt_lkey_table *rkt, struct rvt_pd *pd, /* * We use LKEY == zero for kernel virtual addresses - * (see rvt_get_dma_mr() and dma_virt_ops). + * (see rvt_get_dma_mr()). */ if (sge->lkey == 0) { struct rvt_dev_info *dev = ib_to_rvt(pd->ibpd.device); @@ -877,7 +875,7 @@ int rvt_rkey_ok(struct rvt_qp *qp, struct rvt_sge *sge, /* * We use RKEY == zero for kernel virtual addresses - * (see rvt_get_dma_mr() and dma_virt_ops). + * (see rvt_get_dma_mr()). */ rcu_read_lock(); if (rkey == 0) { diff --git a/drivers/infiniband/sw/rdmavt/vt.c b/drivers/infiniband/sw/rdmavt/vt.c index 670a9623b46e11..d1bbe66610cfe4 100644 --- a/drivers/infiniband/sw/rdmavt/vt.c +++ b/drivers/infiniband/sw/rdmavt/vt.c @@ -524,7 +524,6 @@ static noinline int check_support(struct rvt_dev_info *rdi, int verb) int rvt_register_device(struct rvt_dev_info *rdi) { int ret = 0, i; - u64 dma_mask; if (!rdi) return -EINVAL; @@ -579,13 +578,6 @@ int rvt_register_device(struct rvt_dev_info *rdi) /* Completion queues */ spin_lock_init(&rdi->n_cqs_lock); - /* DMA Operations */ - rdi->ibdev.dev.dma_parms = rdi->ibdev.dev.parent->dma_parms; - dma_mask = IS_ENABLED(CONFIG_64BIT) ? DMA_BIT_MASK(64) : DMA_BIT_MASK(32); - ret = dma_coerce_mask_and_coherent(&rdi->ibdev.dev, dma_mask); - if (ret) - goto bail_wss; - /* Protection Domain */ spin_lock_init(&rdi->n_pds_lock); rdi->n_pds_allocated = 0; diff --git a/drivers/infiniband/sw/rxe/Kconfig b/drivers/infiniband/sw/rxe/Kconfig index a0c6c7dfc1814f..f12bb088a44453 100644 --- a/drivers/infiniband/sw/rxe/Kconfig +++ b/drivers/infiniband/sw/rxe/Kconfig @@ -2,10 +2,8 @@ config RDMA_RXE tristate "Software RDMA over Ethernet (RoCE) driver" depends on INET && PCI && INFINIBAND - depends on !64BIT || ARCH_DMA_ADDR_T_64BIT select NET_UDP_TUNNEL select CRYPTO_CRC32 - select DMA_VIRT_OPS help This driver implements the InfiniBand RDMA transport over the Linux network stack. It enables a system with a diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index f9c832e82552f9..9c66f76545b3c2 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1118,7 +1118,6 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name) int err; struct ib_device *dev = &rxe->ib_dev; struct crypto_shash *tfm; - u64 dma_mask; strlcpy(dev->node_desc, "rxe", sizeof(dev->node_desc)); @@ -1129,12 +1128,6 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name) dev->local_dma_lkey = 0; addrconf_addr_eui48((unsigned char *)&dev->node_guid, rxe->ndev->dev_addr); - dev->dev.dma_parms = &rxe->dma_parms; - dma_set_max_seg_size(&dev->dev, UINT_MAX); - dma_mask = IS_ENABLED(CONFIG_64BIT) ? DMA_BIT_MASK(64) : DMA_BIT_MASK(32); - err = dma_coerce_mask_and_coherent(&dev->dev, dma_mask); - if (err) - return err; dev->uverbs_cmd_mask = BIT_ULL(IB_USER_VERBS_CMD_GET_CONTEXT) | BIT_ULL(IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 3414b341b7091f..4bf5d85a1ab3ce 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -352,7 +352,6 @@ struct rxe_port { struct rxe_dev { struct ib_device ib_dev; struct ib_device_attr attr; - struct device_dma_parameters dma_parms; int max_ucontext; int max_inline_data; struct mutex usdev_lock; diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig index b622fc62f2cd6d..8582ac6050c743 100644 --- a/drivers/infiniband/sw/siw/Kconfig +++ b/drivers/infiniband/sw/siw/Kconfig @@ -1,7 +1,6 @@ config RDMA_SIW tristate "Software RDMA over TCP/IP (iWARP) driver" depends on INET && INFINIBAND && LIBCRC32C - select DMA_VIRT_OPS help This driver implements the iWARP RDMA transport over the Linux TCP/IP network stack. It enables a system with a diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h index e9753831ac3f33..adda7899621962 100644 --- a/drivers/infiniband/sw/siw/siw.h +++ b/drivers/infiniband/sw/siw/siw.h @@ -69,7 +69,6 @@ struct siw_pd { struct siw_device { struct ib_device base_dev; - struct device_dma_parameters dma_parms; struct net_device *netdev; struct siw_dev_cap attrs; diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c index 181e06c1c43d7e..c62a7a0d423c0e 100644 --- a/drivers/infiniband/sw/siw/siw_main.c +++ b/drivers/infiniband/sw/siw/siw_main.c @@ -306,7 +306,6 @@ static struct siw_device *siw_device_create(struct net_device *netdev) struct siw_device *sdev = NULL; struct ib_device *base_dev; struct device *parent = netdev->dev.parent; - u64 dma_mask; int rv; if (!parent) { @@ -383,12 +382,6 @@ static struct siw_device *siw_device_create(struct net_device *netdev) */ base_dev->phys_port_cnt = 1; base_dev->dev.parent = parent; - base_dev->dev.dma_parms = &sdev->dma_parms; - dma_set_max_seg_size(&base_dev->dev, UINT_MAX); - dma_mask = IS_ENABLED(CONFIG_64BIT) ? DMA_BIT_MASK(64) : DMA_BIT_MASK(32); - if (dma_coerce_mask_and_coherent(&base_dev->dev, dma_mask)) - goto error; - base_dev->num_comp_vectors = num_possible_cpus(); xa_init_flags(&sdev->qp_xa, XA_FLAGS_ALLOC1); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 5f8fd7976034e0..661e4a22b3cb81 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -3950,6 +3950,8 @@ static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt) */ static inline int ib_dma_mapping_error(struct ib_device *dev, u64 dma_addr) { + if (!dev->dma_device) + return 0; return dma_mapping_error(dev->dma_device, dma_addr); } @@ -3964,6 +3966,8 @@ static inline u64 ib_dma_map_single(struct ib_device *dev, void *cpu_addr, size_t size, enum dma_data_direction direction) { + if (!dev->dma_device) + return (uintptr_t)cpu_addr; return dma_map_single(dev->dma_device, cpu_addr, size, direction); } @@ -3978,6 +3982,8 @@ static inline void ib_dma_unmap_single(struct ib_device *dev, u64 addr, size_t size, enum dma_data_direction direction) { + if (!dev->dma_device) + return; dma_unmap_single(dev->dma_device, addr, size, direction); } @@ -3995,6 +4001,8 @@ static inline u64 ib_dma_map_page(struct ib_device *dev, size_t size, enum dma_data_direction direction) { + if (!dev->dma_device) + return (uintptr_t)(page_address(page) + offset); return dma_map_page(dev->dma_device, page, offset, size, direction); } @@ -4009,9 +4017,33 @@ static inline void ib_dma_unmap_page(struct ib_device *dev, u64 addr, size_t size, enum dma_data_direction direction) { + if (!dev->dma_device) + return; dma_unmap_page(dev->dma_device, addr, size, direction); } +int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents); +static inline int ib_dma_map_sg_attrs(struct ib_device *dev, + struct scatterlist *sg, int nents, + enum dma_data_direction direction, + unsigned long dma_attrs) +{ + if (!dev->dma_device) + return ib_dma_virt_map_sg(dev, sg, nents); + return dma_map_sg_attrs(dev->dma_device, sg, nents, direction, + dma_attrs); +} + +static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev, + struct scatterlist *sg, int nents, + enum dma_data_direction direction, + unsigned long dma_attrs) +{ + if (!dev->dma_device) + return; + dma_unmap_sg_attrs(dev->dma_device, sg, nents, direction, dma_attrs); +} + /** * ib_dma_map_sg - Map a scatter/gather list to DMA addresses * @dev: The device for which the DMA addresses are to be created @@ -4023,7 +4055,7 @@ static inline int ib_dma_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents, enum dma_data_direction direction) { - return dma_map_sg(dev->dma_device, sg, nents, direction); + return ib_dma_map_sg_attrs(dev, sg, nents, direction, 0); } /** @@ -4037,24 +4069,7 @@ static inline void ib_dma_unmap_sg(struct ib_device *dev, struct scatterlist *sg, int nents, enum dma_data_direction direction) { - dma_unmap_sg(dev->dma_device, sg, nents, direction); -} - -static inline int ib_dma_map_sg_attrs(struct ib_device *dev, - struct scatterlist *sg, int nents, - enum dma_data_direction direction, - unsigned long dma_attrs) -{ - return dma_map_sg_attrs(dev->dma_device, sg, nents, direction, - dma_attrs); -} - -static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev, - struct scatterlist *sg, int nents, - enum dma_data_direction direction, - unsigned long dma_attrs) -{ - dma_unmap_sg_attrs(dev->dma_device, sg, nents, direction, dma_attrs); + ib_dma_unmap_sg_attrs(dev, sg, nents, direction, 0); } /** @@ -4065,6 +4080,8 @@ static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev, */ static inline unsigned int ib_dma_max_seg_size(struct ib_device *dev) { + if (!dev->dma_device) + return UINT_MAX; return dma_get_max_seg_size(dev->dma_device); } @@ -4080,6 +4097,8 @@ static inline void ib_dma_sync_single_for_cpu(struct ib_device *dev, size_t size, enum dma_data_direction dir) { + if (!dev->dma_device) + return; dma_sync_single_for_cpu(dev->dma_device, addr, size, dir); } @@ -4095,6 +4114,8 @@ static inline void ib_dma_sync_single_for_device(struct ib_device *dev, size_t size, enum dma_data_direction dir) { + if (!dev->dma_device) + return; dma_sync_single_for_device(dev->dma_device, addr, size, dir); }
Use the ib_dma_* helpers to skip the DMA translation instead. This removes the last user if dma_virt_ops and keeps the weird layering violation inside the RDMA core instead of burderning the DMA mapping subsystems with it. This also means the software RDMA drivers now don't have to mess with DMA parameters that are not relevant to them at all, and that in the future we can use PCI P2P transfers even for software RDMA, as there is no first fake layer of DMA mapping that the P2P DMA support. Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/infiniband/core/device.c | 41 ++++++++++--------- drivers/infiniband/core/rw.c | 2 +- drivers/infiniband/sw/rdmavt/Kconfig | 3 +- drivers/infiniband/sw/rdmavt/mr.c | 6 +-- drivers/infiniband/sw/rdmavt/vt.c | 8 ---- drivers/infiniband/sw/rxe/Kconfig | 2 - drivers/infiniband/sw/rxe/rxe_verbs.c | 7 ---- drivers/infiniband/sw/rxe/rxe_verbs.h | 1 - drivers/infiniband/sw/siw/Kconfig | 1 - drivers/infiniband/sw/siw/siw.h | 1 - drivers/infiniband/sw/siw/siw_main.c | 7 ---- include/rdma/ib_verbs.h | 59 ++++++++++++++++++--------- 12 files changed, 65 insertions(+), 73 deletions(-)