Message ID | 20210408170123.8788-10-logang@deltatee.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Add new DMA mapping operation for P2PDMA | expand |
On Thu, Apr 08, 2021 at 11:01:16AM -0600, Logan Gunthorpe wrote: > Add PCI P2PDMA support for dma_direct_map_sg() so that it can map > PCI P2PDMA pages directly without a hack in the callers. This allows > for heterogeneous SGLs that contain both P2PDMA and regular pages. > > SGL segments that contain PCI bus addresses are marked with > sg_mark_pci_p2pdma() and are ignored when unmapped. > > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> > kernel/dma/direct.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > index 002268262c9a..108dfb4ecbd5 100644 > +++ b/kernel/dma/direct.c > @@ -13,6 +13,7 @@ > #include <linux/vmalloc.h> > #include <linux/set_memory.h> > #include <linux/slab.h> > +#include <linux/pci-p2pdma.h> > #include "direct.h" > > /* > @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, > struct scatterlist *sg; > int i; > > - for_each_sg(sgl, sg, nents, i) > + for_each_sg(sgl, sg, nents, i) { > + if (sg_is_pci_p2pdma(sg)) { > + sg_unmark_pci_p2pdma(sg); This doesn't seem nice, the DMA layer should only alter the DMA portion of the SG, not the other portions. Is it necessary? Jason
On Tue, Apr 27, 2021 at 04:33:51PM -0300, Jason Gunthorpe wrote: > On Thu, Apr 08, 2021 at 11:01:16AM -0600, Logan Gunthorpe wrote: > > Add PCI P2PDMA support for dma_direct_map_sg() so that it can map > > PCI P2PDMA pages directly without a hack in the callers. This allows > > for heterogeneous SGLs that contain both P2PDMA and regular pages. > > > > SGL segments that contain PCI bus addresses are marked with > > sg_mark_pci_p2pdma() and are ignored when unmapped. > > > > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> > > kernel/dma/direct.c | 25 ++++++++++++++++++++++--- > > 1 file changed, 22 insertions(+), 3 deletions(-) > > > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > > index 002268262c9a..108dfb4ecbd5 100644 > > +++ b/kernel/dma/direct.c > > @@ -13,6 +13,7 @@ > > #include <linux/vmalloc.h> > > #include <linux/set_memory.h> > > #include <linux/slab.h> > > +#include <linux/pci-p2pdma.h> > > #include "direct.h" > > > > /* > > @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, > > struct scatterlist *sg; > > int i; > > > > - for_each_sg(sgl, sg, nents, i) > > + for_each_sg(sgl, sg, nents, i) { > > + if (sg_is_pci_p2pdma(sg)) { > > + sg_unmark_pci_p2pdma(sg); > > This doesn't seem nice, the DMA layer should only alter the DMA > portion of the SG, not the other portions. Is it necessary? Oh, I got it completely wrong what this is for. This should be named sg_dma_mark_pci_p2p() and similar for other functions to make it clear it is part of the DMA side of the SG interface (eg it is like sg_dma_address, sg_dma_len, etc) Jason
On 2021-04-27 1:40 p.m., Jason Gunthorpe wrote: > On Tue, Apr 27, 2021 at 04:33:51PM -0300, Jason Gunthorpe wrote: >> On Thu, Apr 08, 2021 at 11:01:16AM -0600, Logan Gunthorpe wrote: >>> Add PCI P2PDMA support for dma_direct_map_sg() so that it can map >>> PCI P2PDMA pages directly without a hack in the callers. This allows >>> for heterogeneous SGLs that contain both P2PDMA and regular pages. >>> >>> SGL segments that contain PCI bus addresses are marked with >>> sg_mark_pci_p2pdma() and are ignored when unmapped. >>> >>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> >>> kernel/dma/direct.c | 25 ++++++++++++++++++++++--- >>> 1 file changed, 22 insertions(+), 3 deletions(-) >>> >>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c >>> index 002268262c9a..108dfb4ecbd5 100644 >>> +++ b/kernel/dma/direct.c >>> @@ -13,6 +13,7 @@ >>> #include <linux/vmalloc.h> >>> #include <linux/set_memory.h> >>> #include <linux/slab.h> >>> +#include <linux/pci-p2pdma.h> >>> #include "direct.h" >>> >>> /* >>> @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, >>> struct scatterlist *sg; >>> int i; >>> >>> - for_each_sg(sgl, sg, nents, i) >>> + for_each_sg(sgl, sg, nents, i) { >>> + if (sg_is_pci_p2pdma(sg)) { >>> + sg_unmark_pci_p2pdma(sg); >> >> This doesn't seem nice, the DMA layer should only alter the DMA >> portion of the SG, not the other portions. Is it necessary? > > Oh, I got it completely wrong what this is for. > > This should be named sg_dma_mark_pci_p2p() and similar for other > functions to make it clear it is part of the DMA side of the SG > interface (eg it is like sg_dma_address, sg_dma_len, etc) Fair point. Yes, I'll rename this for the next version. Logan
On 4/8/21 10:01 AM, Logan Gunthorpe wrote: > Add PCI P2PDMA support for dma_direct_map_sg() so that it can map > PCI P2PDMA pages directly without a hack in the callers. This allows > for heterogeneous SGLs that contain both P2PDMA and regular pages. > > SGL segments that contain PCI bus addresses are marked with > sg_mark_pci_p2pdma() and are ignored when unmapped. > > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> > --- > kernel/dma/direct.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > index 002268262c9a..108dfb4ecbd5 100644 > --- a/kernel/dma/direct.c > +++ b/kernel/dma/direct.c > @@ -13,6 +13,7 @@ > #include <linux/vmalloc.h> > #include <linux/set_memory.h> > #include <linux/slab.h> > +#include <linux/pci-p2pdma.h> > #include "direct.h" > > /* > @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, This routine now deserves a little bit of commenting, now that it is doing less obvious things. How about something like this: /* * Unmaps pages, except for PCI_P2PDMA pages, which were never mapped in the * first place. Instead of unmapping PCI_P2PDMA entries, simply remove the * SG_PCI_P2PDMA mark */ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, int nents, enum dma_data_direction dir, unsigned long attrs) { > struct scatterlist *sg; > int i; > > - for_each_sg(sgl, sg, nents, i) > + for_each_sg(sgl, sg, nents, i) { > + if (sg_is_pci_p2pdma(sg)) { > + sg_unmark_pci_p2pdma(sg); > + continue; > + } > + > dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir, > attrs); > + } The same thing can be achieved with fewer lines and a bit more clarity. Can we please do it like this instead: for_each_sg(sgl, sg, nents, i) { if (sg_is_pci_p2pdma(sg)) sg_unmark_pci_p2pdma(sg); else dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir, attrs); } > } > #endif > Also here, a block comment for the function would be nice. How about approximately this: /* * Maps each SG segment. Returns the number of entries mapped, or 0 upon * failure. If any entry could not be mapped, then no entries are mapped. */ I'll stop complaining about the pre-existing return code conventions, since by now you know what I was thinking of saying. :) > int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, > enum dma_data_direction dir, unsigned long attrs) > { > - int i; > + struct pci_p2pdma_map_state p2pdma_state = {}; Is it worth putting this stuff on the stack--is there a noticeable performance improvement from caching the state? Because if it's invisible, then simplicity is better. I suspect you're right, and that it *is* worth it, but it's good to know for real. > struct scatterlist *sg; > + int i, ret = 0; > > for_each_sg(sgl, sg, nents, i) { > + if (is_pci_p2pdma_page(sg_page(sg))) { > + ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg, > + attrs); > + if (ret < 0) { > + goto out_unmap; > + } else if (ret) { > + ret = 0; > + continue; Is this a bug? If neither of those "if" branches fires (ret == 0), then the code (probably unintentionally) falls through and continues on to attempt to call dma_direct_map_page()--despite being a PCI_P2PDMA page! See below for suggestions: > + } > + } > + > sg->dma_address = dma_direct_map_page(dev, sg_page(sg), > sg->offset, sg->length, dir, attrs); > if (sg->dma_address == DMA_MAPPING_ERROR) This is another case in which "continue" is misleading and not as good as "else". Because unless I'm wrong above, you really only want to take one path *or* the other. Also, the "else if (ret)" can be simplified to just setting ret = 0 unconditionally. Given all that, here's a suggested alternative, which is both shorter and clearer, IMHO: for_each_sg(sgl, sg, nents, i) { if (is_pci_p2pdma_page(sg_page(sg))) { ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg, attrs); if (ret < 0) goto out_unmap; else ret = 0; } else { sg->dma_address = dma_direct_map_page(dev, sg_page(sg), sg->offset, sg->length, dir, attrs); if (sg->dma_address == DMA_MAPPING_ERROR) goto out_unmap; sg_dma_len(sg) = sg->length; } } thanks,
On 5/2/21 4:28 PM, John Hubbard wrote: > On 4/8/21 10:01 AM, Logan Gunthorpe wrote: ... >> @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, > > This routine now deserves a little bit of commenting, now that it is > doing less obvious things. How about something like this: > > /* > * Unmaps pages, except for PCI_P2PDMA pages, which were never mapped in the > * first place. Instead of unmapping PCI_P2PDMA entries, simply remove the > * SG_PCI_P2PDMA mark > */ I got that kind of wrong. They *were* mapped, but need to be left mostly alone...maybe you can word it better. Here's my second draft: /* * Unmaps pages, except for PCI_P2PDMA pages, which should not be unmapped at * this point. Instead of unmapping PCI_P2PDMA entries, simply remove the * SG_PCI_P2PDMA mark. */ ...am I getting close? :) thanks,
On 2021-05-02 5:28 p.m., John Hubbard wrote: >> @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, > > This routine now deserves a little bit of commenting, now that it is > doing less obvious things. How about something like this: > > /* > * Unmaps pages, except for PCI_P2PDMA pages, which were never mapped in the > * first place. Instead of unmapping PCI_P2PDMA entries, simply remove the > * SG_PCI_P2PDMA mark > */ > void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, > int nents, enum dma_data_direction dir, unsigned long attrs) > { > Ok. >> struct scatterlist *sg; >> int i; >> >> - for_each_sg(sgl, sg, nents, i) >> + for_each_sg(sgl, sg, nents, i) { >> + if (sg_is_pci_p2pdma(sg)) { >> + sg_unmark_pci_p2pdma(sg); >> + continue; >> + } >> + >> dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir, >> attrs); >> + } > > The same thing can be achieved with fewer lines and a bit more clarity. > Can we please do it like this instead: > > for_each_sg(sgl, sg, nents, i) { > if (sg_is_pci_p2pdma(sg)) > sg_unmark_pci_p2pdma(sg); > else > dma_direct_unmap_page(dev, sg->dma_address, > sg_dma_len(sg), dir, attrs); > } > > That's debatable (the way I did it emphasizes the common case). But I'll consider changing it. > > Also here, a block comment for the function would be nice. How about > approximately this: > > /* > * Maps each SG segment. Returns the number of entries mapped, or 0 upon > * failure. If any entry could not be mapped, then no entries are mapped. > */ > > I'll stop complaining about the pre-existing return code conventions, > since by now you know what I was thinking of saying. :) Not really part of this patchset... Seems like if you think there should be a comment like that here, you should send a patch. But this patch starts returning a negative value here. >> int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, >> enum dma_data_direction dir, unsigned long attrs) >> { >> - int i; >> + struct pci_p2pdma_map_state p2pdma_state = {}; > > Is it worth putting this stuff on the stack--is there a noticeable > performance improvement from caching the state? Because if it's > invisible, then simplicity is better. I suspect you're right, and that > it *is* worth it, but it's good to know for real. > >> struct scatterlist *sg; >> + int i, ret = 0; >> >> for_each_sg(sgl, sg, nents, i) { >> + if (is_pci_p2pdma_page(sg_page(sg))) { >> + ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg, >> + attrs); >> + if (ret < 0) { >> + goto out_unmap; >> + } else if (ret) { >> + ret = 0; >> + continue; > > Is this a bug? If neither of those "if" branches fires (ret == 0), then > the code (probably unintentionally) falls through and continues on to > attempt to call dma_direct_map_page()--despite being a PCI_P2PDMA page! No, it's not a bug. Per the documentation of pci_p2pdma_map_segment(), if it returns zero the segment should be mapped normally. P2PDMA pages must be mapped with physical addresses (or IOVA addresses) if the TLPS for the transaction will go through the host bridge. > See below for suggestions: > >> + } >> + } >> + >> sg->dma_address = dma_direct_map_page(dev, sg_page(sg), >> sg->offset, sg->length, dir, attrs); >> if (sg->dma_address == DMA_MAPPING_ERROR) > > This is another case in which "continue" is misleading and not as good > as "else". Because unless I'm wrong above, you really only want to take > one path *or* the other. No, per above, it's not one path or the other. If it's a P2PDMA page it may still need to be mapped normally. > Also, the "else if (ret)" can be simplified to just setting ret = 0 > unconditionally. I don't follow. If ret is set, we need to unset it before the end of the loop. > Given all that, here's a suggested alternative, which is both shorter > and clearer, IMHO: > > for_each_sg(sgl, sg, nents, i) { > if (is_pci_p2pdma_page(sg_page(sg))) { > ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg, > attrs); > if (ret < 0) > goto out_unmap; > else > ret = 0; > } else { > sg->dma_address = dma_direct_map_page(dev, sg_page(sg), > sg->offset, sg->length, dir, attrs); > if (sg->dma_address == DMA_MAPPING_ERROR) > goto out_unmap; > sg_dma_len(sg) = sg->length; > } > } No, per the comments above, this does not accomplish the same thing and is not correct. I'll try to add a comment to the code to make it more clearer. But the kernel doc on pci_p2pdma_map_segment() does mention what must be done for different return values explicitly. Logan
Oops missed a comment: On 2021-05-02 5:28 p.m., John Hubbard wrote: >> int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, >> enum dma_data_direction dir, unsigned long attrs) >> { >> - int i; >> + struct pci_p2pdma_map_state p2pdma_state = {}; > > Is it worth putting this stuff on the stack--is there a noticeable > performance improvement from caching the state? Because if it's > invisible, then simplicity is better. I suspect you're right, and that > it *is* worth it, but it's good to know for real. I haven't measured it (it would be hard to measure), but I think it's fairly clear here. Without the state, xa_load() would need to be called on *every* page in an SGL that maps only P2PDMA memory from one device. With the state, it only needs to be called once. xa_load() is cheap, but it is not that cheap. There's essentially the same optimization in get_user_pages for ZONE_DEVICE pages. So, if it is necessary there, it should be necessary here. Logan
On 2021-05-02 5:32 p.m., John Hubbard wrote: > On 5/2/21 4:28 PM, John Hubbard wrote: >> On 4/8/21 10:01 AM, Logan Gunthorpe wrote: > ... >>> @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, >> >> This routine now deserves a little bit of commenting, now that it is >> doing less obvious things. How about something like this: >> >> /* >> * Unmaps pages, except for PCI_P2PDMA pages, which were never mapped in the >> * first place. Instead of unmapping PCI_P2PDMA entries, simply remove the >> * SG_PCI_P2PDMA mark >> */ > > I got that kind of wrong. They *were* mapped, but need to be left mostly > alone...maybe you can word it better. Here's my second draft: > > /* > * Unmaps pages, except for PCI_P2PDMA pages, which should not be unmapped at > * this point. Instead of unmapping PCI_P2PDMA entries, simply remove the > * SG_PCI_P2PDMA mark. > */ > > ...am I getting close? :) I don't think your original comment was wrong per se. But I guess it depends on your definition of "mapped". In dma-direct the physical address is added to the SGL and, on some arches, the address has to be synced on unmap. With P2PDMA, the PCI bus address is sometimes added to the SGL and no sync is necessary at the end. Logan
On 5/3/21 10:04 AM, Logan Gunthorpe wrote: > Oops missed a comment: > > On 2021-05-02 5:28 p.m., John Hubbard wrote: >>> int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, >>> enum dma_data_direction dir, unsigned long attrs) >>> { >>> - int i; >>> + struct pci_p2pdma_map_state p2pdma_state = {}; >> >> Is it worth putting this stuff on the stack--is there a noticeable >> performance improvement from caching the state? Because if it's >> invisible, then simplicity is better. I suspect you're right, and that >> it *is* worth it, but it's good to know for real. > > I haven't measured it (it would be hard to measure), but I think it's > fairly clear here. Without the state, xa_load() would need to be called > on *every* page in an SGL that maps only P2PDMA memory from one device. > With the state, it only needs to be called once. xa_load() is cheap, but > it is not that cheap. OK, thanks for spelling it out for me. :) > > There's essentially the same optimization in get_user_pages for > ZONE_DEVICE pages. So, if it is necessary there, it should be necessary > here. > Right, that's a pretty solid example. thanks,
On 5/3/21 9:55 AM, Logan Gunthorpe wrote: ... >> The same thing can be achieved with fewer lines and a bit more clarity. >> Can we please do it like this instead: >> >> for_each_sg(sgl, sg, nents, i) { >> if (sg_is_pci_p2pdma(sg)) >> sg_unmark_pci_p2pdma(sg); >> else >> dma_direct_unmap_page(dev, sg->dma_address, >> sg_dma_len(sg), dir, attrs); >> } >> >> > > That's debatable (the way I did it emphasizes the common case). But I'll > consider changing it. > Thanks for considering it. >> >> Also here, a block comment for the function would be nice. How about >> approximately this: >> >> /* >> * Maps each SG segment. Returns the number of entries mapped, or 0 upon >> * failure. If any entry could not be mapped, then no entries are mapped. >> */ >> >> I'll stop complaining about the pre-existing return code conventions, >> since by now you know what I was thinking of saying. :) > > Not really part of this patchset... Seems like if you think there should > be a comment like that here, you should send a patch. But this patch > starts returning a negative value here. OK, that's fine. Like you say, that comment is rather beyond this patchset. >>> for_each_sg(sgl, sg, nents, i) { >>> + if (is_pci_p2pdma_page(sg_page(sg))) { >>> + ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg, >>> + attrs); >>> + if (ret < 0) { >>> + goto out_unmap; >>> + } else if (ret) { >>> + ret = 0; >>> + continue; >> >> Is this a bug? If neither of those "if" branches fires (ret == 0), then >> the code (probably unintentionally) falls through and continues on to >> attempt to call dma_direct_map_page()--despite being a PCI_P2PDMA page! > > No, it's not a bug. Per the documentation of pci_p2pdma_map_segment(), > if it returns zero the segment should be mapped normally. P2PDMA pages > must be mapped with physical addresses (or IOVA addresses) if the TLPS > for the transaction will go through the host bridge. Could we maybe put a little comment there, to that effect? It would be nice to call out that point, especially since it is common to miss one case (negative, 0, positive) when handling return values. Sort of like we used to put "// fallthrough" in the case statements. Not a big deal of course. > >> See below for suggestions: >> >>> + } >>> + } >>> + >>> sg->dma_address = dma_direct_map_page(dev, sg_page(sg), >>> sg->offset, sg->length, dir, attrs); >>> if (sg->dma_address == DMA_MAPPING_ERROR) >> >> This is another case in which "continue" is misleading and not as good >> as "else". Because unless I'm wrong above, you really only want to take >> one path *or* the other. > > No, per above, it's not one path or the other. If it's a P2PDMA page it > may still need to be mapped normally. > Right. That follows. thanks,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 002268262c9a..108dfb4ecbd5 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -13,6 +13,7 @@ #include <linux/vmalloc.h> #include <linux/set_memory.h> #include <linux/slab.h> +#include <linux/pci-p2pdma.h> #include "direct.h" /* @@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl, struct scatterlist *sg; int i; - for_each_sg(sgl, sg, nents, i) + for_each_sg(sgl, sg, nents, i) { + if (sg_is_pci_p2pdma(sg)) { + sg_unmark_pci_p2pdma(sg); + continue; + } + dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir, attrs); + } } #endif int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, enum dma_data_direction dir, unsigned long attrs) { - int i; + struct pci_p2pdma_map_state p2pdma_state = {}; struct scatterlist *sg; + int i, ret = 0; for_each_sg(sgl, sg, nents, i) { + if (is_pci_p2pdma_page(sg_page(sg))) { + ret = pci_p2pdma_map_segment(&p2pdma_state, dev, sg, + attrs); + if (ret < 0) { + goto out_unmap; + } else if (ret) { + ret = 0; + continue; + } + } + sg->dma_address = dma_direct_map_page(dev, sg_page(sg), sg->offset, sg->length, dir, attrs); if (sg->dma_address == DMA_MAPPING_ERROR) @@ -411,7 +430,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, out_unmap: dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC); - return 0; + return ret; } dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
Add PCI P2PDMA support for dma_direct_map_sg() so that it can map PCI P2PDMA pages directly without a hack in the callers. This allows for heterogeneous SGLs that contain both P2PDMA and regular pages. SGL segments that contain PCI bus addresses are marked with sg_mark_pci_p2pdma() and are ignored when unmapped. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> --- kernel/dma/direct.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)