Message ID | 20160310160522.GB1111@bigcity.dyn.berto.se (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Geert Uytterhoeven |
Headers | show |
On Thu, Mar 10, 2016 at 05:05:23PM +0100, Niklas S??derlund wrote: > Hi Christoph, > > On 2016-03-07 23:38:47 -0800, Christoph Hellwig wrote: > > Please add some documentation on where/how this should be used. It's > > not a very obvious interface. > > Good idea, I have added the following to Documentation/DMA-API.txt and > folded it in to this patch. Do you feel it's adequate and do you know > anywhere else I should add documentation? > > diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt > index 45ef3f2..248556a 100644 > --- a/Documentation/DMA-API.txt > +++ b/Documentation/DMA-API.txt > @@ -277,14 +277,29 @@ and <size> parameters are provided to do partial page mapping, it is > recommended that you never use these unless you really know what the > cache width is. > > +dma_addr_t > +dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size, > + enum dma_data_direction dir, struct dma_attrs *attrs) > + > +Maps a MMIO region so it can be accessed by the device and returns the > +DMA address of the memory. API should only be used to map device MMIO, > +mapping of RAM is not permitted. > + > +void > +dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size, > + enum dma_data_direction dir, struct dma_attrs *attrs) Using function prototypes in documentation may not be a great idea as you are explaining the role of this function and not arguments. > + > +Unmaps the IOMMU region previously mapped. All the parameters passed in > +must be identical to those passed in (and returned) by the mapping API. > + > int > dma_mapping_error(struct device *dev, dma_addr_t dma_addr) > > -In some circumstances dma_map_single() and dma_map_page() will fail to create > -a mapping. A driver can check for these errors by testing the returned > -DMA address with dma_mapping_error(). A non-zero return value means the mapping > -could not be created and the driver should take appropriate action (e.g. > -reduce current DMA mapping usage or delay and try again later). > +In some circumstances dma_map_single(), dma_map_page() and dma_map_resource() > +will fail to create a mapping. A driver can check for these errors by testing > +the returned DMA address with dma_mapping_error(). A non-zero return value > +means the mapping could not be created and the driver should take appropriate > +action (e.g. reduce current DMA mapping usage or delay and try again later). > > -- > Regards, > Niklas Söderlund
On Thu, Mar 10, 2016 at 8:05 AM, Niklas S??derlund <niklas.soderlund@ragnatech.se> wrote: > Hi Christoph, > > On 2016-03-07 23:38:47 -0800, Christoph Hellwig wrote: >> Please add some documentation on where/how this should be used. It's >> not a very obvious interface. > > Good idea, I have added the following to Documentation/DMA-API.txt and > folded it in to this patch. Do you feel it's adequate and do you know > anywhere else I should add documentation? > > diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt > index 45ef3f2..248556a 100644 > --- a/Documentation/DMA-API.txt > +++ b/Documentation/DMA-API.txt > @@ -277,14 +277,29 @@ and <size> parameters are provided to do partial page mapping, it is > recommended that you never use these unless you really know what the > cache width is. > > +dma_addr_t > +dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size, > + enum dma_data_direction dir, struct dma_attrs *attrs) > + > +Maps a MMIO region so it can be accessed by the device and returns the > +DMA address of the memory. API should only be used to map device MMIO, > +mapping of RAM is not permitted. > + I think it is confusing to use the dma_ prefix for this peer-to-peer mmio functionality. dma_addr_t is a device's view of host memory. Something like bus_addr_t bus_map_resource(). Doesn't this routine also need the source device in addition to the target device? The resource address is from the perspective of the host cpu, it may be a different address space in the view of two devices relative to each other.
On Thu, Mar 10, 2016 at 10:47:10PM -0800, Dan Williams wrote: > I think it is confusing to use the dma_ prefix for this peer-to-peer > mmio functionality. dma_addr_t is a device's view of host memory. > Something like bus_addr_t bus_map_resource(). Doesn't this routine > also need the source device in addition to the target device? The > resource address is from the perspective of the host cpu, it may be a > different address space in the view of two devices relative to each > other. Is it supposed to be per-mmio? It's in dma-mapping ops, and has dma in the name, so I suspected it's for some form of peer dma. But given that our dma APIs reuqire a struct page backing I have no idea how this even supposed to work, and this little documentation blurb still doesn't clear that up. So for now I'd like to NAK this patch until the use case can be explained clearly, and actually works.
Hi all, Thanks for your comments. On 2016-03-11 03:15:22 -0800, Christoph Hellwig wrote: > On Thu, Mar 10, 2016 at 10:47:10PM -0800, Dan Williams wrote: > > I think it is confusing to use the dma_ prefix for this peer-to-peer > > mmio functionality. dma_addr_t is a device's view of host memory. > > Something like bus_addr_t bus_map_resource(). Doesn't this routine > > also need the source device in addition to the target device? The > > resource address is from the perspective of the host cpu, it may be a > > different address space in the view of two devices relative to each > > other. > > Is it supposed to be per-mmio? It's in dma-mapping ops, and has dma > in the name, so I suspected it's for some form of peer dma. But given > that our dma APIs reuqire a struct page backing I have no idea how this > even supposed to work, and this little documentation blurb still doesn't > clear that up. > > So for now I'd like to NAK this patch until the use case can be > explained clearly, and actually works. I can explain the use case and maybe we can figure out if this approach is the correct one to solve it. The problem is that I have devices behind an IOMMU which I would like to use with DMA. Vinod recently moved forward with his and Linus Walleij patch '[PATCH] dmaengine: use phys_addr_t for slave configuration' which clarifies that the DMA slave address provided by a client is the physical address. This puts the task of mapping the DMA slave address from a phys_addr_t to a dma_addr_t on the DMA engine. Without an IOMMU this is easy since the phys_addr_t and dma_addr_t are the same and no special care is needed. However if you have a IOMMU you need to map the DMA slave phys_addr_t to a dma_addr_t using something like this. Is it not very similar to dma_map_single() where one maps processor virtual memory (instead if MMIO) so that it can be used with DMA slaves?
Hi Dan, On 11/03/16 06:47, Dan Williams wrote: > On Thu, Mar 10, 2016 at 8:05 AM, Niklas S??derlund > <niklas.soderlund@ragnatech.se> wrote: >> Hi Christoph, >> >> On 2016-03-07 23:38:47 -0800, Christoph Hellwig wrote: >>> Please add some documentation on where/how this should be used. It's >>> not a very obvious interface. >> >> Good idea, I have added the following to Documentation/DMA-API.txt and >> folded it in to this patch. Do you feel it's adequate and do you know >> anywhere else I should add documentation? >> >> diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt >> index 45ef3f2..248556a 100644 >> --- a/Documentation/DMA-API.txt >> +++ b/Documentation/DMA-API.txt >> @@ -277,14 +277,29 @@ and <size> parameters are provided to do partial page mapping, it is >> recommended that you never use these unless you really know what the >> cache width is. >> >> +dma_addr_t >> +dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size, >> + enum dma_data_direction dir, struct dma_attrs *attrs) >> + >> +Maps a MMIO region so it can be accessed by the device and returns the >> +DMA address of the memory. API should only be used to map device MMIO, >> +mapping of RAM is not permitted. >> + > > I think it is confusing to use the dma_ prefix for this peer-to-peer > mmio functionality. dma_addr_t is a device's view of host memory. > Something like bus_addr_t bus_map_resource(). Doesn't this routine > also need the source device in addition to the target device? The > resource address is from the perspective of the host cpu, it may be a > different address space in the view of two devices relative to each > other. Hmm, the trouble with that is that when the DMA master is behind an IOMMU, the address space as seen by the device is dynamic and whatever we decide it to be, so there is no distinction between a "DMA" address and a "bus" address. In practice the dmaengine API has clearly worked for however long with slave MMIO addresses being a dma_addr_t, and it doesn't look like anyone objected to the change to phys_addr_t in -next either. If nothing is using bus_addr_t anyway, what's the right thing to do? Looking up through higher abstraction layers, we have the likes of struct snd_dmaengine_dai_dma_data also expecting the slave address to be a dma_addr_t, leading to things like the direct casting in bcm2835_i2s_probe() for the non-IOMMU dma != phys != bus case that could also be cleaned up with this proposed interface. Robin.
On Fri, Mar 11, 2016 at 5:46 AM, Robin Murphy <robin.murphy@arm.com> wrote: > Hi Dan, > > > On 11/03/16 06:47, Dan Williams wrote: >> >> On Thu, Mar 10, 2016 at 8:05 AM, Niklas S??derlund >> <niklas.soderlund@ragnatech.se> wrote: >>> >>> Hi Christoph, >>> >>> On 2016-03-07 23:38:47 -0800, Christoph Hellwig wrote: >>>> >>>> Please add some documentation on where/how this should be used. It's >>>> not a very obvious interface. >>> >>> >>> Good idea, I have added the following to Documentation/DMA-API.txt and >>> folded it in to this patch. Do you feel it's adequate and do you know >>> anywhere else I should add documentation? >>> >>> diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt >>> index 45ef3f2..248556a 100644 >>> --- a/Documentation/DMA-API.txt >>> +++ b/Documentation/DMA-API.txt >>> @@ -277,14 +277,29 @@ and <size> parameters are provided to do partial >>> page mapping, it is >>> recommended that you never use these unless you really know what the >>> cache width is. >>> >>> +dma_addr_t >>> +dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size, >>> + enum dma_data_direction dir, struct dma_attrs *attrs) >>> + >>> +Maps a MMIO region so it can be accessed by the device and returns the >>> +DMA address of the memory. API should only be used to map device MMIO, >>> +mapping of RAM is not permitted. >>> + >> >> >> I think it is confusing to use the dma_ prefix for this peer-to-peer >> mmio functionality. dma_addr_t is a device's view of host memory. >> Something like bus_addr_t bus_map_resource(). Doesn't this routine >> also need the source device in addition to the target device? The >> resource address is from the perspective of the host cpu, it may be a >> different address space in the view of two devices relative to each >> other. > > > Hmm, the trouble with that is that when the DMA master is behind an IOMMU, > the address space as seen by the device is dynamic and whatever we decide it > to be, so there is no distinction between a "DMA" address and a "bus" > address. > > In practice the dmaengine API has clearly worked for however long with slave > MMIO addresses being a dma_addr_t, and it doesn't look like anyone objected > to the change to phys_addr_t in -next either. If nothing is using bus_addr_t > anyway, what's the right thing to do? Looking up through higher abstraction > layers, we have the likes of struct snd_dmaengine_dai_dma_data also > expecting the slave address to be a dma_addr_t, leading to things like the > direct casting in bcm2835_i2s_probe() for the non-IOMMU dma != phys != bus > case that could also be cleaned up with this proposed interface. > So the "bus_addr_t" reaction was prompted by the recent activity of RDMA developers looking to re-use the devm_memremap_pages() api. That enabling is looking at how to setup peer-to-peer PCI-E cycles for an RDMA device to deliver data to another local device without taking a round trip through host memory. I understand the history of the dmaengine-slave implementation, but it seems we're getting to point where we need a less overloaded identifier than "dma" for the case of devices talking to each other.
On Fri, Mar 11, 2016 at 01:58:46PM +0100, Niklas S?derlund wrote: > Without an IOMMU this is easy since the phys_addr_t and dma_addr_t are > the same and no special care is needed. However if you have a IOMMU you > need to map the DMA slave phys_addr_t to a dma_addr_t using something > like this. Is it not very similar to dma_map_single() where one maps > processor virtual memory (instead if MMIO) so that it can be used with > DMA slaves? It's similar, but I don't think this actually works as a general case as there are quite a few places that expect to be able to have a struct page for a physical address. We'd at least need a very careful audit for that case.
Hello, On Tuesday 15 March 2016 01:22:54 Christoph Hellwig wrote: > On Fri, Mar 11, 2016 at 01:58:46PM +0100, Niklas S?derlund wrote: > > Without an IOMMU this is easy since the phys_addr_t and dma_addr_t are > > the same and no special care is needed. However if you have a IOMMU you > > need to map the DMA slave phys_addr_t to a dma_addr_t using something > > like this. Is it not very similar to dma_map_single() where one maps > > processor virtual memory (instead if MMIO) so that it can be used with > > DMA slaves? > > It's similar, but I don't think this actually works as a general case > as there are quite a few places that expect to be able to have a > struct page for a physical address. We'd at least need a very careful > audit for that case. The good news is that, given that no code uses this new API at the moment, there isn't much to audit. The patch series implements the resource mapping for arch/arm only, and makes use of it in the rcar-dmac driver only. Would you like anything audited else than the arch/arm dma mapping implementation, the rcar-dmac driver and the code that then deals with the dma addresses (I'm thinking about the IOMMU subsystem and the ipmmu-vmsa driver in particular) ?
On Thu, Mar 17, 2016 at 01:33:51PM +0200, Laurent Pinchart wrote: > The good news is that, given that no code uses this new API at the moment, > there isn't much to audit. The patch series implements the resource mapping > for arch/arm only, and makes use of it in the rcar-dmac driver only. Would you > like anything audited else than the arch/arm dma mapping implementation, the > rcar-dmac driver and the code that then deals with the dma addresses (I'm > thinking about the IOMMU subsystem and the ipmmu-vmsa driver in particular) ? Yes, it would be good to do an audit of all the ARM dma_ops as well as generic code like drivers/base/dma-*.c, lib/dma-debug.c and include/linux/dma-*.h
Hi Christoph, On 2016-03-21 08:26:01 -0700, Christoph Hellwig wrote: > On Thu, Mar 17, 2016 at 01:33:51PM +0200, Laurent Pinchart wrote: > > The good news is that, given that no code uses this new API at the moment, > > there isn't much to audit. The patch series implements the resource mapping > > for arch/arm only, and makes use of it in the rcar-dmac driver only. Would you > > like anything audited else than the arch/arm dma mapping implementation, the > > rcar-dmac driver and the code that then deals with the dma addresses (I'm > > thinking about the IOMMU subsystem and the ipmmu-vmsa driver in particular) ? > > Yes, it would be good to do an audit of all the ARM dma_ops as well > as generic code like drivers/base/dma-*.c, lib/dma-debug.c and > include/linux/dma-*.h I have now done an audit to the best of my abilities, thanks to Laurent for pointing me in the right direction. And from what I can tell we are good. * drivers/dma/sh/rcar-dmac.c Once the phys_addr_t is mapped to a dma_addr_t using dma_map_resource() it is only used to check that the transfere do not cross 4GB boundaries and then only directly written to HW registers. * drivers/iommu/iommu.c - iommu_map() Check that it's align to min page size or return -EINVAL then calls domain->ops->map() * drivers/iommu/ipmmu-vmsa.c - ipmmu_map() No logic only calls domain->ops->map() * drivers/iommu/io-pgtable-arm.c - arm_lpae_map() No logic only calls __arm_lpae_map() - __arm_lpae_map() No logic only calls arm_lpae_init_pte() - arm_lpae_init_pte() Used to get a pte: pte |= pfn_to_iopte(paddr >> data->pg_shift, data); * drivers/iommu/io-pgtable-arm-v7s.c - arm_v7s_map() No logic only calls __arm_v7s_map() - __arm_v7s_map() No logic only calls arm_v7s_init_pte() - arm_v7s_init_pte Used to get a pte: pte |= paddr & ARM_V7S_LVL_MASK(lvl); * ARM dma-mapping - dma_unmap_* Only valid unmap is dma_unmap_resource() all others are an invalid use case. - dma_sync_single_* Invalid use case, memmory that is mapped is device memmory - dma_common_mmap() and dma_mmap_attrs() Invalid use case - dma_common_get_sgtable() and dma_get_sgtable_attrs() Invalid use case, only for dma_alloc_* allocated memory, - dma_mapping_error() OK
Hi Christoph, Have you had time to look at the audit? Is there anything else I can do make progress on this? On 2016-04-13 15:29:16 +0200, Niklas Söderlund wrote: > Hi Christoph, > > On 2016-03-21 08:26:01 -0700, Christoph Hellwig wrote: > > On Thu, Mar 17, 2016 at 01:33:51PM +0200, Laurent Pinchart wrote: > > > The good news is that, given that no code uses this new API at the moment, > > > there isn't much to audit. The patch series implements the resource mapping > > > for arch/arm only, and makes use of it in the rcar-dmac driver only. Would you > > > like anything audited else than the arch/arm dma mapping implementation, the > > > rcar-dmac driver and the code that then deals with the dma addresses (I'm > > > thinking about the IOMMU subsystem and the ipmmu-vmsa driver in particular) ? > > > > Yes, it would be good to do an audit of all the ARM dma_ops as well > > as generic code like drivers/base/dma-*.c, lib/dma-debug.c and > > include/linux/dma-*.h > > I have now done an audit to the best of my abilities, thanks to Laurent > for pointing me in the right direction. And from what I can tell we are > good. > > * drivers/dma/sh/rcar-dmac.c > Once the phys_addr_t is mapped to a dma_addr_t using > dma_map_resource() it is only used to check that the transfere do not > cross 4GB boundaries and then only directly written to HW registers. > > * drivers/iommu/iommu.c > - iommu_map() > Check that it's align to min page size or return -EINVAL then calls > domain->ops->map() > > * drivers/iommu/ipmmu-vmsa.c > - ipmmu_map() > No logic only calls domain->ops->map() > > * drivers/iommu/io-pgtable-arm.c > - arm_lpae_map() > No logic only calls __arm_lpae_map() > - __arm_lpae_map() > No logic only calls arm_lpae_init_pte() > - arm_lpae_init_pte() > Used to get a pte: > pte |= pfn_to_iopte(paddr >> data->pg_shift, data); > > * drivers/iommu/io-pgtable-arm-v7s.c > - arm_v7s_map() > No logic only calls __arm_v7s_map() > - __arm_v7s_map() > No logic only calls arm_v7s_init_pte() > - arm_v7s_init_pte > Used to get a pte: > pte |= paddr & ARM_V7S_LVL_MASK(lvl); > > * ARM dma-mapping > - dma_unmap_* > Only valid unmap is dma_unmap_resource() all others are an invalid > use case. > - dma_sync_single_* > Invalid use case, memmory that is mapped is device memmory > - dma_common_mmap() and dma_mmap_attrs() > Invalid use case > - dma_common_get_sgtable() and dma_get_sgtable_attrs() > Invalid use case, only for dma_alloc_* allocated memory, > - dma_mapping_error() > OK
On Wed, Apr 13, 2016 at 03:29:17PM +0200, Niklas S?derlund wrote: > > Yes, it would be good to do an audit of all the ARM dma_ops as well > > as generic code like drivers/base/dma-*.c, lib/dma-debug.c and > > include/linux/dma-*.h What about things like the phys_addr() helper in lib/dma-debug.c? That was in fact what prompted my question for an audit, and it seems to not even feature on your list. What patterns / symbols did you look for?
Hi Christoph, On 2016-04-21 06:49:42 -0700, Christoph Hellwig wrote: > On Wed, Apr 13, 2016 at 03:29:17PM +0200, Niklas S?derlund wrote: > > > Yes, it would be good to do an audit of all the ARM dma_ops as well > > > as generic code like drivers/base/dma-*.c, lib/dma-debug.c and > > > include/linux/dma-*.h > > What about things like the phys_addr() helper in lib/dma-debug.c? That > was in fact what prompted my question for an audit, and it seems > to not even feature on your list. What patterns / symbols did you look > for? I have followed the call path from the usage in drivers/dma/sh/rcar-dmac.c and made sure the dma_addr_t is not used in a bad way. I also looked at the dma-mapping API and ruled out that the only function that make sens to use with a dma_addr_t from dma_map_resource() are dma_unmap_resource() and dma_mapping_error(). With that I can't see how a dma_addr_t would end up in lib/dma-debug.c. But I might be missing something? In the big picture do you feel the approach I have is the correct way to solve my problem? Provided we can work out this issues ofc?
On Mon, Apr 25, 2016 at 04:26:19PM +0200, Niklas S?derlund wrote: > I have followed the call path from the usage in > drivers/dma/sh/rcar-dmac.c and made sure the dma_addr_t is not used in a > bad way. The dma-debug routines are called from the generic code in include/linux/dma-mapping.h, and from my reading of the other patches in your series you are calling it for these as well.
Hi Christoph, On 2016-04-25 12:10:04 -0700, Christoph Hellwig wrote: > On Mon, Apr 25, 2016 at 04:26:19PM +0200, Niklas S?derlund wrote: > > I have followed the call path from the usage in > > drivers/dma/sh/rcar-dmac.c and made sure the dma_addr_t is not used in a > > bad way. > > The dma-debug routines are called from the generic code in > include/linux/dma-mapping.h, and from my reading of the other patches > in your series you are calling it for these as well. You are correct I have not consider that dma_mapping_error() call in to lib/dma-debug.c. I will see if I can make the dma_mapping_error() safe to use with a dma_addr_t obtained from dma_map_resource() and post a new series. Thanks for pointing this out!
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 45ef3f2..248556a 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -277,14 +277,29 @@ and <size> parameters are provided to do partial page mapping, it is recommended that you never use these unless you really know what the cache width is. +dma_addr_t +dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size, + enum dma_data_direction dir, struct dma_attrs *attrs) + +Maps a MMIO region so it can be accessed by the device and returns the +DMA address of the memory. API should only be used to map device MMIO, +mapping of RAM is not permitted. + +void +dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size, + enum dma_data_direction dir, struct dma_attrs *attrs) + +Unmaps the IOMMU region previously mapped. All the parameters passed in +must be identical to those passed in (and returned) by the mapping API. + int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) -In some circumstances dma_map_single() and dma_map_page() will fail to create -a mapping. A driver can check for these errors by testing the returned -DMA address with dma_mapping_error(). A non-zero return value means the mapping -could not be created and the driver should take appropriate action (e.g. -reduce current DMA mapping usage or delay and try again later). +In some circumstances dma_map_single(), dma_map_page() and dma_map_resource() +will fail to create a mapping. A driver can check for these errors by testing +the returned DMA address with dma_mapping_error(). A non-zero return value +means the mapping could not be created and the driver should take appropriate +action (e.g. reduce current DMA mapping usage or delay and try again later). -- Regards, Niklas Söderlund