Message ID | alpine.DEB.2.02.1411111644490.26318@kaball.uk.xensource.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On 17/11/14 14:11, Stefano Stabellini wrote: > Hi all, > I am writing this email to ask for your advice. > > On architectures where dma addresses are different from physical > addresses, it can be difficult to retrieve the physical address of a > page from its dma address. > > Specifically this is the case for Xen on arm and arm64 but I think that > other architectures might have the same issue. > > Knowing the physical address is necessary to be able to issue any > required cache maintenance operations when unmap_page, > sync_single_for_cpu and sync_single_for_device are called. > > Adding a struct page* parameter to unmap_page, sync_single_for_cpu and > sync_single_for_device would make Linux dma handling on Xen on arm and > arm64 much easier and quicker. Using an opaque handle instead of struct page * would be more beneficial for the Intel IOMMU driver. e.g., typedef dma_addr_t dma_handle_t; dma_handle_t dma_map_single(struct device *dev, void *va, size_t size, enum dma_data_direction dir); void dma_unmap_single(struct device *dev, dma_handle_t handle, size_t size, enum dma_data_direction dir); etc. Drivers would then use: dma_addr_t dma_addr(dma_handle_t handle); To obtain the bus address from the handle. > I think that other drivers have similar problems, such as the Intel > IOMMU driver having to call find_iova and walking down an rbtree to get > the physical address in its implementation of unmap_page. > > Callers have the struct page* in their hands already from the previous > map_page call so it shouldn't be an issue for them. A problem does > exist however: there are about 280 callers of dma_unmap_page and > pci_unmap_page. We have even more callers of the dma_sync_single_for_* > functions. You will also need to fix dma_unmap_single() and pci_unmap_single() (another 1000+ callers). You may need to consider a parallel set of map/unmap API calls that return/accept a handle, and then converting drivers one-by-one as required, instead of trying to convert every single driver at once. David -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 17 Nov 2014, Stefano Stabellini wrote: > Hi all, > I am writing this email to ask for your advice. > > On architectures where dma addresses are different from physical > addresses, it can be difficult to retrieve the physical address of a > page from its dma address. > > Specifically this is the case for Xen on arm and arm64 but I think that > other architectures might have the same issue. > > Knowing the physical address is necessary to be able to issue any > required cache maintenance operations when unmap_page, > sync_single_for_cpu and sync_single_for_device are called. > > Adding a struct page* parameter to unmap_page, sync_single_for_cpu and > sync_single_for_device would make Linux dma handling on Xen on arm and > arm64 much easier and quicker. > > I think that other drivers have similar problems, such as the Intel > IOMMU driver having to call find_iova and walking down an rbtree to get > the physical address in its implementation of unmap_page. > > Callers have the struct page* in their hands already from the previous > map_page call so it shouldn't be an issue for them. A problem does > exist however: there are about 280 callers of dma_unmap_page and > pci_unmap_page. We have even more callers of the dma_sync_single_for_* > functions. > > > > Is such a change even conceivable? How would one go about it? > > I think that Xen would not be the only one to gain from it, but I would > like to have a confirmation from others: given the magnitude of the > changes involved I would actually prefer to avoid them unless multiple > drivers/archs/subsystems could really benefit from them. Given the lack of interest from the community, I am going to drop this idea. > Cheers, > > Stefano > > > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h > index d5d3881..158a765 100644 > --- a/include/linux/dma-mapping.h > +++ b/include/linux/dma-mapping.h > @@ -31,8 +31,9 @@ struct dma_map_ops { > unsigned long offset, size_t size, > enum dma_data_direction dir, > struct dma_attrs *attrs); > - void (*unmap_page)(struct device *dev, dma_addr_t dma_handle, > - size_t size, enum dma_data_direction dir, > + void (*unmap_page)(struct device *dev, struct page *page, > + dma_addr_t dma_handle, size_t size, > + enum dma_data_direction dir, > struct dma_attrs *attrs); > int (*map_sg)(struct device *dev, struct scatterlist *sg, > int nents, enum dma_data_direction dir, > @@ -41,10 +42,10 @@ struct dma_map_ops { > struct scatterlist *sg, int nents, > enum dma_data_direction dir, > struct dma_attrs *attrs); > - void (*sync_single_for_cpu)(struct device *dev, > + void (*sync_single_for_cpu)(struct device *dev, struct page *page, > dma_addr_t dma_handle, size_t size, > enum dma_data_direction dir); > - void (*sync_single_for_device)(struct device *dev, > + void (*sync_single_for_device)(struct device *dev, struct page *page, > dma_addr_t dma_handle, size_t size, > enum dma_data_direction dir); > void (*sync_sg_for_cpu)(struct device *dev, > -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 21 2014 at 03:48:33 AM, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > On Mon, 17 Nov 2014, Stefano Stabellini wrote: >> Hi all, >> I am writing this email to ask for your advice. >> >> On architectures where dma addresses are different from physical >> addresses, it can be difficult to retrieve the physical address of a >> page from its dma address. >> >> Specifically this is the case for Xen on arm and arm64 but I think that >> other architectures might have the same issue. >> >> Knowing the physical address is necessary to be able to issue any >> required cache maintenance operations when unmap_page, >> sync_single_for_cpu and sync_single_for_device are called. >> >> Adding a struct page* parameter to unmap_page, sync_single_for_cpu and >> sync_single_for_device would make Linux dma handling on Xen on arm and >> arm64 much easier and quicker. >> >> I think that other drivers have similar problems, such as the Intel >> IOMMU driver having to call find_iova and walking down an rbtree to get >> the physical address in its implementation of unmap_page. >> >> Callers have the struct page* in their hands already from the previous >> map_page call so it shouldn't be an issue for them. A problem does >> exist however: there are about 280 callers of dma_unmap_page and >> pci_unmap_page. We have even more callers of the dma_sync_single_for_* >> functions. >> >> >> >> Is such a change even conceivable? How would one go about it? >> >> I think that Xen would not be the only one to gain from it, but I would >> like to have a confirmation from others: given the magnitude of the >> changes involved I would actually prefer to avoid them unless multiple >> drivers/archs/subsystems could really benefit from them. > > Given the lack of interest from the community, I am going to drop this > idea. Actually it sounds like the right API design to me. As a bonus it should help performance a bit as well. For example, the current implementations of dma_sync_single_for_{cpu,device} and dma_unmap_page on ARM while using the IOMMU mapper (arm_iommu_sync_single_for_{cpu,device}, arm_iommu_unmap_page) all call iommu_iova_to_phys which generally results in a page table walk or a hardware register write/poll/read. The problem, as you mentioned, is that there are a ton of callers of the existing APIs. I think David Vrabel had a good suggestion for dealing with this: On Mon, Nov 17 2014 at 06:43:46 AM, David Vrabel <david.vrabel@citrix.com> wrote: > You may need to consider a parallel set of map/unmap API calls that > return/accept a handle, and then converting drivers one-by-one as > required, instead of trying to convert every single driver at once. However, I'm not sure whether the costs of having a parallel set of APIs outweigh the benefits of a cleaner API and a slight performance boost... But I hope the idea isn't completely abandoned without some profiling or other evidence of its benefits (e.g. patches showing how drivers could be simplified with the new APIs). -Mitch
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index d5d3881..158a765 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -31,8 +31,9 @@ struct dma_map_ops { unsigned long offset, size_t size, enum dma_data_direction dir, struct dma_attrs *attrs); - void (*unmap_page)(struct device *dev, dma_addr_t dma_handle, - size_t size, enum dma_data_direction dir, + void (*unmap_page)(struct device *dev, struct page *page, + dma_addr_t dma_handle, size_t size, + enum dma_data_direction dir, struct dma_attrs *attrs); int (*map_sg)(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir, @@ -41,10 +42,10 @@ struct dma_map_ops { struct scatterlist *sg, int nents, enum dma_data_direction dir, struct dma_attrs *attrs); - void (*sync_single_for_cpu)(struct device *dev, + void (*sync_single_for_cpu)(struct device *dev, struct page *page, dma_addr_t dma_handle, size_t size, enum dma_data_direction dir); - void (*sync_single_for_device)(struct device *dev, + void (*sync_single_for_device)(struct device *dev, struct page *page, dma_addr_t dma_handle, size_t size, enum dma_data_direction dir); void (*sync_sg_for_cpu)(struct device *dev,