Message ID | 1308556213-24970-2-git-send-email-m.szyprowski@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello, On Monday, June 20, 2011 10:36 AM Michal Nazarewicz wrote: > On Mon, 20 Jun 2011 09:50:06 +0200, Marek Szyprowski > <m.szyprowski@samsung.com> wrote: > > +static inline void dma_sync_single_for_cpu(struct device *dev, > > I wouldn't really put inline here or in the function below. > > > + dma_addr_t handle, size_t size, enum dma_data_direction dir) > > +{ > > + BUG_ON(!valid_dma_direction(dir)); > > + > > + debug_dma_sync_single_for_cpu(dev, handle, size, dir); > > + > > + if (!dmabounce_sync_for_cpu(dev, handle, size, dir)) > > + return; > > + > > + __dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir); > > I know it is just copy'n'paste but how about: This patch is just about moving the code between the files and I wanted just to show what's being changed and how. There is a final cleanup anyway in the separate patch. All these patches are meant to start the discussion about the way the dma mapping can be redesigned for further extensions with generic iommu support. > > if (dmabounce_sync_for_cpu(dev, handle, size, dir)) > __dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir); The above lines will be removed by the next patches in this series, so I really see no point in changing this. (snipped) Best regards
On Mon, Jun 20, 2011 at 09:50:06AM +0200, Marek Szyprowski wrote: > This patch removes the need for offset parameter in dma bounce > functions. This is required to let dma-mapping framework on ARM > architecture use common, generic dma-mapping helpers. I really don't like this. Really as in hate. Why? I've said in the past that the whole idea of getting rid of the sub-range functions is idiotic. If you have to deal with cache coherence, what you _really_ want is an API which tells you the size of the original buffer and the section of that buffer which you want to handle - because the edges of the buffer need special handling. Lets say that you have a buffer which is 256 bytes long, misaligned to half a cache line. Let's first look at the sequence for whole-buffer: 1. You map it for DMA from the device. This means you writeback the first and last cache lines to perserve any data shared in the overlapping cache line. The remainder you can just invalidate. 2. You want to access the buffer, so you use the sync_for_cpu function. If your CPU doesn't do any speculative prefetching, then you don't need to do anything. If you do, you have to invalidate the buffer, but you must preserve the overlapping cache lines which again must be written back. 3. You transfer ownership back to the device using sync_for_device. As you may have caused cache lines to be read in, again you need to invalidate, and the overlapping cache lines must be written back. Now, if you ask for a sub-section of the buffer to be sync'd, you can actually eliminate those writebacks which are potentially troublesome, and which could corrupt neighbouring data. If you get rid of the sub-buffer functions and start using the whole buffer functions for that purpose, you no longer know whether the partial cache lines are part of the buffer or not, so you have to write those back every time too. So far, we haven't had any reports of corruption of this type (maybe folk using the sync functions are rare on ARM - thankfully) but getting rid of the range sync functions means that solving this becomes a lot more difficult because we've lost the information to make the decision. So I've always believed - and continue to do so - that those who want to get rid of the range sync functions are misguided and are storing up problems for the future.
Hello, On Sunday, July 03, 2011 5:28 PM Russell King wrote: > On Mon, Jun 20, 2011 at 09:50:06AM +0200, Marek Szyprowski wrote: > > This patch removes the need for offset parameter in dma bounce > > functions. This is required to let dma-mapping framework on ARM > > architecture use common, generic dma-mapping helpers. > > I really don't like this. Really as in hate. Why? I've said in the past > that the whole idea of getting rid of the sub-range functions is idiotic. > > If you have to deal with cache coherence, what you _really_ want is an > API which tells you the size of the original buffer and the section of > that buffer which you want to handle - because the edges of the buffer > need special handling. > > Lets say that you have a buffer which is 256 bytes long, misaligned to > half a cache line. Let's first look at the sequence for whole-buffer: > > 1. You map it for DMA from the device. This means you writeback the > first and last cache lines to perserve any data shared in the > overlapping cache line. The remainder you can just invalidate. > > 2. You want to access the buffer, so you use the sync_for_cpu function. > If your CPU doesn't do any speculative prefetching, then you don't > need to do anything. If you do, you have to invalidate the buffer, > but you must preserve the overlapping cache lines which again must > be written back. > > 3. You transfer ownership back to the device using sync_for_device. > As you may have caused cache lines to be read in, again you need to > invalidate, and the overlapping cache lines must be written back. > > Now, if you ask for a sub-section of the buffer to be sync'd, you can > actually eliminate those writebacks which are potentially troublesome, > and which could corrupt neighbouring data. > > If you get rid of the sub-buffer functions and start using the whole > buffer functions for that purpose, you no longer know whether the > partial cache lines are part of the buffer or not, so you have to write > those back every time too. > > So far, we haven't had any reports of corruption of this type (maybe > folk using the sync functions are rare on ARM - thankfully) but getting > rid of the range sync functions means that solving this becomes a lot > more difficult because we've lost the information to make the decision. Well, right now I haven't heard anyone who wants to remove dma_sync_single_range_for_{cpu,device}. All this is about internal implementation and dma_map_ops which uses the simplified calls, not exposed to the drivers or any public API. I also see no reason why we loose the information. All drivers are still required to call dma_map_{single,page} to aquire dma address first. This way DMA mapping subsystem perfectly knows that the range from returned dma_addr to dma_addr+size has been used for dma operations. All calls to dma_sync_single_* operations takes dma_addr as one of the arguments, so there is no problem to check which dma range this particular sync operation fits. In my patch I have shown that it is perfectly possible to use the common dma_map_ops structure on ARM and unify dma mapping implementation a bit with other architectures. IMHO this is the right way. There is a need for custom dma mapping implementations (mainly related to taking the advantage of iommu controllers available on newer SoCs). I would really like to avoid another set of ifdefs or sequences of "if (iommu_supported())" all over the dma-mapping code. Even now all this code is hard to understand in the first read (due to coherent/ non-coherent sub-architectures and dmabounce code mixed in). > So I've always believed - and continue to do so - that those who want > to get rid of the range sync functions are misguided and are storing up > problems for the future. I never said that I want to remove these operations from drivers API. Best regards
diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c index e568163..f7b330f 100644 --- a/arch/arm/common/dmabounce.c +++ b/arch/arm/common/dmabounce.c @@ -171,7 +171,8 @@ find_safe_buffer(struct dmabounce_device_info *device_info, dma_addr_t safe_dma_ read_lock_irqsave(&device_info->lock, flags); list_for_each_entry(b, &device_info->safe_buffers, node) - if (b->safe_dma_addr == safe_dma_addr) { + if (b->safe_dma_addr <= safe_dma_addr && + b->safe_dma_addr + b->size > safe_dma_addr) { rb = b; break; } @@ -391,9 +392,10 @@ void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size, EXPORT_SYMBOL(__dma_unmap_page); int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, - unsigned long off, size_t sz, enum dma_data_direction dir) + size_t sz, enum dma_data_direction dir) { struct safe_buffer *buf; + unsigned long off; dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n", __func__, addr, off, sz, dir); @@ -402,6 +404,8 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, if (!buf) return 1; + off = addr - buf->safe_dma_addr; + BUG_ON(buf->direction != dir); dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n", @@ -420,9 +424,10 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, EXPORT_SYMBOL(dmabounce_sync_for_cpu); int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, - unsigned long off, size_t sz, enum dma_data_direction dir) + size_t sz, enum dma_data_direction dir) { struct safe_buffer *buf; + unsigned long off; dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n", __func__, addr, off, sz, dir); @@ -431,6 +436,8 @@ int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, if (!buf) return 1; + off = addr - buf->safe_dma_addr; + BUG_ON(buf->direction != dir); dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n", diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index 4fff837..ca920aa 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -310,10 +310,8 @@ extern void __dma_unmap_page(struct device *, dma_addr_t, size_t, /* * Private functions */ -int dmabounce_sync_for_cpu(struct device *, dma_addr_t, unsigned long, - size_t, enum dma_data_direction); -int dmabounce_sync_for_device(struct device *, dma_addr_t, unsigned long, - size_t, enum dma_data_direction); +int dmabounce_sync_for_cpu(struct device *, dma_addr_t, size_t, enum dma_data_direction); +int dmabounce_sync_for_device(struct device *, dma_addr_t, size_t, enum dma_data_direction); #else static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr, unsigned long offset, size_t size, enum dma_data_direction dir) @@ -454,6 +452,33 @@ static inline void dma_unmap_page(struct device *dev, dma_addr_t handle, __dma_unmap_page(dev, handle, size, dir); } + +static inline void dma_sync_single_for_cpu(struct device *dev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + BUG_ON(!valid_dma_direction(dir)); + + debug_dma_sync_single_for_cpu(dev, handle, size, dir); + + if (!dmabounce_sync_for_cpu(dev, handle, size, dir)) + return; + + __dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir); +} + +static inline void dma_sync_single_for_device(struct device *dev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + BUG_ON(!valid_dma_direction(dir)); + + debug_dma_sync_single_for_device(dev, handle, size, dir); + + if (!dmabounce_sync_for_device(dev, handle, size, dir)) + return; + + __dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir); +} + /** * dma_sync_single_range_for_cpu * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices @@ -476,40 +501,14 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev, dma_addr_t handle, unsigned long offset, size_t size, enum dma_data_direction dir) { - BUG_ON(!valid_dma_direction(dir)); - - debug_dma_sync_single_for_cpu(dev, handle + offset, size, dir); - - if (!dmabounce_sync_for_cpu(dev, handle, offset, size, dir)) - return; - - __dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir); + dma_sync_single_for_cpu(dev, handle + offset, size, dir); } static inline void dma_sync_single_range_for_device(struct device *dev, dma_addr_t handle, unsigned long offset, size_t size, enum dma_data_direction dir) { - BUG_ON(!valid_dma_direction(dir)); - - debug_dma_sync_single_for_device(dev, handle + offset, size, dir); - - if (!dmabounce_sync_for_device(dev, handle, offset, size, dir)) - return; - - __dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir); -} - -static inline void dma_sync_single_for_cpu(struct device *dev, - dma_addr_t handle, size_t size, enum dma_data_direction dir) -{ - dma_sync_single_range_for_cpu(dev, handle, 0, size, dir); -} - -static inline void dma_sync_single_for_device(struct device *dev, - dma_addr_t handle, size_t size, enum dma_data_direction dir) -{ - dma_sync_single_range_for_device(dev, handle, 0, size, dir); + dma_sync_single_for_device(dev, handle + offset, size, dir); } /* diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 82a093c..c11f234 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -619,7 +619,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int i; for_each_sg(sg, s, nents, i) { - if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s), 0, + if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s), sg_dma_len(s), dir)) continue; @@ -645,7 +645,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int i; for_each_sg(sg, s, nents, i) { - if (!dmabounce_sync_for_device(dev, sg_dma_address(s), 0, + if (!dmabounce_sync_for_device(dev, sg_dma_address(s), sg_dma_len(s), dir)) continue;