Message ID | 1314826214-22428-3-git-send-email-msalter@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 31 August 2011 22:30, Mark Salter <msalter@redhat.com> wrote: > For ARM kernels using CONFIG_ARM_DMA_MEM_BUFFERABLE, this patch adds an ARM > specific dma_coherent_write_sync() to override the default version. This > routine forces out any data sitting in a write buffer between the CPU and > memory. > > Signed-off-by: Mark Salter <msalter@redhat.com> > --- > arch/arm/include/asm/dma-mapping.h | 10 ++++++++++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h > index 7a21d0b..e99562b 100644 > --- a/arch/arm/include/asm/dma-mapping.h > +++ b/arch/arm/include/asm/dma-mapping.h > @@ -206,6 +206,16 @@ int dma_mmap_writecombine(struct device *, struct vm_area_struct *, > void *, dma_addr_t, size_t); > > > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE > +#define ARCH_HAS_DMA_COHERENT_WRITE_SYNC > + > +static inline void dma_coherent_write_sync(void) > +{ > + dsb(); > + outer_sync(); > +} That's what mb() and wmb() do already, at least on ARM. Why do we need another API? IIRC from past discussions on linux-arch around barriers, the mb() should be sufficient in the case of DMA coherent buffers. That's why macros like writel() on ARM have the mb() added by default (for cases where you start the DMA transfer by writing to a device register).
On Tue, 2011-09-06 at 15:32 +0100, Catalin Marinas wrote: > That's what mb() and wmb() do already, at least on ARM. Why do we need > another API? IIRC from past discussions on linux-arch around barriers, > the mb() should be sufficient in the case of DMA coherent buffers. > That's why macros like writel() on ARM have the mb() added by default > (for cases where you start the DMA transfer by writing to a device > register). For USB EHCI, the driver does not necessarily write to a register after writing to DMA coherent memory. In some cases, the controller polls for information written by the driver. --Mark
On 6 September 2011 15:37, Mark Salter <msalter@redhat.com> wrote: > On Tue, 2011-09-06 at 15:32 +0100, Catalin Marinas wrote: >> That's what mb() and wmb() do already, at least on ARM. Why do we need >> another API? IIRC from past discussions on linux-arch around barriers, >> the mb() should be sufficient in the case of DMA coherent buffers. >> That's why macros like writel() on ARM have the mb() added by default >> (for cases where you start the DMA transfer by writing to a device >> register). > > For USB EHCI, the driver does not necessarily write to a register after > writing to DMA coherent memory. In some cases, the controller polls for > information written by the driver. So as I understand, you would like to force the eviction from the write buffer rather than waiting for it to be drained. On ARM, the write buffer is eventually flushed, so there is no strict timing guarantee. It could take longer if the processor immediately starts polling some memory location for example, but in this case a simple barrier would do.
On Tue, 2011-09-06 at 15:48 +0100, Catalin Marinas wrote: > On 6 September 2011 15:37, Mark Salter <msalter@redhat.com> wrote: > > On Tue, 2011-09-06 at 15:32 +0100, Catalin Marinas wrote: > >> That's what mb() and wmb() do already, at least on ARM. Why do we need > >> another API? IIRC from past discussions on linux-arch around barriers, > >> the mb() should be sufficient in the case of DMA coherent buffers. > >> That's why macros like writel() on ARM have the mb() added by default > >> (for cases where you start the DMA transfer by writing to a device > >> register). > > > > For USB EHCI, the driver does not necessarily write to a register after > > writing to DMA coherent memory. In some cases, the controller polls for > > information written by the driver. > > So as I understand, you would like to force the eviction from the > write buffer rather than waiting for it to be drained. On ARM, the > write buffer is eventually flushed, so there is no strict timing > guarantee. It could take longer if the processor immediately starts > polling some memory location for example, but in this case a simple > barrier would do. Yes, a memory barrier would have the same effect on ARM, but the purpose of a barrier is to guarantee ordering. What the patch does is add an interface to force a write buffer flush for performance, not ordering. If a memory barrier is used, it could have a negative impact on other arches. In any case, the current thinking is that the original problem with the USB performance seen on cortex A9 multicore is probably something more than just write buffer delays. Once the original problem is better understood, we can take another look at this patch if it is still needed. --Mark
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index 7a21d0b..e99562b 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -206,6 +206,16 @@ int dma_mmap_writecombine(struct device *, struct vm_area_struct *, void *, dma_addr_t, size_t); +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE +#define ARCH_HAS_DMA_COHERENT_WRITE_SYNC + +static inline void dma_coherent_write_sync(void) +{ + dsb(); + outer_sync(); +} +#endif + #ifdef CONFIG_DMABOUNCE /* * For SA-1111, IXP425, and ADI systems the dma-mapping functions are "magic"
For ARM kernels using CONFIG_ARM_DMA_MEM_BUFFERABLE, this patch adds an ARM specific dma_coherent_write_sync() to override the default version. This routine forces out any data sitting in a write buffer between the CPU and memory. Signed-off-by: Mark Salter <msalter@redhat.com> --- arch/arm/include/asm/dma-mapping.h | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-)