Message ID | 20220818165105.99746-1-s.miroshnichenko@yadro.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | riscv: dma-mapping: Use conventional cache operations for dma_sync*() | expand |
On 18 Aug 2022, at 17:51, Sergei Miroshnichenko <s.miroshnichenko@yadro.com> wrote: > > Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added recently. > > Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus invalidate) > cache operation, which can overwrite the data (received from a device via > DMA) with dirty cache lines. Replace it with the inval to avoid data > corruptions. > > The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the cache, > but it is called after the device had owned the buffer. So in order to not > loose the received data, stick to the inval here as well. > > As of for_device(FROM_DEVICE), according to the inspirational discussion > about matching the DMA API and cache operations [1], the inval or a no-op > should be here, and this would resonate with most other arches. But in a > later discussion [2] it was decided that the clean (which is used now) is > indeed preferable and safer, though slower. The arm64 has been recently > changed in the same way [3]. Flush is always a valid way to implement invalidate, and if it does not work then you have bugs elsewhere, since the CPU is free to write out the dirty cache lines at any point before an invalidate. So whilst it’s more efficient to use invalidate and hint to the core that it does not need to do any writing, it is no more correct, and this this patch description is misguided. You even might wish to use the fact that it is currently a flush as a convenient sanitiser for whatever is currently violating the non-coherent DMA rules and fix that before it’s switched to invalidate and papers over the issue (at least in whatever implementation you’re running on). Jess > [1] dma_sync_*_for_cpu and direction=TO_DEVICE > Link: https://lkml.org/lkml/2018/5/18/979 > > [2] Cache maintenance for non-coherent DMA in arch_sync_dma_for_device() > Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > > [3] commit c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer") > > Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension") > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com> > --- > arch/riscv/mm/dma-noncoherent.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index cd2225304c82..2a8f6124021f 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -45,7 +45,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > break; > case DMA_FROM_DEVICE: > case DMA_BIDIRECTIONAL: > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > break; > default: > break; > -- > 2.37.2 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
Hi Jess, On Thu, 2022-08-18 at 19:12 +0100, Jessica Clarke wrote: > On 18 Aug 2022, at 17:51, Sergei Miroshnichenko > <s.miroshnichenko@yadro.com> wrote: > > > > Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added > > recently. > > > > Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus > > invalidate) > > cache operation, which can overwrite the data (received from a > > device via > > DMA) with dirty cache lines. Replace it with the inval to avoid > > data > > corruptions. > > > > The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the > > cache, > > but it is called after the device had owned the buffer. So in order > > to not > > loose the received data, stick to the inval here as well. > > > > As of for_device(FROM_DEVICE), according to the inspirational > > discussion > > about matching the DMA API and cache operations [1], the inval or a > > no-op > > should be here, and this would resonate with most other arches. But > > in a > > later discussion [2] it was decided that the clean (which is used > > now) is > > indeed preferable and safer, though slower. The arm64 has been > > recently > > changed in the same way [3]. > > Flush is always a valid way to implement invalidate, and if it does > not > work then you have bugs elsewhere, since the CPU is free to write out > the dirty cache lines at any point before an invalidate. So whilst > it’s > more efficient to use invalidate and hint to the core that it does > not > need to do any writing, it is no more correct, and this this patch > description is misguided. You even might wish to use the fact that it > is currently a flush as a convenient sanitiser for whatever is > currently violating the non-coherent DMA rules and fix that before > it’s > switched to invalidate and papers over the issue (at least in > whatever > implementation you’re running on). You are right, thank you, the description *is* misleading: cache lines can't get dirty before for_cpu(), as they are cleaned during for_device(), and are not written to by the CPU in between. I gave it another thought: the reason to invalidate in for_cpu() is that the CPU can speculatively prefetch the buffer after map()/for_device() but before the DMA is finished. Even if replace the clean in for_device(FROM_DEVICE) with the flush, a prefetch still may happen, requiring an invalidation. Serge > > Jess > > > [1] dma_sync_*_for_cpu and direction=TO_DEVICE > > Link: https://lkml.org/lkml/2018/5/18/979 > > > > [2] Cache maintenance for non-coherent DMA in > > arch_sync_dma_for_device() > > Link: > > https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ > > > > [3] commit c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE > > buffers at start of DMA transfer") > > > > Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices > > using zicbom extension") > > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com> > > --- > > arch/riscv/mm/dma-noncoherent.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma- > > noncoherent.c > > index cd2225304c82..2a8f6124021f 100644 > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -45,7 +45,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, > > size_t size, > > break; > > case DMA_FROM_DEVICE: > > case DMA_BIDIRECTIONAL: > > - ALT_CMO_OP(flush, vaddr, size, > > riscv_cbom_block_size); > > + ALT_CMO_OP(inval, vaddr, size, > > riscv_cbom_block_size); > > break; > > default: > > break; > > -- > > 2.37.2 > > > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv >
On 18 Aug 2022, at 21:52, Sergei Miroshnichenko <s.miroshnichenko@yadro.com> wrote: > > Hi Jess, > > On Thu, 2022-08-18 at 19:12 +0100, Jessica Clarke wrote: >> On 18 Aug 2022, at 17:51, Sergei Miroshnichenko >> <s.miroshnichenko@yadro.com> wrote: >>> >>> Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added >>> recently. >>> >>> Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus >>> invalidate) >>> cache operation, which can overwrite the data (received from a >>> device via >>> DMA) with dirty cache lines. Replace it with the inval to avoid >>> data >>> corruptions. >>> >>> The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the >>> cache, >>> but it is called after the device had owned the buffer. So in order >>> to not >>> loose the received data, stick to the inval here as well. >>> >>> As of for_device(FROM_DEVICE), according to the inspirational >>> discussion >>> about matching the DMA API and cache operations [1], the inval or a >>> no-op >>> should be here, and this would resonate with most other arches. But >>> in a >>> later discussion [2] it was decided that the clean (which is used >>> now) is >>> indeed preferable and safer, though slower. The arm64 has been >>> recently >>> changed in the same way [3]. >> >> Flush is always a valid way to implement invalidate, and if it does >> not >> work then you have bugs elsewhere, since the CPU is free to write out >> the dirty cache lines at any point before an invalidate. So whilst >> it’s >> more efficient to use invalidate and hint to the core that it does >> not >> need to do any writing, it is no more correct, and this this patch >> description is misguided. You even might wish to use the fact that it >> is currently a flush as a convenient sanitiser for whatever is >> currently violating the non-coherent DMA rules and fix that before >> it’s >> switched to invalidate and papers over the issue (at least in >> whatever >> implementation you’re running on). > > You are right, thank you, the description *is* misleading: cache lines > can't get dirty before for_cpu(), as they are cleaned during > for_device(), and are not written to by the CPU in between. > > I gave it another thought: the reason to invalidate in for_cpu() is > that the CPU can speculatively prefetch the buffer after > map()/for_device() but before the DMA is finished. > > Even if replace the clean in for_device(FROM_DEVICE) with the flush, a > prefetch still may happen, requiring an invalidation. In that case the line is still clean, not dirty, so a flush will behave the same as an invalidate. So no, this is still wrong, flush is always correct, even if invalidate is potentially faster. Jess
diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index cd2225304c82..2a8f6124021f 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -45,7 +45,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, break; case DMA_FROM_DEVICE: case DMA_BIDIRECTIONAL: - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); break; default: break;
Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added recently. Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus invalidate) cache operation, which can overwrite the data (received from a device via DMA) with dirty cache lines. Replace it with the inval to avoid data corruptions. The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the cache, but it is called after the device had owned the buffer. So in order to not loose the received data, stick to the inval here as well. As of for_device(FROM_DEVICE), according to the inspirational discussion about matching the DMA API and cache operations [1], the inval or a no-op should be here, and this would resonate with most other arches. But in a later discussion [2] it was decided that the clean (which is used now) is indeed preferable and safer, though slower. The arm64 has been recently changed in the same way [3]. [1] dma_sync_*_for_cpu and direction=TO_DEVICE Link: https://lkml.org/lkml/2018/5/18/979 [2] Cache maintenance for non-coherent DMA in arch_sync_dma_for_device() Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/ [3] commit c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer") Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension") Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com> --- arch/riscv/mm/dma-noncoherent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)