diff mbox series

riscv: dma-mapping: Use conventional cache operations for dma_sync*()

Message ID 20220818165105.99746-1-s.miroshnichenko@yadro.com (mailing list archive)
State New, archived
Headers show
Series riscv: dma-mapping: Use conventional cache operations for dma_sync*() | expand

Commit Message

Sergei Miroshnichenko Aug. 18, 2022, 4:51 p.m. UTC
Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added recently.

Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus invalidate)
cache operation, which can overwrite the data (received from a device via
DMA) with dirty cache lines. Replace it with the inval to avoid data
corruptions.

The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the cache,
but it is called after the device had owned the buffer. So in order to not
loose the received data, stick to the inval here as well.

As of for_device(FROM_DEVICE), according to the inspirational discussion
about matching the DMA API and cache operations [1], the inval or a no-op
should be here, and this would resonate with most other arches. But in a
later discussion [2] it was decided that the clean (which is used now) is
indeed preferable and safer, though slower. The arm64 has been recently
changed in the same way [3].

[1] dma_sync_*_for_cpu and direction=TO_DEVICE
Link: https://lkml.org/lkml/2018/5/18/979

[2] Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/

[3] commit c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer")

Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension")
Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 arch/riscv/mm/dma-noncoherent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jessica Clarke Aug. 18, 2022, 6:12 p.m. UTC | #1
On 18 Aug 2022, at 17:51, Sergei Miroshnichenko <s.miroshnichenko@yadro.com> wrote:
> 
> Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added recently.
> 
> Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus invalidate)
> cache operation, which can overwrite the data (received from a device via
> DMA) with dirty cache lines. Replace it with the inval to avoid data
> corruptions.
> 
> The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the cache,
> but it is called after the device had owned the buffer. So in order to not
> loose the received data, stick to the inval here as well.
> 
> As of for_device(FROM_DEVICE), according to the inspirational discussion
> about matching the DMA API and cache operations [1], the inval or a no-op
> should be here, and this would resonate with most other arches. But in a
> later discussion [2] it was decided that the clean (which is used now) is
> indeed preferable and safer, though slower. The arm64 has been recently
> changed in the same way [3].

Flush is always a valid way to implement invalidate, and if it does not
work then you have bugs elsewhere, since the CPU is free to write out
the dirty cache lines at any point before an invalidate. So whilst it’s
more efficient to use invalidate and hint to the core that it does not
need to do any writing, it is no more correct, and this this patch
description is misguided. You even might wish to use the fact that it
is currently a flush as a convenient sanitiser for whatever is
currently violating the non-coherent DMA rules and fix that before it’s
switched to invalidate and papers over the issue (at least in whatever
implementation you’re running on).

Jess

> [1] dma_sync_*_for_cpu and direction=TO_DEVICE
> Link: https://lkml.org/lkml/2018/5/18/979
> 
> [2] Cache maintenance for non-coherent DMA in arch_sync_dma_for_device()
> Link: https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/
> 
> [3] commit c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer")
> 
> Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices using zicbom extension")
> Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
> arch/riscv/mm/dma-noncoherent.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> index cd2225304c82..2a8f6124021f 100644
> --- a/arch/riscv/mm/dma-noncoherent.c
> +++ b/arch/riscv/mm/dma-noncoherent.c
> @@ -45,7 +45,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
> 		break;
> 	case DMA_FROM_DEVICE:
> 	case DMA_BIDIRECTIONAL:
> -		ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
> +		ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size);
> 		break;
> 	default:
> 		break;
> -- 
> 2.37.2
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Sergei Miroshnichenko Aug. 18, 2022, 8:52 p.m. UTC | #2
Hi Jess,

On Thu, 2022-08-18 at 19:12 +0100, Jessica Clarke wrote:
> On 18 Aug 2022, at 17:51, Sergei Miroshnichenko
> <s.miroshnichenko@yadro.com> wrote:
> > 
> > Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added
> > recently.
> > 
> > Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus
> > invalidate)
> > cache operation, which can overwrite the data (received from a
> > device via
> > DMA) with dirty cache lines. Replace it with the inval to avoid
> > data
> > corruptions.
> > 
> > The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the
> > cache,
> > but it is called after the device had owned the buffer. So in order
> > to not
> > loose the received data, stick to the inval here as well.
> > 
> > As of for_device(FROM_DEVICE), according to the inspirational
> > discussion
> > about matching the DMA API and cache operations [1], the inval or a
> > no-op
> > should be here, and this would resonate with most other arches. But
> > in a
> > later discussion [2] it was decided that the clean (which is used
> > now) is
> > indeed preferable and safer, though slower. The arm64 has been
> > recently
> > changed in the same way [3].
> 
> Flush is always a valid way to implement invalidate, and if it does
> not
> work then you have bugs elsewhere, since the CPU is free to write out
> the dirty cache lines at any point before an invalidate. So whilst
> it’s
> more efficient to use invalidate and hint to the core that it does
> not
> need to do any writing, it is no more correct, and this this patch
> description is misguided. You even might wish to use the fact that it
> is currently a flush as a convenient sanitiser for whatever is
> currently violating the non-coherent DMA rules and fix that before
> it’s
> switched to invalidate and papers over the issue (at least in
> whatever
> implementation you’re running on).

You are right, thank you, the description *is* misleading: cache lines
can't get dirty before for_cpu(), as they are cleaned during
for_device(), and are not written to by the CPU in between.

I gave it another thought: the reason to invalidate in for_cpu() is
that the CPU can speculatively prefetch the buffer after
map()/for_device() but before the DMA is finished.

Even if replace the clean in for_device(FROM_DEVICE) with the flush, a
prefetch still may happen, requiring an invalidation.

Serge

> 
> Jess
> 
> > [1] dma_sync_*_for_cpu and direction=TO_DEVICE
> > Link: https://lkml.org/lkml/2018/5/18/979
> > 
> > [2] Cache maintenance for non-coherent DMA in
> > arch_sync_dma_for_device()
> > Link:
> > https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/
> > 
> > [3] commit c50f11c6196f ("arm64: mm: Don't invalidate FROM_DEVICE
> > buffers at start of DMA transfer")
> > 
> > Fixes: 1631ba1259d6 ("riscv: Add support for non-coherent devices
> > using zicbom extension")
> > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> > ---
> > arch/riscv/mm/dma-noncoherent.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-
> > noncoherent.c
> > index cd2225304c82..2a8f6124021f 100644
> > --- a/arch/riscv/mm/dma-noncoherent.c
> > +++ b/arch/riscv/mm/dma-noncoherent.c
> > @@ -45,7 +45,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr,
> > size_t size,
> >               break;
> >       case DMA_FROM_DEVICE:
> >       case DMA_BIDIRECTIONAL:
> > -             ALT_CMO_OP(flush, vaddr, size,
> > riscv_cbom_block_size);
> > +             ALT_CMO_OP(inval, vaddr, size,
> > riscv_cbom_block_size);
> >               break;
> >       default:
> >               break;
> > --
> > 2.37.2
> > 
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
Jessica Clarke Aug. 18, 2022, 9:35 p.m. UTC | #3
On 18 Aug 2022, at 21:52, Sergei Miroshnichenko <s.miroshnichenko@yadro.com> wrote:
> 
> Hi Jess,
> 
> On Thu, 2022-08-18 at 19:12 +0100, Jessica Clarke wrote:
>> On 18 Aug 2022, at 17:51, Sergei Miroshnichenko
>> <s.miroshnichenko@yadro.com> wrote:
>>> 
>>> Support of arch_sync_dma_for_{device,cpu}() for RISC-V was added
>>> recently.
>>> 
>>> Currently, for_cpu(FROM_DEVICE) uses the flush (writeback plus
>>> invalidate)
>>> cache operation, which can overwrite the data (received from a
>>> device via
>>> DMA) with dirty cache lines. Replace it with the inval to avoid
>>> data
>>> corruptions.
>>> 
>>> The same goes for for_cpu(BIDIRECTIONAL), it was also flushing the
>>> cache,
>>> but it is called after the device had owned the buffer. So in order
>>> to not
>>> loose the received data, stick to the inval here as well.
>>> 
>>> As of for_device(FROM_DEVICE), according to the inspirational
>>> discussion
>>> about matching the DMA API and cache operations [1], the inval or a
>>> no-op
>>> should be here, and this would resonate with most other arches. But
>>> in a
>>> later discussion [2] it was decided that the clean (which is used
>>> now) is
>>> indeed preferable and safer, though slower. The arm64 has been
>>> recently
>>> changed in the same way [3].
>> 
>> Flush is always a valid way to implement invalidate, and if it does
>> not
>> work then you have bugs elsewhere, since the CPU is free to write out
>> the dirty cache lines at any point before an invalidate. So whilst
>> it’s
>> more efficient to use invalidate and hint to the core that it does
>> not
>> need to do any writing, it is no more correct, and this this patch
>> description is misguided. You even might wish to use the fact that it
>> is currently a flush as a convenient sanitiser for whatever is
>> currently violating the non-coherent DMA rules and fix that before
>> it’s
>> switched to invalidate and papers over the issue (at least in
>> whatever
>> implementation you’re running on).
> 
> You are right, thank you, the description *is* misleading: cache lines
> can't get dirty before for_cpu(), as they are cleaned during
> for_device(), and are not written to by the CPU in between.
> 
> I gave it another thought: the reason to invalidate in for_cpu() is
> that the CPU can speculatively prefetch the buffer after
> map()/for_device() but before the DMA is finished.
> 
> Even if replace the clean in for_device(FROM_DEVICE) with the flush, a
> prefetch still may happen, requiring an invalidation.

In that case the line is still clean, not dirty, so a flush will behave
the same as an invalidate. So no, this is still wrong, flush is always
correct, even if invalidate is potentially faster.

Jess
diff mbox series

Patch

diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
index cd2225304c82..2a8f6124021f 100644
--- a/arch/riscv/mm/dma-noncoherent.c
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -45,7 +45,7 @@  void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 		break;
 	case DMA_FROM_DEVICE:
 	case DMA_BIDIRECTIONAL:
-		ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
+		ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size);
 		break;
 	default:
 		break;