diff mbox

dmaengine: dw: Fix data corruption in large device to memory transfers

Message ID 1466496091-7934-1-git-send-email-jarkko.nikula@linux.intel.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Jarkko Nikula June 21, 2016, 8:01 a.m. UTC
When transferring more data than the maximum block size supported by the
HW multiplied by source width the transfer is split into smaller chunks.
Currently code calculates the memory width and thus aligment before
splitting for both memory to device and device to memory transfers.

For memory to device transfers this work fine since alignment is preserved
through the splitting and split blocks are still memory width aligned.
However in device to memory transfers aligment breaks when maximum block
size multiplied by register width doesn't have the same alignment than the
buffer. For instance when transferring from an 8-bit register 4100 bytes
(32-bit aligned) on a DW DMA that has maximum block size of 4095 elements.
An attempt to do such transfers caused data corruption.

Fix this by calculating and setting the destination memory width after
splitting by using the split block aligment and length.

Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
---
I'm not sure is this stable material or not. I'm a bit on not stable
side so I didn't Cc it. I noticed the issue by tweaking the spidev to
allow bigger than 4 KiB buffer. There were RX corruption when doing 8-bit
transfers with even sized buffers of >= 4098 bytes on an HW where max
block size is 4095.
---
 drivers/dma/dw/core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Andy Shevchenko June 21, 2016, 10:07 a.m. UTC | #1
On Tue, 2016-06-21 at 11:01 +0300, Jarkko Nikula wrote:
> When transferring more data than the maximum block size supported by
> the
> HW multiplied by source width the transfer is split into smaller
> chunks.
> Currently code calculates the memory width and thus aligment before
> splitting for both memory to device and device to memory transfers.
> 
> For memory to device transfers this work fine since alignment is
> preserved
> through the splitting and split blocks are still memory width aligned.
> However in device to memory transfers aligment breaks when maximum
> block
> size multiplied by register width doesn't have the same alignment than
> the
> buffer. For instance when transferring from an 8-bit register 4100
> bytes
> (32-bit aligned) on a DW DMA that has maximum block size of 4095
> elements.
> An attempt to do such transfers caused data corruption.
> 
> Fix this by calculating and setting the destination memory width after
> splitting by using the split block aligment and length.

I think the problem is deeper than that.

The caller in the best case can provide already split buffer based on
max_segment_size parameter from struct dma_parms. We have to fill it in
the DMA controller driver, i.e. dw_dmac, properly.

The problem here that all infrastructure around it relies on the
parameters of DMA device as a whole, when we should rely on parameters
of _individual channel_.

Speaking of API functions should take struct dma_chan of the exact
channel of the DMA capable device instead of struct device.

> 
> Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
> ---
> I'm not sure is this stable material or not. I'm a bit on not stable
> side so I didn't Cc it. I noticed the issue by tweaking the spidev to
> allow bigger than 4 KiB buffer. There were RX corruption when doing 8-
> bit
> transfers with even sized buffers of >= 4098 bytes on an HW where max
> block size is 4095.
> ---
>  drivers/dma/dw/core.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
> index edf053f73a49..878e2bf58233 100644
> --- a/drivers/dma/dw/core.c
> +++ b/drivers/dma/dw/core.c
> @@ -831,8 +831,6 @@ slave_sg_todev_fill_desc:
>  			mem = sg_dma_address(sg);
>  			len = sg_dma_len(sg);
>  
> -			mem_width = __ffs(data_width | mem | len);
> -
>  slave_sg_fromdev_fill_desc:
>  			desc = dwc_desc_get(dwc);
>  			if (!desc)
> @@ -840,15 +838,17 @@ slave_sg_fromdev_fill_desc:
>  
>  			lli_write(desc, sar, reg);
>  			lli_write(desc, dar, mem);
> -			lli_write(desc, ctllo, ctllo |
> DWC_CTLL_DST_WIDTH(mem_width));
>  			if ((len >> reg_width) > dwc->block_size) {
>  				dlen = dwc->block_size << reg_width;
> -				mem += dlen;
>  				len -= dlen;
>  			} else {
>  				dlen = len;
>  				len = 0;
>  			}
> +			mem_width = __ffs(data_width | mem | dlen);
> +			mem += dlen;
> +			lli_write(desc, ctllo,
> +				  ctllo |
> DWC_CTLL_DST_WIDTH(mem_width));
>  			lli_write(desc, ctlhi, dlen >> reg_width);
>  			desc->len = dlen;
>
Jarkko Nikula June 21, 2016, 10:47 a.m. UTC | #2
On 06/21/2016 01:07 PM, Andy Shevchenko wrote:
>> Fix this by calculating and setting the destination memory width after
>> splitting by using the split block aligment and length.
>
> I think the problem is deeper than that.
>
> The caller in the best case can provide already split buffer based on
> max_segment_size parameter from struct dma_parms. We have to fill it in
> the DMA controller driver, i.e. dw_dmac, properly.
>
Problem with max_segment_size that it is static but here optimal buffer 
split depends on runtime parameter - for instance register width in 
device to memory transfers. Which may not be known when 
dma_get_max_seg_size() is called.

> The problem here that all infrastructure around it relies on the
> parameters of DMA device as a whole, when we should rely on parameters
> of _individual channel_.
>
Fortunately dw_dmac internal buffer splitting uses per channel block 
size at least on HW that has this autocfg bit. But I agree 
dma_get_max_seg_size(dev) appears to be limited for DMA engine users if 
channels have different parameters or is dependent on other runtime 
parameters.
diff mbox

Patch

diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
index edf053f73a49..878e2bf58233 100644
--- a/drivers/dma/dw/core.c
+++ b/drivers/dma/dw/core.c
@@ -831,8 +831,6 @@  slave_sg_todev_fill_desc:
 			mem = sg_dma_address(sg);
 			len = sg_dma_len(sg);
 
-			mem_width = __ffs(data_width | mem | len);
-
 slave_sg_fromdev_fill_desc:
 			desc = dwc_desc_get(dwc);
 			if (!desc)
@@ -840,15 +838,17 @@  slave_sg_fromdev_fill_desc:
 
 			lli_write(desc, sar, reg);
 			lli_write(desc, dar, mem);
-			lli_write(desc, ctllo, ctllo | DWC_CTLL_DST_WIDTH(mem_width));
 			if ((len >> reg_width) > dwc->block_size) {
 				dlen = dwc->block_size << reg_width;
-				mem += dlen;
 				len -= dlen;
 			} else {
 				dlen = len;
 				len = 0;
 			}
+			mem_width = __ffs(data_width | mem | dlen);
+			mem += dlen;
+			lli_write(desc, ctllo,
+				  ctllo | DWC_CTLL_DST_WIDTH(mem_width));
 			lli_write(desc, ctlhi, dlen >> reg_width);
 			desc->len = dlen;