diff mbox series

[2/4] dmaengine: dw: Add memory bus width verification

Message ID 20240416162908.24180-3-fancer.lancer@gmail.com (mailing list archive)
State Superseded
Headers show
Series dmaengine: dw: Fix src/dst addr width misconfig | expand

Commit Message

Serge Semin April 16, 2024, 4:28 p.m. UTC
Currently in case of the DEV_TO_MEM or MEM_TO_DEV DMA transfers the memory
data width (single transfer width) is determined based on the buffer
length, buffer base address or DMA master-channel max address width
capability. It isn't enough in case of the channel disabling prior the
block transfer is finished. Here is what DW AHB DMA IP-core databook says
regarding the port suspension (DMA-transfer pause) implementation in the
controller:

"When CTLx.SRC_TR_WIDTH < CTLx.DST_TR_WIDTH and the CFGx.CH_SUSP bit is
high, the CFGx.FIFO_EMPTY is asserted once the contents of the FIFO do not
permit a single word of CTLx.DST_TR_WIDTH to be formed. However, there may
still be data in the channel FIFO, but not enough to form a single
transfer of CTLx.DST_TR_WIDTH. In this scenario, once the channel is
disabled, the remaining data in the channel FIFO is not transferred to the
destination peripheral."

So in case if the port gets to be suspended and then disabled it's
possible to have the data silently discarded even though the controller
reported that FIFO is empty and the CTLx.BLOCK_TS indicated the dropped
data already received from the source device. This looks as if the data
somehow got lost on a way from the peripheral device to memory and causes
problems for instance in the DW APB UART driver, which pauses and disables
the DMA-transfer as soon as the recv data timeout happens. Here is the way
it looks:

 Memory <------- DMA FIFO <------ UART FIFO <---------------- UART
  DST_TR_WIDTH -+--------|       |         |
                |        |       |         |                No more data
   Current lvl -+--------|       |---------+- DMA-burst lvl
                |        |       |---------+- Leftover data
                |        |       |---------+- SRC_TR_WIDTH
               -+--------+-------+---------+

In the example above: no more data is getting received over the UART port
and BLOCK_TS is not even close to be fully received; some data is left in
the UART FIFO, but not enough to perform a bursted DMA-xfer to the DMA
FIFO; some data is left in the DMA FIFO, but not enough to be passed
further to the system memory in a single transfer. In this situation the
8250 UART driver catches the recv timeout interrupt, pauses the
DMA-transfer and terminates it completely, after which the IRQ handler
manually fetches the leftover data from the UART FIFO into the
recv-buffer. But since the DMA-channel has been disabled with the data
left in the DMA FIFO, that data will be just discarded and the recv-buffer
will have a gap of the "current lvl" size in the recv-buffer at the tail
of the lately received data portion. So the data will be lost just due to
the misconfigured DMA transfer.

Note this is only relevant for the case of the transfer suspension and
_disabling_. No problem will happen if the transfer will be re-enabled
afterwards or the block transfer is fully completed. In the later case the
"FIFO flush mode" will be executed at the transfer final stage in order to
push out the data left in the DMA FIFO.

In order to fix the denoted problem the DW AHB DMA-engine driver needs to
make sure that the _bursted_ source transfer width is greater or equal to
the single destination transfer (note the HW databook describes more
strict constraint than actually required). Since the peripheral-device
side is prescribed by the client driver logic, the memory-side can be only
used for that. The solution can be easily implemented for the DEV_TO_MEM
transfers just by adjusting the memory-channel address width. Sadly it's
not that easy for the MEM_TO_DEV transfers since the mem-to-dma burst size
is normally dynamically determined by the controller. So the only thing
that can be done is to make sure that memory-side address width can be
greater than the peripheral device address width.

Fixes: a09820043c9e ("dw_dmac: autoconfigure data_width or get it via platform data")
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
---
 drivers/dma/dw/core.c | 41 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

Comments

Andy Shevchenko April 16, 2024, 6:47 p.m. UTC | #1
On Tue, Apr 16, 2024 at 07:28:56PM +0300, Serge Semin wrote:
> Currently in case of the DEV_TO_MEM or MEM_TO_DEV DMA transfers the memory
> data width (single transfer width) is determined based on the buffer
> length, buffer base address or DMA master-channel max address width
> capability. It isn't enough in case of the channel disabling prior the
> block transfer is finished. Here is what DW AHB DMA IP-core databook says
> regarding the port suspension (DMA-transfer pause) implementation in the
> controller:
> 
> "When CTLx.SRC_TR_WIDTH < CTLx.DST_TR_WIDTH and the CFGx.CH_SUSP bit is
> high, the CFGx.FIFO_EMPTY is asserted once the contents of the FIFO do not
> permit a single word of CTLx.DST_TR_WIDTH to be formed. However, there may
> still be data in the channel FIFO, but not enough to form a single
> transfer of CTLx.DST_TR_WIDTH. In this scenario, once the channel is
> disabled, the remaining data in the channel FIFO is not transferred to the
> destination peripheral."
> 
> So in case if the port gets to be suspended and then disabled it's
> possible to have the data silently discarded even though the controller
> reported that FIFO is empty and the CTLx.BLOCK_TS indicated the dropped
> data already received from the source device. This looks as if the data
> somehow got lost on a way from the peripheral device to memory and causes
> problems for instance in the DW APB UART driver, which pauses and disables
> the DMA-transfer as soon as the recv data timeout happens. Here is the way
> it looks:
> 
>  Memory <------- DMA FIFO <------ UART FIFO <---------------- UART
>   DST_TR_WIDTH -+--------|       |         |
>                 |        |       |         |                No more data
>    Current lvl -+--------|       |---------+- DMA-burst lvl
>                 |        |       |---------+- Leftover data
>                 |        |       |---------+- SRC_TR_WIDTH
>                -+--------+-------+---------+
> 
> In the example above: no more data is getting received over the UART port
> and BLOCK_TS is not even close to be fully received; some data is left in
> the UART FIFO, but not enough to perform a bursted DMA-xfer to the DMA
> FIFO; some data is left in the DMA FIFO, but not enough to be passed
> further to the system memory in a single transfer. In this situation the
> 8250 UART driver catches the recv timeout interrupt, pauses the
> DMA-transfer and terminates it completely, after which the IRQ handler
> manually fetches the leftover data from the UART FIFO into the
> recv-buffer. But since the DMA-channel has been disabled with the data
> left in the DMA FIFO, that data will be just discarded and the recv-buffer
> will have a gap of the "current lvl" size in the recv-buffer at the tail
> of the lately received data portion. So the data will be lost just due to
> the misconfigured DMA transfer.
> 
> Note this is only relevant for the case of the transfer suspension and
> _disabling_. No problem will happen if the transfer will be re-enabled
> afterwards or the block transfer is fully completed. In the later case the
> "FIFO flush mode" will be executed at the transfer final stage in order to
> push out the data left in the DMA FIFO.
> 
> In order to fix the denoted problem the DW AHB DMA-engine driver needs to
> make sure that the _bursted_ source transfer width is greater or equal to
> the single destination transfer (note the HW databook describes more
> strict constraint than actually required). Since the peripheral-device
> side is prescribed by the client driver logic, the memory-side can be only
> used for that. The solution can be easily implemented for the DEV_TO_MEM
> transfers just by adjusting the memory-channel address width. Sadly it's
> not that easy for the MEM_TO_DEV transfers since the mem-to-dma burst size
> is normally dynamically determined by the controller. So the only thing
> that can be done is to make sure that memory-side address width can be
> greater than the peripheral device address width.

...

> +static int dwc_verify_m_buswidth(struct dma_chan *chan)
> +{
> +	struct dw_dma_chan *dwc = to_dw_dma_chan(chan);
> +	struct dw_dma *dw = to_dw_dma(chan->device);
> +	u32 reg_width, reg_burst, mem_width;
> +
> +	mem_width = dw->pdata->data_width[dwc->dws.m_master];
> +
> +	/* Make sure src and dst word widths are coherent */
> +	if (dwc->dma_sconfig.direction == DMA_MEM_TO_DEV) {
> +		reg_width = dwc->dma_sconfig.dst_addr_width;
> +		if (mem_width < reg_width)
> +			return -EINVAL;
> +
> +		dwc->dma_sconfig.src_addr_width = mem_width;
> +	} else if (dwc->dma_sconfig.direction == DMA_DEV_TO_MEM) {
> +		reg_width = dwc->dma_sconfig.src_addr_width;
> +		reg_burst = rounddown_pow_of_two(dwc->dma_sconfig.src_maxburst);
> +
> +		dwc->dma_sconfig.dst_addr_width = min(mem_width, reg_width * reg_burst);

I understand the desire to go this way, but wouldn't be better to have
a symmetrical check and return an error?

> +	}
> +
> +	return 0;
> +}

IIRC MEM side of the DMA channel will ignore those in HW, so basically you are
(re-)using them purely for the __ffs() corrections.

...

>  	dwc->dma_sconfig.src_maxburst =
> -		clamp(dwc->dma_sconfig.src_maxburst, 0U, dwc->max_burst);
> +		clamp(dwc->dma_sconfig.src_maxburst, 1U, dwc->max_burst);
>  	dwc->dma_sconfig.dst_maxburst =
> -		clamp(dwc->dma_sconfig.dst_maxburst, 0U, dwc->max_burst);
> +		clamp(dwc->dma_sconfig.dst_maxburst, 1U, dwc->max_burst);
Serge Semin April 17, 2024, 5:13 p.m. UTC | #2
On Tue, Apr 16, 2024 at 09:47:02PM +0300, Andy Shevchenko wrote:
> On Tue, Apr 16, 2024 at 07:28:56PM +0300, Serge Semin wrote:
> > Currently in case of the DEV_TO_MEM or MEM_TO_DEV DMA transfers the memory
> > data width (single transfer width) is determined based on the buffer
> > length, buffer base address or DMA master-channel max address width
> > capability. It isn't enough in case of the channel disabling prior the
> > block transfer is finished. Here is what DW AHB DMA IP-core databook says
> > regarding the port suspension (DMA-transfer pause) implementation in the
> > controller:
> > 
> > "When CTLx.SRC_TR_WIDTH < CTLx.DST_TR_WIDTH and the CFGx.CH_SUSP bit is
> > high, the CFGx.FIFO_EMPTY is asserted once the contents of the FIFO do not
> > permit a single word of CTLx.DST_TR_WIDTH to be formed. However, there may
> > still be data in the channel FIFO, but not enough to form a single
> > transfer of CTLx.DST_TR_WIDTH. In this scenario, once the channel is
> > disabled, the remaining data in the channel FIFO is not transferred to the
> > destination peripheral."
> > 
> > So in case if the port gets to be suspended and then disabled it's
> > possible to have the data silently discarded even though the controller
> > reported that FIFO is empty and the CTLx.BLOCK_TS indicated the dropped
> > data already received from the source device. This looks as if the data
> > somehow got lost on a way from the peripheral device to memory and causes
> > problems for instance in the DW APB UART driver, which pauses and disables
> > the DMA-transfer as soon as the recv data timeout happens. Here is the way
> > it looks:
> > 
> >  Memory <------- DMA FIFO <------ UART FIFO <---------------- UART
> >   DST_TR_WIDTH -+--------|       |         |
> >                 |        |       |         |                No more data
> >    Current lvl -+--------|       |---------+- DMA-burst lvl
> >                 |        |       |---------+- Leftover data
> >                 |        |       |---------+- SRC_TR_WIDTH
> >                -+--------+-------+---------+
> > 
> > In the example above: no more data is getting received over the UART port
> > and BLOCK_TS is not even close to be fully received; some data is left in
> > the UART FIFO, but not enough to perform a bursted DMA-xfer to the DMA
> > FIFO; some data is left in the DMA FIFO, but not enough to be passed
> > further to the system memory in a single transfer. In this situation the
> > 8250 UART driver catches the recv timeout interrupt, pauses the
> > DMA-transfer and terminates it completely, after which the IRQ handler
> > manually fetches the leftover data from the UART FIFO into the
> > recv-buffer. But since the DMA-channel has been disabled with the data
> > left in the DMA FIFO, that data will be just discarded and the recv-buffer
> > will have a gap of the "current lvl" size in the recv-buffer at the tail
> > of the lately received data portion. So the data will be lost just due to
> > the misconfigured DMA transfer.
> > 
> > Note this is only relevant for the case of the transfer suspension and
> > _disabling_. No problem will happen if the transfer will be re-enabled
> > afterwards or the block transfer is fully completed. In the later case the
> > "FIFO flush mode" will be executed at the transfer final stage in order to
> > push out the data left in the DMA FIFO.
> > 
> > In order to fix the denoted problem the DW AHB DMA-engine driver needs to
> > make sure that the _bursted_ source transfer width is greater or equal to
> > the single destination transfer (note the HW databook describes more
> > strict constraint than actually required). Since the peripheral-device
> > side is prescribed by the client driver logic, the memory-side can be only
> > used for that. The solution can be easily implemented for the DEV_TO_MEM
> > transfers just by adjusting the memory-channel address width. Sadly it's
> > not that easy for the MEM_TO_DEV transfers since the mem-to-dma burst size
> > is normally dynamically determined by the controller. So the only thing
> > that can be done is to make sure that memory-side address width can be
> > greater than the peripheral device address width.
> 
> ...
> 
> > +static int dwc_verify_m_buswidth(struct dma_chan *chan)
> > +{
> > +	struct dw_dma_chan *dwc = to_dw_dma_chan(chan);
> > +	struct dw_dma *dw = to_dw_dma(chan->device);
> > +	u32 reg_width, reg_burst, mem_width;
> > +
> > +	mem_width = dw->pdata->data_width[dwc->dws.m_master];
> > +
> > +	/* Make sure src and dst word widths are coherent */
> > +	if (dwc->dma_sconfig.direction == DMA_MEM_TO_DEV) {
> > +		reg_width = dwc->dma_sconfig.dst_addr_width;
> > +		if (mem_width < reg_width)
> > +			return -EINVAL;
> > +
> > +		dwc->dma_sconfig.src_addr_width = mem_width;
> > +	} else if (dwc->dma_sconfig.direction == DMA_DEV_TO_MEM) {
> > +		reg_width = dwc->dma_sconfig.src_addr_width;
> > +		reg_burst = rounddown_pow_of_two(dwc->dma_sconfig.src_maxburst);
> > +
> > +		dwc->dma_sconfig.dst_addr_width = min(mem_width, reg_width * reg_burst);
> 

> I understand the desire to go this way, but wouldn't be better to have
> a symmetrical check and return an error?

Sadly the situation isn't symmetrical.

The main idea of the solution proposed in this patch is to make sure
that the DMA transactions would fill in the DMA FIFO in a way so in
case of the suspension all the data would be delivered to the
destination with nothing left in the DMA FIFO and the CFGx.FIFO_EMPTY
flag would mean the real FIFO emptiness. It can be reached only if
(CTLx.SRC_TR_WIDTH * CTLx.SRC_MSIZE) >= CTLx.DST_TR_WIDTH
(calculated in the real values of course). But CTLx.SRC_MSIZE is only
relevant for the flow-control/non-memory peripherals. Thus the
conditions under which the problem can be avoided are:

DMA_MEM_TO_DEV: CTLx.SRC_TR_WIDTH >= CTLx.DST_TR_WIDTH
DMA_DEV_TO_MEM: CTLx.SRC_TR_WIDTH * CTLx.SRC_MSIZE >= CTLx.DST_TR_WIDTH

In both cases the non-memory peripheral side parameters (DEV-side)
can't be changed because they are selected by the client drivers based
on their specific logic (Device FIFO depth, watermarks, CSR widths,
etc). But we can vary the memory-side transfer width as long as it's
within the permitted limits.

In case of the DMA_MEM_TO_DEV transfers we can change the
CTLx.SRC_TR_WIDTH because it represents the memory side transfer
width. But if the maximum memory transfer width is smaller than the
specified destination register width, there is nothing we can do. Thus
returning the EINVAL error. Note this is mainly a hypothetical
situation since normally the max width of the memory master xfers is
greater than the peripheral master xfer max width (in my case it's 128
and 32 bits respectively).

In case of the DMA_DEV_TO_MEM transfers we can change the CTLx.DST_TR_WIDTH
parameter because it's the memory side. Thus if the maximum
memory transfer width is smaller than the bursted source transfer,
then we can stick to the maximum memory transfer width. But if it's
greater than the bursted source transfer, we can freely reduce it
so to support the safe suspension+disable DMA-usage pattern.

> 
> > +	}
> > +
> > +	return 0;
> > +}
> 

> IIRC MEM side of the DMA channel will ignore those in HW, so basically you are
> (re-)using them purely for the __ffs() corrections.

No. DMAC ignores the _burst length_ parameters CTLx.SRC_MSIZE and
CTLx.DEST_MSIZE for the memory side (also see my comment above):

"The CTLx.SRC_MSIZE and CTLx.DEST_MSIZE are properties valid only for
peripherals with a handshaking interface; they cannot be used for
defining the burst length for memory peripherals.

When the peripherals are memory, the DW_ahb_dmac is always the flow
controller and uses DMA transfers to move blocks; thus the
CTLx.SRC_MSIZE and CTLx.DEST_MSIZE values are not used for memory
peripherals. The SRC_MSIZE/DEST_MSIZE limitations are used to
accommodate devices that have limited resources, such as a FIFO.
Memory does not normally have limitations similar to the FIFOs."

In my case the problem is in the CTLx.SRC_TR_WIDTH and
CTLx.DST_TR_WIDTH values misconfiguration. Here is the crucial comment
in the HW-manual about that (cited in the commit messages):

"When CTLx.SRC_TR_WIDTH < CTLx.DST_TR_WIDTH and the CFGx.CH_SUSP bit is
high, the CFGx.FIFO_EMPTY is asserted once the contents of the FIFO do not
permit a single word of CTLx.DST_TR_WIDTH to be formed. However, there may
still be data in the channel FIFO, but not enough to form a single
transfer of CTLx.DST_TR_WIDTH. In this scenario, once the channel is
disabled, the remaining data in the channel FIFO is not transferred to the
destination peripheral."

See Chapter 7.7 "Disabling a Channel Prior to Transfer Completion" of
the DW DMAC HW manual for more details.

-Serge(y)

> 
> ...
> 
> >  	dwc->dma_sconfig.src_maxburst =
> > -		clamp(dwc->dma_sconfig.src_maxburst, 0U, dwc->max_burst);
> > +		clamp(dwc->dma_sconfig.src_maxburst, 1U, dwc->max_burst);
> >  	dwc->dma_sconfig.dst_maxburst =
> > -		clamp(dwc->dma_sconfig.dst_maxburst, 0U, dwc->max_burst);
> > +		clamp(dwc->dma_sconfig.dst_maxburst, 1U, dwc->max_burst);
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>
Andy Shevchenko April 17, 2024, 5:28 p.m. UTC | #3
On Wed, Apr 17, 2024 at 08:13:59PM +0300, Serge Semin wrote:
> On Tue, Apr 16, 2024 at 09:47:02PM +0300, Andy Shevchenko wrote:
> > On Tue, Apr 16, 2024 at 07:28:56PM +0300, Serge Semin wrote:
> > > Currently in case of the DEV_TO_MEM or MEM_TO_DEV DMA transfers the memory
> > > data width (single transfer width) is determined based on the buffer
> > > length, buffer base address or DMA master-channel max address width
> > > capability. It isn't enough in case of the channel disabling prior the
> > > block transfer is finished. Here is what DW AHB DMA IP-core databook says
> > > regarding the port suspension (DMA-transfer pause) implementation in the
> > > controller:
> > > 
> > > "When CTLx.SRC_TR_WIDTH < CTLx.DST_TR_WIDTH and the CFGx.CH_SUSP bit is
> > > high, the CFGx.FIFO_EMPTY is asserted once the contents of the FIFO do not
> > > permit a single word of CTLx.DST_TR_WIDTH to be formed. However, there may
> > > still be data in the channel FIFO, but not enough to form a single
> > > transfer of CTLx.DST_TR_WIDTH. In this scenario, once the channel is
> > > disabled, the remaining data in the channel FIFO is not transferred to the
> > > destination peripheral."
> > > 
> > > So in case if the port gets to be suspended and then disabled it's
> > > possible to have the data silently discarded even though the controller
> > > reported that FIFO is empty and the CTLx.BLOCK_TS indicated the dropped
> > > data already received from the source device. This looks as if the data
> > > somehow got lost on a way from the peripheral device to memory and causes
> > > problems for instance in the DW APB UART driver, which pauses and disables
> > > the DMA-transfer as soon as the recv data timeout happens. Here is the way
> > > it looks:
> > > 
> > >  Memory <------- DMA FIFO <------ UART FIFO <---------------- UART
> > >   DST_TR_WIDTH -+--------|       |         |
> > >                 |        |       |         |                No more data
> > >    Current lvl -+--------|       |---------+- DMA-burst lvl
> > >                 |        |       |---------+- Leftover data
> > >                 |        |       |---------+- SRC_TR_WIDTH
> > >                -+--------+-------+---------+
> > > 
> > > In the example above: no more data is getting received over the UART port
> > > and BLOCK_TS is not even close to be fully received; some data is left in
> > > the UART FIFO, but not enough to perform a bursted DMA-xfer to the DMA
> > > FIFO; some data is left in the DMA FIFO, but not enough to be passed
> > > further to the system memory in a single transfer. In this situation the
> > > 8250 UART driver catches the recv timeout interrupt, pauses the
> > > DMA-transfer and terminates it completely, after which the IRQ handler
> > > manually fetches the leftover data from the UART FIFO into the
> > > recv-buffer. But since the DMA-channel has been disabled with the data
> > > left in the DMA FIFO, that data will be just discarded and the recv-buffer
> > > will have a gap of the "current lvl" size in the recv-buffer at the tail
> > > of the lately received data portion. So the data will be lost just due to
> > > the misconfigured DMA transfer.
> > > 
> > > Note this is only relevant for the case of the transfer suspension and
> > > _disabling_. No problem will happen if the transfer will be re-enabled
> > > afterwards or the block transfer is fully completed. In the later case the
> > > "FIFO flush mode" will be executed at the transfer final stage in order to
> > > push out the data left in the DMA FIFO.
> > > 
> > > In order to fix the denoted problem the DW AHB DMA-engine driver needs to
> > > make sure that the _bursted_ source transfer width is greater or equal to
> > > the single destination transfer (note the HW databook describes more
> > > strict constraint than actually required). Since the peripheral-device
> > > side is prescribed by the client driver logic, the memory-side can be only
> > > used for that. The solution can be easily implemented for the DEV_TO_MEM
> > > transfers just by adjusting the memory-channel address width. Sadly it's
> > > not that easy for the MEM_TO_DEV transfers since the mem-to-dma burst size
> > > is normally dynamically determined by the controller. So the only thing
> > > that can be done is to make sure that memory-side address width can be
> > > greater than the peripheral device address width.

...

> > > +static int dwc_verify_m_buswidth(struct dma_chan *chan)
> > > +{
> > > +	struct dw_dma_chan *dwc = to_dw_dma_chan(chan);
> > > +	struct dw_dma *dw = to_dw_dma(chan->device);
> > > +	u32 reg_width, reg_burst, mem_width;
> > > +
> > > +	mem_width = dw->pdata->data_width[dwc->dws.m_master];
> > > +
> > > +	/* Make sure src and dst word widths are coherent */
> > > +	if (dwc->dma_sconfig.direction == DMA_MEM_TO_DEV) {
> > > +		reg_width = dwc->dma_sconfig.dst_addr_width;
> > > +		if (mem_width < reg_width)
> > > +			return -EINVAL;
> > > +
> > > +		dwc->dma_sconfig.src_addr_width = mem_width;
> > > +	} else if (dwc->dma_sconfig.direction == DMA_DEV_TO_MEM) {
> > > +		reg_width = dwc->dma_sconfig.src_addr_width;
> > > +		reg_burst = rounddown_pow_of_two(dwc->dma_sconfig.src_maxburst);
> > > +
> > > +		dwc->dma_sconfig.dst_addr_width = min(mem_width, reg_width * reg_burst);
> > 
> 
> > I understand the desire to go this way, but wouldn't be better to have
> > a symmetrical check and return an error?
> 
> Sadly the situation isn't symmetrical.
> 
> The main idea of the solution proposed in this patch is to make sure
> that the DMA transactions would fill in the DMA FIFO in a way so in
> case of the suspension all the data would be delivered to the
> destination with nothing left in the DMA FIFO and the CFGx.FIFO_EMPTY
> flag would mean the real FIFO emptiness. It can be reached only if
> (CTLx.SRC_TR_WIDTH * CTLx.SRC_MSIZE) >= CTLx.DST_TR_WIDTH
> (calculated in the real values of course). But CTLx.SRC_MSIZE is only
> relevant for the flow-control/non-memory peripherals. Thus the

Oh, if it involves flow control, shouldn't you check for that as well?
We have a (PPC? IIRC) hardware with this IP that can have peripheral
as flow control.

> conditions under which the problem can be avoided are:
> 
> DMA_MEM_TO_DEV: CTLx.SRC_TR_WIDTH >= CTLx.DST_TR_WIDTH
> DMA_DEV_TO_MEM: CTLx.SRC_TR_WIDTH * CTLx.SRC_MSIZE >= CTLx.DST_TR_WIDTH
> 
> In both cases the non-memory peripheral side parameters (DEV-side)
> can't be changed because they are selected by the client drivers based
> on their specific logic (Device FIFO depth, watermarks, CSR widths,
> etc). But we can vary the memory-side transfer width as long as it's
> within the permitted limits.
> 
> In case of the DMA_MEM_TO_DEV transfers we can change the
> CTLx.SRC_TR_WIDTH because it represents the memory side transfer
> width. But if the maximum memory transfer width is smaller than the
> specified destination register width, there is nothing we can do. Thus
> returning the EINVAL error. Note this is mainly a hypothetical
> situation since normally the max width of the memory master xfers is
> greater than the peripheral master xfer max width (in my case it's 128
> and 32 bits respectively).
> 
> In case of the DMA_DEV_TO_MEM transfers we can change the CTLx.DST_TR_WIDTH
> parameter because it's the memory side. Thus if the maximum
> memory transfer width is smaller than the bursted source transfer,
> then we can stick to the maximum memory transfer width. But if it's
> greater than the bursted source transfer, we can freely reduce it
> so to support the safe suspension+disable DMA-usage pattern.
> 
> > 
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > 
> 
> > IIRC MEM side of the DMA channel will ignore those in HW, so basically you are
> > (re-)using them purely for the __ffs() corrections.
> 
> No. DMAC ignores the _burst length_ parameters CTLx.SRC_MSIZE and
> CTLx.DEST_MSIZE for the memory side (also see my comment above):
> 
> "The CTLx.SRC_MSIZE and CTLx.DEST_MSIZE are properties valid only for
> peripherals with a handshaking interface; they cannot be used for
> defining the burst length for memory peripherals.
> 
> When the peripherals are memory, the DW_ahb_dmac is always the flow
> controller and uses DMA transfers to move blocks; thus the
> CTLx.SRC_MSIZE and CTLx.DEST_MSIZE values are not used for memory
> peripherals. The SRC_MSIZE/DEST_MSIZE limitations are used to
> accommodate devices that have limited resources, such as a FIFO.
> Memory does not normally have limitations similar to the FIFOs."
> 
> In my case the problem is in the CTLx.SRC_TR_WIDTH and
> CTLx.DST_TR_WIDTH values misconfiguration. Here is the crucial comment
> in the HW-manual about that (cited in the commit messages):
> 
> "When CTLx.SRC_TR_WIDTH < CTLx.DST_TR_WIDTH and the CFGx.CH_SUSP bit is
> high, the CFGx.FIFO_EMPTY is asserted once the contents of the FIFO do not
> permit a single word of CTLx.DST_TR_WIDTH to be formed. However, there may
> still be data in the channel FIFO, but not enough to form a single
> transfer of CTLx.DST_TR_WIDTH. In this scenario, once the channel is
> disabled, the remaining data in the channel FIFO is not transferred to the
> destination peripheral."
> 
> See Chapter 7.7 "Disabling a Channel Prior to Transfer Completion" of
> the DW DMAC HW manual for more details.

Got it. Maybe a little summary in the code to explain all this magic?

...

> > >  	dwc->dma_sconfig.src_maxburst =
> > > -		clamp(dwc->dma_sconfig.src_maxburst, 0U, dwc->max_burst);
> > > +		clamp(dwc->dma_sconfig.src_maxburst, 1U, dwc->max_burst);
> > >  	dwc->dma_sconfig.dst_maxburst =
> > > -		clamp(dwc->dma_sconfig.dst_maxburst, 0U, dwc->max_burst);
> > > +		clamp(dwc->dma_sconfig.dst_maxburst, 1U, dwc->max_burst);
Serge Semin April 17, 2024, 6:52 p.m. UTC | #4
On Wed, Apr 17, 2024 at 08:28:06PM +0300, Andy Shevchenko wrote:
> On Wed, Apr 17, 2024 at 08:13:59PM +0300, Serge Semin wrote:
> > On Tue, Apr 16, 2024 at 09:47:02PM +0300, Andy Shevchenko wrote:
> > > On Tue, Apr 16, 2024 at 07:28:56PM +0300, Serge Semin wrote:
> > > > Currently in case of the DEV_TO_MEM or MEM_TO_DEV DMA transfers the memory
> > > > data width (single transfer width) is determined based on the buffer
> > > > length, buffer base address or DMA master-channel max address width
> > > > capability. It isn't enough in case of the channel disabling prior the
> > > > block transfer is finished. Here is what DW AHB DMA IP-core databook says
> > > > regarding the port suspension (DMA-transfer pause) implementation in the
> > > > controller:
> > > > 
> > > > "When CTLx.SRC_TR_WIDTH < CTLx.DST_TR_WIDTH and the CFGx.CH_SUSP bit is
> > > > high, the CFGx.FIFO_EMPTY is asserted once the contents of the FIFO do not
> > > > permit a single word of CTLx.DST_TR_WIDTH to be formed. However, there may
> > > > still be data in the channel FIFO, but not enough to form a single
> > > > transfer of CTLx.DST_TR_WIDTH. In this scenario, once the channel is
> > > > disabled, the remaining data in the channel FIFO is not transferred to the
> > > > destination peripheral."
> > > > 
> > > > So in case if the port gets to be suspended and then disabled it's
> > > > possible to have the data silently discarded even though the controller
> > > > reported that FIFO is empty and the CTLx.BLOCK_TS indicated the dropped
> > > > data already received from the source device. This looks as if the data
> > > > somehow got lost on a way from the peripheral device to memory and causes
> > > > problems for instance in the DW APB UART driver, which pauses and disables
> > > > the DMA-transfer as soon as the recv data timeout happens. Here is the way
> > > > it looks:
> > > > 
> > > >  Memory <------- DMA FIFO <------ UART FIFO <---------------- UART
> > > >   DST_TR_WIDTH -+--------|       |         |
> > > >                 |        |       |         |                No more data
> > > >    Current lvl -+--------|       |---------+- DMA-burst lvl
> > > >                 |        |       |---------+- Leftover data
> > > >                 |        |       |---------+- SRC_TR_WIDTH
> > > >                -+--------+-------+---------+
> > > > 
> > > > In the example above: no more data is getting received over the UART port
> > > > and BLOCK_TS is not even close to be fully received; some data is left in
> > > > the UART FIFO, but not enough to perform a bursted DMA-xfer to the DMA
> > > > FIFO; some data is left in the DMA FIFO, but not enough to be passed
> > > > further to the system memory in a single transfer. In this situation the
> > > > 8250 UART driver catches the recv timeout interrupt, pauses the
> > > > DMA-transfer and terminates it completely, after which the IRQ handler
> > > > manually fetches the leftover data from the UART FIFO into the
> > > > recv-buffer. But since the DMA-channel has been disabled with the data
> > > > left in the DMA FIFO, that data will be just discarded and the recv-buffer
> > > > will have a gap of the "current lvl" size in the recv-buffer at the tail
> > > > of the lately received data portion. So the data will be lost just due to
> > > > the misconfigured DMA transfer.
> > > > 
> > > > Note this is only relevant for the case of the transfer suspension and
> > > > _disabling_. No problem will happen if the transfer will be re-enabled
> > > > afterwards or the block transfer is fully completed. In the later case the
> > > > "FIFO flush mode" will be executed at the transfer final stage in order to
> > > > push out the data left in the DMA FIFO.
> > > > 
> > > > In order to fix the denoted problem the DW AHB DMA-engine driver needs to
> > > > make sure that the _bursted_ source transfer width is greater or equal to
> > > > the single destination transfer (note the HW databook describes more
> > > > strict constraint than actually required). Since the peripheral-device
> > > > side is prescribed by the client driver logic, the memory-side can be only
> > > > used for that. The solution can be easily implemented for the DEV_TO_MEM
> > > > transfers just by adjusting the memory-channel address width. Sadly it's
> > > > not that easy for the MEM_TO_DEV transfers since the mem-to-dma burst size
> > > > is normally dynamically determined by the controller. So the only thing
> > > > that can be done is to make sure that memory-side address width can be
> > > > greater than the peripheral device address width.
> 
> ...
> 
> > > > +static int dwc_verify_m_buswidth(struct dma_chan *chan)
> > > > +{
> > > > +	struct dw_dma_chan *dwc = to_dw_dma_chan(chan);
> > > > +	struct dw_dma *dw = to_dw_dma(chan->device);
> > > > +	u32 reg_width, reg_burst, mem_width;
> > > > +
> > > > +	mem_width = dw->pdata->data_width[dwc->dws.m_master];
> > > > +
> > > > +	/* Make sure src and dst word widths are coherent */
> > > > +	if (dwc->dma_sconfig.direction == DMA_MEM_TO_DEV) {
> > > > +		reg_width = dwc->dma_sconfig.dst_addr_width;
> > > > +		if (mem_width < reg_width)
> > > > +			return -EINVAL;
> > > > +
> > > > +		dwc->dma_sconfig.src_addr_width = mem_width;
> > > > +	} else if (dwc->dma_sconfig.direction == DMA_DEV_TO_MEM) {
> > > > +		reg_width = dwc->dma_sconfig.src_addr_width;
> > > > +		reg_burst = rounddown_pow_of_two(dwc->dma_sconfig.src_maxburst);
> > > > +
> > > > +		dwc->dma_sconfig.dst_addr_width = min(mem_width, reg_width * reg_burst);
> > > 
> > 
> > > I understand the desire to go this way, but wouldn't be better to have
> > > a symmetrical check and return an error?
> > 
> > Sadly the situation isn't symmetrical.
> > 
> > The main idea of the solution proposed in this patch is to make sure
> > that the DMA transactions would fill in the DMA FIFO in a way so in
> > case of the suspension all the data would be delivered to the
> > destination with nothing left in the DMA FIFO and the CFGx.FIFO_EMPTY
> > flag would mean the real FIFO emptiness. It can be reached only if
> > (CTLx.SRC_TR_WIDTH * CTLx.SRC_MSIZE) >= CTLx.DST_TR_WIDTH
> > (calculated in the real values of course). But CTLx.SRC_MSIZE is only
> > relevant for the flow-control/non-memory peripherals. Thus the
> 

> Oh, if it involves flow control, shouldn't you check for that as well?
> We have a (PPC? IIRC) hardware with this IP that can have peripheral
> as flow control.

Oops. Sorry, I was wrong in saying that the peripheral is a
flow-controller. All my cases imply DMAC as flow-controller. The
correct sentence would be "But CTLx.SRC_MSIZE is only relevant for the
non-memory peripherals."

> 
> > conditions under which the problem can be avoided are:
> > 
> > DMA_MEM_TO_DEV: CTLx.SRC_TR_WIDTH >= CTLx.DST_TR_WIDTH
> > DMA_DEV_TO_MEM: CTLx.SRC_TR_WIDTH * CTLx.SRC_MSIZE >= CTLx.DST_TR_WIDTH
> > 
> > In both cases the non-memory peripheral side parameters (DEV-side)
> > can't be changed because they are selected by the client drivers based
> > on their specific logic (Device FIFO depth, watermarks, CSR widths,
> > etc). But we can vary the memory-side transfer width as long as it's
> > within the permitted limits.
> > 
> > In case of the DMA_MEM_TO_DEV transfers we can change the
> > CTLx.SRC_TR_WIDTH because it represents the memory side transfer
> > width. But if the maximum memory transfer width is smaller than the
> > specified destination register width, there is nothing we can do. Thus
> > returning the EINVAL error. Note this is mainly a hypothetical
> > situation since normally the max width of the memory master xfers is
> > greater than the peripheral master xfer max width (in my case it's 128
> > and 32 bits respectively).
> > 
> > In case of the DMA_DEV_TO_MEM transfers we can change the CTLx.DST_TR_WIDTH
> > parameter because it's the memory side. Thus if the maximum
> > memory transfer width is smaller than the bursted source transfer,
> > then we can stick to the maximum memory transfer width. But if it's
> > greater than the bursted source transfer, we can freely reduce it
> > so to support the safe suspension+disable DMA-usage pattern.
> > 
> > > 
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > 
> > 
> > > IIRC MEM side of the DMA channel will ignore those in HW, so basically you are
> > > (re-)using them purely for the __ffs() corrections.
> > 
> > No. DMAC ignores the _burst length_ parameters CTLx.SRC_MSIZE and
> > CTLx.DEST_MSIZE for the memory side (also see my comment above):
> > 
> > "The CTLx.SRC_MSIZE and CTLx.DEST_MSIZE are properties valid only for
> > peripherals with a handshaking interface; they cannot be used for
> > defining the burst length for memory peripherals.
> > 
> > When the peripherals are memory, the DW_ahb_dmac is always the flow
> > controller and uses DMA transfers to move blocks; thus the
> > CTLx.SRC_MSIZE and CTLx.DEST_MSIZE values are not used for memory
> > peripherals. The SRC_MSIZE/DEST_MSIZE limitations are used to
> > accommodate devices that have limited resources, such as a FIFO.
> > Memory does not normally have limitations similar to the FIFOs."
> > 
> > In my case the problem is in the CTLx.SRC_TR_WIDTH and
> > CTLx.DST_TR_WIDTH values misconfiguration. Here is the crucial comment
> > in the HW-manual about that (cited in the commit messages):
> > 
> > "When CTLx.SRC_TR_WIDTH < CTLx.DST_TR_WIDTH and the CFGx.CH_SUSP bit is
> > high, the CFGx.FIFO_EMPTY is asserted once the contents of the FIFO do not
> > permit a single word of CTLx.DST_TR_WIDTH to be formed. However, there may
> > still be data in the channel FIFO, but not enough to form a single
> > transfer of CTLx.DST_TR_WIDTH. In this scenario, once the channel is
> > disabled, the remaining data in the channel FIFO is not transferred to the
> > destination peripheral."
> > 
> > See Chapter 7.7 "Disabling a Channel Prior to Transfer Completion" of
> > the DW DMAC HW manual for more details.
> 

> Got it. Maybe a little summary in the code to explain all this magic?

Will it be enough to add something like this:
/*
 * It's possible to have a data portion locked in the DMA FIFO in case
 * of the channel suspension. Subsequent channel disabling will cause
 * that data silent loss. In order to prevent that maintain the src
 * and dst transfer widths coherency by means of the relation:
 * (CTLx.SRC_TR_WIDTH * CTLx.SRC_MSIZE >= CTLx.DST_TR_WIDTH)
 */
?

-Serge(y)

> 
> ...
> 
> > > >  	dwc->dma_sconfig.src_maxburst =
> > > > -		clamp(dwc->dma_sconfig.src_maxburst, 0U, dwc->max_burst);
> > > > +		clamp(dwc->dma_sconfig.src_maxburst, 1U, dwc->max_burst);
> > > >  	dwc->dma_sconfig.dst_maxburst =
> > > > -		clamp(dwc->dma_sconfig.dst_maxburst, 0U, dwc->max_burst);
> > > > +		clamp(dwc->dma_sconfig.dst_maxburst, 1U, dwc->max_burst);
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>
Andy Shevchenko April 18, 2024, 9:37 a.m. UTC | #5
On Wed, Apr 17, 2024 at 09:52:42PM +0300, Serge Semin wrote:
> On Wed, Apr 17, 2024 at 08:28:06PM +0300, Andy Shevchenko wrote:
> > On Wed, Apr 17, 2024 at 08:13:59PM +0300, Serge Semin wrote:

...

> > Got it. Maybe a little summary in the code to explain all this magic?
> 
> Will it be enough to add something like this:
> /*
>  * It's possible to have a data portion locked in the DMA FIFO in case
>  * of the channel suspension. Subsequent channel disabling will cause
>  * that data silent loss. In order to prevent that maintain the src
>  * and dst transfer widths coherency by means of the relation:
>  * (CTLx.SRC_TR_WIDTH * CTLx.SRC_MSIZE >= CTLx.DST_TR_WIDTH)
>  */

Yes, and you may add something like
"Look for the details in the commit message that brings this change."
at the end of it.
Serge Semin April 18, 2024, 3:52 p.m. UTC | #6
On Thu, Apr 18, 2024 at 12:37:09PM +0300, Andy Shevchenko wrote:
> On Wed, Apr 17, 2024 at 09:52:42PM +0300, Serge Semin wrote:
> > On Wed, Apr 17, 2024 at 08:28:06PM +0300, Andy Shevchenko wrote:
> > > On Wed, Apr 17, 2024 at 08:13:59PM +0300, Serge Semin wrote:
> 
> ...
> 
> > > Got it. Maybe a little summary in the code to explain all this magic?
> > 
> > Will it be enough to add something like this:
> > /*
> >  * It's possible to have a data portion locked in the DMA FIFO in case
> >  * of the channel suspension. Subsequent channel disabling will cause
> >  * that data silent loss. In order to prevent that maintain the src
> >  * and dst transfer widths coherency by means of the relation:
> >  * (CTLx.SRC_TR_WIDTH * CTLx.SRC_MSIZE >= CTLx.DST_TR_WIDTH)
> >  */
> 
> Yes, and you may add something like
> "Look for the details in the commit message that brings this change."
> at the end of it.

Agreed. Thanks.

-Serge(y)

> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>
diff mbox series

Patch

diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
index c297ca9d5706..61e026310dd8 100644
--- a/drivers/dma/dw/core.c
+++ b/drivers/dma/dw/core.c
@@ -622,12 +622,10 @@  dwc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
 	struct dw_desc		*prev;
 	struct dw_desc		*first;
 	u32			ctllo, ctlhi;
-	u8			m_master = dwc->dws.m_master;
-	u8			lms = DWC_LLP_LMS(m_master);
+	u8			lms = DWC_LLP_LMS(dwc->dws.m_master);
 	dma_addr_t		reg;
 	unsigned int		reg_width;
 	unsigned int		mem_width;
-	unsigned int		data_width = dw->pdata->data_width[m_master];
 	unsigned int		i;
 	struct scatterlist	*sg;
 	size_t			total_len = 0;
@@ -661,7 +659,7 @@  dwc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
 			mem = sg_dma_address(sg);
 			len = sg_dma_len(sg);
 
-			mem_width = __ffs(data_width | mem | len);
+			mem_width = __ffs(sconfig->src_addr_width | mem | len);
 
 slave_sg_todev_fill_desc:
 			desc = dwc_desc_get(dwc);
@@ -721,7 +719,7 @@  dwc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
 			lli_write(desc, sar, reg);
 			lli_write(desc, dar, mem);
 			lli_write(desc, ctlhi, ctlhi);
-			mem_width = __ffs(data_width | mem);
+			mem_width = __ffs(sconfig->dst_addr_width | mem);
 			lli_write(desc, ctllo, ctllo | DWC_CTLL_DST_WIDTH(mem_width));
 			desc->len = dlen;
 
@@ -813,6 +811,31 @@  static int dwc_verify_p_buswidth(struct dma_chan *chan)
 	return 0;
 }
 
+static int dwc_verify_m_buswidth(struct dma_chan *chan)
+{
+	struct dw_dma_chan *dwc = to_dw_dma_chan(chan);
+	struct dw_dma *dw = to_dw_dma(chan->device);
+	u32 reg_width, reg_burst, mem_width;
+
+	mem_width = dw->pdata->data_width[dwc->dws.m_master];
+
+	/* Make sure src and dst word widths are coherent */
+	if (dwc->dma_sconfig.direction == DMA_MEM_TO_DEV) {
+		reg_width = dwc->dma_sconfig.dst_addr_width;
+		if (mem_width < reg_width)
+			return -EINVAL;
+
+		dwc->dma_sconfig.src_addr_width = mem_width;
+	} else if (dwc->dma_sconfig.direction == DMA_DEV_TO_MEM) {
+		reg_width = dwc->dma_sconfig.src_addr_width;
+		reg_burst = rounddown_pow_of_two(dwc->dma_sconfig.src_maxburst);
+
+		dwc->dma_sconfig.dst_addr_width = min(mem_width, reg_width * reg_burst);
+	}
+
+	return 0;
+}
+
 static int dwc_config(struct dma_chan *chan, struct dma_slave_config *sconfig)
 {
 	struct dw_dma_chan *dwc = to_dw_dma_chan(chan);
@@ -822,14 +845,18 @@  static int dwc_config(struct dma_chan *chan, struct dma_slave_config *sconfig)
 	memcpy(&dwc->dma_sconfig, sconfig, sizeof(*sconfig));
 
 	dwc->dma_sconfig.src_maxburst =
-		clamp(dwc->dma_sconfig.src_maxburst, 0U, dwc->max_burst);
+		clamp(dwc->dma_sconfig.src_maxburst, 1U, dwc->max_burst);
 	dwc->dma_sconfig.dst_maxburst =
-		clamp(dwc->dma_sconfig.dst_maxburst, 0U, dwc->max_burst);
+		clamp(dwc->dma_sconfig.dst_maxburst, 1U, dwc->max_burst);
 
 	err = dwc_verify_p_buswidth(chan);
 	if (err)
 		return err;
 
+	err = dwc_verify_m_buswidth(chan);
+	if (err)
+		return err;
+
 	dw->encode_maxburst(dwc, &dwc->dma_sconfig.src_maxburst);
 	dw->encode_maxburst(dwc, &dwc->dma_sconfig.dst_maxburst);