diff mbox

[linux-next,v2,1/1] dmaengine: at_hdmac: fix residue computation

Message ID b9630cef4224b93939138b3a8c1950ea44cbcf59.1434626239.git.cyrille.pitchen@atmel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Cyrille Pitchen June 18, 2015, 11:25 a.m. UTC
As claimed by the programmer datasheet and confirmed by the IP designer,
the Block Transfer Size (BTSIZE) bitfield of the Channel x Control A
Register (CTRLAx) always refers to a number of Source Width (SRC_WIDTH)
transfers.

Both the SRC_WIDTH and BTSIZE bitfields can be extacted from the CTRLAx
register to compute the DMA residue. So the 'tx_width' field is useless
and can be removed from the struct at_desc.

Before this patch, atc_prep_slave_sg() was not consistent: BTSIZE was
correctly initialized according to the SRC_WIDTH but 'tx_width' was always
set to reg_width, which was incorrect for MEM_TO_DEV transfers. It led to
bad DMA residue when 'tx_width' != SRC_WIDTH.

Also the 'tx_width' field was mostly set only in the first and last
descriptors. Depending on the kind of DMA transfer, this field remained
uninitialized for intermediate descriptors. The accurate DMA residue was
computed only when the currently processed descriptor was the first or the
last of the chain. This algorithm was a little bit odd. An accurate DMA
residue can always be computed using the SRC_WIDTH and BTSIZE bitfields
in the CTRLAx register.

Finally, the test to check whether the currently processed descriptor is
the last of the chain was wrong: for cyclic transfer, last_desc->lli.dscr
is NOT equal to zero, since set_desc_eol() is never called, but logically
equal to first_desc->txd.phys. This bug has a side effect on the
drivers/tty/serial/atmel_serial.c driver, which uses cyclic DMA transfer
to receive data. Since the DMA residue was wrong each time the DMA
transfer reaches the second (and last) period of the transfer, no more
data were received by the USART driver till the cyclic DMA transfer loops
back to the first period.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
---
 drivers/dma/at_hdmac.c      | 132 +++++++++++++++++++++++++++++---------------
 drivers/dma/at_hdmac_regs.h |   3 +-
 2 files changed, 88 insertions(+), 47 deletions(-)

Comments

Jiri Prchal June 18, 2015, 12:09 p.m. UTC | #1
On 18.6.2015 13:25, Cyrille Pitchen wrote:
>
> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>

Tested-by: Ji?í Prchal <jiri.prchal@aksignal.cz>

Manually applied to 4.1-rc7 and it WORKS!
Great.
Jiri
Torsten Fleischer June 19, 2015, 8:19 a.m. UTC | #2
On Thursday 18 June 2015 at 13:25:41 Cyrille Pitchen wrote:
> 
> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
> ---

Looks good to me.

Acked-by: Torsten Fleischer <torfl6749@gmail.com>

Regards,
Torsten
Nicolas Ferre June 19, 2015, 1:21 p.m. UTC | #3
Le 18/06/2015 14:09, Ji?í Prchal a écrit :
> 
> 
> On 18.6.2015 13:25, Cyrille Pitchen wrote:
>>
>> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
> 
> Tested-by: Ji?í Prchal <jiri.prchal@aksignal.cz>
> 
> Manually applied to 4.1-rc7 and it WORKS!

Good piece of new indeed.

Can you add your "Tested-by" tag?

Bye,
Jiri Prchal June 19, 2015, 2:36 p.m. UTC | #4
On 19.6.2015 15:21, Nicolas Ferre wrote:
> Le 18/06/2015 14:09, Ji?í Prchal a écrit :
>>
>>
>> On 18.6.2015 13:25, Cyrille Pitchen wrote:
>>>
>>> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
>>
>> Tested-by: Ji?í Prchal <jiri.prchal@aksignal.cz>

It's already here.

>>
>> Manually applied to 4.1-rc7 and it WORKS!
>
> Good piece of new indeed.
>
> Can you add your "Tested-by" tag?
>
> Bye,
>
Nicolas Ferre June 29, 2015, 1:34 p.m. UTC | #5
Le 19/06/2015 16:36, Ji?í Prchal a écrit :
> 
> 
> On 19.6.2015 15:21, Nicolas Ferre wrote:
>> Le 18/06/2015 14:09, Ji?í Prchal a écrit :
>>>
>>>
>>> On 18.6.2015 13:25, Cyrille Pitchen wrote:
>>>>
>>>> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
>>>
>>> Tested-by: Ji?í Prchal <jiri.prchal@aksignal.cz>
> 
> It's already here.
> 
>>>
>>> Manually applied to 4.1-rc7 and it WORKS!
>>
>> Good piece of new indeed.
>>
>> Can you add your "Tested-by" tag?
>>
>> Bye,
>>

Hehe: didn't see it and even didn't put my own tag: all wrong ;-)

Sorry for the noise and thanks a lot!

Bye,
Nicolas Ferre June 29, 2015, 1:38 p.m. UTC | #6
Le 18/06/2015 13:25, Cyrille Pitchen a écrit :
> As claimed by the programmer datasheet and confirmed by the IP designer,
> the Block Transfer Size (BTSIZE) bitfield of the Channel x Control A
> Register (CTRLAx) always refers to a number of Source Width (SRC_WIDTH)
> transfers.
> 
> Both the SRC_WIDTH and BTSIZE bitfields can be extacted from the CTRLAx
> register to compute the DMA residue. So the 'tx_width' field is useless
> and can be removed from the struct at_desc.
> 
> Before this patch, atc_prep_slave_sg() was not consistent: BTSIZE was
> correctly initialized according to the SRC_WIDTH but 'tx_width' was always
> set to reg_width, which was incorrect for MEM_TO_DEV transfers. It led to
> bad DMA residue when 'tx_width' != SRC_WIDTH.
> 
> Also the 'tx_width' field was mostly set only in the first and last
> descriptors. Depending on the kind of DMA transfer, this field remained
> uninitialized for intermediate descriptors. The accurate DMA residue was
> computed only when the currently processed descriptor was the first or the
> last of the chain. This algorithm was a little bit odd. An accurate DMA
> residue can always be computed using the SRC_WIDTH and BTSIZE bitfields
> in the CTRLAx register.
> 
> Finally, the test to check whether the currently processed descriptor is
> the last of the chain was wrong: for cyclic transfer, last_desc->lli.dscr
> is NOT equal to zero, since set_desc_eol() is never called, but logically
> equal to first_desc->txd.phys. This bug has a side effect on the
> drivers/tty/serial/atmel_serial.c driver, which uses cyclic DMA transfer
> to receive data. Since the DMA residue was wrong each time the DMA
> transfer reaches the second (and last) period of the transfer, no more
> data were received by the USART driver till the cyclic DMA transfer loops
> back to the first period.
> 
> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>

Yes:
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>

It turns out that some information that I gave to Torsten for his
previous patch were wrong. Thank you Cyrille for having dig this issue
and proposed a fix.

Bye,

> ---
>  drivers/dma/at_hdmac.c      | 132 +++++++++++++++++++++++++++++---------------
>  drivers/dma/at_hdmac_regs.h |   3 +-
>  2 files changed, 88 insertions(+), 47 deletions(-)
> 
> diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
> index 59892126..d3629b7 100644
> --- a/drivers/dma/at_hdmac.c
> +++ b/drivers/dma/at_hdmac.c
> @@ -48,6 +48,8 @@
>  	BIT(DMA_SLAVE_BUSWIDTH_2_BYTES) |\
>  	BIT(DMA_SLAVE_BUSWIDTH_4_BYTES))
>  
> +#define ATC_MAX_DSCR_TRIALS	10
> +
>  /*
>   * Initial number of descriptors to allocate for each channel. This could
>   * be increased during dma usage.
> @@ -285,28 +287,19 @@ static struct at_desc *atc_get_desc_by_cookie(struct at_dma_chan *atchan,
>   *
>   * @current_len: the number of bytes left before reading CTRLA
>   * @ctrla: the value of CTRLA
> - * @desc: the descriptor containing the transfer width
>   */
> -static inline int atc_calc_bytes_left(int current_len, u32 ctrla,
> -					struct at_desc *desc)
> +static inline int atc_calc_bytes_left(int current_len, u32 ctrla)
>  {
> -	return current_len - ((ctrla & ATC_BTSIZE_MAX) << desc->tx_width);
> -}
> +	u32 btsize = (ctrla & ATC_BTSIZE_MAX);
> +	u32 src_width = ATC_REG_TO_SRC_WIDTH(ctrla);
>  
> -/**
> - * atc_calc_bytes_left_from_reg - calculates the number of bytes left according
> - * to the current value of CTRLA.
> - *
> - * @current_len: the number of bytes left before reading CTRLA
> - * @atchan: the channel to read CTRLA for
> - * @desc: the descriptor containing the transfer width
> - */
> -static inline int atc_calc_bytes_left_from_reg(int current_len,
> -			struct at_dma_chan *atchan, struct at_desc *desc)
> -{
> -	u32 ctrla = channel_readl(atchan, CTRLA);
> -
> -	return atc_calc_bytes_left(current_len, ctrla, desc);
> +	/*
> +	 * According to the datasheet, when reading the Control A Register
> +	 * (ctrla), the Buffer Transfer Size (btsize) bitfield refers to the
> +	 * number of transfers completed on the Source Interface.
> +	 * So btsize is always a number of source width transfers.
> +	 */
> +	return current_len - (btsize << src_width);
>  }
>  
>  /**
> @@ -320,7 +313,7 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
>  	struct at_desc *desc_first = atc_first_active(atchan);
>  	struct at_desc *desc;
>  	int ret;
> -	u32 ctrla, dscr;
> +	u32 ctrla, dscr, trials;
>  
>  	/*
>  	 * If the cookie doesn't match to the currently running transfer then
> @@ -346,15 +339,82 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
>  		 * the channel's DSCR register and compare it against the value
>  		 * of the hardware linked list structure of each child
>  		 * descriptor.
> +		 *
> +		 * The CTRLA register provides us with the amount of data
> +		 * already read from the source for the current child
> +		 * descriptor. So we can compute a more accurate residue by also
> +		 * removing the number of bytes corresponding to this amount of
> +		 * data.
> +		 *
> +		 * However, the DSCR and CTRLA registers cannot be read both
> +		 * atomically. Hence a race condition may occur: the first read
> +		 * register may refer to one child descriptor whereas the second
> +		 * read may refer to a later child descriptor in the list
> +		 * because of the DMA transfer progression inbetween the two
> +		 * reads.
> +		 *
> +		 * One solution could have been to pause the DMA transfer, read
> +		 * the DSCR and CTRLA then resume the DMA transfer. Nonetheless,
> +		 * this approach presents some drawbacks:
> +		 * - If the DMA transfer is paused, RX overruns or TX underruns
> +		 *   are more likey to occur depending on the system latency.
> +		 *   Taking the USART driver as an example, it uses a cyclic DMA
> +		 *   transfer to read data from the Receive Holding Register
> +		 *   (RHR) to avoid RX overruns since the RHR is not protected
> +		 *   by any FIFO on most Atmel SoCs. So pausing the DMA transfer
> +		 *   to compute the residue would break the USART driver design.
> +		 * - The atc_pause() function masks interrupts but we'd rather
> +		 *   avoid to do so for system latency purpose.
> +		 *
> +		 * Then we'd rather use another solution: the DSCR is read a
> +		 * first time, the CTRLA is read in turn, next the DSCR is read
> +		 * a second time. If the two consecutive read values of the DSCR
> +		 * are the same then we assume both refers to the very same
> +		 * child descriptor as well as the CTRLA value read inbetween
> +		 * does. For cyclic tranfers, the assumption is that a full loop
> +		 * is "not so fast".
> +		 * If the two DSCR values are different, we read again the CTRLA
> +		 * then the DSCR till two consecutive read values from DSCR are
> +		 * equal or till the maxium trials is reach.
> +		 * This algorithm is very unlikely not to find a stable value for
> +		 * DSCR.
>  		 */
>  
> -		ctrla = channel_readl(atchan, CTRLA);
> -		rmb(); /* ensure CTRLA is read before DSCR */
>  		dscr = channel_readl(atchan, DSCR);
> +		rmb(); /* ensure DSCR is read before CTRLA */
> +		ctrla = channel_readl(atchan, CTRLA);
> +		for (trials = 0; trials < ATC_MAX_DSCR_TRIALS; ++trials) {
> +			u32 new_dscr;
> +
> +			rmb(); /* ensure DSCR is read after CTRLA */
> +			new_dscr = channel_readl(atchan, DSCR);
> +
> +			/*
> +			 * If the DSCR register value has not changed inside the
> +			 * DMA controller since the previous read, we assume
> +			 * that both the dscr and ctrla values refers to the
> +			 * very same descriptor.
> +			 */
> +			if (likely(new_dscr == dscr))
> +				break;
> +
> +			/*
> +			 * DSCR has changed inside the DMA controller, so the
> +			 * previouly read value of CTRLA may refer to an already
> +			 * processed descriptor hence could be outdated.
> +			 * We need to update ctrla to match the current
> +			 * descriptor.
> +			 */
> +			dscr = new_dscr;
> +			rmb(); /* ensure DSCR is read before CTRLA */
> +			ctrla = channel_readl(atchan, CTRLA);
> +		}
> +		if (unlikely(trials >= ATC_MAX_DSCR_TRIALS))
> +			return -ETIMEDOUT;
>  
>  		/* for the first descriptor we can be more accurate */
>  		if (desc_first->lli.dscr == dscr)
> -			return atc_calc_bytes_left(ret, ctrla, desc_first);
> +			return atc_calc_bytes_left(ret, ctrla);
>  
>  		ret -= desc_first->len;
>  		list_for_each_entry(desc, &desc_first->tx_list, desc_node) {
> @@ -365,16 +425,14 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
>  		}
>  
>  		/*
> -		 * For the last descriptor in the chain we can calculate
> +		 * For the current descriptor in the chain we can calculate
>  		 * the remaining bytes using the channel's register.
> -		 * Note that the transfer width of the first and last
> -		 * descriptor may differ.
>  		 */
> -		if (!desc->lli.dscr)
> -			ret = atc_calc_bytes_left_from_reg(ret, atchan, desc);
> +		ret = atc_calc_bytes_left(ret, ctrla);
>  	} else {
>  		/* single transfer */
> -		ret = atc_calc_bytes_left_from_reg(ret, atchan, desc_first);
> +		ctrla = channel_readl(atchan, CTRLA);
> +		ret = atc_calc_bytes_left(ret, ctrla);
>  	}
>  
>  	return ret;
> @@ -726,7 +784,6 @@ atc_prep_dma_interleaved(struct dma_chan *chan,
>  
>  	desc->txd.cookie = -EBUSY;
>  	desc->total_len = desc->len = len;
> -	desc->tx_width = dwidth;
>  
>  	/* set end-of-link to the last link descriptor of list*/
>  	set_desc_eol(desc);
> @@ -804,10 +861,6 @@ atc_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dest, dma_addr_t src,
>  	first->txd.cookie = -EBUSY;
>  	first->total_len = len;
>  
> -	/* set transfer width for the calculation of the residue */
> -	first->tx_width = src_width;
> -	prev->tx_width = src_width;
> -
>  	/* set end-of-link to the last link descriptor of list*/
>  	set_desc_eol(desc);
>  
> @@ -956,10 +1009,6 @@ atc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
>  	first->txd.cookie = -EBUSY;
>  	first->total_len = total_len;
>  
> -	/* set transfer width for the calculation of the residue */
> -	first->tx_width = reg_width;
> -	prev->tx_width = reg_width;
> -
>  	/* first link descriptor of list is responsible of flags */
>  	first->txd.flags = flags; /* client is in control of this ack */
>  
> @@ -1077,12 +1126,6 @@ atc_prep_dma_sg(struct dma_chan *chan,
>  		desc->txd.cookie = 0;
>  		desc->len = len;
>  
> -		/*
> -		 * Although we only need the transfer width for the first and
> -		 * the last descriptor, its easier to set it to all descriptors.
> -		 */
> -		desc->tx_width = src_width;
> -
>  		atc_desc_chain(&first, &prev, desc);
>  
>  		/* update the lengths and addresses for the next loop cycle */
> @@ -1256,7 +1299,6 @@ atc_prep_dma_cyclic(struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
>  	/* First descriptor of the chain embedds additional information */
>  	first->txd.cookie = -EBUSY;
>  	first->total_len = buf_len;
> -	first->tx_width = reg_width;
>  
>  	return &first->txd;
>  
> diff --git a/drivers/dma/at_hdmac_regs.h b/drivers/dma/at_hdmac_regs.h
> index bc8d5eb..7f5a082 100644
> --- a/drivers/dma/at_hdmac_regs.h
> +++ b/drivers/dma/at_hdmac_regs.h
> @@ -112,6 +112,7 @@
>  #define		ATC_SRC_WIDTH_BYTE	(0x0 << 24)
>  #define		ATC_SRC_WIDTH_HALFWORD	(0x1 << 24)
>  #define		ATC_SRC_WIDTH_WORD	(0x2 << 24)
> +#define		ATC_REG_TO_SRC_WIDTH(r)	(((r) >> 24) & 0x3)
>  #define	ATC_DST_WIDTH_MASK	(0x3 << 28)	/* Destination Single Transfer Size */
>  #define		ATC_DST_WIDTH(x)	((x) << 28)
>  #define		ATC_DST_WIDTH_BYTE	(0x0 << 28)
> @@ -182,7 +183,6 @@ struct at_lli {
>   * @txd: support for the async_tx api
>   * @desc_node: node on the channed descriptors list
>   * @len: descriptor byte count
> - * @tx_width: transfer width
>   * @total_len: total transaction byte count
>   */
>  struct at_desc {
> @@ -194,7 +194,6 @@ struct at_desc {
>  	struct dma_async_tx_descriptor	txd;
>  	struct list_head		desc_node;
>  	size_t				len;
> -	u32				tx_width;
>  	size_t				total_len;
>  
>  	/* Interleaved data */
>
Vinod Koul July 10, 2015, 8:05 a.m. UTC | #7
On Thu, Jun 18, 2015 at 01:25:41PM +0200, Cyrille Pitchen wrote:
> As claimed by the programmer datasheet and confirmed by the IP designer,
> the Block Transfer Size (BTSIZE) bitfield of the Channel x Control A
> Register (CTRLAx) always refers to a number of Source Width (SRC_WIDTH)
> transfers.
> 
> Both the SRC_WIDTH and BTSIZE bitfields can be extacted from the CTRLAx
> register to compute the DMA residue. So the 'tx_width' field is useless
> and can be removed from the struct at_desc.
> 
> Before this patch, atc_prep_slave_sg() was not consistent: BTSIZE was
> correctly initialized according to the SRC_WIDTH but 'tx_width' was always
> set to reg_width, which was incorrect for MEM_TO_DEV transfers. It led to
> bad DMA residue when 'tx_width' != SRC_WIDTH.
> 
> Also the 'tx_width' field was mostly set only in the first and last
> descriptors. Depending on the kind of DMA transfer, this field remained
> uninitialized for intermediate descriptors. The accurate DMA residue was
> computed only when the currently processed descriptor was the first or the
> last of the chain. This algorithm was a little bit odd. An accurate DMA
> residue can always be computed using the SRC_WIDTH and BTSIZE bitfields
> in the CTRLAx register.
> 
> Finally, the test to check whether the currently processed descriptor is
> the last of the chain was wrong: for cyclic transfer, last_desc->lli.dscr
> is NOT equal to zero, since set_desc_eol() is never called, but logically
> equal to first_desc->txd.phys. This bug has a side effect on the
> drivers/tty/serial/atmel_serial.c driver, which uses cyclic DMA transfer
> to receive data. Since the DMA residue was wrong each time the DMA
> transfer reaches the second (and last) period of the transfer, no more
> data were received by the USART driver till the cyclic DMA transfer loops
> back to the first period.
> 
Applied, thanks
diff mbox

Patch

diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 59892126..d3629b7 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -48,6 +48,8 @@ 
 	BIT(DMA_SLAVE_BUSWIDTH_2_BYTES) |\
 	BIT(DMA_SLAVE_BUSWIDTH_4_BYTES))
 
+#define ATC_MAX_DSCR_TRIALS	10
+
 /*
  * Initial number of descriptors to allocate for each channel. This could
  * be increased during dma usage.
@@ -285,28 +287,19 @@  static struct at_desc *atc_get_desc_by_cookie(struct at_dma_chan *atchan,
  *
  * @current_len: the number of bytes left before reading CTRLA
  * @ctrla: the value of CTRLA
- * @desc: the descriptor containing the transfer width
  */
-static inline int atc_calc_bytes_left(int current_len, u32 ctrla,
-					struct at_desc *desc)
+static inline int atc_calc_bytes_left(int current_len, u32 ctrla)
 {
-	return current_len - ((ctrla & ATC_BTSIZE_MAX) << desc->tx_width);
-}
+	u32 btsize = (ctrla & ATC_BTSIZE_MAX);
+	u32 src_width = ATC_REG_TO_SRC_WIDTH(ctrla);
 
-/**
- * atc_calc_bytes_left_from_reg - calculates the number of bytes left according
- * to the current value of CTRLA.
- *
- * @current_len: the number of bytes left before reading CTRLA
- * @atchan: the channel to read CTRLA for
- * @desc: the descriptor containing the transfer width
- */
-static inline int atc_calc_bytes_left_from_reg(int current_len,
-			struct at_dma_chan *atchan, struct at_desc *desc)
-{
-	u32 ctrla = channel_readl(atchan, CTRLA);
-
-	return atc_calc_bytes_left(current_len, ctrla, desc);
+	/*
+	 * According to the datasheet, when reading the Control A Register
+	 * (ctrla), the Buffer Transfer Size (btsize) bitfield refers to the
+	 * number of transfers completed on the Source Interface.
+	 * So btsize is always a number of source width transfers.
+	 */
+	return current_len - (btsize << src_width);
 }
 
 /**
@@ -320,7 +313,7 @@  static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
 	struct at_desc *desc_first = atc_first_active(atchan);
 	struct at_desc *desc;
 	int ret;
-	u32 ctrla, dscr;
+	u32 ctrla, dscr, trials;
 
 	/*
 	 * If the cookie doesn't match to the currently running transfer then
@@ -346,15 +339,82 @@  static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
 		 * the channel's DSCR register and compare it against the value
 		 * of the hardware linked list structure of each child
 		 * descriptor.
+		 *
+		 * The CTRLA register provides us with the amount of data
+		 * already read from the source for the current child
+		 * descriptor. So we can compute a more accurate residue by also
+		 * removing the number of bytes corresponding to this amount of
+		 * data.
+		 *
+		 * However, the DSCR and CTRLA registers cannot be read both
+		 * atomically. Hence a race condition may occur: the first read
+		 * register may refer to one child descriptor whereas the second
+		 * read may refer to a later child descriptor in the list
+		 * because of the DMA transfer progression inbetween the two
+		 * reads.
+		 *
+		 * One solution could have been to pause the DMA transfer, read
+		 * the DSCR and CTRLA then resume the DMA transfer. Nonetheless,
+		 * this approach presents some drawbacks:
+		 * - If the DMA transfer is paused, RX overruns or TX underruns
+		 *   are more likey to occur depending on the system latency.
+		 *   Taking the USART driver as an example, it uses a cyclic DMA
+		 *   transfer to read data from the Receive Holding Register
+		 *   (RHR) to avoid RX overruns since the RHR is not protected
+		 *   by any FIFO on most Atmel SoCs. So pausing the DMA transfer
+		 *   to compute the residue would break the USART driver design.
+		 * - The atc_pause() function masks interrupts but we'd rather
+		 *   avoid to do so for system latency purpose.
+		 *
+		 * Then we'd rather use another solution: the DSCR is read a
+		 * first time, the CTRLA is read in turn, next the DSCR is read
+		 * a second time. If the two consecutive read values of the DSCR
+		 * are the same then we assume both refers to the very same
+		 * child descriptor as well as the CTRLA value read inbetween
+		 * does. For cyclic tranfers, the assumption is that a full loop
+		 * is "not so fast".
+		 * If the two DSCR values are different, we read again the CTRLA
+		 * then the DSCR till two consecutive read values from DSCR are
+		 * equal or till the maxium trials is reach.
+		 * This algorithm is very unlikely not to find a stable value for
+		 * DSCR.
 		 */
 
-		ctrla = channel_readl(atchan, CTRLA);
-		rmb(); /* ensure CTRLA is read before DSCR */
 		dscr = channel_readl(atchan, DSCR);
+		rmb(); /* ensure DSCR is read before CTRLA */
+		ctrla = channel_readl(atchan, CTRLA);
+		for (trials = 0; trials < ATC_MAX_DSCR_TRIALS; ++trials) {
+			u32 new_dscr;
+
+			rmb(); /* ensure DSCR is read after CTRLA */
+			new_dscr = channel_readl(atchan, DSCR);
+
+			/*
+			 * If the DSCR register value has not changed inside the
+			 * DMA controller since the previous read, we assume
+			 * that both the dscr and ctrla values refers to the
+			 * very same descriptor.
+			 */
+			if (likely(new_dscr == dscr))
+				break;
+
+			/*
+			 * DSCR has changed inside the DMA controller, so the
+			 * previouly read value of CTRLA may refer to an already
+			 * processed descriptor hence could be outdated.
+			 * We need to update ctrla to match the current
+			 * descriptor.
+			 */
+			dscr = new_dscr;
+			rmb(); /* ensure DSCR is read before CTRLA */
+			ctrla = channel_readl(atchan, CTRLA);
+		}
+		if (unlikely(trials >= ATC_MAX_DSCR_TRIALS))
+			return -ETIMEDOUT;
 
 		/* for the first descriptor we can be more accurate */
 		if (desc_first->lli.dscr == dscr)
-			return atc_calc_bytes_left(ret, ctrla, desc_first);
+			return atc_calc_bytes_left(ret, ctrla);
 
 		ret -= desc_first->len;
 		list_for_each_entry(desc, &desc_first->tx_list, desc_node) {
@@ -365,16 +425,14 @@  static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
 		}
 
 		/*
-		 * For the last descriptor in the chain we can calculate
+		 * For the current descriptor in the chain we can calculate
 		 * the remaining bytes using the channel's register.
-		 * Note that the transfer width of the first and last
-		 * descriptor may differ.
 		 */
-		if (!desc->lli.dscr)
-			ret = atc_calc_bytes_left_from_reg(ret, atchan, desc);
+		ret = atc_calc_bytes_left(ret, ctrla);
 	} else {
 		/* single transfer */
-		ret = atc_calc_bytes_left_from_reg(ret, atchan, desc_first);
+		ctrla = channel_readl(atchan, CTRLA);
+		ret = atc_calc_bytes_left(ret, ctrla);
 	}
 
 	return ret;
@@ -726,7 +784,6 @@  atc_prep_dma_interleaved(struct dma_chan *chan,
 
 	desc->txd.cookie = -EBUSY;
 	desc->total_len = desc->len = len;
-	desc->tx_width = dwidth;
 
 	/* set end-of-link to the last link descriptor of list*/
 	set_desc_eol(desc);
@@ -804,10 +861,6 @@  atc_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dest, dma_addr_t src,
 	first->txd.cookie = -EBUSY;
 	first->total_len = len;
 
-	/* set transfer width for the calculation of the residue */
-	first->tx_width = src_width;
-	prev->tx_width = src_width;
-
 	/* set end-of-link to the last link descriptor of list*/
 	set_desc_eol(desc);
 
@@ -956,10 +1009,6 @@  atc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
 	first->txd.cookie = -EBUSY;
 	first->total_len = total_len;
 
-	/* set transfer width for the calculation of the residue */
-	first->tx_width = reg_width;
-	prev->tx_width = reg_width;
-
 	/* first link descriptor of list is responsible of flags */
 	first->txd.flags = flags; /* client is in control of this ack */
 
@@ -1077,12 +1126,6 @@  atc_prep_dma_sg(struct dma_chan *chan,
 		desc->txd.cookie = 0;
 		desc->len = len;
 
-		/*
-		 * Although we only need the transfer width for the first and
-		 * the last descriptor, its easier to set it to all descriptors.
-		 */
-		desc->tx_width = src_width;
-
 		atc_desc_chain(&first, &prev, desc);
 
 		/* update the lengths and addresses for the next loop cycle */
@@ -1256,7 +1299,6 @@  atc_prep_dma_cyclic(struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
 	/* First descriptor of the chain embedds additional information */
 	first->txd.cookie = -EBUSY;
 	first->total_len = buf_len;
-	first->tx_width = reg_width;
 
 	return &first->txd;
 
diff --git a/drivers/dma/at_hdmac_regs.h b/drivers/dma/at_hdmac_regs.h
index bc8d5eb..7f5a082 100644
--- a/drivers/dma/at_hdmac_regs.h
+++ b/drivers/dma/at_hdmac_regs.h
@@ -112,6 +112,7 @@ 
 #define		ATC_SRC_WIDTH_BYTE	(0x0 << 24)
 #define		ATC_SRC_WIDTH_HALFWORD	(0x1 << 24)
 #define		ATC_SRC_WIDTH_WORD	(0x2 << 24)
+#define		ATC_REG_TO_SRC_WIDTH(r)	(((r) >> 24) & 0x3)
 #define	ATC_DST_WIDTH_MASK	(0x3 << 28)	/* Destination Single Transfer Size */
 #define		ATC_DST_WIDTH(x)	((x) << 28)
 #define		ATC_DST_WIDTH_BYTE	(0x0 << 28)
@@ -182,7 +183,6 @@  struct at_lli {
  * @txd: support for the async_tx api
  * @desc_node: node on the channed descriptors list
  * @len: descriptor byte count
- * @tx_width: transfer width
  * @total_len: total transaction byte count
  */
 struct at_desc {
@@ -194,7 +194,6 @@  struct at_desc {
 	struct dma_async_tx_descriptor	txd;
 	struct list_head		desc_node;
 	size_t				len;
-	u32				tx_width;
 	size_t				total_len;
 
 	/* Interleaved data */