Message ID | 1438936917-7254-1-git-send-email-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 08/07/2015 11:41 AM, Sebastian Andrzej Siewior wrote: > This DMA driver is used by 8250-omap on DRA7-evm. There is one > requirement that is to pause a transfer. This is currently used on the RX > side. It is possible that the UART HW aborted the RX (UART's RX-timeout) > but the DMA controller starts the transfer shortly after. > Before we can manually purge the FIFO we need to pause the transfer, > check how many bytes it already received and terminate the transfer > without it making any progress. > > From testing on the TX side it seems that it is possible that we invoke > pause once the transfer has completed which is indicated by the missing > CCR_ENABLE bit but before the interrupt has been noticed. In that case the > interrupt will come even after disabling it. > > The AM572x manual says that we have to wait for the CCR_RD_ACTIVE & > CCR_WR_ACTIVE bits to be gone before programming it again here is the > drain loop. Also it looks like without the drain the TX-transfer makes > sometimes progress. > > One note: The pause + resume combo is broken because after resume the > the complete transfer will be programmed again. That means the already > transferred bytes (until the pause event) will be sent again. This is > currently not important for my UART user because it does only pause + > terminate. with a short testing audio did not broke (the only user of pause/resume) Some comments embedded. > Cc: <stable@vger.kernel.org> Why stable? This is not fixing any bugs since the PAUSE was not allowed for non cyclic transfers. > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > drivers/dma/omap-dma.c | 54 ++++++++++++++++++++++++++++++++++++++------------ > 1 file changed, 41 insertions(+), 13 deletions(-) > > diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c > index 249445c8a4c6..6b8497203caf 100644 > --- a/drivers/dma/omap-dma.c > +++ b/drivers/dma/omap-dma.c > @@ -299,7 +299,7 @@ static void omap_dma_start(struct omap_chan *c, struct omap_desc *d) > omap_dma_chan_write(c, CCR, d->ccr | CCR_ENABLE); > } > > -static void omap_dma_stop(struct omap_chan *c) > +static int omap_dma_stop(struct omap_chan *c) > { > struct omap_dmadev *od = to_omap_dma_dev(c->vc.chan.device); > uint32_t val; > @@ -342,8 +342,26 @@ static void omap_dma_stop(struct omap_chan *c) > > omap_dma_glbl_write(od, OCP_SYSCONFIG, sysconfig); > } else { > + int i = 0; > + > + if (!(val & CCR_ENABLE)) > + return -EINVAL; > + > val &= ~CCR_ENABLE; > omap_dma_chan_write(c, CCR, val); > + do { > + val = omap_dma_chan_read(c, CCR); > + if (!(val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))) > + break; > + if (i > 100) if (++i > 100) break; to avoid infinite loop? > + break; > + udelay(5); > + } while (1); > + > + if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE)) if (i > 100) ? > + dev_err(c->vc.chan.device->dev, > + "DMA drain did not complete on lch %d\n", > + c->dma_ch); > } > > mb(); > @@ -358,6 +376,7 @@ static void omap_dma_stop(struct omap_chan *c) > > omap_dma_chan_write(c, CLNK_CTRL, val); > } > + return 0; > } > > static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d, > @@ -728,6 +747,8 @@ static enum dma_status omap_dma_tx_status(struct dma_chan *chan, > } else { > txstate->residue = 0; > } > + if (ret == DMA_IN_PROGRESS && c->paused) > + ret = DMA_PAUSED; > spin_unlock_irqrestore(&c->vc.lock, flags); > > return ret; > @@ -1053,28 +1074,33 @@ static int omap_dma_terminate_all(struct dma_chan *chan) > static int omap_dma_pause(struct dma_chan *chan) > { > struct omap_chan *c = to_omap_dma_chan(chan); > + struct omap_dmadev *od = to_omap_dma_dev(chan->device); > + unsigned long flags; > + int ret = -EINVAL; > > - /* Pause/Resume only allowed with cyclic mode */ > - if (!c->cyclic) > - return -EINVAL; > + spin_lock_irqsave(&od->irq_lock, flags); > > - if (!c->paused) { > - omap_dma_stop(c); > - c->paused = true; > + if (!c->paused && c->desc) { > + ret = omap_dma_stop(c); > + if (!ret) > + c->paused = true; > } > > - return 0; > + spin_unlock_irqrestore(&od->irq_lock, flags); > + > + return ret; > } > > static int omap_dma_resume(struct dma_chan *chan) > { > struct omap_chan *c = to_omap_dma_chan(chan); > + struct omap_dmadev *od = to_omap_dma_dev(chan->device); > + unsigned long flags; > + int ret = -EINVAL; > > - /* Pause/Resume only allowed with cyclic mode */ > - if (!c->cyclic) > - return -EINVAL; > + spin_lock_irqsave(&od->irq_lock, flags); > > - if (c->paused) { > + if (c->paused && c->desc) { > mb(); > > /* Restore channel link register */ > @@ -1082,9 +1108,11 @@ static int omap_dma_resume(struct dma_chan *chan) > > omap_dma_start(c, c->desc); > c->paused = false; > + ret = 0; > } > + spin_unlock_irqrestore(&od->irq_lock, flags); > > - return 0; > + return ret; > } > > static int omap_dma_chan_init(struct omap_dmadev *od) >
On 08/07/2015 11:44 AM, Peter Ujfalusi wrote: > On 08/07/2015 11:41 AM, Sebastian Andrzej Siewior wrote: >> This DMA driver is used by 8250-omap on DRA7-evm. There is one >> requirement that is to pause a transfer. This is currently used on the RX >> side. It is possible that the UART HW aborted the RX (UART's RX-timeout) >> but the DMA controller starts the transfer shortly after. >> Before we can manually purge the FIFO we need to pause the transfer, >> check how many bytes it already received and terminate the transfer >> without it making any progress. >> >> From testing on the TX side it seems that it is possible that we invoke >> pause once the transfer has completed which is indicated by the missing >> CCR_ENABLE bit but before the interrupt has been noticed. In that case the >> interrupt will come even after disabling it. >> >> The AM572x manual says that we have to wait for the CCR_RD_ACTIVE & >> CCR_WR_ACTIVE bits to be gone before programming it again here is the >> drain loop. Also it looks like without the drain the TX-transfer makes >> sometimes progress. >> >> One note: The pause + resume combo is broken because after resume the >> the complete transfer will be programmed again. That means the already >> transferred bytes (until the pause event) will be sent again. This is >> currently not important for my UART user because it does only pause + >> terminate. > > with a short testing audio did not broke (the only user of pause/resume) > Some comments embedded. > >> Cc: <stable@vger.kernel.org> > > Why stable? This is not fixing any bugs since the PAUSE was not allowed for > non cyclic transfers. Hmmm. The DRA7x was using pause before for UART. I just did not see it coming that it was not allowed here. John made a similar change to the edma driver and I assumed it went stable but now I see that it was just cherry-picked into the ti tree. If you are not comfortable it being stable material I can drop it. >> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> >> --- >> drivers/dma/omap-dma.c | 54 ++++++++++++++++++++++++++++++++++++++------------ >> 1 file changed, 41 insertions(+), 13 deletions(-) >> >> diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c >> index 249445c8a4c6..6b8497203caf 100644 >> --- a/drivers/dma/omap-dma.c >> +++ b/drivers/dma/omap-dma.c >> @@ -299,7 +299,7 @@ static void omap_dma_start(struct omap_chan *c, struct omap_desc *d) >> omap_dma_chan_write(c, CCR, d->ccr | CCR_ENABLE); >> } >> >> -static void omap_dma_stop(struct omap_chan *c) >> +static int omap_dma_stop(struct omap_chan *c) >> { >> struct omap_dmadev *od = to_omap_dma_dev(c->vc.chan.device); >> uint32_t val; >> @@ -342,8 +342,26 @@ static void omap_dma_stop(struct omap_chan *c) >> >> omap_dma_glbl_write(od, OCP_SYSCONFIG, sysconfig); >> } else { >> + int i = 0; >> + >> + if (!(val & CCR_ENABLE)) >> + return -EINVAL; >> + >> val &= ~CCR_ENABLE; >> omap_dma_chan_write(c, CCR, val); >> + do { >> + val = omap_dma_chan_read(c, CCR); >> + if (!(val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))) >> + break; >> + if (i > 100) > > if (++i > 100) > break; > to avoid infinite loop? Ah. So I forgot to increment the counter. A few lines above there is the same loop as a workaround for something. This is the same loop. I could merge the loop + warning if you prefer. to have those things in one place. I could also just increment i. Merging the two loops might be better. >> + break; >> + udelay(5); >> + } while (1); >> + >> + if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE)) > > if (i > 100) ? While that would work, too I think it is more explicit to the reader if you check for the condition that is important to you. >> + dev_err(c->vc.chan.device->dev, >> + "DMA drain did not complete on lch %d\n", >> + c->dma_ch); >> } >> >> mb(); Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2015 at 10:41:57AM +0200, Sebastian Andrzej Siewior wrote: > This DMA driver is used by 8250-omap on DRA7-evm. There is one > requirement that is to pause a transfer. This is currently used on the RX > side. It is possible that the UART HW aborted the RX (UART's RX-timeout) > but the DMA controller starts the transfer shortly after. > Before we can manually purge the FIFO we need to pause the transfer, > check how many bytes it already received and terminate the transfer > without it making any progress. > > >From testing on the TX side it seems that it is possible that we invoke > pause once the transfer has completed which is indicated by the missing > CCR_ENABLE bit but before the interrupt has been noticed. In that case the > interrupt will come even after disabling it. How do you cope with the OMAP DMA hardware clearing its FIFO when you pause it? The problem is that on mem-to-device transfers, the DMA hardware can prefetch some data from memory into its FIFO before the device wants the data. If you then disable the channel, the hardware clears the FIFO. It is unspecified whether the hardware updates the source address in this case, or by how much. So it's pretty hard to undo the prefetching in software. The net result is: data loss. This is why I explicitly did not implement pause/resume for non-cyclic channels.
On 08/07/2015 01:36 PM, Sebastian Andrzej Siewior wrote: > On 08/07/2015 11:44 AM, Peter Ujfalusi wrote: >> On 08/07/2015 11:41 AM, Sebastian Andrzej Siewior wrote: >>> This DMA driver is used by 8250-omap on DRA7-evm. There is one >>> requirement that is to pause a transfer. This is currently used on the RX >>> side. It is possible that the UART HW aborted the RX (UART's RX-timeout) >>> but the DMA controller starts the transfer shortly after. >>> Before we can manually purge the FIFO we need to pause the transfer, >>> check how many bytes it already received and terminate the transfer >>> without it making any progress. >>> >>> From testing on the TX side it seems that it is possible that we invoke >>> pause once the transfer has completed which is indicated by the missing >>> CCR_ENABLE bit but before the interrupt has been noticed. In that case the >>> interrupt will come even after disabling it. >>> >>> The AM572x manual says that we have to wait for the CCR_RD_ACTIVE & >>> CCR_WR_ACTIVE bits to be gone before programming it again here is the >>> drain loop. Also it looks like without the drain the TX-transfer makes >>> sometimes progress. >>> >>> One note: The pause + resume combo is broken because after resume the >>> the complete transfer will be programmed again. That means the already >>> transferred bytes (until the pause event) will be sent again. This is >>> currently not important for my UART user because it does only pause + >>> terminate. >> >> with a short testing audio did not broke (the only user of pause/resume) >> Some comments embedded. >> >>> Cc: <stable@vger.kernel.org> >> >> Why stable? This is not fixing any bugs since the PAUSE was not allowed for >> non cyclic transfers. > > Hmmm. The DRA7x was using pause before for UART. I just did not see it > coming that it was not allowed here. John made a similar change to the > edma driver and I assumed it went stable but now I see that it was just > cherry-picked into the ti tree. > If you are not comfortable it being stable material I can drop it. This change is needed for the UART DMA support if I'm not mistaken and this mode is not really supported by older kernels, so having this to implement something which is not going to be used in the stable kernels feels somehow wrong. >>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> >>> --- >>> drivers/dma/omap-dma.c | 54 ++++++++++++++++++++++++++++++++++++++------------ >>> 1 file changed, 41 insertions(+), 13 deletions(-) >>> >>> diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c >>> index 249445c8a4c6..6b8497203caf 100644 >>> --- a/drivers/dma/omap-dma.c >>> +++ b/drivers/dma/omap-dma.c >>> @@ -299,7 +299,7 @@ static void omap_dma_start(struct omap_chan *c, struct omap_desc *d) >>> omap_dma_chan_write(c, CCR, d->ccr | CCR_ENABLE); >>> } >>> >>> -static void omap_dma_stop(struct omap_chan *c) >>> +static int omap_dma_stop(struct omap_chan *c) >>> { >>> struct omap_dmadev *od = to_omap_dma_dev(c->vc.chan.device); >>> uint32_t val; >>> @@ -342,8 +342,26 @@ static void omap_dma_stop(struct omap_chan *c) >>> >>> omap_dma_glbl_write(od, OCP_SYSCONFIG, sysconfig); >>> } else { >>> + int i = 0; >>> + >>> + if (!(val & CCR_ENABLE)) >>> + return -EINVAL; >>> + >>> val &= ~CCR_ENABLE; >>> omap_dma_chan_write(c, CCR, val); >>> + do { >>> + val = omap_dma_chan_read(c, CCR); >>> + if (!(val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))) >>> + break; >>> + if (i > 100) >> >> if (++i > 100) >> break; >> to avoid infinite loop? > > Ah. So I forgot to increment the counter. A few lines above there is > the same loop as a workaround for something. This is the same loop. I > could merge the loop + warning if you prefer. to have those things in > one place. I could also just increment i. Merging the two loops might > be better. The other loop is for handling the ERRATA i541 and the two loops can not be merged since the errata handling also require to change in SYSCONFIG register. > >>> + break; >>> + udelay(5); >>> + } while (1); >>> + >>> + if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE)) >> >> if (i > 100) ? > > While that would work, too I think it is more explicit to the reader if > you check for the condition that is important to you. Yeah, I see that the errata handling is doing the same, fine by me. >>> + dev_err(c->vc.chan.device->dev, >>> + "DMA drain did not complete on lch %d\n", >>> + c->dma_ch); >>> } >>> >>> mb(); > > Sebastian >
On 08/07/2015 12:55 PM, Russell King - ARM Linux wrote: > On Fri, Aug 07, 2015 at 10:41:57AM +0200, Sebastian Andrzej Siewior wrote: >> This DMA driver is used by 8250-omap on DRA7-evm. There is one >> requirement that is to pause a transfer. This is currently used on the RX >> side. It is possible that the UART HW aborted the RX (UART's RX-timeout) >> but the DMA controller starts the transfer shortly after. >> Before we can manually purge the FIFO we need to pause the transfer, >> check how many bytes it already received and terminate the transfer >> without it making any progress. >> >> >From testing on the TX side it seems that it is possible that we invoke >> pause once the transfer has completed which is indicated by the missing >> CCR_ENABLE bit but before the interrupt has been noticed. In that case the >> interrupt will come even after disabling it. > > How do you cope with the OMAP DMA hardware clearing its FIFO when you > pause it? I don't > The problem is that on mem-to-device transfers, the DMA hardware can > prefetch some data from memory into its FIFO before the device wants > the data. If you then disable the channel, the hardware clears the > FIFO. > > It is unspecified whether the hardware updates the source address in > this case, or by how much. So it's pretty hard to undo the prefetching > in software. > > The net result is: data loss. > > This is why I explicitly did not implement pause/resume for non-cyclic > channels. Right now the 820-omap (8250-dma in general, too but they don't use this driver) pause only the RX transfer in an error condition. This means it is only device-to-mem transfer. I only mentioned the TX transfer here since this was easier to test. When I first tested the 8250_omap + DMA on EDMA it seemed that once the UART hardware triggered an time-out interrupt the DMA transfer _never_ started. Based on this I could just cancel the transfer since the RX-DMA and assume zero bytes were transfer. Therefore I ignored the lack of pause support in EDMA. Things were fine until someone used 3mbaud instead 115200. Its been observed that at this speed the UART triggers the time-out interrupt and the DMA transfer just started. Without the support for pause we lost some bytes here and once pause support was added the problem was gone. Now I've been chasing another bug on Dra7 which uses this driver and the lack of pause support is a candidate for my current problem. So at this point, things can't get worse if we pause a transfer and the hardware reported the wrong number of received bytes. The situation can improve if we get the correct number :) The 8250_omap does not pause the TX/RX transfer for the fun of it. As I said, on the TX side we avoid it and on the RX side the transfer is only paused if we receive an error interrupt (= no way out). Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/07/2015 01:44 PM, Peter Ujfalusi wrote: >>>> Cc: <stable@vger.kernel.org> >>> >>> Why stable? This is not fixing any bugs since the PAUSE was not allowed for >>> non cyclic transfers. >> >> Hmmm. The DRA7x was using pause before for UART. I just did not see it >> coming that it was not allowed here. John made a similar change to the >> edma driver and I assumed it went stable but now I see that it was just >> cherry-picked into the ti tree. >> If you are not comfortable it being stable material I can drop it. > > This change is needed for the UART DMA support if I'm not mistaken and this > mode is not really supported by older kernels, so having this to implement > something which is not going to be used in the stable kernels feels somehow wrong. We have the DT pieces since v3.19-rc1. And if I remember correctly I tested this on am335x-evm and dra7-evm by I the time I posted the patches. I agree that dra7 support was not the best back then but I am almost sure that I had vanilla running for testing. But I don't insist on the stable tag. Consider it dropped. >>>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> >>>> --- >>>> drivers/dma/omap-dma.c | 54 ++++++++++++++++++++++++++++++++++++++------------ >>>> 1 file changed, 41 insertions(+), 13 deletions(-) >>>> >>>> diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c >>>> index 249445c8a4c6..6b8497203caf 100644 >>>> --- a/drivers/dma/omap-dma.c >>>> +++ b/drivers/dma/omap-dma.c >>>> @@ -299,7 +299,7 @@ static void omap_dma_start(struct omap_chan *c, struct omap_desc *d) >>>> omap_dma_chan_write(c, CCR, d->ccr | CCR_ENABLE); >>>> } >>>> >>>> -static void omap_dma_stop(struct omap_chan *c) >>>> +static int omap_dma_stop(struct omap_chan *c) >>>> { >>>> struct omap_dmadev *od = to_omap_dma_dev(c->vc.chan.device); >>>> uint32_t val; >>>> @@ -342,8 +342,26 @@ static void omap_dma_stop(struct omap_chan *c) >>>> >>>> omap_dma_glbl_write(od, OCP_SYSCONFIG, sysconfig); >>>> } else { >>>> + int i = 0; >>>> + >>>> + if (!(val & CCR_ENABLE)) >>>> + return -EINVAL; >>>> + >>>> val &= ~CCR_ENABLE; >>>> omap_dma_chan_write(c, CCR, val); >>>> + do { >>>> + val = omap_dma_chan_read(c, CCR); >>>> + if (!(val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))) >>>> + break; >>>> + if (i > 100) >>> >>> if (++i > 100) >>> break; >>> to avoid infinite loop? >> >> Ah. So I forgot to increment the counter. A few lines above there is >> the same loop as a workaround for something. This is the same loop. I >> could merge the loop + warning if you prefer. to have those things in >> one place. I could also just increment i. Merging the two loops might >> be better. > > The other loop is for handling the ERRATA i541 and the two loops can not be > merged since the errata handling also require to change in SYSCONFIG register. yes, but I had in mind is to put the loop into one function so we gain: +static void omap_dma_drain_chan(struct omap_chan *c) +{ + int i; + uint32_t val; + + /* Wait for sDMA FIFO to drain */ + for (i = 0; ; i++) { + val = omap_dma_chan_read(c, CCR); + if (!(val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))) + break; + + if (i > 100) + break; + + udelay(5); + } + + if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE)) + dev_err(c->vc.chan.device->dev, + "DMA drain did not complete on lch %d\n", + c->dma_ch); +} which is invoked by both parts of the if case (handling the errata not not) instead of having the same loop twice. >>>> + break; >>>> + udelay(5); >>>> + } while (1); >>>> + >>>> + if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE)) >>> >>> if (i > 100) ? >> >> While that would work, too I think it is more explicit to the reader if >> you check for the condition that is important to you. > > Yeah, I see that the errata handling is doing the same, fine by me. good. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2015 at 02:35:45PM +0200, Sebastian Andrzej Siewior wrote: > On 08/07/2015 12:55 PM, Russell King - ARM Linux wrote: > > On Fri, Aug 07, 2015 at 10:41:57AM +0200, Sebastian Andrzej Siewior wrote: > >> This DMA driver is used by 8250-omap on DRA7-evm. There is one > >> requirement that is to pause a transfer. This is currently used on the RX > >> side. It is possible that the UART HW aborted the RX (UART's RX-timeout) > >> but the DMA controller starts the transfer shortly after. > >> Before we can manually purge the FIFO we need to pause the transfer, > >> check how many bytes it already received and terminate the transfer > >> without it making any progress. > >> > >> >From testing on the TX side it seems that it is possible that we invoke > >> pause once the transfer has completed which is indicated by the missing > >> CCR_ENABLE bit but before the interrupt has been noticed. In that case the > >> interrupt will come even after disabling it. > > > > How do you cope with the OMAP DMA hardware clearing its FIFO when you > > pause it? > > I don't ... and so you introduce a silent data loss bug into the driver. That's not very clever. > Right now the 820-omap (8250-dma in general, too but they don't use > this driver) pause only the RX transfer in an error condition. This > means it is only device-to-mem transfer. I only mentioned the TX > transfer here since this was easier to test. That may be how 8250 works, but 8250 is not everything. You can't ignore this problem. You have to deal with it - either by not allowing a channel that would loose data to be paused, or by recovering from that condition. You're not doing either in your patch. Therefore, I have no other option but to NAK your change. Sorry. Please fix this.
On Fri, Aug 07, 2015 at 12:36:14PM +0200, Sebastian Andrzej Siewior wrote: > On 08/07/2015 11:44 AM, Peter Ujfalusi wrote: > > with a short testing audio did not broke (the only user of pause/resume) > > Some comments embedded. > > > >> Cc: <stable@vger.kernel.org> > > > > Why stable? This is not fixing any bugs since the PAUSE was not allowed for > > non cyclic transfers. > > Hmmm. The DRA7x was using pause before for UART. I just did not see it > coming that it was not allowed here. John made a similar change to the > edma driver and I assumed it went stable but now I see that it was just > cherry-picked into the ti tree. This is *NOT* stable material. Pause of these channels is something that omap-dma has *never* supported. Therefore, it is *not* a regression. What you are doing is *adding* a feature to the omap-dma driver. That is not stable material in any sense. Stable is for bug fixes to existing code, not feature enhancements. If something else has been converted to pause channels and that is causing a problem, then _that_ conversion is where the bug lies, not the lack of support in the omap-dma. If it's a result of using some new driver with omap-dma, then the problem is with whatever introduced that new combination - it's not that omap-dma is buggy. Don't fix bugs in -stable by adding features. That's _no_ way to fix bugs. NAK on this feature patch having any kind of stable tag.
On 08/07/2015 03:17 PM, Russell King - ARM Linux wrote: > On Fri, Aug 07, 2015 at 02:35:45PM +0200, Sebastian Andrzej Siewior wrote: >> On 08/07/2015 12:55 PM, Russell King - ARM Linux wrote: >>> On Fri, Aug 07, 2015 at 10:41:57AM +0200, Sebastian Andrzej Siewior wrote: >>>> This DMA driver is used by 8250-omap on DRA7-evm. There is one >>>> requirement that is to pause a transfer. This is currently used on the RX >>>> side. It is possible that the UART HW aborted the RX (UART's RX-timeout) >>>> but the DMA controller starts the transfer shortly after. >>>> Before we can manually purge the FIFO we need to pause the transfer, >>>> check how many bytes it already received and terminate the transfer >>>> without it making any progress. >>>> >>>> >From testing on the TX side it seems that it is possible that we invoke >>>> pause once the transfer has completed which is indicated by the missing >>>> CCR_ENABLE bit but before the interrupt has been noticed. In that case the >>>> interrupt will come even after disabling it. >>> >>> How do you cope with the OMAP DMA hardware clearing its FIFO when you >>> pause it? >> >> I don't > > ... and so you introduce a silent data loss bug into the driver. That's > not very clever. > >> Right now the 820-omap (8250-dma in general, too but they don't use >> this driver) pause only the RX transfer in an error condition. This >> means it is only device-to-mem transfer. I only mentioned the TX >> transfer here since this was easier to test. > > That may be how 8250 works, but 8250 is not everything. You can't ignore > this problem. You have to deal with it - either by not allowing a channel > that would loose data to be paused, or by recovering from that condition. > You're not doing either in your patch. > > Therefore, I have no other option but to NAK your change. Sorry. > > Please fix this. Would it be okay if I only allow pause for RX-transfers? For TX-transfers, I would need to update the start-address so the transfers begins where it stopped. However based on your concern I can't really assume that the position reported by the HW is the correct one. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2015 at 03:22:56PM +0200, Sebastian Andrzej Siewior wrote: > On 08/07/2015 03:17 PM, Russell King - ARM Linux wrote: > > On Fri, Aug 07, 2015 at 02:35:45PM +0200, Sebastian Andrzej Siewior wrote: > >> On 08/07/2015 12:55 PM, Russell King - ARM Linux wrote: > >>> On Fri, Aug 07, 2015 at 10:41:57AM +0200, Sebastian Andrzej Siewior wrote: > >>>> This DMA driver is used by 8250-omap on DRA7-evm. There is one > >>>> requirement that is to pause a transfer. This is currently used on the RX > >>>> side. It is possible that the UART HW aborted the RX (UART's RX-timeout) > >>>> but the DMA controller starts the transfer shortly after. > >>>> Before we can manually purge the FIFO we need to pause the transfer, > >>>> check how many bytes it already received and terminate the transfer > >>>> without it making any progress. > >>>> > >>>> >From testing on the TX side it seems that it is possible that we invoke > >>>> pause once the transfer has completed which is indicated by the missing > >>>> CCR_ENABLE bit but before the interrupt has been noticed. In that case the > >>>> interrupt will come even after disabling it. > >>> > >>> How do you cope with the OMAP DMA hardware clearing its FIFO when you > >>> pause it? > >> > >> I don't > > > > ... and so you introduce a silent data loss bug into the driver. That's > > not very clever. > > > >> Right now the 820-omap (8250-dma in general, too but they don't use > >> this driver) pause only the RX transfer in an error condition. This > >> means it is only device-to-mem transfer. I only mentioned the TX > >> transfer here since this was easier to test. > > > > That may be how 8250 works, but 8250 is not everything. You can't ignore > > this problem. You have to deal with it - either by not allowing a channel > > that would loose data to be paused, or by recovering from that condition. > > You're not doing either in your patch. > > > > Therefore, I have no other option but to NAK your change. Sorry. > > > > Please fix this. > > Would it be okay if I only allow pause for RX-transfers? Yes, that satisfies one of my suggestions above. > For TX-transfers, I would need to update the start-address so the > transfers begins where it stopped. However based on your concern I > can't really assume that the position reported by the HW is the correct > one. Exactly - I don't believe that existing OMAP DMA hardware can ever support pausing a mem-to-device transfer without data loss as you have no idea how many bytes have been prefetched by the DMA, and therefore you have no idea how many bytes to unwind the hardware position. It gets worse than that when you have to cross into the previous descriptor. It's really not nice. So, disallowing pause for mem-to-device is entirely reasonable given the data loss implications.
On 08/07/2015 03:22 PM, Russell King - ARM Linux wrote: > On Fri, Aug 07, 2015 at 12:36:14PM +0200, Sebastian Andrzej Siewior wrote: >> On 08/07/2015 11:44 AM, Peter Ujfalusi wrote: >>> with a short testing audio did not broke (the only user of pause/resume) >>> Some comments embedded. >>> >>>> Cc: <stable@vger.kernel.org> >>> >>> Why stable? This is not fixing any bugs since the PAUSE was not allowed for >>> non cyclic transfers. >> >> Hmmm. The DRA7x was using pause before for UART. I just did not see it >> coming that it was not allowed here. John made a similar change to the >> edma driver and I assumed it went stable but now I see that it was just >> cherry-picked into the ti tree. > > This is *NOT* stable material. > > Pause of these channels is something that omap-dma has *never* supported. > Therefore, it is *not* a regression. What you are doing is *adding* a > feature to the omap-dma driver. That is not stable material in any sense. > Stable is for bug fixes to existing code, not feature enhancements. I didn't consider this as a feature. > If something else has been converted to pause channels and that is causing > a problem, then _that_ conversion is where the bug lies, not the lack of > support in the omap-dma. So we had the 8250-DMA doing pause and all its current users implement it. We have a DMA driver tree which is not used and it not implementing pause (not implementing pause at all). Later we get a combo of 8250-DMA + DMA driver that is broken because the lack of pause and this is noticed a few kernel releases later. The only way of fixing the bug is by implementing the pause feature. Now you are saying that even if I implement this missing feature in a newer kernel I am not allowed to mark it stable despite the fact that it fixes an existing problem in older kernels because it is not a regression. > If it's a result of using some new driver with omap-dma, then the problem > is with whatever introduced that new combination - it's not that omap-dma > is buggy. > > Don't fix bugs in -stable by adding features. That's _no_ way to fix bugs. > > NAK on this feature patch having any kind of stable tag. I already accepted this. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2015 at 03:42:06PM +0200, Sebastian Andrzej Siewior wrote: > On 08/07/2015 03:22 PM, Russell King - ARM Linux wrote: > > On Fri, Aug 07, 2015 at 12:36:14PM +0200, Sebastian Andrzej Siewior wrote: > >> On 08/07/2015 11:44 AM, Peter Ujfalusi wrote: > >>> with a short testing audio did not broke (the only user of pause/resume) > >>> Some comments embedded. > >>> > >>>> Cc: <stable@vger.kernel.org> > >>> > >>> Why stable? This is not fixing any bugs since the PAUSE was not allowed for > >>> non cyclic transfers. > >> > >> Hmmm. The DRA7x was using pause before for UART. I just did not see it > >> coming that it was not allowed here. John made a similar change to the > >> edma driver and I assumed it went stable but now I see that it was just > >> cherry-picked into the ti tree. > > > > This is *NOT* stable material. > > > > Pause of these channels is something that omap-dma has *never* supported. > > Therefore, it is *not* a regression. What you are doing is *adding* a > > feature to the omap-dma driver. That is not stable material in any sense. > > Stable is for bug fixes to existing code, not feature enhancements. > > I didn't consider this as a feature. As it is something that the driver has _not_ supported, you are clearly adding a feature to an existing driver. It's not a bug fix. > > If something else has been converted to pause channels and that is causing > > a problem, then _that_ conversion is where the bug lies, not the lack of > > support in the omap-dma. > > So we had the 8250-DMA doing pause and all its current users implement > it. We have a DMA driver tree which is not used and it not implementing > pause (not implementing pause at all). Later we get a combo of 8250-DMA > + DMA driver that is broken because the lack of pause and this is > noticed a few kernel releases later. Right, so the patch which caused the regression is the one which arranged for the 8250-dma + omap-dma combination to work together, not the missing pause support in omap-dma. It doesn't matter that it's several releases old, it's that change caused the regression you have today. That change is incorrect today, and it was just as incorrect at the time that it was merged. > The only way of fixing the bug is by implementing the pause feature. That's not the only way of fixing the bug. As the binding of drivers is controlled by DT, you can disable the binding of these two drivers there and 8250 will go back to using non-DMA mode - which is the situation which existed prior to the change which coupled the two drivers together. That's an acceptable change for -stable trees, because it's reverting the change which caused the regression, taking us back to a situation we _know_ worked previously. Then, in mainline during the next merge window, we can introduce the pause feature to omap-dma, and re-enable the 8250 driver to use it. _Once_ that's proven stable, we can then take a view whether those changes should _then_ be backported to stable kernels with greater confidence that backporting the feature addition won't itself cause a new regression.
On 08/07/2015 09:25 AM, Russell King - ARM Linux wrote: > On Fri, Aug 07, 2015 at 03:22:56PM +0200, Sebastian Andrzej Siewior wrote: >> For TX-transfers, I would need to update the start-address so the >> transfers begins where it stopped. However based on your concern I >> can't really assume that the position reported by the HW is the correct >> one. > > Exactly - I don't believe that existing OMAP DMA hardware can ever support > pausing a mem-to-device transfer without data loss as you have no idea > how many bytes have been prefetched by the DMA, and therefore you have > no idea how many bytes to unwind the hardware position. It gets worse > than that when you have to cross into the previous descriptor. It's > really not nice. > > So, disallowing pause for mem-to-device is entirely reasonable given the > data loss implications. Thanks for adding your hard-won knowledge to this discussion, Russell. This saves us a bunch of wasted effort trying to fix x_char with DMA (and TCSANOW termios changes and throttling). Regards, Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ + Greg KH ] On 08/07/2015 09:57 AM, Russell King - ARM Linux wrote: > On Fri, Aug 07, 2015 at 03:42:06PM +0200, Sebastian Andrzej Siewior wrote: >> On 08/07/2015 03:22 PM, Russell King - ARM Linux wrote: >>> On Fri, Aug 07, 2015 at 12:36:14PM +0200, Sebastian Andrzej Siewior wrote: >>>> On 08/07/2015 11:44 AM, Peter Ujfalusi wrote: >>>>> with a short testing audio did not broke (the only user of pause/resume) >>>>> Some comments embedded. >>>>> >>>>>> Cc: <stable@vger.kernel.org> >>>>> >>>>> Why stable? This is not fixing any bugs since the PAUSE was not allowed for >>>>> non cyclic transfers. >>>> >>>> Hmmm. The DRA7x was using pause before for UART. I just did not see it >>>> coming that it was not allowed here. John made a similar change to the >>>> edma driver and I assumed it went stable but now I see that it was just >>>> cherry-picked into the ti tree. >>> >>> This is *NOT* stable material. >>> >>> Pause of these channels is something that omap-dma has *never* supported. >>> Therefore, it is *not* a regression. What you are doing is *adding* a >>> feature to the omap-dma driver. That is not stable material in any sense. >>> Stable is for bug fixes to existing code, not feature enhancements. >> >> I didn't consider this as a feature. > > As it is something that the driver has _not_ supported, you are clearly > adding a feature to an existing driver. It's not a bug fix. > >>> If something else has been converted to pause channels and that is causing >>> a problem, then _that_ conversion is where the bug lies, not the lack of >>> support in the omap-dma. FWIW, the actual bug is the api that silently does nothing. >> So we had the 8250-DMA doing pause and all its current users implement >> it. We have a DMA driver tree which is not used and it not implementing >> pause (not implementing pause at all). Later we get a combo of 8250-DMA >> + DMA driver that is broken because the lack of pause and this is >> noticed a few kernel releases later. > > Right, so the patch which caused the regression is the one which arranged > for the 8250-dma + omap-dma combination to work together, not the missing > pause support in omap-dma. That would be the original submission patch set for an entire driver, the 8250_omap driver. > It doesn't matter that it's several releases old, it's that change caused > the regression you have today. That change is incorrect today, and it was > just as incorrect at the time that it was merged. > >> The only way of fixing the bug is by implementing the pause feature. > > That's not the only way of fixing the bug. > > As the binding of drivers is controlled by DT, you can disable the binding > of these two drivers No. 8250 dma is not a stand-alone driver. Even if it were, how would you go back and fix DTs in the wild? The "binding" is built-in with a CONFIG_ switch. > there and 8250 will go back to using non-DMA mode - > which is the situation which existed prior to the change which coupled the > two drivers together. That's an acceptable change for -stable trees, > because it's reverting the change which caused the regression, taking us > back to a situation we _know_ worked previously. What you're suggesting here is only possible if the entire 8250_omap driver is removed, so that's a non-starter. I suggest to wait on any solution until the correct fix is mainlined and backported, as you note below. Regards, Peter Hurley > Then, in mainline during the next merge window, we can introduce the pause > feature to omap-dma, and re-enable the 8250 driver to use it. _Once_ that's > proven stable, we can then take a view whether those changes should _then_ > be backported to stable kernels with greater confidence that backporting > the feature addition won't itself cause a new regression. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2015 at 11:08:48AM -0400, Peter Hurley wrote: > [ + Greg KH ] > > On 08/07/2015 09:57 AM, Russell King - ARM Linux wrote: > > As it is something that the driver has _not_ supported, you are clearly > > adding a feature to an existing driver. It's not a bug fix. > > > >>> If something else has been converted to pause channels and that is causing > >>> a problem, then _that_ conversion is where the bug lies, not the lack of > >>> support in the omap-dma. > > FWIW, the actual bug is the api that silently does nothing. Incorrect. static int omap_dma_pause(struct dma_chan *chan) { struct omap_chan *c = to_omap_dma_chan(chan); /* Pause/Resume only allowed with cyclic mode */ if (!c->cyclic) return -EINVAL; Asking for the channel to be paused will return an error code indicating that the request failed. That will be propagated back through to the return code of dmaengine_pause(). If we look at what 8250-dma.c is doing: if (dma->rx_running) { dmaengine_pause(dma->rxchan); It's 8250-dma.c which is silently _ignoring_ the return code, failing to check that the operation it requested worked. Maybe this should be WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a message? > > Right, so the patch which caused the regression is the one which arranged > > for the 8250-dma + omap-dma combination to work together, not the missing > > pause support in omap-dma. > > That would be the original submission patch set for an entire driver, > the 8250_omap driver. Well, that's where the bug lies, and I don't agree with your assessment that it would mean reverting the whole thing. The binding between the two drivers is controlled via DT. DT tells it which DMA controller and which DMA input to use. So, as DMA is currently broken on this, the solution is to break that link so that 8250-omap goes back to using PIO only. > > As the binding of drivers is controlled by DT, you can disable the binding > > of these two drivers > > No. 8250 dma is not a stand-alone driver. Even if it were, how would you go > back and fix DTs in the wild? Well, we have reached an impass then. I'm not going to accept a feature addition to a driver as a stable patch without it being adequately tested over _several_ kernel revisions to ensure that we don't end up cocking up some driver which uses the DMA interfaces correctly. It's too big a risk. So, I guess that means that older kernels will just have to remain broken - all because the basic testing of the original code was never undertaken to ensure that basic stuff like reception of characters worked properly. Sorry, I have little sympathy here.
On 08/07/2015 05:29 PM, Russell King - ARM Linux wrote: > On Fri, Aug 07, 2015 at 11:08:48AM -0400, Peter Hurley wrote: >> [ + Greg KH ] >> >> On 08/07/2015 09:57 AM, Russell King - ARM Linux wrote: >>> As it is something that the driver has _not_ supported, you are clearly >>> adding a feature to an existing driver. It's not a bug fix. >>> >>>>> If something else has been converted to pause channels and that is causing >>>>> a problem, then _that_ conversion is where the bug lies, not the lack of >>>>> support in the omap-dma. >> >> FWIW, the actual bug is the api that silently does nothing. > > Incorrect. > > static int omap_dma_pause(struct dma_chan *chan) > { > struct omap_chan *c = to_omap_dma_chan(chan); > > /* Pause/Resume only allowed with cyclic mode */ > if (!c->cyclic) > return -EINVAL; > > Asking for the channel to be paused will return an error code indicating > that the request failed. That will be propagated back through to the > return code of dmaengine_pause(). > > If we look at what 8250-dma.c is doing: > > if (dma->rx_running) { > dmaengine_pause(dma->rxchan); > > It's 8250-dma.c which is silently _ignoring_ the return code, failing > to check that the operation it requested worked. Maybe this should be > WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a > message? I think this is what Peter meant by saying "silently does nothing". > So, I guess that means that older kernels will just have to remain broken - > all because the basic testing of the original code was never undertaken > to ensure that basic stuff like reception of characters worked properly. Hehe. I wouldn't describe testing at 3mbaud as basic. This reads as I didn't do any kind of testing at all prior submitting the driver. This was not the case. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/07/2015 11:29 AM, Russell King - ARM Linux wrote: > On Fri, Aug 07, 2015 at 11:08:48AM -0400, Peter Hurley wrote: >> [ + Greg KH ] >> >> On 08/07/2015 09:57 AM, Russell King - ARM Linux wrote: >>> As it is something that the driver has _not_ supported, you are clearly >>> adding a feature to an existing driver. It's not a bug fix. >>> >>>>> If something else has been converted to pause channels and that is causing >>>>> a problem, then _that_ conversion is where the bug lies, not the lack of >>>>> support in the omap-dma. >> >> FWIW, the actual bug is the api that silently does nothing. > > Incorrect. > > static int omap_dma_pause(struct dma_chan *chan) > { > struct omap_chan *c = to_omap_dma_chan(chan); > > /* Pause/Resume only allowed with cyclic mode */ > if (!c->cyclic) > return -EINVAL; > > Asking for the channel to be paused will return an error code indicating > that the request failed. That will be propagated back through to the > return code of dmaengine_pause(). > > If we look at what 8250-dma.c is doing: > > if (dma->rx_running) { > dmaengine_pause(dma->rxchan); > > It's 8250-dma.c which is silently _ignoring_ the return code, failing > to check that the operation it requested worked. Maybe this should be > WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a > message? Thanks for the suggestion; I'll hold on to that and push it after we add the 8250 omap dma pause in mainline. >>> Right, so the patch which caused the regression is the one which arranged >>> for the 8250-dma + omap-dma combination to work together, not the missing >>> pause support in omap-dma. >> >> That would be the original submission patch set for an entire driver, >> the 8250_omap driver. > > Well, that's where the bug lies, and I don't agree with your assessment > that it would mean reverting the whole thing. > > The binding between the two drivers is controlled via DT. DT tells it > which DMA controller and which DMA input to use. So, as DMA is currently > broken on this, the solution is to break that link so that 8250-omap goes > back to using PIO only. > >>> As the binding of drivers is controlled by DT, you can disable the binding >>> of these two drivers >> >> No. 8250 dma is not a stand-alone driver. Even if it were, how would you go >> back and fix DTs in the wild? > > Well, we have reached an impass then. > > I'm not going to accept a feature addition to a driver as a stable patch > without it being adequately tested over _several_ kernel revisions to > ensure that we don't end up cocking up some driver which uses the DMA > interfaces correctly. It's too big a risk. > > So, I guess that means that older kernels will just have to remain broken - > all because the basic testing of the original code was never undertaken > to ensure that basic stuff like reception of characters worked properly. Well, instead of me saying something snide about the lack of upstream serial driver unit tests, I'll say I've been working on cleaning up and organizing my own tty/serial subsystem and driver units tests which I hope to upstream in the fall. Those include i/o validators that ran this driver for days at time without error at max line rate. Unfortunately, that hardware does not exhibit the same problem as the DRA7 noted in the changelog. > Sorry, I have little sympathy here. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/07/2015 06:07 PM, Peter Hurley wrote: >> If we look at what 8250-dma.c is doing: >> >> if (dma->rx_running) { >> dmaengine_pause(dma->rxchan); >> >> It's 8250-dma.c which is silently _ignoring_ the return code, failing >> to check that the operation it requested worked. Maybe this should be >> WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a >> message? > > Thanks for the suggestion; I'll hold on to that and push it after we add > the 8250 omap dma pause in mainline. I have a patch ready with WARN_ON_ONCE() for 8250-omap and 8250-dma. This warning would trigger on am335x/edma until v4.2-rc1 and omap-dma based version is open. I could post it if you want me to. Besides that those two, there are four other drivers ignoring the return code dmaengine_pause(). Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2015 at 12:07:11PM -0400, Peter Hurley wrote: > On 08/07/2015 11:29 AM, Russell King - ARM Linux wrote: > > On Fri, Aug 07, 2015 at 11:08:48AM -0400, Peter Hurley wrote: > >> [ + Greg KH ] > >> > >> On 08/07/2015 09:57 AM, Russell King - ARM Linux wrote: > >>> As it is something that the driver has _not_ supported, you are clearly > >>> adding a feature to an existing driver. It's not a bug fix. > >>> > >>>>> If something else has been converted to pause channels and that is causing > >>>>> a problem, then _that_ conversion is where the bug lies, not the lack of > >>>>> support in the omap-dma. > >> > >> FWIW, the actual bug is the api that silently does nothing. > > > > Incorrect. > > > > static int omap_dma_pause(struct dma_chan *chan) > > { > > struct omap_chan *c = to_omap_dma_chan(chan); > > > > /* Pause/Resume only allowed with cyclic mode */ > > if (!c->cyclic) > > return -EINVAL; > > > > Asking for the channel to be paused will return an error code indicating > > that the request failed. That will be propagated back through to the > > return code of dmaengine_pause(). > > > > If we look at what 8250-dma.c is doing: > > > > if (dma->rx_running) { > > dmaengine_pause(dma->rxchan); > > > > It's 8250-dma.c which is silently _ignoring_ the return code, failing > > to check that the operation it requested worked. Maybe this should be > > WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a > > message? > > Thanks for the suggestion; I'll hold on to that and push it after we add > the 8250 omap dma pause in mainline. Why wait? You're hiding a data loss bug which is clearly the result of code you allegedly maintain. > Well, instead of me saying something snide about the lack of upstream serial > driver unit tests, I'll say I've been working on cleaning up and organizing > my own tty/serial subsystem and driver units tests which I hope to upstream > in the fall. > > Those include i/o validators that ran this driver for days at time without > error at max line rate. Unfortunately, that hardware does not exhibit the > same problem as the DRA7 noted in the changelog. What you have is a race condition in the code you a responsible for maintaining, caused by poorly implemented code. Fix it, rather than whinging about drivers outside of your subsystem having never implemented _optional_ things that you choose to merge broken code which relied upon it _without_ checking that the operation succeeded. It is _entirely_ your code which is wrong here. I will wait for that to be fixed before acking the omap-dma change since you obviously need something to test with.
On Fri, Aug 07, 2015 at 06:20:44PM +0200, Sebastian Andrzej Siewior wrote: > On 08/07/2015 06:07 PM, Peter Hurley wrote: > >> If we look at what 8250-dma.c is doing: > >> > >> if (dma->rx_running) { > >> dmaengine_pause(dma->rxchan); > >> > >> It's 8250-dma.c which is silently _ignoring_ the return code, failing > >> to check that the operation it requested worked. Maybe this should be > >> WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a > >> message? > > > > Thanks for the suggestion; I'll hold on to that and push it after we add > > the 8250 omap dma pause in mainline. > > I have a patch ready with WARN_ON_ONCE() for 8250-omap and 8250-dma. > This warning would trigger on am335x/edma until v4.2-rc1 and omap-dma > based version is open. I could post it if you want me to. > Besides that those two, there are four other drivers ignoring the > return code dmaengine_pause(). It'll probably be a good idea if those are fixed too, and also have a __must_check annotation on the declaration of dmaengine_pause() to prevent future mishaps like this.
On Fri, Aug 07, 2015 at 05:44:03PM +0200, Sebastian Andrzej Siewior wrote: > On 08/07/2015 05:29 PM, Russell King - ARM Linux wrote: > > On Fri, Aug 07, 2015 at 11:08:48AM -0400, Peter Hurley wrote: > >> [ + Greg KH ] > >> > >> On 08/07/2015 09:57 AM, Russell King - ARM Linux wrote: > >>> As it is something that the driver has _not_ supported, you are clearly > >>> adding a feature to an existing driver. It's not a bug fix. > >>> > >>>>> If something else has been converted to pause channels and that is causing > >>>>> a problem, then _that_ conversion is where the bug lies, not the lack of > >>>>> support in the omap-dma. > >> > >> FWIW, the actual bug is the api that silently does nothing. > > > > Incorrect. > > > > static int omap_dma_pause(struct dma_chan *chan) > > { > > struct omap_chan *c = to_omap_dma_chan(chan); > > > > /* Pause/Resume only allowed with cyclic mode */ > > if (!c->cyclic) > > return -EINVAL; > > > > Asking for the channel to be paused will return an error code indicating > > that the request failed. That will be propagated back through to the > > return code of dmaengine_pause(). > > > > If we look at what 8250-dma.c is doing: > > > > if (dma->rx_running) { > > dmaengine_pause(dma->rxchan); > > > > It's 8250-dma.c which is silently _ignoring_ the return code, failing > > to check that the operation it requested worked. Maybe this should be > > WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a > > message? > > I think this is what Peter meant by saying "silently does nothing". Maybe Peter should phrase his replies better. "the actual bug is the api that silently does nothing." to me means he is complaining that dmaengine_pause() had no effect. _That_ is what I'm objecting to, and claiming that Peter's comment is wrong. He's now blaming me for snide remarks. I could call his one above an incorrect snide remark against the quality of DMA engine code. He's totally wrong, and been proven wrong by my analysis above, plain and simple. He should now accept that he's wrong and move along with sorting out this mess _without_ requiring optional features to be implemented in other subsystems to fix bugs in code he's supposed to be maintaining.
On 08/07/2015 12:39 PM, Russell King - ARM Linux wrote: > On Fri, Aug 07, 2015 at 05:44:03PM +0200, Sebastian Andrzej Siewior wrote: >> On 08/07/2015 05:29 PM, Russell King - ARM Linux wrote: >>> On Fri, Aug 07, 2015 at 11:08:48AM -0400, Peter Hurley wrote: >>>> [ + Greg KH ] >>>> >>>> On 08/07/2015 09:57 AM, Russell King - ARM Linux wrote: >>>>> As it is something that the driver has _not_ supported, you are clearly >>>>> adding a feature to an existing driver. It's not a bug fix. >>>>> >>>>>>> If something else has been converted to pause channels and that is causing >>>>>>> a problem, then _that_ conversion is where the bug lies, not the lack of >>>>>>> support in the omap-dma. >>>> >>>> FWIW, the actual bug is the api that silently does nothing. >>> >>> Incorrect. >>> >>> static int omap_dma_pause(struct dma_chan *chan) >>> { >>> struct omap_chan *c = to_omap_dma_chan(chan); >>> >>> /* Pause/Resume only allowed with cyclic mode */ >>> if (!c->cyclic) >>> return -EINVAL; >>> >>> Asking for the channel to be paused will return an error code indicating >>> that the request failed. That will be propagated back through to the >>> return code of dmaengine_pause(). >>> >>> If we look at what 8250-dma.c is doing: >>> >>> if (dma->rx_running) { >>> dmaengine_pause(dma->rxchan); >>> >>> It's 8250-dma.c which is silently _ignoring_ the return code, failing >>> to check that the operation it requested worked. Maybe this should be >>> WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a >>> message? >> >> I think this is what Peter meant by saying "silently does nothing". > > Maybe Peter should phrase his replies better. "the actual bug is the > api that silently does nothing." to me means he is complaining that > dmaengine_pause() had no effect. _That_ is what I'm objecting to, > and claiming that Peter's comment is wrong. Yes, I missed that the api included a return code which the 8250 dma code should be checking. > He's now blaming me for snide remarks. I could call his one above an > incorrect snide remark against the quality of DMA engine code. You'd be reading a lot into my statement. > He's > totally wrong, and been proven wrong by my analysis above, plain and > simple. > > He should now accept that he's wrong Done. > and move along with sorting out > this mess _without_ requiring optional features to be implemented in > other subsystems to fix bugs in code he's supposed to be maintaining. This is simply a bug that flew under the radar, much like every other bug that gets fixed daily in mainline. The omap-serial driver which doesn't use dma is still the preferred stable driver for omap, for the moment. One of the main features of the 8250_omap integration was the addition of dma support. Without it, 8250_omap is ttyO in ttyS clothing. Regards, Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2015 at 01:23:20PM -0400, Peter Hurley wrote: > The omap-serial driver which doesn't use dma is still the preferred > stable driver for omap, for the moment. > > One of the main features of the 8250_omap integration was the addition > of dma support. Without it, 8250_omap is ttyO in ttyS clothing. Correct. omap-serial used to have broken DMA support, and I had a series of patches fixing it. Unfortunately, despite TI asking me to work on this, TI went ahead and removed the DMA support behind my back, which totally screwed my patch series after I merged some of its pre-requists. It was then decided that DMA support was no longer an important feature, so I dropped the patch set. So, I'm well aware of the issues here, and the problems of using the OMAP sDMA with the UARTs.
On Fri, Aug 07, 2015 at 10:41:57AM +0200, Sebastian Andrzej Siewior wrote: > This DMA driver is used by 8250-omap on DRA7-evm. There is one > requirement that is to pause a transfer. This is currently used on the RX > side. It is possible that the UART HW aborted the RX (UART's RX-timeout) > but the DMA controller starts the transfer shortly after. > Before we can manually purge the FIFO we need to pause the transfer, > check how many bytes it already received and terminate the transfer > without it making any progress. > > >From testing on the TX side it seems that it is possible that we invoke > pause once the transfer has completed which is indicated by the missing > CCR_ENABLE bit but before the interrupt has been noticed. In that case the > interrupt will come even after disabling it. > > The AM572x manual says that we have to wait for the CCR_RD_ACTIVE & > CCR_WR_ACTIVE bits to be gone before programming it again here is the > drain loop. Also it looks like without the drain the TX-transfer makes > sometimes progress. > > One note: The pause + resume combo is broken because after resume the > the complete transfer will be programmed again. That means the already > transferred bytes (until the pause event) will be sent again. This is > currently not important for my UART user because it does only pause + > terminate. > > Cc: <stable@vger.kernel.org> You don't get a "add support" patch into the stable tree unless it's a trivial device id or quirk table addition, please go re-read Documentation/stable_kernel_rules.txt thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ + Heikki ] On 08/07/2015 12:33 PM, Russell King - ARM Linux wrote: > On Fri, Aug 07, 2015 at 12:07:11PM -0400, Peter Hurley wrote: >> On 08/07/2015 11:29 AM, Russell King - ARM Linux wrote: >>> On Fri, Aug 07, 2015 at 11:08:48AM -0400, Peter Hurley wrote: >>>> [ + Greg KH ] >>>> >>>> On 08/07/2015 09:57 AM, Russell King - ARM Linux wrote: >>>>> As it is something that the driver has _not_ supported, you are clearly >>>>> adding a feature to an existing driver. It's not a bug fix. >>>>> >>>>>>> If something else has been converted to pause channels and that is causing >>>>>>> a problem, then _that_ conversion is where the bug lies, not the lack of >>>>>>> support in the omap-dma. >>>> >>>> FWIW, the actual bug is the api that silently does nothing. >>> >>> Incorrect. >>> >>> static int omap_dma_pause(struct dma_chan *chan) >>> { >>> struct omap_chan *c = to_omap_dma_chan(chan); >>> >>> /* Pause/Resume only allowed with cyclic mode */ >>> if (!c->cyclic) >>> return -EINVAL; >>> >>> Asking for the channel to be paused will return an error code indicating >>> that the request failed. That will be propagated back through to the >>> return code of dmaengine_pause(). >>> >>> If we look at what 8250-dma.c is doing: >>> >>> if (dma->rx_running) { >>> dmaengine_pause(dma->rxchan); >>> >>> It's 8250-dma.c which is silently _ignoring_ the return code, failing >>> to check that the operation it requested worked. Maybe this should be >>> WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a >>> message? >> >> Thanks for the suggestion; I'll hold on to that and push it after we add >> the 8250 omap dma pause in mainline. > > Why wait? You're hiding a data loss bug which is clearly the result of > code you allegedly maintain. Because it will generate tons of unnecessary reports when the patch that WARNs inevitably ends up in mainline a version earlier than the patch that fixes it. >> Well, instead of me saying something snide about the lack of upstream serial >> driver unit tests, I'll say I've been working on cleaning up and organizing >> my own tty/serial subsystem and driver units tests which I hope to upstream >> in the fall. >> >> Those include i/o validators that ran this driver for days at time without >> error at max line rate. Unfortunately, that hardware does not exhibit the >> same problem as the DRA7 noted in the changelog. > > What you have is a race condition in the code you a responsible for > maintaining, caused by poorly implemented code. Fix it, rather than > whinging about drivers outside of your subsystem having never implemented > _optional_ things that you choose to merge broken code which relied upon > it _without_ checking that the operation succeeded. > > It is _entirely_ your code which is wrong here. > > I will wait for that to be fixed before acking the omap-dma change since > you obviously need something to test with. I'm not sure to what you're referring here. A WARNing fixes nothing. If you mean some patch, as yet unwritten, that handles the dma cases when dmaengine_pause() is unimplemented without data loss, ok, but please confirm that's what you mean. However, at some point one must look at the api and wonder if the separation of concern has been drawn in the right place. Regards, Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2015 at 02:21:59PM -0400, Peter Hurley wrote: > [ + Heikki ] > > On 08/07/2015 12:33 PM, Russell King - ARM Linux wrote: > > What you have is a race condition in the code you a responsible for > > maintaining, caused by poorly implemented code. Fix it, rather than > > whinging about drivers outside of your subsystem having never implemented > > _optional_ things that you choose to merge broken code which relied upon > > it _without_ checking that the operation succeeded. > > > > It is _entirely_ your code which is wrong here. > > > > I will wait for that to be fixed before acking the omap-dma change since > > you obviously need something to test with. > > I'm not sure to what you're referring here. > > A WARNing fixes nothing. The warning can wait. > If you mean some patch, as yet unwritten, that handles the dma cases when > dmaengine_pause() is unimplemented without data loss, ok, but please confirm > that's what you mean. But the regression needs fixing. > However, at some point one must look at the api and wonder if the separation > of concern has been drawn in the right place. It _is_ in the right place. dmaengine_pause() always has been permitted to fail. It's the responsibility of the user of this API to _check_ the return code to find out whether it had the desired effect. Not checking the return code is a bug in the caller's code. If that wasn't the case, dmaengine_pause() would have a void return type. It doesn't. It has an 'int' to allow failure, or to allow non- implementation for cases where the underlying hardware can't pause the channel without causing data loss. What would you think is better: an API which silently loses data, or one which refuses to stop the transfer and reports an error code back to the caller. You seem to be arguing for the former, and as such, there's no way I can take you seriously. In any case, Greg has now commented on the patch adding the feature, basically refusing it for stable tree inclusion. So the matter is settled: omap-dma isn't going to get the pause feature added in stable trees any time soon. So a different solution now needs to be found, which is what I've been saying all along...
On 08/07/2015 02:32 PM, Russell King - ARM Linux wrote: > On Fri, Aug 07, 2015 at 02:21:59PM -0400, Peter Hurley wrote: >> [ + Heikki ] >> >> On 08/07/2015 12:33 PM, Russell King - ARM Linux wrote: >>> What you have is a race condition in the code you a responsible for >>> maintaining, caused by poorly implemented code. Fix it, rather than >>> whinging about drivers outside of your subsystem having never implemented >>> _optional_ things that you choose to merge broken code which relied upon >>> it _without_ checking that the operation succeeded. >>> >>> It is _entirely_ your code which is wrong here. >>> >>> I will wait for that to be fixed before acking the omap-dma change since >>> you obviously need something to test with. >> >> I'm not sure to what you're referring here. >> >> A WARNing fixes nothing. > > The warning can wait. > >> If you mean some patch, as yet unwritten, that handles the dma cases when >> dmaengine_pause() is unimplemented without data loss, ok, but please confirm >> that's what you mean. > > But the regression needs fixing. I too would prefer the bug to be fixed. But calling it a regression is incorrect. There is no previous SHA in which this problem didn't exist, except before either 8250_dma or 8250_omap was added. From the outset, both the 8250 dma code and the 8250_omap driver (mistakenly) relied on dmaengine_pause. >> However, at some point one must look at the api and wonder if the separation >> of concern has been drawn in the right place. > > It _is_ in the right place. dmaengine_pause() always has been permitted > to fail. It's the responsibility of the user of this API to _check_ the > return code to find out whether it had the desired effect. Not checking > the return code is a bug in the caller's code. > > If that wasn't the case, dmaengine_pause() would have a void return type. > It doesn't. It has an 'int' to allow failure A resource error is significantly different than ENOSYS or EINVAL. > or to allow non- > implementation for cases where the underlying hardware can't pause the > channel without causing data loss. That's your assertion; I've seen no documentation to back that up (other than the de facto commit). And quite frankly, that's absurd. 1. No other driver implements _only some_ use-cases of dmaengine_pause(). 2. The number of users expecting dmaengine_pause to be implemented for non-cyclic dma transfers _dwarfs_ cyclic users. 3. There's a dedicated query interface, dma_get_slave_caps(), for which omap-dma returns /true/ -- not /maybe/ -- to indicate dmaengine_pause() is implemented. As a consumer of the api, I'd much rather opt-out at device initialization time knowing that a required feature is unimplemented, than discover it at i/o time when it's too late. > What would you think is better: an API which silently loses data, or > one which refuses to stop the transfer and reports an error code back > to the caller. An api which provides a means of determining if necessary functionality is implemented _during setup_. That way the consumer of the api can determine if the feature is supportable. For example, dma_get_slave_caps() could differentiate * pause for cyclic support * pause for non-cyclic support * pause and resume support * pause and terminate support .... > You seem to be arguing for the former, and as such, there's no way I > can take you seriously. Leaping to conclusions. > In any case, Greg has now commented on the patch adding the feature, > basically refusing it for stable tree inclusion. So the matter is > settled: omap-dma isn't going to get the pause feature added in stable > trees any time soon. So a different solution now needs to be found, > which is what I've been saying all along... While Sebastian's initial patch is a good first-cut at addressing 8250_omap's use of omap-dma, none of the patches address the general design problem I have outlined above; namely, that simply returning an error at use time for an unimplemented slave transaction is fundamentally flawed. Regards, Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2015 at 09:41:19PM -0400, Peter Hurley wrote: > That's your assertion; I've seen no documentation to back that up > (other than the de facto commit). So, you can't be bothered to read the thread where I quoted bits from the manual, including paraphrasing emails I had discussing this issue with TI. What you're effectively saying that I'm a liar. Fine, there's no point in engaging in further discussion. This will be my last reply to you.
diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c index 249445c8a4c6..6b8497203caf 100644 --- a/drivers/dma/omap-dma.c +++ b/drivers/dma/omap-dma.c @@ -299,7 +299,7 @@ static void omap_dma_start(struct omap_chan *c, struct omap_desc *d) omap_dma_chan_write(c, CCR, d->ccr | CCR_ENABLE); } -static void omap_dma_stop(struct omap_chan *c) +static int omap_dma_stop(struct omap_chan *c) { struct omap_dmadev *od = to_omap_dma_dev(c->vc.chan.device); uint32_t val; @@ -342,8 +342,26 @@ static void omap_dma_stop(struct omap_chan *c) omap_dma_glbl_write(od, OCP_SYSCONFIG, sysconfig); } else { + int i = 0; + + if (!(val & CCR_ENABLE)) + return -EINVAL; + val &= ~CCR_ENABLE; omap_dma_chan_write(c, CCR, val); + do { + val = omap_dma_chan_read(c, CCR); + if (!(val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))) + break; + if (i > 100) + break; + udelay(5); + } while (1); + + if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE)) + dev_err(c->vc.chan.device->dev, + "DMA drain did not complete on lch %d\n", + c->dma_ch); } mb(); @@ -358,6 +376,7 @@ static void omap_dma_stop(struct omap_chan *c) omap_dma_chan_write(c, CLNK_CTRL, val); } + return 0; } static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d, @@ -728,6 +747,8 @@ static enum dma_status omap_dma_tx_status(struct dma_chan *chan, } else { txstate->residue = 0; } + if (ret == DMA_IN_PROGRESS && c->paused) + ret = DMA_PAUSED; spin_unlock_irqrestore(&c->vc.lock, flags); return ret; @@ -1053,28 +1074,33 @@ static int omap_dma_terminate_all(struct dma_chan *chan) static int omap_dma_pause(struct dma_chan *chan) { struct omap_chan *c = to_omap_dma_chan(chan); + struct omap_dmadev *od = to_omap_dma_dev(chan->device); + unsigned long flags; + int ret = -EINVAL; - /* Pause/Resume only allowed with cyclic mode */ - if (!c->cyclic) - return -EINVAL; + spin_lock_irqsave(&od->irq_lock, flags); - if (!c->paused) { - omap_dma_stop(c); - c->paused = true; + if (!c->paused && c->desc) { + ret = omap_dma_stop(c); + if (!ret) + c->paused = true; } - return 0; + spin_unlock_irqrestore(&od->irq_lock, flags); + + return ret; } static int omap_dma_resume(struct dma_chan *chan) { struct omap_chan *c = to_omap_dma_chan(chan); + struct omap_dmadev *od = to_omap_dma_dev(chan->device); + unsigned long flags; + int ret = -EINVAL; - /* Pause/Resume only allowed with cyclic mode */ - if (!c->cyclic) - return -EINVAL; + spin_lock_irqsave(&od->irq_lock, flags); - if (c->paused) { + if (c->paused && c->desc) { mb(); /* Restore channel link register */ @@ -1082,9 +1108,11 @@ static int omap_dma_resume(struct dma_chan *chan) omap_dma_start(c, c->desc); c->paused = false; + ret = 0; } + spin_unlock_irqrestore(&od->irq_lock, flags); - return 0; + return ret; } static int omap_dma_chan_init(struct omap_dmadev *od)
This DMA driver is used by 8250-omap on DRA7-evm. There is one requirement that is to pause a transfer. This is currently used on the RX side. It is possible that the UART HW aborted the RX (UART's RX-timeout) but the DMA controller starts the transfer shortly after. Before we can manually purge the FIFO we need to pause the transfer, check how many bytes it already received and terminate the transfer without it making any progress. From testing on the TX side it seems that it is possible that we invoke pause once the transfer has completed which is indicated by the missing CCR_ENABLE bit but before the interrupt has been noticed. In that case the interrupt will come even after disabling it. The AM572x manual says that we have to wait for the CCR_RD_ACTIVE & CCR_WR_ACTIVE bits to be gone before programming it again here is the drain loop. Also it looks like without the drain the TX-transfer makes sometimes progress. One note: The pause + resume combo is broken because after resume the the complete transfer will be programmed again. That means the already transferred bytes (until the pause event) will be sent again. This is currently not important for my UART user because it does only pause + terminate. Cc: <stable@vger.kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- drivers/dma/omap-dma.c | 54 ++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 41 insertions(+), 13 deletions(-)