Message ID | 1406660344-25307-2-git-send-email-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Jul 29, 2014 at 08:58:58PM +0200, Sebastian Andrzej Siewior wrote: > The rx path of the 8250_dma user in the RX-timeout case: > - it starts the RX transfer > - if the rx-timeout interrupt occures, it dmaengine_pause() the transfer > - step two is dmaengine_terminate_all() on this channel. Okay after this whole channel needs to be reset, which means all the descriptors are discared. > - based on dmaengine_tx_status() it learns the number of transfered > bytes. > - the rx interrupt occures, why, channel is terminated, so existing txn needs to be aborted too > it does not start the RX-transfer because > according to dmaengine_tx_status() the status of the current transfer > is still DMA_IN_PROGRESS because the earlier dmaengine_terminate_all() > did not reset this. there is no current transfer anymore > - on rx-timeout it invokes dmaengine_pause() again. This time, it > segfaults because the channel has no descriptor yet. > > To make the upper case work better, this patch adds dma_cookie_complete() > to complete the cookie. Also it adds is an additional check for echan->edesc > in case the channel has no descriptor assigned. I think we are fixing the behvaiour rather than cause. terminate_all(() needs to do a proper cleanup of the channel And this looks a series, without cover letter sent to all. Which makes it a bit hard to see what is getting done
On 07/31/2014 02:17 PM, Vinod Koul wrote: > On Tue, Jul 29, 2014 at 08:58:58PM +0200, Sebastian Andrzej Siewior wrote: >> The rx path of the 8250_dma user in the RX-timeout case: >> - it starts the RX transfer >> - if the rx-timeout interrupt occures, it dmaengine_pause() the transfer >> - step two is dmaengine_terminate_all() on this channel. > Okay after this whole channel needs to be reset, which means all the > descriptors are discared. >> - based on dmaengine_tx_status() it learns the number of transfered >> bytes. >> - the rx interrupt occures, > why, channel is terminated, so existing txn needs to be aborted too serial8250_rx_dma() is invoked on 8250' RX interrupt to move all the data from the FIFO to memory. This is when the dma transfer is programmed (on all platforms but omap because we need to program it before the data is in the FIFO but this detail shouldn't matter). On Omap (atleast) it happens that if there are not enough characters in the FIFO over a given time the UART triggers a RX-timeout and no bytes are moved by the DMA engine. This is when UART_IIR_RX_TIMEOUT is hit. At that point it invokes the completion hanlder (__dma_rx_do_complete()) to learn how much bytes were transfered and to cancel the transfer (so the remaining bytes can be fetched via PIO). >> it does not start the RX-transfer because >> according to dmaengine_tx_status() the status of the current transfer >> is still DMA_IN_PROGRESS because the earlier dmaengine_terminate_all() >> did not reset this. > there is no current transfer anymore That is correct but since we never completed that cookie, dmaengine_tx_status() returns DMA_IN_PROGRESS and the 8250_dma does no book keeping of its own. >> - on rx-timeout it invokes dmaengine_pause() again. This time, it >> segfaults because the channel has no descriptor yet. >> >> To make the upper case work better, this patch adds dma_cookie_complete() >> to complete the cookie. Also it adds is an additional check for echan->edesc >> in case the channel has no descriptor assigned. > I think we are fixing the behvaiour rather than cause. terminate_all(() > needs to do a proper cleanup of the channel The channel is clean an can be reused. Since the cookie has never been completed / incremented, serial8250_rx_dma() never enqueues another transfer. Are you suggesting that the 8250_dma driver should have its own book keeping whether it started a transfer or not? Either way, it looks wrong that dmaengine_tx_status() reports DMA_IN_PROGRESS even that the transfer got aborted and the channel is properly cleaned up. > And this looks a series, without cover letter sent to all. Which makes it a > bit hard to see what is getting done What is getting done, is getting 8250_dma used on omap after the 8250 based driver is used for uart. The UART is mostly the same on am335x (where edma is used) as on DRA7 (where omap-dma is used) that is why I made the two patches for the two dma engines. The cover letter is at [1]. I tried to cover the problem and all informations in the patch description so you have all the required informations. [0] drivers/tty/serial/8250/8250_dma.c [1] https://lkml.org/lkml/2014/7/29/513 Sebastian
* Vinod Koul | 2014-07-31 17:47:02 [+0530]: >On Tue, Jul 29, 2014 at 08:58:58PM +0200, Sebastian Andrzej Siewior wrote: >> The rx path of the 8250_dma user in the RX-timeout case: >> - it starts the RX transfer >> - if the rx-timeout interrupt occures, it dmaengine_pause() the transfer >> - step two is dmaengine_terminate_all() on this channel. >Okay after this whole channel needs to be reset, which means all the >descriptors are discared. >> To make the upper case work better, this patch adds dma_cookie_complete() >> to complete the cookie. Also it adds is an additional check for echan->edesc >> in case the channel has no descriptor assigned. >I think we are fixing the behvaiour rather than cause. terminate_all(() >needs to do a proper cleanup of the channel In case you are not ignoring me but $reason here is an example that does not work (with both drivers); desc = dmaengine_prep_slave_single(rxchan, …); rx_cookie = dmaengine_submit(desc); dma_async_issue_pending(rxchan); ssleep(2); /* Now assume that the transfer did not start */ st = dmaengine_tx_status(rxchan, rx_cookie, NULL); /* st is now DMA_IN_PROGRESS as expected */ dmaengine_terminate_all(rxchan); st = dmaengine_tx_status(rxchan, rx_cookie, NULL); /* st is still DMA_IN_PROGRESS but _I_ expect DMA_COMPLETE because * it has been terminated / canceled */ Both dma driver clean up all / terminate all descriptors as required but _none_ of them completes the cookie. As a result dma_cookie_status() still thinks that the transfer is in progress. Sebastian
On Fri, Aug 08, 2014 at 06:29:50PM +0200, Sebastian Andrzej Siewior wrote: > * Vinod Koul | 2014-07-31 17:47:02 [+0530]: > > >On Tue, Jul 29, 2014 at 08:58:58PM +0200, Sebastian Andrzej Siewior wrote: > >> The rx path of the 8250_dma user in the RX-timeout case: > >> - it starts the RX transfer > >> - if the rx-timeout interrupt occures, it dmaengine_pause() the transfer > >> - step two is dmaengine_terminate_all() on this channel. > >Okay after this whole channel needs to be reset, which means all the > >descriptors are discared. > > >> To make the upper case work better, this patch adds dma_cookie_complete() > >> to complete the cookie. Also it adds is an additional check for echan->edesc > >> in case the channel has no descriptor assigned. > >I think we are fixing the behvaiour rather than cause. terminate_all(() > >needs to do a proper cleanup of the channel > > In case you are not ignoring me but $reason here is an example that does > not work (with both drivers); Sorry This seems to have slipped thru, wasn't intentional! > > desc = dmaengine_prep_slave_single(rxchan, …); > rx_cookie = dmaengine_submit(desc); > dma_async_issue_pending(rxchan); > > ssleep(2); > /* Now assume that the transfer did not start */ > st = dmaengine_tx_status(rxchan, rx_cookie, NULL); > /* st is now DMA_IN_PROGRESS as expected */ > > dmaengine_terminate_all(rxchan); > st = dmaengine_tx_status(rxchan, rx_cookie, NULL); and here is the culprit. You have terminated the channel. This means that dmaengine driver is free to clean up all the allocated descriptors on the channels and do whatever it decides to do with them. You have already terminated the channel so what is the point in querying the status of the cookie, which you shouldn't use anyway after invoking terminate_all() as its result is not correct. > /* st is still DMA_IN_PROGRESS but _I_ expect DMA_COMPLETE because > * it has been terminated / canceled > */ > > Both dma driver clean up all / terminate all descriptors as required but > _none_ of them completes the cookie. As a result dma_cookie_status() > still thinks that the transfer is in progress. Btw how does it matter in the driver here if the transaction completed or not after terminate_all() was invoked. What is the purpose of querying status. Thanks
On 08/19/2014 05:12 PM, Vinod Koul wrote: >> >> desc = dmaengine_prep_slave_single(rxchan, …); >> rx_cookie = dmaengine_submit(desc); >> dma_async_issue_pending(rxchan); >> >> ssleep(2); >> /* Now assume that the transfer did not start */ >> st = dmaengine_tx_status(rxchan, rx_cookie, NULL); >> /* st is now DMA_IN_PROGRESS as expected */ >> >> dmaengine_terminate_all(rxchan); >> st = dmaengine_tx_status(rxchan, rx_cookie, NULL); > and here is the culprit. You have terminated the channel. This means that > dmaengine driver is free to clean up all the allocated descriptors on the > channels and do whatever it decides to do with them. descriptors, yes. > You have already terminated the channel so what is the point in querying the > status of the cookie, which you shouldn't use anyway after invoking > terminate_all() as its result is not correct. The point is to check (later, after terminate_all()) if there is an outstanding DMA transfer or not _and_ how much data was really transfered. Looking at edma it does not really support the latter if the transfer is already completed. On the plus side the HW does not support partly transfers :) But where is it written that the life time of the cookie is limited? Looking at the "cooking check" code there is no such thing. It is simply compare of completed vs passed number but okay, this is an implementation detail. From [0] it says under "4. Submit the transaction": | This returns a cookie can be used to check the progress of DMA engine | activity via other DMA engine calls not covered in this document. no life time limit mentioned here. Which brings to the question: Why is it okay to use the cookie after the transaction "terminated" normally but not if it has been canceled? And from [0] the API explanation "4. … dma_async_is_tx_complete()": |Note: | Not all DMA engine drivers can return reliable information for | a running DMA channel. It is recommended that DMA engine users | pause or stop (via dmaengine_terminate_all) the channel before | using this API. So the documentation says to use the cookie with dma_async_is_tx_complete() and before doing so it is recommended that the transfer should be paused or stopped. _Exactly_ what is done here. >> /* st is still DMA_IN_PROGRESS but _I_ expect DMA_COMPLETE because >> * it has been terminated / canceled >> */ >> >> Both dma driver clean up all / terminate all descriptors as required but >> _none_ of them completes the cookie. As a result dma_cookie_status() >> still thinks that the transfer is in progress. > > Btw how does it matter in the driver here if the transaction completed or > not after terminate_all() was invoked. What is the purpose of querying > status. In the RX interrupt (of the 8250 unit), the code checks if the transfer has been already started or not via querying the status. So if it returns DMA_COMPLETE then a new transfer will be started. If it returns DMA_IN_PROGRESS then the code returns doing nothing because the DMA engine should start moving data anytime now so the RX interrupt goes away. That means: If the transfer is canceled then it won't be started again. Btw: the current (probably only) dma driver that is used by 8250-dma is dw/core.c. That one does cookie complete on terminate: dwc_control(DMA_TERMINATE_ALL) -> dwc_descriptor_complete() -> dma_cookie_complete(). That is why it works for them… [0] Documentation/dmaengine.txt > > Thanks > Sebastian
On Thu, Aug 21, 2014 at 03:09:12PM +0200, Sebastian Andrzej Siewior wrote: > On 08/19/2014 05:12 PM, Vinod Koul wrote: > >> > >> desc = dmaengine_prep_slave_single(rxchan, …); > >> rx_cookie = dmaengine_submit(desc); > >> dma_async_issue_pending(rxchan); > >> > >> ssleep(2); > >> /* Now assume that the transfer did not start */ > >> st = dmaengine_tx_status(rxchan, rx_cookie, NULL); > >> /* st is now DMA_IN_PROGRESS as expected */ > >> > >> dmaengine_terminate_all(rxchan); > >> st = dmaengine_tx_status(rxchan, rx_cookie, NULL); > > and here is the culprit. You have terminated the channel. This means that > > dmaengine driver is free to clean up all the allocated descriptors on the > > channels and do whatever it decides to do with them. > > descriptors, yes. and by that logic when you query the driver would have freed up! > > You have already terminated the channel so what is the point in querying the > > status of the cookie, which you shouldn't use anyway after invoking > > terminate_all() as its result is not correct. > > The point is to check (later, after terminate_all()) if there is an > outstanding DMA transfer or not _and_ how much data was really > transfered. Looking at edma it does not really support the latter if > the transfer is already completed. On the plus side the HW does not > support partly transfers :) well that can be achieved properly and differently! Why don't we pause the channel, get the residue, status and then terminate. > But where is it written that the life time of the cookie is limited? > Looking at the "cooking check" code there is no such thing. It is > simply compare of completed vs passed number but okay, this is an > implementation detail. > From [0] it says under "4. Submit the transaction": > > | This returns a cookie can be used to check the progress of DMA engine > | activity via other DMA engine calls not covered in this document. > > no life time limit mentioned here. Which brings to the question: Why is > it okay to use the cookie after the transaction "terminated" normally > but not if it has been canceled? Due to the special nature of terminate. The point here is that you don't terminate a transaction but channel operation > And from [0] the API explanation "4. … dma_async_is_tx_complete()": > > |Note: > | Not all DMA engine drivers can return reliable information for > | a running DMA channel. It is recommended that DMA engine users > | pause or stop (via dmaengine_terminate_all) the channel before > | using this API. > > So the documentation says to use the cookie with > dma_async_is_tx_complete() and before doing so it is recommended that > the transfer should be paused or stopped. _Exactly_ what is done here. > > >> /* st is still DMA_IN_PROGRESS but _I_ expect DMA_COMPLETE because > >> * it has been terminated / canceled > >> */ > >> > >> Both dma driver clean up all / terminate all descriptors as required but > >> _none_ of them completes the cookie. As a result dma_cookie_status() > >> still thinks that the transfer is in progress. > > > > Btw how does it matter in the driver here if the transaction completed or > > not after terminate_all() was invoked. What is the purpose of querying > > status. > > In the RX interrupt (of the 8250 unit), the code checks if the transfer > has been already started or not via querying the status. So if it > returns DMA_COMPLETE then a new transfer will be started. If it returns > DMA_IN_PROGRESS then the code returns doing nothing because the DMA > engine should start moving data anytime now so the RX interrupt goes > away. > > That means: If the transfer is canceled then it won't be started again. > > Btw: the current (probably only) dma driver that is used by 8250-dma is > dw/core.c. That one does cookie complete on terminate: > dwc_control(DMA_TERMINATE_ALL) -> dwc_descriptor_complete() -> > dma_cookie_complete(). Yes but would above flow work for you :)
On 08/28/2014 09:06 AM, Vinod Koul wrote: > On Thu, Aug 21, 2014 at 03:09:12PM +0200, Sebastian Andrzej Siewior wrote: >> On 08/19/2014 05:12 PM, Vinod Koul wrote: >>>> >>>> desc = dmaengine_prep_slave_single(rxchan, …); >>>> rx_cookie = dmaengine_submit(desc); >>>> dma_async_issue_pending(rxchan); >>>> >>>> ssleep(2); >>>> /* Now assume that the transfer did not start */ >>>> st = dmaengine_tx_status(rxchan, rx_cookie, NULL); >>>> /* st is now DMA_IN_PROGRESS as expected */ >>>> >>>> dmaengine_terminate_all(rxchan); >>>> st = dmaengine_tx_status(rxchan, rx_cookie, NULL); >>> and here is the culprit. You have terminated the channel. This means that >>> dmaengine driver is free to clean up all the allocated descriptors on the >>> channels and do whatever it decides to do with them. >> >> descriptors, yes. > and by that logic when you query the driver would have freed up! I don't understand. What "would have freed up"? >>> You have already terminated the channel so what is the point in querying the >>> status of the cookie, which you shouldn't use anyway after invoking >>> terminate_all() as its result is not correct. >> >> The point is to check (later, after terminate_all()) if there is an >> outstanding DMA transfer or not _and_ how much data was really >> transfered. Looking at edma it does not really support the latter if >> the transfer is already completed. On the plus side the HW does not >> support partly transfers :) > well that can be achieved properly and differently! I am all yours. > Why don't we pause the channel, get the residue, status and then > terminate. This is done that way. For the protocol: pause is not supported by driver (edma and omap-dma) unless the channel is used in "continuous" mode. But this does not matter here. The root problems remains: You get DMA_IN_PROGRESS returned as long as the cookie has not been completed _even_ after the invocation of terminate_all(). So that part that knows it terminates the thing for the first time does everything correct. The other part that only enqueues a transfer if there is not another one pending does never do anything. >> But where is it written that the life time of the cookie is limited? >> Looking at the "cooking check" code there is no such thing. It is >> simply compare of completed vs passed number but okay, this is an >> implementation detail. >> From [0] it says under "4. Submit the transaction": >> >> | This returns a cookie can be used to check the progress of DMA engine >> | activity via other DMA engine calls not covered in this document. >> >> no life time limit mentioned here. Which brings to the question: Why is >> it okay to use the cookie after the transaction "terminated" normally >> but not if it has been canceled? > Due to the special nature of terminate. The point here is that you don't > terminate a transaction but channel operation So the channel is terminated and the transaction is still in progress. Is this what you are telling me? And why should a transaction remain in progress after the channel has been terminated? I mean once you terminate a bridge crossing point there are no cars going over that bridge. It is closed. Their trip has been terminated as well. The question that remains is which of those cars passed the bridge before it was terminated. I understand that you want first to pause that channel looking at the bytes and then terminate. Totally understand. Otherwise the number of bytes transfered (reside) can no longer be queried because the driver cleaned up. Fine. But it should be reported that this transaction is no longer in movement. It won't happen. Never. And this is what bugs me. >> And from [0] the API explanation "4. … dma_async_is_tx_complete()": > > >> >> |Note: >> | Not all DMA engine drivers can return reliable information for >> | a running DMA channel. It is recommended that DMA engine users >> | pause or stop (via dmaengine_terminate_all) the channel before >> | using this API. >> >> So the documentation says to use the cookie with >> dma_async_is_tx_complete() and before doing so it is recommended that >> the transfer should be paused or stopped. _Exactly_ what is done here. >> >>>> /* st is still DMA_IN_PROGRESS but _I_ expect DMA_COMPLETE because >>>> * it has been terminated / canceled >>>> */ >>>> >>>> Both dma driver clean up all / terminate all descriptors as required but >>>> _none_ of them completes the cookie. As a result dma_cookie_status() >>>> still thinks that the transfer is in progress. >>> >>> Btw how does it matter in the driver here if the transaction completed or >>> not after terminate_all() was invoked. What is the purpose of querying >>> status. >> >> In the RX interrupt (of the 8250 unit), the code checks if the transfer >> has been already started or not via querying the status. So if it >> returns DMA_COMPLETE then a new transfer will be started. If it returns >> DMA_IN_PROGRESS then the code returns doing nothing because the DMA >> engine should start moving data anytime now so the RX interrupt goes >> away. >> >> That means: If the transfer is canceled then it won't be started again. >> >> Btw: the current (probably only) dma driver that is used by 8250-dma is >> dw/core.c. That one does cookie complete on terminate: >> dwc_control(DMA_TERMINATE_ALL) -> dwc_descriptor_complete() -> >> dma_cookie_complete(). > Yes but would above flow work for you :) No, it wont. I ask you to accept the two patches since now the behavior of dma_async_is_tx_complete() is different depending on whether a transaction completed on its own or got terminated. And reporting in_progress does not reflect the reality. I can even add a patch on top updating the documentation where it state that you must not trust the dma_tx_state struct once the request is completed (or aborted). Or we can trade :) I send you a patch which removes the "cookie complete" part from the dwc-dma driver with a CC stable tag. And you explain it to the intel folks that merged that part of the 8250 dma code that rely on the fact (that a aborted transaction reports complete) that what they do is wrong. but I think it will end up with this: I will change the 8250-dma code to keep track whether or not it enqueued a DMA transfer and rely on this information instead of dma_async_is_tx_complete(). I will update the documentation stating that dma_async_is_tx_complete() may report bogus information on completed and terminated transaction. Sebastian
diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index d08c4de..ff05dd4 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -256,6 +256,7 @@ static int edma_terminate_all(struct edma_chan *echan) * echan->edesc is NULL and exit.) */ if (echan->edesc) { + dma_cookie_complete(&echan->edesc->vdesc.tx); echan->edesc = NULL; edma_stop(echan->ch_num); } @@ -282,7 +283,7 @@ static int edma_slave_config(struct edma_chan *echan, static int edma_dma_pause(struct edma_chan *echan) { /* Pause/Resume only allowed with cyclic mode */ - if (!echan->edesc->cyclic) + if (!echan->edesc || !echan->edesc->cyclic) return -EINVAL; edma_pause(echan->ch_num);
The rx path of the 8250_dma user in the RX-timeout case: - it starts the RX transfer - if the rx-timeout interrupt occures, it dmaengine_pause() the transfer - step two is dmaengine_terminate_all() on this channel. - based on dmaengine_tx_status() it learns the number of transfered bytes. - the rx interrupt occures, it does not start the RX-transfer because according to dmaengine_tx_status() the status of the current transfer is still DMA_IN_PROGRESS because the earlier dmaengine_terminate_all() did not reset this. - on rx-timeout it invokes dmaengine_pause() again. This time, it segfaults because the channel has no descriptor yet. To make the upper case work better, this patch adds dma_cookie_complete() to complete the cookie. Also it adds is an additional check for echan->edesc in case the channel has no descriptor assigned. Cc: Joel Fernandes <joelf@ti.com> Cc: Vinod Koul <vinod.koul@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: dmaengine@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- drivers/dma/edma.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)