diff mbox

[1/7] dmaengine: edma: fix two faults which happen with the 8250_dma user

Message ID 1406660344-25307-2-git-send-email-bigeasy@linutronix.de (mailing list archive)
State New, archived
Headers show

Commit Message

Sebastian Andrzej Siewior July 29, 2014, 6:58 p.m. UTC
The rx path of the 8250_dma user in the RX-timeout case:
- it starts the RX transfer
- if the rx-timeout interrupt occures, it dmaengine_pause() the transfer
- step two is dmaengine_terminate_all() on this channel.
- based on dmaengine_tx_status() it learns the number of transfered
  bytes.
- the rx interrupt occures, it does not start the RX-transfer because
  according to dmaengine_tx_status() the status of the current transfer
  is still DMA_IN_PROGRESS because the earlier dmaengine_terminate_all()
  did not reset this.
- on rx-timeout it invokes dmaengine_pause() again. This time, it
  segfaults because the channel has no descriptor yet.

To make the upper case work better, this patch adds dma_cookie_complete()
to complete the cookie. Also it adds is an additional check for echan->edesc
in case the channel has no descriptor assigned.

Cc: Joel Fernandes <joelf@ti.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: dmaengine@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/dma/edma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Vinod Koul July 31, 2014, 12:17 p.m. UTC | #1
On Tue, Jul 29, 2014 at 08:58:58PM +0200, Sebastian Andrzej Siewior wrote:
> The rx path of the 8250_dma user in the RX-timeout case:
> - it starts the RX transfer
> - if the rx-timeout interrupt occures, it dmaengine_pause() the transfer
> - step two is dmaengine_terminate_all() on this channel.
Okay after this whole channel needs to be reset, which means all the
descriptors are discared.
> - based on dmaengine_tx_status() it learns the number of transfered
>   bytes.
> - the rx interrupt occures,
why, channel is terminated, so existing txn needs to be aborted too

>   it does not start the RX-transfer because
>   according to dmaengine_tx_status() the status of the current transfer
>   is still DMA_IN_PROGRESS because the earlier dmaengine_terminate_all()
>   did not reset this.
there is no current transfer anymore

> - on rx-timeout it invokes dmaengine_pause() again. This time, it
>   segfaults because the channel has no descriptor yet.
> 
> To make the upper case work better, this patch adds dma_cookie_complete()
> to complete the cookie. Also it adds is an additional check for echan->edesc
> in case the channel has no descriptor assigned.
I think we are fixing the behvaiour rather than cause. terminate_all(()
needs to do a proper cleanup of the channel

And this looks a series, without cover letter sent to all. Which makes it a
bit hard to see what is getting done
Sebastian Andrzej Siewior July 31, 2014, 1:06 p.m. UTC | #2
On 07/31/2014 02:17 PM, Vinod Koul wrote:
> On Tue, Jul 29, 2014 at 08:58:58PM +0200, Sebastian Andrzej Siewior wrote:
>> The rx path of the 8250_dma user in the RX-timeout case:
>> - it starts the RX transfer
>> - if the rx-timeout interrupt occures, it dmaengine_pause() the transfer
>> - step two is dmaengine_terminate_all() on this channel.
> Okay after this whole channel needs to be reset, which means all the
> descriptors are discared.
>> - based on dmaengine_tx_status() it learns the number of transfered
>>   bytes.
>> - the rx interrupt occures,
> why, channel is terminated, so existing txn needs to be aborted too

serial8250_rx_dma() is invoked on 8250' RX interrupt to move all the
data from the FIFO to memory. This is when the dma transfer is
programmed (on all platforms but omap because we need to program it
before the data is in the FIFO but this detail shouldn't matter).

On Omap (atleast) it happens that if there are not enough characters in
the FIFO over a given time the UART triggers a RX-timeout and no bytes
are moved by the DMA engine. This is when UART_IIR_RX_TIMEOUT is hit.
At that point it invokes the completion hanlder
(__dma_rx_do_complete()) to learn how much bytes were transfered and to
cancel the transfer (so the remaining bytes can be fetched via PIO).

>>   it does not start the RX-transfer because
>>   according to dmaengine_tx_status() the status of the current transfer
>>   is still DMA_IN_PROGRESS because the earlier dmaengine_terminate_all()
>>   did not reset this.
> there is no current transfer anymore

That is correct but since we never completed that cookie,
dmaengine_tx_status() returns DMA_IN_PROGRESS and the 8250_dma does no
book keeping of its own.

>> - on rx-timeout it invokes dmaengine_pause() again. This time, it
>>   segfaults because the channel has no descriptor yet.
>>
>> To make the upper case work better, this patch adds dma_cookie_complete()
>> to complete the cookie. Also it adds is an additional check for echan->edesc
>> in case the channel has no descriptor assigned.
> I think we are fixing the behvaiour rather than cause. terminate_all(()
> needs to do a proper cleanup of the channel

The channel is clean an can be reused. Since the cookie has never been
completed / incremented, serial8250_rx_dma() never enqueues another
transfer.

Are you suggesting that the 8250_dma driver should have its own book
keeping whether it started a transfer or not?

Either way, it looks wrong that dmaengine_tx_status() reports
DMA_IN_PROGRESS even that the transfer got aborted and the channel is
properly cleaned up.

> And this looks a series, without cover letter sent to all. Which makes it a
> bit hard to see what is getting done

What is getting done, is getting 8250_dma used on omap after the 8250
based driver is used for uart. The UART is mostly the same on am335x
(where edma is used) as on DRA7 (where omap-dma is used) that is why I
made the two patches for the two dma engines. The cover letter is at
[1]. I tried to cover the problem and all informations in the patch
description so you have all the required informations.

[0] drivers/tty/serial/8250/8250_dma.c
[1] https://lkml.org/lkml/2014/7/29/513

Sebastian
Sebastian Andrzej Siewior Aug. 8, 2014, 4:29 p.m. UTC | #3
* Vinod Koul | 2014-07-31 17:47:02 [+0530]:

>On Tue, Jul 29, 2014 at 08:58:58PM +0200, Sebastian Andrzej Siewior wrote:
>> The rx path of the 8250_dma user in the RX-timeout case:
>> - it starts the RX transfer
>> - if the rx-timeout interrupt occures, it dmaengine_pause() the transfer
>> - step two is dmaengine_terminate_all() on this channel.
>Okay after this whole channel needs to be reset, which means all the
>descriptors are discared.

>> To make the upper case work better, this patch adds dma_cookie_complete()
>> to complete the cookie. Also it adds is an additional check for echan->edesc
>> in case the channel has no descriptor assigned.
>I think we are fixing the behvaiour rather than cause. terminate_all(()
>needs to do a proper cleanup of the channel

In case you are not ignoring me but $reason here is an example that does
not work (with both drivers);

    desc = dmaengine_prep_slave_single(rxchan, …);
    rx_cookie = dmaengine_submit(desc);
    dma_async_issue_pending(rxchan);

    ssleep(2);
    /* Now assume that the transfer did not start */
    st = dmaengine_tx_status(rxchan, rx_cookie, NULL);
    /* st is now DMA_IN_PROGRESS as expected */

    dmaengine_terminate_all(rxchan);
    st = dmaengine_tx_status(rxchan, rx_cookie, NULL);
    /* st is still DMA_IN_PROGRESS but _I_ expect DMA_COMPLETE because
     * it has been terminated / canceled
     */

Both dma driver clean up all / terminate all descriptors as required but
_none_ of them completes the cookie. As a result dma_cookie_status()
still thinks that the transfer is in progress.

Sebastian
Vinod Koul Aug. 19, 2014, 3:12 p.m. UTC | #4
On Fri, Aug 08, 2014 at 06:29:50PM +0200, Sebastian Andrzej Siewior wrote:
> * Vinod Koul | 2014-07-31 17:47:02 [+0530]:
> 
> >On Tue, Jul 29, 2014 at 08:58:58PM +0200, Sebastian Andrzej Siewior wrote:
> >> The rx path of the 8250_dma user in the RX-timeout case:
> >> - it starts the RX transfer
> >> - if the rx-timeout interrupt occures, it dmaengine_pause() the transfer
> >> - step two is dmaengine_terminate_all() on this channel.
> >Okay after this whole channel needs to be reset, which means all the
> >descriptors are discared.
> 
> >> To make the upper case work better, this patch adds dma_cookie_complete()
> >> to complete the cookie. Also it adds is an additional check for echan->edesc
> >> in case the channel has no descriptor assigned.
> >I think we are fixing the behvaiour rather than cause. terminate_all(()
> >needs to do a proper cleanup of the channel
> 
> In case you are not ignoring me but $reason here is an example that does
> not work (with both drivers);
Sorry This seems to have slipped thru, wasn't intentional!
> 
>     desc = dmaengine_prep_slave_single(rxchan, …);
>     rx_cookie = dmaengine_submit(desc);
>     dma_async_issue_pending(rxchan);
> 
>     ssleep(2);
>     /* Now assume that the transfer did not start */
>     st = dmaengine_tx_status(rxchan, rx_cookie, NULL);
>     /* st is now DMA_IN_PROGRESS as expected */
> 
>     dmaengine_terminate_all(rxchan);
>     st = dmaengine_tx_status(rxchan, rx_cookie, NULL);
and here is the culprit. You have terminated the channel. This means that
dmaengine driver is free to clean up all the allocated descriptors on the
channels and do whatever it decides to do with them.

You have already terminated the channel so what is the point in querying the
status of the cookie, which you shouldn't use anyway after invoking
terminate_all() as its result is not correct.

>     /* st is still DMA_IN_PROGRESS but _I_ expect DMA_COMPLETE because
>      * it has been terminated / canceled
>      */
> 
> Both dma driver clean up all / terminate all descriptors as required but
> _none_ of them completes the cookie. As a result dma_cookie_status()
> still thinks that the transfer is in progress.

Btw how does it matter in the driver here if the transaction completed or
not after terminate_all() was invoked. What is the purpose of querying
status.

Thanks
Sebastian Andrzej Siewior Aug. 21, 2014, 1:09 p.m. UTC | #5
On 08/19/2014 05:12 PM, Vinod Koul wrote:
>>
>>     desc = dmaengine_prep_slave_single(rxchan, …);
>>     rx_cookie = dmaengine_submit(desc);
>>     dma_async_issue_pending(rxchan);
>>
>>     ssleep(2);
>>     /* Now assume that the transfer did not start */
>>     st = dmaengine_tx_status(rxchan, rx_cookie, NULL);
>>     /* st is now DMA_IN_PROGRESS as expected */
>>
>>     dmaengine_terminate_all(rxchan);
>>     st = dmaengine_tx_status(rxchan, rx_cookie, NULL);
> and here is the culprit. You have terminated the channel. This means that
> dmaengine driver is free to clean up all the allocated descriptors on the
> channels and do whatever it decides to do with them.

descriptors, yes.

> You have already terminated the channel so what is the point in querying the
> status of the cookie, which you shouldn't use anyway after invoking
> terminate_all() as its result is not correct.

The point is to check (later, after terminate_all()) if there is an
outstanding DMA transfer or not _and_ how much data was really
transfered. Looking at edma it does not really support the latter if
the transfer is already completed. On the plus side the HW does not
support partly transfers :)

But where is it written that the life time of the cookie is limited?
Looking at the "cooking check" code there is no such thing. It is
simply compare of completed vs passed number but okay, this is an
implementation detail.
From [0] it says under "4. Submit the transaction":

| This returns a cookie can be used to check the progress of DMA engine
| activity via other DMA engine calls not covered in this document.

no life time limit mentioned here. Which brings to the question: Why is
it okay to use the cookie after the transaction "terminated" normally
but not if it has been canceled?

And from [0] the API explanation  "4. … dma_async_is_tx_complete()":

|Note:
|    Not all DMA engine drivers can return reliable information for
|    a running DMA channel.  It is recommended that DMA engine users
|    pause or stop (via dmaengine_terminate_all) the channel before
|    using this API.

So the documentation says to use the cookie with
dma_async_is_tx_complete() and before doing so it is recommended that
the transfer should be paused or stopped. _Exactly_ what is done here.

>>     /* st is still DMA_IN_PROGRESS but _I_ expect DMA_COMPLETE because
>>      * it has been terminated / canceled
>>      */
>>
>> Both dma driver clean up all / terminate all descriptors as required but
>> _none_ of them completes the cookie. As a result dma_cookie_status()
>> still thinks that the transfer is in progress.
> 
> Btw how does it matter in the driver here if the transaction completed or
> not after terminate_all() was invoked. What is the purpose of querying
> status.

In the RX interrupt (of the 8250 unit), the code checks if the transfer
has been already started or not via querying the status. So if it
returns DMA_COMPLETE then a new transfer will be started. If it returns
DMA_IN_PROGRESS then the code returns doing nothing because the DMA
engine should start moving data anytime now so the RX interrupt goes
away.

That means: If the transfer is canceled then it won't be started again.

Btw: the current (probably only) dma driver that is used by 8250-dma is
dw/core.c. That one does cookie complete on terminate:
 dwc_control(DMA_TERMINATE_ALL) -> dwc_descriptor_complete() ->
 dma_cookie_complete().

That is why it works for them…

[0] Documentation/dmaengine.txt

> 
> Thanks
> 

Sebastian
Vinod Koul Aug. 28, 2014, 7:06 a.m. UTC | #6
On Thu, Aug 21, 2014 at 03:09:12PM +0200, Sebastian Andrzej Siewior wrote:
> On 08/19/2014 05:12 PM, Vinod Koul wrote:
> >>
> >>     desc = dmaengine_prep_slave_single(rxchan, …);
> >>     rx_cookie = dmaengine_submit(desc);
> >>     dma_async_issue_pending(rxchan);
> >>
> >>     ssleep(2);
> >>     /* Now assume that the transfer did not start */
> >>     st = dmaengine_tx_status(rxchan, rx_cookie, NULL);
> >>     /* st is now DMA_IN_PROGRESS as expected */
> >>
> >>     dmaengine_terminate_all(rxchan);
> >>     st = dmaengine_tx_status(rxchan, rx_cookie, NULL);
> > and here is the culprit. You have terminated the channel. This means that
> > dmaengine driver is free to clean up all the allocated descriptors on the
> > channels and do whatever it decides to do with them.
> 
> descriptors, yes.
and by that logic when you query the driver would have freed up!

> > You have already terminated the channel so what is the point in querying the
> > status of the cookie, which you shouldn't use anyway after invoking
> > terminate_all() as its result is not correct.
> 
> The point is to check (later, after terminate_all()) if there is an
> outstanding DMA transfer or not _and_ how much data was really
> transfered. Looking at edma it does not really support the latter if
> the transfer is already completed. On the plus side the HW does not
> support partly transfers :)
well that can be achieved properly and differently!
Why don't we pause the channel, get the residue, status and then
terminate.

> But where is it written that the life time of the cookie is limited?
> Looking at the "cooking check" code there is no such thing. It is
> simply compare of completed vs passed number but okay, this is an
> implementation detail.
> From [0] it says under "4. Submit the transaction":
> 
> | This returns a cookie can be used to check the progress of DMA engine
> | activity via other DMA engine calls not covered in this document.
> 
> no life time limit mentioned here. Which brings to the question: Why is
> it okay to use the cookie after the transaction "terminated" normally
> but not if it has been canceled?
Due to the special nature of terminate. The point here is that you don't
terminate a transaction but channel operation

> And from [0] the API explanation  "4. … dma_async_is_tx_complete()":


> 
> |Note:
> |    Not all DMA engine drivers can return reliable information for
> |    a running DMA channel.  It is recommended that DMA engine users
> |    pause or stop (via dmaengine_terminate_all) the channel before
> |    using this API.
> 
> So the documentation says to use the cookie with
> dma_async_is_tx_complete() and before doing so it is recommended that
> the transfer should be paused or stopped. _Exactly_ what is done here.
> 
> >>     /* st is still DMA_IN_PROGRESS but _I_ expect DMA_COMPLETE because
> >>      * it has been terminated / canceled
> >>      */
> >>
> >> Both dma driver clean up all / terminate all descriptors as required but
> >> _none_ of them completes the cookie. As a result dma_cookie_status()
> >> still thinks that the transfer is in progress.
> > 
> > Btw how does it matter in the driver here if the transaction completed or
> > not after terminate_all() was invoked. What is the purpose of querying
> > status.
> 
> In the RX interrupt (of the 8250 unit), the code checks if the transfer
> has been already started or not via querying the status. So if it
> returns DMA_COMPLETE then a new transfer will be started. If it returns
> DMA_IN_PROGRESS then the code returns doing nothing because the DMA
> engine should start moving data anytime now so the RX interrupt goes
> away.
> 
> That means: If the transfer is canceled then it won't be started again.
> 
> Btw: the current (probably only) dma driver that is used by 8250-dma is
> dw/core.c. That one does cookie complete on terminate:
>  dwc_control(DMA_TERMINATE_ALL) -> dwc_descriptor_complete() ->
>  dma_cookie_complete().
Yes but would above flow work for you :)
Sebastian Andrzej Siewior Aug. 28, 2014, 9:06 a.m. UTC | #7
On 08/28/2014 09:06 AM, Vinod Koul wrote:
> On Thu, Aug 21, 2014 at 03:09:12PM +0200, Sebastian Andrzej Siewior wrote:
>> On 08/19/2014 05:12 PM, Vinod Koul wrote:
>>>>
>>>>     desc = dmaengine_prep_slave_single(rxchan, …);
>>>>     rx_cookie = dmaengine_submit(desc);
>>>>     dma_async_issue_pending(rxchan);
>>>>
>>>>     ssleep(2);
>>>>     /* Now assume that the transfer did not start */
>>>>     st = dmaengine_tx_status(rxchan, rx_cookie, NULL);
>>>>     /* st is now DMA_IN_PROGRESS as expected */
>>>>
>>>>     dmaengine_terminate_all(rxchan);
>>>>     st = dmaengine_tx_status(rxchan, rx_cookie, NULL);
>>> and here is the culprit. You have terminated the channel. This means that
>>> dmaengine driver is free to clean up all the allocated descriptors on the
>>> channels and do whatever it decides to do with them.
>>
>> descriptors, yes.
> and by that logic when you query the driver would have freed up!

I don't understand. What "would have freed up"?

>>> You have already terminated the channel so what is the point in querying the
>>> status of the cookie, which you shouldn't use anyway after invoking
>>> terminate_all() as its result is not correct.
>>
>> The point is to check (later, after terminate_all()) if there is an
>> outstanding DMA transfer or not _and_ how much data was really
>> transfered. Looking at edma it does not really support the latter if
>> the transfer is already completed. On the plus side the HW does not
>> support partly transfers :)
> well that can be achieved properly and differently!

I am all yours.

> Why don't we pause the channel, get the residue, status and then
> terminate.

This is done that way. For the protocol: pause is not supported by
driver (edma and omap-dma) unless the channel is used in "continuous"
mode. But this does not matter here.
The root problems remains: You get DMA_IN_PROGRESS returned as long as
the cookie has not been completed _even_ after the invocation of
terminate_all(). So that part that knows it terminates the thing for
the first time does everything correct.
The other part that only enqueues a transfer if there is not another
one pending does never do anything.

>> But where is it written that the life time of the cookie is limited?
>> Looking at the "cooking check" code there is no such thing. It is
>> simply compare of completed vs passed number but okay, this is an
>> implementation detail.
>> From [0] it says under "4. Submit the transaction":
>>
>> | This returns a cookie can be used to check the progress of DMA engine
>> | activity via other DMA engine calls not covered in this document.
>>
>> no life time limit mentioned here. Which brings to the question: Why is
>> it okay to use the cookie after the transaction "terminated" normally
>> but not if it has been canceled?
> Due to the special nature of terminate. The point here is that you don't
> terminate a transaction but channel operation

So the channel is terminated and the transaction is still in progress.
Is this what you are telling me? And why should a transaction remain in
progress after the channel has been terminated? I mean once you
terminate a bridge crossing point there are no cars going over that
bridge. It is closed. Their trip has been terminated as well.

The question that remains is which of those cars passed the bridge
before it was terminated. I understand that you want first to pause
that channel looking at the bytes and then terminate. Totally
understand. Otherwise the number of bytes transfered (reside) can no
longer be queried because the driver cleaned up. Fine.

But it should be reported that this transaction is no longer in
movement. It won't happen. Never. And this is what bugs me.

>> And from [0] the API explanation  "4. … dma_async_is_tx_complete()":
> 
> 
>>
>> |Note:
>> |    Not all DMA engine drivers can return reliable information for
>> |    a running DMA channel.  It is recommended that DMA engine users
>> |    pause or stop (via dmaengine_terminate_all) the channel before
>> |    using this API.
>>
>> So the documentation says to use the cookie with
>> dma_async_is_tx_complete() and before doing so it is recommended that
>> the transfer should be paused or stopped. _Exactly_ what is done here.
>>
>>>>     /* st is still DMA_IN_PROGRESS but _I_ expect DMA_COMPLETE because
>>>>      * it has been terminated / canceled
>>>>      */
>>>>
>>>> Both dma driver clean up all / terminate all descriptors as required but
>>>> _none_ of them completes the cookie. As a result dma_cookie_status()
>>>> still thinks that the transfer is in progress.
>>>
>>> Btw how does it matter in the driver here if the transaction completed or
>>> not after terminate_all() was invoked. What is the purpose of querying
>>> status.
>>
>> In the RX interrupt (of the 8250 unit), the code checks if the transfer
>> has been already started or not via querying the status. So if it
>> returns DMA_COMPLETE then a new transfer will be started. If it returns
>> DMA_IN_PROGRESS then the code returns doing nothing because the DMA
>> engine should start moving data anytime now so the RX interrupt goes
>> away.
>>
>> That means: If the transfer is canceled then it won't be started again.
>>
>> Btw: the current (probably only) dma driver that is used by 8250-dma is
>> dw/core.c. That one does cookie complete on terminate:
>>  dwc_control(DMA_TERMINATE_ALL) -> dwc_descriptor_complete() ->
>>  dma_cookie_complete().
> Yes but would above flow work for you :)

No, it wont. I ask you to accept the two patches since now the behavior
of dma_async_is_tx_complete() is different depending on whether a
transaction completed on its own or got terminated. And reporting
in_progress does not reflect the reality. I can even add a patch on top
updating the documentation where it state that you must not trust the
dma_tx_state struct once the request is completed (or aborted).

Or we can trade :) I send you a patch which removes the "cookie
complete" part from the dwc-dma driver with a CC stable tag.
And you explain it to the intel folks that merged that part of the 8250
dma code that rely on the fact (that a aborted transaction reports
complete) that what they do is wrong.

but I think it will end up with this: I will change the 8250-dma code to
keep track whether or not it enqueued a DMA transfer and rely on this
information instead of dma_async_is_tx_complete(). I will update the
documentation stating that dma_async_is_tx_complete() may report bogus
information on completed and terminated transaction.

Sebastian
diff mbox

Patch

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index d08c4de..ff05dd4 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -256,6 +256,7 @@  static int edma_terminate_all(struct edma_chan *echan)
 	 * echan->edesc is NULL and exit.)
 	 */
 	if (echan->edesc) {
+		dma_cookie_complete(&echan->edesc->vdesc.tx);
 		echan->edesc = NULL;
 		edma_stop(echan->ch_num);
 	}
@@ -282,7 +283,7 @@  static int edma_slave_config(struct edma_chan *echan,
 static int edma_dma_pause(struct edma_chan *echan)
 {
 	/* Pause/Resume only allowed with cyclic mode */
-	if (!echan->edesc->cyclic)
+	if (!echan->edesc || !echan->edesc->cyclic)
 		return -EINVAL;
 
 	edma_pause(echan->ch_num);