Message ID | Pine.LNX.4.64.1310051925170.8097@axis700.grange (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Oct 05, 2013 at 07:36:20PM +0200, Guennadi Liakhovetski wrote: > This patch extends dmaengine documentation to provide more details > on descriptor prepare stage, transaction completion requirements > and DMA error processing. > > Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> > --- > > These extensions reflect my understanding of some aspects of the dmaengine > API. If it is wrong, I'd be happy if someone could correct me. If or where > I'm right, I think, those aspects might want an update. Once we understand > exactly the situation we can think about improvements to the API. > > Documentation/dmaengine.txt | 58 ++++++++++++++++++++++++++++++++++++------ > 1 files changed, 49 insertions(+), 9 deletions(-) > > diff --git a/Documentation/dmaengine.txt b/Documentation/dmaengine.txt > index 879b6e3..21bb72d 100644 > --- a/Documentation/dmaengine.txt > +++ b/Documentation/dmaengine.txt > @@ -133,8 +133,17 @@ The slave DMA usage consists of following steps: > locks before calling the callback function which may cause a > deadlock. > > - Note that callbacks will always be invoked from the DMA > - engines tasklet, never from interrupt context. > + Note that callbacks will always be invoked from the DMA engine's > + interrupt processing bottom half, never from interrupt context. > + > + Note 2: > + A DMA transaction descriptor, returned to the user by one of "prep" > + methods is considered as belogning to the user until it is submitted > + to the dmaengine driver for transfer. However, it is recommended, that > + the dmaengine driver keeps references to prepared descriptors too, > + because if dmaengine_terminate_all() is called at that time, the driver > + will have to recycle all allocated descriptors for the respective > + channel. No. That's quite dangerous. think about what can happen. Thread 1 Thread 2 Driver calls prepare dmaengine_terminate_all() dmaengine driver frees prepared descriptor driver submits descriptor You now have a descriptor which has been freed submitted to the DMA engine queue. This will cause chaos. > 4. Submit the transaction > > @@ -150,15 +159,27 @@ The slave DMA usage consists of following steps: > dmaengine_submit() will not start the DMA operation, it merely adds > it to the pending queue. For this, see step 5, dma_async_issue_pending. > > -5. Issue pending DMA requests and wait for callback notification > +5. Issue pending DMA requests and (optionally) request callback notification > > - The transactions in the pending queue can be activated by calling the > - issue_pending API. If channel is idle then the first transaction in > - queue is started and subsequent ones queued up. > + Dmaengine drivers accumulate submitted transactions on a pending queue. > + After this call all such pending transactions are activated. Transactions, > + submitted after this call will be queued again in a deactivated state until > + this function is called again. If at the time of this call the channel is > + idle then the first transaction in queue is started and subsequent ones are > + queued up. > > - On completion of each DMA operation, the next in queue is started and > - a tasklet triggered. The tasklet will then call the client driver > - completion callback routine for notification, if set. > + On completion of each DMA operation, the next active transaction in queue is > + started and the ISR bottom half, e.g. a tasklet or a kernel thread is > + triggered. Or a kernel thread? I don't think that's right. It's always been specified that the callback will happen from tasklet context. > + The dmaengine driver also has to check the DMA_CTRL_ACK flag by calling > + async_tx_test_ack() on the transaction. If the function returns true, i.e. > + if the transaction either has been flagged from the very beginning, or > + the user has flagged it later, then the transaction descriptor can be > + recycled immediately by the dmaengine driver. "has to check" I think is wrong. This is optional for slave only drivers, because most slave stuff doesn't use the ACK stuff - that's more for the async_tx APIs. > If the function returns > + false, the descriptor cannot be recycled yet and the dmaengine driver has > + to keep polling the descriptor until it is acknowledged by the user. > > Interface: > void dma_async_issue_pending(struct dma_chan *chan); > @@ -171,6 +192,14 @@ Further APIs: > discard data in the DMA FIFO which hasn't been fully transferred. > No callback functions will be called for any incomplete transfers. > > + Note: > + Transactions, aborted by this call are considered as failed. However, > + if the user requests their status, using dma_async_is_complete() / > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and > + DMA_SUCCESS will be returned. So, drivers are advised not to use those > + calls on transactions, submitted before a call to > + dmaengine_terminate_all(). The last sentence doesn't make sense.
Hi Russell, Thanks for explanations. On Sat, 5 Oct 2013, Russell King - ARM Linux wrote: > On Sat, Oct 05, 2013 at 07:36:20PM +0200, Guennadi Liakhovetski wrote: > > This patch extends dmaengine documentation to provide more details > > on descriptor prepare stage, transaction completion requirements > > and DMA error processing. > > > > Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> > > --- > > > > These extensions reflect my understanding of some aspects of the dmaengine > > API. If it is wrong, I'd be happy if someone could correct me. If or where > > I'm right, I think, those aspects might want an update. Once we understand > > exactly the situation we can think about improvements to the API. > > > > Documentation/dmaengine.txt | 58 ++++++++++++++++++++++++++++++++++++------ > > 1 files changed, 49 insertions(+), 9 deletions(-) > > > > diff --git a/Documentation/dmaengine.txt b/Documentation/dmaengine.txt > > index 879b6e3..21bb72d 100644 > > --- a/Documentation/dmaengine.txt > > +++ b/Documentation/dmaengine.txt > > @@ -133,8 +133,17 @@ The slave DMA usage consists of following steps: > > locks before calling the callback function which may cause a > > deadlock. > > > > - Note that callbacks will always be invoked from the DMA > > - engines tasklet, never from interrupt context. > > + Note that callbacks will always be invoked from the DMA engine's > > + interrupt processing bottom half, never from interrupt context. > > + > > + Note 2: > > + A DMA transaction descriptor, returned to the user by one of "prep" > > + methods is considered as belogning to the user until it is submitted > > + to the dmaengine driver for transfer. However, it is recommended, that > > + the dmaengine driver keeps references to prepared descriptors too, > > + because if dmaengine_terminate_all() is called at that time, the driver > > + will have to recycle all allocated descriptors for the respective > > + channel. > > No. That's quite dangerous. think about what can happen. > > Thread 1 Thread 2 > Driver calls prepare > dmaengine_terminate_all() > dmaengine driver frees prepared descriptor > driver submits descriptor > > You now have a descriptor which has been freed submitted to the DMA engine > queue. This will cause chaos. Yes, I understand this. So, it is intentional, that after a *_prep_* a descriptor belongs to the user and - if the user fails - it will simply be leaked. A terminate-all shouldn't recycle them and dmaengine driver unbinding is impossible at that time, as long as the user hasn't released the channel. Ok, I can rework the above just to explain this. > > 4. Submit the transaction > > > > @@ -150,15 +159,27 @@ The slave DMA usage consists of following steps: > > dmaengine_submit() will not start the DMA operation, it merely adds > > it to the pending queue. For this, see step 5, dma_async_issue_pending. > > > > -5. Issue pending DMA requests and wait for callback notification > > +5. Issue pending DMA requests and (optionally) request callback notification > > > > - The transactions in the pending queue can be activated by calling the > > - issue_pending API. If channel is idle then the first transaction in > > - queue is started and subsequent ones queued up. > > + Dmaengine drivers accumulate submitted transactions on a pending queue. > > + After this call all such pending transactions are activated. Transactions, > > + submitted after this call will be queued again in a deactivated state until > > + this function is called again. If at the time of this call the channel is > > + idle then the first transaction in queue is started and subsequent ones are > > + queued up. > > > > - On completion of each DMA operation, the next in queue is started and > > - a tasklet triggered. The tasklet will then call the client driver > > - completion callback routine for notification, if set. > > + On completion of each DMA operation, the next active transaction in queue is > > + started and the ISR bottom half, e.g. a tasklet or a kernel thread is > > + triggered. > > Or a kernel thread? I don't think that's right. It's always been > specified that the callback will happen from tasklet context. Do you see any problems using, say, a threaded interrupt handler, apart from possible performance issues? That seems to be pretty convenient. Otherwise we should really mandate somewhere, that bottom half processing should take place in a tasklet? > > + The dmaengine driver also has to check the DMA_CTRL_ACK flag by calling > > + async_tx_test_ack() on the transaction. If the function returns true, i.e. > > + if the transaction either has been flagged from the very beginning, or > > + the user has flagged it later, then the transaction descriptor can be > > + recycled immediately by the dmaengine driver. > > "has to check" I think is wrong. This is optional for slave only drivers, > because most slave stuff doesn't use the ACK stuff - that's more for the > async_tx APIs. "most," but some do? E.g. the mx3_camera driver. Ok, that one is very tightly bound to the respective dmaengine driver. But if there are other slave drivers, using that, that can run on different platforms and use various dmaengine drivers? > > If the function returns > > + false, the descriptor cannot be recycled yet and the dmaengine driver has > > + to keep polling the descriptor until it is acknowledged by the user. > > > > Interface: > > void dma_async_issue_pending(struct dma_chan *chan); > > @@ -171,6 +192,14 @@ Further APIs: > > discard data in the DMA FIFO which hasn't been fully transferred. > > No callback functions will be called for any incomplete transfers. > > > > + Note: > > + Transactions, aborted by this call are considered as failed. However, > > + if the user requests their status, using dma_async_is_complete() / > > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and > > + DMA_SUCCESS will be returned. So, drivers are advised not to use those > > + calls on transactions, submitted before a call to > > + dmaengine_terminate_all(). > > The last sentence doesn't make sense. Right, sorry, there is a typo in the second line (to remind: this note is for the dmaengine_terminate_all() function): + Note: + Transactions, aborted by this call are considered as failed. However, + if the user requests their status, using dma_async_is_tx_complete() / + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and + DMA_SUCCESS will be returned. So, drivers are advised not to use those + calls on transactions, submitted before a call to + dmaengine_terminate_all(). What I was trying to say, is that if you submit transactions, then terminate them, then try to retrieve their status using dma_async_is_tx_complete() / dma_async_is_complete(), this won't produce the expected DMA_ERROR result, instead, one of DMA_IN_PROGRESS and DMA_SUCCESS will be returned, unless the driver implements some sophisticated .device_tx_status() method. As explained further in the patch, this is similar and even more damaging in case of DMA errors. The dmaengine driver has to silently drop queued and active transactions and users can only use a timeout to verify, whether transactions succeeded. Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/
On Sat, Oct 05, 2013 at 11:00:45PM +0200, Guennadi Liakhovetski wrote: > On Sat, 5 Oct 2013, Russell King - ARM Linux wrote: > > > On Sat, Oct 05, 2013 at 07:36:20PM +0200, Guennadi Liakhovetski wrote: > > > + A DMA transaction descriptor, returned to the user by one of "prep" > > > + methods is considered as belogning to the user until it is submitted > > > + to the dmaengine driver for transfer. However, it is recommended, that > > > + the dmaengine driver keeps references to prepared descriptors too, > > > + because if dmaengine_terminate_all() is called at that time, the driver > > > + will have to recycle all allocated descriptors for the respective > > > + channel. > > > > No. That's quite dangerous. think about what can happen. > > > > Thread 1 Thread 2 > > Driver calls prepare > > dmaengine_terminate_all() > > dmaengine driver frees prepared descriptor > > driver submits descriptor > > > > You now have a descriptor which has been freed submitted to the DMA engine > > queue. This will cause chaos. > > Yes, I understand this. So, it is intentional, that after a *_prep_* a > descriptor belongs to the user and - if the user fails - it will simply be > leaked. A terminate-all shouldn't recycle them and dmaengine driver > unbinding is impossible at that time, as long as the user hasn't released > the channel. Ok, I can rework the above just to explain this. "the user fails" should be very difficult. One of the requirements here is that any "user" submits the prepared descriptor as soon as possible after preparation. Some DMA engine implementations hold a spinlock between preparation and submission so this is absolutely necessary. As Dan Williams explained to me, the reason for the separate submission step is there to allow the callback information to be set. So literally any "user" of this should do this: desc = prepare(); if (!desc) goto failed_to_prepare; desc->callback = function; desc->callback_param = param; dmaengine_submit(desc); There should be very little possiblity of the user failing between those two calls. > > > - On completion of each DMA operation, the next in queue is started and > > > - a tasklet triggered. The tasklet will then call the client driver ^^^^^^^^^^^^^^^^^^^^ > > > - completion callback routine for notification, if set. > > > + On completion of each DMA operation, the next active transaction in queue is > > > + started and the ISR bottom half, e.g. a tasklet or a kernel thread is > > > + triggered. > > > > Or a kernel thread? I don't think that's right. It's always been > > specified that the callback will happen from tasklet context. > > Do you see any problems using, say, a threaded interrupt handler, apart > from possible performance issues? That seems to be pretty convenient. > Otherwise we should really mandate somewhere, that bottom half processing > should take place in a tasklet? The documentation has always stated that callbacks will be made from tasklet context. The problem with allowing different contexts from different drivers is taht spinlocking becomes problematical. Remember that we have _bh() variants which lock against tasklets but allow IRQs. > > > + The dmaengine driver also has to check the DMA_CTRL_ACK flag by calling > > > + async_tx_test_ack() on the transaction. If the function returns true, i.e. > > > + if the transaction either has been flagged from the very beginning, or > > > + the user has flagged it later, then the transaction descriptor can be > > > + recycled immediately by the dmaengine driver. > > > > "has to check" I think is wrong. This is optional for slave only drivers, > > because most slave stuff doesn't use the ACK stuff - that's more for the > > async_tx APIs. > > "most," but some do? E.g. the mx3_camera driver. Ok, that one is very > tightly bound to the respective dmaengine driver. But if there are other > slave drivers, using that, that can run on different platforms and use > various dmaengine drivers? I'm not aware of any which actually make use of the ack stuff. Only those which implement the async_tx must implement this because it gets used for chaining and dependency stuff. However, there was a plan to remove this and replace it with proper refcounting. > > > If the function returns > > > + false, the descriptor cannot be recycled yet and the dmaengine driver has > > > + to keep polling the descriptor until it is acknowledged by the user. > > > > > > Interface: > > > void dma_async_issue_pending(struct dma_chan *chan); > > > @@ -171,6 +192,14 @@ Further APIs: > > > discard data in the DMA FIFO which hasn't been fully transferred. > > > No callback functions will be called for any incomplete transfers. > > > > > > + Note: > > > + Transactions, aborted by this call are considered as failed. However, > > > + if the user requests their status, using dma_async_is_complete() / > > > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and > > > + DMA_SUCCESS will be returned. So, drivers are advised not to use those > > > + calls on transactions, submitted before a call to > > > + dmaengine_terminate_all(). > > > > The last sentence doesn't make sense. > > Right, sorry, there is a typo in the second line (to remind: this note is > for the dmaengine_terminate_all() function): > > + Note: > + Transactions, aborted by this call are considered as failed. However, > + if the user requests their status, using dma_async_is_tx_complete() / > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and > + DMA_SUCCESS will be returned. So, drivers are advised not to use those > + calls on transactions, submitted before a call to > + dmaengine_terminate_all(). > > What I was trying to say, is that if you submit transactions, then > terminate them, then try to retrieve their status using > dma_async_is_tx_complete() / dma_async_is_complete(), this won't produce > the expected DMA_ERROR result, instead, one of DMA_IN_PROGRESS and > DMA_SUCCESS will be returned, unless the driver implements some > sophisticated .device_tx_status() method. Remember that dmaengine_terminate_all() is targetted to a specific channel - it terminates all submitted transactions on that channel and stops any in-process transaction. It doesn't affect any other channel on the controller. In the case of slave DMA users, you have to "own" the channel. This means it's up to the user to sort out what happens should _they_ call dmaengine_terminate_all() on it. For non-slave DMA users, I don't believe that terminating transactions is permitted - they're expected to complete.
Pls keep Dan cced! On Sun, Oct 06, 2013 at 12:31:37AM +0100, Russell King - ARM Linux wrote: > On Sat, Oct 05, 2013 at 11:00:45PM +0200, Guennadi Liakhovetski wrote: > > On Sat, 5 Oct 2013, Russell King - ARM Linux wrote: > > > > > On Sat, Oct 05, 2013 at 07:36:20PM +0200, Guennadi Liakhovetski wrote: > > > > + A DMA transaction descriptor, returned to the user by one of "prep" > > > > + methods is considered as belogning to the user until it is submitted > > > > + to the dmaengine driver for transfer. However, it is recommended, that > > > > + the dmaengine driver keeps references to prepared descriptors too, > > > > + because if dmaengine_terminate_all() is called at that time, the driver > > > > + will have to recycle all allocated descriptors for the respective > > > > + channel. > > > > > > No. That's quite dangerous. think about what can happen. > > > > > > Thread 1 Thread 2 > > > Driver calls prepare > > > dmaengine_terminate_all() > > > dmaengine driver frees prepared descriptor > > > driver submits descriptor > > > > > > You now have a descriptor which has been freed submitted to the DMA engine > > > queue. This will cause chaos. > > > > Yes, I understand this. So, it is intentional, that after a *_prep_* a > > descriptor belongs to the user and - if the user fails - it will simply be > > leaked. A terminate-all shouldn't recycle them and dmaengine driver > > unbinding is impossible at that time, as long as the user hasn't released > > the channel. Ok, I can rework the above just to explain this. > > "the user fails" should be very difficult. One of the requirements here > is that any "user" submits the prepared descriptor as soon as possible > after preparation. Some DMA engine implementations hold a spinlock > between preparation and submission so this is absolutely necessary. > > As Dan Williams explained to me, the reason for the separate submission > step is there to allow the callback information to be set. So literally > any "user" of this should do this: > > desc = prepare(); > if (!desc) > goto failed_to_prepare; > > desc->callback = function; > desc->callback_param = param; > dmaengine_submit(desc); > > There should be very little possiblity of the user failing between those > two calls. > > > > > - On completion of each DMA operation, the next in queue is started and > > > > - a tasklet triggered. The tasklet will then call the client driver > ^^^^^^^^^^^^^^^^^^^^ > > > > - completion callback routine for notification, if set. > > > > + On completion of each DMA operation, the next active transaction in queue is > > > > + started and the ISR bottom half, e.g. a tasklet or a kernel thread is > > > > + triggered. > > > > > > Or a kernel thread? I don't think that's right. It's always been > > > specified that the callback will happen from tasklet context. > > > > Do you see any problems using, say, a threaded interrupt handler, apart > > from possible performance issues? That seems to be pretty convenient. > > Otherwise we should really mandate somewhere, that bottom half processing > > should take place in a tasklet? > > The documentation has always stated that callbacks will be made from > tasklet context. The problem with allowing different contexts from > different drivers is taht spinlocking becomes problematical. Remember > that we have _bh() variants which lock against tasklets but allow IRQs. > > > > > + The dmaengine driver also has to check the DMA_CTRL_ACK flag by calling > > > > + async_tx_test_ack() on the transaction. If the function returns true, i.e. > > > > + if the transaction either has been flagged from the very beginning, or > > > > + the user has flagged it later, then the transaction descriptor can be > > > > + recycled immediately by the dmaengine driver. > > > > > > "has to check" I think is wrong. This is optional for slave only drivers, > > > because most slave stuff doesn't use the ACK stuff - that's more for the > > > async_tx APIs. > > > > "most," but some do? E.g. the mx3_camera driver. Ok, that one is very > > tightly bound to the respective dmaengine driver. But if there are other > > slave drivers, using that, that can run on different platforms and use > > various dmaengine drivers? > > I'm not aware of any which actually make use of the ack stuff. Only those > which implement the async_tx must implement this because it gets used for > chaining and dependency stuff. However, there was a plan to remove this > and replace it with proper refcounting. > > > > > If the function returns > > > > + false, the descriptor cannot be recycled yet and the dmaengine driver has > > > > + to keep polling the descriptor until it is acknowledged by the user. > > > > > > > > Interface: > > > > void dma_async_issue_pending(struct dma_chan *chan); > > > > @@ -171,6 +192,14 @@ Further APIs: > > > > discard data in the DMA FIFO which hasn't been fully transferred. > > > > No callback functions will be called for any incomplete transfers. > > > > > > > > + Note: > > > > + Transactions, aborted by this call are considered as failed. However, > > > > + if the user requests their status, using dma_async_is_complete() / > > > > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and > > > > + DMA_SUCCESS will be returned. So, drivers are advised not to use those > > > > + calls on transactions, submitted before a call to > > > > + dmaengine_terminate_all(). > > > > > > The last sentence doesn't make sense. > > > > Right, sorry, there is a typo in the second line (to remind: this note is > > for the dmaengine_terminate_all() function): > > > > + Note: > > + Transactions, aborted by this call are considered as failed. However, > > + if the user requests their status, using dma_async_is_tx_complete() / > > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and > > + DMA_SUCCESS will be returned. So, drivers are advised not to use those > > + calls on transactions, submitted before a call to > > + dmaengine_terminate_all(). If you submit a bunch of descriptors then I see no reason when something in middle failed, the status for first descriptor which succedded is returned as an error even if user aborted the channel. The status query happens for a cookie which represents a descriptor. It doesnt return channel status. > > What I was trying to say, is that if you submit transactions, then > > terminate them, then try to retrieve their status using > > dma_async_is_tx_complete() / dma_async_is_complete(), this won't produce > > the expected DMA_ERROR result, instead, one of DMA_IN_PROGRESS and > > DMA_SUCCESS will be returned, unless the driver implements some > > sophisticated .device_tx_status() method. Well the moment you aborted the channel all the previously submitted descriptors can be freed by the driver. So you cant do anything with these. In the client driver care should be taken to drop all the references of descriptors and then terminate the channel. > Remember that dmaengine_terminate_all() is targetted to a specific > channel - it terminates all submitted transactions on that channel > and stops any in-process transaction. It doesn't affect any other > channel on the controller. > > In the case of slave DMA users, you have to "own" the channel. This > means it's up to the user to sort out what happens should _they_ call > dmaengine_terminate_all() on it. > > For non-slave DMA users, I don't believe that terminating transactions > is permitted - they're expected to complete.
Hi Russell On Sun, 6 Oct 2013, Russell King - ARM Linux wrote: > On Sat, Oct 05, 2013 at 11:00:45PM +0200, Guennadi Liakhovetski wrote: > > On Sat, 5 Oct 2013, Russell King - ARM Linux wrote: > > > > > On Sat, Oct 05, 2013 at 07:36:20PM +0200, Guennadi Liakhovetski wrote: > > > > + A DMA transaction descriptor, returned to the user by one of "prep" > > > > + methods is considered as belogning to the user until it is submitted > > > > + to the dmaengine driver for transfer. However, it is recommended, that > > > > + the dmaengine driver keeps references to prepared descriptors too, > > > > + because if dmaengine_terminate_all() is called at that time, the driver > > > > + will have to recycle all allocated descriptors for the respective > > > > + channel. > > > > > > No. That's quite dangerous. think about what can happen. > > > > > > Thread 1 Thread 2 > > > Driver calls prepare > > > dmaengine_terminate_all() > > > dmaengine driver frees prepared descriptor > > > driver submits descriptor > > > > > > You now have a descriptor which has been freed submitted to the DMA engine > > > queue. This will cause chaos. > > > > Yes, I understand this. So, it is intentional, that after a *_prep_* a > > descriptor belongs to the user and - if the user fails - it will simply be > > leaked. A terminate-all shouldn't recycle them and dmaengine driver > > unbinding is impossible at that time, as long as the user hasn't released > > the channel. Ok, I can rework the above just to explain this. > > "the user fails" should be very difficult. One of the requirements here > is that any "user" submits the prepared descriptor as soon as possible > after preparation. Some DMA engine implementations hold a spinlock > between preparation and submission so this is absolutely necessary. Ouch... > As Dan Williams explained to me, the reason for the separate submission > step is there to allow the callback information to be set. So literally > any "user" of this should do this: > > desc = prepare(); > if (!desc) > goto failed_to_prepare; > > desc->callback = function; > desc->callback_param = param; > dmaengine_submit(desc); > > There should be very little possiblity of the user failing between those > two calls. I see. > > > > - On completion of each DMA operation, the next in queue is started and > > > > - a tasklet triggered. The tasklet will then call the client driver > ^^^^^^^^^^^^^^^^^^^^ > > > > - completion callback routine for notification, if set. > > > > + On completion of each DMA operation, the next active transaction in queue is > > > > + started and the ISR bottom half, e.g. a tasklet or a kernel thread is > > > > + triggered. > > > > > > Or a kernel thread? I don't think that's right. It's always been > > > specified that the callback will happen from tasklet context. > > > > Do you see any problems using, say, a threaded interrupt handler, apart > > from possible performance issues? That seems to be pretty convenient. > > Otherwise we should really mandate somewhere, that bottom half processing > > should take place in a tasklet? > > The documentation has always stated that callbacks will be made from > tasklet context. The problem with allowing different contexts from > different drivers is taht spinlocking becomes problematical. Remember > that we have _bh() variants which lock against tasklets but allow IRQs. Spinlocks are local to dmaengine drivers, and I currently see quite a few of them doing spin_lock_irq(save)() instead of _bh(). Some also take that lock in their ISR, like the pl330 does. The disadvantage of using a threaded IRQ, that I see so far, is that you then can only wake up the thread from the ISR, not from other contexts, but even then it should be possible, at least for some designs. > > > > + The dmaengine driver also has to check the DMA_CTRL_ACK flag by calling > > > > + async_tx_test_ack() on the transaction. If the function returns true, i.e. > > > > + if the transaction either has been flagged from the very beginning, or > > > > + the user has flagged it later, then the transaction descriptor can be > > > > + recycled immediately by the dmaengine driver. > > > > > > "has to check" I think is wrong. This is optional for slave only drivers, > > > because most slave stuff doesn't use the ACK stuff - that's more for the > > > async_tx APIs. > > > > "most," but some do? E.g. the mx3_camera driver. Ok, that one is very > > tightly bound to the respective dmaengine driver. But if there are other > > slave drivers, using that, that can run on different platforms and use > > various dmaengine drivers? > > I'm not aware of any which actually make use of the ack stuff. Only those > which implement the async_tx must implement this because it gets used for > chaining and dependency stuff. However, there was a plan to remove this > and replace it with proper refcounting. Ok. > > > > If the function returns > > > > + false, the descriptor cannot be recycled yet and the dmaengine driver has > > > > + to keep polling the descriptor until it is acknowledged by the user. > > > > > > > > Interface: > > > > void dma_async_issue_pending(struct dma_chan *chan); > > > > @@ -171,6 +192,14 @@ Further APIs: > > > > discard data in the DMA FIFO which hasn't been fully transferred. > > > > No callback functions will be called for any incomplete transfers. > > > > > > > > + Note: > > > > + Transactions, aborted by this call are considered as failed. However, > > > > + if the user requests their status, using dma_async_is_complete() / > > > > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and > > > > + DMA_SUCCESS will be returned. So, drivers are advised not to use those > > > > + calls on transactions, submitted before a call to > > > > + dmaengine_terminate_all(). > > > > > > The last sentence doesn't make sense. > > > > Right, sorry, there is a typo in the second line (to remind: this note is > > for the dmaengine_terminate_all() function): > > > > + Note: > > + Transactions, aborted by this call are considered as failed. However, > > + if the user requests their status, using dma_async_is_tx_complete() / > > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and > > + DMA_SUCCESS will be returned. So, drivers are advised not to use those > > + calls on transactions, submitted before a call to > > + dmaengine_terminate_all(). > > > > What I was trying to say, is that if you submit transactions, then > > terminate them, then try to retrieve their status using > > dma_async_is_tx_complete() / dma_async_is_complete(), this won't produce > > the expected DMA_ERROR result, instead, one of DMA_IN_PROGRESS and > > DMA_SUCCESS will be returned, unless the driver implements some > > sophisticated .device_tx_status() method. > > Remember that dmaengine_terminate_all() is targetted to a specific > channel - it terminates all submitted transactions on that channel > and stops any in-process transaction. It doesn't affect any other > channel on the controller. > > In the case of slave DMA users, you have to "own" the channel. This > means it's up to the user to sort out what happens should _they_ call > dmaengine_terminate_all() on it. > > For non-slave DMA users, I don't believe that terminating transactions > is permitted - they're expected to complete. Ok, but the DMA error case remains, right? I.e. in case the dmaengine driver had to drop transactions due to a hardware problem, the client driver has no way to retrieve their status? Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/
On Mon, Oct 07, 2013 at 12:17:28PM +0100, Russell King - ARM Linux wrote: > On Mon, Oct 07, 2013 at 12:45:33PM +0200, Guennadi Liakhovetski wrote: > > No, not something in the middle. I was thinking about > > > > (1) cookie 1-3 are submitted > > (2) cookie 1 succeeds > > (3) a DMA error occurs, cookies 2-3 are discarded discarded using terminate_all right? > > (4) cookie 4 is submitted and succeeds > > (5) the user requests status of cookie 2 and gets success back, AFAICS No you cant!, you can only request for 4.. > > This is a side effect of using a sequential cookie based system to indicate > whether a descriptor has succeeded or not. In the above case, since user calls terminate_all we can put a rule that terminate should reset the cookie counter. So after the terminate_all the cookies are sequentially valid! Anyways as pointed above user shouldnt check for 2, he should have removed all refs till 3 before calling terminate. > What may be better is to change the wording here: not DMA_SUCCESS but > DMA_COMPLETED. That doesn't imply that it has been successful, merely > that the DMA engine has finished with the transaction. Agreed that its not indication of success but of DMA completetion. I have seen cases where slave perhiphral got stuck while sending last FIFO but since DMA finished transferiing to FIFO it says complete. Dan do you agree? > How a transaction gets to notify that it's been in error is a problem - > there is no mechanism to do that (part of that is because DMA engine slave > support grew out of the async_tx stuff which doesn't really have the > notion of failure.) > > We can't fetch the descriptor after it has been completed, because it will > have been freed or recycled depending on the implementation. > > If we want to have some way to report errors against descriptors, I think > we should augment the descriptor with another callback which gets called > (again from tasklet context) so that drivers which care about this can be > notified of errors. In case DMA engine gets an error interrupt we can do so, but same can be done with current callback by adding status bit. In more pratical scenarios its that DMA is stuck due to FIFO not moving, or wrong periphral mapping or some bug. In these cases dma driver cannot find it is stuck. So recommendation would be for client to put a timeout and after timeout assume the transfers are not completed
On Mon, Oct 07, 2013 at 12:45:33PM +0200, Guennadi Liakhovetski wrote: > On Sun, 6 Oct 2013, Vinod Koul wrote: > > > > Do you see any problems using, say, a threaded interrupt handler, apart > > > > from possible performance issues? That seems to be pretty convenient. > > > > Otherwise we should really mandate somewhere, that bottom half processing > > > > should take place in a tasklet? > > > > > > The documentation has always stated that callbacks will be made from > > > tasklet context. The problem with allowing different contexts from > > > different drivers is taht spinlocking becomes problematical. Remember > > > that we have _bh() variants which lock against tasklets but allow IRQs. > > Vinod, Dan, what do you think about the bottom half interrupt processing? > Do we want to make the use of a tasklet compulsory or can we also allow > other contexts? I dont see any advantage of using threaded handler as compared to tasklet, As Russell pointed out its going to make locking and common handling very complicated to be invoked from different contexts. What will be benefit from this? -- ~Vinod
On Sun, 6 Oct 2013, Vinod Koul wrote: > Pls keep Dan cced! Sure, sorry. > On Sun, Oct 06, 2013 at 12:31:37AM +0100, Russell King - ARM Linux wrote: > > On Sat, Oct 05, 2013 at 11:00:45PM +0200, Guennadi Liakhovetski wrote: > > > On Sat, 5 Oct 2013, Russell King - ARM Linux wrote: > > > > > > > On Sat, Oct 05, 2013 at 07:36:20PM +0200, Guennadi Liakhovetski wrote: > > > > > + A DMA transaction descriptor, returned to the user by one of "prep" > > > > > + methods is considered as belogning to the user until it is submitted > > > > > + to the dmaengine driver for transfer. However, it is recommended, that > > > > > + the dmaengine driver keeps references to prepared descriptors too, > > > > > + because if dmaengine_terminate_all() is called at that time, the driver > > > > > + will have to recycle all allocated descriptors for the respective > > > > > + channel. > > > > > > > > No. That's quite dangerous. think about what can happen. > > > > > > > > Thread 1 Thread 2 > > > > Driver calls prepare > > > > dmaengine_terminate_all() > > > > dmaengine driver frees prepared descriptor > > > > driver submits descriptor > > > > > > > > You now have a descriptor which has been freed submitted to the DMA engine > > > > queue. This will cause chaos. > > > > > > Yes, I understand this. So, it is intentional, that after a *_prep_* a > > > descriptor belongs to the user and - if the user fails - it will simply be > > > leaked. A terminate-all shouldn't recycle them and dmaengine driver > > > unbinding is impossible at that time, as long as the user hasn't released > > > the channel. Ok, I can rework the above just to explain this. > > > > "the user fails" should be very difficult. One of the requirements here > > is that any "user" submits the prepared descriptor as soon as possible > > after preparation. Some DMA engine implementations hold a spinlock > > between preparation and submission so this is absolutely necessary. > > > > As Dan Williams explained to me, the reason for the separate submission > > step is there to allow the callback information to be set. So literally > > any "user" of this should do this: > > > > desc = prepare(); > > if (!desc) > > goto failed_to_prepare; > > > > desc->callback = function; > > desc->callback_param = param; > > dmaengine_submit(desc); > > > > There should be very little possiblity of the user failing between those > > two calls. > > > > > > > - On completion of each DMA operation, the next in queue is started and > > > > > - a tasklet triggered. The tasklet will then call the client driver > > ^^^^^^^^^^^^^^^^^^^^ > > > > > - completion callback routine for notification, if set. > > > > > + On completion of each DMA operation, the next active transaction in queue is > > > > > + started and the ISR bottom half, e.g. a tasklet or a kernel thread is > > > > > + triggered. > > > > > > > > Or a kernel thread? I don't think that's right. It's always been > > > > specified that the callback will happen from tasklet context. > > > > > > Do you see any problems using, say, a threaded interrupt handler, apart > > > from possible performance issues? That seems to be pretty convenient. > > > Otherwise we should really mandate somewhere, that bottom half processing > > > should take place in a tasklet? > > > > The documentation has always stated that callbacks will be made from > > tasklet context. The problem with allowing different contexts from > > different drivers is taht spinlocking becomes problematical. Remember > > that we have _bh() variants which lock against tasklets but allow IRQs. Vinod, Dan, what do you think about the bottom half interrupt processing? Do we want to make the use of a tasklet compulsory or can we also allow other contexts? > > > > > + The dmaengine driver also has to check the DMA_CTRL_ACK flag by calling > > > > > + async_tx_test_ack() on the transaction. If the function returns true, i.e. > > > > > + if the transaction either has been flagged from the very beginning, or > > > > > + the user has flagged it later, then the transaction descriptor can be > > > > > + recycled immediately by the dmaengine driver. > > > > > > > > "has to check" I think is wrong. This is optional for slave only drivers, > > > > because most slave stuff doesn't use the ACK stuff - that's more for the > > > > async_tx APIs. > > > > > > "most," but some do? E.g. the mx3_camera driver. Ok, that one is very > > > tightly bound to the respective dmaengine driver. But if there are other > > > slave drivers, using that, that can run on different platforms and use > > > various dmaengine drivers? > > > > I'm not aware of any which actually make use of the ack stuff. Only those > > which implement the async_tx must implement this because it gets used for > > chaining and dependency stuff. However, there was a plan to remove this > > and replace it with proper refcounting. > > > > > > > If the function returns > > > > > + false, the descriptor cannot be recycled yet and the dmaengine driver has > > > > > + to keep polling the descriptor until it is acknowledged by the user. > > > > > > > > > > Interface: > > > > > void dma_async_issue_pending(struct dma_chan *chan); > > > > > @@ -171,6 +192,14 @@ Further APIs: > > > > > discard data in the DMA FIFO which hasn't been fully transferred. > > > > > No callback functions will be called for any incomplete transfers. > > > > > > > > > > + Note: > > > > > + Transactions, aborted by this call are considered as failed. However, > > > > > + if the user requests their status, using dma_async_is_complete() / > > > > > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and > > > > > + DMA_SUCCESS will be returned. So, drivers are advised not to use those > > > > > + calls on transactions, submitted before a call to > > > > > + dmaengine_terminate_all(). > > > > > > > > The last sentence doesn't make sense. > > > > > > Right, sorry, there is a typo in the second line (to remind: this note is > > > for the dmaengine_terminate_all() function): > > > > > > + Note: > > > + Transactions, aborted by this call are considered as failed. However, > > > + if the user requests their status, using dma_async_is_tx_complete() / > > > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and > > > + DMA_SUCCESS will be returned. So, drivers are advised not to use those > > > + calls on transactions, submitted before a call to > > > + dmaengine_terminate_all(). > If you submit a bunch of descriptors then I see no reason when something in > middle failed, the status for first descriptor which succedded is returned as an > error even if user aborted the channel. The status query happens for a cookie > which represents a descriptor. It doesnt return channel status. No, not something in the middle. I was thinking about (1) cookie 1-3 are submitted (2) cookie 1 succeeds (3) a DMA error occurs, cookies 2-3 are discarded (4) cookie 4 is submitted and succeeds (5) the user requests status of cookie 2 and gets success back, AFAICS > > > What I was trying to say, is that if you submit transactions, then > > > terminate them, then try to retrieve their status using > > > dma_async_is_tx_complete() / dma_async_is_complete(), this won't produce > > > the expected DMA_ERROR result, instead, one of DMA_IN_PROGRESS and > > > DMA_SUCCESS will be returned, unless the driver implements some > > > sophisticated .device_tx_status() method. > Well the moment you aborted the channel all the previously submitted descriptors > can be freed by the driver. So you cant do anything with these. In the client > driver care should be taken to drop all the references of descriptors and then > terminate the channel. > > > Remember that dmaengine_terminate_all() is targetted to a specific > > channel - it terminates all submitted transactions on that channel > > and stops any in-process transaction. It doesn't affect any other > > channel on the controller. > > > > In the case of slave DMA users, you have to "own" the channel. This > > means it's up to the user to sort out what happens should _they_ call > > dmaengine_terminate_all() on it. > > > > For non-slave DMA users, I don't believe that terminating transactions > > is permitted - they're expected to complete. > > -- > ~Vinod Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/
On Mon, Oct 07, 2013 at 12:45:33PM +0200, Guennadi Liakhovetski wrote: > No, not something in the middle. I was thinking about > > (1) cookie 1-3 are submitted > (2) cookie 1 succeeds > (3) a DMA error occurs, cookies 2-3 are discarded > (4) cookie 4 is submitted and succeeds > (5) the user requests status of cookie 2 and gets success back, AFAICS This is a side effect of using a sequential cookie based system to indicate whether a descriptor has succeeded or not. What may be better is to change the wording here: not DMA_SUCCESS but DMA_COMPLETED. That doesn't imply that it has been successful, merely that the DMA engine has finished with the transaction. How a transaction gets to notify that it's been in error is a problem - there is no mechanism to do that (part of that is because DMA engine slave support grew out of the async_tx stuff which doesn't really have the notion of failure.) We can't fetch the descriptor after it has been completed, because it will have been freed or recycled depending on the implementation. If we want to have some way to report errors against descriptors, I think we should augment the descriptor with another callback which gets called (again from tasklet context) so that drivers which care about this can be notified of errors.
On Mon, 7 Oct 2013, Vinod Koul wrote: > On Mon, Oct 07, 2013 at 12:17:28PM +0100, Russell King - ARM Linux wrote: > > On Mon, Oct 07, 2013 at 12:45:33PM +0200, Guennadi Liakhovetski wrote: > > > No, not something in the middle. I was thinking about > > > > > > (1) cookie 1-3 are submitted > > > (2) cookie 1 succeeds > > > (3) a DMA error occurs, cookies 2-3 are discarded > discarded using terminate_all right? No, by the dmaengine driver as a part of the error processing. > > > (4) cookie 4 is submitted and succeeds > > > (5) the user requests status of cookie 2 and gets success back, AFAICS > No you cant!, you can only request for 4.. > > > > This is a side effect of using a sequential cookie based system to indicate > > whether a descriptor has succeeded or not. > In the above case, since user calls terminate_all we can put a rule that > terminate should reset the cookie counter. > So after the terminate_all the cookies are sequentially valid! Aren't all cookies "valid" at any time? I mean, even after such a reset, say, currently being at cookie #10, if the user asks for a status of cookie #1000, it will be returned as DMA_SUCCESS, won't it? > Anyways as pointed above user shouldnt check for 2, he should have removed all > refs till 3 before calling terminate. > > > What may be better is to change the wording here: not DMA_SUCCESS but > > DMA_COMPLETED. That doesn't imply that it has been successful, merely > > that the DMA engine has finished with the transaction. > Agreed that its not indication of success but of DMA completetion. I have seen > cases where slave perhiphral got stuck while sending last FIFO but since DMA > finished transferiing to FIFO it says complete. > > Dan do you agree? > > > How a transaction gets to notify that it's been in error is a problem - > > there is no mechanism to do that (part of that is because DMA engine slave > > support grew out of the async_tx stuff which doesn't really have the > > notion of failure.) > > > > We can't fetch the descriptor after it has been completed, because it will > > have been freed or recycled depending on the implementation. > > > > If we want to have some way to report errors against descriptors, I think > > we should augment the descriptor with another callback which gets called > > (again from tasklet context) so that drivers which care about this can be > > notified of errors. > In case DMA engine gets an error interrupt we can do so, but same can be done > with current callback by adding status bit. In more pratical scenarios its that > DMA is stuck due to FIFO not moving, or wrong periphral mapping or some bug. In > these cases dma driver cannot find it is stuck. So recommendation would be for > client to put a timeout and after timeout assume the transfers are not completed Yes, a timeout seems to be the only way so far for clients to identify a failure. Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/
On Mon, 7 Oct 2013, Vinod Koul wrote: > On Mon, Oct 07, 2013 at 12:45:33PM +0200, Guennadi Liakhovetski wrote: > > On Sun, 6 Oct 2013, Vinod Koul wrote: > > > > > Do you see any problems using, say, a threaded interrupt handler, apart > > > > > from possible performance issues? That seems to be pretty convenient. > > > > > Otherwise we should really mandate somewhere, that bottom half processing > > > > > should take place in a tasklet? > > > > > > > > The documentation has always stated that callbacks will be made from > > > > tasklet context. The problem with allowing different contexts from > > > > different drivers is taht spinlocking becomes problematical. Remember > > > > that we have _bh() variants which lock against tasklets but allow IRQs. > > > > Vinod, Dan, what do you think about the bottom half interrupt processing? > > Do we want to make the use of a tasklet compulsory or can we also allow > > other contexts? > I dont see any advantage of using threaded handler as compared to tasklet, As > Russell pointed out its going to make locking and common handling very > complicated to be invoked from different contexts. What will be benefit from > this? I think using a (devm_)request_threaded_irq() and returning IRQ_WAKE_THREAD in the ISR is prettier and easier than maintaining an additional tasklet. BTW, shdma-base.c already does that. Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/
On Mon, Oct 07, 2013 at 02:15:22PM +0200, Guennadi Liakhovetski wrote: > On Mon, 7 Oct 2013, Vinod Koul wrote: > > > On Mon, Oct 07, 2013 at 12:17:28PM +0100, Russell King - ARM Linux wrote: > > > On Mon, Oct 07, 2013 at 12:45:33PM +0200, Guennadi Liakhovetski wrote: > > > > No, not something in the middle. I was thinking about > > > > > > > > (1) cookie 1-3 are submitted > > > > (2) cookie 1 succeeds > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > discarded using terminate_all right? > > No, by the dmaengine driver as a part of the error processing. And how will that be done...? -- ~Vinod
On Mon, Oct 07, 2013 at 05:28:37PM +0200, Guennadi Liakhovetski wrote: > > > > > > No, not something in the middle. I was thinking about > > > > > > > > > > > > (1) cookie 1-3 are submitted > > > > > > (2) cookie 1 succeeds > > > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > > > discarded using terminate_all right? > > > > > > No, by the dmaengine driver as a part of the error processing. > > And how will that be done...? > > Sorry, I meant - DMA descriptors with cookies #2 and #3 will be cancelled > and recycled by the dmaengine driver. That's what you have to do, when > processing DMA error IRQ. Well how do you that? -- ~Vinod
On Mon, 7 Oct 2013, Vinod Koul wrote: > On Mon, Oct 07, 2013 at 02:15:22PM +0200, Guennadi Liakhovetski wrote: > > On Mon, 7 Oct 2013, Vinod Koul wrote: > > > > > On Mon, Oct 07, 2013 at 12:17:28PM +0100, Russell King - ARM Linux wrote: > > > > On Mon, Oct 07, 2013 at 12:45:33PM +0200, Guennadi Liakhovetski wrote: > > > > > No, not something in the middle. I was thinking about > > > > > > > > > > (1) cookie 1-3 are submitted > > > > > (2) cookie 1 succeeds > > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > > discarded using terminate_all right? > > > > No, by the dmaengine driver as a part of the error processing. > And how will that be done...? Sorry, I meant - DMA descriptors with cookies #2 and #3 will be cancelled and recycled by the dmaengine driver. That's what you have to do, when processing DMA error IRQ. Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/
On Mon, 7 Oct 2013, Vinod Koul wrote: > On Mon, Oct 07, 2013 at 05:28:37PM +0200, Guennadi Liakhovetski wrote: > > > > > > > No, not something in the middle. I was thinking about > > > > > > > > > > > > > > (1) cookie 1-3 are submitted > > > > > > > (2) cookie 1 succeeds > > > > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > > > > discarded using terminate_all right? > > > > > > > > No, by the dmaengine driver as a part of the error processing. > > > And how will that be done...? > > > > Sorry, I meant - DMA descriptors with cookies #2 and #3 will be cancelled > > and recycled by the dmaengine driver. That's what you have to do, when > > processing DMA error IRQ. > Well how do you that? Mmmh, maybe I'm missing something, but isn't it a part of the common DMA processing? You get an error IRQ; on some DMAC types this means, that you have to reset the hardware, so, you perform whatever actions you have to do to reset the controller; you remove any descriptors from the pending queue; reinsert them into the free queue and let any clients run on a timeout. I don't think it would be a good idea to do anything more smart like trying to restart the current transfer or drop it and continue with the queue, because we don't know in what state the client hardware is, so, we can only let the client driver try to recover. Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/
On Mon, Oct 07, 2013 at 05:28:37PM +0200, Guennadi Liakhovetski wrote: > > > > > > No, not something in the middle. I was thinking about > > > > > > > > > > > > (1) cookie 1-3 are submitted > > > > > > (2) cookie 1 succeeds > > > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > > > discarded using terminate_all right? > > > > > > No, by the dmaengine driver as a part of the error processing. > > And how will that be done...? > > Sorry, I meant - DMA descriptors with cookies #2 and #3 will be cancelled > and recycled by the dmaengine driver. That's what you have to do, when > processing DMA error IRQ. well the term cancel means someone went ahead and requested abort/terminate of a transaction... As we discussed in other mail on this thread the most common occurrence will be timeout on client side and client cant cancel selectively. For dma driver detection error, which is quite rare in slave usages, if people think its common for them we cna indicate this thru status in callback. Then client will know and doesnt need to query. Recycling cookie has a large space so i dont think we will get confused with a recycled one -- ~Vinod
On Mon, Oct 07, 2013 at 05:45:22PM +0200, Guennadi Liakhovetski wrote: > On Mon, 7 Oct 2013, Vinod Koul wrote: > > > On Mon, Oct 07, 2013 at 05:28:37PM +0200, Guennadi Liakhovetski wrote: > > > > > > > > No, not something in the middle. I was thinking about > > > > > > > > > > > > > > > > (1) cookie 1-3 are submitted > > > > > > > > (2) cookie 1 succeeds > > > > > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > > > > > discarded using terminate_all right? > > > > > > > > > > No, by the dmaengine driver as a part of the error processing. > > > > And how will that be done...? > > > > > > Sorry, I meant - DMA descriptors with cookies #2 and #3 will be cancelled > > > and recycled by the dmaengine driver. That's what you have to do, when > > > processing DMA error IRQ. > > Well how do you that? > > Mmmh, maybe I'm missing something, but isn't it a part of the common DMA > processing? You get an error IRQ; on some DMAC types this means, that you > have to reset the hardware, so, you perform whatever actions you have to > do to reset the controller; you remove any descriptors from the pending > queue; reinsert them into the free queue and let any clients run on a > timeout. I don't think it would be a good idea to do anything more smart > like trying to restart the current transfer or drop it and continue with > the queue, because we don't know in what state the client hardware is, so, > we can only let the client driver try to recover. No that would be very wrong thing to do behind clients back. Suppose you got a trasaction which returned error irq and it was generated one half of the requested transfer was done. Redoing the entrie transaction wont be right! So I think you need to let client know the error status. But again, is this usage fiarly common? -- ~Vinod
On Mon, 7 Oct 2013, Vinod Koul wrote: > On Mon, Oct 07, 2013 at 05:28:37PM +0200, Guennadi Liakhovetski wrote: > > > > > > > No, not something in the middle. I was thinking about > > > > > > > > > > > > > > (1) cookie 1-3 are submitted > > > > > > > (2) cookie 1 succeeds > > > > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > > > > discarded using terminate_all right? > > > > > > > > No, by the dmaengine driver as a part of the error processing. > > > And how will that be done...? > > > > Sorry, I meant - DMA descriptors with cookies #2 and #3 will be cancelled > > and recycled by the dmaengine driver. That's what you have to do, when > > processing DMA error IRQ. > well the term cancel means someone went ahead and requested abort/terminate of a > transaction... > > As we discussed in other mail on this thread the most common occurrence will be > timeout on client side and client cant cancel selectively. > > For dma driver detection error, which is quite rare in slave usages, if people > think its common for them we cna indicate this thru status in callback. Then > client will know and doesnt need to query. > > Recycling cookie has a large space so i dont think we will get confused > with a recycled one If you reset cookies, as you proposed, IMHO, this can very well lead to a confusion: suppose cookie #1000 caused an error. We drop queued transfers with cookies #1001-1005 and reset to #1. When we're at #100 the user asks for a status of cookie #1003, which we have aborted. However, the following check in dma_async_is_complete(): if (last_complete <= last_used) { if ((cookie <= last_complete) || (cookie > last_used)) return DMA_SUCCESS; will confirm success to the user. Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/
On Mon, 7 Oct 2013, Vinod Koul wrote: > On Mon, Oct 07, 2013 at 05:45:22PM +0200, Guennadi Liakhovetski wrote: > > On Mon, 7 Oct 2013, Vinod Koul wrote: > > > > > On Mon, Oct 07, 2013 at 05:28:37PM +0200, Guennadi Liakhovetski wrote: > > > > > > > > > No, not something in the middle. I was thinking about > > > > > > > > > > > > > > > > > > (1) cookie 1-3 are submitted > > > > > > > > > (2) cookie 1 succeeds > > > > > > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > > > > > > discarded using terminate_all right? > > > > > > > > > > > > No, by the dmaengine driver as a part of the error processing. > > > > > And how will that be done...? > > > > > > > > Sorry, I meant - DMA descriptors with cookies #2 and #3 will be cancelled > > > > and recycled by the dmaengine driver. That's what you have to do, when > > > > processing DMA error IRQ. > > > Well how do you that? > > > > Mmmh, maybe I'm missing something, but isn't it a part of the common DMA > > processing? You get an error IRQ; on some DMAC types this means, that you > > have to reset the hardware, so, you perform whatever actions you have to > > do to reset the controller; you remove any descriptors from the pending > > queue; reinsert them into the free queue and let any clients run on a > > timeout. I don't think it would be a good idea to do anything more smart > > like trying to restart the current transfer or drop it and continue with > > the queue, because we don't know in what state the client hardware is, so, > > we can only let the client driver try to recover. > No that would be very wrong thing to do behind clients back. Suppose you got a > trasaction which returned error irq and it was generated one half of the > requested transfer was done. Redoing the entrie transaction wont be right! > > So I think you need to let client know the error status. > > But again, is this usage fiarly common? Hm, I think, the question is different: is this possible and realistic? If there's just one DMAC and one platform, I think, there should be a way to support it? There are controllers, that actually have separate error IRQ outputs and special status bits for them. Actually, see commit commit 47a4dc26eeb89a3746f9b1e2092602b40469640a Author: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Date: Thu Feb 11 16:50:05 2010 +0000 dmaengine: shdma: fix DMA error handling. Which means, there are indeed real life situations when such error IRQs trigger. Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/
On Mon, Oct 07, 2013 at 10:55:56PM +0200, Guennadi Liakhovetski wrote: > > > > > > > > > > No, not something in the middle. I was thinking about > > > > > > > > > > > > > > > > > > > > (1) cookie 1-3 are submitted > > > > > > > > > > (2) cookie 1 succeeds > > > > > > > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > > > > > > > discarded using terminate_all right? > > > > > > > > > > > > > > No, by the dmaengine driver as a part of the error processing. > > > > > > And how will that be done...? > > > > > > > > > > Sorry, I meant - DMA descriptors with cookies #2 and #3 will be cancelled > > > > > and recycled by the dmaengine driver. That's what you have to do, when > > > > > processing DMA error IRQ. > > > > Well how do you that? > > > > > > Mmmh, maybe I'm missing something, but isn't it a part of the common DMA > > > processing? You get an error IRQ; on some DMAC types this means, that you > > > have to reset the hardware, so, you perform whatever actions you have to > > > do to reset the controller; you remove any descriptors from the pending > > > queue; reinsert them into the free queue and let any clients run on a > > > timeout. I don't think it would be a good idea to do anything more smart > > > like trying to restart the current transfer or drop it and continue with > > > the queue, because we don't know in what state the client hardware is, so, > > > we can only let the client driver try to recover. > > No that would be very wrong thing to do behind clients back. Suppose you got a > > trasaction which returned error irq and it was generated one half of the > > requested transfer was done. Redoing the entrie transaction wont be right! > > > > So I think you need to let client know the error status. > > > > But again, is this usage fiarly common? > > Hm, I think, the question is different: is this possible and realistic? If > there's just one DMAC and one platform, I think, there should be a way to > support it? There are controllers, that actually have separate error IRQ > outputs and special status bits for them. Actually, see commit Fair enough, if this is something you commonly need to deal with then lets add a new callback for error reporting. I think this would be simpler and then client will know about the failure On failure reporting the client should query the status to know what is pending for this transaction then recover accordingly Will send patch for this and moving to DMA_COMPLETE as status -- ~Vinod
On Mon, Oct 07, 2013 at 10:43:51PM +0200, Guennadi Liakhovetski wrote: > On Mon, 7 Oct 2013, Vinod Koul wrote: > > > On Mon, Oct 07, 2013 at 05:28:37PM +0200, Guennadi Liakhovetski wrote: > > > > > > > > No, not something in the middle. I was thinking about > > > > > > > > > > > > > > > > (1) cookie 1-3 are submitted > > > > > > > > (2) cookie 1 succeeds > > > > > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > > > > > discarded using terminate_all right? > > > > > > > > > > No, by the dmaengine driver as a part of the error processing. > > > > And how will that be done...? > > > > > > Sorry, I meant - DMA descriptors with cookies #2 and #3 will be cancelled > > > and recycled by the dmaengine driver. That's what you have to do, when > > > processing DMA error IRQ. > > well the term cancel means someone went ahead and requested abort/terminate of a > > transaction... > > > > As we discussed in other mail on this thread the most common occurrence will be > > timeout on client side and client cant cancel selectively. > > > > For dma driver detection error, which is quite rare in slave usages, if people > > think its common for them we cna indicate this thru status in callback. Then > > client will know and doesnt need to query. > > > > Recycling cookie has a large space so i dont think we will get confused > > with a recycled one > > If you reset cookies, as you proposed, IMHO, this can very well lead to a > confusion: suppose cookie #1000 caused an error. We drop queued transfers > with cookies #1001-1005 and reset to #1. When we're at #100 the user asks > for a status of cookie #1003, which we have aborted. However, the > following check in dma_async_is_complete(): Well you are missing an important point. Who is doing abort. It is the client driver, the user. And as i said previously before calling terminate_all he will forget about the cookies allocated previosuly, So after that he cant invoke any of the old cookie. If he is, then that is wrong and needs to be fixed :) So while user is dealing with new transactions (cookies), he doesnt know anything about old ones and cant query as in above example 1003! > > if (last_complete <= last_used) { > if ((cookie <= last_complete) || (cookie > last_used)) > return DMA_SUCCESS; > > will confirm success to the user. Not in above example!, if you are at #100, query for #1003 will return DMA_IN_PROGRESS. -- ~Vinod
On Tue, 8 Oct 2013, Vinod Koul wrote: > On Mon, Oct 07, 2013 at 10:43:51PM +0200, Guennadi Liakhovetski wrote: > > On Mon, 7 Oct 2013, Vinod Koul wrote: > > > > > On Mon, Oct 07, 2013 at 05:28:37PM +0200, Guennadi Liakhovetski wrote: > > > > > > > > > No, not something in the middle. I was thinking about > > > > > > > > > > > > > > > > > > (1) cookie 1-3 are submitted > > > > > > > > > (2) cookie 1 succeeds > > > > > > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > > > > > > discarded using terminate_all right? > > > > > > > > > > > > No, by the dmaengine driver as a part of the error processing. > > > > > And how will that be done...? > > > > > > > > Sorry, I meant - DMA descriptors with cookies #2 and #3 will be cancelled > > > > and recycled by the dmaengine driver. That's what you have to do, when > > > > processing DMA error IRQ. > > > well the term cancel means someone went ahead and requested abort/terminate of a > > > transaction... > > > > > > As we discussed in other mail on this thread the most common occurrence will be > > > timeout on client side and client cant cancel selectively. > > > > > > For dma driver detection error, which is quite rare in slave usages, if people > > > think its common for them we cna indicate this thru status in callback. Then > > > client will know and doesnt need to query. > > > > > > Recycling cookie has a large space so i dont think we will get confused > > > with a recycled one > > > > If you reset cookies, as you proposed, IMHO, this can very well lead to a > > confusion: suppose cookie #1000 caused an error. We drop queued transfers > > with cookies #1001-1005 and reset to #1. When we're at #100 the user asks > > for a status of cookie #1003, which we have aborted. However, the > > following check in dma_async_is_complete(): > Well you are missing an important point. Who is doing abort. It is the client > driver, the user. No. "cookie #1000 caused an error" - a DMA error. The dmaengine driver detects it. The user knows nothing about it, so, they cannot call terminate_all(). The "we" above refers to the dmaengine driver. So, that's exactly the case, discussed in the other mail too. And your proposed error / status reporting callback will fix this, so, let's do that! Thanks Guennadi > And as i said previously before calling terminate_all he will forget about the > cookies allocated previosuly, So after that he cant invoke any of the old > cookie. If he is, then that is wrong and needs to be fixed :) > > So while user is dealing with new transactions (cookies), he doesnt know > anything about old ones and cant query as in above example 1003! > > > > if (last_complete <= last_used) { > > if ((cookie <= last_complete) || (cookie > last_used)) > > return DMA_SUCCESS; > > > > will confirm success to the user. > Not in above example!, if you are at #100, query for #1003 will return > DMA_IN_PROGRESS. > > -- > ~Vinod > --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/
Hi Vinod, On Tue, 8 Oct 2013, Vinod Koul wrote: > On Mon, Oct 07, 2013 at 10:55:56PM +0200, Guennadi Liakhovetski wrote: > > > > > > > > > > > No, not something in the middle. I was thinking about > > > > > > > > > > > > > > > > > > > > > > (1) cookie 1-3 are submitted > > > > > > > > > > > (2) cookie 1 succeeds > > > > > > > > > > > (3) a DMA error occurs, cookies 2-3 are discarded > > > > > > > > > discarded using terminate_all right? > > > > > > > > > > > > > > > > No, by the dmaengine driver as a part of the error processing. > > > > > > > And how will that be done...? > > > > > > > > > > > > Sorry, I meant - DMA descriptors with cookies #2 and #3 will be cancelled > > > > > > and recycled by the dmaengine driver. That's what you have to do, when > > > > > > processing DMA error IRQ. > > > > > Well how do you that? > > > > > > > > Mmmh, maybe I'm missing something, but isn't it a part of the common DMA > > > > processing? You get an error IRQ; on some DMAC types this means, that you > > > > have to reset the hardware, so, you perform whatever actions you have to > > > > do to reset the controller; you remove any descriptors from the pending > > > > queue; reinsert them into the free queue and let any clients run on a > > > > timeout. I don't think it would be a good idea to do anything more smart > > > > like trying to restart the current transfer or drop it and continue with > > > > the queue, because we don't know in what state the client hardware is, so, > > > > we can only let the client driver try to recover. > > > No that would be very wrong thing to do behind clients back. Suppose you got a > > > trasaction which returned error irq and it was generated one half of the > > > requested transfer was done. Redoing the entrie transaction wont be right! > > > > > > So I think you need to let client know the error status. > > > > > > But again, is this usage fiarly common? > > > > Hm, I think, the question is different: is this possible and realistic? If > > there's just one DMAC and one platform, I think, there should be a way to > > support it? There are controllers, that actually have separate error IRQ > > outputs and special status bits for them. Actually, see commit > Fair enough, if this is something you commonly need to deal with then lets add a > new callback for error reporting. I think this would be simpler and then client > will know about the failure > > On failure reporting the client should query the status to know what is pending > for this transaction then recover accordingly Yes, this should help, thanks! > Will send patch for this and moving to DMA_COMPLETE as status As for statuses: AFAICS in a non-cyclic case a dmaengine driver can distinguish between the following states: 1. queued: after .tx_submit(), before .device_issue_pending() 2. active: after .device_issue_pending(), before actually sending to the hardware 3. in progress: sent to the hardware, waiting for completion 4. waiting for ack: completed successfully, waiting for async_tx_ack() 5. unknown: the cookie cannot be found in any of the driver's lists Makes sense? Thanks Guennadi --- Guennadi Liakhovetski, Ph.D. Freelance Open-Source Software Developer http://www.open-technology.de/
On Mon, Oct 7, 2013 at 12:40 AM, Guennadi Liakhovetski <g.liakhovetski@gmx.de> wrote: > Hi Russell > > On Sun, 6 Oct 2013, Russell King - ARM Linux wrote: > >> On Sat, Oct 05, 2013 at 11:00:45PM +0200, Guennadi Liakhovetski wrote: >> > On Sat, 5 Oct 2013, Russell King - ARM Linux wrote: >> > >> > > On Sat, Oct 05, 2013 at 07:36:20PM +0200, Guennadi Liakhovetski wrote: >> > > > + A DMA transaction descriptor, returned to the user by one of "prep" >> > > > + methods is considered as belogning to the user until it is submitted >> > > > + to the dmaengine driver for transfer. However, it is recommended, that >> > > > + the dmaengine driver keeps references to prepared descriptors too, >> > > > + because if dmaengine_terminate_all() is called at that time, the driver >> > > > + will have to recycle all allocated descriptors for the respective >> > > > + channel. >> > > >> > > No. That's quite dangerous. think about what can happen. >> > > >> > > Thread 1 Thread 2 >> > > Driver calls prepare >> > > dmaengine_terminate_all() >> > > dmaengine driver frees prepared descriptor >> > > driver submits descriptor >> > > >> > > You now have a descriptor which has been freed submitted to the DMA engine >> > > queue. This will cause chaos. >> > >> > Yes, I understand this. So, it is intentional, that after a *_prep_* a >> > descriptor belongs to the user and - if the user fails - it will simply be >> > leaked. A terminate-all shouldn't recycle them and dmaengine driver >> > unbinding is impossible at that time, as long as the user hasn't released >> > the channel. Ok, I can rework the above just to explain this. >> >> "the user fails" should be very difficult. One of the requirements here >> is that any "user" submits the prepared descriptor as soon as possible >> after preparation. Some DMA engine implementations hold a spinlock >> between preparation and submission so this is absolutely necessary. > > Ouch... The pain here is that some engines have a pre-allocated descriptor ring, and the prep methods write directly to that ring. Without holding a lock a parallel submission could advance the ring before the last one was submitted. This was done because the original implementation of the prep routines started the pattern of writing directly to the ring. I think that is the original sin of dmaengine as every other driver subsystem in the universe submits a generic request object to a driver who then copies the values into the hardware control structure. For dma-slave that overhead is almost certainly worth paying. > >> As Dan Williams explained to me, the reason for the separate submission >> step is there to allow the callback information to be set. So literally >> any "user" of this should do this: >> >> desc = prepare(); >> if (!desc) >> goto failed_to_prepare; >> >> desc->callback = function; >> desc->callback_param = param; >> dmaengine_submit(desc); >> >> There should be very little possiblity of the user failing between those >> two calls. > > I see. > >> > > > - On completion of each DMA operation, the next in queue is started and >> > > > - a tasklet triggered. The tasklet will then call the client driver >> ^^^^^^^^^^^^^^^^^^^^ >> > > > - completion callback routine for notification, if set. >> > > > + On completion of each DMA operation, the next active transaction in queue is >> > > > + started and the ISR bottom half, e.g. a tasklet or a kernel thread is >> > > > + triggered. >> > > >> > > Or a kernel thread? I don't think that's right. It's always been >> > > specified that the callback will happen from tasklet context. >> > >> > Do you see any problems using, say, a threaded interrupt handler, apart >> > from possible performance issues? That seems to be pretty convenient. >> > Otherwise we should really mandate somewhere, that bottom half processing >> > should take place in a tasklet? >> >> The documentation has always stated that callbacks will be made from >> tasklet context. The problem with allowing different contexts from >> different drivers is taht spinlocking becomes problematical. Remember >> that we have _bh() variants which lock against tasklets but allow IRQs. > > Spinlocks are local to dmaengine drivers, and I currently see quite a few > of them doing spin_lock_irq(save)() instead of _bh(). Some also take that > lock in their ISR, like the pl330 does. Yes, but in that example the driver still arranges for the callback to be in tasklet context (pl330_tasklet). Threaded irqs should be fine. The callbacks just expect no stricter than bh context and cannot assume process context. > > The disadvantage of using a threaded IRQ, that I see so far, is that you > then can only wake up the thread from the ISR, not from other contexts, > but even then it should be possible, at least for some designs. > >> > > > + The dmaengine driver also has to check the DMA_CTRL_ACK flag by calling >> > > > + async_tx_test_ack() on the transaction. If the function returns true, i.e. >> > > > + if the transaction either has been flagged from the very beginning, or >> > > > + the user has flagged it later, then the transaction descriptor can be >> > > > + recycled immediately by the dmaengine driver. >> > > >> > > "has to check" I think is wrong. This is optional for slave only drivers, >> > > because most slave stuff doesn't use the ACK stuff - that's more for the >> > > async_tx APIs. >> > >> > "most," but some do? E.g. the mx3_camera driver. Ok, that one is very >> > tightly bound to the respective dmaengine driver. But if there are other >> > slave drivers, using that, that can run on different platforms and use >> > various dmaengine drivers? >> >> I'm not aware of any which actually make use of the ack stuff. Only those >> which implement the async_tx must implement this because it gets used for >> chaining and dependency stuff. However, there was a plan to remove this >> and replace it with proper refcounting. > > Ok. Which would dovetail with a generic dma request object, but not a pre-requisite to removal of the ack stuff. > >> > > > If the function returns >> > > > + false, the descriptor cannot be recycled yet and the dmaengine driver has >> > > > + to keep polling the descriptor until it is acknowledged by the user. >> > > > >> > > > Interface: >> > > > void dma_async_issue_pending(struct dma_chan *chan); >> > > > @@ -171,6 +192,14 @@ Further APIs: >> > > > discard data in the DMA FIFO which hasn't been fully transferred. >> > > > No callback functions will be called for any incomplete transfers. >> > > > >> > > > + Note: >> > > > + Transactions, aborted by this call are considered as failed. However, >> > > > + if the user requests their status, using dma_async_is_complete() / >> > > > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and >> > > > + DMA_SUCCESS will be returned. So, drivers are advised not to use those >> > > > + calls on transactions, submitted before a call to >> > > > + dmaengine_terminate_all(). >> > > >> > > The last sentence doesn't make sense. >> > >> > Right, sorry, there is a typo in the second line (to remind: this note is >> > for the dmaengine_terminate_all() function): >> > >> > + Note: >> > + Transactions, aborted by this call are considered as failed. However, >> > + if the user requests their status, using dma_async_is_tx_complete() / >> > + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and >> > + DMA_SUCCESS will be returned. So, drivers are advised not to use those >> > + calls on transactions, submitted before a call to >> > + dmaengine_terminate_all(). >> > >> > What I was trying to say, is that if you submit transactions, then >> > terminate them, then try to retrieve their status using >> > dma_async_is_tx_complete() / dma_async_is_complete(), this won't produce >> > the expected DMA_ERROR result, instead, one of DMA_IN_PROGRESS and >> > DMA_SUCCESS will be returned, unless the driver implements some >> > sophisticated .device_tx_status() method. >> >> Remember that dmaengine_terminate_all() is targetted to a specific >> channel - it terminates all submitted transactions on that channel >> and stops any in-process transaction. It doesn't affect any other >> channel on the controller. >> >> In the case of slave DMA users, you have to "own" the channel. This >> means it's up to the user to sort out what happens should _they_ call >> dmaengine_terminate_all() on it. >> >> For non-slave DMA users, I don't believe that terminating transactions >> is permitted - they're expected to complete. > > Ok, but the DMA error case remains, right? I.e. in case the dmaengine > driver had to drop transactions due to a hardware problem, the client > driver has no way to retrieve their status? In the current model Russell is right, it would need to be a specific error callback. -- Dan
On Mon, Oct 7, 2013 at 3:39 AM, Vinod Koul <vinod.koul@intel.com> wrote: > On Mon, Oct 07, 2013 at 12:17:28PM +0100, Russell King - ARM Linux wrote: >> On Mon, Oct 07, 2013 at 12:45:33PM +0200, Guennadi Liakhovetski wrote: >> > No, not something in the middle. I was thinking about >> > >> > (1) cookie 1-3 are submitted >> > (2) cookie 1 succeeds >> > (3) a DMA error occurs, cookies 2-3 are discarded > discarded using terminate_all right? > >> > (4) cookie 4 is submitted and succeeds >> > (5) the user requests status of cookie 2 and gets success back, AFAICS > No you cant!, you can only request for 4.. >> >> This is a side effect of using a sequential cookie based system to indicate >> whether a descriptor has succeeded or not. > In the above case, since user calls terminate_all we can put a rule that > terminate should reset the cookie counter. > So after the terminate_all the cookies are sequentially valid! I think you and Guennadi address this later in the thread, but I think cookie values should always increase. I see nothing but trouble if cookie values are allowed to go backwards. > Anyways as pointed above user shouldnt check for 2, he should have removed all > refs till 3 before calling terminate. > >> What may be better is to change the wording here: not DMA_SUCCESS but >> DMA_COMPLETED. That doesn't imply that it has been successful, merely >> that the DMA engine has finished with the transaction. > Agreed that its not indication of success but of DMA completetion. I have seen > cases where slave perhiphral got stuck while sending last FIFO but since DMA > finished transferiing to FIFO it says complete. > > Dan do you agree? Yes, it's an indication of completion, not necessarily success. -- Dan
On Tue, Oct 08, 2013 at 06:34:42PM -0700, Dan Williams wrote: > On Mon, Oct 7, 2013 at 3:39 AM, Vinod Koul <vinod.koul@intel.com> wrote: > > On Mon, Oct 07, 2013 at 12:17:28PM +0100, Russell King - ARM Linux wrote: > >> On Mon, Oct 07, 2013 at 12:45:33PM +0200, Guennadi Liakhovetski wrote: > >> > No, not something in the middle. I was thinking about > >> > > >> > (1) cookie 1-3 are submitted > >> > (2) cookie 1 succeeds > >> > (3) a DMA error occurs, cookies 2-3 are discarded > > discarded using terminate_all right? > > > >> > (4) cookie 4 is submitted and succeeds > >> > (5) the user requests status of cookie 2 and gets success back, AFAICS > > No you cant!, you can only request for 4.. > >> > >> This is a side effect of using a sequential cookie based system to indicate > >> whether a descriptor has succeeded or not. > > In the above case, since user calls terminate_all we can put a rule that > > terminate should reset the cookie counter. > > So after the terminate_all the cookies are sequentially valid! > > I think you and Guennadi address this later in the thread, but I think > cookie values should always increase. I see nothing but trouble if > cookie values are allowed to go backwards. Ok, my idea was to do this in terminate_all ONLY which is complete reset for a channel. Anything done before that should not be valid and should not e referenced! But am okay to drop the idea... > > Anyways as pointed above user shouldnt check for 2, he should have removed all > > refs till 3 before calling terminate. > > > >> What may be better is to change the wording here: not DMA_SUCCESS but > >> DMA_COMPLETED. That doesn't imply that it has been successful, merely > >> that the DMA engine has finished with the transaction. > > Agreed that its not indication of success but of DMA completetion. I have seen > > cases where slave perhiphral got stuck while sending last FIFO but since DMA > > finished transferiing to FIFO it says complete. > > > > Dan do you agree? > > Yes, it's an indication of completion, not necessarily success. Sure, will update this..
On 10/08/2013 07:34 PM, Dan Williams wrote: > On Mon, Oct 7, 2013 at 3:39 AM, Vinod Koul <vinod.koul@intel.com> wrote: >> On Mon, Oct 07, 2013 at 12:17:28PM +0100, Russell King - ARM Linux wrote: ... >>> What may be better is to change the wording here: not DMA_SUCCESS but >>> DMA_COMPLETED. That doesn't imply that it has been successful, merely >>> that the DMA engine has finished with the transaction. >> >> Agreed that its not indication of success but of DMA completetion. I have seen >> cases where slave perhiphral got stuck while sending last FIFO but since DMA >> finished transferiing to FIFO it says complete. In that case, the DMA *has* completed. DMA is the transfer into the FIFO, not the handling of the FIFO content by the peripheral. >> Dan do you agree? > > Yes, it's an indication of completion, not necessarily success. Surely by definition, a DMA can't *complete* without being successful. If the DMA failed, then it didn't complete, but rather must have been aborted or error'd out, without completing the whole transfer.
On Wed, Oct 16, 2013 at 01:33:44PM -0600, Stephen Warren wrote: > On 10/08/2013 07:34 PM, Dan Williams wrote: > > On Mon, Oct 7, 2013 at 3:39 AM, Vinod Koul <vinod.koul@intel.com> wrote: > >> On Mon, Oct 07, 2013 at 12:17:28PM +0100, Russell King - ARM Linux wrote: > ... > >>> What may be better is to change the wording here: not DMA_SUCCESS but > >>> DMA_COMPLETED. That doesn't imply that it has been successful, merely > >>> that the DMA engine has finished with the transaction. > >> > >> Agreed that its not indication of success but of DMA completetion. I have seen > >> cases where slave perhiphral got stuck while sending last FIFO but since DMA > >> finished transferiing to FIFO it says complete. > > In that case, the DMA *has* completed. DMA is the transfer into the > FIFO, not the handling of the FIFO content by the peripheral. > > >> Dan do you agree? > > > > Yes, it's an indication of completion, not necessarily success. > > Surely by definition, a DMA can't *complete* without being successful. > If the DMA failed, then it didn't complete, but rather must have been > aborted or error'd out, without completing the whole transfer. DMA means transferring of data, and that is what DMA compeletion would mean. Once the data has left the DMA FIFO, we don't know if it filled up the memory propery or get stuck in periphral FIFO, the dmaengine would have no knowledge or control of it. So it can't claim the transfer was success from a data point of view, hence the rename now! ~Vinod
On 10/16/2013 11:16 PM, Vinod Koul wrote: > On Wed, Oct 16, 2013 at 01:33:44PM -0600, Stephen Warren wrote: >> On 10/08/2013 07:34 PM, Dan Williams wrote: >>> On Mon, Oct 7, 2013 at 3:39 AM, Vinod Koul <vinod.koul@intel.com> wrote: >>>> On Mon, Oct 07, 2013 at 12:17:28PM +0100, Russell King - ARM Linux wrote: >> ... >>>>> What may be better is to change the wording here: not DMA_SUCCESS but >>>>> DMA_COMPLETED. That doesn't imply that it has been successful, merely >>>>> that the DMA engine has finished with the transaction. >>>> >>>> Agreed that its not indication of success but of DMA completetion. I have seen >>>> cases where slave perhiphral got stuck while sending last FIFO but since DMA >>>> finished transferiing to FIFO it says complete. >> >> In that case, the DMA *has* completed. DMA is the transfer into the >> FIFO, not the handling of the FIFO content by the peripheral. >> >>>> Dan do you agree? >>> >>> Yes, it's an indication of completion, not necessarily success. >> >> Surely by definition, a DMA can't *complete* without being successful. >> If the DMA failed, then it didn't complete, but rather must have been >> aborted or error'd out, without completing the whole transfer. > > DMA means transferring of data, and that is what DMA compeletion would mean. > Once the data has left the DMA FIFO, we don't know if it filled up the memory > propery or get stuck in periphral FIFO, the dmaengine would have no knowledge or > control of it. So it can't claim the transfer was success from a data point of > view, hence the rename now! Well, I obviously don't agreee, but I guess I won't bother pursuing this.
On Thu, Oct 17, 2013 at 7:55 AM, Stephen Warren <swarren@wwwdotorg.org> wrote: > On 10/16/2013 11:16 PM, Vinod Koul wrote: >> On Wed, Oct 16, 2013 at 01:33:44PM -0600, Stephen Warren wrote: >>> On 10/08/2013 07:34 PM, Dan Williams wrote: >>>> On Mon, Oct 7, 2013 at 3:39 AM, Vinod Koul <vinod.koul@intel.com> wrote: >>>>> On Mon, Oct 07, 2013 at 12:17:28PM +0100, Russell King - ARM Linux wrote: >>> ... >>>>>> What may be better is to change the wording here: not DMA_SUCCESS but >>>>>> DMA_COMPLETED. That doesn't imply that it has been successful, merely >>>>>> that the DMA engine has finished with the transaction. >>>>> >>>>> Agreed that its not indication of success but of DMA completetion. I have seen >>>>> cases where slave perhiphral got stuck while sending last FIFO but since DMA >>>>> finished transferiing to FIFO it says complete. >>> >>> In that case, the DMA *has* completed. DMA is the transfer into the >>> FIFO, not the handling of the FIFO content by the peripheral. >>> >>>>> Dan do you agree? >>>> >>>> Yes, it's an indication of completion, not necessarily success. >>> >>> Surely by definition, a DMA can't *complete* without being successful. >>> If the DMA failed, then it didn't complete, but rather must have been >>> aborted or error'd out, without completing the whole transfer. >> >> DMA means transferring of data, and that is what DMA compeletion would mean. >> Once the data has left the DMA FIFO, we don't know if it filled up the memory >> propery or get stuck in periphral FIFO, the dmaengine would have no knowledge or >> control of it. So it can't claim the transfer was success from a data point of >> view, hence the rename now! > > Well, I obviously don't agreee, but I guess I won't bother pursuing this. > What name would you prefer, DMA_THAT_COOKIE_IS_NOT_PART_OF_MY_ACTIVE_SET_SO_SOMETHING_MUST_HAVE_HAPPENED :-)? But I think you are pointing out that we don't have any rules around what to expect when a channel encounters an error. Does the chain stop advancing and cookie stays valid until the client can clean up, or does the engine mark an error and advance its last_completed_cookie past the error? -- Dan
diff --git a/Documentation/dmaengine.txt b/Documentation/dmaengine.txt index 879b6e3..21bb72d 100644 --- a/Documentation/dmaengine.txt +++ b/Documentation/dmaengine.txt @@ -133,8 +133,17 @@ The slave DMA usage consists of following steps: locks before calling the callback function which may cause a deadlock. - Note that callbacks will always be invoked from the DMA - engines tasklet, never from interrupt context. + Note that callbacks will always be invoked from the DMA engine's + interrupt processing bottom half, never from interrupt context. + + Note 2: + A DMA transaction descriptor, returned to the user by one of "prep" + methods is considered as belogning to the user until it is submitted + to the dmaengine driver for transfer. However, it is recommended, that + the dmaengine driver keeps references to prepared descriptors too, + because if dmaengine_terminate_all() is called at that time, the driver + will have to recycle all allocated descriptors for the respective + channel. 4. Submit the transaction @@ -150,15 +159,27 @@ The slave DMA usage consists of following steps: dmaengine_submit() will not start the DMA operation, it merely adds it to the pending queue. For this, see step 5, dma_async_issue_pending. -5. Issue pending DMA requests and wait for callback notification +5. Issue pending DMA requests and (optionally) request callback notification - The transactions in the pending queue can be activated by calling the - issue_pending API. If channel is idle then the first transaction in - queue is started and subsequent ones queued up. + Dmaengine drivers accumulate submitted transactions on a pending queue. + After this call all such pending transactions are activated. Transactions, + submitted after this call will be queued again in a deactivated state until + this function is called again. If at the time of this call the channel is + idle then the first transaction in queue is started and subsequent ones are + queued up. - On completion of each DMA operation, the next in queue is started and - a tasklet triggered. The tasklet will then call the client driver - completion callback routine for notification, if set. + On completion of each DMA operation, the next active transaction in queue is + started and the ISR bottom half, e.g. a tasklet or a kernel thread is + triggered. The tasklet will then call the client driver completion callback + routine for notification, if set. + + The dmaengine driver also has to check the DMA_CTRL_ACK flag by calling + async_tx_test_ack() on the transaction. If the function returns true, i.e. + if the transaction either has been flagged from the very beginning, or + the user has flagged it later, then the transaction descriptor can be + recycled immediately by the dmaengine driver. If the function returns + false, the descriptor cannot be recycled yet and the dmaengine driver has + to keep polling the descriptor until it is acknowledged by the user. Interface: void dma_async_issue_pending(struct dma_chan *chan); @@ -171,6 +192,14 @@ Further APIs: discard data in the DMA FIFO which hasn't been fully transferred. No callback functions will be called for any incomplete transfers. + Note: + Transactions, aborted by this call are considered as failed. However, + if the user requests their status, using dma_async_is_complete() / + dma_async_is_complete(), as described below, one of DMA_IN_PROGRESS and + DMA_SUCCESS will be returned. So, drivers are advised not to use those + calls on transactions, submitted before a call to + dmaengine_terminate_all(). + 2. int dmaengine_pause(struct dma_chan *chan) This pauses activity on the DMA channel without data loss. @@ -196,3 +225,14 @@ Further APIs: a running DMA channel. It is recommended that DMA engine users pause or stop (via dmaengine_terminate_all) the channel before using this API. + +DMA error handling + +1. If a DMA error occurs, usually the DMA driver has to abort the transaction in + progress. Since transactions, queued after the aborted one can depend on it, + they usually have to be dropped too. There is currently no way to notify + users about such a problem, so, users should normally start all DMA + transactions with a timeout handler. If a timeout occurs, the user shall + assume, that the DMA transaction failed. However, due to the same reason, as + described in a note to the dmaengine_terminate_all() description above, + enquiring status of timed out transactions is unreliable.
This patch extends dmaengine documentation to provide more details on descriptor prepare stage, transaction completion requirements and DMA error processing. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> --- These extensions reflect my understanding of some aspects of the dmaengine API. If it is wrong, I'd be happy if someone could correct me. If or where I'm right, I think, those aspects might want an update. Once we understand exactly the situation we can think about improvements to the API. Documentation/dmaengine.txt | 58 ++++++++++++++++++++++++++++++++++++------ 1 files changed, 49 insertions(+), 9 deletions(-)