Message ID | 20191023153138.23442-1-tony@atomide.com (mailing list archive) |
---|---|
State | Mainlined |
Commit | bacdcb6675e170bb2e8d3824da220e10274f42a7 |
Headers | show |
Series | dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle | expand |
On 23-10-19, 08:31, Tony Lindgren wrote: > Yegor Yefremov <yegorslists@googlemail.com> reported that musb and ftdi > uart can fail for the first open of the uart unless connected using > a hub. > > This is because the first dma call done by musb_ep_program() must wait > if cppi41 is PM runtime suspended. Otherwise musb_ep_program() continues > with other non-dma packets before the DMA transfer is started causing at > least ftdi uarts to fail to receive data. > > Let's fix the issue by waking up cppi41 with PM runtime calls added to > cppi41_dma_prep_slave_sg() and return NULL if still idled. This way we > have musb_ep_program() continue with PIO until cppi41 is awake. Applied and tagged stable, thanks
Hi Tony, On 10/23/19 6:31 PM, Tony Lindgren wrote: > Yegor Yefremov <yegorslists@googlemail.com> reported that musb and ftdi > uart can fail for the first open of the uart unless connected using > a hub. > > This is because the first dma call done by musb_ep_program() must wait > if cppi41 is PM runtime suspended. Otherwise musb_ep_program() continues > with other non-dma packets before the DMA transfer is started causing at > least ftdi uarts to fail to receive data. > > Let's fix the issue by waking up cppi41 with PM runtime calls added to > cppi41_dma_prep_slave_sg() and return NULL if still idled. This way we > have musb_ep_program() continue with PIO until cppi41 is awake. > > Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support") > Cc: Bin Liu <b-liu@ti.com> > Cc: giulio.benetti@benettiengineering.com > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Cc: Sebastian Reichel <sre@kernel.org> > Cc: Skvortsov <andrej.skvortzov@gmail.com> > Reported-by: Yegor Yefremov <yegorslists@googlemail.com> > Signed-off-by: Tony Lindgren <tony@atomide.com> > --- > > Please consider adding Cc stable v4.9+ tag when committing > > --- > drivers/dma/ti/cppi41.c | 21 ++++++++++++++++++++- > 1 file changed, 20 insertions(+), 1 deletion(-) > > diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c > --- a/drivers/dma/ti/cppi41.c > +++ b/drivers/dma/ti/cppi41.c > @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg( > enum dma_transfer_direction dir, unsigned long tx_flags, void *context) > { > struct cppi41_channel *c = to_cpp41_chan(chan); > + struct dma_async_tx_descriptor *txd = NULL; > + struct cppi41_dd *cdd = c->cdd; > struct cppi41_desc *d; > struct scatterlist *sg; > unsigned int i; > + int error; > + > + error = pm_runtime_get(cdd->ddev.dev); If pm_runtime_get() pm_runtime_mark_last_busy()+pm_runtime_put_autosuspend() around a code which updates a descriptor in _memory_ helps then this best described as works by luck ;) I have a feeling that if you put enough delay between prepare_sg and issue_pending in the usb driver then it will keep failing, no? fwiw, in the cppi41_dma_issue_pending() the driver does: error = pm_runtime_get(cdd->ddev.dev); ... if (!cdd->is_suspended) cppi41_run_queue(cdd); ... pm_runtime_mark_last_busy(cdd->ddev.dev); pm_runtime_put_autosuspend(cdd->ddev.dev); Without waiting for the transfer to complete? If issue_pending is not starting the transfer right away then the whole pm handling is broken in there. imho. runtime_get in prep_slave_sg and runtime_put when the transfer is finished? > + if (error < 0) { > + pm_runtime_put_noidle(cdd->ddev.dev); > + > + return NULL; > + } > + > + if (cdd->is_suspended) > + goto err_out_not_ready; > > d = c->desc; > for_each_sg(sgl, sg, sg_len, i) { > @@ -611,7 +624,13 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg( > d++; > } > > - return &c->txd; > + txd = &c->txd; > + > +err_out_not_ready: > + pm_runtime_mark_last_busy(cdd->ddev.dev); > + pm_runtime_put_autosuspend(cdd->ddev.dev); > + > + return txd; > } > > static void cppi41_compute_td_desc(struct cppi41_desc *d) > - Peter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
* Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 17:04]: > On 10/23/19 6:31 PM, Tony Lindgren wrote: > > diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c > > --- a/drivers/dma/ti/cppi41.c > > +++ b/drivers/dma/ti/cppi41.c > > @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg( > > enum dma_transfer_direction dir, unsigned long tx_flags, void *context) > > { > > struct cppi41_channel *c = to_cpp41_chan(chan); > > + struct dma_async_tx_descriptor *txd = NULL; > > + struct cppi41_dd *cdd = c->cdd; > > struct cppi41_desc *d; > > struct scatterlist *sg; > > unsigned int i; > > + int error; > > + > > + error = pm_runtime_get(cdd->ddev.dev); > > If pm_runtime_get() > pm_runtime_mark_last_busy()+pm_runtime_put_autosuspend() around a code > which updates a descriptor in _memory_ helps then this best described as > works by luck ;) It also checks the cpp41 state for cdd->is_suspended though. AFAIK we do not currently have any other place to tell the driver a DMA request is about to start at some point soon. > I have a feeling that if you put enough delay between prepare_sg and > issue_pending in the usb driver then it will keep failing, no? Nope, it will just queue it and run the queue when awake. > fwiw, in the cppi41_dma_issue_pending() the driver does: > > error = pm_runtime_get(cdd->ddev.dev); > ... > if (!cdd->is_suspended) > cppi41_run_queue(cdd); > ... > pm_runtime_mark_last_busy(cdd->ddev.dev); > pm_runtime_put_autosuspend(cdd->ddev.dev); > > Without waiting for the transfer to complete? The queue gets run when cpp41 is awake, runtime PM reference is not released until completed. > If issue_pending is not starting the transfer right away then the whole > pm handling is broken in there. imho. AFAIK there is no other way to do this without tagging devices with pm_runtime_irq_safe(), which is nasty as it takes a permanent use count on the parent device. But yeah, some dmaengine API that can sleep to tell a request is about to come would simplify things. I don't think we have anything like that available right now? Regards, Tony
Hello! On 10/23/2019 06:31 PM, Tony Lindgren wrote: > Yegor Yefremov <yegorslists@googlemail.com> reported that musb and ftdi > uart can fail for the first open of the uart unless connected using > a hub. > > This is because the first dma call done by musb_ep_program() must wait > if cppi41 is PM runtime suspended. Otherwise musb_ep_program() continues > with other non-dma packets before the DMA transfer is started causing at > least ftdi uarts to fail to receive data. > > Let's fix the issue by waking up cppi41 with PM runtime calls added to > cppi41_dma_prep_slave_sg() and return NULL if still idled. This way we > have musb_ep_program() continue with PIO until cppi41 is awake. > > Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support") > Cc: Bin Liu <b-liu@ti.com> > Cc: giulio.benetti@benettiengineering.com > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Cc: Sebastian Reichel <sre@kernel.org> > Cc: Skvortsov <andrej.skvortzov@gmail.com> > Reported-by: Yegor Yefremov <yegorslists@googlemail.com> > Signed-off-by: Tony Lindgren <tony@atomide.com> > --- > > Please consider adding Cc stable v4.9+ tag when committing > > --- > drivers/dma/ti/cppi41.c | 21 ++++++++++++++++++++- > 1 file changed, 20 insertions(+), 1 deletion(-) > > diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c > --- a/drivers/dma/ti/cppi41.c > +++ b/drivers/dma/ti/cppi41.c > @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg( > enum dma_transfer_direction dir, unsigned long tx_flags, void *context) > { > struct cppi41_channel *c = to_cpp41_chan(chan); > + struct dma_async_tx_descriptor *txd = NULL; > + struct cppi41_dd *cdd = c->cdd; > struct cppi41_desc *d; > struct scatterlist *sg; > unsigned int i; > + int error; > + > + error = pm_runtime_get(cdd->ddev.dev); > + if (error < 0) { I'd call that variable 'status', comparison (error < 0) just doesn't look right. If it was *if* (error), it would have been more correct. > + pm_runtime_put_noidle(cdd->ddev.dev); > + > + return NULL; > + } > + > + if (cdd->is_suspended) > + goto err_out_not_ready; > > d = c->desc; > for_each_sg(sgl, sg, sg_len, i) { [...] MBR, Sergei
On Wed, Oct 23, 2019 at 9:55 PM Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> wrote: > On 10/23/2019 06:31 PM, Tony Lindgren wrote: > > + int error; > > + > > + error = pm_runtime_get(cdd->ddev.dev); > > + if (error < 0) { > > I'd call that variable 'status', comparison (error < 0) just doesn't look right. > If it was *if* (error), it would have been more correct. It's error when it's negative. That's how PM runtime is designed. > > + pm_runtime_put_noidle(cdd->ddev.dev); > > + > > + return NULL; > > + }
* Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> [191023 18:56]: > On 10/23/2019 06:31 PM, Tony Lindgren wrote: > > --- a/drivers/dma/ti/cppi41.c > > +++ b/drivers/dma/ti/cppi41.c > > @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg( > > enum dma_transfer_direction dir, unsigned long tx_flags, void *context) > > { > > struct cppi41_channel *c = to_cpp41_chan(chan); > > + struct dma_async_tx_descriptor *txd = NULL; > > + struct cppi41_dd *cdd = c->cdd; > > struct cppi41_desc *d; > > struct scatterlist *sg; > > unsigned int i; > > + int error; > > + > > + error = pm_runtime_get(cdd->ddev.dev); > > + if (error < 0) { > > I'd call that variable 'status', comparison (error < 0) just doesn't look right. > If it was *if* (error), it would have been more correct. Good suggestion, something to remember for next time. In general error is widely used this way though: $ git grep -A1 "error = " | grep "error < 0" | wc -l 425 Regards, Tony
On 10/23/19 8:16 PM, Tony Lindgren wrote: > * Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 17:04]: >> On 10/23/19 6:31 PM, Tony Lindgren wrote: >>> diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c >>> --- a/drivers/dma/ti/cppi41.c >>> +++ b/drivers/dma/ti/cppi41.c >>> @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg( >>> enum dma_transfer_direction dir, unsigned long tx_flags, void *context) >>> { >>> struct cppi41_channel *c = to_cpp41_chan(chan); >>> + struct dma_async_tx_descriptor *txd = NULL; >>> + struct cppi41_dd *cdd = c->cdd; >>> struct cppi41_desc *d; >>> struct scatterlist *sg; >>> unsigned int i; >>> + int error; >>> + >>> + error = pm_runtime_get(cdd->ddev.dev); >> >> If pm_runtime_get() >> pm_runtime_mark_last_busy()+pm_runtime_put_autosuspend() around a code >> which updates a descriptor in _memory_ helps then this best described as >> works by luck ;) > > It also checks the cpp41 state for cdd->is_suspended > though. Which is cleared/set in the suspend/resume callbacks and they are called from a work (the driver uses async runtime_get). > AFAIK we do not currently have any other place > to tell the driver a DMA request is about to start at > some point soon. True, but still. >> I have a feeling that if you put enough delay between prepare_sg and >> issue_pending in the usb driver then it will keep failing, no? > > Nope, it will just queue it and run the queue when awake. the autosuspend_delay is set 100 ms, so if you put a udelay(101) between prep_sg and issue_pending in the usb driver this trickery will be for nothing, right? If the usb driver is preempted for longer than 100ms between the two calls, same issue. Not sure, but if for some reason the transfer would take longer than 100ms than pm_runtime will bring down the dma, no? >> fwiw, in the cppi41_dma_issue_pending() the driver does: >> >> error = pm_runtime_get(cdd->ddev.dev); >> ... >> if (!cdd->is_suspended) >> cppi41_run_queue(cdd); >> ... >> pm_runtime_mark_last_busy(cdd->ddev.dev); >> pm_runtime_put_autosuspend(cdd->ddev.dev); >> >> Without waiting for the transfer to complete? > > The queue gets run when cpp41 is awake, runtime PM > reference is not released until completed. > >> If issue_pending is not starting the transfer right away then the whole >> pm handling is broken in there. imho. > > AFAIK there is no other way to do this without tagging > devices with pm_runtime_irq_safe(), which is nasty as > it takes a permanent use count on the parent device. > > But yeah, some dmaengine API that can sleep to tell > a request is about to come would simplify things. any of the prep callbacks kind of indicates that a client is preparing a transfer so in a perfect world it is going to want to execute it.. > I don't think we have anything like that available > right now? Well, it would have the same issues. If the time between dmaengine_be_warned_i_m_going_to_call_issue_pending_soon and issue_pending is more than the autosuspend_delay then it is not going to help. On the other hand: if the usb driver assumes that the dma transfer is already finished when issue_pending returned and carry on with subsequent request, that is also a problematic assumption. One can only consider a transfer to be done if the completion callback is called or you have polled for the completion and it tells you the same. This is problematic if you are in atomic context as the DMA completion interrupt might not come while you are there. imho, this fix is working by lucky constellation of the stars ;) Or we can assume that there will never be more than 100ms delay between prepare_sg and issue_pending... - Peter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
* Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 19:02]: > On 10/23/19 8:16 PM, Tony Lindgren wrote: > > * Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 17:04]: > >> On 10/23/19 6:31 PM, Tony Lindgren wrote: > >>> diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c > >>> --- a/drivers/dma/ti/cppi41.c > >>> +++ b/drivers/dma/ti/cppi41.c > >>> @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg( > >>> enum dma_transfer_direction dir, unsigned long tx_flags, void *context) > >>> { > >>> struct cppi41_channel *c = to_cpp41_chan(chan); > >>> + struct dma_async_tx_descriptor *txd = NULL; > >>> + struct cppi41_dd *cdd = c->cdd; > >>> struct cppi41_desc *d; > >>> struct scatterlist *sg; > >>> unsigned int i; > >>> + int error; > >>> + > >>> + error = pm_runtime_get(cdd->ddev.dev); > >> > >> If pm_runtime_get() > >> pm_runtime_mark_last_busy()+pm_runtime_put_autosuspend() around a code > >> which updates a descriptor in _memory_ helps then this best described as > >> works by luck ;) > > > > It also checks the cpp41 state for cdd->is_suspended > > though. > > Which is cleared/set in the suspend/resume callbacks and they are called > from a work (the driver uses async runtime_get). Right, only the cppi41 driver itself knows when it's idled or not, we cannot rely on pm_runtime functions for that. > >> I have a feeling that if you put enough delay between prepare_sg and > >> issue_pending in the usb driver then it will keep failing, no? > > > > Nope, it will just queue it and run the queue when awake. > > the autosuspend_delay is set 100 ms, so if you put a udelay(101) between > prep_sg and issue_pending in the usb driver this trickery will be for > nothing, right? > If the usb driver is preempted for longer than 100ms between the two > calls, same issue. > Not sure, but if for some reason the transfer would take longer than > 100ms than pm_runtime will bring down the dma, no? No, the dma will happen just fine no matter what the delay is. Part of the problem here is the musb driver. It friggin continues before checking the completion of a dma transfer! Fixing that currently is not trivial. > > But yeah, some dmaengine API that can sleep to tell > > a request is about to come would simplify things. > > any of the prep callbacks kind of indicates that a client is preparing a > transfer so in a perfect world it is going to want to execute it.. > > > I don't think we have anything like that available > > right now? > > Well, it would have the same issues. If the time between > dmaengine_be_warned_i_m_going_to_call_issue_pending_soon and > issue_pending is more than the autosuspend_delay then it is not going to > help. We'd have to allow dma consumer driver call pm_runtime_get_sync() on the dma device. Something similar maybe to what we have for phy_pm_runtime_get_sync(). Or just get the device handle for dma so the consumer can call pm_runtime_get_sync() on it. > On the other hand: if the usb driver assumes that the dma transfer is > already finished when issue_pending returned and carry on with > subsequent request, that is also a problematic assumption. One can only > consider a transfer to be done if the completion callback is called or > you have polled for the completion and it tells you the same. > This is problematic if you are in atomic context as the DMA completion > interrupt might not come while you are there. Yeah the musb driver has a history of 14 years of issues. I guess the logic there has been, it's usb, it can disconnect at any time.. > imho, this fix is working by lucky constellation of the stars ;) > Or we can assume that there will never be more than 100ms delay between > prepare_sg and issue_pending... Not true. The cpp41 dma is doing the right thing and is not affected by the PM runtime autosuspend delay. Otherwise USB hard drives would not work, they can take seconds to spin up :) The missing part here seems to be the fact that musb continues before the dma completion is done. Regards, Tony
Hello! On 10/23/2019 09:58 PM, Andy Shevchenko wrote: >>> + int error; >>> + >>> + error = pm_runtime_get(cdd->ddev.dev); >>> + if (error < 0) { >> >> I'd call that variable 'status', comparison (error < 0) just doesn't look right. >> If it was *if* (error), it would have been more correct. > > It's error when it's negative. That's how PM runtime is designed. Most of the other code too. However, the RPM code tends to return positive values if a call is OK, so you have to go greater lengths and check a result for < 0. Calling the variable 'error' doesn't seem right in this context... >>> + pm_runtime_put_noidle(cdd->ddev.dev); >>> + >>> + return NULL; >>> + } MBR, Sergei
Tony, On 10/23/19 10:18 PM, Tony Lindgren wrote: >> the autosuspend_delay is set 100 ms, so if you put a udelay(101) between >> prep_sg and issue_pending in the usb driver this trickery will be for >> nothing, right? >> If the usb driver is preempted for longer than 100ms between the two >> calls, same issue. >> Not sure, but if for some reason the transfer would take longer than >> 100ms than pm_runtime will bring down the dma, no? > > No, the dma will happen just fine no matter what the delay is. > > Part of the problem here is the musb driver. It friggin > continues before checking the completion of a dma transfer! nice. Well I assume it is from an era when pm_runtime did not even existed and assumes that things are always available and react without delay. > Fixing that currently is not trivial. I had the pleasure to lurk in for the tusb6010_omap and I fully agree. >>> But yeah, some dmaengine API that can sleep to tell >>> a request is about to come would simplify things. >> >> any of the prep callbacks kind of indicates that a client is preparing a >> transfer so in a perfect world it is going to want to execute it.. >> >>> I don't think we have anything like that available >>> right now? >> >> Well, it would have the same issues. If the time between >> dmaengine_be_warned_i_m_going_to_call_issue_pending_soon and >> issue_pending is more than the autosuspend_delay then it is not going to >> help. > > We'd have to allow dma consumer driver call pm_runtime_get_sync() > on the dma device. Something similar maybe to what we have > for phy_pm_runtime_get_sync(). Or just get the device handle for > dma so the consumer can call pm_runtime_get_sync() on it. How much a pm_runtime_get_sync(dmadev) is different when it is issued by the client driver compared to when the dma driver issues it for it's own device? >> On the other hand: if the usb driver assumes that the dma transfer is >> already finished when issue_pending returned and carry on with >> subsequent request, that is also a problematic assumption. One can only >> consider a transfer to be done if the completion callback is called or >> you have polled for the completion and it tells you the same. >> This is problematic if you are in atomic context as the DMA completion >> interrupt might not come while you are there. > > Yeah the musb driver has a history of 14 years of issues. > I guess the logic there has been, it's usb, it can disconnect > at any time.. > >> imho, this fix is working by lucky constellation of the stars ;) >> Or we can assume that there will never be more than 100ms delay between >> prepare_sg and issue_pending... > > Not true. The cpp41 dma is doing the right thing and is not > affected by the PM runtime autosuspend delay. Yes, what the cppi41 does is (was) correct. If the rpm's work is not executed between the runtime_get and when the is_suspended flag is checked in issue_pending, then the resume callback will start the transfer. That's kind of fine(ish). If cppi41 would be used for audio then it would not be as the DMA must start when we tell it to start. > Otherwise USB > hard drives would not work, they can take seconds to spin up :) Hrm, true. > The missing part here seems to be the fact that musb continues > before the dma completion is done. But I still fail to see the difference between the events before this patch and with the case when there is a 100ms delay between prep_sg and issue_pending. Before this patch: prep_sg() issue_pending() <- runtime_get() / put_autosuspend() _not_ starting transfer runtime_resume() <- starts the transfer With this patch and than 100ms delay between prep_sg and issue_pending: prep_sg() <- runtime_get() / put_autosuspend() runtime_resume() <- not starting transfer issue_pending() <- runtime_get() / put_autosuspend() starts the transfer With this patch, but more than 100ms delay in between: prep_sg() <- runtime_get() / put_autosuspend() runtime_resume() <- not starting transfer > 100ms delay runtime_suspend() issue_pending() <- runtime_get() / put_autosuspend() _not_ starting transfer runtime_resume() <- starts the transfer pm_runtime_get_sync() in issue_pending would be the solution to avoid delayed execution, but the usb driver should not assume that DMA is completed as soon as issue_pending returned. - Peter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
* Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 19:55]: > On 10/23/19 10:18 PM, Tony Lindgren wrote: > > We'd have to allow dma consumer driver call pm_runtime_get_sync() > > on the dma device. Something similar maybe to what we have > > for phy_pm_runtime_get_sync(). Or just get the device handle for > > dma so the consumer can call pm_runtime_get_sync() on it. > > How much a pm_runtime_get_sync(dmadev) is different when it is issued by > the client driver compared to when the dma driver issues it for it's own > device? Well the consumer device could call pm_runtime_get_sync(dmadev) when the USB cable is connected for example, and then call pm_runtime_pu(dmadev) when let's say the USB cable is disconnected. Without using pm_runtime_irq_safe() we currently don't have a clear path for doing this where the pm_runtime_get_sync(dmadev) may sleep. > But I still fail to see the difference between the events before this > patch and with the case when there is a 100ms delay between prep_sg and > issue_pending. > > Before this patch: > > prep_sg() > issue_pending() <- runtime_get() / put_autosuspend() > _not_ starting transfer > runtime_resume() <- starts the transfer > > With this patch and than 100ms delay between prep_sg and issue_pending: > > prep_sg() <- runtime_get() / put_autosuspend() > runtime_resume() <- not starting transfer > issue_pending() <- runtime_get() / put_autosuspend() > starts the transfer > > With this patch, but more than 100ms delay in between: > > prep_sg() <- runtime_get() / put_autosuspend() > runtime_resume() <- not starting transfer > > 100ms delay > runtime_suspend() > issue_pending() <- runtime_get() / put_autosuspend() > _not_ starting transfer > runtime_resume() <- starts the transfer > > pm_runtime_get_sync() in issue_pending would be the solution to avoid > delayed execution, but the usb driver should not assume that DMA is > completed as soon as issue_pending returned. Oh I see. Yes the consumer driver would need to check for the completed dma transfer in all cases. The delay issues should not currently happen in the musb_ep_program() problem case as it gets called from IRQ context. And no, adding pm_runtime_get_sync() to issue_pending is not a solution. There may be clocks and regulators that need to be powered up, and we don't want to use pm_runtime_irq_safe() because of the permanent use count on the parent. Regards, Tony
On 23/10/2019 23:18, Tony Lindgren wrote: > * Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 19:55]: >> On 10/23/19 10:18 PM, Tony Lindgren wrote: >>> We'd have to allow dma consumer driver call pm_runtime_get_sync() >>> on the dma device. Something similar maybe to what we have >>> for phy_pm_runtime_get_sync(). Or just get the device handle for >>> dma so the consumer can call pm_runtime_get_sync() on it. >> >> How much a pm_runtime_get_sync(dmadev) is different when it is issued by >> the client driver compared to when the dma driver issues it for it's own >> device? > > Well the consumer device could call pm_runtime_get_sync(dmadev) > when the USB cable is connected for example, and then call > pm_runtime_pu(dmadev) when let's say the USB cable is disconnected. > > Without using pm_runtime_irq_safe() we currently don't have a > clear path for doing this where the pm_runtime_get_sync(dmadev) > may sleep. > >> But I still fail to see the difference between the events before this >> patch and with the case when there is a 100ms delay between prep_sg and >> issue_pending. >> >> Before this patch: >> >> prep_sg() >> issue_pending() <- runtime_get() / put_autosuspend() >> _not_ starting transfer >> runtime_resume() <- starts the transfer >> >> With this patch and than 100ms delay between prep_sg and issue_pending: >> >> prep_sg() <- runtime_get() / put_autosuspend() >> runtime_resume() <- not starting transfer >> issue_pending() <- runtime_get() / put_autosuspend() >> starts the transfer >> >> With this patch, but more than 100ms delay in between: >> >> prep_sg() <- runtime_get() / put_autosuspend() >> runtime_resume() <- not starting transfer >>> 100ms delay >> runtime_suspend() >> issue_pending() <- runtime_get() / put_autosuspend() >> _not_ starting transfer >> runtime_resume() <- starts the transfer >> >> pm_runtime_get_sync() in issue_pending would be the solution to avoid >> delayed execution, but the usb driver should not assume that DMA is >> completed as soon as issue_pending returned. > > Oh I see. Yes the consumer driver would need to check for > the completed dma transfer in all cases. The delay issues > should not currently happen in the musb_ep_program() problem > case as it gets called from IRQ context. > > And no, adding pm_runtime_get_sync() to issue_pending is not > a solution. There may be clocks and regulators that need to > be powered up, and we don't want to use pm_runtime_irq_safe() > because of the permanent use count on the parent. 5 cents. I think the right thing might be to get rid of pm_runtime_xxx() in cppi41_dma_issue_pending(). So overall approach will be: - new job -> cppi41_dma_prep_slave_sg() -> pm_runtime_get() - issue_pending: fill backlog if suspended or run_queue if active (pm_runtime_active()) - job done: dmaengine_desc_get_callback_invoke() -> dmaengine_desc_get_callback_invoke(); pm_runtime_mark_last_busy(cdd->ddev.dev); pm_runtime_put_autosuspend(cdd->ddev.dev); in all places. It even might allow to get rid of cdd->lock.
* Grygorii Strashko <grygorii.strashko@ti.com> [191023 20:56]: > On 23/10/2019 23:18, Tony Lindgren wrote: > > And no, adding pm_runtime_get_sync() to issue_pending is not > > a solution. There may be clocks and regulators that need to > > be powered up, and we don't want to use pm_runtime_irq_safe() > > because of the permanent use count on the parent. > > 5 cents. > > I think the right thing might be to get rid of pm_runtime_xxx() > in cppi41_dma_issue_pending(). So overall approach will be: > > - new job -> cppi41_dma_prep_slave_sg() -> pm_runtime_get() > - issue_pending: fill backlog if suspended or run_queue if active (pm_runtime_active()) > - job done: dmaengine_desc_get_callback_invoke() -> > > dmaengine_desc_get_callback_invoke(); > pm_runtime_mark_last_busy(cdd->ddev.dev); > pm_runtime_put_autosuspend(cdd->ddev.dev); > in all places. > > It even might allow to get rid of cdd->lock. Well I don't think cppi41_dma_prep_slave_sg() is necessarily paired with anything currently. This can potentially leading to pm_runtime_get() called multiple times? So I think we'd also need cppi41_dma_cleanup_slave_sg() or similar, and require they get called in a paired manner. It might be better to add seprate PM runtime specific functions that dma consumers can optionally call. Regards, Tony
On 24/10/2019 00:27, Tony Lindgren wrote: > * Grygorii Strashko <grygorii.strashko@ti.com> [191023 20:56]: >> On 23/10/2019 23:18, Tony Lindgren wrote: >>> And no, adding pm_runtime_get_sync() to issue_pending is not >>> a solution. There may be clocks and regulators that need to >>> be powered up, and we don't want to use pm_runtime_irq_safe() >>> because of the permanent use count on the parent. >> >> 5 cents. >> >> I think the right thing might be to get rid of pm_runtime_xxx() >> in cppi41_dma_issue_pending(). So overall approach will be: >> >> - new job -> cppi41_dma_prep_slave_sg() -> pm_runtime_get() >> - issue_pending: fill backlog if suspended or run_queue if active (pm_runtime_active()) >> - job done: dmaengine_desc_get_callback_invoke() -> >> >> dmaengine_desc_get_callback_invoke(); >> pm_runtime_mark_last_busy(cdd->ddev.dev); >> pm_runtime_put_autosuspend(cdd->ddev.dev); >> in all places. >> >> It even might allow to get rid of cdd->lock. > > Well I don't think cppi41_dma_prep_slave_sg() is necessarily > paired with anything currently. It should - dma cmpletion callbacks have to be called somewhere. This can potentially leading > to pm_runtime_get() called multiple times? That's the idea - increase pm_counter as many times as jobs submitted. > > So I think we'd also need cppi41_dma_cleanup_slave_sg() > or similar, and require they get called in a paired manner. > > It might be better to add seprate PM runtime specific > functions that dma consumers can optionally call. not sure here. It seems just over designed very old stuff ;)
* Grygorii Strashko <grygorii.strashko@ti.com> [191023 21:43]: > > > On 24/10/2019 00:27, Tony Lindgren wrote: > > * Grygorii Strashko <grygorii.strashko@ti.com> [191023 20:56]: > > > On 23/10/2019 23:18, Tony Lindgren wrote: > > > > And no, adding pm_runtime_get_sync() to issue_pending is not > > > > a solution. There may be clocks and regulators that need to > > > > be powered up, and we don't want to use pm_runtime_irq_safe() > > > > because of the permanent use count on the parent. > > > > > > 5 cents. > > > > > > I think the right thing might be to get rid of pm_runtime_xxx() > > > in cppi41_dma_issue_pending(). So overall approach will be: > > > > > > - new job -> cppi41_dma_prep_slave_sg() -> pm_runtime_get() > > > - issue_pending: fill backlog if suspended or run_queue if active (pm_runtime_active()) > > > - job done: dmaengine_desc_get_callback_invoke() -> > > > > > > dmaengine_desc_get_callback_invoke(); > > > pm_runtime_mark_last_busy(cdd->ddev.dev); > > > pm_runtime_put_autosuspend(cdd->ddev.dev); > > > in all places. > > > > > > It even might allow to get rid of cdd->lock. > > > > Well I don't think cppi41_dma_prep_slave_sg() is necessarily > > paired with anything currently. > > It should - dma cmpletion callbacks have to be called somewhere. Well what I meant is there's no guarantee that we have cppi41_dma_issue_pending() followed by cppi41_dma_prep_slave_sg() currently :) > This can potentially leading > > to pm_runtime_get() called multiple times? > > That's the idea - increase pm_counter as many times as jobs submitted. Right, but that needs to be done in a paired manner so the API is clear to everyone and does not lead into unpaired PM runtime calls. Regards, Tony
On 23/10/2019 23.18, Tony Lindgren wrote: > * Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 19:55]: >> On 10/23/19 10:18 PM, Tony Lindgren wrote: >>> We'd have to allow dma consumer driver call pm_runtime_get_sync() >>> on the dma device. Something similar maybe to what we have >>> for phy_pm_runtime_get_sync(). Or just get the device handle for >>> dma so the consumer can call pm_runtime_get_sync() on it. >> >> How much a pm_runtime_get_sync(dmadev) is different when it is issued by >> the client driver compared to when the dma driver issues it for it's own >> device? > > Well the consumer device could call pm_runtime_get_sync(dmadev) > when the USB cable is connected for example, and then call > pm_runtime_pu(dmadev) when let's say the USB cable is disconnected. And the USB cable connect/disconnect is handled in interrupt -> you need to call pm_runtime_get_sync(dmadev) from interrupt context and need to mark the dmadev to pm_runtime_irq_safe() > Without using pm_runtime_irq_safe() we currently don't have a > clear path for doing this where the pm_runtime_get_sync(dmadev) > may sleep. > >> But I still fail to see the difference between the events before this >> patch and with the case when there is a 100ms delay between prep_sg and >> issue_pending. >> >> Before this patch: >> >> prep_sg() >> issue_pending() <- runtime_get() / put_autosuspend() >> _not_ starting transfer >> runtime_resume() <- starts the transfer >> >> With this patch and than 100ms delay between prep_sg and issue_pending: >> >> prep_sg() <- runtime_get() / put_autosuspend() >> runtime_resume() <- not starting transfer >> issue_pending() <- runtime_get() / put_autosuspend() >> starts the transfer >> >> With this patch, but more than 100ms delay in between: >> >> prep_sg() <- runtime_get() / put_autosuspend() >> runtime_resume() <- not starting transfer >>> 100ms delay >> runtime_suspend() >> issue_pending() <- runtime_get() / put_autosuspend() >> _not_ starting transfer >> runtime_resume() <- starts the transfer >> >> pm_runtime_get_sync() in issue_pending would be the solution to avoid >> delayed execution, but the usb driver should not assume that DMA is >> completed as soon as issue_pending returned. > > Oh I see. Yes the consumer driver would need to check for > the completed dma transfer in all cases. The delay issues > should not currently happen in the musb_ep_program() problem > case as it gets called from IRQ context. the cppi41 driver solely relies on irq to check is the transfer is completed (based on the cookie status). So yeah, musb have no other choice than trust that the transfer is done in a timely manner. > And no, adding pm_runtime_get_sync() to issue_pending is not > a solution. There may be clocks and regulators that need to > be powered up, and we don't want to use pm_runtime_irq_safe() > because of the permanent use count on the parent. I think the only way to handle this is to keep the DMA enabled as long as the USB cable is connected. Either to introduce dma_pm_runtime_get_sync(struct dma_chan *c) and dma_pm_runtime_put(struct dma_chan *c) or some better name. It's use would be optional, but for USB you would call them for cable connect the get_sync (from a work) and put it on disconnect. The driver internally would not need to be changed, I think this patch could be removed as well. - Péter Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On Thu, Oct 24, 2019 at 11:51 AM Peter Ujfalusi <peter.ujfalusi@ti.com> wrote: > On 23/10/2019 23.18, Tony Lindgren wrote: > And the USB cable connect/disconnect is handled in interrupt -> you need > to call pm_runtime_get_sync(dmadev) from interrupt context and need to > mark the dmadev to pm_runtime_irq_safe() Side note: and please don't spread the pm_runtime_irq_safe() hack more... I think Tony is quite aware of this.
* Andy Shevchenko <andy.shevchenko@gmail.com> [191024 11:23]: > On Thu, Oct 24, 2019 at 11:51 AM Peter Ujfalusi <peter.ujfalusi@ti.com> wrote: > > On 23/10/2019 23.18, Tony Lindgren wrote: > > > And the USB cable connect/disconnect is handled in interrupt -> you need > > to call pm_runtime_get_sync(dmadev) from interrupt context and need to > > mark the dmadev to pm_runtime_irq_safe() > > Side note: and please don't spread the pm_runtime_irq_safe() hack more... > I think Tony is quite aware of this. I agree, let's not add any more pm_runtime_irq_safe() at all. And let's get rid of all the existing users of it. Using pm_runtime_irq_safe() causes a problem for implementing genpd because of the permanent use count on the parent device. And then there is the serial driver issue with console use. In the musb case, we should be able to enable/disable cppi41 dma from musb_pm_runtime_check_session() without a need for pm_runtime_irq_safe(). We probably still want to keep the $subject patch for the state check and return NULL if cppi41 is not active. But should be able to remove the PM runtime calls in the $subject patch. Regards, Tony
diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c --- a/drivers/dma/ti/cppi41.c +++ b/drivers/dma/ti/cppi41.c @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg( enum dma_transfer_direction dir, unsigned long tx_flags, void *context) { struct cppi41_channel *c = to_cpp41_chan(chan); + struct dma_async_tx_descriptor *txd = NULL; + struct cppi41_dd *cdd = c->cdd; struct cppi41_desc *d; struct scatterlist *sg; unsigned int i; + int error; + + error = pm_runtime_get(cdd->ddev.dev); + if (error < 0) { + pm_runtime_put_noidle(cdd->ddev.dev); + + return NULL; + } + + if (cdd->is_suspended) + goto err_out_not_ready; d = c->desc; for_each_sg(sgl, sg, sg_len, i) { @@ -611,7 +624,13 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg( d++; } - return &c->txd; + txd = &c->txd; + +err_out_not_ready: + pm_runtime_mark_last_busy(cdd->ddev.dev); + pm_runtime_put_autosuspend(cdd->ddev.dev); + + return txd; } static void cppi41_compute_td_desc(struct cppi41_desc *d)
Yegor Yefremov <yegorslists@googlemail.com> reported that musb and ftdi uart can fail for the first open of the uart unless connected using a hub. This is because the first dma call done by musb_ep_program() must wait if cppi41 is PM runtime suspended. Otherwise musb_ep_program() continues with other non-dma packets before the DMA transfer is started causing at least ftdi uarts to fail to receive data. Let's fix the issue by waking up cppi41 with PM runtime calls added to cppi41_dma_prep_slave_sg() and return NULL if still idled. This way we have musb_ep_program() continue with PIO until cppi41 is awake. Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support") Cc: Bin Liu <b-liu@ti.com> Cc: giulio.benetti@benettiengineering.com Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Reichel <sre@kernel.org> Cc: Skvortsov <andrej.skvortzov@gmail.com> Reported-by: Yegor Yefremov <yegorslists@googlemail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> --- Please consider adding Cc stable v4.9+ tag when committing --- drivers/dma/ti/cppi41.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)