diff mbox series

dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle

Message ID 20191023153138.23442-1-tony@atomide.com (mailing list archive)
State Mainlined
Commit bacdcb6675e170bb2e8d3824da220e10274f42a7
Headers show
Series dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle | expand

Commit Message

Tony Lindgren Oct. 23, 2019, 3:31 p.m. UTC
Yegor Yefremov <yegorslists@googlemail.com> reported that musb and ftdi
uart can fail for the first open of the uart unless connected using
a hub.

This is because the first dma call done by musb_ep_program() must wait
if cppi41 is PM runtime suspended. Otherwise musb_ep_program() continues
with other non-dma packets before the DMA transfer is started causing at
least ftdi uarts to fail to receive data.

Let's fix the issue by waking up cppi41 with PM runtime calls added to
cppi41_dma_prep_slave_sg() and return NULL if still idled. This way we
have musb_ep_program() continue with PIO until cppi41 is awake.

Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
Cc: Bin Liu <b-liu@ti.com>
Cc: giulio.benetti@benettiengineering.com
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Skvortsov <andrej.skvortzov@gmail.com>
Reported-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---

Please consider adding Cc stable v4.9+ tag when committing

---
 drivers/dma/ti/cppi41.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Comments

Vinod Koul Oct. 23, 2019, 3:52 p.m. UTC | #1
On 23-10-19, 08:31, Tony Lindgren wrote:
> Yegor Yefremov <yegorslists@googlemail.com> reported that musb and ftdi
> uart can fail for the first open of the uart unless connected using
> a hub.
> 
> This is because the first dma call done by musb_ep_program() must wait
> if cppi41 is PM runtime suspended. Otherwise musb_ep_program() continues
> with other non-dma packets before the DMA transfer is started causing at
> least ftdi uarts to fail to receive data.
> 
> Let's fix the issue by waking up cppi41 with PM runtime calls added to
> cppi41_dma_prep_slave_sg() and return NULL if still idled. This way we
> have musb_ep_program() continue with PIO until cppi41 is awake.

Applied and tagged stable, thanks
Peter Ujfalusi Oct. 23, 2019, 5:04 p.m. UTC | #2
Hi Tony,

On 10/23/19 6:31 PM, Tony Lindgren wrote:
> Yegor Yefremov <yegorslists@googlemail.com> reported that musb and ftdi
> uart can fail for the first open of the uart unless connected using
> a hub.
> 
> This is because the first dma call done by musb_ep_program() must wait
> if cppi41 is PM runtime suspended. Otherwise musb_ep_program() continues
> with other non-dma packets before the DMA transfer is started causing at
> least ftdi uarts to fail to receive data.
> 
> Let's fix the issue by waking up cppi41 with PM runtime calls added to
> cppi41_dma_prep_slave_sg() and return NULL if still idled. This way we
> have musb_ep_program() continue with PIO until cppi41 is awake.
> 
> Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
> Cc: Bin Liu <b-liu@ti.com>
> Cc: giulio.benetti@benettiengineering.com
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Sebastian Reichel <sre@kernel.org>
> Cc: Skvortsov <andrej.skvortzov@gmail.com>
> Reported-by: Yegor Yefremov <yegorslists@googlemail.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> ---
> 
> Please consider adding Cc stable v4.9+ tag when committing
> 
> ---
>  drivers/dma/ti/cppi41.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c
> --- a/drivers/dma/ti/cppi41.c
> +++ b/drivers/dma/ti/cppi41.c
> @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg(
>  	enum dma_transfer_direction dir, unsigned long tx_flags, void *context)
>  {
>  	struct cppi41_channel *c = to_cpp41_chan(chan);
> +	struct dma_async_tx_descriptor *txd = NULL;
> +	struct cppi41_dd *cdd = c->cdd;
>  	struct cppi41_desc *d;
>  	struct scatterlist *sg;
>  	unsigned int i;
> +	int error;
> +
> +	error = pm_runtime_get(cdd->ddev.dev);

If pm_runtime_get()
pm_runtime_mark_last_busy()+pm_runtime_put_autosuspend() around a code
which updates a descriptor in _memory_ helps then this best described as
works by luck ;)

I have a feeling that if you put enough delay between prepare_sg and
issue_pending in the usb driver then it will keep failing, no?

fwiw, in the cppi41_dma_issue_pending() the driver does:

	error = pm_runtime_get(cdd->ddev.dev);
...
	if (!cdd->is_suspended)
		cppi41_run_queue(cdd);
...
	pm_runtime_mark_last_busy(cdd->ddev.dev);
	pm_runtime_put_autosuspend(cdd->ddev.dev);

Without waiting for the transfer to complete?

If issue_pending is not starting the transfer right away then the whole
pm handling is broken in there. imho.

runtime_get in prep_slave_sg and runtime_put when the transfer is finished?

> +	if (error < 0) {
> +		pm_runtime_put_noidle(cdd->ddev.dev);
> +
> +		return NULL;
> +	}
> +
> +	if (cdd->is_suspended)
> +		goto err_out_not_ready;
>  
>  	d = c->desc;
>  	for_each_sg(sgl, sg, sg_len, i) {
> @@ -611,7 +624,13 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg(
>  		d++;
>  	}
>  
> -	return &c->txd;
> +	txd = &c->txd;
> +
> +err_out_not_ready:
> +	pm_runtime_mark_last_busy(cdd->ddev.dev);
> +	pm_runtime_put_autosuspend(cdd->ddev.dev);
> +
> +	return txd;
>  }
>  
>  static void cppi41_compute_td_desc(struct cppi41_desc *d)
> 

- Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Tony Lindgren Oct. 23, 2019, 5:16 p.m. UTC | #3
* Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 17:04]:
> On 10/23/19 6:31 PM, Tony Lindgren wrote:
> > diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c
> > --- a/drivers/dma/ti/cppi41.c
> > +++ b/drivers/dma/ti/cppi41.c
> > @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg(
> >  	enum dma_transfer_direction dir, unsigned long tx_flags, void *context)
> >  {
> >  	struct cppi41_channel *c = to_cpp41_chan(chan);
> > +	struct dma_async_tx_descriptor *txd = NULL;
> > +	struct cppi41_dd *cdd = c->cdd;
> >  	struct cppi41_desc *d;
> >  	struct scatterlist *sg;
> >  	unsigned int i;
> > +	int error;
> > +
> > +	error = pm_runtime_get(cdd->ddev.dev);
> 
> If pm_runtime_get()
> pm_runtime_mark_last_busy()+pm_runtime_put_autosuspend() around a code
> which updates a descriptor in _memory_ helps then this best described as
> works by luck ;)

It also checks the cpp41 state for cdd->is_suspended
though. AFAIK we do not currently have any other place
to tell the driver a DMA request is about to start at
some point soon.

> I have a feeling that if you put enough delay between prepare_sg and
> issue_pending in the usb driver then it will keep failing, no?

Nope, it will just queue it and run the queue when awake.

> fwiw, in the cppi41_dma_issue_pending() the driver does:
> 
> 	error = pm_runtime_get(cdd->ddev.dev);
> ...
> 	if (!cdd->is_suspended)
> 		cppi41_run_queue(cdd);
> ...
> 	pm_runtime_mark_last_busy(cdd->ddev.dev);
> 	pm_runtime_put_autosuspend(cdd->ddev.dev);
> 
> Without waiting for the transfer to complete?

The queue gets run when cpp41 is awake, runtime PM
reference is not released until completed.

> If issue_pending is not starting the transfer right away then the whole
> pm handling is broken in there. imho.

AFAIK there is no other way to do this without tagging
devices with pm_runtime_irq_safe(), which is nasty as
it takes a permanent use count on the parent device.

But yeah, some dmaengine API that can sleep to tell
a request is about to come would simplify things.

I don't think we have anything like that available
right now?

Regards,

Tony
Sergei Shtylyov Oct. 23, 2019, 6:55 p.m. UTC | #4
Hello!

On 10/23/2019 06:31 PM, Tony Lindgren wrote:

> Yegor Yefremov <yegorslists@googlemail.com> reported that musb and ftdi
> uart can fail for the first open of the uart unless connected using
> a hub.
> 
> This is because the first dma call done by musb_ep_program() must wait
> if cppi41 is PM runtime suspended. Otherwise musb_ep_program() continues
> with other non-dma packets before the DMA transfer is started causing at
> least ftdi uarts to fail to receive data.
> 
> Let's fix the issue by waking up cppi41 with PM runtime calls added to
> cppi41_dma_prep_slave_sg() and return NULL if still idled. This way we
> have musb_ep_program() continue with PIO until cppi41 is awake.
> 
> Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
> Cc: Bin Liu <b-liu@ti.com>
> Cc: giulio.benetti@benettiengineering.com
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Sebastian Reichel <sre@kernel.org>
> Cc: Skvortsov <andrej.skvortzov@gmail.com>
> Reported-by: Yegor Yefremov <yegorslists@googlemail.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> ---
> 
> Please consider adding Cc stable v4.9+ tag when committing
> 
> ---
>  drivers/dma/ti/cppi41.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c
> --- a/drivers/dma/ti/cppi41.c
> +++ b/drivers/dma/ti/cppi41.c
> @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg(
>  	enum dma_transfer_direction dir, unsigned long tx_flags, void *context)
>  {
>  	struct cppi41_channel *c = to_cpp41_chan(chan);
> +	struct dma_async_tx_descriptor *txd = NULL;
> +	struct cppi41_dd *cdd = c->cdd;
>  	struct cppi41_desc *d;
>  	struct scatterlist *sg;
>  	unsigned int i;
> +	int error;
> +
> +	error = pm_runtime_get(cdd->ddev.dev);
> +	if (error < 0) {

   I'd call that variable 'status', comparison (error < 0) just doesn't look right.
If it was *if* (error), it would have been more correct. 

> +		pm_runtime_put_noidle(cdd->ddev.dev);
> +
> +		return NULL;
> +	}
> +
> +	if (cdd->is_suspended)
> +		goto err_out_not_ready;
>  
>  	d = c->desc;
>  	for_each_sg(sgl, sg, sg_len, i) {
[...]

MBR, Sergei
Andy Shevchenko Oct. 23, 2019, 6:58 p.m. UTC | #5
On Wed, Oct 23, 2019 at 9:55 PM Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> On 10/23/2019 06:31 PM, Tony Lindgren wrote:

> > +     int error;
> > +
> > +     error = pm_runtime_get(cdd->ddev.dev);
> > +     if (error < 0) {
>
>    I'd call that variable 'status', comparison (error < 0) just doesn't look right.
> If it was *if* (error), it would have been more correct.

It's error when it's negative. That's how PM runtime is designed.

> > +             pm_runtime_put_noidle(cdd->ddev.dev);
> > +
> > +             return NULL;
> > +     }
Tony Lindgren Oct. 23, 2019, 7 p.m. UTC | #6
* Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> [191023 18:56]:
> On 10/23/2019 06:31 PM, Tony Lindgren wrote:
> > --- a/drivers/dma/ti/cppi41.c
> > +++ b/drivers/dma/ti/cppi41.c
> > @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg(
> >  	enum dma_transfer_direction dir, unsigned long tx_flags, void *context)
> >  {
> >  	struct cppi41_channel *c = to_cpp41_chan(chan);
> > +	struct dma_async_tx_descriptor *txd = NULL;
> > +	struct cppi41_dd *cdd = c->cdd;
> >  	struct cppi41_desc *d;
> >  	struct scatterlist *sg;
> >  	unsigned int i;
> > +	int error;
> > +
> > +	error = pm_runtime_get(cdd->ddev.dev);
> > +	if (error < 0) {
> 
>    I'd call that variable 'status', comparison (error < 0) just doesn't look right.
> If it was *if* (error), it would have been more correct. 

Good suggestion, something to remember for next time.
In general error is widely used this way though:

$ git grep -A1 "error = " | grep "error < 0" | wc -l
425

Regards,

Tony
Peter Ujfalusi Oct. 23, 2019, 7:02 p.m. UTC | #7
On 10/23/19 8:16 PM, Tony Lindgren wrote:
> * Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 17:04]:
>> On 10/23/19 6:31 PM, Tony Lindgren wrote:
>>> diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c
>>> --- a/drivers/dma/ti/cppi41.c
>>> +++ b/drivers/dma/ti/cppi41.c
>>> @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg(
>>>  	enum dma_transfer_direction dir, unsigned long tx_flags, void *context)
>>>  {
>>>  	struct cppi41_channel *c = to_cpp41_chan(chan);
>>> +	struct dma_async_tx_descriptor *txd = NULL;
>>> +	struct cppi41_dd *cdd = c->cdd;
>>>  	struct cppi41_desc *d;
>>>  	struct scatterlist *sg;
>>>  	unsigned int i;
>>> +	int error;
>>> +
>>> +	error = pm_runtime_get(cdd->ddev.dev);
>>
>> If pm_runtime_get()
>> pm_runtime_mark_last_busy()+pm_runtime_put_autosuspend() around a code
>> which updates a descriptor in _memory_ helps then this best described as
>> works by luck ;)
> 
> It also checks the cpp41 state for cdd->is_suspended
> though.

Which is cleared/set in the suspend/resume callbacks and they are called
from a work (the driver uses async runtime_get).

> AFAIK we do not currently have any other place
> to tell the driver a DMA request is about to start at
> some point soon.

True, but still.

>> I have a feeling that if you put enough delay between prepare_sg and
>> issue_pending in the usb driver then it will keep failing, no?
> 
> Nope, it will just queue it and run the queue when awake.

the autosuspend_delay is set 100 ms, so if you put a udelay(101) between
prep_sg and issue_pending in the usb driver this trickery will be for
nothing, right?
If the usb driver is preempted for longer than 100ms between the two
calls, same issue.
Not sure, but if for some reason the transfer would take longer than
100ms than pm_runtime will bring down the dma, no?

>> fwiw, in the cppi41_dma_issue_pending() the driver does:
>>
>> 	error = pm_runtime_get(cdd->ddev.dev);
>> ...
>> 	if (!cdd->is_suspended)
>> 		cppi41_run_queue(cdd);
>> ...
>> 	pm_runtime_mark_last_busy(cdd->ddev.dev);
>> 	pm_runtime_put_autosuspend(cdd->ddev.dev);
>>
>> Without waiting for the transfer to complete?
> 
> The queue gets run when cpp41 is awake, runtime PM
> reference is not released until completed.
> 
>> If issue_pending is not starting the transfer right away then the whole
>> pm handling is broken in there. imho.
> 
> AFAIK there is no other way to do this without tagging
> devices with pm_runtime_irq_safe(), which is nasty as
> it takes a permanent use count on the parent device.
> 
> But yeah, some dmaengine API that can sleep to tell
> a request is about to come would simplify things.

any of the prep callbacks kind of indicates that a client is preparing a
transfer so in a perfect world it is going to want to execute it..

> I don't think we have anything like that available
> right now?

Well, it would have the same issues. If the time between
dmaengine_be_warned_i_m_going_to_call_issue_pending_soon and
issue_pending is more than the autosuspend_delay then it is not going to
help.

On the other hand: if the usb driver assumes that the dma transfer is
already finished when issue_pending returned and carry on with
subsequent request, that is also a problematic assumption. One can only
consider a transfer to be done if the completion callback is called or
you have polled for the completion and it tells you the same.
This is problematic if you are in atomic context as the DMA completion
interrupt might not come while you are there.

imho, this fix is working by lucky constellation of the stars ;)
Or we can assume that there will never be more than 100ms delay between
prepare_sg and issue_pending...

- Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Tony Lindgren Oct. 23, 2019, 7:18 p.m. UTC | #8
* Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 19:02]:
> On 10/23/19 8:16 PM, Tony Lindgren wrote:
> > * Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 17:04]:
> >> On 10/23/19 6:31 PM, Tony Lindgren wrote:
> >>> diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c
> >>> --- a/drivers/dma/ti/cppi41.c
> >>> +++ b/drivers/dma/ti/cppi41.c
> >>> @@ -586,9 +586,22 @@ static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg(
> >>>  	enum dma_transfer_direction dir, unsigned long tx_flags, void *context)
> >>>  {
> >>>  	struct cppi41_channel *c = to_cpp41_chan(chan);
> >>> +	struct dma_async_tx_descriptor *txd = NULL;
> >>> +	struct cppi41_dd *cdd = c->cdd;
> >>>  	struct cppi41_desc *d;
> >>>  	struct scatterlist *sg;
> >>>  	unsigned int i;
> >>> +	int error;
> >>> +
> >>> +	error = pm_runtime_get(cdd->ddev.dev);
> >>
> >> If pm_runtime_get()
> >> pm_runtime_mark_last_busy()+pm_runtime_put_autosuspend() around a code
> >> which updates a descriptor in _memory_ helps then this best described as
> >> works by luck ;)
> > 
> > It also checks the cpp41 state for cdd->is_suspended
> > though.
> 
> Which is cleared/set in the suspend/resume callbacks and they are called
> from a work (the driver uses async runtime_get).

Right, only the cppi41 driver itself knows when it's idled
or not, we cannot rely on pm_runtime functions for that.

> >> I have a feeling that if you put enough delay between prepare_sg and
> >> issue_pending in the usb driver then it will keep failing, no?
> > 
> > Nope, it will just queue it and run the queue when awake.
> 
> the autosuspend_delay is set 100 ms, so if you put a udelay(101) between
> prep_sg and issue_pending in the usb driver this trickery will be for
> nothing, right?
> If the usb driver is preempted for longer than 100ms between the two
> calls, same issue.
> Not sure, but if for some reason the transfer would take longer than
> 100ms than pm_runtime will bring down the dma, no?

No, the dma will happen just fine no matter what the delay is.

Part of the problem here is the musb driver. It friggin
continues before checking the completion of a dma transfer!
Fixing that currently is not trivial.

> > But yeah, some dmaengine API that can sleep to tell
> > a request is about to come would simplify things.
> 
> any of the prep callbacks kind of indicates that a client is preparing a
> transfer so in a perfect world it is going to want to execute it..
> 
> > I don't think we have anything like that available
> > right now?
> 
> Well, it would have the same issues. If the time between
> dmaengine_be_warned_i_m_going_to_call_issue_pending_soon and
> issue_pending is more than the autosuspend_delay then it is not going to
> help.

We'd have to allow dma consumer driver call pm_runtime_get_sync()
on the dma device. Something similar maybe to what we have
for phy_pm_runtime_get_sync(). Or just get the device handle for
dma so the consumer can call pm_runtime_get_sync() on it.

> On the other hand: if the usb driver assumes that the dma transfer is
> already finished when issue_pending returned and carry on with
> subsequent request, that is also a problematic assumption. One can only
> consider a transfer to be done if the completion callback is called or
> you have polled for the completion and it tells you the same.
> This is problematic if you are in atomic context as the DMA completion
> interrupt might not come while you are there.

Yeah the musb driver has a history of 14 years of issues.
I guess the logic there has been, it's usb, it can disconnect
at any time..

> imho, this fix is working by lucky constellation of the stars ;)
> Or we can assume that there will never be more than 100ms delay between
> prepare_sg and issue_pending...

Not true. The cpp41 dma is doing the right thing and is not
affected by the PM runtime autosuspend delay. Otherwise USB
hard drives would not work, they can take seconds to spin up :)

The missing part here seems to be the fact that musb continues
before the dma completion is done.

Regards,

Tony
Sergei Shtylyov Oct. 23, 2019, 7:19 p.m. UTC | #9
Hello!

On 10/23/2019 09:58 PM, Andy Shevchenko wrote:

>>> +     int error;
>>> +
>>> +     error = pm_runtime_get(cdd->ddev.dev);
>>> +     if (error < 0) {
>>
>>    I'd call that variable 'status', comparison (error < 0) just doesn't look right.
>> If it was *if* (error), it would have been more correct.
> 
> It's error when it's negative. That's how PM runtime is designed.

   Most of the other code too. However, the RPM code tends to return 
positive values if a call is OK, so you have to go greater lengths and
check a result for < 0. Calling the variable 'error' doesn't seem right
in this context...

>>> +             pm_runtime_put_noidle(cdd->ddev.dev);
>>> +
>>> +             return NULL;
>>> +     }

MBR, Sergei
Peter Ujfalusi Oct. 23, 2019, 7:55 p.m. UTC | #10
Tony,

On 10/23/19 10:18 PM, Tony Lindgren wrote:
>> the autosuspend_delay is set 100 ms, so if you put a udelay(101) between
>> prep_sg and issue_pending in the usb driver this trickery will be for
>> nothing, right?
>> If the usb driver is preempted for longer than 100ms between the two
>> calls, same issue.
>> Not sure, but if for some reason the transfer would take longer than
>> 100ms than pm_runtime will bring down the dma, no?
> 
> No, the dma will happen just fine no matter what the delay is.
> 
> Part of the problem here is the musb driver. It friggin
> continues before checking the completion of a dma transfer!

nice. Well I assume it is from an era when pm_runtime did not even
existed and assumes that things are always available and react without
delay.

> Fixing that currently is not trivial.

I had the pleasure to lurk in for the tusb6010_omap and I fully agree.

>>> But yeah, some dmaengine API that can sleep to tell
>>> a request is about to come would simplify things.
>>
>> any of the prep callbacks kind of indicates that a client is preparing a
>> transfer so in a perfect world it is going to want to execute it..
>>
>>> I don't think we have anything like that available
>>> right now?
>>
>> Well, it would have the same issues. If the time between
>> dmaengine_be_warned_i_m_going_to_call_issue_pending_soon and
>> issue_pending is more than the autosuspend_delay then it is not going to
>> help.
> 
> We'd have to allow dma consumer driver call pm_runtime_get_sync()
> on the dma device. Something similar maybe to what we have
> for phy_pm_runtime_get_sync(). Or just get the device handle for
> dma so the consumer can call pm_runtime_get_sync() on it.

How much a pm_runtime_get_sync(dmadev) is different when it is issued by
the client driver compared to when the dma driver issues it for it's own
device?

>> On the other hand: if the usb driver assumes that the dma transfer is
>> already finished when issue_pending returned and carry on with
>> subsequent request, that is also a problematic assumption. One can only
>> consider a transfer to be done if the completion callback is called or
>> you have polled for the completion and it tells you the same.
>> This is problematic if you are in atomic context as the DMA completion
>> interrupt might not come while you are there.
> 
> Yeah the musb driver has a history of 14 years of issues.
> I guess the logic there has been, it's usb, it can disconnect
> at any time..
> 
>> imho, this fix is working by lucky constellation of the stars ;)
>> Or we can assume that there will never be more than 100ms delay between
>> prepare_sg and issue_pending...
> 
> Not true. The cpp41 dma is doing the right thing and is not
> affected by the PM runtime autosuspend delay.

Yes, what the cppi41 does is (was) correct. If the rpm's work is not
executed between the runtime_get and when the is_suspended flag is
checked in issue_pending, then the resume callback will start the
transfer. That's kind of fine(ish). If cppi41 would be used for audio
then it would not be as the DMA must start when we tell it to start.

> Otherwise USB
> hard drives would not work, they can take seconds to spin up :)

Hrm, true.

> The missing part here seems to be the fact that musb continues
> before the dma completion is done.

But I still fail to see the difference between the events before this
patch and with the case when there is a 100ms delay between prep_sg and
issue_pending.

Before this patch:

prep_sg()
issue_pending() <- runtime_get() /  put_autosuspend()
		   _not_ starting transfer
runtime_resume() <- starts the transfer

With this patch and than 100ms delay between prep_sg and issue_pending:

prep_sg() <- runtime_get() /  put_autosuspend()
runtime_resume() <- not starting transfer
issue_pending() <- runtime_get() /  put_autosuspend()
		   starts the transfer

With this patch, but more than 100ms delay in between:

prep_sg() <- runtime_get() /  put_autosuspend()
runtime_resume() <- not starting transfer
> 100ms delay
runtime_suspend()
issue_pending() <- runtime_get() /  put_autosuspend()
		   _not_ starting transfer
runtime_resume() <- starts the transfer

pm_runtime_get_sync() in issue_pending would be the solution to avoid
delayed execution, but the usb driver should not assume that DMA is
completed as soon as issue_pending returned.

- Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Tony Lindgren Oct. 23, 2019, 8:18 p.m. UTC | #11
* Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 19:55]:
> On 10/23/19 10:18 PM, Tony Lindgren wrote:
> > We'd have to allow dma consumer driver call pm_runtime_get_sync()
> > on the dma device. Something similar maybe to what we have
> > for phy_pm_runtime_get_sync(). Or just get the device handle for
> > dma so the consumer can call pm_runtime_get_sync() on it.
> 
> How much a pm_runtime_get_sync(dmadev) is different when it is issued by
> the client driver compared to when the dma driver issues it for it's own
> device?

Well the consumer device could call pm_runtime_get_sync(dmadev)
when the USB cable is connected for example, and then call
pm_runtime_pu(dmadev) when let's say the USB cable is disconnected.

Without using pm_runtime_irq_safe() we currently don't have a
clear path for doing this where the pm_runtime_get_sync(dmadev)
may sleep.

> But I still fail to see the difference between the events before this
> patch and with the case when there is a 100ms delay between prep_sg and
> issue_pending.
> 
> Before this patch:
> 
> prep_sg()
> issue_pending() <- runtime_get() /  put_autosuspend()
> 		   _not_ starting transfer
> runtime_resume() <- starts the transfer
> 
> With this patch and than 100ms delay between prep_sg and issue_pending:
> 
> prep_sg() <- runtime_get() /  put_autosuspend()
> runtime_resume() <- not starting transfer
> issue_pending() <- runtime_get() /  put_autosuspend()
> 		   starts the transfer
> 
> With this patch, but more than 100ms delay in between:
> 
> prep_sg() <- runtime_get() /  put_autosuspend()
> runtime_resume() <- not starting transfer
> > 100ms delay
> runtime_suspend()
> issue_pending() <- runtime_get() /  put_autosuspend()
> 		   _not_ starting transfer
> runtime_resume() <- starts the transfer
> 
> pm_runtime_get_sync() in issue_pending would be the solution to avoid
> delayed execution, but the usb driver should not assume that DMA is
> completed as soon as issue_pending returned.

Oh I see. Yes the consumer driver would need to check for
the completed dma transfer in all cases. The delay issues
should not currently happen in the musb_ep_program() problem
case as it gets called from IRQ context.

And no, adding pm_runtime_get_sync() to issue_pending is not
a solution. There may be clocks and regulators that need to
be powered up, and we don't want to use pm_runtime_irq_safe()
because of the permanent use count on the parent.

Regards,

Tony
Grygorii Strashko Oct. 23, 2019, 8:55 p.m. UTC | #12
On 23/10/2019 23:18, Tony Lindgren wrote:
> * Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 19:55]:
>> On 10/23/19 10:18 PM, Tony Lindgren wrote:
>>> We'd have to allow dma consumer driver call pm_runtime_get_sync()
>>> on the dma device. Something similar maybe to what we have
>>> for phy_pm_runtime_get_sync(). Or just get the device handle for
>>> dma so the consumer can call pm_runtime_get_sync() on it.
>>
>> How much a pm_runtime_get_sync(dmadev) is different when it is issued by
>> the client driver compared to when the dma driver issues it for it's own
>> device?
> 
> Well the consumer device could call pm_runtime_get_sync(dmadev)
> when the USB cable is connected for example, and then call
> pm_runtime_pu(dmadev) when let's say the USB cable is disconnected.
> 
> Without using pm_runtime_irq_safe() we currently don't have a
> clear path for doing this where the pm_runtime_get_sync(dmadev)
> may sleep.
> 
>> But I still fail to see the difference between the events before this
>> patch and with the case when there is a 100ms delay between prep_sg and
>> issue_pending.
>>
>> Before this patch:
>>
>> prep_sg()
>> issue_pending() <- runtime_get() /  put_autosuspend()
>> 		   _not_ starting transfer
>> runtime_resume() <- starts the transfer
>>
>> With this patch and than 100ms delay between prep_sg and issue_pending:
>>
>> prep_sg() <- runtime_get() /  put_autosuspend()
>> runtime_resume() <- not starting transfer
>> issue_pending() <- runtime_get() /  put_autosuspend()
>> 		   starts the transfer
>>
>> With this patch, but more than 100ms delay in between:
>>
>> prep_sg() <- runtime_get() /  put_autosuspend()
>> runtime_resume() <- not starting transfer
>>> 100ms delay
>> runtime_suspend()
>> issue_pending() <- runtime_get() /  put_autosuspend()
>> 		   _not_ starting transfer
>> runtime_resume() <- starts the transfer
>>
>> pm_runtime_get_sync() in issue_pending would be the solution to avoid
>> delayed execution, but the usb driver should not assume that DMA is
>> completed as soon as issue_pending returned.
> 
> Oh I see. Yes the consumer driver would need to check for
> the completed dma transfer in all cases. The delay issues
> should not currently happen in the musb_ep_program() problem
> case as it gets called from IRQ context.
> 
> And no, adding pm_runtime_get_sync() to issue_pending is not
> a solution. There may be clocks and regulators that need to
> be powered up, and we don't want to use pm_runtime_irq_safe()
> because of the permanent use count on the parent.

5 cents.

I think the right thing might be to get rid of pm_runtime_xxx()
in cppi41_dma_issue_pending(). So overall approach will be:

- new job -> cppi41_dma_prep_slave_sg() -> pm_runtime_get()
- issue_pending: fill backlog if suspended or run_queue if active (pm_runtime_active())
- job done: dmaengine_desc_get_callback_invoke() ->

	dmaengine_desc_get_callback_invoke();
	pm_runtime_mark_last_busy(cdd->ddev.dev);
	pm_runtime_put_autosuspend(cdd->ddev.dev);
   in all places.

It even might allow to get rid of cdd->lock.
Tony Lindgren Oct. 23, 2019, 9:27 p.m. UTC | #13
* Grygorii Strashko <grygorii.strashko@ti.com> [191023 20:56]:
> On 23/10/2019 23:18, Tony Lindgren wrote:
> > And no, adding pm_runtime_get_sync() to issue_pending is not
> > a solution. There may be clocks and regulators that need to
> > be powered up, and we don't want to use pm_runtime_irq_safe()
> > because of the permanent use count on the parent.
> 
> 5 cents.
> 
> I think the right thing might be to get rid of pm_runtime_xxx()
> in cppi41_dma_issue_pending(). So overall approach will be:
> 
> - new job -> cppi41_dma_prep_slave_sg() -> pm_runtime_get()
> - issue_pending: fill backlog if suspended or run_queue if active (pm_runtime_active())
> - job done: dmaengine_desc_get_callback_invoke() ->
> 
> 	dmaengine_desc_get_callback_invoke();
> 	pm_runtime_mark_last_busy(cdd->ddev.dev);
> 	pm_runtime_put_autosuspend(cdd->ddev.dev);
>   in all places.
> 
> It even might allow to get rid of cdd->lock.

Well I don't think cppi41_dma_prep_slave_sg() is necessarily
paired with anything currently. This can potentially leading
to pm_runtime_get() called multiple times?

So I think we'd also need cppi41_dma_cleanup_slave_sg()
or similar, and require they get called in a paired manner.

It might be better to add seprate PM runtime specific
functions that dma consumers can optionally call.

Regards,

Tony
Grygorii Strashko Oct. 23, 2019, 9:43 p.m. UTC | #14
On 24/10/2019 00:27, Tony Lindgren wrote:
> * Grygorii Strashko <grygorii.strashko@ti.com> [191023 20:56]:
>> On 23/10/2019 23:18, Tony Lindgren wrote:
>>> And no, adding pm_runtime_get_sync() to issue_pending is not
>>> a solution. There may be clocks and regulators that need to
>>> be powered up, and we don't want to use pm_runtime_irq_safe()
>>> because of the permanent use count on the parent.
>>
>> 5 cents.
>>
>> I think the right thing might be to get rid of pm_runtime_xxx()
>> in cppi41_dma_issue_pending(). So overall approach will be:
>>
>> - new job -> cppi41_dma_prep_slave_sg() -> pm_runtime_get()
>> - issue_pending: fill backlog if suspended or run_queue if active (pm_runtime_active())
>> - job done: dmaengine_desc_get_callback_invoke() ->
>>
>> 	dmaengine_desc_get_callback_invoke();
>> 	pm_runtime_mark_last_busy(cdd->ddev.dev);
>> 	pm_runtime_put_autosuspend(cdd->ddev.dev);
>>    in all places.
>>
>> It even might allow to get rid of cdd->lock.
> 
> Well I don't think cppi41_dma_prep_slave_sg() is necessarily
> paired with anything currently.

It should - dma cmpletion callbacks have to be called somewhere.


  This can potentially leading
> to pm_runtime_get() called multiple times?

That's the idea - increase pm_counter as many times as jobs submitted.

> 
> So I think we'd also need cppi41_dma_cleanup_slave_sg()
> or similar, and require they get called in a paired manner.
> 
> It might be better to add seprate PM runtime specific
> functions that dma consumers can optionally call.

not sure here. It seems just over designed very old stuff ;)
Tony Lindgren Oct. 23, 2019, 9:49 p.m. UTC | #15
* Grygorii Strashko <grygorii.strashko@ti.com> [191023 21:43]:
> 
> 
> On 24/10/2019 00:27, Tony Lindgren wrote:
> > * Grygorii Strashko <grygorii.strashko@ti.com> [191023 20:56]:
> > > On 23/10/2019 23:18, Tony Lindgren wrote:
> > > > And no, adding pm_runtime_get_sync() to issue_pending is not
> > > > a solution. There may be clocks and regulators that need to
> > > > be powered up, and we don't want to use pm_runtime_irq_safe()
> > > > because of the permanent use count on the parent.
> > > 
> > > 5 cents.
> > > 
> > > I think the right thing might be to get rid of pm_runtime_xxx()
> > > in cppi41_dma_issue_pending(). So overall approach will be:
> > > 
> > > - new job -> cppi41_dma_prep_slave_sg() -> pm_runtime_get()
> > > - issue_pending: fill backlog if suspended or run_queue if active (pm_runtime_active())
> > > - job done: dmaengine_desc_get_callback_invoke() ->
> > > 
> > > 	dmaengine_desc_get_callback_invoke();
> > > 	pm_runtime_mark_last_busy(cdd->ddev.dev);
> > > 	pm_runtime_put_autosuspend(cdd->ddev.dev);
> > >    in all places.
> > > 
> > > It even might allow to get rid of cdd->lock.
> > 
> > Well I don't think cppi41_dma_prep_slave_sg() is necessarily
> > paired with anything currently.
> 
> It should - dma cmpletion callbacks have to be called somewhere.

Well what I meant is there's no guarantee that we have
cppi41_dma_issue_pending() followed by cppi41_dma_prep_slave_sg()
currently :)

>  This can potentially leading
> > to pm_runtime_get() called multiple times?
> 
> That's the idea - increase pm_counter as many times as jobs submitted.

Right, but that needs to be done in a paired manner so the
API is clear to everyone and does not lead into unpaired
PM runtime calls.

Regards,

Tony
Peter Ujfalusi Oct. 24, 2019, 8:52 a.m. UTC | #16
On 23/10/2019 23.18, Tony Lindgren wrote:
> * Peter Ujfalusi <peter.ujfalusi@ti.com> [191023 19:55]:
>> On 10/23/19 10:18 PM, Tony Lindgren wrote:
>>> We'd have to allow dma consumer driver call pm_runtime_get_sync()
>>> on the dma device. Something similar maybe to what we have
>>> for phy_pm_runtime_get_sync(). Or just get the device handle for
>>> dma so the consumer can call pm_runtime_get_sync() on it.
>>
>> How much a pm_runtime_get_sync(dmadev) is different when it is issued by
>> the client driver compared to when the dma driver issues it for it's own
>> device?
> 
> Well the consumer device could call pm_runtime_get_sync(dmadev)
> when the USB cable is connected for example, and then call
> pm_runtime_pu(dmadev) when let's say the USB cable is disconnected.

And the USB cable connect/disconnect is handled in interrupt -> you need
to call pm_runtime_get_sync(dmadev) from interrupt context and need to
mark the dmadev to pm_runtime_irq_safe()

> Without using pm_runtime_irq_safe() we currently don't have a
> clear path for doing this where the pm_runtime_get_sync(dmadev)
> may sleep.
> 
>> But I still fail to see the difference between the events before this
>> patch and with the case when there is a 100ms delay between prep_sg and
>> issue_pending.
>>
>> Before this patch:
>>
>> prep_sg()
>> issue_pending() <- runtime_get() /  put_autosuspend()
>> 		   _not_ starting transfer
>> runtime_resume() <- starts the transfer
>>
>> With this patch and than 100ms delay between prep_sg and issue_pending:
>>
>> prep_sg() <- runtime_get() /  put_autosuspend()
>> runtime_resume() <- not starting transfer
>> issue_pending() <- runtime_get() /  put_autosuspend()
>> 		   starts the transfer
>>
>> With this patch, but more than 100ms delay in between:
>>
>> prep_sg() <- runtime_get() /  put_autosuspend()
>> runtime_resume() <- not starting transfer
>>> 100ms delay
>> runtime_suspend()
>> issue_pending() <- runtime_get() /  put_autosuspend()
>> 		   _not_ starting transfer
>> runtime_resume() <- starts the transfer
>>
>> pm_runtime_get_sync() in issue_pending would be the solution to avoid
>> delayed execution, but the usb driver should not assume that DMA is
>> completed as soon as issue_pending returned.
> 
> Oh I see. Yes the consumer driver would need to check for
> the completed dma transfer in all cases. The delay issues
> should not currently happen in the musb_ep_program() problem
> case as it gets called from IRQ context.

the cppi41 driver solely relies on irq to check is the transfer is
completed (based on the cookie status).
So yeah, musb have no other choice than trust that the transfer is done
in a timely manner.

> And no, adding pm_runtime_get_sync() to issue_pending is not
> a solution. There may be clocks and regulators that need to
> be powered up, and we don't want to use pm_runtime_irq_safe()
> because of the permanent use count on the parent.

I think the only way to handle this is to keep the DMA enabled as long
as the USB cable is connected.
Either to introduce dma_pm_runtime_get_sync(struct dma_chan *c) and
dma_pm_runtime_put(struct dma_chan *c) or some better name.

It's use would be optional, but for USB you would call them for cable
connect the get_sync (from a work) and put it on disconnect.

The driver internally would not need to be changed, I think this patch
could be removed as well.

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Andy Shevchenko Oct. 24, 2019, 11:22 a.m. UTC | #17
On Thu, Oct 24, 2019 at 11:51 AM Peter Ujfalusi <peter.ujfalusi@ti.com> wrote:
> On 23/10/2019 23.18, Tony Lindgren wrote:

> And the USB cable connect/disconnect is handled in interrupt -> you need
> to call pm_runtime_get_sync(dmadev) from interrupt context and need to
> mark the dmadev to pm_runtime_irq_safe()

Side note: and please don't spread the pm_runtime_irq_safe() hack more...
I think Tony is quite aware of this.
Tony Lindgren Oct. 24, 2019, 2 p.m. UTC | #18
* Andy Shevchenko <andy.shevchenko@gmail.com> [191024 11:23]:
> On Thu, Oct 24, 2019 at 11:51 AM Peter Ujfalusi <peter.ujfalusi@ti.com> wrote:
> > On 23/10/2019 23.18, Tony Lindgren wrote:
> 
> > And the USB cable connect/disconnect is handled in interrupt -> you need
> > to call pm_runtime_get_sync(dmadev) from interrupt context and need to
> > mark the dmadev to pm_runtime_irq_safe()
> 
> Side note: and please don't spread the pm_runtime_irq_safe() hack more...
> I think Tony is quite aware of this.

I agree, let's not add any more pm_runtime_irq_safe() at all. And let's get
rid of all the existing users of it.

Using pm_runtime_irq_safe() causes a problem for implementing genpd because
of the permanent use count on the parent device. And then there is the
serial driver issue with console use.

In the musb case, we should be able to enable/disable cppi41 dma from
musb_pm_runtime_check_session() without a need for pm_runtime_irq_safe().

We probably still want to keep the $subject patch for the state check and
return NULL if cppi41 is not active. But should be able to remove the
PM runtime calls in the $subject patch.

Regards,

Tony
diff mbox series

Patch

diff --git a/drivers/dma/ti/cppi41.c b/drivers/dma/ti/cppi41.c
--- a/drivers/dma/ti/cppi41.c
+++ b/drivers/dma/ti/cppi41.c
@@ -586,9 +586,22 @@  static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg(
 	enum dma_transfer_direction dir, unsigned long tx_flags, void *context)
 {
 	struct cppi41_channel *c = to_cpp41_chan(chan);
+	struct dma_async_tx_descriptor *txd = NULL;
+	struct cppi41_dd *cdd = c->cdd;
 	struct cppi41_desc *d;
 	struct scatterlist *sg;
 	unsigned int i;
+	int error;
+
+	error = pm_runtime_get(cdd->ddev.dev);
+	if (error < 0) {
+		pm_runtime_put_noidle(cdd->ddev.dev);
+
+		return NULL;
+	}
+
+	if (cdd->is_suspended)
+		goto err_out_not_ready;
 
 	d = c->desc;
 	for_each_sg(sgl, sg, sg_len, i) {
@@ -611,7 +624,13 @@  static struct dma_async_tx_descriptor *cppi41_dma_prep_slave_sg(
 		d++;
 	}
 
-	return &c->txd;
+	txd = &c->txd;
+
+err_out_not_ready:
+	pm_runtime_mark_last_busy(cdd->ddev.dev);
+	pm_runtime_put_autosuspend(cdd->ddev.dev);
+
+	return txd;
 }
 
 static void cppi41_compute_td_desc(struct cppi41_desc *d)