diff mbox

[RESEND,1/2] spi: rockchip: Convert to late and early system PM callbacks

Message ID 20171013104138.3216-2-jeffy.chen@rock-chips.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jeffy Chen Oct. 13, 2017, 10:41 a.m. UTC
Currently we are suspending the spi master in it's ->suspend callback,
which is racy as some other drivers may still want to transmit messages
on the bus(e.g. spi based pwm backlight).

Convert to late and early system PM callbacks to avoid the race.

Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
---

 drivers/spi/spi-rockchip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Doug Anderson Oct. 13, 2017, 3:32 p.m. UTC | #1
Hi,

On Fri, Oct 13, 2017 at 3:41 AM, Jeffy Chen <jeffy.chen@rock-chips.com> wrote:
> Currently we are suspending the spi master in it's ->suspend callback,
> which is racy as some other drivers may still want to transmit messages
> on the bus(e.g. spi based pwm backlight).
>
> Convert to late and early system PM callbacks to avoid the race.
>
> Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> ---
>
>  drivers/spi/spi-rockchip.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

It shouldn't hurt to do this, but I'm curious if you did any digging
about why this happens?  As I understood it suspend order is supposed
to be opposite of probe order.  Thus anything that was able to get a
reference to the cros-ec PWM at its probe time should get suspended
before cros-ec suspends.


-Doug
Brian Norris Oct. 13, 2017, 3:51 p.m. UTC | #2
Hi,

On Fri, Oct 13, 2017 at 08:32:12AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Fri, Oct 13, 2017 at 3:41 AM, Jeffy Chen <jeffy.chen@rock-chips.com> wrote:
> > Currently we are suspending the spi master in it's ->suspend callback,
> > which is racy as some other drivers may still want to transmit messages
> > on the bus(e.g. spi based pwm backlight).
> >
> > Convert to late and early system PM callbacks to avoid the race.
> >
> > Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
> > ---
> >
> >  drivers/spi/spi-rockchip.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> It shouldn't hurt to do this, but I'm curious if you did any digging
> about why this happens?  As I understood it suspend order is supposed
> to be opposite of probe order.  Thus anything that was able to get a
> reference to the cros-ec PWM at its probe time should get suspended
> before cros-ec suspends.

Yes, this does seem odd to me too. This looks like an arms race hack
that should be avoided unless we know a legit root cause. Also,
"probe order implies suspend order" doesn't quite work for async suspend
anyway, so we'd probably want to express the dependency properly
anyway.

Any chance this is related? Seems like that might break the parent/child
relationship for master/slave:

commit d7e2ee257038baeb03baef602500368a51ee9eef
Author: Linus Walleij <linus.walleij@linaro.org>
Date:   Mon Apr 11 13:51:03 2016 +0200

    spi: let SPI masters ignore their children for PM

Brian
Mark Brown Oct. 13, 2017, 4:42 p.m. UTC | #3
On Fri, Oct 13, 2017 at 08:51:21AM -0700, Brian Norris wrote:

> Yes, this does seem odd to me too. This looks like an arms race hack
> that should be avoided unless we know a legit root cause. Also,
> "probe order implies suspend order" doesn't quite work for async suspend
> anyway, so we'd probably want to express the dependency properly
> anyway.

Yeah, it's the same stuff as we get with initcall ordering.  This sort
of thing does happen with things like PMICs which tend to have hardware
that the system wants to manipulate in the IRQs off part of suspend.
Ideally the dependency annotation stuff would figure things out though
I'm not sure what the status of that is.

> Any chance this is related? Seems like that might break the parent/child
> relationship for master/slave:

> commit d7e2ee257038baeb03baef602500368a51ee9eef
> Author: Linus Walleij <linus.walleij@linaro.org>
> Date:   Mon Apr 11 13:51:03 2016 +0200

>     spi: let SPI masters ignore their children for PM

That's for runtime PM, I'd not expect it to affect system suspend.
Jeffy Chen Oct. 13, 2017, 6:19 p.m. UTC | #4
Hi guys,

it looks like the suspend sequence depends on the dt node sequence, and 
we are putting display-subsystem dt node above spi dt node, so it would 
be earlier in the device list, then got suspended later than spi device.

the pwm backlight and cros_ec_spi pwm are very interesting, not only 
about suspend dependency... if we unbind cros_ec_spi pwm, the pwm 
backlight would still hold a reference to it, and crash the kernel later.

On 10/14/2017 12:42 AM, Mark Brown wrote:
> On Fri, Oct 13, 2017 at 08:51:21AM -0700, Brian Norris wrote:
>
>> Yes, this does seem odd to me too. This looks like an arms race hack
>> that should be avoided unless we know a legit root cause. Also,
>> "probe order implies suspend order" doesn't quite work for async suspend
>> anyway, so we'd probably want to express the dependency properly
>> anyway.
>
> Yeah, it's the same stuff as we get with initcall ordering.  This sort
> of thing does happen with things like PMICs which tend to have hardware
> that the system wants to manipulate in the IRQs off part of suspend.
> Ideally the dependency annotation stuff would figure things out though
> I'm not sure what the status of that is.
>
>> Any chance this is related? Seems like that might break the parent/child
>> relationship for master/slave:
>
>> commit d7e2ee257038baeb03baef602500368a51ee9eef
>> Author: Linus Walleij <linus.walleij@linaro.org>
>> Date:   Mon Apr 11 13:51:03 2016 +0200
>
>>      spi: let SPI masters ignore their children for PM
>
> That's for runtime PM, I'd not expect it to affect system suspend.
>
Jeffy Chen Oct. 13, 2017, 6:25 p.m. UTC | #5
On 10/14/2017 02:19 AM, jeffy wrote:
>
>
> it looks like the suspend sequence depends on the dt node sequence, and
> we are putting display-subsystem dt node above spi dt node, so it would
> be earlier in the device list, then got suspended later than spi device.
>
> the pwm backlight and cros_ec_spi pwm are very interesting, not only
> about suspend dependency... if we unbind cros_ec_spi pwm, the pwm
> backlight would still hold a reference to it, and crash the kernel later.

or maybe we should move device_pm_add() from device_add() to driver_bound()?
Dmitry Torokhov Oct. 13, 2017, 6:30 p.m. UTC | #6
On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote:
> Hi guys,
> 
> it looks like the suspend sequence depends on the dt node sequence, and we
> are putting display-subsystem dt node above spi dt node, so it would be
> earlier in the device list, then got suspended later than spi device.

Would it not get a deferral when trying to get resource reference, which
would cause it bumped down to the end of dpm list?

> 
> the pwm backlight and cros_ec_spi pwm are very interesting, not only about
> suspend dependency... if we unbind cros_ec_spi pwm, the pwm backlight would
> still hold a reference to it, and crash the kernel later.

That would be a bug in PWM/cors_ec and it should keep the PWM object
until last reference drops and simply error out on all requests.

> 
> On 10/14/2017 12:42 AM, Mark Brown wrote:
> > On Fri, Oct 13, 2017 at 08:51:21AM -0700, Brian Norris wrote:
> > 
> > > Yes, this does seem odd to me too. This looks like an arms race hack
> > > that should be avoided unless we know a legit root cause. Also,
> > > "probe order implies suspend order" doesn't quite work for async suspend
> > > anyway, so we'd probably want to express the dependency properly
> > > anyway.
> > 
> > Yeah, it's the same stuff as we get with initcall ordering.  This sort
> > of thing does happen with things like PMICs which tend to have hardware
> > that the system wants to manipulate in the IRQs off part of suspend.
> > Ideally the dependency annotation stuff would figure things out though
> > I'm not sure what the status of that is.

I'd say non-existent for resources such as regulators, pwms, clocks,
etc. I do not think many places call device_link_add()... I think adding
this to devm_* APIs might be easiest to get the ball going as they
naturally have consumer device and can easily figure out the supplier
side.

> > 
> > > Any chance this is related? Seems like that might break the parent/child
> > > relationship for master/slave:
> > 
> > > commit d7e2ee257038baeb03baef602500368a51ee9eef
> > > Author: Linus Walleij <linus.walleij@linaro.org>
> > > Date:   Mon Apr 11 13:51:03 2016 +0200
> > 
> > >      spi: let SPI masters ignore their children for PM
> > 
> > That's for runtime PM, I'd not expect it to affect system suspend.
> > 
> 
> 

Thanks.
Dmitry Torokhov Oct. 13, 2017, 6:30 p.m. UTC | #7
On Sat, Oct 14, 2017 at 02:25:48AM +0800, jeffy wrote:
> On 10/14/2017 02:19 AM, jeffy wrote:
> > 
> > 
> > it looks like the suspend sequence depends on the dt node sequence, and
> > we are putting display-subsystem dt node above spi dt node, so it would
> > be earlier in the device list, then got suspended later than spi device.
> > 
> > the pwm backlight and cros_ec_spi pwm are very interesting, not only
> > about suspend dependency... if we unbind cros_ec_spi pwm, the pwm
> > backlight would still hold a reference to it, and crash the kernel later.
> 
> or maybe we should move device_pm_add() from device_add() to driver_bound()?

You do not necessarily need to have a driver to power the dveice on and
off.
Mark Brown Oct. 13, 2017, 6:37 p.m. UTC | #8
On Fri, Oct 13, 2017 at 11:30:06AM -0700, Dmitry Torokhov wrote:
> On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote:

> > > Yeah, it's the same stuff as we get with initcall ordering.  This sort
> > > of thing does happen with things like PMICs which tend to have hardware
> > > that the system wants to manipulate in the IRQs off part of suspend.
> > > Ideally the dependency annotation stuff would figure things out though
> > > I'm not sure what the status of that is.

> I'd say non-existent for resources such as regulators, pwms, clocks,
> etc. I do not think many places call device_link_add()... I think adding
> this to devm_* APIs might be easiest to get the ball going as they
> naturally have consumer device and can easily figure out the supplier
> side.

Hrm, are things in a state where we're supposed to be doing that?  I've
not seen anyone even trying which is a bit odd, I'd thought there was
some core work still ongoing (and it wasn't clear to me if we were going
to try to do things like use DT/ACPI cross references to figure this
stuff out).
Jeffy Chen Oct. 13, 2017, 6:44 p.m. UTC | #9
Hi Dmitry,

On 10/14/2017 02:30 AM, Dmitry Torokhov wrote:
> On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote:
>> Hi guys,
>>
>> it looks like the suspend sequence depends on the dt node sequence, and we
>> are putting display-subsystem dt node above spi dt node, so it would be
>> earlier in the device list, then got suspended later than spi device.
>
> Would it not get a deferral when trying to get resource reference, which
> would cause it bumped down to the end of dpm list?
hmm, right, check again, the rockchip drm would not depend on spi, but 
the edp driver does.

so the drm driver(display-subsystem) would probed before spi, but try to 
control the backlight in the suspend/resume...

so i was wrong in the commit message, will fix it in next version.
>
>>
>> the pwm backlight and cros_ec_spi pwm are very interesting, not only about
>> suspend dependency... if we unbind cros_ec_spi pwm, the pwm backlight would
>> still hold a reference to it, and crash the kernel later.
>
> That would be a bug in PWM/cors_ec and it should keep the PWM object
> until last reference drops and simply error out on all requests.
right, and maybe try to refresh the pwm reference when we bind it again
>
>>
>> On 10/14/2017 12:42 AM, Mark Brown wrote:
>>> On Fri, Oct 13, 2017 at 08:51:21AM -0700, Brian Norris wrote:
>>>
>>>> Yes, this does seem odd to me too. This looks like an arms race hack
>>>> that should be avoided unless we know a legit root cause. Also,
>>>> "probe order implies suspend order" doesn't quite work for async suspend
>>>> anyway, so we'd probably want to express the dependency properly
>>>> anyway.
>>>
>>> Yeah, it's the same stuff as we get with initcall ordering.  This sort
>>> of thing does happen with things like PMICs which tend to have hardware
>>> that the system wants to manipulate in the IRQs off part of suspend.
>>> Ideally the dependency annotation stuff would figure things out though
>>> I'm not sure what the status of that is.
>
> I'd say non-existent for resources such as regulators, pwms, clocks,
> etc. I do not think many places call device_link_add()... I think adding
> this to devm_* APIs might be easiest to get the ball going as they
> naturally have consumer device and can easily figure out the supplier
> side.
>
>>>
>>>> Any chance this is related? Seems like that might break the parent/child
>>>> relationship for master/slave:
>>>
>>>> commit d7e2ee257038baeb03baef602500368a51ee9eef
>>>> Author: Linus Walleij <linus.walleij@linaro.org>
>>>> Date:   Mon Apr 11 13:51:03 2016 +0200
>>>
>>>>       spi: let SPI masters ignore their children for PM
>>>
>>> That's for runtime PM, I'd not expect it to affect system suspend.
>>>
>>
>>
>
> Thanks.
>
Jeffy Chen Oct. 13, 2017, 7:01 p.m. UTC | #10
Hi guys,

so what happens here is:
1/ we put display-subsystem dt node before spi node, which cause 
rockchip drm driver probed before spi(also before edp driver/vop driver...)

2/ rockchip drm driver bound after spi/edp/vop... drivers probed

3/ in rockchip drm driver's resume callback, it would try to enable edp 
panel backlight(through spi), but spi master is still suspended, then we 
got these errors:

1970-01-01T08:02:59.607315+08:00 ERR kernel: [  178.754005] cros-ec-spi 
spi2.0: spi
  transfer failed: -108
1970-01-01T08:02:59.607320+08:00 ERR kernel: [  178.760102] cros-ec-spi 
spi2.0: cs-
deassert spi transfer failed: -108
1970-01-01T08:02:59.607325+08:00 ERR kernel: [  178.767380] cros-ec-spi 
spi2.0: Com
mand xfer error (err:-108)
1970-01-01T08:02:59.607331+08:00 ERR kernel: [  178.773963] cros-ec-spi 
spi2.0: spi
  transfer failed: -108
1970-01-01T08:02:59.607336+08:00 ERR kernel: [  178.780066] cros-ec-spi 
spi2.0: cs-
deassert spi transfer failed: -108
1970-01-01T08:02:59.607341+08:00 ERR kernel: [  178.787359] cros-ec-spi 
spi2.0: Com
mand xfer error (err:-108)

so other than move spi master suspend to late suspend, maybe we could 
defer rockchip drm driver probe after it's component drivers somehow?



On 10/14/2017 02:44 AM, jeffy wrote:
> Hi Dmitry,
>
> On 10/14/2017 02:30 AM, Dmitry Torokhov wrote:
>> On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote:
>>> Hi guys,
>>>
>>> it looks like the suspend sequence depends on the dt node sequence,
>>> and we
>>> are putting display-subsystem dt node above spi dt node, so it would be
>>> earlier in the device list, then got suspended later than spi device.
>>
>> Would it not get a deferral when trying to get resource reference, which
>> would cause it bumped down to the end of dpm list?
> hmm, right, check again, the rockchip drm would not depend on spi, but
> the edp driver does.
>
> so the drm driver(display-subsystem) would probed before spi, but try to
> control the backlight in the suspend/resume...
>
> so i was wrong in the commit message, will fix it in next version.
>>
>>>
>>> the pwm backlight and cros_ec_spi pwm are very interesting, not only
>>> about
>>> suspend dependency... if we unbind cros_ec_spi pwm, the pwm backlight
>>> would
>>> still hold a reference to it, and crash the kernel later.
>>
>> That would be a bug in PWM/cors_ec and it should keep the PWM object
>> until last reference drops and simply error out on all requests.
> right, and maybe try to refresh the pwm reference when we bind it again
>>
>>>
>>> On 10/14/2017 12:42 AM, Mark Brown wrote:
>>>> On Fri, Oct 13, 2017 at 08:51:21AM -0700, Brian Norris wrote:
>>>>
>>>>> Yes, this does seem odd to me too. This looks like an arms race hack
>>>>> that should be avoided unless we know a legit root cause. Also,
>>>>> "probe order implies suspend order" doesn't quite work for async
>>>>> suspend
>>>>> anyway, so we'd probably want to express the dependency properly
>>>>> anyway.
>>>>
>>>> Yeah, it's the same stuff as we get with initcall ordering.  This sort
>>>> of thing does happen with things like PMICs which tend to have hardware
>>>> that the system wants to manipulate in the IRQs off part of suspend.
>>>> Ideally the dependency annotation stuff would figure things out though
>>>> I'm not sure what the status of that is.
>>
>> I'd say non-existent for resources such as regulators, pwms, clocks,
>> etc. I do not think many places call device_link_add()... I think adding
>> this to devm_* APIs might be easiest to get the ball going as they
>> naturally have consumer device and can easily figure out the supplier
>> side.
>>
>>>>
>>>>> Any chance this is related? Seems like that might break the
>>>>> parent/child
>>>>> relationship for master/slave:
>>>>
>>>>> commit d7e2ee257038baeb03baef602500368a51ee9eef
>>>>> Author: Linus Walleij <linus.walleij@linaro.org>
>>>>> Date:   Mon Apr 11 13:51:03 2016 +0200
>>>>
>>>>>       spi: let SPI masters ignore their children for PM
>>>>
>>>> That's for runtime PM, I'd not expect it to affect system suspend.
>>>>
>>>
>>>
>>
>> Thanks.
>>
>
Dmitry Torokhov Oct. 13, 2017, 11:45 p.m. UTC | #11
On Fri, Oct 13, 2017 at 07:37:50PM +0100, Mark Brown wrote:
> On Fri, Oct 13, 2017 at 11:30:06AM -0700, Dmitry Torokhov wrote:
> > On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote:
> 
> > > > Yeah, it's the same stuff as we get with initcall ordering.  This sort
> > > > of thing does happen with things like PMICs which tend to have hardware
> > > > that the system wants to manipulate in the IRQs off part of suspend.
> > > > Ideally the dependency annotation stuff would figure things out though
> > > > I'm not sure what the status of that is.
> 
> > I'd say non-existent for resources such as regulators, pwms, clocks,
> > etc. I do not think many places call device_link_add()... I think adding
> > this to devm_* APIs might be easiest to get the ball going as they
> > naturally have consumer device and can easily figure out the supplier
> > side.
> 
> Hrm, are things in a state where we're supposed to be doing that?  I've
> not seen anyone even trying which is a bit odd, I'd thought there was
> some core work still ongoing (and it wasn't clear to me if we were going
> to try to do things like use DT/ACPI cross references to figure this
> stuff out).

Not sure, adding Rafael...
Rafael J. Wysocki Oct. 14, 2017, 12:15 a.m. UTC | #12
On Saturday, October 14, 2017 1:45:11 AM CEST Dmitry Torokhov wrote:
> On Fri, Oct 13, 2017 at 07:37:50PM +0100, Mark Brown wrote:
> > On Fri, Oct 13, 2017 at 11:30:06AM -0700, Dmitry Torokhov wrote:
> > > On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote:
> > 
> > > > > Yeah, it's the same stuff as we get with initcall ordering.  This sort
> > > > > of thing does happen with things like PMICs which tend to have hardware
> > > > > that the system wants to manipulate in the IRQs off part of suspend.
> > > > > Ideally the dependency annotation stuff would figure things out though
> > > > > I'm not sure what the status of that is.
> > 
> > > I'd say non-existent for resources such as regulators, pwms, clocks,
> > > etc. I do not think many places call device_link_add()... I think adding
> > > this to devm_* APIs might be easiest to get the ball going as they
> > > naturally have consumer device and can easily figure out the supplier
> > > side.
> > 
> > Hrm, are things in a state where we're supposed to be doing that?  I've
> > not seen anyone even trying which is a bit odd, I'd thought there was
> > some core work still ongoing (and it wasn't clear to me if we were going
> > to try to do things like use DT/ACPI cross references to figure this
> > stuff out).
> 
> Not sure, adding Rafael...

The infrastructure in the core is there, the part related to getting that
information from firmware is (mostly) missing.

Still, moving system suspend/resume callbacks to the late/early phases may
be a good idea anyway.

Thanks,
Rafael
Mark Brown Oct. 14, 2017, 8:59 a.m. UTC | #13
On Sat, Oct 14, 2017 at 02:15:34AM +0200, Rafael J. Wysocki wrote:

> The infrastructure in the core is there, the part related to getting that
> information from firmware is (mostly) missing.

Ah, yes - now I remember.  We want this for probe ordering so we don't
really want to be doing it in the subsystems when they request resources
as that's too late (though I guess duplication there won't hurt).  Fun.
diff mbox

Patch

diff --git a/drivers/spi/spi-rockchip.c b/drivers/spi/spi-rockchip.c
index fdcf3076681b..ae539c735ea6 100644
--- a/drivers/spi/spi-rockchip.c
+++ b/drivers/spi/spi-rockchip.c
@@ -914,7 +914,7 @@  static int rockchip_spi_runtime_resume(struct device *dev)
 #endif /* CONFIG_PM */
 
 static const struct dev_pm_ops rockchip_spi_pm = {
-	SET_SYSTEM_SLEEP_PM_OPS(rockchip_spi_suspend, rockchip_spi_resume)
+	SET_LATE_SYSTEM_SLEEP_PM_OPS(rockchip_spi_suspend, rockchip_spi_resume)
 	SET_RUNTIME_PM_OPS(rockchip_spi_runtime_suspend,
 			   rockchip_spi_runtime_resume, NULL)
 };