Message ID | 20171013104138.3216-2-jeffy.chen@rock-chips.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, On Fri, Oct 13, 2017 at 3:41 AM, Jeffy Chen <jeffy.chen@rock-chips.com> wrote: > Currently we are suspending the spi master in it's ->suspend callback, > which is racy as some other drivers may still want to transmit messages > on the bus(e.g. spi based pwm backlight). > > Convert to late and early system PM callbacks to avoid the race. > > Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> > --- > > drivers/spi/spi-rockchip.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) It shouldn't hurt to do this, but I'm curious if you did any digging about why this happens? As I understood it suspend order is supposed to be opposite of probe order. Thus anything that was able to get a reference to the cros-ec PWM at its probe time should get suspended before cros-ec suspends. -Doug
Hi, On Fri, Oct 13, 2017 at 08:32:12AM -0700, Doug Anderson wrote: > Hi, > > On Fri, Oct 13, 2017 at 3:41 AM, Jeffy Chen <jeffy.chen@rock-chips.com> wrote: > > Currently we are suspending the spi master in it's ->suspend callback, > > which is racy as some other drivers may still want to transmit messages > > on the bus(e.g. spi based pwm backlight). > > > > Convert to late and early system PM callbacks to avoid the race. > > > > Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> > > --- > > > > drivers/spi/spi-rockchip.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > It shouldn't hurt to do this, but I'm curious if you did any digging > about why this happens? As I understood it suspend order is supposed > to be opposite of probe order. Thus anything that was able to get a > reference to the cros-ec PWM at its probe time should get suspended > before cros-ec suspends. Yes, this does seem odd to me too. This looks like an arms race hack that should be avoided unless we know a legit root cause. Also, "probe order implies suspend order" doesn't quite work for async suspend anyway, so we'd probably want to express the dependency properly anyway. Any chance this is related? Seems like that might break the parent/child relationship for master/slave: commit d7e2ee257038baeb03baef602500368a51ee9eef Author: Linus Walleij <linus.walleij@linaro.org> Date: Mon Apr 11 13:51:03 2016 +0200 spi: let SPI masters ignore their children for PM Brian
On Fri, Oct 13, 2017 at 08:51:21AM -0700, Brian Norris wrote: > Yes, this does seem odd to me too. This looks like an arms race hack > that should be avoided unless we know a legit root cause. Also, > "probe order implies suspend order" doesn't quite work for async suspend > anyway, so we'd probably want to express the dependency properly > anyway. Yeah, it's the same stuff as we get with initcall ordering. This sort of thing does happen with things like PMICs which tend to have hardware that the system wants to manipulate in the IRQs off part of suspend. Ideally the dependency annotation stuff would figure things out though I'm not sure what the status of that is. > Any chance this is related? Seems like that might break the parent/child > relationship for master/slave: > commit d7e2ee257038baeb03baef602500368a51ee9eef > Author: Linus Walleij <linus.walleij@linaro.org> > Date: Mon Apr 11 13:51:03 2016 +0200 > spi: let SPI masters ignore their children for PM That's for runtime PM, I'd not expect it to affect system suspend.
Hi guys, it looks like the suspend sequence depends on the dt node sequence, and we are putting display-subsystem dt node above spi dt node, so it would be earlier in the device list, then got suspended later than spi device. the pwm backlight and cros_ec_spi pwm are very interesting, not only about suspend dependency... if we unbind cros_ec_spi pwm, the pwm backlight would still hold a reference to it, and crash the kernel later. On 10/14/2017 12:42 AM, Mark Brown wrote: > On Fri, Oct 13, 2017 at 08:51:21AM -0700, Brian Norris wrote: > >> Yes, this does seem odd to me too. This looks like an arms race hack >> that should be avoided unless we know a legit root cause. Also, >> "probe order implies suspend order" doesn't quite work for async suspend >> anyway, so we'd probably want to express the dependency properly >> anyway. > > Yeah, it's the same stuff as we get with initcall ordering. This sort > of thing does happen with things like PMICs which tend to have hardware > that the system wants to manipulate in the IRQs off part of suspend. > Ideally the dependency annotation stuff would figure things out though > I'm not sure what the status of that is. > >> Any chance this is related? Seems like that might break the parent/child >> relationship for master/slave: > >> commit d7e2ee257038baeb03baef602500368a51ee9eef >> Author: Linus Walleij <linus.walleij@linaro.org> >> Date: Mon Apr 11 13:51:03 2016 +0200 > >> spi: let SPI masters ignore their children for PM > > That's for runtime PM, I'd not expect it to affect system suspend. >
On 10/14/2017 02:19 AM, jeffy wrote: > > > it looks like the suspend sequence depends on the dt node sequence, and > we are putting display-subsystem dt node above spi dt node, so it would > be earlier in the device list, then got suspended later than spi device. > > the pwm backlight and cros_ec_spi pwm are very interesting, not only > about suspend dependency... if we unbind cros_ec_spi pwm, the pwm > backlight would still hold a reference to it, and crash the kernel later. or maybe we should move device_pm_add() from device_add() to driver_bound()?
On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote: > Hi guys, > > it looks like the suspend sequence depends on the dt node sequence, and we > are putting display-subsystem dt node above spi dt node, so it would be > earlier in the device list, then got suspended later than spi device. Would it not get a deferral when trying to get resource reference, which would cause it bumped down to the end of dpm list? > > the pwm backlight and cros_ec_spi pwm are very interesting, not only about > suspend dependency... if we unbind cros_ec_spi pwm, the pwm backlight would > still hold a reference to it, and crash the kernel later. That would be a bug in PWM/cors_ec and it should keep the PWM object until last reference drops and simply error out on all requests. > > On 10/14/2017 12:42 AM, Mark Brown wrote: > > On Fri, Oct 13, 2017 at 08:51:21AM -0700, Brian Norris wrote: > > > > > Yes, this does seem odd to me too. This looks like an arms race hack > > > that should be avoided unless we know a legit root cause. Also, > > > "probe order implies suspend order" doesn't quite work for async suspend > > > anyway, so we'd probably want to express the dependency properly > > > anyway. > > > > Yeah, it's the same stuff as we get with initcall ordering. This sort > > of thing does happen with things like PMICs which tend to have hardware > > that the system wants to manipulate in the IRQs off part of suspend. > > Ideally the dependency annotation stuff would figure things out though > > I'm not sure what the status of that is. I'd say non-existent for resources such as regulators, pwms, clocks, etc. I do not think many places call device_link_add()... I think adding this to devm_* APIs might be easiest to get the ball going as they naturally have consumer device and can easily figure out the supplier side. > > > > > Any chance this is related? Seems like that might break the parent/child > > > relationship for master/slave: > > > > > commit d7e2ee257038baeb03baef602500368a51ee9eef > > > Author: Linus Walleij <linus.walleij@linaro.org> > > > Date: Mon Apr 11 13:51:03 2016 +0200 > > > > > spi: let SPI masters ignore their children for PM > > > > That's for runtime PM, I'd not expect it to affect system suspend. > > > > Thanks.
On Sat, Oct 14, 2017 at 02:25:48AM +0800, jeffy wrote: > On 10/14/2017 02:19 AM, jeffy wrote: > > > > > > it looks like the suspend sequence depends on the dt node sequence, and > > we are putting display-subsystem dt node above spi dt node, so it would > > be earlier in the device list, then got suspended later than spi device. > > > > the pwm backlight and cros_ec_spi pwm are very interesting, not only > > about suspend dependency... if we unbind cros_ec_spi pwm, the pwm > > backlight would still hold a reference to it, and crash the kernel later. > > or maybe we should move device_pm_add() from device_add() to driver_bound()? You do not necessarily need to have a driver to power the dveice on and off.
On Fri, Oct 13, 2017 at 11:30:06AM -0700, Dmitry Torokhov wrote: > On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote: > > > Yeah, it's the same stuff as we get with initcall ordering. This sort > > > of thing does happen with things like PMICs which tend to have hardware > > > that the system wants to manipulate in the IRQs off part of suspend. > > > Ideally the dependency annotation stuff would figure things out though > > > I'm not sure what the status of that is. > I'd say non-existent for resources such as regulators, pwms, clocks, > etc. I do not think many places call device_link_add()... I think adding > this to devm_* APIs might be easiest to get the ball going as they > naturally have consumer device and can easily figure out the supplier > side. Hrm, are things in a state where we're supposed to be doing that? I've not seen anyone even trying which is a bit odd, I'd thought there was some core work still ongoing (and it wasn't clear to me if we were going to try to do things like use DT/ACPI cross references to figure this stuff out).
Hi Dmitry, On 10/14/2017 02:30 AM, Dmitry Torokhov wrote: > On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote: >> Hi guys, >> >> it looks like the suspend sequence depends on the dt node sequence, and we >> are putting display-subsystem dt node above spi dt node, so it would be >> earlier in the device list, then got suspended later than spi device. > > Would it not get a deferral when trying to get resource reference, which > would cause it bumped down to the end of dpm list? hmm, right, check again, the rockchip drm would not depend on spi, but the edp driver does. so the drm driver(display-subsystem) would probed before spi, but try to control the backlight in the suspend/resume... so i was wrong in the commit message, will fix it in next version. > >> >> the pwm backlight and cros_ec_spi pwm are very interesting, not only about >> suspend dependency... if we unbind cros_ec_spi pwm, the pwm backlight would >> still hold a reference to it, and crash the kernel later. > > That would be a bug in PWM/cors_ec and it should keep the PWM object > until last reference drops and simply error out on all requests. right, and maybe try to refresh the pwm reference when we bind it again > >> >> On 10/14/2017 12:42 AM, Mark Brown wrote: >>> On Fri, Oct 13, 2017 at 08:51:21AM -0700, Brian Norris wrote: >>> >>>> Yes, this does seem odd to me too. This looks like an arms race hack >>>> that should be avoided unless we know a legit root cause. Also, >>>> "probe order implies suspend order" doesn't quite work for async suspend >>>> anyway, so we'd probably want to express the dependency properly >>>> anyway. >>> >>> Yeah, it's the same stuff as we get with initcall ordering. This sort >>> of thing does happen with things like PMICs which tend to have hardware >>> that the system wants to manipulate in the IRQs off part of suspend. >>> Ideally the dependency annotation stuff would figure things out though >>> I'm not sure what the status of that is. > > I'd say non-existent for resources such as regulators, pwms, clocks, > etc. I do not think many places call device_link_add()... I think adding > this to devm_* APIs might be easiest to get the ball going as they > naturally have consumer device and can easily figure out the supplier > side. > >>> >>>> Any chance this is related? Seems like that might break the parent/child >>>> relationship for master/slave: >>> >>>> commit d7e2ee257038baeb03baef602500368a51ee9eef >>>> Author: Linus Walleij <linus.walleij@linaro.org> >>>> Date: Mon Apr 11 13:51:03 2016 +0200 >>> >>>> spi: let SPI masters ignore their children for PM >>> >>> That's for runtime PM, I'd not expect it to affect system suspend. >>> >> >> > > Thanks. >
Hi guys, so what happens here is: 1/ we put display-subsystem dt node before spi node, which cause rockchip drm driver probed before spi(also before edp driver/vop driver...) 2/ rockchip drm driver bound after spi/edp/vop... drivers probed 3/ in rockchip drm driver's resume callback, it would try to enable edp panel backlight(through spi), but spi master is still suspended, then we got these errors: 1970-01-01T08:02:59.607315+08:00 ERR kernel: [ 178.754005] cros-ec-spi spi2.0: spi transfer failed: -108 1970-01-01T08:02:59.607320+08:00 ERR kernel: [ 178.760102] cros-ec-spi spi2.0: cs- deassert spi transfer failed: -108 1970-01-01T08:02:59.607325+08:00 ERR kernel: [ 178.767380] cros-ec-spi spi2.0: Com mand xfer error (err:-108) 1970-01-01T08:02:59.607331+08:00 ERR kernel: [ 178.773963] cros-ec-spi spi2.0: spi transfer failed: -108 1970-01-01T08:02:59.607336+08:00 ERR kernel: [ 178.780066] cros-ec-spi spi2.0: cs- deassert spi transfer failed: -108 1970-01-01T08:02:59.607341+08:00 ERR kernel: [ 178.787359] cros-ec-spi spi2.0: Com mand xfer error (err:-108) so other than move spi master suspend to late suspend, maybe we could defer rockchip drm driver probe after it's component drivers somehow? On 10/14/2017 02:44 AM, jeffy wrote: > Hi Dmitry, > > On 10/14/2017 02:30 AM, Dmitry Torokhov wrote: >> On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote: >>> Hi guys, >>> >>> it looks like the suspend sequence depends on the dt node sequence, >>> and we >>> are putting display-subsystem dt node above spi dt node, so it would be >>> earlier in the device list, then got suspended later than spi device. >> >> Would it not get a deferral when trying to get resource reference, which >> would cause it bumped down to the end of dpm list? > hmm, right, check again, the rockchip drm would not depend on spi, but > the edp driver does. > > so the drm driver(display-subsystem) would probed before spi, but try to > control the backlight in the suspend/resume... > > so i was wrong in the commit message, will fix it in next version. >> >>> >>> the pwm backlight and cros_ec_spi pwm are very interesting, not only >>> about >>> suspend dependency... if we unbind cros_ec_spi pwm, the pwm backlight >>> would >>> still hold a reference to it, and crash the kernel later. >> >> That would be a bug in PWM/cors_ec and it should keep the PWM object >> until last reference drops and simply error out on all requests. > right, and maybe try to refresh the pwm reference when we bind it again >> >>> >>> On 10/14/2017 12:42 AM, Mark Brown wrote: >>>> On Fri, Oct 13, 2017 at 08:51:21AM -0700, Brian Norris wrote: >>>> >>>>> Yes, this does seem odd to me too. This looks like an arms race hack >>>>> that should be avoided unless we know a legit root cause. Also, >>>>> "probe order implies suspend order" doesn't quite work for async >>>>> suspend >>>>> anyway, so we'd probably want to express the dependency properly >>>>> anyway. >>>> >>>> Yeah, it's the same stuff as we get with initcall ordering. This sort >>>> of thing does happen with things like PMICs which tend to have hardware >>>> that the system wants to manipulate in the IRQs off part of suspend. >>>> Ideally the dependency annotation stuff would figure things out though >>>> I'm not sure what the status of that is. >> >> I'd say non-existent for resources such as regulators, pwms, clocks, >> etc. I do not think many places call device_link_add()... I think adding >> this to devm_* APIs might be easiest to get the ball going as they >> naturally have consumer device and can easily figure out the supplier >> side. >> >>>> >>>>> Any chance this is related? Seems like that might break the >>>>> parent/child >>>>> relationship for master/slave: >>>> >>>>> commit d7e2ee257038baeb03baef602500368a51ee9eef >>>>> Author: Linus Walleij <linus.walleij@linaro.org> >>>>> Date: Mon Apr 11 13:51:03 2016 +0200 >>>> >>>>> spi: let SPI masters ignore their children for PM >>>> >>>> That's for runtime PM, I'd not expect it to affect system suspend. >>>> >>> >>> >> >> Thanks. >> >
On Fri, Oct 13, 2017 at 07:37:50PM +0100, Mark Brown wrote: > On Fri, Oct 13, 2017 at 11:30:06AM -0700, Dmitry Torokhov wrote: > > On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote: > > > > > Yeah, it's the same stuff as we get with initcall ordering. This sort > > > > of thing does happen with things like PMICs which tend to have hardware > > > > that the system wants to manipulate in the IRQs off part of suspend. > > > > Ideally the dependency annotation stuff would figure things out though > > > > I'm not sure what the status of that is. > > > I'd say non-existent for resources such as regulators, pwms, clocks, > > etc. I do not think many places call device_link_add()... I think adding > > this to devm_* APIs might be easiest to get the ball going as they > > naturally have consumer device and can easily figure out the supplier > > side. > > Hrm, are things in a state where we're supposed to be doing that? I've > not seen anyone even trying which is a bit odd, I'd thought there was > some core work still ongoing (and it wasn't clear to me if we were going > to try to do things like use DT/ACPI cross references to figure this > stuff out). Not sure, adding Rafael...
On Saturday, October 14, 2017 1:45:11 AM CEST Dmitry Torokhov wrote: > On Fri, Oct 13, 2017 at 07:37:50PM +0100, Mark Brown wrote: > > On Fri, Oct 13, 2017 at 11:30:06AM -0700, Dmitry Torokhov wrote: > > > On Sat, Oct 14, 2017 at 02:19:28AM +0800, jeffy wrote: > > > > > > > Yeah, it's the same stuff as we get with initcall ordering. This sort > > > > > of thing does happen with things like PMICs which tend to have hardware > > > > > that the system wants to manipulate in the IRQs off part of suspend. > > > > > Ideally the dependency annotation stuff would figure things out though > > > > > I'm not sure what the status of that is. > > > > > I'd say non-existent for resources such as regulators, pwms, clocks, > > > etc. I do not think many places call device_link_add()... I think adding > > > this to devm_* APIs might be easiest to get the ball going as they > > > naturally have consumer device and can easily figure out the supplier > > > side. > > > > Hrm, are things in a state where we're supposed to be doing that? I've > > not seen anyone even trying which is a bit odd, I'd thought there was > > some core work still ongoing (and it wasn't clear to me if we were going > > to try to do things like use DT/ACPI cross references to figure this > > stuff out). > > Not sure, adding Rafael... The infrastructure in the core is there, the part related to getting that information from firmware is (mostly) missing. Still, moving system suspend/resume callbacks to the late/early phases may be a good idea anyway. Thanks, Rafael
On Sat, Oct 14, 2017 at 02:15:34AM +0200, Rafael J. Wysocki wrote: > The infrastructure in the core is there, the part related to getting that > information from firmware is (mostly) missing. Ah, yes - now I remember. We want this for probe ordering so we don't really want to be doing it in the subsystems when they request resources as that's too late (though I guess duplication there won't hurt). Fun.
diff --git a/drivers/spi/spi-rockchip.c b/drivers/spi/spi-rockchip.c index fdcf3076681b..ae539c735ea6 100644 --- a/drivers/spi/spi-rockchip.c +++ b/drivers/spi/spi-rockchip.c @@ -914,7 +914,7 @@ static int rockchip_spi_runtime_resume(struct device *dev) #endif /* CONFIG_PM */ static const struct dev_pm_ops rockchip_spi_pm = { - SET_SYSTEM_SLEEP_PM_OPS(rockchip_spi_suspend, rockchip_spi_resume) + SET_LATE_SYSTEM_SLEEP_PM_OPS(rockchip_spi_suspend, rockchip_spi_resume) SET_RUNTIME_PM_OPS(rockchip_spi_runtime_suspend, rockchip_spi_runtime_resume, NULL) };
Currently we are suspending the spi master in it's ->suspend callback, which is racy as some other drivers may still want to transmit messages on the bus(e.g. spi based pwm backlight). Convert to late and early system PM callbacks to avoid the race. Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> --- drivers/spi/spi-rockchip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)