Message ID | 20210428092500.23521-1-tony@atomide.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PATCHv2] drm/omap: Fix issue with clocks left on after resume | expand |
Hi Tony, Thank you for the patch. On Wed, Apr 28, 2021 at 12:25:00PM +0300, Tony Lindgren wrote: > On resume, dispc pm_runtime_force_resume() is not enabling the hardware > as we pass the pm_runtime_need_not_resume() test as the device is suspended > with no child devices. > > As the resume continues, omap_atomic_comit_tail() calls dispc_runtime_get() > that calls rpm_resume() enabling the hardware, and increasing child_count > for it's parent device. > > But at this point device_complete() has not yet been called for dispc. So > when omap_atomic_comit_tail() calls dispc_runtime_put(), it won't idle > the hardware as rpm_suspend() returns -EBUSY, and the clocks are left on > after resume. The parent child count is not decremented as the -EBUSY > cannot be easily handled until later on after device_complete(). > > This can be easily seen for example after suspending Beagleboard-X15 with > no displays connected, and by reading the CM_DSS_DSS_CLKCTRL register at > 0x4a009120 after resume. After a suspend and resume cycle, it shows a > value of 0x00040102 instead of 0x00070000 like it should. > > Let's fix the issue by calling dispc_runtime_suspend() and > dispc_runtime_resume() directly from dispc_suspend() and dispc_resume(). > This leaves out the PM runtime related issues for system suspend. > > We could handle the issue by adding more calls to dispc_runtime_get() > and dispc_runtime_put() from omap_drm_suspend() and omap_drm_resume() > as suggested by Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>. > But that would just add more inter-component calls and more dependencies > to PM runtime for system suspend and does not make things easier in the > long. Based on my experience on the camera and display side with devices that are made of multiple components, suspend and resume are best handled in a controlled way by the top-level driver. Otherwise you end up having different components suspending and resuming in random orders, and that's a recipe for failure. Can we get the omapdrm suspend/resume to run first/last, and stop/restart the whole device from there ? > See also earlier commit 88d26136a256 ("PM: Prevent runtime suspend during > system resume") and commit ca8199f13498 ("drm/msm/dpu: ensure device > suspend happens during PM sleep") for more information. > > Fixes: ecfdedd7da5d ("drm/omap: force runtime PM suspend on system suspend") > Signed-off-by: Tony Lindgren <tony@atomide.com> > --- > > Changes since v1: > - Updated the description for a typo noticed by Tomi > - Added more info about what all goes wrong > > --- > drivers/gpu/drm/omapdrm/dss/dispc.c | 27 ++++++++++++++++++++++++++- > 1 file changed, 26 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/omapdrm/dss/dispc.c b/drivers/gpu/drm/omapdrm/dss/dispc.c > --- a/drivers/gpu/drm/omapdrm/dss/dispc.c > +++ b/drivers/gpu/drm/omapdrm/dss/dispc.c > @@ -182,6 +182,7 @@ struct dispc_device { > const struct dispc_features *feat; > > bool is_enabled; > + bool needs_resume; > > struct regmap *syscon_pol; > u32 syscon_pol_offset; > @@ -4887,10 +4888,34 @@ static int dispc_runtime_resume(struct device *dev) > return 0; > } > > +static int dispc_suspend(struct device *dev) > +{ > + struct dispc_device *dispc = dev_get_drvdata(dev); > + > + if (!dispc->is_enabled) > + return 0; > + > + dispc->needs_resume = true; > + > + return dispc_runtime_suspend(dev); > +} > + > +static int dispc_resume(struct device *dev) > +{ > + struct dispc_device *dispc = dev_get_drvdata(dev); > + > + if (!dispc->needs_resume) > + return 0; > + > + dispc->needs_resume = false; > + > + return dispc_runtime_resume(dev); > +} > + > static const struct dev_pm_ops dispc_pm_ops = { > .runtime_suspend = dispc_runtime_suspend, > .runtime_resume = dispc_runtime_resume, > - SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume) > + SET_LATE_SYSTEM_SLEEP_PM_OPS(dispc_suspend, dispc_resume) > }; > > struct platform_driver omap_dispchw_driver = {
Hi, * Laurent Pinchart <laurent.pinchart@ideasonboard.com> [210428 14:10]: > Based on my experience on the camera and display side with devices that > are made of multiple components, suspend and resume are best handled in > a controlled way by the top-level driver. Otherwise you end up having > different components suspending and resuming in random orders, and > that's a recipe for failure. Manually suspending and resuming the components should be doable based on the registered components. However, I'm not sure it buys much in this case since we do have Linux driver module take care of things for us for most part :) The dss hardware is really a private interconnect with a control module, and a collection of various mostly independent display output device modules. We also have the interconnect target module to deal with for each module, and have the interconnect hierachy mapped in the devicetree. So we already have Linux driver module take care of the device hierarchy. Because the child components are mostly independent, it should not matter in which order they suspend and resume related to each other. I think the remaining issue is how dispc should provide services to the other components. If dispc needs to be enabled to provide services to the other modules, maybe there's some better Linux generic framework dispc could implement? That is other than PM runtime calls for routing the signals to the output modules? Then PM runtime can be handled private to the dispc module. Decoupling the system suspend and resume from PM runtime calls for all the other dss components should still also be done IMO. But that can be done as a separate clean-up patches after we have fixed the $subject issue. > Can we get the omapdrm suspend/resume to run first/last, and > stop/restart the whole device from there ? This is already the case since commit ecfdedd7da5d ("drm/omap: force runtime PM suspend on system suspend"). We have omap_drv use SIMPLE_DEV_PM_OPS, and the components use SET_LATE_SYSTEM_SLEEP_PM_OPS. So omap_drv suspends first and resumes last. The order should not matter for other components. Well that is as long as we can deal with dispc providing resources. I think we really should also change omap_drv use prepare/complete ops, and have the components use standard SIMPLE_DEV_PM_OPS. That still won't help with PM runtime related issues for system suspend and resume though, but leaves out the need for late pm ops. Regards, Tony
On 29/04/2021 07:46, Tony Lindgren wrote: > Hi, > > * Laurent Pinchart <laurent.pinchart@ideasonboard.com> [210428 14:10]: >> Based on my experience on the camera and display side with devices that >> are made of multiple components, suspend and resume are best handled in >> a controlled way by the top-level driver. Otherwise you end up having >> different components suspending and resuming in random orders, and >> that's a recipe for failure. > > Manually suspending and resuming the components should be doable based > on the registered components. However, I'm not sure it buys much in > this case since we do have Linux driver module take care of things for > us for most part :) > > The dss hardware is really a private interconnect with a control module, > and a collection of various mostly independent display output device > modules. > > We also have the interconnect target module to deal with for each > module, and have the interconnect hierachy mapped in the devicetree. > So we already have Linux driver module take care of the device > hierarchy. > > Because the child components are mostly independent, it should not > matter in which order they suspend and resume related to each other. > > I think the remaining issue is how dispc should provide services to > the other components. > > If dispc needs to be enabled to provide services to the other modules, > maybe there's some better Linux generic framework dispc could implement? > That is other than PM runtime calls for routing the signals to the > output modules? Then PM runtime can be handled private to the dispc > module. What would be the difference? The dispc service would just call runtime get and put, like it does now, wouldn't it? > Decoupling the system suspend and resume from PM runtime calls for > all the other dss components should still also be done IMO. But that > can be done as a separate clean-up patches after we have fixed the > $subject issue. I don't think I still really understand why all this is needed. I mean, obviously things don't work correctly at the moment, so maybe this patch can be applied to fix the system suspend. But it just feels like a big hack (the current pm_runtime_force_suspend/resume work-around feels like a big hack too). Why doesn't the suspend just work? Afaics, the runtime PM code in omapdrm is fine: the dependencies work nicely, things get runtime suspended and resumes correctly. And at system suspend, omapdrm will disable the whole display pipeline (including bridges, panels) in a controlled manner, which results in the appropriate runtime PM calls. I think this should just work. But it doesn't, because... runtime PM and system suspend don't quite work well together? Or something. So is it something that omapdrm is doing in a wrong way, or is the PM framework just messed up, and other drivers need to dance around the problems too? >> Can we get the omapdrm suspend/resume to run first/last, and >> stop/restart the whole device from there ? > > This is already the case since commit ecfdedd7da5d ("drm/omap: force > runtime PM suspend on system suspend"). We have omap_drv use > SIMPLE_DEV_PM_OPS, and the components use SET_LATE_SYSTEM_SLEEP_PM_OPS. > So omap_drv suspends first and resumes last. The order should not > matter for other components. Well that is as long as we can deal > with dispc providing resources. > > I think we really should also change omap_drv use prepare/complete ops, > and have the components use standard SIMPLE_DEV_PM_OPS. That still > won't help with PM runtime related issues for system suspend and > resume though, but leaves out the need for late pm ops. Why do we need to do the above? What would omapdrm do in prepare & complete? Why would we use SIMPLE_DEV_PM_OPS for the dss subcomponents? Slightly off topic, but I just noticed that we're using runtime_put_sync for some reason. Found 0eaf9f52e94f756147dbfe1faf1f77a02378dbf9. I've been fighting with system suspend for a long time =). I wonder if using non-sync version would remove the EBUSY problem... Tomi
* Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> [210503 08:04]: > On 29/04/2021 07:46, Tony Lindgren wrote: > > I think the remaining issue is how dispc should provide services to > > the other components. > > > > If dispc needs to be enabled to provide services to the other modules, > > maybe there's some better Linux generic framework dispc could implement? > > That is other than PM runtime calls for routing the signals to the > > output modules? Then PM runtime can be handled private to the dispc > > module. > > What would be the difference? The dispc service would just call runtime get > and put, like it does now, wouldn't it? I was thinking that we could have dispc always enabled when services from dispc have been requested. I'm not sure if we need to toggle dispc state, having it set to SYSC_IDLE_SMART mode might be enough. And I think the clocks are already on for dispc from the top level dss module if any of the dss components are active. I could be wrong though as I don't know enough about the dss hardware :) > > Decoupling the system suspend and resume from PM runtime calls for > > all the other dss components should still also be done IMO. But that > > can be done as a separate clean-up patches after we have fixed the > > $subject issue. > > I don't think I still really understand why all this is needed. I mean, > obviously things don't work correctly at the moment, so maybe this patch can > be applied to fix the system suspend. But it just feels like a big hack (the > current pm_runtime_force_suspend/resume work-around feels like a big hack > too). Well omapdrm is not handling the -EBUSY error during system resume. > Why doesn't the suspend just work? Afaics, the runtime PM code in omapdrm is > fine: the dependencies work nicely, things get runtime suspended and resumes > correctly. And at system suspend, omapdrm will disable the whole display > pipeline (including bridges, panels) in a controlled manner, which results > in the appropriate runtime PM calls. I think this should just work. But it > doesn't, because... runtime PM and system suspend don't quite work well > together? Or something. Right, PM runtime and system suspend should not be mixed together. > So is it something that omapdrm is doing in a wrong way, or is the PM > framework just messed up, and other drivers need to dance around the > problems too? I think we have omapdrm close to doing the right things. But trying to use PM runtime for system suspend is like using a Rube Goldberg machine to turn off the lights at night when you want to sleep :p > > I think we really should also change omap_drv use prepare/complete ops, > > and have the components use standard SIMPLE_DEV_PM_OPS. That still > > won't help with PM runtime related issues for system suspend and > > resume though, but leaves out the need for late pm ops. > > Why do we need to do the above? What would omapdrm do in prepare & complete? > Why would we use SIMPLE_DEV_PM_OPS for the dss subcomponents? That would leave out the need to use late_pm_ops at all for the driver, we should suspend a bit earlier, and resume a bit later. > Slightly off topic, but I just noticed that we're using runtime_put_sync for > some reason. Found 0eaf9f52e94f756147dbfe1faf1f77a02378dbf9. I've been > fighting with system suspend for a long time =). > > I wonder if using non-sync version would remove the EBUSY problem... Worth trying, but it will only help if the -EBUSY error from pm_runtime_put() is handled somewhere for a retry.. Regards, Tony
Hi, * Tony Lindgren <tony@atomide.com> [210503 10:45]: > * Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> [210503 08:04]: > > On 29/04/2021 07:46, Tony Lindgren wrote: > > > Decoupling the system suspend and resume from PM runtime calls for > > > all the other dss components should still also be done IMO. But that > > > can be done as a separate clean-up patches after we have fixed the > > > $subject issue. > > > > I don't think I still really understand why all this is needed. I mean, > > obviously things don't work correctly at the moment, so maybe this patch can > > be applied to fix the system suspend. But it just feels like a big hack (the > > current pm_runtime_force_suspend/resume work-around feels like a big hack > > too). > > Well omapdrm is not handling the -EBUSY error during system resume. Or rather something on the resume path is not handling and cannot handle -EBUSY. And sounds like the reason is.. > > Slightly off topic, but I just noticed that we're using runtime_put_sync for > > some reason. Found 0eaf9f52e94f756147dbfe1faf1f77a02378dbf9. I've been > > fighting with system suspend for a long time =). > > > > I wonder if using non-sync version would remove the EBUSY problem... > > Worth trying, but it will only help if the -EBUSY error from > pm_runtime_put() is handled somewhere for a retry.. ..the use of pm_runtime_put_sync() like you suggested. I did a quick test with the minimal change below and that works :) Seems like that's probably the best minimal fix for the -rc cycle. Regards, Tony 8< ---------------- diff --git a/drivers/gpu/drm/omapdrm/dss/dispc.c b/drivers/gpu/drm/omapdrm/dss/dispc.c --- a/drivers/gpu/drm/omapdrm/dss/dispc.c +++ b/drivers/gpu/drm/omapdrm/dss/dispc.c @@ -664,7 +664,7 @@ void dispc_runtime_put(struct dispc_device *dispc) DSSDBG("dispc_runtime_put\n"); - r = pm_runtime_put_sync(&dispc->pdev->dev); + r = pm_runtime_put(&dispc->pdev->dev); WARN_ON(r < 0 && r != -ENOSYS); }
* Tony Lindgren <tony@atomide.com> [210503 11:16]: > ..the use of pm_runtime_put_sync() like you suggested. I did a quick > test with the minimal change below and that works :) Seems like that's > probably the best minimal fix for the -rc cycle. Sorry I was mistaken, the patch below won't help for the omapdrm PM runtime state on suspend. I had patch "bus: ti-sysc: Fix am335x resume hang for usb otg module" applied, and that changes ti-sysc to get rid of the PM runtime calls during system suspend. The side effect is ti-sysc now ignores the module PM runtime state on suspend. This is pretty much what _od_suspend_noirq() was also doing. I think we still fix the dispc related issue too, otherwise the parent child_count will just keep increasing on each suspend. I check that again though. Regards, Tony > 8< ---------------- > diff --git a/drivers/gpu/drm/omapdrm/dss/dispc.c b/drivers/gpu/drm/omapdrm/dss/dispc.c > --- a/drivers/gpu/drm/omapdrm/dss/dispc.c > +++ b/drivers/gpu/drm/omapdrm/dss/dispc.c > @@ -664,7 +664,7 @@ void dispc_runtime_put(struct dispc_device *dispc) > > DSSDBG("dispc_runtime_put\n"); > > - r = pm_runtime_put_sync(&dispc->pdev->dev); > + r = pm_runtime_put(&dispc->pdev->dev); > WARN_ON(r < 0 && r != -ENOSYS); > } >
* Tony Lindgren <tony@atomide.com> [210503 12:18]: > I think we still fix the dispc related issue too, otherwise the parent > child_count will just keep increasing on each suspend. I check that > again though. After tons more debugging, I found the root cause for the parent child_count increasing and posted a patch for it at [0] below. This means the $subject patch here can be done later on as clean-up to leave out lots of unnecessary PM runtime calls and simplify the system suspend :) Regards, Tony [0] https://lore.kernel.org/linux-pm/20210505110915.6861-1-tony@atomide.com/T/#u
On 05/05/2021 14:12, Tony Lindgren wrote: > * Tony Lindgren <tony@atomide.com> [210503 12:18]: >> I think we still fix the dispc related issue too, otherwise the parent >> child_count will just keep increasing on each suspend. I check that >> again though. > > After tons more debugging, I found the root cause for the parent child_count > increasing and posted a patch for it at [0] below. > > This means the $subject patch here can be done later on as clean-up to > leave out lots of unnecessary PM runtime calls and simplify the system > suspend :) Great work, thanks! It's always nice when someone else does the tons of debugging ;). I tested the patch you sent, works fine for me. While testing I noticed another problem, which happens also without your fix: go to suspend with HDMI connected, remove the cable, resume the board -> boom. I didn't debug that yet. Tomi
diff --git a/drivers/gpu/drm/omapdrm/dss/dispc.c b/drivers/gpu/drm/omapdrm/dss/dispc.c --- a/drivers/gpu/drm/omapdrm/dss/dispc.c +++ b/drivers/gpu/drm/omapdrm/dss/dispc.c @@ -182,6 +182,7 @@ struct dispc_device { const struct dispc_features *feat; bool is_enabled; + bool needs_resume; struct regmap *syscon_pol; u32 syscon_pol_offset; @@ -4887,10 +4888,34 @@ static int dispc_runtime_resume(struct device *dev) return 0; } +static int dispc_suspend(struct device *dev) +{ + struct dispc_device *dispc = dev_get_drvdata(dev); + + if (!dispc->is_enabled) + return 0; + + dispc->needs_resume = true; + + return dispc_runtime_suspend(dev); +} + +static int dispc_resume(struct device *dev) +{ + struct dispc_device *dispc = dev_get_drvdata(dev); + + if (!dispc->needs_resume) + return 0; + + dispc->needs_resume = false; + + return dispc_runtime_resume(dev); +} + static const struct dev_pm_ops dispc_pm_ops = { .runtime_suspend = dispc_runtime_suspend, .runtime_resume = dispc_runtime_resume, - SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume) + SET_LATE_SYSTEM_SLEEP_PM_OPS(dispc_suspend, dispc_resume) }; struct platform_driver omap_dispchw_driver = {
On resume, dispc pm_runtime_force_resume() is not enabling the hardware as we pass the pm_runtime_need_not_resume() test as the device is suspended with no child devices. As the resume continues, omap_atomic_comit_tail() calls dispc_runtime_get() that calls rpm_resume() enabling the hardware, and increasing child_count for it's parent device. But at this point device_complete() has not yet been called for dispc. So when omap_atomic_comit_tail() calls dispc_runtime_put(), it won't idle the hardware as rpm_suspend() returns -EBUSY, and the clocks are left on after resume. The parent child count is not decremented as the -EBUSY cannot be easily handled until later on after device_complete(). This can be easily seen for example after suspending Beagleboard-X15 with no displays connected, and by reading the CM_DSS_DSS_CLKCTRL register at 0x4a009120 after resume. After a suspend and resume cycle, it shows a value of 0x00040102 instead of 0x00070000 like it should. Let's fix the issue by calling dispc_runtime_suspend() and dispc_runtime_resume() directly from dispc_suspend() and dispc_resume(). This leaves out the PM runtime related issues for system suspend. We could handle the issue by adding more calls to dispc_runtime_get() and dispc_runtime_put() from omap_drm_suspend() and omap_drm_resume() as suggested by Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>. But that would just add more inter-component calls and more dependencies to PM runtime for system suspend and does not make things easier in the long. See also earlier commit 88d26136a256 ("PM: Prevent runtime suspend during system resume") and commit ca8199f13498 ("drm/msm/dpu: ensure device suspend happens during PM sleep") for more information. Fixes: ecfdedd7da5d ("drm/omap: force runtime PM suspend on system suspend") Signed-off-by: Tony Lindgren <tony@atomide.com> --- Changes since v1: - Updated the description for a typo noticed by Tomi - Added more info about what all goes wrong --- drivers/gpu/drm/omapdrm/dss/dispc.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)