Message ID | 20180919121342.18288-1-m.szyprowski@samsung.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | [RFC/PATCH] clk: samsung: exynos5433: Fix NOIRQ suspend/resume support | expand |
On 19 September 2018 at 14:13, Marek Szyprowski <m.szyprowski@samsung.com> wrote: > My recent commit af68ec14f112 'clk: samsung: Use NOIRQ stage for Exynos5433 > clocks suspend/resume' moved Exynos5433 CMU suspend/resume callbacks > to NOIRQ stage. It turned out that in some cases this is not enough to > ensure that clocks provided by the given CMU will be available for other > drivers during their NOIRQ system suspend/resume phase. Device core calls > pm_runtime_disable and pm_runtime_enable unconditionally on every device > during LATE system suspend/resume phase. This means that during > noirq_resume, CMU device cannot be runtime resumed, because its > power.disable_depth is higher than zero (runtime PM for it will be enabled > later in device_resume_early function). > > To let other devices properly use clocks and keep runtime PM enabled also > during NOIRQ system sleep phase, add calls to pm_runtime_enable/disable > to balance the actions done by device core. The CMU device will be > suspended in its noirq_suspend callback and finally resumed in noirq_resume. > > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> > --- > Hi! > > The unconditional calls to pm_runtime_enable and pm_runtime_enable > in device_suspend_late and device_resume_early core functions were a bit > surprising for me, because they make NOIRQ phase a special case from > the perspective of runtime PM handling. Using LATE system sleep phase > to call pm_runtime_enable/disable looks like a workaround for me, but right > now I have no other idea how to ensure proper behavior of the Exynos5433 > CMU driver. > > The most strange thing is that I didn't observe this initially after > switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under > any power domain behaved correctly and was properly available for DWMMC > device, which needs to enable clocks in its noirq_resume callback. > > However, in case of AUD_CMU and its clients (for example serial_3 device, > which is a part of audio-subsytem), the pm_runtime_get_sync() called from > clk_prepare() failed, because AUD_CMU device had still disabled runtime PM > support. The only difference between FSYS_CMU and AUD_CMU, is the fact > that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power > domain assigned (for completely other reasons FSYS power domains is not yet > instantiated in exynos5433.dtsi). > > Ulf/Rafael: could You comment a bit on this issue? How it should be solved > properly? In general, this sounds like the classical problem of suspending/resuming devices in a correct order. The long term solution is to use device links to address this problem. However, as a simple fix, I would try to add a ->prepare() callback and call pm_runtime_get_sync() from it and then in a new corresponding ->complete() callback, call pm_runtime_put(). This means that the device will stay runtime resumed until the suspend_noirq phase. It would also means the pm_runtime_force_resume() will resume the device at resume_noirq phase. Would that work? > > Best regards > Marek Szyprowski > Samsung R&D Institute Poland > --- > drivers/clk/samsung/clk-exynos5433.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/drivers/clk/samsung/clk-exynos5433.c b/drivers/clk/samsung/clk-exynos5433.c > index 4a0f8ff87ca0..7e6c0c9397a9 100644 > --- a/drivers/clk/samsung/clk-exynos5433.c > +++ b/drivers/clk/samsung/clk-exynos5433.c > @@ -5623,6 +5623,24 @@ static int __maybe_unused exynos5433_cmu_resume(struct device *dev) > return 0; > } > > +/* > + * late_suspend and early_resume callbacks are required to balance > + * pm_runtime_disable and pm_runtime_enable calls in device_suspend_late > + * and device_resume_early core functions to keep runtime enabled for > + * CMU device during noirq sleep phase. > + */ > +static int __maybe_unused exynos5433_late_suspend(struct device *dev) > +{ > + pm_runtime_enable(dev); > + return 0; > +} > + > +static int __maybe_unused exynos5433_early_resume(struct device *dev) > +{ > + pm_runtime_disable(dev); > + return 0; > +} > + > static int __init exynos5433_cmu_probe(struct platform_device *pdev) > { > const struct samsung_cmu_info *info; > @@ -5760,6 +5778,8 @@ static const struct of_device_id exynos5433_cmu_of_match[] = { > static const struct dev_pm_ops exynos5433_cmu_pm_ops = { > SET_RUNTIME_PM_OPS(exynos5433_cmu_suspend, exynos5433_cmu_resume, > NULL) > + SET_LATE_SYSTEM_SLEEP_PM_OPS(exynos5433_late_suspend, > + exynos5433_early_resume) > SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, > pm_runtime_force_resume) > }; > -- > 2.17.1 > Kind regards Uffe
On Wed, Sep 19, 2018 at 2:14 PM Marek Szyprowski <m.szyprowski@samsung.com> wrote: > > My recent commit af68ec14f112 'clk: samsung: Use NOIRQ stage for Exynos5433 > clocks suspend/resume' moved Exynos5433 CMU suspend/resume callbacks > to NOIRQ stage. It turned out that in some cases this is not enough to > ensure that clocks provided by the given CMU will be available for other > drivers during their NOIRQ system suspend/resume phase. Device core calls > pm_runtime_disable and pm_runtime_enable unconditionally on every device > during LATE system suspend/resume phase. This means that during > noirq_resume, CMU device cannot be runtime resumed, because its > power.disable_depth is higher than zero (runtime PM for it will be enabled > later in device_resume_early function). > > To let other devices properly use clocks and keep runtime PM enabled also > during NOIRQ system sleep phase, No. > add calls to pm_runtime_enable/disable to balance the actions done by device core. There's nothing to balance, PM-runtime doesn't work during system-wide PM transitions as a rule. The only thing you can do is to runtime-resume a device before "suspend late", but you can't runtime-suspend anything anyway. > The CMU device will be suspended in its noirq_suspend callback and finally resumed in noirq_resume. > > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> > --- > Hi! > > The unconditional calls to pm_runtime_enable and pm_runtime_enable > in device_suspend_late and device_resume_early core functions were a bit > surprising for me, I wonder why. It has been like this for a long time and it is quite well documented AFAICS. > because they make NOIRQ phase a special case from > the perspective of runtime PM handling. Well, system-wide PM transitions are not "runtime". They use different sets of callbacks for a reason. > Using LATE system sleep phase > to call pm_runtime_enable/disable looks like a workaround for me, No, it just plain doesn't work. > but right now I have no other idea how to ensure proper behavior of the Exynos5433 > CMU driver. > > The most strange thing is that I didn't observe this initially after > switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under > any power domain behaved correctly and was properly available for DWMMC > device, which needs to enable clocks in its noirq_resume callback. > > However, in case of AUD_CMU and its clients (for example serial_3 device, > which is a part of audio-subsytem), the pm_runtime_get_sync() called from > clk_prepare() failed, because AUD_CMU device had still disabled runtime PM > support. The only difference between FSYS_CMU and AUD_CMU, is the fact > that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power > domain assigned (for completely other reasons FSYS power domains is not yet > instantiated in exynos5433.dtsi). > > Ulf/Rafael: could You comment a bit on this issue? How it should be solved > properly? Well, as a rule, don't use PM-runtime from your system-wide PM callbacks. The only thing you can do is to resume a device from runtime suspend in ->prepare or ->suspend in order to change its configuration and suspend it later properly in the next callbacks. With that in mind, ensure the proper ordering of system-wide suspend of the devices as Ulf said. Thanks, Rafael
Hi Ulf, On 2018-09-19 23:18, Ulf Hansson wrote: > On 19 September 2018 at 14:13, Marek Szyprowski > <m.szyprowski@samsung.com> wrote: >> My recent commit af68ec14f112 'clk: samsung: Use NOIRQ stage for Exynos5433 >> clocks suspend/resume' moved Exynos5433 CMU suspend/resume callbacks >> to NOIRQ stage. It turned out that in some cases this is not enough to >> ensure that clocks provided by the given CMU will be available for other >> drivers during their NOIRQ system suspend/resume phase. Device core calls >> pm_runtime_disable and pm_runtime_enable unconditionally on every device >> during LATE system suspend/resume phase. This means that during >> noirq_resume, CMU device cannot be runtime resumed, because its >> power.disable_depth is higher than zero (runtime PM for it will be enabled >> later in device_resume_early function). >> >> To let other devices properly use clocks and keep runtime PM enabled also >> during NOIRQ system sleep phase, add calls to pm_runtime_enable/disable >> to balance the actions done by device core. The CMU device will be >> suspended in its noirq_suspend callback and finally resumed in noirq_resume. > >> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> >> --- >> Hi! >> >> The unconditional calls to pm_runtime_enable and pm_runtime_enable >> in device_suspend_late and device_resume_early core functions were a bit >> surprising for me, because they make NOIRQ phase a special case from >> the perspective of runtime PM handling. Using LATE system sleep phase >> to call pm_runtime_enable/disable looks like a workaround for me, but right >> now I have no other idea how to ensure proper behavior of the Exynos5433 >> CMU driver. >> >> The most strange thing is that I didn't observe this initially after >> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under >> any power domain behaved correctly and was properly available for DWMMC >> device, which needs to enable clocks in its noirq_resume callback. >> >> However, in case of AUD_CMU and its clients (for example serial_3 device, >> which is a part of audio-subsytem), the pm_runtime_get_sync() called from >> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM >> support. The only difference between FSYS_CMU and AUD_CMU, is the fact >> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power >> domain assigned (for completely other reasons FSYS power domains is not yet >> instantiated in exynos5433.dtsi). >> >> Ulf/Rafael: could You comment a bit on this issue? How it should be solved >> properly? > In general, this sounds like the classical problem of > suspending/resuming devices in a correct order. The long term solution > is to use device links to address this problem. Device links won't help in this case. This is not a problem of correct order of suspending/resuming devices. exynos5433 cmu device is already suspended after its client's and resumed before them. The problem here is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume stage. > However, as a simple fix, I would try to add a ->prepare() callback > and call pm_runtime_get_sync() from it and then in a new corresponding > ->complete() callback, call pm_runtime_put(). > > This means that the device will stay runtime resumed until the > suspend_noirq phase. It would also means the pm_runtime_force_resume() > will resume the device at resume_noirq phase. Would that work? Yes, this will work, but it will unnecessarily wake all instances of exynos5433 cmu and their power domains during the system suspend/resume cycle, what I would like to avoid. Are there any reasons for disabling runtime PM unconditionally for every device in the system during LATE system sleep stage? > ... Best regards
On Thu, Sep 20, 2018 at 3:42 PM Marek Szyprowski <m.szyprowski@samsung.com> wrote: > > Hi Ulf, > > On 2018-09-19 23:18, Ulf Hansson wrote: > > On 19 September 2018 at 14:13, Marek Szyprowski > > <m.szyprowski@samsung.com> wrote: > >> My recent commit af68ec14f112 'clk: samsung: Use NOIRQ stage for Exynos5433 > >> clocks suspend/resume' moved Exynos5433 CMU suspend/resume callbacks > >> to NOIRQ stage. It turned out that in some cases this is not enough to > >> ensure that clocks provided by the given CMU will be available for other > >> drivers during their NOIRQ system suspend/resume phase. Device core calls > >> pm_runtime_disable and pm_runtime_enable unconditionally on every device > >> during LATE system suspend/resume phase. This means that during > >> noirq_resume, CMU device cannot be runtime resumed, because its > >> power.disable_depth is higher than zero (runtime PM for it will be enabled > >> later in device_resume_early function). > >> > >> To let other devices properly use clocks and keep runtime PM enabled also > >> during NOIRQ system sleep phase, add calls to pm_runtime_enable/disable > >> to balance the actions done by device core. The CMU device will be > >> suspended in its noirq_suspend callback and finally resumed in noirq_resume. > > > >> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> > >> --- > >> Hi! > >> > >> The unconditional calls to pm_runtime_enable and pm_runtime_enable > >> in device_suspend_late and device_resume_early core functions were a bit > >> surprising for me, because they make NOIRQ phase a special case from > >> the perspective of runtime PM handling. Using LATE system sleep phase > >> to call pm_runtime_enable/disable looks like a workaround for me, but right > >> now I have no other idea how to ensure proper behavior of the Exynos5433 > >> CMU driver. > >> > >> The most strange thing is that I didn't observe this initially after > >> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under > >> any power domain behaved correctly and was properly available for DWMMC > >> device, which needs to enable clocks in its noirq_resume callback. > >> > >> However, in case of AUD_CMU and its clients (for example serial_3 device, > >> which is a part of audio-subsytem), the pm_runtime_get_sync() called from > >> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM > >> support. The only difference between FSYS_CMU and AUD_CMU, is the fact > >> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power > >> domain assigned (for completely other reasons FSYS power domains is not yet > >> instantiated in exynos5433.dtsi). > >> > >> Ulf/Rafael: could You comment a bit on this issue? How it should be solved > >> properly? > > In general, this sounds like the classical problem of > > suspending/resuming devices in a correct order. The long term solution > > is to use device links to address this problem. > > Device links won't help in this case. This is not a problem of correct > order of suspending/resuming devices. exynos5433 cmu device is already > suspended after its client's and resumed before them. The problem here > is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume > stage. Simply put, if a driver uses PM-runtime in its noirq system-wide callbacks, this is a bug in the driver. > > However, as a simple fix, I would try to add a ->prepare() callback > > and call pm_runtime_get_sync() from it and then in a new corresponding > > ->complete() callback, call pm_runtime_put(). > > > > This means that the device will stay runtime resumed until the > > suspend_noirq phase. It would also means the pm_runtime_force_resume() > > will resume the device at resume_noirq phase. Would that work? > > Yes, this will work, but it will unnecessarily wake all instances of > exynos5433 cmu and their power domains during the system suspend/resume > cycle, what I would like to avoid. > > Are there any reasons for disabling runtime PM unconditionally for every > device in the system during LATE system sleep stage? Yes, there are. It generally is not safe to do that, because the state of the device after/during late suspend generally may be different than expected by its PM-runtime callbacks and that doesn't apply just to the device itself, but also to its parent and possible suppliers and all of their ancestors and suppliers. The resume of a device generally triggers a resume of all devices it depends on and that may not be expected by their drivers and the bus types/PM domains they depend on and/or work with. Your particular case appears to be simple enough, but there are more complicated ones. Also, it generally is not a good idea to rely on PM-runtime for system-wide PM, because PM-runtime may be effectively disabled by user space via sysfs for any device. IOW, pm_runtime_suspend() or pm_runtime_put_sync() or equivalent doesn't guarantee that the device will actually be suspended and system-wide PM is expected to work regardless. So, please abandon the runtime-resume-during-noirq-phase-of-system-wide-suspend idea and find a different way to organize your code. Thanks, Rafael
[...] >>> >>> The unconditional calls to pm_runtime_enable and pm_runtime_enable >>> in device_suspend_late and device_resume_early core functions were a bit >>> surprising for me, because they make NOIRQ phase a special case from >>> the perspective of runtime PM handling. Using LATE system sleep phase >>> to call pm_runtime_enable/disable looks like a workaround for me, but right >>> now I have no other idea how to ensure proper behavior of the Exynos5433 >>> CMU driver. >>> >>> The most strange thing is that I didn't observe this initially after >>> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under >>> any power domain behaved correctly and was properly available for DWMMC >>> device, which needs to enable clocks in its noirq_resume callback. >>> >>> However, in case of AUD_CMU and its clients (for example serial_3 device, >>> which is a part of audio-subsytem), the pm_runtime_get_sync() called from >>> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM >>> support. The only difference between FSYS_CMU and AUD_CMU, is the fact >>> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power >>> domain assigned (for completely other reasons FSYS power domains is not yet >>> instantiated in exynos5433.dtsi). >>> >>> Ulf/Rafael: could You comment a bit on this issue? How it should be solved >>> properly? >> In general, this sounds like the classical problem of >> suspending/resuming devices in a correct order. The long term solution >> is to use device links to address this problem. > > Device links won't help in this case. This is not a problem of correct > order of suspending/resuming devices. exynos5433 cmu device is already > suspended after its client's and resumed before them. The problem here > is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume > stage. > I see. However, I am guessing that one of the reason to why the client's are also using the noirq phase is because of getting a the correct order. The point is, the different level of PM callbacks (prepare, suspend, suspend_late, suspend_noirq etc) aren't really the right solution to get a correct order, even if that is what works for most cases. In principle, if all devices dependencies were hooked up correctly with device links, we would in principle only need one level of suspend/resume callbacks. >> However, as a simple fix, I would try to add a ->prepare() callback >> and call pm_runtime_get_sync() from it and then in a new corresponding >> ->complete() callback, call pm_runtime_put(). >> >> This means that the device will stay runtime resumed until the >> suspend_noirq phase. It would also means the pm_runtime_force_resume() >> will resume the device at resume_noirq phase. Would that work? > > Yes, this will work, but it will unnecessarily wake all instances of > exynos5433 cmu and their power domains during the system suspend/resume > cycle, what I would like to avoid. I guess what you are saying is that it's not at every suspend sequence that the client's needs to runtime resume the exynos5433 cmu device - and in those cases when not needed, runtime resuming it becomes a waste in regards to both time and energy. Right? Anyway, I fully agree that this isn't very nice, but unfortunate that's the best I can think of as a simple solution. [...] Kind regards Uffe
On Thursday, September 20, 2018 7:29:46 PM CEST Ulf Hansson wrote: > [...] > > >>> > >>> The unconditional calls to pm_runtime_enable and pm_runtime_enable > >>> in device_suspend_late and device_resume_early core functions were a bit > >>> surprising for me, because they make NOIRQ phase a special case from > >>> the perspective of runtime PM handling. Using LATE system sleep phase > >>> to call pm_runtime_enable/disable looks like a workaround for me, but right > >>> now I have no other idea how to ensure proper behavior of the Exynos5433 > >>> CMU driver. > >>> > >>> The most strange thing is that I didn't observe this initially after > >>> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under > >>> any power domain behaved correctly and was properly available for DWMMC > >>> device, which needs to enable clocks in its noirq_resume callback. > >>> > >>> However, in case of AUD_CMU and its clients (for example serial_3 device, > >>> which is a part of audio-subsytem), the pm_runtime_get_sync() called from > >>> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM > >>> support. The only difference between FSYS_CMU and AUD_CMU, is the fact > >>> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power > >>> domain assigned (for completely other reasons FSYS power domains is not yet > >>> instantiated in exynos5433.dtsi). > >>> > >>> Ulf/Rafael: could You comment a bit on this issue? How it should be solved > >>> properly? > >> In general, this sounds like the classical problem of > >> suspending/resuming devices in a correct order. The long term solution > >> is to use device links to address this problem. > > > > Device links won't help in this case. This is not a problem of correct > > order of suspending/resuming devices. exynos5433 cmu device is already > > suspended after its client's and resumed before them. The problem here > > is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume > > stage. > > > > I see. > > However, I am guessing that one of the reason to why the client's are > also using the noirq phase is because of getting a the correct order. > The point is, the different level of PM callbacks (prepare, suspend, > suspend_late, suspend_noirq etc) aren't really the right solution to > get a correct order, even if that is what works for most cases. > > In principle, if all devices dependencies were hooked up correctly > with device links, we would in principle only need one level of > suspend/resume callbacks. Well, theoretically. :-) Some of them serve a specific purpose in some cases, like in the PCI bus type devices are put into D-states by the bus type layer in the noirq suspend phase. > >> However, as a simple fix, I would try to add a ->prepare() callback > >> and call pm_runtime_get_sync() from it and then in a new corresponding > >> ->complete() callback, call pm_runtime_put(). > >> > >> This means that the device will stay runtime resumed until the > >> suspend_noirq phase. It would also means the pm_runtime_force_resume() > >> will resume the device at resume_noirq phase. Would that work? > > > > Yes, this will work, but it will unnecessarily wake all instances of > > exynos5433 cmu and their power domains during the system suspend/resume > > cycle, what I would like to avoid. > > I guess what you are saying is that it's not at every suspend sequence > that the client's needs to runtime resume the exynos5433 cmu device - > and in those cases when not needed, runtime resuming it becomes a > waste in regards to both time and energy. Right? > > Anyway, I fully agree that this isn't very nice, but unfortunate > that's the best I can think of as a simple solution. Resuming in ->prepare() is a bit wasteful, because they are run sequentially, so the times of all of the resumes add up. It is slightly more efficient to do it in ->suspend() in general. However, in this particular case, is there any reason why the clients cannot resume their providers in ->suspend()? They also might drop references to them on the way out, in ->resume() to balance the usage counters. Thanks, Rafael
[...] >> >> However, as a simple fix, I would try to add a ->prepare() callback >> >> and call pm_runtime_get_sync() from it and then in a new corresponding >> >> ->complete() callback, call pm_runtime_put(). >> >> >> >> This means that the device will stay runtime resumed until the >> >> suspend_noirq phase. It would also means the pm_runtime_force_resume() >> >> will resume the device at resume_noirq phase. Would that work? >> > >> > Yes, this will work, but it will unnecessarily wake all instances of >> > exynos5433 cmu and their power domains during the system suspend/resume >> > cycle, what I would like to avoid. >> >> I guess what you are saying is that it's not at every suspend sequence >> that the client's needs to runtime resume the exynos5433 cmu device - >> and in those cases when not needed, runtime resuming it becomes a >> waste in regards to both time and energy. Right? >> >> Anyway, I fully agree that this isn't very nice, but unfortunate >> that's the best I can think of as a simple solution. > > Resuming in ->prepare() is a bit wasteful, because they are run sequentially, > so the times of all of the resumes add up. It is slightly more efficient to > do it in ->suspend() in general. Good point! > > However, in this particular case, is there any reason why the clients cannot > resume their providers in ->suspend()? They also might drop references to > them on the way out, in ->resume() to balance the usage counters. I think the problem is that the clients don't know their providers. If they did, I rather think using device links should be the preferred solution. To make a client aware of its providers, for the general case, we would need to invent some DT bindings to describe the dependencies. Kind regards Uffe
Hi Rafael and Ulf, I understand your position on using runtime PM during system suspend/resume, but I would like add a few more words on why I need it. On 2018-09-20 19:29, Ulf Hansson wrote: > [...] >>>> The unconditional calls to pm_runtime_enable and pm_runtime_enable >>>> in device_suspend_late and device_resume_early core functions were a bit >>>> surprising for me, because they make NOIRQ phase a special case from >>>> the perspective of runtime PM handling. Using LATE system sleep phase >>>> to call pm_runtime_enable/disable looks like a workaround for me, but right >>>> now I have no other idea how to ensure proper behavior of the Exynos5433 >>>> CMU driver. >>>> >>>> The most strange thing is that I didn't observe this initially after >>>> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under >>>> any power domain behaved correctly and was properly available for DWMMC >>>> device, which needs to enable clocks in its noirq_resume callback. >>>> >>>> However, in case of AUD_CMU and its clients (for example serial_3 device, >>>> which is a part of audio-subsytem), the pm_runtime_get_sync() called from >>>> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM >>>> support. The only difference between FSYS_CMU and AUD_CMU, is the fact >>>> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power >>>> domain assigned (for completely other reasons FSYS power domains is not yet >>>> instantiated in exynos5433.dtsi). >>>> >>>> Ulf/Rafael: could You comment a bit on this issue? How it should be solved >>>> properly? >>> In general, this sounds like the classical problem of >>> suspending/resuming devices in a correct order. The long term solution >>> is to use device links to address this problem. >> Device links won't help in this case. This is not a problem of correct >> order of suspending/resuming devices. exynos5433 cmu device is already >> suspended after its client's and resumed before them. The problem here >> is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume >> stage. >> > I see. > > However, I am guessing that one of the reason to why the client's are > also using the noirq phase is because of getting a the correct order. > The point is, the different level of PM callbacks (prepare, suspend, > suspend_late, suspend_noirq etc) aren't really the right solution to > get a correct order, even if that is what works for most cases. Nope, in both cases clients (dw-mmc-exynos and samsung serial/uart driver) use NOIRQ resume stage, because they want to play with interrupts (avoid interrupt storm) on resume. The only way to avoid NOIRQ phase in them is to manually mask their irqs on suspend and unmask after fixing interrupts related registers, but using NOIRQ was preferred solution. The main problem is that both clients needs to enable clocks to access their registers in NOIRQ resume, thus their clock provider has to be able to runtime resume as well. clk_prepare() will call pm_runtime_get_sync() on clock provider device, but first someone has to call pm_runtime_enable() on it, otherwise pm_runtimge_get_sync() fails. > In principle, if all devices dependencies were hooked up correctly > with device links, we would in principle only need one level of > suspend/resume callbacks. > >>> However, as a simple fix, I would try to add a ->prepare() callback >>> and call pm_runtime_get_sync() from it and then in a new corresponding >>> ->complete() callback, call pm_runtime_put(). >>> >>> This means that the device will stay runtime resumed until the >>> suspend_noirq phase. It would also means the pm_runtime_force_resume() >>> will resume the device at resume_noirq phase. Would that work? >> Yes, this will work, but it will unnecessarily wake all instances of >> exynos5433 cmu and their power domains during the system suspend/resume >> cycle, what I would like to avoid. > I guess what you are saying is that it's not at every suspend sequence > that the client's needs to runtime resume the exynos5433 cmu device - > and in those cases when not needed, runtime resuming it becomes a > waste in regards to both time and energy. Right? Exactly. The other problem is that clock provider has to ensure that calling pm_runtime_get_sync() on its device during NOIRQ phase will not fail to let clients to properly prepare+enable their clocks. > Anyway, I fully agree that this isn't very nice, but unfortunate > that's the best I can think of as a simple solution. > [...] > Best regards
On 1 October 2018 at 14:34, Marek Szyprowski <m.szyprowski@samsung.com> wrote: > Hi Rafael and Ulf, > > I understand your position on using runtime PM during system > suspend/resume, but > I would like add a few more words on why I need it. > > On 2018-09-20 19:29, Ulf Hansson wrote: >> [...] >>>>> The unconditional calls to pm_runtime_enable and pm_runtime_enable >>>>> in device_suspend_late and device_resume_early core functions were a bit >>>>> surprising for me, because they make NOIRQ phase a special case from >>>>> the perspective of runtime PM handling. Using LATE system sleep phase >>>>> to call pm_runtime_enable/disable looks like a workaround for me, but right >>>>> now I have no other idea how to ensure proper behavior of the Exynos5433 >>>>> CMU driver. >>>>> >>>>> The most strange thing is that I didn't observe this initially after >>>>> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under >>>>> any power domain behaved correctly and was properly available for DWMMC >>>>> device, which needs to enable clocks in its noirq_resume callback. >>>>> >>>>> However, in case of AUD_CMU and its clients (for example serial_3 device, >>>>> which is a part of audio-subsytem), the pm_runtime_get_sync() called from >>>>> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM >>>>> support. The only difference between FSYS_CMU and AUD_CMU, is the fact >>>>> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power >>>>> domain assigned (for completely other reasons FSYS power domains is not yet >>>>> instantiated in exynos5433.dtsi). >>>>> >>>>> Ulf/Rafael: could You comment a bit on this issue? How it should be solved >>>>> properly? >>>> In general, this sounds like the classical problem of >>>> suspending/resuming devices in a correct order. The long term solution >>>> is to use device links to address this problem. >>> Device links won't help in this case. This is not a problem of correct >>> order of suspending/resuming devices. exynos5433 cmu device is already >>> suspended after its client's and resumed before them. The problem here >>> is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume >>> stage. >>> >> I see. >> >> However, I am guessing that one of the reason to why the client's are >> also using the noirq phase is because of getting a the correct order. >> The point is, the different level of PM callbacks (prepare, suspend, >> suspend_late, suspend_noirq etc) aren't really the right solution to >> get a correct order, even if that is what works for most cases. > > Nope, in both cases clients (dw-mmc-exynos and samsung serial/uart driver) > use NOIRQ resume stage, because they want to play with interrupts (avoid > interrupt storm) on resume. The only way to avoid NOIRQ phase in them is > to manually mask their irqs on suspend and unmask after fixing interrupts > related registers, but using NOIRQ was preferred solution. I see. > > The main problem is that both clients needs to enable clocks to access > their registers in NOIRQ resume, thus their clock provider has to be able > to runtime resume as well. clk_prepare() will call pm_runtime_get_sync() > on clock provider device, but first someone has to call pm_runtime_enable() > on it, otherwise pm_runtimge_get_sync() fails. In principle the sequence you are describing doesn't sound like a very unique case, so yes, I fully agree that it's problematic. So far, the solution to this problem has been to resume devices unconditionally, instead of using runtime PM for example. In this case, resuming would need to be done even as early as in the noirq phase. Allowing runtime PM to stay enabled, instead of disabling it from PM core in the device suspend late phase, could be an option. However, in such case in needs to be done on case by case basis, like an opt-in solution, otherwise it may cause problems. Rafael, what do you think, is this doable? > >> In principle, if all devices dependencies were hooked up correctly >> with device links, we would in principle only need one level of >> suspend/resume callbacks. >> >>>> However, as a simple fix, I would try to add a ->prepare() callback >>>> and call pm_runtime_get_sync() from it and then in a new corresponding >>>> ->complete() callback, call pm_runtime_put(). >>>> >>>> This means that the device will stay runtime resumed until the >>>> suspend_noirq phase. It would also means the pm_runtime_force_resume() >>>> will resume the device at resume_noirq phase. Would that work? >>> Yes, this will work, but it will unnecessarily wake all instances of >>> exynos5433 cmu and their power domains during the system suspend/resume >>> cycle, what I would like to avoid. >> I guess what you are saying is that it's not at every suspend sequence >> that the client's needs to runtime resume the exynos5433 cmu device - >> and in those cases when not needed, runtime resuming it becomes a >> waste in regards to both time and energy. Right? > > Exactly. The other problem is that clock provider has to ensure that > calling pm_runtime_get_sync() on its device during NOIRQ phase will > not fail to let clients to properly prepare+enable their clocks. Okay, thanks for confirming. > >> Anyway, I fully agree that this isn't very nice, but unfortunate >> that's the best I can think of as a simple solution. > >> [...] >> Kind regards Uffe
diff --git a/drivers/clk/samsung/clk-exynos5433.c b/drivers/clk/samsung/clk-exynos5433.c index 4a0f8ff87ca0..7e6c0c9397a9 100644 --- a/drivers/clk/samsung/clk-exynos5433.c +++ b/drivers/clk/samsung/clk-exynos5433.c @@ -5623,6 +5623,24 @@ static int __maybe_unused exynos5433_cmu_resume(struct device *dev) return 0; } +/* + * late_suspend and early_resume callbacks are required to balance + * pm_runtime_disable and pm_runtime_enable calls in device_suspend_late + * and device_resume_early core functions to keep runtime enabled for + * CMU device during noirq sleep phase. + */ +static int __maybe_unused exynos5433_late_suspend(struct device *dev) +{ + pm_runtime_enable(dev); + return 0; +} + +static int __maybe_unused exynos5433_early_resume(struct device *dev) +{ + pm_runtime_disable(dev); + return 0; +} + static int __init exynos5433_cmu_probe(struct platform_device *pdev) { const struct samsung_cmu_info *info; @@ -5760,6 +5778,8 @@ static const struct of_device_id exynos5433_cmu_of_match[] = { static const struct dev_pm_ops exynos5433_cmu_pm_ops = { SET_RUNTIME_PM_OPS(exynos5433_cmu_suspend, exynos5433_cmu_resume, NULL) + SET_LATE_SYSTEM_SLEEP_PM_OPS(exynos5433_late_suspend, + exynos5433_early_resume) SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume) };
My recent commit af68ec14f112 'clk: samsung: Use NOIRQ stage for Exynos5433 clocks suspend/resume' moved Exynos5433 CMU suspend/resume callbacks to NOIRQ stage. It turned out that in some cases this is not enough to ensure that clocks provided by the given CMU will be available for other drivers during their NOIRQ system suspend/resume phase. Device core calls pm_runtime_disable and pm_runtime_enable unconditionally on every device during LATE system suspend/resume phase. This means that during noirq_resume, CMU device cannot be runtime resumed, because its power.disable_depth is higher than zero (runtime PM for it will be enabled later in device_resume_early function). To let other devices properly use clocks and keep runtime PM enabled also during NOIRQ system sleep phase, add calls to pm_runtime_enable/disable to balance the actions done by device core. The CMU device will be suspended in its noirq_suspend callback and finally resumed in noirq_resume. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> --- Hi! The unconditional calls to pm_runtime_enable and pm_runtime_enable in device_suspend_late and device_resume_early core functions were a bit surprising for me, because they make NOIRQ phase a special case from the perspective of runtime PM handling. Using LATE system sleep phase to call pm_runtime_enable/disable looks like a workaround for me, but right now I have no other idea how to ensure proper behavior of the Exynos5433 CMU driver. The most strange thing is that I didn't observe this initially after switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under any power domain behaved correctly and was properly available for DWMMC device, which needs to enable clocks in its noirq_resume callback. However, in case of AUD_CMU and its clients (for example serial_3 device, which is a part of audio-subsytem), the pm_runtime_get_sync() called from clk_prepare() failed, because AUD_CMU device had still disabled runtime PM support. The only difference between FSYS_CMU and AUD_CMU, is the fact that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power domain assigned (for completely other reasons FSYS power domains is not yet instantiated in exynos5433.dtsi). Ulf/Rafael: could You comment a bit on this issue? How it should be solved properly? Best regards Marek Szyprowski Samsung R&D Institute Poland --- drivers/clk/samsung/clk-exynos5433.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)