diff mbox series

[RFC/PATCH] clk: samsung: exynos5433: Fix NOIRQ suspend/resume support

Message ID 20180919121342.18288-1-m.szyprowski@samsung.com (mailing list archive)
State Not Applicable, archived
Headers show
Series [RFC/PATCH] clk: samsung: exynos5433: Fix NOIRQ suspend/resume support | expand

Commit Message

Marek Szyprowski Sept. 19, 2018, 12:13 p.m. UTC
My recent commit af68ec14f112 'clk: samsung: Use NOIRQ stage for Exynos5433
clocks suspend/resume' moved Exynos5433 CMU suspend/resume callbacks
to NOIRQ stage. It turned out that in some cases this is not enough to
ensure that clocks provided by the given CMU will be available for other
drivers during their NOIRQ system suspend/resume phase. Device core calls
pm_runtime_disable and pm_runtime_enable unconditionally on every device
during LATE system suspend/resume phase. This means that during
noirq_resume, CMU device cannot be runtime resumed, because its
power.disable_depth is higher than zero (runtime PM for it will be enabled
later in device_resume_early function).

To let other devices properly use clocks and keep runtime PM enabled also
during NOIRQ system sleep phase, add calls to pm_runtime_enable/disable
to balance the actions done by device core. The CMU device will be
suspended in its noirq_suspend callback and finally resumed in noirq_resume.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
Hi!

The unconditional calls to pm_runtime_enable and pm_runtime_enable
in device_suspend_late and device_resume_early core functions were a bit
surprising for me, because they make NOIRQ phase a special case from
the perspective of runtime PM handling. Using LATE system sleep phase
to call pm_runtime_enable/disable looks like a workaround for me, but right
now I have no other idea how to ensure proper behavior of the Exynos5433
CMU driver.

The most strange thing is that I didn't observe this initially after
switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under
any power domain behaved correctly and was properly available for DWMMC
device, which needs to enable clocks in its noirq_resume callback.

However, in case of AUD_CMU and its clients (for example serial_3 device,
which is a part of audio-subsytem), the pm_runtime_get_sync() called from
clk_prepare() failed, because AUD_CMU device had still disabled runtime PM
support. The only difference between FSYS_CMU and AUD_CMU, is the fact
that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power
domain assigned (for completely other reasons FSYS power domains is not yet
instantiated in exynos5433.dtsi).

Ulf/Rafael: could You comment a bit on this issue? How it should be solved
properly?

Best regards
Marek Szyprowski
Samsung R&D Institute Poland
---
 drivers/clk/samsung/clk-exynos5433.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

Ulf Hansson Sept. 19, 2018, 9:18 p.m. UTC | #1
On 19 September 2018 at 14:13, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> My recent commit af68ec14f112 'clk: samsung: Use NOIRQ stage for Exynos5433
> clocks suspend/resume' moved Exynos5433 CMU suspend/resume callbacks
> to NOIRQ stage. It turned out that in some cases this is not enough to
> ensure that clocks provided by the given CMU will be available for other
> drivers during their NOIRQ system suspend/resume phase. Device core calls
> pm_runtime_disable and pm_runtime_enable unconditionally on every device
> during LATE system suspend/resume phase. This means that during
> noirq_resume, CMU device cannot be runtime resumed, because its
> power.disable_depth is higher than zero (runtime PM for it will be enabled
> later in device_resume_early function).
>
> To let other devices properly use clocks and keep runtime PM enabled also
> during NOIRQ system sleep phase, add calls to pm_runtime_enable/disable
> to balance the actions done by device core. The CMU device will be
> suspended in its noirq_suspend callback and finally resumed in noirq_resume.


>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
> Hi!
>
> The unconditional calls to pm_runtime_enable and pm_runtime_enable
> in device_suspend_late and device_resume_early core functions were a bit
> surprising for me, because they make NOIRQ phase a special case from
> the perspective of runtime PM handling. Using LATE system sleep phase
> to call pm_runtime_enable/disable looks like a workaround for me, but right
> now I have no other idea how to ensure proper behavior of the Exynos5433
> CMU driver.
>
> The most strange thing is that I didn't observe this initially after
> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under
> any power domain behaved correctly and was properly available for DWMMC
> device, which needs to enable clocks in its noirq_resume callback.
>
> However, in case of AUD_CMU and its clients (for example serial_3 device,
> which is a part of audio-subsytem), the pm_runtime_get_sync() called from
> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM
> support. The only difference between FSYS_CMU and AUD_CMU, is the fact
> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power
> domain assigned (for completely other reasons FSYS power domains is not yet
> instantiated in exynos5433.dtsi).
>
> Ulf/Rafael: could You comment a bit on this issue? How it should be solved
> properly?

In general, this sounds like the classical problem of
suspending/resuming devices in a correct order. The long term solution
is to use device links to address this problem.

However, as a simple fix, I would try to add a ->prepare() callback
and call pm_runtime_get_sync() from it and then in a new corresponding
->complete() callback, call pm_runtime_put().

This means that the device will stay runtime resumed until the
suspend_noirq phase. It would also means the pm_runtime_force_resume()
will resume the device at resume_noirq phase. Would that work?

>
> Best regards
> Marek Szyprowski
> Samsung R&D Institute Poland
> ---
>  drivers/clk/samsung/clk-exynos5433.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/drivers/clk/samsung/clk-exynos5433.c b/drivers/clk/samsung/clk-exynos5433.c
> index 4a0f8ff87ca0..7e6c0c9397a9 100644
> --- a/drivers/clk/samsung/clk-exynos5433.c
> +++ b/drivers/clk/samsung/clk-exynos5433.c
> @@ -5623,6 +5623,24 @@ static int __maybe_unused exynos5433_cmu_resume(struct device *dev)
>         return 0;
>  }
>
> +/*
> + * late_suspend and early_resume callbacks are required to balance
> + * pm_runtime_disable and pm_runtime_enable calls in device_suspend_late
> + * and device_resume_early core functions to keep runtime enabled for
> + * CMU device during noirq sleep phase.
> + */
> +static int __maybe_unused exynos5433_late_suspend(struct device *dev)
> +{
> +       pm_runtime_enable(dev);
> +       return 0;
> +}
> +
> +static int __maybe_unused exynos5433_early_resume(struct device *dev)
> +{
> +       pm_runtime_disable(dev);
> +       return 0;
> +}
> +
>  static int __init exynos5433_cmu_probe(struct platform_device *pdev)
>  {
>         const struct samsung_cmu_info *info;
> @@ -5760,6 +5778,8 @@ static const struct of_device_id exynos5433_cmu_of_match[] = {
>  static const struct dev_pm_ops exynos5433_cmu_pm_ops = {
>         SET_RUNTIME_PM_OPS(exynos5433_cmu_suspend, exynos5433_cmu_resume,
>                            NULL)
> +       SET_LATE_SYSTEM_SLEEP_PM_OPS(exynos5433_late_suspend,
> +                                    exynos5433_early_resume)
>         SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
>                                      pm_runtime_force_resume)
>  };
> --
> 2.17.1
>

Kind regards
Uffe
Rafael J. Wysocki Sept. 19, 2018, 10:27 p.m. UTC | #2
On Wed, Sep 19, 2018 at 2:14 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> My recent commit af68ec14f112 'clk: samsung: Use NOIRQ stage for Exynos5433
> clocks suspend/resume' moved Exynos5433 CMU suspend/resume callbacks
> to NOIRQ stage. It turned out that in some cases this is not enough to
> ensure that clocks provided by the given CMU will be available for other
> drivers during their NOIRQ system suspend/resume phase. Device core calls
> pm_runtime_disable and pm_runtime_enable unconditionally on every device
> during LATE system suspend/resume phase. This means that during
> noirq_resume, CMU device cannot be runtime resumed, because its
> power.disable_depth is higher than zero (runtime PM for it will be enabled
> later in device_resume_early function).
>
> To let other devices properly use clocks and keep runtime PM enabled also
> during NOIRQ system sleep phase,

No.

> add calls to pm_runtime_enable/disable to balance the actions done by device core.

There's nothing to balance, PM-runtime doesn't work during system-wide
PM transitions as a rule.

The only thing you can do is to runtime-resume a device before
"suspend late", but you can't runtime-suspend anything anyway.

> The CMU device will be suspended in its noirq_suspend callback and finally resumed in noirq_resume.
>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
> Hi!
>
> The unconditional calls to pm_runtime_enable and pm_runtime_enable
> in device_suspend_late and device_resume_early core functions were a bit
> surprising for me,

I wonder why.  It has been like this for a long time and it is quite
well documented AFAICS.

> because they make NOIRQ phase a special case from
> the perspective of runtime PM handling.

Well, system-wide PM transitions are not "runtime".  They use
different sets of callbacks for a reason.

> Using LATE system sleep phase
> to call pm_runtime_enable/disable looks like a workaround for me,

No, it just plain doesn't work.

> but right now I have no other idea how to ensure proper behavior of the Exynos5433
> CMU driver.
>
> The most strange thing is that I didn't observe this initially after
> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under
> any power domain behaved correctly and was properly available for DWMMC
> device, which needs to enable clocks in its noirq_resume callback.
>
> However, in case of AUD_CMU and its clients (for example serial_3 device,
> which is a part of audio-subsytem), the pm_runtime_get_sync() called from
> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM
> support. The only difference between FSYS_CMU and AUD_CMU, is the fact
> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power
> domain assigned (for completely other reasons FSYS power domains is not yet
> instantiated in exynos5433.dtsi).
>
> Ulf/Rafael: could You comment a bit on this issue? How it should be solved
> properly?

Well, as a rule, don't use PM-runtime from your system-wide PM
callbacks.  The only thing you can do is to resume a device from
runtime suspend in ->prepare or ->suspend in order to change its
configuration and suspend it later properly in the next callbacks.

With that in mind, ensure the proper ordering of system-wide suspend
of the devices as Ulf said.

Thanks,
Rafael
Marek Szyprowski Sept. 20, 2018, 1:42 p.m. UTC | #3
Hi Ulf,

On 2018-09-19 23:18, Ulf Hansson wrote:
> On 19 September 2018 at 14:13, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> My recent commit af68ec14f112 'clk: samsung: Use NOIRQ stage for Exynos5433
>> clocks suspend/resume' moved Exynos5433 CMU suspend/resume callbacks
>> to NOIRQ stage. It turned out that in some cases this is not enough to
>> ensure that clocks provided by the given CMU will be available for other
>> drivers during their NOIRQ system suspend/resume phase. Device core calls
>> pm_runtime_disable and pm_runtime_enable unconditionally on every device
>> during LATE system suspend/resume phase. This means that during
>> noirq_resume, CMU device cannot be runtime resumed, because its
>> power.disable_depth is higher than zero (runtime PM for it will be enabled
>> later in device_resume_early function).
>>
>> To let other devices properly use clocks and keep runtime PM enabled also
>> during NOIRQ system sleep phase, add calls to pm_runtime_enable/disable
>> to balance the actions done by device core. The CMU device will be
>> suspended in its noirq_suspend callback and finally resumed in noirq_resume.
>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> ---
>> Hi!
>>
>> The unconditional calls to pm_runtime_enable and pm_runtime_enable
>> in device_suspend_late and device_resume_early core functions were a bit
>> surprising for me, because they make NOIRQ phase a special case from
>> the perspective of runtime PM handling. Using LATE system sleep phase
>> to call pm_runtime_enable/disable looks like a workaround for me, but right
>> now I have no other idea how to ensure proper behavior of the Exynos5433
>> CMU driver.
>>
>> The most strange thing is that I didn't observe this initially after
>> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under
>> any power domain behaved correctly and was properly available for DWMMC
>> device, which needs to enable clocks in its noirq_resume callback.
>>
>> However, in case of AUD_CMU and its clients (for example serial_3 device,
>> which is a part of audio-subsytem), the pm_runtime_get_sync() called from
>> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM
>> support. The only difference between FSYS_CMU and AUD_CMU, is the fact
>> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power
>> domain assigned (for completely other reasons FSYS power domains is not yet
>> instantiated in exynos5433.dtsi).
>>
>> Ulf/Rafael: could You comment a bit on this issue? How it should be solved
>> properly?
> In general, this sounds like the classical problem of
> suspending/resuming devices in a correct order. The long term solution
> is to use device links to address this problem.

Device links won't help in this case. This is not a problem of correct
order of suspending/resuming devices. exynos5433 cmu device is already
suspended after its client's and resumed before them. The problem here
is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume
stage.

> However, as a simple fix, I would try to add a ->prepare() callback
> and call pm_runtime_get_sync() from it and then in a new corresponding
> ->complete() callback, call pm_runtime_put().
>
> This means that the device will stay runtime resumed until the
> suspend_noirq phase. It would also means the pm_runtime_force_resume()
> will resume the device at resume_noirq phase. Would that work?

Yes, this will work, but it will unnecessarily wake all instances of
exynos5433 cmu and their power domains during the system suspend/resume
cycle, what I would like to avoid.

Are there any reasons for disabling runtime PM unconditionally for every
device in the system during LATE system sleep stage?

 > ...

Best regards
Rafael J. Wysocki Sept. 20, 2018, 2:24 p.m. UTC | #4
On Thu, Sep 20, 2018 at 3:42 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> Hi Ulf,
>
> On 2018-09-19 23:18, Ulf Hansson wrote:
> > On 19 September 2018 at 14:13, Marek Szyprowski
> > <m.szyprowski@samsung.com> wrote:
> >> My recent commit af68ec14f112 'clk: samsung: Use NOIRQ stage for Exynos5433
> >> clocks suspend/resume' moved Exynos5433 CMU suspend/resume callbacks
> >> to NOIRQ stage. It turned out that in some cases this is not enough to
> >> ensure that clocks provided by the given CMU will be available for other
> >> drivers during their NOIRQ system suspend/resume phase. Device core calls
> >> pm_runtime_disable and pm_runtime_enable unconditionally on every device
> >> during LATE system suspend/resume phase. This means that during
> >> noirq_resume, CMU device cannot be runtime resumed, because its
> >> power.disable_depth is higher than zero (runtime PM for it will be enabled
> >> later in device_resume_early function).
> >>
> >> To let other devices properly use clocks and keep runtime PM enabled also
> >> during NOIRQ system sleep phase, add calls to pm_runtime_enable/disable
> >> to balance the actions done by device core. The CMU device will be
> >> suspended in its noirq_suspend callback and finally resumed in noirq_resume.
> >
> >> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >> ---
> >> Hi!
> >>
> >> The unconditional calls to pm_runtime_enable and pm_runtime_enable
> >> in device_suspend_late and device_resume_early core functions were a bit
> >> surprising for me, because they make NOIRQ phase a special case from
> >> the perspective of runtime PM handling. Using LATE system sleep phase
> >> to call pm_runtime_enable/disable looks like a workaround for me, but right
> >> now I have no other idea how to ensure proper behavior of the Exynos5433
> >> CMU driver.
> >>
> >> The most strange thing is that I didn't observe this initially after
> >> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under
> >> any power domain behaved correctly and was properly available for DWMMC
> >> device, which needs to enable clocks in its noirq_resume callback.
> >>
> >> However, in case of AUD_CMU and its clients (for example serial_3 device,
> >> which is a part of audio-subsytem), the pm_runtime_get_sync() called from
> >> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM
> >> support. The only difference between FSYS_CMU and AUD_CMU, is the fact
> >> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power
> >> domain assigned (for completely other reasons FSYS power domains is not yet
> >> instantiated in exynos5433.dtsi).
> >>
> >> Ulf/Rafael: could You comment a bit on this issue? How it should be solved
> >> properly?
> > In general, this sounds like the classical problem of
> > suspending/resuming devices in a correct order. The long term solution
> > is to use device links to address this problem.
>
> Device links won't help in this case. This is not a problem of correct
> order of suspending/resuming devices. exynos5433 cmu device is already
> suspended after its client's and resumed before them. The problem here
> is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume
> stage.

Simply put, if a driver uses PM-runtime in its noirq system-wide
callbacks, this is a bug in the driver.

> > However, as a simple fix, I would try to add a ->prepare() callback
> > and call pm_runtime_get_sync() from it and then in a new corresponding
> > ->complete() callback, call pm_runtime_put().
> >
> > This means that the device will stay runtime resumed until the
> > suspend_noirq phase. It would also means the pm_runtime_force_resume()
> > will resume the device at resume_noirq phase. Would that work?
>
> Yes, this will work, but it will unnecessarily wake all instances of
> exynos5433 cmu and their power domains during the system suspend/resume
> cycle, what I would like to avoid.
>
> Are there any reasons for disabling runtime PM unconditionally for every
> device in the system during LATE system sleep stage?

Yes, there are.

It generally is not safe to do that, because the state of the device
after/during late suspend generally may be different than expected by
its PM-runtime callbacks and that doesn't apply just to the device
itself, but also to its parent and possible suppliers and all of their
ancestors and suppliers.  The resume of a device generally triggers a
resume of all devices it depends on and that may not be expected by
their drivers and the bus types/PM domains they depend on and/or work
with.

Your particular case appears to be simple enough, but there are more
complicated ones.

Also, it generally is not a good idea to rely on PM-runtime for
system-wide PM, because PM-runtime may be effectively disabled by user
space via sysfs for any device.  IOW, pm_runtime_suspend() or
pm_runtime_put_sync() or equivalent doesn't guarantee that the device
will actually be suspended and system-wide PM is expected to work
regardless.

So, please abandon the
runtime-resume-during-noirq-phase-of-system-wide-suspend idea and find
a different way to organize your code.

Thanks,
Rafael
Ulf Hansson Sept. 20, 2018, 5:29 p.m. UTC | #5
[...]

>>>
>>> The unconditional calls to pm_runtime_enable and pm_runtime_enable
>>> in device_suspend_late and device_resume_early core functions were a bit
>>> surprising for me, because they make NOIRQ phase a special case from
>>> the perspective of runtime PM handling. Using LATE system sleep phase
>>> to call pm_runtime_enable/disable looks like a workaround for me, but right
>>> now I have no other idea how to ensure proper behavior of the Exynos5433
>>> CMU driver.
>>>
>>> The most strange thing is that I didn't observe this initially after
>>> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under
>>> any power domain behaved correctly and was properly available for DWMMC
>>> device, which needs to enable clocks in its noirq_resume callback.
>>>
>>> However, in case of AUD_CMU and its clients (for example serial_3 device,
>>> which is a part of audio-subsytem), the pm_runtime_get_sync() called from
>>> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM
>>> support. The only difference between FSYS_CMU and AUD_CMU, is the fact
>>> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power
>>> domain assigned (for completely other reasons FSYS power domains is not yet
>>> instantiated in exynos5433.dtsi).
>>>
>>> Ulf/Rafael: could You comment a bit on this issue? How it should be solved
>>> properly?
>> In general, this sounds like the classical problem of
>> suspending/resuming devices in a correct order. The long term solution
>> is to use device links to address this problem.
>
> Device links won't help in this case. This is not a problem of correct
> order of suspending/resuming devices. exynos5433 cmu device is already
> suspended after its client's and resumed before them. The problem here
> is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume
> stage.
>

I see.

However, I am guessing that one of the reason to why the client's are
also using the noirq phase is because of getting a the correct order.
The point is, the different level of PM callbacks (prepare, suspend,
suspend_late, suspend_noirq etc) aren't really the right solution to
get a correct order, even if that is what works for most cases.

In principle, if all devices dependencies were hooked up correctly
with device links, we would in principle only need one level of
suspend/resume callbacks.

>> However, as a simple fix, I would try to add a ->prepare() callback
>> and call pm_runtime_get_sync() from it and then in a new corresponding
>> ->complete() callback, call pm_runtime_put().
>>
>> This means that the device will stay runtime resumed until the
>> suspend_noirq phase. It would also means the pm_runtime_force_resume()
>> will resume the device at resume_noirq phase. Would that work?
>
> Yes, this will work, but it will unnecessarily wake all instances of
> exynos5433 cmu and their power domains during the system suspend/resume
> cycle, what I would like to avoid.

I guess what you are saying is that it's not at every suspend sequence
that the client's needs to runtime resume the exynos5433 cmu device -
and in those cases when not  needed, runtime resuming it becomes a
waste in regards to both time and energy. Right?

Anyway, I fully agree that this isn't very nice, but unfortunate
that's the best I can think of as a simple solution.

[...]

Kind regards
Uffe
Rafael J. Wysocki Sept. 21, 2018, 12:10 p.m. UTC | #6
On Thursday, September 20, 2018 7:29:46 PM CEST Ulf Hansson wrote:
> [...]
> 
> >>>
> >>> The unconditional calls to pm_runtime_enable and pm_runtime_enable
> >>> in device_suspend_late and device_resume_early core functions were a bit
> >>> surprising for me, because they make NOIRQ phase a special case from
> >>> the perspective of runtime PM handling. Using LATE system sleep phase
> >>> to call pm_runtime_enable/disable looks like a workaround for me, but right
> >>> now I have no other idea how to ensure proper behavior of the Exynos5433
> >>> CMU driver.
> >>>
> >>> The most strange thing is that I didn't observe this initially after
> >>> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under
> >>> any power domain behaved correctly and was properly available for DWMMC
> >>> device, which needs to enable clocks in its noirq_resume callback.
> >>>
> >>> However, in case of AUD_CMU and its clients (for example serial_3 device,
> >>> which is a part of audio-subsytem), the pm_runtime_get_sync() called from
> >>> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM
> >>> support. The only difference between FSYS_CMU and AUD_CMU, is the fact
> >>> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power
> >>> domain assigned (for completely other reasons FSYS power domains is not yet
> >>> instantiated in exynos5433.dtsi).
> >>>
> >>> Ulf/Rafael: could You comment a bit on this issue? How it should be solved
> >>> properly?
> >> In general, this sounds like the classical problem of
> >> suspending/resuming devices in a correct order. The long term solution
> >> is to use device links to address this problem.
> >
> > Device links won't help in this case. This is not a problem of correct
> > order of suspending/resuming devices. exynos5433 cmu device is already
> > suspended after its client's and resumed before them. The problem here
> > is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume
> > stage.
> >
> 
> I see.
> 
> However, I am guessing that one of the reason to why the client's are
> also using the noirq phase is because of getting a the correct order.
> The point is, the different level of PM callbacks (prepare, suspend,
> suspend_late, suspend_noirq etc) aren't really the right solution to
> get a correct order, even if that is what works for most cases.
> 
> In principle, if all devices dependencies were hooked up correctly
> with device links, we would in principle only need one level of
> suspend/resume callbacks.

Well, theoretically. :-)

Some of them serve a specific purpose in some cases, like in the PCI bus
type devices are put into D-states by the bus type layer in the noirq
suspend phase.

> >> However, as a simple fix, I would try to add a ->prepare() callback
> >> and call pm_runtime_get_sync() from it and then in a new corresponding
> >> ->complete() callback, call pm_runtime_put().
> >>
> >> This means that the device will stay runtime resumed until the
> >> suspend_noirq phase. It would also means the pm_runtime_force_resume()
> >> will resume the device at resume_noirq phase. Would that work?
> >
> > Yes, this will work, but it will unnecessarily wake all instances of
> > exynos5433 cmu and their power domains during the system suspend/resume
> > cycle, what I would like to avoid.
> 
> I guess what you are saying is that it's not at every suspend sequence
> that the client's needs to runtime resume the exynos5433 cmu device -
> and in those cases when not  needed, runtime resuming it becomes a
> waste in regards to both time and energy. Right?
> 
> Anyway, I fully agree that this isn't very nice, but unfortunate
> that's the best I can think of as a simple solution.

Resuming in ->prepare() is a bit wasteful, because they are run sequentially,
so the times of all of the resumes add up.  It is slightly more efficient to
do it in ->suspend() in general.

However, in this particular case, is there any reason why the clients cannot
resume their providers in ->suspend()?  They also might drop references to
them on the way out, in ->resume() to balance the usage counters.

Thanks,
Rafael
Ulf Hansson Sept. 25, 2018, 2:18 p.m. UTC | #7
[...]

>> >> However, as a simple fix, I would try to add a ->prepare() callback
>> >> and call pm_runtime_get_sync() from it and then in a new corresponding
>> >> ->complete() callback, call pm_runtime_put().
>> >>
>> >> This means that the device will stay runtime resumed until the
>> >> suspend_noirq phase. It would also means the pm_runtime_force_resume()
>> >> will resume the device at resume_noirq phase. Would that work?
>> >
>> > Yes, this will work, but it will unnecessarily wake all instances of
>> > exynos5433 cmu and their power domains during the system suspend/resume
>> > cycle, what I would like to avoid.
>>
>> I guess what you are saying is that it's not at every suspend sequence
>> that the client's needs to runtime resume the exynos5433 cmu device -
>> and in those cases when not  needed, runtime resuming it becomes a
>> waste in regards to both time and energy. Right?
>>
>> Anyway, I fully agree that this isn't very nice, but unfortunate
>> that's the best I can think of as a simple solution.
>
> Resuming in ->prepare() is a bit wasteful, because they are run sequentially,
> so the times of all of the resumes add up.  It is slightly more efficient to
> do it in ->suspend() in general.

Good point!

>
> However, in this particular case, is there any reason why the clients cannot
> resume their providers in ->suspend()?  They also might drop references to
> them on the way out, in ->resume() to balance the usage counters.

I think the problem is that the clients don't know their providers. If
they did, I rather think using device links should be the preferred
solution.

To make a client aware of its providers, for the general case, we
would need to invent some DT bindings to describe the dependencies.

Kind regards
Uffe
Marek Szyprowski Oct. 1, 2018, 12:34 p.m. UTC | #8
Hi Rafael and Ulf,

I understand your position on using runtime PM during system
suspend/resume, but
I would like add a few more words on why I need it.

On 2018-09-20 19:29, Ulf Hansson wrote:
> [...]
>>>> The unconditional calls to pm_runtime_enable and pm_runtime_enable
>>>> in device_suspend_late and device_resume_early core functions were a bit
>>>> surprising for me, because they make NOIRQ phase a special case from
>>>> the perspective of runtime PM handling. Using LATE system sleep phase
>>>> to call pm_runtime_enable/disable looks like a workaround for me, but right
>>>> now I have no other idea how to ensure proper behavior of the Exynos5433
>>>> CMU driver.
>>>>
>>>> The most strange thing is that I didn't observe this initially after
>>>> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under
>>>> any power domain behaved correctly and was properly available for DWMMC
>>>> device, which needs to enable clocks in its noirq_resume callback.
>>>>
>>>> However, in case of AUD_CMU and its clients (for example serial_3 device,
>>>> which is a part of audio-subsytem), the pm_runtime_get_sync() called from
>>>> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM
>>>> support. The only difference between FSYS_CMU and AUD_CMU, is the fact
>>>> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power
>>>> domain assigned (for completely other reasons FSYS power domains is not yet
>>>> instantiated in exynos5433.dtsi).
>>>>
>>>> Ulf/Rafael: could You comment a bit on this issue? How it should be solved
>>>> properly?
>>> In general, this sounds like the classical problem of
>>> suspending/resuming devices in a correct order. The long term solution
>>> is to use device links to address this problem.
>> Device links won't help in this case. This is not a problem of correct
>> order of suspending/resuming devices. exynos5433 cmu device is already
>> suspended after its client's and resumed before them. The problem here
>> is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume
>> stage.
>>
> I see.
>
> However, I am guessing that one of the reason to why the client's are
> also using the noirq phase is because of getting a the correct order.
> The point is, the different level of PM callbacks (prepare, suspend,
> suspend_late, suspend_noirq etc) aren't really the right solution to
> get a correct order, even if that is what works for most cases.

Nope, in both cases clients (dw-mmc-exynos and samsung serial/uart driver)
use NOIRQ resume stage, because they want to play with interrupts (avoid
interrupt storm) on resume. The only way to avoid NOIRQ phase in them is
to manually mask their irqs on suspend and unmask after fixing interrupts
related registers, but using NOIRQ was preferred solution.

The main problem is that both clients needs to enable clocks to access
their registers in NOIRQ resume, thus their clock provider has to be able
to runtime resume as well. clk_prepare() will call pm_runtime_get_sync()
on clock provider device, but first someone has to call pm_runtime_enable()
on it, otherwise pm_runtimge_get_sync() fails.

> In principle, if all devices dependencies were hooked up correctly
> with device links, we would in principle only need one level of
> suspend/resume callbacks.
>
>>> However, as a simple fix, I would try to add a ->prepare() callback
>>> and call pm_runtime_get_sync() from it and then in a new corresponding
>>> ->complete() callback, call pm_runtime_put().
>>>
>>> This means that the device will stay runtime resumed until the
>>> suspend_noirq phase. It would also means the pm_runtime_force_resume()
>>> will resume the device at resume_noirq phase. Would that work?
>> Yes, this will work, but it will unnecessarily wake all instances of
>> exynos5433 cmu and their power domains during the system suspend/resume
>> cycle, what I would like to avoid.
> I guess what you are saying is that it's not at every suspend sequence
> that the client's needs to runtime resume the exynos5433 cmu device -
> and in those cases when not  needed, runtime resuming it becomes a
> waste in regards to both time and energy. Right?

Exactly. The other problem is that clock provider has to ensure that
calling pm_runtime_get_sync() on its device during NOIRQ phase will
not fail to let clients to properly prepare+enable their clocks.

> Anyway, I fully agree that this isn't very nice, but unfortunate
> that's the best I can think of as a simple solution.

> [...]
>

Best regards
Ulf Hansson Oct. 1, 2018, 1:30 p.m. UTC | #9
On 1 October 2018 at 14:34, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Hi Rafael and Ulf,
>
> I understand your position on using runtime PM during system
> suspend/resume, but
> I would like add a few more words on why I need it.
>
> On 2018-09-20 19:29, Ulf Hansson wrote:
>> [...]
>>>>> The unconditional calls to pm_runtime_enable and pm_runtime_enable
>>>>> in device_suspend_late and device_resume_early core functions were a bit
>>>>> surprising for me, because they make NOIRQ phase a special case from
>>>>> the perspective of runtime PM handling. Using LATE system sleep phase
>>>>> to call pm_runtime_enable/disable looks like a workaround for me, but right
>>>>> now I have no other idea how to ensure proper behavior of the Exynos5433
>>>>> CMU driver.
>>>>>
>>>>> The most strange thing is that I didn't observe this initially after
>>>>> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under
>>>>> any power domain behaved correctly and was properly available for DWMMC
>>>>> device, which needs to enable clocks in its noirq_resume callback.
>>>>>
>>>>> However, in case of AUD_CMU and its clients (for example serial_3 device,
>>>>> which is a part of audio-subsytem), the pm_runtime_get_sync() called from
>>>>> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM
>>>>> support. The only difference between FSYS_CMU and AUD_CMU, is the fact
>>>>> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power
>>>>> domain assigned (for completely other reasons FSYS power domains is not yet
>>>>> instantiated in exynos5433.dtsi).
>>>>>
>>>>> Ulf/Rafael: could You comment a bit on this issue? How it should be solved
>>>>> properly?
>>>> In general, this sounds like the classical problem of
>>>> suspending/resuming devices in a correct order. The long term solution
>>>> is to use device links to address this problem.
>>> Device links won't help in this case. This is not a problem of correct
>>> order of suspending/resuming devices. exynos5433 cmu device is already
>>> suspended after its client's and resumed before them. The problem here
>>> is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume
>>> stage.
>>>
>> I see.
>>
>> However, I am guessing that one of the reason to why the client's are
>> also using the noirq phase is because of getting a the correct order.
>> The point is, the different level of PM callbacks (prepare, suspend,
>> suspend_late, suspend_noirq etc) aren't really the right solution to
>> get a correct order, even if that is what works for most cases.
>
> Nope, in both cases clients (dw-mmc-exynos and samsung serial/uart driver)
> use NOIRQ resume stage, because they want to play with interrupts (avoid
> interrupt storm) on resume. The only way to avoid NOIRQ phase in them is
> to manually mask their irqs on suspend and unmask after fixing interrupts
> related registers, but using NOIRQ was preferred solution.

I see.

>
> The main problem is that both clients needs to enable clocks to access
> their registers in NOIRQ resume, thus their clock provider has to be able
> to runtime resume as well. clk_prepare() will call pm_runtime_get_sync()
> on clock provider device, but first someone has to call pm_runtime_enable()
> on it, otherwise pm_runtimge_get_sync() fails.

In principle the sequence you are describing doesn't sound like a very
unique case, so yes, I fully agree that it's problematic.

So far, the solution to this problem has been to resume devices
unconditionally, instead of using runtime PM for example. In this
case, resuming would need to be done even as early as in the noirq
phase.

Allowing runtime PM to stay enabled, instead of disabling it from PM
core in the device suspend late phase, could be an option. However, in
such case in needs to be done on case by case basis, like an opt-in
solution, otherwise it may cause problems. Rafael, what do you think,
is this doable?

>
>> In principle, if all devices dependencies were hooked up correctly
>> with device links, we would in principle only need one level of
>> suspend/resume callbacks.
>>
>>>> However, as a simple fix, I would try to add a ->prepare() callback
>>>> and call pm_runtime_get_sync() from it and then in a new corresponding
>>>> ->complete() callback, call pm_runtime_put().
>>>>
>>>> This means that the device will stay runtime resumed until the
>>>> suspend_noirq phase. It would also means the pm_runtime_force_resume()
>>>> will resume the device at resume_noirq phase. Would that work?
>>> Yes, this will work, but it will unnecessarily wake all instances of
>>> exynos5433 cmu and their power domains during the system suspend/resume
>>> cycle, what I would like to avoid.
>> I guess what you are saying is that it's not at every suspend sequence
>> that the client's needs to runtime resume the exynos5433 cmu device -
>> and in those cases when not  needed, runtime resuming it becomes a
>> waste in regards to both time and energy. Right?
>
> Exactly. The other problem is that clock provider has to ensure that
> calling pm_runtime_get_sync() on its device during NOIRQ phase will
> not fail to let clients to properly prepare+enable their clocks.

Okay, thanks for confirming.

>
>> Anyway, I fully agree that this isn't very nice, but unfortunate
>> that's the best I can think of as a simple solution.
>
>> [...]
>>

Kind regards
Uffe
diff mbox series

Patch

diff --git a/drivers/clk/samsung/clk-exynos5433.c b/drivers/clk/samsung/clk-exynos5433.c
index 4a0f8ff87ca0..7e6c0c9397a9 100644
--- a/drivers/clk/samsung/clk-exynos5433.c
+++ b/drivers/clk/samsung/clk-exynos5433.c
@@ -5623,6 +5623,24 @@  static int __maybe_unused exynos5433_cmu_resume(struct device *dev)
 	return 0;
 }
 
+/*
+ * late_suspend and early_resume callbacks are required to balance
+ * pm_runtime_disable and pm_runtime_enable calls in device_suspend_late
+ * and device_resume_early core functions to keep runtime enabled for
+ * CMU device during noirq sleep phase.
+ */
+static int __maybe_unused exynos5433_late_suspend(struct device *dev)
+{
+	pm_runtime_enable(dev);
+	return 0;
+}
+
+static int __maybe_unused exynos5433_early_resume(struct device *dev)
+{
+	pm_runtime_disable(dev);
+	return 0;
+}
+
 static int __init exynos5433_cmu_probe(struct platform_device *pdev)
 {
 	const struct samsung_cmu_info *info;
@@ -5760,6 +5778,8 @@  static const struct of_device_id exynos5433_cmu_of_match[] = {
 static const struct dev_pm_ops exynos5433_cmu_pm_ops = {
 	SET_RUNTIME_PM_OPS(exynos5433_cmu_suspend, exynos5433_cmu_resume,
 			   NULL)
+	SET_LATE_SYSTEM_SLEEP_PM_OPS(exynos5433_late_suspend,
+				     exynos5433_early_resume)
 	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
 				     pm_runtime_force_resume)
 };