Message ID | 2932219.eEdIVn6WWf@aspire.rjw.lan (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to > instruct the PM core and middle-layer (bus type, PM domain, etc.) > code that it is desirable to leave the device in runtime suspend > after system-wide transitions to the working state (for example, > the device may be slow to resume and it may be better to avoid > resuming it right away). > > Generally, the middle-layer code involved in the handling of the > device is expected to indicate to the PM core whether or not the > device may be left in suspend with the help of the device's > power.may_skip_resume status bit. That has to happen in the "noirq" > phase of the preceding system suspend (or analogous) transition. > The middle layer is then responsible for handling the device as > appropriate in its "noirq" resume callback which is executed > regardless of whether or not the device may be left suspended, but > the other resume callbacks (except for ->complete) will be skipped > automatically by the core if the device really can be left in > suspend. I don't understand the reason to why you need to skip invoking resume callbacks to achieve this behavior, could you elaborate on that? Couldn't the PM domain or the middle-layer instead decide what to do? To me it sounds a bit prone to errors by skipping callbacks from the PM core, and I wonder if the general driver author will be able to understand how to use this flag properly. That said, as the series don't include any changes for drivers making use of the flag, could please fold in such change as it would provide a more complete picture? > > The additional power.must_resume status bit introduced for the > implementation of this mechanisn is used internally by the PM core > to track the requirement to resume the device (which may depend on > its children etc). Yeah, clearly the PM core needs to be involved, because of the need of dealing with parent/child relations, however as kind of indicate above, couldn't the PM core just set some flag/status bits, which instructs the middle-layer and PM domain on what to do? That sounds like an easier approach. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > --- > Documentation/driver-api/pm/devices.rst | 24 ++++++++++- > drivers/base/power/main.c | 65 +++++++++++++++++++++++++++++--- > include/linux/pm.h | 14 +++++- > 3 files changed, 93 insertions(+), 10 deletions(-) > > Index: linux-pm/include/linux/pm.h > =================================================================== > --- linux-pm.orig/include/linux/pm.h > +++ linux-pm/include/linux/pm.h > @@ -559,6 +559,7 @@ struct pm_subsys_data { > * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device. > * SMART_PREPARE: Check the return value of the driver's ->prepare callback. > * SMART_SUSPEND: No need to resume the device from runtime suspend. > + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible. > * > * Setting SMART_PREPARE instructs bus types and PM domains which may want > * system suspend/resume callbacks to be skipped for the device to return 0 from > @@ -572,10 +573,14 @@ struct pm_subsys_data { > * necessary from the driver's perspective. It also may cause them to skip > * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by > * the driver if they decide to leave the device in runtime suspend. > + * > + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the > + * driver prefers the device to be left in runtime suspend after system resume. > */ > -#define DPM_FLAG_NEVER_SKIP BIT(0) > -#define DPM_FLAG_SMART_PREPARE BIT(1) > -#define DPM_FLAG_SMART_SUSPEND BIT(2) > +#define DPM_FLAG_NEVER_SKIP BIT(0) > +#define DPM_FLAG_SMART_PREPARE BIT(1) > +#define DPM_FLAG_SMART_SUSPEND BIT(2) > +#define DPM_FLAG_LEAVE_SUSPENDED BIT(3) > > struct dev_pm_info { > pm_message_t power_state; > @@ -597,6 +602,8 @@ struct dev_pm_info { > bool wakeup_path:1; > bool syscore:1; > bool no_pm_callbacks:1; /* Owned by the PM core */ > + unsigned int must_resume:1; /* Owned by the PM core */ > + unsigned int may_skip_resume:1; /* Set by subsystems */ > #else > unsigned int should_wakeup:1; > #endif > @@ -765,6 +772,7 @@ extern int pm_generic_poweroff_late(stru > extern int pm_generic_poweroff(struct device *dev); > extern void pm_generic_complete(struct device *dev); > > +extern bool dev_pm_may_skip_resume(struct device *dev); > extern bool dev_pm_smart_suspend_and_suspended(struct device *dev); > > #else /* !CONFIG_PM_SLEEP */ > Index: linux-pm/drivers/base/power/main.c > =================================================================== > --- linux-pm.orig/drivers/base/power/main.c > +++ linux-pm/drivers/base/power/main.c > @@ -528,6 +528,18 @@ static void dpm_watchdog_clear(struct dp > /*------------------------- Resume routines -------------------------*/ > > /** > + * dev_pm_may_skip_resume - System-wide device resume optimization check. > + * @dev: Target device. > + * > + * Checks whether or not the device may be left in suspend after a system-wide > + * transition to the working state. > + */ > +bool dev_pm_may_skip_resume(struct device *dev) > +{ > + return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE; > +} > + > +/** > * device_resume_noirq - Execute a "noirq resume" callback for given device. > * @dev: Device to handle. > * @state: PM transition of the system being carried out. > @@ -575,6 +587,12 @@ static int device_resume_noirq(struct de > error = dpm_run_callback(callback, dev, state, info); > dev->power.is_noirq_suspended = false; > > + if (dev_pm_may_skip_resume(dev)) { > + pm_runtime_set_suspended(dev); According to the doc, the DPM_FLAG_LEAVE_SUSPENDED intends to leave the device in runtime suspend state during system resume. However, here you are actually trying to change its runtime PM state to that. Moreover, you should check the return value from pm_runtime_set_suspended(). Then I wonder, what should you do when it fails here? Perhaps a better idea is to do this in the noirq suspend phase, because it allows you to bail out in case pm_runtime_set_suspended() fails. Another option is to leave this to the middle-layer and PM domain, that would make it more flexible and probably also easier for them to deal with the error path. > + dev->power.is_late_suspended = false; > + dev->power.is_suspended = false;pm_runtime_set_suspended( > + } > + > Out: > complete_all(&dev->power.completion); > TRACE_RESUME(error); > @@ -1076,6 +1094,22 @@ static pm_message_t resume_event(pm_mess > return PMSG_ON; > } > > +static void dpm_superior_set_must_resume(struct device *dev) > +{ > + struct device_link *link; > + int idx; > + > + if (dev->parent) > + dev->parent->power.must_resume = true; > + > + idx = device_links_read_lock(); > + > + list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) > + link->supplier->power.must_resume = true; > + > + device_links_read_unlock(idx); > +} > + > /** > * __device_suspend_noirq - Execute a "noirq suspend" callback for given device. > * @dev: Device to handle. > @@ -1127,10 +1161,27 @@ static int __device_suspend_noirq(struct > } > > error = dpm_run_callback(callback, dev, state, info); > - if (!error) > - dev->power.is_noirq_suspended = true; > - else > + if (error) { > async_error = error; > + goto Complete; > + } > + > + dev->power.is_noirq_suspended = true; > + > + if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) { > + /* > + * The only safe strategy here is to require that if the device > + * may not be left in suspend, resume callbacks must be invoked > + * for it. > + */ > + dev->power.must_resume = dev->power.must_resume || > + !dev->power.may_skip_resume; > + } else { > + dev->power.must_resume = true; > + } > + > + if (dev->power.must_resume) > + dpm_superior_set_must_resume(dev); > > Complete: > complete_all(&dev->power.completion); > @@ -1487,6 +1538,9 @@ static int __device_suspend(struct devic > dev->power.direct_complete = false; > } > > + dev->power.may_skip_resume = false; > + dev->power.must_resume = false; > + > dpm_watchdog_set(&wd, dev); > device_lock(dev); > > @@ -1652,8 +1706,9 @@ static int device_prepare(struct device > if (dev->power.syscore) > return 0; > > - WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) && > - !pm_runtime_enabled(dev)); > + WARN_ON(!pm_runtime_enabled(dev) && > + dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND | > + DPM_FLAG_LEAVE_SUSPENDED)); > > /* > * If a device's parent goes into runtime suspend at the wrong time, > Index: linux-pm/Documentation/driver-api/pm/devices.rst > =================================================================== > --- linux-pm.orig/Documentation/driver-api/pm/devices.rst > +++ linux-pm/Documentation/driver-api/pm/devices.rst > @@ -788,6 +788,26 @@ must reflect the "active" status for run > > During system-wide resume from a sleep state it's easiest to put devices into > the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`. > -Refer to that document for more information regarding this particular issue as > +[Refer to that document for more information regarding this particular issue as > well as for information on the device runtime power management framework in > -general. > +general.] > + > +However, it may be desirable to leave some devices in runtime suspend after > +system transitions to the working state and device drivers can use the > +``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer > +code) that this is the case. Whether or not the devices will actually be left > +in suspend may depend on their state before the given system suspend-resume > +cycle and on the type of the system transition under way. In particular, > +devices are not left suspended if that transition is a restore from hibernation, > +as device states are not guaranteed to be reflected by the information stored in > +the hibernation image in that case. > + > +The middle-layer code involved in the handling of the device has to indicate to > +the PM core if the device may be left in suspend with the help of its > +:c:member:`power.may_skip_resume` status bit. That has to happen in the "noirq" > +phase of the preceding system-wide suspend (or analogous) transition. The > +middle layer is then responsible for handling the device as appropriate in its > +"noirq" resume callback, which is executed regardless of whether or not the > +device may be left suspended, but the other resume callbacks (except for > +``->complete``) will be skipped automatically by the PM core if the device > +really can be left in suspend. > Kind regards Uffe
On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to >> instruct the PM core and middle-layer (bus type, PM domain, etc.) >> code that it is desirable to leave the device in runtime suspend >> after system-wide transitions to the working state (for example, >> the device may be slow to resume and it may be better to avoid >> resuming it right away). >> >> Generally, the middle-layer code involved in the handling of the >> device is expected to indicate to the PM core whether or not the >> device may be left in suspend with the help of the device's >> power.may_skip_resume status bit. That has to happen in the "noirq" >> phase of the preceding system suspend (or analogous) transition. >> The middle layer is then responsible for handling the device as >> appropriate in its "noirq" resume callback which is executed >> regardless of whether or not the device may be left suspended, but >> the other resume callbacks (except for ->complete) will be skipped >> automatically by the core if the device really can be left in >> suspend. > > I don't understand the reason to why you need to skip invoking resume > callbacks to achieve this behavior, could you elaborate on that? The reason why it is done this way is because that takes less code and is easier (or at least less error-prone, because it avoids repeating patterns in middle layers). Note that the callbacks only may be skipped by the core if the middle layer has set power.skip_resume for the device (or if the core is handling it in patch [5/6], but that's one more step ahead still). > Couldn't the PM domain or the middle-layer instead decide what to do? They still can, the whole thing is a total opt-in. But to be constructive, do you have any specific examples in mind? > To me it sounds a bit prone to errors by skipping callbacks from the > PM core, and I wonder if the general driver author will be able to > understand how to use this flag properly. This has nothing to do with general driver authors and I'm not sure what you mean here and where you are going with this. > That said, as the series don't include any changes for drivers making > use of the flag, could please fold in such change as it would provide > a more complete picture? I've already done so, see https://patchwork.kernel.org/patch/10007349/ IMHO it's not really useful to drag this stuff (which doesn't change BTW) along with every iteration of the core patches. >> >> The additional power.must_resume status bit introduced for the >> implementation of this mechanisn is used internally by the PM core >> to track the requirement to resume the device (which may depend on >> its children etc). > > Yeah, clearly the PM core needs to be involved, because of the need of > dealing with parent/child relations, however as kind of indicate > above, couldn't the PM core just set some flag/status bits, which > instructs the middle-layer and PM domain on what to do? That sounds > like an easier approach. No, it is not easier. And it is backwards. >> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >> --- >> Documentation/driver-api/pm/devices.rst | 24 ++++++++++- >> drivers/base/power/main.c | 65 +++++++++++++++++++++++++++++--- >> include/linux/pm.h | 14 +++++- >> 3 files changed, 93 insertions(+), 10 deletions(-) >> >> Index: linux-pm/include/linux/pm.h >> =================================================================== >> --- linux-pm.orig/include/linux/pm.h >> +++ linux-pm/include/linux/pm.h >> @@ -559,6 +559,7 @@ struct pm_subsys_data { >> * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device. >> * SMART_PREPARE: Check the return value of the driver's ->prepare callback. >> * SMART_SUSPEND: No need to resume the device from runtime suspend. >> + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible. >> * >> * Setting SMART_PREPARE instructs bus types and PM domains which may want >> * system suspend/resume callbacks to be skipped for the device to return 0 from >> @@ -572,10 +573,14 @@ struct pm_subsys_data { >> * necessary from the driver's perspective. It also may cause them to skip >> * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by >> * the driver if they decide to leave the device in runtime suspend. >> + * >> + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the >> + * driver prefers the device to be left in runtime suspend after system resume. >> */ >> -#define DPM_FLAG_NEVER_SKIP BIT(0) >> -#define DPM_FLAG_SMART_PREPARE BIT(1) >> -#define DPM_FLAG_SMART_SUSPEND BIT(2) >> +#define DPM_FLAG_NEVER_SKIP BIT(0) >> +#define DPM_FLAG_SMART_PREPARE BIT(1) >> +#define DPM_FLAG_SMART_SUSPEND BIT(2) >> +#define DPM_FLAG_LEAVE_SUSPENDED BIT(3) >> >> struct dev_pm_info { >> pm_message_t power_state; >> @@ -597,6 +602,8 @@ struct dev_pm_info { >> bool wakeup_path:1; >> bool syscore:1; >> bool no_pm_callbacks:1; /* Owned by the PM core */ >> + unsigned int must_resume:1; /* Owned by the PM core */ >> + unsigned int may_skip_resume:1; /* Set by subsystems */ >> #else >> unsigned int should_wakeup:1; >> #endif >> @@ -765,6 +772,7 @@ extern int pm_generic_poweroff_late(stru >> extern int pm_generic_poweroff(struct device *dev); >> extern void pm_generic_complete(struct device *dev); >> >> +extern bool dev_pm_may_skip_resume(struct device *dev); >> extern bool dev_pm_smart_suspend_and_suspended(struct device *dev); >> >> #else /* !CONFIG_PM_SLEEP */ >> Index: linux-pm/drivers/base/power/main.c >> =================================================================== >> --- linux-pm.orig/drivers/base/power/main.c >> +++ linux-pm/drivers/base/power/main.c >> @@ -528,6 +528,18 @@ static void dpm_watchdog_clear(struct dp >> /*------------------------- Resume routines -------------------------*/ >> >> /** >> + * dev_pm_may_skip_resume - System-wide device resume optimization check. >> + * @dev: Target device. >> + * >> + * Checks whether or not the device may be left in suspend after a system-wide >> + * transition to the working state. >> + */ >> +bool dev_pm_may_skip_resume(struct device *dev) >> +{ >> + return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE; >> +} >> + >> +/** >> * device_resume_noirq - Execute a "noirq resume" callback for given device. >> * @dev: Device to handle. >> * @state: PM transition of the system being carried out. >> @@ -575,6 +587,12 @@ static int device_resume_noirq(struct de >> error = dpm_run_callback(callback, dev, state, info); >> dev->power.is_noirq_suspended = false; >> >> + if (dev_pm_may_skip_resume(dev)) { >> + pm_runtime_set_suspended(dev); > > According to the doc, the DPM_FLAG_LEAVE_SUSPENDED intends to leave > the device in runtime suspend state during system resume. > However, here you are actually trying to change its runtime PM state to that. So the doc needs to be fixed. :-) But I'm guessing that this just is a misunderstanding and you mean the phrase "it may be desirable to leave some devices in runtime suspend after [...]". Yes, it is talking about "runtime suspend", but actually "runtime suspend" is the only kind of "suspend" you can leave a device in after a system transition to the working state. It never says that the device must have been suspended before the preceding system transition into a sleep state started. > Moreover, you should check the return value from > pm_runtime_set_suspended(). This is in "noirq", so failures of that are meaningless here. > Then I wonder, what should you do when it fails here? > > Perhaps a better idea is to do this in the noirq suspend phase, > because it allows you to bail out in case pm_runtime_set_suspended() > fails. This doesn't make sense, sorry. > Another option is to leave this to the middle-layer and PM domain, > that would make it more flexible and probably also easier for them to > deal with the error path. So the middle layer doesn't have to set power.skip_resume. Just don't set it if you don't like the default handling, but yes, you will affect others this way. Thanks, Rafael
On Sat, Nov 11, 2017 at 12:45 AM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: >> On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>> >>> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to >>> instruct the PM core and middle-layer (bus type, PM domain, etc.) >>> code that it is desirable to leave the device in runtime suspend >>> after system-wide transitions to the working state (for example, >>> the device may be slow to resume and it may be better to avoid >>> resuming it right away). >>> >>> Generally, the middle-layer code involved in the handling of the >>> device is expected to indicate to the PM core whether or not the >>> device may be left in suspend with the help of the device's >>> power.may_skip_resume status bit. That has to happen in the "noirq" >>> phase of the preceding system suspend (or analogous) transition. >>> The middle layer is then responsible for handling the device as >>> appropriate in its "noirq" resume callback which is executed >>> regardless of whether or not the device may be left suspended, but >>> the other resume callbacks (except for ->complete) will be skipped >>> automatically by the core if the device really can be left in >>> suspend. >> >> I don't understand the reason to why you need to skip invoking resume >> callbacks to achieve this behavior, could you elaborate on that? > > The reason why it is done this way is because that takes less code and > is easier (or at least less error-prone, because it avoids repeating > patterns in middle layers). Actually, it also is a matter of correctness, at least to some extent. Namely, if the parent or any supplier of the device has power.must_resume clear in dpm_noirq_resume_devices(), then the device should not be touched during the whole system resume transition (because the access may very well go through the suspended parent or supplier) and the most straightforward way to make that happen is to avoid running the code that may touch the device. [Arguably, if middle layers were made responsible for handling that, they would need to do pretty much the same thing and so there is no reason for not doing it in the core.] Allowing the "noirq" callback from middle layers to run in that case is a stretch already, but since genpd needs that, well, tough nuggets. All of that said, if there is a middle layer wanting to set power.skip_resume and needing to do something different for the resume callbacks, then this piece can be moved from the core to the middle layers at any time later. So far there's none, though. At least not in this patch series. Thanks, Rafael
On Sat, Nov 11, 2017 at 12:45 AM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: >> On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: [cut] >> Moreover, you should check the return value from >> pm_runtime_set_suspended(). > > This is in "noirq", so failures of that are meaningless here. They *should* be meaningless, but __pm_runtime_set_status() is sort of buggy and checks child_count regardless of whether or not runtime PM is enabled for the children (but when changing the status to "active" it actually checks if runtime PM is enabled for the parent before returning -EBUSY, so it is not event consistent internally). Oh well. >> Then I wonder, what should you do when it fails here? >> >> Perhaps a better idea is to do this in the noirq suspend phase, >> because it allows you to bail out in case pm_runtime_set_suspended() >> fails. > > This doesn't make sense, sorry. Not for the above reason, but that would allow the bug in __pm_runtime_set_status() to be sort of worked around by setting the status to "suspended" for children before doing that for their parents. Moreover, stuff with nonzero usage_counts cannot be left in suspend regardless. Thanks, Rafael
On 11 November 2017 at 00:45, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: >> On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>> >>> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to >>> instruct the PM core and middle-layer (bus type, PM domain, etc.) >>> code that it is desirable to leave the device in runtime suspend >>> after system-wide transitions to the working state (for example, >>> the device may be slow to resume and it may be better to avoid >>> resuming it right away). >>> >>> Generally, the middle-layer code involved in the handling of the >>> device is expected to indicate to the PM core whether or not the >>> device may be left in suspend with the help of the device's >>> power.may_skip_resume status bit. That has to happen in the "noirq" >>> phase of the preceding system suspend (or analogous) transition. >>> The middle layer is then responsible for handling the device as >>> appropriate in its "noirq" resume callback which is executed >>> regardless of whether or not the device may be left suspended, but >>> the other resume callbacks (except for ->complete) will be skipped >>> automatically by the core if the device really can be left in >>> suspend. >> >> I don't understand the reason to why you need to skip invoking resume >> callbacks to achieve this behavior, could you elaborate on that? > > The reason why it is done this way is because that takes less code and > is easier (or at least less error-prone, because it avoids repeating > patterns in middle layers). > > Note that the callbacks only may be skipped by the core if the middle > layer has set power.skip_resume for the device (or if the core is > handling it in patch [5/6], but that's one more step ahead still). > >> Couldn't the PM domain or the middle-layer instead decide what to do? > > They still can, the whole thing is a total opt-in. > > But to be constructive, do you have any specific examples in mind? See more below. > >> To me it sounds a bit prone to errors by skipping callbacks from the >> PM core, and I wonder if the general driver author will be able to >> understand how to use this flag properly. > > This has nothing to do with general driver authors and I'm not sure > what you mean here and where you are going with this. Let me elaborate. My general goal is that I want to make it easier (or as easy as possible) for the general driver author to deploy runtime PM and system-wide PM support - in an optimized manner. Therefore, I am pondering over the solution you picked in this series, trying to understand how it fits into those aspects. Particular I am a bit worried from a complexity point of view, about the part with skipping callbacks from the PM core. We have observed some difficulties with the direct_complete path (i2c dw driver), which is based on a similar approach as this one. Additionally, in this case, to trigger skipping of callbacks to happen, first, drivers needs to inform the middle-layer, second, the middle layer acts on that information and then informs the PM core, then in the third step, the PM core can decide what to do. It doesn't sound straight-forward. I guess I need to be convinced that this new approach is going to be better than the the direct_complete path, so it somehow can replace it along the road. Otherwise, we may end up just having yet another way of skipping callbacks in the PM core and I don't like that. Of course, I also realize this hole thing is opt-in, so nothing will break and we are all good. :-) > >> That said, as the series don't include any changes for drivers making >> use of the flag, could please fold in such change as it would provide >> a more complete picture? > > I've already done so, see https://patchwork.kernel.org/patch/10007349/ > > IMHO it's not really useful to drag this stuff (which doesn't change > BTW) along with every iteration of the core patches. Well, to me it's useful because it shows how these flags can/will be used. Anyway, I thought you scraped that patch and was working on a new version. I will have a look then. [...] >>> * device_resume_noirq - Execute a "noirq resume" callback for given device. >>> * @dev: Device to handle. >>> * @state: PM transition of the system being carried out. >>> @@ -575,6 +587,12 @@ static int device_resume_noirq(struct de >>> error = dpm_run_callback(callback, dev, state, info); >>> dev->power.is_noirq_suspended = false; >>> >>> + if (dev_pm_may_skip_resume(dev)) { >>> + pm_runtime_set_suspended(dev); >> >> According to the doc, the DPM_FLAG_LEAVE_SUSPENDED intends to leave >> the device in runtime suspend state during system resume. >> However, here you are actually trying to change its runtime PM state to that. > > So the doc needs to be fixed. :-) Yep. > > But I'm guessing that this just is a misunderstanding and you mean the > phrase "it may be desirable to leave some devices in runtime suspend > after [...]". Yes, it is talking about "runtime suspend", but > actually "runtime suspend" is the only kind of "suspend" you can leave > a device in after a system transition to the working state. It never > says that the device must have been suspended before the preceding > system transition into a sleep state started. My point is, it's isn't obvious why you need to make sure the device's runtime PM status is set to "RPM_SUSPENDED" when leaving the resume noirq phase. You did explain that somewhat above, thanks! Perhaps you could fold in some of that information into the doc as well? > >> Moreover, you should check the return value from >> pm_runtime_set_suspended(). > > This is in "noirq", so failures of that are meaningless here. > >> Then I wonder, what should you do when it fails here? >> >> Perhaps a better idea is to do this in the noirq suspend phase, >> because it allows you to bail out in case pm_runtime_set_suspended() >> fails. > > This doesn't make sense, sorry. What do you mean by "failures of that are meaningless here."? I was suggesting, instead of calling pm_runtime_set_suspended() in the noirq *resume* phase, why can't you do that in the noirq *suspend* phase? In the noirq *suspend* phase it's not too late to deal with errors!? Or is it? [...] Kind regards Uffe
On Tuesday, November 14, 2017 5:07:59 PM CET Ulf Hansson wrote: > On 11 November 2017 at 00:45, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > >> On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >>> > >>> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to > >>> instruct the PM core and middle-layer (bus type, PM domain, etc.) > >>> code that it is desirable to leave the device in runtime suspend > >>> after system-wide transitions to the working state (for example, > >>> the device may be slow to resume and it may be better to avoid > >>> resuming it right away). > >>> > >>> Generally, the middle-layer code involved in the handling of the > >>> device is expected to indicate to the PM core whether or not the > >>> device may be left in suspend with the help of the device's > >>> power.may_skip_resume status bit. That has to happen in the "noirq" > >>> phase of the preceding system suspend (or analogous) transition. > >>> The middle layer is then responsible for handling the device as > >>> appropriate in its "noirq" resume callback which is executed > >>> regardless of whether or not the device may be left suspended, but > >>> the other resume callbacks (except for ->complete) will be skipped > >>> automatically by the core if the device really can be left in > >>> suspend. > >> > >> I don't understand the reason to why you need to skip invoking resume > >> callbacks to achieve this behavior, could you elaborate on that? > > > > The reason why it is done this way is because that takes less code and > > is easier (or at least less error-prone, because it avoids repeating > > patterns in middle layers). > > > > Note that the callbacks only may be skipped by the core if the middle > > layer has set power.skip_resume for the device (or if the core is > > handling it in patch [5/6], but that's one more step ahead still). > > > >> Couldn't the PM domain or the middle-layer instead decide what to do? > > > > They still can, the whole thing is a total opt-in. > > > > But to be constructive, do you have any specific examples in mind? > > See more below. > > > > >> To me it sounds a bit prone to errors by skipping callbacks from the > >> PM core, and I wonder if the general driver author will be able to > >> understand how to use this flag properly. > > > > This has nothing to do with general driver authors and I'm not sure > > what you mean here and where you are going with this. > > Let me elaborate. > > My general goal is that I want to make it easier (or as easy as > possible) for the general driver author to deploy runtime PM and > system-wide PM support - in an optimized manner. Therefore, I am > pondering over the solution you picked in this series, trying to > understand how it fits into those aspects. > > Particular I am a bit worried from a complexity point of view, about > the part with skipping callbacks from the PM core. We have observed > some difficulties with the direct_complete path (i2c dw driver), which > is based on a similar approach as this one. These are resume callbacks, not suspend callbacks. Also not all of them are skipped. That is quite a bit different from skipping *all* callbacks. Moreover, at the point the core decides to skip the callbacks, the device *has* *to* be left suspended and there simply is no point in running them no matter what. That part of code can be trivially moved to middle layers, but then each of them will have to do exactly the same thing. I don't see any reason to do that and I'm not finding one in your comments. Sorry. > Additionally, in this case, to trigger skipping of callbacks to > happen, first, drivers needs to inform the middle-layer, second, the > middle layer acts on that information and then informs the PM core, > then in the third step, the PM core can decide what to do. It doesn't > sound straight-forward. It really doesn't work like that. First, the driver sets the LEAVE_SUSPENDED flag for the core to consume. The middle layers don't have to look at it at all. Second, each middle layer sets power.may_skip_resume for devices whose state after system suspend should match the runtime suspend state. The middle layer must know that this is the case to set that bit. [The core effectively does that part for devices handled by it directly in patch [5/6].] The core then takes the LEAVE_SUSPENDED flags, power.may_skip_resume bits, status of the children and consumers into account in order to produce the power.must_resume bits and those are used (later) to decide whether or not to resume the devices. That decision is made by the core and so the core acts on it and the middle layers must follow. > I guess I need to be convinced that this new approach is going to be > better than the the direct_complete path, so it somehow can replace it > along the road. Otherwise, we may end up just having yet another way > of skipping callbacks in the PM core and I don't like that. Well, this works the other way around this time, I'm afraid. At this point you need to convince me that the approach has real issues. :-) > Of course, I also realize this hole thing is opt-in, so nothing will > break and we are all good. :-) > > > > >> That said, as the series don't include any changes for drivers making > >> use of the flag, could please fold in such change as it would provide > >> a more complete picture? > > > > I've already done so, see https://patchwork.kernel.org/patch/10007349/ > > > > IMHO it's not really useful to drag this stuff (which doesn't change > > BTW) along with every iteration of the core patches. > > Well, to me it's useful because it shows how these flags can/will be used. > > Anyway, I thought you scraped that patch and was working on a new > version. I will have a look then. > > [...] > > >>> * device_resume_noirq - Execute a "noirq resume" callback for given device. > >>> * @dev: Device to handle. > >>> * @state: PM transition of the system being carried out. > >>> @@ -575,6 +587,12 @@ static int device_resume_noirq(struct de > >>> error = dpm_run_callback(callback, dev, state, info); > >>> dev->power.is_noirq_suspended = false; > >>> > >>> + if (dev_pm_may_skip_resume(dev)) { > >>> + pm_runtime_set_suspended(dev); > >> > >> According to the doc, the DPM_FLAG_LEAVE_SUSPENDED intends to leave > >> the device in runtime suspend state during system resume. > >> However, here you are actually trying to change its runtime PM state to that. > > > > So the doc needs to be fixed. :-) > > Yep. > > > > > But I'm guessing that this just is a misunderstanding and you mean the > > phrase "it may be desirable to leave some devices in runtime suspend > > after [...]". Yes, it is talking about "runtime suspend", but > > actually "runtime suspend" is the only kind of "suspend" you can leave > > a device in after a system transition to the working state. It never > > says that the device must have been suspended before the preceding > > system transition into a sleep state started. > > My point is, it's isn't obvious why you need to make sure the device's > runtime PM status is set to "RPM_SUSPENDED" when leaving the resume > noirq phase. You did explain that somewhat above, thanks! > > Perhaps you could fold in some of that information into the doc as well? The doc doesn't describe the design of the code, though. I guess I'll just add a comment at the point where the status changes. > > > >> Moreover, you should check the return value from > >> pm_runtime_set_suspended(). > > > > This is in "noirq", so failures of that are meaningless here. > > > >> Then I wonder, what should you do when it fails here? > >> > >> Perhaps a better idea is to do this in the noirq suspend phase, > >> because it allows you to bail out in case pm_runtime_set_suspended() > >> fails. > > > > This doesn't make sense, sorry. > > What do you mean by "failures of that are meaningless here."? If all devices have runtime PM disabled, pm_runtime_set_suspended() should just do what it is asked for unless called with an invalid argument or similar. > I was suggesting, instead of calling pm_runtime_set_suspended() in the > noirq *resume* phase, why can't you do that in the noirq *suspend* > phase? > > In the noirq *suspend* phase it's not too late to deal with errors!? Or is it? At that point it has not been decided whether or not the devices will stay suspended yet. The status cannot be changed before making that decision, which only happens in the noirq resume phase. Thanks, Rafael
[...] >> >> My general goal is that I want to make it easier (or as easy as >> possible) for the general driver author to deploy runtime PM and >> system-wide PM support - in an optimized manner. Therefore, I am >> pondering over the solution you picked in this series, trying to >> understand how it fits into those aspects. >> >> Particular I am a bit worried from a complexity point of view, about >> the part with skipping callbacks from the PM core. We have observed >> some difficulties with the direct_complete path (i2c dw driver), which >> is based on a similar approach as this one. > > These are resume callbacks, not suspend callbacks. Also not all of them > are skipped. That is quite a bit different from skipping *all* callbacks. > > Moreover, at the point the core decides to skip the callbacks, the device > *has* *to* be left suspended and there simply is no point in running them > no matter what. > > That part of code can be trivially moved to middle layers, but then each > of them will have to do exactly the same thing. I don't see any reason to > do that and I'm not finding one in your comments. Sorry. I think my concerns boils done to that I am wondering how useful it will be, in general, to enable the core to skip invoking resume callbacks. Although, if you are targeting some specific devices/drivers (ACPI, PCI, etc), and not care that much about flexibility, then I am fine with it. The approach seems to work. Let me elaborate on that comment a bit. 1) Skipping resume callbacks is not going to work for a device that may be attached to the generic PM domain. Well, in principle one could try to re-work genpd to cope with this behavior, I guess, but that would also mean genpd becomes limited to always use the noirq callbacks to power on/off the PM domain. That isn't an acceptable limitation. 2) Because of 1) This leads to those cross SoC drivers, dealing with devices which sometimes may have a genpd attached and sometimes an ACPI PM domain attached. I guess those drivers would need to have a different set of system-wide PM callbacks, depending on the PM domain the device is attached to, as to achieve a similar optimized behavior during system resume. Or some other cleverness to deal with system-wide PM. Perhaps we can ignore both 1) and 2), because the number of cross SoC drivers having these issues should be rather limited!? 3) There are certainly lots of drivers that can cope with its device remaining in runtime suspend, during system resume. Although, some of these drivers may have some additional operations to carry out during resume, which may not require to resume (activate) its device. For example the driver may need to resume a queue, re-configure an out-of-band wakeup (GPIO IRQ), re-configure pinctrls, etc. These drivers can't use the method behind LEAVE_SUSPENDED, because they need their resume callbacks to be invoked. [...] > > Well, this works the other way around this time, I'm afraid. At this point > you need to convince me that the approach has real issues. :-) I think I have pointed out some issues above. Feel free to ignore them, depending on what your target is. [...] >> >> Perhaps a better idea is to do this in the noirq suspend phase, >> >> because it allows you to bail out in case pm_runtime_set_suspended() >> >> fails. >> > >> > This doesn't make sense, sorry. >> >> What do you mean by "failures of that are meaningless here."? > > If all devices have runtime PM disabled, pm_runtime_set_suspended() should > just do what it is asked for unless called with an invalid argument or > similar. Yes. > >> I was suggesting, instead of calling pm_runtime_set_suspended() in the >> noirq *resume* phase, why can't you do that in the noirq *suspend* >> phase? >> >> In the noirq *suspend* phase it's not too late to deal with errors!? Or is it? > > At that point it has not been decided whether or not the devices will stay > suspended yet. The status cannot be changed before making that decision, > which only happens in the noirq resume phase. Okay, then it's fine as is (because of your other patch to the runtime PM core, which changes the behavior for pm_runtime_set_suspended()). BTW, I have some additional minor comments to some other parts of the code, but I will start over with a new thread proving you with those comments. Kind regards Uffe
Index: linux-pm/include/linux/pm.h =================================================================== --- linux-pm.orig/include/linux/pm.h +++ linux-pm/include/linux/pm.h @@ -559,6 +559,7 @@ struct pm_subsys_data { * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device. * SMART_PREPARE: Check the return value of the driver's ->prepare callback. * SMART_SUSPEND: No need to resume the device from runtime suspend. + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible. * * Setting SMART_PREPARE instructs bus types and PM domains which may want * system suspend/resume callbacks to be skipped for the device to return 0 from @@ -572,10 +573,14 @@ struct pm_subsys_data { * necessary from the driver's perspective. It also may cause them to skip * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by * the driver if they decide to leave the device in runtime suspend. + * + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the + * driver prefers the device to be left in runtime suspend after system resume. */ -#define DPM_FLAG_NEVER_SKIP BIT(0) -#define DPM_FLAG_SMART_PREPARE BIT(1) -#define DPM_FLAG_SMART_SUSPEND BIT(2) +#define DPM_FLAG_NEVER_SKIP BIT(0) +#define DPM_FLAG_SMART_PREPARE BIT(1) +#define DPM_FLAG_SMART_SUSPEND BIT(2) +#define DPM_FLAG_LEAVE_SUSPENDED BIT(3) struct dev_pm_info { pm_message_t power_state; @@ -597,6 +602,8 @@ struct dev_pm_info { bool wakeup_path:1; bool syscore:1; bool no_pm_callbacks:1; /* Owned by the PM core */ + unsigned int must_resume:1; /* Owned by the PM core */ + unsigned int may_skip_resume:1; /* Set by subsystems */ #else unsigned int should_wakeup:1; #endif @@ -765,6 +772,7 @@ extern int pm_generic_poweroff_late(stru extern int pm_generic_poweroff(struct device *dev); extern void pm_generic_complete(struct device *dev); +extern bool dev_pm_may_skip_resume(struct device *dev); extern bool dev_pm_smart_suspend_and_suspended(struct device *dev); #else /* !CONFIG_PM_SLEEP */ Index: linux-pm/drivers/base/power/main.c =================================================================== --- linux-pm.orig/drivers/base/power/main.c +++ linux-pm/drivers/base/power/main.c @@ -528,6 +528,18 @@ static void dpm_watchdog_clear(struct dp /*------------------------- Resume routines -------------------------*/ /** + * dev_pm_may_skip_resume - System-wide device resume optimization check. + * @dev: Target device. + * + * Checks whether or not the device may be left in suspend after a system-wide + * transition to the working state. + */ +bool dev_pm_may_skip_resume(struct device *dev) +{ + return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE; +} + +/** * device_resume_noirq - Execute a "noirq resume" callback for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. @@ -575,6 +587,12 @@ static int device_resume_noirq(struct de error = dpm_run_callback(callback, dev, state, info); dev->power.is_noirq_suspended = false; + if (dev_pm_may_skip_resume(dev)) { + pm_runtime_set_suspended(dev); + dev->power.is_late_suspended = false; + dev->power.is_suspended = false; + } + Out: complete_all(&dev->power.completion); TRACE_RESUME(error); @@ -1076,6 +1094,22 @@ static pm_message_t resume_event(pm_mess return PMSG_ON; } +static void dpm_superior_set_must_resume(struct device *dev) +{ + struct device_link *link; + int idx; + + if (dev->parent) + dev->parent->power.must_resume = true; + + idx = device_links_read_lock(); + + list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) + link->supplier->power.must_resume = true; + + device_links_read_unlock(idx); +} + /** * __device_suspend_noirq - Execute a "noirq suspend" callback for given device. * @dev: Device to handle. @@ -1127,10 +1161,27 @@ static int __device_suspend_noirq(struct } error = dpm_run_callback(callback, dev, state, info); - if (!error) - dev->power.is_noirq_suspended = true; - else + if (error) { async_error = error; + goto Complete; + } + + dev->power.is_noirq_suspended = true; + + if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) { + /* + * The only safe strategy here is to require that if the device + * may not be left in suspend, resume callbacks must be invoked + * for it. + */ + dev->power.must_resume = dev->power.must_resume || + !dev->power.may_skip_resume; + } else { + dev->power.must_resume = true; + } + + if (dev->power.must_resume) + dpm_superior_set_must_resume(dev); Complete: complete_all(&dev->power.completion); @@ -1487,6 +1538,9 @@ static int __device_suspend(struct devic dev->power.direct_complete = false; } + dev->power.may_skip_resume = false; + dev->power.must_resume = false; + dpm_watchdog_set(&wd, dev); device_lock(dev); @@ -1652,8 +1706,9 @@ static int device_prepare(struct device if (dev->power.syscore) return 0; - WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) && - !pm_runtime_enabled(dev)); + WARN_ON(!pm_runtime_enabled(dev) && + dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND | + DPM_FLAG_LEAVE_SUSPENDED)); /* * If a device's parent goes into runtime suspend at the wrong time, Index: linux-pm/Documentation/driver-api/pm/devices.rst =================================================================== --- linux-pm.orig/Documentation/driver-api/pm/devices.rst +++ linux-pm/Documentation/driver-api/pm/devices.rst @@ -788,6 +788,26 @@ must reflect the "active" status for run During system-wide resume from a sleep state it's easiest to put devices into the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`. -Refer to that document for more information regarding this particular issue as +[Refer to that document for more information regarding this particular issue as well as for information on the device runtime power management framework in -general. +general.] + +However, it may be desirable to leave some devices in runtime suspend after +system transitions to the working state and device drivers can use the +``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer +code) that this is the case. Whether or not the devices will actually be left +in suspend may depend on their state before the given system suspend-resume +cycle and on the type of the system transition under way. In particular, +devices are not left suspended if that transition is a restore from hibernation, +as device states are not guaranteed to be reflected by the information stored in +the hibernation image in that case. + +The middle-layer code involved in the handling of the device has to indicate to +the PM core if the device may be left in suspend with the help of its +:c:member:`power.may_skip_resume` status bit. That has to happen in the "noirq" +phase of the preceding system-wide suspend (or analogous) transition. The +middle layer is then responsible for handling the device as appropriate in its +"noirq" resume callback, which is executed regardless of whether or not the +device may be left suspended, but the other resume callbacks (except for +``->complete``) will be skipped automatically by the PM core if the device +really can be left in suspend.