Message ID | 1849600.vI30ccvhCD@aspire.rjw.lan (mailing list archive) |
---|---|
State | Mainlined |
Delegated to: | Rafael Wysocki |
Headers | show |
On 20 September 2017 at 02:26, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > It quite often is necessary to resume devices from runtime suspend > during system suspend for various reasons (for example, if their > wakeup settings need to be changed), but that requires middle-layer > or subsystem code to follow additional rules which currently are not > clearly documented. > > Namely, if a driver calls pm_runtime_resume() for the device from > its ->suspend (or equivalent) system sleep callback, that may not > work if the middle layer above it has updated the state of the > device from its ->prepare or ->suspend callbacks already in an > incompatible way. For this reason, all middle layers must follow > the rule that, until the ->suspend callback provided by the device's > driver is invoked, the only way in which the device's state can be > updated is by calling pm_runtime_resume() for it, if necessary. > Fortunately enough, all middle layers in the code base today follow > this rule, but it is not explicitly stated anywhere, so do that. > > Note that calling pm_runtime_resume() from the ->suspend callback > of a driver will cause the ->runtime_resume callback provided by the > middle layer to be invoked, but the rule above guarantees that this > callback will nest properly with the middle layer's ->suspend > callback and it will play well with the ->prepare one invoked before. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Still, some comments below.. > --- > > This is a follow-up for the system suspend callbacks discussion during > the Power Management and Energy-Awareness session at the LPC last week. > > In particular, I have been thinking about using pm_runtime_resume() from > driver ->suspend callbacks and it actually appears to be quite defendable to > me as long as it is guaranteed that middle layers will not mess up with the > device state before the driver's ->suspend callback is invoked. If that > is the case, and I *think* that it currently is the case for all of the > middle layers in the tree unless I overlook something (USB anyone?), the > middle layer callbacks involved (->prepare, ->suspend and ->runtime_resume) > should actually nest properly and there should not be problems with that. > So, my proposal here would be to simply go ahead and document the rules > regarding this case without modifying the code. > > At the same time, I see at least two general problems with calling > pm_runtime_force_suspend() from the ->suspend callbacks of device drivers > (unless the middle layers involved are trivial). > > First, note that the middle layer callbacks involved in that case are > ->prepare, ->suspend, ->runtime_suspend (called indirectly from within the > former) and then *also* ->suspend_late and ->suspend_noirq, because the PM > core will call the last two from the middle layer as it has no information > that pm_runtime_force_suspend() was called for the device in the "suspend" > phase. Of course, in general, what the middle layer ->suspend_late and > ->suspend_noirq do is not guaranteed to play well with what its > ->runtime_suspend does even if ->suspend itself is OK (but for ->runtime_resume > all of that actually works, because the state it leaves the device in should > be compatible with the system suspend callbacks invoked in the later phases). > > Second, leaving devices in runtime suspend in the "suspend" phase of system > suspend is fishy even when their runtime PM is disabled, because that doesn't > guarantee anything regarding their children or possible consumers. Runtime > PM may still be enabled for those devices at that time and runtime resume may > be triggered for them later, in which case it all quickly falls apart. This is true, although to me this is a about a different problem and has very little to do with pm_runtime_force_suspend(). More precisely, whether runtime PM becomes disabled in the suspend phase or suspend_late phase, really doesn't matter. Because in the end this is about suspending/resuming devices in the correct order. > > IOW, there are reasons why the PM core bumps up the runtime PM usage counters > of all devices during system suspend and they also apply to runtime suspend > callbacks being invoked directly with runtime PM disabled for the given device. > Frankly, it generally is only safe to leave a device in runtime suspend during > system suspend if it can be guarateed that the system suspend callbacks in the > subsequent suspend phases will not be invoked for it at all. I understand this is perfectly true for some of the non-trivial middle layers, however just to be clear, this statement don't have to serve as a general rule for all cases, right? Moreover, bumping the runtime PM usage count (pm_runtime_get_noresume()) in the device_prepare/suspend() phase, was originally added to prevent the runtime PM core from runtime suspending a device, in cases when runtime PM has been enabled for it. Preventing the ->runtime_suspend() callback from being invoked when runtime PM is disabled, just doesn't make any sense to me. > > Thanks, > Rafael > > --- > Documentation/driver-api/pm/devices.rst | 25 ++++++++++++++++++++++++- > 1 file changed, 24 insertions(+), 1 deletion(-) > > Index: linux-pm/Documentation/driver-api/pm/devices.rst > =================================================================== > --- linux-pm.orig/Documentation/driver-api/pm/devices.rst > +++ linux-pm/Documentation/driver-api/pm/devices.rst > @@ -328,7 +328,10 @@ the phases are: ``prepare``, ``suspend`` > After the ``->prepare`` callback method returns, no new children may be > registered below the device. The method may also prepare the device or > driver in some way for the upcoming system power transition, but it > - should not put the device into a low-power state. > + should not put the device into a low-power state. Moreover, if the > + device supports runtime power management, the ``->prepare`` callback > + method must not update its state in case it is necessary to resume it > + from runtime suspend later on. > > For devices supporting runtime power management, the return value of the > prepare callback can be used to indicate to the PM core that it may > @@ -356,6 +359,16 @@ the phases are: ``prepare``, ``suspend`` > the appropriate low-power state, depending on the bus type the device is > on, and they may enable wakeup events. > > + However, for devices supporting runtime power management, the > + ``->suspend`` methods provided by subsystems (bus types and PM domains > + in particular) must follow an additional rule regarding what can be done > + to the devices before their drivers' ``->suspend`` methods are called. > + Namely, they can only resume the devices from runtime suspend by > + calling :c:func:`pm_runtime_resume` for them, if that is necessary, and > + they must not update the state of the devices in any other way at that > + time (in case the drivers need to resume the devices from runtime > + suspend in their ``->suspend`` methods). > + > 3. For a number of devices it is convenient to split suspend into the > "quiesce device" and "save device state" phases, in which cases > ``suspend_late`` is meant to do the latter. It is always executed after > @@ -729,6 +742,16 @@ state temporarily, for example so that i > disabled. This all depends on the hardware and the design of the subsystem and > device driver in question. > > +If it is necessary to resume a device from runtime suspend during a system-wide > +transition into a sleep state, that can be done by calling > +:c:func:`pm_runtime_resume` for it from the ``->suspend`` callback (or its > +couterpart for transitions related to hibernation) of either the device's driver > +or a subsystem responsible for it (for example, a bus type or a PM domain). > +That is guaranteed to work by the requirement that subsystems must not change > +the state of devices (possibly except for resuming them from runtime suspend) > +from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before* > +invoking device drivers' ``->suspend`` callbacks (or equivalent). > + > During system-wide resume from a sleep state it's easiest to put devices into > the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`. > Refer to that document for more information regarding this particular issue as > Kind regards Uffe
On Wed, Sep 20, 2017 at 2:28 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > On 20 September 2017 at 02:26, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> It quite often is necessary to resume devices from runtime suspend >> during system suspend for various reasons (for example, if their >> wakeup settings need to be changed), but that requires middle-layer >> or subsystem code to follow additional rules which currently are not >> clearly documented. >> >> Namely, if a driver calls pm_runtime_resume() for the device from >> its ->suspend (or equivalent) system sleep callback, that may not >> work if the middle layer above it has updated the state of the >> device from its ->prepare or ->suspend callbacks already in an >> incompatible way. For this reason, all middle layers must follow >> the rule that, until the ->suspend callback provided by the device's >> driver is invoked, the only way in which the device's state can be >> updated is by calling pm_runtime_resume() for it, if necessary. >> Fortunately enough, all middle layers in the code base today follow >> this rule, but it is not explicitly stated anywhere, so do that. >> >> Note that calling pm_runtime_resume() from the ->suspend callback >> of a driver will cause the ->runtime_resume callback provided by the >> middle layer to be invoked, but the rule above guarantees that this >> callback will nest properly with the middle layer's ->suspend >> callback and it will play well with the ->prepare one invoked before. >> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Thanks! > Still, some comments below.. > >> --- >> >> This is a follow-up for the system suspend callbacks discussion during >> the Power Management and Energy-Awareness session at the LPC last week. >> >> In particular, I have been thinking about using pm_runtime_resume() from >> driver ->suspend callbacks and it actually appears to be quite defendable to >> me as long as it is guaranteed that middle layers will not mess up with the >> device state before the driver's ->suspend callback is invoked. If that >> is the case, and I *think* that it currently is the case for all of the >> middle layers in the tree unless I overlook something (USB anyone?), the >> middle layer callbacks involved (->prepare, ->suspend and ->runtime_resume) >> should actually nest properly and there should not be problems with that. >> So, my proposal here would be to simply go ahead and document the rules >> regarding this case without modifying the code. >> >> At the same time, I see at least two general problems with calling >> pm_runtime_force_suspend() from the ->suspend callbacks of device drivers >> (unless the middle layers involved are trivial). >> >> First, note that the middle layer callbacks involved in that case are >> ->prepare, ->suspend, ->runtime_suspend (called indirectly from within the >> former) and then *also* ->suspend_late and ->suspend_noirq, because the PM >> core will call the last two from the middle layer as it has no information >> that pm_runtime_force_suspend() was called for the device in the "suspend" >> phase. Of course, in general, what the middle layer ->suspend_late and >> ->suspend_noirq do is not guaranteed to play well with what its >> ->runtime_suspend does even if ->suspend itself is OK (but for ->runtime_resume >> all of that actually works, because the state it leaves the device in should >> be compatible with the system suspend callbacks invoked in the later phases). >> >> Second, leaving devices in runtime suspend in the "suspend" phase of system >> suspend is fishy even when their runtime PM is disabled, because that doesn't >> guarantee anything regarding their children or possible consumers. Runtime >> PM may still be enabled for those devices at that time and runtime resume may >> be triggered for them later, in which case it all quickly falls apart. > > This is true, although to me this is a about a different problem and > has very little to do with pm_runtime_force_suspend(). > > More precisely, whether runtime PM becomes disabled in the suspend > phase or suspend_late phase, really doesn't matter. Because in the end > this is about suspending/resuming devices in the correct order. Yes, it is, but this is not my point (I didn't make it clear enough I guess). At the time you make the decision to disable runtime PM for a parent (say) and leave it in runtime suspend, all of its children are suspended just fine (otherwise the parent wouldn't have been suspended too). However, you *also* need to make sure that there will be no attempts to resume any of them *after* that point, which practically means that either runtime PM has to have been disabled already for all of them at the time it is disabled for the parent, or there has to be another guarantee in place. That's why the core tries to enforce the "runtime PM disabled for the entire hierarchy below" guarantee for the devices with direct_complete set, but that may just be overkill in many cases. I guess it may be better to use WARN_ON() to catch the cases in which things may really go wrong. >> IOW, there are reasons why the PM core bumps up the runtime PM usage counters >> of all devices during system suspend and they also apply to runtime suspend >> callbacks being invoked directly with runtime PM disabled for the given device. >> Frankly, it generally is only safe to leave a device in runtime suspend during >> system suspend if it can be guarateed that the system suspend callbacks in the >> subsequent suspend phases will not be invoked for it at all. > > I understand this is perfectly true for some of the non-trivial middle > layers, however just to be clear, this statement don't have to serve > as a general rule for all cases, right? Well, a really general version of it is something like "it is only safe to leave a device in runtime suspend during system suspend if it can be guaranteed that the system suspend callbacks in the subsequent suspend phases will not change its state" and the most effective way to make that guarantee is to prevent them from being invoked at all. :-) > Moreover, bumping the runtime PM usage count > (pm_runtime_get_noresume()) in the device_prepare/suspend() phase, was > originally added to prevent the runtime PM core from runtime > suspending a device, in cases when runtime PM has been enabled for it. > Preventing the ->runtime_suspend() callback from being invoked when > runtime PM is disabled, just doesn't make any sense to me. The problem is that the functionality of ->runtime_suspend() in principle overlaps with the functionality of ->suspend(), ->suspend_late() and ->suspend_noirq() combined, but it need not be entirely the same. Therefore if you invoke ->runtime_suspend() anywhere between the beginning of ->suspend() and the end of ->suspend_noirq(), the remaining code in the system sleep callbacks needs to know about that in order to avoid, for example, attempting to power down the device for the second time in a row, which very well may kill the system in some extreme cases. Of course, if those callbacks are trivial, this problem goes away, but they need not be trivial and if you are a platform driver (or an i2c/spi driver too for that matter), you aren't guaranteed that they will always be trivial. That is quite a bit of an issue to me. Thanks, Rafael
On Wed, 20 Sep 2017, Rafael J. Wysocki wrote: > >> Second, leaving devices in runtime suspend in the "suspend" phase of system > >> suspend is fishy even when their runtime PM is disabled, because that doesn't > >> guarantee anything regarding their children or possible consumers. Runtime > >> PM may still be enabled for those devices at that time and runtime resume may > >> be triggered for them later, in which case it all quickly falls apart. > > > > This is true, although to me this is a about a different problem and > > has very little to do with pm_runtime_force_suspend(). > > > > More precisely, whether runtime PM becomes disabled in the suspend > > phase or suspend_late phase, really doesn't matter. Because in the end > > this is about suspending/resuming devices in the correct order. > > Yes, it is, but this is not my point (I didn't make it clear enough I guess). > > At the time you make the decision to disable runtime PM for a parent > (say) and leave it in runtime suspend, all of its children are > suspended just fine (otherwise the parent wouldn't have been suspended > too). However, you *also* need to make sure that there will be no > attempts to resume any of them *after* that point, which practically > means that either runtime PM has to have been disabled already for all > of them at the time it is disabled for the parent, or there has to be > another guarantee in place. > > That's why the core tries to enforce the "runtime PM disabled for the > entire hierarchy below" guarantee for the devices with direct_complete > set, but that may just be overkill in many cases. I guess it may be > better to use WARN_ON() to catch the cases in which things may really > go wrong. That's a good idea. > >> IOW, there are reasons why the PM core bumps up the runtime PM usage counters > >> of all devices during system suspend and they also apply to runtime suspend > >> callbacks being invoked directly with runtime PM disabled for the given device. > >> Frankly, it generally is only safe to leave a device in runtime suspend during > >> system suspend if it can be guarateed that the system suspend callbacks in the > >> subsequent suspend phases will not be invoked for it at all. > > > > I understand this is perfectly true for some of the non-trivial middle > > layers, however just to be clear, this statement don't have to serve > > as a general rule for all cases, right? > > Well, a really general version of it is something like "it is only > safe to leave a device in runtime suspend during system suspend if it > can be guaranteed that the system suspend callbacks in the subsequent > suspend phases will not change its state" and the most effective way > to make that guarantee is to prevent them from being invoked at all. > :-) Of course, this can be overkill. It's probably common for there to be little physical difference between a device's "suspended" state and its "runtime-suspended" state. In such cases, the middle layer and the subsequent callbacks merely have to recognize that the device is already at low power (by checking an internal flag, for instance) and then do nothing when they see that it is. > > Moreover, bumping the runtime PM usage count > > (pm_runtime_get_noresume()) in the device_prepare/suspend() phase, was > > originally added to prevent the runtime PM core from runtime > > suspending a device, in cases when runtime PM has been enabled for it. > > Preventing the ->runtime_suspend() callback from being invoked when > > runtime PM is disabled, just doesn't make any sense to me. > > The problem is that the functionality of ->runtime_suspend() in > principle overlaps with the functionality of ->suspend(), > ->suspend_late() and ->suspend_noirq() combined, but it need not be > entirely the same. Therefore if you invoke ->runtime_suspend() > anywhere between the beginning of ->suspend() and the end of > ->suspend_noirq(), the remaining code in the system sleep callbacks > needs to know about that in order to avoid, for example, attempting to > power down the device for the second time in a row, which very well > may kill the system in some extreme cases. Another possibility is the arrival of an ill-timed runtime resume after some of the suspend callbacks have run (because of a wakeup request, for example). In short, system PM and runtime PM do overlap considerably, and this means that it would be easy for either one to interfere with the other unless the developers are very careful. > Of course, if those callbacks are trivial, this problem goes away, but > they need not be trivial and if you are a platform driver (or an > i2c/spi driver too for that matter), you aren't guaranteed that they > will always be trivial. That is quite a bit of an issue to me. Indeed. It may not be possible to come up with a firm set of detailed rules that apply everywhere. But we should make sure that people are aware of the issues. Alan Stern
On Wed, Sep 20, 2017 at 04:01:32PM +0200, Rafael J. Wysocki wrote: > On Wed, Sep 20, 2017 at 2:28 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On 20 September 2017 at 02:26, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > >> > >> Second, leaving devices in runtime suspend in the "suspend" phase of system > >> suspend is fishy even when their runtime PM is disabled, because that doesn't > >> guarantee anything regarding their children or possible consumers. Runtime > >> PM may still be enabled for those devices at that time and runtime resume may > >> be triggered for them later, in which case it all quickly falls apart. > > > > This is true, although to me this is a about a different problem and > > has very little to do with pm_runtime_force_suspend(). > > > > More precisely, whether runtime PM becomes disabled in the suspend > > phase or suspend_late phase, really doesn't matter. Because in the end > > this is about suspending/resuming devices in the correct order. > > Yes, it is, but this is not my point (I didn't make it clear enough I guess). > > At the time you make the decision to disable runtime PM for a parent > (say) and leave it in runtime suspend, all of its children are > suspended just fine (otherwise the parent wouldn't have been suspended > too). However, you *also* need to make sure that there will be no > attempts to resume any of them *after* that point, which practically > means that either runtime PM has to have been disabled already for all > of them at the time it is disabled for the parent, or there has to be > another guarantee in place. > > That's why the core tries to enforce the "runtime PM disabled for the > entire hierarchy below" guarantee for the devices with direct_complete > set, but that may just be overkill in many cases. I guess it may be > better to use WARN_ON() to catch the cases in which things may really > go wrong. I read this half a dozen times and I'm still confused. Moreover, Documentation/driver-api/pm/devices.rst says: Runtime Power Management model: Devices may also be put into low-power states while the system is running, independently of other power management activity in principle. However, devices are not generally independent of each other (for example, a parent device cannot be suspended unless all of its child devices have been suspended). ... However, isn't this a fundamental difference of runtime suspend vs. system suspend that parent devices *can* be runtime suspended before their children? E.g. an audio codec could keep running while the i2c bus used to program its registers can be runtime suspended. If this is correct I think it would be useful to spell it out explicitly in the documentation. During system suspend, pm core will suspend children first, and if the child's ->suspend hook uses the i2c bus to access registers, it will implicitly runtime resume the i2c bus (e.g. due to pm_runtime_get_sync() in i2c_dw_xfer()). Later pm core will ->suspend the i2c bus. I have a hunch the root of the problem is that ->prepare walks the tree in top-down order, and its return value is used to decide about direct-complete. Why does it do that? Shouldn't pm core check the direct_complete flag during ->suspend if the device is in runtime suspend, to decide whether to skip runtime resume + ->suspend for *this* device? Johannes
On Wed, Sep 20, 2017 at 6:27 PM, Johannes Stezenbach <js@sig21.net> wrote: > On Wed, Sep 20, 2017 at 04:01:32PM +0200, Rafael J. Wysocki wrote: >> On Wed, Sep 20, 2017 at 2:28 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote: >> > On 20 September 2017 at 02:26, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> >> >> >> Second, leaving devices in runtime suspend in the "suspend" phase of system >> >> suspend is fishy even when their runtime PM is disabled, because that doesn't >> >> guarantee anything regarding their children or possible consumers. Runtime >> >> PM may still be enabled for those devices at that time and runtime resume may >> >> be triggered for them later, in which case it all quickly falls apart. >> > >> > This is true, although to me this is a about a different problem and >> > has very little to do with pm_runtime_force_suspend(). >> > >> > More precisely, whether runtime PM becomes disabled in the suspend >> > phase or suspend_late phase, really doesn't matter. Because in the end >> > this is about suspending/resuming devices in the correct order. >> >> Yes, it is, but this is not my point (I didn't make it clear enough I guess). >> >> At the time you make the decision to disable runtime PM for a parent >> (say) and leave it in runtime suspend, all of its children are >> suspended just fine (otherwise the parent wouldn't have been suspended >> too). However, you *also* need to make sure that there will be no >> attempts to resume any of them *after* that point, which practically >> means that either runtime PM has to have been disabled already for all >> of them at the time it is disabled for the parent, or there has to be >> another guarantee in place. >> >> That's why the core tries to enforce the "runtime PM disabled for the >> entire hierarchy below" guarantee for the devices with direct_complete >> set, but that may just be overkill in many cases. I guess it may be >> better to use WARN_ON() to catch the cases in which things may really >> go wrong. > > I read this half a dozen times and I'm still confused. > Moreover, Documentation/driver-api/pm/devices.rst says: > > Runtime Power Management model: > > Devices may also be put into low-power states while the system is > running, independently of other power management activity in principle. > However, devices are not generally independent of each other (for > example, a parent device cannot be suspended unless all of its child > devices have been suspended). ... > > However, isn't this a fundamental difference of runtime suspend > vs. system suspend that parent devices *can* be runtime suspended > before their children? No, it isn't. > E.g. an audio codec could keep running > while the i2c bus used to program its registers can be runtime suspended. > If this is correct I think it would be useful to spell it out explicitly > in the documentation. That's because the i2c bus uses the ignore_children flag that allows it to override the general rules. :-) > During system suspend, pm core will suspend children first, > and if the child's ->suspend hook uses the i2c bus to access registers, > it will implicitly runtime resume the i2c bus (e.g. due to pm_runtime_get_sync() > in i2c_dw_xfer()). Later pm core will ->suspend the i2c bus. Yup. > I have a hunch the root of the problem is that ->prepare walks the tree > in top-down order, and its return value is used to decide about > direct-complete. Why does it do that? Shouldn't pm core check > the direct_complete flag during ->suspend if the device > is in runtime suspend, to decide whether to skip runtime resume + ->suspend > for *this* device? direct_complete has nothing to do with this. First off, the PM core does check the direct_complete flag in __device_suspend() and does more-or-less what you are saying. However, that flag is initialized in device_prepare() with the help of the ->suspend() return value, because whether or not it makes sense to set that flag depends on some conditions that may change between consecutive system suspend-resume cycles in general and need to be checked in advance before setting it. HTH Rafael
On Wed, Sep 20, 2017 at 6:15 PM, Alan Stern <stern@rowland.harvard.edu> wrote: > On Wed, 20 Sep 2017, Rafael J. Wysocki wrote: > >> >> Second, leaving devices in runtime suspend in the "suspend" phase of system >> >> suspend is fishy even when their runtime PM is disabled, because that doesn't >> >> guarantee anything regarding their children or possible consumers. Runtime >> >> PM may still be enabled for those devices at that time and runtime resume may >> >> be triggered for them later, in which case it all quickly falls apart. >> > >> > This is true, although to me this is a about a different problem and >> > has very little to do with pm_runtime_force_suspend(). >> > >> > More precisely, whether runtime PM becomes disabled in the suspend >> > phase or suspend_late phase, really doesn't matter. Because in the end >> > this is about suspending/resuming devices in the correct order. >> >> Yes, it is, but this is not my point (I didn't make it clear enough I guess). >> >> At the time you make the decision to disable runtime PM for a parent >> (say) and leave it in runtime suspend, all of its children are >> suspended just fine (otherwise the parent wouldn't have been suspended >> too). However, you *also* need to make sure that there will be no >> attempts to resume any of them *after* that point, which practically >> means that either runtime PM has to have been disabled already for all >> of them at the time it is disabled for the parent, or there has to be >> another guarantee in place. >> >> That's why the core tries to enforce the "runtime PM disabled for the >> entire hierarchy below" guarantee for the devices with direct_complete >> set, but that may just be overkill in many cases. I guess it may be >> better to use WARN_ON() to catch the cases in which things may really >> go wrong. > > That's a good idea. > >> >> IOW, there are reasons why the PM core bumps up the runtime PM usage counters >> >> of all devices during system suspend and they also apply to runtime suspend >> >> callbacks being invoked directly with runtime PM disabled for the given device. >> >> Frankly, it generally is only safe to leave a device in runtime suspend during >> >> system suspend if it can be guarateed that the system suspend callbacks in the >> >> subsequent suspend phases will not be invoked for it at all. >> > >> > I understand this is perfectly true for some of the non-trivial middle >> > layers, however just to be clear, this statement don't have to serve >> > as a general rule for all cases, right? >> >> Well, a really general version of it is something like "it is only >> safe to leave a device in runtime suspend during system suspend if it >> can be guaranteed that the system suspend callbacks in the subsequent >> suspend phases will not change its state" and the most effective way >> to make that guarantee is to prevent them from being invoked at all. >> :-) > > Of course, this can be overkill. It's probably common for there to be > little physical difference between a device's "suspended" state and > its "runtime-suspended" state. In such cases, the middle layer and > the subsequent callbacks merely have to recognize that the device is > already at low power (by checking an internal flag, for instance) and > then do nothing when they see that it is. Right. The idea, though, was to make it work even if they didn't do that. :-) >> > Moreover, bumping the runtime PM usage count >> > (pm_runtime_get_noresume()) in the device_prepare/suspend() phase, was >> > originally added to prevent the runtime PM core from runtime >> > suspending a device, in cases when runtime PM has been enabled for it. >> > Preventing the ->runtime_suspend() callback from being invoked when >> > runtime PM is disabled, just doesn't make any sense to me. >> >> The problem is that the functionality of ->runtime_suspend() in >> principle overlaps with the functionality of ->suspend(), >> ->suspend_late() and ->suspend_noirq() combined, but it need not be >> entirely the same. Therefore if you invoke ->runtime_suspend() >> anywhere between the beginning of ->suspend() and the end of >> ->suspend_noirq(), the remaining code in the system sleep callbacks >> needs to know about that in order to avoid, for example, attempting to >> power down the device for the second time in a row, which very well >> may kill the system in some extreme cases. > > Another possibility is the arrival of an ill-timed runtime resume after > some of the suspend callbacks have run (because of a wakeup request, > for example). In short, system PM and runtime PM do overlap > considerably, and this means that it would be easy for either one to > interfere with the other unless the developers are very careful. Right. >> Of course, if those callbacks are trivial, this problem goes away, but >> they need not be trivial and if you are a platform driver (or an >> i2c/spi driver too for that matter), you aren't guaranteed that they >> will always be trivial. That is quite a bit of an issue to me. > > Indeed. It may not be possible to come up with a firm set of detailed > rules that apply everywhere. But we should make sure that people are > aware of the issues. Agreed. Thanks, Rafael
On Thu, Sep 21, 2017 at 02:39:30AM +0200, Rafael J. Wysocki wrote: > On Wed, Sep 20, 2017 at 6:27 PM, Johannes Stezenbach <js@sig21.net> wrote: > > > > E.g. an audio codec could keep running > > while the i2c bus used to program its registers can be runtime suspended. > > If this is correct I think it would be useful to spell it out explicitly > > in the documentation. > > That's because the i2c bus uses the ignore_children flag that allows > it to override the general rules. :-) Ah! I was looking at Documentation/driver-api/pm only (which is changed by your patch), but this is documented in Documentation/power (and obviously I hadn't checked the code, shame on me). > direct_complete has nothing to do with this. Oh? Reading again, do I get this right: 1. simple method: always call pm_runtime_resume() in ->suspend(), then suspend the driver again 2. optimization: if pm_runtime_suspended(), the driver's ->suspend() can possibly do nothing if conditions permit, otherwise it calls pm_runtime_resume() and then suspends 3. optimization: tell pm core to skip ->suspend() via return value from ->prepare() which sets direct_complete ...and your patch only deals with 1 and 2. Sorry to hijack your thread for side discussion, it was inadvertant due to my lack of understanding. > First off, the PM core does check the direct_complete flag in > __device_suspend() and does more-or-less what you are saying. > > However, that flag is initialized in device_prepare() with the help of > the ->suspend() return value, because whether or not it makes sense to you mean ->prepare(), right? > set that flag depends on some conditions that may change between > consecutive system suspend-resume cycles in general and need to be > checked in advance before setting it. > > HTH It does, however the question remains *why* it needs to check it in ->prepare() and not right before calling ->suspend(). Using ->prepare() for the purpose seems wrong since it traverses the hierarchy in the "wrong" order. Only right before calling ->suspend() the driver knows if its current state allows it to skip any further actions for suspend, because suspending children or other users may cause pm_runtime_resume() for it. (In the back of my head I have the scenario of bug #196861, some completely different driver uses i2c via ACPI OpRegion during its suspend.) Thanks, Johannes
On Thursday, September 21, 2017 11:27:13 AM CEST Johannes Stezenbach wrote: > On Thu, Sep 21, 2017 at 02:39:30AM +0200, Rafael J. Wysocki wrote: > > On Wed, Sep 20, 2017 at 6:27 PM, Johannes Stezenbach <js@sig21.net> wrote: > > > > > > E.g. an audio codec could keep running > > > while the i2c bus used to program its registers can be runtime suspended. > > > If this is correct I think it would be useful to spell it out explicitly > > > in the documentation. > > > > That's because the i2c bus uses the ignore_children flag that allows > > it to override the general rules. :-) > > Ah! I was looking at Documentation/driver-api/pm only (which is > changed by your patch), but this is documented in Documentation/power > (and obviously I hadn't checked the code, shame on me). > > > direct_complete has nothing to do with this. > > Oh? Reading again, do I get this right: > > 1. simple method: always call pm_runtime_resume() in ->suspend(), > then suspend the driver again Right. > 2. optimization: if pm_runtime_suspended(), the driver's ->suspend() > can possibly do nothing if conditions permit, otherwise it calls > pm_runtime_resume() and then suspends Well, that may work in some cases. :-) > 3. optimization: tell pm core to skip ->suspend() via return value > from ->prepare() which sets direct_complete Yup. > ...and your patch only deals with 1 and 2. Yes, basically. > Sorry to hijack your thread for side discussion, it was > inadvertant due to my lack of understanding. > > > > First off, the PM core does check the direct_complete flag in > > __device_suspend() and does more-or-less what you are saying. > > > > However, that flag is initialized in device_prepare() with the help of > > the ->suspend() return value, because whether or not it makes sense to > > you mean ->prepare(), right? Right (sorry). > > set that flag depends on some conditions that may change between > > consecutive system suspend-resume cycles in general and need to be > > checked in advance before setting it. > > > > HTH > > It does, however the question remains *why* it needs to check > it in ->prepare() and not right before calling ->suspend(). Becuase the core needs input from middle layers in some cases before it decides to call ->suspend(). > Using ->prepare() for the purpose seems wrong since it traverses > the hierarchy in the "wrong" order. That doesn't matter. What matters is whether or not the device's state is "compatible" with system sleep at the ->prepare() time (which is checked by some middle layers in ->prepare()). BTW, did you notice the pm_runtime_status_suspended() checks in __device_suspend()? They are in there in case the device has been resumed from runtime suspend after ->prepare(). > Only right before calling ->suspend() the driver knows if its current > state allows it to skip any further actions for suspend, because > suspending children or other users may cause pm_runtime_resume() > for it. (In the back of my head I have the scenario of > bug #196861, some completely different driver uses > i2c via ACPI OpRegion during its suspend.) Yes, that's a special case (again, because of the way i2c handles runtime PM overall). Thanks, Rafael
On Thu, 21 Sep 2017, Johannes Stezenbach wrote: > On Thu, Sep 21, 2017 at 02:39:30AM +0200, Rafael J. Wysocki wrote: > > On Wed, Sep 20, 2017 at 6:27 PM, Johannes Stezenbach <js@sig21.net> wrote: > > > > > > E.g. an audio codec could keep running > > > while the i2c bus used to program its registers can be runtime suspended. > > > If this is correct I think it would be useful to spell it out explicitly > > > in the documentation. > > > > That's because the i2c bus uses the ignore_children flag that allows > > it to override the general rules. :-) > > Ah! I was looking at Documentation/driver-api/pm only (which is > changed by your patch), but this is documented in Documentation/power > (and obviously I hadn't checked the code, shame on me). > > > direct_complete has nothing to do with this. > > Oh? Reading again, do I get this right: > > 1. simple method: always call pm_runtime_resume() in ->suspend(), > then suspend the driver again > 2. optimization: if pm_runtime_suspended(), the driver's ->suspend() > can possibly do nothing if conditions permit, otherwise it calls > pm_runtime_resume() and then suspends > 3. optimization: tell pm core to skip ->suspend() via return value > from ->prepare() which sets direct_complete > > ...and your patch only deals with 1 and 2. > > Sorry to hijack your thread for side discussion, it was > inadvertant due to my lack of understanding. > > > > First off, the PM core does check the direct_complete flag in > > __device_suspend() and does more-or-less what you are saying. > > > > However, that flag is initialized in device_prepare() with the help of > > the ->suspend() return value, because whether or not it makes sense to > > you mean ->prepare(), right? > > > set that flag depends on some conditions that may change between > > consecutive system suspend-resume cycles in general and need to be > > checked in advance before setting it. > > > > HTH > > It does, however the question remains *why* it needs to check > it in ->prepare() and not right before calling ->suspend(). > Using ->prepare() for the purpose seems wrong since it traverses > the hierarchy in the "wrong" order. No, it is the _right_ order. If a device's ->prepare() says that direct_complete is okay, but one of its descendants disallows direct_complete, we then want to clear the direct_complete flag in the original device structure. We couldn't do this if we checked the descendant's driver first. > Only right before > calling ->suspend() the driver knows if its current state > allows it to skip any further actions for suspend, because > suspending children or other users may cause pm_runtime_resume() > for it. If the device gets runtime-resumed before ->suspend() would be called then the direct_complete setting doesn't matter. The PM core follows the direct_complete path only if the device is already in runtime suspend when the ->suspend() callback would normally be invoked. And if the device does get runtime-resumed like this, it can't be runtime-suspended again. The PM core makes sure of that. Alan Stern > (In the back of my head I have the scenario of > bug #196861, some completely different driver uses > i2c via ACPI OpRegion during its suspend.) > > > Thanks, > Johannes
On Thursday, September 21, 2017 4:36:30 PM CEST Alan Stern wrote: > On Thu, 21 Sep 2017, Johannes Stezenbach wrote: > > > On Thu, Sep 21, 2017 at 02:39:30AM +0200, Rafael J. Wysocki wrote: > > > On Wed, Sep 20, 2017 at 6:27 PM, Johannes Stezenbach <js@sig21.net> wrote: > > > > > > > > E.g. an audio codec could keep running > > > > while the i2c bus used to program its registers can be runtime suspended. > > > > If this is correct I think it would be useful to spell it out explicitly > > > > in the documentation. > > > > > > That's because the i2c bus uses the ignore_children flag that allows > > > it to override the general rules. :-) > > > > Ah! I was looking at Documentation/driver-api/pm only (which is > > changed by your patch), but this is documented in Documentation/power > > (and obviously I hadn't checked the code, shame on me). > > > > > direct_complete has nothing to do with this. > > > > Oh? Reading again, do I get this right: > > > > 1. simple method: always call pm_runtime_resume() in ->suspend(), > > then suspend the driver again > > 2. optimization: if pm_runtime_suspended(), the driver's ->suspend() > > can possibly do nothing if conditions permit, otherwise it calls > > pm_runtime_resume() and then suspends > > 3. optimization: tell pm core to skip ->suspend() via return value > > from ->prepare() which sets direct_complete > > > > ...and your patch only deals with 1 and 2. > > > > Sorry to hijack your thread for side discussion, it was > > inadvertant due to my lack of understanding. > > > > > > > First off, the PM core does check the direct_complete flag in > > > __device_suspend() and does more-or-less what you are saying. > > > > > > However, that flag is initialized in device_prepare() with the help of > > > the ->suspend() return value, because whether or not it makes sense to > > > > you mean ->prepare(), right? > > > > > set that flag depends on some conditions that may change between > > > consecutive system suspend-resume cycles in general and need to be > > > checked in advance before setting it. > > > > > > HTH > > > > It does, however the question remains *why* it needs to check > > it in ->prepare() and not right before calling ->suspend(). > > Using ->prepare() for the purpose seems wrong since it traverses > > the hierarchy in the "wrong" order. > > No, it is the _right_ order. If a device's ->prepare() says that > direct_complete is okay, but one of its descendants disallows > direct_complete, we then want to clear the direct_complete flag in the > original device structure. We couldn't do this if we checked the > descendant's driver first. But we really clear it for parents (and suppliers) in __device_suspend(), which is still OK, because that is first called for the children (and consumers). So the ordering of ->prepare() doesn't really matter here IMO. Thanks, Rafael
[...] >>> Second, leaving devices in runtime suspend in the "suspend" phase of system >>> suspend is fishy even when their runtime PM is disabled, because that doesn't >>> guarantee anything regarding their children or possible consumers. Runtime >>> PM may still be enabled for those devices at that time and runtime resume may >>> be triggered for them later, in which case it all quickly falls apart. >> >> This is true, although to me this is a about a different problem and >> has very little to do with pm_runtime_force_suspend(). >> >> More precisely, whether runtime PM becomes disabled in the suspend >> phase or suspend_late phase, really doesn't matter. Because in the end >> this is about suspending/resuming devices in the correct order. > > Yes, it is, but this is not my point (I didn't make it clear enough I guess). > > At the time you make the decision to disable runtime PM for a parent > (say) and leave it in runtime suspend, all of its children are > suspended just fine (otherwise the parent wouldn't have been suspended > too). However, you *also* need to make sure that there will be no > attempts to resume any of them *after* that point, which practically > means that either runtime PM has to have been disabled already for all > of them at the time it is disabled for the parent, or there has to be > another guarantee in place. > > That's why the core tries to enforce the "runtime PM disabled for the > entire hierarchy below" guarantee for the devices with direct_complete > set, but that may just be overkill in many cases. I guess it may be > better to use WARN_ON() to catch the cases in which things may really > go wrong. Yes, using WARN_ON() is some clever way, seems like a great idea. My point is, disabling runtime PM in a hierarchical manner, isn't a solution to the suspend/resume ordering problem. > >>> IOW, there are reasons why the PM core bumps up the runtime PM usage counters >>> of all devices during system suspend and they also apply to runtime suspend >>> callbacks being invoked directly with runtime PM disabled for the given device. >>> Frankly, it generally is only safe to leave a device in runtime suspend during >>> system suspend if it can be guarateed that the system suspend callbacks in the >>> subsequent suspend phases will not be invoked for it at all. >> >> I understand this is perfectly true for some of the non-trivial middle >> layers, however just to be clear, this statement don't have to serve >> as a general rule for all cases, right? > > Well, a really general version of it is something like "it is only > safe to leave a device in runtime suspend during system suspend if it > can be guaranteed that the system suspend callbacks in the subsequent > suspend phases will not change its state" and the most effective way > to make that guarantee is to prevent them from being invoked at all. > :-) > >> Moreover, bumping the runtime PM usage count >> (pm_runtime_get_noresume()) in the device_prepare/suspend() phase, was >> originally added to prevent the runtime PM core from runtime >> suspending a device, in cases when runtime PM has been enabled for it. >> Preventing the ->runtime_suspend() callback from being invoked when >> runtime PM is disabled, just doesn't make any sense to me. > > The problem is that the functionality of ->runtime_suspend() in > principle overlaps with the functionality of ->suspend(), > ->suspend_late() and ->suspend_noirq() combined, but it need not be > entirely the same. Therefore if you invoke ->runtime_suspend() > anywhere between the beginning of ->suspend() and the end of > ->suspend_noirq(), the remaining code in the system sleep callbacks > needs to know about that in order to avoid, for example, attempting to > power down the device for the second time in a row, which very well > may kill the system in some extreme cases. Agree, this may be a bit tricky to deal with for the more complex middle layers. However, this is also very good reason to why it's useful keep the runtime PM status up to date, always reflecting the HW status of the device - even during system suspend phases. > > Of course, if those callbacks are trivial, this problem goes away, but > they need not be trivial and if you are a platform driver (or an > i2c/spi driver too for that matter), you aren't guaranteed that they > will always be trivial. That is quite a bit of an issue to me. I understand your concern, but I don't share it. :-) To me, I would rather keep things as simple as possible, then deal with the complexity once we have a valid case for it. Kind regards Uffe
On Friday, September 22, 2017 9:22:47 AM CEST Ulf Hansson wrote: > [...] > > >>> Second, leaving devices in runtime suspend in the "suspend" phase of system > >>> suspend is fishy even when their runtime PM is disabled, because that doesn't > >>> guarantee anything regarding their children or possible consumers. Runtime > >>> PM may still be enabled for those devices at that time and runtime resume may > >>> be triggered for them later, in which case it all quickly falls apart. > >> > >> This is true, although to me this is a about a different problem and > >> has very little to do with pm_runtime_force_suspend(). > >> > >> More precisely, whether runtime PM becomes disabled in the suspend > >> phase or suspend_late phase, really doesn't matter. Because in the end > >> this is about suspending/resuming devices in the correct order. > > > > Yes, it is, but this is not my point (I didn't make it clear enough I guess). > > > > At the time you make the decision to disable runtime PM for a parent > > (say) and leave it in runtime suspend, all of its children are > > suspended just fine (otherwise the parent wouldn't have been suspended > > too). However, you *also* need to make sure that there will be no > > attempts to resume any of them *after* that point, which practically > > means that either runtime PM has to have been disabled already for all > > of them at the time it is disabled for the parent, or there has to be > > another guarantee in place. > > > > That's why the core tries to enforce the "runtime PM disabled for the > > entire hierarchy below" guarantee for the devices with direct_complete > > set, but that may just be overkill in many cases. I guess it may be > > better to use WARN_ON() to catch the cases in which things may really > > go wrong. > > Yes, using WARN_ON() is some clever way, seems like a great idea. > > My point is, disabling runtime PM in a hierarchical manner, isn't a > solution to the suspend/resume ordering problem. > > > > >>> IOW, there are reasons why the PM core bumps up the runtime PM usage counters > >>> of all devices during system suspend and they also apply to runtime suspend > >>> callbacks being invoked directly with runtime PM disabled for the given device. > >>> Frankly, it generally is only safe to leave a device in runtime suspend during > >>> system suspend if it can be guarateed that the system suspend callbacks in the > >>> subsequent suspend phases will not be invoked for it at all. > >> > >> I understand this is perfectly true for some of the non-trivial middle > >> layers, however just to be clear, this statement don't have to serve > >> as a general rule for all cases, right? > > > > Well, a really general version of it is something like "it is only > > safe to leave a device in runtime suspend during system suspend if it > > can be guaranteed that the system suspend callbacks in the subsequent > > suspend phases will not change its state" and the most effective way > > to make that guarantee is to prevent them from being invoked at all. > > :-) > > > >> Moreover, bumping the runtime PM usage count > >> (pm_runtime_get_noresume()) in the device_prepare/suspend() phase, was > >> originally added to prevent the runtime PM core from runtime > >> suspending a device, in cases when runtime PM has been enabled for it. > >> Preventing the ->runtime_suspend() callback from being invoked when > >> runtime PM is disabled, just doesn't make any sense to me. > > > > The problem is that the functionality of ->runtime_suspend() in > > principle overlaps with the functionality of ->suspend(), > > ->suspend_late() and ->suspend_noirq() combined, but it need not be > > entirely the same. Therefore if you invoke ->runtime_suspend() > > anywhere between the beginning of ->suspend() and the end of > > ->suspend_noirq(), the remaining code in the system sleep callbacks > > needs to know about that in order to avoid, for example, attempting to > > power down the device for the second time in a row, which very well > > may kill the system in some extreme cases. > > Agree, this may be a bit tricky to deal with for the more complex middle layers. > > However, this is also very good reason to why it's useful keep the > runtime PM status up to date, always reflecting the HW status of the > device - even during system suspend phases. > > > > > Of course, if those callbacks are trivial, this problem goes away, but > > they need not be trivial and if you are a platform driver (or an > > i2c/spi driver too for that matter), you aren't guaranteed that they > > will always be trivial. That is quite a bit of an issue to me. > > I understand your concern, but I don't share it. :-) > > To me, I would rather keep things as simple as possible, then deal > with the complexity once we have a valid case for it. But we have valid cases, actually, which is why we are talking about that in the first place. The point is how to address them and *none* of the choices at hand are particuarly simple. Some of them are simply plain unacceptable to me, though, and I have tried very hard to explain why this is the case, but at this point I'm just totally out of patience, so please accept the fact. Thanks, Rafael
Index: linux-pm/Documentation/driver-api/pm/devices.rst =================================================================== --- linux-pm.orig/Documentation/driver-api/pm/devices.rst +++ linux-pm/Documentation/driver-api/pm/devices.rst @@ -328,7 +328,10 @@ the phases are: ``prepare``, ``suspend`` After the ``->prepare`` callback method returns, no new children may be registered below the device. The method may also prepare the device or driver in some way for the upcoming system power transition, but it - should not put the device into a low-power state. + should not put the device into a low-power state. Moreover, if the + device supports runtime power management, the ``->prepare`` callback + method must not update its state in case it is necessary to resume it + from runtime suspend later on. For devices supporting runtime power management, the return value of the prepare callback can be used to indicate to the PM core that it may @@ -356,6 +359,16 @@ the phases are: ``prepare``, ``suspend`` the appropriate low-power state, depending on the bus type the device is on, and they may enable wakeup events. + However, for devices supporting runtime power management, the + ``->suspend`` methods provided by subsystems (bus types and PM domains + in particular) must follow an additional rule regarding what can be done + to the devices before their drivers' ``->suspend`` methods are called. + Namely, they can only resume the devices from runtime suspend by + calling :c:func:`pm_runtime_resume` for them, if that is necessary, and + they must not update the state of the devices in any other way at that + time (in case the drivers need to resume the devices from runtime + suspend in their ``->suspend`` methods). + 3. For a number of devices it is convenient to split suspend into the "quiesce device" and "save device state" phases, in which cases ``suspend_late`` is meant to do the latter. It is always executed after @@ -729,6 +742,16 @@ state temporarily, for example so that i disabled. This all depends on the hardware and the design of the subsystem and device driver in question. +If it is necessary to resume a device from runtime suspend during a system-wide +transition into a sleep state, that can be done by calling +:c:func:`pm_runtime_resume` for it from the ``->suspend`` callback (or its +couterpart for transitions related to hibernation) of either the device's driver +or a subsystem responsible for it (for example, a bus type or a PM domain). +That is guaranteed to work by the requirement that subsystems must not change +the state of devices (possibly except for resuming them from runtime suspend) +from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before* +invoking device drivers' ``->suspend`` callbacks (or equivalent). + During system-wide resume from a sleep state it's easiest to put devices into the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`. Refer to that document for more information regarding this particular issue as