diff mbox

[RFC] PM / Runtime: Allow to inactivate devices during system suspend

Message ID 1384360178-333-1-git-send-email-ulf.hansson@linaro.org (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Ulf Hansson Nov. 13, 2013, 4:29 p.m. UTC
The PM core was preventing devices from going inactive during system
suspend. Remove this constraint and moreover try to inactivate devices
by invoking pm_runtime_idle() before proceeding with device's
suspend_late callbacks.

A great amount of drivers that uses runtime PM will benefit from this.

Drivers that implements some of the system ->suspend* callbacks for the
only reason of making sure the runtime PM resourses gets inactive, can
now solely instead rely on the PM core to handle this.

In the case were drivers needs there runtime resourses to be activated
to be able to handle their system suspend tasks, those can easily just
restore the runtime PM usage count, pm_runtime_put_sync(), once done
with it's suspend operations, which then will inactivate the device.

Cc: Kevin Hilman <khilman@linaro.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

There are a quite long history for how the PM core behaves around this
topic. Changes has been iterated, back for forward, you might even
think that it should have been settled at this point. So, forgive me
for bringing this up, again.

This patch shall obviously not be merged as is, it will cause
regressions for several drivers and subsystems. The intent is instead
to try to follow up on previous discussions to see if we might end up
closer to the sequence, which my RFC PATCH suggests, or if I am just
bothering you completely unnecessary.

---
 drivers/base/power/main.c |   11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

Comments

Alan Stern Nov. 13, 2013, 7:16 p.m. UTC | #1
On Wed, 13 Nov 2013, Ulf Hansson wrote:

> The PM core was preventing devices from going inactive during system
> suspend. Remove this constraint and moreover try to inactivate devices
> by invoking pm_runtime_idle() before proceeding with device's
> suspend_late callbacks.
> 
> A great amount of drivers that uses runtime PM will benefit from this.

It's not clear how drivers will benefit.

> Drivers that implements some of the system ->suspend* callbacks for the
> only reason of making sure the runtime PM resourses gets inactive, can
> now solely instead rely on the PM core to handle this.

Drivers don't have to implement anything special if they want to use
their runtime routines for system suspend.  All they have to do is set
the .suspend and .resume method pointers to point to their
runtime_suspend and runtime_resume routines.

> In the case were drivers needs there runtime resourses to be activated
> to be able to handle their system suspend tasks, those can easily just
> restore the runtime PM usage count, pm_runtime_put_sync(), once done
> with it's suspend operations, which then will inactivate the device.

The reason for doing things the way they are is to avoid races between
system PM callbacks and runtime PM callbacks.  We don't want to have a
runtime_resume routine powering up a device at the same time as a
suspend routine is powering it down.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Nov. 13, 2013, 10:13 p.m. UTC | #2
On Wednesday, November 13, 2013 05:29:38 PM Ulf Hansson wrote:
> The PM core was preventing devices from going inactive during system
> suspend.

That's not even correct.  It prevents the runtime PM framework from being
used to suspend devices during system suspend and that's on purpose.  The
reason why is that we don't want asynchronous runtime suspends to happen
during system suspend and race with the suspend process.

> Remove this constraint and moreover try to inactivate devices
> by invoking pm_runtime_idle() before proceeding with device's
> suspend_late callbacks.
> 
> A great amount of drivers that uses runtime PM will benefit from this.

If the idea is to be able to do pm_runtime_suspend() from the system suspend
code path, this is not the way to go and it kind of indicates the lack of
understanding why things have been arranged this way.

If you want a driver to do the same thing for runtime PM and system
suspend/resume, point its .suspend_late() and .runtime_suspend() to the
same routine and analogously for .runtime_resume() and .resume_early()
(that may not work for PCI, though, because of the _noirq stuff, but that's
the PCI bus type issue).

Of course, the assumption behind .runtime_suspend() and .runtime_resume() is
that they will be run when the device is not in use, while system suspend
may actually happen when the device *is* in use and .suspend() should take
care of that if necessary.  So you may need .suspend() anyway.

One important property of system suspend is that it can happen at any time
and that's why the .suspend() and .resume() callbacks are needed (and this
also means that they can't be the same as .runtime_suspend() and
.runtime_resume() in general).

Thanks!
Rafael J. Wysocki Nov. 13, 2013, 10:21 p.m. UTC | #3
On Wednesday, November 13, 2013 02:16:03 PM Alan Stern wrote:
> On Wed, 13 Nov 2013, Ulf Hansson wrote:
> 
> > The PM core was preventing devices from going inactive during system
> > suspend. Remove this constraint and moreover try to inactivate devices
> > by invoking pm_runtime_idle() before proceeding with device's
> > suspend_late callbacks.
> > 
> > A great amount of drivers that uses runtime PM will benefit from this.
> 
> It's not clear how drivers will benefit.
> 
> > Drivers that implements some of the system ->suspend* callbacks for the
> > only reason of making sure the runtime PM resourses gets inactive, can
> > now solely instead rely on the PM core to handle this.
> 
> Drivers don't have to implement anything special if they want to use
> their runtime routines for system suspend.  All they have to do is set
> the .suspend and .resume method pointers to point to their
> runtime_suspend and runtime_resume routines.

Well, not really.  This doesn't work in general, because the assumptions for
running .suspend() (at any time, with user space frozen) are quite different
from the assumptions for running .runtime_suspend() (when the device is not
in use).  So generally .suspend() and .runtime_suspend() should be different.

That said .runtime_suspend() generally can be the same as .suspend_late()
(and that was the main reason why the latter was introduced in the first
place) and analogously for .runtime_resume() and .reasume_early().

Although there are simple cases in which .suspend() and .runtime_suspend() may
be the same, I'd still recommend to point .suspend_late and .runtime_suspend
to the same routine and leaving .suspend as NULL in those cases too (and
analogously for the resume stuff).

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Nov. 14, 2013, 1:21 p.m. UTC | #4
>
> Drivers don't have to implement anything special if they want to use
> their runtime routines for system suspend.  All they have to do is set
> the .suspend and .resume method pointers to point to their
> runtime_suspend and runtime_resume routines.
>
>> In the case were drivers needs there runtime resourses to be activated
>> to be able to handle their system suspend tasks, those can easily just
>> restore the runtime PM usage count, pm_runtime_put_sync(), once done
>> with it's suspend operations, which then will inactivate the device.
>
> The reason for doing things the way they are is to avoid races between
> system PM callbacks and runtime PM callbacks.  We don't want to have a
> runtime_resume routine powering up a device at the same time as a
> suspend routine is powering it down.

I believe I understand why the PM core prevent this. Some subsystems
and drivers depends on this behaviour, but I wonder if we should
consider this as common cases or more like special cases?

For those drivers that potentially could rely on runtime PM (and it's
callbacks) to inactivate the device during system suspend, and thus
don't have issues with races you describe above, the current behaviour
in the PM core adds unnecessary complexity for these drivers to handle
system suspend.

Drivers/buses/power_domains that handles runtime PM enabled devices,
will, as you also have pointed out earlier, need to implement the
.suspend callback (suspend_late or suspend_noirq might work as well)
and the corresponding .resume callback. This just for the sake of
making sure all their runtime enabled devices gets fully inactivated
during system suspend.

I wonder if we somehow should move the responsibility of preventing
"runtime suspend" as a part of the system suspend, down to the driver
to decide instead of the PM core?

Kind regards
Ulf Hansson
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Nov. 14, 2013, 2:32 p.m. UTC | #5
>
> That's not even correct.  It prevents the runtime PM framework from being
> used to suspend devices during system suspend and that's on purpose.  The
> reason why is that we don't want asynchronous runtime suspends to happen
> during system suspend and race with the suspend process.
>
>> Remove this constraint and moreover try to inactivate devices
>> by invoking pm_runtime_idle() before proceeding with device's
>> suspend_late callbacks.
>>
>> A great amount of drivers that uses runtime PM will benefit from this.
>
> If the idea is to be able to do pm_runtime_suspend() from the system suspend
> code path, this is not the way to go and it kind of indicates the lack of
> understanding why things have been arranged this way.

I guess there are several reasons, but the one that think stands the
best is the race condition you are referring to above. I were more or
less looking at the below commits to try to get a summary of the
history, I guess there are more. :-)

cf579dfb82550e34de7ccf3ef090d8b834ccd3a9
and
1e2ef05bb8cf851a694d38e9170c89e7ff052741

>
> If you want a driver to do the same thing for runtime PM and system
> suspend/resume, point its .suspend_late() and .runtime_suspend() to the
> same routine and analogously for .runtime_resume() and .resume_early()
> (that may not work for PCI, though, because of the _noirq stuff, but that's
> the PCI bus type issue).

When your .suspend_late callback gets called, your device may be
either active or inactive from runtime PM point of view. Just pointing
the .suspend_late callback to the .runtime_suspend callback, simply
wont work.

That would mean we might end up in trying to inactivate an already
inactivated device, which I doubt is a good thing.:-) Some additional
code will be needed to track the "runtime state" and to act
accordingly. This is what drivers already tries to handle as of today,
though so far mostly from their .suspend callbacks I believe.

>
> Of course, the assumption behind .runtime_suspend() and .runtime_resume() is
> that they will be run when the device is not in use, while system suspend
> may actually happen when the device *is* in use and .suspend() should take
> care of that if necessary.  So you may need .suspend() anyway.
>
> One important property of system suspend is that it can happen at any time
> and that's why the .suspend() and .resume() callbacks are needed (and this
> also means that they can't be the same as .runtime_suspend() and
> .runtime_resume() in general).

I see your point and in most cases you are right. But, I am also sure
that quite many drivers should be able to cope with only runtime PM
callbacks. I have collected a bunch of examples that I can share, if
you are interested?

Kind regards
Ulf Hansson

>
> Thanks!
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Nov. 14, 2013, 3:55 p.m. UTC | #6
On Thu, 14 Nov 2013, Ulf Hansson wrote:

> > If you want a driver to do the same thing for runtime PM and system
> > suspend/resume, point its .suspend_late() and .runtime_suspend() to the
> > same routine and analogously for .runtime_resume() and .resume_early()
> > (that may not work for PCI, though, because of the _noirq stuff, but that's
> > the PCI bus type issue).
> 
> When your .suspend_late callback gets called, your device may be
> either active or inactive from runtime PM point of view. Just pointing
> the .suspend_late callback to the .runtime_suspend callback, simply
> wont work.
> 
> That would mean we might end up in trying to inactivate an already
> inactivated device, which I doubt is a good thing.:-) Some additional
> code will be needed to track the "runtime state" and to act
> accordingly. This is what drivers already tries to handle as of today,
> though so far mostly from their .suspend callbacks I believe.

You could write a "generic" suspend_late routine that would invoke the 
driver's runtime_suspend routine if the device's runtime PM status was 
RPM_ACTIVE.

> > Of course, the assumption behind .runtime_suspend() and .runtime_resume() is
> > that they will be run when the device is not in use, while system suspend
> > may actually happen when the device *is* in use and .suspend() should take
> > care of that if necessary.  So you may need .suspend() anyway.
> >
> > One important property of system suspend is that it can happen at any time
> > and that's why the .suspend() and .resume() callbacks are needed (and this
> > also means that they can't be the same as .runtime_suspend() and
> > .runtime_resume() in general).
> 
> I see your point and in most cases you are right. But, I am also sure
> that quite many drivers should be able to cope with only runtime PM
> callbacks. I have collected a bunch of examples that I can share, if
> you are interested?

All those drivers should be able to utilize this "generic" suspend_late 
routine.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Nov. 14, 2013, 3:59 p.m. UTC | #7
On Thu, 14 Nov 2013, Ulf Hansson wrote:

> > The reason for doing things the way they are is to avoid races between
> > system PM callbacks and runtime PM callbacks.  We don't want to have a
> > runtime_resume routine powering up a device at the same time as a
> > suspend routine is powering it down.
> 
> I believe I understand why the PM core prevent this. Some subsystems
> and drivers depends on this behaviour, but I wonder if we should
> consider this as common cases or more like special cases?
> 
> For those drivers that potentially could rely on runtime PM (and it's
> callbacks) to inactivate the device during system suspend, and thus
> don't have issues with races you describe above, the current behaviour
> in the PM core adds unnecessary complexity for these drivers to handle
> system suspend.

What unnecessary complexity?

Bear in mind that drivers _cannot_ rely on runtime PM to inactivate a 
device during system suspend.  The user can always prevent a device 
from going into runtime suspend by writing "on" to the 
/sys/.../power/control file.

> Drivers/buses/power_domains that handles runtime PM enabled devices,
> will, as you also have pointed out earlier, need to implement the
> .suspend callback (suspend_late or suspend_noirq might work as well)
> and the corresponding .resume callback. This just for the sake of
> making sure all their runtime enabled devices gets fully inactivated
> during system suspend.
> 
> I wonder if we somehow should move the responsibility of preventing
> "runtime suspend" as a part of the system suspend, down to the driver
> to decide instead of the PM core?

No, I don't think so.  It would mean that every driver's suspend
routine would have to do a pm_runtime_get of some sort.  It's better to
just leave the single runtime_get call in the PM core.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Nov. 14, 2013, 4:40 p.m. UTC | #8
On 14 November 2013 16:59, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 14 Nov 2013, Ulf Hansson wrote:
>
>> > The reason for doing things the way they are is to avoid races between
>> > system PM callbacks and runtime PM callbacks.  We don't want to have a
>> > runtime_resume routine powering up a device at the same time as a
>> > suspend routine is powering it down.
>>
>> I believe I understand why the PM core prevent this. Some subsystems
>> and drivers depends on this behaviour, but I wonder if we should
>> consider this as common cases or more like special cases?
>>
>> For those drivers that potentially could rely on runtime PM (and it's
>> callbacks) to inactivate the device during system suspend, and thus
>> don't have issues with races you describe above, the current behaviour
>> in the PM core adds unnecessary complexity for these drivers to handle
>> system suspend.
>
> What unnecessary complexity?
>
> Bear in mind that drivers _cannot_ rely on runtime PM to inactivate a
> device during system suspend.  The user can always prevent a device
> from going into runtime suspend by writing "on" to the
> /sys/.../power/control file.

Good that you brought this up. From my point of view I think the sysfs
for runtime PM could be debated whether it should exist at all, at
least in it's current form.

Anyway, if userspace decides to prevent runtime_suspend, I guess it
will have take the consequences for it as well. :-)

>
>> Drivers/buses/power_domains that handles runtime PM enabled devices,
>> will, as you also have pointed out earlier, need to implement the
>> .suspend callback (suspend_late or suspend_noirq might work as well)
>> and the corresponding .resume callback. This just for the sake of
>> making sure all their runtime enabled devices gets fully inactivated
>> during system suspend.
>>
>> I wonder if we somehow should move the responsibility of preventing
>> "runtime suspend" as a part of the system suspend, down to the driver
>> to decide instead of the PM core?
>
> No, I don't think so.  It would mean that every driver's suspend
> routine would have to do a pm_runtime_get of some sort.  It's better to
> just leave the single runtime_get call in the PM core.

In the end I suppose what case that is most common should win. :-) At
the same time, we should go slowly forward and it is probably not
realistic to convert each and every driver in one go.

So as a way forward, I am thinking of a similar approach as you
suggested with the "generic" suspend_late. But instead add a new
runtime PM API, which intent is to let drivers to specify for PM core,
if it should care to prevent the runtime suspend from happen during
system system - or not. Could that work?

Kind regards
Ulf Hansson

>
> Alan Stern
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Nov. 14, 2013, 5:57 p.m. UTC | #9
On Thu, 14 Nov 2013, Ulf Hansson wrote:

> > Bear in mind that drivers _cannot_ rely on runtime PM to inactivate a
> > device during system suspend.  The user can always prevent a device
> > from going into runtime suspend by writing "on" to the
> > /sys/.../power/control file.
> 
> Good that you brought this up. From my point of view I think the sysfs
> for runtime PM could be debated whether it should exist at all, at
> least in it's current form.

What other form would you suggest?

> Anyway, if userspace decides to prevent runtime_suspend, I guess it
> will have take the consequences for it as well. :-)

Right now, those consequences don't include also preventing the device
from going to low power during system system.  What would you do if you
had a buggy device, where you knew it couldn't handle runtime suspend?  
If you still wanted to put the whole system to sleep, you'd be stuck.

> So as a way forward, I am thinking of a similar approach as you
> suggested with the "generic" suspend_late. But instead add a new
> runtime PM API, which intent is to let drivers to specify for PM core,
> if it should care to prevent the runtime suspend from happen during
> system system - or not. Could that work?

I don't see what good it would do.  Instead of adding a new flag, why 
not just let drivers point their .suspend_late methods to the new 
generic function?  That would accomplish almost the same thing without 
changing the API.

(Actually, it would be superior.  The generic suspend_late approach 
will work even if the user writes "on" to /sys/.../power/control, 
whereas your approach won't work.)

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Nov. 15, 2013, 8:44 a.m. UTC | #10
On 14 November 2013 18:57, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 14 Nov 2013, Ulf Hansson wrote:
>
>> > Bear in mind that drivers _cannot_ rely on runtime PM to inactivate a
>> > device during system suspend.  The user can always prevent a device
>> > from going into runtime suspend by writing "on" to the
>> > /sys/.../power/control file.
>>
>> Good that you brought this up. From my point of view I think the sysfs
>> for runtime PM could be debated whether it should exist at all, at
>> least in it's current form.
>
> What other form would you suggest?

I think it is kind of strange to give provision to userspace to
control a "request based" power management feature in the kernel. Me
personally can't think of any good use case, but I comes from the
embedded/ARM world so I might not have the full picture.

What would make sense to me, is to move the runtime PM sysfs to
debugfs, because it is a good feature to make use of during
development and debugging.

>
>> Anyway, if userspace decides to prevent runtime_suspend, I guess it
>> will have take the consequences for it as well. :-)
>
> Right now, those consequences don't include also preventing the device
> from going to low power during system system.  What would you do if you
> had a buggy device, where you knew it couldn't handle runtime suspend?
> If you still wanted to put the whole system to sleep, you'd be stuck.

Hmm, this seems like a use case for runtime PM sysfs then. :-)

It sounds like vague arguments. If the driver has bugs we need to fix
them, right? Should we really let userspace workaround bugs in the
kernel?

>
>> So as a way forward, I am thinking of a similar approach as you
>> suggested with the "generic" suspend_late. But instead add a new
>> runtime PM API, which intent is to let drivers to specify for PM core,
>> if it should care to prevent the runtime suspend from happen during
>> system system - or not. Could that work?
>
> I don't see what good it would do.  Instead of adding a new flag, why
> not just let drivers point their .suspend_late methods to the new
> generic function?  That would accomplish almost the same thing without
> changing the API.

Two minor functions will be added to the API, no big deal I think.
More importantly these will actually make life easier for some drivers
since they don't need to set up any additional PM callbacks.

Using the generic suspend_late approach would for sure work, but it
will spread to buses and power domains as well, which is not the case
when adding when adding a new API.

>
> (Actually, it would be superior.  The generic suspend_late approach
> will work even if the user writes "on" to /sys/.../power/control,
> whereas your approach won't work.)

We have different view here. :-)

Whether I like it or not, the sysfs interface exist and then I think
the kernel must act accordingly. We shouldn’t override it. I mean if
userspace has decided to keep a device active, there is probably a
reason.


Thanks a lot Alan for being patient with me and continuing the discussion!

Kind regards
Ulf Hansson

>
> Alan Stern
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjørn Mork Nov. 15, 2013, 9:29 a.m. UTC | #11
Ulf Hansson <ulf.hansson@linaro.org> writes:

> I think it is kind of strange to give provision to userspace to
> control a "request based" power management feature in the kernel. Me
> personally can't think of any good use case, but I comes from the
> embedded/ARM world so I might not have the full picture.

Because resuming a device inevitably adds latency, and that only
userspace can decide the latency vs power policy for any given device,
system and user?

There are also other reasons, mostly related to device bugs.  But the
above is the most important in my view.  Without that control knob I
don't think I would have enabled runtime PM in any network driver while
the netif is running.  The additional latency is noticable.

Anyway, this is a pointless discussion.  The power/control ABI is in use
so many places that it cannot be changed.  I'd suggest googling "break
userspace" before even suggesting anything like that again.


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Nov. 15, 2013, 10:34 a.m. UTC | #12
On 15 November 2013 10:29, Bjørn Mork <bjorn@mork.no> wrote:
> Ulf Hansson <ulf.hansson@linaro.org> writes:
>
>> I think it is kind of strange to give provision to userspace to
>> control a "request based" power management feature in the kernel. Me
>> personally can't think of any good use case, but I comes from the
>> embedded/ARM world so I might not have the full picture.
>
> Because resuming a device inevitably adds latency, and that only
> userspace can decide the latency vs power policy for any given device,
> system and user?

That is a valid point. Though we have pm_qos, should not that be the
input for these kind of requirement for a driver?

In other words, if the latency requirement can't be met for a device
while re-activating it, it should not enter runtime suspend state.

>
> There are also other reasons, mostly related to device bugs.  But the
> above is the most important in my view.  Without that control knob I
> don't think I would have enabled runtime PM in any network driver while
> the netif is running.  The additional latency is noticable.
>
> Anyway, this is a pointless discussion.  The power/control ABI is in use
> so many places that it cannot be changed.  I'd suggest googling "break
> userspace" before even suggesting anything like that again.
>

I agree, and has never been a suggestion from my side either, more
like a statement that I don't like the sysfs for runtime PM. :-)

Kind regards
Ulf Hansson

>
> Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Nov. 18, 2013, 1:27 p.m. UTC | #13
> In any case, although Bjørn made a good point about latency, the main
> reason for having the sysfs interface was to cope with buggy devices.
> Linux started using runtime PM before Windows did.  At that time, many
> devices did not support runtime suspend very well -- they didn't have
> to since Windows didn't use it.  Plenty of devices simply couldn't
> handle it.  Too many for a blacklist.
>
> So we pushed the decision about whether to enable runtime PM out to
> userspace.  There was no other choice; too many people were having too
> many problems when runtime PM was automatically on.
>
>> What would make sense to me, is to move the runtime PM sysfs to
>> debugfs, because it is a good feature to make use of during
>> development and debugging.
>
> But since it is primarily needed for device control by the user, it
> belongs in sysfs.
>

Got it! Thanks for giving me the history to better understand why the
sysfs exists.

<snip>

>> >> So as a way forward, I am thinking of a similar approach as you
>> >> suggested with the "generic" suspend_late. But instead add a new
>> >> runtime PM API, which intent is to let drivers to specify for PM core,
>> >> if it should care to prevent the runtime suspend from happen during
>> >> system system - or not. Could that work?
>> >
>> > I don't see what good it would do.  Instead of adding a new flag, why
>> > not just let drivers point their .suspend_late methods to the new
>> > generic function?  That would accomplish almost the same thing without
>> > changing the API.
>>
>> Two minor functions will be added to the API, no big deal I think.
>> More importantly these will actually make life easier for some drivers
>> since they don't need to set up any additional PM callbacks.
>>
>> Using the generic suspend_late approach would for sure work, but it
>> will spread to buses and power domains as well, which is not the case
>> when adding when adding a new API.
>
> I don't understand what you mean by "spread to".  That makes it sound
> like an infectious disease rather than a subroutine.  :-)
>
> Also, I don't see how adding one line for an extra callback pointer is
> any harder than adding one line to call pm_runtime_no_prevent_suspend.
>

I favour the pm_runtime_no_prevent_suspend API approach, since it
would mean a change in behaviour of the PM core. I also think that in
some cases it could make runtime PM better understandable, for those
who are deploying it.

Another advantage of "pm_runtime_no_prevent_suspend" is that driver's
can still easily prevent runtime suspend, during system suspend, by an
extra call to "pm_runtime_get*". Typically this could be used when the
device is configured for wakeup. In this case the generic suspend_late
approach would not work, since the driver would need to implement it's
own .suspend_late callback to handle this.

Kind regards
Ulf Hansson

>> > (Actually, it would be superior.  The generic suspend_late approach
>> > will work even if the user writes "on" to /sys/.../power/control,
>> > whereas your approach won't work.)
>>
>> We have different view here. :-)
>>
>> Whether I like it or not, the sysfs interface exist and then I think
>> the kernel must act accordingly. We shouldn’t override it. I mean if
>> userspace has decided to keep a device active, there is probably a
>> reason.
>>
>>
>> Thanks a lot Alan for being patient with me and continuing the discussion!
>
> My pleasure.
>
> Alan Stern
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Nov. 18, 2013, 3:17 p.m. UTC | #14
On Mon, 18 Nov 2013, Ulf Hansson wrote:

> I favour the pm_runtime_no_prevent_suspend API approach, since it
> would mean a change in behaviour of the PM core.

That's exactly why I don't favor it!  :-)

>  I also think that in
> some cases it could make runtime PM better understandable, for those
> who are deploying it.

I don't see how it would make runtime PM either more or less
understandable, since the proposed change affects system PM, not
runtime PM.  On the other hand, it would make the Runtime PM 
documentation more complicated.

(Also, I think the name "pm_runtime_no_prevent_suspend" is a very bad 
choice.  It should be something more like 
"allow_runtime_suspend_during_system_sleep".)

> Another advantage of "pm_runtime_no_prevent_suspend" is that driver's
> can still easily prevent runtime suspend, during system suspend, by an
> extra call to "pm_runtime_get*". Typically this could be used when the
> device is configured for wakeup.

How is this an advantage?  Drivers can accomplish the same thing right 
now by not doing anything at all.

>  In this case the generic suspend_late
> approach would not work, since the driver would need to implement it's
> own .suspend_late callback to handle this.

Actually, drivers would simply need to _avoid_ setting the 
.suspend_late callback.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Nov. 19, 2013, 12:21 p.m. UTC | #15
On 18 November 2013 16:17, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 18 Nov 2013, Ulf Hansson wrote:
>
>> I favour the pm_runtime_no_prevent_suspend API approach, since it
>> would mean a change in behaviour of the PM core.
>
> That's exactly why I don't favor it!  :-)

I will try my best to convince you then. :-)

>
>>  I also think that in
>> some cases it could make runtime PM better understandable, for those
>> who are deploying it.
>
> I don't see how it would make runtime PM either more or less
> understandable, since the proposed change affects system PM, not
> runtime PM.  On the other hand, it would make the Runtime PM
> documentation more complicated.

At the moment, system PM is already affecting behaviour of runtime PM
since it is preventing runtime suspend during system suspend.

To me and and those driver authors, that don't see the direct need for
why the PM core behaves like this (because we are not working with PCI
drivers for example), I would guess this is probably the piece in the
documentation we have more difficult to understand and adapt to.

I can point at some examples, were I believe drivers that use runtime
PM, tries to adapt to the PM core behaviour, but I fear they for too
many times have got it wrong. Here are some various examples I have
found, which made me think like this.

1.
Not many are using the .suspend_late callback at all, which should be
the most proper solution to use.

2.
Drivers are using the synchronise pm_runtime_put_sync in their
.suspend callbacks.

3.
Drivers do "pm_runtime_get_sync" from their .suspend callbacks, but
don't put the device into inactive state from anywhere as a part of
the system suspend.

4.
Drivers only handles system suspend tasks from it's .suspend callbacks
and don't implement .suspend_late, thus the device will not be fully
inactivated.

5.
Drivers relies on pm_runtime_suspended or worse
pm_runtime_status_suspend, to check if they shall put their device
into inactive state from their .suspend callbacks or leave it as is.
While they put the device into inactive state, the runtime state is
not properly updated to reflect this. Most important, at this point
runtime PM has not yet been disabled from the PM core.

>
> (Also, I think the name "pm_runtime_no_prevent_suspend" is a very bad
> choice.  It should be something more like
> "allow_runtime_suspend_during_system_sleep".)

I am not so happy with the naming myself, we should work out a better one then.

>
>> Another advantage of "pm_runtime_no_prevent_suspend" is that driver's
>> can still easily prevent runtime suspend, during system suspend, by an
>> extra call to "pm_runtime_get*". Typically this could be used when the
>> device is configured for wakeup.
>
> How is this an advantage?  Drivers can accomplish the same thing right
> now by not doing anything at all.
>
>>  In this case the generic suspend_late
>> approach would not work, since the driver would need to implement it's
>> own .suspend_late callback to handle this.
>
> Actually, drivers would simply need to _avoid_ setting the
> .suspend_late callback.

I should have stated, that I means those scenarios were the wakeup
configuration is decided in runtime and depending on the current
situation. For example depending on what kind of peripheral that is
connected. In this scenario, I think the advantage still stands.

Kind regards
Ulf Hansson

>
> Alan Stern
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Nov. 19, 2013, 1:21 p.m. UTC | #16
On Tuesday, November 19, 2013 01:21:57 PM Ulf Hansson wrote:
> On 18 November 2013 16:17, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Mon, 18 Nov 2013, Ulf Hansson wrote:
> >
> >> I favour the pm_runtime_no_prevent_suspend API approach, since it
> >> would mean a change in behaviour of the PM core.
> >
> > That's exactly why I don't favor it!  :-)
> 
> I will try my best to convince you then. :-)
> 
> >
> >>  I also think that in
> >> some cases it could make runtime PM better understandable, for those
> >> who are deploying it.
> >
> > I don't see how it would make runtime PM either more or less
> > understandable, since the proposed change affects system PM, not
> > runtime PM.  On the other hand, it would make the Runtime PM
> > documentation more complicated.
> 
> At the moment, system PM is already affecting behaviour of runtime PM
> since it is preventing runtime suspend during system suspend.
> 
> To me and and those driver authors, that don't see the direct need for
> why the PM core behaves like this (because we are not working with PCI
> drivers for example), I would guess this is probably the piece in the
> documentation we have more difficult to understand and adapt to.
> 
> I can point at some examples, were I believe drivers that use runtime
> PM, tries to adapt to the PM core behaviour, but I fear they for too
> many times have got it wrong. Here are some various examples I have
> found, which made me think like this.
> 
> 1.
> Not many are using the .suspend_late callback at all, which should be
> the most proper solution to use.

Yes.

> 2.
> Drivers are using the synchronise pm_runtime_put_sync in their
> .suspend callbacks.
> 
> 3.
> Drivers do "pm_runtime_get_sync" from their .suspend callbacks, but
> don't put the device into inactive state from anywhere as a part of
> the system suspend.
> 
> 4.
> Drivers only handles system suspend tasks from it's .suspend callbacks
> and don't implement .suspend_late, thus the device will not be fully
> inactivated.
> 
> 5.
> Drivers relies on pm_runtime_suspended or worse
> pm_runtime_status_suspend, to check if they shall put their device
> into inactive state from their .suspend callbacks or leave it as is.
> While they put the device into inactive state, the runtime state is
> not properly updated to reflect this. Most important, at this point
> runtime PM has not yet been disabled from the PM core.

All 2 and 5 are bugs in the drviers that should be fixed.  Whether or
not 4 is a bug depends on what the bus type does.

There's one more problem here, which is the expectation that drivers should
be responsible for the entire handling of suspend/resume of their devices.
This is misguided, because it forces the knowledge about particular platforms
into the drivers.  Ideally, the suspend and resume of a device should be
divided into a device-specific part handled by the driver and a bus-specific
or platform-specific part handled by the bus type or a PM domain, respectively.
The driver should only be concerned about the device-specific part.

That applies to both system suspend/resume and runtime PM, though.

Thanks!
Alan Stern Nov. 19, 2013, 3:35 p.m. UTC | #17
On Tue, 19 Nov 2013, Ulf Hansson wrote:

> At the moment, system PM is already affecting behaviour of runtime PM
> since it is preventing runtime suspend during system suspend.

Sure.  And that behavior is documented.  In any case, it's a bug for 
drivers to depend on runtime suspend for carrying out a system suspend.

> To me and and those driver authors, that don't see the direct need for
> why the PM core behaves like this (because we are not working with PCI
> drivers for example), I would guess this is probably the piece in the
> documentation we have more difficult to understand and adapt to.

Perhaps you would like to improve the documentation, to explain this
point more clearly?

> I can point at some examples, were I believe drivers that use runtime
> PM, tries to adapt to the PM core behaviour, but I fear they for too
> many times have got it wrong. Here are some various examples I have
> found, which made me think like this.
> 
> 1.
> Not many are using the .suspend_late callback at all, which should be
> the most proper solution to use.

That's understandable, since suspend_late is a relatively recent
addition to the runtime PM API.  Naturally, drivers that implemented
runtime PM before suspend_late was added don't use it.

> 2.
> Drivers are using the synchronise pm_runtime_put_sync in their
> .suspend callbacks.

Which drivers do this?  Although this behavior isn't necessarily a bug, 
it probably doesn't do what the driver's author expected.

> 3.
> Drivers do "pm_runtime_get_sync" from their .suspend callbacks, but
> don't put the device into inactive state from anywhere as a part of
> the system suspend.

Again, which drivers?  Perhaps they expect the platform code to put the 
device into a low-power state.

> 4.
> Drivers only handles system suspend tasks from it's .suspend callbacks
> and don't implement .suspend_late, thus the device will not be fully
> inactivated.

It's okay not to implement .suspend_late.  If .suspend takes care of
everything, .suspend_late won't have work to do.

> 5.
> Drivers relies on pm_runtime_suspended or worse
> pm_runtime_status_suspend, to check if they shall put their device
> into inactive state from their .suspend callbacks or leave it as is.

There's nothing wrong with that (provided there's no risk of a runtime
resume occurring concurrently).  It makes sense to have a check like
that, so that you don't try to put the device into a low-power state
when it already is in low power.

> While they put the device into inactive state, the runtime state is
> not properly updated to reflect this. Most important, at this point
> runtime PM has not yet been disabled from the PM core.

The runtime PM status is generally not reliable after a device's
.suspend has run.  The .resume routine in the bus, class, or driver is
responsible for making sure that the runtime PM status is adjusted to 
the proper value.  This is the intended behavior, and it is documented.

> I should have stated, that I means those scenarios were the wakeup
> configuration is decided in runtime and depending on the current
> situation. For example depending on what kind of peripheral that is
> connected. In this scenario, I think the advantage still stands.

In fact, this example shows that using runtime-PM calls to carry out
system suspend can be a bad idea.  Here's the reason: Runtime suspend
is expected _always_ to enable wakeup (if the hardware supports it).  
But during system suspend, wakeup is supposed to be enabled _only_ if
device_may_wakeup() returns true.  Therefore, using a runtime_suspend
callback to handle system suspend is likely to end up with an incorrect
wakeup setting.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kevin Hilman Nov. 19, 2013, 4:45 p.m. UTC | #18
Alan Stern <stern@rowland.harvard.edu> writes:

> On Tue, 19 Nov 2013, Ulf Hansson wrote:
>
>> At the moment, system PM is already affecting behaviour of runtime PM
>> since it is preventing runtime suspend during system suspend.
>
> Sure.  And that behavior is documented.  In any case, it's a bug for 
> drivers to depend on runtime suspend for carrying out a system suspend.

As Rafael mentioned, there is bus/pm_domain code that comes into play
here, so I'm not sure it's always a bug.

IMO, it's not a bug for the driver to depend on runtime PM if the
bus/pm_domain is handling the details.

On OMAP, we handle all the SoC on-chip devices with a pm_domain since
the low-level PM operations that need to happen are bus-level things not
device-level things.  Therefore, drivers for these devices can rely
entirely on runtime PM, even for system suspend.  The late/early
callbacks in the pm_domain can see if the device is runtime suspended
already or not and behave accordingly.

So, this all *can* work by handling it at the bus/pm_domain level, but
as Ulf has mentioned (and I agree) it seems like a clunky workaround
because the PM core is preventing it from happening as one might expect.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Nov. 19, 2013, 6:03 p.m. UTC | #19
On Tue, 19 Nov 2013, Kevin Hilman wrote:

> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Tue, 19 Nov 2013, Ulf Hansson wrote:
> >
> >> At the moment, system PM is already affecting behaviour of runtime PM
> >> since it is preventing runtime suspend during system suspend.
> >
> > Sure.  And that behavior is documented.  In any case, it's a bug for 
> > drivers to depend on runtime suspend for carrying out a system suspend.
> 
> As Rafael mentioned, there is bus/pm_domain code that comes into play
> here, so I'm not sure it's always a bug.
> 
> IMO, it's not a bug for the driver to depend on runtime PM if the
> bus/pm_domain is handling the details.
> 
> On OMAP, we handle all the SoC on-chip devices with a pm_domain since
> the low-level PM operations that need to happen are bus-level things not
> device-level things.  Therefore, drivers for these devices can rely
> entirely on runtime PM, even for system suspend.  The late/early
> callbacks in the pm_domain can see if the device is runtime suspended
> already or not and behave accordingly.
> 
> So, this all *can* work by handling it at the bus/pm_domain level, but
> as Ulf has mentioned (and I agree) it seems like a clunky workaround
> because the PM core is preventing it from happening as one might expect.

The problem is that userspace can disable runtime PM at any time by 
writing "on" to /sys/.../power/control.  Once that's done, you can't
depend on runtime PM to put the device into a low-power state during 
system suspend.

Now, if the bus-level code always takes care of putting the device into
low power during system suspend, then the driver doesn't have to worry
about it at all.  That's perfectly fine -- but I would hardly describe
the situation by saying the driver relies on runtime PM for system
suspend.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kevin Hilman Nov. 19, 2013, 6:43 p.m. UTC | #20
Alan Stern <stern@rowland.harvard.edu> writes:

> On Tue, 19 Nov 2013, Kevin Hilman wrote:
>
>> Alan Stern <stern@rowland.harvard.edu> writes:
>> 
>> > On Tue, 19 Nov 2013, Ulf Hansson wrote:
>> >
>> >> At the moment, system PM is already affecting behaviour of runtime PM
>> >> since it is preventing runtime suspend during system suspend.
>> >
>> > Sure.  And that behavior is documented.  In any case, it's a bug for 
>> > drivers to depend on runtime suspend for carrying out a system suspend.
>> 
>> As Rafael mentioned, there is bus/pm_domain code that comes into play
>> here, so I'm not sure it's always a bug.
>> 
>> IMO, it's not a bug for the driver to depend on runtime PM if the
>> bus/pm_domain is handling the details.
>> 
>> On OMAP, we handle all the SoC on-chip devices with a pm_domain since
>> the low-level PM operations that need to happen are bus-level things not
>> device-level things.  Therefore, drivers for these devices can rely
>> entirely on runtime PM, even for system suspend.  The late/early
>> callbacks in the pm_domain can see if the device is runtime suspended
>> already or not and behave accordingly.
>> 
>> So, this all *can* work by handling it at the bus/pm_domain level, but
>> as Ulf has mentioned (and I agree) it seems like a clunky workaround
>> because the PM core is preventing it from happening as one might expect.
>
> The problem is that userspace can disable runtime PM at any time by 
> writing "on" to /sys/.../power/control.  Once that's done, you can't
> depend on runtime PM to put the device into a low-power state during 
> system suspend.

We handle that too.

The way we handle it, that the pm_domain code "knows" that the device
*should* be runtime suspended by the time the pm_domain suspend_noirq
callback is called.  So it checks pm_runtime_status_suspended() and if
the device is not runtime suspended, the pm_domain will "forcibly"
runtime suspend the device (which just means, calling the driver's
->runtime_suspend(), doing the bus-level magic, and setting the runtime
PM state accordingly.)

This is arguably the same approach as the using the late callbacks that
you've already described, but it's just done in a hidden way at the bus
level.  Also, this solution predates the addtion of the late callbacks
and I believe may have even been initially proposed by you (or maybe
Rafael, I forgot.)

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Nov. 20, 2013, 10:52 a.m. UTC | #21
On 19 November 2013 16:35, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Tue, 19 Nov 2013, Ulf Hansson wrote:
>
>> At the moment, system PM is already affecting behaviour of runtime PM
>> since it is preventing runtime suspend during system suspend.
>
> Sure.  And that behavior is documented.  In any case, it's a bug for
> drivers to depend on runtime suspend for carrying out a system suspend.

Why do you say that?

A significant amount of drivers should be able to cope with only the
runtime PM callbacks, if only the PM core have respected the drivers
in the way my RFC proposes.

In the end, I want to achieve simpler use of runtime PM for drivers.
As you will see below, a vast amount of drivers has got it wrong and
to me that indicates we have not fully succeeded in keeping it as
simple as I think we could have done.

>
>> To me and and those driver authors, that don't see the direct need for
>> why the PM core behaves like this (because we are not working with PCI
>> drivers for example), I would guess this is probably the piece in the
>> documentation we have more difficult to understand and adapt to.
>
> Perhaps you would like to improve the documentation, to explain this
> point more clearly?

Sure, but even if documentation would be improved (actually I think it
is quite good already), I think the behaviour of the PM core is what
causes the confusion.

It is obvious to you guys, working with this for a long time and with
these subsystem that clearly benefit from having the PM core to
prevent runtime suspend during system suspend. Not so sure this is the
case for others.

>
>> I can point at some examples, were I believe drivers that use runtime
>> PM, tries to adapt to the PM core behaviour, but I fear they for too
>> many times have got it wrong. Here are some various examples I have
>> found, which made me think like this.
>>
>> 1.
>> Not many are using the .suspend_late callback at all, which should be
>> the most proper solution to use.
>
> That's understandable, since suspend_late is a relatively recent
> addition to the runtime PM API.  Naturally, drivers that implemented
> runtime PM before suspend_late was added don't use it.

Well, it was included in 3.4, soon 1.5 years ago, but still only a
handful occurrences. That ought to tell us something. :-)

>
>> 2.
>> Drivers are using the synchronise pm_runtime_put_sync in their
>> .suspend callbacks.
>
> Which drivers do this?  Although this behavior isn't necessarily a bug,
> it probably doesn't do what the driver's author expected.

I think it indicates two things. One, the documentation has likely not
been thoroughly read by authors. Second, authors are not expecting the
current behaviour of PM core.

This is just a subset of drivers I found:
drivers/pwm/pwm-tiecap.c
drivers/pwm/pwm-tiehrpwm.c
arch/arm/mach-omap2/gpmc.c
drivers/char/hw_random/omap-rng.c
drivers/crypto/omap-aes.c
drivers/crypto/omap-sham.c
drivers/input/keyboard/sh_keysc.c
drivers/media/platform/davinci/vpif.c
drivers/media/platform/davinci/vpss.c
drivers/mtd/devices/elm.c

>
>> 3.
>> Drivers do "pm_runtime_get_sync" from their .suspend callbacks, but
>> don't put the device into inactive state from anywhere as a part of
>> the system suspend.
>
> Again, which drivers?  Perhaps they expect the platform code to put the
> device into a low-power state.

Sure, that requires some more deeper analyse, but anyway I found these:
drivers/mmc/host/mmci.c
drivers/mmc/host/sdhci-pxav3.c
drivers/power/bq24190_charger.c

>
>> 4.
>> Drivers only handles system suspend tasks from it's .suspend callbacks
>> and don't implement .suspend_late, thus the device will not be fully
>> inactivated.
>
> It's okay not to implement .suspend_late.  If .suspend takes care of
> everything, .suspend_late won't have work to do.

I have tried to iron what you say above, not sure I got it completely
right though:

drivers/mfd/arizona-core.c
drivers/mmc/host/sdhci-s3c.c
drivers/spi/spi-tegra114.c
drivers/spi/spi-tegra20-sflash.c
drivers/tty/serial/8250/8250_dw.c

>
>> 5.
>> Drivers relies on pm_runtime_suspended or worse
>> pm_runtime_status_suspend, to check if they shall put their device
>> into inactive state from their .suspend callbacks or leave it as is.
>
> There's nothing wrong with that (provided there's no risk of a runtime
> resume occurring concurrently).  It makes sense to have a check like
> that, so that you don't try to put the device into a low-power state
> when it already is in low power.

I am not saying these are bugs, but it kind of indicates that the
author is not fully aware of how to use runtime PM in system suspend.
The proper thing to do, should have been to use pm_runtime_disable,
like the PM core in suspend late, before inactivating the device.

drivers/ata/libata-core.c
drivers/gpu/drm/exynos/exynos_drm_ipp.c
drivers/gpu/drm/exynos/exynos_drm_fimc.c
drivers/gpu/drm/exynos/exynos_drm_gsc.c
drivers/gpu/drm/exynos/exynos_drm_rotator.c
drivers/gpu/drm/exynos/exynos_hdmi.c:
drivers/dma/sirf-dma.c

>
>> While they put the device into inactive state, the runtime state is
>> not properly updated to reflect this. Most important, at this point
>> runtime PM has not yet been disabled from the PM core.
>
> The runtime PM status is generally not reliable after a device's
> .suspend has run.  The .resume routine in the bus, class, or driver is
> responsible for making sure that the runtime PM status is adjusted to
> the proper value.  This is the intended behavior, and it is documented.

Still runtime PM should be disabled (pm_runtime_disable) to handle
this properly, which is not the case in my examples above.

>
>> I should have stated, that I means those scenarios were the wakeup
>> configuration is decided in runtime and depending on the current
>> situation. For example depending on what kind of peripheral that is
>> connected. In this scenario, I think the advantage still stands.
>
> In fact, this example shows that using runtime-PM calls to carry out
> system suspend can be a bad idea.  Here's the reason: Runtime suspend
> is expected _always_ to enable wakeup (if the hardware supports it).

Not sure what you mean here, why should runtime suspend always enable
wakeup? Is that not depending on the current situation?

> But during system suspend, wakeup is supposed to be enabled _only_ if
> device_may_wakeup() returns true.  Therefore, using a runtime_suspend
> callback to handle system suspend is likely to end up with an incorrect
> wakeup setting.

I don't see how these are in conflict? The idea was that drivers that
requires it's device to be runtime resumed to be able to have wakeup
cap set, should just prevent runtime suspend (pm_runtime_get). I might
have explained this in some badly written English, sorry if I confused
you. :-)

Kind regards
Ulf Hansson

>
> Alan Stern
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Machek Nov. 20, 2013, 11:23 a.m. UTC | #22
On Wed 2013-11-20 11:52:05, Ulf Hansson wrote:
> On 19 November 2013 16:35, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Tue, 19 Nov 2013, Ulf Hansson wrote:
> >
> >> At the moment, system PM is already affecting behaviour of runtime PM
> >> since it is preventing runtime suspend during system suspend.
> >
> > Sure.  And that behavior is documented.  In any case, it's a bug for
> > drivers to depend on runtime suspend for carrying out a system suspend.
> 
> Why do you say that?

Because that's the way it is?

> A significant amount of drivers should be able to cope with only the
> runtime PM callbacks, if only the PM core have respected the drivers
> in the way my RFC proposes.

So what? It is not as additional callback is huge burden -- code is
the same -- and we do want option of disabling runtime PM.

Don't make system suspend dependend on runtime PM.

								Pavel
Rafael J. Wysocki Nov. 20, 2013, 11:53 a.m. UTC | #23
On Wednesday, November 20, 2013 12:23:03 PM Pavel Machek wrote:
> On Wed 2013-11-20 11:52:05, Ulf Hansson wrote:
> > On 19 November 2013 16:35, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > On Tue, 19 Nov 2013, Ulf Hansson wrote:
> > >
> > >> At the moment, system PM is already affecting behaviour of runtime PM
> > >> since it is preventing runtime suspend during system suspend.
> > >
> > > Sure.  And that behavior is documented.  In any case, it's a bug for
> > > drivers to depend on runtime suspend for carrying out a system suspend.
> > 
> > Why do you say that?
> 
> Because that's the way it is?
> 
> > A significant amount of drivers should be able to cope with only the
> > runtime PM callbacks, if only the PM core have respected the drivers
> > in the way my RFC proposes.
> 
> So what? It is not as additional callback is huge burden -- code is
> the same -- and we do want option of disabling runtime PM.
> 
> Don't make system suspend dependend on runtime PM.

Amen.

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Nov. 20, 2013, 12:43 p.m. UTC | #24
On 20 November 2013 12:23, Pavel Machek <pavel@ucw.cz> wrote:
> On Wed 2013-11-20 11:52:05, Ulf Hansson wrote:
>> On 19 November 2013 16:35, Alan Stern <stern@rowland.harvard.edu> wrote:
>> > On Tue, 19 Nov 2013, Ulf Hansson wrote:
>> >
>> >> At the moment, system PM is already affecting behaviour of runtime PM
>> >> since it is preventing runtime suspend during system suspend.
>> >
>> > Sure.  And that behavior is documented.  In any case, it's a bug for
>> > drivers to depend on runtime suspend for carrying out a system suspend.
>>
>> Why do you say that?
>
> Because that's the way it is?
>
>> A significant amount of drivers should be able to cope with only the
>> runtime PM callbacks, if only the PM core have respected the drivers
>> in the way my RFC proposes.
>
> So what? It is not as additional callback is huge burden -- code is
> the same -- and we do want option of disabling runtime PM.
>
> Don't make system suspend dependend on runtime PM.

I get your point and for sure I respect it, even if don't agree. :-)

Do note that, the intent with my RFC is also to simplify for drivers
in general using runtime PM. No matter how you put it, the
consequences of preventing runtime suspend during system suspend are
causing confusions.

Kind regards
Ulf Hansson

>
>                                                                 Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Nov. 20, 2013, 4:55 p.m. UTC | #25
On Wed, 20 Nov 2013, Ulf Hansson wrote:

> On 19 November 2013 16:35, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Tue, 19 Nov 2013, Ulf Hansson wrote:
> >
> >> At the moment, system PM is already affecting behaviour of runtime PM
> >> since it is preventing runtime suspend during system suspend.
> >
> > Sure.  And that behavior is documented.  In any case, it's a bug for
> > drivers to depend on runtime suspend for carrying out a system suspend.
> 
> Why do you say that?

Because, as I have repeated several times during this discussion,
the user can turn off runtime PM at any time by writing "on" to
/sys/.../power/control.  In addition, as Pavel pointed out, runtime PM
might not even be configured.

> A significant amount of drivers should be able to cope with only the
> runtime PM callbacks, if only the PM core have respected the drivers
> in the way my RFC proposes.

Or in the way I proposed.

> In the end, I want to achieve simpler use of runtime PM for drivers.
> As you will see below, a vast amount of drivers has got it wrong and
> to me that indicates we have not fully succeeded in keeping it as
> simple as I think we could have done.

You could simplify runtime PM by not configuring system sleep.  Then
none of these issues would matter.  :-)

But if you want to allow a system to have both runtime PM and system 
PM, some complexity is unavoidable.

> >> To me and and those driver authors, that don't see the direct need for
> >> why the PM core behaves like this (because we are not working with PCI
> >> drivers for example), I would guess this is probably the piece in the
> >> documentation we have more difficult to understand and adapt to.
> >
> > Perhaps you would like to improve the documentation, to explain this
> > point more clearly?
> 
> Sure, but even if documentation would be improved (actually I think it
> is quite good already), I think the behaviour of the PM core is what
> causes the confusion.
> 
> It is obvious to you guys, working with this for a long time and with
> these subsystem that clearly benefit from having the PM core to
> prevent runtime suspend during system suspend. Not so sure this is the
> case for others.

Then this is a perfect point for you to explain more clearly in the 
documentation.  If people are confused by the PM core's behavior, the 
right thing to do is explain the behavior -- not change it.

(In fact, the interaction between runtime suspend and system suspend is 
sufficiently complicated that no matter what the PM core does, it will 
end up confusing some people.)

> >> 1.
> >> Not many are using the .suspend_late callback at all, which should be
> >> the most proper solution to use.
> >
> > That's understandable, since suspend_late is a relatively recent
> > addition to the runtime PM API.  Naturally, drivers that implemented
> > runtime PM before suspend_late was added don't use it.
> 
> Well, it was included in 3.4, soon 1.5 years ago, but still only a
> handful occurrences. That ought to tell us something. :-)

I don't think so.  Go back and find some other features that were added 
to the kernel 1.5 years ago, and see how many places _they_ are used.

> >> 2.
> >> Drivers are using the synchronise pm_runtime_put_sync in their
> >> .suspend callbacks.
> >
> > Which drivers do this?  Although this behavior isn't necessarily a bug,
> > it probably doesn't do what the driver's author expected.
> 
> I think it indicates two things. One, the documentation has likely not
> been thoroughly read by authors. Second, authors are not expecting the
> current behaviour of PM core.
> 
> This is just a subset of drivers I found:
> drivers/pwm/pwm-tiecap.c
> drivers/pwm/pwm-tiehrpwm.c
> arch/arm/mach-omap2/gpmc.c
> drivers/char/hw_random/omap-rng.c
> drivers/crypto/omap-aes.c
> drivers/crypto/omap-sham.c
> drivers/input/keyboard/sh_keysc.c
> drivers/media/platform/davinci/vpif.c
> drivers/media/platform/davinci/vpss.c
> drivers/mtd/devices/elm.c

Okay, I'll take a look at these drivers.

> >> 3.
> >> Drivers do "pm_runtime_get_sync" from their .suspend callbacks, but
> >> don't put the device into inactive state from anywhere as a part of
> >> the system suspend.
> >
> > Again, which drivers?  Perhaps they expect the platform code to put the
> > device into a low-power state.
> 
> Sure, that requires some more deeper analyse, but anyway I found these:
> drivers/mmc/host/mmci.c
> drivers/mmc/host/sdhci-pxav3.c
> drivers/power/bq24190_charger.c

And these.

> >> 4.
> >> Drivers only handles system suspend tasks from it's .suspend callbacks
> >> and don't implement .suspend_late, thus the device will not be fully
> >> inactivated.
> >
> > It's okay not to implement .suspend_late.  If .suspend takes care of
> > everything, .suspend_late won't have work to do.
> 
> I have tried to iron what you say above, not sure I got it completely
> right though:

Sorry, I omitted a word.  I meant that if .suspend takes care of
everything (prevents new I/O requests from being serviced, finishes all
ongoing I/O requests, and puts the device into a low-power state), then
.suspend_late won't need to do any work at all.

> drivers/mfd/arizona-core.c
> drivers/mmc/host/sdhci-s3c.c
> drivers/spi/spi-tegra114.c
> drivers/spi/spi-tegra20-sflash.c
> drivers/tty/serial/8250/8250_dw.c

You must be joking.  There are hundreds of drivers that don't implement
.suspend_late.  Why list just these ones?  (In fact, almost no drivers
_do_ implement it.)

> >> 5.
> >> Drivers relies on pm_runtime_suspended or worse
> >> pm_runtime_status_suspend, to check if they shall put their device
> >> into inactive state from their .suspend callbacks or leave it as is.
> >
> > There's nothing wrong with that (provided there's no risk of a runtime
> > resume occurring concurrently).  It makes sense to have a check like
> > that, so that you don't try to put the device into a low-power state
> > when it already is in low power.
> 
> I am not saying these are bugs, but it kind of indicates that the
> author is not fully aware of how to use runtime PM in system suspend.
> The proper thing to do, should have been to use pm_runtime_disable,
> like the PM core in suspend late, before inactivating the device.
> 
> drivers/ata/libata-core.c
> drivers/gpu/drm/exynos/exynos_drm_ipp.c
> drivers/gpu/drm/exynos/exynos_drm_fimc.c
> drivers/gpu/drm/exynos/exynos_drm_gsc.c
> drivers/gpu/drm/exynos/exynos_drm_rotator.c
> drivers/gpu/drm/exynos/exynos_hdmi.c:
> drivers/dma/sirf-dma.c

Drivers don't necessarily have to call pm_runtime_disable.  They can
have other ways of insuring that no runtime resumes occur at the wrong
time.  For example, it is guaranteed that all user processes will be 
frozen during a system suspend, so no runtime-resume requests can come 
from userspace.

> >> While they put the device into inactive state, the runtime state is
> >> not properly updated to reflect this. Most important, at this point
> >> runtime PM has not yet been disabled from the PM core.
> >
> > The runtime PM status is generally not reliable after a device's
> > .suspend has run.  The .resume routine in the bus, class, or driver is
> > responsible for making sure that the runtime PM status is adjusted to
> > the proper value.  This is the intended behavior, and it is documented.
> 
> Still runtime PM should be disabled (pm_runtime_disable) to handle
> this properly, which is not the case in my examples above.

Not necessarily; see above.

Also, the documentation does explain that drivers should call 
pm_runtime_disable from their .resume callback, as part of adjusting 
the runtime PM status to match the device's actual state.

> >> I should have stated, that I means those scenarios were the wakeup
> >> configuration is decided in runtime and depending on the current
> >> situation. For example depending on what kind of peripheral that is
> >> connected. In this scenario, I think the advantage still stands.
> >
> > In fact, this example shows that using runtime-PM calls to carry out
> > system suspend can be a bad idea.  Here's the reason: Runtime suspend
> > is expected _always_ to enable wakeup (if the hardware supports it).
> 
> Not sure what you mean here, why should runtime suspend always enable
> wakeup? Is that not depending on the current situation?

No, it's not dependent on the situation.  Runtime suspend is always 
supposed to enable wakeup, if the hardware supports it.  This is 
documented as well.

> > But during system suspend, wakeup is supposed to be enabled _only_ if
> > device_may_wakeup() returns true.  Therefore, using a runtime_suspend
> > callback to handle system suspend is likely to end up with an incorrect
> > wakeup setting.
> 
> I don't see how these are in conflict? The idea was that drivers that
> requires it's device to be runtime resumed to be able to have wakeup
> cap set, should just prevent runtime suspend (pm_runtime_get). I might
> have explained this in some badly written English, sorry if I confused
> you. :-)

Here's an example.  Suppose a device is in runtime suspend, with wakeup 
enabled.  Now the whole computer goes into system suspend, and the 
device is _not_ supposed to be enabled for wakeup during system 
suspend.  The driver _should_ power-up the device, disable wakeup, and 
then power the device down again.  (Or something like that.)

But if the driver uses its runtime_suspend method for system suspend, 
then it will merely see that the device is already in low power, so it 
will return without doing anything.  As a result, the device will have 
wakeup enabled even though it shouldn't.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Nov. 20, 2013, 4:58 p.m. UTC | #26
On Wed, 20 Nov 2013, Ulf Hansson wrote:

> Do note that, the intent with my RFC is also to simplify for drivers
> in general using runtime PM. No matter how you put it, the
> consequences of preventing runtime suspend during system suspend are
> causing confusions.

Then fix the confusion.  Explain peoples' mistakes to them.

Plenty of other parts of the kernel are difficult or confusing to use.  
Sometimes there simply is no way to avoid it.  We deal with the 
situation by teaching people how to use these facilities properly.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Machek Nov. 20, 2013, 7:55 p.m. UTC | #27
On Wed 2013-11-20 11:58:31, Alan Stern wrote:
> On Wed, 20 Nov 2013, Ulf Hansson wrote:
> 
> > Do note that, the intent with my RFC is also to simplify for drivers
> > in general using runtime PM. No matter how you put it, the
> > consequences of preventing runtime suspend during system suspend are
> > causing confusions.
> 
> Then fix the confusion.  Explain peoples' mistakes to them.

Exactly. Way to solve confusion is not "make core behave as part of
drivers apparently think it should".

And yes, you have a list of broken drivers... thanks for that.
									Pavel
Ulf Hansson Nov. 21, 2013, 9 a.m. UTC | #28
On 19 November 2013 19:03, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Tue, 19 Nov 2013, Kevin Hilman wrote:
>
>> Alan Stern <stern@rowland.harvard.edu> writes:
>>
>> > On Tue, 19 Nov 2013, Ulf Hansson wrote:
>> >
>> >> At the moment, system PM is already affecting behaviour of runtime PM
>> >> since it is preventing runtime suspend during system suspend.
>> >
>> > Sure.  And that behavior is documented.  In any case, it's a bug for
>> > drivers to depend on runtime suspend for carrying out a system suspend.
>>
>> As Rafael mentioned, there is bus/pm_domain code that comes into play
>> here, so I'm not sure it's always a bug.
>>
>> IMO, it's not a bug for the driver to depend on runtime PM if the
>> bus/pm_domain is handling the details.
>>
>> On OMAP, we handle all the SoC on-chip devices with a pm_domain since
>> the low-level PM operations that need to happen are bus-level things not
>> device-level things.  Therefore, drivers for these devices can rely
>> entirely on runtime PM, even for system suspend.  The late/early
>> callbacks in the pm_domain can see if the device is runtime suspended
>> already or not and behave accordingly.
>>
>> So, this all *can* work by handling it at the bus/pm_domain level, but
>> as Ulf has mentioned (and I agree) it seems like a clunky workaround
>> because the PM core is preventing it from happening as one might expect.
>
> The problem is that userspace can disable runtime PM at any time by
> writing "on" to /sys/.../power/control.  Once that's done, you can't
> depend on runtime PM to put the device into a low-power state during
> system suspend.
>
> Now, if the bus-level code always takes care of putting the device into
> low power during system suspend, then the driver doesn't have to worry
> about it at all.  That's perfectly fine -- but I would hardly describe
> the situation by saying the driver relies on runtime PM for system
> suspend.

But they are, since that is exactly what these driver depends on
during system suspend. And since the power domain invokes the runtime
callbacks, normally CONFIG_PM_RUNTIME is needed for the callbacks to
be set up from in the bus/driver.

I suppose that the OMAP SoC maintainers were accepting that this
dependency were allowed to exist, since that was the only solution to
their problem at that point.

Today, several years later, we are still indirectly pushing those SoC
maintainers that faces similar issues, to accept this dependency
between system suspend and runtime suspend. I see a conflict here. :-)

Kind regards
Ulf Hansson

>
> Alan Stern
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Nov. 21, 2013, 9:11 a.m. UTC | #29
On 14 November 2013 16:55, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 14 Nov 2013, Ulf Hansson wrote:
>
>> > If you want a driver to do the same thing for runtime PM and system
>> > suspend/resume, point its .suspend_late() and .runtime_suspend() to the
>> > same routine and analogously for .runtime_resume() and .resume_early()
>> > (that may not work for PCI, though, because of the _noirq stuff, but that's
>> > the PCI bus type issue).
>>
>> When your .suspend_late callback gets called, your device may be
>> either active or inactive from runtime PM point of view. Just pointing
>> the .suspend_late callback to the .runtime_suspend callback, simply
>> wont work.
>>
>> That would mean we might end up in trying to inactivate an already
>> inactivated device, which I doubt is a good thing.:-) Some additional
>> code will be needed to track the "runtime state" and to act
>> accordingly. This is what drivers already tries to handle as of today,
>> though so far mostly from their .suspend callbacks I believe.
>
> You could write a "generic" suspend_late routine that would invoke the
> driver's runtime_suspend routine if the device's runtime PM status was
> RPM_ACTIVE.
>

I started drafting a new RFC according to your suggestion. Soon I
realized that using such a callback for .suspend_late will make
drivers/buses system suspend to depend on runtime suspend, (which my
RFC also has problem with). So I guess this is not a recommended set
up from the PM maintainers. Wandering in circles... :-)

Kind regards
Ulf Hansson
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Nov. 21, 2013, 3:54 p.m. UTC | #30
On Thu, 21 Nov 2013, Ulf Hansson wrote:

> >> As Rafael mentioned, there is bus/pm_domain code that comes into play
> >> here, so I'm not sure it's always a bug.
> >>
> >> IMO, it's not a bug for the driver to depend on runtime PM if the
> >> bus/pm_domain is handling the details.
> >>
> >> On OMAP, we handle all the SoC on-chip devices with a pm_domain since
> >> the low-level PM operations that need to happen are bus-level things not
> >> device-level things.  Therefore, drivers for these devices can rely
> >> entirely on runtime PM, even for system suspend.  The late/early
> >> callbacks in the pm_domain can see if the device is runtime suspended
> >> already or not and behave accordingly.
> >>
> >> So, this all *can* work by handling it at the bus/pm_domain level, but
> >> as Ulf has mentioned (and I agree) it seems like a clunky workaround
> >> because the PM core is preventing it from happening as one might expect.
> >
> > The problem is that userspace can disable runtime PM at any time by
> > writing "on" to /sys/.../power/control.  Once that's done, you can't
> > depend on runtime PM to put the device into a low-power state during
> > system suspend.
> >
> > Now, if the bus-level code always takes care of putting the device into
> > low power during system suspend, then the driver doesn't have to worry
> > about it at all.  That's perfectly fine -- but I would hardly describe
> > the situation by saying the driver relies on runtime PM for system
> > suspend.
> 
> But they are, since that is exactly what these driver depends on
> during system suspend.

Not at all.  These drivers rely on the bus-level code's suspend 
routine to handle system suspend.  They don't rely on runtime PM.

>  And since the power domain invokes the runtime
> callbacks, normally CONFIG_PM_RUNTIME is needed for the callbacks to
> be set up from in the bus/driver.

No, it isn't.  You can always write callbacks that get compiled even
when CONFIG_PM_RUNTIME isn't enabled.  It's very easy -- all you have
to do is change

	#ifdef CONFIG_PM_RUNTIME

to

	#ifdef CONFIG_PM

> I suppose that the OMAP SoC maintainers were accepting that this
> dependency were allowed to exist, since that was the only solution to
> their problem at that point.
> 
> Today, several years later, we are still indirectly pushing those SoC
> maintainers that faces similar issues, to accept this dependency
> between system suspend and runtime suspend. I see a conflict here. :-)

It's only a misunderstanding.  There is no dependency.


> I started drafting a new RFC according to your suggestion. Soon I
> realized that using such a callback for .suspend_late will make
> drivers/buses system suspend to depend on runtime suspend, (which my
> RFC also has problem with). So I guess this is not a recommended set
> up from the PM maintainers. Wandering in circles... :-)

Same misunderstanding as before.  The drivers merely need to be written 
so that the callbacks are compiled whenever CONFIG_PM is enabled rather 
than CONFIG_PM_RUNTIME.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Nov. 22, 2013, 8:19 a.m. UTC | #31
On 21 November 2013 16:54, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 21 Nov 2013, Ulf Hansson wrote:
>
>> >> As Rafael mentioned, there is bus/pm_domain code that comes into play
>> >> here, so I'm not sure it's always a bug.
>> >>
>> >> IMO, it's not a bug for the driver to depend on runtime PM if the
>> >> bus/pm_domain is handling the details.
>> >>
>> >> On OMAP, we handle all the SoC on-chip devices with a pm_domain since
>> >> the low-level PM operations that need to happen are bus-level things not
>> >> device-level things.  Therefore, drivers for these devices can rely
>> >> entirely on runtime PM, even for system suspend.  The late/early
>> >> callbacks in the pm_domain can see if the device is runtime suspended
>> >> already or not and behave accordingly.
>> >>
>> >> So, this all *can* work by handling it at the bus/pm_domain level, but
>> >> as Ulf has mentioned (and I agree) it seems like a clunky workaround
>> >> because the PM core is preventing it from happening as one might expect.
>> >
>> > The problem is that userspace can disable runtime PM at any time by
>> > writing "on" to /sys/.../power/control.  Once that's done, you can't
>> > depend on runtime PM to put the device into a low-power state during
>> > system suspend.
>> >
>> > Now, if the bus-level code always takes care of putting the device into
>> > low power during system suspend, then the driver doesn't have to worry
>> > about it at all.  That's perfectly fine -- but I would hardly describe
>> > the situation by saying the driver relies on runtime PM for system
>> > suspend.
>>
>> But they are, since that is exactly what these driver depends on
>> during system suspend.
>
> Not at all.  These drivers rely on the bus-level code's suspend
> routine to handle system suspend.  They don't rely on runtime PM.
>
>>  And since the power domain invokes the runtime
>> callbacks, normally CONFIG_PM_RUNTIME is needed for the callbacks to
>> be set up from in the bus/driver.
>
> No, it isn't.  You can always write callbacks that get compiled even
> when CONFIG_PM_RUNTIME isn't enabled.  It's very easy -- all you have
> to do is change
>
>         #ifdef CONFIG_PM_RUNTIME
>
> to
>
>         #ifdef CONFIG_PM

You nailed it here! Since you consider this to be allowed from
dependency point of view, all my problems described would be
"solvable". :-)

Regarding the SET_RUNTIME_PM_OPS macro, which today relies on
CONFIG_PM_RUNTIME should then be moved to CONFIG_PM instead. As
prerequisites, drivers using this macro, obviously need to implement
the runtime PM callbacks within CONFIG_PM as well. If that makes sense
to you as well, I will start working on converting them that needs.

>
>> I suppose that the OMAP SoC maintainers were accepting that this
>> dependency were allowed to exist, since that was the only solution to
>> their problem at that point.
>>
>> Today, several years later, we are still indirectly pushing those SoC
>> maintainers that faces similar issues, to accept this dependency
>> between system suspend and runtime suspend. I see a conflict here. :-)
>
> It's only a misunderstanding.  There is no dependency.
>
>
>> I started drafting a new RFC according to your suggestion. Soon I
>> realized that using such a callback for .suspend_late will make
>> drivers/buses system suspend to depend on runtime suspend, (which my
>> RFC also has problem with). So I guess this is not a recommended set
>> up from the PM maintainers. Wandering in circles... :-)
>
> Same misunderstanding as before.  The drivers merely need to be written
> so that the callbacks are compiled whenever CONFIG_PM is enabled rather
> than CONFIG_PM_RUNTIME.

I will submit a patch for a "generic suspend late" asap.

I noticed the pm_generic_suspend_late already existed in similar form
as the other generic functions, so I suggest we leave that as is. We
name this new one as, "pm_generic_suspend_late_runtime", to indicate
some relationship with runtime PM.

If you have other suggestions, please advise me.

Kind regards
Ulf Hansson

>
> Alan Stern
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Nov. 22, 2013, 12:06 p.m. UTC | #32
On Friday, November 22, 2013 09:19:32 AM Ulf Hansson wrote:
> On 21 November 2013 16:54, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Thu, 21 Nov 2013, Ulf Hansson wrote:
> >
> >> >> As Rafael mentioned, there is bus/pm_domain code that comes into play
> >> >> here, so I'm not sure it's always a bug.
> >> >>
> >> >> IMO, it's not a bug for the driver to depend on runtime PM if the
> >> >> bus/pm_domain is handling the details.
> >> >>
> >> >> On OMAP, we handle all the SoC on-chip devices with a pm_domain since
> >> >> the low-level PM operations that need to happen are bus-level things not
> >> >> device-level things.  Therefore, drivers for these devices can rely
> >> >> entirely on runtime PM, even for system suspend.  The late/early
> >> >> callbacks in the pm_domain can see if the device is runtime suspended
> >> >> already or not and behave accordingly.
> >> >>
> >> >> So, this all *can* work by handling it at the bus/pm_domain level, but
> >> >> as Ulf has mentioned (and I agree) it seems like a clunky workaround
> >> >> because the PM core is preventing it from happening as one might expect.
> >> >
> >> > The problem is that userspace can disable runtime PM at any time by
> >> > writing "on" to /sys/.../power/control.  Once that's done, you can't
> >> > depend on runtime PM to put the device into a low-power state during
> >> > system suspend.
> >> >
> >> > Now, if the bus-level code always takes care of putting the device into
> >> > low power during system suspend, then the driver doesn't have to worry
> >> > about it at all.  That's perfectly fine -- but I would hardly describe
> >> > the situation by saying the driver relies on runtime PM for system
> >> > suspend.
> >>
> >> But they are, since that is exactly what these driver depends on
> >> during system suspend.
> >
> > Not at all.  These drivers rely on the bus-level code's suspend
> > routine to handle system suspend.  They don't rely on runtime PM.
> >
> >>  And since the power domain invokes the runtime
> >> callbacks, normally CONFIG_PM_RUNTIME is needed for the callbacks to
> >> be set up from in the bus/driver.
> >
> > No, it isn't.  You can always write callbacks that get compiled even
> > when CONFIG_PM_RUNTIME isn't enabled.  It's very easy -- all you have
> > to do is change
> >
> >         #ifdef CONFIG_PM_RUNTIME
> >
> > to
> >
> >         #ifdef CONFIG_PM
> 
> You nailed it here! Since you consider this to be allowed from
> dependency point of view, all my problems described would be
> "solvable". :-)
> 
> Regarding the SET_RUNTIME_PM_OPS macro, which today relies on
> CONFIG_PM_RUNTIME should then be moved to CONFIG_PM instead. As
> prerequisites, drivers using this macro, obviously need to implement
> the runtime PM callbacks within CONFIG_PM as well. If that makes sense
> to you as well, I will start working on converting them that needs.

SET_RUNTIME_PM_OPS is intended to be used inside of a struct dev_pm_ops
definition along the lines of UNIVERSAL_DEV_PM_OPS (although the latter
was a mistake, sadly).

So actually what we need is a sane replacement for UNIVERSAL_DEV_PM_OPS
in my opinion.

Thanks!
diff mbox

Patch

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index ee039af..670b1d5 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -832,7 +832,7 @@  static void device_complete(struct device *dev, pm_message_t state)
 
 	device_unlock(dev);
 
-	pm_runtime_put(dev);
+	pm_request_idle(dev);
 }
 
 /**
@@ -1004,6 +1004,7 @@  static int device_suspend_late(struct device *dev, pm_message_t state)
 	pm_callback_t callback = NULL;
 	char *info = NULL;
 
+	pm_runtime_idle(dev);
 	__pm_runtime_disable(dev, false);
 
 	if (dev->power.syscore)
@@ -1312,14 +1313,6 @@  static int device_prepare(struct device *dev, pm_message_t state)
 	if (dev->power.syscore)
 		return 0;
 
-	/*
-	 * If a device's parent goes into runtime suspend at the wrong time,
-	 * it won't be possible to resume the device.  To prevent this we
-	 * block runtime suspend here, during the prepare phase, and allow
-	 * it again during the complete phase.
-	 */
-	pm_runtime_get_noresume(dev);
-
 	device_lock(dev);
 
 	dev->power.wakeup_path = device_may_wakeup(dev);