diff mbox series

[1/2] PM / runtime: Allow drivers to override runtime PM behaviour on sleep

Message ID 20191128160314.2381249-2-thierry.reding@gmail.com (mailing list archive)
State Rejected, archived
Headers show
Series PM / runtime: Allow drivers to override runtime PM behaviour on sleep | expand

Commit Message

Thierry Reding Nov. 28, 2019, 4:03 p.m. UTC
From: Thierry Reding <treding@nvidia.com>

Currently the driver PM core will automatically acquire a runtime PM
reference for devices before system sleep is entered. This is needed
to avoid potential issues related to devices' parents getting put to
runtime suspend at the wrong time and causing problems with their
children.

In some cases drivers are carefully written to avoid such issues and
the default behaviour can be changed to allow runtime PM to operate
regularly during system sleep.

Add helpers to flag devices as being safe to be runtime suspended at
system sleep time.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/base/power/main.c    |  6 ++++--
 drivers/base/power/runtime.c | 16 ++++++++++++++++
 include/linux/pm.h           |  1 +
 include/linux/pm_runtime.h   |  2 ++
 4 files changed, 23 insertions(+), 2 deletions(-)

Comments

Rafael J. Wysocki Nov. 28, 2019, 4:14 p.m. UTC | #1
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com> wrote:
>
> From: Thierry Reding <treding@nvidia.com>
>
> Currently the driver PM core will automatically acquire a runtime PM
> reference for devices before system sleep is entered. This is needed
> to avoid potential issues related to devices' parents getting put to
> runtime suspend at the wrong time and causing problems with their
> children.

Not only for that.

> In some cases drivers are carefully written to avoid such issues and
> the default behaviour can be changed to allow runtime PM to operate
> regularly during system sleep.

But this change breaks quite a few assumptions in the core too, so no,
it can't be made.

Thanks!
Thierry Reding Nov. 28, 2019, 4:50 p.m. UTC | #2
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com> wrote:
> >
> > From: Thierry Reding <treding@nvidia.com>
> >
> > Currently the driver PM core will automatically acquire a runtime PM
> > reference for devices before system sleep is entered. This is needed
> > to avoid potential issues related to devices' parents getting put to
> > runtime suspend at the wrong time and causing problems with their
> > children.
> 
> Not only for that.
> 
> > In some cases drivers are carefully written to avoid such issues and
> > the default behaviour can be changed to allow runtime PM to operate
> > regularly during system sleep.
> 
> But this change breaks quite a few assumptions in the core too, so no,
> it can't be made.

Anything in particular that I can look at? I'm not seeing any issues
when I test this, which could of course mean that I'm just getting
lucky.

One thing that irritated me is that I think this used to work. I do
recall testing suspend/resume a few years ago and devices would get
properly runtime suspended/resumed. I did some digging but couldn't
find anything that would have had an impact on this.

Given that this is completely opt-in feature, why are you categorically
NAK'ing this?

Is there some other alternative that I can look into?

Thierry
Rafael J. Wysocki Nov. 28, 2019, 10:03 p.m. UTC | #3
On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> 
> --0F1p//8PRICkK4MW
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
> 
> On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>=
>  wrote:
> > >
> > > From: Thierry Reding <treding@nvidia.com>
> > >
> > > Currently the driver PM core will automatically acquire a runtime PM
> > > reference for devices before system sleep is entered. This is needed
> > > to avoid potential issues related to devices' parents getting put to
> > > runtime suspend at the wrong time and causing problems with their
> > > children.
> >=20
> > Not only for that.
> >=20
> > > In some cases drivers are carefully written to avoid such issues and
> > > the default behaviour can be changed to allow runtime PM to operate
> > > regularly during system sleep.
> >=20
> > But this change breaks quite a few assumptions in the core too, so no,
> > it can't be made.
> 
> Anything in particular that I can look at? I'm not seeing any issues
> when I test this, which could of course mean that I'm just getting
> lucky.

There are races and such that you may never hit during casual testing.

> One thing that irritated me is that I think this used to work. I do
> recall testing suspend/resume a few years ago and devices would get
> properly runtime suspended/resumed.

Not true at all.

The PM core has always taken PM-runtime references on all devices pretty much
since when PM-runtime was introduced.

> I did some digging but couldn't
> find anything that would have had an impact on this.
> 
> Given that this is completely opt-in feature, why are you categorically
> NAK'ing this?

The general problem is that if any device has been touched by system-wide
suspend code, it should not be subject to PM-runtime any more until the
subsequent system-wide resume is able to undo whatever the suspend did.

Moreover, if a device is runtime-suspended, the system-wide suspend code
may mishandle it, in general.  That's why PM-runtime suspend is not allowed
during system-wide transitions at all.  And it has always been like that.

For a specific platform you may be able to overcome these limitations if
you are careful enough, but certainly they are there in general and surely
you cannot prevent people from using your opt-in just because they think
that they know what they are doing.

> Is there some other alternative that I can look into?

First of all, ensure that the dpm_list ordering is what it should be on the
system/platform in question.  That can be done with the help of device links.

In addition, make sure that the devices needed to suspend other devices are
suspended in the noirq phase of system-wide suspend and resumed in the
noirq phase of system-wide resume.  Or at least all of the other devices
need to be suspended before them and resumed after them.

These two things should allow you to cover the vast majority of cases if
not all of them without messing up with the rules.

Thanks!
Rafael J. Wysocki Nov. 28, 2019, 10:20 p.m. UTC | #4
On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
> On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> > 
> > --0F1p//8PRICkK4MW
> > Content-Type: text/plain; charset=us-ascii
> > Content-Disposition: inline
> > Content-Transfer-Encoding: quoted-printable
> > 
> > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>=
> >  wrote:
> > > >
> > > > From: Thierry Reding <treding@nvidia.com>
> > > >
> > > > Currently the driver PM core will automatically acquire a runtime PM
> > > > reference for devices before system sleep is entered. This is needed
> > > > to avoid potential issues related to devices' parents getting put to
> > > > runtime suspend at the wrong time and causing problems with their
> > > > children.
> > >=20
> > > Not only for that.
> > >=20
> > > > In some cases drivers are carefully written to avoid such issues and
> > > > the default behaviour can be changed to allow runtime PM to operate
> > > > regularly during system sleep.
> > >=20
> > > But this change breaks quite a few assumptions in the core too, so no,
> > > it can't be made.
> > 
> > Anything in particular that I can look at? I'm not seeing any issues
> > when I test this, which could of course mean that I'm just getting
> > lucky.
> 
> There are races and such that you may never hit during casual testing.
> 
> > One thing that irritated me is that I think this used to work. I do
> > recall testing suspend/resume a few years ago and devices would get
> > properly runtime suspended/resumed.
> 
> Not true at all.
> 
> The PM core has always taken PM-runtime references on all devices pretty much
> since when PM-runtime was introduced.
> 
> > I did some digging but couldn't
> > find anything that would have had an impact on this.
> > 
> > Given that this is completely opt-in feature, why are you categorically
> > NAK'ing this?
> 
> The general problem is that if any device has been touched by system-wide
> suspend code, it should not be subject to PM-runtime any more until the
> subsequent system-wide resume is able to undo whatever the suspend did.
> 
> Moreover, if a device is runtime-suspended, the system-wide suspend code
> may mishandle it, in general.  That's why PM-runtime suspend is not allowed
> during system-wide transitions at all.  And it has always been like that.
> 
> For a specific platform you may be able to overcome these limitations if
> you are careful enough, but certainly they are there in general and surely
> you cannot prevent people from using your opt-in just because they think
> that they know what they are doing.

BTW, what if user space prevents PM-runtime from suspending devices by writing
"on" to their "control" files?

System-wide suspend is (of course) still expected to work in that case, so how
exactly would you overcome that?
Thierry Reding Nov. 29, 2019, 9:33 a.m. UTC | #5
On Thu, Nov 28, 2019 at 11:03:57PM +0100, Rafael J. Wysocki wrote:
> On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> > 
> > --0F1p//8PRICkK4MW
> > Content-Type: text/plain; charset=us-ascii
> > Content-Disposition: inline
> > Content-Transfer-Encoding: quoted-printable
> > 
> > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>=
> >  wrote:
> > > >
> > > > From: Thierry Reding <treding@nvidia.com>
> > > >
> > > > Currently the driver PM core will automatically acquire a runtime PM
> > > > reference for devices before system sleep is entered. This is needed
> > > > to avoid potential issues related to devices' parents getting put to
> > > > runtime suspend at the wrong time and causing problems with their
> > > > children.
> > >=20
> > > Not only for that.
> > >=20
> > > > In some cases drivers are carefully written to avoid such issues and
> > > > the default behaviour can be changed to allow runtime PM to operate
> > > > regularly during system sleep.
> > >=20
> > > But this change breaks quite a few assumptions in the core too, so no,
> > > it can't be made.
> > 
> > Anything in particular that I can look at? I'm not seeing any issues
> > when I test this, which could of course mean that I'm just getting
> > lucky.
> 
> There are races and such that you may never hit during casual testing.
> 
> > One thing that irritated me is that I think this used to work. I do
> > recall testing suspend/resume a few years ago and devices would get
> > properly runtime suspended/resumed.
> 
> Not true at all.
> 
> The PM core has always taken PM-runtime references on all devices pretty much
> since when PM-runtime was introduced.

You're right. I was finally able to find a toolchain that I could build
an old version of the kernel with. I tested system suspend/resume on the
v4.8 release, which is the first one that had the runtime PM changes as
well as the subsystem suspend/resume support wired up, and I can't see
the runtime PM callbacks invoked during system suspend/resume.

So I must be misremembering, or I'm confusing it with some other tests I
was running at the time.

> > I did some digging but couldn't
> > find anything that would have had an impact on this.
> > 
> > Given that this is completely opt-in feature, why are you categorically
> > NAK'ing this?
> 
> The general problem is that if any device has been touched by system-wide
> suspend code, it should not be subject to PM-runtime any more until the
> subsequent system-wide resume is able to undo whatever the suspend did.
> 
> Moreover, if a device is runtime-suspended, the system-wide suspend code
> may mishandle it, in general.  That's why PM-runtime suspend is not allowed
> during system-wide transitions at all.  And it has always been like that.

For this particular use-case the above should all be irrelevant. None of
the drivers involved here do anything special at system suspend, because
runtime suspend already puts the devices into the lowest possible power
state. Basically when these devices are put into runtime suspend, they
are completely turned off. The only exception is for things like HDMI
where the +5V pin remains powered, so that hotplug detection will work.

The runtime PM state of the devices involved is managed by the subsystem
system suspend/resume helpers in DRM/KMS. Basically those helpers turn
off all the devices in the composite device, which ultimately results in
their last runtime PM reference being released. So for system suspend
and resume, these devices aren't touched, other than maybe for the PM
core's internal book-keeping.

> For a specific platform you may be able to overcome these limitations if
> you are careful enough, but certainly they are there in general and surely
> you cannot prevent people from using your opt-in just because they think
> that they know what they are doing.

That's true. But the same thing is true for pretty much all other APIs.
People obviously have to make sure they know what they're doing, just
like they have to with any other API.

I suppose the documentation for this new function is currently lacking a
bit. Perhaps adding a big warning to this and listing the common
pitfalls would help people make the right call about whether or not they
can use this.

> > Is there some other alternative that I can look into?
> 
> First of all, ensure that the dpm_list ordering is what it should be on the
> system/platform in question.  That can be done with the help of device links.

I don't think we have device links for everything, but the deferred
probe code should take care of ordering the dpm_list correctly because
we do handle deferred probe properly in all cases.

Also, the dpm_list ordering isn't very critical in this case. If the
devices are allowed to runtime suspend during system sleep, the
subsystem sleep helper will put them into runtime suspend at the correct
time. This is propagated all the way through the display pipeline and
that order is ensured by the subsystem helpers.

> In addition, make sure that the devices needed to suspend other devices are
> suspended in the noirq phase of system-wide suspend and resumed in the
> noirq phase of system-wide resume.  Or at least all of the other devices
> need to be suspended before them and resumed after them.

We're fine on this front as well. We have run into such issues in the
past, but I don't think there are any such issue left at the moment. I
do have one pending fix for I2C suspend/resume which fixes an issue
where some pinmuxing changes needed to get the HDMI DDC channel to work
were not getting applied during resume.

That I2C issue is related to this, I think. What I'm seeing is that when
the system goes to sleep, the pinmux looses its programming at a
hardware level, but the I2C driver doesn't know about it because it does
not get runtime suspended. At runtime suspend it would switch the pinmux
state to "idle" which would then match the system suspend state. Upon
runtime resume it sets the "default" pinmux state, which will then
restore the register programming.

In the current case where runtime suspend/resume is prohibited during
system sleep, upon resume the I2C driver will assume that the pinmux
state is still "default" and it won't reapply the state (it's actually
the pinmux subsystem that makes this decision) and causes HDMI DDC
transactions to time out.

One simple fix for that is to use pm_runtime_force_suspend() and
pm_runtime_force_resume() as system suspend/resume callbacks to make
sure the I2C controller is runtime suspended/resumed during system
sleep.

Note that forcing runtime suspend/resume this way is suboptimal in the
DRM/KMS case because the suspend/resume happens disconnected from the
subsystem suspend/resume callbacks, which is not desired as that breaks
some of the assumptions in those callbacks.

> These two things should allow you to cover the vast majority of cases if
> not all of them without messing up with the rules.

One alternative that I had thought about was to just ditch the runtime
PM callbacks for this. However, there's one corner case where this may
break. On early Tegra generations, the two display controllers are
"coupled" in that the second one doesn't work if the first one is
disabled. We describe that using a device link from the second to the
first controller. This causes the first controller to be automatically
be runtime resumed when the second controller is used. This only works
via runtime PM, so if I don't use runtime PM I'd have to add special
handling for that case.

Actually, there's another problem as well. Most of these devices use
generic PM domains to power on/off the SoC partitions that they're in.
If I side-step runtime PM, then I'd have to somehow find a way to
explicitly control the PM domains.

Another alternative would be to have a kind of hybrid approach where I
leave runtime PM calls in the drivers, but disconnect the runtime PM
callback implementations from that. That would at least fix the issue
with the generic PM domains.

However, it would not fix the problem with coupled display controllers
because empty runtime PM callbacks wouldn't actually power up the first
display controller when it is needed by the second controller. I would
have to add infrastructure that basically duplicates some of runtime PM
to fix that.

So the bottom line is that runtime PM is still the best solution for
this problem. It works really nice and is very consistent.

Do you think adding better documentation to this new flag and the
accessors would help remove your concerns about this?

Thierry
Thierry Reding Nov. 29, 2019, 10:08 a.m. UTC | #6
On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote:
> On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
> > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> > > 
> > > --0F1p//8PRICkK4MW
> > > Content-Type: text/plain; charset=us-ascii
> > > Content-Disposition: inline
> > > Content-Transfer-Encoding: quoted-printable
> > > 
> > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>=
> > >  wrote:
> > > > >
> > > > > From: Thierry Reding <treding@nvidia.com>
> > > > >
> > > > > Currently the driver PM core will automatically acquire a runtime PM
> > > > > reference for devices before system sleep is entered. This is needed
> > > > > to avoid potential issues related to devices' parents getting put to
> > > > > runtime suspend at the wrong time and causing problems with their
> > > > > children.
> > > >=20
> > > > Not only for that.
> > > >=20
> > > > > In some cases drivers are carefully written to avoid such issues and
> > > > > the default behaviour can be changed to allow runtime PM to operate
> > > > > regularly during system sleep.
> > > >=20
> > > > But this change breaks quite a few assumptions in the core too, so no,
> > > > it can't be made.
> > > 
> > > Anything in particular that I can look at? I'm not seeing any issues
> > > when I test this, which could of course mean that I'm just getting
> > > lucky.
> > 
> > There are races and such that you may never hit during casual testing.
> > 
> > > One thing that irritated me is that I think this used to work. I do
> > > recall testing suspend/resume a few years ago and devices would get
> > > properly runtime suspended/resumed.
> > 
> > Not true at all.
> > 
> > The PM core has always taken PM-runtime references on all devices pretty much
> > since when PM-runtime was introduced.
> > 
> > > I did some digging but couldn't
> > > find anything that would have had an impact on this.
> > > 
> > > Given that this is completely opt-in feature, why are you categorically
> > > NAK'ing this?
> > 
> > The general problem is that if any device has been touched by system-wide
> > suspend code, it should not be subject to PM-runtime any more until the
> > subsequent system-wide resume is able to undo whatever the suspend did.
> > 
> > Moreover, if a device is runtime-suspended, the system-wide suspend code
> > may mishandle it, in general.  That's why PM-runtime suspend is not allowed
> > during system-wide transitions at all.  And it has always been like that.
> > 
> > For a specific platform you may be able to overcome these limitations if
> > you are careful enough, but certainly they are there in general and surely
> > you cannot prevent people from using your opt-in just because they think
> > that they know what they are doing.
> 
> BTW, what if user space prevents PM-runtime from suspending devices by writing
> "on" to their "control" files?
> 
> System-wide suspend is (of course) still expected to work in that case, so how
> exactly would you overcome that?

I suppose one way to overcome that would be to make it an error to write
"on" to the "control" files for these devices.

Currently doing this is likely going to break display support on Tegra,
so this would be a good idea in this case anyway.

Again, I could avoid all of these issues by avoiding runtime PM in this
driver, but I would end up reimplementing some of the same concepts. I'd
rather use something that's supported by the PM core and that might be
useful to other drivers than reinvent the wheel.

Thierry
Rafael J. Wysocki Nov. 29, 2019, 10:09 a.m. UTC | #7
On Fri, Nov 29, 2019 at 10:34 AM Thierry Reding
<thierry.reding@gmail.com> wrote:
>
> On Thu, Nov 28, 2019 at 11:03:57PM +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> > >
> > > --0F1p//8PRICkK4MW
> > > Content-Type: text/plain; charset=us-ascii
> > > Content-Disposition: inline
> > > Content-Transfer-Encoding: quoted-printable
> > >
> > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>=
> > >  wrote:
> > > > >
> > > > > From: Thierry Reding <treding@nvidia.com>
> > > > >
> > > > > Currently the driver PM core will automatically acquire a runtime PM
> > > > > reference for devices before system sleep is entered. This is needed
> > > > > to avoid potential issues related to devices' parents getting put to
> > > > > runtime suspend at the wrong time and causing problems with their
> > > > > children.
> > > >=20
> > > > Not only for that.
> > > >=20
> > > > > In some cases drivers are carefully written to avoid such issues and
> > > > > the default behaviour can be changed to allow runtime PM to operate
> > > > > regularly during system sleep.
> > > >=20
> > > > But this change breaks quite a few assumptions in the core too, so no,
> > > > it can't be made.
> > >
> > > Anything in particular that I can look at? I'm not seeing any issues
> > > when I test this, which could of course mean that I'm just getting
> > > lucky.
> >
> > There are races and such that you may never hit during casual testing.
> >
> > > One thing that irritated me is that I think this used to work. I do
> > > recall testing suspend/resume a few years ago and devices would get
> > > properly runtime suspended/resumed.
> >
> > Not true at all.
> >
> > The PM core has always taken PM-runtime references on all devices pretty much
> > since when PM-runtime was introduced.
>
> You're right. I was finally able to find a toolchain that I could build
> an old version of the kernel with. I tested system suspend/resume on the
> v4.8 release, which is the first one that had the runtime PM changes as
> well as the subsystem suspend/resume support wired up, and I can't see
> the runtime PM callbacks invoked during system suspend/resume.
>
> So I must be misremembering, or I'm confusing it with some other tests I
> was running at the time.
>
> > > I did some digging but couldn't
> > > find anything that would have had an impact on this.
> > >
> > > Given that this is completely opt-in feature, why are you categorically
> > > NAK'ing this?
> >
> > The general problem is that if any device has been touched by system-wide
> > suspend code, it should not be subject to PM-runtime any more until the
> > subsequent system-wide resume is able to undo whatever the suspend did.
> >
> > Moreover, if a device is runtime-suspended, the system-wide suspend code
> > may mishandle it, in general.  That's why PM-runtime suspend is not allowed
> > during system-wide transitions at all.  And it has always been like that.
>
> For this particular use-case the above should all be irrelevant. None of
> the drivers involved here do anything special at system suspend, because
> runtime suspend already puts the devices into the lowest possible power
> state. Basically when these devices are put into runtime suspend, they
> are completely turned off. The only exception is for things like HDMI
> where the +5V pin remains powered, so that hotplug detection will work.
>
> The runtime PM state of the devices involved is managed by the subsystem
> system suspend/resume helpers in DRM/KMS. Basically those helpers turn
> off all the devices in the composite device, which ultimately results in
> their last runtime PM reference being released. So for system suspend
> and resume, these devices aren't touched, other than maybe for the PM
> core's internal book-keeping.

OK, so you actually want system-wide PM to work like PM-runtime on the
platform in question, but there are substantial differences.

First, PM-runtime suspend can be effectively disabled by user space
and system-wide suspend is always expected to work.

Second, if system wakeup devices are involved, their handling during
system-wide suspend depends on the return value of device_may_wakeup()
which depends on what user space does, whereas PM-runtime assumes
device wakeup to be always enabled.

> > For a specific platform you may be able to overcome these limitations if
> > you are careful enough, but certainly they are there in general and surely
> > you cannot prevent people from using your opt-in just because they think
> > that they know what they are doing.
>
> That's true. But the same thing is true for pretty much all other APIs.
> People obviously have to make sure they know what they're doing, just
> like they have to with any other API.
>
> I suppose the documentation for this new function is currently lacking a
> bit. Perhaps adding a big warning to this and listing the common
> pitfalls would help people make the right call about whether or not they
> can use this.

And then *somebody* would have to chase a ton of subtle issues
resulting from that.  No, thanks, but no thanks.

> > > Is there some other alternative that I can look into?
> >
> > First of all, ensure that the dpm_list ordering is what it should be on the
> > system/platform in question.  That can be done with the help of device links.
>
> I don't think we have device links for everything, but the deferred
> probe code should take care of ordering the dpm_list correctly because
> we do handle deferred probe properly in all cases.
>
> Also, the dpm_list ordering isn't very critical in this case. If the
> devices are allowed to runtime suspend during system sleep, the
> subsystem sleep helper will put them into runtime suspend at the correct
> time. This is propagated all the way through the display pipeline and
> that order is ensured by the subsystem helpers.

You are still not saying what happens if user space doesn't allow
PM-runtime to suspend the devices (by writing "on" to their "control"
files).

> > In addition, make sure that the devices needed to suspend other devices are
> > suspended in the noirq phase of system-wide suspend and resumed in the
> > noirq phase of system-wide resume.  Or at least all of the other devices
> > need to be suspended before them and resumed after them.
>
> We're fine on this front as well. We have run into such issues in the
> past, but I don't think there are any such issue left at the moment. I
> do have one pending fix for I2C suspend/resume which fixes an issue
> where some pinmuxing changes needed to get the HDMI DDC channel to work
> were not getting applied during resume.
>
> That I2C issue is related to this, I think. What I'm seeing is that when
> the system goes to sleep, the pinmux looses its programming at a
> hardware level, but the I2C driver doesn't know about it because it does
> not get runtime suspended.

Well, no, that's not the reason.  The real reason is that the handling
of that device during system-wide suspend does not follow the rules
followed by PM-runtime for it.

Switching system-wide PM over to PM-runtime to address that is not
going to work, because PM-runtime is not mandatory and system-wide PM
is.

> At runtime suspend it would switch the pinmux
> state to "idle" which would then match the system suspend state. Upon
> runtime resume it sets the "default" pinmux state, which will then
> restore the register programming.

So this logic needs to be implemented in the system-wide suspend flow as well.

> In the current case where runtime suspend/resume is prohibited during

Runtime suspend is, runtime resume isn't until the "late" suspend phase.

> system sleep, upon resume the I2C driver will assume that the pinmux
> state is still "default" and it won't reapply the state (it's actually
> the pinmux subsystem that makes this decision) and causes HDMI DDC
> transactions to time out.

So this is a bug in the system-wide suspend/resume flow that needs to
be addressed, but not by switching it over to PM-runtime.

> One simple fix for that is to use pm_runtime_force_suspend() and
> pm_runtime_force_resume() as system suspend/resume callbacks to make
> sure the I2C controller is runtime suspended/resumed during system
> sleep.
>
> Note that forcing runtime suspend/resume this way is suboptimal in the
> DRM/KMS case because the suspend/resume happens disconnected from the
> subsystem suspend/resume callbacks, which is not desired as that breaks
> some of the assumptions in those callbacks.

So there needs to be another way.

Have you looked at DPM_FLAG_SMART_SUSPEND?

> > These two things should allow you to cover the vast majority of cases if
> > not all of them without messing up with the rules.
>
> One alternative that I had thought about was to just ditch the runtime
> PM callbacks for this. However, there's one corner case where this may
> break. On early Tegra generations, the two display controllers are
> "coupled" in that the second one doesn't work if the first one is
> disabled. We describe that using a device link from the second to the
> first controller. This causes the first controller to be automatically
> be runtime resumed when the second controller is used. This only works
> via runtime PM, so if I don't use runtime PM I'd have to add special
> handling for that case.

Runtime resume during system-wide suspend and resume is basically fine
unless you try to do it in the "late" suspend phase or later, but that
limitation is kind of artificial.  [I was talking about that at the
LPC this year.]  It basically cannot be carried out in the part of
system-wide suspend after the core regards the device and its parent
etc as "suspended", but the definition of that may be adjusted IMO.

And using PM-runtime resume during system-wide resume may be fine too,
basically (as long as the ordering of that is not lead to any kind of
loop dependencies).

On the other hand, there is *zero* need for runtime suspend during
system-wide transitions and it is known problematic.

> Actually, there's another problem as well. Most of these devices use
> generic PM domains to power on/off the SoC partitions that they're in.
> If I side-step runtime PM, then I'd have to somehow find a way to
> explicitly control the PM domains.

That's a problem with genpd, I'd say.

> Another alternative would be to have a kind of hybrid approach where I
> leave runtime PM calls in the drivers, but disconnect the runtime PM
> callback implementations from that. That would at least fix the issue
> with the generic PM domains.
>
> However, it would not fix the problem with coupled display controllers
> because empty runtime PM callbacks wouldn't actually power up the first
> display controller when it is needed by the second controller. I would
> have to add infrastructure that basically duplicates some of runtime PM
> to fix that.
>
> So the bottom line is that runtime PM is still the best solution for
> this problem. It works really nice and is very consistent.
>
> Do you think adding better documentation to this new flag and the
> accessors would help remove your concerns about this?

No, it wouldn't.

Also your arguments are mostly about PM-runtime resume, which is a
different story.
Rafael J. Wysocki Nov. 29, 2019, 10:22 a.m. UTC | #8
On Fri, Nov 29, 2019 at 11:08 AM Thierry Reding
<thierry.reding@gmail.com> wrote:
>
> On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
> > > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> > > >
> > > > --0F1p//8PRICkK4MW
> > > > Content-Type: text/plain; charset=us-ascii
> > > > Content-Disposition: inline
> > > > Content-Transfer-Encoding: quoted-printable
> > > >
> > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>=
> > > >  wrote:
> > > > > >
> > > > > > From: Thierry Reding <treding@nvidia.com>
> > > > > >
> > > > > > Currently the driver PM core will automatically acquire a runtime PM
> > > > > > reference for devices before system sleep is entered. This is needed
> > > > > > to avoid potential issues related to devices' parents getting put to
> > > > > > runtime suspend at the wrong time and causing problems with their
> > > > > > children.
> > > > >=20
> > > > > Not only for that.
> > > > >=20
> > > > > > In some cases drivers are carefully written to avoid such issues and
> > > > > > the default behaviour can be changed to allow runtime PM to operate
> > > > > > regularly during system sleep.
> > > > >=20
> > > > > But this change breaks quite a few assumptions in the core too, so no,
> > > > > it can't be made.
> > > >
> > > > Anything in particular that I can look at? I'm not seeing any issues
> > > > when I test this, which could of course mean that I'm just getting
> > > > lucky.
> > >
> > > There are races and such that you may never hit during casual testing.
> > >
> > > > One thing that irritated me is that I think this used to work. I do
> > > > recall testing suspend/resume a few years ago and devices would get
> > > > properly runtime suspended/resumed.
> > >
> > > Not true at all.
> > >
> > > The PM core has always taken PM-runtime references on all devices pretty much
> > > since when PM-runtime was introduced.
> > >
> > > > I did some digging but couldn't
> > > > find anything that would have had an impact on this.
> > > >
> > > > Given that this is completely opt-in feature, why are you categorically
> > > > NAK'ing this?
> > >
> > > The general problem is that if any device has been touched by system-wide
> > > suspend code, it should not be subject to PM-runtime any more until the
> > > subsequent system-wide resume is able to undo whatever the suspend did.
> > >
> > > Moreover, if a device is runtime-suspended, the system-wide suspend code
> > > may mishandle it, in general.  That's why PM-runtime suspend is not allowed
> > > during system-wide transitions at all.  And it has always been like that.
> > >
> > > For a specific platform you may be able to overcome these limitations if
> > > you are careful enough, but certainly they are there in general and surely
> > > you cannot prevent people from using your opt-in just because they think
> > > that they know what they are doing.
> >
> > BTW, what if user space prevents PM-runtime from suspending devices by writing
> > "on" to their "control" files?
> >
> > System-wide suspend is (of course) still expected to work in that case, so how
> > exactly would you overcome that?
>
> I suppose one way to overcome that would be to make it an error to write
> "on" to the "control" files for these devices.

Seeing suggestions like this in messages from seasoned kernel
developers is seriously disappointing. :-/

> Currently doing this is likely going to break display support on Tegra,
> so this would be a good idea in this case anyway.

PM-runtime has always allowed user space to prevent devices from being
suspended and it seems that this has not been taken into account by
Tegra display support developers at all.

> Again, I could avoid all of these issues by avoiding runtime PM in this driver,

I don't quite see the connection here.

Preventing a device from suspending should never be a functional
problem.  It may be an energy-efficiency problem, but that's something
for user space to consider before writing "on" to a device's control
file.

> but I would end up reimplementing some of the same concepts. I'd
> rather use something that's supported by the PM core and that might be
> useful to other drivers than reinvent the wheel.

Which doesn't have to be by using PM-runtime suspend for the handling
of system-wide suspend, at least in my view.
Thierry Reding Nov. 29, 2019, 11:44 a.m. UTC | #9
On Fri, Nov 29, 2019 at 11:09:26AM +0100, Rafael J. Wysocki wrote:
> On Fri, Nov 29, 2019 at 10:34 AM Thierry Reding
> <thierry.reding@gmail.com> wrote:
> >
> > On Thu, Nov 28, 2019 at 11:03:57PM +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> > > >
> > > > --0F1p//8PRICkK4MW
> > > > Content-Type: text/plain; charset=us-ascii
> > > > Content-Disposition: inline
> > > > Content-Transfer-Encoding: quoted-printable
> > > >
> > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>=
> > > >  wrote:
> > > > > >
> > > > > > From: Thierry Reding <treding@nvidia.com>
> > > > > >
> > > > > > Currently the driver PM core will automatically acquire a runtime PM
> > > > > > reference for devices before system sleep is entered. This is needed
> > > > > > to avoid potential issues related to devices' parents getting put to
> > > > > > runtime suspend at the wrong time and causing problems with their
> > > > > > children.
> > > > >=20
> > > > > Not only for that.
> > > > >=20
> > > > > > In some cases drivers are carefully written to avoid such issues and
> > > > > > the default behaviour can be changed to allow runtime PM to operate
> > > > > > regularly during system sleep.
> > > > >=20
> > > > > But this change breaks quite a few assumptions in the core too, so no,
> > > > > it can't be made.
> > > >
> > > > Anything in particular that I can look at? I'm not seeing any issues
> > > > when I test this, which could of course mean that I'm just getting
> > > > lucky.
> > >
> > > There are races and such that you may never hit during casual testing.
> > >
> > > > One thing that irritated me is that I think this used to work. I do
> > > > recall testing suspend/resume a few years ago and devices would get
> > > > properly runtime suspended/resumed.
> > >
> > > Not true at all.
> > >
> > > The PM core has always taken PM-runtime references on all devices pretty much
> > > since when PM-runtime was introduced.
> >
> > You're right. I was finally able to find a toolchain that I could build
> > an old version of the kernel with. I tested system suspend/resume on the
> > v4.8 release, which is the first one that had the runtime PM changes as
> > well as the subsystem suspend/resume support wired up, and I can't see
> > the runtime PM callbacks invoked during system suspend/resume.
> >
> > So I must be misremembering, or I'm confusing it with some other tests I
> > was running at the time.
> >
> > > > I did some digging but couldn't
> > > > find anything that would have had an impact on this.
> > > >
> > > > Given that this is completely opt-in feature, why are you categorically
> > > > NAK'ing this?
> > >
> > > The general problem is that if any device has been touched by system-wide
> > > suspend code, it should not be subject to PM-runtime any more until the
> > > subsequent system-wide resume is able to undo whatever the suspend did.
> > >
> > > Moreover, if a device is runtime-suspended, the system-wide suspend code
> > > may mishandle it, in general.  That's why PM-runtime suspend is not allowed
> > > during system-wide transitions at all.  And it has always been like that.
> >
> > For this particular use-case the above should all be irrelevant. None of
> > the drivers involved here do anything special at system suspend, because
> > runtime suspend already puts the devices into the lowest possible power
> > state. Basically when these devices are put into runtime suspend, they
> > are completely turned off. The only exception is for things like HDMI
> > where the +5V pin remains powered, so that hotplug detection will work.
> >
> > The runtime PM state of the devices involved is managed by the subsystem
> > system suspend/resume helpers in DRM/KMS. Basically those helpers turn
> > off all the devices in the composite device, which ultimately results in
> > their last runtime PM reference being released. So for system suspend
> > and resume, these devices aren't touched, other than maybe for the PM
> > core's internal book-keeping.
> 
> OK, so you actually want system-wide PM to work like PM-runtime on the
> platform in question, but there are substantial differences.

That's not exactly what I'm trying to do here. If this was all I wanted
to do I could simply use UNIVERSAL_DEV_PM_OPS.

What I want to do is basically allow the system-wide PM of the subsystem
to control the runtime PM of the devices involved.

> First, PM-runtime suspend can be effectively disabled by user space
> and system-wide suspend is always expected to work.
> 
> Second, if system wakeup devices are involved, their handling during
> system-wide suspend depends on the return value of device_may_wakeup()
> which depends on what user space does, whereas PM-runtime assumes
> device wakeup to be always enabled.
> 
> > > For a specific platform you may be able to overcome these limitations if
> > > you are careful enough, but certainly they are there in general and surely
> > > you cannot prevent people from using your opt-in just because they think
> > > that they know what they are doing.
> >
> > That's true. But the same thing is true for pretty much all other APIs.
> > People obviously have to make sure they know what they're doing, just
> > like they have to with any other API.
> >
> > I suppose the documentation for this new function is currently lacking a
> > bit. Perhaps adding a big warning to this and listing the common
> > pitfalls would help people make the right call about whether or not they
> > can use this.
> 
> And then *somebody* would have to chase a ton of subtle issues
> resulting from that.  No, thanks, but no thanks.

If the kerneldoc makes it clear that they're only supposed to use this
when they exactly know that it's safe to do, I don't think anybody is
going to put the blame on you to fix their bugs. If using this breaks,
it's clearly wrong to use it.

> > > > Is there some other alternative that I can look into?
> > >
> > > First of all, ensure that the dpm_list ordering is what it should be on the
> > > system/platform in question.  That can be done with the help of device links.
> >
> > I don't think we have device links for everything, but the deferred
> > probe code should take care of ordering the dpm_list correctly because
> > we do handle deferred probe properly in all cases.
> >
> > Also, the dpm_list ordering isn't very critical in this case. If the
> > devices are allowed to runtime suspend during system sleep, the
> > subsystem sleep helper will put them into runtime suspend at the correct
> > time. This is propagated all the way through the display pipeline and
> > that order is ensured by the subsystem helpers.
> 
> You are still not saying what happens if user space doesn't allow
> PM-runtime to suspend the devices (by writing "on" to their "control"
> files).

I was suggesting that we prohibit that, which you clearly didn't like.
You didn't give any reasons for why you think this is a bad idea, but
the alternative would be to implement some driver-specific equivalent of
that.

At that point, does it really matter whether the user is prevented from
prohibiting suspend via runtime PM or some non-standard mechanism with a
different name but equivalent functionality?

The fact is that in order to properly use the device we need to be able
to suspend it. We need to do this to switch video modes anyway. There's
simply no way to make the display work reliably without it going into
suspend and resuming. Whether we call this runtime suspend/resume or
something driver-specific is really just an implementation detail. The
consequences for userspace are exactly the same.

So if you think that allowing userspace to prohibit runtime suspend is
imperative always, then I don't have much choice but to do it without
runtime PM.

> > > In addition, make sure that the devices needed to suspend other devices are
> > > suspended in the noirq phase of system-wide suspend and resumed in the
> > > noirq phase of system-wide resume.  Or at least all of the other devices
> > > need to be suspended before them and resumed after them.
> >
> > We're fine on this front as well. We have run into such issues in the
> > past, but I don't think there are any such issue left at the moment. I
> > do have one pending fix for I2C suspend/resume which fixes an issue
> > where some pinmuxing changes needed to get the HDMI DDC channel to work
> > were not getting applied during resume.
> >
> > That I2C issue is related to this, I think. What I'm seeing is that when
> > the system goes to sleep, the pinmux looses its programming at a
> > hardware level, but the I2C driver doesn't know about it because it does
> > not get runtime suspended.
> 
> Well, no, that's not the reason.  The real reason is that the handling
> of that device during system-wide suspend does not follow the rules
> followed by PM-runtime for it.
> 
> Switching system-wide PM over to PM-runtime to address that is not
> going to work, because PM-runtime is not mandatory and system-wide PM
> is.
> 
> > At runtime suspend it would switch the pinmux
> > state to "idle" which would then match the system suspend state. Upon
> > runtime resume it sets the "default" pinmux state, which will then
> > restore the register programming.
> 
> So this logic needs to be implemented in the system-wide suspend flow as well.

I suppose one other alternative would be to use universal PM ops for
this case. In this case we actually do want the same behaviour at system
sleep than we do for runtime PM.

> > In the current case where runtime suspend/resume is prohibited during
> 
> Runtime suspend is, runtime resume isn't until the "late" suspend phase.
> 
> > system sleep, upon resume the I2C driver will assume that the pinmux
> > state is still "default" and it won't reapply the state (it's actually
> > the pinmux subsystem that makes this decision) and causes HDMI DDC
> > transactions to time out.
> 
> So this is a bug in the system-wide suspend/resume flow that needs to
> be addressed, but not by switching it over to PM-runtime.
> 
> > One simple fix for that is to use pm_runtime_force_suspend() and
> > pm_runtime_force_resume() as system suspend/resume callbacks to make
> > sure the I2C controller is runtime suspended/resumed during system
> > sleep.
> >
> > Note that forcing runtime suspend/resume this way is suboptimal in the
> > DRM/KMS case because the suspend/resume happens disconnected from the
> > subsystem suspend/resume callbacks, which is not desired as that breaks
> > some of the assumptions in those callbacks.
> 
> So there needs to be another way.
> 
> Have you looked at DPM_FLAG_SMART_SUSPEND?

I'll look at that. It seems like it could do the trick for the I2C
problem I'm seeing.

Generally, though, what I keep noticing here is that for many devices
there is some commonality between runtime PM and system sleep. Actually
for some devices they are exactly the same, which I guess is one of the
reasons why I had hoped we could somehow simplify things by having
runtime PM on one hand and then if system sleep doesn't need anything
other than what runtime PM already does, we could just do runtime PM all
the time. That way we could avoid all the duplication. I guess that's
mostly what universal PM ops are about. I'm not exactly sure how that
would work during system resume, though. Would the PM core not invoke
the same callback twice, once for system resume and then again (after
allowing runtime PM again) for resume runtime?

> > > These two things should allow you to cover the vast majority of cases if
> > > not all of them without messing up with the rules.
> >
> > One alternative that I had thought about was to just ditch the runtime
> > PM callbacks for this. However, there's one corner case where this may
> > break. On early Tegra generations, the two display controllers are
> > "coupled" in that the second one doesn't work if the first one is
> > disabled. We describe that using a device link from the second to the
> > first controller. This causes the first controller to be automatically
> > be runtime resumed when the second controller is used. This only works
> > via runtime PM, so if I don't use runtime PM I'd have to add special
> > handling for that case.
> 
> Runtime resume during system-wide suspend and resume is basically fine
> unless you try to do it in the "late" suspend phase or later, but that
> limitation is kind of artificial.  [I was talking about that at the
> LPC this year.]  It basically cannot be carried out in the part of
> system-wide suspend after the core regards the device and its parent
> etc as "suspended", but the definition of that may be adjusted IMO.
> 
> And using PM-runtime resume during system-wide resume may be fine too,
> basically (as long as the ordering of that is not lead to any kind of
> loop dependencies).
> 
> On the other hand, there is *zero* need for runtime suspend during
> system-wide transitions and it is known problematic.

I don't quite understand this. I don't see a need to runtime resume
during suspend, because you're actually trying to suspend devices. Most
of the time at least. I get that in some cases you may need to resume
devices in order to help put other devices (that depend on them) into
suspend, but most of the time the goal is to set devices into some low
power state so that when the system is asleep you consume less power
than in the active state. This is more or less the same thing that you
want with runtime PM as well, isn't it?

So why do you say that there's no need for runtime suspend during system
suspend? I always figured that runtime suspend was sort of a soft system
suspend in that system suspend may be more aggressive, and mostly a
superset of runtime suspend. So in pseudocode it would be roughly:

	runtime_suspend(device)
	{
		set_low_power_mode(device);
	}

	system_suspend(device)
	{
		save_context(device);
		runtime_suspend(device);
		power_off(device);
	}

And for resume you could basically just call these in reverse order:

	runtime_resume(device)
	{
		set_normal_mode(device);
	}

	system_resume(device)
	{
		power_on(device);
		runtime_resume(device);
		restore_context(device);
	}

I understand that this may not be true for all devices. However, in some
cases we may even want to go further and do at runtime_suspend() what
we do in system_suspend() because the impact may be low enough and the
power savings worth it.

> > Actually, there's another problem as well. Most of these devices use
> > generic PM domains to power on/off the SoC partitions that they're in.
> > If I side-step runtime PM, then I'd have to somehow find a way to
> > explicitly control the PM domains.
> 
> That's a problem with genpd, I'd say.

Fair enough. So far we've tried to implement things such that they work
within the existing infrastructure, but if runtime PM turns out not to
be what we actually need, maybe we need to just move to something
different.

> > Another alternative would be to have a kind of hybrid approach where I
> > leave runtime PM calls in the drivers, but disconnect the runtime PM
> > callback implementations from that. That would at least fix the issue
> > with the generic PM domains.
> >
> > However, it would not fix the problem with coupled display controllers
> > because empty runtime PM callbacks wouldn't actually power up the first
> > display controller when it is needed by the second controller. I would
> > have to add infrastructure that basically duplicates some of runtime PM
> > to fix that.
> >
> > So the bottom line is that runtime PM is still the best solution for
> > this problem. It works really nice and is very consistent.
> >
> > Do you think adding better documentation to this new flag and the
> > accessors would help remove your concerns about this?
> 
> No, it wouldn't.
> 
> Also your arguments are mostly about PM-runtime resume, which is a
> different story.

What makes you say that? I'm equally concerned about runtime suspend
because runtime resume alone is not good enough to resume a device that
was never suspended. Runtime suspend will typically assert a reset for
these devices and runtime resume will then deassert the reset. This is
necessary to get the devices into a proper working state.

Thierry
Thierry Reding Nov. 29, 2019, 12:07 p.m. UTC | #10
On Fri, Nov 29, 2019 at 11:22:08AM +0100, Rafael J. Wysocki wrote:
> On Fri, Nov 29, 2019 at 11:08 AM Thierry Reding
> <thierry.reding@gmail.com> wrote:
> >
> > On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
> > > > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> > > > >
> > > > > --0F1p//8PRICkK4MW
> > > > > Content-Type: text/plain; charset=us-ascii
> > > > > Content-Disposition: inline
> > > > > Content-Transfer-Encoding: quoted-printable
> > > > >
> > > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > > > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>=
> > > > >  wrote:
> > > > > > >
> > > > > > > From: Thierry Reding <treding@nvidia.com>
> > > > > > >
> > > > > > > Currently the driver PM core will automatically acquire a runtime PM
> > > > > > > reference for devices before system sleep is entered. This is needed
> > > > > > > to avoid potential issues related to devices' parents getting put to
> > > > > > > runtime suspend at the wrong time and causing problems with their
> > > > > > > children.
> > > > > >=20
> > > > > > Not only for that.
> > > > > >=20
> > > > > > > In some cases drivers are carefully written to avoid such issues and
> > > > > > > the default behaviour can be changed to allow runtime PM to operate
> > > > > > > regularly during system sleep.
> > > > > >=20
> > > > > > But this change breaks quite a few assumptions in the core too, so no,
> > > > > > it can't be made.
> > > > >
> > > > > Anything in particular that I can look at? I'm not seeing any issues
> > > > > when I test this, which could of course mean that I'm just getting
> > > > > lucky.
> > > >
> > > > There are races and such that you may never hit during casual testing.
> > > >
> > > > > One thing that irritated me is that I think this used to work. I do
> > > > > recall testing suspend/resume a few years ago and devices would get
> > > > > properly runtime suspended/resumed.
> > > >
> > > > Not true at all.
> > > >
> > > > The PM core has always taken PM-runtime references on all devices pretty much
> > > > since when PM-runtime was introduced.
> > > >
> > > > > I did some digging but couldn't
> > > > > find anything that would have had an impact on this.
> > > > >
> > > > > Given that this is completely opt-in feature, why are you categorically
> > > > > NAK'ing this?
> > > >
> > > > The general problem is that if any device has been touched by system-wide
> > > > suspend code, it should not be subject to PM-runtime any more until the
> > > > subsequent system-wide resume is able to undo whatever the suspend did.
> > > >
> > > > Moreover, if a device is runtime-suspended, the system-wide suspend code
> > > > may mishandle it, in general.  That's why PM-runtime suspend is not allowed
> > > > during system-wide transitions at all.  And it has always been like that.
> > > >
> > > > For a specific platform you may be able to overcome these limitations if
> > > > you are careful enough, but certainly they are there in general and surely
> > > > you cannot prevent people from using your opt-in just because they think
> > > > that they know what they are doing.
> > >
> > > BTW, what if user space prevents PM-runtime from suspending devices by writing
> > > "on" to their "control" files?
> > >
> > > System-wide suspend is (of course) still expected to work in that case, so how
> > > exactly would you overcome that?
> >
> > I suppose one way to overcome that would be to make it an error to write
> > "on" to the "control" files for these devices.
> 
> Seeing suggestions like this in messages from seasoned kernel
> developers is seriously disappointing. :-/
> 
> > Currently doing this is likely going to break display support on Tegra,
> > so this would be a good idea in this case anyway.
> 
> PM-runtime has always allowed user space to prevent devices from being
> suspended and it seems that this has not been taken into account by
> Tegra display support developers at all.
> 
> > Again, I could avoid all of these issues by avoiding runtime PM in this driver,
> 
> I don't quite see the connection here.
> 
> Preventing a device from suspending should never be a functional
> problem.  It may be an energy-efficiency problem, but that's something
> for user space to consider before writing "on" to a device's control
> file.

That's really a question of how you define suspension. In the case of
display drivers we have the somewhat unfortunate situation that in most
SoCs the display "device" is actually represented by a collection of
different devices. On Tegra specifically, for example, you have a couple
of display controllers, then some "encoders" that take pixel streams
from the display controllers and encode them into some wire format like
LVDS, HDMI, DSI or DP.

Prohibiting suspension of any of the individual devices causes problems
because it effectively makes the whole composite display device not
suspendable. Doing so in turn usually means that you can't change the
display configuration anymore because devices need to be powered up and
down in order to change the configuration.

I consider powering up and down the devices a form of suspension. Hence
it seemed natural to implement using runtime PM.

It sounds to me like userspace preventing runtime PM is problematic in
most scenarios that involve composite devices because it makes all of
the interactions between the devices a bit complicated.

> > but I would end up reimplementing some of the same concepts. I'd
> > rather use something that's supported by the PM core and that might be
> > useful to other drivers than reinvent the wheel.
> 
> Which doesn't have to be by using PM-runtime suspend for the handling
> of system-wide suspend, at least in my view.

Well, runtime PM is very convenient for this, though. It would allow the
same code paths to be used in all cases.

Thierry
Daniel Vetter Nov. 29, 2019, 8:27 p.m. UTC | #11
On Fri, Nov 29, 2019 at 01:07:19PM +0100, Thierry Reding wrote:
> On Fri, Nov 29, 2019 at 11:22:08AM +0100, Rafael J. Wysocki wrote:
> > On Fri, Nov 29, 2019 at 11:08 AM Thierry Reding
> > <thierry.reding@gmail.com> wrote:
> > >
> > > On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote:
> > > > > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote:
> > > > > >
> > > > > > --0F1p//8PRICkK4MW
> > > > > > Content-Type: text/plain; charset=us-ascii
> > > > > > Content-Disposition: inline
> > > > > > Content-Transfer-Encoding: quoted-printable
> > > > > >
> > > > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote:
> > > > > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>=
> > > > > >  wrote:
> > > > > > > >
> > > > > > > > From: Thierry Reding <treding@nvidia.com>
> > > > > > > >
> > > > > > > > Currently the driver PM core will automatically acquire a runtime PM
> > > > > > > > reference for devices before system sleep is entered. This is needed
> > > > > > > > to avoid potential issues related to devices' parents getting put to
> > > > > > > > runtime suspend at the wrong time and causing problems with their
> > > > > > > > children.
> > > > > > >=20
> > > > > > > Not only for that.
> > > > > > >=20
> > > > > > > > In some cases drivers are carefully written to avoid such issues and
> > > > > > > > the default behaviour can be changed to allow runtime PM to operate
> > > > > > > > regularly during system sleep.
> > > > > > >=20
> > > > > > > But this change breaks quite a few assumptions in the core too, so no,
> > > > > > > it can't be made.
> > > > > >
> > > > > > Anything in particular that I can look at? I'm not seeing any issues
> > > > > > when I test this, which could of course mean that I'm just getting
> > > > > > lucky.
> > > > >
> > > > > There are races and such that you may never hit during casual testing.
> > > > >
> > > > > > One thing that irritated me is that I think this used to work. I do
> > > > > > recall testing suspend/resume a few years ago and devices would get
> > > > > > properly runtime suspended/resumed.
> > > > >
> > > > > Not true at all.
> > > > >
> > > > > The PM core has always taken PM-runtime references on all devices pretty much
> > > > > since when PM-runtime was introduced.
> > > > >
> > > > > > I did some digging but couldn't
> > > > > > find anything that would have had an impact on this.
> > > > > >
> > > > > > Given that this is completely opt-in feature, why are you categorically
> > > > > > NAK'ing this?
> > > > >
> > > > > The general problem is that if any device has been touched by system-wide
> > > > > suspend code, it should not be subject to PM-runtime any more until the
> > > > > subsequent system-wide resume is able to undo whatever the suspend did.
> > > > >
> > > > > Moreover, if a device is runtime-suspended, the system-wide suspend code
> > > > > may mishandle it, in general.  That's why PM-runtime suspend is not allowed
> > > > > during system-wide transitions at all.  And it has always been like that.
> > > > >
> > > > > For a specific platform you may be able to overcome these limitations if
> > > > > you are careful enough, but certainly they are there in general and surely
> > > > > you cannot prevent people from using your opt-in just because they think
> > > > > that they know what they are doing.
> > > >
> > > > BTW, what if user space prevents PM-runtime from suspending devices by writing
> > > > "on" to their "control" files?
> > > >
> > > > System-wide suspend is (of course) still expected to work in that case, so how
> > > > exactly would you overcome that?
> > >
> > > I suppose one way to overcome that would be to make it an error to write
> > > "on" to the "control" files for these devices.
> > 
> > Seeing suggestions like this in messages from seasoned kernel
> > developers is seriously disappointing. :-/
> > 
> > > Currently doing this is likely going to break display support on Tegra,
> > > so this would be a good idea in this case anyway.
> > 
> > PM-runtime has always allowed user space to prevent devices from being
> > suspended and it seems that this has not been taken into account by
> > Tegra display support developers at all.
> > 
> > > Again, I could avoid all of these issues by avoiding runtime PM in this driver,
> > 
> > I don't quite see the connection here.
> > 
> > Preventing a device from suspending should never be a functional
> > problem.  It may be an energy-efficiency problem, but that's something
> > for user space to consider before writing "on" to a device's control
> > file.
> 
> That's really a question of how you define suspension. In the case of
> display drivers we have the somewhat unfortunate situation that in most
> SoCs the display "device" is actually represented by a collection of
> different devices. On Tegra specifically, for example, you have a couple
> of display controllers, then some "encoders" that take pixel streams
> from the display controllers and encode them into some wire format like
> LVDS, HDMI, DSI or DP.
> 
> Prohibiting suspension of any of the individual devices causes problems
> because it effectively makes the whole composite display device not
> suspendable. Doing so in turn usually means that you can't change the
> display configuration anymore because devices need to be powered up and
> down in order to change the configuration.
> 
> I consider powering up and down the devices a form of suspension. Hence
> it seemed natural to implement using runtime PM.
> 
> It sounds to me like userspace preventing runtime PM is problematic in
> most scenarios that involve composite devices because it makes all of
> the interactions between the devices a bit complicated.

Yeah with the DT model of how a SoC works, all these tiny little devices
are essentially implementation details that userspace really shouldn't
ever care about, much less change anything with them.

If userspace doesn't want the gpu to auto-suspend, then there's the
overall gpu device that it can set that on, and I guess doing that should
not break a decently written driver. For all the others insisting that
userspace can be stupid essentially means we get to hand roll large chunks
of runtime pm in drivers, which feels rather pointless. We have a lot of
that home-grown runtime pm for subcomponents in i915, and I very much
understand why the DT folks wanted to standardize all that with lots of
little explicit devices.

Maybe the mistake was simply allowing these to be visible to userspace.
-Daniel

> > > but I would end up reimplementing some of the same concepts. I'd
> > > rather use something that's supported by the PM core and that might be
> > > useful to other drivers than reinvent the wheel.
> > 
> > Which doesn't have to be by using PM-runtime suspend for the handling
> > of system-wide suspend, at least in my view.
> 
> Well, runtime PM is very convenient for this, though. It would allow the
> same code paths to be used in all cases.
> 
> Thierry



> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Rafael J. Wysocki Dec. 4, 2019, 12:02 a.m. UTC | #12
On Fri, Nov 29, 2019 at 1:07 PM Thierry Reding <thierry.reding@gmail.com> wrote:
>
> On Fri, Nov 29, 2019 at 11:22:08AM +0100, Rafael J. Wysocki wrote:
> >

[cut]

Sorry for the delay.

First off, let me note that I have seen your most recent patches and
thanks for taking the feedback into account, much appreciated!

Nevertheless, I feel that I need to address the below, because it is
really important.

> > Preventing a device from suspending should never be a functional
> > problem.  It may be an energy-efficiency problem, but that's something
> > for user space to consider before writing "on" to a device's control
> > file.
>
> That's really a question of how you define suspension.

In general, yes.

However, if you talk about PM-runtime, there are definitions of
"suspended" and "active" in there already.  Namely, in the PM-runtime
context, "suspended" means "may not be accessible to software" whereas
"active" means "software can access it".

> In the case of
> display drivers we have the somewhat unfortunate situation that in most
> SoCs the display "device" is actually represented by a collection of
> different devices. On Tegra specifically, for example, you have a couple
> of display controllers, then some "encoders" that take pixel streams
> from the display controllers and encode them into some wire format like
> LVDS, HDMI, DSI or DP.
>
> Prohibiting suspension of any of the individual devices causes problems
> because it effectively makes the whole composite display device not
> suspendable.

For PM-runtime, that shouldn't be a problem at all.

PM-runtime is all about (possibly) saving energy by powering down
devices that are not in use.  In particular, It is not about powering
down any devices on demand for any reason other than idleness.
Therefore in PM-runtime a situation in which a given device cannot be
suspended at all is regarded as normal, even though that may not be
desirable for energy-efficiency reasons.  It just means that the
device is in use by somebody all the time.  Moreover, PM-runtime is
designed to make it possible to resume devices at any time (as long as
the hardware works as expected), as soon as they are needed, modulo
some possible delays.  Actually, that's the purpose of a significant
part of the PM-runtime framework.

Accordingly, device drivers may refuse to suspend devices, but
refusing to resume a device is not expected by PM-runtime.

If writing "on" to the "control" file of a device does not cause it to
be resumed (if suspended) and to stay in the "active" meta-state until
"auto" is written to that file, you cannot really claim that
PM-runtime is working correctly on your system.

> Doing so in turn usually means that you can't change the
> display configuration anymore because devices need to be powered up and
> down in order to change the configuration.
>
> I consider powering up and down the devices a form of suspension. Hence
> it seemed natural to implement using runtime PM.

Unfortunately, that's not the case.

The purpose of PM-runtime is to allow idle devices to be put into
power states in which it may not be safe to access them and to make
them go back into the "accessible and responsive" state whenever
software wants/needs to access them in a coordinated fashion.  IOW, it
kind of is a counterpart of CPU idle time management.

> It sounds to me like userspace preventing runtime PM is problematic in
> most scenarios that involve composite devices because it makes all of
> the interactions between the devices a bit complicated.

Even so, that's how it works.

User space can expect to be able to block runtime suspend of devices
at any level of device hierarchy, at least for diagnostics if nothing
else, end the kernel is responsible for ensuring that.

> > > but I would end up reimplementing some of the same concepts. I'd
> > > rather use something that's supported by the PM core and that might be
> > > useful to other drivers than reinvent the wheel.
> >
> > Which doesn't have to be by using PM-runtime suspend for the handling
> > of system-wide suspend, at least in my view.
>
> Well, runtime PM is very convenient for this, though. It would allow the
> same code paths to be used in all cases.

The same low-level power-up and power-down code can be used in all
cases, but PM-runtime is not low-level enough.  It is also
opportunistic, so if you need to power down a device for reasons other
than "natural" idleness, PM-runtime is not the right tool for that
task.

Of course, PM-runtime callbacks can invoke the low-level power-up and
power-down code, but as you said there are reasons for powering down
devices not just because they happen to be idle.  System-wide suspend
is one of them.
diff mbox series

Patch

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 134a8af51511..f8dbf00c703b 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1113,7 +1113,8 @@  static void device_complete(struct device *dev, pm_message_t state)
 
 	device_unlock(dev);
 
-	pm_runtime_put(dev);
+	if (!dev->power.always_runtime)
+		pm_runtime_put(dev);
 }
 
 /**
@@ -1896,7 +1897,8 @@  static int device_prepare(struct device *dev, pm_message_t state)
 	 * block runtime suspend here, during the prepare phase, and allow
 	 * it again during the complete phase.
 	 */
-	pm_runtime_get_noresume(dev);
+	if (!dev->power.always_runtime)
+		pm_runtime_get_noresume(dev);
 
 	device_lock(dev);
 
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 48616f358854..699803984426 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -1440,6 +1440,22 @@  void pm_runtime_allow(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(pm_runtime_allow);
 
+void pm_runtime_always_allow(struct device *dev)
+{
+	spin_lock_irq(&dev->power.lock);
+	dev->power.always_runtime = 1;
+	spin_unlock_irq(&dev->power.lock);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_always_allow);
+
+void pm_runtime_always_forbid(struct device *dev)
+{
+	spin_lock_irq(&dev->power.lock);
+	dev->power.always_runtime = 0;
+	spin_unlock_irq(&dev->power.lock);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_always_forbid);
+
 /**
  * pm_runtime_no_callbacks - Ignore runtime PM callbacks for a device.
  * @dev: Device to handle.
diff --git a/include/linux/pm.h b/include/linux/pm.h
index e057d1fa2469..6133cf496878 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -615,6 +615,7 @@  struct dev_pm_info {
 	unsigned int		use_autosuspend:1;
 	unsigned int		timer_autosuspends:1;
 	unsigned int		memalloc_noio:1;
+	unsigned int		always_runtime:1;
 	unsigned int		links_count;
 	enum rpm_request	request;
 	enum rpm_status		runtime_status;
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index 22af69d237a6..28204baf01cb 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -46,6 +46,8 @@  extern void pm_runtime_enable(struct device *dev);
 extern void __pm_runtime_disable(struct device *dev, bool check_resume);
 extern void pm_runtime_allow(struct device *dev);
 extern void pm_runtime_forbid(struct device *dev);
+extern void pm_runtime_always_allow(struct device *dev);
+extern void pm_runtime_always_forbid(struct device *dev);
 extern void pm_runtime_no_callbacks(struct device *dev);
 extern void pm_runtime_irq_safe(struct device *dev);
 extern void __pm_runtime_use_autosuspend(struct device *dev, bool use);