Message ID | 20191128160314.2381249-2-thierry.reding@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PM / runtime: Allow drivers to override runtime PM behaviour on sleep | expand |
On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > From: Thierry Reding <treding@nvidia.com> > > Currently the driver PM core will automatically acquire a runtime PM > reference for devices before system sleep is entered. This is needed > to avoid potential issues related to devices' parents getting put to > runtime suspend at the wrong time and causing problems with their > children. Not only for that. > In some cases drivers are carefully written to avoid such issues and > the default behaviour can be changed to allow runtime PM to operate > regularly during system sleep. But this change breaks quite a few assumptions in the core too, so no, it can't be made. Thanks!
On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > From: Thierry Reding <treding@nvidia.com> > > > > Currently the driver PM core will automatically acquire a runtime PM > > reference for devices before system sleep is entered. This is needed > > to avoid potential issues related to devices' parents getting put to > > runtime suspend at the wrong time and causing problems with their > > children. > > Not only for that. > > > In some cases drivers are carefully written to avoid such issues and > > the default behaviour can be changed to allow runtime PM to operate > > regularly during system sleep. > > But this change breaks quite a few assumptions in the core too, so no, > it can't be made. Anything in particular that I can look at? I'm not seeing any issues when I test this, which could of course mean that I'm just getting lucky. One thing that irritated me is that I think this used to work. I do recall testing suspend/resume a few years ago and devices would get properly runtime suspended/resumed. I did some digging but couldn't find anything that would have had an impact on this. Given that this is completely opt-in feature, why are you categorically NAK'ing this? Is there some other alternative that I can look into? Thierry
On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote: > > --0F1p//8PRICkK4MW > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > Content-Transfer-Encoding: quoted-printable > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>= > wrote: > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > Currently the driver PM core will automatically acquire a runtime PM > > > reference for devices before system sleep is entered. This is needed > > > to avoid potential issues related to devices' parents getting put to > > > runtime suspend at the wrong time and causing problems with their > > > children. > >=20 > > Not only for that. > >=20 > > > In some cases drivers are carefully written to avoid such issues and > > > the default behaviour can be changed to allow runtime PM to operate > > > regularly during system sleep. > >=20 > > But this change breaks quite a few assumptions in the core too, so no, > > it can't be made. > > Anything in particular that I can look at? I'm not seeing any issues > when I test this, which could of course mean that I'm just getting > lucky. There are races and such that you may never hit during casual testing. > One thing that irritated me is that I think this used to work. I do > recall testing suspend/resume a few years ago and devices would get > properly runtime suspended/resumed. Not true at all. The PM core has always taken PM-runtime references on all devices pretty much since when PM-runtime was introduced. > I did some digging but couldn't > find anything that would have had an impact on this. > > Given that this is completely opt-in feature, why are you categorically > NAK'ing this? The general problem is that if any device has been touched by system-wide suspend code, it should not be subject to PM-runtime any more until the subsequent system-wide resume is able to undo whatever the suspend did. Moreover, if a device is runtime-suspended, the system-wide suspend code may mishandle it, in general. That's why PM-runtime suspend is not allowed during system-wide transitions at all. And it has always been like that. For a specific platform you may be able to overcome these limitations if you are careful enough, but certainly they are there in general and surely you cannot prevent people from using your opt-in just because they think that they know what they are doing. > Is there some other alternative that I can look into? First of all, ensure that the dpm_list ordering is what it should be on the system/platform in question. That can be done with the help of device links. In addition, make sure that the devices needed to suspend other devices are suspended in the noirq phase of system-wide suspend and resumed in the noirq phase of system-wide resume. Or at least all of the other devices need to be suspended before them and resumed after them. These two things should allow you to cover the vast majority of cases if not all of them without messing up with the rules. Thanks!
On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote: > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote: > > > > --0F1p//8PRICkK4MW > > Content-Type: text/plain; charset=us-ascii > > Content-Disposition: inline > > Content-Transfer-Encoding: quoted-printable > > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>= > > wrote: > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > Currently the driver PM core will automatically acquire a runtime PM > > > > reference for devices before system sleep is entered. This is needed > > > > to avoid potential issues related to devices' parents getting put to > > > > runtime suspend at the wrong time and causing problems with their > > > > children. > > >=20 > > > Not only for that. > > >=20 > > > > In some cases drivers are carefully written to avoid such issues and > > > > the default behaviour can be changed to allow runtime PM to operate > > > > regularly during system sleep. > > >=20 > > > But this change breaks quite a few assumptions in the core too, so no, > > > it can't be made. > > > > Anything in particular that I can look at? I'm not seeing any issues > > when I test this, which could of course mean that I'm just getting > > lucky. > > There are races and such that you may never hit during casual testing. > > > One thing that irritated me is that I think this used to work. I do > > recall testing suspend/resume a few years ago and devices would get > > properly runtime suspended/resumed. > > Not true at all. > > The PM core has always taken PM-runtime references on all devices pretty much > since when PM-runtime was introduced. > > > I did some digging but couldn't > > find anything that would have had an impact on this. > > > > Given that this is completely opt-in feature, why are you categorically > > NAK'ing this? > > The general problem is that if any device has been touched by system-wide > suspend code, it should not be subject to PM-runtime any more until the > subsequent system-wide resume is able to undo whatever the suspend did. > > Moreover, if a device is runtime-suspended, the system-wide suspend code > may mishandle it, in general. That's why PM-runtime suspend is not allowed > during system-wide transitions at all. And it has always been like that. > > For a specific platform you may be able to overcome these limitations if > you are careful enough, but certainly they are there in general and surely > you cannot prevent people from using your opt-in just because they think > that they know what they are doing. BTW, what if user space prevents PM-runtime from suspending devices by writing "on" to their "control" files? System-wide suspend is (of course) still expected to work in that case, so how exactly would you overcome that?
On Thu, Nov 28, 2019 at 11:03:57PM +0100, Rafael J. Wysocki wrote: > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote: > > > > --0F1p//8PRICkK4MW > > Content-Type: text/plain; charset=us-ascii > > Content-Disposition: inline > > Content-Transfer-Encoding: quoted-printable > > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>= > > wrote: > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > Currently the driver PM core will automatically acquire a runtime PM > > > > reference for devices before system sleep is entered. This is needed > > > > to avoid potential issues related to devices' parents getting put to > > > > runtime suspend at the wrong time and causing problems with their > > > > children. > > >=20 > > > Not only for that. > > >=20 > > > > In some cases drivers are carefully written to avoid such issues and > > > > the default behaviour can be changed to allow runtime PM to operate > > > > regularly during system sleep. > > >=20 > > > But this change breaks quite a few assumptions in the core too, so no, > > > it can't be made. > > > > Anything in particular that I can look at? I'm not seeing any issues > > when I test this, which could of course mean that I'm just getting > > lucky. > > There are races and such that you may never hit during casual testing. > > > One thing that irritated me is that I think this used to work. I do > > recall testing suspend/resume a few years ago and devices would get > > properly runtime suspended/resumed. > > Not true at all. > > The PM core has always taken PM-runtime references on all devices pretty much > since when PM-runtime was introduced. You're right. I was finally able to find a toolchain that I could build an old version of the kernel with. I tested system suspend/resume on the v4.8 release, which is the first one that had the runtime PM changes as well as the subsystem suspend/resume support wired up, and I can't see the runtime PM callbacks invoked during system suspend/resume. So I must be misremembering, or I'm confusing it with some other tests I was running at the time. > > I did some digging but couldn't > > find anything that would have had an impact on this. > > > > Given that this is completely opt-in feature, why are you categorically > > NAK'ing this? > > The general problem is that if any device has been touched by system-wide > suspend code, it should not be subject to PM-runtime any more until the > subsequent system-wide resume is able to undo whatever the suspend did. > > Moreover, if a device is runtime-suspended, the system-wide suspend code > may mishandle it, in general. That's why PM-runtime suspend is not allowed > during system-wide transitions at all. And it has always been like that. For this particular use-case the above should all be irrelevant. None of the drivers involved here do anything special at system suspend, because runtime suspend already puts the devices into the lowest possible power state. Basically when these devices are put into runtime suspend, they are completely turned off. The only exception is for things like HDMI where the +5V pin remains powered, so that hotplug detection will work. The runtime PM state of the devices involved is managed by the subsystem system suspend/resume helpers in DRM/KMS. Basically those helpers turn off all the devices in the composite device, which ultimately results in their last runtime PM reference being released. So for system suspend and resume, these devices aren't touched, other than maybe for the PM core's internal book-keeping. > For a specific platform you may be able to overcome these limitations if > you are careful enough, but certainly they are there in general and surely > you cannot prevent people from using your opt-in just because they think > that they know what they are doing. That's true. But the same thing is true for pretty much all other APIs. People obviously have to make sure they know what they're doing, just like they have to with any other API. I suppose the documentation for this new function is currently lacking a bit. Perhaps adding a big warning to this and listing the common pitfalls would help people make the right call about whether or not they can use this. > > Is there some other alternative that I can look into? > > First of all, ensure that the dpm_list ordering is what it should be on the > system/platform in question. That can be done with the help of device links. I don't think we have device links for everything, but the deferred probe code should take care of ordering the dpm_list correctly because we do handle deferred probe properly in all cases. Also, the dpm_list ordering isn't very critical in this case. If the devices are allowed to runtime suspend during system sleep, the subsystem sleep helper will put them into runtime suspend at the correct time. This is propagated all the way through the display pipeline and that order is ensured by the subsystem helpers. > In addition, make sure that the devices needed to suspend other devices are > suspended in the noirq phase of system-wide suspend and resumed in the > noirq phase of system-wide resume. Or at least all of the other devices > need to be suspended before them and resumed after them. We're fine on this front as well. We have run into such issues in the past, but I don't think there are any such issue left at the moment. I do have one pending fix for I2C suspend/resume which fixes an issue where some pinmuxing changes needed to get the HDMI DDC channel to work were not getting applied during resume. That I2C issue is related to this, I think. What I'm seeing is that when the system goes to sleep, the pinmux looses its programming at a hardware level, but the I2C driver doesn't know about it because it does not get runtime suspended. At runtime suspend it would switch the pinmux state to "idle" which would then match the system suspend state. Upon runtime resume it sets the "default" pinmux state, which will then restore the register programming. In the current case where runtime suspend/resume is prohibited during system sleep, upon resume the I2C driver will assume that the pinmux state is still "default" and it won't reapply the state (it's actually the pinmux subsystem that makes this decision) and causes HDMI DDC transactions to time out. One simple fix for that is to use pm_runtime_force_suspend() and pm_runtime_force_resume() as system suspend/resume callbacks to make sure the I2C controller is runtime suspended/resumed during system sleep. Note that forcing runtime suspend/resume this way is suboptimal in the DRM/KMS case because the suspend/resume happens disconnected from the subsystem suspend/resume callbacks, which is not desired as that breaks some of the assumptions in those callbacks. > These two things should allow you to cover the vast majority of cases if > not all of them without messing up with the rules. One alternative that I had thought about was to just ditch the runtime PM callbacks for this. However, there's one corner case where this may break. On early Tegra generations, the two display controllers are "coupled" in that the second one doesn't work if the first one is disabled. We describe that using a device link from the second to the first controller. This causes the first controller to be automatically be runtime resumed when the second controller is used. This only works via runtime PM, so if I don't use runtime PM I'd have to add special handling for that case. Actually, there's another problem as well. Most of these devices use generic PM domains to power on/off the SoC partitions that they're in. If I side-step runtime PM, then I'd have to somehow find a way to explicitly control the PM domains. Another alternative would be to have a kind of hybrid approach where I leave runtime PM calls in the drivers, but disconnect the runtime PM callback implementations from that. That would at least fix the issue with the generic PM domains. However, it would not fix the problem with coupled display controllers because empty runtime PM callbacks wouldn't actually power up the first display controller when it is needed by the second controller. I would have to add infrastructure that basically duplicates some of runtime PM to fix that. So the bottom line is that runtime PM is still the best solution for this problem. It works really nice and is very consistent. Do you think adding better documentation to this new flag and the accessors would help remove your concerns about this? Thierry
On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote: > On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote: > > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote: > > > > > > --0F1p//8PRICkK4MW > > > Content-Type: text/plain; charset=us-ascii > > > Content-Disposition: inline > > > Content-Transfer-Encoding: quoted-printable > > > > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>= > > > wrote: > > > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > > > Currently the driver PM core will automatically acquire a runtime PM > > > > > reference for devices before system sleep is entered. This is needed > > > > > to avoid potential issues related to devices' parents getting put to > > > > > runtime suspend at the wrong time and causing problems with their > > > > > children. > > > >=20 > > > > Not only for that. > > > >=20 > > > > > In some cases drivers are carefully written to avoid such issues and > > > > > the default behaviour can be changed to allow runtime PM to operate > > > > > regularly during system sleep. > > > >=20 > > > > But this change breaks quite a few assumptions in the core too, so no, > > > > it can't be made. > > > > > > Anything in particular that I can look at? I'm not seeing any issues > > > when I test this, which could of course mean that I'm just getting > > > lucky. > > > > There are races and such that you may never hit during casual testing. > > > > > One thing that irritated me is that I think this used to work. I do > > > recall testing suspend/resume a few years ago and devices would get > > > properly runtime suspended/resumed. > > > > Not true at all. > > > > The PM core has always taken PM-runtime references on all devices pretty much > > since when PM-runtime was introduced. > > > > > I did some digging but couldn't > > > find anything that would have had an impact on this. > > > > > > Given that this is completely opt-in feature, why are you categorically > > > NAK'ing this? > > > > The general problem is that if any device has been touched by system-wide > > suspend code, it should not be subject to PM-runtime any more until the > > subsequent system-wide resume is able to undo whatever the suspend did. > > > > Moreover, if a device is runtime-suspended, the system-wide suspend code > > may mishandle it, in general. That's why PM-runtime suspend is not allowed > > during system-wide transitions at all. And it has always been like that. > > > > For a specific platform you may be able to overcome these limitations if > > you are careful enough, but certainly they are there in general and surely > > you cannot prevent people from using your opt-in just because they think > > that they know what they are doing. > > BTW, what if user space prevents PM-runtime from suspending devices by writing > "on" to their "control" files? > > System-wide suspend is (of course) still expected to work in that case, so how > exactly would you overcome that? I suppose one way to overcome that would be to make it an error to write "on" to the "control" files for these devices. Currently doing this is likely going to break display support on Tegra, so this would be a good idea in this case anyway. Again, I could avoid all of these issues by avoiding runtime PM in this driver, but I would end up reimplementing some of the same concepts. I'd rather use something that's supported by the PM core and that might be useful to other drivers than reinvent the wheel. Thierry
On Fri, Nov 29, 2019 at 10:34 AM Thierry Reding <thierry.reding@gmail.com> wrote: > > On Thu, Nov 28, 2019 at 11:03:57PM +0100, Rafael J. Wysocki wrote: > > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote: > > > > > > --0F1p//8PRICkK4MW > > > Content-Type: text/plain; charset=us-ascii > > > Content-Disposition: inline > > > Content-Transfer-Encoding: quoted-printable > > > > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>= > > > wrote: > > > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > > > Currently the driver PM core will automatically acquire a runtime PM > > > > > reference for devices before system sleep is entered. This is needed > > > > > to avoid potential issues related to devices' parents getting put to > > > > > runtime suspend at the wrong time and causing problems with their > > > > > children. > > > >=20 > > > > Not only for that. > > > >=20 > > > > > In some cases drivers are carefully written to avoid such issues and > > > > > the default behaviour can be changed to allow runtime PM to operate > > > > > regularly during system sleep. > > > >=20 > > > > But this change breaks quite a few assumptions in the core too, so no, > > > > it can't be made. > > > > > > Anything in particular that I can look at? I'm not seeing any issues > > > when I test this, which could of course mean that I'm just getting > > > lucky. > > > > There are races and such that you may never hit during casual testing. > > > > > One thing that irritated me is that I think this used to work. I do > > > recall testing suspend/resume a few years ago and devices would get > > > properly runtime suspended/resumed. > > > > Not true at all. > > > > The PM core has always taken PM-runtime references on all devices pretty much > > since when PM-runtime was introduced. > > You're right. I was finally able to find a toolchain that I could build > an old version of the kernel with. I tested system suspend/resume on the > v4.8 release, which is the first one that had the runtime PM changes as > well as the subsystem suspend/resume support wired up, and I can't see > the runtime PM callbacks invoked during system suspend/resume. > > So I must be misremembering, or I'm confusing it with some other tests I > was running at the time. > > > > I did some digging but couldn't > > > find anything that would have had an impact on this. > > > > > > Given that this is completely opt-in feature, why are you categorically > > > NAK'ing this? > > > > The general problem is that if any device has been touched by system-wide > > suspend code, it should not be subject to PM-runtime any more until the > > subsequent system-wide resume is able to undo whatever the suspend did. > > > > Moreover, if a device is runtime-suspended, the system-wide suspend code > > may mishandle it, in general. That's why PM-runtime suspend is not allowed > > during system-wide transitions at all. And it has always been like that. > > For this particular use-case the above should all be irrelevant. None of > the drivers involved here do anything special at system suspend, because > runtime suspend already puts the devices into the lowest possible power > state. Basically when these devices are put into runtime suspend, they > are completely turned off. The only exception is for things like HDMI > where the +5V pin remains powered, so that hotplug detection will work. > > The runtime PM state of the devices involved is managed by the subsystem > system suspend/resume helpers in DRM/KMS. Basically those helpers turn > off all the devices in the composite device, which ultimately results in > their last runtime PM reference being released. So for system suspend > and resume, these devices aren't touched, other than maybe for the PM > core's internal book-keeping. OK, so you actually want system-wide PM to work like PM-runtime on the platform in question, but there are substantial differences. First, PM-runtime suspend can be effectively disabled by user space and system-wide suspend is always expected to work. Second, if system wakeup devices are involved, their handling during system-wide suspend depends on the return value of device_may_wakeup() which depends on what user space does, whereas PM-runtime assumes device wakeup to be always enabled. > > For a specific platform you may be able to overcome these limitations if > > you are careful enough, but certainly they are there in general and surely > > you cannot prevent people from using your opt-in just because they think > > that they know what they are doing. > > That's true. But the same thing is true for pretty much all other APIs. > People obviously have to make sure they know what they're doing, just > like they have to with any other API. > > I suppose the documentation for this new function is currently lacking a > bit. Perhaps adding a big warning to this and listing the common > pitfalls would help people make the right call about whether or not they > can use this. And then *somebody* would have to chase a ton of subtle issues resulting from that. No, thanks, but no thanks. > > > Is there some other alternative that I can look into? > > > > First of all, ensure that the dpm_list ordering is what it should be on the > > system/platform in question. That can be done with the help of device links. > > I don't think we have device links for everything, but the deferred > probe code should take care of ordering the dpm_list correctly because > we do handle deferred probe properly in all cases. > > Also, the dpm_list ordering isn't very critical in this case. If the > devices are allowed to runtime suspend during system sleep, the > subsystem sleep helper will put them into runtime suspend at the correct > time. This is propagated all the way through the display pipeline and > that order is ensured by the subsystem helpers. You are still not saying what happens if user space doesn't allow PM-runtime to suspend the devices (by writing "on" to their "control" files). > > In addition, make sure that the devices needed to suspend other devices are > > suspended in the noirq phase of system-wide suspend and resumed in the > > noirq phase of system-wide resume. Or at least all of the other devices > > need to be suspended before them and resumed after them. > > We're fine on this front as well. We have run into such issues in the > past, but I don't think there are any such issue left at the moment. I > do have one pending fix for I2C suspend/resume which fixes an issue > where some pinmuxing changes needed to get the HDMI DDC channel to work > were not getting applied during resume. > > That I2C issue is related to this, I think. What I'm seeing is that when > the system goes to sleep, the pinmux looses its programming at a > hardware level, but the I2C driver doesn't know about it because it does > not get runtime suspended. Well, no, that's not the reason. The real reason is that the handling of that device during system-wide suspend does not follow the rules followed by PM-runtime for it. Switching system-wide PM over to PM-runtime to address that is not going to work, because PM-runtime is not mandatory and system-wide PM is. > At runtime suspend it would switch the pinmux > state to "idle" which would then match the system suspend state. Upon > runtime resume it sets the "default" pinmux state, which will then > restore the register programming. So this logic needs to be implemented in the system-wide suspend flow as well. > In the current case where runtime suspend/resume is prohibited during Runtime suspend is, runtime resume isn't until the "late" suspend phase. > system sleep, upon resume the I2C driver will assume that the pinmux > state is still "default" and it won't reapply the state (it's actually > the pinmux subsystem that makes this decision) and causes HDMI DDC > transactions to time out. So this is a bug in the system-wide suspend/resume flow that needs to be addressed, but not by switching it over to PM-runtime. > One simple fix for that is to use pm_runtime_force_suspend() and > pm_runtime_force_resume() as system suspend/resume callbacks to make > sure the I2C controller is runtime suspended/resumed during system > sleep. > > Note that forcing runtime suspend/resume this way is suboptimal in the > DRM/KMS case because the suspend/resume happens disconnected from the > subsystem suspend/resume callbacks, which is not desired as that breaks > some of the assumptions in those callbacks. So there needs to be another way. Have you looked at DPM_FLAG_SMART_SUSPEND? > > These two things should allow you to cover the vast majority of cases if > > not all of them without messing up with the rules. > > One alternative that I had thought about was to just ditch the runtime > PM callbacks for this. However, there's one corner case where this may > break. On early Tegra generations, the two display controllers are > "coupled" in that the second one doesn't work if the first one is > disabled. We describe that using a device link from the second to the > first controller. This causes the first controller to be automatically > be runtime resumed when the second controller is used. This only works > via runtime PM, so if I don't use runtime PM I'd have to add special > handling for that case. Runtime resume during system-wide suspend and resume is basically fine unless you try to do it in the "late" suspend phase or later, but that limitation is kind of artificial. [I was talking about that at the LPC this year.] It basically cannot be carried out in the part of system-wide suspend after the core regards the device and its parent etc as "suspended", but the definition of that may be adjusted IMO. And using PM-runtime resume during system-wide resume may be fine too, basically (as long as the ordering of that is not lead to any kind of loop dependencies). On the other hand, there is *zero* need for runtime suspend during system-wide transitions and it is known problematic. > Actually, there's another problem as well. Most of these devices use > generic PM domains to power on/off the SoC partitions that they're in. > If I side-step runtime PM, then I'd have to somehow find a way to > explicitly control the PM domains. That's a problem with genpd, I'd say. > Another alternative would be to have a kind of hybrid approach where I > leave runtime PM calls in the drivers, but disconnect the runtime PM > callback implementations from that. That would at least fix the issue > with the generic PM domains. > > However, it would not fix the problem with coupled display controllers > because empty runtime PM callbacks wouldn't actually power up the first > display controller when it is needed by the second controller. I would > have to add infrastructure that basically duplicates some of runtime PM > to fix that. > > So the bottom line is that runtime PM is still the best solution for > this problem. It works really nice and is very consistent. > > Do you think adding better documentation to this new flag and the > accessors would help remove your concerns about this? No, it wouldn't. Also your arguments are mostly about PM-runtime resume, which is a different story.
On Fri, Nov 29, 2019 at 11:08 AM Thierry Reding <thierry.reding@gmail.com> wrote: > > On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote: > > On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote: > > > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote: > > > > > > > > --0F1p//8PRICkK4MW > > > > Content-Type: text/plain; charset=us-ascii > > > > Content-Disposition: inline > > > > Content-Transfer-Encoding: quoted-printable > > > > > > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > > > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>= > > > > wrote: > > > > > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > > > > > Currently the driver PM core will automatically acquire a runtime PM > > > > > > reference for devices before system sleep is entered. This is needed > > > > > > to avoid potential issues related to devices' parents getting put to > > > > > > runtime suspend at the wrong time and causing problems with their > > > > > > children. > > > > >=20 > > > > > Not only for that. > > > > >=20 > > > > > > In some cases drivers are carefully written to avoid such issues and > > > > > > the default behaviour can be changed to allow runtime PM to operate > > > > > > regularly during system sleep. > > > > >=20 > > > > > But this change breaks quite a few assumptions in the core too, so no, > > > > > it can't be made. > > > > > > > > Anything in particular that I can look at? I'm not seeing any issues > > > > when I test this, which could of course mean that I'm just getting > > > > lucky. > > > > > > There are races and such that you may never hit during casual testing. > > > > > > > One thing that irritated me is that I think this used to work. I do > > > > recall testing suspend/resume a few years ago and devices would get > > > > properly runtime suspended/resumed. > > > > > > Not true at all. > > > > > > The PM core has always taken PM-runtime references on all devices pretty much > > > since when PM-runtime was introduced. > > > > > > > I did some digging but couldn't > > > > find anything that would have had an impact on this. > > > > > > > > Given that this is completely opt-in feature, why are you categorically > > > > NAK'ing this? > > > > > > The general problem is that if any device has been touched by system-wide > > > suspend code, it should not be subject to PM-runtime any more until the > > > subsequent system-wide resume is able to undo whatever the suspend did. > > > > > > Moreover, if a device is runtime-suspended, the system-wide suspend code > > > may mishandle it, in general. That's why PM-runtime suspend is not allowed > > > during system-wide transitions at all. And it has always been like that. > > > > > > For a specific platform you may be able to overcome these limitations if > > > you are careful enough, but certainly they are there in general and surely > > > you cannot prevent people from using your opt-in just because they think > > > that they know what they are doing. > > > > BTW, what if user space prevents PM-runtime from suspending devices by writing > > "on" to their "control" files? > > > > System-wide suspend is (of course) still expected to work in that case, so how > > exactly would you overcome that? > > I suppose one way to overcome that would be to make it an error to write > "on" to the "control" files for these devices. Seeing suggestions like this in messages from seasoned kernel developers is seriously disappointing. :-/ > Currently doing this is likely going to break display support on Tegra, > so this would be a good idea in this case anyway. PM-runtime has always allowed user space to prevent devices from being suspended and it seems that this has not been taken into account by Tegra display support developers at all. > Again, I could avoid all of these issues by avoiding runtime PM in this driver, I don't quite see the connection here. Preventing a device from suspending should never be a functional problem. It may be an energy-efficiency problem, but that's something for user space to consider before writing "on" to a device's control file. > but I would end up reimplementing some of the same concepts. I'd > rather use something that's supported by the PM core and that might be > useful to other drivers than reinvent the wheel. Which doesn't have to be by using PM-runtime suspend for the handling of system-wide suspend, at least in my view.
On Fri, Nov 29, 2019 at 11:09:26AM +0100, Rafael J. Wysocki wrote: > On Fri, Nov 29, 2019 at 10:34 AM Thierry Reding > <thierry.reding@gmail.com> wrote: > > > > On Thu, Nov 28, 2019 at 11:03:57PM +0100, Rafael J. Wysocki wrote: > > > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote: > > > > > > > > --0F1p//8PRICkK4MW > > > > Content-Type: text/plain; charset=us-ascii > > > > Content-Disposition: inline > > > > Content-Transfer-Encoding: quoted-printable > > > > > > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > > > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>= > > > > wrote: > > > > > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > > > > > Currently the driver PM core will automatically acquire a runtime PM > > > > > > reference for devices before system sleep is entered. This is needed > > > > > > to avoid potential issues related to devices' parents getting put to > > > > > > runtime suspend at the wrong time and causing problems with their > > > > > > children. > > > > >=20 > > > > > Not only for that. > > > > >=20 > > > > > > In some cases drivers are carefully written to avoid such issues and > > > > > > the default behaviour can be changed to allow runtime PM to operate > > > > > > regularly during system sleep. > > > > >=20 > > > > > But this change breaks quite a few assumptions in the core too, so no, > > > > > it can't be made. > > > > > > > > Anything in particular that I can look at? I'm not seeing any issues > > > > when I test this, which could of course mean that I'm just getting > > > > lucky. > > > > > > There are races and such that you may never hit during casual testing. > > > > > > > One thing that irritated me is that I think this used to work. I do > > > > recall testing suspend/resume a few years ago and devices would get > > > > properly runtime suspended/resumed. > > > > > > Not true at all. > > > > > > The PM core has always taken PM-runtime references on all devices pretty much > > > since when PM-runtime was introduced. > > > > You're right. I was finally able to find a toolchain that I could build > > an old version of the kernel with. I tested system suspend/resume on the > > v4.8 release, which is the first one that had the runtime PM changes as > > well as the subsystem suspend/resume support wired up, and I can't see > > the runtime PM callbacks invoked during system suspend/resume. > > > > So I must be misremembering, or I'm confusing it with some other tests I > > was running at the time. > > > > > > I did some digging but couldn't > > > > find anything that would have had an impact on this. > > > > > > > > Given that this is completely opt-in feature, why are you categorically > > > > NAK'ing this? > > > > > > The general problem is that if any device has been touched by system-wide > > > suspend code, it should not be subject to PM-runtime any more until the > > > subsequent system-wide resume is able to undo whatever the suspend did. > > > > > > Moreover, if a device is runtime-suspended, the system-wide suspend code > > > may mishandle it, in general. That's why PM-runtime suspend is not allowed > > > during system-wide transitions at all. And it has always been like that. > > > > For this particular use-case the above should all be irrelevant. None of > > the drivers involved here do anything special at system suspend, because > > runtime suspend already puts the devices into the lowest possible power > > state. Basically when these devices are put into runtime suspend, they > > are completely turned off. The only exception is for things like HDMI > > where the +5V pin remains powered, so that hotplug detection will work. > > > > The runtime PM state of the devices involved is managed by the subsystem > > system suspend/resume helpers in DRM/KMS. Basically those helpers turn > > off all the devices in the composite device, which ultimately results in > > their last runtime PM reference being released. So for system suspend > > and resume, these devices aren't touched, other than maybe for the PM > > core's internal book-keeping. > > OK, so you actually want system-wide PM to work like PM-runtime on the > platform in question, but there are substantial differences. That's not exactly what I'm trying to do here. If this was all I wanted to do I could simply use UNIVERSAL_DEV_PM_OPS. What I want to do is basically allow the system-wide PM of the subsystem to control the runtime PM of the devices involved. > First, PM-runtime suspend can be effectively disabled by user space > and system-wide suspend is always expected to work. > > Second, if system wakeup devices are involved, their handling during > system-wide suspend depends on the return value of device_may_wakeup() > which depends on what user space does, whereas PM-runtime assumes > device wakeup to be always enabled. > > > > For a specific platform you may be able to overcome these limitations if > > > you are careful enough, but certainly they are there in general and surely > > > you cannot prevent people from using your opt-in just because they think > > > that they know what they are doing. > > > > That's true. But the same thing is true for pretty much all other APIs. > > People obviously have to make sure they know what they're doing, just > > like they have to with any other API. > > > > I suppose the documentation for this new function is currently lacking a > > bit. Perhaps adding a big warning to this and listing the common > > pitfalls would help people make the right call about whether or not they > > can use this. > > And then *somebody* would have to chase a ton of subtle issues > resulting from that. No, thanks, but no thanks. If the kerneldoc makes it clear that they're only supposed to use this when they exactly know that it's safe to do, I don't think anybody is going to put the blame on you to fix their bugs. If using this breaks, it's clearly wrong to use it. > > > > Is there some other alternative that I can look into? > > > > > > First of all, ensure that the dpm_list ordering is what it should be on the > > > system/platform in question. That can be done with the help of device links. > > > > I don't think we have device links for everything, but the deferred > > probe code should take care of ordering the dpm_list correctly because > > we do handle deferred probe properly in all cases. > > > > Also, the dpm_list ordering isn't very critical in this case. If the > > devices are allowed to runtime suspend during system sleep, the > > subsystem sleep helper will put them into runtime suspend at the correct > > time. This is propagated all the way through the display pipeline and > > that order is ensured by the subsystem helpers. > > You are still not saying what happens if user space doesn't allow > PM-runtime to suspend the devices (by writing "on" to their "control" > files). I was suggesting that we prohibit that, which you clearly didn't like. You didn't give any reasons for why you think this is a bad idea, but the alternative would be to implement some driver-specific equivalent of that. At that point, does it really matter whether the user is prevented from prohibiting suspend via runtime PM or some non-standard mechanism with a different name but equivalent functionality? The fact is that in order to properly use the device we need to be able to suspend it. We need to do this to switch video modes anyway. There's simply no way to make the display work reliably without it going into suspend and resuming. Whether we call this runtime suspend/resume or something driver-specific is really just an implementation detail. The consequences for userspace are exactly the same. So if you think that allowing userspace to prohibit runtime suspend is imperative always, then I don't have much choice but to do it without runtime PM. > > > In addition, make sure that the devices needed to suspend other devices are > > > suspended in the noirq phase of system-wide suspend and resumed in the > > > noirq phase of system-wide resume. Or at least all of the other devices > > > need to be suspended before them and resumed after them. > > > > We're fine on this front as well. We have run into such issues in the > > past, but I don't think there are any such issue left at the moment. I > > do have one pending fix for I2C suspend/resume which fixes an issue > > where some pinmuxing changes needed to get the HDMI DDC channel to work > > were not getting applied during resume. > > > > That I2C issue is related to this, I think. What I'm seeing is that when > > the system goes to sleep, the pinmux looses its programming at a > > hardware level, but the I2C driver doesn't know about it because it does > > not get runtime suspended. > > Well, no, that's not the reason. The real reason is that the handling > of that device during system-wide suspend does not follow the rules > followed by PM-runtime for it. > > Switching system-wide PM over to PM-runtime to address that is not > going to work, because PM-runtime is not mandatory and system-wide PM > is. > > > At runtime suspend it would switch the pinmux > > state to "idle" which would then match the system suspend state. Upon > > runtime resume it sets the "default" pinmux state, which will then > > restore the register programming. > > So this logic needs to be implemented in the system-wide suspend flow as well. I suppose one other alternative would be to use universal PM ops for this case. In this case we actually do want the same behaviour at system sleep than we do for runtime PM. > > In the current case where runtime suspend/resume is prohibited during > > Runtime suspend is, runtime resume isn't until the "late" suspend phase. > > > system sleep, upon resume the I2C driver will assume that the pinmux > > state is still "default" and it won't reapply the state (it's actually > > the pinmux subsystem that makes this decision) and causes HDMI DDC > > transactions to time out. > > So this is a bug in the system-wide suspend/resume flow that needs to > be addressed, but not by switching it over to PM-runtime. > > > One simple fix for that is to use pm_runtime_force_suspend() and > > pm_runtime_force_resume() as system suspend/resume callbacks to make > > sure the I2C controller is runtime suspended/resumed during system > > sleep. > > > > Note that forcing runtime suspend/resume this way is suboptimal in the > > DRM/KMS case because the suspend/resume happens disconnected from the > > subsystem suspend/resume callbacks, which is not desired as that breaks > > some of the assumptions in those callbacks. > > So there needs to be another way. > > Have you looked at DPM_FLAG_SMART_SUSPEND? I'll look at that. It seems like it could do the trick for the I2C problem I'm seeing. Generally, though, what I keep noticing here is that for many devices there is some commonality between runtime PM and system sleep. Actually for some devices they are exactly the same, which I guess is one of the reasons why I had hoped we could somehow simplify things by having runtime PM on one hand and then if system sleep doesn't need anything other than what runtime PM already does, we could just do runtime PM all the time. That way we could avoid all the duplication. I guess that's mostly what universal PM ops are about. I'm not exactly sure how that would work during system resume, though. Would the PM core not invoke the same callback twice, once for system resume and then again (after allowing runtime PM again) for resume runtime? > > > These two things should allow you to cover the vast majority of cases if > > > not all of them without messing up with the rules. > > > > One alternative that I had thought about was to just ditch the runtime > > PM callbacks for this. However, there's one corner case where this may > > break. On early Tegra generations, the two display controllers are > > "coupled" in that the second one doesn't work if the first one is > > disabled. We describe that using a device link from the second to the > > first controller. This causes the first controller to be automatically > > be runtime resumed when the second controller is used. This only works > > via runtime PM, so if I don't use runtime PM I'd have to add special > > handling for that case. > > Runtime resume during system-wide suspend and resume is basically fine > unless you try to do it in the "late" suspend phase or later, but that > limitation is kind of artificial. [I was talking about that at the > LPC this year.] It basically cannot be carried out in the part of > system-wide suspend after the core regards the device and its parent > etc as "suspended", but the definition of that may be adjusted IMO. > > And using PM-runtime resume during system-wide resume may be fine too, > basically (as long as the ordering of that is not lead to any kind of > loop dependencies). > > On the other hand, there is *zero* need for runtime suspend during > system-wide transitions and it is known problematic. I don't quite understand this. I don't see a need to runtime resume during suspend, because you're actually trying to suspend devices. Most of the time at least. I get that in some cases you may need to resume devices in order to help put other devices (that depend on them) into suspend, but most of the time the goal is to set devices into some low power state so that when the system is asleep you consume less power than in the active state. This is more or less the same thing that you want with runtime PM as well, isn't it? So why do you say that there's no need for runtime suspend during system suspend? I always figured that runtime suspend was sort of a soft system suspend in that system suspend may be more aggressive, and mostly a superset of runtime suspend. So in pseudocode it would be roughly: runtime_suspend(device) { set_low_power_mode(device); } system_suspend(device) { save_context(device); runtime_suspend(device); power_off(device); } And for resume you could basically just call these in reverse order: runtime_resume(device) { set_normal_mode(device); } system_resume(device) { power_on(device); runtime_resume(device); restore_context(device); } I understand that this may not be true for all devices. However, in some cases we may even want to go further and do at runtime_suspend() what we do in system_suspend() because the impact may be low enough and the power savings worth it. > > Actually, there's another problem as well. Most of these devices use > > generic PM domains to power on/off the SoC partitions that they're in. > > If I side-step runtime PM, then I'd have to somehow find a way to > > explicitly control the PM domains. > > That's a problem with genpd, I'd say. Fair enough. So far we've tried to implement things such that they work within the existing infrastructure, but if runtime PM turns out not to be what we actually need, maybe we need to just move to something different. > > Another alternative would be to have a kind of hybrid approach where I > > leave runtime PM calls in the drivers, but disconnect the runtime PM > > callback implementations from that. That would at least fix the issue > > with the generic PM domains. > > > > However, it would not fix the problem with coupled display controllers > > because empty runtime PM callbacks wouldn't actually power up the first > > display controller when it is needed by the second controller. I would > > have to add infrastructure that basically duplicates some of runtime PM > > to fix that. > > > > So the bottom line is that runtime PM is still the best solution for > > this problem. It works really nice and is very consistent. > > > > Do you think adding better documentation to this new flag and the > > accessors would help remove your concerns about this? > > No, it wouldn't. > > Also your arguments are mostly about PM-runtime resume, which is a > different story. What makes you say that? I'm equally concerned about runtime suspend because runtime resume alone is not good enough to resume a device that was never suspended. Runtime suspend will typically assert a reset for these devices and runtime resume will then deassert the reset. This is necessary to get the devices into a proper working state. Thierry
On Fri, Nov 29, 2019 at 11:22:08AM +0100, Rafael J. Wysocki wrote: > On Fri, Nov 29, 2019 at 11:08 AM Thierry Reding > <thierry.reding@gmail.com> wrote: > > > > On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote: > > > On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote: > > > > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote: > > > > > > > > > > --0F1p//8PRICkK4MW > > > > > Content-Type: text/plain; charset=us-ascii > > > > > Content-Disposition: inline > > > > > Content-Transfer-Encoding: quoted-printable > > > > > > > > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > > > > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>= > > > > > wrote: > > > > > > > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > > > > > > > Currently the driver PM core will automatically acquire a runtime PM > > > > > > > reference for devices before system sleep is entered. This is needed > > > > > > > to avoid potential issues related to devices' parents getting put to > > > > > > > runtime suspend at the wrong time and causing problems with their > > > > > > > children. > > > > > >=20 > > > > > > Not only for that. > > > > > >=20 > > > > > > > In some cases drivers are carefully written to avoid such issues and > > > > > > > the default behaviour can be changed to allow runtime PM to operate > > > > > > > regularly during system sleep. > > > > > >=20 > > > > > > But this change breaks quite a few assumptions in the core too, so no, > > > > > > it can't be made. > > > > > > > > > > Anything in particular that I can look at? I'm not seeing any issues > > > > > when I test this, which could of course mean that I'm just getting > > > > > lucky. > > > > > > > > There are races and such that you may never hit during casual testing. > > > > > > > > > One thing that irritated me is that I think this used to work. I do > > > > > recall testing suspend/resume a few years ago and devices would get > > > > > properly runtime suspended/resumed. > > > > > > > > Not true at all. > > > > > > > > The PM core has always taken PM-runtime references on all devices pretty much > > > > since when PM-runtime was introduced. > > > > > > > > > I did some digging but couldn't > > > > > find anything that would have had an impact on this. > > > > > > > > > > Given that this is completely opt-in feature, why are you categorically > > > > > NAK'ing this? > > > > > > > > The general problem is that if any device has been touched by system-wide > > > > suspend code, it should not be subject to PM-runtime any more until the > > > > subsequent system-wide resume is able to undo whatever the suspend did. > > > > > > > > Moreover, if a device is runtime-suspended, the system-wide suspend code > > > > may mishandle it, in general. That's why PM-runtime suspend is not allowed > > > > during system-wide transitions at all. And it has always been like that. > > > > > > > > For a specific platform you may be able to overcome these limitations if > > > > you are careful enough, but certainly they are there in general and surely > > > > you cannot prevent people from using your opt-in just because they think > > > > that they know what they are doing. > > > > > > BTW, what if user space prevents PM-runtime from suspending devices by writing > > > "on" to their "control" files? > > > > > > System-wide suspend is (of course) still expected to work in that case, so how > > > exactly would you overcome that? > > > > I suppose one way to overcome that would be to make it an error to write > > "on" to the "control" files for these devices. > > Seeing suggestions like this in messages from seasoned kernel > developers is seriously disappointing. :-/ > > > Currently doing this is likely going to break display support on Tegra, > > so this would be a good idea in this case anyway. > > PM-runtime has always allowed user space to prevent devices from being > suspended and it seems that this has not been taken into account by > Tegra display support developers at all. > > > Again, I could avoid all of these issues by avoiding runtime PM in this driver, > > I don't quite see the connection here. > > Preventing a device from suspending should never be a functional > problem. It may be an energy-efficiency problem, but that's something > for user space to consider before writing "on" to a device's control > file. That's really a question of how you define suspension. In the case of display drivers we have the somewhat unfortunate situation that in most SoCs the display "device" is actually represented by a collection of different devices. On Tegra specifically, for example, you have a couple of display controllers, then some "encoders" that take pixel streams from the display controllers and encode them into some wire format like LVDS, HDMI, DSI or DP. Prohibiting suspension of any of the individual devices causes problems because it effectively makes the whole composite display device not suspendable. Doing so in turn usually means that you can't change the display configuration anymore because devices need to be powered up and down in order to change the configuration. I consider powering up and down the devices a form of suspension. Hence it seemed natural to implement using runtime PM. It sounds to me like userspace preventing runtime PM is problematic in most scenarios that involve composite devices because it makes all of the interactions between the devices a bit complicated. > > but I would end up reimplementing some of the same concepts. I'd > > rather use something that's supported by the PM core and that might be > > useful to other drivers than reinvent the wheel. > > Which doesn't have to be by using PM-runtime suspend for the handling > of system-wide suspend, at least in my view. Well, runtime PM is very convenient for this, though. It would allow the same code paths to be used in all cases. Thierry
On Fri, Nov 29, 2019 at 01:07:19PM +0100, Thierry Reding wrote: > On Fri, Nov 29, 2019 at 11:22:08AM +0100, Rafael J. Wysocki wrote: > > On Fri, Nov 29, 2019 at 11:08 AM Thierry Reding > > <thierry.reding@gmail.com> wrote: > > > > > > On Thu, Nov 28, 2019 at 11:20:01PM +0100, Rafael J. Wysocki wrote: > > > > On Thursday, November 28, 2019 11:03:57 PM CET Rafael J. Wysocki wrote: > > > > > On Thursday, November 28, 2019 5:50:26 PM CET Thierry Reding wrote: > > > > > > > > > > > > --0F1p//8PRICkK4MW > > > > > > Content-Type: text/plain; charset=us-ascii > > > > > > Content-Disposition: inline > > > > > > Content-Transfer-Encoding: quoted-printable > > > > > > > > > > > > On Thu, Nov 28, 2019 at 05:14:51PM +0100, Rafael J. Wysocki wrote: > > > > > > > On Thu, Nov 28, 2019 at 5:03 PM Thierry Reding <thierry.reding@gmail.com>= > > > > > > wrote: > > > > > > > > > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > > > > > > > > > Currently the driver PM core will automatically acquire a runtime PM > > > > > > > > reference for devices before system sleep is entered. This is needed > > > > > > > > to avoid potential issues related to devices' parents getting put to > > > > > > > > runtime suspend at the wrong time and causing problems with their > > > > > > > > children. > > > > > > >=20 > > > > > > > Not only for that. > > > > > > >=20 > > > > > > > > In some cases drivers are carefully written to avoid such issues and > > > > > > > > the default behaviour can be changed to allow runtime PM to operate > > > > > > > > regularly during system sleep. > > > > > > >=20 > > > > > > > But this change breaks quite a few assumptions in the core too, so no, > > > > > > > it can't be made. > > > > > > > > > > > > Anything in particular that I can look at? I'm not seeing any issues > > > > > > when I test this, which could of course mean that I'm just getting > > > > > > lucky. > > > > > > > > > > There are races and such that you may never hit during casual testing. > > > > > > > > > > > One thing that irritated me is that I think this used to work. I do > > > > > > recall testing suspend/resume a few years ago and devices would get > > > > > > properly runtime suspended/resumed. > > > > > > > > > > Not true at all. > > > > > > > > > > The PM core has always taken PM-runtime references on all devices pretty much > > > > > since when PM-runtime was introduced. > > > > > > > > > > > I did some digging but couldn't > > > > > > find anything that would have had an impact on this. > > > > > > > > > > > > Given that this is completely opt-in feature, why are you categorically > > > > > > NAK'ing this? > > > > > > > > > > The general problem is that if any device has been touched by system-wide > > > > > suspend code, it should not be subject to PM-runtime any more until the > > > > > subsequent system-wide resume is able to undo whatever the suspend did. > > > > > > > > > > Moreover, if a device is runtime-suspended, the system-wide suspend code > > > > > may mishandle it, in general. That's why PM-runtime suspend is not allowed > > > > > during system-wide transitions at all. And it has always been like that. > > > > > > > > > > For a specific platform you may be able to overcome these limitations if > > > > > you are careful enough, but certainly they are there in general and surely > > > > > you cannot prevent people from using your opt-in just because they think > > > > > that they know what they are doing. > > > > > > > > BTW, what if user space prevents PM-runtime from suspending devices by writing > > > > "on" to their "control" files? > > > > > > > > System-wide suspend is (of course) still expected to work in that case, so how > > > > exactly would you overcome that? > > > > > > I suppose one way to overcome that would be to make it an error to write > > > "on" to the "control" files for these devices. > > > > Seeing suggestions like this in messages from seasoned kernel > > developers is seriously disappointing. :-/ > > > > > Currently doing this is likely going to break display support on Tegra, > > > so this would be a good idea in this case anyway. > > > > PM-runtime has always allowed user space to prevent devices from being > > suspended and it seems that this has not been taken into account by > > Tegra display support developers at all. > > > > > Again, I could avoid all of these issues by avoiding runtime PM in this driver, > > > > I don't quite see the connection here. > > > > Preventing a device from suspending should never be a functional > > problem. It may be an energy-efficiency problem, but that's something > > for user space to consider before writing "on" to a device's control > > file. > > That's really a question of how you define suspension. In the case of > display drivers we have the somewhat unfortunate situation that in most > SoCs the display "device" is actually represented by a collection of > different devices. On Tegra specifically, for example, you have a couple > of display controllers, then some "encoders" that take pixel streams > from the display controllers and encode them into some wire format like > LVDS, HDMI, DSI or DP. > > Prohibiting suspension of any of the individual devices causes problems > because it effectively makes the whole composite display device not > suspendable. Doing so in turn usually means that you can't change the > display configuration anymore because devices need to be powered up and > down in order to change the configuration. > > I consider powering up and down the devices a form of suspension. Hence > it seemed natural to implement using runtime PM. > > It sounds to me like userspace preventing runtime PM is problematic in > most scenarios that involve composite devices because it makes all of > the interactions between the devices a bit complicated. Yeah with the DT model of how a SoC works, all these tiny little devices are essentially implementation details that userspace really shouldn't ever care about, much less change anything with them. If userspace doesn't want the gpu to auto-suspend, then there's the overall gpu device that it can set that on, and I guess doing that should not break a decently written driver. For all the others insisting that userspace can be stupid essentially means we get to hand roll large chunks of runtime pm in drivers, which feels rather pointless. We have a lot of that home-grown runtime pm for subcomponents in i915, and I very much understand why the DT folks wanted to standardize all that with lots of little explicit devices. Maybe the mistake was simply allowing these to be visible to userspace. -Daniel > > > but I would end up reimplementing some of the same concepts. I'd > > > rather use something that's supported by the PM core and that might be > > > useful to other drivers than reinvent the wheel. > > > > Which doesn't have to be by using PM-runtime suspend for the handling > > of system-wide suspend, at least in my view. > > Well, runtime PM is very convenient for this, though. It would allow the > same code paths to be used in all cases. > > Thierry > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Fri, Nov 29, 2019 at 1:07 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > On Fri, Nov 29, 2019 at 11:22:08AM +0100, Rafael J. Wysocki wrote: > > [cut] Sorry for the delay. First off, let me note that I have seen your most recent patches and thanks for taking the feedback into account, much appreciated! Nevertheless, I feel that I need to address the below, because it is really important. > > Preventing a device from suspending should never be a functional > > problem. It may be an energy-efficiency problem, but that's something > > for user space to consider before writing "on" to a device's control > > file. > > That's really a question of how you define suspension. In general, yes. However, if you talk about PM-runtime, there are definitions of "suspended" and "active" in there already. Namely, in the PM-runtime context, "suspended" means "may not be accessible to software" whereas "active" means "software can access it". > In the case of > display drivers we have the somewhat unfortunate situation that in most > SoCs the display "device" is actually represented by a collection of > different devices. On Tegra specifically, for example, you have a couple > of display controllers, then some "encoders" that take pixel streams > from the display controllers and encode them into some wire format like > LVDS, HDMI, DSI or DP. > > Prohibiting suspension of any of the individual devices causes problems > because it effectively makes the whole composite display device not > suspendable. For PM-runtime, that shouldn't be a problem at all. PM-runtime is all about (possibly) saving energy by powering down devices that are not in use. In particular, It is not about powering down any devices on demand for any reason other than idleness. Therefore in PM-runtime a situation in which a given device cannot be suspended at all is regarded as normal, even though that may not be desirable for energy-efficiency reasons. It just means that the device is in use by somebody all the time. Moreover, PM-runtime is designed to make it possible to resume devices at any time (as long as the hardware works as expected), as soon as they are needed, modulo some possible delays. Actually, that's the purpose of a significant part of the PM-runtime framework. Accordingly, device drivers may refuse to suspend devices, but refusing to resume a device is not expected by PM-runtime. If writing "on" to the "control" file of a device does not cause it to be resumed (if suspended) and to stay in the "active" meta-state until "auto" is written to that file, you cannot really claim that PM-runtime is working correctly on your system. > Doing so in turn usually means that you can't change the > display configuration anymore because devices need to be powered up and > down in order to change the configuration. > > I consider powering up and down the devices a form of suspension. Hence > it seemed natural to implement using runtime PM. Unfortunately, that's not the case. The purpose of PM-runtime is to allow idle devices to be put into power states in which it may not be safe to access them and to make them go back into the "accessible and responsive" state whenever software wants/needs to access them in a coordinated fashion. IOW, it kind of is a counterpart of CPU idle time management. > It sounds to me like userspace preventing runtime PM is problematic in > most scenarios that involve composite devices because it makes all of > the interactions between the devices a bit complicated. Even so, that's how it works. User space can expect to be able to block runtime suspend of devices at any level of device hierarchy, at least for diagnostics if nothing else, end the kernel is responsible for ensuring that. > > > but I would end up reimplementing some of the same concepts. I'd > > > rather use something that's supported by the PM core and that might be > > > useful to other drivers than reinvent the wheel. > > > > Which doesn't have to be by using PM-runtime suspend for the handling > > of system-wide suspend, at least in my view. > > Well, runtime PM is very convenient for this, though. It would allow the > same code paths to be used in all cases. The same low-level power-up and power-down code can be used in all cases, but PM-runtime is not low-level enough. It is also opportunistic, so if you need to power down a device for reasons other than "natural" idleness, PM-runtime is not the right tool for that task. Of course, PM-runtime callbacks can invoke the low-level power-up and power-down code, but as you said there are reasons for powering down devices not just because they happen to be idle. System-wide suspend is one of them.
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 134a8af51511..f8dbf00c703b 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1113,7 +1113,8 @@ static void device_complete(struct device *dev, pm_message_t state) device_unlock(dev); - pm_runtime_put(dev); + if (!dev->power.always_runtime) + pm_runtime_put(dev); } /** @@ -1896,7 +1897,8 @@ static int device_prepare(struct device *dev, pm_message_t state) * block runtime suspend here, during the prepare phase, and allow * it again during the complete phase. */ - pm_runtime_get_noresume(dev); + if (!dev->power.always_runtime) + pm_runtime_get_noresume(dev); device_lock(dev); diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index 48616f358854..699803984426 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -1440,6 +1440,22 @@ void pm_runtime_allow(struct device *dev) } EXPORT_SYMBOL_GPL(pm_runtime_allow); +void pm_runtime_always_allow(struct device *dev) +{ + spin_lock_irq(&dev->power.lock); + dev->power.always_runtime = 1; + spin_unlock_irq(&dev->power.lock); +} +EXPORT_SYMBOL_GPL(pm_runtime_always_allow); + +void pm_runtime_always_forbid(struct device *dev) +{ + spin_lock_irq(&dev->power.lock); + dev->power.always_runtime = 0; + spin_unlock_irq(&dev->power.lock); +} +EXPORT_SYMBOL_GPL(pm_runtime_always_forbid); + /** * pm_runtime_no_callbacks - Ignore runtime PM callbacks for a device. * @dev: Device to handle. diff --git a/include/linux/pm.h b/include/linux/pm.h index e057d1fa2469..6133cf496878 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -615,6 +615,7 @@ struct dev_pm_info { unsigned int use_autosuspend:1; unsigned int timer_autosuspends:1; unsigned int memalloc_noio:1; + unsigned int always_runtime:1; unsigned int links_count; enum rpm_request request; enum rpm_status runtime_status; diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index 22af69d237a6..28204baf01cb 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -46,6 +46,8 @@ extern void pm_runtime_enable(struct device *dev); extern void __pm_runtime_disable(struct device *dev, bool check_resume); extern void pm_runtime_allow(struct device *dev); extern void pm_runtime_forbid(struct device *dev); +extern void pm_runtime_always_allow(struct device *dev); +extern void pm_runtime_always_forbid(struct device *dev); extern void pm_runtime_no_callbacks(struct device *dev); extern void pm_runtime_irq_safe(struct device *dev); extern void __pm_runtime_use_autosuspend(struct device *dev, bool use);