Message ID | 1591167.zlNa3zLfmM@aspire.rjw.lan (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Wednesday, October 18, 2017 2:39:24 AM CEST Rafael J. Wysocki wrote: > On Tuesday, October 17, 2017 9:41:16 PM CEST Ulf Hansson wrote: > > [cut] > > > > > > >> deploying this and from a middle layer point of view, all the trivial > > >> cases supports this. > > > > > > These functions are wrong, however, because they attempt to reuse the > > > whole callback *path* instead of just reusing driver callbacks. The > > > *only* reason why it all "works" is because there are no middle layer > > > callbacks involved in that now. > > > > > > If you changed them to reuse driver callbacks only today, nothing would break > > > AFAICS. > > > > Yes, it would. > > > > First, for example, the amba bus is responsible for the amba bus > > clock, but relies on drivers to gate/ungate it during system sleep. In > > case the amba drivers don't use the pm_runtime_force_suspend|resume(), > > it will explicitly have to start manage the clock during system sleep > > themselves. Leading to open coding. > > Well, I suspected that something like this would surface. ;-) > > Are there any major reasons why the appended patch (obviously untested) won't > work, then? OK, there is a reason, which is the optimizations bundled into pm_runtime_force_*, because (a) the device may be left in runtime suspend by them (in which case amba_pm_suspend_early() in my patch should not run) and (b) pm_runtime_force_resume() may decide to leave it suspended (in which case amba_pm_suspend_late() in my patch should not run). [BTW, the "leave the device suspended" optimization in pm_runtime_force_* is potentially problematic too, because it requires the children to do the right thing, which effectively means that their drivers need to use pm_runtime_force_* too, but what if they don't want to reuse their runtime PM callbacks for system-wide PM?] Honestly, I don't like the way this is designed. IMO, it would be better to do the optimizations and all in the bus type middle-layer code instead of expecting drivers to use pm_runtime_force_* as their system-wide PM callbacks (and that expectation should at least be documented, which I'm not sure is the case now). But whatever. It all should work the way it does now without pm_runtime_force_* if (a) the bus type's PM callbacks are changed like in the last patch and the drivers (b) point their system suspend callbacks to the runtime PM callback routines and (c) set DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED for the devices (if they need to do the PM in ->suspend and ->resume, they may set DPM_FLAG_AVOID_RPM too). And if you see a reason why that won't work, please let me know. Thanks, Rafael
On 18 October 2017 at 02:39, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > On Tuesday, October 17, 2017 9:41:16 PM CEST Ulf Hansson wrote: > > [cut] > >> > >> >> deploying this and from a middle layer point of view, all the trivial >> >> cases supports this. >> > >> > These functions are wrong, however, because they attempt to reuse the >> > whole callback *path* instead of just reusing driver callbacks. The >> > *only* reason why it all "works" is because there are no middle layer >> > callbacks involved in that now. >> > >> > If you changed them to reuse driver callbacks only today, nothing would break >> > AFAICS. >> >> Yes, it would. >> >> First, for example, the amba bus is responsible for the amba bus >> clock, but relies on drivers to gate/ungate it during system sleep. In >> case the amba drivers don't use the pm_runtime_force_suspend|resume(), >> it will explicitly have to start manage the clock during system sleep >> themselves. Leading to open coding. > > Well, I suspected that something like this would surface. ;-) > > Are there any major reasons why the appended patch (obviously untested) won't > work, then? Let me comment on the code, instead of here... ...just realized your second reply, so let me reply to that instead regarding the patch. > >> Second, it will introduce a regression in behavior for all users of >> pm_runtime_force_suspend|resume(), especially during system resume as >> the driver may then end up resuming the device even in case it isn't >> needed. > > How so? > > I'm talking about a change like in the appended patch, where > pm_runtime_force_* simply invoke driver callbacks directly. What is > skipped there is middle-layer stuff which is empty anyway in all cases > except for AMBA (if that's all what is lurking below the surface), so > I don't quite see how the failure will happen. I am afraid changing pm_runtime_force* to only call driver callbacks may become fragile. Let me elaborate. The reason why pm_runtime_force_* needs to respects the hierarchy of the RPM callbacks, is because otherwise it can't safely update the runtime PM status of the device. And updating the runtime PM status of the device is required to manage the optimized behavior during system resume (avoiding to unnecessary resume devices). Besides the AMBA case, I also realized that we are dealing with PM clocks in the genpd case. For this, genpd relies on the that runtime PM status of the device properly reflects the state of the HW, during system-wide PM. In other words, if the driver would change the runtime PM status of the device, without respecting the hierarchy of the runtime PM callbacks, it would lead to that genpd starts taking wrong decisions while managing the PM clocks during system-wide PM. So in case you intend to change pm_runtime_force_* this needs to be addressed too. > >> I believe I have explained why, also several times by now - >> and that's also how far you could take the i2c designware driver at >> this point. >> >> That said, I assume the second part may be addressed in this series, >> if these drivers convert to use the "driver PM flags", right? >> >> However, what about the first case? Is some open coding needed or your >> think the amba driver can instruct the amba bus via the "driver PM >> flags"? > > With the appended patch applied things should work for AMBA like for > any other bus type implementing PM, so I don't see why not. > >> > >> >> Like the spi bus, i2c bus, amba bus, platform >> >> bus, genpd, etc. There are no changes needed to continue to support >> >> this option, if you see what I mean. >> > >> > For the time being, nothing changes in that respect, but eventually I'd >> > prefer the pm_runtime_force_* things to go away, frankly. >> >> Okay, thanks for that clear statement! >> >> > >> >> So, when you say that re-using runtime PM callbacks for system-wide PM >> >> isn't going to happen, can you please elaborate what you mean? >> > >> > I didn't mean "reusing runtime PM callbacks for system-wide PM" overall, but >> > reusing *middle-layer* runtime PM callbacks for system-wide PM. That is the >> > bogus part. >> >> I think we have discussed this several times, but the arguments you >> have put forward, explaining *why* haven't yet convinced me. > > Well, sorry about that. I would like to be able to explain my point to you so > that you understand my perspective, but if that's not working, that's not a > sufficient reason for me to give up. > > I'm just refusing to maintain code that I don't agree with in the long run. > >> In principle what you have been saying is that it's a "layering >> violation" to use pm_runtime_force_suspend|resume() from driver's >> system sleep callbacks, but on the other hand you think using >> pm_runtime_get* and friends is okay!? > > Not unconditionally, which would be fair to mention. > > Only if it is called in ->prepare or as the first thing in a ->suspend > callback. Later than that is broken too in principle. > >> That makes little sense to me, because it's the same "layering >> violation" that is done for both cases. > > The "layering violation" is all about things possibly occurring in a > wrong order. For example, say a middle-layer ->runtime_suspend is > called via pm_runtime_force_suspend() which in turn is called from > middle-layer ->suspend_late as a driver callback. If the ->runtime_suspend > does anything significat to the device, then executing the remaining part of > ->suspend_late will almost cetainly break things, more or less. > > That is not a concern with a middle-layer ->runtime_resume running > *before* a middle-layer ->suspend (or any subsequent callbacks) does > anything significant to the device. > > Is there anything in the above which is not clear enough? > >> Moreover, you have been explaining that re-using runtime PM callbacks >> for PCI doesn't work. Then my question is, why should a limitation of >> the PCI subsystem put constraints on the behavior for all other >> subsystems/middle-layers? > > Because they aren't just PCI subsystem limitations only. The need to handle > wakeup setup differently for runtime PM and system sleep is not PCI-specific. > The need to handle suspend and hibernation differently isn't too. > > Those things may be more obvious in PCI, but they are generic rather than > special. Absolutely agree about the different wake-up settings. However, these issues can be addressed also when using pm_runtime_force_*, at least in general, but then not for PCI. Regarding hibernation, honestly that's not really my area of expertise. Although, I assume the middle-layer and driver can treat that as a separate case, so if it's not suitable to use pm_runtime_force* for that case, then they shouldn't do it. > > Also, quite so often other middle layers interact with PCI directly or > indirectly (eg. a platform device may be a child or a consumer of a PCI > device) and some optimizations need to take that into account (eg. parents > generally need to be accessible when their childres are resumed and so on). A device's parent becomes informed when changing the runtime PM status of the device via pm_runtime_force_suspend|resume(), as those calls pm_runtime_set_suspended|active(). In case that isn't that sufficient, what else is needed? Perhaps you can point me to an example so I can understand better? For a PCI consumer device those will of course have to play by the rules of PCI. > > Moreover, the majority of the "other subsystems/middle-layers" you've talked > about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*, > so question is how representative they really are. That's the point. We know pm_runtime_force_* works nicely for the trivial middle-layer cases. For the more complex cases, we need something additional/different. > >> > >> > Quoting again: >> > >> > "If you are a middle layer, your role is basically to do PM for a certain >> > group of devices. Thus you cannot really do the same in ->suspend or >> > ->suspend_early and in ->runtime_suspend (because the former generally need to >> > take device_may_wakeup() into account and the latter doesn't) and you shouldn't >> > really do the same in ->suspend and ->freeze (becuase the latter shouldn't >> > change the device's power state) and so on." >> > >> > I have said for multiple times that re-using *driver* callbacks actually makes >> > sense and the series is for doing that easier in general among other things. >> > >> >> I assume you mean that the PM core won't be involved to support this, >> >> but is that it? >> >> >> >> Do you also mean that *all* users of pm_runtime_force_suspend|resume() >> >> must convert to this new thing, using "driver PM flags", so in the end >> >> you want to remove pm_runtime_force_suspend|resume()? >> >> - Then if so, you must of course consider all cases for how >> >> pm_runtime_force_suspend|resume() are being deployed currently, else >> >> existing users can't convert to the "driver PM flags" thing. Have you >> >> done that in this series? >> > >> > Let me turn this around. >> > >> > The majority of cases in which pm_runtime_force_* are used *should* be >> > addressable using the flags introduced here. Some case in which >> > pm_runtime_force_* cannot be used should be addressable by these flags >> > as well. >> >> That's sounds really great! >> >> > >> > There may be some cases in which pm_runtime_force_* are used that may >> > require something more, but I'm not going to worry about that right now. >> >> This approach concerns me, because if we in the end realizes that >> pm_runtime_force_suspend|resume() will be too hard to get rid of, then >> this series just add yet another generic way of trying to optimize the >> system sleep path for runtime PM enabled devices. > > Which also works for PCI and the ACPI PM domain and that's sort of valuable > anyway, isn't it? Indeed it is! I am definitely open to improve the situation for ACPI and PCI. Seems like I may have given the wrong impression about that. > > For the record, I don't think it will be too hard to get rid of > pm_runtime_force_suspend|resume(), although that may take quite some time. > >> So then we would end up having to support the "direct_complete" path, >> the "driver PM flags" and cases where >> pm_runtime_force_suspend|resume() is used. No, that just isn't good >> enough to me. That will just lead to similar scenarios as we had in >> the i2c designware driver. > > Frankly, this sounds like staging for indefinite blocking of changes in > this area on non-technical grounds. I hope that it isn't the case ... > >> If we decide to go with these new "driver PM flags", I want to make >> sure, as long as possible, that we can remove both the >> "direct_complete" path support from the PM core as well as removing >> the pm_runtime_force_suspend|resume() helpers. > > We'll see. > >> > >> > I'll take care of that when I'll be removing pm_runtime_force_*, which I'm >> > not doing here. >> >> Of course I am fine with that we postpone doing the actual converting >> of drivers etc from this series, although as stated above, let's sure >> we *can* do it by using the "driver PM flags". > > There clearly are use cases that benefit from this series and I don't see > any alternatives covering them, including both direct-complete and the > pm_runtime_force* approach, so I'm not buying this "let's make sure > it can cover all possible use cases that exist" argumentation. Alright, let me re-phrase my take on this. Because you stated that you plan to remove pm_runtime_force_* eventually, then I think you need to put up some valid reasons of why (I consider that done), but more importantly, you need to offer an alternative solution that can replace it. Else such that statement can easily become wrong interpreted. My point is, the "driver PM flags" do *not* offers a full alternative solution, it may do in the future or it may not. So, to conclude from my side, I don't have any major objections to going forward with the "driver PM flags", especially with the goal of improving the situation for PCI and ACPI. Down the road, we can then *try* to make it replace pm_runtime_force_* and the "direct_complete path". Hopefully that makes it more clear. [...] Kind regards Uffe
[...] >> Are there any major reasons why the appended patch (obviously untested) won't >> work, then? > > OK, there is a reason, which is the optimizations bundled into > pm_runtime_force_*, because (a) the device may be left in runtime suspend > by them (in which case amba_pm_suspend_early() in my patch should not run) > and (b) pm_runtime_force_resume() may decide to leave it suspended (in which > case amba_pm_suspend_late() in my patch should not run). Exactly. > > [BTW, the "leave the device suspended" optimization in pm_runtime_force_* > is potentially problematic too, because it requires the children to do > the right thing, which effectively means that their drivers need to use > pm_runtime_force_* too, but what if they don't want to reuse their > runtime PM callbacks for system-wide PM?] Deployment of pm_runtime_force_suspend() should generally be done for children devices first. If some reason that isn't the case, it's expected that the call to pm_runtime_set_suspended() invoked from pm_runtime_force_suspend(), for the parent, should fail and thus abort system suspend. > > Honestly, I don't like the way this is designed. IMO, it would be better > to do the optimizations and all in the bus type middle-layer code instead > of expecting drivers to use pm_runtime_force_* as their system-wide PM > callbacks (and that expectation should at least be documented, which I'm > not sure is the case now). But whatever. > > It all should work the way it does now without pm_runtime_force_* if (a) the > bus type's PM callbacks are changed like in the last patch and the drivers > (b) point their system suspend callbacks to the runtime PM callback routines > and (c) set DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED for the > devices (if they need to do the PM in ->suspend and ->resume, they may set > DPM_FLAG_AVOID_RPM too). > > And if you see a reason why that won't work, please let me know. I will have look and try out the series by using my local "runtime PM test driver". I get back to you with an update on this. Kind regards Uffe
On Wednesday, October 18, 2017 1:57:52 PM CEST Ulf Hansson wrote: > On 18 October 2017 at 02:39, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > On Tuesday, October 17, 2017 9:41:16 PM CEST Ulf Hansson wrote: > > > > [cut] > > > >> > > >> >> deploying this and from a middle layer point of view, all the trivial > >> >> cases supports this. > >> > > >> > These functions are wrong, however, because they attempt to reuse the > >> > whole callback *path* instead of just reusing driver callbacks. The > >> > *only* reason why it all "works" is because there are no middle layer > >> > callbacks involved in that now. > >> > > >> > If you changed them to reuse driver callbacks only today, nothing would break > >> > AFAICS. > >> > >> Yes, it would. > >> > >> First, for example, the amba bus is responsible for the amba bus > >> clock, but relies on drivers to gate/ungate it during system sleep. In > >> case the amba drivers don't use the pm_runtime_force_suspend|resume(), > >> it will explicitly have to start manage the clock during system sleep > >> themselves. Leading to open coding. > > > > Well, I suspected that something like this would surface. ;-) > > > > Are there any major reasons why the appended patch (obviously untested) won't > > work, then? > > Let me comment on the code, instead of here... > > ...just realized your second reply, so let me reply to that instead > regarding the patch. > > > > >> Second, it will introduce a regression in behavior for all users of > >> pm_runtime_force_suspend|resume(), especially during system resume as > >> the driver may then end up resuming the device even in case it isn't > >> needed. > > > > How so? > > > > I'm talking about a change like in the appended patch, where > > pm_runtime_force_* simply invoke driver callbacks directly. What is > > skipped there is middle-layer stuff which is empty anyway in all cases > > except for AMBA (if that's all what is lurking below the surface), so > > I don't quite see how the failure will happen. > > I am afraid changing pm_runtime_force* to only call driver callbacks > may become fragile. Let me elaborate. > > The reason why pm_runtime_force_* needs to respects the hierarchy of > the RPM callbacks, is because otherwise it can't safely update the > runtime PM status of the device. I'm not sure I follow this requirement. Why is that so? > And updating the runtime PM status of > the device is required to manage the optimized behavior during system > resume (avoiding to unnecessary resume devices). Well, OK. The runtime PM status of the device after system resume should better reflect its physical state. [The physical state of the device may not be under the control of the kernel in some cases, like in S3 resume on some systems that reset devices in the firmware and so on, but let's set that aside.] However, for the runtime PM status of the device may still reflect its state if, say, a ->resume_early of the middle layer is called during resume along with a driver's ->runtime_resume. That still can produce the right state of the device and all depends on the middle layer. On the other hand, as I said before, using a middle-layer ->runtime_suspend during a system sleep transition may be outright incorrect, say if device wakeup settings need to be adjusted by the middle layer (which is the case for some of them). Of course, if the middle layer expects the driver to point its system-wide PM callbacks to pm_runtime_force_*, then that's how it goes, but the drivers working with this particular middle layer generally won't work with other middle layers and may interact incorrectly with parents and/or children using the other middle layers. I guess the problem boils down to having a common set of expectations on the driver side and on the middle layer side allowing different combinations of these to work together. > Besides the AMBA case, I also realized that we are dealing with PM > clocks in the genpd case. For this, genpd relies on the that runtime > PM status of the device properly reflects the state of the HW, during > system-wide PM. > > In other words, if the driver would change the runtime PM status of > the device, without respecting the hierarchy of the runtime PM > callbacks, it would lead to that genpd starts taking wrong decisions > while managing the PM clocks during system-wide PM. So in case you > intend to change pm_runtime_force_* this needs to be addressed too. I've just looked at the genpd code and quite frankly I'm not sure how this works, but I'll figure this out. :-) > > > >> I believe I have explained why, also several times by now - > >> and that's also how far you could take the i2c designware driver at > >> this point. > >> > >> That said, I assume the second part may be addressed in this series, > >> if these drivers convert to use the "driver PM flags", right? > >> > >> However, what about the first case? Is some open coding needed or your > >> think the amba driver can instruct the amba bus via the "driver PM > >> flags"? > > > > With the appended patch applied things should work for AMBA like for > > any other bus type implementing PM, so I don't see why not. > > > >> > > >> >> Like the spi bus, i2c bus, amba bus, platform > >> >> bus, genpd, etc. There are no changes needed to continue to support > >> >> this option, if you see what I mean. > >> > > >> > For the time being, nothing changes in that respect, but eventually I'd > >> > prefer the pm_runtime_force_* things to go away, frankly. > >> > >> Okay, thanks for that clear statement! > >> > >> > > >> >> So, when you say that re-using runtime PM callbacks for system-wide PM > >> >> isn't going to happen, can you please elaborate what you mean? > >> > > >> > I didn't mean "reusing runtime PM callbacks for system-wide PM" overall, but > >> > reusing *middle-layer* runtime PM callbacks for system-wide PM. That is the > >> > bogus part. > >> > >> I think we have discussed this several times, but the arguments you > >> have put forward, explaining *why* haven't yet convinced me. > > > > Well, sorry about that. I would like to be able to explain my point to you so > > that you understand my perspective, but if that's not working, that's not a > > sufficient reason for me to give up. > > > > I'm just refusing to maintain code that I don't agree with in the long run. > > > >> In principle what you have been saying is that it's a "layering > >> violation" to use pm_runtime_force_suspend|resume() from driver's > >> system sleep callbacks, but on the other hand you think using > >> pm_runtime_get* and friends is okay!? > > > > Not unconditionally, which would be fair to mention. > > > > Only if it is called in ->prepare or as the first thing in a ->suspend > > callback. Later than that is broken too in principle. > > > >> That makes little sense to me, because it's the same "layering > >> violation" that is done for both cases. > > > > The "layering violation" is all about things possibly occurring in a > > wrong order. For example, say a middle-layer ->runtime_suspend is > > called via pm_runtime_force_suspend() which in turn is called from > > middle-layer ->suspend_late as a driver callback. If the ->runtime_suspend > > does anything significat to the device, then executing the remaining part of > > ->suspend_late will almost cetainly break things, more or less. > > > > That is not a concern with a middle-layer ->runtime_resume running > > *before* a middle-layer ->suspend (or any subsequent callbacks) does > > anything significant to the device. > > > > Is there anything in the above which is not clear enough? > > > >> Moreover, you have been explaining that re-using runtime PM callbacks > >> for PCI doesn't work. Then my question is, why should a limitation of > >> the PCI subsystem put constraints on the behavior for all other > >> subsystems/middle-layers? > > > > Because they aren't just PCI subsystem limitations only. The need to handle > > wakeup setup differently for runtime PM and system sleep is not PCI-specific. > > The need to handle suspend and hibernation differently isn't too. > > > > Those things may be more obvious in PCI, but they are generic rather than > > special. > > Absolutely agree about the different wake-up settings. However, these > issues can be addressed also when using pm_runtime_force_*, at least > in general, but then not for PCI. Well, not for the ACPI PM domain too. In general, not if the wakeup settings are adjusted by the middle layer. > Regarding hibernation, honestly that's not really my area of > expertise. Although, I assume the middle-layer and driver can treat > that as a separate case, so if it's not suitable to use > pm_runtime_force* for that case, then they shouldn't do it. Well, agreed. In some simple cases, though, driver callbacks can be reused for hibernation too, so it would be good to have a common way to do that too, IMO. > > > > Also, quite so often other middle layers interact with PCI directly or > > indirectly (eg. a platform device may be a child or a consumer of a PCI > > device) and some optimizations need to take that into account (eg. parents > > generally need to be accessible when their childres are resumed and so on). > > A device's parent becomes informed when changing the runtime PM status > of the device via pm_runtime_force_suspend|resume(), as those calls > pm_runtime_set_suspended|active(). This requires the parent driver or middle layer to look at the reference counter and understand it the same way as pm_runtime_force_*. > In case that isn't that sufficient, what else is needed? Perhaps you can > point me to an example so I can understand better? Say you want to leave the parent suspended after system resume, but the child drivers use pm_runtime_force_suspend|resume(). The parent would then need to use pm_runtime_force_suspend|resume() too, no? > For a PCI consumer device those will of course have to play by the rules of PCI. > > > > > Moreover, the majority of the "other subsystems/middle-layers" you've talked > > about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*, > > so question is how representative they really are. > > That's the point. We know pm_runtime_force_* works nicely for the > trivial middle-layer cases. In which cases the middle-layer callbacks don't exist, so it's just like reusing driver callbacks directly. :-) > For the more complex cases, we need something additional/different. Something different. But overall, as I said, this is about common expectations. Today, some middle layers expect drivers to point their callback pointers to the same routine in order to resue it (PCI, ACPI bus type), some of them expect pm_runtime_force_suspend|resume() to be used (AMBA, maybe genpd), and some of them have no expectations at all. There needs to be a common ground in that area for drivers to be able to work with different middle layers. > > > >> > > >> > Quoting again: > >> > > >> > "If you are a middle layer, your role is basically to do PM for a certain > >> > group of devices. Thus you cannot really do the same in ->suspend or > >> > ->suspend_early and in ->runtime_suspend (because the former generally need to > >> > take device_may_wakeup() into account and the latter doesn't) and you shouldn't > >> > really do the same in ->suspend and ->freeze (becuase the latter shouldn't > >> > change the device's power state) and so on." > >> > > >> > I have said for multiple times that re-using *driver* callbacks actually makes > >> > sense and the series is for doing that easier in general among other things. > >> > > >> >> I assume you mean that the PM core won't be involved to support this, > >> >> but is that it? > >> >> > >> >> Do you also mean that *all* users of pm_runtime_force_suspend|resume() > >> >> must convert to this new thing, using "driver PM flags", so in the end > >> >> you want to remove pm_runtime_force_suspend|resume()? > >> >> - Then if so, you must of course consider all cases for how > >> >> pm_runtime_force_suspend|resume() are being deployed currently, else > >> >> existing users can't convert to the "driver PM flags" thing. Have you > >> >> done that in this series? > >> > > >> > Let me turn this around. > >> > > >> > The majority of cases in which pm_runtime_force_* are used *should* be > >> > addressable using the flags introduced here. Some case in which > >> > pm_runtime_force_* cannot be used should be addressable by these flags > >> > as well. > >> > >> That's sounds really great! > >> > >> > > >> > There may be some cases in which pm_runtime_force_* are used that may > >> > require something more, but I'm not going to worry about that right now. > >> > >> This approach concerns me, because if we in the end realizes that > >> pm_runtime_force_suspend|resume() will be too hard to get rid of, then > >> this series just add yet another generic way of trying to optimize the > >> system sleep path for runtime PM enabled devices. > > > > Which also works for PCI and the ACPI PM domain and that's sort of valuable > > anyway, isn't it? > > Indeed it is! I am definitely open to improve the situation for ACPI and PCI. > > Seems like I may have given the wrong impression about that. > > > > > For the record, I don't think it will be too hard to get rid of > > pm_runtime_force_suspend|resume(), although that may take quite some time. > > > >> So then we would end up having to support the "direct_complete" path, > >> the "driver PM flags" and cases where > >> pm_runtime_force_suspend|resume() is used. No, that just isn't good > >> enough to me. That will just lead to similar scenarios as we had in > >> the i2c designware driver. > > > > Frankly, this sounds like staging for indefinite blocking of changes in > > this area on non-technical grounds. I hope that it isn't the case ... > > > >> If we decide to go with these new "driver PM flags", I want to make > >> sure, as long as possible, that we can remove both the > >> "direct_complete" path support from the PM core as well as removing > >> the pm_runtime_force_suspend|resume() helpers. > > > > We'll see. > > > >> > > >> > I'll take care of that when I'll be removing pm_runtime_force_*, which I'm > >> > not doing here. > >> > >> Of course I am fine with that we postpone doing the actual converting > >> of drivers etc from this series, although as stated above, let's sure > >> we *can* do it by using the "driver PM flags". > > > > There clearly are use cases that benefit from this series and I don't see > > any alternatives covering them, including both direct-complete and the > > pm_runtime_force* approach, so I'm not buying this "let's make sure > > it can cover all possible use cases that exist" argumentation. > > Alright, let me re-phrase my take on this. > > Because you stated that you plan to remove pm_runtime_force_* > eventually, then I think you need to put up some valid reasons of why > (I consider that done), but more importantly, you need to offer an > alternative solution that can replace it. Else such that statement can > easily become wrong interpreted. My point is, the "driver PM flags" do > *not* offers a full alternative solution, it may do in the future or > it may not. > > So, to conclude from my side, I don't have any major objections to > going forward with the "driver PM flags", especially with the goal of > improving the situation for PCI and ACPI. Down the road, we can then > *try* to make it replace pm_runtime_force_* and the "direct_complete > path". > > Hopefully that makes it more clear. Yes, it does, thank you! Rafael
[...] >> >> The reason why pm_runtime_force_* needs to respects the hierarchy of >> the RPM callbacks, is because otherwise it can't safely update the >> runtime PM status of the device. > > I'm not sure I follow this requirement. Why is that so? If the PM domain controls some resources for the device in its RPM callbacks and the driver controls some other resources in its RPM callbacks - then these resources needs to be managed together. This follows the behavior of when a regular call to pm_runtime_get|put(), triggers the RPM callbacks to be invoked. > >> And updating the runtime PM status of >> the device is required to manage the optimized behavior during system >> resume (avoiding to unnecessary resume devices). > > Well, OK. The runtime PM status of the device after system resume should > better reflect its physical state. > > [The physical state of the device may not be under the control of the > kernel in some cases, like in S3 resume on some systems that reset > devices in the firmware and so on, but let's set that aside.] > > However, for the runtime PM status of the device may still reflect its state > if, say, a ->resume_early of the middle layer is called during resume along > with a driver's ->runtime_resume. That still can produce the right state > of the device and all depends on the middle layer. > > On the other hand, as I said before, using a middle-layer ->runtime_suspend > during a system sleep transition may be outright incorrect, say if device > wakeup settings need to be adjusted by the middle layer (which is the > case for some of them). > > Of course, if the middle layer expects the driver to point its > system-wide PM callbacks to pm_runtime_force_*, then that's how it goes, > but the drivers working with this particular middle layer generally > won't work with other middle layers and may interact incorrectly > with parents and/or children using the other middle layers. > > I guess the problem boils down to having a common set of expectations > on the driver side and on the middle layer side allowing different > combinations of these to work together. Yes! > >> Besides the AMBA case, I also realized that we are dealing with PM >> clocks in the genpd case. For this, genpd relies on the that runtime >> PM status of the device properly reflects the state of the HW, during >> system-wide PM. >> >> In other words, if the driver would change the runtime PM status of >> the device, without respecting the hierarchy of the runtime PM >> callbacks, it would lead to that genpd starts taking wrong decisions >> while managing the PM clocks during system-wide PM. So in case you >> intend to change pm_runtime_force_* this needs to be addressed too. > > I've just looked at the genpd code and quite frankly I'm not sure how this > works, but I'll figure this out. :-) You may think of it as genpd's RPM callback controls some device clocks, while the driver control some other device resources (pinctrl for example) from its RPM callback. These resources needs to managed together, similar to as I described above. [...] >> Absolutely agree about the different wake-up settings. However, these >> issues can be addressed also when using pm_runtime_force_*, at least >> in general, but then not for PCI. > > Well, not for the ACPI PM domain too. > > In general, not if the wakeup settings are adjusted by the middle layer. Correct! To use pm_runtime_force* for these cases, one would need some additional information exchange between the driver and the middle-layer. > >> Regarding hibernation, honestly that's not really my area of >> expertise. Although, I assume the middle-layer and driver can treat >> that as a separate case, so if it's not suitable to use >> pm_runtime_force* for that case, then they shouldn't do it. > > Well, agreed. > > In some simple cases, though, driver callbacks can be reused for hibernation > too, so it would be good to have a common way to do that too, IMO. Okay, that makes sense! > >> > >> > Also, quite so often other middle layers interact with PCI directly or >> > indirectly (eg. a platform device may be a child or a consumer of a PCI >> > device) and some optimizations need to take that into account (eg. parents >> > generally need to be accessible when their childres are resumed and so on). >> >> A device's parent becomes informed when changing the runtime PM status >> of the device via pm_runtime_force_suspend|resume(), as those calls >> pm_runtime_set_suspended|active(). > > This requires the parent driver or middle layer to look at the reference > counter and understand it the same way as pm_runtime_force_*. > >> In case that isn't that sufficient, what else is needed? Perhaps you can >> point me to an example so I can understand better? > > Say you want to leave the parent suspended after system resume, but the > child drivers use pm_runtime_force_suspend|resume(). The parent would then > need to use pm_runtime_force_suspend|resume() too, no? Actually no. Currently the other options of "deferring resume" (not using pm_runtime_force_*), is either using the "direct_complete" path or similar to the approach you took for the i2c designware driver. Both cases should play nicely in combination of a child being managed by pm_runtime_force_*. That's because only when the parent device is kept runtime suspended during system suspend, resuming can be deferred. That means, if the resume of the parent is deferred, so will the also the resume of the child. > >> For a PCI consumer device those will of course have to play by the rules of PCI. >> >> > >> > Moreover, the majority of the "other subsystems/middle-layers" you've talked >> > about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*, >> > so question is how representative they really are. >> >> That's the point. We know pm_runtime_force_* works nicely for the >> trivial middle-layer cases. > > In which cases the middle-layer callbacks don't exist, so it's just like > reusing driver callbacks directly. :-) > >> For the more complex cases, we need something additional/different. > > Something different. > > But overall, as I said, this is about common expectations. > > Today, some middle layers expect drivers to point their callback pointers > to the same routine in order to resue it (PCI, ACPI bus type), some of them > expect pm_runtime_force_suspend|resume() to be used (AMBA, maybe genpd), > and some of them have no expectations at all. > > There needs to be a common ground in that area for drivers to be able to > work with different middle layers. Yes, reaching that point would be great, we should definitively aim for that! [...] Kind regards Uffe
On 10/18/2017 09:11 AM, Ulf Hansson wrote: > [...] > >>> >>> The reason why pm_runtime_force_* needs to respects the hierarchy of >>> the RPM callbacks, is because otherwise it can't safely update the >>> runtime PM status of the device. >> >> I'm not sure I follow this requirement. Why is that so? > > If the PM domain controls some resources for the device in its RPM > callbacks and the driver controls some other resources in its RPM > callbacks - then these resources needs to be managed together. > > This follows the behavior of when a regular call to > pm_runtime_get|put(), triggers the RPM callbacks to be invoked. > >> >>> And updating the runtime PM status of >>> the device is required to manage the optimized behavior during system >>> resume (avoiding to unnecessary resume devices). >> >> Well, OK. The runtime PM status of the device after system resume should >> better reflect its physical state. >> >> [The physical state of the device may not be under the control of the >> kernel in some cases, like in S3 resume on some systems that reset >> devices in the firmware and so on, but let's set that aside.] >> >> However, for the runtime PM status of the device may still reflect its state >> if, say, a ->resume_early of the middle layer is called during resume along >> with a driver's ->runtime_resume. That still can produce the right state >> of the device and all depends on the middle layer. >> >> On the other hand, as I said before, using a middle-layer ->runtime_suspend >> during a system sleep transition may be outright incorrect, say if device >> wakeup settings need to be adjusted by the middle layer (which is the >> case for some of them). >> >> Of course, if the middle layer expects the driver to point its >> system-wide PM callbacks to pm_runtime_force_*, then that's how it goes, >> but the drivers working with this particular middle layer generally >> won't work with other middle layers and may interact incorrectly >> with parents and/or children using the other middle layers. >> >> I guess the problem boils down to having a common set of expectations >> on the driver side and on the middle layer side allowing different >> combinations of these to work together. > > Yes! > >> >>> Besides the AMBA case, I also realized that we are dealing with PM >>> clocks in the genpd case. For this, genpd relies on the that runtime >>> PM status of the device properly reflects the state of the HW, during >>> system-wide PM. >>> >>> In other words, if the driver would change the runtime PM status of >>> the device, without respecting the hierarchy of the runtime PM >>> callbacks, it would lead to that genpd starts taking wrong decisions >>> while managing the PM clocks during system-wide PM. So in case you >>> intend to change pm_runtime_force_* this needs to be addressed too. >> >> I've just looked at the genpd code and quite frankly I'm not sure how this >> works, but I'll figure this out. :-) > > You may think of it as genpd's RPM callback controls some device > clocks, while the driver control some other device resources (pinctrl > for example) from its RPM callback. > > These resources needs to managed together, similar to as I described above. > > [...] > >>> Absolutely agree about the different wake-up settings. However, these >>> issues can be addressed also when using pm_runtime_force_*, at least >>> in general, but then not for PCI. >> >> Well, not for the ACPI PM domain too. >> >> In general, not if the wakeup settings are adjusted by the middle layer. > > Correct! > > To use pm_runtime_force* for these cases, one would need some > additional information exchange between the driver and the > middle-layer. > >> >>> Regarding hibernation, honestly that's not really my area of >>> expertise. Although, I assume the middle-layer and driver can treat >>> that as a separate case, so if it's not suitable to use >>> pm_runtime_force* for that case, then they shouldn't do it. >> >> Well, agreed. >> >> In some simple cases, though, driver callbacks can be reused for hibernation >> too, so it would be good to have a common way to do that too, IMO. > > Okay, that makes sense! > >> >>>> >>>> Also, quite so often other middle layers interact with PCI directly or >>>> indirectly (eg. a platform device may be a child or a consumer of a PCI >>>> device) and some optimizations need to take that into account (eg. parents >>>> generally need to be accessible when their childres are resumed and so on). >>> >>> A device's parent becomes informed when changing the runtime PM status >>> of the device via pm_runtime_force_suspend|resume(), as those calls >>> pm_runtime_set_suspended|active(). >> >> This requires the parent driver or middle layer to look at the reference >> counter and understand it the same way as pm_runtime_force_*. >> >>> In case that isn't that sufficient, what else is needed? Perhaps you can >>> point me to an example so I can understand better? >> >> Say you want to leave the parent suspended after system resume, but the >> child drivers use pm_runtime_force_suspend|resume(). The parent would then >> need to use pm_runtime_force_suspend|resume() too, no? > > Actually no. > > Currently the other options of "deferring resume" (not using > pm_runtime_force_*), is either using the "direct_complete" path or > similar to the approach you took for the i2c designware driver. > > Both cases should play nicely in combination of a child being managed > by pm_runtime_force_*. That's because only when the parent device is > kept runtime suspended during system suspend, resuming can be > deferred. > > That means, if the resume of the parent is deferred, so will the also > the resume of the child. > >> >>> For a PCI consumer device those will of course have to play by the rules of PCI. >>> >>>> >>>> Moreover, the majority of the "other subsystems/middle-layers" you've talked >>>> about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*, >>>> so question is how representative they really are. >>> >>> That's the point. We know pm_runtime_force_* works nicely for the >>> trivial middle-layer cases. >> >> In which cases the middle-layer callbacks don't exist, so it's just like >> reusing driver callbacks directly. :-) I'd like to ask you clarify one point here and provide some info which I hope can be useful - what's exactly means "trivial middle-layer cases"? Is it when systems use "drivers/base/power/clock_ops.c - Generic clock manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP device framework struct dev_pm_domain omap_device_pm_domain (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops tegra_aconnect_pm_ops? if yes all above have PM runtime callbacks.
On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote: > > On 10/18/2017 09:11 AM, Ulf Hansson wrote: [...] > >>> That's the point. We know pm_runtime_force_* works nicely for the > >>> trivial middle-layer cases. > >> > >> In which cases the middle-layer callbacks don't exist, so it's just like > >> reusing driver callbacks directly. :-) > > I'd like to ask you clarify one point here and provide some info which I hope can be useful - > what's exactly means "trivial middle-layer cases"? > > Is it when systems use "drivers/base/power/clock_ops.c - Generic clock > manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP > device framework struct dev_pm_domain omap_device_pm_domain > (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops > tegra_aconnect_pm_ops? > > if yes all above have PM runtime callbacks. Trivial ones don't actually do anything meaningful in their PM callbacks. Things like the platform bus type, spi bus type, i2c bus type and similar. If the middle-layer callbacks manipulate devices in a significant way, then they aren't trivial. Thanks, Rafael
On Wednesday, October 18, 2017 2:34:10 PM CEST Ulf Hansson wrote: > [...] > > >> Are there any major reasons why the appended patch (obviously untested) won't > >> work, then? > > > > OK, there is a reason, which is the optimizations bundled into > > pm_runtime_force_*, because (a) the device may be left in runtime suspend > > by them (in which case amba_pm_suspend_early() in my patch should not run) > > and (b) pm_runtime_force_resume() may decide to leave it suspended (in which > > case amba_pm_suspend_late() in my patch should not run). > > Exactly. > > > > > [BTW, the "leave the device suspended" optimization in pm_runtime_force_* > > is potentially problematic too, because it requires the children to do > > the right thing, which effectively means that their drivers need to use > > pm_runtime_force_* too, but what if they don't want to reuse their > > runtime PM callbacks for system-wide PM?] > > Deployment of pm_runtime_force_suspend() should generally be done for > children devices first. > > If some reason that isn't the case, it's expected that the call to > pm_runtime_set_suspended() invoked from pm_runtime_force_suspend(), > for the parent, should fail and thus abort system suspend. Well, generally what about drivers that need to do something significantly different for system suspend and runtime PM? The whole picture seems to be falling apart if one of these is involved. > > > > Honestly, I don't like the way this is designed. IMO, it would be better > > to do the optimizations and all in the bus type middle-layer code instead > > of expecting drivers to use pm_runtime_force_* as their system-wide PM > > callbacks (and that expectation should at least be documented, which I'm > > not sure is the case now). But whatever. > > > > It all should work the way it does now without pm_runtime_force_* if (a) the > > bus type's PM callbacks are changed like in the last patch and the drivers > > (b) point their system suspend callbacks to the runtime PM callback routines > > and (c) set DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED for the > > devices (if they need to do the PM in ->suspend and ->resume, they may set > > DPM_FLAG_AVOID_RPM too). > > > > And if you see a reason why that won't work, please let me know. > > I will have look and try out the series by using my local "runtime PM > test driver". > > I get back to you with an update on this. OK, thanks!
On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote: > [...] > > >> > >> The reason why pm_runtime_force_* needs to respects the hierarchy of > >> the RPM callbacks, is because otherwise it can't safely update the > >> runtime PM status of the device. > > > > I'm not sure I follow this requirement. Why is that so? > > If the PM domain controls some resources for the device in its RPM > callbacks and the driver controls some other resources in its RPM > callbacks - then these resources needs to be managed together. Right, but that doesn't automatically make it necessary to use runtime PM callbacks in the middle layer. Its system-wide PM callbacks may be suitable for that just fine. That is, at least in some cases, you can combine ->runtime_suspend from a driver and ->suspend_late from a middle layer with no problems, for example. That's why some middle layers allow drivers to point ->suspend_late and ->runtime_suspend to the same routine if they want to reuse that code. > This follows the behavior of when a regular call to > pm_runtime_get|put(), triggers the RPM callbacks to be invoked. But (a) it doesn't have to follow it and (b) in some cases it should not follow it. > > > >> And updating the runtime PM status of > >> the device is required to manage the optimized behavior during system > >> resume (avoiding to unnecessary resume devices). > > > > Well, OK. The runtime PM status of the device after system resume should > > better reflect its physical state. > > > > [The physical state of the device may not be under the control of the > > kernel in some cases, like in S3 resume on some systems that reset > > devices in the firmware and so on, but let's set that aside.] > > > > However, for the runtime PM status of the device may still reflect its state > > if, say, a ->resume_early of the middle layer is called during resume along > > with a driver's ->runtime_resume. That still can produce the right state > > of the device and all depends on the middle layer. > > > > On the other hand, as I said before, using a middle-layer ->runtime_suspend > > during a system sleep transition may be outright incorrect, say if device > > wakeup settings need to be adjusted by the middle layer (which is the > > case for some of them). > > > > Of course, if the middle layer expects the driver to point its > > system-wide PM callbacks to pm_runtime_force_*, then that's how it goes, > > but the drivers working with this particular middle layer generally > > won't work with other middle layers and may interact incorrectly > > with parents and/or children using the other middle layers. > > > > I guess the problem boils down to having a common set of expectations > > on the driver side and on the middle layer side allowing different > > combinations of these to work together. > > Yes! > > > > >> Besides the AMBA case, I also realized that we are dealing with PM > >> clocks in the genpd case. For this, genpd relies on the that runtime > >> PM status of the device properly reflects the state of the HW, during > >> system-wide PM. > >> > >> In other words, if the driver would change the runtime PM status of > >> the device, without respecting the hierarchy of the runtime PM > >> callbacks, it would lead to that genpd starts taking wrong decisions > >> while managing the PM clocks during system-wide PM. So in case you > >> intend to change pm_runtime_force_* this needs to be addressed too. > > > > I've just looked at the genpd code and quite frankly I'm not sure how this > > works, but I'll figure this out. :-) > > You may think of it as genpd's RPM callback controls some device > clocks, while the driver control some other device resources (pinctrl > for example) from its RPM callback. > > These resources needs to managed together, similar to as I described above. Which, again, doesn't mean that runtime PM callbacks from the middle layer have to be used for that. > [...] > > >> Absolutely agree about the different wake-up settings. However, these > >> issues can be addressed also when using pm_runtime_force_*, at least > >> in general, but then not for PCI. > > > > Well, not for the ACPI PM domain too. > > > > In general, not if the wakeup settings are adjusted by the middle layer. > > Correct! > > To use pm_runtime_force* for these cases, one would need some > additional information exchange between the driver and the > middle-layer. Which pretty much defeats the purpose of the wrappers, doesn't it? > > > >> Regarding hibernation, honestly that's not really my area of > >> expertise. Although, I assume the middle-layer and driver can treat > >> that as a separate case, so if it's not suitable to use > >> pm_runtime_force* for that case, then they shouldn't do it. > > > > Well, agreed. > > > > In some simple cases, though, driver callbacks can be reused for hibernation > > too, so it would be good to have a common way to do that too, IMO. > > Okay, that makes sense! > > > > >> > > >> > Also, quite so often other middle layers interact with PCI directly or > >> > indirectly (eg. a platform device may be a child or a consumer of a PCI > >> > device) and some optimizations need to take that into account (eg. parents > >> > generally need to be accessible when their childres are resumed and so on). > >> > >> A device's parent becomes informed when changing the runtime PM status > >> of the device via pm_runtime_force_suspend|resume(), as those calls > >> pm_runtime_set_suspended|active(). > > > > This requires the parent driver or middle layer to look at the reference > > counter and understand it the same way as pm_runtime_force_*. > > > >> In case that isn't that sufficient, what else is needed? Perhaps you can > >> point me to an example so I can understand better? > > > > Say you want to leave the parent suspended after system resume, but the > > child drivers use pm_runtime_force_suspend|resume(). The parent would then > > need to use pm_runtime_force_suspend|resume() too, no? > > Actually no. > > Currently the other options of "deferring resume" (not using > pm_runtime_force_*), is either using the "direct_complete" path or > similar to the approach you took for the i2c designware driver. > > Both cases should play nicely in combination of a child being managed > by pm_runtime_force_*. That's because only when the parent device is > kept runtime suspended during system suspend, resuming can be > deferred. And because the parent remains in runtime suspend late enough in the system suspend path, its children also are guaranteed to be suspended. But then all of them need to be left in runtime suspend during system resume too, which is somewhat restrictive, because some drivers may want their devices to be resumed then. [BTW, our current documentation recommends resuming devices during system resume, actually, and gives a list of reasons why. :-)] > That means, if the resume of the parent is deferred, so will the also > the resume of the child. > > > > >> For a PCI consumer device those will of course have to play by the rules of PCI. > >> > >> > > >> > Moreover, the majority of the "other subsystems/middle-layers" you've talked > >> > about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*, > >> > so question is how representative they really are. > >> > >> That's the point. We know pm_runtime_force_* works nicely for the > >> trivial middle-layer cases. > > > > In which cases the middle-layer callbacks don't exist, so it's just like > > reusing driver callbacks directly. :-) > > > >> For the more complex cases, we need something additional/different. > > > > Something different. > > > > But overall, as I said, this is about common expectations. > > > > Today, some middle layers expect drivers to point their callback pointers > > to the same routine in order to resue it (PCI, ACPI bus type), some of them > > expect pm_runtime_force_suspend|resume() to be used (AMBA, maybe genpd), > > and some of them have no expectations at all. > > > > There needs to be a common ground in that area for drivers to be able to > > work with different middle layers. > > Yes, reaching that point would be great, we should definitively aim for that! Indeed. Thanks, Rafael
On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote: >> >> On 10/18/2017 09:11 AM, Ulf Hansson wrote: > > [...] > >> >>> That's the point. We know pm_runtime_force_* works nicely for the >> >>> trivial middle-layer cases. >> >> >> >> In which cases the middle-layer callbacks don't exist, so it's just like >> >> reusing driver callbacks directly. :-) >> >> I'd like to ask you clarify one point here and provide some info which I hope can be useful - >> what's exactly means "trivial middle-layer cases"? >> >> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock >> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP >> device framework struct dev_pm_domain omap_device_pm_domain >> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops >> tegra_aconnect_pm_ops? >> >> if yes all above have PM runtime callbacks. > > Trivial ones don't actually do anything meaningful in their PM callbacks. > > Things like the platform bus type, spi bus type, i2c bus type and similar. > > If the middle-layer callbacks manipulate devices in a significant way, then > they aren't trivial. I fully agree with Rafael's description above, but let me also clarify one more thing. We have also been discussing PM domains as being trivial and non-trivial. In some statements I even think the PM domain has been a part the middle-layer terminology, which may have been a bit confusing. In this regards as we consider genpd being a trivial PM domain, those examples your bring up above is too me also examples of trivial PM domains. Especially because they don't deal with wakeups, as that is taken care of by the drivers, right!? Kind regards Uffe
On 19 October 2017 at 00:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote: >> [...] >> >> >> >> >> The reason why pm_runtime_force_* needs to respects the hierarchy of >> >> the RPM callbacks, is because otherwise it can't safely update the >> >> runtime PM status of the device. >> > >> > I'm not sure I follow this requirement. Why is that so? >> >> If the PM domain controls some resources for the device in its RPM >> callbacks and the driver controls some other resources in its RPM >> callbacks - then these resources needs to be managed together. > > Right, but that doesn't automatically make it necessary to use runtime PM > callbacks in the middle layer. Its system-wide PM callbacks may be > suitable for that just fine. > > That is, at least in some cases, you can combine ->runtime_suspend from a > driver and ->suspend_late from a middle layer with no problems, for example. > > That's why some middle layers allow drivers to point ->suspend_late and > ->runtime_suspend to the same routine if they want to reuse that code. > >> This follows the behavior of when a regular call to >> pm_runtime_get|put(), triggers the RPM callbacks to be invoked. > > But (a) it doesn't have to follow it and (b) in some cases it should not > follow it. Of course you don't explicitly *have to* respect the hierarchy of the RPM callbacks in pm_runtime_force_*. However, changing that would require some additional information exchange between the driver and the middle-layer/PM domain, as to instruct the middle-layer/PM domain of what to do during system-wide PM. Especially in cases when the driver deals with wakeup, as in those cases the instructions may change dynamically. [...] >> > In general, not if the wakeup settings are adjusted by the middle layer. >> >> Correct! >> >> To use pm_runtime_force* for these cases, one would need some >> additional information exchange between the driver and the >> middle-layer. > > Which pretty much defeats the purpose of the wrappers, doesn't it? Well, no matter if the wrappers are used or not, we need some kind of information exchange between the driver and the middle-layers/PM domains. Anyway, me personally think it's too early to conclude that using the wrappers may not be useful going forward. At this point, they clearly helps trivial cases to remain being trivial. > >> > >> >> Regarding hibernation, honestly that's not really my area of >> >> expertise. Although, I assume the middle-layer and driver can treat >> >> that as a separate case, so if it's not suitable to use >> >> pm_runtime_force* for that case, then they shouldn't do it. >> > >> > Well, agreed. >> > >> > In some simple cases, though, driver callbacks can be reused for hibernation >> > too, so it would be good to have a common way to do that too, IMO. >> >> Okay, that makes sense! >> >> > >> >> > >> >> > Also, quite so often other middle layers interact with PCI directly or >> >> > indirectly (eg. a platform device may be a child or a consumer of a PCI >> >> > device) and some optimizations need to take that into account (eg. parents >> >> > generally need to be accessible when their childres are resumed and so on). >> >> >> >> A device's parent becomes informed when changing the runtime PM status >> >> of the device via pm_runtime_force_suspend|resume(), as those calls >> >> pm_runtime_set_suspended|active(). >> > >> > This requires the parent driver or middle layer to look at the reference >> > counter and understand it the same way as pm_runtime_force_*. >> > >> >> In case that isn't that sufficient, what else is needed? Perhaps you can >> >> point me to an example so I can understand better? >> > >> > Say you want to leave the parent suspended after system resume, but the >> > child drivers use pm_runtime_force_suspend|resume(). The parent would then >> > need to use pm_runtime_force_suspend|resume() too, no? >> >> Actually no. >> >> Currently the other options of "deferring resume" (not using >> pm_runtime_force_*), is either using the "direct_complete" path or >> similar to the approach you took for the i2c designware driver. >> >> Both cases should play nicely in combination of a child being managed >> by pm_runtime_force_*. That's because only when the parent device is >> kept runtime suspended during system suspend, resuming can be >> deferred. > > And because the parent remains in runtime suspend late enough in the > system suspend path, its children also are guaranteed to be suspended. Yes. > > But then all of them need to be left in runtime suspend during system > resume too, which is somewhat restrictive, because some drivers may > want their devices to be resumed then. Actually, this scenario is also addressed when using the pm_runtime_force_*. The driver for the child would only need to bump the runtime PM usage count (pm_runtime_get_noresume()) before calling pm_runtime_force_suspend() at system suspend. That then also propagates to the parent, leading to that both the parent and the child will be resumed when pm_runtime_force_resume() is called for them. Of course, if the driver of the parent isn't using pm_runtime_force_, we would have to assume that it's always being resumed at system resume. As at matter of fact, doesn't this scenario actually indicates that we do need to involve the runtime PM core (updating RPM status according to the HW state even during system-wide PM) to really get this right. It's not enough to only use "driver PM flags"!? Seems like we need to create a list of all requirements, pitfalls, good things vs bad things etc. :-) > > [BTW, our current documentation recommends resuming devices during > system resume, actually, and gives a list of reasons why. :-)] Yes, but that too easy and to me not good enough. :-) [...] Kind regards Uffe
On 10/19/2017 03:33 AM, Ulf Hansson wrote: > On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote: >>> >>> On 10/18/2017 09:11 AM, Ulf Hansson wrote: >> >> [...] >> >>>>>> That's the point. We know pm_runtime_force_* works nicely for the >>>>>> trivial middle-layer cases. >>>>> >>>>> In which cases the middle-layer callbacks don't exist, so it's just like >>>>> reusing driver callbacks directly. :-) >>> >>> I'd like to ask you clarify one point here and provide some info which I hope can be useful - >>> what's exactly means "trivial middle-layer cases"? >>> >>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock >>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP >>> device framework struct dev_pm_domain omap_device_pm_domain >>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops >>> tegra_aconnect_pm_ops? >>> >>> if yes all above have PM runtime callbacks. >> >> Trivial ones don't actually do anything meaningful in their PM callbacks. >> >> Things like the platform bus type, spi bus type, i2c bus type and similar. >> >> If the middle-layer callbacks manipulate devices in a significant way, then >> they aren't trivial. > > I fully agree with Rafael's description above, but let me also clarify > one more thing. > > We have also been discussing PM domains as being trivial and > non-trivial. In some statements I even think the PM domain has been a > part the middle-layer terminology, which may have been a bit > confusing. > > In this regards as we consider genpd being a trivial PM domain, those > examples your bring up above is too me also examples of trivial PM > domains. Especially because they don't deal with wakeups, as that is > taken care of by the drivers, right!? Not directly, for example, omap device framework has noirq callback implemented which forcibly disable all devices which are not PM runtime suspended. while doing this it calls drivers PM .runtime_suspend() which may return non 0 value and in this case device will be left enabled (powered) at suspend for wake up purposes (see _od_suspend_noirq()).
[...] >>> > Say you want to leave the parent suspended after system resume, but the >>> > child drivers use pm_runtime_force_suspend|resume(). The parent would then >>> > need to use pm_runtime_force_suspend|resume() too, no? >>> >>> Actually no. >>> >>> Currently the other options of "deferring resume" (not using >>> pm_runtime_force_*), is either using the "direct_complete" path or >>> similar to the approach you took for the i2c designware driver. >>> >>> Both cases should play nicely in combination of a child being managed >>> by pm_runtime_force_*. That's because only when the parent device is >>> kept runtime suspended during system suspend, resuming can be >>> deferred. >> >> And because the parent remains in runtime suspend late enough in the >> system suspend path, its children also are guaranteed to be suspended. > > Yes. > >> >> But then all of them need to be left in runtime suspend during system >> resume too, which is somewhat restrictive, because some drivers may >> want their devices to be resumed then. > > Actually, this scenario is also addressed when using the pm_runtime_force_*. > > The driver for the child would only need to bump the runtime PM usage > count (pm_runtime_get_noresume()) before calling > pm_runtime_force_suspend() at system suspend. That then also > propagates to the parent, leading to that both the parent and the > child will be resumed when pm_runtime_force_resume() is called for > them. I need to correct myself here. The above currently only works if the child is runtime resumed while pm_runtime_force_suspend() is called. The logic in pm_runtime_force_* needs to be improved to take care of such scenarios. However I think that should be rather easy to fix, if we want that. [...] Kind regards Uffe
On 19 October 2017 at 19:21, Grygorii Strashko <grygorii.strashko@ti.com> wrote: > > > On 10/19/2017 03:33 AM, Ulf Hansson wrote: >> On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >>> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote: >>>> >>>> On 10/18/2017 09:11 AM, Ulf Hansson wrote: >>> >>> [...] >>> >>>>>>> That's the point. We know pm_runtime_force_* works nicely for the >>>>>>> trivial middle-layer cases. >>>>>> >>>>>> In which cases the middle-layer callbacks don't exist, so it's just like >>>>>> reusing driver callbacks directly. :-) >>>> >>>> I'd like to ask you clarify one point here and provide some info which I hope can be useful - >>>> what's exactly means "trivial middle-layer cases"? >>>> >>>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock >>>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP >>>> device framework struct dev_pm_domain omap_device_pm_domain >>>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops >>>> tegra_aconnect_pm_ops? >>>> >>>> if yes all above have PM runtime callbacks. >>> >>> Trivial ones don't actually do anything meaningful in their PM callbacks. >>> >>> Things like the platform bus type, spi bus type, i2c bus type and similar. >>> >>> If the middle-layer callbacks manipulate devices in a significant way, then >>> they aren't trivial. >> >> I fully agree with Rafael's description above, but let me also clarify >> one more thing. >> >> We have also been discussing PM domains as being trivial and >> non-trivial. In some statements I even think the PM domain has been a >> part the middle-layer terminology, which may have been a bit >> confusing. >> >> In this regards as we consider genpd being a trivial PM domain, those >> examples your bring up above is too me also examples of trivial PM >> domains. Especially because they don't deal with wakeups, as that is >> taken care of by the drivers, right!? > > Not directly, for example, omap device framework has noirq callback implemented > which forcibly disable all devices which are not PM runtime suspended. > while doing this it calls drivers PM .runtime_suspend() which may return > non 0 value and in this case device will be left enabled (powered) at suspend for > wake up purposes (see _od_suspend_noirq()). > Yeah, I had that feeling that omap has some trickyness going on. :-) I sure that can be fixed in the omap PM domain, although
On 19 October 2017 at 20:04, Ulf Hansson <ulf.hansson@linaro.org> wrote: > On 19 October 2017 at 19:21, Grygorii Strashko <grygorii.strashko@ti.com> wrote: >> >> >> On 10/19/2017 03:33 AM, Ulf Hansson wrote: >>> On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >>>> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote: >>>>> >>>>> On 10/18/2017 09:11 AM, Ulf Hansson wrote: >>>> >>>> [...] >>>> >>>>>>>> That's the point. We know pm_runtime_force_* works nicely for the >>>>>>>> trivial middle-layer cases. >>>>>>> >>>>>>> In which cases the middle-layer callbacks don't exist, so it's just like >>>>>>> reusing driver callbacks directly. :-) >>>>> >>>>> I'd like to ask you clarify one point here and provide some info which I hope can be useful - >>>>> what's exactly means "trivial middle-layer cases"? >>>>> >>>>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock >>>>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP >>>>> device framework struct dev_pm_domain omap_device_pm_domain >>>>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops >>>>> tegra_aconnect_pm_ops? >>>>> >>>>> if yes all above have PM runtime callbacks. >>>> >>>> Trivial ones don't actually do anything meaningful in their PM callbacks. >>>> >>>> Things like the platform bus type, spi bus type, i2c bus type and similar. >>>> >>>> If the middle-layer callbacks manipulate devices in a significant way, then >>>> they aren't trivial. >>> >>> I fully agree with Rafael's description above, but let me also clarify >>> one more thing. >>> >>> We have also been discussing PM domains as being trivial and >>> non-trivial. In some statements I even think the PM domain has been a >>> part the middle-layer terminology, which may have been a bit >>> confusing. >>> >>> In this regards as we consider genpd being a trivial PM domain, those >>> examples your bring up above is too me also examples of trivial PM >>> domains. Especially because they don't deal with wakeups, as that is >>> taken care of by the drivers, right!? >> >> Not directly, for example, omap device framework has noirq callback implemented >> which forcibly disable all devices which are not PM runtime suspended. >> while doing this it calls drivers PM .runtime_suspend() which may return >> non 0 value and in this case device will be left enabled (powered) at suspend for >> wake up purposes (see _od_suspend_noirq()). >> > > Yeah, I had that feeling that omap has some trickyness going on. :-) > > I sure that can be fixed in the omap PM domain, although ...slipped with my fingers.. here is the rest of the reply... ..of course that require us to use another way for drivers to signal to the omap PM domain that it needs to stay powered as to deal with wakeup. I can have a look at that more closely, to see if it makes sense to change. Kind regards Uffe
On 10/19/2017 01:11 PM, Ulf Hansson wrote: > On 19 October 2017 at 20:04, Ulf Hansson <ulf.hansson@linaro.org> wrote: >> On 19 October 2017 at 19:21, Grygorii Strashko <grygorii.strashko@ti.com> wrote: >>> >>> >>> On 10/19/2017 03:33 AM, Ulf Hansson wrote: >>>> On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >>>>> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote: >>>>>> >>>>>> On 10/18/2017 09:11 AM, Ulf Hansson wrote: >>>>> >>>>> [...] >>>>> >>>>>>>>> That's the point. We know pm_runtime_force_* works nicely for the >>>>>>>>> trivial middle-layer cases. >>>>>>>> >>>>>>>> In which cases the middle-layer callbacks don't exist, so it's just like >>>>>>>> reusing driver callbacks directly. :-) >>>>>> >>>>>> I'd like to ask you clarify one point here and provide some info which I hope can be useful - >>>>>> what's exactly means "trivial middle-layer cases"? >>>>>> >>>>>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock >>>>>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP >>>>>> device framework struct dev_pm_domain omap_device_pm_domain >>>>>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops >>>>>> tegra_aconnect_pm_ops? >>>>>> >>>>>> if yes all above have PM runtime callbacks. >>>>> >>>>> Trivial ones don't actually do anything meaningful in their PM callbacks. >>>>> >>>>> Things like the platform bus type, spi bus type, i2c bus type and similar. >>>>> >>>>> If the middle-layer callbacks manipulate devices in a significant way, then >>>>> they aren't trivial. >>>> >>>> I fully agree with Rafael's description above, but let me also clarify >>>> one more thing. >>>> >>>> We have also been discussing PM domains as being trivial and >>>> non-trivial. In some statements I even think the PM domain has been a >>>> part the middle-layer terminology, which may have been a bit >>>> confusing. >>>> >>>> In this regards as we consider genpd being a trivial PM domain, those >>>> examples your bring up above is too me also examples of trivial PM >>>> domains. Especially because they don't deal with wakeups, as that is >>>> taken care of by the drivers, right!? >>> >>> Not directly, for example, omap device framework has noirq callback implemented >>> which forcibly disable all devices which are not PM runtime suspended. >>> while doing this it calls drivers PM .runtime_suspend() which may return >>> non 0 value and in this case device will be left enabled (powered) at suspend for >>> wake up purposes (see _od_suspend_noirq()). >>> >> >> Yeah, I had that feeling that omap has some trickyness going on. :-) >> >> I sure that can be fixed in the omap PM domain, although > > ...slipped with my fingers.. here is the rest of the reply... > > ..of course that require us to use another way for drivers to signal > to the omap PM domain that it needs to stay powered as to deal with > wakeup. > > I can have a look at that more closely, to see if it makes sense to change. > Also, additional note here. some IPs are reused between OMAP/Davinci/Keystone, OMAP PM domain have some code running at noirq time to dial with devices left in PM runtime enabled state (OMAP PM runtime centric), while Davinci/Keystone haven't (clock_ops.c), so pm_runtime_force_* API is actually possibility now to make the same driver work on all these platforms.
On Thursday, October 19, 2017 2:21:07 PM CEST Ulf Hansson wrote: > On 19 October 2017 at 00:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote: > >> [...] > >> > >> >> > >> >> The reason why pm_runtime_force_* needs to respects the hierarchy of > >> >> the RPM callbacks, is because otherwise it can't safely update the > >> >> runtime PM status of the device. > >> > > >> > I'm not sure I follow this requirement. Why is that so? > >> > >> If the PM domain controls some resources for the device in its RPM > >> callbacks and the driver controls some other resources in its RPM > >> callbacks - then these resources needs to be managed together. > > > > Right, but that doesn't automatically make it necessary to use runtime PM > > callbacks in the middle layer. Its system-wide PM callbacks may be > > suitable for that just fine. > > > > That is, at least in some cases, you can combine ->runtime_suspend from a > > driver and ->suspend_late from a middle layer with no problems, for example. > > > > That's why some middle layers allow drivers to point ->suspend_late and > > ->runtime_suspend to the same routine if they want to reuse that code. > > > >> This follows the behavior of when a regular call to > >> pm_runtime_get|put(), triggers the RPM callbacks to be invoked. > > > > But (a) it doesn't have to follow it and (b) in some cases it should not > > follow it. > > Of course you don't explicitly *have to* respect the hierarchy of the > RPM callbacks in pm_runtime_force_*. > > However, changing that would require some additional information > exchange between the driver and the middle-layer/PM domain, as to > instruct the middle-layer/PM domain of what to do during system-wide > PM. Especially in cases when the driver deals with wakeup, as in those > cases the instructions may change dynamically. Well, if wakeup matters, drivers can't simply point their PM callbacks to pm_runtime_force_* anyway. If the driver itself deals with wakeups, it clearly needs different callback routines for system-wide PM and for runtime PM, so it can't reuse its runtime PM callbacks at all then. If the middle layer deals with wakeups, different callbacks are needed at that level and so pm_runtime_force_* are unsuitable too. Really, invoking runtime PM callbacks from the middle layer in pm_runtime_force_* is a not a idea at all and there's no general requirement for it whatever. > [...] > > >> > In general, not if the wakeup settings are adjusted by the middle layer. > >> > >> Correct! > >> > >> To use pm_runtime_force* for these cases, one would need some > >> additional information exchange between the driver and the > >> middle-layer. > > > > Which pretty much defeats the purpose of the wrappers, doesn't it? > > Well, no matter if the wrappers are used or not, we need some kind of > information exchange between the driver and the middle-layers/PM > domains. Right. But if that information is exchanged, then why use wrappers *in* *addition* to that? > Anyway, me personally think it's too early to conclude that using the > wrappers may not be useful going forward. At this point, they clearly > helps trivial cases to remain being trivial. I'm not sure about that really. So far I've seen more complexity resulting from using them than being avoided by using them, but I guess the beauty is in the eye of the beholder. :-) > > > >> > > >> >> Regarding hibernation, honestly that's not really my area of > >> >> expertise. Although, I assume the middle-layer and driver can treat > >> >> that as a separate case, so if it's not suitable to use > >> >> pm_runtime_force* for that case, then they shouldn't do it. > >> > > >> > Well, agreed. > >> > > >> > In some simple cases, though, driver callbacks can be reused for hibernation > >> > too, so it would be good to have a common way to do that too, IMO. > >> > >> Okay, that makes sense! > >> > >> > > >> >> > > >> >> > Also, quite so often other middle layers interact with PCI directly or > >> >> > indirectly (eg. a platform device may be a child or a consumer of a PCI > >> >> > device) and some optimizations need to take that into account (eg. parents > >> >> > generally need to be accessible when their childres are resumed and so on). > >> >> > >> >> A device's parent becomes informed when changing the runtime PM status > >> >> of the device via pm_runtime_force_suspend|resume(), as those calls > >> >> pm_runtime_set_suspended|active(). > >> > > >> > This requires the parent driver or middle layer to look at the reference > >> > counter and understand it the same way as pm_runtime_force_*. > >> > > >> >> In case that isn't that sufficient, what else is needed? Perhaps you can > >> >> point me to an example so I can understand better? > >> > > >> > Say you want to leave the parent suspended after system resume, but the > >> > child drivers use pm_runtime_force_suspend|resume(). The parent would then > >> > need to use pm_runtime_force_suspend|resume() too, no? > >> > >> Actually no. > >> > >> Currently the other options of "deferring resume" (not using > >> pm_runtime_force_*), is either using the "direct_complete" path or > >> similar to the approach you took for the i2c designware driver. > >> > >> Both cases should play nicely in combination of a child being managed > >> by pm_runtime_force_*. That's because only when the parent device is > >> kept runtime suspended during system suspend, resuming can be > >> deferred. > > > > And because the parent remains in runtime suspend late enough in the > > system suspend path, its children also are guaranteed to be suspended. > > Yes. > > > > > But then all of them need to be left in runtime suspend during system > > resume too, which is somewhat restrictive, because some drivers may > > want their devices to be resumed then. > > Actually, this scenario is also addressed when using the pm_runtime_force_*. > > The driver for the child would only need to bump the runtime PM usage > count (pm_runtime_get_noresume()) before calling > pm_runtime_force_suspend() at system suspend. That then also > propagates to the parent, leading to that both the parent and the > child will be resumed when pm_runtime_force_resume() is called for > them. > > Of course, if the driver of the parent isn't using pm_runtime_force_, > we would have to assume that it's always being resumed at system > resume. There may be other ways to avoid that, though. BTW, I don't quite like using the RPM usage counter this way either, if that hasn't been clear so far. > As at matter of fact, doesn't this scenario actually indicates that we > do need to involve the runtime PM core (updating RPM status according > to the HW state even during system-wide PM) to really get this right. > It's not enough to only use "driver PM flags"!? I'm not sure what you are talking about. For all devices with enabled runtime PM any state produced by system suspend/resume has to be labeled either as RPM_SUSPENDED or as RPM_ACTIVE. That has always been the case and hasn't involved any magic. However, while runtime PM is disabled, the state of the device doesn't need to be reflected by its RPM status and there's no need to track it then. Moreover, in some cases it cannot be tracked even, because of the firmare involvement (and we cannot track the firmware). Besides, please really look at what happens in the patches I posted and then we can talk. > Seems like we need to create a list of all requirements, pitfalls, > good things vs bad things etc. :-) We surely need to know what general cases need to be addressed. > > > > [BTW, our current documentation recommends resuming devices during > > system resume, actually, and gives a list of reasons why. :-)] > > Yes, but that too easy and to me not good enough. :-) But the list of reasons why is kind of valid still. There may be better reasons for not doing that, but it really is a tradeoff and drivers should be able to decide which way they want to go. IOW, the "leave the device in runtime suspend throughout system suspend" optimization doesn't have to be bundled with the "leave the device in suspend throughout and after system resume" one. Thanks, Rafael
On 20 October 2017 at 03:19, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > On Thursday, October 19, 2017 2:21:07 PM CEST Ulf Hansson wrote: >> On 19 October 2017 at 00:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> > On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote: >> >> [...] >> >> >> >> >> >> >> >> The reason why pm_runtime_force_* needs to respects the hierarchy of >> >> >> the RPM callbacks, is because otherwise it can't safely update the >> >> >> runtime PM status of the device. >> >> > >> >> > I'm not sure I follow this requirement. Why is that so? >> >> >> >> If the PM domain controls some resources for the device in its RPM >> >> callbacks and the driver controls some other resources in its RPM >> >> callbacks - then these resources needs to be managed together. >> > >> > Right, but that doesn't automatically make it necessary to use runtime PM >> > callbacks in the middle layer. Its system-wide PM callbacks may be >> > suitable for that just fine. >> > >> > That is, at least in some cases, you can combine ->runtime_suspend from a >> > driver and ->suspend_late from a middle layer with no problems, for example. >> > >> > That's why some middle layers allow drivers to point ->suspend_late and >> > ->runtime_suspend to the same routine if they want to reuse that code. >> > >> >> This follows the behavior of when a regular call to >> >> pm_runtime_get|put(), triggers the RPM callbacks to be invoked. >> > >> > But (a) it doesn't have to follow it and (b) in some cases it should not >> > follow it. >> >> Of course you don't explicitly *have to* respect the hierarchy of the >> RPM callbacks in pm_runtime_force_*. >> >> However, changing that would require some additional information >> exchange between the driver and the middle-layer/PM domain, as to >> instruct the middle-layer/PM domain of what to do during system-wide >> PM. Especially in cases when the driver deals with wakeup, as in those >> cases the instructions may change dynamically. > > Well, if wakeup matters, drivers can't simply point their PM callbacks > to pm_runtime_force_* anyway. > > If the driver itself deals with wakeups, it clearly needs different callback > routines for system-wide PM and for runtime PM, so it can't reuse its runtime > PM callbacks at all then. It can still re-use its runtime PM callbacks, simply by calling pm_runtime_force_ from its system sleep callbacks. Drivers already do that today, not only to deal with wakeups, but generally when they need to deal with some additional operations. > > If the middle layer deals with wakeups, different callbacks are needed at > that level and so pm_runtime_force_* are unsuitable too. > > Really, invoking runtime PM callbacks from the middle layer in > pm_runtime_force_* is a not a idea at all and there's no general requirement > for it whatever. > >> [...] >> >> >> > In general, not if the wakeup settings are adjusted by the middle layer. >> >> >> >> Correct! >> >> >> >> To use pm_runtime_force* for these cases, one would need some >> >> additional information exchange between the driver and the >> >> middle-layer. >> > >> > Which pretty much defeats the purpose of the wrappers, doesn't it? >> >> Well, no matter if the wrappers are used or not, we need some kind of >> information exchange between the driver and the middle-layers/PM >> domains. > > Right. > > But if that information is exchanged, then why use wrappers *in* *addition* > to that? If we can find a different method that both avoids both open coding and offers the optimize system-wide PM path at resume, I am open to that. > >> Anyway, me personally think it's too early to conclude that using the >> wrappers may not be useful going forward. At this point, they clearly >> helps trivial cases to remain being trivial. > > I'm not sure about that really. So far I've seen more complexity resulting > from using them than being avoided by using them, but I guess the beauty is > in the eye of the beholder. :-) Hehe, yeah you may be right. :-) > >> > >> >> > >> >> >> Regarding hibernation, honestly that's not really my area of >> >> >> expertise. Although, I assume the middle-layer and driver can treat >> >> >> that as a separate case, so if it's not suitable to use >> >> >> pm_runtime_force* for that case, then they shouldn't do it. >> >> > >> >> > Well, agreed. >> >> > >> >> > In some simple cases, though, driver callbacks can be reused for hibernation >> >> > too, so it would be good to have a common way to do that too, IMO. >> >> >> >> Okay, that makes sense! >> >> >> >> > >> >> >> > >> >> >> > Also, quite so often other middle layers interact with PCI directly or >> >> >> > indirectly (eg. a platform device may be a child or a consumer of a PCI >> >> >> > device) and some optimizations need to take that into account (eg. parents >> >> >> > generally need to be accessible when their childres are resumed and so on). >> >> >> >> >> >> A device's parent becomes informed when changing the runtime PM status >> >> >> of the device via pm_runtime_force_suspend|resume(), as those calls >> >> >> pm_runtime_set_suspended|active(). >> >> > >> >> > This requires the parent driver or middle layer to look at the reference >> >> > counter and understand it the same way as pm_runtime_force_*. >> >> > >> >> >> In case that isn't that sufficient, what else is needed? Perhaps you can >> >> >> point me to an example so I can understand better? >> >> > >> >> > Say you want to leave the parent suspended after system resume, but the >> >> > child drivers use pm_runtime_force_suspend|resume(). The parent would then >> >> > need to use pm_runtime_force_suspend|resume() too, no? >> >> >> >> Actually no. >> >> >> >> Currently the other options of "deferring resume" (not using >> >> pm_runtime_force_*), is either using the "direct_complete" path or >> >> similar to the approach you took for the i2c designware driver. >> >> >> >> Both cases should play nicely in combination of a child being managed >> >> by pm_runtime_force_*. That's because only when the parent device is >> >> kept runtime suspended during system suspend, resuming can be >> >> deferred. >> > >> > And because the parent remains in runtime suspend late enough in the >> > system suspend path, its children also are guaranteed to be suspended. >> >> Yes. >> >> > >> > But then all of them need to be left in runtime suspend during system >> > resume too, which is somewhat restrictive, because some drivers may >> > want their devices to be resumed then. >> >> Actually, this scenario is also addressed when using the pm_runtime_force_*. >> >> The driver for the child would only need to bump the runtime PM usage >> count (pm_runtime_get_noresume()) before calling >> pm_runtime_force_suspend() at system suspend. That then also >> propagates to the parent, leading to that both the parent and the >> child will be resumed when pm_runtime_force_resume() is called for >> them. >> >> Of course, if the driver of the parent isn't using pm_runtime_force_, >> we would have to assume that it's always being resumed at system >> resume. > > There may be other ways to avoid that, though. > > BTW, I don't quite like using the RPM usage counter this way either, if > that hasn't been clear so far. > >> As at matter of fact, doesn't this scenario actually indicates that we >> do need to involve the runtime PM core (updating RPM status according >> to the HW state even during system-wide PM) to really get this right. >> It's not enough to only use "driver PM flags"!? > > I'm not sure what you are talking about. > > For all devices with enabled runtime PM any state produced by system > suspend/resume has to be labeled either as RPM_SUSPENDED or as RPM_ACTIVE. > That has always been the case and hasn't involved any magic. > > However, while runtime PM is disabled, the state of the device doesn't > need to be reflected by its RPM status and there's no need to track it then. > Moreover, in some cases it cannot be tracked even, because of the firmare > involvement (and we cannot track the firmware). > > Besides, please really look at what happens in the patches I posted and > then we can talk. Yes, I will have look. > >> Seems like we need to create a list of all requirements, pitfalls, >> good things vs bad things etc. :-) > > We surely need to know what general cases need to be addressed. > >> > >> > [BTW, our current documentation recommends resuming devices during >> > system resume, actually, and gives a list of reasons why. :-)] >> >> Yes, but that too easy and to me not good enough. :-) > > But the list of reasons why is kind of valid still. There may be better > reasons for not doing that, but it really is a tradeoff and drivers > should be able to decide which way they want to go. Agree. > > IOW, the "leave the device in runtime suspend throughout system > suspend" optimization doesn't have to be bundled with the "leave the > device in suspend throughout and after system resume" one. Agree. Kind regards Uffe
[...] >>>>> In this regards as we consider genpd being a trivial PM domain, those >>>>> examples your bring up above is too me also examples of trivial PM >>>>> domains. Especially because they don't deal with wakeups, as that is >>>>> taken care of by the drivers, right!? >>>> >>>> Not directly, for example, omap device framework has noirq callback implemented >>>> which forcibly disable all devices which are not PM runtime suspended. >>>> while doing this it calls drivers PM .runtime_suspend() which may return >>>> non 0 value and in this case device will be left enabled (powered) at suspend for >>>> wake up purposes (see _od_suspend_noirq()). >>>> >>> >>> Yeah, I had that feeling that omap has some trickyness going on. :-) >>> >>> I sure that can be fixed in the omap PM domain, although >> >> ...slipped with my fingers.. here is the rest of the reply... >> >> ..of course that require us to use another way for drivers to signal >> to the omap PM domain that it needs to stay powered as to deal with >> wakeup. >> >> I can have a look at that more closely, to see if it makes sense to change. >> > > Also, additional note here. some IPs are reused between OMAP/Davinci/Keystone, > OMAP PM domain have some code running at noirq time to dial with devices left > in PM runtime enabled state (OMAP PM runtime centric), while Davinci/Keystone haven't (clock_ops.c), > so pm_runtime_force_* API is actually possibility now to make the same driver work > on all these platforms. That sounds great! Also, in the end it would be nice to also convert the OMAP PM domain to genpd. I think most of the needed infrastructure is already there to do that. Kind regards Uffe
Index: linux-pm/drivers/amba/bus.c =================================================================== --- linux-pm.orig/drivers/amba/bus.c +++ linux-pm/drivers/amba/bus.c @@ -132,52 +132,77 @@ static struct attribute *amba_dev_attrs[ ATTRIBUTE_GROUPS(amba_dev); #ifdef CONFIG_PM +static void amba_pm_suspend(struct device *dev) +{ + struct amba_device *pcdev = to_amba_device(dev); + + if (!dev->driver) + return; + + if (pm_runtime_is_irq_safe(dev)) + clk_disable(pcdev->pclk); + else + clk_disable_unprepare(pcdev->pclk); +} + +static int amba_pm_resume(struct device *dev) +{ + struct amba_device *pcdev = to_amba_device(dev); + + if (!dev->driver) + return 0; + + /* Failure is probably fatal to the system, but... */ + if (pm_runtime_is_irq_safe(dev)) + return clk_enable(pcdev->pclk); + + return clk_prepare_enable(pcdev->pclk); +} + /* * Hooks to provide runtime PM of the pclk (bus clock). It is safe to * enable/disable the bus clock at runtime PM suspend/resume as this * does not result in loss of context. */ +static int amba_pm_suspend_early(struct device *dev) +{ + int ret = pm_generic_suspend_early(dev); + + if (ret) + return ret; + + amba_pm_suspend(dev); + return 0; +} + +static int amba_pm_resume_late(struct device *dev) +{ + int ret = amba_pm_resume(dev); + + return ret ? ret : pm_generic_resume_late(dev); +} + static int amba_pm_runtime_suspend(struct device *dev) { - struct amba_device *pcdev = to_amba_device(dev); int ret = pm_generic_runtime_suspend(dev); - if (ret == 0 && dev->driver) { - if (pm_runtime_is_irq_safe(dev)) - clk_disable(pcdev->pclk); - else - clk_disable_unprepare(pcdev->pclk); - } + if (ret) + return ret; - return ret; + amba_pm_suspend(dev); + return 0; } static int amba_pm_runtime_resume(struct device *dev) { - struct amba_device *pcdev = to_amba_device(dev); - int ret; - - if (dev->driver) { - if (pm_runtime_is_irq_safe(dev)) - ret = clk_enable(pcdev->pclk); - else - ret = clk_prepare_enable(pcdev->pclk); - /* Failure is probably fatal to the system, but... */ - if (ret) - return ret; - } + int ret = amba_pm_resume(dev); - return pm_generic_runtime_resume(dev); + return ret ? ret : pm_generic_runtime_resume(dev); } #endif /* CONFIG_PM */ static const struct dev_pm_ops amba_pm = { - .suspend = pm_generic_suspend, - .resume = pm_generic_resume, - .freeze = pm_generic_freeze, - .thaw = pm_generic_thaw, - .poweroff = pm_generic_poweroff, - .restore = pm_generic_restore, + SET_LATE_SYSTEM_SLEEP_PM_OPS(amba_pm_suspend_late, amba_pm_resume_early) SET_RUNTIME_PM_OPS( amba_pm_runtime_suspend, amba_pm_runtime_resume, Index: linux-pm/drivers/base/power/runtime.c =================================================================== --- linux-pm.orig/drivers/base/power/runtime.c +++ linux-pm/drivers/base/power/runtime.c @@ -1636,14 +1636,15 @@ void pm_runtime_drop_link(struct device */ int pm_runtime_force_suspend(struct device *dev) { - int (*callback)(struct device *); + int (*callback)(struct device *) = NULL; int ret = 0; pm_runtime_disable(dev); if (pm_runtime_status_suspended(dev)) return 0; - callback = RPM_GET_CALLBACK(dev, runtime_suspend); + if (dev->driver && dev->driver->pm) + callback = dev->driver->pm->runtime_suspend; if (!callback) { ret = -ENOSYS; @@ -1690,10 +1691,11 @@ EXPORT_SYMBOL_GPL(pm_runtime_force_suspe */ int pm_runtime_force_resume(struct device *dev) { - int (*callback)(struct device *); + int (*callback)(struct device *) = NULL; int ret = 0; - callback = RPM_GET_CALLBACK(dev, runtime_resume); + if (dev->driver && dev->driver->pm) + callback = dev->driver->pm->runtime_resume; if (!callback) { ret = -ENOSYS;