Message ID | 20180518173008.73291-2-tony@atomide.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 7e48e23a1f4a50f93ac1073f1326e0a73829b631 |
Headers | show |
On Fri, May 18, 2018 at 10:30:07AM -0700, Tony Lindgren wrote: > If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle(). > This is probably not a critical fix as we should only hit this when > things are broken elsewhere. This feels like a bug in the runtime PM APIs to be honest - I'd really not expect that if a function call like a get failed there'd be any cleanup to do. I'd expect a very high proportion of users to have the same problem due to this.
+Cc: Rafael On Mon, May 21, 2018 at 6:46 PM, Mark Brown <broonie@kernel.org> wrote: > On Fri, May 18, 2018 at 10:30:07AM -0700, Tony Lindgren wrote: >> If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle(). >> This is probably not a critical fix as we should only hit this when >> things are broken elsewhere. > > This feels like a bug in the runtime PM APIs to be honest - I'd really > not expect that if a function call like a get failed there'd be any > cleanup to do. I'd expect a very high proportion of users to have the > same problem due to this. I don't remember the full and correct explanation, but there is a rationale behind such behaviour (I suppose it's related to sync/async agnosticism of RPM ops)
Hi, On 5/21/2018 2:23 PM, Andy Shevchenko wrote: > +Cc: Rafael > > On Mon, May 21, 2018 at 6:46 PM, Mark Brown <broonie@kernel.org> wrote: >> On Fri, May 18, 2018 at 10:30:07AM -0700, Tony Lindgren wrote: >>> If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle(). >>> This is probably not a critical fix as we should only hit this when >>> things are broken elsewhere. >> >> This feels like a bug in the runtime PM APIs to be honest - I'd really >> not expect that if a function call like a get failed there'd be any >> cleanup to do. I'd expect a very high proportion of users to have the >> same problem due to this. > > I don't remember the full and correct explanation, but there is a > rationale behind such behaviour (I suppose it's related to sync/async > agnosticism of RPM ops) > We've seen this behavior before and have had similar code to protect against these failures before. The pm_runtime_get_sync() fails in certain conditions (for e.g when called during certain portions of the system suspend/resume). AFAIK: Since the runtime APIs internally increment the refcount then call rpm_resume(), in case rpm_resume() fails (for e.g in case RPM is disabled) then the driver has to either put the refcount and not go through with the transaction OR manually move the RPM state and call the RPM callback APIs internally. Best Regards Girish
On Mon, May 21, 2018 at 04:19:07PM -0600, Mahadevan, Girish wrote: > On 5/21/2018 2:23 PM, Andy Shevchenko wrote: > > On Mon, May 21, 2018 at 6:46 PM, Mark Brown <broonie@kernel.org> wrote: > >> On Fri, May 18, 2018 at 10:30:07AM -0700, Tony Lindgren wrote: > >>> If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle(). > >>> This is probably not a critical fix as we should only hit this when > >>> things are broken elsewhere. > >> This feels like a bug in the runtime PM APIs to be honest - I'd really > >> not expect that if a function call like a get failed there'd be any > >> cleanup to do. I'd expect a very high proportion of users to have the > >> same problem due to this. > > I don't remember the full and correct explanation, but there is a > > rationale behind such behaviour (I suppose it's related to sync/async > > agnosticism of RPM ops) > We've seen this behavior before and have had similar code to protect > against these failures before. > The pm_runtime_get_sync() fails in certain conditions (for e.g when > called during certain portions of the system suspend/resume). > AFAIK: > Since the runtime APIs internally increment the refcount then call > rpm_resume(), in case rpm_resume() fails (for e.g in case RPM is > disabled) then the driver has to either put the refcount and not go > through with the transaction OR manually move the RPM state and call the > RPM callback APIs internally. That's still really confusing and surprising, it seems like if the driver wants to manually call the callbacks there should be a specific API that it can use to get that behaviour rather than requiring every user to know that this might happen and manually handle it.
On 5/22/2018 11:32 AM, Mark Brown wrote: > On Mon, May 21, 2018 at 04:19:07PM -0600, Mahadevan, Girish wrote: >> On 5/21/2018 2:23 PM, Andy Shevchenko wrote: >>> On Mon, May 21, 2018 at 6:46 PM, Mark Brown <broonie@kernel.org> wrote: >>>> On Fri, May 18, 2018 at 10:30:07AM -0700, Tony Lindgren wrote: >>>>> If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle(). >>>>> This is probably not a critical fix as we should only hit this when >>>>> things are broken elsewhere. >>>> This feels like a bug in the runtime PM APIs to be honest - I'd really >>>> not expect that if a function call like a get failed there'd be any >>>> cleanup to do. I'd expect a very high proportion of users to have the >>>> same problem due to this. >>> I don't remember the full and correct explanation, but there is a >>> rationale behind such behaviour (I suppose it's related to sync/async >>> agnosticism of RPM ops) >> We've seen this behavior before and have had similar code to protect >> against these failures before. >> The pm_runtime_get_sync() fails in certain conditions (for e.g when >> called during certain portions of the system suspend/resume). >> AFAIK: >> Since the runtime APIs internally increment the refcount then call >> rpm_resume(), in case rpm_resume() fails (for e.g in case RPM is >> disabled) then the driver has to either put the refcount and not go >> through with the transaction OR manually move the RPM state and call the >> RPM callback APIs internally. > That's still really confusing and surprising, it seems like if the > driver wants to manually call the callbacks there should be a specific > API that it can use to get that behaviour rather than requiring every > user to know that this might happen and manually handle it. That's a design choice made a long time a go and sorry for it turning out to be inconvenient. However, there are pieces of code in the kernel that don't check the return value of pm_runtime_get_sync() and then call pm_runtime_put() (or equivalent) after that unconditionally. They all would need to be changed to do the latter only if the preceding pm_runtime_get_sync() was successful. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 22, 2018 at 08:01:29PM +0200, Rafael J. Wysocki wrote: > On 5/22/2018 11:32 AM, Mark Brown wrote: > > That's still really confusing and surprising, it seems like if the > > driver wants to manually call the callbacks there should be a specific > > API that it can use to get that behaviour rather than requiring every > > user to know that this might happen and manually handle it. > That's a design choice made a long time a go and sorry for it turning out to > be inconvenient. It's not so much that it's inconvenient as that it's a really surprising and unusual which makes it error prone - both in terms of people not being aware of the behaviour and in terms of the expected handling looking wrong. > However, there are pieces of code in the kernel that don't check the return > value of pm_runtime_get_sync() and then call pm_runtime_put() (or > equivalent) after that unconditionally. They all would need to be changed to > do the latter only if the preceding pm_runtime_get_sync() was successful. This does mean that we're encouraging code that ignores errors which doesn't seem ideal. On the other hand trying to change anything at this point might be more trouble than it's worth if there's too many users, perhaps it's better to just try to document this more.
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1213,6 +1213,7 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread) if (!was_busy && ctlr->auto_runtime_pm) { ret = pm_runtime_get_sync(ctlr->dev.parent); if (ret < 0) { + pm_runtime_put_noidle(ctlr->dev.parent); dev_err(&ctlr->dev, "Failed to power device: %d\n", ret); mutex_unlock(&ctlr->io_mutex);
If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle(). This is probably not a critical fix as we should only hit this when things are broken elsewhere. Signed-off-by: Tony Lindgren <tony@atomide.com> --- drivers/spi/spi.c | 1 + 1 file changed, 1 insertion(+)