diff mbox

runtime PM usage_count during driver_probe_device()?

Message ID 87d3hu6oxo.fsf@ti.com (mailing list archive)
State Rejected, archived
Headers show

Commit Message

Kevin Hilman July 1, 2011, 2:44 p.m. UTC
Kevin Hilman <khilman@ti.com> writes:

[...]

>     If the device bus type's or driver's ->probe() or ->remove()
>     callback runs pm_runtime_suspend() or pm_runtime_idle() or their
>     asynchronous counterparts, they will fail returning -EAGAIN, because
>     the device's usage counter is incremented by the core before
>     executing ->probe() and ->remove().  Still, it may be desirable to
>     suspend the device as soon as ->probe() or ->remove() has finished,
>     so the PM core uses pm_runtime_idle_sync() to invoke the
>     subsystem-level idle callback for the device at that time.

[...]

> Another curiosity is that, contrary to the above documentation, there is
> no usage_count increment before the bus/driver ->remove() (although
> there is a _get_sync/_put_sync around the sysfs_remove and notifier just
> before the bus/driver->remove().

OK, so the ->probe() part has been explained and makes sense, but I
would expect ->remove() to be similarily protected (as the documentation
states.)  But that is not the case.  Is that a bug?  If so, patch below
makes the code match the documentation.

Kevin

From eef73ab2feb203bacb57dc35862f2a9969b61593 Mon Sep 17 00:00:00 2001
From: Kevin Hilman <khilman@ti.com>
Date: Fri, 1 Jul 2011 07:37:47 -0700
Subject: [PATCH] driver core: prevent runtime PM races with ->remove()

Runtime PM Documentation states that the runtime PM usage count is
incremented during driver ->probe() and ->remove().  This is designed
to prevent driver runtime PM races with subsystems which may initiate
runtime PM transitions before during and after drivers are loaded.

Current code increments the usage_count during ->probe() but not
during ->remove().  This patch fixes the ->remove() part and makes the
code match the documentation.

Signed-off-by: Kevin Hilman <khilman@ti.com>
---
 drivers/base/dd.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

Comments

Alan Stern July 1, 2011, 3:25 p.m. UTC | #1
On Fri, 1 Jul 2011, Kevin Hilman wrote:

> OK, so the ->probe() part has been explained and makes sense, but I
> would expect ->remove() to be similarily protected (as the documentation
> states.)  But that is not the case.  Is that a bug?  If so, patch below
> makes the code match the documentation.

I suspect it is a bug, but it's hard to be sure.  It's so _blatantly_ 
wrong that it looks like it was done deliberately.

> Kevin
> 
> From eef73ab2feb203bacb57dc35862f2a9969b61593 Mon Sep 17 00:00:00 2001
> From: Kevin Hilman <khilman@ti.com>
> Date: Fri, 1 Jul 2011 07:37:47 -0700
> Subject: [PATCH] driver core: prevent runtime PM races with ->remove()
> 
> Runtime PM Documentation states that the runtime PM usage count is
> incremented during driver ->probe() and ->remove().  This is designed
> to prevent driver runtime PM races with subsystems which may initiate
> runtime PM transitions before during and after drivers are loaded.
> 
> Current code increments the usage_count during ->probe() but not
> during ->remove().  This patch fixes the ->remove() part and makes the
> code match the documentation.
> 
> Signed-off-by: Kevin Hilman <khilman@ti.com>
> ---
>  drivers/base/dd.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 6658da7..47e079d 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -329,13 +329,13 @@ static void __device_release_driver(struct device *dev)
>  			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>  						     BUS_NOTIFY_UNBIND_DRIVER,
>  						     dev);
> -
> -		pm_runtime_put_sync(dev);
> -
>  		if (dev->bus && dev->bus->remove)
>  			dev->bus->remove(dev);
>  		else if (drv->remove)
>  			drv->remove(dev);
> +
> +		pm_runtime_put_sync(dev);
> +
>  		devres_release_all(dev);
>  		dev->driver = NULL;
>  		klist_remove(&dev->p->knode_driver);

To be safer, the put_sync() call should be moved down here.  Or maybe 
even after the blocking_notifier_call_chain() that occurs here.

Alan Stern
Kevin Hilman July 1, 2011, 3:45 p.m. UTC | #2
Alan Stern <stern@rowland.harvard.edu> writes:

> On Fri, 1 Jul 2011, Kevin Hilman wrote:
>
>> OK, so the ->probe() part has been explained and makes sense, but I
>> would expect ->remove() to be similarily protected (as the documentation
>> states.)  But that is not the case.  Is that a bug?  If so, patch below
>> makes the code match the documentation.
>
> I suspect it is a bug, but it's hard to be sure.  It's so _blatantly_ 
> wrong that it looks like it was done deliberately.

heh

>> Kevin
>> 
>> From eef73ab2feb203bacb57dc35862f2a9969b61593 Mon Sep 17 00:00:00 2001
>> From: Kevin Hilman <khilman@ti.com>
>> Date: Fri, 1 Jul 2011 07:37:47 -0700
>> Subject: [PATCH] driver core: prevent runtime PM races with ->remove()
>> 
>> Runtime PM Documentation states that the runtime PM usage count is
>> incremented during driver ->probe() and ->remove().  This is designed
>> to prevent driver runtime PM races with subsystems which may initiate
>> runtime PM transitions before during and after drivers are loaded.
>> 
>> Current code increments the usage_count during ->probe() but not
>> during ->remove().  This patch fixes the ->remove() part and makes the
>> code match the documentation.
>> 
>> Signed-off-by: Kevin Hilman <khilman@ti.com>
>> ---
>>  drivers/base/dd.c |    6 +++---
>>  1 files changed, 3 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>> index 6658da7..47e079d 100644
>> --- a/drivers/base/dd.c
>> +++ b/drivers/base/dd.c
>> @@ -329,13 +329,13 @@ static void __device_release_driver(struct device *dev)
>>  			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>>  						     BUS_NOTIFY_UNBIND_DRIVER,
>>  						     dev);
>> -
>> -		pm_runtime_put_sync(dev);
>> -
>>  		if (dev->bus && dev->bus->remove)
>>  			dev->bus->remove(dev);
>>  		else if (drv->remove)
>>  			drv->remove(dev);
>> +
>> +		pm_runtime_put_sync(dev);
>> +
>>  		devres_release_all(dev);
>>  		dev->driver = NULL;
>>  		klist_remove(&dev->p->knode_driver);
>
> To be safer, the put_sync() call should be moved down here.  Or maybe 
> even after the blocking_notifier_call_chain() that occurs here.

I was actually thinking about the other direction: moving the get_sync
after the first notifier chain.  IOW, the get_sync/put_sync only
protects the ->remove() calls, not the notifiers.

The protection around the notifiers doesn't make sense to me, at least
in the context of driver runtime PM racing with the subsystem.
Especially since these notifiers are likely how the
subsystem/bus/pm_domain code getting notified that there may be a device
to manage in the first place.

Kevin
Alan Stern July 1, 2011, 3:59 p.m. UTC | #3
On Fri, 1 Jul 2011, Kevin Hilman wrote:

> >> --- a/drivers/base/dd.c
> >> +++ b/drivers/base/dd.c
> >> @@ -329,13 +329,13 @@ static void __device_release_driver(struct device *dev)
> >>  			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> >>  						     BUS_NOTIFY_UNBIND_DRIVER,
> >>  						     dev);
> >> -
> >> -		pm_runtime_put_sync(dev);
> >> -
> >>  		if (dev->bus && dev->bus->remove)
> >>  			dev->bus->remove(dev);
> >>  		else if (drv->remove)
> >>  			drv->remove(dev);
> >> +
> >> +		pm_runtime_put_sync(dev);
> >> +
> >>  		devres_release_all(dev);
> >>  		dev->driver = NULL;
> >>  		klist_remove(&dev->p->knode_driver);
> >
> > To be safer, the put_sync() call should be moved down here.  Or maybe 
> > even after the blocking_notifier_call_chain() that occurs here.
> 
> I was actually thinking about the other direction: moving the get_sync
> after the first notifier chain.  IOW, the get_sync/put_sync only
> protects the ->remove() calls, not the notifiers.
> 
> The protection around the notifiers doesn't make sense to me, at least
> in the context of driver runtime PM racing with the subsystem.
> Especially since these notifiers are likely how the
> subsystem/bus/pm_domain code getting notified that there may be a device
> to manage in the first place.

The get_sync part doesn't matter so much.  Moving it past the notifier 
call would probably be okay -- unless one of the listeners on the 
notifier chain expects the device to be active.  Changing the get_sync 
to get_noresume would probably also be okay -- subject to a similar 
reservation.

The problem with the put_sync isn't the notifier.  If you leave it
where you've got it now, you'll end up invoking a callback at a time
when the driver thinks it no longer controls the device but the
driver-model core still thinks it does.  You certainly want to do the

	dev->driver = NULL;

first.

Alan Stern
Kevin Hilman July 1, 2011, 4:54 p.m. UTC | #4
Alan Stern <stern@rowland.harvard.edu> writes:

> On Fri, 1 Jul 2011, Kevin Hilman wrote:
>
>> >> --- a/drivers/base/dd.c
>> >> +++ b/drivers/base/dd.c
>> >> @@ -329,13 +329,13 @@ static void __device_release_driver(struct device *dev)
>> >>  			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>> >>  						     BUS_NOTIFY_UNBIND_DRIVER,
>> >>  						     dev);
>> >> -
>> >> -		pm_runtime_put_sync(dev);
>> >> -
>> >>  		if (dev->bus && dev->bus->remove)
>> >>  			dev->bus->remove(dev);
>> >>  		else if (drv->remove)
>> >>  			drv->remove(dev);
>> >> +
>> >> +		pm_runtime_put_sync(dev);
>> >> +
>> >>  		devres_release_all(dev);
>> >>  		dev->driver = NULL;
>> >>  		klist_remove(&dev->p->knode_driver);
>> >
>> > To be safer, the put_sync() call should be moved down here.  Or maybe 
>> > even after the blocking_notifier_call_chain() that occurs here.
>> 
>> I was actually thinking about the other direction: moving the get_sync
>> after the first notifier chain.  IOW, the get_sync/put_sync only
>> protects the ->remove() calls, not the notifiers.
>> 
>> The protection around the notifiers doesn't make sense to me, at least
>> in the context of driver runtime PM racing with the subsystem.
>> Especially since these notifiers are likely how the
>> subsystem/bus/pm_domain code getting notified that there may be a device
>> to manage in the first place.
>
> The get_sync part doesn't matter so much.  Moving it past the notifier 
> call would probably be okay -- unless one of the listeners on the 
> notifier chain expects the device to be active.  Changing the get_sync 
> to get_noresume would probably also be okay -- subject to a similar 
> reservation.

There are enough "probably"s in the above to make me a bit uncomfortable
making this change.  Maybe you can take this patch forward?

Kevin

> The problem with the put_sync isn't the notifier.  If you leave it
> where you've got it now, you'll end up invoking a callback at a time
> when the driver thinks it no longer controls the device but the
> driver-model core still thinks it does.  You certainly want to do the
>
> 	dev->driver = NULL;
>
> first.
>
> Alan Stern
Rafael Wysocki July 1, 2011, 8:53 p.m. UTC | #5
Hi,

On Friday, July 01, 2011, Kevin Hilman wrote:
> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Fri, 1 Jul 2011, Kevin Hilman wrote:
> >
> >> OK, so the ->probe() part has been explained and makes sense, but I
> >> would expect ->remove() to be similarily protected (as the documentation
> >> states.)  But that is not the case.  Is that a bug?  If so, patch below
> >> makes the code match the documentation.
> >
> > I suspect it is a bug, but it's hard to be sure.  It's so _blatantly_ 
> > wrong that it looks like it was done deliberately.
> 
> heh

I seem to remeber having a problem with the pm_runtime_put_sync() after
drv->remove(dev) ...

So the code in question was introduced by

commit e1866b33b1e89f077b7132daae3dfd9a594e9a1a
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Fri Apr 29 00:33:45 2011 +0200

    PM / Runtime: Rework runtime PM handling during driver removal

with a long changelog explaining the reason why.  Which seems to make sense. ;-)

So I'm not sure.

Thanks,
Rafael
Alan Stern July 1, 2011, 9:12 p.m. UTC | #6
On Fri, 1 Jul 2011, Rafael J. Wysocki wrote:

> Hi,
> 
> On Friday, July 01, 2011, Kevin Hilman wrote:
> > Alan Stern <stern@rowland.harvard.edu> writes:
> > 
> > > On Fri, 1 Jul 2011, Kevin Hilman wrote:
> > >
> > >> OK, so the ->probe() part has been explained and makes sense, but I
> > >> would expect ->remove() to be similarily protected (as the documentation
> > >> states.)  But that is not the case.  Is that a bug?  If so, patch below
> > >> makes the code match the documentation.
> > >
> > > I suspect it is a bug, but it's hard to be sure.  It's so _blatantly_ 
> > > wrong that it looks like it was done deliberately.
> > 
> > heh
> 
> I seem to remeber having a problem with the pm_runtime_put_sync() after
> drv->remove(dev) ...
> 
> So the code in question was introduced by
> 
> commit e1866b33b1e89f077b7132daae3dfd9a594e9a1a
> Author: Rafael J. Wysocki <rjw@sisk.pl>
> Date:   Fri Apr 29 00:33:45 2011 +0200
> 
>     PM / Runtime: Rework runtime PM handling during driver removal
> 
> with a long changelog explaining the reason why.  Which seems to make sense. ;-)

Okay, that seems fair enough.  Looks like the documentation needs to be 
updated to match, though.

And we probably still want to make sure that access to the 
power/control and related attribute files is mutually exclusive with 
probe and remove.

Alan Stern
diff mbox

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 6658da7..47e079d 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -329,13 +329,13 @@  static void __device_release_driver(struct device *dev)
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 						     BUS_NOTIFY_UNBIND_DRIVER,
 						     dev);
-
-		pm_runtime_put_sync(dev);
-
 		if (dev->bus && dev->bus->remove)
 			dev->bus->remove(dev);
 		else if (drv->remove)
 			drv->remove(dev);
+
+		pm_runtime_put_sync(dev);
+
 		devres_release_all(dev);
 		dev->driver = NULL;
 		klist_remove(&dev->p->knode_driver);