diff mbox series

[RFC,1/7] drivers: base: Add resource managed version of delayed work init

Message ID 1230b0d2ba99ad546d72ab079e76cb1b3df32afb.1613216412.git.matti.vaittinen@fi.rohmeurope.com (mailing list archive)
State Superseded
Headers show
Series Add managed version of delayed work init | expand

Commit Message

Vaittinen, Matti Feb. 13, 2021, 11:58 a.m. UTC
A few drivers which need a delayed work-queue must cancel work at exit.
Some of those implement remove solely for this purpose. Help drivers
to avoid unnecessary remove and error-branch implementation by adding
managed verision of delayed work initialization

Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
---
 drivers/base/devres.c  | 33 +++++++++++++++++++++++++++++++++
 include/linux/device.h |  5 +++++
 2 files changed, 38 insertions(+)

Comments

Greg Kroah-Hartman Feb. 13, 2021, 12:16 p.m. UTC | #1
On Sat, Feb 13, 2021 at 01:58:44PM +0200, Matti Vaittinen wrote:
> A few drivers which need a delayed work-queue must cancel work at exit.
> Some of those implement remove solely for this purpose. Help drivers
> to avoid unnecessary remove and error-branch implementation by adding
> managed verision of delayed work initialization
> 
> Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>

That's not a good idea.  As this would kick in when the device is
removed from the system, not when it is unbound from the driver, right?

> ---
>  drivers/base/devres.c  | 33 +++++++++++++++++++++++++++++++++
>  include/linux/device.h |  5 +++++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/drivers/base/devres.c b/drivers/base/devres.c
> index fb9d5289a620..2879595bb5a4 100644
> --- a/drivers/base/devres.c
> +++ b/drivers/base/devres.c
> @@ -1231,3 +1231,36 @@ void devm_free_percpu(struct device *dev, void __percpu *pdata)
>  			       (void *)pdata));
>  }
>  EXPORT_SYMBOL_GPL(devm_free_percpu);
> +
> +static void dev_delayed_work_drop(struct device *dev, void *res)
> +{
> +	cancel_delayed_work_sync(*(struct delayed_work **)res);
> +}
> +
> +/**
> + * devm_delayed_work_autocancel - Resource-managed work allocation
> + * @dev: Device which lifetime work is bound to
> + * @pdata: work to be cancelled when device exits
> + *
> + * Initialize work which is automatically cancelled when device exits.

There is no such thing in the driver model as "when device exits".
Please use the proper terminology as I do not understand what you think
this is doing here...

> + * A few drivers need delayed work which must be cancelled before driver
> + * is unload to avoid accessing removed resources.
> + * devm_delayed_work_autocancel() can be used to omit the explicit
> + * cancelleation when driver is unload.
> + */
> +int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
> +				 void (*worker)(struct work_struct *work))
> +{
> +	struct delayed_work **ptr;
> +
> +	ptr = devres_alloc(dev_delayed_work_drop, sizeof(*ptr), GFP_KERNEL);
> +	if (!ptr)
> +		return -ENOMEM;
> +
> +	INIT_DELAYED_WORK(w, worker);
> +	*ptr = w;
> +	devres_add(dev, ptr);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(devm_delayed_work_autocancel);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 1779f90eeb4c..192456198de7 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -27,6 +27,7 @@
>  #include <linux/uidgid.h>
>  #include <linux/gfp.h>
>  #include <linux/overflow.h>
> +#include <linux/workqueue.h>
>  #include <linux/device/bus.h>
>  #include <linux/device/class.h>
>  #include <linux/device/driver.h>
> @@ -249,6 +250,10 @@ void __iomem *devm_of_iomap(struct device *dev,
>  			    struct device_node *node, int index,
>  			    resource_size_t *size);
>  
> +/* delayed work which is cancelled when driver exits */

Not when the "driver exits".

There is two different lifespans here (well 3).  Code and data*2.  Don't
confuse them as that will just cause lots of problems.

The move toward more and more "devm" functions is not the way to go as
they just more and more make things easier to get wrong.

APIs should be impossible to get wrong, this one is going to be almost
impossible to get right.

thanks,

greg k-h
Vaittinen, Matti Feb. 13, 2021, 12:26 p.m. UTC | #2
On Sat, 2021-02-13 at 13:16 +0100, Greg Kroah-Hartman wrote:
> On Sat, Feb 13, 2021 at 01:58:44PM +0200, Matti Vaittinen wrote:
> > A few drivers which need a delayed work-queue must cancel work at
> > exit.
> > Some of those implement remove solely for this purpose. Help
> > drivers
> > to avoid unnecessary remove and error-branch implementation by
> > adding
> > managed verision of delayed work initialization
> > 
> > Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
> 
> That's not a good idea.  As this would kick in when the device is
> removed from the system, not when it is unbound from the driver,
> right?
> 
> > ---
> >  drivers/base/devres.c  | 33 +++++++++++++++++++++++++++++++++
> >  include/linux/device.h |  5 +++++
> >  2 files changed, 38 insertions(+)
> > 
> > diff --git a/drivers/base/devres.c b/drivers/base/devres.c
> > index fb9d5289a620..2879595bb5a4 100644
> > --- a/drivers/base/devres.c
> > +++ b/drivers/base/devres.c
> > @@ -1231,3 +1231,36 @@ void devm_free_percpu(struct device *dev,
> > void __percpu *pdata)
> >  			       (void *)pdata));
> >  }
> >  EXPORT_SYMBOL_GPL(devm_free_percpu);
> > +
> > +static void dev_delayed_work_drop(struct device *dev, void *res)
> > +{
> > +	cancel_delayed_work_sync(*(struct delayed_work **)res);
> > +}
> > +
> > +/**
> > + * devm_delayed_work_autocancel - Resource-managed work allocation
> > + * @dev: Device which lifetime work is bound to
> > + * @pdata: work to be cancelled when device exits
> > + *
> > + * Initialize work which is automatically cancelled when device
> > exits.
> 
> There is no such thing in the driver model as "when device exits".
> Please use the proper terminology as I do not understand what you
> think
> this is doing here...
> 
> > + * A few drivers need delayed work which must be cancelled before
> > driver
> > + * is unload to avoid accessing removed resources.
> > + * devm_delayed_work_autocancel() can be used to omit the explicit
> > + * cancelleation when driver is unload.
> > + */
> > +int devm_delayed_work_autocancel(struct device *dev, struct
> > delayed_work *w,
> > +				 void (*worker)(struct work_struct
> > *work))
> > +{
> > +	struct delayed_work **ptr;
> > +
> > +	ptr = devres_alloc(dev_delayed_work_drop, sizeof(*ptr),
> > GFP_KERNEL);
> > +	if (!ptr)
> > +		return -ENOMEM;
> > +
> > +	INIT_DELAYED_WORK(w, worker);
> > +	*ptr = w;
> > +	devres_add(dev, ptr);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(devm_delayed_work_autocancel);
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 1779f90eeb4c..192456198de7 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -27,6 +27,7 @@
> >  #include <linux/uidgid.h>
> >  #include <linux/gfp.h>
> >  #include <linux/overflow.h>
> > +#include <linux/workqueue.h>
> >  #include <linux/device/bus.h>
> >  #include <linux/device/class.h>
> >  #include <linux/device/driver.h>
> > @@ -249,6 +250,10 @@ void __iomem *devm_of_iomap(struct device
> > *dev,
> >  			    struct device_node *node, int index,
> >  			    resource_size_t *size);
> >  
> > +/* delayed work which is cancelled when driver exits */
> 
> Not when the "driver exits".
> 
> There is two different lifespans here (well 3).  Code and
> data*2.  Don't
> confuse them as that will just cause lots of problems.
> 
> The move toward more and more "devm" functions is not the way to go
> as
> they just more and more make things easier to get wrong.
> 
> APIs should be impossible to get wrong, this one is going to be
> almost
> impossible to get right.

Thanks for prompt reply. I guess I must've misunderstood some of this
concept. Frankly to say, I don't understand how the devm based irq
management works and this does not. Maybe I'd better study this further
then.

Best Regards
	Matti Vaittinen
Greg Kroah-Hartman Feb. 13, 2021, 12:38 p.m. UTC | #3
On Sat, Feb 13, 2021 at 12:26:59PM +0000, Vaittinen, Matti wrote:
> 
> On Sat, 2021-02-13 at 13:16 +0100, Greg Kroah-Hartman wrote:
> > On Sat, Feb 13, 2021 at 01:58:44PM +0200, Matti Vaittinen wrote:
> > > A few drivers which need a delayed work-queue must cancel work at
> > > exit.
> > > Some of those implement remove solely for this purpose. Help
> > > drivers
> > > to avoid unnecessary remove and error-branch implementation by
> > > adding
> > > managed verision of delayed work initialization
> > > 
> > > Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
> > 
> > That's not a good idea.  As this would kick in when the device is
> > removed from the system, not when it is unbound from the driver,
> > right?
> > 
> > > ---
> > >  drivers/base/devres.c  | 33 +++++++++++++++++++++++++++++++++
> > >  include/linux/device.h |  5 +++++
> > >  2 files changed, 38 insertions(+)
> > > 
> > > diff --git a/drivers/base/devres.c b/drivers/base/devres.c
> > > index fb9d5289a620..2879595bb5a4 100644
> > > --- a/drivers/base/devres.c
> > > +++ b/drivers/base/devres.c
> > > @@ -1231,3 +1231,36 @@ void devm_free_percpu(struct device *dev,
> > > void __percpu *pdata)
> > >  			       (void *)pdata));
> > >  }
> > >  EXPORT_SYMBOL_GPL(devm_free_percpu);
> > > +
> > > +static void dev_delayed_work_drop(struct device *dev, void *res)
> > > +{
> > > +	cancel_delayed_work_sync(*(struct delayed_work **)res);
> > > +}
> > > +
> > > +/**
> > > + * devm_delayed_work_autocancel - Resource-managed work allocation
> > > + * @dev: Device which lifetime work is bound to
> > > + * @pdata: work to be cancelled when device exits
> > > + *
> > > + * Initialize work which is automatically cancelled when device
> > > exits.
> > 
> > There is no such thing in the driver model as "when device exits".
> > Please use the proper terminology as I do not understand what you
> > think
> > this is doing here...
> > 
> > > + * A few drivers need delayed work which must be cancelled before
> > > driver
> > > + * is unload to avoid accessing removed resources.
> > > + * devm_delayed_work_autocancel() can be used to omit the explicit
> > > + * cancelleation when driver is unload.
> > > + */
> > > +int devm_delayed_work_autocancel(struct device *dev, struct
> > > delayed_work *w,
> > > +				 void (*worker)(struct work_struct
> > > *work))
> > > +{
> > > +	struct delayed_work **ptr;
> > > +
> > > +	ptr = devres_alloc(dev_delayed_work_drop, sizeof(*ptr),
> > > GFP_KERNEL);
> > > +	if (!ptr)
> > > +		return -ENOMEM;
> > > +
> > > +	INIT_DELAYED_WORK(w, worker);
> > > +	*ptr = w;
> > > +	devres_add(dev, ptr);
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(devm_delayed_work_autocancel);
> > > diff --git a/include/linux/device.h b/include/linux/device.h
> > > index 1779f90eeb4c..192456198de7 100644
> > > --- a/include/linux/device.h
> > > +++ b/include/linux/device.h
> > > @@ -27,6 +27,7 @@
> > >  #include <linux/uidgid.h>
> > >  #include <linux/gfp.h>
> > >  #include <linux/overflow.h>
> > > +#include <linux/workqueue.h>
> > >  #include <linux/device/bus.h>
> > >  #include <linux/device/class.h>
> > >  #include <linux/device/driver.h>
> > > @@ -249,6 +250,10 @@ void __iomem *devm_of_iomap(struct device
> > > *dev,
> > >  			    struct device_node *node, int index,
> > >  			    resource_size_t *size);
> > >  
> > > +/* delayed work which is cancelled when driver exits */
> > 
> > Not when the "driver exits".
> > 
> > There is two different lifespans here (well 3).  Code and
> > data*2.  Don't
> > confuse them as that will just cause lots of problems.
> > 
> > The move toward more and more "devm" functions is not the way to go
> > as
> > they just more and more make things easier to get wrong.
> > 
> > APIs should be impossible to get wrong, this one is going to be
> > almost
> > impossible to get right.
> 
> Thanks for prompt reply. I guess I must've misunderstood some of this
> concept. Frankly to say, I don't understand how the devm based irq
> management works and this does not. Maybe I'd better study this further
> then.

The devm based irq management works horribly and should not be used at
all.  That is the main offender in the "an api that is impossible to use
correctly" category.

Honestly, I think it should just be removed as there is almost no real
need for it that I can determine, other than to cause bugs.

thanks,

greg k-h
Hans de Goede Feb. 13, 2021, 1:18 p.m. UTC | #4
Hi,

On 2/13/21 1:16 PM, Greg Kroah-Hartman wrote:
> On Sat, Feb 13, 2021 at 01:58:44PM +0200, Matti Vaittinen wrote:
>> A few drivers which need a delayed work-queue must cancel work at exit.
>> Some of those implement remove solely for this purpose. Help drivers
>> to avoid unnecessary remove and error-branch implementation by adding
>> managed verision of delayed work initialization
>>
>> Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
> 
> That's not a good idea.  As this would kick in when the device is
> removed from the system, not when it is unbound from the driver, right?

Erm, no devm managed resources get released when the driver is detached:
drivers/base/dd.c: __device_release_driver() calls devres_release_all(dev);


> 
>> ---
>>  drivers/base/devres.c  | 33 +++++++++++++++++++++++++++++++++
>>  include/linux/device.h |  5 +++++
>>  2 files changed, 38 insertions(+)
>>
>> diff --git a/drivers/base/devres.c b/drivers/base/devres.c
>> index fb9d5289a620..2879595bb5a4 100644
>> --- a/drivers/base/devres.c
>> +++ b/drivers/base/devres.c
>> @@ -1231,3 +1231,36 @@ void devm_free_percpu(struct device *dev, void __percpu *pdata)
>>  			       (void *)pdata));
>>  }
>>  EXPORT_SYMBOL_GPL(devm_free_percpu);
>> +
>> +static void dev_delayed_work_drop(struct device *dev, void *res)
>> +{
>> +	cancel_delayed_work_sync(*(struct delayed_work **)res);
>> +}
>> +
>> +/**
>> + * devm_delayed_work_autocancel - Resource-managed work allocation
>> + * @dev: Device which lifetime work is bound to
>> + * @pdata: work to be cancelled when device exits
>> + *
>> + * Initialize work which is automatically cancelled when device exits.
> 
> There is no such thing in the driver model as "when device exits".
> Please use the proper terminology as I do not understand what you think
> this is doing here...

I agree that this needs better wording I always talk about driver-unbinding
because sysfs has /sys/bus/*/drivers/*/bind and /sys/bus/*/drivers/*/unbind
attributes. But I see that the relevant driver-core functions all call it
driver detaching, so lets be consistent and use that here too.

> 
>> + * A few drivers need delayed work which must be cancelled before driver
>> + * is unload to avoid accessing removed resources.
>> + * devm_delayed_work_autocancel() can be used to omit the explicit
>> + * cancelleation when driver is unload.
>> + */
>> +int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
>> +				 void (*worker)(struct work_struct *work))
>> +{
>> +	struct delayed_work **ptr;
>> +
>> +	ptr = devres_alloc(dev_delayed_work_drop, sizeof(*ptr), GFP_KERNEL);
>> +	if (!ptr)
>> +		return -ENOMEM;
>> +
>> +	INIT_DELAYED_WORK(w, worker);
>> +	*ptr = w;
>> +	devres_add(dev, ptr);
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(devm_delayed_work_autocancel);
>> diff --git a/include/linux/device.h b/include/linux/device.h
>> index 1779f90eeb4c..192456198de7 100644
>> --- a/include/linux/device.h
>> +++ b/include/linux/device.h
>> @@ -27,6 +27,7 @@
>>  #include <linux/uidgid.h>
>>  #include <linux/gfp.h>
>>  #include <linux/overflow.h>
>> +#include <linux/workqueue.h>
>>  #include <linux/device/bus.h>
>>  #include <linux/device/class.h>
>>  #include <linux/device/driver.h>
>> @@ -249,6 +250,10 @@ void __iomem *devm_of_iomap(struct device *dev,
>>  			    struct device_node *node, int index,
>>  			    resource_size_t *size);
>>  
>> +/* delayed work which is cancelled when driver exits */
> 
> Not when the "driver exits".

Right this should be detached not exits.

> There is two different lifespans here (well 3).  Code and data*2.  Don't
> confuse them as that will just cause lots of problems.
> 
> The move toward more and more "devm" functions is not the way to go as
> they just more and more make things easier to get wrong.
> 
> APIs should be impossible to get wrong, this one is going to be almost
> impossible to get right.

I have to disagree here devm generally makes it easier to get things right,
it is when some devm functions are missing and devm and non devm resources
are mixed that things get tricky.

Lets look for example at the drivers/extcon/extcon-intel-int3496.c code
from patch 2/7 from this set. The removed driver-remove function looks like
this:

-static int int3496_remove(struct platform_device *pdev)
-{
-	struct int3496_data *data = platform_get_drvdata(pdev);
-
-	devm_free_irq(&pdev->dev, data->usb_id_irq, data);
-	cancel_delayed_work_sync(&data->work);
-
-	return 0;
-}
-

This is a good example where the mix of devm and non devm (the workqueue)
resources makes things tricky. The IRQ must be freed first to avoid the
work potentially getting re-queued after the sync cancel.

In this case using devm for the IRQ may cause the driver author to forget
about this, leaving a race.

Bit with the new proposed devm_delayed_work_autocancel() function things
will just work.

This work gets queued by the IRQ handler, so the work must be initialized (1)
*before* devm_request_irq() gets called. Any different order would be a
bug in the probe function since then the IRQ might run before the work
is initialized.

Since devm unrolls / releases resources in reverse order, this means that
it will automatically free the IRQ (which was requested later) before
cancelling the work.

So by switching to the new devm_delayed_work_autocancel() function we avoid
a case where a driver author can cause a race on driver detach because it is
relying on devm to free the IRQ, which may cause it to requeue a just
cancelled work.

IOW introducing this function (and using it where appropriate) actually
removes a possible class of bugs.

patch 2/7 actually has a nice example of this, drivers/extcon/extcon-gpio.c
also uses a delayed work queued by an interrupt, together with devm managing
the interrupt, yet the removed driver_remove callback:

-static int gpio_extcon_remove(struct platform_device *pdev)
-{
-	struct gpio_extcon_data *data = platform_get_drvdata(pdev);
-
-	cancel_delayed_work_sync(&data->work);
-
-	return 0;
-}
-

Is missing the explicit free on the IRQ which is necessary to avoid
the race. One the one hand this illustrates your (Greg's) argument that
devm managed IRQs may be a bad idea.

OTOH it shows that if we have devm managed IRQs anyways that then also
having devm managed autocancel works is a good idea, since this RFC patch-set
not only results in some cleanup, but is actually fixing at least 1 driver
detach race condition.

Regards,

Hans



1) devm_delayed_work_autocancel() replaces INIT_DELAYED_WORK()
Greg Kroah-Hartman Feb. 13, 2021, 1:33 p.m. UTC | #5
On Sat, Feb 13, 2021 at 02:18:06PM +0100, Hans de Goede wrote:
> Hi,
> 
> On 2/13/21 1:16 PM, Greg Kroah-Hartman wrote:
> > On Sat, Feb 13, 2021 at 01:58:44PM +0200, Matti Vaittinen wrote:
> >> A few drivers which need a delayed work-queue must cancel work at exit.
> >> Some of those implement remove solely for this purpose. Help drivers
> >> to avoid unnecessary remove and error-branch implementation by adding
> >> managed verision of delayed work initialization
> >>
> >> Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
> > 
> > That's not a good idea.  As this would kick in when the device is
> > removed from the system, not when it is unbound from the driver, right?
> 
> Erm, no devm managed resources get released when the driver is detached:
> drivers/base/dd.c: __device_release_driver() calls devres_release_all(dev);

Then why do you have to manually call devm_free_irq() in release
callbacks?  I thought that was the primary problem with those things.

I can understand devm_ calls handling resources, but callbacks and
workqueues feels like a big stretch.

> > There is two different lifespans here (well 3).  Code and data*2.  Don't
> > confuse them as that will just cause lots of problems.
> > 
> > The move toward more and more "devm" functions is not the way to go as
> > they just more and more make things easier to get wrong.
> > 
> > APIs should be impossible to get wrong, this one is going to be almost
> > impossible to get right.
> 
> I have to disagree here devm generally makes it easier to get things right,
> it is when some devm functions are missing and devm and non devm resources
> are mixed that things get tricky.
> 
> Lets look for example at the drivers/extcon/extcon-intel-int3496.c code
> from patch 2/7 from this set. The removed driver-remove function looks like
> this:
> 
> -static int int3496_remove(struct platform_device *pdev)
> -{
> -	struct int3496_data *data = platform_get_drvdata(pdev);
> -
> -	devm_free_irq(&pdev->dev, data->usb_id_irq, data);
> -	cancel_delayed_work_sync(&data->work);
> -
> -	return 0;
> -}
> -
> 
> This is a good example where the mix of devm and non devm (the workqueue)
> resources makes things tricky. The IRQ must be freed first to avoid the
> work potentially getting re-queued after the sync cancel.
> 
> In this case using devm for the IRQ may cause the driver author to forget
> about this, leaving a race.
> 
> Bit with the new proposed devm_delayed_work_autocancel() function things
> will just work.
> 
> This work gets queued by the IRQ handler, so the work must be initialized (1)
> *before* devm_request_irq() gets called. Any different order would be a
> bug in the probe function since then the IRQ might run before the work
> is initialized.

How are we now going to audit the order of these calls to ensure that
this is done correctly?  That still feels like it is ripe for bugs in a
much easier way than without these functions.

> Since devm unrolls / releases resources in reverse order, this means that
> it will automatically free the IRQ (which was requested later) before
> cancelling the work.
> 
> So by switching to the new devm_delayed_work_autocancel() function we avoid
> a case where a driver author can cause a race on driver detach because it is
> relying on devm to free the IRQ, which may cause it to requeue a just
> cancelled work.
> 
> IOW introducing this function (and using it where appropriate) actually
> removes a possible class of bugs.
> 
> patch 2/7 actually has a nice example of this, drivers/extcon/extcon-gpio.c
> also uses a delayed work queued by an interrupt, together with devm managing
> the interrupt, yet the removed driver_remove callback:
> 
> -static int gpio_extcon_remove(struct platform_device *pdev)
> -{
> -	struct gpio_extcon_data *data = platform_get_drvdata(pdev);
> -
> -	cancel_delayed_work_sync(&data->work);
> -
> -	return 0;
> -}
> -
> 
> Is missing the explicit free on the IRQ which is necessary to avoid
> the race. One the one hand this illustrates your (Greg's) argument that
> devm managed IRQs may be a bad idea.

I still think it is :)

> OTOH it shows that if we have devm managed IRQs anyways that then also
> having devm managed autocancel works is a good idea, since this RFC patch-set
> not only results in some cleanup, but is actually fixing at least 1 driver
> detach race condition.

Fixing bugs is good, but the abstraction away from resource management
that the devm_ calls cause is worrying as the "magic" behind them can be
wrong, as seen here.

thanks,

greg k-h
Hans de Goede Feb. 13, 2021, 2:38 p.m. UTC | #6
Hi,

On 2/13/21 2:33 PM, Greg Kroah-Hartman wrote:
> On Sat, Feb 13, 2021 at 02:18:06PM +0100, Hans de Goede wrote:
>> Hi,
>>
>> On 2/13/21 1:16 PM, Greg Kroah-Hartman wrote:
>>> On Sat, Feb 13, 2021 at 01:58:44PM +0200, Matti Vaittinen wrote:
>>>> A few drivers which need a delayed work-queue must cancel work at exit.
>>>> Some of those implement remove solely for this purpose. Help drivers
>>>> to avoid unnecessary remove and error-branch implementation by adding
>>>> managed verision of delayed work initialization
>>>>
>>>> Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
>>>
>>> That's not a good idea.  As this would kick in when the device is
>>> removed from the system, not when it is unbound from the driver, right?
>>
>> Erm, no devm managed resources get released when the driver is detached:
>> drivers/base/dd.c: __device_release_driver() calls devres_release_all(dev);
> 
> Then why do you have to manually call devm_free_irq() in release
> callbacks? 

That is only necessary when there are none devm managed resources /
resource-ish things which get manually released in the remove function
(because they are not devm managed), while the IRQ handler depends on
these.

> I thought that was the primary problem with those things.

If all resources / resource-ish things used by the IRQ handler are
devm managed then there is no need to have a devm_free_irq() call
in a drivers release (which is called remove) callback.

> I can understand devm_ calls handling resources, but callbacks and
> workqueues feels like a big stretch.

Things get hairy wrt devm resource handling when some of the resources
are not managed by devm.

Having devm managed sync-cancel of work items is IMHO actually good to
have because it helps avoiding mixing of devm managed resources with
non devm managed resources.

Some drivers are actually already doing devm managed workqueue cancellation
by using the generic devm_add_action() helper which allows calling a
driver supplied cleanup function when devres_release_all() runs.

This is also useful to make sure the workqueue is cancelled at the
right time during tear-down on error-exits from probe(), which
also runs devres_release_all().

See drivers/power/supply/axp288_charger.c for an example of this.

Note that no-one is going to force people to use this new
devm_delayed_work_autocancel() functionality. IMHO it is a useful
tool to have in our toolbox.

> 
>>> There is two different lifespans here (well 3).  Code and data*2.  Don't
>>> confuse them as that will just cause lots of problems.
>>>
>>> The move toward more and more "devm" functions is not the way to go as
>>> they just more and more make things easier to get wrong.
>>>
>>> APIs should be impossible to get wrong, this one is going to be almost
>>> impossible to get right.
>>
>> I have to disagree here devm generally makes it easier to get things right,
>> it is when some devm functions are missing and devm and non devm resources
>> are mixed that things get tricky.
>>
>> Lets look for example at the drivers/extcon/extcon-intel-int3496.c code
>> from patch 2/7 from this set. The removed driver-remove function looks like
>> this:
>>
>> -static int int3496_remove(struct platform_device *pdev)
>> -{
>> -	struct int3496_data *data = platform_get_drvdata(pdev);
>> -
>> -	devm_free_irq(&pdev->dev, data->usb_id_irq, data);
>> -	cancel_delayed_work_sync(&data->work);
>> -
>> -	return 0;
>> -}
>> -
>>
>> This is a good example where the mix of devm and non devm (the workqueue)
>> resources makes things tricky. The IRQ must be freed first to avoid the
>> work potentially getting re-queued after the sync cancel.
>>
>> In this case using devm for the IRQ may cause the driver author to forget
>> about this, leaving a race.
>>
>> Bit with the new proposed devm_delayed_work_autocancel() function things
>> will just work.
>>
>> This work gets queued by the IRQ handler, so the work must be initialized (1)
>> *before* devm_request_irq() gets called. Any different order would be a
>> bug in the probe function since then the IRQ might run before the work
>> is initialized.
> 
> How are we now going to audit the order of these calls to ensure that
> this is done correctly?  That still feels like it is ripe for bugs in a
> much easier way than without these functions.

We already need to audit probe() functions to make sure that all resources
which the IRQ handlers need are allocated before the IRQ gets requested.

This is why requesting the IRQ is usually one of the last things which
driver's probe functions do.

Using devm (and only devm without mixing and matching) avoids to have to
_also_ audit the probe-error-exit teardown and driver-release paths.
Those will automatically be done in the right order since devm will
release everything in reverse order of allocation.

So this reduces the amount of auditing work we need to do.

In my experience the probe-error-exit teardown and driver-release paths are full
of bugs, bugs which are often never caught because most drivers bind once and then
stay bound until reboot / shutdown so this paths are often not exercised during
testing. So automating this teardown is a good thing to do, it removes a whole
class of bugs.

>> Since devm unrolls / releases resources in reverse order, this means that
>> it will automatically free the IRQ (which was requested later) before
>> cancelling the work.
>>
>> So by switching to the new devm_delayed_work_autocancel() function we avoid
>> a case where a driver author can cause a race on driver detach because it is
>> relying on devm to free the IRQ, which may cause it to requeue a just
>> cancelled work.
>>
>> IOW introducing this function (and using it where appropriate) actually
>> removes a possible class of bugs.
>>
>> patch 2/7 actually has a nice example of this, drivers/extcon/extcon-gpio.c
>> also uses a delayed work queued by an interrupt, together with devm managing
>> the interrupt, yet the removed driver_remove callback:
>>
>> -static int gpio_extcon_remove(struct platform_device *pdev)
>> -{
>> -	struct gpio_extcon_data *data = platform_get_drvdata(pdev);
>> -
>> -	cancel_delayed_work_sync(&data->work);
>> -
>> -	return 0;
>> -}
>> -
>>
>> Is missing the explicit free on the IRQ which is necessary to avoid
>> the race. One the one hand this illustrates your (Greg's) argument that
>> devm managed IRQs may be a bad idea.
> 
> I still think it is :)

I understand and I'm not under the illusion that I'm going to change
your mind about this :)

>> OTOH it shows that if we have devm managed IRQs anyways that then also
>> having devm managed autocancel works is a good idea, since this RFC patch-set
>> not only results in some cleanup, but is actually fixing at least 1 driver
>> detach race condition.
> 
> Fixing bugs is good, but the abstraction away from resource management
> that the devm_ calls cause is worrying as the "magic" behind them can be
> wrong, as seen here.

Here being the gpio_extcon_remove() example ?

Yes that is bad, but again that is due to mixing manual and devm managed
resource management.

Having this new devm_delayed_work_autocancel() helper will allow a
bunch of drivers to move away from mixing the 2, which is a good thing
in my book.

As I said above I believe that having devm_delayed_work_autocancel() (1)
in our toolbox will be a good thing to have. Driver authors can then choose
to use it; or they can choose to not use it if they don't like it.

I know that the reason why I did not use it in the
drivers/extcon/extcon-intel-int3496.c driver is because it was not available
if it had been available then I would definitely have used it, as it avoids the
mixing of resource-management styles which that driver is currently doing.

And I think that that is what this is ultimately about, there are 2 styles
of resource-management:

1. manual
2. devm based

And they both have their pros and cons, problems mostly arise when mixing them
and adding new devm helpers for commonly used cleanup patterns is a good thing
as it helps to get rid of mixing these 2 styles in a single driver.

Regards,

Hans
Hans de Goede Feb. 13, 2021, 2:52 p.m. UTC | #7
Hi,

On 2/13/21 3:38 PM, Hans de Goede wrote:
> Hi,
> 
> On 2/13/21 2:33 PM, Greg Kroah-Hartman wrote:

<snip>

> Having this new devm_delayed_work_autocancel() helper will allow a
> bunch of drivers to move away from mixing the 2, which is a good thing
> in my book.
> 
> As I said above I believe that having devm_delayed_work_autocancel() (1)
> in our toolbox will be a good thing to have. Driver authors can then choose
> to use it; or they can choose to not use it if they don't like it.
> 
> I know that the reason why I did not use it in the
> drivers/extcon/extcon-intel-int3496.c driver is because it was not available
> if it had been available then I would definitely have used it, as it avoids the
> mixing of resource-management styles which that driver is currently doing.
> 
> And I think that that is what this is ultimately about, there are 2 styles
> of resource-management:
> 
> 1. manual
> 2. devm based
> 
> And they both have their pros and cons, problems mostly arise when mixing them
> and adding new devm helpers for commonly used cleanup patterns is a good thing
> as it helps to get rid of mixing these 2 styles in a single driver.

I just noticed that I forgot to fill in the (1) footnote above:

1) And we probably will want one for non delayed work items to: devm_work_autocancel(),
but lets cross that bridge when we get there.

Also when reviewing: "[RFC PATCH 2/7] extconn: Clean-up few drivers by using managed work init"
I noticed that the extcon-qcom-spmi-misc.c and extcon-palmas.c follow the same standard
pattern of having an IRQ which queues a delayed work and they both miss the devm_free_irq
call before the cancel_delayed_work_sync() on driver release. So just patch 2/7 here
fixes 3 driver-release race conditions (fixing 3/4 drivers which it touches) as a
bonus to the code-cleanup which it does.

So as this clearly seems to be fixing a bunch of bugs, by simply completely removing the
buggy code driver remove callbacks, this really seems like a good idea to me.

Regards,

Hans
Hans de Goede Feb. 13, 2021, 3:03 p.m. UTC | #8
Hi,

On 2/13/21 12:58 PM, Matti Vaittinen wrote:
> A few drivers which need a delayed work-queue must cancel work at exit.
> Some of those implement remove solely for this purpose. Help drivers
> to avoid unnecessary remove and error-branch implementation by adding
> managed verision of delayed work initialization
> 
> Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
> ---
>  drivers/base/devres.c  | 33 +++++++++++++++++++++++++++++++++
>  include/linux/device.h |  5 +++++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/drivers/base/devres.c b/drivers/base/devres.c
> index fb9d5289a620..2879595bb5a4 100644
> --- a/drivers/base/devres.c
> +++ b/drivers/base/devres.c
> @@ -1231,3 +1231,36 @@ void devm_free_percpu(struct device *dev, void __percpu *pdata)
>  			       (void *)pdata));
>  }
>  EXPORT_SYMBOL_GPL(devm_free_percpu);
> +
> +static void dev_delayed_work_drop(struct device *dev, void *res)
> +{
> +	cancel_delayed_work_sync(*(struct delayed_work **)res);
> +}
> +
> +/**
> + * devm_delayed_work_autocancel - Resource-managed work allocation
> + * @dev: Device which lifetime work is bound to
> + * @pdata: work to be cancelled when device exits
> + *
> + * Initialize work which is automatically cancelled when device exits.
> + * A few drivers need delayed work which must be cancelled before driver
> + * is unload to avoid accessing removed resources.
> + * devm_delayed_work_autocancel() can be used to omit the explicit
> + * cancelleation when driver is unload.
> + */
> +int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
> +				 void (*worker)(struct work_struct *work))
> +{
> +	struct delayed_work **ptr;
> +
> +	ptr = devres_alloc(dev_delayed_work_drop, sizeof(*ptr), GFP_KERNEL);
> +	if (!ptr)
> +		return -ENOMEM;
> +
> +	INIT_DELAYED_WORK(w, worker);
> +	*ptr = w;
> +	devres_add(dev, ptr);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(devm_delayed_work_autocancel);

This is a bit of a micro-optimization I must admit, but I think you can
just make this a static inline using devm_add_action, which would avoid
growing the base kernel image and avoid adding yet another symbol to
the exported symbols table.

I think something like this should work:

static int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
					void (*worker)(struct work_struct *work)) {
	INIT_DELAYED_WORK(w, worker);
	return devm_add_action(dev, (void (*action)(void *))cancel_delayed_work_sync, w);
}

I'm not sure about the cast, that may need something like this instead:

typedef void (*devm_action_func)(void *);

static int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
					void (*worker)(struct work_struct *work)) {
	INIT_DELAYED_WORK(w, worker);
	return devm_add_action(dev, (devm_action_func)cancel_delayed_work_sync, w);
}

As said this is a bit of a micro-optimization. But if we want to add others too,
say one for non delayed works, then at some point in time this is going to start
to add up (a bit) wrt symbol-table size and base kernel-image size.

And if you go the static inline route, I guess you could add this in

include/linux/workqueue.h

instead and putting workqueue devm cleanup helpers there seems better
then putting random devm cleanup helpers in include/linux/device.h .

Regards,

Hans






> diff --git a/include/linux/device.h b/include/linux/device.h
> index 1779f90eeb4c..192456198de7 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -27,6 +27,7 @@
>  #include <linux/uidgid.h>
>  #include <linux/gfp.h>
>  #include <linux/overflow.h>
> +#include <linux/workqueue.h>
>  #include <linux/device/bus.h>
>  #include <linux/device/class.h>
>  #include <linux/device/driver.h>
> @@ -249,6 +250,10 @@ void __iomem *devm_of_iomap(struct device *dev,
>  			    struct device_node *node, int index,
>  			    resource_size_t *size);
>  
> +/* delayed work which is cancelled when driver exits */
> +int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
> +				 void (*worker)(struct work_struct *work));
> +
>  /* allows to add/remove a custom action to devres stack */
>  int devm_add_action(struct device *dev, void (*action)(void *), void *data);
>  void devm_remove_action(struct device *dev, void (*action)(void *), void *data);
>
Guenter Roeck Feb. 13, 2021, 3:27 p.m. UTC | #9
On 2/13/21 7:03 AM, Hans de Goede wrote:
[ ... ]
> 
> I think something like this should work:
> 
> static int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
> 					void (*worker)(struct work_struct *work)) {
> 	INIT_DELAYED_WORK(w, worker);
> 	return devm_add_action(dev, (void (*action)(void *))cancel_delayed_work_sync, w);
> }
> 
> I'm not sure about the cast, that may need something like this instead:
> 
> typedef void (*devm_action_func)(void *);
> 
> static int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
> 					void (*worker)(struct work_struct *work)) {
> 	INIT_DELAYED_WORK(w, worker);
> 	return devm_add_action(dev, (devm_action_func)cancel_delayed_work_sync, w);

Unfortunately, you can not type cast function pointers in C. It is against the C ABI.
I am sure it is done in a few places in the kernel anyway, but those are wrong.

This is the reason why many calls to devm_add_action() point to functions such as

static void visconti_clk_disable_unprepare(void *data)
{
        clk_disable_unprepare(data);
}

which could otherwise be handled using typecasts.

Guenter
Hans de Goede Feb. 13, 2021, 3:59 p.m. UTC | #10
Hi,

On 2/13/21 4:27 PM, Guenter Roeck wrote:
> On 2/13/21 7:03 AM, Hans de Goede wrote:
> [ ... ]
>>
>> I think something like this should work:
>>
>> static int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
>> 					void (*worker)(struct work_struct *work)) {
>> 	INIT_DELAYED_WORK(w, worker);
>> 	return devm_add_action(dev, (void (*action)(void *))cancel_delayed_work_sync, w);
>> }
>>
>> I'm not sure about the cast, that may need something like this instead:
>>
>> typedef void (*devm_action_func)(void *);
>>
>> static int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
>> 					void (*worker)(struct work_struct *work)) {
>> 	INIT_DELAYED_WORK(w, worker);
>> 	return devm_add_action(dev, (devm_action_func)cancel_delayed_work_sync, w);
> 
> Unfortunately, you can not type cast function pointers in C. It is against the C ABI.
> I am sure it is done in a few places in the kernel anyway, but those are wrong.

I see, bummer.

> This is the reason why many calls to devm_add_action() point to functions such as
> 
> static void visconti_clk_disable_unprepare(void *data)
> {
>         clk_disable_unprepare(data);
> }
> 
> which could otherwise be handled using typecasts.

Hmm, wouldn't something like this be a candidate for adding a:

devm_clk_prepare_enable() helper?

This seems better then having the driver(s) make + error check separate
clk_prepare_enable() + devm_add_action_or_reset() calls ?

I must admit I'm guilty myself of just using devm_add_action() sometimes
when a specific devm helper is missing, but this whole discussion makes
me think that it would be good to have some extra devm helpers for
common cases / driver cleanup patterns.

If we add a devm_clk_prepare_enable() helper that should probably be added
to drivers/clk/clk-devres.c and not to drivers/base/devres.c .

I also still wonder if we cannot find a better place for this new
devm_delayed_work_autocancel() helper but nothing comes to mind.

Regards,

Hans
Guenter Roeck Feb. 13, 2021, 6:17 p.m. UTC | #11
On 2/13/21 7:59 AM, Hans de Goede wrote:
> Hi,
> 
> On 2/13/21 4:27 PM, Guenter Roeck wrote:
>> On 2/13/21 7:03 AM, Hans de Goede wrote:
>> [ ... ]
>>>
>>> I think something like this should work:
>>>
>>> static int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
>>> 					void (*worker)(struct work_struct *work)) {
>>> 	INIT_DELAYED_WORK(w, worker);
>>> 	return devm_add_action(dev, (void (*action)(void *))cancel_delayed_work_sync, w);
>>> }
>>>
>>> I'm not sure about the cast, that may need something like this instead:
>>>
>>> typedef void (*devm_action_func)(void *);
>>>
>>> static int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
>>> 					void (*worker)(struct work_struct *work)) {
>>> 	INIT_DELAYED_WORK(w, worker);
>>> 	return devm_add_action(dev, (devm_action_func)cancel_delayed_work_sync, w);
>>
>> Unfortunately, you can not type cast function pointers in C. It is against the C ABI.
>> I am sure it is done in a few places in the kernel anyway, but those are wrong.
> 
> I see, bummer.
> 
>> This is the reason why many calls to devm_add_action() point to functions such as
>>
>> static void visconti_clk_disable_unprepare(void *data)
>> {
>>         clk_disable_unprepare(data);
>> }
>>
>> which could otherwise be handled using typecasts.
> 
> Hmm, wouldn't something like this be a candidate for adding a:
> 
> devm_clk_prepare_enable() helper?
> 
> This seems better then having the driver(s) make + error check separate
> clk_prepare_enable() + devm_add_action_or_reset() calls ?
> 

I don't really want to go there anymore. The maintainer rejected it several times.

Guenter
Vaittinen, Matti Feb. 15, 2021, 6:58 a.m. UTC | #12
On Sat, 2021-02-13 at 14:18 +0100, Hans de Goede wrote:
> Hi,
> 
> On 2/13/21 1:16 PM, Greg Kroah-Hartman wrote:
> > On Sat, Feb 13, 2021 at 01:58:44PM +0200, Matti Vaittinen wrote:
> > > +/**
> > > + * devm_delayed_work_autocancel - Resource-managed work
> > > allocation
> > > + * @dev: Device which lifetime work is bound to
> > > + * @pdata: work to be cancelled when device exits
> > > + *
> > > + * Initialize work which is automatically cancelled when device
> > > exits.
> > 
> > There is no such thing in the driver model as "when device exits".
> > Please use the proper terminology as I do not understand what you
> > think
> > this is doing here...
> 
> I agree that this needs better wording I always talk about driver-
> unbinding
> because sysfs has /sys/bus/*/drivers/*/bind and
> /sys/bus/*/drivers/*/unbind
> attributes. But I see that the relevant driver-core functions all
> call it
> driver detaching, so lets be consistent and use that here too.

//Snip

> > > @@ -249,6 +250,10 @@ void __iomem *devm_of_iomap(struct device
> > > *dev,
> > >  			    struct device_node *node, int index,
> > >  			    resource_size_t *size);
> > >  
> > > +/* delayed work which is cancelled when driver exits */
> > 
> > Not when the "driver exits".
> 
> Right this should be detached not exits.
> 

Thanks guys.
I am poor with the terminology so I do appreciate your help in getting
this right. I can change this for the v2.


> > There is two different lifespans here (well 3).  Code and
> > data*2.  Don't
> > confuse them as that will just cause lots of problems.
> > 
> > The move toward more and more "devm" functions is not the way to go
> > as
> > they just more and more make things easier to get wrong.
> > 
> > APIs should be impossible to get wrong, this one is going to be
> > almost
> > impossible to get right.
> 
> I have to disagree here devm generally makes it easier to get things
> right,
> it is when some devm functions are missing and devm and non devm
> resources
> are mixed that things get tricky.

Thanks for the discussion. I hope we can come to some conclusion here.
Unsurprisingly I agree with Hans here. I did after all send this patch
series :) I guess I am mostly just repeating what he said.

As Hans pointed out, when all calls are 'undone' by devm the order of
'undoing' is highly likely to be correct as the unwinding is done in
reverse order to initializations. I think it is sane to assume in most
case things are initiated in order where operations which depend on
something are done last - and when 'unwinding' things those are
'undone' first. 

My 'gut feeling' for probe / remove related errors is that the most
usual errors I've seen have been:

a) Tear-down completely forgotten
b) Tear-down forgotten at error path
c) Wrong order of initiating things (IRQ requested prior resource
initialization)
d) Wrong order of cleann-up at remove.

a) and b) class errors have been the most common ones I've seen. They
can be completely avoided when devm is used.
c) is there no matter if we use devm or not.
d) is mostly avoided when only devm is used - mixing devm and manual
operations make this more likely as Hans pointed out. As long as we
have some devm operations we should help avoid mixing devm and manual
clean-up.

Best Regards
	Matti Vaittinen
Vaittinen, Matti Feb. 15, 2021, 7:22 a.m. UTC | #13
On Sat, 2021-02-13 at 16:59 +0100, Hans de Goede wrote:
> Hi,
> 
> On 2/13/21 4:27 PM, Guenter Roeck wrote:
> > On 2/13/21 7:03 AM, Hans de Goede wrote:
> > [ ... ]
> > > I think something like this should work:
> > > 
> > > static int devm_delayed_work_autocancel(struct device *dev,
> > > struct delayed_work *w,
> > > 					void (*worker)(struct
> > > work_struct *work)) {
> > > 	INIT_DELAYED_WORK(w, worker);
> > > 	return devm_add_action(dev, (void (*action)(void
> > > *))cancel_delayed_work_sync, w);
> > > }
> > > 
> > > I'm not sure about the cast, that may need something like this
> > > instead:
> > > 
> > > typedef void (*devm_action_func)(void *);
> > > 
> > > static int devm_delayed_work_autocancel(struct device *dev,
> > > struct delayed_work *w,
> > > 					void (*worker)(struct
> > > work_struct *work)) {
> > > 	INIT_DELAYED_WORK(w, worker);
> > > 	return devm_add_action(dev,
> > > (devm_action_func)cancel_delayed_work_sync, w);
> > 
> > Unfortunately, you can not type cast function pointers in C. It is
> > against the C ABI.
> > I am sure it is done in a few places in the kernel anyway, but
> > those are wrong.
> 
> I see, bummer.

I think using devm_add_action() is still a good idea.

> 
> If we add a devm_clk_prepare_enable() helper that should probably be
> added
> to drivers/clk/clk-devres.c and not to drivers/base/devres.c .
> 
> I also still wonder if we cannot find a better place for this new
> devm_delayed_work_autocancel() helper but nothing comes to mind.

I don't like the idea of including device.h from workqueue.h - and I
think this would be necessary if we added
devm_delayed_work_autocancel() as inline in workqueue.h, right?

I also see strong objection towards the devm managed clean-ups.

How about adding some devm-helpers.c in drivers/base - where we could
collect devm-based helpers - and which could be enabled by own CONFIG -
and left out by those who dislike it?

I know I wrote that the devm_delayed_work_autocancel() does probably
not warrant own file - but if you can foresee devm_work_autocancel()
and few other generally useful helpers - then we would have a place for
those. The devm stuff should in my opinion live under drivers/.

Best Regards
	Matti Vaittinen
Hans de Goede Feb. 15, 2021, 10:37 a.m. UTC | #14
Hi,

On 2/15/21 8:22 AM, Vaittinen, Matti wrote:
> 
> On Sat, 2021-02-13 at 16:59 +0100, Hans de Goede wrote:
>> Hi,
>>
>> On 2/13/21 4:27 PM, Guenter Roeck wrote:
>>> On 2/13/21 7:03 AM, Hans de Goede wrote:
>>> [ ... ]
>>>> I think something like this should work:
>>>>
>>>> static int devm_delayed_work_autocancel(struct device *dev,
>>>> struct delayed_work *w,
>>>> 					void (*worker)(struct
>>>> work_struct *work)) {
>>>> 	INIT_DELAYED_WORK(w, worker);
>>>> 	return devm_add_action(dev, (void (*action)(void
>>>> *))cancel_delayed_work_sync, w);
>>>> }
>>>>
>>>> I'm not sure about the cast, that may need something like this
>>>> instead:
>>>>
>>>> typedef void (*devm_action_func)(void *);
>>>>
>>>> static int devm_delayed_work_autocancel(struct device *dev,
>>>> struct delayed_work *w,
>>>> 					void (*worker)(struct
>>>> work_struct *work)) {
>>>> 	INIT_DELAYED_WORK(w, worker);
>>>> 	return devm_add_action(dev,
>>>> (devm_action_func)cancel_delayed_work_sync, w);
>>>
>>> Unfortunately, you can not type cast function pointers in C. It is
>>> against the C ABI.
>>> I am sure it is done in a few places in the kernel anyway, but
>>> those are wrong.
>>
>> I see, bummer.
> 
> I think using devm_add_action() is still a good idea.

Yes, we could also just have a 1 line static inline function to do
the function-cast. Like this:

static inline void devm_delayed_work_autocancel_func(void *work)
{
	cancel_delayed_work_sync(work);
}

static inline int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w, void (*worker)(struct work_struct *work))
{
	INIT_DELAYED_WORK(w, worker);
	return devm_add_action(dev, devm_delayed_work_autocancel_func, w);
}

Both functions will then simply be compiled out in files which do not
use them.

>> If we add a devm_clk_prepare_enable() helper that should probably be
>> added
>> to drivers/clk/clk-devres.c and not to drivers/base/devres.c .
>>
>> I also still wonder if we cannot find a better place for this new
>> devm_delayed_work_autocancel() helper but nothing comes to mind.
> 
> I don't like the idea of including device.h from workqueue.h - and I
> think this would be necessary if we added
> devm_delayed_work_autocancel() as inline in workqueue.h, right?

Yes.

> I also see strong objection towards the devm managed clean-ups.

Yes it seems that there are some people who don't like this, where as
others do like them.

> How about adding some devm-helpers.c in drivers/base - where we could
> collect devm-based helpers - and which could be enabled by own CONFIG -
> and left out by those who dislike it?

I would make this something configurable through Kconfig, but if
go the static inline route, which I'm in favor of then we could just
have a:

include/linux/devm-cleanup-helpers.h

And put everything (including kdoc texts) there.

This way the functionality is 100% opt-in (by explicitly including
the header if you want the helpers) which hopefully makes this a
bit more acceptable to people who don't like this style of cleanups.

I would be even happy to act as the upstream maintainer for such a
include/linux/devm-cleanup-helpers.h file, I can maintain it as part
of the platform-drivers-x86 tree (with its own MAINTAINERS entry).

Greg, would this be an acceptable solution to you ?

Regards,

Hans
Greg Kroah-Hartman Feb. 15, 2021, 11:31 a.m. UTC | #15
On Mon, Feb 15, 2021 at 11:37:26AM +0100, Hans de Goede wrote:
> Hi,
> 
> On 2/15/21 8:22 AM, Vaittinen, Matti wrote:
> > 
> > On Sat, 2021-02-13 at 16:59 +0100, Hans de Goede wrote:
> >> Hi,
> >>
> >> On 2/13/21 4:27 PM, Guenter Roeck wrote:
> >>> On 2/13/21 7:03 AM, Hans de Goede wrote:
> >>> [ ... ]
> >>>> I think something like this should work:
> >>>>
> >>>> static int devm_delayed_work_autocancel(struct device *dev,
> >>>> struct delayed_work *w,
> >>>> 					void (*worker)(struct
> >>>> work_struct *work)) {
> >>>> 	INIT_DELAYED_WORK(w, worker);
> >>>> 	return devm_add_action(dev, (void (*action)(void
> >>>> *))cancel_delayed_work_sync, w);
> >>>> }
> >>>>
> >>>> I'm not sure about the cast, that may need something like this
> >>>> instead:
> >>>>
> >>>> typedef void (*devm_action_func)(void *);
> >>>>
> >>>> static int devm_delayed_work_autocancel(struct device *dev,
> >>>> struct delayed_work *w,
> >>>> 					void (*worker)(struct
> >>>> work_struct *work)) {
> >>>> 	INIT_DELAYED_WORK(w, worker);
> >>>> 	return devm_add_action(dev,
> >>>> (devm_action_func)cancel_delayed_work_sync, w);
> >>>
> >>> Unfortunately, you can not type cast function pointers in C. It is
> >>> against the C ABI.
> >>> I am sure it is done in a few places in the kernel anyway, but
> >>> those are wrong.
> >>
> >> I see, bummer.
> > 
> > I think using devm_add_action() is still a good idea.
> 
> Yes, we could also just have a 1 line static inline function to do
> the function-cast. Like this:
> 
> static inline void devm_delayed_work_autocancel_func(void *work)
> {
> 	cancel_delayed_work_sync(work);
> }
> 
> static inline int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w, void (*worker)(struct work_struct *work))
> {
> 	INIT_DELAYED_WORK(w, worker);
> 	return devm_add_action(dev, devm_delayed_work_autocancel_func, w);
> }
> 
> Both functions will then simply be compiled out in files which do not
> use them.
> 
> >> If we add a devm_clk_prepare_enable() helper that should probably be
> >> added
> >> to drivers/clk/clk-devres.c and not to drivers/base/devres.c .
> >>
> >> I also still wonder if we cannot find a better place for this new
> >> devm_delayed_work_autocancel() helper but nothing comes to mind.
> > 
> > I don't like the idea of including device.h from workqueue.h - and I
> > think this would be necessary if we added
> > devm_delayed_work_autocancel() as inline in workqueue.h, right?
> 
> Yes.
> 
> > I also see strong objection towards the devm managed clean-ups.
> 
> Yes it seems that there are some people who don't like this, where as
> others do like them.
> 
> > How about adding some devm-helpers.c in drivers/base - where we could
> > collect devm-based helpers - and which could be enabled by own CONFIG -
> > and left out by those who dislike it?
> 
> I would make this something configurable through Kconfig, but if
> go the static inline route, which I'm in favor of then we could just
> have a:
> 
> include/linux/devm-cleanup-helpers.h
> 
> And put everything (including kdoc texts) there.
> 
> This way the functionality is 100% opt-in (by explicitly including
> the header if you want the helpers) which hopefully makes this a
> bit more acceptable to people who don't like this style of cleanups.
> 
> I would be even happy to act as the upstream maintainer for such a
> include/linux/devm-cleanup-helpers.h file, I can maintain it as part
> of the platform-drivers-x86 tree (with its own MAINTAINERS entry).
> 
> Greg, would this be an acceptable solution to you ?

I don't know, sorry, let's revisit this after 5.12-rc1 is out, with a
patch set that I can review again, and we can go from there as I can't
do anything until then...

thanks,

greg k-h
Hans de Goede Feb. 15, 2021, 11:43 a.m. UTC | #16
Hi,

On 2/15/21 12:31 PM, gregkh@linuxfoundation.org wrote:
> On Mon, Feb 15, 2021 at 11:37:26AM +0100, Hans de Goede wrote:
>> Hi,
>>
>> On 2/15/21 8:22 AM, Vaittinen, Matti wrote:
>>>
>>> On Sat, 2021-02-13 at 16:59 +0100, Hans de Goede wrote:
>>>> Hi,
>>>>
>>>> On 2/13/21 4:27 PM, Guenter Roeck wrote:
>>>>> On 2/13/21 7:03 AM, Hans de Goede wrote:
>>>>> [ ... ]
>>>>>> I think something like this should work:
>>>>>>
>>>>>> static int devm_delayed_work_autocancel(struct device *dev,
>>>>>> struct delayed_work *w,
>>>>>> 					void (*worker)(struct
>>>>>> work_struct *work)) {
>>>>>> 	INIT_DELAYED_WORK(w, worker);
>>>>>> 	return devm_add_action(dev, (void (*action)(void
>>>>>> *))cancel_delayed_work_sync, w);
>>>>>> }
>>>>>>
>>>>>> I'm not sure about the cast, that may need something like this
>>>>>> instead:
>>>>>>
>>>>>> typedef void (*devm_action_func)(void *);
>>>>>>
>>>>>> static int devm_delayed_work_autocancel(struct device *dev,
>>>>>> struct delayed_work *w,
>>>>>> 					void (*worker)(struct
>>>>>> work_struct *work)) {
>>>>>> 	INIT_DELAYED_WORK(w, worker);
>>>>>> 	return devm_add_action(dev,
>>>>>> (devm_action_func)cancel_delayed_work_sync, w);
>>>>>
>>>>> Unfortunately, you can not type cast function pointers in C. It is
>>>>> against the C ABI.
>>>>> I am sure it is done in a few places in the kernel anyway, but
>>>>> those are wrong.
>>>>
>>>> I see, bummer.
>>>
>>> I think using devm_add_action() is still a good idea.
>>
>> Yes, we could also just have a 1 line static inline function to do
>> the function-cast. Like this:
>>
>> static inline void devm_delayed_work_autocancel_func(void *work)
>> {
>> 	cancel_delayed_work_sync(work);
>> }
>>
>> static inline int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w, void (*worker)(struct work_struct *work))
>> {
>> 	INIT_DELAYED_WORK(w, worker);
>> 	return devm_add_action(dev, devm_delayed_work_autocancel_func, w);
>> }
>>
>> Both functions will then simply be compiled out in files which do not
>> use them.
>>
>>>> If we add a devm_clk_prepare_enable() helper that should probably be
>>>> added
>>>> to drivers/clk/clk-devres.c and not to drivers/base/devres.c .
>>>>
>>>> I also still wonder if we cannot find a better place for this new
>>>> devm_delayed_work_autocancel() helper but nothing comes to mind.
>>>
>>> I don't like the idea of including device.h from workqueue.h - and I
>>> think this would be necessary if we added
>>> devm_delayed_work_autocancel() as inline in workqueue.h, right?
>>
>> Yes.
>>
>>> I also see strong objection towards the devm managed clean-ups.
>>
>> Yes it seems that there are some people who don't like this, where as
>> others do like them.
>>
>>> How about adding some devm-helpers.c in drivers/base - where we could
>>> collect devm-based helpers - and which could be enabled by own CONFIG -
>>> and left out by those who dislike it?
>>
>> I would make this something configurable through Kconfig, but if

Clarification I meant to write: "I would NOT make this something configurable through Kconfig".

>> go the static inline route, which I'm in favor of then we could just
>> have a:
>>
>> include/linux/devm-cleanup-helpers.h
>>
>> And put everything (including kdoc texts) there.
>>
>> This way the functionality is 100% opt-in (by explicitly including
>> the header if you want the helpers) which hopefully makes this a
>> bit more acceptable to people who don't like this style of cleanups.
>>
>> I would be even happy to act as the upstream maintainer for such a
>> include/linux/devm-cleanup-helpers.h file, I can maintain it as part
>> of the platform-drivers-x86 tree (with its own MAINTAINERS entry).
>>
>> Greg, would this be an acceptable solution to you ?
> 
> I don't know, sorry, let's revisit this after 5.12-rc1 is out, with a
> patch set that I can review again, and we can go from there as I can't
> do anything until then...

Ok.

Regards,

Hans
Vaittinen, Matti Feb. 15, 2021, 1:12 p.m. UTC | #17
On Mon, 2021-02-15 at 12:43 +0100, Hans de Goede wrote:
> Hi,
> 
> On 2/15/21 12:31 PM, gregkh@linuxfoundation.org wrote:
> > On Mon, Feb 15, 2021 at 11:37:26AM +0100, Hans de Goede wrote:
> > > Hi,
> > > 
> > > On 2/15/21 8:22 AM, Vaittinen, Matti wrote:
> > > > On Sat, 2021-02-13 at 16:59 +0100, Hans de Goede wrote:
> > > > > Hi,
> > > > > 
> > > > > On 2/13/21 4:27 PM, Guenter Roeck wrote:
> > > > > > On 2/13/21 7:03 AM, Hans de Goede wrote:
> > > > > > [ ... ]
> > > > > > > I think something like this should work:
> > > > > > > 
> > > > > > > static int devm_delayed_work_autocancel(struct device
> > > > > > > *dev,
> > > > > > > struct delayed_work *w,
> > > > > > > 					void (*worker)(struct
> > > > > > > work_struct *work)) {
> > > > > > > 	INIT_DELAYED_WORK(w, worker);
> > > > > > > 	return devm_add_action(dev, (void (*action)(void
> > > > > > > *))cancel_delayed_work_sync, w);
> > > > > > > }
> > > 
> > > include/linux/devm-cleanup-helpers.h
> > > 
> > > And put everything (including kdoc texts) there.
> > > 
> > > This way the functionality is 100% opt-in (by explicitly
> > > including
> > > the header if you want the helpers) which hopefully makes this a
> > > bit more acceptable to people who don't like this style of
> > > cleanups.
> > > 
> > > I would be even happy to act as the upstream maintainer for such
> > > a
> > > include/linux/devm-cleanup-helpers.h file, I can maintain it as
> > > part
> > > of the platform-drivers-x86 tree (with its own MAINTAINERS
> > > entry).
> > > 
> > > Greg, would this be an acceptable solution to you ?
> > 
> > I don't know, sorry, let's revisit this after 5.12-rc1 is out, with
> > a
> > patch set that I can review again, and we can go from there as I
> > can't
> > do anything until then...
> 
> Ok.

This is Ok for me too. I am in no hurry with this - I've already a few
things to work on.
So, I will rework this to be in a single header when v5.12-rc1 is out.

Best Regards
	Matti Vaittinen


--
Matti Vaittinen, Linux device drivers
ROHM Semiconductors, Finland
SWDC
Kiviharjunlenkki 1E
90220 OULU
FINLAND

~~~ "I don't think so," said Rene Descartes. Just then he vanished ~~~

Simon says - in Latin please.
"non cogito me" dixit Rene Descarte, deinde evanescavit

(Thanks for the translation Simon)
diff mbox series

Patch

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index fb9d5289a620..2879595bb5a4 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -1231,3 +1231,36 @@  void devm_free_percpu(struct device *dev, void __percpu *pdata)
 			       (void *)pdata));
 }
 EXPORT_SYMBOL_GPL(devm_free_percpu);
+
+static void dev_delayed_work_drop(struct device *dev, void *res)
+{
+	cancel_delayed_work_sync(*(struct delayed_work **)res);
+}
+
+/**
+ * devm_delayed_work_autocancel - Resource-managed work allocation
+ * @dev: Device which lifetime work is bound to
+ * @pdata: work to be cancelled when device exits
+ *
+ * Initialize work which is automatically cancelled when device exits.
+ * A few drivers need delayed work which must be cancelled before driver
+ * is unload to avoid accessing removed resources.
+ * devm_delayed_work_autocancel() can be used to omit the explicit
+ * cancelleation when driver is unload.
+ */
+int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
+				 void (*worker)(struct work_struct *work))
+{
+	struct delayed_work **ptr;
+
+	ptr = devres_alloc(dev_delayed_work_drop, sizeof(*ptr), GFP_KERNEL);
+	if (!ptr)
+		return -ENOMEM;
+
+	INIT_DELAYED_WORK(w, worker);
+	*ptr = w;
+	devres_add(dev, ptr);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devm_delayed_work_autocancel);
diff --git a/include/linux/device.h b/include/linux/device.h
index 1779f90eeb4c..192456198de7 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -27,6 +27,7 @@ 
 #include <linux/uidgid.h>
 #include <linux/gfp.h>
 #include <linux/overflow.h>
+#include <linux/workqueue.h>
 #include <linux/device/bus.h>
 #include <linux/device/class.h>
 #include <linux/device/driver.h>
@@ -249,6 +250,10 @@  void __iomem *devm_of_iomap(struct device *dev,
 			    struct device_node *node, int index,
 			    resource_size_t *size);
 
+/* delayed work which is cancelled when driver exits */
+int devm_delayed_work_autocancel(struct device *dev, struct delayed_work *w,
+				 void (*worker)(struct work_struct *work));
+
 /* allows to add/remove a custom action to devres stack */
 int devm_add_action(struct device *dev, void (*action)(void *), void *data);
 void devm_remove_action(struct device *dev, void (*action)(void *), void *data);