diff mbox series

[v2] software node: balance refcount for managed sw nodes

Message ID 20210914144409.32626-1-laurentiu.tudor@nxp.com (mailing list archive)
State Superseded, archived
Headers show
Series [v2] software node: balance refcount for managed sw nodes | expand

Commit Message

Laurentiu Tudor Sept. 14, 2021, 2:44 p.m. UTC
From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
software nodes, thus leading to underflow errors. Balance the refcount by
bumping it in the device_create_managed_software_node() function.

The error [1] was encountered after adding a .shutdown() op to our
fsl-mc-bus driver.

[1]
pc : refcount_warn_saturate+0xf8/0x150
lr : refcount_warn_saturate+0xf8/0x150
sp : ffff80001009b920
x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
Call trace:
 refcount_warn_saturate+0xf8/0x150
 kobject_put+0x10c/0x120
 software_node_notify+0xd8/0x140
 device_platform_notify+0x4c/0xb4
 device_del+0x188/0x424
 fsl_mc_device_remove+0x2c/0x4c
 rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
 device_for_each_child+0x5c/0xac
 dprc_remove+0x9c/0xc0
 fsl_mc_driver_remove+0x28/0x64
 __device_release_driver+0x188/0x22c
 device_release_driver+0x30/0x50
 bus_remove_device+0x128/0x134
 device_del+0x16c/0x424
 fsl_mc_bus_remove+0x8c/0x114
 fsl_mc_bus_shutdown+0x14/0x20
 platform_shutdown+0x28/0x40
 device_shutdown+0x15c/0x330
 __do_sys_reboot+0x218/0x2a0
 __arm64_sys_reboot+0x28/0x34
 invoke_syscall+0x48/0x114
 el0_svc_common+0x40/0xdc
 do_el0_svc+0x2c/0x94
 el0_svc+0x2c/0x54
 el0t_64_sync_handler+0xa8/0x12c
 el0t_64_sync+0x198/0x19c
---[ end trace 32eb1c71c7d86821 ]---

Reported-by: Jon Nettleton <jon@solid-run.com>
Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
Changes since v1:
 - added Heikki's Reviewed-by: (Thanks!)

Changes since RFC:
 - use software_node_notify(KOBJ_ADD) instead of directly bumping
   refcount (Heikki)

 drivers/base/swnode.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Greg KH Sept. 14, 2021, 2:50 p.m. UTC | #1
On Tue, Sep 14, 2021 at 05:44:09PM +0300, laurentiu.tudor@nxp.com wrote:
> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> 
> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> software nodes, thus leading to underflow errors. Balance the refcount by
> bumping it in the device_create_managed_software_node() function.
> 
> The error [1] was encountered after adding a .shutdown() op to our
> fsl-mc-bus driver.
> 
> [1]
> pc : refcount_warn_saturate+0xf8/0x150
> lr : refcount_warn_saturate+0xf8/0x150
> sp : ffff80001009b920
> x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
> x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
> x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
> x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
> x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
> x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
> x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
> x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
> x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
> Call trace:
>  refcount_warn_saturate+0xf8/0x150
>  kobject_put+0x10c/0x120
>  software_node_notify+0xd8/0x140
>  device_platform_notify+0x4c/0xb4
>  device_del+0x188/0x424
>  fsl_mc_device_remove+0x2c/0x4c
>  rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
>  device_for_each_child+0x5c/0xac
>  dprc_remove+0x9c/0xc0
>  fsl_mc_driver_remove+0x28/0x64
>  __device_release_driver+0x188/0x22c
>  device_release_driver+0x30/0x50
>  bus_remove_device+0x128/0x134
>  device_del+0x16c/0x424
>  fsl_mc_bus_remove+0x8c/0x114
>  fsl_mc_bus_shutdown+0x14/0x20
>  platform_shutdown+0x28/0x40
>  device_shutdown+0x15c/0x330
>  __do_sys_reboot+0x218/0x2a0
>  __arm64_sys_reboot+0x28/0x34
>  invoke_syscall+0x48/0x114
>  el0_svc_common+0x40/0xdc
>  do_el0_svc+0x2c/0x94
>  el0_svc+0x2c/0x54
>  el0t_64_sync_handler+0xa8/0x12c
>  el0t_64_sync+0x198/0x19c
> ---[ end trace 32eb1c71c7d86821 ]---
> 
> Reported-by: Jon Nettleton <jon@solid-run.com>
> Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> ---
> Changes since v1:
>  - added Heikki's Reviewed-by: (Thanks!)
> 
> Changes since RFC:
>  - use software_node_notify(KOBJ_ADD) instead of directly bumping
>    refcount (Heikki)
> 
>  drivers/base/swnode.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
> index d1f1a8240120..bdb50a06c82a 100644
> --- a/drivers/base/swnode.c
> +++ b/drivers/base/swnode.c
> @@ -1113,6 +1113,9 @@ int device_create_managed_software_node(struct device *dev,
>  	to_swnode(fwnode)->managed = true;
>  	set_secondary_fwnode(dev, fwnode);
>  
> +	if (device_is_registered(dev))
> +		software_node_notify(dev, KOBJ_ADD);
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(device_create_managed_software_node);
> -- 
> 2.17.1
> 

I am seeing that this needs to go into 5.15-final, but how about any
further back?  Stable kernels?  Does this "fix" a specific commit?

thanks,

greg k-h
Laurentiu Tudor Sept. 14, 2021, 4:16 p.m. UTC | #2
On 9/14/2021 5:50 PM, Greg KH wrote:
> On Tue, Sep 14, 2021 at 05:44:09PM +0300, laurentiu.tudor@nxp.com wrote:
>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>
>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
>> software nodes, thus leading to underflow errors. Balance the refcount by
>> bumping it in the device_create_managed_software_node() function.
>>
>> The error [1] was encountered after adding a .shutdown() op to our
>> fsl-mc-bus driver.
>>
>> [1]
>> pc : refcount_warn_saturate+0xf8/0x150
>> lr : refcount_warn_saturate+0xf8/0x150
>> sp : ffff80001009b920
>> x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
>> x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
>> x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
>> x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
>> x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
>> x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
>> x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
>> x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
>> x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
>> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
>> Call trace:
>>  refcount_warn_saturate+0xf8/0x150
>>  kobject_put+0x10c/0x120
>>  software_node_notify+0xd8/0x140
>>  device_platform_notify+0x4c/0xb4
>>  device_del+0x188/0x424
>>  fsl_mc_device_remove+0x2c/0x4c
>>  rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
>>  device_for_each_child+0x5c/0xac
>>  dprc_remove+0x9c/0xc0
>>  fsl_mc_driver_remove+0x28/0x64
>>  __device_release_driver+0x188/0x22c
>>  device_release_driver+0x30/0x50
>>  bus_remove_device+0x128/0x134
>>  device_del+0x16c/0x424
>>  fsl_mc_bus_remove+0x8c/0x114
>>  fsl_mc_bus_shutdown+0x14/0x20
>>  platform_shutdown+0x28/0x40
>>  device_shutdown+0x15c/0x330
>>  __do_sys_reboot+0x218/0x2a0
>>  __arm64_sys_reboot+0x28/0x34
>>  invoke_syscall+0x48/0x114
>>  el0_svc_common+0x40/0xdc
>>  do_el0_svc+0x2c/0x94
>>  el0_svc+0x2c/0x54
>>  el0t_64_sync_handler+0xa8/0x12c
>>  el0t_64_sync+0x198/0x19c
>> ---[ end trace 32eb1c71c7d86821 ]---
>>
>> Reported-by: Jon Nettleton <jon@solid-run.com>
>> Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
>> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
>> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>> ---
>> Changes since v1:
>>  - added Heikki's Reviewed-by: (Thanks!)
>>
>> Changes since RFC:
>>  - use software_node_notify(KOBJ_ADD) instead of directly bumping
>>    refcount (Heikki)
>>
>>  drivers/base/swnode.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
>> index d1f1a8240120..bdb50a06c82a 100644
>> --- a/drivers/base/swnode.c
>> +++ b/drivers/base/swnode.c
>> @@ -1113,6 +1113,9 @@ int device_create_managed_software_node(struct device *dev,
>>  	to_swnode(fwnode)->managed = true;
>>  	set_secondary_fwnode(dev, fwnode);
>>  
>> +	if (device_is_registered(dev))
>> +		software_node_notify(dev, KOBJ_ADD);
>> +
>>  	return 0;
>>  }
>>  EXPORT_SYMBOL_GPL(device_create_managed_software_node);
>> -- 
>> 2.17.1
>>
> 
> I am seeing that this needs to go into 5.15-final, but how about any
> further back?  Stable kernels?  

I think that's a good point. I can resend and Cc: stable if everyone's
fine with that.

> Does this "fix" a specific commit?

I did not found a certain commit that introduced the breakage so don't
know what to say here. I'd let more experienced people comment on this.

---
Best Regards, Laurentiu
Heikki Krogerus Sept. 15, 2021, 7 a.m. UTC | #3
On Tue, Sep 14, 2021 at 07:16:04PM +0300, Laurentiu Tudor wrote:
> 
> 
> On 9/14/2021 5:50 PM, Greg KH wrote:
> > On Tue, Sep 14, 2021 at 05:44:09PM +0300, laurentiu.tudor@nxp.com wrote:
> >> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> >>
> >> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> >> software nodes, thus leading to underflow errors. Balance the refcount by
> >> bumping it in the device_create_managed_software_node() function.
> >>
> >> The error [1] was encountered after adding a .shutdown() op to our
> >> fsl-mc-bus driver.
> >>
> >> [1]
> >> pc : refcount_warn_saturate+0xf8/0x150
> >> lr : refcount_warn_saturate+0xf8/0x150
> >> sp : ffff80001009b920
> >> x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
> >> x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
> >> x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
> >> x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
> >> x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
> >> x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
> >> x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
> >> x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
> >> x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
> >> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
> >> Call trace:
> >>  refcount_warn_saturate+0xf8/0x150
> >>  kobject_put+0x10c/0x120
> >>  software_node_notify+0xd8/0x140
> >>  device_platform_notify+0x4c/0xb4
> >>  device_del+0x188/0x424
> >>  fsl_mc_device_remove+0x2c/0x4c
> >>  rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
> >>  device_for_each_child+0x5c/0xac
> >>  dprc_remove+0x9c/0xc0
> >>  fsl_mc_driver_remove+0x28/0x64
> >>  __device_release_driver+0x188/0x22c
> >>  device_release_driver+0x30/0x50
> >>  bus_remove_device+0x128/0x134
> >>  device_del+0x16c/0x424
> >>  fsl_mc_bus_remove+0x8c/0x114
> >>  fsl_mc_bus_shutdown+0x14/0x20
> >>  platform_shutdown+0x28/0x40
> >>  device_shutdown+0x15c/0x330
> >>  __do_sys_reboot+0x218/0x2a0
> >>  __arm64_sys_reboot+0x28/0x34
> >>  invoke_syscall+0x48/0x114
> >>  el0_svc_common+0x40/0xdc
> >>  do_el0_svc+0x2c/0x94
> >>  el0_svc+0x2c/0x54
> >>  el0t_64_sync_handler+0xa8/0x12c
> >>  el0t_64_sync+0x198/0x19c
> >> ---[ end trace 32eb1c71c7d86821 ]---
> >>
> >> Reported-by: Jon Nettleton <jon@solid-run.com>
> >> Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> >> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> >> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> >> ---
> >> Changes since v1:
> >>  - added Heikki's Reviewed-by: (Thanks!)
> >>
> >> Changes since RFC:
> >>  - use software_node_notify(KOBJ_ADD) instead of directly bumping
> >>    refcount (Heikki)
> >>
> >>  drivers/base/swnode.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
> >> index d1f1a8240120..bdb50a06c82a 100644
> >> --- a/drivers/base/swnode.c
> >> +++ b/drivers/base/swnode.c
> >> @@ -1113,6 +1113,9 @@ int device_create_managed_software_node(struct device *dev,
> >>  	to_swnode(fwnode)->managed = true;
> >>  	set_secondary_fwnode(dev, fwnode);
> >>  
> >> +	if (device_is_registered(dev))
> >> +		software_node_notify(dev, KOBJ_ADD);
> >> +
> >>  	return 0;
> >>  }
> >>  EXPORT_SYMBOL_GPL(device_create_managed_software_node);
> >> -- 
> >> 2.17.1
> >>
> > 
> > I am seeing that this needs to go into 5.15-final, but how about any
> > further back?  Stable kernels?  
> 
> I think that's a good point. I can resend and Cc: stable if everyone's
> fine with that.
> 
> > Does this "fix" a specific commit?
> 
> I did not found a certain commit that introduced the breakage so don't
> know what to say here. I'd let more experienced people comment on this.

This fixes the commit that introduced the function, so:

Fixes: 151f6ff78cdf ("software node: Provide replacement for device_add_properties()")

thanks,
diff mbox series

Patch

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index d1f1a8240120..bdb50a06c82a 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -1113,6 +1113,9 @@  int device_create_managed_software_node(struct device *dev,
 	to_swnode(fwnode)->managed = true;
 	set_secondary_fwnode(dev, fwnode);
 
+	if (device_is_registered(dev))
+		software_node_notify(dev, KOBJ_ADD);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(device_create_managed_software_node);