diff mbox series

[net] igc: Fix LED-related deadlock on driver unbind

Message ID 20240422204503.225448-1-anthony.l.nguyen@intel.com (mailing list archive)
State Accepted
Commit c04d1b9ecce565455652ac3c6b17043cd475cf47
Delegated to: Netdev Maintainers
Headers show
Series [net] igc: Fix LED-related deadlock on driver unbind | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 926 this patch: 926
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: andrew@lunn.ch; 1 maintainers not CCed: andrew@lunn.ch
netdev/build_clang success Errors and warnings before: 939 this patch: 939
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 937 this patch: 937
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 85 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 29 this patch: 29
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-24--15-00 (tests: 995)

Commit Message

Tony Nguyen April 22, 2024, 8:45 p.m. UTC
From: Lukas Wunner <lukas@wunner.de>

Roman reports a deadlock on unplug of a Thunderbolt docking station
containing an Intel I225 Ethernet adapter.

The root cause is that led_classdev's for LEDs on the adapter are
registered such that they're device-managed by the netdev.  That
results in recursive acquisition of the rtnl_lock() mutex on unplug:

When the driver calls unregister_netdev(), it acquires rtnl_lock(),
then frees the device-managed resources.  Upon unregistering the LEDs,
netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
which tries to acquire rtnl_lock() again.

Avoid by using non-device-managed LED registration.

Stack trace for posterity:

  schedule+0x6e/0xf0
  schedule_preempt_disabled+0x15/0x20
  __mutex_lock+0x2a0/0x750
  unregister_netdevice_notifier+0x40/0x150
  netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev]
  led_trigger_set+0x102/0x330
  led_classdev_unregister+0x4b/0x110
  release_nodes+0x3d/0xb0
  devres_release_all+0x8b/0xc0
  device_del+0x34f/0x3c0
  unregister_netdevice_many_notify+0x80b/0xaf0
  unregister_netdev+0x7c/0xd0
  igc_remove+0xd8/0x1e0 [igc]
  pci_device_remove+0x3f/0xb0

Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
Reported-by: Roman Lozko <lozko.roma@gmail.com>
Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/
Reported-by: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
Closes: https://lore.kernel.org/r/ZhRD3cOtz5i-61PB@mail-itl/
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel i225
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |  2 ++
 drivers/net/ethernet/intel/igc/igc_leds.c | 38 ++++++++++++++++++-----
 drivers/net/ethernet/intel/igc/igc_main.c |  3 ++
 3 files changed, 35 insertions(+), 8 deletions(-)

Comments

Jacob Keller April 22, 2024, 11:32 p.m. UTC | #1
On 4/22/2024 1:45 PM, Tony Nguyen wrote:
> From: Lukas Wunner <lukas@wunner.de>
> 
> Roman reports a deadlock on unplug of a Thunderbolt docking station
> containing an Intel I225 Ethernet adapter.
> 
> The root cause is that led_classdev's for LEDs on the adapter are
> registered such that they're device-managed by the netdev.  That
> results in recursive acquisition of the rtnl_lock() mutex on unplug:
> 
> When the driver calls unregister_netdev(), it acquires rtnl_lock(),
> then frees the device-managed resources.  Upon unregistering the LEDs,
> netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
> which tries to acquire rtnl_lock() again.
> 
> Avoid by using non-device-managed LED registration.
> 

Could we instead switch to using devm with the PCI device struct instead
of the netdev struct? That would make it still get automatically cleaned
up, but by cleaning it up only when the PCIe device goes away, which
should be after rtnl_lock() is released..

I guess I don't have an objection to this patch in principle since it
does resolve the issue, but it seems like it would be simpler to switch
which device managed the resources vs re-adding the manual handling.

Thanks,
Jake

> Stack trace for posterity:
> 
>   schedule+0x6e/0xf0
>   schedule_preempt_disabled+0x15/0x20
>   __mutex_lock+0x2a0/0x750
>   unregister_netdevice_notifier+0x40/0x150
>   netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev]
>   led_trigger_set+0x102/0x330
>   led_classdev_unregister+0x4b/0x110
>   release_nodes+0x3d/0xb0
>   devres_release_all+0x8b/0xc0
>   device_del+0x34f/0x3c0
>   unregister_netdevice_many_notify+0x80b/0xaf0
>   unregister_netdev+0x7c/0xd0
>   igc_remove+0xd8/0x1e0 [igc]
>   pci_device_remove+0x3f/0xb0
> 
> Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
> Reported-by: Roman Lozko <lozko.roma@gmail.com>
> Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/
> Reported-by: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
> Closes: https://lore.kernel.org/r/ZhRD3cOtz5i-61PB@mail-itl/
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Reviewed-by: Simon Horman <horms@kernel.org>
> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel i225
> Tested-by: Naama Meir <naamax.meir@linux.intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc.h      |  2 ++
>  drivers/net/ethernet/intel/igc/igc_leds.c | 38 ++++++++++++++++++-----
>  drivers/net/ethernet/intel/igc/igc_main.c |  3 ++
>  3 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> index 90316dc58630..6bc56c7c181e 100644
> --- a/drivers/net/ethernet/intel/igc/igc.h
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -298,6 +298,7 @@ struct igc_adapter {
>  
>  	/* LEDs */
>  	struct mutex led_mutex;
> +	struct igc_led_classdev *leds;
>  };
>  
>  void igc_up(struct igc_adapter *adapter);
> @@ -723,6 +724,7 @@ void igc_ptp_read(struct igc_adapter *adapter, struct timespec64 *ts);
>  void igc_ptp_tx_tstamp_event(struct igc_adapter *adapter);
>  
>  int igc_led_setup(struct igc_adapter *adapter);
> +void igc_led_free(struct igc_adapter *adapter);
>  
>  #define igc_rx_pg_size(_ring) (PAGE_SIZE << igc_rx_pg_order(_ring))
>  
> diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c
> index bf240c5daf86..3929b25b6ae6 100644
> --- a/drivers/net/ethernet/intel/igc/igc_leds.c
> +++ b/drivers/net/ethernet/intel/igc/igc_leds.c
> @@ -236,8 +236,8 @@ static void igc_led_get_name(struct igc_adapter *adapter, int index, char *buf,
>  		 pci_dev_id(adapter->pdev), index);
>  }
>  
> -static void igc_setup_ldev(struct igc_led_classdev *ldev,
> -			   struct net_device *netdev, int index)
> +static int igc_setup_ldev(struct igc_led_classdev *ldev,
> +			  struct net_device *netdev, int index)
>  {
>  	struct igc_adapter *adapter = netdev_priv(netdev);
>  	struct led_classdev *led_cdev = &ldev->led;
> @@ -257,24 +257,46 @@ static void igc_setup_ldev(struct igc_led_classdev *ldev,
>  	led_cdev->hw_control_get = igc_led_hw_control_get;
>  	led_cdev->hw_control_get_device = igc_led_hw_control_get_device;
>  
> -	devm_led_classdev_register(&netdev->dev, led_cdev);
> +	return led_classdev_register(&netdev->dev, led_cdev);
>  }
>  
>  int igc_led_setup(struct igc_adapter *adapter)
>  {
>  	struct net_device *netdev = adapter->netdev;
> -	struct device *dev = &netdev->dev;
>  	struct igc_led_classdev *leds;
> -	int i;
> +	int i, err;
>  
>  	mutex_init(&adapter->led_mutex);
>  
> -	leds = devm_kcalloc(dev, IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL);
> +	leds = kcalloc(IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL);
>  	if (!leds)
>  		return -ENOMEM;
>  
> -	for (i = 0; i < IGC_NUM_LEDS; i++)
> -		igc_setup_ldev(leds + i, netdev, i);
> +	for (i = 0; i < IGC_NUM_LEDS; i++) {
> +		err = igc_setup_ldev(leds + i, netdev, i);
> +		if (err)
> +			goto err;
> +	}
> +
> +	adapter->leds = leds;
>  
>  	return 0;
> +
> +err:
> +	for (i--; i >= 0; i--)
> +		led_classdev_unregister(&((leds + i)->led));
> +
> +	kfree(leds);
> +	return err;
> +}
> +
> +void igc_led_free(struct igc_adapter *adapter)
> +{
> +	struct igc_led_classdev *leds = adapter->leds;
> +	int i;
> +
> +	for (i = 0; i < IGC_NUM_LEDS; i++)
> +		led_classdev_unregister(&((leds + i)->led));
> +
> +	kfree(leds);
>  }
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 35ad40a803cb..4d975d620a8e 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -7021,6 +7021,9 @@ static void igc_remove(struct pci_dev *pdev)
>  	cancel_work_sync(&adapter->watchdog_task);
>  	hrtimer_cancel(&adapter->hrtimer);
>  
> +	if (IS_ENABLED(CONFIG_IGC_LEDS))
> +		igc_led_free(adapter);
> +
>  	/* Release control of h/w to f/w.  If f/w is AMT enabled, this
>  	 * would have already happened in close and is redundant.
>  	 */
Marek Marczykowski-Górecki April 22, 2024, 11:37 p.m. UTC | #2
On Mon, Apr 22, 2024 at 04:32:01PM -0700, Jacob Keller wrote:
> On 4/22/2024 1:45 PM, Tony Nguyen wrote:
> > From: Lukas Wunner <lukas@wunner.de>
> > 
> > Roman reports a deadlock on unplug of a Thunderbolt docking station
> > containing an Intel I225 Ethernet adapter.
> > 
> > The root cause is that led_classdev's for LEDs on the adapter are
> > registered such that they're device-managed by the netdev.  That
> > results in recursive acquisition of the rtnl_lock() mutex on unplug:
> > 
> > When the driver calls unregister_netdev(), it acquires rtnl_lock(),
> > then frees the device-managed resources.  Upon unregistering the LEDs,
> > netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
> > which tries to acquire rtnl_lock() again.
> > 
> > Avoid by using non-device-managed LED registration.
> 
> Could we instead switch to using devm with the PCI device struct instead
> of the netdev struct? That would make it still get automatically cleaned
> up, but by cleaning it up only when the PCIe device goes away, which
> should be after rtnl_lock() is released..

Wouldn't that effectively leak memory if driver is unbound from the
device and then bound back (and possibly repeated multiple times)?
Jacob Keller April 22, 2024, 11:46 p.m. UTC | #3
On 4/22/2024 4:37 PM, Marek Marczykowski-Górecki wrote:
> On Mon, Apr 22, 2024 at 04:32:01PM -0700, Jacob Keller wrote:
>> On 4/22/2024 1:45 PM, Tony Nguyen wrote:
>>> From: Lukas Wunner <lukas@wunner.de>
>>>
>>> Roman reports a deadlock on unplug of a Thunderbolt docking station
>>> containing an Intel I225 Ethernet adapter.
>>>
>>> The root cause is that led_classdev's for LEDs on the adapter are
>>> registered such that they're device-managed by the netdev.  That
>>> results in recursive acquisition of the rtnl_lock() mutex on unplug:
>>>
>>> When the driver calls unregister_netdev(), it acquires rtnl_lock(),
>>> then frees the device-managed resources.  Upon unregistering the LEDs,
>>> netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
>>> which tries to acquire rtnl_lock() again.
>>>
>>> Avoid by using non-device-managed LED registration.
>>
>> Could we instead switch to using devm with the PCI device struct instead
>> of the netdev struct? That would make it still get automatically cleaned
>> up, but by cleaning it up only when the PCIe device goes away, which
>> should be after rtnl_lock() is released..
> 
> Wouldn't that effectively leak memory if driver is unbound from the
> device and then bound back (and possibly repeated multiple times)?
>

My understanding of devm is that when you unload the driver it calls the
devm teardowns so you only leak until driver remove.

In the netdev case, you're releasing during unregister_netdev() instead
of at the end of the .remove() callback of the PCI driver.

To me, using devm from the PCI device should be equivalent to managing
it manually within the igc_remove() function.

I could be mis-understanding how devm works, or the order and flow for
how and when igc allocates these?
Lukas Wunner April 23, 2024, 7:53 a.m. UTC | #4
On Mon, Apr 22, 2024 at 04:32:01PM -0700, Jacob Keller wrote:
> On 4/22/2024 1:45 PM, Tony Nguyen wrote:
> > Roman reports a deadlock on unplug of a Thunderbolt docking station
> > containing an Intel I225 Ethernet adapter.
> > 
> > The root cause is that led_classdev's for LEDs on the adapter are
> > registered such that they're device-managed by the netdev.  That
> > results in recursive acquisition of the rtnl_lock() mutex on unplug:
> > 
> > When the driver calls unregister_netdev(), it acquires rtnl_lock(),
> > then frees the device-managed resources.  Upon unregistering the LEDs,
> > netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
> > which tries to acquire rtnl_lock() again.
> > 
> > Avoid by using non-device-managed LED registration.
> > 
> 
> Could we instead switch to using devm with the PCI device struct instead
> of the netdev struct?

No, unfortunately that doesn't work:

The unregistering of the LEDs would then happen after unbind of the
pci_dev, i.e. after igc_release_hw_control() and pci_disable_device().
The LED registers aren't even accessible at that point, but the LEDs
are still exposed in sysfs.  I tried that approach but then realized
it's a mistake:

https://lore.kernel.org/all/ZhBN9p1yOyciXkzw@wunner.de/

Andrew Lunn concurred and wrote that "LEDs need to be added and
explicitly removed within the life cycle of the netdev":

https://lore.kernel.org/all/7cfb1af7-3270-447a-a2cf-16c2af02ec29@lunn.ch/

We'd have to convert the igc driver to use devm_*() for everything to
avoid this ordering issue.  I don't think that's something we can do
at this point in the cycle.  The present patch fixes a regression
introduced with v6.9-rc1.


There's another reason this approach doesn't work:

The first argument to devm_led_classdev_register() has two purposes:
(1) It's used to manage the resource (i.e. LED is unregistered on unbind),
(2) but it's also used as the parent below which the LED appears in sysfs.

If I changed the argument to the pci_dev, the LED would suddenly appear
below the pci_dev in sysfs, instead of the netdev.  So the patch would
result in an undesired change of behavior.

Of course we can discuss introducing a new devm_*() helper which accepts
separate device arguments for the two purposes above.  But that would
likewise be something we can't do at this point in the cycle.

We discussed the conundrum of the dual-purpose device argument in a
separate thread for r8169 (which suffered from the same LED deadlock):

https://lore.kernel.org/all/20240405205903.GA3458@wunner.de/

Thanks,

Lukas
Lukas Wunner April 23, 2024, 8:08 a.m. UTC | #5
On Mon, Apr 22, 2024 at 04:46:28PM -0700, Jacob Keller wrote:
> To me, using devm from the PCI device should be equivalent to managing
> it manually within the igc_remove() function.

It is not equivalent because the ordering is different:

igc_remove() is called before device-managed resources are released:

__device_release_driver()
  device_remove()             # invokes igc_remove()
  device_unbind_cleanup()
    devres_release_all()      # releases device-managed resources

If you unregister LEDs explicitly in igc_remove() before unregistering
the netdev and disabling PCI device access, everything's fine.

If you instead use devm_led_classdev_register(), the LEDs would still
be registered and available in sysfs after igc_remove() has torn down
everything, which is bad.

You'd have to use devm_*() for all initialization steps in igc_probe()
to make this work.  With devm_*() it's generally all or nothing.

(There are exceptions:  Using devm_*() just for memory allocations is
fine as those can safely be freed even if everything else is torn down.)

Thanks,

Lukas
patchwork-bot+netdevbpf@kernel.org April 25, 2024, 3:40 a.m. UTC | #6
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 22 Apr 2024 13:45:02 -0700 you wrote:
> From: Lukas Wunner <lukas@wunner.de>
> 
> Roman reports a deadlock on unplug of a Thunderbolt docking station
> containing an Intel I225 Ethernet adapter.
> 
> The root cause is that led_classdev's for LEDs on the adapter are
> registered such that they're device-managed by the netdev.  That
> results in recursive acquisition of the rtnl_lock() mutex on unplug:
> 
> [...]

Here is the summary with links:
  - [net] igc: Fix LED-related deadlock on driver unbind
    https://git.kernel.org/netdev/net/c/c04d1b9ecce5

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 90316dc58630..6bc56c7c181e 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -298,6 +298,7 @@  struct igc_adapter {
 
 	/* LEDs */
 	struct mutex led_mutex;
+	struct igc_led_classdev *leds;
 };
 
 void igc_up(struct igc_adapter *adapter);
@@ -723,6 +724,7 @@  void igc_ptp_read(struct igc_adapter *adapter, struct timespec64 *ts);
 void igc_ptp_tx_tstamp_event(struct igc_adapter *adapter);
 
 int igc_led_setup(struct igc_adapter *adapter);
+void igc_led_free(struct igc_adapter *adapter);
 
 #define igc_rx_pg_size(_ring) (PAGE_SIZE << igc_rx_pg_order(_ring))
 
diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c
index bf240c5daf86..3929b25b6ae6 100644
--- a/drivers/net/ethernet/intel/igc/igc_leds.c
+++ b/drivers/net/ethernet/intel/igc/igc_leds.c
@@ -236,8 +236,8 @@  static void igc_led_get_name(struct igc_adapter *adapter, int index, char *buf,
 		 pci_dev_id(adapter->pdev), index);
 }
 
-static void igc_setup_ldev(struct igc_led_classdev *ldev,
-			   struct net_device *netdev, int index)
+static int igc_setup_ldev(struct igc_led_classdev *ldev,
+			  struct net_device *netdev, int index)
 {
 	struct igc_adapter *adapter = netdev_priv(netdev);
 	struct led_classdev *led_cdev = &ldev->led;
@@ -257,24 +257,46 @@  static void igc_setup_ldev(struct igc_led_classdev *ldev,
 	led_cdev->hw_control_get = igc_led_hw_control_get;
 	led_cdev->hw_control_get_device = igc_led_hw_control_get_device;
 
-	devm_led_classdev_register(&netdev->dev, led_cdev);
+	return led_classdev_register(&netdev->dev, led_cdev);
 }
 
 int igc_led_setup(struct igc_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
-	struct device *dev = &netdev->dev;
 	struct igc_led_classdev *leds;
-	int i;
+	int i, err;
 
 	mutex_init(&adapter->led_mutex);
 
-	leds = devm_kcalloc(dev, IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL);
+	leds = kcalloc(IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL);
 	if (!leds)
 		return -ENOMEM;
 
-	for (i = 0; i < IGC_NUM_LEDS; i++)
-		igc_setup_ldev(leds + i, netdev, i);
+	for (i = 0; i < IGC_NUM_LEDS; i++) {
+		err = igc_setup_ldev(leds + i, netdev, i);
+		if (err)
+			goto err;
+	}
+
+	adapter->leds = leds;
 
 	return 0;
+
+err:
+	for (i--; i >= 0; i--)
+		led_classdev_unregister(&((leds + i)->led));
+
+	kfree(leds);
+	return err;
+}
+
+void igc_led_free(struct igc_adapter *adapter)
+{
+	struct igc_led_classdev *leds = adapter->leds;
+	int i;
+
+	for (i = 0; i < IGC_NUM_LEDS; i++)
+		led_classdev_unregister(&((leds + i)->led));
+
+	kfree(leds);
 }
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 35ad40a803cb..4d975d620a8e 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -7021,6 +7021,9 @@  static void igc_remove(struct pci_dev *pdev)
 	cancel_work_sync(&adapter->watchdog_task);
 	hrtimer_cancel(&adapter->hrtimer);
 
+	if (IS_ENABLED(CONFIG_IGC_LEDS))
+		igc_led_free(adapter);
+
 	/* Release control of h/w to f/w.  If f/w is AMT enabled, this
 	 * would have already happened in close and is redundant.
 	 */