Message ID | 20240422204503.225448-1-anthony.l.nguyen@intel.com (mailing list archive) |
---|---|
State | Accepted |
Commit | c04d1b9ecce565455652ac3c6b17043cd475cf47 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] igc: Fix LED-related deadlock on driver unbind | expand |
On 4/22/2024 1:45 PM, Tony Nguyen wrote: > From: Lukas Wunner <lukas@wunner.de> > > Roman reports a deadlock on unplug of a Thunderbolt docking station > containing an Intel I225 Ethernet adapter. > > The root cause is that led_classdev's for LEDs on the adapter are > registered such that they're device-managed by the netdev. That > results in recursive acquisition of the rtnl_lock() mutex on unplug: > > When the driver calls unregister_netdev(), it acquires rtnl_lock(), > then frees the device-managed resources. Upon unregistering the LEDs, > netdev_trig_deactivate() invokes unregister_netdevice_notifier(), > which tries to acquire rtnl_lock() again. > > Avoid by using non-device-managed LED registration. > Could we instead switch to using devm with the PCI device struct instead of the netdev struct? That would make it still get automatically cleaned up, but by cleaning it up only when the PCIe device goes away, which should be after rtnl_lock() is released.. I guess I don't have an objection to this patch in principle since it does resolve the issue, but it seems like it would be simpler to switch which device managed the resources vs re-adding the manual handling. Thanks, Jake > Stack trace for posterity: > > schedule+0x6e/0xf0 > schedule_preempt_disabled+0x15/0x20 > __mutex_lock+0x2a0/0x750 > unregister_netdevice_notifier+0x40/0x150 > netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev] > led_trigger_set+0x102/0x330 > led_classdev_unregister+0x4b/0x110 > release_nodes+0x3d/0xb0 > devres_release_all+0x8b/0xc0 > device_del+0x34f/0x3c0 > unregister_netdevice_many_notify+0x80b/0xaf0 > unregister_netdev+0x7c/0xd0 > igc_remove+0xd8/0x1e0 [igc] > pci_device_remove+0x3f/0xb0 > > Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226") > Reported-by: Roman Lozko <lozko.roma@gmail.com> > Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/ > Reported-by: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com> > Closes: https://lore.kernel.org/r/ZhRD3cOtz5i-61PB@mail-itl/ > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: Heiner Kallweit <hkallweit1@gmail.com> > Reviewed-by: Simon Horman <horms@kernel.org> > Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> > Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel i225 > Tested-by: Naama Meir <naamax.meir@linux.intel.com> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> > --- > drivers/net/ethernet/intel/igc/igc.h | 2 ++ > drivers/net/ethernet/intel/igc/igc_leds.c | 38 ++++++++++++++++++----- > drivers/net/ethernet/intel/igc/igc_main.c | 3 ++ > 3 files changed, 35 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h > index 90316dc58630..6bc56c7c181e 100644 > --- a/drivers/net/ethernet/intel/igc/igc.h > +++ b/drivers/net/ethernet/intel/igc/igc.h > @@ -298,6 +298,7 @@ struct igc_adapter { > > /* LEDs */ > struct mutex led_mutex; > + struct igc_led_classdev *leds; > }; > > void igc_up(struct igc_adapter *adapter); > @@ -723,6 +724,7 @@ void igc_ptp_read(struct igc_adapter *adapter, struct timespec64 *ts); > void igc_ptp_tx_tstamp_event(struct igc_adapter *adapter); > > int igc_led_setup(struct igc_adapter *adapter); > +void igc_led_free(struct igc_adapter *adapter); > > #define igc_rx_pg_size(_ring) (PAGE_SIZE << igc_rx_pg_order(_ring)) > > diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c > index bf240c5daf86..3929b25b6ae6 100644 > --- a/drivers/net/ethernet/intel/igc/igc_leds.c > +++ b/drivers/net/ethernet/intel/igc/igc_leds.c > @@ -236,8 +236,8 @@ static void igc_led_get_name(struct igc_adapter *adapter, int index, char *buf, > pci_dev_id(adapter->pdev), index); > } > > -static void igc_setup_ldev(struct igc_led_classdev *ldev, > - struct net_device *netdev, int index) > +static int igc_setup_ldev(struct igc_led_classdev *ldev, > + struct net_device *netdev, int index) > { > struct igc_adapter *adapter = netdev_priv(netdev); > struct led_classdev *led_cdev = &ldev->led; > @@ -257,24 +257,46 @@ static void igc_setup_ldev(struct igc_led_classdev *ldev, > led_cdev->hw_control_get = igc_led_hw_control_get; > led_cdev->hw_control_get_device = igc_led_hw_control_get_device; > > - devm_led_classdev_register(&netdev->dev, led_cdev); > + return led_classdev_register(&netdev->dev, led_cdev); > } > > int igc_led_setup(struct igc_adapter *adapter) > { > struct net_device *netdev = adapter->netdev; > - struct device *dev = &netdev->dev; > struct igc_led_classdev *leds; > - int i; > + int i, err; > > mutex_init(&adapter->led_mutex); > > - leds = devm_kcalloc(dev, IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL); > + leds = kcalloc(IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL); > if (!leds) > return -ENOMEM; > > - for (i = 0; i < IGC_NUM_LEDS; i++) > - igc_setup_ldev(leds + i, netdev, i); > + for (i = 0; i < IGC_NUM_LEDS; i++) { > + err = igc_setup_ldev(leds + i, netdev, i); > + if (err) > + goto err; > + } > + > + adapter->leds = leds; > > return 0; > + > +err: > + for (i--; i >= 0; i--) > + led_classdev_unregister(&((leds + i)->led)); > + > + kfree(leds); > + return err; > +} > + > +void igc_led_free(struct igc_adapter *adapter) > +{ > + struct igc_led_classdev *leds = adapter->leds; > + int i; > + > + for (i = 0; i < IGC_NUM_LEDS; i++) > + led_classdev_unregister(&((leds + i)->led)); > + > + kfree(leds); > } > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c > index 35ad40a803cb..4d975d620a8e 100644 > --- a/drivers/net/ethernet/intel/igc/igc_main.c > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > @@ -7021,6 +7021,9 @@ static void igc_remove(struct pci_dev *pdev) > cancel_work_sync(&adapter->watchdog_task); > hrtimer_cancel(&adapter->hrtimer); > > + if (IS_ENABLED(CONFIG_IGC_LEDS)) > + igc_led_free(adapter); > + > /* Release control of h/w to f/w. If f/w is AMT enabled, this > * would have already happened in close and is redundant. > */
On Mon, Apr 22, 2024 at 04:32:01PM -0700, Jacob Keller wrote: > On 4/22/2024 1:45 PM, Tony Nguyen wrote: > > From: Lukas Wunner <lukas@wunner.de> > > > > Roman reports a deadlock on unplug of a Thunderbolt docking station > > containing an Intel I225 Ethernet adapter. > > > > The root cause is that led_classdev's for LEDs on the adapter are > > registered such that they're device-managed by the netdev. That > > results in recursive acquisition of the rtnl_lock() mutex on unplug: > > > > When the driver calls unregister_netdev(), it acquires rtnl_lock(), > > then frees the device-managed resources. Upon unregistering the LEDs, > > netdev_trig_deactivate() invokes unregister_netdevice_notifier(), > > which tries to acquire rtnl_lock() again. > > > > Avoid by using non-device-managed LED registration. > > Could we instead switch to using devm with the PCI device struct instead > of the netdev struct? That would make it still get automatically cleaned > up, but by cleaning it up only when the PCIe device goes away, which > should be after rtnl_lock() is released.. Wouldn't that effectively leak memory if driver is unbound from the device and then bound back (and possibly repeated multiple times)?
On 4/22/2024 4:37 PM, Marek Marczykowski-Górecki wrote: > On Mon, Apr 22, 2024 at 04:32:01PM -0700, Jacob Keller wrote: >> On 4/22/2024 1:45 PM, Tony Nguyen wrote: >>> From: Lukas Wunner <lukas@wunner.de> >>> >>> Roman reports a deadlock on unplug of a Thunderbolt docking station >>> containing an Intel I225 Ethernet adapter. >>> >>> The root cause is that led_classdev's for LEDs on the adapter are >>> registered such that they're device-managed by the netdev. That >>> results in recursive acquisition of the rtnl_lock() mutex on unplug: >>> >>> When the driver calls unregister_netdev(), it acquires rtnl_lock(), >>> then frees the device-managed resources. Upon unregistering the LEDs, >>> netdev_trig_deactivate() invokes unregister_netdevice_notifier(), >>> which tries to acquire rtnl_lock() again. >>> >>> Avoid by using non-device-managed LED registration. >> >> Could we instead switch to using devm with the PCI device struct instead >> of the netdev struct? That would make it still get automatically cleaned >> up, but by cleaning it up only when the PCIe device goes away, which >> should be after rtnl_lock() is released.. > > Wouldn't that effectively leak memory if driver is unbound from the > device and then bound back (and possibly repeated multiple times)? > My understanding of devm is that when you unload the driver it calls the devm teardowns so you only leak until driver remove. In the netdev case, you're releasing during unregister_netdev() instead of at the end of the .remove() callback of the PCI driver. To me, using devm from the PCI device should be equivalent to managing it manually within the igc_remove() function. I could be mis-understanding how devm works, or the order and flow for how and when igc allocates these?
On Mon, Apr 22, 2024 at 04:32:01PM -0700, Jacob Keller wrote: > On 4/22/2024 1:45 PM, Tony Nguyen wrote: > > Roman reports a deadlock on unplug of a Thunderbolt docking station > > containing an Intel I225 Ethernet adapter. > > > > The root cause is that led_classdev's for LEDs on the adapter are > > registered such that they're device-managed by the netdev. That > > results in recursive acquisition of the rtnl_lock() mutex on unplug: > > > > When the driver calls unregister_netdev(), it acquires rtnl_lock(), > > then frees the device-managed resources. Upon unregistering the LEDs, > > netdev_trig_deactivate() invokes unregister_netdevice_notifier(), > > which tries to acquire rtnl_lock() again. > > > > Avoid by using non-device-managed LED registration. > > > > Could we instead switch to using devm with the PCI device struct instead > of the netdev struct? No, unfortunately that doesn't work: The unregistering of the LEDs would then happen after unbind of the pci_dev, i.e. after igc_release_hw_control() and pci_disable_device(). The LED registers aren't even accessible at that point, but the LEDs are still exposed in sysfs. I tried that approach but then realized it's a mistake: https://lore.kernel.org/all/ZhBN9p1yOyciXkzw@wunner.de/ Andrew Lunn concurred and wrote that "LEDs need to be added and explicitly removed within the life cycle of the netdev": https://lore.kernel.org/all/7cfb1af7-3270-447a-a2cf-16c2af02ec29@lunn.ch/ We'd have to convert the igc driver to use devm_*() for everything to avoid this ordering issue. I don't think that's something we can do at this point in the cycle. The present patch fixes a regression introduced with v6.9-rc1. There's another reason this approach doesn't work: The first argument to devm_led_classdev_register() has two purposes: (1) It's used to manage the resource (i.e. LED is unregistered on unbind), (2) but it's also used as the parent below which the LED appears in sysfs. If I changed the argument to the pci_dev, the LED would suddenly appear below the pci_dev in sysfs, instead of the netdev. So the patch would result in an undesired change of behavior. Of course we can discuss introducing a new devm_*() helper which accepts separate device arguments for the two purposes above. But that would likewise be something we can't do at this point in the cycle. We discussed the conundrum of the dual-purpose device argument in a separate thread for r8169 (which suffered from the same LED deadlock): https://lore.kernel.org/all/20240405205903.GA3458@wunner.de/ Thanks, Lukas
On Mon, Apr 22, 2024 at 04:46:28PM -0700, Jacob Keller wrote: > To me, using devm from the PCI device should be equivalent to managing > it manually within the igc_remove() function. It is not equivalent because the ordering is different: igc_remove() is called before device-managed resources are released: __device_release_driver() device_remove() # invokes igc_remove() device_unbind_cleanup() devres_release_all() # releases device-managed resources If you unregister LEDs explicitly in igc_remove() before unregistering the netdev and disabling PCI device access, everything's fine. If you instead use devm_led_classdev_register(), the LEDs would still be registered and available in sysfs after igc_remove() has torn down everything, which is bad. You'd have to use devm_*() for all initialization steps in igc_probe() to make this work. With devm_*() it's generally all or nothing. (There are exceptions: Using devm_*() just for memory allocations is fine as those can safely be freed even if everything else is torn down.) Thanks, Lukas
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Mon, 22 Apr 2024 13:45:02 -0700 you wrote: > From: Lukas Wunner <lukas@wunner.de> > > Roman reports a deadlock on unplug of a Thunderbolt docking station > containing an Intel I225 Ethernet adapter. > > The root cause is that led_classdev's for LEDs on the adapter are > registered such that they're device-managed by the netdev. That > results in recursive acquisition of the rtnl_lock() mutex on unplug: > > [...] Here is the summary with links: - [net] igc: Fix LED-related deadlock on driver unbind https://git.kernel.org/netdev/net/c/c04d1b9ecce5 You are awesome, thank you!
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h index 90316dc58630..6bc56c7c181e 100644 --- a/drivers/net/ethernet/intel/igc/igc.h +++ b/drivers/net/ethernet/intel/igc/igc.h @@ -298,6 +298,7 @@ struct igc_adapter { /* LEDs */ struct mutex led_mutex; + struct igc_led_classdev *leds; }; void igc_up(struct igc_adapter *adapter); @@ -723,6 +724,7 @@ void igc_ptp_read(struct igc_adapter *adapter, struct timespec64 *ts); void igc_ptp_tx_tstamp_event(struct igc_adapter *adapter); int igc_led_setup(struct igc_adapter *adapter); +void igc_led_free(struct igc_adapter *adapter); #define igc_rx_pg_size(_ring) (PAGE_SIZE << igc_rx_pg_order(_ring)) diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c index bf240c5daf86..3929b25b6ae6 100644 --- a/drivers/net/ethernet/intel/igc/igc_leds.c +++ b/drivers/net/ethernet/intel/igc/igc_leds.c @@ -236,8 +236,8 @@ static void igc_led_get_name(struct igc_adapter *adapter, int index, char *buf, pci_dev_id(adapter->pdev), index); } -static void igc_setup_ldev(struct igc_led_classdev *ldev, - struct net_device *netdev, int index) +static int igc_setup_ldev(struct igc_led_classdev *ldev, + struct net_device *netdev, int index) { struct igc_adapter *adapter = netdev_priv(netdev); struct led_classdev *led_cdev = &ldev->led; @@ -257,24 +257,46 @@ static void igc_setup_ldev(struct igc_led_classdev *ldev, led_cdev->hw_control_get = igc_led_hw_control_get; led_cdev->hw_control_get_device = igc_led_hw_control_get_device; - devm_led_classdev_register(&netdev->dev, led_cdev); + return led_classdev_register(&netdev->dev, led_cdev); } int igc_led_setup(struct igc_adapter *adapter) { struct net_device *netdev = adapter->netdev; - struct device *dev = &netdev->dev; struct igc_led_classdev *leds; - int i; + int i, err; mutex_init(&adapter->led_mutex); - leds = devm_kcalloc(dev, IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL); + leds = kcalloc(IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL); if (!leds) return -ENOMEM; - for (i = 0; i < IGC_NUM_LEDS; i++) - igc_setup_ldev(leds + i, netdev, i); + for (i = 0; i < IGC_NUM_LEDS; i++) { + err = igc_setup_ldev(leds + i, netdev, i); + if (err) + goto err; + } + + adapter->leds = leds; return 0; + +err: + for (i--; i >= 0; i--) + led_classdev_unregister(&((leds + i)->led)); + + kfree(leds); + return err; +} + +void igc_led_free(struct igc_adapter *adapter) +{ + struct igc_led_classdev *leds = adapter->leds; + int i; + + for (i = 0; i < IGC_NUM_LEDS; i++) + led_classdev_unregister(&((leds + i)->led)); + + kfree(leds); } diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 35ad40a803cb..4d975d620a8e 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -7021,6 +7021,9 @@ static void igc_remove(struct pci_dev *pdev) cancel_work_sync(&adapter->watchdog_task); hrtimer_cancel(&adapter->hrtimer); + if (IS_ENABLED(CONFIG_IGC_LEDS)) + igc_led_free(adapter); + /* Release control of h/w to f/w. If f/w is AMT enabled, this * would have already happened in close and is redundant. */