Message ID | 2f1be6b1cf2b3346929b0049f2ac7d7d79acb5c9.1713188539.git.lukas@wunner.de (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] igc: Fix LED-related deadlock on driver unbind | expand |
On Mon, Apr 15, 2024 at 03:48:48PM +0200, Lukas Wunner wrote: > Roman reports a deadlock on unplug of a Thunderbolt docking station > containing an Intel I225 Ethernet adapter. > > The root cause is that led_classdev's for LEDs on the adapter are > registered such that they're device-managed by the netdev. That > results in recursive acquisition of the rtnl_lock() mutex on unplug: > > When the driver calls unregister_netdev(), it acquires rtnl_lock(), > then frees the device-managed resources. Upon unregistering the LEDs, > netdev_trig_deactivate() invokes unregister_netdevice_notifier(), > which tries to acquire rtnl_lock() again. > > Avoid by using non-device-managed LED registration. > > Stack trace for posterity: > > schedule+0x6e/0xf0 > schedule_preempt_disabled+0x15/0x20 > __mutex_lock+0x2a0/0x750 > unregister_netdevice_notifier+0x40/0x150 > netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev] > led_trigger_set+0x102/0x330 > led_classdev_unregister+0x4b/0x110 > release_nodes+0x3d/0xb0 > devres_release_all+0x8b/0xc0 > device_del+0x34f/0x3c0 > unregister_netdevice_many_notify+0x80b/0xaf0 > unregister_netdev+0x7c/0xd0 > igc_remove+0xd8/0x1e0 [igc] > pci_device_remove+0x3f/0xb0 > > Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226") > Reported-by: Roman Lozko <lozko.roma@gmail.com> > Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/ > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: Heiner Kallweit <hkallweit1@gmail.com> I am aware that Kurt has submitted what appears to be the same patch [1,2], which I'm inclined to put down to miscommunication (email based workflows are like that sometimes). FWIIW, it is my understanding is that the patch originated from Lukas[3], and thus it seems most appropriate to take his submission. As for the patch itself, I agree that it addresses the problem at hand. For the record, I have not tested it. Reviewed-by: Simon Horman <horms@kernel.org> [1] [PATCH iwl-net] igc: Fix deadlock on module removal https://lore.kernel.org/netdev/20240411-igc_led_deadlock-v1-1-0da98a3c68c5@linutronix.de/ [2] [PATCH iwl-net v2] igc: Fix deadlock on module removal https://lore.kernel.org/netdev/20240411-igc_led_deadlock-v2-1-b758c0c88b2b@linutronix.de/ [3] Re: Deadlock in pciehp on dock disconnect https://lore.kernel.org/all/ZhBN9p1yOyciXkzw@wunner.de/
On Mon Apr 15 2024, Lukas Wunner wrote: > Roman reports a deadlock on unplug of a Thunderbolt docking station > containing an Intel I225 Ethernet adapter. > > The root cause is that led_classdev's for LEDs on the adapter are > registered such that they're device-managed by the netdev. That > results in recursive acquisition of the rtnl_lock() mutex on unplug: > > When the driver calls unregister_netdev(), it acquires rtnl_lock(), > then frees the device-managed resources. Upon unregistering the LEDs, > netdev_trig_deactivate() invokes unregister_netdevice_notifier(), > which tries to acquire rtnl_lock() again. > > Avoid by using non-device-managed LED registration. > > Stack trace for posterity: > > schedule+0x6e/0xf0 > schedule_preempt_disabled+0x15/0x20 > __mutex_lock+0x2a0/0x750 > unregister_netdevice_notifier+0x40/0x150 > netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev] > led_trigger_set+0x102/0x330 > led_classdev_unregister+0x4b/0x110 > release_nodes+0x3d/0xb0 > devres_release_all+0x8b/0xc0 > device_del+0x34f/0x3c0 > unregister_netdevice_many_notify+0x80b/0xaf0 > unregister_netdev+0x7c/0xd0 > igc_remove+0xd8/0x1e0 [igc] > pci_device_remove+0x3f/0xb0 > > Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226") > Reported-by: Roman Lozko <lozko.roma@gmail.com> > Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/ > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> I think, the first SoB has to be yours, because you are the patch author. In fact, my SoB is not required at all. However, feel free to add: Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # Intel i225 Thanks, Kurt
On Tue, Apr 16, 2024 at 04:06:49PM +0200, Kurt Kanzenbach wrote: > On Mon Apr 15 2024, Lukas Wunner wrote: > > Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226") > > Reported-by: Roman Lozko <lozko.roma@gmail.com> > > Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/ > > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> > > I think, the first SoB has to be yours, because you are the patch > author. In fact, my SoB is not required at all. My understanding is that the commit author must be identical to the last Signed-off-by, so I put mine last. I've seen Stephen Rothwell send complaints whenever he spotted commits in linux-next violating that. I carried over the variable and function renaming you did to match the driver's (or your) preferred style, hence the inclusion of your Signed-off-by. Thanks! Lukas
On 4/15/2024 16:48, Lukas Wunner wrote: > Roman reports a deadlock on unplug of a Thunderbolt docking station > containing an Intel I225 Ethernet adapter. > > The root cause is that led_classdev's for LEDs on the adapter are > registered such that they're device-managed by the netdev. That > results in recursive acquisition of the rtnl_lock() mutex on unplug: > > When the driver calls unregister_netdev(), it acquires rtnl_lock(), > then frees the device-managed resources. Upon unregistering the LEDs, > netdev_trig_deactivate() invokes unregister_netdevice_notifier(), > which tries to acquire rtnl_lock() again. > > Avoid by using non-device-managed LED registration. > > Stack trace for posterity: > > schedule+0x6e/0xf0 > schedule_preempt_disabled+0x15/0x20 > __mutex_lock+0x2a0/0x750 > unregister_netdevice_notifier+0x40/0x150 > netdev_trig_deactivate+0x1f/0x60 [ledtrig_netdev] > led_trigger_set+0x102/0x330 > led_classdev_unregister+0x4b/0x110 > release_nodes+0x3d/0xb0 > devres_release_all+0x8b/0xc0 > device_del+0x34f/0x3c0 > unregister_netdevice_many_notify+0x80b/0xaf0 > unregister_netdev+0x7c/0xd0 > igc_remove+0xd8/0x1e0 [igc] > pci_device_remove+0x3f/0xb0 > > Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226") > Reported-by: Roman Lozko <lozko.roma@gmail.com> > Closes: https://lore.kernel.org/r/CAEhC_B=ksywxCG_+aQqXUrGEgKq+4mqnSV8EBHOKbC3-Obj9+Q@mail.gmail.com/ > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: Heiner Kallweit <hkallweit1@gmail.com> > --- > drivers/net/ethernet/intel/igc/igc.h | 2 ++ > drivers/net/ethernet/intel/igc/igc_leds.c | 38 ++++++++++++++++++++++++------- > drivers/net/ethernet/intel/igc/igc_main.c | 3 +++ > 3 files changed, 35 insertions(+), 8 deletions(-) Tested-by: Naama Meir <naamax.meir@linux.intel.com>
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h index 90316dc..6bc56c7 100644 --- a/drivers/net/ethernet/intel/igc/igc.h +++ b/drivers/net/ethernet/intel/igc/igc.h @@ -298,6 +298,7 @@ struct igc_adapter { /* LEDs */ struct mutex led_mutex; + struct igc_led_classdev *leds; }; void igc_up(struct igc_adapter *adapter); @@ -723,6 +724,7 @@ struct igc_nfc_rule *igc_get_nfc_rule(struct igc_adapter *adapter, void igc_ptp_tx_tstamp_event(struct igc_adapter *adapter); int igc_led_setup(struct igc_adapter *adapter); +void igc_led_free(struct igc_adapter *adapter); #define igc_rx_pg_size(_ring) (PAGE_SIZE << igc_rx_pg_order(_ring)) diff --git a/drivers/net/ethernet/intel/igc/igc_leds.c b/drivers/net/ethernet/intel/igc/igc_leds.c index bf240c5..3929b25 100644 --- a/drivers/net/ethernet/intel/igc/igc_leds.c +++ b/drivers/net/ethernet/intel/igc/igc_leds.c @@ -236,8 +236,8 @@ static void igc_led_get_name(struct igc_adapter *adapter, int index, char *buf, pci_dev_id(adapter->pdev), index); } -static void igc_setup_ldev(struct igc_led_classdev *ldev, - struct net_device *netdev, int index) +static int igc_setup_ldev(struct igc_led_classdev *ldev, + struct net_device *netdev, int index) { struct igc_adapter *adapter = netdev_priv(netdev); struct led_classdev *led_cdev = &ldev->led; @@ -257,24 +257,46 @@ static void igc_setup_ldev(struct igc_led_classdev *ldev, led_cdev->hw_control_get = igc_led_hw_control_get; led_cdev->hw_control_get_device = igc_led_hw_control_get_device; - devm_led_classdev_register(&netdev->dev, led_cdev); + return led_classdev_register(&netdev->dev, led_cdev); } int igc_led_setup(struct igc_adapter *adapter) { struct net_device *netdev = adapter->netdev; - struct device *dev = &netdev->dev; struct igc_led_classdev *leds; - int i; + int i, err; mutex_init(&adapter->led_mutex); - leds = devm_kcalloc(dev, IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL); + leds = kcalloc(IGC_NUM_LEDS, sizeof(*leds), GFP_KERNEL); if (!leds) return -ENOMEM; - for (i = 0; i < IGC_NUM_LEDS; i++) - igc_setup_ldev(leds + i, netdev, i); + for (i = 0; i < IGC_NUM_LEDS; i++) { + err = igc_setup_ldev(leds + i, netdev, i); + if (err) + goto err; + } + + adapter->leds = leds; return 0; + +err: + for (i--; i >= 0; i--) + led_classdev_unregister(&((leds + i)->led)); + + kfree(leds); + return err; +} + +void igc_led_free(struct igc_adapter *adapter) +{ + struct igc_led_classdev *leds = adapter->leds; + int i; + + for (i = 0; i < IGC_NUM_LEDS; i++) + led_classdev_unregister(&((leds + i)->led)); + + kfree(leds); } diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 35ad40a..4d975d6 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -7021,6 +7021,9 @@ static void igc_remove(struct pci_dev *pdev) cancel_work_sync(&adapter->watchdog_task); hrtimer_cancel(&adapter->hrtimer); + if (IS_ENABLED(CONFIG_IGC_LEDS)) + igc_led_free(adapter); + /* Release control of h/w to f/w. If f/w is AMT enabled, this * would have already happened in close and is redundant. */