Message ID | 20231206113934.8d7819857574.I2deb5804ef1739a2af307283d320ef7d82456494@changeid (mailing list archive) |
---|---|
State | Deferred |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v3] net: ethtool: do runtime PM outside RTNL | expand |
On Wed, 6 Dec 2023 11:39:32 +0100 Johannes Berg wrote: > As reported by Marc MERLIN, at least one driver (igc) wants or > needs to acquire the RTNL inside suspend/resume ops, which can > be called from here in ethtool if runtime PM is enabled. > > Allow this by doing runtime PM transitions without the RTNL > held. For the ioctl to have the same operations order, this > required reworking the code to separately check validity and > do the operation. For the netlink code, this now has to do > the runtime_pm_put a bit later. I was really, really hoping that this would serve as a motivation for Intel to sort out the igb/igc implementation. The flow AFAICT is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC back down immediately (!?) then it schedules a link check from a work queue, which opens it again (!?). It's a source of never ending bugs. nit: please don't repost within 24h on netdev: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#tl-dr
On Wed, 2023-12-06 at 08:44 -0800, Jakub Kicinski wrote: > On Wed, 6 Dec 2023 11:39:32 +0100 Johannes Berg wrote: > > As reported by Marc MERLIN, at least one driver (igc) wants or > > needs to acquire the RTNL inside suspend/resume ops, which can > > be called from here in ethtool if runtime PM is enabled. > > > > Allow this by doing runtime PM transitions without the RTNL > > held. For the ioctl to have the same operations order, this > > required reworking the code to separately check validity and > > do the operation. For the netlink code, this now has to do > > the runtime_pm_put a bit later. > > I was really, really hoping that this would serve as a motivation > for Intel to sort out the igb/igc implementation. The flow AFAICT > is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC > back down immediately (!?) then it schedules a link check from a work > queue, which opens it again (!?). It's a source of never ending bugs. > Well, I work there, but ... WiFi something else entirely. Marc just got lucky I spotted an issue in the logs ;-) I'll let you guys take it from here ... johannes
On Wed, Dec 06, 2023 at 05:46:07PM +0100, Johannes Berg wrote: > On Wed, 2023-12-06 at 08:44 -0800, Jakub Kicinski wrote: > > On Wed, 6 Dec 2023 11:39:32 +0100 Johannes Berg wrote: > > > As reported by Marc MERLIN, at least one driver (igc) wants or > > > needs to acquire the RTNL inside suspend/resume ops, which can > > > be called from here in ethtool if runtime PM is enabled. > > > > > > Allow this by doing runtime PM transitions without the RTNL > > > held. For the ioctl to have the same operations order, this > > > required reworking the code to separately check validity and > > > do the operation. For the netlink code, this now has to do > > > the runtime_pm_put a bit later. > > > > I was really, really hoping that this would serve as a motivation > > for Intel to sort out the igb/igc implementation. The flow AFAICT > > is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC > > back down immediately (!?) then it schedules a link check from a work > > queue, which opens it again (!?). It's a source of never ending bugs. > > Well, I work there, but ... WiFi something else entirely. Marc just got > lucky I spotted an issue in the logs ;-) and I am truly thankful for the time you took to help out. It made a huge difference for me in being able to keep a laptop that I will probably use for the next 4 years and otherwise had to return. Thank you for going above and beyond. > I'll let you guys take it from here ... As this time the first patch I got still works for me even if it's not the most ideal way to fix this. As you all figure out what's the best/better patch, please let me know if you'd like me to validate/retest. Thanks, Marc
On 12/6/23 17:46, Johannes Berg wrote: > On Wed, 2023-12-06 at 08:44 -0800, Jakub Kicinski wrote: >> On Wed, 6 Dec 2023 11:39:32 +0100 Johannes Berg wrote: >>> As reported by Marc MERLIN, at least one driver (igc) wants or >>> needs to acquire the RTNL inside suspend/resume ops, which can >>> be called from here in ethtool if runtime PM is enabled. >>> >>> Allow this by doing runtime PM transitions without the RTNL >>> held. For the ioctl to have the same operations order, this >>> required reworking the code to separately check validity and >>> do the operation. For the netlink code, this now has to do >>> the runtime_pm_put a bit later. >> >> I was really, really hoping that this would serve as a motivation >> for Intel to sort out the igb/igc implementation. The flow AFAICT >> is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC >> back down immediately (!?) then it schedules a link check from a work >> queue, which opens it again (!?). It's a source of never ending bugs. >> > > Well, I work there, but ... WiFi something else entirely. Marc just got > lucky I spotted an issue in the logs ;-) > > I'll let you guys take it from here ... > > johannes > I have let know our igc TL, architect, and anybody that could be interested via cc: IWL. And I'm happy that this could be done at relaxed pace thanks to Johannes
On Thu, 7 Dec 2023 11:16:10 +0100 Przemek Kitszel wrote: > I have let know our igc TL, architect, and anybody that could be > interested via cc: IWL. And I'm happy that this could be done at > relaxed pace thanks to Johannes I think you may be expecting us to take Johannes's patch. It's still on the table, but to make things clear - upstream we prefer to wait for the "real fix", so if we agree that fixing igb/igc is a better way (as Heiner pointed out on previous version PM functions are called by the stack under rtnl elsewhere too, just not while device is open) - we'll wait for that. Especially that I'm 80% I complained about the PM in those drivers in the past and nobody seemed to care. It's a constant source of rtnl deadlocks.
On Thu, Dec 07, 2023 at 09:40:21AM -0800, Jakub Kicinski wrote: > On Thu, 7 Dec 2023 11:16:10 +0100 Przemek Kitszel wrote: > > I have let know our igc TL, architect, and anybody that could be > > interested via cc: IWL. And I'm happy that this could be done at > > relaxed pace thanks to Johannes > > I think you may be expecting us to take Johannes's patch. > It's still on the table, but to make things clear - > upstream we prefer to wait for the "real fix", so if we agree > that fixing igb/igc is a better way (as Heiner pointed out on previous > version PM functions are called by the stack under rtnl elsewhere too, > just not while device is open) - we'll wait for that. Especially > that I'm 80% I complained about the PM in those drivers in > the past and nobody seemed to care. It's a constant source of rtnl > deadlocks. For whatever it's worth, I want to be clear that all stock kernels are 100% unusable on lenovo P17gen2 because of this deadlock and that without the temporary patch, my laptop would be usuable. It was also a risk of data loss due to repeated deadlocks and unclean shutdowns. I cannot say what the correct fix is, but I am definitely hoping you will accept some solution for the next stable kernel. Thank you Marc
On 11.12.2023 05:52, Marc MERLIN wrote: > On Thu, Dec 07, 2023 at 09:40:21AM -0800, Jakub Kicinski wrote: >> On Thu, 7 Dec 2023 11:16:10 +0100 Przemek Kitszel wrote: >>> I have let know our igc TL, architect, and anybody that could be >>> interested via cc: IWL. And I'm happy that this could be done at >>> relaxed pace thanks to Johannes >> >> I think you may be expecting us to take Johannes's patch. >> It's still on the table, but to make things clear - >> upstream we prefer to wait for the "real fix", so if we agree >> that fixing igb/igc is a better way (as Heiner pointed out on previous >> version PM functions are called by the stack under rtnl elsewhere too, >> just not while device is open) - we'll wait for that. Especially >> that I'm 80% I complained about the PM in those drivers in >> the past and nobody seemed to care. It's a constant source of rtnl >> deadlocks. > > For whatever it's worth, I want to be clear that all stock kernels > are 100% unusable on lenovo P17gen2 because of this deadlock and that > without the temporary patch, my laptop would be usuable. > It was also a risk of data loss due to repeated deadlocks and unclean > shutdowns. > Why don't you simply disable runtime pm for the affected device as a workaround? This can be done via sysfs. > I cannot say what the correct fix is, but I am definitely hoping you > will accept some solution for the next stable kernel. > > Thank you > Marc
On Fri, Dec 15, 2023 at 02:42:01PM +0100, Heiner Kallweit wrote: > Why don't you simply disable runtime pm for the affected device as a > workaround? This can be done via sysfs. 1) because I didn't know what the exact bug was and how to work around it :) 2) without power management, the battery use is not good, but yes not good is better than laptop crashing :) That said, if it's only affecting wired ethernet, I can also unload the igc module until I actually need wired ethernet, which is sometimes but not often. Thanks for your suggestion Marc
On Fri, Dec 15, 2023 at 09:46:34AM -0800, Marc MERLIN wrote: > On Fri, Dec 15, 2023 at 02:42:01PM +0100, Heiner Kallweit wrote: > > Why don't you simply disable runtime pm for the affected device as a > > workaround? This can be done via sysfs. > > 1) because I didn't know what the exact bug was and how to work around it :) Mmmh, so I need to test an ubuntu kernel (6.5.0-14) because of sound issues in mainline TOT, and I can't boot the kernel to completion without hititng this hang bug. I'm not exactly sure which part of the boot triggers it. I can't patch that kernel easily. How exactly do I disable runtime PM from the kernel command line for "that device" which I'm not even sure which one it is. If it's the eth device, I already removed the igc module to prevent it from loading, and I also removed the ethtool binary, but I'm still getting the hang. On the plus side, with 6.6.8 and the old patch which I understand is not the ideal solution, I can confirm that I've been running problem free until now, so thanks again for that interim patch. Thanks, Marc
On 24.12.2023 17:30, Marc MERLIN wrote: > On Fri, Dec 15, 2023 at 09:46:34AM -0800, Marc MERLIN wrote: >> On Fri, Dec 15, 2023 at 02:42:01PM +0100, Heiner Kallweit wrote: >>> Why don't you simply disable runtime pm for the affected device as a >>> workaround? This can be done via sysfs. >> >> 1) because I didn't know what the exact bug was and how to work around it :) > > Mmmh, so I need to test an ubuntu kernel (6.5.0-14) because of sound > issues in mainline TOT, and I can't boot the kernel to completion > without hititng this hang bug. I'm not exactly sure which part of the > boot triggers it. > > I can't patch that kernel easily. How exactly do I disable runtime PM > from the kernel command line for "that device" which I'm not even sure Change <device>/power/control from "auto" to "on". > which one it is. If it's the eth device, I already removed the igc > module to prevent it from loading, and I also removed the ethtool > binary, but I'm still getting the hang. > > On the plus side, with 6.6.8 and the old patch which I understand is not > the ideal solution, I can confirm that I've been running problem free > until now, so thanks again for that interim patch. > > Thanks, > Marc
On 25/12/2023 1:12, Heiner Kallweit wrote: > On 24.12.2023 17:30, Marc MERLIN wrote: >> On Fri, Dec 15, 2023 at 09:46:34AM -0800, Marc MERLIN wrote: >>> On Fri, Dec 15, 2023 at 02:42:01PM +0100, Heiner Kallweit wrote: >>>> Why don't you simply disable runtime pm for the affected device as a >>>> workaround? This can be done via sysfs. >>> >>> 1) because I didn't know what the exact bug was and how to work around it :) >> >> Mmmh, so I need to test an ubuntu kernel (6.5.0-14) because of sound >> issues in mainline TOT, and I can't boot the kernel to completion >> without hititng this hang bug. I'm not exactly sure which part of the >> boot triggers it. >> >> I can't patch that kernel easily. How exactly do I disable runtime PM >> from the kernel command line for "that device" which I'm not even sure > > Change <device>/power/control from "auto" to "on". Need to figure out your controller location in a file system via lspci/lspci -t and then change to "on" For example: echo on > /sys/devices/pci0000\:00/0000\:00\:1c.0/0000\:ae\:00.0/power/control We are starting to look at this problem, but I can't reproduce the problem on my machines yet. > >> which one it is. If it's the eth device, I already removed the igc >> module to prevent it from loading, and I also removed the ethtool >> binary, but I'm still getting the hang. >> >> On the plus side, with 6.6.8 and the old patch which I understand is not >> the ideal solution, I can confirm that I've been running problem free >> until now, so thanks again for that interim patch. >> >> Thanks, >> Marc > > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
On Mon, Dec 25, 2023 at 10:03:23AM +0200, Sasha Neftin wrote: > > > I can't patch that kernel easily. How exactly do I disable runtime PM > > > from the kernel command line for "that device" which I'm not even sure > > > > Change <device>/power/control from "auto" to "on". > > Need to figure out your controller location in a file system via lspci/lspci > -t and then change to "on" > For example: echo on > > /sys/devices/pci0000\:00/0000\:00\:1c.0/0000\:ae\:00.0/power/control > > We are starting to look at this problem, but I can't reproduce the problem > on my machines yet. Thanks. I realized it was going to be hard either way if the boot hangs before I get to a command prompt, which was what was happening yesterday. I had to boot ubuntu to debug a sound issue, and it was very tricky since most of the time it hung before I got to a command prompt, but I was finally able to get it to work long enough to debug the sound issue and go back to my self built kernel to port over the sound config I needed. I wish I could tell you exactly how to reproduct this in a more useful way, sorry about that. Marc
On Wed, Dec 06, 2023 at 08:44:48AM -0800, Jakub Kicinski wrote: > On Wed, 6 Dec 2023 11:39:32 +0100 Johannes Berg wrote: > > As reported by Marc MERLIN, at least one driver (igc) wants or > > needs to acquire the RTNL inside suspend/resume ops, which can > > be called from here in ethtool if runtime PM is enabled. > > > > Allow this by doing runtime PM transitions without the RTNL > > held. For the ioctl to have the same operations order, this > > required reworking the code to separately check validity and > > do the operation. For the netlink code, this now has to do > > the runtime_pm_put a bit later. > > I was really, really hoping that this would serve as a motivation > for Intel to sort out the igb/igc implementation. The flow AFAICT > is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC > back down immediately (!?) then it schedules a link check from a work It's not like that. pm_runtime_put() in igc_open() does not disable device. It calls runtime_idle callback which check if there is link and if is not, schedule device suspend in 5 second, otherwise device stays running. Work watchdog_task runs periodically and also check for link changes. > queue, which opens it again (!?). It's a source of never ending bugs. Maybe there are issues there and igc pm runtime implementation needs improvements, with lockings or otherwise. Some folks are looking at this. But I think for this particular deadlock problem reverting of below commits should be considered: bd869245a3dc net: core: try to runtime-resume detached device in __dev_open f32a21376573 ethtool: runtime-resume netdev parent before ethtool ioctl ops First, the deadlock should be addressed also in older kernels and refactoring is not really backportable fix. Second, I don't think network stack should do any calls to pm_runtime* . This should be fully device driver specific, as this depends on device driver implementation of power saving. IMHO if it's desirable to resume disabled device on requests from user space, then netif_device_detach() should not be used in runtime suspend. Thoughts ? Regards Stanislaw
On 03.01.2024 11:30, Stanislaw Gruszka wrote: > On Wed, Dec 06, 2023 at 08:44:48AM -0800, Jakub Kicinski wrote: >> On Wed, 6 Dec 2023 11:39:32 +0100 Johannes Berg wrote: >>> As reported by Marc MERLIN, at least one driver (igc) wants or >>> needs to acquire the RTNL inside suspend/resume ops, which can >>> be called from here in ethtool if runtime PM is enabled. >>> >>> Allow this by doing runtime PM transitions without the RTNL >>> held. For the ioctl to have the same operations order, this >>> required reworking the code to separately check validity and >>> do the operation. For the netlink code, this now has to do >>> the runtime_pm_put a bit later. >> >> I was really, really hoping that this would serve as a motivation >> for Intel to sort out the igb/igc implementation. The flow AFAICT >> is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC >> back down immediately (!?) then it schedules a link check from a work > > It's not like that. pm_runtime_put() in igc_open() does not disable device. > It calls runtime_idle callback which check if there is link and if is > not, schedule device suspend in 5 second, otherwise device stays running. > > Work watchdog_task runs periodically and also check for link changes. > >> queue, which opens it again (!?). It's a source of never ending bugs. > > Maybe there are issues there and igc pm runtime implementation needs > improvements, with lockings or otherwise. Some folks are looking at this. > But I think for this particular deadlock problem reverting of below commits > should be considered: > > bd869245a3dc net: core: try to runtime-resume detached device in __dev_open > f32a21376573 ethtool: runtime-resume netdev parent before ethtool ioctl ops > Reverting bd869245a3dc would break existing stuff. > First, the deadlock should be addressed also in older kernels and > refactoring is not really backportable fix. > You could simply disable igc runtime pm on older kernel versions if backporting a proper fix would be too cumbersome. > Second, I don't think network stack should do any calls to pm_runtime* . It's not unusual that subsystem core code deals with runtime pm. E.g. see all the runtime pm calls in drivers/pci/pci.c IMO it's exactly the purpose of the RPM API to encapsulate the device-specific (r)pm features. > This should be fully device driver specific, as this depends on device > driver implementation of power saving. IMHO if it's desirable to > resume disabled device on requests from user space, then > netif_device_detach() should not be used in runtime suspend. > > Thoughts ? > > Regards > Stanislaw >
On Wed, Jan 03, 2024 at 12:24:18PM +0100, Heiner Kallweit wrote: > On 03.01.2024 11:30, Stanislaw Gruszka wrote: > > On Wed, Dec 06, 2023 at 08:44:48AM -0800, Jakub Kicinski wrote: > >> On Wed, 6 Dec 2023 11:39:32 +0100 Johannes Berg wrote: > >>> As reported by Marc MERLIN, at least one driver (igc) wants or > >>> needs to acquire the RTNL inside suspend/resume ops, which can > >>> be called from here in ethtool if runtime PM is enabled. > >>> > >>> Allow this by doing runtime PM transitions without the RTNL > >>> held. For the ioctl to have the same operations order, this > >>> required reworking the code to separately check validity and > >>> do the operation. For the netlink code, this now has to do > >>> the runtime_pm_put a bit later. > >> > >> I was really, really hoping that this would serve as a motivation > >> for Intel to sort out the igb/igc implementation. The flow AFAICT > >> is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC > >> back down immediately (!?) then it schedules a link check from a work > > > > It's not like that. pm_runtime_put() in igc_open() does not disable device. > > It calls runtime_idle callback which check if there is link and if is > > not, schedule device suspend in 5 second, otherwise device stays running. > > > > Work watchdog_task runs periodically and also check for link changes. > > > >> queue, which opens it again (!?). It's a source of never ending bugs. > > > > Maybe there are issues there and igc pm runtime implementation needs > > improvements, with lockings or otherwise. Some folks are looking at this. > > But I think for this particular deadlock problem reverting of below commits > > should be considered: > > > > bd869245a3dc net: core: try to runtime-resume detached device in __dev_open > > f32a21376573 ethtool: runtime-resume netdev parent before ethtool ioctl ops > > Reverting bd869245a3dc would break existing stuff. > > > First, the deadlock should be addressed also in older kernels and > > refactoring is not really backportable fix. > > > You could simply disable igc runtime pm on older kernel versions > if backporting a proper fix would be too cumbersome. It would be better to have pm working on older kernels as it use to. > > Second, I don't think network stack should do any calls to pm_runtime* . > > It's not unusual that subsystem core code deals with runtime pm. > E.g. see all the runtime pm calls in drivers/pci/pci.c > IMO it's exactly the purpose of the RPM API to encapsulate the > device-specific (r)pm features. PCI is bus layer that control device probe/remove, suspend/resume, etc, it has to do this. To make proper companion non-bus subsystem should be used i.e. sound, drm, bluetooth ... all of those do not pm_runtime in core layer and leave that to drivers. One exception is block layer with it's blk-pm.c , but that it's also more like library that is used by the drivers. Regards Stanislaw
On Wed, 3 Jan 2024 11:30:17 +0100 Stanislaw Gruszka wrote: > > I was really, really hoping that this would serve as a motivation > > for Intel to sort out the igb/igc implementation. The flow AFAICT > > is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC > > back down immediately (!?) then it schedules a link check from a work > > It's not like that. pm_runtime_put() in igc_open() does not disable device. > It calls runtime_idle callback which check if there is link and if is > not, schedule device suspend in 5 second, otherwise device stays running. Hm, I missed the 5 sec delay there. Next question for me is - how does it not deadlock in the open? igc_open() __igc_open(resuming=false) if (!resuming) pm_runtime_get_sync(&pdev->dev); igc_resume() rtnl_lock()
On Wed, Jan 03, 2024 at 03:34:05PM -0800, Jakub Kicinski wrote: > On Wed, 3 Jan 2024 11:30:17 +0100 Stanislaw Gruszka wrote: > > > I was really, really hoping that this would serve as a motivation > > > for Intel to sort out the igb/igc implementation. The flow AFAICT > > > is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC > > > back down immediately (!?) then it schedules a link check from a work > > > > It's not like that. pm_runtime_put() in igc_open() does not disable device. > > It calls runtime_idle callback which check if there is link and if is > > not, schedule device suspend in 5 second, otherwise device stays running. > > Hm, I missed the 5 sec delay there. Next question for me is - how does > it not deadlock in the open? > > igc_open() > __igc_open(resuming=false) > if (!resuming) > pm_runtime_get_sync(&pdev->dev); > > igc_resume() > rtnl_lock() If device was not suspended, pm_runtime_get_sync() will increase dev->power.usage_count counter and cancel pending rpm suspend request if any. There is race condition though, more about that below. If device was suspended, we could not get to igc_open() since it was marked as detached and fail netif_device_present() check in __dev_open(). That was the behaviour before bd869245a3dc. There is small race window between with igc_open() and scheduled runtime suspend, if at the same time dev_open() is done and dev->power.suspend_timer expire: open: pm_suspend_timer_fh: rtnl_lock() rpm_suspend() igc_runtime_suspend() __igc_shutdown() rtnl_lock() __igc_open() pm_runtime_get_sync(): waits for rpm suspend callback done This needs to be addressed, but it's not that this can happen all the time. To trigger this someone has to remove the cable and exactly after 5 seconds do ip link set up. Regards Stanislaw
On 04.01.2024 09:25, Stanislaw Gruszka wrote: > On Wed, Jan 03, 2024 at 03:34:05PM -0800, Jakub Kicinski wrote: >> On Wed, 3 Jan 2024 11:30:17 +0100 Stanislaw Gruszka wrote: >>>> I was really, really hoping that this would serve as a motivation >>>> for Intel to sort out the igb/igc implementation. The flow AFAICT >>>> is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC >>>> back down immediately (!?) then it schedules a link check from a work >>> >>> It's not like that. pm_runtime_put() in igc_open() does not disable device. >>> It calls runtime_idle callback which check if there is link and if is >>> not, schedule device suspend in 5 second, otherwise device stays running. >> >> Hm, I missed the 5 sec delay there. Next question for me is - how does >> it not deadlock in the open? >> >> igc_open() >> __igc_open(resuming=false) >> if (!resuming) >> pm_runtime_get_sync(&pdev->dev); >> >> igc_resume() >> rtnl_lock() > > If device was not suspended, pm_runtime_get_sync() will increase > dev->power.usage_count counter and cancel pending rpm suspend > request if any. There is race condition though, more about that > below. > > If device was suspended, we could not get to igc_open() since it > was marked as detached and fail netif_device_present() check in > __dev_open(). That was the behaviour before bd869245a3dc. > > There is small race window between with igc_open() and scheduled > runtime suspend, if at the same time dev_open() is done and > dev->power.suspend_timer expire: > > open: pm_suspend_timer_fh: > > rtnl_lock() > rpm_suspend() > igc_runtime_suspend() > __igc_shutdown() > rtnl_lock() > > __igc_open() > pm_runtime_get_sync(): > waits for rpm suspend callback done > > This needs to be addressed, but it's not that this can happen > all the time. To trigger this someone has to remove the > cable and exactly after 5 seconds do ip link set up. > For me the main question is the following. In igc_resume() you have rtnl_lock(); if (!err && netif_running(netdev)) err = __igc_open(netdev, true); if (!err) netif_device_attach(netdev); rtnl_unlock(); Why is the global rtnl_lock() needed here? The netdev is in detached state what protects from e.g. userspace activity, see all the netif_device_present() checks in net core. > Regards > Stanislaw
On Thu, 4 Jan 2024 10:05:12 +0100 Heiner Kallweit wrote: > > If device was not suspended, pm_runtime_get_sync() will increase > > dev->power.usage_count counter and cancel pending rpm suspend > > request if any. There is race condition though, more about that > > below. > > > > If device was suspended, we could not get to igc_open() since it > > was marked as detached and fail netif_device_present() check in > > __dev_open(). That was the behaviour before bd869245a3dc. __dev_open() tries to resume as well, and is also under rtnl_lock. So that resume call somehow must never happen or users would see -ENODEV? Sorry for the basic questions, the flow is confusing :S > > There is small race window between with igc_open() and scheduled > > runtime suspend, if at the same time dev_open() is done and > > dev->power.suspend_timer expire: > > > > open: pm_suspend_timer_fh: > > > > rtnl_lock() > > rpm_suspend() > > igc_runtime_suspend() > > __igc_shutdown() > > rtnl_lock() > > > > __igc_open() > > pm_runtime_get_sync(): > > waits for rpm suspend callback done > > > > This needs to be addressed, but it's not that this can happen > > all the time. To trigger this someone has to remove the > > cable and exactly after 5 seconds do ip link set up. Or tries to up exactly 5 sec after probe? > For me the main question is the following. In igc_resume() you have > > rtnl_lock(); > if (!err && netif_running(netdev)) > err = __igc_open(netdev, true); > > if (!err) > netif_device_attach(netdev); > rtnl_unlock(); > > Why is the global rtnl_lock() needed here? The netdev is in detached > state what protects from e.g. userspace activity, see all the > netif_device_present() checks in net core. That'd assume there are no RPM calls outside networking in this driver. Perhaps there aren't but that also sounds wobbly.
On Thu, Jan 04, 2024 at 10:05:12AM +0100, Heiner Kallweit wrote: > On 04.01.2024 09:25, Stanislaw Gruszka wrote: > For me the main question is the following. In igc_resume() you have > > rtnl_lock(); > if (!err && netif_running(netdev)) > err = __igc_open(netdev, true); > > if (!err) > netif_device_attach(netdev); > rtnl_unlock(); > > Why is the global rtnl_lock() needed here? The netdev is in detached > state what protects from e.g. userspace activity, see all the > netif_device_present() checks in net core. Good question. Initially I thought that the lock can be removed for the exact reason you wrote. I.e. the analogus change as you did for igb could de done ( ac8c58f5b535 ). But after more detailed examination I can see the need for lock. __igc_open() calls at least one function that require rtnl_lock: netif_set_real_num_rx_queues(). Despite using netif_device_attach() without the rtnl_lock at various places it's not safe. After: if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) && netif_running(dev)) { the full link down can be performed without rtnl_lock, so we can do netif_tx_wake_all_queues(dev); __netdev_watchdog_up(dev); during closing or after device is closed. Just found those two. I think there could be more reasons. Regards Stanislaw
On Thu, Jan 04, 2024 at 08:16:56AM -0800, Jakub Kicinski wrote: > On Thu, 4 Jan 2024 10:05:12 +0100 Heiner Kallweit wrote: > > > If device was not suspended, pm_runtime_get_sync() will increase > > > dev->power.usage_count counter and cancel pending rpm suspend > > > request if any. There is race condition though, more about that > > > below. > > > > > > If device was suspended, we could not get to igc_open() since it > > > was marked as detached and fail netif_device_present() check in > > > __dev_open(). That was the behaviour before bd869245a3dc. > > __dev_open() tries to resume as well, and is also under rtnl_lock. This one is plain 100% deadlock for igc (and igb before ac8c58f5b535) I'm opting for remove those rpm calls from __dev_open() and ethtool. The only thing that prevent that deadlock to happen all the time, is that rpm is disabled by default (for PCI devices). When pci driver want to rpm be default enabled, it has to call pm_runtime_allow(). Otherwise user has to enable it by: echo auto > /sys/bus/pci/devices/PCI_ID/power/control But this could be also done by some power saving user-space software. This is most probable reason way Marc reported that he can not boot his laptop due to this deadlock. Other unlikely possibility that for some reason rpm was enabled by default, but it should not be for PCI since: bb910a7040e9 ("PCI/PM Runtime: Make runtime PM of PCI devices inactive by default") > So that resume call somehow must never happen or users would see > -ENODEV? Sorry for the basic questions, the flow is confusing :S If we talk about situation before rpm calls were added to net core (i.e. < 5.9) there was open/ethtool -ENODEV error when igc/igb was runtime suspend due to netif_device_present() check fail. That was by design, what for open the device and loose energy if there is no cable and device can not be used anyway ? > > > There is small race window between with igc_open() and scheduled > > > runtime suspend, if at the same time dev_open() is done and > > > dev->power.suspend_timer expire: > > > > > > open: pm_suspend_timer_fh: > > > > > > rtnl_lock() > > > rpm_suspend() > > > igc_runtime_suspend() > > > __igc_shutdown() > > > rtnl_lock() > > > > > > __igc_open() > > > pm_runtime_get_sync(): > > > waits for rpm suspend callback done > > > > > > This needs to be addressed, but it's not that this can happen > > > all the time. To trigger this someone has to remove the > > > cable and exactly after 5 seconds do ip link set up. > > Or tries to up exactly 5 sec after probe? Just after probe rpm is disabled, so 5 sec after enabling rpm (with cable removed) or 5 sec after cable remove (with rpm enabled). > > For me the main question is the following. In igc_resume() you have > > > > rtnl_lock(); > > if (!err && netif_running(netdev)) > > err = __igc_open(netdev, true); > > > > if (!err) > > netif_device_attach(netdev); > > rtnl_unlock(); > > > > Why is the global rtnl_lock() needed here? The netdev is in detached > > state what protects from e.g. userspace activity, see all the > > netif_device_present() checks in net core. > > That'd assume there are no RPM calls outside networking in this driver. > Perhaps there aren't but that also sounds wobbly. They are in PCI layer. For example when disabling rpm (reverting auto in power/control) by: echo on > /sys/bus/pci/devices/PCI_ID/power/control Regards Stanislaw
On Fri, 5 Jan 2024 12:53:42 +0100 Stanislaw Gruszka wrote: > On Thu, Jan 04, 2024 at 08:16:56AM -0800, Jakub Kicinski wrote: > > __dev_open() tries to resume as well, and is also under rtnl_lock. > > This one is plain 100% deadlock for igc (and igb before ac8c58f5b535) > I'm opting for remove those rpm calls from __dev_open() and ethtool. I don't know what gets powered down, exactly, in this device, so I can't give you a concrete example. But usually there's at least one ndo / ethtool callback which needs to resume the device (and already holds rtnl_lock). Taking rtnl_lock on the resume path is fundamentally broken. Removing the rpm calls from the core is just going to lead to a whack-a-mole of bugs in the drivers themselves. IOW I look at the RPM calls in the core as a canary for people doing the wrong thing :( > > So that resume call somehow must never happen or users would see > > -ENODEV? Sorry for the basic questions, the flow is confusing :S > > If we talk about situation before rpm calls were added to net core > (i.e. < 5.9) there was open/ethtool -ENODEV error when igc/igb > was runtime suspend due to netif_device_present() check fail. > > That was by design, what for open the device and loose > energy if there is no cable and device can not be used anyway ? I think "link" means actual link up here, no? As opposed to no cable plugged in. If I understand that right - the device would have to train the link in DOWN state in order for the device to be opened? That would be quite wasteful in terms of power. Regardless, returning -ENODEV is really not how netdevs should behave. That's what carrier reporting is for! :(
On Fri, Jan 05, 2024 at 07:30:01AM -0800, Jakub Kicinski wrote: > On Fri, 5 Jan 2024 12:53:42 +0100 Stanislaw Gruszka wrote: > > On Thu, Jan 04, 2024 at 08:16:56AM -0800, Jakub Kicinski wrote: > > > __dev_open() tries to resume as well, and is also under rtnl_lock. > > > > This one is plain 100% deadlock for igc (and igb before ac8c58f5b535) > > I'm opting for remove those rpm calls from __dev_open() and ethtool. > > I don't know what gets powered down, exactly, in this device, > so I can't give you a concrete example. But usually there's > at least one ndo / ethtool callback which needs to resume > the device (and already holds rtnl_lock). Taking rtnl_lock > on the resume path is fundamentally broken. I agree with that. > Removing the > rpm calls from the core is just going to lead to a whack-a-mole > of bugs in the drivers themselves. > > IOW I look at the RPM calls in the core as a canary for people > doing the wrong thing :( Hmm, this one I don't understand, what other bugs could pop up after reverting bd869245a3dcc and others that added rpm calls for the net core? > > > So that resume call somehow must never happen or users would see > > > -ENODEV? Sorry for the basic questions, the flow is confusing :S > > > > If we talk about situation before rpm calls were added to net core > > (i.e. < 5.9) there was open/ethtool -ENODEV error when igc/igb > > was runtime suspend due to netif_device_present() check fail. > > > > That was by design, what for open the device and loose > > energy if there is no cable and device can not be used anyway ? > > I think "link" means actual link up here, no? As opposed to no cable > plugged in. If I understand that right - the device would have to train > the link in DOWN state in order for the device to be opened? > That would be quite wasteful in terms of power. I ment no cable plugged. When igc device was runtime suspended, and user connected the cable, user has to power device up via on > power/control and then ip link set up. > Regardless, returning -ENODEV is really not how netdevs should behave. > That's what carrier reporting is for! :( Ok, I can agrre with that. But I think this should be achived by not using netif_device_detach() in rpm suspend, not by if (!netif_device_present(dev)) { /* may be detached because parent is runtime-suspended */ if (dev->dev.parent) pm_runtime_resume(dev->dev.parent); if (!netif_device_present(dev)) return -ENODEV; } Regards Stanislaw
On Fri, 5 Jan 2024 17:29:16 +0100 Stanislaw Gruszka wrote: > > Removing the rpm calls from the core is just going to lead to a > > whack-a-mole of bugs in the drivers themselves. > > > > IOW I look at the RPM calls in the core as a canary for people > > doing the wrong thing :( > > Hmm, this one I don't understand, what other bugs could pop up > after reverting bd869245a3dcc and others that added rpm calls > for the net core? IDK what igc powers down, but if there's any ndo or ethtool callback which needs to access a register that requires power to be resumed - it will deadlock on rtnl exactly the same. Looking at igc_ethtool I see: static int igc_ethtool_begin(struct net_device *netdev) { struct igc_adapter *adapter = netdev_priv(netdev); pm_runtime_get_sync(&adapter->pdev->dev); return 0; } static void igc_ethtool_complete(struct net_device *netdev) { struct igc_adapter *adapter = netdev_priv(netdev); pm_runtime_put(&adapter->pdev->dev); } so unless we think that returning -ENODEV from all ethtool calls when cable is not plugged in is okay - removing the PM resume from the core doesn't buy us much :(
On Fri, Jan 05, 2024 at 07:02:18PM -0800, Jakub Kicinski wrote: > On Fri, 5 Jan 2024 17:29:16 +0100 Stanislaw Gruszka wrote: > > > Removing the rpm calls from the core is just going to lead to a > > > whack-a-mole of bugs in the drivers themselves. > > > > > > IOW I look at the RPM calls in the core as a canary for people > > > doing the wrong thing :( > > > > Hmm, this one I don't understand, what other bugs could pop up > > after reverting bd869245a3dcc and others that added rpm calls > > for the net core? > > IDK what igc powers down, From what I can tell basically everything, it's full shutdown. > but if there's any ndo or ethtool > callback which needs to access a register that requires power > to be resumed - it will deadlock on rtnl exactly the same. > > Looking at igc_ethtool I see: > > static int igc_ethtool_begin(struct net_device *netdev) > { > struct igc_adapter *adapter = netdev_priv(netdev); > > pm_runtime_get_sync(&adapter->pdev->dev); > return 0; > } > > static void igc_ethtool_complete(struct net_device *netdev) > { > struct igc_adapter *adapter = netdev_priv(netdev); > > pm_runtime_put(&adapter->pdev->dev); > } > > so unless we think that returning -ENODEV from all ethtool calls > when cable is not plugged in is okay - removing the PM resume > from the core doesn't buy us much :( It would address the regression in simple fix that can be send to -stable. Event if -ENODEV for all ethtool ops and open is not good, it's still better than deadlocking whole system. I agree RPM for igc is not perfect and has issues that need to be fix. People are working on it inspired by e1000e implementation. Is should address the main requirement: no rtnl_lock on resume path - waking up device when needed on ndo/ethtool. But that would not be simple fix AFICT, more likely it will be reimplementation of the whole thing. Additionally, in context of ethtool, previously each driver that implement RPM, woke up the device for actual HW access, and don't when only memory was used. For example e1000e has fine tuned ethtool ops. Some others like cadence/macb or renesas/sh_eth went event further and have their pm_runtime_resume_get_sync() in register access functions. Now a hardware is powered up on every ethtool op regardless of actual need. So I think that the calls are only needed for some drivers, but for others are detrimental. Would adding new netdev->priv_flags for calling them be acceptable ? Regards Stanislaw
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c index a977f8903467..7c0e56985eb8 100644 --- a/net/ethtool/ioctl.c +++ b/net/ethtool/ioctl.c @@ -2768,26 +2768,18 @@ static int ethtool_set_fecparam(struct net_device *dev, void __user *useraddr) /* The main entry point in this file. Called from net/core/dev_ioctl.c */ static int -__dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr, - u32 ethcmd, struct ethtool_devlink_compat *devlink_state) +__dev_ethtool_check(struct net_device *dev, void __user *useraddr, + u32 ethcmd, u32 *sub_cmd) { - struct net_device *dev; - u32 sub_cmd; - int rc; - netdev_features_t old_features; - - dev = __dev_get_by_name(net, ifr->ifr_name); - if (!dev) - return -ENODEV; - if (ethcmd == ETHTOOL_PERQUEUE) { - if (copy_from_user(&sub_cmd, useraddr + sizeof(ethcmd), sizeof(sub_cmd))) + if (copy_from_user(sub_cmd, useraddr + sizeof(ethcmd), sizeof(*sub_cmd))) return -EFAULT; } else { - sub_cmd = ethcmd; + *sub_cmd = ethcmd; } + /* Allow some commands to be done by anyone */ - switch (sub_cmd) { + switch (*sub_cmd) { case ETHTOOL_GSET: case ETHTOOL_GDRVINFO: case ETHTOOL_GMSGLVL: @@ -2826,22 +2818,28 @@ __dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr, case ETHTOOL_GFECPARAM: break; default: - if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) + if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN)) return -EPERM; } - if (dev->dev.parent) - pm_runtime_get_sync(dev->dev.parent); + return 0; +} - if (!netif_device_present(dev)) { - rc = -ENODEV; - goto out; - } +static int +__dev_ethtool_do(struct net_device *dev, struct ifreq *ifr, + void __user *useraddr, u32 ethcmd, u32 sub_cmd, + struct ethtool_devlink_compat *devlink_state) +{ + netdev_features_t old_features; + int rc; + + if (!netif_device_present(dev)) + return -ENODEV; if (dev->ethtool_ops->begin) { rc = dev->ethtool_ops->begin(dev); if (rc < 0) - goto out; + return rc; } old_features = dev->features; @@ -3052,7 +3050,7 @@ __dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr, rc = ethtool_set_fecparam(dev, useraddr); break; default: - rc = -EOPNOTSUPP; + return -EOPNOTSUPP; } if (dev->ethtool_ops->complete) @@ -3060,9 +3058,6 @@ __dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr, if (old_features != dev->features) netdev_features_change(dev); -out: - if (dev->dev.parent) - pm_runtime_put(dev->dev.parent); return rc; } @@ -3070,7 +3065,9 @@ __dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr, int dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr) { struct ethtool_devlink_compat *state; - u32 ethcmd; + struct net_device *dev = NULL; + netdevice_tracker dev_tracker; + u32 ethcmd, subcmd; int rc; if (copy_from_user(ðcmd, useraddr, sizeof(ethcmd))) @@ -3090,9 +3087,26 @@ int dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr) break; } + dev = netdev_get_by_name(net, ifr->ifr_name, &dev_tracker, GFP_KERNEL); + if (!dev) { + rc = -ENODEV; + goto exit_free; + } + + rc = __dev_ethtool_check(dev, useraddr, ethcmd, &subcmd); + if (rc) + goto exit_free; + + if (dev->dev.parent) + pm_runtime_get_sync(dev->dev.parent); + rtnl_lock(); - rc = __dev_ethtool(net, ifr, useraddr, ethcmd, state); + rc = __dev_ethtool_do(dev, ifr, useraddr, ethcmd, subcmd, state); rtnl_unlock(); + + if (dev->dev.parent) + pm_runtime_put(dev->dev.parent); + if (rc) goto exit_free; @@ -3115,6 +3129,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr, void __user *useraddr) } exit_free: + netdev_put(dev, &dev_tracker); if (state->devlink) devlink_put(state->devlink); kfree(state); diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c index fe3553f60bf3..67e2dd893330 100644 --- a/net/ethtool/netlink.c +++ b/net/ethtool/netlink.c @@ -34,39 +34,23 @@ int ethnl_ops_begin(struct net_device *dev) { int ret; - if (!dev) - return -ENODEV; - - if (dev->dev.parent) - pm_runtime_get_sync(dev->dev.parent); - if (!netif_device_present(dev) || - dev->reg_state == NETREG_UNREGISTERING) { - ret = -ENODEV; - goto err; - } + dev->reg_state == NETREG_UNREGISTERING) + return -ENODEV; if (dev->ethtool_ops->begin) { ret = dev->ethtool_ops->begin(dev); if (ret) - goto err; + return ret; } return 0; -err: - if (dev->dev.parent) - pm_runtime_put(dev->dev.parent); - - return ret; } void ethnl_ops_complete(struct net_device *dev) { if (dev->ethtool_ops->complete) dev->ethtool_ops->complete(dev); - - if (dev->dev.parent) - pm_runtime_put(dev->dev.parent); } /** @@ -602,6 +586,14 @@ static int ethnl_default_set_doit(struct sk_buff *skb, struct genl_info *info) goto out_dev; } + if (!req_info.dev) { + ret = -ENODEV; + goto out_dev; + } + + if (req_info.dev->dev.parent) + pm_runtime_get_sync(req_info.dev->dev.parent); + rtnl_lock(); ret = ethnl_ops_begin(req_info.dev); if (ret < 0) @@ -617,6 +609,8 @@ static int ethnl_default_set_doit(struct sk_buff *skb, struct genl_info *info) ethnl_ops_complete(req_info.dev); out_rtnl: rtnl_unlock(); + if (req_info.dev->dev.parent) + pm_runtime_put(req_info.dev->dev.parent); out_dev: ethnl_parse_header_dev_put(&req_info); return ret;