Message ID | 20220801233403.258871-1-f.fainelli@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 744d23c71af39c7dc77ac7c3cac87ae86a181a85 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: phy: Warn about incorrect mdio_bus_phy_resume() state | expand |
Hello: This patch was applied to netdev/net.git (master) by Jakub Kicinski <kuba@kernel.org>: On Mon, 1 Aug 2022 16:34:03 -0700 you wrote: > Calling mdio_bus_phy_resume() with neither the PHY state machine set to > PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication > that we can produce a race condition looking like this: > > CPU0 CPU1 > bcmgenet_resume > -> phy_resume > -> phy_init_hw > -> phy_start > -> phy_resume > phy_start_aneg() > mdio_bus_phy_resume > -> phy_resume > -> phy_write(..., BMCR_RESET) > -> usleep() -> phy_read() > > [...] Here is the summary with links: - [net] net: phy: Warn about incorrect mdio_bus_phy_resume() state https://git.kernel.org/netdev/net/c/744d23c71af3 You are awesome, thank you!
Hi All, On 02.08.2022 01:34, Florian Fainelli wrote: > Calling mdio_bus_phy_resume() with neither the PHY state machine set to > PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication > that we can produce a race condition looking like this: > > CPU0 CPU1 > bcmgenet_resume > -> phy_resume > -> phy_init_hw > -> phy_start > -> phy_resume > phy_start_aneg() > mdio_bus_phy_resume > -> phy_resume > -> phy_write(..., BMCR_RESET) > -> usleep() -> phy_read() > > with the phy_resume() function triggering a PHY behavior that might have > to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix > brcm_fet_config_init()") for instance) that ultimately leads to an error > reading from the PHY. > > Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") > Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> This patch, as probably intended, triggers a warning during system suspend/resume cycle in the SMSC911x driver. I've observed it on ARM Juno R1 board on the kernel compiled from next-202208010: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323 mdio_bus_phy_resume+0x34/0xc8 Modules linked in: smsc911x cpufreq_powersave cpufreq_conservative crct10dif_ce ip_tables x_tables ipv6 [last unloaded: smsc911x] CPU: 1 PID: 398 Comm: rtcwake Not tainted 5.19.0+ #940 Hardware name: ARM Juno development board (r1) (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mdio_bus_phy_resume+0x34/0xc8 lr : dpm_run_callback+0x74/0x350 ... Call trace: mdio_bus_phy_resume+0x34/0xc8 dpm_run_callback+0x74/0x350 device_resume+0xb8/0x258 dpm_resume+0x120/0x4a8 dpm_resume_end+0x14/0x28 suspend_devices_and_enter+0x164/0xa60 pm_suspend+0x25c/0x3a8 state_store+0x84/0x108 kobj_attr_store+0x14/0x28 sysfs_kf_write+0x60/0x70 kernfs_fop_write_iter+0x124/0x1a8 new_sync_write+0xd0/0x190 vfs_write+0x208/0x478 ksys_write+0x64/0xf0 __arm64_sys_write+0x14/0x20 invoke_syscall+0x40/0xf8 el0_svc_common.constprop.3+0x8c/0x120 do_el0_svc+0x28/0xc8 el0_svc+0x48/0xd0 el0t_64_sync_handler+0x94/0xb8 el0t_64_sync+0x15c/0x160 irq event stamp: 24406 hardirqs last enabled at (24405): [<ffff8000090c4734>] _raw_spin_unlock_irqrestore+0x8c/0x90 hardirqs last disabled at (24406): [<ffff8000090b3164>] el1_dbg+0x24/0x88 softirqs last enabled at (24144): [<ffff800008010488>] _stext+0x488/0x5cc softirqs last disabled at (24139): [<ffff80000809bf98>] irq_exit_rcu+0x168/0x1a8 ---[ end trace 0000000000000000 ]--- I hope the above information will help fixing the driver. > --- > drivers/net/phy/phy_device.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 46acddd865a7..608de5a94165 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -316,6 +316,12 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev) > > phydev->suspended_by_mdio_bus = 0; > > + /* If we managed to get here with the PHY state machine in a state other > + * than PHY_HALTED this is an indication that something went wrong and > + * we should most likely be using MAC managed PM and we are not. > + */ > + WARN_ON(phydev->state != PHY_HALTED && !phydev->mac_managed_pm); > + > ret = phy_init_hw(phydev); > if (ret < 0) > return ret; Best regards
On 8/12/22 04:19, Marek Szyprowski wrote: > Hi All, > > On 02.08.2022 01:34, Florian Fainelli wrote: >> Calling mdio_bus_phy_resume() with neither the PHY state machine set to >> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication >> that we can produce a race condition looking like this: >> >> CPU0 CPU1 >> bcmgenet_resume >> -> phy_resume >> -> phy_init_hw >> -> phy_start >> -> phy_resume >> phy_start_aneg() >> mdio_bus_phy_resume >> -> phy_resume >> -> phy_write(..., BMCR_RESET) >> -> usleep() -> phy_read() >> >> with the phy_resume() function triggering a PHY behavior that might have >> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix >> brcm_fet_config_init()") for instance) that ultimately leads to an error >> reading from the PHY. >> >> Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> > > This patch, as probably intended, triggers a warning during system > suspend/resume cycle in the SMSC911x driver. I've observed it on ARM > Juno R1 board on the kernel compiled from next-202208010: > > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323 > mdio_bus_phy_resume+0x34/0xc8 > Modules linked in: smsc911x cpufreq_powersave cpufreq_conservative > crct10dif_ce ip_tables x_tables ipv6 [last unloaded: smsc911x] > CPU: 1 PID: 398 Comm: rtcwake Not tainted 5.19.0+ #940 > Hardware name: ARM Juno development board (r1) (DT) > pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : mdio_bus_phy_resume+0x34/0xc8 > lr : dpm_run_callback+0x74/0x350 > ... > Call trace: > mdio_bus_phy_resume+0x34/0xc8 > dpm_run_callback+0x74/0x350 > device_resume+0xb8/0x258 > dpm_resume+0x120/0x4a8 > dpm_resume_end+0x14/0x28 > suspend_devices_and_enter+0x164/0xa60 > pm_suspend+0x25c/0x3a8 > state_store+0x84/0x108 > kobj_attr_store+0x14/0x28 > sysfs_kf_write+0x60/0x70 > kernfs_fop_write_iter+0x124/0x1a8 > new_sync_write+0xd0/0x190 > vfs_write+0x208/0x478 > ksys_write+0x64/0xf0 > __arm64_sys_write+0x14/0x20 > invoke_syscall+0x40/0xf8 > el0_svc_common.constprop.3+0x8c/0x120 > do_el0_svc+0x28/0xc8 > el0_svc+0x48/0xd0 > el0t_64_sync_handler+0x94/0xb8 > el0t_64_sync+0x15c/0x160 > irq event stamp: 24406 > hardirqs last enabled at (24405): [<ffff8000090c4734>] > _raw_spin_unlock_irqrestore+0x8c/0x90 > hardirqs last disabled at (24406): [<ffff8000090b3164>] el1_dbg+0x24/0x88 > softirqs last enabled at (24144): [<ffff800008010488>] _stext+0x488/0x5cc > softirqs last disabled at (24139): [<ffff80000809bf98>] > irq_exit_rcu+0x168/0x1a8 > ---[ end trace 0000000000000000 ]--- > > I hope the above information will help fixing the driver. Yes this is catching an actual issue in the driver in that the PHY state machine is still running while the system is trying to suspend. We could go about fixing it in a different number of ways, though I believe this one is probably correct enough to work and fix the warning: diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c index 3bf20211cceb..e9c0668a4dc0 100644 --- a/drivers/net/ethernet/smsc/smsc911x.c +++ b/drivers/net/ethernet/smsc/smsc911x.c @@ -1037,6 +1037,8 @@ static int smsc911x_mii_probe(struct net_device *dev) return ret; } + /* Indicate that the MAC is responsible for managing PHY PM */ + phydev->mac_managed_pm = true; phy_attached_info(phydev); phy_set_max_speed(phydev, SPEED_100); @@ -2587,6 +2589,8 @@ static int smsc911x_suspend(struct device *dev) if (netif_running(ndev)) { netif_stop_queue(ndev); netif_device_detach(ndev); + if (!device_may_wakeup(dev)) + phy_suspend(dev->phydev); } /* enable wake on LAN, energy detection and the external PME @@ -2628,6 +2632,8 @@ static int smsc911x_resume(struct device *dev) if (netif_running(ndev)) { netif_device_attach(ndev); netif_start_queue(ndev); + if (!device_may_wakeup(dev)) + phy_resume(dev->phydev); } return 0;
On 12.08.2022 18:32, Florian Fainelli wrote: > On 8/12/22 04:19, Marek Szyprowski wrote: >> Hi All, >> >> On 02.08.2022 01:34, Florian Fainelli wrote: >>> Calling mdio_bus_phy_resume() with neither the PHY state machine set to >>> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication >>> that we can produce a race condition looking like this: >>> >>> CPU0 CPU1 >>> bcmgenet_resume >>> -> phy_resume >>> -> phy_init_hw >>> -> phy_start >>> -> phy_resume >>> phy_start_aneg() >>> mdio_bus_phy_resume >>> -> phy_resume >>> -> phy_write(..., BMCR_RESET) >>> -> usleep() -> phy_read() >>> >>> with the phy_resume() function triggering a PHY behavior that might >>> have >>> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix >>> brcm_fet_config_init()") for instance) that ultimately leads to an >>> error >>> reading from the PHY. >>> >>> Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC >>> driver manages PHY PM") >>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> >> >> This patch, as probably intended, triggers a warning during system >> suspend/resume cycle in the SMSC911x driver. I've observed it on ARM >> Juno R1 board on the kernel compiled from next-202208010: >> >> ------------[ cut here ]------------ >> WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323 >> mdio_bus_phy_resume+0x34/0xc8 >> Modules linked in: smsc911x cpufreq_powersave cpufreq_conservative >> crct10dif_ce ip_tables x_tables ipv6 [last unloaded: smsc911x] >> CPU: 1 PID: 398 Comm: rtcwake Not tainted 5.19.0+ #940 >> Hardware name: ARM Juno development board (r1) (DT) >> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : mdio_bus_phy_resume+0x34/0xc8 >> lr : dpm_run_callback+0x74/0x350 >> ... >> Call trace: >> mdio_bus_phy_resume+0x34/0xc8 >> dpm_run_callback+0x74/0x350 >> device_resume+0xb8/0x258 >> dpm_resume+0x120/0x4a8 >> dpm_resume_end+0x14/0x28 >> suspend_devices_and_enter+0x164/0xa60 >> pm_suspend+0x25c/0x3a8 >> state_store+0x84/0x108 >> kobj_attr_store+0x14/0x28 >> sysfs_kf_write+0x60/0x70 >> kernfs_fop_write_iter+0x124/0x1a8 >> new_sync_write+0xd0/0x190 >> vfs_write+0x208/0x478 >> ksys_write+0x64/0xf0 >> __arm64_sys_write+0x14/0x20 >> invoke_syscall+0x40/0xf8 >> el0_svc_common.constprop.3+0x8c/0x120 >> do_el0_svc+0x28/0xc8 >> el0_svc+0x48/0xd0 >> el0t_64_sync_handler+0x94/0xb8 >> el0t_64_sync+0x15c/0x160 >> irq event stamp: 24406 >> hardirqs last enabled at (24405): [<ffff8000090c4734>] >> _raw_spin_unlock_irqrestore+0x8c/0x90 >> hardirqs last disabled at (24406): [<ffff8000090b3164>] >> el1_dbg+0x24/0x88 >> softirqs last enabled at (24144): [<ffff800008010488>] >> _stext+0x488/0x5cc >> softirqs last disabled at (24139): [<ffff80000809bf98>] >> irq_exit_rcu+0x168/0x1a8 >> ---[ end trace 0000000000000000 ]--- >> >> I hope the above information will help fixing the driver. > > Yes this is catching an actual issue in the driver in that the PHY > state machine is still running while the system is trying to suspend. > We could go about fixing it in a different number of ways, though I > believe this one is probably correct enough to work and fix the warning: Right, this fixes the warning (after fixing the minor compile issue, see below). Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > > diff --git a/drivers/net/ethernet/smsc/smsc911x.c > b/drivers/net/ethernet/smsc/smsc911x.c > index 3bf20211cceb..e9c0668a4dc0 100644 > --- a/drivers/net/ethernet/smsc/smsc911x.c > +++ b/drivers/net/ethernet/smsc/smsc911x.c > @@ -1037,6 +1037,8 @@ static int smsc911x_mii_probe(struct net_device > *dev) > return ret; > } > > + /* Indicate that the MAC is responsible for managing PHY PM */ > + phydev->mac_managed_pm = true; > phy_attached_info(phydev); > > phy_set_max_speed(phydev, SPEED_100); > @@ -2587,6 +2589,8 @@ static int smsc911x_suspend(struct device *dev) > if (netif_running(ndev)) { > netif_stop_queue(ndev); > netif_device_detach(ndev); > + if (!device_may_wakeup(dev)) > + phy_suspend(dev->phydev); phy_suspend(ndev->phydev); > } > > /* enable wake on LAN, energy detection and the external PME > @@ -2628,6 +2632,8 @@ static int smsc911x_resume(struct device *dev) > if (netif_running(ndev)) { > netif_device_attach(ndev); > netif_start_queue(ndev); > + if (!device_may_wakeup(dev)) > + phy_resume(dev->phydev); phy_resume(ndev->phydev); > } > > return 0; > Best regards
Hi Florian, On Fri, Aug 12, 2022 at 6:39 PM Florian Fainelli <f.fainelli@gmail.com> wrote: > On 8/12/22 04:19, Marek Szyprowski wrote: > > On 02.08.2022 01:34, Florian Fainelli wrote: > >> Calling mdio_bus_phy_resume() with neither the PHY state machine set to > >> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication > >> that we can produce a race condition looking like this: > >> > >> CPU0 CPU1 > >> bcmgenet_resume > >> -> phy_resume > >> -> phy_init_hw > >> -> phy_start > >> -> phy_resume > >> phy_start_aneg() > >> mdio_bus_phy_resume > >> -> phy_resume > >> -> phy_write(..., BMCR_RESET) > >> -> usleep() -> phy_read() > >> > >> with the phy_resume() function triggering a PHY behavior that might have > >> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix > >> brcm_fet_config_init()") for instance) that ultimately leads to an error > >> reading from the PHY. > >> > >> Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") > >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> > > > > This patch, as probably intended, triggers a warning during system > > suspend/resume cycle in the SMSC911x driver. I've observed it on ARM > > Juno R1 board on the kernel compiled from next-202208010: > > > > ------------[ cut here ]------------ > > WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323 > > mdio_bus_phy_resume+0x34/0xc8 I am seeing the same on the ape6evm and kzm9g development boards with smsc911x Ethernet, and on various boards with Renesas Ethernet (sh_eth or ravb) if Wake-on-LAN is disabled. > Yes this is catching an actual issue in the driver in that the PHY state > machine is still running while the system is trying to suspend. We could > go about fixing it in a different number of ways, though I believe this > one is probably correct enough to work and fix the warning: > --- a/drivers/net/ethernet/smsc/smsc911x.c > +++ b/drivers/net/ethernet/smsc/smsc911x.c > @@ -1037,6 +1037,8 @@ static int smsc911x_mii_probe(struct net_device *dev) > return ret; > } > > + /* Indicate that the MAC is responsible for managing PHY PM */ > + phydev->mac_managed_pm = true; > phy_attached_info(phydev); > > phy_set_max_speed(phydev, SPEED_100); > @@ -2587,6 +2589,8 @@ static int smsc911x_suspend(struct device *dev) > if (netif_running(ndev)) { > netif_stop_queue(ndev); > netif_device_detach(ndev); > + if (!device_may_wakeup(dev)) > + phy_suspend(dev->phydev); > } > > /* enable wake on LAN, energy detection and the external PME > @@ -2628,6 +2632,8 @@ static int smsc911x_resume(struct device *dev) > if (netif_running(ndev)) { > netif_device_attach(ndev); > netif_start_queue(ndev); > + if (!device_may_wakeup(dev)) > + phy_resume(dev->phydev); > } > > return 0; Thanks for your patch, but unfortunately this does not work on ape6evm and kzm9g, where the smsc911x device is connected to a power-managed bus. It looks like the PHY registers are accessed while the device is already suspended, causing a crash during system suspend: 8<--- cut here --- Unhandled fault: imprecise external abort (0x1406) at 0x00000000 [00000000] *pgd=00000000 Internal error: : 1406 [#1] SMP ARM CPU: 2 PID: 75 Comm: kworker/2:2 Not tainted 6.0.0-rc1-ape6evm-00977-gdc70725fbca5-dirty #375 Hardware name: Generic R8A73A4 (Flattened Device Tree) Workqueue: events_power_efficient phy_state_machine PC is at smsc911x_reg_read+0x30/0x48 LR is at smsc911x_reg_read+0x30/0x48 pc : [<c03807cc>] lr : [<c03807cc>] psr: 20030093 sp : f0891e30 ip : 00000000 fp : eff98605 r10: c092af80 r9 : c202e6e8 r8 : 20030013 r7 : c202e70c r6 : 000000a4 r5 : 20030093 r4 : c202e6c0 r3 : f0a31000 r2 : 00000002 r1 : f0a310a4 r0 : 00000000 Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 4552006a DAC: 00000051 Register r0 information: NULL pointer Register r1 information: 0-page vmalloc region starting at 0xf0a31000 allocated at smsc911x_drv_probe+0x108/0x934 Register r2 information: non-paged memory Register r3 information: 0-page vmalloc region starting at 0xf0a31000 allocated at smsc911x_drv_probe+0x108/0x934 Register r4 information: slab kmalloc-4k start c202e000 pointer offset 1728 size 4096 Register r5 information: non-paged memory Register r6 information: non-paged memory Register r7 information: slab kmalloc-4k start c202e000 pointer offset 1804 size 4096 Register r8 information: non-paged memory Register r9 information: slab kmalloc-4k start c202e000 pointer offset 1768 size 4096 Register r10 information: non-slab/vmalloc memory Register r11 information: non-slab/vmalloc memory Register r12 information: NULL pointer Process kworker/2:2 (pid: 75, stack limit = 0x5239c21f) Stack: (0xf0891e30 to 0xf0892000) 1e20: c202e6c0 00000006 c19da000 c202e6c0 1e40: 20030013 c03815bc 00000000 00000001 c19da000 c0381820 00000001 c19da000 1e60: c19da000 00000000 c39eacc0 00000000 c092af80 c037e694 c19da000 00000001 1e80: 00000000 c19da758 c19da000 00000001 00000000 c037e888 c29a0000 c29a03f4 1ea0: 00000078 c29a0448 c39eacc0 c037c878 c29a0000 c037cab4 c29a0000 c29a03f4 1ec0: c29a0000 c037fbc8 c29a0000 c29a03f4 c29a0000 c29a0448 00000004 c0377540 1ee0: c3ac8f00 c29a03f4 c29a0000 c0378588 009a03f4 c07cce88 c3ac8f00 c29a03f4 1f00: eff96780 00000000 eff98600 00000080 c092af80 c00426f0 00000001 00000000 1f20: c00425c4 c07cce88 c07bb574 00000000 c13fa5b8 c0da3f6c 00000000 c071c1da 1f40: 00000000 c07cce88 00000000 c3ac8f00 c3ac8f18 eff96780 c092a665 c07c9d00 1f60: eff967bc c0934e20 00000000 c0042b30 c2b8a500 c2ba65c0 c3ac8880 f0901ebc 1f80: c00428f0 c3ac8f00 00000000 c00494c4 c2ba65c0 c00493f4 00000000 00000000 1fa0: 00000000 00000000 00000000 c0009108 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 smsc911x_reg_read from smsc911x_mac_read+0x4c/0xa0 smsc911x_mac_read from smsc911x_mii_read+0x38/0xb4 smsc911x_mii_read from __mdiobus_read+0x70/0xc4 __mdiobus_read from mdiobus_read+0x34/0x48 mdiobus_read from genphy_update_link+0x10/0xc8 genphy_update_link from genphy_read_status+0x10/0xc4 genphy_read_status from lan87xx_read_status+0x10/0x11c lan87xx_read_status from phy_check_link_status+0x5c/0xbc phy_check_link_status from phy_state_machine+0x78/0x218 phy_state_machine from process_one_work+0x2f0/0x4c4 process_one_work from worker_thread+0x240/0x2d0 worker_thread from kthread+0xd0/0xe0 kthread from ret_from_fork+0x14/0x2c Exception stack(0xf0891fb0 to 0xf0891ff8) 1fa0: 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 Code: e5933000 e1a05000 e1a00004 e12fff33 (e1a01005) ---[ end trace 0000000000000000 ]--- Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On 8/16/2022 6:20 AM, Geert Uytterhoeven wrote: > Hi Florian, > > On Fri, Aug 12, 2022 at 6:39 PM Florian Fainelli <f.fainelli@gmail.com> wrote: >> On 8/12/22 04:19, Marek Szyprowski wrote: >>> On 02.08.2022 01:34, Florian Fainelli wrote: >>>> Calling mdio_bus_phy_resume() with neither the PHY state machine set to >>>> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication >>>> that we can produce a race condition looking like this: >>>> >>>> CPU0 CPU1 >>>> bcmgenet_resume >>>> -> phy_resume >>>> -> phy_init_hw >>>> -> phy_start >>>> -> phy_resume >>>> phy_start_aneg() >>>> mdio_bus_phy_resume >>>> -> phy_resume >>>> -> phy_write(..., BMCR_RESET) >>>> -> usleep() -> phy_read() >>>> >>>> with the phy_resume() function triggering a PHY behavior that might have >>>> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix >>>> brcm_fet_config_init()") for instance) that ultimately leads to an error >>>> reading from the PHY. >>>> >>>> Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") >>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> >>> >>> This patch, as probably intended, triggers a warning during system >>> suspend/resume cycle in the SMSC911x driver. I've observed it on ARM >>> Juno R1 board on the kernel compiled from next-202208010: >>> >>> ------------[ cut here ]------------ >>> WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323 >>> mdio_bus_phy_resume+0x34/0xc8 > > I am seeing the same on the ape6evm and kzm9g development > boards with smsc911x Ethernet, and on various boards with Renesas > Ethernet (sh_eth or ravb) if Wake-on-LAN is disabled. > >> Yes this is catching an actual issue in the driver in that the PHY state >> machine is still running while the system is trying to suspend. We could >> go about fixing it in a different number of ways, though I believe this >> one is probably correct enough to work and fix the warning: > >> --- a/drivers/net/ethernet/smsc/smsc911x.c >> +++ b/drivers/net/ethernet/smsc/smsc911x.c >> @@ -1037,6 +1037,8 @@ static int smsc911x_mii_probe(struct net_device *dev) >> return ret; >> } >> >> + /* Indicate that the MAC is responsible for managing PHY PM */ >> + phydev->mac_managed_pm = true; >> phy_attached_info(phydev); >> >> phy_set_max_speed(phydev, SPEED_100); >> @@ -2587,6 +2589,8 @@ static int smsc911x_suspend(struct device *dev) >> if (netif_running(ndev)) { >> netif_stop_queue(ndev); >> netif_device_detach(ndev); >> + if (!device_may_wakeup(dev)) >> + phy_suspend(dev->phydev); >> } >> >> /* enable wake on LAN, energy detection and the external PME >> @@ -2628,6 +2632,8 @@ static int smsc911x_resume(struct device *dev) >> if (netif_running(ndev)) { >> netif_device_attach(ndev); >> netif_start_queue(ndev); >> + if (!device_may_wakeup(dev)) >> + phy_resume(dev->phydev); >> } >> >> return 0; > > Thanks for your patch, but unfortunately this does not work on ape6evm > and kzm9g, where the smsc911x device is connected to a power-managed > bus. It looks like the PHY registers are accessed while the device > is already suspended, causing a crash during system suspend: Does it work better if you replace phy_suspend() with phy_stop() and phy_resume() with phy_start()?
Hi Florian, On Wed, Aug 17, 2022 at 4:28 AM Florian Fainelli <f.fainelli@gmail.com> wrote: > On 8/16/2022 6:20 AM, Geert Uytterhoeven wrote: > > On Fri, Aug 12, 2022 at 6:39 PM Florian Fainelli <f.fainelli@gmail.com> wrote: > >> On 8/12/22 04:19, Marek Szyprowski wrote: > >>> On 02.08.2022 01:34, Florian Fainelli wrote: > >>>> Calling mdio_bus_phy_resume() with neither the PHY state machine set to > >>>> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication > >>>> that we can produce a race condition looking like this: > >>>> > >>>> CPU0 CPU1 > >>>> bcmgenet_resume > >>>> -> phy_resume > >>>> -> phy_init_hw > >>>> -> phy_start > >>>> -> phy_resume > >>>> phy_start_aneg() > >>>> mdio_bus_phy_resume > >>>> -> phy_resume > >>>> -> phy_write(..., BMCR_RESET) > >>>> -> usleep() -> phy_read() > >>>> > >>>> with the phy_resume() function triggering a PHY behavior that might have > >>>> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix > >>>> brcm_fet_config_init()") for instance) that ultimately leads to an error > >>>> reading from the PHY. > >>>> > >>>> Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") > >>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> > >>> > >>> This patch, as probably intended, triggers a warning during system > >>> suspend/resume cycle in the SMSC911x driver. I've observed it on ARM > >>> Juno R1 board on the kernel compiled from next-202208010: > >>> > >>> ------------[ cut here ]------------ > >>> WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323 > >>> mdio_bus_phy_resume+0x34/0xc8 > > > > I am seeing the same on the ape6evm and kzm9g development > > boards with smsc911x Ethernet, and on various boards with Renesas > > Ethernet (sh_eth or ravb) if Wake-on-LAN is disabled. > > > >> Yes this is catching an actual issue in the driver in that the PHY state > >> machine is still running while the system is trying to suspend. We could > >> go about fixing it in a different number of ways, though I believe this > >> one is probably correct enough to work and fix the warning: > > > >> --- a/drivers/net/ethernet/smsc/smsc911x.c > >> +++ b/drivers/net/ethernet/smsc/smsc911x.c > >> @@ -1037,6 +1037,8 @@ static int smsc911x_mii_probe(struct net_device *dev) > >> return ret; > >> } > >> > >> + /* Indicate that the MAC is responsible for managing PHY PM */ > >> + phydev->mac_managed_pm = true; > >> phy_attached_info(phydev); > >> > >> phy_set_max_speed(phydev, SPEED_100); > >> @@ -2587,6 +2589,8 @@ static int smsc911x_suspend(struct device *dev) > >> if (netif_running(ndev)) { > >> netif_stop_queue(ndev); > >> netif_device_detach(ndev); > >> + if (!device_may_wakeup(dev)) > >> + phy_suspend(dev->phydev); > >> } > >> > >> /* enable wake on LAN, energy detection and the external PME > >> @@ -2628,6 +2632,8 @@ static int smsc911x_resume(struct device *dev) > >> if (netif_running(ndev)) { > >> netif_device_attach(ndev); > >> netif_start_queue(ndev); > >> + if (!device_may_wakeup(dev)) > >> + phy_resume(dev->phydev); > >> } > >> > >> return 0; > > > > Thanks for your patch, but unfortunately this does not work on ape6evm > > and kzm9g, where the smsc911x device is connected to a power-managed > > bus. It looks like the PHY registers are accessed while the device > > is already suspended, causing a crash during system suspend: > > Does it work better if you replace phy_suspend() with phy_stop() and > phy_resume() with phy_start()? Thank you, much better! Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On 17.08.2022 11:18, Geert Uytterhoeven wrote: > On Wed, Aug 17, 2022 at 4:28 AM Florian Fainelli <f.fainelli@gmail.com> wrote: >> On 8/16/2022 6:20 AM, Geert Uytterhoeven wrote: >>> On Fri, Aug 12, 2022 at 6:39 PM Florian Fainelli <f.fainelli@gmail.com> wrote: >>>> On 8/12/22 04:19, Marek Szyprowski wrote: >>>>> On 02.08.2022 01:34, Florian Fainelli wrote: >>>>>> Calling mdio_bus_phy_resume() with neither the PHY state machine set to >>>>>> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication >>>>>> that we can produce a race condition looking like this: >>>>>> >>>>>> CPU0 CPU1 >>>>>> bcmgenet_resume >>>>>> -> phy_resume >>>>>> -> phy_init_hw >>>>>> -> phy_start >>>>>> -> phy_resume >>>>>> phy_start_aneg() >>>>>> mdio_bus_phy_resume >>>>>> -> phy_resume >>>>>> -> phy_write(..., BMCR_RESET) >>>>>> -> usleep() -> phy_read() >>>>>> >>>>>> with the phy_resume() function triggering a PHY behavior that might have >>>>>> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix >>>>>> brcm_fet_config_init()") for instance) that ultimately leads to an error >>>>>> reading from the PHY. >>>>>> >>>>>> Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") >>>>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> >>>>> This patch, as probably intended, triggers a warning during system >>>>> suspend/resume cycle in the SMSC911x driver. I've observed it on ARM >>>>> Juno R1 board on the kernel compiled from next-202208010: >>>>> >>>>> ------------[ cut here ]------------ >>>>> WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323 >>>>> mdio_bus_phy_resume+0x34/0xc8 >>> I am seeing the same on the ape6evm and kzm9g development >>> boards with smsc911x Ethernet, and on various boards with Renesas >>> Ethernet (sh_eth or ravb) if Wake-on-LAN is disabled. >>> >>>> Yes this is catching an actual issue in the driver in that the PHY state >>>> machine is still running while the system is trying to suspend. We could >>>> go about fixing it in a different number of ways, though I believe this >>>> one is probably correct enough to work and fix the warning: >>>> --- a/drivers/net/ethernet/smsc/smsc911x.c >>>> +++ b/drivers/net/ethernet/smsc/smsc911x.c >>>> @@ -1037,6 +1037,8 @@ static int smsc911x_mii_probe(struct net_device *dev) >>>> return ret; >>>> } >>>> >>>> + /* Indicate that the MAC is responsible for managing PHY PM */ >>>> + phydev->mac_managed_pm = true; >>>> phy_attached_info(phydev); >>>> >>>> phy_set_max_speed(phydev, SPEED_100); >>>> @@ -2587,6 +2589,8 @@ static int smsc911x_suspend(struct device *dev) >>>> if (netif_running(ndev)) { >>>> netif_stop_queue(ndev); >>>> netif_device_detach(ndev); >>>> + if (!device_may_wakeup(dev)) >>>> + phy_suspend(dev->phydev); >>>> } >>>> >>>> /* enable wake on LAN, energy detection and the external PME >>>> @@ -2628,6 +2632,8 @@ static int smsc911x_resume(struct device *dev) >>>> if (netif_running(ndev)) { >>>> netif_device_attach(ndev); >>>> netif_start_queue(ndev); >>>> + if (!device_may_wakeup(dev)) >>>> + phy_resume(dev->phydev); >>>> } >>>> >>>> return 0; >>> Thanks for your patch, but unfortunately this does not work on ape6evm >>> and kzm9g, where the smsc911x device is connected to a power-managed >>> bus. It looks like the PHY registers are accessed while the device >>> is already suspended, causing a crash during system suspend: >> Does it work better if you replace phy_suspend() with phy_stop() and >> phy_resume() with phy_start()? > Thank you, much better! > Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> It also works for me. Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Best regards
On Tue, Aug 16, 2022 at 3:20 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Fri, Aug 12, 2022 at 6:39 PM Florian Fainelli <f.fainelli@gmail.com> wrote: > > On 8/12/22 04:19, Marek Szyprowski wrote: > > > On 02.08.2022 01:34, Florian Fainelli wrote: > > >> Calling mdio_bus_phy_resume() with neither the PHY state machine set to > > >> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication > > >> that we can produce a race condition looking like this: > > >> > > >> CPU0 CPU1 > > >> bcmgenet_resume > > >> -> phy_resume > > >> -> phy_init_hw > > >> -> phy_start > > >> -> phy_resume > > >> phy_start_aneg() > > >> mdio_bus_phy_resume > > >> -> phy_resume > > >> -> phy_write(..., BMCR_RESET) > > >> -> usleep() -> phy_read() > > >> > > >> with the phy_resume() function triggering a PHY behavior that might have > > >> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix > > >> brcm_fet_config_init()") for instance) that ultimately leads to an error > > >> reading from the PHY. > > >> > > >> Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") > > >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> > > > > > > This patch, as probably intended, triggers a warning during system > > > suspend/resume cycle in the SMSC911x driver. I've observed it on ARM > > > Juno R1 board on the kernel compiled from next-202208010: > > > > > > ------------[ cut here ]------------ > > > WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323 > > > mdio_bus_phy_resume+0x34/0xc8 > > I am seeing the same on the ape6evm and kzm9g development > boards with smsc911x Ethernet, and on various boards with Renesas So the smsc911x issue was fixed by commit 3ce9f2bef7552893 ("net: smsc911x: Stop and start PHY during suspend and resume"). > Ethernet (sh_eth or ravb) if Wake-on-LAN is disabled. The issue is still seen with sh_eth and ravb. I have sent to fixes: https://lore.kernel.org/r/c6e1331b9bef61225fa4c09db3ba3e2e7214ba2d.1663598886.git.geert+renesas@glider.be https://lore.kernel.org/r/c6e1331b9bef61225fa4c09db3ba3e2e7214ba2d.1663598886.git.geert+renesas@glider.be Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Mon, 1 Aug 2022 16:34:03 -0700 Florian Fainelli wrote: > Calling mdio_bus_phy_resume() with neither the PHY state machine set to > PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication > that we can produce a race condition looking like this: > > CPU0 CPU1 > bcmgenet_resume > -> phy_resume > -> phy_init_hw > -> phy_start > -> phy_resume > phy_start_aneg() > mdio_bus_phy_resume > -> phy_resume > -> phy_write(..., BMCR_RESET) > -> usleep() -> phy_read() > > with the phy_resume() function triggering a PHY behavior that might have > to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix > brcm_fet_config_init()") for instance) that ultimately leads to an error > reading from the PHY. Hi Florian! There were some follow ups on this one, were all the known reports covered at this point or there are still platforms to tweak?
On 9/22/2022 5:31 AM, Jakub Kicinski wrote: > On Mon, 1 Aug 2022 16:34:03 -0700 Florian Fainelli wrote: >> Calling mdio_bus_phy_resume() with neither the PHY state machine set to >> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication >> that we can produce a race condition looking like this: >> >> CPU0 CPU1 >> bcmgenet_resume >> -> phy_resume >> -> phy_init_hw >> -> phy_start >> -> phy_resume >> phy_start_aneg() >> mdio_bus_phy_resume >> -> phy_resume >> -> phy_write(..., BMCR_RESET) >> -> usleep() -> phy_read() >> >> with the phy_resume() function triggering a PHY behavior that might have >> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix >> brcm_fet_config_init()") for instance) that ultimately leads to an error >> reading from the PHY. > > Hi Florian! There were some follow ups on this one, were all the known > reports covered at this point or there are still platforms to tweak? This is the only active thread that I am aware of, and Lukas and I are in agreement as to what to do next: https://lore.kernel.org/all/20220918191333.GA2107@wunner.de/ expect him to submit a follow on the warning condition to deal with the smsc95xx USB driver. Thanks!
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 46acddd865a7..608de5a94165 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -316,6 +316,12 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev) phydev->suspended_by_mdio_bus = 0; + /* If we managed to get here with the PHY state machine in a state other + * than PHY_HALTED this is an indication that something went wrong and + * we should most likely be using MAC managed PM and we are not. + */ + WARN_ON(phydev->state != PHY_HALTED && !phydev->mac_managed_pm); + ret = phy_init_hw(phydev); if (ret < 0) return ret;
Calling mdio_bus_phy_resume() with neither the PHY state machine set to PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication that we can produce a race condition looking like this: CPU0 CPU1 bcmgenet_resume -> phy_resume -> phy_init_hw -> phy_start -> phy_resume phy_start_aneg() mdio_bus_phy_resume -> phy_resume -> phy_write(..., BMCR_RESET) -> usleep() -> phy_read() with the phy_resume() function triggering a PHY behavior that might have to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix brcm_fet_config_init()") for instance) that ultimately leads to an error reading from the PHY. Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> --- drivers/net/phy/phy_device.c | 6 ++++++ 1 file changed, 6 insertions(+)