diff mbox series

[net-next,v3,5/7] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling

Message ID 748ac44eeb97b209f66182f3788d2a49d7bc28fe.1652343655.git.lukas@wunner.de (mailing list archive)
State Accepted
Commit 1ce8b37241ed291af56f7a49bbdbf20c08728e88
Headers show
Series Polling be gone on LAN95xx | expand

Commit Message

Lukas Wunner May 12, 2022, 8:42 a.m. UTC
Link status of SMSC LAN95xx chips is polled once per second, even though
they're capable of signaling PHY interrupts through the MAC layer.

Forward those interrupts to the PHY driver to avoid polling.  Benefits
are reduced bus traffic, reduced CPU overhead and quicker interface
bringup.

Polling was introduced in 2016 by commit d69d16949346 ("usbnet:
smsc95xx: fix link detection for disabled autonegotiation").
Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt,
hence couldn't detect link-up events when auto-negotiation was disabled.
The proper solution would have been to enable the ENERGYON interrupt
instead of polling.

Since then, PHY handling was moved from the LAN95xx driver to the SMSC
PHY driver with commit 05b35e7eb9a1 ("smsc95xx: add phylib support").
That PHY driver is capable of link detection with auto-negotiation
disabled because it enables the ENERGYON interrupt.

Note that signaling interrupts through the MAC layer not only works with
the integrated PHY, but also with an external PHY, provided its
interrupt pin is attached to LAN95xx's nPHY_INT pin.

In the unlikely event that the interrupt pin of an external PHY is
attached to a GPIO of the SoC (or not connected at all), the driver can
be amended to retrieve the irq from the PHY's of_node.

To forward PHY interrupts to phylib, it is not sufficient to call
phy_mac_interrupt().  Instead, the PHY's interrupt handler needs to run
so that PHY interrupts are cleared.  That's because according to page
119 of the LAN950x datasheet, "The source of this interrupt is a level.
The interrupt persists until it is cleared in the PHY."

https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf

Therefore, create an IRQ domain with a single IRQ for the PHY.  In the
future, the IRQ domain may be extended to support the 11 GPIOs on the
LAN95xx.

Normally the PHY interrupt should be masked until the PHY driver has
cleared it.  However masking requires a (sleeping) USB transaction and
interrupts are received in (non-sleepable) softirq context.  I decided
not to mask the interrupt at all (by using the dummy_irq_chip's noop
->irq_mask() callback):  The USB interrupt endpoint is polled in 1 msec
intervals and normally that's sufficient to wake the PHY driver's IRQ
thread and have it clear the interrupt.  If it does take longer, worst
thing that can happen is the IRQ thread is woken again.  No big deal.

Because PHY interrupts are now perpetually enabled, there's no need to
selectively enable them on suspend.  So remove all invocations of
smsc95xx_enable_phy_wakeup_interrupts().

In smsc95xx_resume(), move the call of phy_init_hw() before
usbnet_resume() (which restarts the status URB) to ensure that the PHY
is fully initialized when an interrupt is handled.

Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch> # from a PHY perspective
Cc: Andre Edich <andre.edich@microchip.com>
---
Only change since v2:
 * Drop call to __irq_enter_raw() which worked around a warning in
   generic_handle_domain_irq().  That warning is gone since
   792ea6a074ae (queued on tip.git/irq/urgent).
   (Marc Zyngier, Thomas Gleixner)

 drivers/net/usb/smsc95xx.c | 113 ++++++++++++++++++++-----------------
 1 file changed, 61 insertions(+), 52 deletions(-)

Comments

Marek Szyprowski May 17, 2022, 10:18 a.m. UTC | #1
Hi Lukas,

On 12.05.2022 10:42, Lukas Wunner wrote:
> Link status of SMSC LAN95xx chips is polled once per second, even though
> they're capable of signaling PHY interrupts through the MAC layer.
>
> Forward those interrupts to the PHY driver to avoid polling.  Benefits
> are reduced bus traffic, reduced CPU overhead and quicker interface
> bringup.
>
> Polling was introduced in 2016 by commit d69d16949346 ("usbnet:
> smsc95xx: fix link detection for disabled autonegotiation").
> Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt,
> hence couldn't detect link-up events when auto-negotiation was disabled.
> The proper solution would have been to enable the ENERGYON interrupt
> instead of polling.
>
> Since then, PHY handling was moved from the LAN95xx driver to the SMSC
> PHY driver with commit 05b35e7eb9a1 ("smsc95xx: add phylib support").
> That PHY driver is capable of link detection with auto-negotiation
> disabled because it enables the ENERGYON interrupt.
>
> Note that signaling interrupts through the MAC layer not only works with
> the integrated PHY, but also with an external PHY, provided its
> interrupt pin is attached to LAN95xx's nPHY_INT pin.
>
> In the unlikely event that the interrupt pin of an external PHY is
> attached to a GPIO of the SoC (or not connected at all), the driver can
> be amended to retrieve the irq from the PHY's of_node.
>
> To forward PHY interrupts to phylib, it is not sufficient to call
> phy_mac_interrupt().  Instead, the PHY's interrupt handler needs to run
> so that PHY interrupts are cleared.  That's because according to page
> 119 of the LAN950x datasheet, "The source of this interrupt is a level.
> The interrupt persists until it is cleared in the PHY."
>
> https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf
>
> Therefore, create an IRQ domain with a single IRQ for the PHY.  In the
> future, the IRQ domain may be extended to support the 11 GPIOs on the
> LAN95xx.
>
> Normally the PHY interrupt should be masked until the PHY driver has
> cleared it.  However masking requires a (sleeping) USB transaction and
> interrupts are received in (non-sleepable) softirq context.  I decided
> not to mask the interrupt at all (by using the dummy_irq_chip's noop
> ->irq_mask() callback):  The USB interrupt endpoint is polled in 1 msec
> intervals and normally that's sufficient to wake the PHY driver's IRQ
> thread and have it clear the interrupt.  If it does take longer, worst
> thing that can happen is the IRQ thread is woken again.  No big deal.
>
> Because PHY interrupts are now perpetually enabled, there's no need to
> selectively enable them on suspend.  So remove all invocations of
> smsc95xx_enable_phy_wakeup_interrupts().
>
> In smsc95xx_resume(), move the call of phy_init_hw() before
> usbnet_resume() (which restarts the status URB) to ensure that the PHY
> is fully initialized when an interrupt is handled.
>
> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500
> Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch> # from a PHY perspective
> Cc: Andre Edich <andre.edich@microchip.com>

This patch landed in the recent linux next-20220516 as commit 
1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to 
avoid polling"). Unfortunately it breaks smsc95xx usb ethernet operation 
after system suspend-resume cycle. On the Odroid XU3 board I got the 
following warning in the kernel log:

# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
PM: suspend entry (deep)
Filesystems sync: 0.001 seconds
Freezing user space processes ... (elapsed 0.002 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
printk: Suspending console(s) (use no_console_suspend to debug)
smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
------------[ cut here ]------------
WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946 
phy_state_machine+0x98/0x28c
Modules linked in: snd_soc_hdmi_codec snd_soc_odroid governor_passive 
snd_soc_i2s exynos_bus snd_soc_idma snd_soc_s3c_dma exynosdrm 
analogix_dp snd_soc_max98090 snd_soc_core ac97_bus snd_pcm_dmaengine 
snd_pcm clk_s2mps11 rtc_s5m snd_timer snd soundcore ina2xx exynos_gsc 
pwm_samsung exynos_adc s5p_jpeg ohci_exynosv4l2_mem2mem phy_exynos_usb2 
panfrost ehci_exynos s5p_mfc drm_shmem_helper videobuf2_dma_contig 
videobuf2_memops videobuf2_v4l2 videobuf2_common gpu_sched videodev mc 
exynos_ppmu exynos5422_dmc exynos_nocp s5p_sss rtc_s3c exynos_rng 
s3c2410_wdt s5p_cec pwm_fan
CPU: 2 PID: 73 Comm: kworker/2:1 Tainted: G        W 
5.18.0-rc6-01433-g1ce8b37241ed #5040
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events_power_efficient phy_state_machine
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x40/0x4c
  dump_stack_lvl from __warn+0xc8/0x13c
  __warn from warn_slowpath_fmt+0x5c/0xb4
  warn_slowpath_fmt from phy_state_machine+0x98/0x28c
  phy_state_machine from process_one_work+0x1ec/0x4cc
  process_one_work from worker_thread+0x58/0x54c
  worker_thread from kthread+0xd0/0xec
  kthread from ret_from_fork+0x14/0x2c
Exception stack(0xf0aa9fb0 to 0xf0aa9ff8)
...
---[ end trace 0000000000000000 ]---

It looks that the driver's suspend/resume operations might need some 
adjustments. After the system suspend/resume cycle the driver is not 
operational anymore. Reverting the $subject patch on top of linux 
next-20220516 restores ethernet operation after system suspend/resume.

> ---
> Only change since v2:
>   * Drop call to __irq_enter_raw() which worked around a warning in
>     generic_handle_domain_irq().  That warning is gone since
>     792ea6a074ae (queued on tip.git/irq/urgent).
>     (Marc Zyngier, Thomas Gleixner)
>
>   drivers/net/usb/smsc95xx.c | 113 ++++++++++++++++++++-----------------
>   1 file changed, 61 insertions(+), 52 deletions(-)

 > ...

Best regards
Lukas Wunner May 19, 2022, 7:08 p.m. UTC | #2
On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
> This patch landed in the recent linux next-20220516 as commit 
> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to 
> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet operation 
> after system suspend-resume cycle. On the Odroid XU3 board I got the 
> following warning in the kernel log:
> 
> # time rtcwake -s10 -mmem
> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
> PM: suspend entry (deep)
> Filesystems sync: 0.001 seconds
> Freezing user space processes ... (elapsed 0.002 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> printk: Suspending console(s) (use no_console_suspend to debug)
> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
> phy_state_machine+0x98/0x28c
[...]
> It looks that the driver's suspend/resume operations might need some 
> adjustments. After the system suspend/resume cycle the driver is not 
> operational anymore. Reverting the $subject patch on top of linux 
> next-20220516 restores ethernet operation after system suspend/resume.

Thanks a lot for the report.  It seems the PHY is signaling a link change
shortly before system sleep and by the time the phy_state_machine() worker
gets around to handle it, the device has already been suspended and thus
refuses any further USB requests with -EHOSTUNREACH (-113):

usb_suspend_both()
  usb_suspend_interface()
    smsc95xx_suspend()
      usbnet_suspend()
        __usbnet_status_stop_force() # stops interrupt polling,
                                     # link change is signaled before this

  udev->can_submit = 0               # refuse further URBs

Assuming the above theory is correct, calling phy_stop_machine()
after usbnet_suspend() would be sufficient to fix the issue.
It cancels the phy_state_machine() worker.

The small patch below does that.  Could you give it a spin?

Taking a step back though, I'm wondering if there's a bigger problem here:
This is a USB device, so we stop receiving interrupts once the Interrupt
Endpoint is no longer polled.  But what if a PHY's interrupt is attached
to a GPIO of the SoC and that interrupt is raised while the system is
suspending?  The interrupt handler may likewise try to reach an
inaccessible (suspended) device.

The right thing to do would probably be to signal wakeup.  But the
PHY drivers' irq handlers instead schedule the phy_state_machine().
Perhaps we need something like the following at the top of
phy_state_machine():

	if (phydev->suspended) {
		pm_wakeup_dev_event(&phydev->mdio.dev, 0, true);
		return;
	}

However, phydev->suspended is set at the *bottom* of phy_suspend(),
it would have to be set at the *top* of mdio_bus_phy_suspend()
for the above to be correct.  Hmmm...

Thanks,

Lukas

-- >8 --
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index bd03e16..d351a6c 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1201,6 +1201,7 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
 	}
 
 	pdata->phydev->irq = phy_irq;
+	pdata->phydev->mac_managed_pm = true;
 	pdata->phydev->is_internal = is_internal_phy;
 
 	/* detect device revision as different features may be available */
@@ -1496,6 +1497,9 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
 		return ret;
 	}
 
+	if (netif_running(dev->net))
+		phy_stop(pdata->phydev);
+
 	if (pdata->suspend_flags) {
 		netdev_warn(dev->net, "error during last resume\n");
 		pdata->suspend_flags = 0;
@@ -1778,6 +1782,8 @@ static int smsc95xx_resume(struct usb_interface *intf)
 	}
 
 	phy_init_hw(pdata->phydev);
+	if (netif_running(dev->net))
+		phy_start(pdata->phydev);
 
 	ret = usbnet_resume(intf);
 	if (ret < 0)
Marek Szyprowski May 19, 2022, 9:22 p.m. UTC | #3
Hi Lukas,

On 19.05.2022 21:08, Lukas Wunner wrote:
> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
>> This patch landed in the recent linux next-20220516 as commit
>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to
>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet operation
>> after system suspend-resume cycle. On the Odroid XU3 board I got the
>> following warning in the kernel log:
>>
>> # time rtcwake -s10 -mmem
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
>> PM: suspend entry (deep)
>> Filesystems sync: 0.001 seconds
>> Freezing user space processes ... (elapsed 0.002 seconds) done.
>> OOM killer disabled.
>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> printk: Suspending console(s) (use no_console_suspend to debug)
>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
>> ------------[ cut here ]------------
>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
>> phy_state_machine+0x98/0x28c
> [...]
>> It looks that the driver's suspend/resume operations might need some
>> adjustments. After the system suspend/resume cycle the driver is not
>> operational anymore. Reverting the $subject patch on top of linux
>> next-20220516 restores ethernet operation after system suspend/resume.
> Thanks a lot for the report.  It seems the PHY is signaling a link change
> shortly before system sleep and by the time the phy_state_machine() worker
> gets around to handle it, the device has already been suspended and thus
> refuses any further USB requests with -EHOSTUNREACH (-113):
>
> usb_suspend_both()
>    usb_suspend_interface()
>      smsc95xx_suspend()
>        usbnet_suspend()
>          __usbnet_status_stop_force() # stops interrupt polling,
>                                       # link change is signaled before this
>
>    udev->can_submit = 0               # refuse further URBs
>
> Assuming the above theory is correct, calling phy_stop_machine()
> after usbnet_suspend() would be sufficient to fix the issue.
> It cancels the phy_state_machine() worker.
>
> The small patch below does that.  Could you give it a spin?

That's it. Your analysis is right and the patch fixes the issue. Thanks!

Feel free to add:

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

> Taking a step back though, I'm wondering if there's a bigger problem here:
> This is a USB device, so we stop receiving interrupts once the Interrupt
> Endpoint is no longer polled.  But what if a PHY's interrupt is attached
> to a GPIO of the SoC and that interrupt is raised while the system is
> suspending?  The interrupt handler may likewise try to reach an
> inaccessible (suspended) device.
>
> The right thing to do would probably be to signal wakeup.  But the
> PHY drivers' irq handlers instead schedule the phy_state_machine().
> Perhaps we need something like the following at the top of
> phy_state_machine():
>
> 	if (phydev->suspended) {
> 		pm_wakeup_dev_event(&phydev->mdio.dev, 0, true);
> 		return;
> 	}
>
> However, phydev->suspended is set at the *bottom* of phy_suspend(),
> it would have to be set at the *top* of mdio_bus_phy_suspend()
> for the above to be correct.  Hmmm...
Well, your concern sounds valid, but I don't have a board with such hw 
configuration, so I cannot really test.
> Thanks,
>
> Lukas
>
> -- >8 --
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index bd03e16..d351a6c 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -1201,6 +1201,7 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
>   	}
>   
>   	pdata->phydev->irq = phy_irq;
> +	pdata->phydev->mac_managed_pm = true;
>   	pdata->phydev->is_internal = is_internal_phy;
>   
>   	/* detect device revision as different features may be available */
> @@ -1496,6 +1497,9 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
>   		return ret;
>   	}
>   
> +	if (netif_running(dev->net))
> +		phy_stop(pdata->phydev);
> +
>   	if (pdata->suspend_flags) {
>   		netdev_warn(dev->net, "error during last resume\n");
>   		pdata->suspend_flags = 0;
> @@ -1778,6 +1782,8 @@ static int smsc95xx_resume(struct usb_interface *intf)
>   	}
>   
>   	phy_init_hw(pdata->phydev);
> +	if (netif_running(dev->net))
> +		phy_start(pdata->phydev);
>   
>   	ret = usbnet_resume(intf);
>   	if (ret < 0)
>
Best regards
Lukas Wunner May 23, 2022, 9:43 a.m. UTC | #4
On Thu, May 19, 2022 at 11:22:36PM +0200, Marek Szyprowski wrote:
> On 19.05.2022 21:08, Lukas Wunner wrote:
> > Taking a step back though, I'm wondering if there's a bigger problem here:
> > This is a USB device, so we stop receiving interrupts once the Interrupt
> > Endpoint is no longer polled.  But what if a PHY's interrupt is attached
> > to a GPIO of the SoC and that interrupt is raised while the system is
> > suspending?  The interrupt handler may likewise try to reach an
> > inaccessible (suspended) device.
> >
> > The right thing to do would probably be to signal wakeup.  But the
> > PHY drivers' irq handlers instead schedule the phy_state_machine().
> > Perhaps we need something like the following at the top of
> > phy_state_machine():
> >
> > 	if (phydev->suspended) {
> > 		pm_wakeup_dev_event(&phydev->mdio.dev, 0, true);
> > 		return;
> > 	}
> >
> > However, phydev->suspended is set at the *bottom* of phy_suspend(),
> > it would have to be set at the *top* of mdio_bus_phy_suspend()
> > for the above to be correct.  Hmmm...
> 
> Well, your concern sounds valid, but I don't have a board with such hw 
> configuration, so I cannot really test.

I'm torn whether I should submit the quick fix in my last e-mail
or attempt to address the deeper issue.  The quick fix would ensure
v5.19-rc1 isn't broken, but if possible I'd rather address the deeper
issue...

Below is another patch.  Would you mind testing if it fixes the problem
for you?  It's a replacement for the patch in my last e-mail and seeks
to fix the problem for all drivers, not just smsc95xx.  If you don't
have time to test it, let me know and I'll just submit the quick fix
in my previous e-mail.

BTW, getting a PHY interrupt on suspend seems like a corner case to me,
so I'm amazed you found this and seem to be able to reproduce it 100%.
Out of curiosity, is this a CI test you're performing?

Thanks,

Lukas

-- >8 --

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index ef62f357b76d..c2442c38d312 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -31,6 +31,7 @@
 #include <linux/io.h>
 #include <linux/uaccess.h>
 #include <linux/atomic.h>
+#include <linux/suspend.h>
 #include <net/netlink.h>
 #include <net/genetlink.h>
 #include <net/sock.h>
@@ -976,6 +977,25 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 	struct phy_driver *drv = phydev->drv;
 	irqreturn_t ret;
 
+	if (IS_ENABLED(CONFIG_PM_SLEEP) &&
+	    (phydev->mdio.dev.power.is_prepared ||
+	     phydev->mdio.dev.power.is_suspended)) {
+		struct net_device *netdev = phydev->attached_dev;
+
+		if (netdev) {
+			struct device *parent = netdev->dev.parent;
+
+			if (netdev->wol_enabled)
+				pm_system_wakeup();
+			else if (device_may_wakeup(&netdev->dev))
+				pm_wakeup_dev_event(&netdev->dev, 0, true);
+			else if (parent && device_may_wakeup(parent))
+				pm_wakeup_dev_event(parent, 0, true);
+		}
+
+		return IRQ_HANDLED;
+	}
+
 	mutex_lock(&phydev->lock);
 	ret = drv->handle_interrupt(phydev);
 	mutex_unlock(&phydev->lock);
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 431a8719c635..da6d70ddf167 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -283,8 +283,11 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev)
 	 * may call phy routines that try to grab the same lock, and that may
 	 * lead to a deadlock.
 	 */
-	if (phydev->attached_dev && phydev->adjust_link)
+	if (phydev->attached_dev && phydev->adjust_link) {
+		if (phy_interrupt_is_valid(phydev))
+			synchronize_irq(phydev->irq);
 		phy_stop_machine(phydev);
+	}
 
 	if (!mdio_bus_phy_may_suspend(phydev))
 		return 0;
Marek Szyprowski May 23, 2022, 11:38 a.m. UTC | #5
Hi Lukas,

On 23.05.2022 11:43, Lukas Wunner wrote:
> On Thu, May 19, 2022 at 11:22:36PM +0200, Marek Szyprowski wrote:
>> On 19.05.2022 21:08, Lukas Wunner wrote:
>>> Taking a step back though, I'm wondering if there's a bigger problem here:
>>> This is a USB device, so we stop receiving interrupts once the Interrupt
>>> Endpoint is no longer polled.  But what if a PHY's interrupt is attached
>>> to a GPIO of the SoC and that interrupt is raised while the system is
>>> suspending?  The interrupt handler may likewise try to reach an
>>> inaccessible (suspended) device.
>>>
>>> The right thing to do would probably be to signal wakeup.  But the
>>> PHY drivers' irq handlers instead schedule the phy_state_machine().
>>> Perhaps we need something like the following at the top of
>>> phy_state_machine():
>>>
>>> 	if (phydev->suspended) {
>>> 		pm_wakeup_dev_event(&phydev->mdio.dev, 0, true);
>>> 		return;
>>> 	}
>>>
>>> However, phydev->suspended is set at the *bottom* of phy_suspend(),
>>> it would have to be set at the *top* of mdio_bus_phy_suspend()
>>> for the above to be correct.  Hmmm...
>> Well, your concern sounds valid, but I don't have a board with such hw
>> configuration, so I cannot really test.
> I'm torn whether I should submit the quick fix in my last e-mail
> or attempt to address the deeper issue.  The quick fix would ensure
> v5.19-rc1 isn't broken, but if possible I'd rather address the deeper
> issue...
>
> Below is another patch.  Would you mind testing if it fixes the problem
> for you?  It's a replacement for the patch in my last e-mail and seeks
> to fix the problem for all drivers, not just smsc95xx.  If you don't
> have time to test it, let me know and I'll just submit the quick fix
> in my previous e-mail.

I've just tested it on top of next-20220519 and I was not able to 
reproduce the issue, so it looks it also fixes the issue. :)

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

> BTW, getting a PHY interrupt on suspend seems like a corner case to me,
> so I'm amazed you found this and seem to be able to reproduce it 100%.
> Out of curiosity, is this a CI test you're performing?

I've have some semi-automated (based on simple bash scripts) tests 
utilizing remote test boards (with remote power on/off control, serial 
console, tftp booting). This issue was quite easy to reproduce, even 
manually. Maybe it is somehow specific to the Odroid-XU3/XU3-lite boards 
and the way the smsc95xx USB ethernet chip is connected there, but it 
happens there usually in 2 of 3 suspend/resume tests.

Best regards
Andrew Lunn May 23, 2022, 12:34 p.m. UTC | #6
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -283,8 +283,11 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev)
>  	 * may call phy routines that try to grab the same lock, and that may
>  	 * lead to a deadlock.
>  	 */
> -	if (phydev->attached_dev && phydev->adjust_link)
> +	if (phydev->attached_dev && phydev->adjust_link) {
> +		if (phy_interrupt_is_valid(phydev))
> +			synchronize_irq(phydev->irq);
>  		phy_stop_machine(phydev);
> +	}

What is this hunk trying to achieve? As far as i know, interrupts have
not been disabled. So as soon as the call to synchronize_irq()
finishes, could well be another interrupt happens.

	  Andrew
Lukas Wunner May 23, 2022, 1:47 p.m. UTC | #7
On Mon, May 23, 2022 at 02:34:49PM +0200, Andrew Lunn wrote:
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -283,8 +283,11 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev)
> >  	 * may call phy routines that try to grab the same lock, and that may
> >  	 * lead to a deadlock.
> >  	 */
> > -	if (phydev->attached_dev && phydev->adjust_link)
> > +	if (phydev->attached_dev && phydev->adjust_link) {
> > +		if (phy_interrupt_is_valid(phydev))
> > +			synchronize_irq(phydev->irq);
> >  		phy_stop_machine(phydev);
> > +	}
> 
> What is this hunk trying to achieve? As far as i know, interrupts have
> not been disabled. So as soon as the call to synchronize_irq()
> finishes, could well be another interrupt happens.

That other interrupt would bail out of phy_interrupt() because
the is_prepared flag is set on the PHY's struct device, see
first hunk of the patch.

The problem is that an interrupt may occur before the system
sleep transition commences.  phy_interrupt() will notice that
is_prepared is not (yet) set, hence invokes drv->handle_interrupt().
Let's say the IRQ thread is preempted at that point, the system
sleep transition is started and mdio_bus_phy_suspend() is run.
It calls phy_stop_machine(), so the state machine is now stopped.
Now phy_interrupt() continues, and the PHY driver's ->handle_interrupt()
callback starts the state machine.  Boom, that's not what we want.

So the synchronize_irq() ensures that any already running
phy_interrupt() runs to completion before phy_stop_machine()
is called.  It doesn't matter if another interrupt occurs
because then is_prepared will have been set and therefore
phy_interrupt() won't call drv->handle_interrupt().

Let me know if I haven't explained it in sufficient clarity,
I'll be happy to try again. :)

I'm more concerned about the first hunk of the patch because I'm
not sure I got the wakeup stuff right...

Thanks,

Lukas
Andrew Lunn May 24, 2022, 12:52 a.m. UTC | #8
On Mon, May 23, 2022 at 03:47:09PM +0200, Lukas Wunner wrote:
> On Mon, May 23, 2022 at 02:34:49PM +0200, Andrew Lunn wrote:
> > > --- a/drivers/net/phy/phy_device.c
> > > +++ b/drivers/net/phy/phy_device.c
> > > @@ -283,8 +283,11 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev)
> > >  	 * may call phy routines that try to grab the same lock, and that may
> > >  	 * lead to a deadlock.
> > >  	 */
> > > -	if (phydev->attached_dev && phydev->adjust_link)
> > > +	if (phydev->attached_dev && phydev->adjust_link) {
> > > +		if (phy_interrupt_is_valid(phydev))
> > > +			synchronize_irq(phydev->irq);
> > >  		phy_stop_machine(phydev);
> > > +	}
> > 
> > What is this hunk trying to achieve? As far as i know, interrupts have
> > not been disabled. So as soon as the call to synchronize_irq()
> > finishes, could well be another interrupt happens.
> 
> That other interrupt would bail out of phy_interrupt() because
> the is_prepared flag is set on the PHY's struct device, see
> first hunk of the patch.
> 
> The problem is that an interrupt may occur before the system
> sleep transition commences.  phy_interrupt() will notice that
> is_prepared is not (yet) set, hence invokes drv->handle_interrupt().
> Let's say the IRQ thread is preempted at that point, the system
> sleep transition is started and mdio_bus_phy_suspend() is run.
> It calls phy_stop_machine(), so the state machine is now stopped.
> Now phy_interrupt() continues, and the PHY driver's ->handle_interrupt()
> callback starts the state machine.  Boom, that's not what we want.
> 
> So the synchronize_irq() ensures that any already running
> phy_interrupt() runs to completion before phy_stop_machine()
> is called.  It doesn't matter if another interrupt occurs
> because then is_prepared will have been set and therefore
> phy_interrupt() won't call drv->handle_interrupt().
> 
> Let me know if I haven't explained it in sufficient clarity,
> I'll be happy to try again. :)

I think some comments are needed. If i don't understand what is going
on, i'm sure others don't as well.

    Andrew
Andrew Lunn May 24, 2022, 1:08 a.m. UTC | #9
> @@ -976,6 +977,25 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
>  	struct phy_driver *drv = phydev->drv;
>  	irqreturn_t ret;
>  
> +	if (IS_ENABLED(CONFIG_PM_SLEEP) &&
> +	    (phydev->mdio.dev.power.is_prepared ||
> +	     phydev->mdio.dev.power.is_suspended)) {
> +		struct net_device *netdev = phydev->attached_dev;
> +
> +		if (netdev) {
> +			struct device *parent = netdev->dev.parent;
> +
> +			if (netdev->wol_enabled)
> +				pm_system_wakeup();
> +			else if (device_may_wakeup(&netdev->dev))
> +				pm_wakeup_dev_event(&netdev->dev, 0, true);
> +			else if (parent && device_may_wakeup(parent))
> +				pm_wakeup_dev_event(parent, 0, true);
> +		}
> +
> +		return IRQ_HANDLED;

I'm not sure you can just throw the interrupt away. There have been
issues with WoL, where the WoL signal has been applied to a PMC, not
an actual interrupt. Yet the PHY driver assumes it is an
interrupt. And in order for WoL to work correctly, it needs the
interrupt handler to be called. We said the hardware is broken, WoL
cannot work for that setup.

Here you have correct hardware, but you are throwing the interrupt
away, which will have the same result. So i think you need to abort
the suspend, get the bus working again, and call the interrupt
handler. If this is a WoL interrupt you are supposed to be waking up
anyway.

	Andrew
Marek Szyprowski May 24, 2022, 6:16 a.m. UTC | #10
On 24.05.2022 03:08, Andrew Lunn wrote:
>> @@ -976,6 +977,25 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
>>   	struct phy_driver *drv = phydev->drv;
>>   	irqreturn_t ret;
>>   
>> +	if (IS_ENABLED(CONFIG_PM_SLEEP) &&
>> +	    (phydev->mdio.dev.power.is_prepared ||
>> +	     phydev->mdio.dev.power.is_suspended)) {
>> +		struct net_device *netdev = phydev->attached_dev;
>> +
>> +		if (netdev) {
>> +			struct device *parent = netdev->dev.parent;
>> +
>> +			if (netdev->wol_enabled)
>> +				pm_system_wakeup();
>> +			else if (device_may_wakeup(&netdev->dev))
>> +				pm_wakeup_dev_event(&netdev->dev, 0, true);
>> +			else if (parent && device_may_wakeup(parent))
>> +				pm_wakeup_dev_event(parent, 0, true);
>> +		}
>> +
>> +		return IRQ_HANDLED;
> I'm not sure you can just throw the interrupt away. There have been
> issues with WoL, where the WoL signal has been applied to a PMC, not
> an actual interrupt. Yet the PHY driver assumes it is an
> interrupt. And in order for WoL to work correctly, it needs the
> interrupt handler to be called. We said the hardware is broken, WoL
> cannot work for that setup.
>
> Here you have correct hardware, but you are throwing the interrupt
> away, which will have the same result. So i think you need to abort
> the suspend, get the bus working again, and call the interrupt
> handler. If this is a WoL interrupt you are supposed to be waking up
> anyway.

This hardware doesn't support wake-on-lan. It looks somehow that it 
manages to throw an interrupt just a moment before the power regulator 
for the whole usb bus is cut off.

Best regards
Andrew Lunn May 24, 2022, 12:03 p.m. UTC | #11
On Tue, May 24, 2022 at 08:16:23AM +0200, Marek Szyprowski wrote:
> On 24.05.2022 03:08, Andrew Lunn wrote:
> >> @@ -976,6 +977,25 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
> >>   	struct phy_driver *drv = phydev->drv;
> >>   	irqreturn_t ret;
> >>   
> >> +	if (IS_ENABLED(CONFIG_PM_SLEEP) &&
> >> +	    (phydev->mdio.dev.power.is_prepared ||
> >> +	     phydev->mdio.dev.power.is_suspended)) {
> >> +		struct net_device *netdev = phydev->attached_dev;
> >> +
> >> +		if (netdev) {
> >> +			struct device *parent = netdev->dev.parent;
> >> +
> >> +			if (netdev->wol_enabled)
> >> +				pm_system_wakeup();
> >> +			else if (device_may_wakeup(&netdev->dev))
> >> +				pm_wakeup_dev_event(&netdev->dev, 0, true);
> >> +			else if (parent && device_may_wakeup(parent))
> >> +				pm_wakeup_dev_event(parent, 0, true);
> >> +		}
> >> +
> >> +		return IRQ_HANDLED;
> > I'm not sure you can just throw the interrupt away. There have been
> > issues with WoL, where the WoL signal has been applied to a PMC, not
> > an actual interrupt. Yet the PHY driver assumes it is an
> > interrupt. And in order for WoL to work correctly, it needs the
> > interrupt handler to be called. We said the hardware is broken, WoL
> > cannot work for that setup.
> >
> > Here you have correct hardware, but you are throwing the interrupt
> > away, which will have the same result. So i think you need to abort
> > the suspend, get the bus working again, and call the interrupt
> > handler. If this is a WoL interrupt you are supposed to be waking up
> > anyway.
> 
> This hardware doesn't support wake-on-lan. It looks somehow that it 
> manages to throw an interrupt just a moment before the power regulator 
> for the whole usb bus is cut off.

Your hardware might not support WOL, but this is generic code. It
needs to work for all hardware.

As for this hardware, if it does not support WOL, why are interrupts
still enabled?

      Andrew
Lukas Wunner May 24, 2022, 12:13 p.m. UTC | #12
On Tue, May 24, 2022 at 03:08:07AM +0200, Andrew Lunn wrote:
> > @@ -976,6 +977,25 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
> >  	struct phy_driver *drv = phydev->drv;
> >  	irqreturn_t ret;
> >  
> > +	if (IS_ENABLED(CONFIG_PM_SLEEP) &&
> > +	    (phydev->mdio.dev.power.is_prepared ||
> > +	     phydev->mdio.dev.power.is_suspended)) {
> > +		struct net_device *netdev = phydev->attached_dev;
> > +
> > +		if (netdev) {
> > +			struct device *parent = netdev->dev.parent;
> > +
> > +			if (netdev->wol_enabled)
> > +				pm_system_wakeup();
> > +			else if (device_may_wakeup(&netdev->dev))
> > +				pm_wakeup_dev_event(&netdev->dev, 0, true);
> > +			else if (parent && device_may_wakeup(parent))
> > +				pm_wakeup_dev_event(parent, 0, true);
> > +		}
> > +
> > +		return IRQ_HANDLED;
> 
> I'm not sure you can just throw the interrupt away. There have been
> issues with WoL, where the WoL signal has been applied to a PMC, not
> an actual interrupt. Yet the PHY driver assumes it is an
> interrupt. And in order for WoL to work correctly, it needs the
> interrupt handler to be called. We said the hardware is broken, WoL
> cannot work for that setup.
> 
> Here you have correct hardware, but you are throwing the interrupt
> away, which will have the same result. So i think you need to abort
> the suspend, get the bus working again, and call the interrupt
> handler. If this is a WoL interrupt you are supposed to be waking up
> anyway.

mdio_bus_phy_resume() does trigger the state machine via
phy_start_machine(), so link state changes *are* detected after wakeup.

But you're saying that's not sufficient and you really want the
PHY driver's IRQ handler to be called, do I understand that correctly?

That could be achieved with a flag indicating that the IRQ handler
needs to be rerun after resume.  A simple invocation of irq_wake_thread()
will then achieve that.

It has also occurred to me that not clearing the IRQ may lead to
an interrupt storm if it's level-triggered.  So I need to disable
the IRQ and re-enable it after the PHY has been resumed.
Back to the drawing board...

Thanks,

Lukas
Lukas Wunner May 26, 2022, 1:55 p.m. UTC | #13
On Tue, May 24, 2022 at 02:03:52PM +0200, Andrew Lunn wrote:
> As for this hardware, if it does not support WOL, why are interrupts
> still enabled?

LAN95xx chips do support WoL and will signal a USB wake event.
But whether that actually results in resume from system sleep
depends on the capabilities of the SoC and its USB host controller.

LAN95xx supports a variety of wake options (WoL, PHY Energy Detect, ...)
and can use either its integrated SMSC PHY or an external PHY.
I'm not sure all wake options will work with arbitrary external PHYs.

If WoL or Wake on PHY Energy Detect is not used, we just program the
LAN95xx to enter a deeper power state which results in the respective
wake events being ignored.  As a result, interrupts may be left enabled
even though they're not used as a wakeup source.  The phylib doesn't
provide an API to selectively disable or enable interrupts, other than
phy_stop() and phy_start(), which does a lot more.

The patch I've submitted today treats such unnecessarily enabled
interrupts leniently:  It will not signal wakeup if that wasn't enabled
and just remembers that an interrupt occurred.  The interrupt will be
replayed upon resume and that's it.

Thanks,

Lukas
Andrew Lunn June 6, 2022, 1:28 a.m. UTC | #14
> mdio_bus_phy_resume() does trigger the state machine via
> phy_start_machine(), so link state changes *are* detected after wakeup.
> 
> But you're saying that's not sufficient and you really want the
> PHY driver's IRQ handler to be called, do I understand that correctly?

It is an interrupt, so i would expect the handler to be called. I've
never looked deeply how the kernel handles this, but maybe there is
some core support for this. The kernel does know about wake up
interrupts. The interesting bit is how do you defer the interrupt
until you have enough of the system running again you can actually
service the interrupt.

PHY interrupts mostly are level, not edge, because there are multiple
sources of interrupts within the PHY. So you do need to clear the
interrupt source, or you are going to get a storm, as you pointed out.
But being a level might actually help you. It fires once to get you
out of sleep, and then fires again when the interrupt controller is
resumed and is enabled.

	  Andrew
Marek Szyprowski Aug. 26, 2022, 6:51 a.m. UTC | #15
On 19.05.2022 23:22, Marek Szyprowski wrote:
> On 19.05.2022 21:08, Lukas Wunner wrote:
>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
>>> This patch landed in the recent linux next-20220516 as commit
>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY 
>>> driver to
>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet 
>>> operation
>>> after system suspend-resume cycle. On the Odroid XU3 board I got the
>>> following warning in the kernel log:
>>>
>>> # time rtcwake -s10 -mmem
>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
>>> PM: suspend entry (deep)
>>> Filesystems sync: 0.001 seconds
>>> Freezing user space processes ... (elapsed 0.002 seconds) done.
>>> OOM killer disabled.
>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>> printk: Suspending console(s) (use no_console_suspend to debug)
>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
>>> phy_state_machine+0x98/0x28c
>> [...]
>>> It looks that the driver's suspend/resume operations might need some
>>> adjustments. After the system suspend/resume cycle the driver is not
>>> operational anymore. Reverting the $subject patch on top of linux
>>> next-20220516 restores ethernet operation after system suspend/resume.
>> Thanks a lot for the report.  It seems the PHY is signaling a link 
>> change
>> shortly before system sleep and by the time the phy_state_machine() 
>> worker
>> gets around to handle it, the device has already been suspended and thus
>> refuses any further USB requests with -EHOSTUNREACH (-113):
>>
>> usb_suspend_both()
>>    usb_suspend_interface()
>>      smsc95xx_suspend()
>>        usbnet_suspend()
>>          __usbnet_status_stop_force() # stops interrupt polling,
>>                                       # link change is signaled 
>> before this
>>
>>    udev->can_submit = 0               # refuse further URBs
>>
>> Assuming the above theory is correct, calling phy_stop_machine()
>> after usbnet_suspend() would be sufficient to fix the issue.
>> It cancels the phy_state_machine() worker.
>>
>> The small patch below does that.  Could you give it a spin?
>
> That's it. Your analysis is right and the patch fixes the issue. Thanks!
>
> Feel free to add:
>
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>


Gentle ping for the final patch...

It looks that the similar fix is posted for other drivers, i.e.:

https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/


>
>> Taking a step back though, I'm wondering if there's a bigger problem 
>> here:
>> This is a USB device, so we stop receiving interrupts once the Interrupt
>> Endpoint is no longer polled.  But what if a PHY's interrupt is attached
>> to a GPIO of the SoC and that interrupt is raised while the system is
>> suspending?  The interrupt handler may likewise try to reach an
>> inaccessible (suspended) device.
>>
>> The right thing to do would probably be to signal wakeup.  But the
>> PHY drivers' irq handlers instead schedule the phy_state_machine().
>> Perhaps we need something like the following at the top of
>> phy_state_machine():
>>
>>     if (phydev->suspended) {
>>         pm_wakeup_dev_event(&phydev->mdio.dev, 0, true);
>>         return;
>>     }
>>
>> However, phydev->suspended is set at the *bottom* of phy_suspend(),
>> it would have to be set at the *top* of mdio_bus_phy_suspend()
>> for the above to be correct.  Hmmm...
> Well, your concern sounds valid, but I don't have a board with such hw 
> configuration, so I cannot really test.
>> Thanks,
>>
>> Lukas
>>
>> -- >8 --
>> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
>> index bd03e16..d351a6c 100644
>> --- a/drivers/net/usb/smsc95xx.c
>> +++ b/drivers/net/usb/smsc95xx.c
>> @@ -1201,6 +1201,7 @@ static int smsc95xx_bind(struct usbnet *dev, 
>> struct usb_interface *intf)
>>       }
>>         pdata->phydev->irq = phy_irq;
>> +    pdata->phydev->mac_managed_pm = true;
>>       pdata->phydev->is_internal = is_internal_phy;
>>         /* detect device revision as different features may be 
>> available */
>> @@ -1496,6 +1497,9 @@ static int smsc95xx_suspend(struct 
>> usb_interface *intf, pm_message_t message)
>>           return ret;
>>       }
>>   +    if (netif_running(dev->net))
>> +        phy_stop(pdata->phydev);
>> +
>>       if (pdata->suspend_flags) {
>>           netdev_warn(dev->net, "error during last resume\n");
>>           pdata->suspend_flags = 0;
>> @@ -1778,6 +1782,8 @@ static int smsc95xx_resume(struct usb_interface 
>> *intf)
>>       }
>>         phy_init_hw(pdata->phydev);
>> +    if (netif_running(dev->net))
>> +        phy_start(pdata->phydev);
>>         ret = usbnet_resume(intf);
>>       if (ret < 0)
>>
> Best regards

Best regards
Lukas Wunner Aug. 26, 2022, 7:19 a.m. UTC | #16
On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote:
> On 19.05.2022 23:22, Marek Szyprowski wrote:
> > On 19.05.2022 21:08, Lukas Wunner wrote:
> >> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
> >>> This patch landed in the recent linux next-20220516 as commit
> >>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY 
> >>> driver to
> >>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet 
> >>> operation
> >>> after system suspend-resume cycle. On the Odroid XU3 board I got the
> >>> following warning in the kernel log:
> >>>
> >>> # time rtcwake -s10 -mmem
> >>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
> >>> PM: suspend entry (deep)
> >>> Filesystems sync: 0.001 seconds
> >>> Freezing user space processes ... (elapsed 0.002 seconds) done.
> >>> OOM killer disabled.
> >>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >>> printk: Suspending console(s) (use no_console_suspend to debug)
> >>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
> >>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
> >>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
> >>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
> >>> ------------[ cut here ]------------
> >>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
> >>> phy_state_machine+0x98/0x28c
> >> [...]
> >>> It looks that the driver's suspend/resume operations might need some
> >>> adjustments. After the system suspend/resume cycle the driver is not
> >>> operational anymore. Reverting the $subject patch on top of linux
> >>> next-20220516 restores ethernet operation after system suspend/resume.
> >> Thanks a lot for the report. It seems the PHY is signaling a link 
> >> change
> >> shortly before system sleep and by the time the phy_state_machine() 
> >> worker
> >> gets around to handle it, the device has already been suspended and thus
> >> refuses any further USB requests with -EHOSTUNREACH (-113):
[...]
> >> Assuming the above theory is correct, calling phy_stop_machine()
> >> after usbnet_suspend() would be sufficient to fix the issue.
> >> It cancels the phy_state_machine() worker.
> >>
> >> The small patch below does that. Could you give it a spin?
> >
> > That's it. Your analysis is right and the patch fixes the issue. Thanks!
> >
> > Feel free to add:
> >
> > Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >
> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> Gentle ping for the final patch...

Hm?  Actually this issue is supposed to be fixed by mainline commit
1758bde2e4aa ("net: phy: Don't trigger state machine while in suspend").

The initial fix attempt that you're replying to should not be necessary
with that commit.

Are you still seeing issues even with 1758bde2e4aa applied?
Or are you maybe using a custom downstream tree which is missing that commit?

Thanks,

Lukas
Marek Szyprowski Aug. 26, 2022, 7:41 a.m. UTC | #17
Hi Lukas,

On 26.08.2022 09:19, Lukas Wunner wrote:
> On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote:
>> On 19.05.2022 23:22, Marek Szyprowski wrote:
>>> On 19.05.2022 21:08, Lukas Wunner wrote:
>>>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
>>>>> This patch landed in the recent linux next-20220516 as commit
>>>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY
>>>>> driver to
>>>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet
>>>>> operation
>>>>> after system suspend-resume cycle. On the Odroid XU3 board I got the
>>>>> following warning in the kernel log:
>>>>>
>>>>> # time rtcwake -s10 -mmem
>>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
>>>>> PM: suspend entry (deep)
>>>>> Filesystems sync: 0.001 seconds
>>>>> Freezing user space processes ... (elapsed 0.002 seconds) done.
>>>>> OOM killer disabled.
>>>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>>> printk: Suspending console(s) (use no_console_suspend to debug)
>>>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
>>>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
>>>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
>>>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
>>>>> ------------[ cut here ]------------
>>>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
>>>>> phy_state_machine+0x98/0x28c
>>>> [...]
>>>>> It looks that the driver's suspend/resume operations might need some
>>>>> adjustments. After the system suspend/resume cycle the driver is not
>>>>> operational anymore. Reverting the $subject patch on top of linux
>>>>> next-20220516 restores ethernet operation after system suspend/resume.
>>>> Thanks a lot for the report. It seems the PHY is signaling a link
>>>> change
>>>> shortly before system sleep and by the time the phy_state_machine()
>>>> worker
>>>> gets around to handle it, the device has already been suspended and thus
>>>> refuses any further USB requests with -EHOSTUNREACH (-113):
> [...]
>>>> Assuming the above theory is correct, calling phy_stop_machine()
>>>> after usbnet_suspend() would be sufficient to fix the issue.
>>>> It cancels the phy_state_machine() worker.
>>>>
>>>> The small patch below does that. Could you give it a spin?
>>> That's it. Your analysis is right and the patch fixes the issue. Thanks!
>>>
>>> Feel free to add:
>>>
>>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>
>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> Gentle ping for the final patch...
> Hm?  Actually this issue is supposed to be fixed by mainline commit
> 1758bde2e4aa ("net: phy: Don't trigger state machine while in suspend").
>
> The initial fix attempt that you're replying to should not be necessary
> with that commit.
>
> Are you still seeing issues even with 1758bde2e4aa applied?
> Or are you maybe using a custom downstream tree which is missing that commit?

On Linux next-20220825 I still get the following warning during 
suspend/resume cycle:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1483 at drivers/net/phy/phy_device.c:323 
mdio_bus_phy_resume+0x10c/0x110
Modules linked in: exynos_gsc s5p_jpeg s5p_mfc videobuf2_dma_contig 
v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev 
mc s5p_cec
CPU: 0 PID: 1483 Comm: rtcwake Not tainted 6.0.0-rc2-next-20220825 #5482
Hardware name: Samsung Exynos (Flattened Device Tree)
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from __warn+0xc8/0x220
  __warn from warn_slowpath_fmt+0x5c/0xb4
  warn_slowpath_fmt from mdio_bus_phy_resume+0x10c/0x110
  mdio_bus_phy_resume from dpm_run_callback+0x94/0x208
  dpm_run_callback from device_resume+0x124/0x21c
  device_resume from dpm_resume+0x108/0x278
  dpm_resume from dpm_resume_end+0xc/0x18
  dpm_resume_end from suspend_devices_and_enter+0x208/0x70c
  suspend_devices_and_enter from pm_suspend+0x364/0x430
  pm_suspend from state_store+0x68/0xc8
  state_store from kernfs_fop_write_iter+0x110/0x1d4
  kernfs_fop_write_iter from vfs_write+0x1c4/0x2ac
  vfs_write from ksys_write+0x5c/0xd4
  ksys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xf2ee5fa8 to 0xf2ee5ff0)
5fa0:                   00000004 0002b438 00000004 0002b438 00000004 
00000000
5fc0: 00000004 0002b438 000291b0 00000004 0002b438 00000004 befd9c1c 
00028160
5fe0: 0000006c befd9ae8 b6eb4148 b6f118a4
irq event stamp: 58381
hardirqs last  enabled at (58393): [<c019ff28>] vprintk_emit+0x320/0x344
hardirqs last disabled at (58400): [<c019fedc>] vprintk_emit+0x2d4/0x344
softirqs last  enabled at (58258): [<c0101694>] __do_softirq+0x354/0x618
softirqs last disabled at (58247): [<c012dd18>] __irq_exit_rcu+0x140/0x1ec
---[ end trace 0000000000000000 ]---

The mentioned patch fixes it.

Best regards
Lukas Wunner Aug. 26, 2022, 7:53 a.m. UTC | #18
On Fri, Aug 26, 2022 at 09:41:46AM +0200, Marek Szyprowski wrote:
> On 26.08.2022 09:19, Lukas Wunner wrote:
> > On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote:
> >> On 19.05.2022 23:22, Marek Szyprowski wrote:
> >>> On 19.05.2022 21:08, Lukas Wunner wrote:
> >>>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
> >>>>> This patch landed in the recent linux next-20220516 as commit
> >>>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY
> >>>>> driver to
> >>>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet
> >>>>> operation
> >>>>> after system suspend-resume cycle. On the Odroid XU3 board I got the
> >>>>> following warning in the kernel log:
> >>>>>
> >>>>> # time rtcwake -s10 -mmem
> >>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
> >>>>> PM: suspend entry (deep)
> >>>>> Filesystems sync: 0.001 seconds
> >>>>> Freezing user space processes ... (elapsed 0.002 seconds) done.
> >>>>> OOM killer disabled.
> >>>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >>>>> printk: Suspending console(s) (use no_console_suspend to debug)
> >>>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
> >>>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
> >>>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
> >>>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
> >>>>> ------------[ cut here ]------------
> >>>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
> >>>>> phy_state_machine+0x98/0x28c
> >>>> [...]
> >>>>> It looks that the driver's suspend/resume operations might need some
> >>>>> adjustments. After the system suspend/resume cycle the driver is not
> >>>>> operational anymore. Reverting the $subject patch on top of linux
> >>>>> next-20220516 restores ethernet operation after system suspend/resume.
> >>>> Thanks a lot for the report. It seems the PHY is signaling a link
> >>>> change
> >>>> shortly before system sleep and by the time the phy_state_machine()
> >>>> worker
> >>>> gets around to handle it, the device has already been suspended and thus
> >>>> refuses any further USB requests with -EHOSTUNREACH (-113):
> > [...]
> >>>> Assuming the above theory is correct, calling phy_stop_machine()
> >>>> after usbnet_suspend() would be sufficient to fix the issue.
> >>>> It cancels the phy_state_machine() worker.
> >>>>
> >>>> The small patch below does that. Could you give it a spin?
> >>> That's it. Your analysis is right and the patch fixes the issue. Thanks!
> >>>
> >>> Feel free to add:
> >>>
> >>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >>>
> >>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >> Gentle ping for the final patch...
> > Hm?  Actually this issue is supposed to be fixed by mainline commit
> > 1758bde2e4aa ("net: phy: Don't trigger state machine while in suspend").
> >
> > The initial fix attempt that you're replying to should not be necessary
> > with that commit.
> >
> > Are you still seeing issues even with 1758bde2e4aa applied?
> > Or are you maybe using a custom downstream tree which is missing that commit?
> 
> On Linux next-20220825 I still get the following warning during 
> suspend/resume cycle:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1483 at drivers/net/phy/phy_device.c:323 
> mdio_bus_phy_resume+0x10c/0x110
> Modules linked in: exynos_gsc s5p_jpeg s5p_mfc videobuf2_dma_contig 
> v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev 
> mc s5p_cec
> CPU: 0 PID: 1483 Comm: rtcwake Not tainted 6.0.0-rc2-next-20220825 #5482
> Hardware name: Samsung Exynos (Flattened Device Tree)
>   unwind_backtrace from show_stack+0x10/0x14
>   show_stack from dump_stack_lvl+0x58/0x70
>   dump_stack_lvl from __warn+0xc8/0x220
>   __warn from warn_slowpath_fmt+0x5c/0xb4
>   warn_slowpath_fmt from mdio_bus_phy_resume+0x10c/0x110
>   mdio_bus_phy_resume from dpm_run_callback+0x94/0x208
>   dpm_run_callback from device_resume+0x124/0x21c
>   device_resume from dpm_resume+0x108/0x278
>   dpm_resume from dpm_resume_end+0xc/0x18
>   dpm_resume_end from suspend_devices_and_enter+0x208/0x70c
>   suspend_devices_and_enter from pm_suspend+0x364/0x430
>   pm_suspend from state_store+0x68/0xc8
>   state_store from kernfs_fop_write_iter+0x110/0x1d4
>   kernfs_fop_write_iter from vfs_write+0x1c4/0x2ac
>   vfs_write from ksys_write+0x5c/0xd4
>   ksys_write from ret_fast_syscall+0x0/0x1c
> Exception stack(0xf2ee5fa8 to 0xf2ee5ff0)
> 5fa0:                   00000004 0002b438 00000004 0002b438 00000004 
> 00000000
> 5fc0: 00000004 0002b438 000291b0 00000004 0002b438 00000004 befd9c1c 
> 00028160
> 5fe0: 0000006c befd9ae8 b6eb4148 b6f118a4
> irq event stamp: 58381
> hardirqs last  enabled at (58393): [<c019ff28>] vprintk_emit+0x320/0x344
> hardirqs last disabled at (58400): [<c019fedc>] vprintk_emit+0x2d4/0x344
> softirqs last  enabled at (58258): [<c0101694>] __do_softirq+0x354/0x618
> softirqs last disabled at (58247): [<c012dd18>] __irq_exit_rcu+0x140/0x1ec
> ---[ end trace 0000000000000000 ]---
> 
> The mentioned patch fixes it.

Color me confused.

With "the mentioned patch", are you referring to 1758bde2e4aa
or are you referring to the little test patch in this e-mail:
https://lore.kernel.org/netdev/20220519190841.GA30869@wunner.de/

There's a Tested-by from you on 1758bde2e4aa, so I assume the
commit fixed the issue at the time.  Does it still occur
intermittently?  Or does it occur every time?  In the latter case,
I'd assume some other change was made in the meantime and that
other change exposed the issue again...

Thanks,

Lukas
Marek Szyprowski Aug. 26, 2022, 8:29 a.m. UTC | #19
Hi Lukas,

On 26.08.2022 09:53, Lukas Wunner wrote:
> On Fri, Aug 26, 2022 at 09:41:46AM +0200, Marek Szyprowski wrote:
>> On 26.08.2022 09:19, Lukas Wunner wrote:
>>> On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote:
>>>> On 19.05.2022 23:22, Marek Szyprowski wrote:
>>>>> On 19.05.2022 21:08, Lukas Wunner wrote:
>>>>>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
>>>>>>> This patch landed in the recent linux next-20220516 as commit
>>>>>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY
>>>>>>> driver to
>>>>>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet
>>>>>>> operation
>>>>>>> after system suspend-resume cycle. On the Odroid XU3 board I got the
>>>>>>> following warning in the kernel log:
>>>>>>>
>>>>>>> # time rtcwake -s10 -mmem
>>>>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
>>>>>>> PM: suspend entry (deep)
>>>>>>> Filesystems sync: 0.001 seconds
>>>>>>> Freezing user space processes ... (elapsed 0.002 seconds) done.
>>>>>>> OOM killer disabled.
>>>>>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>>>>> printk: Suspending console(s) (use no_console_suspend to debug)
>>>>>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
>>>>>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
>>>>>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
>>>>>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
>>>>>>> ------------[ cut here ]------------
>>>>>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
>>>>>>> phy_state_machine+0x98/0x28c
>>>>>> [...]
>>>>>>> It looks that the driver's suspend/resume operations might need some
>>>>>>> adjustments. After the system suspend/resume cycle the driver is not
>>>>>>> operational anymore. Reverting the $subject patch on top of linux
>>>>>>> next-20220516 restores ethernet operation after system suspend/resume.
>>>>>> Thanks a lot for the report. It seems the PHY is signaling a link
>>>>>> change
>>>>>> shortly before system sleep and by the time the phy_state_machine()
>>>>>> worker
>>>>>> gets around to handle it, the device has already been suspended and thus
>>>>>> refuses any further USB requests with -EHOSTUNREACH (-113):
>>> [...]
>>>>>> Assuming the above theory is correct, calling phy_stop_machine()
>>>>>> after usbnet_suspend() would be sufficient to fix the issue.
>>>>>> It cancels the phy_state_machine() worker.
>>>>>>
>>>>>> The small patch below does that. Could you give it a spin?
>>>>> That's it. Your analysis is right and the patch fixes the issue. Thanks!
>>>>>
>>>>> Feel free to add:
>>>>>
>>>>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>
>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>> Gentle ping for the final patch...
>>> Hm?  Actually this issue is supposed to be fixed by mainline commit
>>> 1758bde2e4aa ("net: phy: Don't trigger state machine while in suspend").
>>>
>>> The initial fix attempt that you're replying to should not be necessary
>>> with that commit.
>>>
>>> Are you still seeing issues even with 1758bde2e4aa applied?
>>> Or are you maybe using a custom downstream tree which is missing that commit?
>> On Linux next-20220825 I still get the following warning during
>> suspend/resume cycle:
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 1483 at drivers/net/phy/phy_device.c:323
>> mdio_bus_phy_resume+0x10c/0x110
>> Modules linked in: exynos_gsc s5p_jpeg s5p_mfc videobuf2_dma_contig
>> v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev
>> mc s5p_cec
>> CPU: 0 PID: 1483 Comm: rtcwake Not tainted 6.0.0-rc2-next-20220825 #5482
>> Hardware name: Samsung Exynos (Flattened Device Tree)
>>    unwind_backtrace from show_stack+0x10/0x14
>>    show_stack from dump_stack_lvl+0x58/0x70
>>    dump_stack_lvl from __warn+0xc8/0x220
>>    __warn from warn_slowpath_fmt+0x5c/0xb4
>>    warn_slowpath_fmt from mdio_bus_phy_resume+0x10c/0x110
>>    mdio_bus_phy_resume from dpm_run_callback+0x94/0x208
>>    dpm_run_callback from device_resume+0x124/0x21c
>>    device_resume from dpm_resume+0x108/0x278
>>    dpm_resume from dpm_resume_end+0xc/0x18
>>    dpm_resume_end from suspend_devices_and_enter+0x208/0x70c
>>    suspend_devices_and_enter from pm_suspend+0x364/0x430
>>    pm_suspend from state_store+0x68/0xc8
>>    state_store from kernfs_fop_write_iter+0x110/0x1d4
>>    kernfs_fop_write_iter from vfs_write+0x1c4/0x2ac
>>    vfs_write from ksys_write+0x5c/0xd4
>>    ksys_write from ret_fast_syscall+0x0/0x1c
>> Exception stack(0xf2ee5fa8 to 0xf2ee5ff0)
>> 5fa0:                   00000004 0002b438 00000004 0002b438 00000004
>> 00000000
>> 5fc0: 00000004 0002b438 000291b0 00000004 0002b438 00000004 befd9c1c
>> 00028160
>> 5fe0: 0000006c befd9ae8 b6eb4148 b6f118a4
>> irq event stamp: 58381
>> hardirqs last  enabled at (58393): [<c019ff28>] vprintk_emit+0x320/0x344
>> hardirqs last disabled at (58400): [<c019fedc>] vprintk_emit+0x2d4/0x344
>> softirqs last  enabled at (58258): [<c0101694>] __do_softirq+0x354/0x618
>> softirqs last disabled at (58247): [<c012dd18>] __irq_exit_rcu+0x140/0x1ec
>> ---[ end trace 0000000000000000 ]---
>>
>> The mentioned patch fixes it.
> Color me confused.
>
> With "the mentioned patch", are you referring to 1758bde2e4aa
> or are you referring to the little test patch in this e-mail:
> https://lore.kernel.org/netdev/20220519190841.GA30869@wunner.de/
>
> There's a Tested-by from you on 1758bde2e4aa, so I assume the
> commit fixed the issue at the time.  Does it still occur
> intermittently?  Or does it occur every time?  In the latter case,
> I'd assume some other change was made in the meantime and that
> other change exposed the issue again...

The issue seems to be exposed again. On 1758bde2e4aa everything works 
fine and there is no warning during suspend/resume cycle. Then I've 
noticed that warning again while playing with current linux-next. I will 
check when it got broken and report again.

Best regards
Marek Szyprowski Aug. 29, 2022, 11:40 a.m. UTC | #20
Hi All,

On 26.08.2022 10:29, Marek Szyprowski wrote:
> On 26.08.2022 09:53, Lukas Wunner wrote:
>> On Fri, Aug 26, 2022 at 09:41:46AM +0200, Marek Szyprowski wrote:
>>> On 26.08.2022 09:19, Lukas Wunner wrote:
>>>> On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote:
>>>>> On 19.05.2022 23:22, Marek Szyprowski wrote:
>>>>>> On 19.05.2022 21:08, Lukas Wunner wrote:
>>>>>>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
>>>>>>>> This patch landed in the recent linux next-20220516 as commit
>>>>>>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY
>>>>>>>> driver to
>>>>>>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet
>>>>>>>> operation
>>>>>>>> after system suspend-resume cycle. On the Odroid XU3 board I 
>>>>>>>> got the
>>>>>>>> following warning in the kernel log:
>>>>>>>>
>>>>>>>> # time rtcwake -s10 -mmem
>>>>>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 
>>>>>>>> 09:16:07 2022
>>>>>>>> PM: suspend entry (deep)
>>>>>>>> Filesystems sync: 0.001 seconds
>>>>>>>> Freezing user space processes ... (elapsed 0.002 seconds) done.
>>>>>>>> OOM killer disabled.
>>>>>>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) 
>>>>>>>> done.
>>>>>>>> printk: Suspending console(s) (use no_console_suspend to debug)
>>>>>>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
>>>>>>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
>>>>>>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
>>>>>>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
>>>>>>>> ------------[ cut here ]------------
>>>>>>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
>>>>>>>> phy_state_machine+0x98/0x28c
>>>>>>> [...]
>>>>>>>> It looks that the driver's suspend/resume operations might need 
>>>>>>>> some
>>>>>>>> adjustments. After the system suspend/resume cycle the driver 
>>>>>>>> is not
>>>>>>>> operational anymore. Reverting the $subject patch on top of linux
>>>>>>>> next-20220516 restores ethernet operation after system 
>>>>>>>> suspend/resume.
>>>>>>> Thanks a lot for the report. It seems the PHY is signaling a link
>>>>>>> change
>>>>>>> shortly before system sleep and by the time the phy_state_machine()
>>>>>>> worker
>>>>>>> gets around to handle it, the device has already been suspended 
>>>>>>> and thus
>>>>>>> refuses any further USB requests with -EHOSTUNREACH (-113):
>>>> [...]
>>>>>>> Assuming the above theory is correct, calling phy_stop_machine()
>>>>>>> after usbnet_suspend() would be sufficient to fix the issue.
>>>>>>> It cancels the phy_state_machine() worker.
>>>>>>>
>>>>>>> The small patch below does that. Could you give it a spin?
>>>>>> That's it. Your analysis is right and the patch fixes the issue. 
>>>>>> Thanks!
>>>>>>
>>>>>> Feel free to add:
>>>>>>
>>>>>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>>
>>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>> Gentle ping for the final patch...
>>>> Hm?  Actually this issue is supposed to be fixed by mainline commit
>>>> 1758bde2e4aa ("net: phy: Don't trigger state machine while in 
>>>> suspend").
>>>>
>>>> The initial fix attempt that you're replying to should not be 
>>>> necessary
>>>> with that commit.
>>>>
>>>> Are you still seeing issues even with 1758bde2e4aa applied?
>>>> Or are you maybe using a custom downstream tree which is missing 
>>>> that commit?
>>> On Linux next-20220825 I still get the following warning during
>>> suspend/resume cycle:
>>>
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 1483 at drivers/net/phy/phy_device.c:323
>>> mdio_bus_phy_resume+0x10c/0x110
>>> Modules linked in: exynos_gsc s5p_jpeg s5p_mfc videobuf2_dma_contig
>>> v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev
>>> mc s5p_cec
>>> CPU: 0 PID: 1483 Comm: rtcwake Not tainted 6.0.0-rc2-next-20220825 
>>> #5482
>>> Hardware name: Samsung Exynos (Flattened Device Tree)
>>>    unwind_backtrace from show_stack+0x10/0x14
>>>    show_stack from dump_stack_lvl+0x58/0x70
>>>    dump_stack_lvl from __warn+0xc8/0x220
>>>    __warn from warn_slowpath_fmt+0x5c/0xb4
>>>    warn_slowpath_fmt from mdio_bus_phy_resume+0x10c/0x110
>>>    mdio_bus_phy_resume from dpm_run_callback+0x94/0x208
>>>    dpm_run_callback from device_resume+0x124/0x21c
>>>    device_resume from dpm_resume+0x108/0x278
>>>    dpm_resume from dpm_resume_end+0xc/0x18
>>>    dpm_resume_end from suspend_devices_and_enter+0x208/0x70c
>>>    suspend_devices_and_enter from pm_suspend+0x364/0x430
>>>    pm_suspend from state_store+0x68/0xc8
>>>    state_store from kernfs_fop_write_iter+0x110/0x1d4
>>>    kernfs_fop_write_iter from vfs_write+0x1c4/0x2ac
>>>    vfs_write from ksys_write+0x5c/0xd4
>>>    ksys_write from ret_fast_syscall+0x0/0x1c
>>> Exception stack(0xf2ee5fa8 to 0xf2ee5ff0)
>>> 5fa0:                   00000004 0002b438 00000004 0002b438 00000004
>>> 00000000
>>> 5fc0: 00000004 0002b438 000291b0 00000004 0002b438 00000004 befd9c1c
>>> 00028160
>>> 5fe0: 0000006c befd9ae8 b6eb4148 b6f118a4
>>> irq event stamp: 58381
>>> hardirqs last  enabled at (58393): [<c019ff28>] 
>>> vprintk_emit+0x320/0x344
>>> hardirqs last disabled at (58400): [<c019fedc>] 
>>> vprintk_emit+0x2d4/0x344
>>> softirqs last  enabled at (58258): [<c0101694>] 
>>> __do_softirq+0x354/0x618
>>> softirqs last disabled at (58247): [<c012dd18>] 
>>> __irq_exit_rcu+0x140/0x1ec
>>> ---[ end trace 0000000000000000 ]---
>>>
>>> The mentioned patch fixes it.
>> Color me confused.
>>
>> With "the mentioned patch", are you referring to 1758bde2e4aa
>> or are you referring to the little test patch in this e-mail:
>> https://lore.kernel.org/netdev/20220519190841.GA30869@wunner.de/
>>
>> There's a Tested-by from you on 1758bde2e4aa, so I assume the
>> commit fixed the issue at the time.  Does it still occur
>> intermittently?  Or does it occur every time?  In the latter case,
>> I'd assume some other change was made in the meantime and that
>> other change exposed the issue again...
>
> The issue seems to be exposed again. On 1758bde2e4aa everything works 
> fine and there is no warning during suspend/resume cycle. Then I've 
> noticed that warning again while playing with current linux-next. I 
> will check when it got broken and report again.

I've finally traced what has happened. I've double checked and indeed 
the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as 
such it has been merged to linus tree. Then the commit 744d23c71af3 
("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been 
merged to linus tree, which triggers a new warning during the 
suspend/resume cycle with smsc95xx driver. Please note, that the 
smsc95xx still works fine regardless that warning. However it look that 
the commit 1758bde2e4aa only hide a real problem, which the commit 
744d23c71af3 warns about.

Probably a proper fix for smsc95xx driver is to call phy_stop/start 
during suspend/resume cycle, like in similar patches for other drivers:

https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/

Best regards
Lukas Wunner Sept. 18, 2022, 7:13 p.m. UTC | #21
[cc += Florian]

On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
> I've finally traced what has happened. I've double checked and indeed 
> the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as 
> such it has been merged to linus tree. Then the commit 744d23c71af3 
> ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been 
> merged to linus tree, which triggers a new warning during the 
> suspend/resume cycle with smsc95xx driver. Please note, that the 
> smsc95xx still works fine regardless that warning. However it look that 
> the commit 1758bde2e4aa only hide a real problem, which the commit 
> 744d23c71af3 warns about.
> 
> Probably a proper fix for smsc95xx driver is to call phy_stop/start 
> during suspend/resume cycle, like in similar patches for other drivers:
> 
> https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/

No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
no need to call phy_{stop,start}().

744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
of the fallout.

However the WARN() condition still seems too broad and causes false
positives such as in your case.  In particular, mdio_bus_phy_suspend()
may leave the device in PHY_UP state, so that's a legal state that
needs to be exempted from the WARN().

Does the issue still appear even after 6dbe852c379f?

If it does, could you test whether exempting PHY_UP silences the
gratuitous WARN splat?  I.e.:

-	WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY);
+	WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY &&
+		phydev->state != PHY_UP);

Thanks,

Lukas
Florian Fainelli Sept. 18, 2022, 8:41 p.m. UTC | #22
On 9/18/2022 12:13 PM, Lukas Wunner wrote:
> [cc += Florian]
> 
> On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
>> I've finally traced what has happened. I've double checked and indeed
>> the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as
>> such it has been merged to linus tree. Then the commit 744d23c71af3
>> ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been
>> merged to linus tree, which triggers a new warning during the
>> suspend/resume cycle with smsc95xx driver. Please note, that the
>> smsc95xx still works fine regardless that warning. However it look that
>> the commit 1758bde2e4aa only hide a real problem, which the commit
>> 744d23c71af3 warns about.
>>
>> Probably a proper fix for smsc95xx driver is to call phy_stop/start
>> during suspend/resume cycle, like in similar patches for other drivers:
>>
>> https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/
> 
> No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
> no need to call phy_{stop,start}() >
> 744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
> of the fallout.
> 
> However the WARN() condition still seems too broad and causes false
> positives such as in your case.  In particular, mdio_bus_phy_suspend()
> may leave the device in PHY_UP state, so that's a legal state that
> needs to be exempted from the WARN().

How is that a legal state when the PHY should be suspended? Even if we 
are interrupt driven, the state machine should be stopped, does not mean 
that Wake-on-LAN or other activity interrupts should be disabled.

> 
> Does the issue still appear even after 6dbe852c379f?
> 
> If it does, could you test whether exempting PHY_UP silences the
> gratuitous WARN splat?  I.e.:

If you allow PHY_UP, then the warning becomes effectively useless, so I 
don't believe this is quite what you want to do here.
Lukas Wunner Sept. 18, 2022, 8:55 p.m. UTC | #23
On Sun, Sep 18, 2022 at 01:41:13PM -0700, Florian Fainelli wrote:
> On 9/18/2022 12:13 PM, Lukas Wunner wrote:
> > On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
> > > I've finally traced what has happened. I've double checked and indeed
> > > the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as
> > > such it has been merged to linus tree. Then the commit 744d23c71af3
> > > ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been
> > > merged to linus tree, which triggers a new warning during the
> > > suspend/resume cycle with smsc95xx driver. Please note, that the
> > > smsc95xx still works fine regardless that warning. However it look that
> > > the commit 1758bde2e4aa only hide a real problem, which the commit
> > > 744d23c71af3 warns about.
> > > 
> > > Probably a proper fix for smsc95xx driver is to call phy_stop/start
> > > during suspend/resume cycle, like in similar patches for other drivers:
> > > 
> > > https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/
> > 
> > No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
> > no need to call phy_{stop,start}() >
> > 744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
> > of the fallout.
> > 
> > However the WARN() condition still seems too broad and causes false
> > positives such as in your case.  In particular, mdio_bus_phy_suspend()
> > may leave the device in PHY_UP state, so that's a legal state that
> > needs to be exempted from the WARN().
> 
> How is that a legal state when the PHY should be suspended? Even if we are
> interrupt driven, the state machine should be stopped, does not mean that
> Wake-on-LAN or other activity interrupts should be disabled.

mdio_bus_phy_suspend()
  phy_stop_machine()
    phydev->state = PHY_UP  #  if (phydev->state >= PHY_UP)

So apparently PHY_UP is a legal state for a suspended PHY.


> > Does the issue still appear even after 6dbe852c379f?
> > 
> > If it does, could you test whether exempting PHY_UP silences the
> > gratuitous WARN splat?  I.e.:
> 
> If you allow PHY_UP, then the warning becomes effectively useless, so I
> don't believe this is quite what you want to do here.

Hm, maybe the WARN() should be dropped altogether?

Thanks,

Lukas
Florian Fainelli Sept. 18, 2022, 10:11 p.m. UTC | #24
On 9/18/2022 1:55 PM, Lukas Wunner wrote:
> On Sun, Sep 18, 2022 at 01:41:13PM -0700, Florian Fainelli wrote:
>> On 9/18/2022 12:13 PM, Lukas Wunner wrote:
>>> On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
>>>> I've finally traced what has happened. I've double checked and indeed
>>>> the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as
>>>> such it has been merged to linus tree. Then the commit 744d23c71af3
>>>> ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been
>>>> merged to linus tree, which triggers a new warning during the
>>>> suspend/resume cycle with smsc95xx driver. Please note, that the
>>>> smsc95xx still works fine regardless that warning. However it look that
>>>> the commit 1758bde2e4aa only hide a real problem, which the commit
>>>> 744d23c71af3 warns about.
>>>>
>>>> Probably a proper fix for smsc95xx driver is to call phy_stop/start
>>>> during suspend/resume cycle, like in similar patches for other drivers:
>>>>
>>>> https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/
>>>
>>> No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
>>> no need to call phy_{stop,start}() >
>>> 744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
>>> of the fallout.
>>>
>>> However the WARN() condition still seems too broad and causes false
>>> positives such as in your case.  In particular, mdio_bus_phy_suspend()
>>> may leave the device in PHY_UP state, so that's a legal state that
>>> needs to be exempted from the WARN().
>>
>> How is that a legal state when the PHY should be suspended? Even if we are
>> interrupt driven, the state machine should be stopped, does not mean that
>> Wake-on-LAN or other activity interrupts should be disabled.
> 
> mdio_bus_phy_suspend()
>    phy_stop_machine()
>      phydev->state = PHY_UP  #  if (phydev->state >= PHY_UP)
> 
> So apparently PHY_UP is a legal state for a suspended PHY.

It is not clear to me why, however. Sure it does ensure that when we 
resume we set needs_aneg = true but this feels like a hack in the sense 
that we are setting the PHY in a provisional state in anticipation for 
what might come next.

> 
> 
>>> Does the issue still appear even after 6dbe852c379f?
>>>
>>> If it does, could you test whether exempting PHY_UP silences the
>>> gratuitous WARN splat?  I.e.:
>>
>> If you allow PHY_UP, then the warning becomes effectively useless, so I
>> don't believe this is quite what you want to do here.
> 
> Hm, maybe the WARN() should be dropped altogether?

And then be left with debugging similar problems that prompted me to 
submit the patch in the first place, no thank you. I guess I would 
rather accept that PHY_UP needs to be special cased then.
Marek Szyprowski Sept. 22, 2022, 1:21 p.m. UTC | #25
On 18.09.2022 21:13, Lukas Wunner wrote:
> [cc += Florian]
>
> On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
>> I've finally traced what has happened. I've double checked and indeed
>> the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as
>> such it has been merged to linus tree. Then the commit 744d23c71af3
>> ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been
>> merged to linus tree, which triggers a new warning during the
>> suspend/resume cycle with smsc95xx driver. Please note, that the
>> smsc95xx still works fine regardless that warning. However it look that
>> the commit 1758bde2e4aa only hide a real problem, which the commit
>> 744d23c71af3 warns about.
>>
>> Probably a proper fix for smsc95xx driver is to call phy_stop/start
>> during suspend/resume cycle, like in similar patches for other drivers:
>>
>> https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/
> No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
> no need to call phy_{stop,start}().
>
> 744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
> of the fallout.
>
> However the WARN() condition still seems too broad and causes false
> positives such as in your case.  In particular, mdio_bus_phy_suspend()
> may leave the device in PHY_UP state, so that's a legal state that
> needs to be exempted from the WARN().
>
> Does the issue still appear even after 6dbe852c379f?
>
> If it does, could you test whether exempting PHY_UP silences the
> gratuitous WARN splat?  I.e.:
>
> -	WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY);
> +	WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY &&
> +		phydev->state != PHY_UP);

Yes, this hides this warning in my case. I've tested on linux 
next-20220921 with the above change.

Best regards
Lukas Wunner Sept. 23, 2022, 4:20 a.m. UTC | #26
On Sun, Sep 18, 2022 at 03:11:47PM -0700, Florian Fainelli wrote:
> On 9/18/2022 1:55 PM, Lukas Wunner wrote:
> > On Sun, Sep 18, 2022 at 01:41:13PM -0700, Florian Fainelli wrote:
> > > On 9/18/2022 12:13 PM, Lukas Wunner wrote:
> > > > On Mon, Aug 29, 2022 at 01:40:05PM +0200, Marek Szyprowski wrote:
> > > > > I've finally traced what has happened. I've double checked and indeed
> > > > > the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as
> > > > > such it has been merged to linus tree. Then the commit 744d23c71af3
> > > > > ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been
> > > > > merged to linus tree, which triggers a new warning during the
> > > > > suspend/resume cycle with smsc95xx driver. Please note, that the
> > > > > smsc95xx still works fine regardless that warning. However it look that
> > > > > the commit 1758bde2e4aa only hide a real problem, which the commit
> > > > > 744d23c71af3 warns about.
> > > > > 
> > > > > Probably a proper fix for smsc95xx driver is to call phy_stop/start
> > > > > during suspend/resume cycle, like in similar patches for other drivers:
> > > > > 
> > > > > https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@gmail.com/
> > > > 
> > > > No, smsc95xx.c relies on mdio_bus_phy_{suspend,resume}() and there's
> > > > no need to call phy_{stop,start}() >
> > > > 744d23c71af3 was flawed and 6dbe852c379f has already fixed a portion
> > > > of the fallout.
> > > > 
> > > > However the WARN() condition still seems too broad and causes false
> > > > positives such as in your case.  In particular, mdio_bus_phy_suspend()
> > > > may leave the device in PHY_UP state, so that's a legal state that
> > > > needs to be exempted from the WARN().
> > > 
> > > How is that a legal state when the PHY should be suspended? Even if we are
> > > interrupt driven, the state machine should be stopped, does not mean that
> > > Wake-on-LAN or other activity interrupts should be disabled.
> > 
> > mdio_bus_phy_suspend()
> >    phy_stop_machine()
> >      phydev->state = PHY_UP  #  if (phydev->state >= PHY_UP)
> > 
> > So apparently PHY_UP is a legal state for a suspended PHY.
> 
> It is not clear to me why, however. Sure it does ensure that when we resume
> we set needs_aneg = true but this feels like a hack in the sense that we are
> setting the PHY in a provisional state in anticipation for what might come
> next.

I've just submitted a fix so that at least v6.0 doesn't get released
with a false-positive WARN splat on resume:

https://lore.kernel.org/netdev/8128fdb51eeebc9efbf3776a4097363a1317aaf1.1663905575.git.lukas@wunner.de/

I guess we can look into making the state setting more logical in a
separate step.


> > > If you allow PHY_UP, then the warning becomes effectively useless, so I
> > > don't believe this is quite what you want to do here.
> > 
> > Hm, maybe the WARN() should be dropped altogether?
> 
> And then be left with debugging similar problems that prompted me to submit
> the patch in the first place, no thank you. I guess I would rather accept
> that PHY_UP needs to be special cased then.

I've interpreted that as an Acked-by for exempting PHY_UP.
If that was not what you wanted, please speak up.

Thanks,

Lukas
diff mbox series

Patch

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 6309ff8e75de..bd03e16f98a1 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -18,6 +18,8 @@ 
 #include <linux/usb/usbnet.h>
 #include <linux/slab.h>
 #include <linux/of_net.h>
+#include <linux/irq.h>
+#include <linux/irqdomain.h>
 #include <linux/mdio.h>
 #include <linux/phy.h>
 #include <net/selftests.h>
@@ -53,6 +55,9 @@ 
 #define SUSPEND_ALLMODES		(SUSPEND_SUSPEND0 | SUSPEND_SUSPEND1 | \
 					 SUSPEND_SUSPEND2 | SUSPEND_SUSPEND3)
 
+#define SMSC95XX_NR_IRQS		(1) /* raise to 12 for GPIOs */
+#define PHY_HWIRQ			(SMSC95XX_NR_IRQS - 1)
+
 struct smsc95xx_priv {
 	u32 mac_cr;
 	u32 hash_hi;
@@ -61,6 +66,9 @@  struct smsc95xx_priv {
 	spinlock_t mac_cr_lock;
 	u8 features;
 	u8 suspend_flags;
+	struct irq_chip irqchip;
+	struct irq_domain *irqdomain;
+	struct fwnode_handle *irqfwnode;
 	struct mii_bus *mdiobus;
 	struct phy_device *phydev;
 };
@@ -597,6 +605,8 @@  static void smsc95xx_mac_update_fullduplex(struct usbnet *dev)
 
 static void smsc95xx_status(struct usbnet *dev, struct urb *urb)
 {
+	struct smsc95xx_priv *pdata = dev->driver_priv;
+	unsigned long flags;
 	u32 intdata;
 
 	if (urb->actual_length != 4) {
@@ -608,11 +618,15 @@  static void smsc95xx_status(struct usbnet *dev, struct urb *urb)
 	intdata = get_unaligned_le32(urb->transfer_buffer);
 	netif_dbg(dev, link, dev->net, "intdata: 0x%08X\n", intdata);
 
+	local_irq_save(flags);
+
 	if (intdata & INT_ENP_PHY_INT_)
-		;
+		generic_handle_domain_irq(pdata->irqdomain, PHY_HWIRQ);
 	else
 		netdev_warn(dev->net, "unexpected interrupt, intdata=0x%08X\n",
 			    intdata);
+
+	local_irq_restore(flags);
 }
 
 /* Enable or disable Tx & Rx checksum offload engines */
@@ -1080,8 +1094,9 @@  static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
 {
 	struct smsc95xx_priv *pdata;
 	bool is_internal_phy;
+	char usb_path[64];
+	int ret, phy_irq;
 	u32 val;
-	int ret;
 
 	printk(KERN_INFO SMSC_CHIPNAME " v" SMSC_DRIVER_VERSION "\n");
 
@@ -1121,10 +1136,38 @@  static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
 	if (ret)
 		goto free_pdata;
 
+	/* create irq domain for use by PHY driver and GPIO consumers */
+	usb_make_path(dev->udev, usb_path, sizeof(usb_path));
+	pdata->irqfwnode = irq_domain_alloc_named_fwnode(usb_path);
+	if (!pdata->irqfwnode) {
+		ret = -ENOMEM;
+		goto free_pdata;
+	}
+
+	pdata->irqdomain = irq_domain_create_linear(pdata->irqfwnode,
+						    SMSC95XX_NR_IRQS,
+						    &irq_domain_simple_ops,
+						    pdata);
+	if (!pdata->irqdomain) {
+		ret = -ENOMEM;
+		goto free_irqfwnode;
+	}
+
+	phy_irq = irq_create_mapping(pdata->irqdomain, PHY_HWIRQ);
+	if (!phy_irq) {
+		ret = -ENOENT;
+		goto remove_irqdomain;
+	}
+
+	pdata->irqchip = dummy_irq_chip;
+	pdata->irqchip.name = SMSC_CHIPNAME;
+	irq_set_chip_and_handler_name(phy_irq, &pdata->irqchip,
+				      handle_simple_irq, "phy");
+
 	pdata->mdiobus = mdiobus_alloc();
 	if (!pdata->mdiobus) {
 		ret = -ENOMEM;
-		goto free_pdata;
+		goto dispose_irq;
 	}
 
 	ret = smsc95xx_read_reg(dev, HW_CFG, &val);
@@ -1157,6 +1200,7 @@  static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
 		goto unregister_mdio;
 	}
 
+	pdata->phydev->irq = phy_irq;
 	pdata->phydev->is_internal = is_internal_phy;
 
 	/* detect device revision as different features may be available */
@@ -1199,6 +1243,15 @@  static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf)
 free_mdio:
 	mdiobus_free(pdata->mdiobus);
 
+dispose_irq:
+	irq_dispose_mapping(phy_irq);
+
+remove_irqdomain:
+	irq_domain_remove(pdata->irqdomain);
+
+free_irqfwnode:
+	irq_domain_free_fwnode(pdata->irqfwnode);
+
 free_pdata:
 	kfree(pdata);
 	return ret;
@@ -1211,6 +1264,9 @@  static void smsc95xx_unbind(struct usbnet *dev, struct usb_interface *intf)
 	phy_disconnect(dev->net->phydev);
 	mdiobus_unregister(pdata->mdiobus);
 	mdiobus_free(pdata->mdiobus);
+	irq_dispose_mapping(irq_find_mapping(pdata->irqdomain, PHY_HWIRQ));
+	irq_domain_remove(pdata->irqdomain);
+	irq_domain_free_fwnode(pdata->irqfwnode);
 	netif_dbg(dev, ifdown, dev->net, "free pdata\n");
 	kfree(pdata);
 }
@@ -1235,29 +1291,6 @@  static u32 smsc_crc(const u8 *buffer, size_t len, int filter)
 	return crc << ((filter % 2) * 16);
 }
 
-static int smsc95xx_enable_phy_wakeup_interrupts(struct usbnet *dev, u16 mask)
-{
-	int ret;
-
-	netdev_dbg(dev->net, "enabling PHY wakeup interrupts\n");
-
-	/* read to clear */
-	ret = smsc95xx_mdio_read_nopm(dev, PHY_INT_SRC);
-	if (ret < 0)
-		return ret;
-
-	/* enable interrupt source */
-	ret = smsc95xx_mdio_read_nopm(dev, PHY_INT_MASK);
-	if (ret < 0)
-		return ret;
-
-	ret |= mask;
-
-	smsc95xx_mdio_write_nopm(dev, PHY_INT_MASK, ret);
-
-	return 0;
-}
-
 static int smsc95xx_link_ok_nopm(struct usbnet *dev)
 {
 	int ret;
@@ -1424,7 +1457,6 @@  static int smsc95xx_enter_suspend3(struct usbnet *dev)
 static int smsc95xx_autosuspend(struct usbnet *dev, u32 link_up)
 {
 	struct smsc95xx_priv *pdata = dev->driver_priv;
-	int ret;
 
 	if (!netif_running(dev->net)) {
 		/* interface is ifconfig down so fully power down hw */
@@ -1443,27 +1475,10 @@  static int smsc95xx_autosuspend(struct usbnet *dev, u32 link_up)
 		}
 
 		netdev_dbg(dev->net, "autosuspend entering SUSPEND1\n");
-
-		/* enable PHY wakeup events for if cable is attached */
-		ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
-			PHY_INT_MASK_ANEG_COMP_);
-		if (ret < 0) {
-			netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
-			return ret;
-		}
-
 		netdev_info(dev->net, "entering SUSPEND1 mode\n");
 		return smsc95xx_enter_suspend1(dev);
 	}
 
-	/* enable PHY wakeup events so we remote wakeup if cable is pulled */
-	ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
-		PHY_INT_MASK_LINK_DOWN_);
-	if (ret < 0) {
-		netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
-		return ret;
-	}
-
 	netdev_dbg(dev->net, "autosuspend entering SUSPEND3\n");
 	return smsc95xx_enter_suspend3(dev);
 }
@@ -1529,13 +1544,6 @@  static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
 	}
 
 	if (pdata->wolopts & WAKE_PHY) {
-		ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
-			(PHY_INT_MASK_ANEG_COMP_ | PHY_INT_MASK_LINK_DOWN_));
-		if (ret < 0) {
-			netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
-			goto done;
-		}
-
 		/* if link is down then configure EDPD and enter SUSPEND1,
 		 * otherwise enter SUSPEND0 below
 		 */
@@ -1769,11 +1777,12 @@  static int smsc95xx_resume(struct usb_interface *intf)
 			return ret;
 	}
 
+	phy_init_hw(pdata->phydev);
+
 	ret = usbnet_resume(intf);
 	if (ret < 0)
 		netdev_warn(dev->net, "usbnet_resume error\n");
 
-	phy_init_hw(pdata->phydev);
 	return ret;
 }