diff mbox series

[RFC,net-next,v1,2/2] net. phy: dp83tg720: Add randomized polling intervals for unstable link detection

Message ID 20241127131011.92800-2-o.rempel@pengutronix.de (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net-next,v1,1/2] net: phy: Add support for driver-specific next update time | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/apply fail Patch does not apply to net-next-1

Commit Message

Oleksij Rempel Nov. 27, 2024, 1:10 p.m. UTC
Address the limitations of the DP83TG720 PHY, which cannot reliably detect or
report a stable link state. To handle this, the PHY must be periodically reset
when the link is down. However, synchronized reset intervals between the PHY
and its link partner can result in a deadlock, preventing the link from
re-establishing.

This change introduces a randomized polling interval when the link is down to
desynchronize resets between link partners.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
---
 drivers/net/phy/dp83tg720.c | 76 +++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

--
2.39.5

Comments

Andrew Lunn Nov. 27, 2024, 3:37 p.m. UTC | #1
On Wed, Nov 27, 2024 at 02:10:11PM +0100, Oleksij Rempel wrote:
> Address the limitations of the DP83TG720 PHY, which cannot reliably detect or
> report a stable link state. To handle this, the PHY must be periodically reset
> when the link is down. However, synchronized reset intervals between the PHY
> and its link partner can result in a deadlock, preventing the link from
> re-establishing.
> 
> This change introduces a randomized polling interval when the link is down to
> desynchronize resets between link partners.

Hi Oleksij

What other solutions did you try? I'm wondering if this is more
complex than it needs to be. Could you add a random delay in
dp83tg720_read_status() when it decides to do a reset?

	Andrew
Heiner Kallweit Nov. 27, 2024, 5:33 p.m. UTC | #2
On 27.11.2024 14:10, Oleksij Rempel wrote:
> Address the limitations of the DP83TG720 PHY, which cannot reliably detect or
> report a stable link state. To handle this, the PHY must be periodically reset
> when the link is down. However, synchronized reset intervals between the PHY
> and its link partner can result in a deadlock, preventing the link from
> re-establishing.
> 
Out of curiosity: This PHY isn't normally quirky, but completely broken.
Why would anybody use it?

> This change introduces a randomized polling interval when the link is down to
> desynchronize resets between link partners.
> 
> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
> ---
>  drivers/net/phy/dp83tg720.c | 76 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 76 insertions(+)
> 
> diff --git a/drivers/net/phy/dp83tg720.c b/drivers/net/phy/dp83tg720.c
> index f56659d41b31..64c65454cf94 100644
> --- a/drivers/net/phy/dp83tg720.c
> +++ b/drivers/net/phy/dp83tg720.c
> @@ -4,12 +4,31 @@
>   */
>  #include <linux/bitfield.h>
>  #include <linux/ethtool_netlink.h>
> +#include <linux/jiffies.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/phy.h>
> +#include <linux/random.h>
> 
>  #include "open_alliance_helpers.h"
> 
> +/*
> + * DP83TG720S_POLL_ACTIVE_LINK - Polling interval in milliseconds when the link
> + *				 is active.
> + * DP83TG720S_POLL_NO_LINK_MIN - Minimum polling interval in milliseconds when
> + *				 the link is down.
> + * DP83TG720S_POLL_NO_LINK_MAX - Maximum polling interval in milliseconds when
> + *				 the link is down.
> + *
> + * These values are not documented or officially recommended by the vendor but
> + * were determined through empirical testing. They achieve a good balance in
> + * minimizing the number of reset retries while ensuring reliable link recovery
> + * within a reasonable timeframe.
> + */
> +#define DP83TG720S_POLL_ACTIVE_LINK		1000
> +#define DP83TG720S_POLL_NO_LINK_MIN		100
> +#define DP83TG720S_POLL_NO_LINK_MAX		1000
> +
>  #define DP83TG720S_PHY_ID			0x2000a284
> 
>  /* MDIO_MMD_VEND2 registers */
> @@ -355,6 +374,11 @@ static int dp83tg720_read_status(struct phy_device *phydev)
>  		if (ret)
>  			return ret;
> 
> +		/* The sleep value is based on testing with the DP83TG720S-Q1
> +		 * PHY. The PHY needs some time to recover from a link loss.
> +		 */
What is the issue during this "time to recover"?
Is errata information available from the vendor?

> +		msleep(600);
> +
>  		/* After HW reset we need to restore master/slave configuration.
>  		 * genphy_c45_pma_baset1_read_master_slave() call will be done
>  		 * by the dp83tg720_config_aneg() function.
> @@ -482,6 +506,57 @@ static int dp83tg720_probe(struct phy_device *phydev)
>  	return 0;
>  }
> 
> +/**
> + * dp83tg720_phy_get_next_update_time - Determine the next update time for PHY
> + *                                      state
> + * @phydev: Pointer to the phy_device structure
> + *
> + * This function addresses a limitation of the DP83TG720 PHY, which cannot
> + * reliably detect or report a stable link state. To recover from such
> + * scenarios, the PHY must be periodically reset when the link is down. However,
> + * if the link partner also runs Linux with the same driver, synchronized reset
> + * intervals can lead to a deadlock where the link never establishes due to
> + * simultaneous resets on both sides.
> + *
> + * To avoid this, the function implements randomized polling intervals when the
> + * link is down. It ensures that reset intervals are desynchronized by
> + * introducing a random delay between a configured minimum and maximum range.
> + * When the link is up, a fixed polling interval is used to minimize overhead.
> + *
> + * This mechanism guarantees that the link will reestablish within 10 seconds
> + * in the worst-case scenario.
> + *
> + * Return: Time (in milliseconds) until the next update event for the PHY state
> + * machine.
> + */
> +static unsigned int dp83tg720_phy_get_next_update_time(struct phy_device *phydev)
> +{
> +	unsigned int jiffy_ms = jiffies_to_msecs(1); /* Jiffy granularity in ms */
> +	unsigned int next_time_ms;
> +
> +	if (phydev->link) {
> +		/* When the link is up, use a fixed 1000ms interval */
> +		next_time_ms = DP83TG720S_POLL_ACTIVE_LINK;
> +	} else {
> +		unsigned int min_jiffies, max_jiffies, rand_jiffies;
> +		/* When the link is down, randomize interval between
> +		 * configured min/max
> +		 */
> +
> +		/* Convert min and max to jiffies */
> +		min_jiffies = msecs_to_jiffies(DP83TG720S_POLL_NO_LINK_MIN);
> +		max_jiffies = msecs_to_jiffies(DP83TG720S_POLL_NO_LINK_MAX);
> +
> +		/* Randomize in the jiffie range and convert back to ms */
> +		rand_jiffies = min_jiffies +
> +			get_random_u32_below(max_jiffies - min_jiffies + 1);
> +		next_time_ms = jiffies_to_msecs(rand_jiffies);
> +	}
> +
> +	/* Ensure the polling time is at least one jiffy */
> +	return max(next_time_ms, jiffy_ms);
> +}
> +
>  static struct phy_driver dp83tg720_driver[] = {
>  {
>  	PHY_ID_MATCH_MODEL(DP83TG720S_PHY_ID),
> @@ -500,6 +575,7 @@ static struct phy_driver dp83tg720_driver[] = {
>  	.get_link_stats	= dp83tg720_get_link_stats,
>  	.get_phy_stats	= dp83tg720_get_phy_stats,
>  	.update_stats	= dp83tg720_update_stats,
> +	.get_next_update_time = dp83tg720_phy_get_next_update_time,
> 
>  	.suspend	= genphy_suspend,
>  	.resume		= genphy_resume,
> --
> 2.39.5
>
Oleksij Rempel Nov. 28, 2024, 7:57 a.m. UTC | #3
On Wed, Nov 27, 2024 at 06:33:03PM +0100, Heiner Kallweit wrote:
> On 27.11.2024 14:10, Oleksij Rempel wrote:
> > Address the limitations of the DP83TG720 PHY, which cannot reliably detect or
> > report a stable link state. To handle this, the PHY must be periodically reset
> > when the link is down. However, synchronized reset intervals between the PHY
> > and its link partner can result in a deadlock, preventing the link from
> > re-establishing.
> > 
> Out of curiosity: This PHY isn't normally quirky, but completely broken.
> Why would anybody use it?

Is it rhetorical question, or you are really curios? I can answer it, but it
will not be a short one ¯\_(ツ)_/¯

> >  /* MDIO_MMD_VEND2 registers */
> > @@ -355,6 +374,11 @@ static int dp83tg720_read_status(struct phy_device *phydev)
> >  		if (ret)
> >  			return ret;
> > 
> > +		/* The sleep value is based on testing with the DP83TG720S-Q1
> > +		 * PHY. The PHY needs some time to recover from a link loss.
> > +		 */
> What is the issue during this "time to recover"?
> Is errata information available from the vendor?

I didn't found errata documentation for this chip. But there is
"DP83TC81x, DP83TG72x Software Implementation Guide" SNLA404, describing
the need of reset each 100ms and PHY Reset + PHY Initialization
sequences:
https://www.ti.com/lit/an/snla404/snla404.pdf

So far, i was not able to find any justification for this delay, except
it helped to reduce amount of PHY resets.
Oleksij Rempel Nov. 28, 2024, 8:07 a.m. UTC | #4
Hi Andrew,

On Wed, Nov 27, 2024 at 04:37:49PM +0100, Andrew Lunn wrote:
> On Wed, Nov 27, 2024 at 02:10:11PM +0100, Oleksij Rempel wrote:
> > Address the limitations of the DP83TG720 PHY, which cannot reliably detect or
> > report a stable link state. To handle this, the PHY must be periodically reset
> > when the link is down. However, synchronized reset intervals between the PHY
> > and its link partner can result in a deadlock, preventing the link from
> > re-establishing.
> > 
> > This change introduces a randomized polling interval when the link is down to
> > desynchronize resets between link partners.
> 
> Hi Oleksij
> 
> What other solutions did you try? I'm wondering if this is more
> complex than it needs to be. Could you add a random delay in
> dp83tg720_read_status() when it decides to do a reset?

Yes, this would be possible, but there are multiple reasons I decided to
go this way:
- in link down case, it is better to increase polling frequency, it
  allows to reduce link up time.
- there are PHYs, for example an integrated to LAN9372 which supports
  only link down interrupt. As long as link is down, it should be
  polled.
- i'm working on generic PHY stats support and PHYs need to be polled,
  even with IRQ support, just less frequently.

I can add it to the commit message.
Andrew Lunn Nov. 28, 2024, 2:29 p.m. UTC | #5
On Thu, Nov 28, 2024 at 09:07:14AM +0100, Oleksij Rempel wrote:
> Hi Andrew,
> 
> On Wed, Nov 27, 2024 at 04:37:49PM +0100, Andrew Lunn wrote:
> > On Wed, Nov 27, 2024 at 02:10:11PM +0100, Oleksij Rempel wrote:
> > > Address the limitations of the DP83TG720 PHY, which cannot reliably detect or
> > > report a stable link state. To handle this, the PHY must be periodically reset
> > > when the link is down. However, synchronized reset intervals between the PHY
> > > and its link partner can result in a deadlock, preventing the link from
> > > re-establishing.
> > > 
> > > This change introduces a randomized polling interval when the link is down to
> > > desynchronize resets between link partners.
> > 
> > Hi Oleksij
> > 
> > What other solutions did you try? I'm wondering if this is more
> > complex than it needs to be. Could you add a random delay in
> > dp83tg720_read_status() when it decides to do a reset?
> 
> Yes, this would be possible, but there are multiple reasons I decided to
> go this way:
> - in link down case, it is better to increase polling frequency, it
>   allows to reduce link up time.
> - there are PHYs, for example an integrated to LAN9372 which supports
>   only link down interrupt. As long as link is down, it should be
>   polled.
> - i'm working on generic PHY stats support and PHYs need to be polled,
>   even with IRQ support, just less frequently.
> 
> I can add it to the commit message.

Yes, more justification would be good.

In general, we try to hide workarounds for broken devices in the
driver, not expose it to all drivers. Variable rate polling, and
polling even when interrupt are enabled does however sound
useful. Cable testing might also be able to use it.

	Andrew
diff mbox series

Patch

diff --git a/drivers/net/phy/dp83tg720.c b/drivers/net/phy/dp83tg720.c
index f56659d41b31..64c65454cf94 100644
--- a/drivers/net/phy/dp83tg720.c
+++ b/drivers/net/phy/dp83tg720.c
@@ -4,12 +4,31 @@ 
  */
 #include <linux/bitfield.h>
 #include <linux/ethtool_netlink.h>
+#include <linux/jiffies.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/phy.h>
+#include <linux/random.h>

 #include "open_alliance_helpers.h"

+/*
+ * DP83TG720S_POLL_ACTIVE_LINK - Polling interval in milliseconds when the link
+ *				 is active.
+ * DP83TG720S_POLL_NO_LINK_MIN - Minimum polling interval in milliseconds when
+ *				 the link is down.
+ * DP83TG720S_POLL_NO_LINK_MAX - Maximum polling interval in milliseconds when
+ *				 the link is down.
+ *
+ * These values are not documented or officially recommended by the vendor but
+ * were determined through empirical testing. They achieve a good balance in
+ * minimizing the number of reset retries while ensuring reliable link recovery
+ * within a reasonable timeframe.
+ */
+#define DP83TG720S_POLL_ACTIVE_LINK		1000
+#define DP83TG720S_POLL_NO_LINK_MIN		100
+#define DP83TG720S_POLL_NO_LINK_MAX		1000
+
 #define DP83TG720S_PHY_ID			0x2000a284

 /* MDIO_MMD_VEND2 registers */
@@ -355,6 +374,11 @@  static int dp83tg720_read_status(struct phy_device *phydev)
 		if (ret)
 			return ret;

+		/* The sleep value is based on testing with the DP83TG720S-Q1
+		 * PHY. The PHY needs some time to recover from a link loss.
+		 */
+		msleep(600);
+
 		/* After HW reset we need to restore master/slave configuration.
 		 * genphy_c45_pma_baset1_read_master_slave() call will be done
 		 * by the dp83tg720_config_aneg() function.
@@ -482,6 +506,57 @@  static int dp83tg720_probe(struct phy_device *phydev)
 	return 0;
 }

+/**
+ * dp83tg720_phy_get_next_update_time - Determine the next update time for PHY
+ *                                      state
+ * @phydev: Pointer to the phy_device structure
+ *
+ * This function addresses a limitation of the DP83TG720 PHY, which cannot
+ * reliably detect or report a stable link state. To recover from such
+ * scenarios, the PHY must be periodically reset when the link is down. However,
+ * if the link partner also runs Linux with the same driver, synchronized reset
+ * intervals can lead to a deadlock where the link never establishes due to
+ * simultaneous resets on both sides.
+ *
+ * To avoid this, the function implements randomized polling intervals when the
+ * link is down. It ensures that reset intervals are desynchronized by
+ * introducing a random delay between a configured minimum and maximum range.
+ * When the link is up, a fixed polling interval is used to minimize overhead.
+ *
+ * This mechanism guarantees that the link will reestablish within 10 seconds
+ * in the worst-case scenario.
+ *
+ * Return: Time (in milliseconds) until the next update event for the PHY state
+ * machine.
+ */
+static unsigned int dp83tg720_phy_get_next_update_time(struct phy_device *phydev)
+{
+	unsigned int jiffy_ms = jiffies_to_msecs(1); /* Jiffy granularity in ms */
+	unsigned int next_time_ms;
+
+	if (phydev->link) {
+		/* When the link is up, use a fixed 1000ms interval */
+		next_time_ms = DP83TG720S_POLL_ACTIVE_LINK;
+	} else {
+		unsigned int min_jiffies, max_jiffies, rand_jiffies;
+		/* When the link is down, randomize interval between
+		 * configured min/max
+		 */
+
+		/* Convert min and max to jiffies */
+		min_jiffies = msecs_to_jiffies(DP83TG720S_POLL_NO_LINK_MIN);
+		max_jiffies = msecs_to_jiffies(DP83TG720S_POLL_NO_LINK_MAX);
+
+		/* Randomize in the jiffie range and convert back to ms */
+		rand_jiffies = min_jiffies +
+			get_random_u32_below(max_jiffies - min_jiffies + 1);
+		next_time_ms = jiffies_to_msecs(rand_jiffies);
+	}
+
+	/* Ensure the polling time is at least one jiffy */
+	return max(next_time_ms, jiffy_ms);
+}
+
 static struct phy_driver dp83tg720_driver[] = {
 {
 	PHY_ID_MATCH_MODEL(DP83TG720S_PHY_ID),
@@ -500,6 +575,7 @@  static struct phy_driver dp83tg720_driver[] = {
 	.get_link_stats	= dp83tg720_get_link_stats,
 	.get_phy_stats	= dp83tg720_get_phy_stats,
 	.update_stats	= dp83tg720_update_stats,
+	.get_next_update_time = dp83tg720_phy_get_next_update_time,

 	.suspend	= genphy_suspend,
 	.resume		= genphy_resume,