diff mbox series

[v2,1/1] net: phy: micrel: Add workaround for incomplete autonegotiation

Message ID 20231227231657.15152-1-asmaa@nvidia.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v2,1/1] net: phy: micrel: Add workaround for incomplete autonegotiation | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1113 this patch: 1113
netdev/cc_maintainers fail 6 maintainers not CCed: edumazet@google.com pabeni@redhat.com kuba@kernel.org linux@armlinux.org.uk hkallweit1@gmail.com andrew@lunn.ch
netdev/build_clang success Errors and warnings before: 1140 this patch: 1140
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1140 this patch: 1140
netdev/checkpatch fail ERROR: code indent should use tabs where possible WARNING: please, no spaces at the start of a line
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Asmaa Mnebhi Dec. 27, 2023, 11:16 p.m. UTC
Very rarely, the KSZ9031 fails to complete autonegotiation although it was
initiated via phy_start(). As a result, the link stays down. Restarting
autonegotiation when in this state solves the issue.

Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
---
v1->v2:
- Use msleep() instead of mdelay()

 drivers/net/phy/micrel.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Florian Fainelli Dec. 28, 2023, 8:43 a.m. UTC | #1
On 12/28/2023 12:16 AM, Asmaa Mnebhi wrote:
> Very rarely, the KSZ9031 fails to complete autonegotiation although it was
> initiated via phy_start(). As a result, the link stays down. Restarting
> autonegotiation when in this state solves the issue.
> 
> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>

Is there a Micrel errata associated with this work around that could be 
referenced here?
Asmaa Mnebhi Dec. 28, 2023, 1:37 p.m. UTC | #2
> On 12/28/2023 12:16 AM, Asmaa Mnebhi wrote:
> > Very rarely, the KSZ9031 fails to complete autonegotiation although it
> > was initiated via phy_start(). As a result, the link stays down.
> > Restarting autonegotiation when in this state solves the issue.
> >
> > Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
> 
> Is there a Micrel errata associated with this work around that could be
> referenced here?

Hi Florian,

No there isn’t. This is based on observations and comparison with the behavior and testing of other PHYs. For example, we don’t see this issue with the Vitesse PHY.

Thanks.
Asmaa
Heiner Kallweit Dec. 28, 2023, 2:09 p.m. UTC | #3
On 28.12.2023 14:37, Asmaa Mnebhi wrote:
>  > On 12/28/2023 12:16 AM, Asmaa Mnebhi wrote:
>>> Very rarely, the KSZ9031 fails to complete autonegotiation although it
>>> was initiated via phy_start(). As a result, the link stays down.
>>> Restarting autonegotiation when in this state solves the issue.
>>>
>>> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
>>
>> Is there a Micrel errata associated with this work around that could be
>> referenced here?
> 
> Hi Florian,
> 
> No there isn’t. This is based on observations and comparison with the behavior and testing of other PHYs. For example, we don’t see this issue with the Vitesse PHY.
> 
The Microchip KSZ9031 errata documentation lists few link-related errata.
May any of these be relevant in your case? If not, please check with Microchip.
KSZ9031 isn't new, and most likely we would have seen such reports before,
if there's an actual issue.
I'd like to avoid that we add code to work around an issue that is specific
to your setup.

> Thanks.
> Asmaa
Heiner Kallweit Dec. 28, 2023, 2:17 p.m. UTC | #4
On 28.12.2023 00:16, Asmaa Mnebhi wrote:
> Very rarely, the KSZ9031 fails to complete autonegotiation although it was
> initiated via phy_start(). As a result, the link stays down. Restarting
> autonegotiation when in this state solves the issue.
> 
The patch isn't addressed to all relevant maintainers. Please use the
get_maintainers script.

You should use the net/net-next annotation to make clear whether this should
be treated as a fix (in this case add a Fixes tag) or net-next material.

> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
> ---
> v1->v2:
> - Use msleep() instead of mdelay()
> 
>  drivers/net/phy/micrel.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
> index 08e3915001c3..9952a073413f 100644
> --- a/drivers/net/phy/micrel.c
> +++ b/drivers/net/phy/micrel.c
> @@ -1475,6 +1475,7 @@ static int ksz9031_get_features(struct phy_device *phydev)
>  
>  static int ksz9031_read_status(struct phy_device *phydev)
>  {
> +	u8 timeout = 10;
>  	int err;
>  	int regval;
>  
> @@ -1494,6 +1495,22 @@ static int ksz9031_read_status(struct phy_device *phydev)
>  		return genphy_config_aneg(phydev);
>  	}
>  
> +	/* KSZ9031's autonegotiation takes normally 4-5 seconds to complete.
> +	 * Occasionally it fails to complete autonegotiation. The workaround is
> +	 * to restart it.
> +	 */
> +        if (phydev->autoneg == AUTONEG_ENABLE) {
> +		while (timeout) {
> +			if (phy_aneg_done(phydev))
> +				break;
> +			msleep(1000);
> +			timeout--;

It's not too nice to do this synchronously. Even in the non-problem case this
will block the phylib state machine for seconds. Better find a way to do it
asynchronously.

> +		};
> +
> +		if (timeout == 0)
> +			phy_restart_aneg(phydev);
> +	}
> +
>  	return 0;
>  }
>
Asmaa Mnebhi Dec. 28, 2023, 9:21 p.m. UTC | #5
> >> Is there a Micrel errata associated with this work around that could
> >> be referenced here?
> >
> > Hi Florian,
> >
> > No there isn’t. This is based on observations and comparison with the
> behavior and testing of other PHYs. For example, we don’t see this issue with
> the Vitesse PHY.
> >
> The Microchip KSZ9031 errata documentation lists few link-related errata.
> May any of these be relevant in your case? If not, please check with Microchip.
> KSZ9031 isn't new, and most likely we would have seen such reports before, if
> there's an actual issue.
> I'd like to avoid that we add code to work around an issue that is specific to
> your setup.
> 
Thanks Heiner. I went over the errata and there are couple of issues which could result in the link not coming up:

1) Module 1: Device fails to link after Asymmetric Pause capability is set
The micrel.c driver already has a workaround for this and I have verified that when our issue reproduces, only symmetric pause is enabled.

2) Module 5: Auto-Negotiation link-up failure / long link-up time due to default FLP interval setting
The micrel.c driver also already has a workaround for this. 

Apart from the erratas, I see that there were other KSZ9031 issues for which workarounds were needed in the kernel:
1) commit d2fd719bcb0e83cb39cfee22ee800f98a56eceb3
net/phy: micrel: Add workaround for bad autoneg

2) commit c1a8d0a3accf64a014d605e6806ce05d1c17adf1
net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround

in our case, I have verified that we don’t stumble upon the above 2 bugs (idle error count is 0x0).

Our QA sees that autonegotiation fails to complete after rebooting the system > 2000 times. So it is hard to reproduce.
Our OOB MAC is connected to the Micrel KSZ9031, which is connected to a switch.
I have checked that phy_start() calls phy_start_aneg() and that the genphy_restart_aneg() sets the BMCR_ANRESTART bit. After that, it doesn’t matter how long we wait, the PHY autonegotiation doesn’t complete and the link is down. Restarting autonegotiation a second time solves the issue. 

I will share this information with Microchip. I hope they can help.

Thanks.
Asmaa
diff mbox series

Patch

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 08e3915001c3..9952a073413f 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -1475,6 +1475,7 @@  static int ksz9031_get_features(struct phy_device *phydev)
 
 static int ksz9031_read_status(struct phy_device *phydev)
 {
+	u8 timeout = 10;
 	int err;
 	int regval;
 
@@ -1494,6 +1495,22 @@  static int ksz9031_read_status(struct phy_device *phydev)
 		return genphy_config_aneg(phydev);
 	}
 
+	/* KSZ9031's autonegotiation takes normally 4-5 seconds to complete.
+	 * Occasionally it fails to complete autonegotiation. The workaround is
+	 * to restart it.
+	 */
+        if (phydev->autoneg == AUTONEG_ENABLE) {
+		while (timeout) {
+			if (phy_aneg_done(phydev))
+				break;
+			msleep(1000);
+			timeout--;
+		};
+
+		if (timeout == 0)
+			phy_restart_aneg(phydev);
+	}
+
 	return 0;
 }