Message ID | fb736ae9a0af7616c20c36264aaec8702abc84ae.1728056939.git.daniel@makrotopia.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: phy: realtek: check validity of 10GbE link-partner advertisement | expand |
On Fri, Oct 04, 2024 at 04:50:36PM +0100, Daniel Golle wrote: > Only use link-partner advertisement bits for 10GbE modes if they are > actually valid. Check LOCALOK and REMOTEOK bits and clear 10GbE modes > unless both of them are set. > This prevents misinterpreting the stale 2500M link-partner advertisement > bit in case a subsequent linkpartner doesn't do any NBase-T > advertisement at all. > > Signed-off-by: Daniel Golle <daniel@makrotopia.org> > --- > drivers/net/phy/realtek.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c > index c4d0d93523ad..d276477cf511 100644 > --- a/drivers/net/phy/realtek.c > +++ b/drivers/net/phy/realtek.c > @@ -927,6 +927,10 @@ static int rtl822x_read_status(struct phy_device *phydev) > if (lpadv < 0) > return lpadv; > > + if (!(lpadv & MDIO_AN_10GBT_STAT_REMOK) || > + !(lpadv & MDIO_AN_10GBT_STAT_LOCOK)) > + lpadv = 0; > + > mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, > lpadv); I know lpadv is coming from a vendor register, but does MDIO_AN_10GBT_STAT_LOCOK and MDIO_AN_10GBT_STAT_REMOK apply if it was also from the register defined in 802.3? I'm just wondering if this test should be inside mii_10gbt_stat_mod_linkmode_lpa_t()? Andrew
On Fri, Oct 04, 2024 at 11:17:28PM +0200, Andrew Lunn wrote: > On Fri, Oct 04, 2024 at 04:50:36PM +0100, Daniel Golle wrote: > > Only use link-partner advertisement bits for 10GbE modes if they are > > actually valid. Check LOCALOK and REMOTEOK bits and clear 10GbE modes > > unless both of them are set. > > This prevents misinterpreting the stale 2500M link-partner advertisement > > bit in case a subsequent linkpartner doesn't do any NBase-T > > advertisement at all. > > > > Signed-off-by: Daniel Golle <daniel@makrotopia.org> > > --- > > drivers/net/phy/realtek.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c > > index c4d0d93523ad..d276477cf511 100644 > > --- a/drivers/net/phy/realtek.c > > +++ b/drivers/net/phy/realtek.c > > @@ -927,6 +927,10 @@ static int rtl822x_read_status(struct phy_device *phydev) > > if (lpadv < 0) > > return lpadv; > > > > + if (!(lpadv & MDIO_AN_10GBT_STAT_REMOK) || > > + !(lpadv & MDIO_AN_10GBT_STAT_LOCOK)) > > + lpadv = 0; > > + > > mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, > > lpadv); > > I know lpadv is coming from a vendor register, but does > MDIO_AN_10GBT_STAT_LOCOK and MDIO_AN_10GBT_STAT_REMOK apply if it was > also from the register defined in 802.3? I'm just wondering if this > test should be inside mii_10gbt_stat_mod_linkmode_lpa_t()? Yes, it does apply and I thought the same, but as mii_10gbt_stat_mod_linkmode_lpa_t is used in various places without checking those two bits we may break other PHYs which may not use them (and apparently this is mostly a problem on RealTek PHYs where all the other bits in the register persist in case of a non-NBase-T- capable subsequent link-partner after initially being connected to an NBase-T-capable one). Maybe we could introduce a new function mii_10gbt_stat_mod_linkmode_lpa_validate_t() which calls mii_10gbt_stat_mod_linkmode_lpa_t() but checks LOCOK and REMOK as a precondition?
On Fri, Oct 04, 2024 at 11:06:01PM +0100, Daniel Golle wrote: > On Fri, Oct 04, 2024 at 11:17:28PM +0200, Andrew Lunn wrote: > > On Fri, Oct 04, 2024 at 04:50:36PM +0100, Daniel Golle wrote: > > > Only use link-partner advertisement bits for 10GbE modes if they are > > > actually valid. Check LOCALOK and REMOTEOK bits and clear 10GbE modes > > > unless both of them are set. > > > This prevents misinterpreting the stale 2500M link-partner advertisement > > > bit in case a subsequent linkpartner doesn't do any NBase-T > > > advertisement at all. > > > > > > Signed-off-by: Daniel Golle <daniel@makrotopia.org> > > > --- > > > drivers/net/phy/realtek.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c > > > index c4d0d93523ad..d276477cf511 100644 > > > --- a/drivers/net/phy/realtek.c > > > +++ b/drivers/net/phy/realtek.c > > > @@ -927,6 +927,10 @@ static int rtl822x_read_status(struct phy_device *phydev) > > > if (lpadv < 0) > > > return lpadv; > > > > > > + if (!(lpadv & MDIO_AN_10GBT_STAT_REMOK) || > > > + !(lpadv & MDIO_AN_10GBT_STAT_LOCOK)) > > > + lpadv = 0; > > > + > > > mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, > > > lpadv); > > > > I know lpadv is coming from a vendor register, but does > > MDIO_AN_10GBT_STAT_LOCOK and MDIO_AN_10GBT_STAT_REMOK apply if it was > > also from the register defined in 802.3? I'm just wondering if this > > test should be inside mii_10gbt_stat_mod_linkmode_lpa_t()? > > Yes, it does apply and I thought the same, but as > mii_10gbt_stat_mod_linkmode_lpa_t is used in various places without > checking those two bits we may break other PHYs which may not use > them (and apparently this is mostly a problem on RealTek PHYs where > all the other bits in the register persist in case of a non-NBase-T- > capable subsequent link-partner after initially being connected to > an NBase-T-capable one). > > Maybe we could introduce a new function > mii_10gbt_stat_mod_linkmode_lpa_validate_t() > which calls mii_10gbt_stat_mod_linkmode_lpa_t() but checks LOCOK and > REMOK as a precondition? Isn't the link status supposed to indicate link down of LOCOK is clear? Maybe checking these bits should be included in the link status check, and if not set, then phydev->link should be cleared?
On Tue, Oct 08, 2024 at 12:10:07PM +0100, Russell King (Oracle) wrote: > On Fri, Oct 04, 2024 at 11:06:01PM +0100, Daniel Golle wrote: > > On Fri, Oct 04, 2024 at 11:17:28PM +0200, Andrew Lunn wrote: > > > On Fri, Oct 04, 2024 at 04:50:36PM +0100, Daniel Golle wrote: > > > > Only use link-partner advertisement bits for 10GbE modes if they are > > > > actually valid. Check LOCALOK and REMOTEOK bits and clear 10GbE modes > > > > unless both of them are set. > > > > This prevents misinterpreting the stale 2500M link-partner advertisement > > > > bit in case a subsequent linkpartner doesn't do any NBase-T > > > > advertisement at all. > > > > > > > > Signed-off-by: Daniel Golle <daniel@makrotopia.org> > > > > --- > > > > drivers/net/phy/realtek.c | 4 ++++ > > > > 1 file changed, 4 insertions(+) > > > > > > > > diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c > > > > index c4d0d93523ad..d276477cf511 100644 > > > > --- a/drivers/net/phy/realtek.c > > > > +++ b/drivers/net/phy/realtek.c > > > > @@ -927,6 +927,10 @@ static int rtl822x_read_status(struct phy_device *phydev) > > > > if (lpadv < 0) > > > > return lpadv; > > > > > > > > + if (!(lpadv & MDIO_AN_10GBT_STAT_REMOK) || > > > > + !(lpadv & MDIO_AN_10GBT_STAT_LOCOK)) > > > > + lpadv = 0; > > > > + > > > > mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, > > > > lpadv); > > > > > > I know lpadv is coming from a vendor register, but does > > > MDIO_AN_10GBT_STAT_LOCOK and MDIO_AN_10GBT_STAT_REMOK apply if it was > > > also from the register defined in 802.3? I'm just wondering if this > > > test should be inside mii_10gbt_stat_mod_linkmode_lpa_t()? > > > > Yes, it does apply and I thought the same, but as > > mii_10gbt_stat_mod_linkmode_lpa_t is used in various places without > > checking those two bits we may break other PHYs which may not use > > them (and apparently this is mostly a problem on RealTek PHYs where > > all the other bits in the register persist in case of a non-NBase-T- > > capable subsequent link-partner after initially being connected to > > an NBase-T-capable one). > > > > Maybe we could introduce a new function > > mii_10gbt_stat_mod_linkmode_lpa_validate_t() > > which calls mii_10gbt_stat_mod_linkmode_lpa_t() but checks LOCOK and > > REMOK as a precondition? > > Isn't the link status supposed to indicate link down of LOCOK > is clear? > > Maybe checking these bits should be included in the link status > check, and if not set, then phydev->link should be cleared? At least in case of those RealTek PHYs the situation is a bit different: The AN_10GBT bits do get set according to the link partner advertisement in case the link partner does any 10GBT advertisement at all. Now, if after being connected to a link partner which does 10GBT advertisement you subsequently connect to a link partner which doesn't do any 10GBT advertisement at all, the previously advertised bits remain in the register, just REMOK and LOCOK aren't set. That obviously doesn't imply that the link is down. I noticed it because I would see a downshift warning in kernel logs even though the new link partner was not capable of connecting with speeds higher than 1000MBit/s. Note that some that this doesn't happen with all 1GBE NICs, because some seem to carry out 10GBT advertisement but just all empty while others just don't do any 10GBT advertisement at all.
On Tue, Oct 08, 2024 at 12:59:17PM +0100, Daniel Golle wrote: > On Tue, Oct 08, 2024 at 12:10:07PM +0100, Russell King (Oracle) wrote: > > On Fri, Oct 04, 2024 at 11:06:01PM +0100, Daniel Golle wrote: > > > On Fri, Oct 04, 2024 at 11:17:28PM +0200, Andrew Lunn wrote: > > > > On Fri, Oct 04, 2024 at 04:50:36PM +0100, Daniel Golle wrote: > > > > > Only use link-partner advertisement bits for 10GbE modes if they are > > > > > actually valid. Check LOCALOK and REMOTEOK bits and clear 10GbE modes > > > > > unless both of them are set. > > > > > This prevents misinterpreting the stale 2500M link-partner advertisement > > > > > bit in case a subsequent linkpartner doesn't do any NBase-T > > > > > advertisement at all. > > > > > > > > > > Signed-off-by: Daniel Golle <daniel@makrotopia.org> > > > > > --- > > > > > drivers/net/phy/realtek.c | 4 ++++ > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c > > > > > index c4d0d93523ad..d276477cf511 100644 > > > > > --- a/drivers/net/phy/realtek.c > > > > > +++ b/drivers/net/phy/realtek.c > > > > > @@ -927,6 +927,10 @@ static int rtl822x_read_status(struct phy_device *phydev) > > > > > if (lpadv < 0) > > > > > return lpadv; > > > > > > > > > > + if (!(lpadv & MDIO_AN_10GBT_STAT_REMOK) || > > > > > + !(lpadv & MDIO_AN_10GBT_STAT_LOCOK)) > > > > > + lpadv = 0; > > > > > + > > > > > mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, > > > > > lpadv); > > > > > > > > I know lpadv is coming from a vendor register, but does > > > > MDIO_AN_10GBT_STAT_LOCOK and MDIO_AN_10GBT_STAT_REMOK apply if it was > > > > also from the register defined in 802.3? I'm just wondering if this > > > > test should be inside mii_10gbt_stat_mod_linkmode_lpa_t()? > > > > > > Yes, it does apply and I thought the same, but as > > > mii_10gbt_stat_mod_linkmode_lpa_t is used in various places without > > > checking those two bits we may break other PHYs which may not use > > > them (and apparently this is mostly a problem on RealTek PHYs where > > > all the other bits in the register persist in case of a non-NBase-T- > > > capable subsequent link-partner after initially being connected to > > > an NBase-T-capable one). > > > > > > Maybe we could introduce a new function > > > mii_10gbt_stat_mod_linkmode_lpa_validate_t() > > > which calls mii_10gbt_stat_mod_linkmode_lpa_t() but checks LOCOK and > > > REMOK as a precondition? > > > > Isn't the link status supposed to indicate link down of LOCOK > > is clear? > > > > Maybe checking these bits should be included in the link status > > check, and if not set, then phydev->link should be cleared? > > At least in case of those RealTek PHYs the situation is a bit different: > The AN_10GBT bits do get set according to the link partner advertisement > in case the link partner does any 10GBT advertisement at all. > Now, if after being connected to a link partner which does 10GBT > advertisement you subsequently connect to a link partner which doesn't > do any 10GBT advertisement at all, the previously advertised bits > remain in the register, just REMOK and LOCOK aren't set. > That obviously doesn't imply that the link is down. > I noticed it because I would see a downshift warning in kernel logs > even though the new link partner was not capable of connecting with > speeds higher than 1000MBit/s. > Note that some that this doesn't happen with all 1GBE NICs, because > some seem to carry out 10GBT advertisement but just all empty while > others just don't do any 10GBT advertisement at all. Let's start checking what we're doing with regards to this register. 7.33.11 (Link Partner 10GBASE-T capability) states that this is only valid when the Page received bit (7.1.6) has been set. This is the BMSR_ANEGCOMPLETE / MDIO_AN_STAT1_COMPLETE bit. Looking at rtl822x_read_status, which is called directly as a .read_status() method, it reads some register that might be the equivalent of MMD 7 Register 33 (if that's what 0xa5d, 0x13 is, 0xa5d, 0x12 seems to be MDIO_AN_10GBT_CTRL) whether or not the link is up and whether or not AN has completed. It's only conditional on Autoneg being enabled. However, we don't look at 7.1.6, which is wrong according to 802.3. So I think the first thing that's needed here is that needs fixing - we should only be reading the LP ability registers when (a) we have link, and (b) when the PHY indicates that config pages have been received. The next thing that needs fixing is to add support for checking these LOCOK/REMOK bits - and if these are specific to the result of the negotiation (there's some hints in 802.3 that's the case, as there are other registers with similar bits in, but I haven't looked deeply at it) then, since the resolution is done in core PHY code, I think we need another method into drivers to check these bits once resolution has occurred. However, I need to do more reading to work this out.
On Tue, Oct 08, 2024 at 01:45:03PM +0100, Russell King (Oracle) wrote: > Let's start checking what we're doing with regards to this register. > > 7.33.11 (Link Partner 10GBASE-T capability) states that this is only > valid when the Page received bit (7.1.6) has been set. This is the > BMSR_ANEGCOMPLETE / MDIO_AN_STAT1_COMPLETE bit. > > Looking at rtl822x_read_status, which is called directly as a > .read_status() method, it reads some register that might be the > equivalent of MMD 7 Register 33 (if that's what 0xa5d, 0x13 is, > 0xa5d, 0x12 seems to be MDIO_AN_10GBT_CTRL) whether or not the link > is up and whether or not AN has completed. It's only conditional on > Autoneg being enabled. > > However, we don't look at 7.1.6, which is wrong according to 802.3. > So I think the first thing that's needed here is that needs fixing > - we should only be reading the LP ability registers when (a) we > have link, and (b) when the PHY indicates that config pages have > been received. > > The next thing that needs fixing is to add support for checking > these LOCOK/REMOK bits - and if these are specific to the result of > the negotiation (there's some hints in 802.3 that's the case, as > there are other registers with similar bits in, but I haven't > looked deeply at it) then, since the resolution is done in core > PHY code, I think we need another method into drivers to check > these bits once resolution has occurred. Okay, I think the problem is down to the order in which Realtek is doing stuff. genphy_read_status() calls genphy_update_link(), which updates phydev->link and phydev->autoneg_complete from the BMSR, and then goes on to call genphy_read_lpa(). Looking at genphy_read_lpa(): if (phydev->autoneg == AUTONEG_ENABLE) { if (!phydev->autoneg_complete) { mii_stat1000_mod_linkmode_lpa_t(phydev->lp_advertising, 0); mii_lpa_mod_linkmode_lpa_t(phydev->lp_advertising, 0); return 0; } So, if BMSR_ANEGCOMPLETE is not set, then we zero the 1G FD/HD, Autoneg, Pause, Asym Pause and 100/10M FD/HD fields in the LPA leaving everything else alone - and then do nothing further. In other words, we don't read the LPA registers and update these bits. Looking at genphy_c45_read_lpa(): if (!(val & MDIO_AN_STAT1_COMPLETE)) { linkmode_clear_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, phydev->lp_advertising); mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, 0); mii_adv_mod_linkmode_adv_t(phydev->lp_advertising, 0); phydev->pause = 0; phydev->asym_pause = 0; return 0; So that's basically the same thing - if the MDIO_AN_STAT1_COMPLETE is clear, then we clear the 10G stuff. I think that mii_adv_mod_linkmode_adv_t() is wrong here, it should be mii_lpa_mod_linkmode_lpa_t(). However, the principle here is that if !autoneg_complete, then the modes that would've been set by the respective function need to be cleared. Now, rtl822x_read_status() reads the 10G status, modifying phydev->lp_advertising before then going on to call rtlgen_read_status(), which then calls genphy_read_status(), which in turn will then call genphy_read_lpa(). First, this is the wrong way around. Realtek needs to call genphy_read_status() so that phydev->link and phydev->autoneg_complete are both updated to the current status. Then, it needs to check whether AN is enabled, and whether autoneg has completed and deal with both situations. Afterwards, it then *possibly* needs to read its speed register and decode that to phydev->speed, but I don't see the point of that when it's (a) not able to also decode the duplex from that register, and (b) when we've already resolved it ourselves from the link mode. What I'd be worried about is if the PHY does a down-shift to a different speed _and_ duplex from what was resolved - and thus whether we should even be enabling downshift on this PHY. Maybe there's a bit in 0xa43 0x12 that gives us the duplex as well? In other words: static int rtl822x_read_status(struct phy_device *phydev) { int lpadv, ret; ret = rtlgen_read_status(phydev); if (ret < 0) return ret; if (phydev->autoneg == AUTONEG_DISABLE) return 0; if (!phydev->autoneg_complete) { mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, 0); return 0; } lpadv = phy_read_paged(phydev, 0xa5d, 0x13); if (lpadv < 0) return lpadv; mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, lpadv); phy_resolve_aneg_linkmode(phydev); return 0; } That should at least get proper behaviour in the link partner advertising bitmap rather than the weirdness that Realtek is doing. (BTW, other drivers should be audited for the same bug!) Now, if we still have the stale 2500M problem, then I'd suggest: if (phydev->speed >= 2500 && !(lpadv & MDIO_AN_10GBT_STAT_LOCOK)) { /* Possible stale advertisement causing incorrect * resolution. */ mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, 0); phy_resolve_aneg_linkmode(phydev); } here. However, if we keep the rtlgen_decode_speed() stuff, and can fix the duplex issue, then the phy_resolve_aneg_linkmode() calls should not be necessary, and it should be moved _after_ this to ensure that phydev->speed (and phydev->duplex) are correctly set.
Hi Russell, On Tue, Oct 08, 2024 at 03:27:21PM +0100, Russell King (Oracle) wrote: > Okay, I think the problem is down to the order in which Realtek is > doing stuff. > [...] > Now, rtl822x_read_status() reads the 10G status, modifying > phydev->lp_advertising before then going on to call > rtlgen_read_status(), which then calls genphy_read_status(), which > in turn will then call genphy_read_lpa(). > > First, this is the wrong way around. Realtek needs to call > genphy_read_status() so that phydev->link and phydev->autoneg_complete > are both updated to the current status. First of all thanks a lot for diving down that rabbit hole with me! > > Then, it needs to check whether AN is enabled, and whether autoneg > has completed and deal with both situations. > > Afterwards, it then *possibly* needs to read its speed register and > decode that to phydev->speed, but I don't see the point of that when > it's (a) not able to also decode the duplex from that register, and > (b) when we've already resolved it ourselves from the link mode. > What I'd be worried about is if the PHY does a down-shift to a > different speed _and_ duplex from what was resolved - and thus > whether we should even be enabling downshift on this PHY. Maybe > there's a bit in 0xa43 0x12 that gives us the duplex as well? > > In other words: > > static int rtl822x_read_status(struct phy_device *phydev) > { > int lpadv, ret; > > ret = rtlgen_read_status(phydev); > if (ret < 0) > return ret; > > if (phydev->autoneg == AUTONEG_DISABLE) > return 0; > > if (!phydev->autoneg_complete) { > mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, 0); > return 0; > } > > lpadv = phy_read_paged(phydev, 0xa5d, 0x13); > if (lpadv < 0) > return lpadv; > > mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, lpadv); > phy_resolve_aneg_linkmode(phydev); > > return 0; > } > > That should at least get proper behaviour in the link partner > advertising bitmap rather than the weirdness that Realtek is doing. > (BTW, other drivers should be audited for the same bug!) Got it, always do genphy_read_status() first thing, as that will clear things and set autoneg_complete. Similarly, when dealing with the same PHY in C45 mode, I noticed that phy->autoneg_complete never gets set, but rather we have to check it via genphy_c45_aneg_done(phydev) and clear bits set by mii_stat1000_mod_linkmode_lpa_t(). Doing so for C45 access, and following your suggestion above for C22 resolves the issue without any need to check MDIO_AN_10GBT_STAT_LOCOK or MDIO_AN_10GBT_STAT_REMOK. > [...] > However, if we keep the rtlgen_decode_speed() stuff, and can fix the > duplex issue, then the phy_resolve_aneg_linkmode() calls should not > be necessary, and it should be moved _after_ this to ensure that > phydev->speed (and phydev->duplex) are correctly set. PHY Specific Status Register, MMD 31.0xA434 also carries duplex information in bit 3 as well as more useful information. Probably rtlgen_decode_speed() should be renamed to rtlgen_decode_physr() and decode most of that. I'll post a series taking care of all of that shortly. Again, thanks a lot for the extremely insightful lesson! Cheers Daniel
diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c index c4d0d93523ad..d276477cf511 100644 --- a/drivers/net/phy/realtek.c +++ b/drivers/net/phy/realtek.c @@ -927,6 +927,10 @@ static int rtl822x_read_status(struct phy_device *phydev) if (lpadv < 0) return lpadv; + if (!(lpadv & MDIO_AN_10GBT_STAT_REMOK) || + !(lpadv & MDIO_AN_10GBT_STAT_LOCOK)) + lpadv = 0; + mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, lpadv); }
Only use link-partner advertisement bits for 10GbE modes if they are actually valid. Check LOCALOK and REMOTEOK bits and clear 10GbE modes unless both of them are set. This prevents misinterpreting the stale 2500M link-partner advertisement bit in case a subsequent linkpartner doesn't do any NBase-T advertisement at all. Signed-off-by: Daniel Golle <daniel@makrotopia.org> --- drivers/net/phy/realtek.c | 4 ++++ 1 file changed, 4 insertions(+)