mbox series

[net-next,0/5] net: phylink: fix PCS without autoneg

Message ID Z3_n_5BXkxQR4zEG@shell.armlinux.org.uk (mailing list archive)
Headers show
Series net: phylink: fix PCS without autoneg | expand

Message

Russell King (Oracle) Jan. 9, 2025, 3:15 p.m. UTC
Hi,

Eric Woudstra reported that a PCS attached using 2500base-X does not
see link when phylink is using in-band mode, but autoneg is disabled,
despite there being a valid 2500base-X signal being received. We have
these settings:

	act_link_an_mode = MLO_AN_INBAND
	pcs_neg_mode = PHYLINK_PCS_NEG_INBAND_DISABLED

Eric diagnosed it to phylink_decode_c37_word() setting state->link
false because the full-duplex bit isn't set in the non-existent link
partner advertisement word (which doesn't exist because in-band
autoneg is disabled!)

The test in phylink_mii_c22_pcs_decode_state() is supposed to catch
this state, but since we converted PCS to use neg_mode, testing the
Autoneg in the local advertisement is no longer sufficient - we need
to be looking at the neg_mode, which currently isn't provided.

We need to provide this via the .pcs_get_state() method, and this
will require modifying all PCS implementations to add the extra
argument to this method.

Patch 1 uses the PCS neg_mode in phylink_mac_pcs_get_state() to correct
the now obsolute usage of the Autoneg bit in the advertisement.

Patch 2 passes neg_mode into the .pcs_get_state() method, and updates
all users.

Patch 3 adds neg_mode as an argument to the various clause 22 state
decoder functions in phylink, modifying drivers to pass the neg_mode
through.

Patch 4 makes use of phylink_mii_c22_pcs_decode_state() rather than
using the Autoneg bit in the advertising field.

Patch 5 may be required for Eric's case - it ensures that we report
the correct state for interface types that we support only one set
of modes for when autoneg is disabled.

 drivers/net/dsa/b53/b53_serdes.c                   |  4 +-
 drivers/net/dsa/mt7530.c                           |  2 +-
 drivers/net/dsa/mv88e6xxx/pcs-6185.c               |  1 +
 drivers/net/dsa/mv88e6xxx/pcs-6352.c               |  1 +
 drivers/net/dsa/mv88e6xxx/pcs-639x.c               |  5 +-
 drivers/net/dsa/qca/qca8k-8xxx.c                   |  2 +-
 drivers/net/ethernet/cadence/macb_main.c           |  3 +-
 drivers/net/ethernet/freescale/fman/fman_dtsec.c   |  4 +-
 drivers/net/ethernet/marvell/mvneta.c              |  2 +-
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c    |  2 +
 .../net/ethernet/marvell/prestera/prestera_main.c  |  1 +
 drivers/net/ethernet/meta/fbnic/fbnic_phylink.c    |  2 +-
 .../net/ethernet/microchip/lan966x/lan966x_main.h  |  2 +-
 .../ethernet/microchip/lan966x/lan966x_phylink.c   |  3 +-
 .../net/ethernet/microchip/lan966x/lan966x_port.c  |  4 +-
 .../net/ethernet/microchip/sparx5/sparx5_phylink.c |  2 +-
 drivers/net/ethernet/xilinx/xilinx_axienet_main.c  |  3 +-
 drivers/net/pcs/pcs-lynx.c                         |  4 +-
 drivers/net/pcs/pcs-mtk-lynxi.c                    |  4 +-
 drivers/net/pcs/pcs-xpcs.c                         |  7 +--
 drivers/net/phy/phylink.c                          | 60 ++++++++++++++++------
 include/linux/phylink.h                            | 11 ++--
 22 files changed, 87 insertions(+), 42 deletions(-)

Comments

Russell King (Oracle) Jan. 13, 2025, 9:21 a.m. UTC | #1
Hi,

Eric Woudstra reported that a PCS attached using 2500base-X does not
see link when phylink is using in-band mode, but autoneg is disabled,
despite there being a valid 2500base-X signal being received. We have
these settings:

	act_link_an_mode = MLO_AN_INBAND
	pcs_neg_mode = PHYLINK_PCS_NEG_INBAND_DISABLED

Eric diagnosed it to phylink_decode_c37_word() setting state->link
false because the full-duplex bit isn't set in the non-existent link
partner advertisement word (which doesn't exist because in-band
autoneg is disabled!)

The test in phylink_mii_c22_pcs_decode_state() is supposed to catch
this state, but since we converted PCS to use neg_mode, testing the
Autoneg in the local advertisement is no longer sufficient - we need
to be looking at the neg_mode, which currently isn't provided.

We need to provide this via the .pcs_get_state() method, and this
will require modifying all PCS implementations to add the extra
argument to this method.

Patch 1 uses the PCS neg_mode in phylink_mac_pcs_get_state() to correct
the now obsolute usage of the Autoneg bit in the advertisement.

Patch 2 passes neg_mode into the .pcs_get_state() method, and updates
all users.

Patch 3 adds neg_mode as an argument to the various clause 22 state
decoder functions in phylink, modifying drivers to pass the neg_mode
through.

Patch 4 makes use of phylink_mii_c22_pcs_decode_state() rather than
using the Autoneg bit in the advertising field.

Patch 5 may be required for Eric's case - it ensures that we report
the correct state for interface types that we support only one set
of modes for when autoneg is disabled.

Changes in v2:
- Add test for NULL pcs in patch 1

I haven't added Eric's t-b because I used a different fix in patch 1.

 drivers/net/dsa/b53/b53_serdes.c                   |  4 +-
 drivers/net/dsa/mt7530.c                           |  2 +-
 drivers/net/dsa/mv88e6xxx/pcs-6185.c               |  1 +
 drivers/net/dsa/mv88e6xxx/pcs-6352.c               |  1 +
 drivers/net/dsa/mv88e6xxx/pcs-639x.c               |  5 +-
 drivers/net/dsa/qca/qca8k-8xxx.c                   |  2 +-
 drivers/net/ethernet/cadence/macb_main.c           |  3 +-
 drivers/net/ethernet/freescale/fman/fman_dtsec.c   |  4 +-
 drivers/net/ethernet/marvell/mvneta.c              |  2 +-
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c    |  2 +
 .../net/ethernet/marvell/prestera/prestera_main.c  |  1 +
 drivers/net/ethernet/meta/fbnic/fbnic_phylink.c    |  2 +-
 .../net/ethernet/microchip/lan966x/lan966x_main.h  |  2 +-
 .../ethernet/microchip/lan966x/lan966x_phylink.c   |  3 +-
 .../net/ethernet/microchip/lan966x/lan966x_port.c  |  4 +-
 .../net/ethernet/microchip/sparx5/sparx5_phylink.c |  2 +-
 drivers/net/ethernet/xilinx/xilinx_axienet_main.c  |  3 +-
 drivers/net/pcs/pcs-lynx.c                         |  4 +-
 drivers/net/pcs/pcs-mtk-lynxi.c                    |  4 +-
 drivers/net/pcs/pcs-xpcs.c                         |  7 +--
 drivers/net/phy/phylink.c                          | 60 ++++++++++++++++------
 include/linux/phylink.h                            | 11 ++--
 22 files changed, 87 insertions(+), 42 deletions(-)
Russell King (Oracle) Jan. 13, 2025, 10:14 a.m. UTC | #2
On Mon, Jan 13, 2025 at 09:21:18AM +0000, Russell King (Oracle) wrote:
> Hi,
> 
> Eric Woudstra reported that a PCS attached using 2500base-X does not
> see link when phylink is using in-band mode, but autoneg is disabled,
> despite there being a valid 2500base-X signal being received. We have
> these settings:

Please ignore this inappropriately threaded cover message. Thanks.
Eric Woudstra Jan. 14, 2025, 10:59 a.m. UTC | #3
On 1/13/25 10:21 AM, Russell King (Oracle) wrote:
> Hi,
> 
> Eric Woudstra reported that a PCS attached using 2500base-X does not
> see link when phylink is using in-band mode, but autoneg is disabled,
> despite there being a valid 2500base-X signal being received. We have
> these settings:
> 
> 	act_link_an_mode = MLO_AN_INBAND
> 	pcs_neg_mode = PHYLINK_PCS_NEG_INBAND_DISABLED
> 
> Eric diagnosed it to phylink_decode_c37_word() setting state->link
> false because the full-duplex bit isn't set in the non-existent link
> partner advertisement word (which doesn't exist because in-band
> autoneg is disabled!)
> 
> The test in phylink_mii_c22_pcs_decode_state() is supposed to catch
> this state, but since we converted PCS to use neg_mode, testing the
> Autoneg in the local advertisement is no longer sufficient - we need
> to be looking at the neg_mode, which currently isn't provided.
> 
> We need to provide this via the .pcs_get_state() method, and this
> will require modifying all PCS implementations to add the extra
> argument to this method.
> 
> Patch 1 uses the PCS neg_mode in phylink_mac_pcs_get_state() to correct
> the now obsolute usage of the Autoneg bit in the advertisement.
> 
> Patch 2 passes neg_mode into the .pcs_get_state() method, and updates
> all users.
> 
> Patch 3 adds neg_mode as an argument to the various clause 22 state
> decoder functions in phylink, modifying drivers to pass the neg_mode
> through.
> 
> Patch 4 makes use of phylink_mii_c22_pcs_decode_state() rather than
> using the Autoneg bit in the advertising field.
> 
> Patch 5 may be required for Eric's case - it ensures that we report
> the correct state for interface types that we support only one set
> of modes for when autoneg is disabled.
> 
> Changes in v2:
> - Add test for NULL pcs in patch 1
> 
> I haven't added Eric's t-b because I used a different fix in patch 1.

So I tested this V2 patch and with the first link up command, I get the
link up which is functional end to end.

Tested-by: Eric Woudstra <ericwouds@gmail.com>

PS, FYI: I do however still have the difference in phylink_mac_config().

At first the link is up with (before phy attached):
phylink_mac_config: mode=inband/2500base-x

When the phy is attached, phylink_mac_config() is not called, but we
have functional link.

When I do: ethtool -s eth1 advertise 0x28, then:
phylink_mac_config: mode=inband/sgmii

And back again: ethtool -s eth1 advertise 0x800000000028, then:
phylink_mac_config: mode=phy/2500base-x

I can share more log entries if needed.
Russell King (Oracle) Jan. 14, 2025, 11:14 a.m. UTC | #4
On Tue, Jan 14, 2025 at 11:59:06AM +0100, Eric Woudstra wrote:
> On 1/13/25 10:21 AM, Russell King (Oracle) wrote:
> > Hi,
> > 
> > Eric Woudstra reported that a PCS attached using 2500base-X does not
> > see link when phylink is using in-band mode, but autoneg is disabled,
> > despite there being a valid 2500base-X signal being received. We have
> > these settings:
> > 
> > 	act_link_an_mode = MLO_AN_INBAND
> > 	pcs_neg_mode = PHYLINK_PCS_NEG_INBAND_DISABLED
> > 
> > Eric diagnosed it to phylink_decode_c37_word() setting state->link
> > false because the full-duplex bit isn't set in the non-existent link
> > partner advertisement word (which doesn't exist because in-band
> > autoneg is disabled!)
> > 
> > The test in phylink_mii_c22_pcs_decode_state() is supposed to catch
> > this state, but since we converted PCS to use neg_mode, testing the
> > Autoneg in the local advertisement is no longer sufficient - we need
> > to be looking at the neg_mode, which currently isn't provided.
> > 
> > We need to provide this via the .pcs_get_state() method, and this
> > will require modifying all PCS implementations to add the extra
> > argument to this method.
> > 
> > Patch 1 uses the PCS neg_mode in phylink_mac_pcs_get_state() to correct
> > the now obsolute usage of the Autoneg bit in the advertisement.
> > 
> > Patch 2 passes neg_mode into the .pcs_get_state() method, and updates
> > all users.
> > 
> > Patch 3 adds neg_mode as an argument to the various clause 22 state
> > decoder functions in phylink, modifying drivers to pass the neg_mode
> > through.
> > 
> > Patch 4 makes use of phylink_mii_c22_pcs_decode_state() rather than
> > using the Autoneg bit in the advertising field.
> > 
> > Patch 5 may be required for Eric's case - it ensures that we report
> > the correct state for interface types that we support only one set
> > of modes for when autoneg is disabled.
> > 
> > Changes in v2:
> > - Add test for NULL pcs in patch 1
> > 
> > I haven't added Eric's t-b because I used a different fix in patch 1.
> 
> So I tested this V2 patch and with the first link up command, I get the
> link up which is functional end to end.
> 
> Tested-by: Eric Woudstra <ericwouds@gmail.com>
> 
> PS, FYI: I do however still have the difference in phylink_mac_config().
> 
> At first the link is up with (before phy attached):
> phylink_mac_config: mode=inband/2500base-x
> 
> When the phy is attached, phylink_mac_config() is not called, but we
> have functional link.
> 
> When I do: ethtool -s eth1 advertise 0x28, then:
> phylink_mac_config: mode=inband/sgmii
> 
> And back again: ethtool -s eth1 advertise 0x800000000028, then:
> phylink_mac_config: mode=phy/2500base-x
> 
> I can share more log entries if needed.

I'm not worried - there's a sixth patch that I need to send once this
set of five are merged that will fix the above weirdness.

Thanks for testing.