Message ID | 20240906104145.9587-1-jeffd@silicom-usa.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ixgbe: Manual AN-37 for troublesome link partners for X550 SFI | expand |
On Fri, Sep 06, 2024 at 06:41:45AM -0400, Jeff Daly wrote: > Resubmit commit 565736048bd5 ("ixgbe: Manual AN-37 for troublesome link > partners for X550 SFI") > > Some (Juniper MX5) SFP link partners exhibit a disinclination to > autonegotiate with X550 configured in SFI mode. This patch enables > a manual AN-37 restart to work around the problem. > > Resubmitted patch includes a module parameter (default disabled) to > isolate changes. Module parameters are not liked in networking code. They are very user unfriendly, and poorly documented. Why do you need it? Is this change risky? Andrew
> On Fri, Sep 06, 2024 at 06:41:45AM -0400, Jeff Daly wrote: > > Resubmit commit 565736048bd5 ("ixgbe: Manual AN-37 for troublesome > > link partners for X550 SFI") > > > > Some (Juniper MX5) SFP link partners exhibit a disinclination to > > autonegotiate with X550 configured in SFI mode. This patch enables a > > manual AN-37 restart to work around the problem. > > > > Resubmitted patch includes a module parameter (default disabled) to > > isolate changes. > > Module parameters are not liked in networking code. They are very user > unfriendly, and poorly documented. Completely understood, which is why the original patch didn't include this. > > Why do you need it? Is this change risky? > > Andrew It turns out that the patch works fine for the specific issue it's trying to address (Juniper switch), but for (seemingly all) other devices it breaks the autonegotiation. A few months back it was reported that there were issues with Cisco switches (which we didn't have to test with). The parameter was added in order to isolate the specific changes from affecting any other hardware.
> It turns out that the patch works fine for the specific issue it's trying to address (Juniper switch), > but for (seemingly all) other devices it breaks the autonegotiation. So it sounds like you need to figure out the nitty-gritty details of what is going on with the Juniper switch. Once you understand that, you might be able to find a workaround which works for all systems. Andrew
> > > It turns out that the patch works fine for the specific issue it's > > trying to address (Juniper switch), but for (seemingly all) other devices it breaks > the autonegotiation. > > So it sounds like you need to figure out the nitty-gritty details of what is going > on with the Juniper switch. Once you understand that, you might be able to find > a workaround which works for all systems. > > Andrew This was originally worked out by Doug Boom at Intel. It had to do with autonegotiation not being the part of the SFP optics when the Denverton X550 Si was released and was thus not POR for DNV. The Juniper switches however won't exit their AN sequence unless an AN37 transaction is seen. Other switch vendors recover gracefully when the right encoding is discovered, not using AN37 transactions, but not Juniper. Since DNV doesn't do AN37 in SFP auto mode, there's an endless loop. (Technically, the switches *could* be updated to new firmware that should have this capability, but apparently a logistical issue for at least one of our customers.) Going back through my emails, Doug did mention that it would possibly cause issues with other switches, but it wasn't anything we, or (until just recently) anyone else had observed. A quote from Doug: "that AN37 fix pretty much only works with the Juniper switches, and can cause problems with other switches." Initially I wanted to have this patch wrapped in a module parameter to avoid any potential issues, but that was shot down for the same reasons you initially commented about. The unwrapped patch was accepted however. It was a couple years before the potential other switch issue actually showed up and the patch was reverted. Our customer still wants the code in the mainline kernel driver, maintaining a separate patch was not something that was acceptable to them, so we are. This was all gone around with Intel a couple of years before and the solution for a non-updated Juniper MX5T switch is orthogonal to other switch support, thus the patch with the module parameter.
> This was originally worked out by Doug Boom at Intel. It had to do > with autonegotiation not being the part of the SFP optics when the > Denverton X550 Si was released and was thus not POR for DNV. The > Juniper switches however won't exit their AN sequence unless an AN37 > transaction is seen. I wounder what 802.3 says about this. I suspect the Juniper switch is within the standard here, and the x550 is broken. > Other switch vendors recover gracefully when the right encoding is > discovered, not using AN37 transactions, but not Juniper. We have seen similar things in the Linux core PHY handling, but mostly around 2500BaseX MAC and PHY drivers. A lot of vendors implement what they call over clocked SGMII, rather than 2500BaseX. But SGMII signalling makes no sense when overclocked to 2.5GHz, so they just disable it, leaving no signalling at all. Some 25000BaseX PHYs handle this, they gracefully fall back to sensible defaults when they discover they are connected to a broken MAC. Others need telling they are connected to a broken MAC which does not perform signalling. But it is easier for a MAC-PHY relationship, everything is on one board, we know all the details, and can work around the issues. > Since DNV doesn't do AN37 in SFP auto mode, there's an endless loop. > (Technically, the switches *could* be updated to new firmware that > should have this capability, but apparently a logistical issue for > at least one of our customers.) I would say that is the wrong solution, i don't think the switch is doing anything wrong. But the devil is in the details, check 802.3. > Going back through my emails, Doug did mention that it would possibly cause issues with other switches, but it wasn't anything we, or (until just recently) anyone else had observed. A quote from Doug: > > "that AN37 fix pretty much only works with the Juniper switches, and can cause problems with other switches." LOS from the from the SFP cage will tell you there is something on the other end of the link. It is not a particularly reliable signal, since it just means there is light. Is there any indication the link is not usable? You could wait 10 seconds after LOS is inactive, and if there is no usable link kick off the workaround. If after 10 seconds the link is still not usable, turn the workaround off again. Flip flop every 10 seconds. Hopefully the initial 10 seconds delay means you won't upset switches which currently work, and after 10 seconds, you gain a link to switches that really do expect AN37. Andrew
> > This was originally worked out by Doug Boom at Intel. It had to do > > with autonegotiation not being the part of the SFP optics when the > > Denverton X550 Si was released and was thus not POR for DNV. The > > Juniper switches however won't exit their AN sequence unless an AN37 > > transaction is seen. > > I wounder what 802.3 says about this. I suspect the Juniper switch is within the > standard here, and the x550 is broken. > Paraphrasing Doug: X550 on DNV lacks the AN37 to SFI like the other ixgbe devices so AN37 SFI doesn't work with DNV unless both sides force autonegotiation. The Juniper switch won't exit AN until it sees an AN37 transaction, on stock DNV this won't occur. There's no timeout with AN37 in the spec so Juniper implements the protocol according to spec, but this means with no AN37 coming from DNV it loops forever. Other vendors (and probably Juniper too) saw the hole in the spec and have a timeout and some recovery where it locks correctly (not via AN37), which make other switches work ok with DNV, but these still have the endless loop. > > Other switch vendors recover gracefully when the right encoding is > > discovered, not using AN37 transactions, but not Juniper. > Snip > LOS from the from the SFP cage will tell you there is something on the other end > of the link. It is not a particularly reliable signal, since it just means there is light. > Is there any indication the link is not usable? You could wait 10 seconds after > LOS is inactive, and if there is no usable link kick off the workaround. If after 10 > seconds the link is still not usable, turn the workaround off again. Flip flop every > 10 seconds. > > Hopefully the initial 10 seconds delay means you won't upset switches which > currently work, and after 10 seconds, you gain a link to switches that really do > expect AN37. > > Andrew I'll look into this.
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 8b8404d8c946..ef77df0f94a6 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -157,6 +157,11 @@ module_param(allow_unsupported_sfp, bool, 0444); MODULE_PARM_DESC(allow_unsupported_sfp, "Allow unsupported and untested SFP+ modules on 82599-based adapters"); +static bool manual_an37_for_sfi; +module_param(manual_an37_for_sfi, bool, 0444); +MODULE_PARM_DESC(manual_an37_for_sfi, + "Manual AN-37 for troublesome link partners for X550 SFI"); + #define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV|NETIF_MSG_PROBE|NETIF_MSG_LINK) static int debug = -1; module_param(debug, int, 0); @@ -10977,6 +10982,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent) if (allow_unsupported_sfp) hw->allow_unsupported_sfp = allow_unsupported_sfp; + if (manual_an37_for_sfi) + hw->manual_an37_for_sfi = manual_an37_for_sfi; + /* reset_hw fills in the perm_addr as well */ hw->phy.reset_if_overtemp = true; err = hw->mac.ops.reset_hw(hw); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h index 346e3d9114a8..288bb2be3c23 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h @@ -3654,6 +3654,7 @@ struct ixgbe_hw { bool allow_unsupported_sfp; bool wol_enabled; bool need_crosstalk_fix; + bool manual_an37_for_sfi; }; struct ixgbe_info { @@ -3675,7 +3676,9 @@ struct ixgbe_info { #define IXGBE_KRM_LINK_S1(P) ((P) ? 0x8200 : 0x4200) #define IXGBE_KRM_LINK_CTRL_1(P) ((P) ? 0x820C : 0x420C) #define IXGBE_KRM_AN_CNTL_1(P) ((P) ? 0x822C : 0x422C) +#define IXGBE_KRM_AN_CNTL_4(P) ((P) ? 0x8238 : 0x4238) #define IXGBE_KRM_AN_CNTL_8(P) ((P) ? 0x8248 : 0x4248) +#define IXGBE_KRM_PCS_KX_AN(P) ((P) ? 0x9918 : 0x5918) #define IXGBE_KRM_SGMII_CTRL(P) ((P) ? 0x82A0 : 0x42A0) #define IXGBE_KRM_LP_BASE_PAGE_HIGH(P) ((P) ? 0x836C : 0x436C) #define IXGBE_KRM_DSP_TXFFE_STATE_4(P) ((P) ? 0x8634 : 0x4634) @@ -3685,6 +3688,7 @@ struct ixgbe_info { #define IXGBE_KRM_PMD_FLX_MASK_ST20(P) ((P) ? 0x9054 : 0x5054) #define IXGBE_KRM_TX_COEFF_CTRL_1(P) ((P) ? 0x9520 : 0x5520) #define IXGBE_KRM_RX_ANA_CTL(P) ((P) ? 0x9A00 : 0x5A00) +#define IXGBE_KRM_FLX_TMRS_CTRL_ST31(P) ((P) ? 0x9180 : 0x5180) #define IXGBE_KRM_PMD_FLX_MASK_ST20_SFI_10G_DA ~(0x3 << 20) #define IXGBE_KRM_PMD_FLX_MASK_ST20_SFI_10G_SR BIT(20) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c index a5f644934445..e3117ccf092c 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c @@ -1726,6 +1726,58 @@ static int ixgbe_setup_sfi_x550a(struct ixgbe_hw *hw, ixgbe_link_speed *speed) IXGBE_KRM_PMD_FLX_MASK_ST20(hw->bus.lan_id), IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); + if (hw->manual_an37_for_sfi) { + /* change mode enforcement rules to hybrid */ + (void)mac->ops.read_iosf_sb_reg(hw, + IXGBE_KRM_FLX_TMRS_CTRL_ST31(hw->bus.lan_id), + IXGBE_SB_IOSF_TARGET_KR_PHY, ®_val); + reg_val |= 0x0400; + + (void)mac->ops.write_iosf_sb_reg(hw, + IXGBE_KRM_FLX_TMRS_CTRL_ST31(hw->bus.lan_id), + IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); + + /* manually control the config */ + (void)mac->ops.read_iosf_sb_reg(hw, + IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id), + IXGBE_SB_IOSF_TARGET_KR_PHY, ®_val); + reg_val |= 0x20002240; + + (void)mac->ops.write_iosf_sb_reg(hw, + IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id), + IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); + + /* move the AN base page values */ + (void)mac->ops.read_iosf_sb_reg(hw, + IXGBE_KRM_PCS_KX_AN(hw->bus.lan_id), + IXGBE_SB_IOSF_TARGET_KR_PHY, ®_val); + reg_val |= 0x1; + + (void)mac->ops.write_iosf_sb_reg(hw, + IXGBE_KRM_PCS_KX_AN(hw->bus.lan_id), + IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); + + /* set the AN37 over CB mode */ + (void)mac->ops.read_iosf_sb_reg(hw, + IXGBE_KRM_AN_CNTL_4(hw->bus.lan_id), + IXGBE_SB_IOSF_TARGET_KR_PHY, ®_val); + reg_val |= 0x20000000; + + (void)mac->ops.write_iosf_sb_reg(hw, + IXGBE_KRM_AN_CNTL_4(hw->bus.lan_id), + IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); + + /* restart AN manually */ + (void)mac->ops.read_iosf_sb_reg(hw, + IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id), + IXGBE_SB_IOSF_TARGET_KR_PHY, ®_val); + reg_val |= IXGBE_KRM_LINK_CTRL_1_TETH_AN_RESTART; + + (void)mac->ops.write_iosf_sb_reg(hw, + IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id), + IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); + } + /* Toggle port SW reset by AN reset. */ status = ixgbe_restart_an_internal_phy_x550em(hw);
Resubmit commit 565736048bd5 ("ixgbe: Manual AN-37 for troublesome link partners for X550 SFI") Some (Juniper MX5) SFP link partners exhibit a disinclination to autonegotiate with X550 configured in SFI mode. This patch enables a manual AN-37 restart to work around the problem. Resubmitted patch includes a module parameter (default disabled) to isolate changes. Signed-off-by: Jeff Daly <jeffd@silicom-usa.com> --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 +++ drivers/net/ethernet/intel/ixgbe/ixgbe_type.h | 4 ++ drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 52 +++++++++++++++++++ 3 files changed, 64 insertions(+)