diff mbox series

ixgbe: Manual AN-37 for troublesome link partners for X550 SFI

Message ID 20240906104145.9587-1-jeffd@silicom-usa.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series ixgbe: Manual AN-37 for troublesome link partners for X550 SFI | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api fail Found: 'module_param' was: 0 now: 1
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 101 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 83 this patch: 83
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-09-06--21-00 (tests: 721)

Commit Message

Jeff Daly Sept. 6, 2024, 10:41 a.m. UTC
Resubmit commit 565736048bd5 ("ixgbe: Manual AN-37 for troublesome link
partners for X550 SFI")

Some (Juniper MX5) SFP link partners exhibit a disinclination to
autonegotiate with X550 configured in SFI mode.  This patch enables
a manual AN-37 restart to work around the problem.

Resubmitted patch includes a module parameter (default disabled) to
isolate changes.

Signed-off-by: Jeff Daly <jeffd@silicom-usa.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  8 +++
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h |  4 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 52 +++++++++++++++++++
 3 files changed, 64 insertions(+)

Comments

Andrew Lunn Sept. 6, 2024, 3:54 p.m. UTC | #1
On Fri, Sep 06, 2024 at 06:41:45AM -0400, Jeff Daly wrote:
> Resubmit commit 565736048bd5 ("ixgbe: Manual AN-37 for troublesome link
> partners for X550 SFI")
> 
> Some (Juniper MX5) SFP link partners exhibit a disinclination to
> autonegotiate with X550 configured in SFI mode.  This patch enables
> a manual AN-37 restart to work around the problem.
> 
> Resubmitted patch includes a module parameter (default disabled) to
> isolate changes.

Module parameters are not liked in networking code. They are very user
unfriendly, and poorly documented.

Why do you need it? Is this change risky?

	Andrew
Jeff Daly Sept. 6, 2024, 8:49 p.m. UTC | #2
> On Fri, Sep 06, 2024 at 06:41:45AM -0400, Jeff Daly wrote:
> > Resubmit commit 565736048bd5 ("ixgbe: Manual AN-37 for troublesome
> > link partners for X550 SFI")
> >
> > Some (Juniper MX5) SFP link partners exhibit a disinclination to
> > autonegotiate with X550 configured in SFI mode.  This patch enables a
> > manual AN-37 restart to work around the problem.
> >
> > Resubmitted patch includes a module parameter (default disabled) to
> > isolate changes.
> 
> Module parameters are not liked in networking code. They are very user
> unfriendly, and poorly documented.

Completely understood, which is why the original patch didn't include this.

> 
> Why do you need it? Is this change risky?
> 
>         Andrew

It turns out that the patch works fine for the specific issue it's trying to address (Juniper switch), 
but for (seemingly all) other devices it breaks the autonegotiation.  A few months back it was 
reported that there were issues with Cisco switches (which we didn't have to test with).  The
parameter was added in order to isolate the specific changes from affecting any other hardware.
Andrew Lunn Sept. 6, 2024, 9:16 p.m. UTC | #3
> It turns out that the patch works fine for the specific issue it's trying to address (Juniper switch), 
> but for (seemingly all) other devices it breaks the autonegotiation.

So it sounds like you need to figure out the nitty-gritty details of
what is going on with the Juniper switch. Once you understand that,
you might be able to find a workaround which works for all systems.

    Andrew
Jeff Daly Sept. 9, 2024, 3:46 p.m. UTC | #4
> 
> > It turns out that the patch works fine for the specific issue it's
> > trying to address (Juniper switch), but for (seemingly all) other devices it breaks
> the autonegotiation.
> 
> So it sounds like you need to figure out the nitty-gritty details of what is going
> on with the Juniper switch. Once you understand that, you might be able to find
> a workaround which works for all systems.
> 
>     Andrew

This was originally worked out by Doug Boom at Intel.  It had to do with autonegotiation not being the part of the SFP optics when the Denverton X550 Si was released and was thus not POR for DNV.   The Juniper switches however won't exit their AN sequence unless an AN37 transaction is seen.  Other switch vendors recover gracefully when the right encoding is discovered, not using AN37 transactions, but not Juniper.  Since DNV doesn't do AN37 in SFP auto mode, there's an endless loop.   (Technically, the switches *could* be updated to new firmware that should have this capability, but apparently a logistical issue for at least one of our customers.)  

Going back through my emails, Doug did mention that it would possibly cause issues with other switches, but it wasn't anything we, or (until just recently) anyone else had observed.  A quote from Doug:  

"that AN37 fix pretty much only works with the Juniper switches, and can cause problems with other switches."

Initially I wanted to have this patch wrapped in a module parameter to avoid any potential issues, but that was shot down for the same reasons you initially commented about.  The unwrapped patch was accepted however.  It was a couple years before the potential other switch issue actually showed up and the patch was reverted.  Our customer still wants the code in the mainline kernel driver, maintaining a separate patch was not something that was acceptable to them, so we are.  This was all gone around with Intel a couple of years before and the solution for a non-updated Juniper MX5T switch is orthogonal to other switch support, thus the patch with the module parameter.
Andrew Lunn Sept. 9, 2024, 7:35 p.m. UTC | #5
> This was originally worked out by Doug Boom at Intel.  It had to do
> with autonegotiation not being the part of the SFP optics when the
> Denverton X550 Si was released and was thus not POR for DNV.  The
> Juniper switches however won't exit their AN sequence unless an AN37
> transaction is seen.

I wounder what 802.3 says about this. I suspect the Juniper switch is
within the standard here, and the x550 is broken.

> Other switch vendors recover gracefully when the right encoding is
> discovered, not using AN37 transactions, but not Juniper.

We have seen similar things in the Linux core PHY handling, but mostly
around 2500BaseX MAC and PHY drivers. A lot of vendors implement what
they call over clocked SGMII, rather than 2500BaseX. But SGMII
signalling makes no sense when overclocked to 2.5GHz, so they just
disable it, leaving no signalling at all. Some 25000BaseX PHYs handle
this, they gracefully fall back to sensible defaults when they
discover they are connected to a broken MAC. Others need telling they
are connected to a broken MAC which does not perform signalling. But
it is easier for a MAC-PHY relationship, everything is on one board,
we know all the details, and can work around the issues.

> Since DNV doesn't do AN37 in SFP auto mode, there's an endless loop.
> (Technically, the switches *could* be updated to new firmware that
> should have this capability, but apparently a logistical issue for
> at least one of our customers.)

I would say that is the wrong solution, i don't think the switch is
doing anything wrong. But the devil is in the details, check 802.3.

> Going back through my emails, Doug did mention that it would possibly cause issues with other switches, but it wasn't anything we, or (until just recently) anyone else had observed.  A quote from Doug:  
> 
> "that AN37 fix pretty much only works with the Juniper switches, and can cause problems with other switches."

LOS from the from the SFP cage will tell you there is something on the
other end of the link. It is not a particularly reliable signal, since
it just means there is light. Is there any indication the link is not
usable? You could wait 10 seconds after LOS is inactive, and if there
is no usable link kick off the workaround. If after 10 seconds the
link is still not usable, turn the workaround off again. Flip flop
every 10 seconds.

Hopefully the initial 10 seconds delay means you won't upset switches
which currently work, and after 10 seconds, you gain a link to
switches that really do expect AN37.

	Andrew
Jeff Daly Sept. 10, 2024, 5:51 p.m. UTC | #6
> > This was originally worked out by Doug Boom at Intel.  It had to do
> > with autonegotiation not being the part of the SFP optics when the
> > Denverton X550 Si was released and was thus not POR for DNV.  The
> > Juniper switches however won't exit their AN sequence unless an AN37
> > transaction is seen.
> 
> I wounder what 802.3 says about this. I suspect the Juniper switch is within the
> standard here, and the x550 is broken.
>

Paraphrasing Doug: X550 on DNV lacks the AN37 to SFI like the other ixgbe devices so AN37 SFI doesn't work with DNV unless both sides force autonegotiation.
The Juniper switch won't exit AN until it sees an AN37 transaction, on stock DNV this won't occur.  There's no timeout with AN37 in the spec so Juniper 
implements the protocol according to spec, but this means with no AN37 coming from DNV it loops forever.  Other vendors (and probably Juniper too) saw the
hole in the spec and have a timeout and some recovery where it locks correctly (not via AN37), which make other switches work ok with DNV, but these still
have the endless loop. 
 
> > Other switch vendors recover gracefully when the right encoding is
> > discovered, not using AN37 transactions, but not Juniper.
> 

Snip

> LOS from the from the SFP cage will tell you there is something on the other end
> of the link. It is not a particularly reliable signal, since it just means there is light.
> Is there any indication the link is not usable? You could wait 10 seconds after
> LOS is inactive, and if there is no usable link kick off the workaround. If after 10
> seconds the link is still not usable, turn the workaround off again. Flip flop every
> 10 seconds.
> 
> Hopefully the initial 10 seconds delay means you won't upset switches which
> currently work, and after 10 seconds, you gain a link to switches that really do
> expect AN37.
> 
>         Andrew

I'll look into this.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 8b8404d8c946..ef77df0f94a6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -157,6 +157,11 @@  module_param(allow_unsupported_sfp, bool, 0444);
 MODULE_PARM_DESC(allow_unsupported_sfp,
 		 "Allow unsupported and untested SFP+ modules on 82599-based adapters");
 
+static bool manual_an37_for_sfi;
+module_param(manual_an37_for_sfi, bool, 0444);
+MODULE_PARM_DESC(manual_an37_for_sfi,
+		 "Manual AN-37 for troublesome link partners for X550 SFI");
+
 #define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV|NETIF_MSG_PROBE|NETIF_MSG_LINK)
 static int debug = -1;
 module_param(debug, int, 0);
@@ -10977,6 +10982,9 @@  static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (allow_unsupported_sfp)
 		hw->allow_unsupported_sfp = allow_unsupported_sfp;
 
+	if (manual_an37_for_sfi)
+		hw->manual_an37_for_sfi = manual_an37_for_sfi;
+
 	/* reset_hw fills in the perm_addr as well */
 	hw->phy.reset_if_overtemp = true;
 	err = hw->mac.ops.reset_hw(hw);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index 346e3d9114a8..288bb2be3c23 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -3654,6 +3654,7 @@  struct ixgbe_hw {
 	bool				allow_unsupported_sfp;
 	bool				wol_enabled;
 	bool				need_crosstalk_fix;
+	bool				manual_an37_for_sfi;
 };
 
 struct ixgbe_info {
@@ -3675,7 +3676,9 @@  struct ixgbe_info {
 #define IXGBE_KRM_LINK_S1(P)		((P) ? 0x8200 : 0x4200)
 #define IXGBE_KRM_LINK_CTRL_1(P)	((P) ? 0x820C : 0x420C)
 #define IXGBE_KRM_AN_CNTL_1(P)		((P) ? 0x822C : 0x422C)
+#define IXGBE_KRM_AN_CNTL_4(P)		((P) ? 0x8238 : 0x4238)
 #define IXGBE_KRM_AN_CNTL_8(P)		((P) ? 0x8248 : 0x4248)
+#define IXGBE_KRM_PCS_KX_AN(P)		((P) ? 0x9918 : 0x5918)
 #define IXGBE_KRM_SGMII_CTRL(P)		((P) ? 0x82A0 : 0x42A0)
 #define IXGBE_KRM_LP_BASE_PAGE_HIGH(P)	((P) ? 0x836C : 0x436C)
 #define IXGBE_KRM_DSP_TXFFE_STATE_4(P)	((P) ? 0x8634 : 0x4634)
@@ -3685,6 +3688,7 @@  struct ixgbe_info {
 #define IXGBE_KRM_PMD_FLX_MASK_ST20(P)	((P) ? 0x9054 : 0x5054)
 #define IXGBE_KRM_TX_COEFF_CTRL_1(P)	((P) ? 0x9520 : 0x5520)
 #define IXGBE_KRM_RX_ANA_CTL(P)		((P) ? 0x9A00 : 0x5A00)
+#define IXGBE_KRM_FLX_TMRS_CTRL_ST31(P)	((P) ? 0x9180 : 0x5180)
 
 #define IXGBE_KRM_PMD_FLX_MASK_ST20_SFI_10G_DA		~(0x3 << 20)
 #define IXGBE_KRM_PMD_FLX_MASK_ST20_SFI_10G_SR		BIT(20)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
index a5f644934445..e3117ccf092c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
@@ -1726,6 +1726,58 @@  static int ixgbe_setup_sfi_x550a(struct ixgbe_hw *hw, ixgbe_link_speed *speed)
 				IXGBE_KRM_PMD_FLX_MASK_ST20(hw->bus.lan_id),
 				IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val);
 
+	if (hw->manual_an37_for_sfi) {
+		/* change mode enforcement rules to hybrid */
+		(void)mac->ops.read_iosf_sb_reg(hw,
+			IXGBE_KRM_FLX_TMRS_CTRL_ST31(hw->bus.lan_id),
+			IXGBE_SB_IOSF_TARGET_KR_PHY, &reg_val);
+			reg_val |= 0x0400;
+
+		(void)mac->ops.write_iosf_sb_reg(hw,
+			IXGBE_KRM_FLX_TMRS_CTRL_ST31(hw->bus.lan_id),
+			IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val);
+
+		/* manually control the config */
+		(void)mac->ops.read_iosf_sb_reg(hw,
+			IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id),
+			IXGBE_SB_IOSF_TARGET_KR_PHY, &reg_val);
+		reg_val |= 0x20002240;
+
+		(void)mac->ops.write_iosf_sb_reg(hw,
+			IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id),
+			IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val);
+
+		/* move the AN base page values */
+		(void)mac->ops.read_iosf_sb_reg(hw,
+			IXGBE_KRM_PCS_KX_AN(hw->bus.lan_id),
+			IXGBE_SB_IOSF_TARGET_KR_PHY, &reg_val);
+		reg_val |= 0x1;
+
+		(void)mac->ops.write_iosf_sb_reg(hw,
+			IXGBE_KRM_PCS_KX_AN(hw->bus.lan_id),
+			IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val);
+
+		/* set the AN37 over CB mode */
+		(void)mac->ops.read_iosf_sb_reg(hw,
+			IXGBE_KRM_AN_CNTL_4(hw->bus.lan_id),
+			IXGBE_SB_IOSF_TARGET_KR_PHY, &reg_val);
+		reg_val |= 0x20000000;
+
+		(void)mac->ops.write_iosf_sb_reg(hw,
+			IXGBE_KRM_AN_CNTL_4(hw->bus.lan_id),
+			IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val);
+
+		/* restart AN manually */
+		(void)mac->ops.read_iosf_sb_reg(hw,
+			IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id),
+			IXGBE_SB_IOSF_TARGET_KR_PHY, &reg_val);
+		reg_val |= IXGBE_KRM_LINK_CTRL_1_TETH_AN_RESTART;
+
+		(void)mac->ops.write_iosf_sb_reg(hw,
+			IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id),
+			IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val);
+	}
+
 	/* Toggle port SW reset by AN reset. */
 	status = ixgbe_restart_an_internal_phy_x550em(hw);