Message ID | 20240520-net-2024-05-20-revert-silicom-switch-workaround-v1-1-50f80f261c94@intel.com (mailing list archive) |
---|---|
State | Accepted |
Commit | b35b1c0b4e166a427395deaf61e3140495dfcb89 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI" | expand |
On Mon, May 20, 2024 at 05:21:27PM -0700, Jacob Keller wrote: > This reverts commit 565736048bd5f9888990569993c6b6bfdf6dcb6d. > > According to the commit, it implements a manual AN-37 for some > "troublesome" Juniper MX5 switches. This appears to be a workaround for a > particular switch. > > It has been reported that this causes a severe breakage for other switches, > including a Cisco 3560CX-12PD-S. > > The code appears to be a workaround for a specific switch which fails to > link in SFI mode. It expects to see AN-37 auto negotiation in order to > link. The Cisco switch is not expecting AN-37 auto negotiation. When the > device starts the manual AN-37, the Cisco switch decides that the port is > confused and stops attempting to link with it. This persists until a power > cycle. A simple driver unload and reload does not resolve the issue, even > if loading with a version of the driver which lacks this workaround. > > The authors of the workaround commit have not responded with > clarifications, and the result of the workaround is complete failure to > connect with other switches. > > This appears to be a case where the driver can either "correctly" link with > the Juniper MX5 switch, at the cost of bricking the link with the Cisco > switch, or it can behave properly for the Cisco switch, but fail to link > with the Junipir MX5 switch. I do not know enough about the standards > involved to clearly determine whether either switch is at fault or behaving > incorrectly. Nor do I know whether there exists some alternative fix which > corrects behavior with both switches. > > Revert the workaround for the Juniper switch. > > Fixes: 565736048bd5 ("ixgbe: Manual AN-37 for troublesome link partners for X550 SFI") > Link: https://lore.kernel.org/netdev/cbe874db-9ac9-42b8-afa0-88ea910e1e99@intel.com/T/ > Link: https://forum.proxmox.com/threads/intel-x553-sfp-ixgbe-no-go-on-pve8.135129/#post-612291 > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> > Cc: Jeff Daly <jeffd@silicom-usa.com> > Cc: kernel.org-fo5k2w@ycharbi.fr One of those awkward situations where the only known (in this case to Jacob and me) resolution to a regression is itself a regression (for a different setup). I think that in these kind of situations it's best to go back to how things were. Reviewed-by: Simon Horman <horms@kernel.org>
> -----Original Message----- > From: Simon Horman <horms@kernel.org> > Sent: Tuesday, May 21, 2024 12:42 PM > To: Jacob Keller <jacob.e.keller@intel.com> > Cc: netdev@vger.kernel.org; Jeff Daly <jeffd@silicom-usa.com>; kernel.org- > fo5k2w@ycharbi.fr > Subject: Re: [PATCH net] Revert "ixgbe: Manual AN-37 for troublesome link > partners for X550 SFI" > > Caution: This is an external email. Please take care when clicking links or > opening attachments. > > > On Mon, May 20, 2024 at 05:21:27PM -0700, Jacob Keller wrote: > > This reverts commit 565736048bd5f9888990569993c6b6bfdf6dcb6d. > > > > According to the commit, it implements a manual AN-37 for some > > "troublesome" Juniper MX5 switches. This appears to be a workaround > > for a particular switch. > > > > It has been reported that this causes a severe breakage for other > > switches, including a Cisco 3560CX-12PD-S. > > > > The code appears to be a workaround for a specific switch which fails > > to link in SFI mode. It expects to see AN-37 auto negotiation in order > > to link. The Cisco switch is not expecting AN-37 auto negotiation. > > When the device starts the manual AN-37, the Cisco switch decides that > > the port is confused and stops attempting to link with it. This > > persists until a power cycle. A simple driver unload and reload does > > not resolve the issue, even if loading with a version of the driver which lacks > this workaround. > > > > The authors of the workaround commit have not responded with > > clarifications, and the result of the workaround is complete failure > > to connect with other switches. > > > > This appears to be a case where the driver can either "correctly" link > > with the Juniper MX5 switch, at the cost of bricking the link with the > > Cisco switch, or it can behave properly for the Cisco switch, but fail > > to link with the Junipir MX5 switch. I do not know enough about the > > standards involved to clearly determine whether either switch is at > > fault or behaving incorrectly. Nor do I know whether there exists some > > alternative fix which corrects behavior with both switches. > > > > Revert the workaround for the Juniper switch. > > > > Fixes: 565736048bd5 ("ixgbe: Manual AN-37 for troublesome link > > partners for X550 SFI") > > Link: > > https://lore.kernel.org/netdev/cbe874db-9ac9-42b8-afa0-88ea910e1e99@in > > tel.com/T/ > > Link: > > https://forum.proxmox.com/threads/intel-x553-sfp-ixgbe-no-go-on-pve8.1 > > 35129/#post-612291 > > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> > > Cc: Jeff Daly <jeffd@silicom-usa.com> > > Cc: kernel.org-fo5k2w@ycharbi.fr > > One of those awkward situations where the only known (in this case to Jacob > and me) resolution to a regression is itself a regression (for a different setup). > > I think that in these kind of situations it's best to go back to how things were. > > Reviewed-by: Simon Horman <horms@kernel.org> In principle, I don't disagree..... However, our customer was very sensitive to having any patches/workarounds needed for their configuration be part of the upstream. Aside from maintaining our own patchset (or figuring out whether there's a patch that works for everyone) is there a better solution?
If any of you have the skills to develop a patch that tries to satisfy everyone, please know that I'm always available for testing on my hardware. If Jeff also has the possibilities, it's not impossible that we could come to a consensus. All we'd have to do would be to test the behavior of our equipment in the problematic situation. Isn't there someone at Intel who can contribute their expertise on the underlying technical reasons for the problem (obviously level 1 OSI) in order to guide us towards a state-of-the-art solution? Best regards. 21 mai 2024 à 18:49 "Jeff Daly" <jeffd@silicom-usa.com> a écrit: > > > > > -----Original Message----- > > From: Simon Horman <horms@kernel.org> > > Sent: Tuesday, May 21, 2024 12:42 PM > > To: Jacob Keller <jacob.e.keller@intel.com> > > Cc: netdev@vger.kernel.org; Jeff Daly <jeffd@silicom-usa.com>; kernel.org- > > fo5k2w@ycharbi.fr > > Subject: Re: [PATCH net] Revert "ixgbe: Manual AN-37 for troublesome link > > partners for X550 SFI" > > > > Caution: This is an external email. Please take care when clicking links or > > opening attachments. > > > > > > On Mon, May 20, 2024 at 05:21:27PM -0700, Jacob Keller wrote: > > This reverts commit 565736048bd5f9888990569993c6b6bfdf6dcb6d. > > > > According to the commit, it implements a manual AN-37 for some > > "troublesome" Juniper MX5 switches. This appears to be a workaround > > for a particular switch. > > > > It has been reported that this causes a severe breakage for other > > switches, including a Cisco 3560CX-12PD-S. > > > > The code appears to be a workaround for a specific switch which fails > > to link in SFI mode. It expects to see AN-37 auto negotiation in order > > to link. The Cisco switch is not expecting AN-37 auto negotiation. > > When the device starts the manual AN-37, the Cisco switch decides that > > the port is confused and stops attempting to link with it. This > > persists until a power cycle. A simple driver unload and reload does > > not resolve the issue, even if loading with a version of the driver which lacks > > this workaround. > > > > The authors of the workaround commit have not responded with > > clarifications, and the result of the workaround is complete failure > > to connect with other switches. > > > > This appears to be a case where the driver can either "correctly" link > > with the Juniper MX5 switch, at the cost of bricking the link with the > > Cisco switch, or it can behave properly for the Cisco switch, but fail > > to link with the Junipir MX5 switch. I do not know enough about the > > standards involved to clearly determine whether either switch is at > > fault or behaving incorrectly. Nor do I know whether there exists some > > alternative fix which corrects behavior with both switches. > > > > Revert the workaround for the Juniper switch. > > > > Fixes: 565736048bd5 ("ixgbe: Manual AN-37 for troublesome link > > partners for X550 SFI") > > Link: > > https://lore.kernel.org/netdev/cbe874db-9ac9-42b8-afa0-88ea910e1e99@in > > tel.com/T/ > > Link: > > https://forum.proxmox.com/threads/intel-x553-sfp-ixgbe-no-go-on-pve8.1 > > 35129/#post-612291 > > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> > > Cc: Jeff Daly <jeffd@silicom-usa.com> > > Cc: kernel.org-fo5k2w@ycharbi.fr > > > > One of those awkward situations where the only known (in this case to Jacob > > and me) resolution to a regression is itself a regression (for a different setup). > > > > I think that in these kind of situations it's best to go back to how things were. > > > > Reviewed-by: Simon Horman <horms@kernel.org> > > > > In principle, I don't disagree..... However, our customer was very sensitive to having any patches/workarounds needed for their configuration be part of the upstream. Aside from maintaining our own patchset (or figuring out whether there's a patch that works for everyone) is there a better solution? >
On 5/21/2024 10:12 AM, kernel.org-fo5k2w@ycharbi.fr wrote: > If any of you have the skills to develop a patch that tries to satisfy everyone, please know that I'm always available for testing on my hardware. If Jeff also has the possibilities, it's not impossible that we could come to a consensus. All we'd have to do would be to test the behavior of our equipment in the problematic situation. > I would love a solution which fixes both cases. I don't currently have any idea what it would be. > Isn't there someone at Intel who can contribute their expertise on the underlying technical reasons for the problem (obviously level 1 OSI) in order to guide us towards a state-of-the-art solution? > > Best regards. > Unfortunately I do not know anyone still here who has the expertise to solve this. The out-of-tree ixgbe driver does not have the fix from Silicom, so from Intel's perspective the correct implementation matches the need for the Cisco switch... > On 5/21/2024 9:49 AM, Jeff Daly wrote: >> >> >>> -----Original Message----- >>> From: Simon Horman <horms@kernel.org> >>> Sent: Tuesday, May 21, 2024 12:42 PM >>> To: Jacob Keller <jacob.e.keller@intel.com> >>> Cc: netdev@vger.kernel.org; Jeff Daly <jeffd@silicom-usa.com>; kernel.org- >>> fo5k2w@ycharbi.fr >>> Subject: Re: [PATCH net] Revert "ixgbe: Manual AN-37 for troublesome link >>> partners for X550 SFI" >>> >>> One of those awkward situations where the only known (in this case to Jacob >>> and me) resolution to a regression is itself a regression (for a different setup). >>> >>> I think that in these kind of situations it's best to go back to how things were. >>> >>> Reviewed-by: Simon Horman <horms@kernel.org> >> >> In principle, I don't disagree..... However, our customer was very sensitive to having any patches/workarounds needed for their configuration be part of the upstream. Aside from maintaining our own patchset (or figuring out whether there's a patch that works for everyone) is there a better solution? >> >> We're somewhat stuck between a rock and a hard place here. I don't have full context for the problem, however I did manage to get a little more info about this from internal bugs. Here is the facts as i understand it: 1. The Juniper MX5 switch appears to require clause 37 auto negotiation (AN-37) to link. 2. The Cisco 3560CX-12PD-S appears to reject AN-37 as invalid and stops trying to link if it sees it for this case. 3. As far as I understand, AN-37 is intended for 1G links, and is not generally supported or used in 10GB? It looks like the way this fix applies affects all 10GB SFP links, which results in the issues with the Cisco switch. For context, this document was the best I found from a quick google search: https://www.ieee802.org/3/by/public/Mar15/booth_3by_01_0315.pdf It appears the Cisco device is linking at some form of 10GB according to the bug report here: > show interface status | include Te1/0/1 > Te1/0/1 --- Vers Qotom --- connected trunk full 10G > SFP-10GBase-CX1 The link is an SFP-10GBase-CX1? @Jeff, can you provide any further details about the Juniper MX5 switch case that the original change fixes? The function being modified here is the ixgbe_setup_sfi_x550a, which is called for setting up SFI, and the description says "Used to connect the internal PHY directly to an SFP cage without auto-negotiation" It is only called by ixgbe_setup_mac_link_sfp_n which is supposed to configure the PHY for native SFP support for IXGBE_DEV_ID_X550EM_A_SFP_N (0x15C4). No other device type is changed with this. Every comment here implies that this has no auto negotiation, which makes it extremely weird to me that we try to enable AN-37 in this flow. Without more context, my gut instinct is that the Cisco switch is likely following the general expectations here compared to the Juniper switch. I also don't see a good way currently to have the driver select between the options, if both cases are standard SFP. It can't know what its linked against. If we try the AN-37 flow with Cisco, it essentially bricks the link until a reboot. Even reloading to the out-of-tree driver which doesn't do this AN-37 flow fails to recover link. This makes any sort of "fallback" mechanism unlikely to work. Unless we can find some obvious way to distinguish the two cases, or there is fundamentally a different fix for the Juniper case, I don't see how we can support both flows. I guess there is the option of some sort of toggle via ethtool/otherwise to allow selection... But users might try to enable this when link is faulty and end up hitting the case where once we try the AN-37, the remote switch refuses to try again until a cycle.
Hello: This patch was applied to netdev/net.git (main) by Paolo Abeni <pabeni@redhat.com>: On Mon, 20 May 2024 17:21:27 -0700 you wrote: > This reverts commit 565736048bd5f9888990569993c6b6bfdf6dcb6d. > > According to the commit, it implements a manual AN-37 for some > "troublesome" Juniper MX5 switches. This appears to be a workaround for a > particular switch. > > It has been reported that this causes a severe breakage for other switches, > including a Cisco 3560CX-12PD-S. > > [...] Here is the summary with links: - [net] Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI" https://git.kernel.org/netdev/net/c/b35b1c0b4e16 You are awesome, thank you!
On 5/21/2024 2:05 PM, Jacob Keller wrote: > > > On 5/21/2024 10:12 AM, kernel.org-fo5k2w@ycharbi.fr wrote: >> If any of you have the skills to develop a patch that tries to satisfy everyone, please know that I'm always available for testing on my hardware. If Jeff also has the possibilities, it's not impossible that we could come to a consensus. All we'd have to do would be to test the behavior of our equipment in the problematic situation. >> > > I would love a solution which fixes both cases. I don't currently have > any idea what it would be. > It looks like netdev pulled the revert. Given the lack of a full understanding of the original fix posted from Jeff, I think this is the right decision. >> Isn't there someone at Intel who can contribute their expertise on the underlying technical reasons for the problem (obviously level 1 OSI) in order to guide us towards a state-of-the-art solution? >> I did create an internal ticket here to track and try to get a reproduction so that some of our link experts can diagnose the issue being seen. I hope to have news I can report on this soon. > I guess there is the option of some sort of toggle via ethtool/otherwise > to allow selection... But users might try to enable this when link is > faulty and end up hitting the case where once we try the AN-37, the > remote switch refuses to try again until a cycle. > Given that we have two cases where our current understanding is a need for mutually exclusive behavior, we (Intel) would be open to some sort of config option, flag, or otherwise toggle to enable the Silicom folks without breaking everything else. I don't know what the acceptance for such an idea is with the rest of the community. I hope that internal reproduction task above may lead to a better understanding and possibly a fix that can resolve both cases.
Jacob, initially a config flag option was put forward but because of the maturity of the driver, that option was shot down by the maintainers. > -----Original Message----- > From: Jacob Keller <jacob.e.keller@intel.com> > Sent: Thursday, May 23, 2024 12:15 PM > To: kernel.org-fo5k2w@ycharbi.fr; Jeff Daly <jeffd@silicom-usa.com>; Simon > Horman <horms@kernel.org> > Cc: netdev@vger.kernel.org > Subject: Re: [PATCH net] Revert "ixgbe: Manual AN-37 for troublesome link > partners for X550 SFI" > > Caution: This is an external email. Please take care when clicking links or > opening attachments. > > > On 5/21/2024 2:05 PM, Jacob Keller wrote: > > > > > > On 5/21/2024 10:12 AM, kernel.org-fo5k2w@ycharbi.fr wrote: > >> If any of you have the skills to develop a patch that tries to satisfy everyone, > please know that I'm always available for testing on my hardware. If Jeff also > has the possibilities, it's not impossible that we could come to a consensus. All > we'd have to do would be to test the behavior of our equipment in the > problematic situation. > >> > > > > I would love a solution which fixes both cases. I don't currently have > > any idea what it would be. > > > > It looks like netdev pulled the revert. Given the lack of a full understanding of the > original fix posted from Jeff, I think this is the right decision. > > >> Isn't there someone at Intel who can contribute their expertise on the > underlying technical reasons for the problem (obviously level 1 OSI) in order to > guide us towards a state-of-the-art solution? > >> > > I did create an internal ticket here to track and try to get a reproduction so that > some of our link experts can diagnose the issue being seen. > > I hope to have news I can report on this soon. > > > I guess there is the option of some sort of toggle via > > ethtool/otherwise to allow selection... But users might try to enable > > this when link is faulty and end up hitting the case where once we try > > the AN-37, the remote switch refuses to try again until a cycle. > > > > Given that we have two cases where our current understanding is a need for > mutually exclusive behavior, we (Intel) would be open to some sort of config > option, flag, or otherwise toggle to enable the Silicom folks without breaking > everything else. I don't know what the acceptance for such an idea is with the > rest of the community. > > I hope that internal reproduction task above may lead to a better understanding > and possibly a fix that can resolve both cases.
> It looks like netdev pulled the revert. Given the lack of a full > understanding of the original fix posted from Jeff, I think this is the > right decision. Thank you very much for your investment. I hope a solution can be found for Jeff. > I did create an internal ticket here to track and try to get a > reproduction so that some of our link experts can diagnose the issue > being seen. > > I hope to have news I can report on this soon. Super. Cela va peut-être faire remonter d'autres problèmes sous-jacent dans l'implémentation de la norme. > > > > > I guess there is the option of some sort of toggle via ethtool/otherwise > > to allow selection... But users might try to enable this when link is > > faulty and end up hitting the case where once we try the AN-37, the > > remote switch refuses to try again until a cycle. > > > > Given that we have two cases where our current understanding is a need > for mutually exclusive behavior, we (Intel) would be open to some sort > of config option, flag, or otherwise toggle to enable the Silicom folks > without breaking everything else. I don't know what the acceptance for > such an idea is with the rest of the community. > > I hope that internal reproduction task above may lead to a better > understanding and possibly a fix that can resolve both cases. > > The link is an SFP-10GBase-CX1? As I understand it, CX1 is the name given to Twinax copper cables such as the one I used in the experiments in this thread. It's therefore a priori the right value to display for this kind of connection (instead of “10000baseT/Full”). Thanks again for your hard work in finding a solution. You can always contact me later so that I can carry out tests if you need. The machine is at least available for 1 year for testing purposes. Best regards, Yohan.
On 5/23/2024 9:49 AM, kernel.org-fo5k2w@ycharbi.fr wrote: >> The link is an SFP-10GBase-CX1? > > As I understand it, CX1 is the name given to Twinax copper cables such as the one I used in the experiments in this thread. It's therefore a priori the right value to display for this kind of connection (instead of “10000baseT/Full”). > I'm filing an internal tracking bug regarding this as well. > Thanks again for your hard work in finding a solution. You can always contact me later so that I can carry out tests if you need. The machine is at least available for 1 year for testing purposes. > > Best regards, > Yohan. Appreciate it. Hopefully we can find something that works for both cases.
On 5/23/2024 9:49 AM, kernel.org-fo5k2w@ycharbi.fr wrote: >> > >> The link is an SFP-10GBase-CX1? > > As I understand it, CX1 is the name given to Twinax copper cables such as the one I used in the experiments in this thread. It's therefore a priori the right value to display for this kind of connection (instead of “10000baseT/Full”). > One more thing: Could you confirm if this behavior appears in the 5.19.9 driver from the Intel website or source forge? I'm curious if this is a case of a fix that never got published to the netdev community. Thanks, Jake
> One more thing: Could you confirm if this behavior appears in the 5.19.9 > driver from the Intel website or source forge? I'm curious if this is a > case of a fix that never got published to the netdev community. > > Thanks, > Jake I can confirm that the result of the “Supported link modes:” section is identical with the Intel ixgbe-5.19.9 driver: uname -r 5.10.0-29-amd64 apt install linux-headers-$(uname -r) gcc make wget https://downloadmirror.intel.com/812532/ixgbe-5.19.9.tar.gz tar xf ixgbe-5.19.9.tar.gz -C /usr/local/src/ make -j 8 modinfo ./ixgbe.ko rmmod ixgbe modprobe dca insmod ./ixgbe.ko # eno1 is up ethtool eno1 Settings for eno1: Supported ports: [ FIBRE ] Supported link modes: 10000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: 10000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 10000Mb/s Duplex: Full Auto-negotiation: off Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Supports Wake-on: d Wake-on: d Current message level: 0x00000007 (7) drv probe link Link detected: yes
On 5/23/2024 1:19 PM, kernel.org-fo5k2w@ycharbi.fr wrote: >> One more thing: Could you confirm if this behavior appears in the 5.19.9 >> driver from the Intel website or source forge? I'm curious if this is a >> case of a fix that never got published to the netdev community. >> >> Thanks, >> Jake > > I can confirm that the result of the “Supported link modes:” section is identical with the Intel ixgbe-5.19.9 driver: > uname -r > 5.10.0-29-amd64 > > apt install linux-headers-$(uname -r) gcc make > wget https://downloadmirror.intel.com/812532/ixgbe-5.19.9.tar.gz > tar xf ixgbe-5.19.9.tar.gz -C /usr/local/src/ > make -j 8 > > modinfo ./ixgbe.ko > rmmod ixgbe > modprobe dca > insmod ./ixgbe.ko > > # eno1 is up > ethtool eno1 > Settings for eno1: > Supported ports: [ FIBRE ] > Supported link modes: 10000baseT/Full > Supported pause frame use: Symmetric > Supports auto-negotiation: No > Supported FEC modes: Not reported > Advertised link modes: 10000baseT/Full > Advertised pause frame use: Symmetric > Advertised auto-negotiation: No > Advertised FEC modes: Not reported > Speed: 10000Mb/s > Duplex: Full > Auto-negotiation: off > Port: Direct Attach Copper > PHYAD: 0 > Transceiver: internal > Supports Wake-on: d > Wake-on: d > Current message level: 0x00000007 (7) > drv probe link > Link detected: yes Thanks. At a glance from reviewing code, it looks like ixgbe simply collapses all non-backplane links into 10000baseT/Full.
Looking for a solution that would satisfy everyone.... > -----Original Message----- > From: Jacob Keller <jacob.e.keller@intel.com> > Sent: Thursday, May 23, 2024 12:15 PM > To: kernel.org-fo5k2w@ycharbi.fr; Jeff Daly <jeffd@silicom-usa.com>; Simon > Horman <horms@kernel.org> > Cc: netdev@vger.kernel.org > Subject: Re: [PATCH net] Revert "ixgbe: Manual AN-37 for troublesome link > partners for X550 SFI" > > Caution: This is an external email. Please take care when clicking links or > opening attachments. > > > On 5/21/2024 2:05 PM, Jacob Keller wrote: > > > > > > On 5/21/2024 10:12 AM, kernel.org-fo5k2w@ycharbi.fr wrote: > >> If any of you have the skills to develop a patch that tries to satisfy everyone, > please know that I'm always available for testing on my hardware. If Jeff also > has the possibilities, it's not impossible that we could come to a consensus. All > we'd have to do would be to test the behavior of our equipment in the > problematic situation. > >> > > > > I would love a solution which fixes both cases. I don't currently have > > any idea what it would be. > > > > It looks like netdev pulled the revert. Given the lack of a full understanding of the > original fix posted from Jeff, I think this is the right decision. > > >> Isn't there someone at Intel who can contribute their expertise on the > underlying technical reasons for the problem (obviously level 1 OSI) in order to > guide us towards a state-of-the-art solution? > >> > > I did create an internal ticket here to track and try to get a reproduction so that > some of our link experts can diagnose the issue being seen. > > I hope to have news I can report on this soon. > > > I guess there is the option of some sort of toggle via > > ethtool/otherwise to allow selection... But users might try to enable > > this when link is faulty and end up hitting the case where once we try > > the AN-37, the remote switch refuses to try again until a cycle. > > > > Given that we have two cases where our current understanding is a need for > mutually exclusive behavior, we (Intel) would be open to some sort of config > option, flag, or otherwise toggle to enable the Silicom folks without breaking > everything else. I don't know what the acceptance for such an idea is with the > rest of the community. > Originally, this was the idea that was put forward, and if I recall it was quashed by the maintainers due to the maturity of the driver. I was told that introducing new config options were a no-go. If there's a possibility of reworking the patch to introduce a new config option during module loading that would be specific to our fix, I'm sure we can come up with something appropriate.... I just don't want to try to push something that will never get accepted, but I think in this case it's something that could be warranted. I understand the desire to not have special workaround code for a singular case, but in this instance I think it's the only option. > I hope that internal reproduction task above may lead to a better understanding > and possibly a fix that can resolve both cases.
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h index 897fe357b65b..346e3d9114a8 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h @@ -3675,9 +3675,7 @@ struct ixgbe_info { #define IXGBE_KRM_LINK_S1(P) ((P) ? 0x8200 : 0x4200) #define IXGBE_KRM_LINK_CTRL_1(P) ((P) ? 0x820C : 0x420C) #define IXGBE_KRM_AN_CNTL_1(P) ((P) ? 0x822C : 0x422C) -#define IXGBE_KRM_AN_CNTL_4(P) ((P) ? 0x8238 : 0x4238) #define IXGBE_KRM_AN_CNTL_8(P) ((P) ? 0x8248 : 0x4248) -#define IXGBE_KRM_PCS_KX_AN(P) ((P) ? 0x9918 : 0x5918) #define IXGBE_KRM_SGMII_CTRL(P) ((P) ? 0x82A0 : 0x42A0) #define IXGBE_KRM_LP_BASE_PAGE_HIGH(P) ((P) ? 0x836C : 0x436C) #define IXGBE_KRM_DSP_TXFFE_STATE_4(P) ((P) ? 0x8634 : 0x4634) @@ -3687,7 +3685,6 @@ struct ixgbe_info { #define IXGBE_KRM_PMD_FLX_MASK_ST20(P) ((P) ? 0x9054 : 0x5054) #define IXGBE_KRM_TX_COEFF_CTRL_1(P) ((P) ? 0x9520 : 0x5520) #define IXGBE_KRM_RX_ANA_CTL(P) ((P) ? 0x9A00 : 0x5A00) -#define IXGBE_KRM_FLX_TMRS_CTRL_ST31(P) ((P) ? 0x9180 : 0x5180) #define IXGBE_KRM_PMD_FLX_MASK_ST20_SFI_10G_DA ~(0x3 << 20) #define IXGBE_KRM_PMD_FLX_MASK_ST20_SFI_10G_SR BIT(20) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c index 2decb0710b6e..a5f644934445 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c @@ -1722,59 +1722,9 @@ static int ixgbe_setup_sfi_x550a(struct ixgbe_hw *hw, ixgbe_link_speed *speed) return -EINVAL; } - (void)mac->ops.write_iosf_sb_reg(hw, - IXGBE_KRM_PMD_FLX_MASK_ST20(hw->bus.lan_id), - IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); - - /* change mode enforcement rules to hybrid */ - (void)mac->ops.read_iosf_sb_reg(hw, - IXGBE_KRM_FLX_TMRS_CTRL_ST31(hw->bus.lan_id), - IXGBE_SB_IOSF_TARGET_KR_PHY, ®_val); - reg_val |= 0x0400; - - (void)mac->ops.write_iosf_sb_reg(hw, - IXGBE_KRM_FLX_TMRS_CTRL_ST31(hw->bus.lan_id), - IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); - - /* manually control the config */ - (void)mac->ops.read_iosf_sb_reg(hw, - IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id), - IXGBE_SB_IOSF_TARGET_KR_PHY, ®_val); - reg_val |= 0x20002240; - - (void)mac->ops.write_iosf_sb_reg(hw, - IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id), - IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); - - /* move the AN base page values */ - (void)mac->ops.read_iosf_sb_reg(hw, - IXGBE_KRM_PCS_KX_AN(hw->bus.lan_id), - IXGBE_SB_IOSF_TARGET_KR_PHY, ®_val); - reg_val |= 0x1; - - (void)mac->ops.write_iosf_sb_reg(hw, - IXGBE_KRM_PCS_KX_AN(hw->bus.lan_id), - IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); - - /* set the AN37 over CB mode */ - (void)mac->ops.read_iosf_sb_reg(hw, - IXGBE_KRM_AN_CNTL_4(hw->bus.lan_id), - IXGBE_SB_IOSF_TARGET_KR_PHY, ®_val); - reg_val |= 0x20000000; - - (void)mac->ops.write_iosf_sb_reg(hw, - IXGBE_KRM_AN_CNTL_4(hw->bus.lan_id), - IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); - - /* restart AN manually */ - (void)mac->ops.read_iosf_sb_reg(hw, - IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id), - IXGBE_SB_IOSF_TARGET_KR_PHY, ®_val); - reg_val |= IXGBE_KRM_LINK_CTRL_1_TETH_AN_RESTART; - - (void)mac->ops.write_iosf_sb_reg(hw, - IXGBE_KRM_LINK_CTRL_1(hw->bus.lan_id), - IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); + status = mac->ops.write_iosf_sb_reg(hw, + IXGBE_KRM_PMD_FLX_MASK_ST20(hw->bus.lan_id), + IXGBE_SB_IOSF_TARGET_KR_PHY, reg_val); /* Toggle port SW reset by AN reset. */ status = ixgbe_restart_an_internal_phy_x550em(hw);
This reverts commit 565736048bd5f9888990569993c6b6bfdf6dcb6d. According to the commit, it implements a manual AN-37 for some "troublesome" Juniper MX5 switches. This appears to be a workaround for a particular switch. It has been reported that this causes a severe breakage for other switches, including a Cisco 3560CX-12PD-S. The code appears to be a workaround for a specific switch which fails to link in SFI mode. It expects to see AN-37 auto negotiation in order to link. The Cisco switch is not expecting AN-37 auto negotiation. When the device starts the manual AN-37, the Cisco switch decides that the port is confused and stops attempting to link with it. This persists until a power cycle. A simple driver unload and reload does not resolve the issue, even if loading with a version of the driver which lacks this workaround. The authors of the workaround commit have not responded with clarifications, and the result of the workaround is complete failure to connect with other switches. This appears to be a case where the driver can either "correctly" link with the Juniper MX5 switch, at the cost of bricking the link with the Cisco switch, or it can behave properly for the Cisco switch, but fail to link with the Junipir MX5 switch. I do not know enough about the standards involved to clearly determine whether either switch is at fault or behaving incorrectly. Nor do I know whether there exists some alternative fix which corrects behavior with both switches. Revert the workaround for the Juniper switch. Fixes: 565736048bd5 ("ixgbe: Manual AN-37 for troublesome link partners for X550 SFI") Link: https://lore.kernel.org/netdev/cbe874db-9ac9-42b8-afa0-88ea910e1e99@intel.com/T/ Link: https://forum.proxmox.com/threads/intel-x553-sfp-ixgbe-no-go-on-pve8.135129/#post-612291 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Jeff Daly <jeffd@silicom-usa.com> Cc: kernel.org-fo5k2w@ycharbi.fr --- drivers/net/ethernet/intel/ixgbe/ixgbe_type.h | 3 -- drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 56 ++------------------------- 2 files changed, 3 insertions(+), 56 deletions(-) --- base-commit: e4a87abf588536d1cdfb128595e6e680af5cf3ed change-id: 20240520-net-2024-05-20-revert-silicom-switch-workaround-48a6a7a9b0a0 Best regards,