Message ID | ZGIkGmyL8yL1q1zp@shell.armlinux.org.uk (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | [RFC] Providing a helper for PCS inband negotiation | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
> 2. XLGMII.. Looking at the XPCS driver, it's unclear whether Clause 73 > AN gets used for this. A quick scan of IEEE 802.3 suggests that > XLGMII doesn't have any support for any inband signalling, and it's > just an intermediary protocol between the MAC (more specifically the > RS, but for the purposes of this I'll just refer to MAC) and the > attached PCS, and any autonegotiation happens after the XLGMII link. So isn't XLGMII then a generic PHY thing, not a phylink thing? Or am i not correctly understanding how drivers/phy/marvell/phy-*-comphy.c and drivers/phy/microchip/*_serdev.c fit into the overall picture? Andrew
On Mon, May 15, 2023 at 03:58:09PM +0200, Andrew Lunn wrote: > > 2. XLGMII.. Looking at the XPCS driver, it's unclear whether Clause 73 > > AN gets used for this. A quick scan of IEEE 802.3 suggests that > > XLGMII doesn't have any support for any inband signalling, and it's > > just an intermediary protocol between the MAC (more specifically the > > RS, but for the purposes of this I'll just refer to MAC) and the > > attached PCS, and any autonegotiation happens after the XLGMII link. > > So isn't XLGMII then a generic PHY thing, not a phylink thing? Honestly, I'm really not sure. It feels more like an internal-SoC thing. See my last diagram why I think this. > Or am i not correctly understanding how > drivers/phy/marvell/phy-*-comphy.c and > drivers/phy/microchip/*_serdev.c fit into the overall picture? In the case of serdes-based protocols with an external PHY, the general structure we have is: ------------------------------+ SoC | +-----+ +-----+ +--------+ | +--------------+ | MAC +---+ PCS +--+ SERDES +------------+ Ethernet PHY +---- Media +-----+ +-----+ +--------+ | ^ +--------------+ ------------------------------+ | | PHY_INTERFACE_MODE_xxx has referred to this bit, and this is the bit where inband can occur. In the case of multi-rate implementations, there can be one of many different PCS that can be placed there, and the SERDES handles converting the data stream from the PCS into its appropriate electrical form, essentially covering the PMA and PMD functions. (That same SERDES block generally can also handle PCIe and SATA.) In the case of non-serdes based protocols, then it's essentially the same as the above, but with the PCS and SERDES blocks removed. For Fibre based connections: ------------------------------+ SoC | +-----+ +-----+ +--------+ | | MAC +---+ PCS +--+ SERDES +------------ Media +-----+ +-----+ +--------+ | ^ ------------------------------+ | | PHY_INTERFACE_MODE_xxx has referred to this bit, and 1000BASE-X runs negotiation on this. 10GBASE-R also used for fibre has no negotiation, but there's still the 10GBASE-R PCS and the 802.3 PMA/PMD are subsumed by the SERDES block. What I think seems to be the case with XPCS is: ------------------------------+ SoC | +-----+ +-----+ +--------+ | | MAC +---+ PCS +--+ SERDES +----------- Media +-----+ ^ +-----+ +--------+ | --------|---------------------+ | PHY_INTERFACE_MODE_xxx seems to be referring to this bit. When clause 73 negotiation is used, that happens where I've stated "media" above, and that's involved with negotiating backplane protocols e.g. 40GBASE-KR4, 40GBASE-CR4, 25GBASE-KR, 10GBASE-KR, 1000BASE-KX etc on the bit I've called "Media" above. However, we can also have this for a fibre link: ------------------------------+ SoC | +-----+ +-----+ +--------+ | | MAC +---+ PCS +--+ SERDES +----------- Fibre +-----+ ^ +-----+ +--------+ | ^ --------|---------------------+ | | | XLGMII 40GBASE-R Given that in this case, we'd want PHY_INTERFACE_MODE_xxx to say 40GBASE-R, using the existing PHY_INTERFACE_MODE_xxx to specify at where I've pointed XLGMII just makes things confused... but in the case above with clause 73 negotiation, we wouldn't have a standard PHY_INTERFACE_MODE_xxx specifier for the external "media" side because that's dependent on the result of the negotiation. So... this seems to be a right can of wriggly things.
On Mon, May 15, 2023 at 04:34:33PM +0100, Russell King (Oracle) wrote: > On Mon, May 15, 2023 at 03:58:09PM +0200, Andrew Lunn wrote: > > > 2. XLGMII.. Looking at the XPCS driver, it's unclear whether Clause 73 > > > AN gets used for this. A quick scan of IEEE 802.3 suggests that > > > XLGMII doesn't have any support for any inband signalling, and it's > > > just an intermediary protocol between the MAC (more specifically the > > > RS, but for the purposes of this I'll just refer to MAC) and the > > > attached PCS, and any autonegotiation happens after the XLGMII link. > > > > So isn't XLGMII then a generic PHY thing, not a phylink thing? > > Honestly, I'm really not sure. It feels more like an internal-SoC > thing. See my last diagram why I think this. > > > Or am i not correctly understanding how > > drivers/phy/marvell/phy-*-comphy.c and > > drivers/phy/microchip/*_serdev.c fit into the overall picture? > > In the case of serdes-based protocols with an external PHY, the general > structure we have is: > > ------------------------------+ > SoC | > +-----+ +-----+ +--------+ | +--------------+ > | MAC +---+ PCS +--+ SERDES +------------+ Ethernet PHY +---- Media > +-----+ +-----+ +--------+ | ^ +--------------+ > ------------------------------+ | > | > PHY_INTERFACE_MODE_xxx has referred to this bit, and this is the bit > where inband can occur. > > In the case of multi-rate implementations, there can be one of many > different PCS that can be placed there, and the SERDES handles > converting the data stream from the PCS into its appropriate > electrical form, essentially covering the PMA and PMD functions. > (That same SERDES block generally can also handle PCIe and SATA.) > > In the case of non-serdes based protocols, then it's essentially the > same as the above, but with the PCS and SERDES blocks removed. > > For Fibre based connections: > > ------------------------------+ > SoC | > +-----+ +-----+ +--------+ | > | MAC +---+ PCS +--+ SERDES +------------ Media > +-----+ +-----+ +--------+ | ^ > ------------------------------+ | > | > PHY_INTERFACE_MODE_xxx has referred to this bit, and 1000BASE-X runs > negotiation on this. 10GBASE-R also used for fibre has no negotiation, > but there's still the 10GBASE-R PCS and the 802.3 PMA/PMD are subsumed > by the SERDES block. > > What I think seems to be the case with XPCS is: > > ------------------------------+ > SoC | > +-----+ +-----+ +--------+ | > | MAC +---+ PCS +--+ SERDES +----------- Media > +-----+ ^ +-----+ +--------+ | > --------|---------------------+ > | > PHY_INTERFACE_MODE_xxx seems to be referring to this bit. When clause 73 > negotiation is used, that happens where I've stated "media" above, and > that's involved with negotiating backplane protocols e.g. 40GBASE-KR4, > 40GBASE-CR4, 25GBASE-KR, 10GBASE-KR, 1000BASE-KX etc on the bit I've > called "Media" above. > > However, we can also have this for a fibre link: > > ------------------------------+ > SoC | > +-----+ +-----+ +--------+ | > | MAC +---+ PCS +--+ SERDES +----------- Fibre > +-----+ ^ +-----+ +--------+ | ^ > --------|---------------------+ | > | | > XLGMII 40GBASE-R > > Given that in this case, we'd want PHY_INTERFACE_MODE_xxx to say > 40GBASE-R, using the existing PHY_INTERFACE_MODE_xxx to specify at > where I've pointed XLGMII just makes things confused... but in the > case above with clause 73 negotiation, we wouldn't have a standard > PHY_INTERFACE_MODE_xxx specifier for the external "media" side > because that's dependent on the result of the negotiation. > > So... this seems to be a right can of wriggly things. I should point out that in many of these cases, this may also be true: --------+ +---------------------+ SoC | | PHY | +-----+ | | +-----+ +--------+ | | MAC +-------+ PCS +--+ SERDES +------------ Media +-----+ | ^ | +-----+ +--------+ | ^ --------+ | +---------------------+ | | | GMII 1000BASE-X XGMII 10GBASE-R XLGMII 40GBASE-R One would expect PHY_INTERFACE_MODE_xxx to refer to the first in this case. If we decide that is the right model, then we shouldn't have added PHY_INTERFACE_MODE_1000BASEX! Then we get to the fun that is SGMII, which if we think the above is the right model, we get GMII, a PHY that converts GMII to SGMII, and then another PHY that converts SGMII to whatever the media wants. In all of these cases, the interface mode on the left has no bearing on whether the PHY block in the above (whether its separate or part of the SoC) performs any kind of negotiation.
Hello, On Mon, May 15, 2023 at 01:22:50PM +0100, Russell King (Oracle) wrote: > 1. Should 10GBASE-KR be included in the SGMII et.al. case in the code? > Any other interface modes that should be there? Obviously, > PHYLINK_PCS_NEG_NONE is not correct for 10GBASE-KR since it does use > inband AN. Does it make sense for the user to disable inband AN for > 10GBASE-KR? If so, maybe it should be under the 1000base-X case. What physical process (reference to IEEE 802.3 clause is fine) would be controlled by the phylink_pcs_neg_mode() function for the so-called 10GBASE-KR phy-mode?
On Mon, May 15, 2023 at 10:56:16PM +0300, Vladimir Oltean wrote: > Hello, > > On Mon, May 15, 2023 at 01:22:50PM +0100, Russell King (Oracle) wrote: > > 1. Should 10GBASE-KR be included in the SGMII et.al. case in the code? > > Any other interface modes that should be there? Obviously, > > PHYLINK_PCS_NEG_NONE is not correct for 10GBASE-KR since it does use > > inband AN. Does it make sense for the user to disable inband AN for > > 10GBASE-KR? If so, maybe it should be under the 1000base-X case. > > What physical process (reference to IEEE 802.3 clause is fine) would be > controlled by the phylink_pcs_neg_mode() function for the so-called > 10GBASE-KR phy-mode? Clause 73.1: "While implementation of Auto-Negotiation is mandatory for Backplane Ethernet PHY s, the use of Auto-Negotiation is optional." "The Auto-Negotiation function allows the devices to switch between the various operational modes in an orderly fashion, permits management to disable or enable the Auto-Negotiation function, and allows management to select a specific operational mode." "The Auto-Negotiation function also provides a parallel detection function to allow Backplane Ethernet devices to connect to other Backplane Ethernet devices that have Auto-Negotiation disabled and interoperate with legacy devices that do not support Clause 73 Auto-Negotiation." So, my reading of these statements is that the _user_ should be able to control via ethtool whether Clause 73 negotiation is performed on a 10GBASE-KR (or any other backplane link that uses clause 73 negotiation.) Having extracted that from 802.3, I now believe it should be treated the same as 1000BASE-X, and the Autoneg bit in ethtool should determine whether Clause 73 negotiation is used for 10GBASE-KR (and any other Clause 73 using protocol.)
On Mon, May 15, 2023 at 10:45:21PM +0100, Russell King (Oracle) wrote: > Clause 73.1: > > So, my reading of these statements is that the _user_ should be > able to control via ethtool whether Clause 73 negotiation is > performed on a 10GBASE-KR (or any other backplane link that > uses clause 73 negotiation.) Having extracted that from 802.3, > I now believe it should be treated the same as 1000BASE-X, and > the Autoneg bit in ethtool should determine whether Clause 73 > negotiation is used for 10GBASE-KR (and any other Clause 73 > using protocol.) Having said that copper backplane link modes should be treated "the same" as fiber link modes w.r.t. ethtool -s autoneg, it should also be said that there are significant differences between clause 37 and 73 autoneg too. Clause 73 negotiates the actual use of 10GBase-KR as a SERDES protocol through the copper backplane in favor of other "Base-K*" alternative link modes, so it's not quite proper to say that 10GBase-KR is a clause 73 using protocol. To me, the goals of clause 73 autoneg are much more similar to those of the twisted pair autoneg process - clause 28, which similarly selects between different media side protocols in the PHY, using a priority resolution function. For those, we use phylib and the phy_device structure. What are the merits of using phylink_pcs for copper backplanes and not phylib?
On Tue, May 16, 2023 at 01:08:33AM +0300, Vladimir Oltean wrote: > On Mon, May 15, 2023 at 10:45:21PM +0100, Russell King (Oracle) wrote: > > Clause 73.1: > > > > So, my reading of these statements is that the _user_ should be > > able to control via ethtool whether Clause 73 negotiation is > > performed on a 10GBASE-KR (or any other backplane link that > > uses clause 73 negotiation.) Having extracted that from 802.3, > > I now believe it should be treated the same as 1000BASE-X, and > > the Autoneg bit in ethtool should determine whether Clause 73 > > negotiation is used for 10GBASE-KR (and any other Clause 73 > > using protocol.) > > Having said that copper backplane link modes should be treated "the > same" as fiber link modes w.r.t. ethtool -s autoneg, it should also be > said that there are significant differences between clause 37 and 73 > autoneg too. > > Clause 73 negotiates the actual use of 10GBase-KR as a SERDES protocol > through the copper backplane in favor of other "Base-K*" alternative > link modes, so it's not quite proper to say that 10GBase-KR is a clause > 73 using protocol. I believe it is correct to state it as such, because: 72.1: Table 72–1—Physical Layer clauses associated with the 10GBASE-KR PMD 73—Auto-Negotiation for Backplane Ethernet Required Essentially, 802.3 doesn't permit 10GBASE-KR without hardware support for Clause 73 AN (but that AN doesn't have to be enabled by management software.) > To me, the goals of clause 73 autoneg are much more similar to those of > the twisted pair autoneg process - clause 28, which similarly selects > between different media side protocols in the PHY, using a priority > resolution function. For those, we use phylib and the phy_device > structure. What are the merits of using phylink_pcs for copper backplanes > and not phylib? I agree with you on that, because not only does that fit better with our ethernet PHY model, but it also means PHY_INTERFACE_MODE_XLGMII makes sense. However, by that same token, 1000BASE-X should never have been a PHY_INTERFACE_MODE_xxx, and this should also have been treated purely as a PHY. Taking that still further, this means SGMII, which is 1000BASE-X but modified for Cisco's purposes, would effectively be a media converting PHY sat between the MAC/RS and the "real" ethernet PHY. In this case, PHY_INTERFACE_MODE_SGMII might make sense because the "real" ethernet PHY needs to know that. Then there's 1000BASE-X used to connect a "real" ethernet PHY to the MAC/RS, which means 1000BASE-X can't really be any different from SGMII. This all makes the whole thing extremely muddy, but this deviates away from the original topic, because we're now into a "what should we call a PCS" vs "what should we call a PHY" discussion. Then we'll get into a discussion about phylib, difficulties with net_device only being able to have one phylib device, stacked PHYs, and phylib not being able to cope with non-MDIO based devices that we find on embedded platforms (some which don't even offer anything that approximates the 802.3 register set, so could never be a phylib driver.) It even impacts on the DT description, since what does "managed = "in-band-status";" mean if we start considering 1000base-X the same way as 1000base-T and the "PHY" protocol for 1000base-X becomes GMII. A GMII link has no inband AN, so "managed = "in-band-status";" at that point makes no sense. That is definitely a can of worms I do *not* want to open with this discussion - and much of the above has a long history and considerably pre-dates phylink. My original question was more around: how do we decide what we currently have as a PCS should use inband negotiation. For SGMII and close relatives, and 1000BASE-X it's been obvious. For 2500BASE-X less so (due to vendors coming up with it before its been standardised.) We have implementations using this for other protocols, so it's a question that needs answering for these other protocols. However, if we did want to extend this topic, then there are a number of questions that really need to be asked is about the XPCS driver. Such as - what does 1000BASE-KX, 10000BASE-KX4, 10000BASE-KR and 2500BASE-X have to do with USXGMII, and why are there no copper ethtool modes listed when a USXGMII link can definitely support being connected to a copper PHY? (See xpcs_usxgmii_features[]). Why does XPCS think that USXGMII uses Clause 73 AN? (see the first entry in synopsys_xpcs_compat[].) xpcs_sgmii_features[] only mentions copper linkmodes, but as we know, it's common for copper PHYs to also support fibre with an auto- selection mechanism. So, 1000BASE-X is definitely possible if SGMII is supported, so why isn't it listed. As previously said, 1000BASE-X can be connected to a PHY that does 1000BASE-T, so why does xpcs_1000basex_features[] not mention 1000baseT_Full... there's probably more here as well. Interestingly, xpcs_2500basex_features[] _does_ mention both 2500BASE-X and 2500BASE-T, but I think that only does because I happened to comment on it during a review. I think xpcs is another can of worms, but is an easier can of worms to solve than trying to sort out that "what's an ethernet PHY" question we seem to be heading towards (which I think would be a mammoth task, even back when phylink didn't exist, to sort out.)
On Tue, May 16, 2023 at 12:36:33AM +0100, Russell King (Oracle) wrote: > > Clause 73 negotiates the actual use of 10GBase-KR as a SERDES protocol > > through the copper backplane in favor of other "Base-K*" alternative > > link modes, so it's not quite proper to say that 10GBase-KR is a clause > > 73 using protocol. > > I believe it is correct to state it as such, because: > > 72.1: Table 72–1—Physical Layer clauses associated with the 10GBASE-KR PMD > > 73—Auto-Negotiation for Backplane Ethernet Required > > Essentially, 802.3 doesn't permit 10GBASE-KR without hardware support > for Clause 73 AN (but that AN doesn't have to be enabled by management > software.) Just like clause 40 (PCS and PMA for 1000BASE-T) requires clause 28 AN to be supported. But, when the autoneg process begins, the use of 10GBase-KR as a protocol over the backplane link hasn't even been yet established, so I find it unnatural to speak of clause 73 autoneg as something that 10GBase-KR has. The reason why I'm insisting on this is because to me, to treat clause 73 as an in-band autoneg process of 10GBase-KR sounds like a reversal of causality. The clause 73 link codeword has a Technology Ability field through which 10GBase-KR, 1GBase-KX etc are advertised as supported protocols. If C73 is an inband protocol of 10GBase-KR, what should the local PCS advertise for its Technology Ability? Only 10GBase-KR, because this is what is implied by treating it as an attribute of 10GBase-KR, no? But that would be a denatured way of negotiating - advertise a single link mode, take it or leave it. And what other inband autoneg protocols permit, say, starting from SGMII and ending in 1000Base-X? Clause 73 can't be directly compared to what we currently mean by managed = "in-band-status". Not only is C37 autoneg not directly comparable to C73, but they are not mutually exclusive, either. I would say they are more or less orthogonal. More below. I don't believe that toggling clause 73 autoneg based on phylink_pcs_neg_mode() makes much sense. > > To me, the goals of clause 73 autoneg are much more similar to those of > > the twisted pair autoneg process - clause 28, which similarly selects > > between different media side protocols in the PHY, using a priority > > resolution function. For those, we use phylib and the phy_device > > structure. What are the merits of using phylink_pcs for copper backplanes > > and not phylib? > > I agree with you on that, because not only does that fit better with > our ethernet PHY model, but it also means PHY_INTERFACE_MODE_XLGMII > makes sense. > > However, by that same token, 1000BASE-X should never have been a > PHY_INTERFACE_MODE_xxx, and this should also have been treated purely > as a PHY. > > Taking that still further, this means SGMII, which is 1000BASE-X but > modified for Cisco's purposes, would effectively be a media converting > PHY sat between the MAC/RS and the "real" ethernet PHY. In this case, > PHY_INTERFACE_MODE_SGMII might make sense because the "real" ethernet > PHY needs to know that. > > Then there's 1000BASE-X used to connect a "real" ethernet PHY to the > MAC/RS, which means 1000BASE-X can't really be any different from > SGMII. > > This all makes the whole thing extremely muddy, but this deviates away > from the original topic, because we're now into a "what should we call > a PCS" vs "what should we call a PHY" discussion. Then we'll get into > a discussion about phylib, difficulties with net_device only being > able to have one phylib device, stacked PHYs, and phylib not being > able to cope with non-MDIO based devices that we find on embedded > platforms (some which don't even offer anything that approximates the > 802.3 register set, so could never be a phylib driver.) > > It even impacts on the DT description, since what does "managed = > "in-band-status";" mean if we start considering 1000base-X the same > way as 1000base-T and the "PHY" protocol for 1000base-X becomes GMII. > A GMII link has no inband AN, so "managed = "in-band-status";" at > that point makes no sense. > > That is definitely a can of worms I do *not* want to open with this > discussion - and much of the above has a long history and considerably > pre-dates phylink. I don't have much of a problem accepting that not everything we have in-tree is consistent/correct. But if you think it's too big of a can of worms to open, okay. > > My original question was more around: how do we decide what we > currently have as a PCS should use inband negotiation. > > For SGMII and close relatives, and 1000BASE-X it's been obvious. For > 2500BASE-X less so (due to vendors coming up with it before its been > standardised.) > > We have implementations using this for other protocols, so it's > a question that needs answering for these other protocols. > > > However, if we did want to extend this topic, then there are a number > of questions that really need to be asked is about the XPCS driver. > Such as - what does 1000BASE-KX, 10000BASE-KX4, 10000BASE-KR and > 2500BASE-X have to do with USXGMII, and why are there no copper > ethtool modes listed when a USXGMII link can definitely support > being connected to a copper PHY? (See xpcs_usxgmii_features[]). > > Why does XPCS think that USXGMII uses Clause 73 AN? (see the first > entry in synopsys_xpcs_compat[].) First, in principle USXGMII and clause 73 are not mutually exclusive. It is possible to use clause 73 to advertise 10GBase-KR as a link mode, and that will give you link training for proper 3-tap electrical equalization over the copper backplane. Then, once C73 AN/LT ended and 10GBase-KR has been established, is possible to configure the 10GBase-R PCS to enable C37 USXGMII to select the actual data rate via symbol replication, while the SERDES lane remains at 10GBaud. At least, the XPCS seems to permit enabling symbol replication in conjunction with 10GBase-KR. Cross-checking what the XPCS driver does with the XPCS databook, I think it's this: It doesn't use C37 autoneg, but it still uses the USXGMII replicator. It uses C73 (standard or perhaps a vendor-specific form) to advertise the 1000baseKX_Full, 2500baseX_Full, 10000baseKX4_Full, 10000baseKR_Full link modes. Each of these corresponds to a bit in the SR_AN_ADV3 register. Note that 2500baseX_Full doesn't have a "K", and for good reason - because 802.3 Table 73–4—"Technology Ability Field encoding" doesn't specify 2500Base-KX as a copper backplane mode that exists. But XPCS does! To quote the databook: | When DATA[32] is set to 1, it indicates that the local device | supports 2.5GBASE-KX speed mode. And since it advertises this through a code word visible to the link partner, I would suggest this means that its C73 hardware implementation is non-standard. Then, there's the entire issue of what the software does. When 1GBase-KX is established through C73 autoneg, the XPCS driver will leave the lane in 10GBase-R mode, and enable 1:10 symbol replication in the USXGMII block. Sure, this is a 1Gbit data rate, but it uses the 64b/66b encoding (BASE-R), and not the advertised 8b/10b encoding (BASE-X)! So it will likely only work between 2 XPCS devices!!! Then, there's the entire issue that the code, as it was originally introduced, is not the same as it is now. For example, this bit in xpcs_do_config(): switch (compat->an_mode) { case DW_AN_C73: if (phylink_autoneg_inband(mode)) { ret = xpcs_config_aneg_c73(xpcs, compat); if (ret) return ret; } break; used to look at state->an_enabled rather than phylink_autoneg_inband(). Through my idiocy, I inadvertently converted that in commit 11059740e616 ("net: pcs: xpcs: convert to phylink_pcs_ops"). > > xpcs_sgmii_features[] only mentions copper linkmodes, but as we know, > it's common for copper PHYs to also support fibre with an auto- > selection mechanism. So, 1000BASE-X is definitely possible if SGMII > is supported, so why isn't it listed. Most likely explanation is that XPCS has never been paired up until now to such a PHY. > As previously said, 1000BASE-X can be connected to a PHY that does > 1000BASE-T, so why does xpcs_1000basex_features[] not mention > 1000baseT_Full... there's probably more here as well. > > Interestingly, xpcs_2500basex_features[] _does_ mention both > 2500BASE-X and 2500BASE-T, but I think that only does because I > happened to comment on it during a review. > > I think xpcs is another can of worms, but is an easier can of worms > to solve than trying to sort out that "what's an ethernet PHY" > question we seem to be heading towards (which I think would be a > mammoth task, even back when phylink didn't exist, to sort out.) I wasn't necessarily going to go all the way into "what's a PHY?". I just want to clarify some terms such that we can agree what is correct and what is not. I believe that much of what's currently in XPCS w.r.t. C73 is not correct, partly through initial intention and partly through blind conversions such as mine.
On Tue, May 16, 2023 at 12:00:09PM +0300, Vladimir Oltean wrote: > On Tue, May 16, 2023 at 12:36:33AM +0100, Russell King (Oracle) wrote: > > > Clause 73 negotiates the actual use of 10GBase-KR as a SERDES protocol > > > through the copper backplane in favor of other "Base-K*" alternative > > > link modes, so it's not quite proper to say that 10GBase-KR is a clause > > > 73 using protocol. > > > > I believe it is correct to state it as such, because: > > > > 72.1: Table 72–1—Physical Layer clauses associated with the 10GBASE-KR PMD > > > > 73—Auto-Negotiation for Backplane Ethernet Required > > > > Essentially, 802.3 doesn't permit 10GBASE-KR without hardware support > > for Clause 73 AN (but that AN doesn't have to be enabled by management > > software.) > > Just like clause 40 (PCS and PMA for 1000BASE-T) requires clause 28 AN > to be supported. But, when the autoneg process begins, the use of > 10GBase-KR as a protocol over the backplane link hasn't even been yet > established, so I find it unnatural to speak of clause 73 autoneg as > something that 10GBase-KR has. > > The reason why I'm insisting on this is because to me, to treat clause > 73 as an in-band autoneg process of 10GBase-KR sounds like a reversal of > causality. The clause 73 link codeword has a Technology Ability field > through which 10GBase-KR, 1GBase-KX etc are advertised as supported > protocols. If C73 is an inband protocol of 10GBase-KR, what should the > local PCS advertise for its Technology Ability? Only 10GBase-KR, because > this is what is implied by treating it as an attribute of 10GBase-KR, no? I'm not going to get hung up on this, because I don't regard the point as being particularly important to this discussion. You are right that Clause 73 AN selects whether e.g. 10GBASE-KR gets used, and I don't disagree with that. We're just entering into what seems like a pointless debate that eats up email bandwidth. > But that would be a denatured way of negotiating - advertise a single > link mode, take it or leave it. And what other inband autoneg protocols > permit, say, starting from SGMII and ending in 1000Base-X? Clause 73 > can't be directly compared to what we currently mean by managed = > "in-band-status". > > Not only is C37 autoneg not directly comparable to C73, but they are not > mutually exclusive, either. I would say they are more or less orthogonal. > More below. > > I don't believe that toggling clause 73 autoneg based on phylink_pcs_neg_mode() > makes much sense. I agree, which means phylink_pcs_neg_mode() needs to document this so that people don't think it does - and that solves both of the issues I was bringing up in my original email. > > However, if we did want to extend this topic, then there are a number > > of questions that really need to be asked is about the XPCS driver. > > Such as - what does 1000BASE-KX, 10000BASE-KX4, 10000BASE-KR and > > 2500BASE-X have to do with USXGMII, and why are there no copper > > ethtool modes listed when a USXGMII link can definitely support > > being connected to a copper PHY? (See xpcs_usxgmii_features[]). > > > > Why does XPCS think that USXGMII uses Clause 73 AN? (see the first > > entry in synopsys_xpcs_compat[].) > > First, in principle USXGMII and clause 73 are not mutually exclusive. > > It is possible to use clause 73 to advertise 10GBase-KR as a link mode, > and that will give you link training for proper 3-tap electrical > equalization over the copper backplane. > > Then, once C73 AN/LT ended and 10GBase-KR has been established, is > possible to configure the 10GBase-R PCS to enable C37 USXGMII to select > the actual data rate via symbol replication, while the SERDES lane > remains at 10GBaud. At least, the XPCS seems to permit enabling symbol > replication in conjunction with 10GBase-KR. My comments are against the driver as it stands today, not some theoretical case that the hardware may support. What I'm getting at is if the interface mode is PHY_INTERFACE_MODE_USXGMII, then... okay... we _may_ wish to do clause 73 negotiation advertising 10GBASE-KR and then do clause 73 for the USXGMII control word - but the driver doesn't do this as far as I can see. If C73 AN is being used, it merely reads the C73 state and returns the resolution from that. Any speed information that a USXGMII PHY passes back over the C37 inband signalling would be ignored because there seems to be no provision for the USXGMII inband signalling. So I'm confused what the xpcs driver _actually_ does when USXGMII mode is selected by PHY_INTERFACE_MODE_USXGMII, because looking at the driver, it doesn't look like it's USXGMII at all. > Then, there's the entire issue that the code, as it was originally > introduced, is not the same as it is now. For example, this bit in > xpcs_do_config(): > > switch (compat->an_mode) { > case DW_AN_C73: > if (phylink_autoneg_inband(mode)) { > ret = xpcs_config_aneg_c73(xpcs, compat); > if (ret) > return ret; > } > break; > > used to look at state->an_enabled rather than phylink_autoneg_inband(). > Through my idiocy, I inadvertently converted that in commit 11059740e616 > ("net: pcs: xpcs: convert to phylink_pcs_ops"). If we want to change that back to the old behaviour, that needs to be: if (test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, advertising)) { ... } break; but that wouldn't ever have been sufficient, even when the code was using an_enabled, because both of these reflect the user configuration. (an_enabled was just a proxy for this Autoneg bit). I'm going to call both of these an "AN indicator" in the question below. Isn't it rather perverse that the driver configures AN if this AN indicator is set, but then does nothing if it isn't? > > xpcs_sgmii_features[] only mentions copper linkmodes, but as we know, > > it's common for copper PHYs to also support fibre with an auto- > > selection mechanism. So, 1000BASE-X is definitely possible if SGMII > > is supported, so why isn't it listed. > > Most likely explanation is that XPCS has never been paired up until now > to such a PHY. So it's probably safe to add ETHTOOL_LINK_MODE_1000baseX_Full_BIT there - thanks. > > As previously said, 1000BASE-X can be connected to a PHY that does > > 1000BASE-T, so why does xpcs_1000basex_features[] not mention > > 1000baseT_Full... there's probably more here as well. > > > > Interestingly, xpcs_2500basex_features[] _does_ mention both > > 2500BASE-X and 2500BASE-T, but I think that only does because I > > happened to comment on it during a review. > > > > I think xpcs is another can of worms, but is an easier can of worms > > to solve than trying to sort out that "what's an ethernet PHY" > > question we seem to be heading towards (which I think would be a > > mammoth task, even back when phylink didn't exist, to sort out.) > > I wasn't necessarily going to go all the way into "what's a PHY?". > I just want to clarify some terms such that we can agree what is correct > and what is not. I believe that much of what's currently in XPCS w.r.t. > C73 is not correct, partly through initial intention and partly through > blind conversions such as mine. Right, that's probably why I'm having a hard time interpreting what this driver is doing when it comes to these modes that makes use clause 73. As this is the only phylink-using implementation that involves clause 73 at present, I would like to ensure that there's a clear resolution of the expected behaviour before we get further implementations, and preferably document what's expected.
On Tue, May 16, 2023 at 12:49:44PM +0100, Russell King (Oracle) wrote: > What I'm getting at is if the interface mode is > PHY_INTERFACE_MODE_USXGMII, then... okay... we _may_ wish to do > clause 73 negotiation advertising 10GBASE-KR and then do clause 73 ~~ 37 > for the USXGMII control word - but the driver doesn't do this as far > as I can see. If C73 AN is being used, it merely reads the C73 > state and returns the resolution from that. Any speed information that > a USXGMII PHY passes back over the C37 inband signalling would be > ignored because there seems to be no provision for the USXGMII > inband signalling. > > So I'm confused what the xpcs driver _actually_ does when USXGMII > mode is selected by PHY_INTERFACE_MODE_USXGMII, because looking at > the driver, it doesn't look like it's USXGMII at all. If what you're looking for is strictly the USXGMII in-band autoneg code word derived from C37, then the short answer is that you won't find it, even when going back to the initial commit fcb26bd2b6ca ("net: phy: Add Synopsys DesignWare XPCS MDIO module"). To the larger question 'what does XPCS actually do in phy-mode "usxgmii"', I guess the simple answer looking at the aforementioned initial commit is 'it violates the advertised coding scheme by using BASE-R instead of BASE-X for speeds lower than 10G', and 'if you don't want the USXGMII replicator just use phy-mode "10gbase-kr" which behaves basically the same except it doesn't advertise the rate-adapted speeds'. Disclaimer: I'm saying this based solely on the code and documentation. > If we want to change that back to the old behaviour, that needs to > be: > if (test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, advertising)) { > ... > } > break; Ok. I should send a patch fixing this, since I introduced the behavior change. I'll add your suggested tag. > but that wouldn't ever have been sufficient, even when the code was > using an_enabled, because both of these reflect the user configuration. > (an_enabled was just a proxy for this Autoneg bit). I'm going to call > both of these an "AN indicator" in the question below. > > Isn't it rather perverse that the driver configures AN if this AN > indicator is set, but then does nothing if it isn't? Maybe. My copy of the databook is parameterized based on instantiation-dependent variables, and I can't really tell what are the out-of-reset register values for hardware I don't have, so it is hard for me to infer what is the behavior when AN is not enabled. > As this is the only phylink-using implementation that involves clause > 73 at present, I would like to ensure that there's a clear resolution > of the expected behaviour before we get further implementations, and > preferably document what's expected. +1 that's where I would like this to go, too.
diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 8968094503d6..465248818276 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -21,6 +21,18 @@ enum { MLO_AN_FIXED, /* Fixed-link mode */ MLO_AN_INBAND, /* In-band protocol */ + /* PCS "negotiation" mode. + * PHYLINK_PCS_NEG_NONE - protocol has no inband capability + * PHYLINK_PCS_NEG_OUTBAND - some out of band or fixed link setting + * PHYLINK_PCS_NEG_INBAND_DISABLED - inband mode disabled, e.g. + * 1000base-X with autoneg off + * PHYLINK_PCS_NEG_INBAND_ENABLED - inband mode enabled + */ + PHYLINK_PCS_NEG_NONE = 0, + PHYLINK_PCS_NEG_OUTBAND, + PHYLINK_PCS_NEG_INBAND_DISABLED, + PHYLINK_PCS_NEG_INBAND_ENABLED, + /* MAC_SYM_PAUSE and MAC_ASYM_PAUSE are used when configuring our * autonegotiation advertisement. They correspond to the PAUSE and * ASM_DIR bits defined by 802.3, respectively. @@ -79,6 +91,67 @@ static inline bool phylink_autoneg_inband(unsigned int mode) return mode == MLO_AN_INBAND; } +/** + * phylink_pcs_neg_mode() - helper to determine PCS inband mode + * @mode: one of %MLO_AN_FIXED, %MLO_AN_PHY, %MLO_AN_INBAND. + * @interface: interface mode to be used + * @advertising: adertisement ethtool link mode mask + * + * Determines the negotiation mode to be used by the PCS, and returns + * one of: + * %PHYLINK_PCS_NEG_NONE: interface mode does not support inband + * %PHYLINK_PCS_NEG_OUTBAND: an out of band mode (e.g. reading the PHY, + * or fixed link) will be used. + * %PHYLINK_PCS_NEG_INBAND_DISABLED: inband mode selected but autoneg disabled + * %PHYLINK_PCS_NEG_INBAND_ENABLED: inband mode selected and autoneg enabled + */ +static inline unsigned int phylink_pcs_neg_mode(unsigned int mode, + phy_interface_t interface, + const unsigned long *advertising) +{ + unsigned int neg_mode; + + switch (interface) { + case PHY_INTERFACE_MODE_SGMII: + case PHY_INTERFACE_MODE_QSGMII: + case PHY_INTERFACE_MODE_QUSGMII: + case PHY_INTERFACE_MODE_USXGMII: + /* These protocols are designed for use with a PHY which + * communicates its negotiation result back to the MAC via + * inband communication. Note: there exist PHYs that run + * with SGMII but do not send the inband data. + */ + if (!phylink_autoneg_inband(mode)) + neg_mode = PHYLINK_PCS_NEG_OUTBAND; + else + neg_mode = PHYLINK_PCS_NEG_INBAND_ENABLED; + break; + + case PHY_INTERFACE_MODE_1000BASEX: + case PHY_INTERFACE_MODE_2500BASEX: + /* 1000base-X is designed for use media-side for Fibre + * connections, and thus the Autoneg bit needs to be + * taken into account. We also do this for 2500base-X + * as well, but drivers may not support this, so may + * need to override this. + */ + if (!phylink_autoneg_inband(mode)) + neg_mode = PHYLINK_PCS_NEG_OUTBAND; + else if (linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, + advertising)) + neg_mode = PHYLINK_PCS_NEG_INBAND_ENABLED; + else + neg_mode = PHYLINK_PCS_NEG_INBAND_DISABLED; + break; + + default: + neg_mode = PHYLINK_PCS_NEG_NONE; + break; + } + + return neg_mode; +} + /** * struct phylink_link_state - link state structure * @advertising: ethtool bitmask containing advertised link modes