Message ID | 20211123164027.15618-5-kabel@kernel.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Extend `phy-mode` to string array | expand |
On Tue, Nov 23, 2021 at 05:40:23PM +0100, Marek Behún wrote: > Now that the 'phy-mode' property can be a string array containing more > PHY modes (all that are supported by the board), update the bitmap of > interfaces supported by the MAC with this property. > > Normally this would be a simple intersection (of interfaces supported by > the current implementation of the driver and interfaces supported by the > board), but we need to keep being backwards compatible with older DTs, > which may only define one mode, since, as Russell King says, > conventionally phy-mode has meant "this is the mode we want to operate > the PHY interface in" which was fine when PHYs didn't change their > mode depending on the media speed > > An example is DT defining > phy-mode = "sgmii"; > but the board supporting also 1000base-x and 2500base-x. > > Add the following logic to keep this backwards compatiblity: > - if more PHY modes are defined, do a simple intersection > - if one PHY mode is defined: > - if it is sgmii, 1000base-x or 2500base-x, add all three and then do > the intersection > - if it is 10gbase-r or usxgmii, add both, and also 5gbase-r, > 2500base-x, 1000base-x and sgmii, and then do the intersection > > This is simple enough and should work for all boards. > > Nonetheless it is possible (although extremely unlikely, in my opinion) > that a board will be found that (for example) defines > phy-mode = "sgmii"; > and the MAC drivers supports sgmii, 1000base-x and 2500base-x, but the > board DOESN'T support 2500base-x, because of electrical reasons (since > the frequency is 2.5x of sgmii). > Our code will in this case incorrectly infer also support for > 2500base-x. To avoid this, the board maintainer should either change DTS > to > phy-mode = "sgmii", "1000base-x"; > and update device tree on all boards, or, if that is impossible, add a > fix into the function we are introducing in this commit. > > Another example would be a board with device-tree defining > phy-mode = "10gbase-r"; > We infer from this all other modes (sgmii, 1000base-x, 2500base-x, > 5gbase-r, usxgmii), and these then get filtered by those supported by > the driver. But it is possible that a driver supports all of these > modes, and yet not all are supported because the board has an older > version of the TF-A firmware, which implements changing of PHY modes via > SMC calls. For this case, the board maintainer should either provide all > supported modes in the phy-mode property, or add a fix into this > function that somehow checks for this situation. But this is a really > unprobable scenario, in my opinion. > > Signed-off-by: Marek Behún <kabel@kernel.org> > --- > Changes since v1: > - added 10gbase-r example scenario to commit message > - changed phylink_update_phy_modes() so that if supported_interfaces is > empty (an unconverted driver that doesn't fill up this member), we > leave it empty > - rewritten phylink_update_phy_modes() according to Sean Anderson's > comment: use phy_interface_and/or() instead of several > if (test_bit) set_bit > --- > drivers/net/phy/phylink.c | 70 +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 70 insertions(+) > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > index 3603c024109a..d2300a3a60ec 100644 > --- a/drivers/net/phy/phylink.c > +++ b/drivers/net/phy/phylink.c > @@ -564,6 +564,74 @@ static int phylink_parse_fixedlink(struct phylink *pl, > return 0; > } > > +static void phylink_update_phy_modes(struct phylink *pl, > + struct fwnode_handle *fwnode) > +{ > + unsigned long *supported = pl->config->supported_interfaces; > + DECLARE_PHY_INTERFACE_MASK(modes); > + > + /* FIXME: If supported is empty, leave it as it is. This happens for > + * unconverted drivers that don't fill up supported_interfaces. Once all > + * such drivers are converted, we can drop this. > + */ > + if (phy_interface_empty(supported)) > + return; > + > + if (fwnode_get_phy_modes(fwnode, modes) < 0) > + return; > + > + /* If no modes are defined in fwnode, interpret it as all modes > + * supported by the MAC are supported by the board. > + */ > + if (phy_interface_empty(modes)) > + return; > + > + /* We want the intersection of given supported modes with those defined > + * in DT. > + * > + * Some older device-trees mention only one of `sgmii`, `1000base-x` or > + * `2500base-x`, while supporting all three. Other mention `10gbase-r` > + * or `usxgmii`, while supporting both, and also `sgmii`, `1000base-x`, > + * `2500base-x` and `5gbase-r`. > + * For backwards compatibility with these older DTs, make it so that if > + * one of these modes is mentioned in DT and MAC supports more of them, > + * keep all that are supported according to the logic above. > + * > + * Nonetheless it is possible that a device may support only one mode, > + * for example 1000base-x, due to strapping pins or some other reasons. > + * If a specific device supports only the mode mentioned in DT, the > + * exception should be made here with of_machine_is_compatible(). > + */ > + if (bitmap_weight(modes, PHY_INTERFACE_MODE_MAX) == 1) { I like the idea of extending the mask of phy_modes based on the phylink_config.supported_interfaces bitmap that the driver populates (assuming, of course, that it's converted correctly to this new format, and I looked through the implementations and just found a bug). I think it might just work with old device trees, too. In fact, it may work so well, that I would even be tempted to ask "can we defer for a while updating the device trees and bindings documents with an array, just keep the phy modes as an array internally inside the kernel?" On the Marvell boards that you're working with, do you have an example board on which not all the PHY modes supported by the driver might be available? Do you also know the reason why? You give an example in the comments about 1000base-X and strapping, can you expand on that? Because I think it's a bit strange to create a framework for fixups now, when we don't even know what kind of stuff may be broken. The PHY modes (effectively SERDES protocols) might not be what you actually want to restrict. I mean, restricting the PHY modes is like saying: "the MAC supports USXGMII and 10GBase-R, the PHY supports USXGMII and 10GBase-R, I know how to configure both of them in each mode, but on this board, USXGMII works and 10GBase-R doesn't". ?! It may make more sense, when the time comes for the first fixup, to put a cap on the maximum gross data rate attainable on that particular lane (including stuff such as USXGMII symbol repetition), instead of having to list out all the PHY modes that a driver might or might not support right now. Imagine what pain it will be to update device trees each time a driver gains software support for a SERDES protocol it didn't support before. But if you cap the lane frequency, you can make phylink_update_phy_modes() deduce what isn't acceptable using simple logic. Also, if there's something to be broken by this change, why would we put an of_machine_is_compatible() here, and why wouldn't we instead update the phylink_config.supported_interfaces directly in the driver? I think it's the driver's responsibility for passing a valid mask of supported interfaces. > + DECLARE_PHY_INTERFACE_MASK(mask); > + bool lower = false; > + > + if (test_bit(PHY_INTERFACE_MODE_10GBASER, modes) || > + test_bit(PHY_INTERFACE_MODE_USXGMII, modes)) { > + phy_interface_zero(mask); > + __set_bit(PHY_INTERFACE_MODE_5GBASER, mask); > + __set_bit(PHY_INTERFACE_MODE_10GBASER, mask); > + __set_bit(PHY_INTERFACE_MODE_USXGMII, mask); > + phy_interface_and(mask, supported, mask); > + phy_interface_or(modes, modes, mask); > + lower = true; > + } > + > + if (lower || (test_bit(PHY_INTERFACE_MODE_SGMII, modes) || > + test_bit(PHY_INTERFACE_MODE_1000BASEX, modes) || > + test_bit(PHY_INTERFACE_MODE_2500BASEX, modes))) { > + phy_interface_zero(mask); > + __set_bit(PHY_INTERFACE_MODE_SGMII, mask); > + __set_bit(PHY_INTERFACE_MODE_1000BASEX, mask); > + __set_bit(PHY_INTERFACE_MODE_2500BASEX, mask); > + phy_interface_and(mask, supported, mask); > + phy_interface_or(modes, modes, mask); > + } > + } > + > + phy_interface_and(supported, supported, modes); > +} > + > static int phylink_parse_mode(struct phylink *pl, struct fwnode_handle *fwnode) > { > struct fwnode_handle *dn; > @@ -1157,6 +1225,8 @@ struct phylink *phylink_create(struct phylink_config *config, > __set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state); > timer_setup(&pl->link_poll, phylink_fixed_poll, 0); > > + phylink_update_phy_modes(pl, fwnode); > + > bitmap_fill(pl->supported, __ETHTOOL_LINK_MODE_MASK_NBITS); > linkmode_copy(pl->link_config.advertising, pl->supported); > phylink_validate(pl, pl->supported, &pl->link_config); > -- > 2.32.0 >
Hi Vladimir, On Tue, 23 Nov 2021 23:24:41 +0200 Vladimir Oltean <olteanv@gmail.com> wrote: > > + /* We want the intersection of given supported modes with those defined > > + * in DT. > > + * > > + * Some older device-trees mention only one of `sgmii`, `1000base-x` or > > + * `2500base-x`, while supporting all three. Other mention `10gbase-r` > > + * or `usxgmii`, while supporting both, and also `sgmii`, `1000base-x`, > > + * `2500base-x` and `5gbase-r`. > > + * For backwards compatibility with these older DTs, make it so that if > > + * one of these modes is mentioned in DT and MAC supports more of them, > > + * keep all that are supported according to the logic above. > > + * > > + * Nonetheless it is possible that a device may support only one mode, > > + * for example 1000base-x, due to strapping pins or some other reasons. > > + * If a specific device supports only the mode mentioned in DT, the > > + * exception should be made here with of_machine_is_compatible(). > > + */ > > + if (bitmap_weight(modes, PHY_INTERFACE_MODE_MAX) == 1) { > > I like the idea of extending the mask of phy_modes based on the > phylink_config.supported_interfaces bitmap that the driver populates > (assuming, of course, that it's converted correctly to this new format, > and I looked through the implementations and just found a bug). Russell is working on converting/fixing the drivers, you can look at his changes at http://git.armlinux.org.uk/cgit/linux-arm.git/log/?h=net-queue look at the first two pages, search for "populate supported_interfaces" > I think > it might just work with old device trees, too. In fact, it may work so > well, that I would even be tempted to ask "can we defer for a while > updating the device trees and bindings documents with an array, just > keep the phy modes as an array internally inside the kernel?" We don't need to update the device-trees immediately, but I really think the DT binding should be ready to allow multiple modes, if someone does want to update. > On the Marvell boards that you're working with, do you have an example > board on which not all the PHY modes supported by the driver might be > available? Do you also know the reason why? You give an example in the > comments about 1000base-X and strapping, can you expand on that? On Macchiatobin, both the mvpp2 MAC driver and the marvell10g PHY driver support xaui mode, but that mode requires 4 SerDes lanes, and the boards wires only one lane between the SOC and the PHY, so obviously it cannot be used. Thinking about it, maybe I should have put this into the commit message. > Because I think it's a bit strange to create a framework for fixups now, > when we don't even know what kind of stuff may be broken. The PHY modes > (effectively SERDES protocols) might not be what you actually want to restrict. > I mean, restricting the PHY modes is like saying: "the MAC supports > USXGMII and 10GBase-R, the PHY supports USXGMII and 10GBase-R, I know > how to configure both of them in each mode, but on this board, USXGMII > works and 10GBase-R doesn't". > > ?! As I said above, the best example is with SerDeses which use multiple lanes where not all may be wired. But this backward compatibility code concerns only one-lane SerDeses: usxgmii, 10gbaser-r, 5gbase-r, 2500base-x, 1000base-x, sgmii. For these, I can think only of 2 possibilities why a fixup might be needed to restrict some mode when it is supported both by MAC and PHY: 1) the mode is not supported by the board because it has too large frequency and the wiring on the board interferes with it. Example: - device tree defined phy-mode = "sgmii" - from this we infer also 1000base-x and 2500base-x - but 2500base-x works at 2.5x the frequency of 1000base-x/sgmii - the board wiring does not work at that frequency I don't know of any such board, but I know that such thing is possible, because for example the connetor on Turris MOX modules allows only frequencies up to 6 GHz. So it is not impossible for there to be boards where only 2 GHz is supported... 2) the mode is not supported by the board because the generic PHY driver uses SMC calls to the firmware to change the SerDes mode, but the board may have old version of firmware which does not support SMC call correctly (or at all). This was a real issue with TF-A firmware on Macchiatobin, where the 5gbase-r mode was not supported in older versions > It may make more sense, when the time comes for the first fixup, to put > a cap on the maximum gross data rate attainable on that particular lane > (including stuff such as USXGMII symbol repetition), instead of having > to list out all the PHY modes that a driver might or might not support > right now. Imagine what pain it will be to update device trees each time > a driver gains software support for a SERDES protocol it didn't support > before. But if you cap the lane frequency, you can make phylink_update_phy_modes() > deduce what isn't acceptable using simple logic. So we would need to specify max data rate, usxgmii symbol repetition, and number of lanes connected, and even then it could be not enough. I think that simply specifying all the phy-modes that the HW supports is simpler. The devicetree does not need and should not be updated each time software gains support for another SerDes protocol. The devicetree should be updated only once, to specify all SerDes protocols that the hardware supports (the SOC/MAC, the PHY, and the board wiring). The software then takes from this list only those modes that the drivers support. So no need to update device-tree each time SW gains support for new SerDes mode. > Also, if there's something to be broken by this change, why would we put > an of_machine_is_compatible() here, and why wouldn't we instead update > the phylink_config.supported_interfaces directly in the driver? I think > it's the driver's responsibility for passing a valid mask of supported > interfaces. The whole idea of this code is to guarantee backward compatibility with older device-trees. In my opinion (and I think Russell agrees with me), this should be at one place, instead of putting various weird exceptions into various MAC drivers. Marek
On Tue, Nov 23, 2021 at 11:27:13PM +0100, Marek Behún wrote: > Hi Vladimir, > > On Tue, 23 Nov 2021 23:24:41 +0200 > Vladimir Oltean <olteanv@gmail.com> wrote: > > > > + /* We want the intersection of given supported modes with those defined > > > + * in DT. > > > + * > > > + * Some older device-trees mention only one of `sgmii`, `1000base-x` or > > > + * `2500base-x`, while supporting all three. Other mention `10gbase-r` > > > + * or `usxgmii`, while supporting both, and also `sgmii`, `1000base-x`, > > > + * `2500base-x` and `5gbase-r`. > > > + * For backwards compatibility with these older DTs, make it so that if > > > + * one of these modes is mentioned in DT and MAC supports more of them, > > > + * keep all that are supported according to the logic above. > > > + * > > > + * Nonetheless it is possible that a device may support only one mode, > > > + * for example 1000base-x, due to strapping pins or some other reasons. > > > + * If a specific device supports only the mode mentioned in DT, the > > > + * exception should be made here with of_machine_is_compatible(). > > > + */ > > > + if (bitmap_weight(modes, PHY_INTERFACE_MODE_MAX) == 1) { > > > > I like the idea of extending the mask of phy_modes based on the > > phylink_config.supported_interfaces bitmap that the driver populates > > (assuming, of course, that it's converted correctly to this new format, > > and I looked through the implementations and just found a bug). > > Russell is working on converting/fixing the drivers, you can look at his > changes at > http://git.armlinux.org.uk/cgit/linux-arm.git/log/?h=net-queue > look at the first two pages, search for "populate supported_interfaces" This one is a mistake in one of the existing conversions, I looked at the git tree that you pointed to and I don't see any fixup patch on that driver. I'm currently testing a fix, but it might take until tomorrow until I post anything. > > I think > > it might just work with old device trees, too. In fact, it may work so > > well, that I would even be tempted to ask "can we defer for a while > > updating the device trees and bindings documents with an array, just > > keep the phy modes as an array internally inside the kernel?" > > We don't need to update the device-trees immediately, but I really > think the DT binding should be ready to allow multiple modes, if > someone does want to update. > > > On the Marvell boards that you're working with, do you have an example > > board on which not all the PHY modes supported by the driver might be > > available? Do you also know the reason why? You give an example in the > > comments about 1000base-X and strapping, can you expand on that? > > On Macchiatobin, both the mvpp2 MAC driver and the marvell10g PHY > driver support xaui mode, but that mode requires 4 SerDes lanes, and > the boards wires only one lane between the SOC and the PHY, so obviously > it cannot be used. Thinking about it, maybe I should have put this into > the commit message. Yes, so 4 lanes of 3.125 GHz or higher could support XAUI, 1 lane of 3.125 GHz or higher could support 2500base-X but not XAUI. Sounds like simple logic. > > > Because I think it's a bit strange to create a framework for fixups now, > > when we don't even know what kind of stuff may be broken. The PHY modes > > (effectively SERDES protocols) might not be what you actually want to restrict. > > I mean, restricting the PHY modes is like saying: "the MAC supports > > USXGMII and 10GBase-R, the PHY supports USXGMII and 10GBase-R, I know > > how to configure both of them in each mode, but on this board, USXGMII > > works and 10GBase-R doesn't". > > > > ?! > > As I said above, the best example is with SerDeses which use multiple > lanes where not all may be wired. But this backward compatibility code > concerns only one-lane SerDeses: usxgmii, 10gbaser-r, 5gbase-r, > 2500base-x, 1000base-x, sgmii. Right, why so? From phy-mode = "xaui", you could also infer support for single-lane SERDES protocols up to 3.125 GHz, aren't you interested in doing that too? > For these, I can think only of 2 > possibilities why a fixup might be needed to restrict some mode when it > is supported both by MAC and PHY: > 1) the mode is not supported by the board because it has too large > frequency and the wiring on the board interferes with it. > Example: > - device tree defined phy-mode = "sgmii" > - from this we infer also 1000base-x and 2500base-x > - but 2500base-x works at 2.5x the frequency of 1000base-x/sgmii > - the board wiring does not work at that frequency > > I don't know of any such board, but I know that such thing is > possible, because for example the connetor on Turris MOX modules > allows only frequencies up to 6 GHz. So it is not impossible for > there to be boards where only 2 GHz is supported... Right. We're saying the same thing, basically. > 2) the mode is not supported by the board because the generic PHY > driver uses SMC calls to the firmware to change the SerDes mode, but > the board may have old version of firmware which does not support > SMC call correctly (or at all). This was a real issue with TF-A > firmware on Macchiatobin, where the 5gbase-r mode was not supported > in older versions Ok, so in your proposal, U-Boot would have to fix up the device tree based on a certain version of ATF, and remove "5gbase-r" from the phy-mode array. Whereas in my proposal, the mvpp2 driver would need to find out about the ATF version in use, and remove "5gbase-r" from the supported interfaces. As a user, I'd certainly prefer if Linux could figure this one out. > > It may make more sense, when the time comes for the first fixup, to put > > a cap on the maximum gross data rate attainable on that particular lane > > (including stuff such as USXGMII symbol repetition), instead of having > > to list out all the PHY modes that a driver might or might not support > > right now. Imagine what pain it will be to update device trees each time > > a driver gains software support for a SERDES protocol it didn't support > > before. But if you cap the lane frequency, you can make phylink_update_phy_modes() > > deduce what isn't acceptable using simple logic. > > So we would need to specify max data rate, usxgmii symbol repetition, No, not symbol repetition. Just data rate and number of lanes. > and number of lanes connected, and even then it could be not enough. > I think that simply specifying all the phy-modes that the HW supports > is simpler. > > The devicetree does not need and should not be updated each time > software gains support for another SerDes protocol. The devicetree > should be updated only once, to specify all SerDes protocols that the > hardware supports (the SOC/MAC, the PHY, and the board wiring). The > software then takes from this list only those modes that the drivers > support. So no need to update device-tree each time SW gains support > for new SerDes mode. This implies that when you bring up a board and write the device tree for it, you know that PHY mode X works without ever testing it. What if it doesn't work when you finally add support for it? Now you already have one DT blob in circulation. That's why I'm saying that maybe it could be better if we could think in terms that are a bit more physical and easy to characterize. > > Also, if there's something to be broken by this change, why would we put > > an of_machine_is_compatible() here, and why wouldn't we instead update > > the phylink_config.supported_interfaces directly in the driver? I think > > it's the driver's responsibility for passing a valid mask of supported > > interfaces. > > The whole idea of this code is to guarantee backward compatibility with > older device-trees. In my opinion (and I think Russell agrees with me), > this should be at one place, instead of putting various weird > exceptions into various MAC drivers. Yes, but they're more flexible in the driver... What if the check is not as simple as a machine compatible (think about the ATF firmware example you gave).
On Wed, Nov 24, 2021 at 12:54:18AM +0200, Vladimir Oltean wrote: > This implies that when you bring up a board and write the device tree > for it, you know that PHY mode X works without ever testing it. What if > it doesn't work when you finally add support for it? Now you already > have one DT blob in circulation. That's why I'm saying that maybe it > could be better if we could think in terms that are a bit more physical > and easy to characterize. However, it doesn't solve the problem. Let's take an example. The 3310 supports a mode where it runs in XAUI/5GBASE-R/2500BASE-X/SGMII depending on the negotiated media parameters. XAUI is four lanes of 3.125Gbaud. 5GBASE-R is one lane of 5.15625Gbaud. Let's say you're using this, and test the 10G speed using XAUI, intending the other speeds to work. So you put in DT that you support four lanes and up to 5.15625Gbaud. Later, you discover that 5GBASE-R doesn't work because there's an electrical issue with the board. You now have DT in circulation which doesn't match the capabilities of the hardware. How is this any different from the situation you describe above? To me, it seems to be exactly the same problem. So, I don't think one can get away from these kinds of issues - where you create a board, do insufficient testing, publish a DT description, and then through further testing discover you have to restrict the hardware capabilities in DT. In fact, this is sadly an entirely normal process - problems always get found after boards have been sent out and initial DT has been created. A good example is the 6th switch port on the original Clearfog boards. This was connected to an external PHY at address 0 on the MDIO bus behind the switch. However, the 88E6176 switch already has an internal PHY at address 0, so the external PHY can't be accessed. Consequently, port 6 is unreliable. That only came to light some time later, and resulted in the DT needing to be modified. There are always problems that need DT to be fixed - I don't think it's possible to get away from that by changing what or how something is described. In fact, I think trying to make that argument actually shows a misunderstanding of the process of hardware bringup. Just like software, hardware is buggy and takes time to be debugged, and that process continues after the first version of DT for a board has been produced, and there will always be changes required to DT. I'm not saying that describing the maximum frequency and lanes doesn't have merit, I'm merely taking issue with the basis of your argument.
On Wed, 24 Nov 2021 00:54:18 +0200 Vladimir Oltean <olteanv@gmail.com> wrote: > > As I said above, the best example is with SerDeses which use multiple > > lanes where not all may be wired. But this backward compatibility code > > concerns only one-lane SerDeses: usxgmii, 10gbaser-r, 5gbase-r, > > 2500base-x, 1000base-x, sgmii. > > Right, why so? From phy-mode = "xaui", you could also infer support for > single-lane SERDES protocols up to 3.125 GHz, aren't you interested in > doing that too? OK, this would seem to make sense if I dropped the DT binding change for now, and just updated the code to change the supported_interfaces member so that marvell10g driver could select appropriate mode. I will think about it. Thanks. > > 2) the mode is not supported by the board because the generic PHY > > driver uses SMC calls to the firmware to change the SerDes mode, but > > the board may have old version of firmware which does not support > > SMC call correctly (or at all). This was a real issue with TF-A > > firmware on Macchiatobin, where the 5gbase-r mode was not supported > > in older versions > > Ok, so in your proposal, U-Boot would have to fix up the device tree > based on a certain version of ATF, and remove "5gbase-r" from the > phy-mode array. Whereas in my proposal, the mvpp2 driver would need to > find out about the ATF version in use, and remove "5gbase-r" from the > supported interfaces. > > As a user, I'd certainly prefer if Linux could figure this one out. You are right here, it would really be better for the mvpp2 driver to do this discovering. > This implies that when you bring up a board and write the device tree > for it, you know that PHY mode X works without ever testing it. What if > it doesn't work when you finally add support for it? Now you already > have one DT blob in circulation. That's why I'm saying that maybe it > could be better if we could think in terms that are a bit more physical > and easy to characterize. The thing is that this same could happen with your proposal of max-data-rate + number-of-lanes, as Russell explained in his reply. > > The whole idea of this code is to guarantee backward compatibility with > > older device-trees. In my opinion (and I think Russell agrees with me), > > this should be at one place, instead of putting various weird > > exceptions into various MAC drivers. > > Yes, but they're more flexible in the driver... What if the check is not > as simple as a machine compatible (think about the ATF firmware example > you gave). You persuaded me, it makes more sense in the driver. Marek
On Wed, Nov 24, 2021 at 11:04:37AM +0000, Russell King (Oracle) wrote: > On Wed, Nov 24, 2021 at 12:54:18AM +0200, Vladimir Oltean wrote: > > This implies that when you bring up a board and write the device tree > > for it, you know that PHY mode X works without ever testing it. What if > > it doesn't work when you finally add support for it? Now you already > > have one DT blob in circulation. That's why I'm saying that maybe it > > could be better if we could think in terms that are a bit more physical > > and easy to characterize. > > However, it doesn't solve the problem. Let's take an example. > > The 3310 supports a mode where it runs in XAUI/5GBASE-R/2500BASE-X/SGMII > depending on the negotiated media parameters. > > XAUI is four lanes of 3.125Gbaud. > 5GBASE-R is one lane of 5.15625Gbaud. > > Let's say you're using this, and test the 10G speed using XAUI, > intending the other speeds to work. So you put in DT that you support > four lanes and up to 5.15625Gbaud. Yes, see, the blame's on you if you do that. You effectively declared that the lane is able of sustaining a data rate higher than you've actually had proof it does (5.156 vs 3.125). The reason why I'm making this suggestion is because I think it lends itself better to the way in which hardware manufacturers work. A hobbyist like me has no choice than to test the highest data rate when determining what frequency to declare in the DT (it's the same thing for spi-max-frequency and other limilar DT properties, really), but hardware people have simulations based on IBIS-AMI models, they can do SERDES self-tests using PRBS patterns, lots of stuff to characterize what frequency a lane is rated for, without actually speaking any Ethernet protocol on it. In fact there are lots of people who can do this stuff (which I know mostly nothing about) with precision without even knowing how to even type a simple "ls" inside a Linux shell. > Later, you discover that 5GBASE-R doesn't work because there's an > electrical issue with the board. You now have DT in circulation > which doesn't match the capabilities of the hardware. > > How is this any different from the situation you describe above? > To me, it seems to be exactly the same problem. To err is human, of course. But one thing I think we learned from the old implementation of phylink_validate is that it gets very tiring to keep adding PHY modes, and we always seem to miss some. When that array will be described in DT, it could be just a tad more painful to maintain. > So, I don't think one can get away from these kinds of issues - where > you create a board, do insufficient testing, publish a DT description, > and then through further testing discover you have to restrict the > hardware capabilities in DT. In fact, this is sadly an entirely normal > process - problems always get found after boards have been sent out > and initial DT has been created. > > A good example is the 6th switch port on the original Clearfog boards. > This was connected to an external PHY at address 0 on the MDIO bus > behind the switch. However, the 88E6176 switch already has an internal > PHY at address 0, so the external PHY can't be accessed. Consequently, > port 6 is unreliable. That only came to light some time later, and > resulted in the DT needing to be modified. Sure, but this is a bit unrelated. > There are always problems that need DT to be fixed - I don't think it's > possible to get away from that by changing what or how something is > described. In fact, I think trying to make that argument actually shows > a misunderstanding of the process of hardware bringup. Oh, how so? > Just like software, hardware is buggy and takes time to be debugged, > and that process continues after the first version of DT for a board > has been produced, and there will always be changes required to DT. > > I'm not saying that describing the maximum frequency and lanes doesn't > have merit, I'm merely taking issue with the basis of your argument. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Wed, 24 Nov 2021 14:04:41 +0200 Vladimir Oltean <olteanv@gmail.com> wrote: > On Wed, Nov 24, 2021 at 11:04:37AM +0000, Russell King (Oracle) wrote: > > On Wed, Nov 24, 2021 at 12:54:18AM +0200, Vladimir Oltean wrote: > > > This implies that when you bring up a board and write the device tree > > > for it, you know that PHY mode X works without ever testing it. What if > > > it doesn't work when you finally add support for it? Now you already > > > have one DT blob in circulation. That's why I'm saying that maybe it > > > could be better if we could think in terms that are a bit more physical > > > and easy to characterize. > > > > However, it doesn't solve the problem. Let's take an example. > > > > The 3310 supports a mode where it runs in XAUI/5GBASE-R/2500BASE-X/SGMII > > depending on the negotiated media parameters. > > > > XAUI is four lanes of 3.125Gbaud. > > 5GBASE-R is one lane of 5.15625Gbaud. > > > > Let's say you're using this, and test the 10G speed using XAUI, > > intending the other speeds to work. So you put in DT that you support > > four lanes and up to 5.15625Gbaud. > > Yes, see, the blame's on you if you do that.You effectively declared > that the lane is able of sustaining a data rate higher than you've > actually had proof it does (5.156 vs 3.125). But the blame is on the DT writer in the same way if they declare support for a PHY mode that wasn't tested. (Or at least self-tests with PRBS patterns at given frequency.) > The reason why I'm making this suggestion is because I think it lends > itself better to the way in which hardware manufacturers work. > A hobbyist like me has no choice than to test the highest data rate when > determining what frequency to declare in the DT (it's the same thing for > spi-max-frequency and other limilar DT properties, really), but hardware > people have simulations based on IBIS-AMI models, they can do SERDES > self-tests using PRBS patterns, lots of stuff to characterize what > frequency a lane is rated for, without actually speaking any Ethernet > protocol on it. In fact there are lots of people who can do this stuff > (which I know mostly nothing about) with precision without even knowing > how to even type a simple "ls" inside a Linux shell. > > > Later, you discover that 5GBASE-R doesn't work because there's an > > electrical issue with the board. You now have DT in circulation > > which doesn't match the capabilities of the hardware. > > > > How is this any different from the situation you describe above? > > To me, it seems to be exactly the same problem. > > To err is human, of course. But one thing I think we learned from the > old implementation of phylink_validate is that it gets very tiring to > keep adding PHY modes, and we always seem to miss some. When that array > will be described in DT, it could be just a tad more painful to maintain. The thing is that we will still need the `phy-mode` property, it can't be deprecated IMO. There are non-SerDes modes, like rgmii, which may use different pins than SerDes modes. There may theoretically also be a SoC or PHY where the lanes for XAUI do not share pins with the lane of 2500base-x, and this lane may not be wired. Tis true that I don't know of any such hardware and it probably does not and will not exist, but we don't know that for sure and this is a case where your proposal will fail and the phy-mode extension would work nicely. Maybe we need opinions from other people here. Marek
On Wed, Nov 24, 2021 at 01:17:03PM +0100, Marek Behún wrote: > On Wed, 24 Nov 2021 14:04:41 +0200 > Vladimir Oltean <olteanv@gmail.com> wrote: > > > On Wed, Nov 24, 2021 at 11:04:37AM +0000, Russell King (Oracle) wrote: > > > On Wed, Nov 24, 2021 at 12:54:18AM +0200, Vladimir Oltean wrote: > > > > This implies that when you bring up a board and write the device tree > > > > for it, you know that PHY mode X works without ever testing it. What if > > > > it doesn't work when you finally add support for it? Now you already > > > > have one DT blob in circulation. That's why I'm saying that maybe it > > > > could be better if we could think in terms that are a bit more physical > > > > and easy to characterize. > > > > > > However, it doesn't solve the problem. Let's take an example. > > > > > > The 3310 supports a mode where it runs in XAUI/5GBASE-R/2500BASE-X/SGMII > > > depending on the negotiated media parameters. > > > > > > XAUI is four lanes of 3.125Gbaud. > > > 5GBASE-R is one lane of 5.15625Gbaud. > > > > > > Let's say you're using this, and test the 10G speed using XAUI, > > > intending the other speeds to work. So you put in DT that you support > > > four lanes and up to 5.15625Gbaud. > > > > Yes, see, the blame's on you if you do that.You effectively declared > > that the lane is able of sustaining a data rate higher than you've > > actually had proof it does (5.156 vs 3.125). > > But the blame is on the DT writer in the same way if they declare > support for a PHY mode that wasn't tested. (Or at least self-tests with > PRBS patterns at given frequency.) I think we're running around in circles on this one. I think there's an overlap between the supported_interfaces and the phy-mode array in device tree. Going back to your example with the SMC calls unsupported by ATF, you may run into the surprise that you have a phy-mode in the device tree which you need to mask out in Linux, via supported_interfaces. You would have needed to mask it out anyway, even with my proposal, so you don't gain anything except the extra burden of spelling out 5 lines of a phy-mode array in the device tree. It's going to be a nightmare for review, it isn't obvious to say that "this phy-mode shouldn't be here" or "you're missing this phy-mode". > > The reason why I'm making this suggestion is because I think it lends > > itself better to the way in which hardware manufacturers work. > > A hobbyist like me has no choice than to test the highest data rate when > > determining what frequency to declare in the DT (it's the same thing for > > spi-max-frequency and other limilar DT properties, really), but hardware > > people have simulations based on IBIS-AMI models, they can do SERDES > > self-tests using PRBS patterns, lots of stuff to characterize what > > frequency a lane is rated for, without actually speaking any Ethernet > > protocol on it. In fact there are lots of people who can do this stuff > > (which I know mostly nothing about) with precision without even knowing > > how to even type a simple "ls" inside a Linux shell. > > > > > Later, you discover that 5GBASE-R doesn't work because there's an > > > electrical issue with the board. You now have DT in circulation > > > which doesn't match the capabilities of the hardware. > > > > > > How is this any different from the situation you describe above? > > > To me, it seems to be exactly the same problem. > > > > To err is human, of course. But one thing I think we learned from the > > old implementation of phylink_validate is that it gets very tiring to > > keep adding PHY modes, and we always seem to miss some. When that array > > will be described in DT, it could be just a tad more painful to maintain. > > The thing is that we will still need the `phy-mode` property, it can't > be deprecated IMO. Wait a minute, who said anything about deprecating it? I just said "let's not make it an array, in the actual device tree". The phy-mode was, and will remain, the initial MII-side protocol, which can or cannot be changed at runtime. > There are non-SerDes modes, like rgmii, which may use different pins > than SerDes modes. And, as discussed, there is no use case for dynamic switchover between RGMII and a SERDES lane. > There may theoretically also be a SoC or PHY where the lanes for XAUI > do not share pins with the lane of 2500base-x, and this lane may not be > wired. Tis true that I don't know of any such hardware and it probably > does not and will not exist, but we don't know that for sure and this is > a case where your proposal will fail and the phy-mode extension would > work nicely. We haven't gotten to discussing any of the details yet, but my proposal of course would have been to describe the number of SERDES lanes and max frequency in the COMPHY nodes, and have the COMPHY tell you what is supported. At least with Layerscape, that would be the model. So when you have pinmuxing, you just get the capabilities from the lane in use. > > Maybe we need opinions from other people here. Other opinions are welcome of course.
On Wed, 24 Nov 2021 14:31:35 +0200 Vladimir Oltean <olteanv@gmail.com> wrote: > > > To err is human, of course. But one thing I think we learned from the > > > old implementation of phylink_validate is that it gets very tiring to > > > keep adding PHY modes, and we always seem to miss some. When that array > > > will be described in DT, it could be just a tad more painful to maintain. > > > > The thing is that we will still need the `phy-mode` property, it can't > > be deprecated IMO. > > Wait a minute, who said anything about deprecating it? I just said > "let's not make it an array, in the actual device tree". The phy-mode > was, and will remain, the initial MII-side protocol, which can or cannot > be changed at runtime. Hello Vladimir, I was told multiple times that device-tree should not specify how the software should behave (given multiple HW options). This has not been always followed, but it is preferred. Now the 'phy-mode' property, if interpreted as "the initial MII-side protocol" would break this rule. This is therefore another reason why it should either be extended to an array, or, if we went with your proposal of 'num-lanes' + 'max-freq', deprecated (at least for serdes modes). But it can't be deprecated entirely, IMO (because of non serdes protocols). I thought more about your proposal of 'num-lanes' + 'max-freq' vs extending 'phy-mode'. - 'num-lanes' + 'max-freq' are IMO closer to the idea of device-tree, since they describe exactly how the parts of the device are connected to each other - otherwise I think your argument against extending 'phy-mode' because of people declaring support for modes that weren't properly tested and would later be found broken is invalid, since the same could happen for 'num-lanes' + 'max-freq' properties - the 'phy-mode' property already exists. I think if we went with the 'num-lanes' + 'max-freq' proposal, we would need to deprecate 'phy-mode' for serdes protocols (at least for situations where multiple modes can be used, since then 'phy-mode' would go against the idea of the rule I mentioned in first paragraph) Vladimir, Rob has now given R-B for the 'phy-mode' extension patch. I am wondering now what to do, since other people haven't given their opinions here. Whether to re-send the series, and maybe start discussing there, or waiting more. Marek
On Thu, Dec 02, 2021 at 11:25:50PM +0100, Marek Behún wrote: > On Wed, 24 Nov 2021 14:31:35 +0200 > Vladimir Oltean <olteanv@gmail.com> wrote: > > > > > To err is human, of course. But one thing I think we learned from the > > > > old implementation of phylink_validate is that it gets very tiring to > > > > keep adding PHY modes, and we always seem to miss some. When that array > > > > will be described in DT, it could be just a tad more painful to maintain. > > > > > > The thing is that we will still need the `phy-mode` property, it can't > > > be deprecated IMO. > > > > Wait a minute, who said anything about deprecating it? I just said > > "let's not make it an array, in the actual device tree". The phy-mode > > was, and will remain, the initial MII-side protocol, which can or cannot > > be changed at runtime. > > Hello Vladimir, > > I was told multiple times that device-tree should not specify how the > software should behave (given multiple HW options). This has not been > always followed, but it is preferred. > > Now the 'phy-mode' property, if interpreted as "the initial MII-side > protocol" would break this rule. Isn't the rule already broken? Like it or not, this is exactly what the phy-mode property is today. Even the compatibility mode added by your patch right here _depends_ on that exact interpretation. > This is therefore another reason why it should either be extended to an > array, The reason being what, exactly? To me it is still non sequitur. I think you are saying that by listing all supported PHY modes, it becomes less of a "hardware configuration" and more of a "hardware description". But my point is that "all PHY modes" is a pretty moving target, and it shouldn't be put in the device tree because that should be rather stable. > or, if we went with your proposal of 'num-lanes' + 'max-freq', > deprecated (at least for serdes modes). But it can't be deprecated > entirely, IMO (because of non serdes protocols). You haven't exactly formulated why you think the implication is that phy-mode could be deprecated, if kept "not an array". Because it can't. If I understand you correctly, you are basically implying that in an ideal world where all MAC drivers and all PHY drivers could list out all their capabilities, a given MAC + PHY pair would just work out the highest common SERDES protocol that can be electrically supported. But as long as the Generic PHY driver is going to be a thing, this will be technically impossible, because there isn't any IEEE standardized way of determining SERDES protocols. So no, the phy-mode will continue to have its role of specifying the initial MII-side protocol, which can or cannot be changed at runtime. Also, as a second point, my guess is that there will always be SERDES blocks where changing the protocol at runtime is so complicated, or has so many restrictions, that "wont't change" is indistinguishable from "cant't change". And the network drivers that use these SERDES blocks will therefore use the device tree to specify the phy-mode that the SERDES lane is preconfigured for, which is not the same as describing the capability, technically speaking, but is nonetheless virtually indistinguishable. So I don't see that the phy-mode should become deprecated either way. > > I thought more about your proposal of 'num-lanes' + 'max-freq' vs > extending 'phy-mode'. > > - 'num-lanes' + 'max-freq' are IMO closer to the idea of device-tree, > since they describe exactly how the parts of the device are connected > to each other > - otherwise I think your argument against extending 'phy-mode' because > of people declaring support for modes that weren't properly tested and > would later be found broken is invalid, since the same could happen > for 'num-lanes' + 'max-freq' properties To reiterate my points: - The worst cases of the phy-mode array vs the physical properties of the lanes are about just as bad, if you're set out to misuse them. I.e. you can first declare that a lane is capable of 1.125 GHz, then "oh, you know what, now I tested 3.125 GHz and that works too", then "as a matter of fact, 10.3125 GHz works too", then "hey, there are actually 4 lanes, not just one!". I'm sorry, nobody can help you if you do that. - The best cases seem to be a bit more disproportionate on the other hand (i.e. when things are done "right" by device tree authors): With the phy-mode array, if you don't want to declare things that haven't been tested, it gets clunky: you'd have an ever increasing array of gibberish, and multiple versions of it in flight. On the other hand, doing the 'num-lanes' + 'max-freq' thing right would IMO mean that you validate the electrical parameters of the SERDES lane at the maximum intended and testable data rate and width. I truly believe this can be done orthogonally to actually testing an Ethernet protocol, which is why I'm actually suggesting it. This description can't bit rot no matter what you do, if it was done correctly (see above). It also appears easier to look at physical properties and say "yes, this looks about right", than looking at a potentially large phy-mode array where some modes are oddly missing and you don't really know why. - If you discover that some phy-modes can't be supported despite the electrical parameters allowing it (your SMC firmware example), it becomes "not a device tree problem" if you don't put the phy-mode array there in the first place. So my proposal simply comes from trying to optimize for the "doing things right" case (hopefully this would also be the common case). If we look at ways in which people will find ways to screw things up, I'm sure that the sky is the limit. > - the 'phy-mode' property already exists. I think if we went with the > 'num-lanes' + 'max-freq' proposal, we would need to deprecate > 'phy-mode' for serdes protocols (at least for situations where > multiple modes can be used, since then 'phy-mode' would go against > the idea of the rule I mentioned in first paragraph) I don't understand this argument, I'm sorry. I only suggested that you don't introduce a phy-mode array until there is a proven need for it. You've said that your current use cases are fine with just the "compatibility" mode where you extend the single phy-mode from the device tree based on supported_interfaces. I don't think you _have_ to introduce my suggested electrical parameters right now, either. It is just an alternative solution to an unspecified problem. But one other aspect that maybe should be considered is the fact that your compatibility mode would introduce some weirdness: if the phy-mode array in the device tree has exactly one element, then you say "oh, this is an old device tree, let me fix this up based on supported_interfaces and potentially change towards protocols that aren't present in the device tree". But if the phy-mode array has more than one element, you don't allow changing towards protocols outside of that array. That is strange. I think _you_ are changing the meaning of the phy-mode property. If you restrict the array of phy-modes via some other mechanism (like dynamically through code that populates supported_interfaces based on stuff like SMC firmware queries, lane electrical parameters), then transforming the phy-mode property into an array has literally no benefits, and there won't be any backwards compatibility problem to handle. > Vladimir, Rob has now given R-B for the 'phy-mode' extension patch. > > I am wondering now what to do, since other people haven't given their > opinions here. Whether to re-send the series, and maybe start discussing > there, or waiting more. Go with the approach you feel the most comfortable justifying, I have no stake here. In fact I would be glad too if other people shared their opinion on how to move things forward.
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 3603c024109a..d2300a3a60ec 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -564,6 +564,74 @@ static int phylink_parse_fixedlink(struct phylink *pl, return 0; } +static void phylink_update_phy_modes(struct phylink *pl, + struct fwnode_handle *fwnode) +{ + unsigned long *supported = pl->config->supported_interfaces; + DECLARE_PHY_INTERFACE_MASK(modes); + + /* FIXME: If supported is empty, leave it as it is. This happens for + * unconverted drivers that don't fill up supported_interfaces. Once all + * such drivers are converted, we can drop this. + */ + if (phy_interface_empty(supported)) + return; + + if (fwnode_get_phy_modes(fwnode, modes) < 0) + return; + + /* If no modes are defined in fwnode, interpret it as all modes + * supported by the MAC are supported by the board. + */ + if (phy_interface_empty(modes)) + return; + + /* We want the intersection of given supported modes with those defined + * in DT. + * + * Some older device-trees mention only one of `sgmii`, `1000base-x` or + * `2500base-x`, while supporting all three. Other mention `10gbase-r` + * or `usxgmii`, while supporting both, and also `sgmii`, `1000base-x`, + * `2500base-x` and `5gbase-r`. + * For backwards compatibility with these older DTs, make it so that if + * one of these modes is mentioned in DT and MAC supports more of them, + * keep all that are supported according to the logic above. + * + * Nonetheless it is possible that a device may support only one mode, + * for example 1000base-x, due to strapping pins or some other reasons. + * If a specific device supports only the mode mentioned in DT, the + * exception should be made here with of_machine_is_compatible(). + */ + if (bitmap_weight(modes, PHY_INTERFACE_MODE_MAX) == 1) { + DECLARE_PHY_INTERFACE_MASK(mask); + bool lower = false; + + if (test_bit(PHY_INTERFACE_MODE_10GBASER, modes) || + test_bit(PHY_INTERFACE_MODE_USXGMII, modes)) { + phy_interface_zero(mask); + __set_bit(PHY_INTERFACE_MODE_5GBASER, mask); + __set_bit(PHY_INTERFACE_MODE_10GBASER, mask); + __set_bit(PHY_INTERFACE_MODE_USXGMII, mask); + phy_interface_and(mask, supported, mask); + phy_interface_or(modes, modes, mask); + lower = true; + } + + if (lower || (test_bit(PHY_INTERFACE_MODE_SGMII, modes) || + test_bit(PHY_INTERFACE_MODE_1000BASEX, modes) || + test_bit(PHY_INTERFACE_MODE_2500BASEX, modes))) { + phy_interface_zero(mask); + __set_bit(PHY_INTERFACE_MODE_SGMII, mask); + __set_bit(PHY_INTERFACE_MODE_1000BASEX, mask); + __set_bit(PHY_INTERFACE_MODE_2500BASEX, mask); + phy_interface_and(mask, supported, mask); + phy_interface_or(modes, modes, mask); + } + } + + phy_interface_and(supported, supported, modes); +} + static int phylink_parse_mode(struct phylink *pl, struct fwnode_handle *fwnode) { struct fwnode_handle *dn; @@ -1157,6 +1225,8 @@ struct phylink *phylink_create(struct phylink_config *config, __set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state); timer_setup(&pl->link_poll, phylink_fixed_poll, 0); + phylink_update_phy_modes(pl, fwnode); + bitmap_fill(pl->supported, __ETHTOOL_LINK_MODE_MASK_NBITS); linkmode_copy(pl->link_config.advertising, pl->supported); phylink_validate(pl, pl->supported, &pl->link_config);
Now that the 'phy-mode' property can be a string array containing more PHY modes (all that are supported by the board), update the bitmap of interfaces supported by the MAC with this property. Normally this would be a simple intersection (of interfaces supported by the current implementation of the driver and interfaces supported by the board), but we need to keep being backwards compatible with older DTs, which may only define one mode, since, as Russell King says, conventionally phy-mode has meant "this is the mode we want to operate the PHY interface in" which was fine when PHYs didn't change their mode depending on the media speed An example is DT defining phy-mode = "sgmii"; but the board supporting also 1000base-x and 2500base-x. Add the following logic to keep this backwards compatiblity: - if more PHY modes are defined, do a simple intersection - if one PHY mode is defined: - if it is sgmii, 1000base-x or 2500base-x, add all three and then do the intersection - if it is 10gbase-r or usxgmii, add both, and also 5gbase-r, 2500base-x, 1000base-x and sgmii, and then do the intersection This is simple enough and should work for all boards. Nonetheless it is possible (although extremely unlikely, in my opinion) that a board will be found that (for example) defines phy-mode = "sgmii"; and the MAC drivers supports sgmii, 1000base-x and 2500base-x, but the board DOESN'T support 2500base-x, because of electrical reasons (since the frequency is 2.5x of sgmii). Our code will in this case incorrectly infer also support for 2500base-x. To avoid this, the board maintainer should either change DTS to phy-mode = "sgmii", "1000base-x"; and update device tree on all boards, or, if that is impossible, add a fix into the function we are introducing in this commit. Another example would be a board with device-tree defining phy-mode = "10gbase-r"; We infer from this all other modes (sgmii, 1000base-x, 2500base-x, 5gbase-r, usxgmii), and these then get filtered by those supported by the driver. But it is possible that a driver supports all of these modes, and yet not all are supported because the board has an older version of the TF-A firmware, which implements changing of PHY modes via SMC calls. For this case, the board maintainer should either provide all supported modes in the phy-mode property, or add a fix into this function that somehow checks for this situation. But this is a really unprobable scenario, in my opinion. Signed-off-by: Marek Behún <kabel@kernel.org> --- Changes since v1: - added 10gbase-r example scenario to commit message - changed phylink_update_phy_modes() so that if supported_interfaces is empty (an unconverted driver that doesn't fill up this member), we leave it empty - rewritten phylink_update_phy_modes() according to Sean Anderson's comment: use phy_interface_and/or() instead of several if (test_bit) set_bit --- drivers/net/phy/phylink.c | 70 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+)