Message ID | Z0XEWGqLJ8okNSIr@shell.armlinux.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | net: phylink managed EEE support | expand |
On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote: > Patch 11 adds phylink managed EEE support. Two new MAC APIs are added, > to enable and disable LPI. The enable method is passed the LPI timer > setting which it is expected to program into the hardware, and also a > flag ehther the transmit clock should be stopped. > > *** There are open questions here. Eagle eyed reviewers will notice > pl->config->lpi_interfaces. There are MACs out there which only > support LPI signalling on a subset of their interface types. Phylib > doesn't understand this. I'm handling this at the moment by simply > not activating LPI at the MAC, but that leads to ethtool --show-eee > suggesting that EEE is active when it isn't. > *** Should we pass the phy_interface_t to these functions? > *** Should mac_enable_tx_lpi() be allowed to fail if the MAC doesn't > support the interface mode? There is another point to raise here - should we have a "validate_eee" method in struct phylink_mac_ops so that MAC drivers can validate settings such as the tx_lpi_timer value can be programmed into the hardware? We do have the situation on Marvell platforms where the programmed value depends on the MAC speed, and is only 8 bit, which makes validating its value rather difficult - at 1G speeds, it's a resolution of 1us so we can support up to 255us. At 100M speeds, it's 10us, supporting up to 2.55ms. This makes it awkward to be able to validate the set_eee() settings are sane for the hardware. Should Marvell platforms instead implement a hrtimer above this? That sounds a bit problematical to manage sanely.
Hi Russell, On Tue, Nov 26, 2024 at 01:01:11PM +0000, Russell King (Oracle) wrote: > On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote: > > Patch 11 adds phylink managed EEE support. Two new MAC APIs are added, > > to enable and disable LPI. The enable method is passed the LPI timer > > setting which it is expected to program into the hardware, and also a > > flag ehther the transmit clock should be stopped. > > > > *** There are open questions here. Eagle eyed reviewers will notice > > pl->config->lpi_interfaces. There are MACs out there which only > > support LPI signalling on a subset of their interface types. Phylib > > doesn't understand this. I'm handling this at the moment by simply > > not activating LPI at the MAC, but that leads to ethtool --show-eee > > suggesting that EEE is active when it isn't. > > *** Should we pass the phy_interface_t to these functions? > > *** Should mac_enable_tx_lpi() be allowed to fail if the MAC doesn't > > support the interface mode? > > There is another point to raise here - should we have a "validate_eee" > method in struct phylink_mac_ops so that MAC drivers can validate > settings such as the tx_lpi_timer value can be programmed into the > hardware? > > We do have the situation on Marvell platforms where the programmed > value depends on the MAC speed, and is only 8 bit, which makes > validating its value rather difficult - at 1G speeds, it's a > resolution of 1us so we can support up to 255us. At 100M speeds, > it's 10us, supporting up to 2.55ms. This makes it awkward to be able > to validate the set_eee() settings are sane for the hardware. Should > Marvell platforms instead implement a hrtimer above this? That sounds > a bit problematical to manage sanely. I agree that tx_lpi_timer can be a problem, and this is not just a Marvell issue. For example, I think the FEC MAC on i.MX8MP might also be affected. But I can't confirm this because I don't have an i.MX8MP board with a PHY that supports MAC-controlled EEE mode. The Realtek PHY I have uses PHY-controlled EEE (SmartEEE). Except for this, I think there should be sane default values for tx_lpi_timer. The IEEE 802.3-2022 standard (Section 78.2) describes EEE timing, but it doesn’t give a clear recommendation for tx_lpi_timer. IMO, the best value for tx_lpi_timer should be the sum of the time needed to put the full chain (MAC -> PHY -> remote PHY) into sleep mode and the time needed to wake the chain. These times are link-speed specific, so defaults should consider PHY timings for each link speed. Except for tx_lpi_timer, some MACs also allow configuration of the Twake timer. For example, the FEC driver uses the lpi_timer value to configure the Twake timer, and the LAN78xx driver also provides a configurable Twake timer register. If the Twake timer is not configured properly, or if the system has quirks causing the actual Twake time to be longer than expected, it can result in frame corruption. Regards, Oleksij
On Tue, Nov 26, 2024 at 03:21:26PM +0100, Oleksij Rempel wrote: > Hi Russell, > > On Tue, Nov 26, 2024 at 01:01:11PM +0000, Russell King (Oracle) wrote: > > On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote: > > > Patch 11 adds phylink managed EEE support. Two new MAC APIs are added, > > > to enable and disable LPI. The enable method is passed the LPI timer > > > setting which it is expected to program into the hardware, and also a > > > flag ehther the transmit clock should be stopped. > > > > > > *** There are open questions here. Eagle eyed reviewers will notice > > > pl->config->lpi_interfaces. There are MACs out there which only > > > support LPI signalling on a subset of their interface types. Phylib > > > doesn't understand this. I'm handling this at the moment by simply > > > not activating LPI at the MAC, but that leads to ethtool --show-eee > > > suggesting that EEE is active when it isn't. > > > *** Should we pass the phy_interface_t to these functions? > > > *** Should mac_enable_tx_lpi() be allowed to fail if the MAC doesn't > > > support the interface mode? > > > > There is another point to raise here - should we have a "validate_eee" > > method in struct phylink_mac_ops so that MAC drivers can validate > > settings such as the tx_lpi_timer value can be programmed into the > > hardware? > > > > We do have the situation on Marvell platforms where the programmed > > value depends on the MAC speed, and is only 8 bit, which makes > > validating its value rather difficult - at 1G speeds, it's a > > resolution of 1us so we can support up to 255us. At 100M speeds, > > it's 10us, supporting up to 2.55ms. This makes it awkward to be able > > to validate the set_eee() settings are sane for the hardware. Should > > Marvell platforms instead implement a hrtimer above this? That sounds > > a bit problematical to manage sanely. > > I agree that tx_lpi_timer can be a problem, and this is not just a > Marvell issue. For example, I think the FEC MAC on i.MX8MP might also > be affected. But I can't confirm this because I don't have an i.MX8MP > board with a PHY that supports MAC-controlled EEE mode. The Realtek PHY > I have uses PHY-controlled EEE (SmartEEE). > > Except for this, I think there should be sane default values for > tx_lpi_timer. The IEEE 802.3-2022 standard (Section 78.2) describes EEE > timing, but it doesn’t give a clear recommendation for tx_lpi_timer. There are of course some parameters of EEE that should be fixed, and IEEE specifies them in that section. These are Ts, Tq and Tr, and IEEE gives a range of values for these which need to be conformed with in a compliant implementation. Please don't get confused by the mvneta/mvpp2 implementation, there are parameters for Ts and Tw, but the Ts value is not the same as Ts in IEEE. IEEE defines it as the period of time between the PHY transmitting sleep and turning all transmitters off. Marvell define it as the minimum time from the Tx FIFO being empty to asserting LPI - quite different! Other parameters depend on the implementation, such as the propagation delay of data through the PHY. These, we don't currently take account of. I don't recall off-hand whether there's any standards defined registers describing these parameters in the PHY (I need to delve into 802.3...) That's all needed for computing Tw. > IMO, the best value for tx_lpi_timer should be the sum of the time > needed to put the full chain (MAC -> PHY -> remote PHY) into sleep mode > and the time needed to wake the chain. These times are link-speed > specific, so defaults should consider PHY timings for each link speed. One can only make a set of defaults that depend on the speed if we also give the user the ability to set the tx_lpi_timer values on a per-speed basis, otherwise how do we update the values when set_eee() gets called? Also how do we know what the requirements of the remote PHY are? I think the only way that could be known is by exchanging the EEE TLV with the link partner as described in section 78. > Except for tx_lpi_timer, some MACs also allow configuration of the Twake > timer. For example, the FEC driver uses the lpi_timer value to > configure the Twake timer, and the LAN78xx driver also provides a > configurable Twake timer register. > > If the Twake timer is not configured properly, or if the system has > quirks causing the actual Twake time to be longer than expected, it can > result in frame corruption. I have been aware of it, and to me it sounds like another can of worms that right now I'd rather not open until we have solved the basics of EEE and got MAC drivers into better shape for the basics. I had been wondering whether we would eventually need phylink to pass Tw along with the LPI timer, and I think eventually we would need to - and we also need some infrastructure for the EEE TLV, both sending it at the appropriate time, and processing it when received. I don't think we have any of that at the moment, so it would be another chunk of development.
On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote: > In doing this, I came across the fact that the addition of phylib > managed EEE support has actually broken a huge number of drivers - > phylib will now overwrite all members of struct ethtool_keee whether > the netdev driver wants it or not. This leads to weird scenarios where > doing a get_eee() op followed by a set_eee() op results in e.g. > tx_lpi_timer being zeroed, because the MAC driver doesn't know it needs > to initialise phylib's phydev->eee_cfg.tx_lpi_timer member. This mess > really needs urgently addressing, and I believe it came about because > Andrew's patches were only partly merged via another party - I guess > highlighting the inherent danger of "thou shalt limit your patch series > to no more than 15 patches" when one has a subsystem who's in-kernel > API is changing. Note that, I think, fec_main.c isn't broken, although a quick review only looking at fec_enet_get_eee() may suggest otherwise: static int fec_enet_get_eee(struct net_device *ndev, struct ethtool_keee *edata) { struct fec_enet_private *fep = netdev_priv(ndev); struct ethtool_keee *p = &fep->eee; if (!(fep->quirks & FEC_QUIRK_HAS_EEE)) return -EOPNOTSUPP; if (!netif_running(ndev)) return -ENETDOWN; edata->tx_lpi_timer = p->tx_lpi_timer; // <=========================== return phy_ethtool_get_eee(ndev->phydev, edata); } static int fec_enet_set_eee(struct net_device *ndev, struct ethtool_keee *edata) { ... p->tx_lpi_timer = edata->tx_lpi_timer; Since the driver does not touch phydev->eee_cfg.tx_lpi_timer, phy_ethtool_get_eee() above will overwrite edata->tx_lpi_timer with zero. If ethtool does a read-modify-write on the EEE settings, then fec_enet_set_eee() will be passed a value of zero for edata->tx_lpi_timer. This will result in FEC_LPI_SLEEP and FEC_LPI_WAKE being written with zero, and from what I can see in fec_enet_eee_mode_set(), that disables EEE. The saving grace for this driver is that p->tx_lpi_timer also starts off as zero. A better implementation would be to get rid of p->tx_lpi_timer entirely, and instead rely on phydev->eee_cfg.tx_lpi_timer to be managed by phylib, obtaining its value from there in fec_enet_eee_mode_set(). At least the driver doesn't attempt to maintain any other EEE state! So something like this: diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h index 1cca0425d493..c81f2ea588f2 100644 --- a/drivers/net/ethernet/freescale/fec.h +++ b/drivers/net/ethernet/freescale/fec.h @@ -671,8 +671,6 @@ struct fec_enet_private { unsigned int tx_time_itr; unsigned int itr_clk_rate; - /* tx lpi eee mode */ - struct ethtool_keee eee; unsigned int clk_ref_rate; /* ptp clock period in ns*/ diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 1b55047c0237..25c842835d52 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -2045,14 +2045,14 @@ static int fec_enet_us_to_tx_cycle(struct net_device *ndev, int us) return us * (fep->clk_ref_rate / 1000) / 1000; } -static int fec_enet_eee_mode_set(struct net_device *ndev, bool enable) +static int fec_enet_eee_mode_set(struct net_device *ndev, u32 lpi_timer, + bool enable) { struct fec_enet_private *fep = netdev_priv(ndev); - struct ethtool_keee *p = &fep->eee; unsigned int sleep_cycle, wake_cycle; if (enable) { - sleep_cycle = fec_enet_us_to_tx_cycle(ndev, p->tx_lpi_timer); + sleep_cycle = fec_enet_us_to_tx_cycle(lpi_timer); wake_cycle = sleep_cycle; } else { sleep_cycle = 0; @@ -2105,7 +2105,9 @@ static void fec_enet_adjust_link(struct net_device *ndev) napi_enable(&fep->napi); } if (fep->quirks & FEC_QUIRK_HAS_EEE) - fec_enet_eee_mode_set(ndev, phy_dev->enable_tx_lpi); + fec_enet_eee_mode_set(ndev, + phy_dev->eee_cfg.tx_lpi_timer, + phy_dev->enable_tx_lpi); } else { if (fep->link) { netif_stop_queue(ndev); @@ -3181,7 +3183,6 @@ static int fec_enet_get_eee(struct net_device *ndev, struct ethtool_keee *edata) { struct fec_enet_private *fep = netdev_priv(ndev); - struct ethtool_keee *p = &fep->eee; if (!(fep->quirks & FEC_QUIRK_HAS_EEE)) return -EOPNOTSUPP; @@ -3189,8 +3190,6 @@ fec_enet_get_eee(struct net_device *ndev, struct ethtool_keee *edata) if (!netif_running(ndev)) return -ENETDOWN; - edata->tx_lpi_timer = p->tx_lpi_timer; - return phy_ethtool_get_eee(ndev->phydev, edata); } @@ -3198,7 +3197,6 @@ static int fec_enet_set_eee(struct net_device *ndev, struct ethtool_keee *edata) { struct fec_enet_private *fep = netdev_priv(ndev); - struct ethtool_keee *p = &fep->eee; if (!(fep->quirks & FEC_QUIRK_HAS_EEE)) return -EOPNOTSUPP; @@ -3206,8 +3204,6 @@ fec_enet_set_eee(struct net_device *ndev, struct ethtool_keee *edata) if (!netif_running(ndev)) return -ENETDOWN; - p->tx_lpi_timer = edata->tx_lpi_timer; - return phy_ethtool_set_eee(ndev->phydev, edata); } Another patch to be added to my stack...
On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote: > In doing this, I came across the fact that the addition of phylib > managed EEE support has actually broken a huge number of drivers - > phylib will now overwrite all members of struct ethtool_keee whether > the netdev driver wants it or not. This leads to weird scenarios where > doing a get_eee() op followed by a set_eee() op results in e.g. > tx_lpi_timer being zeroed, because the MAC driver doesn't know it needs > to initialise phylib's phydev->eee_cfg.tx_lpi_timer member. This mess > really needs urgently addressing, and I believe it came about because > Andrew's patches were only partly merged via another party - I guess > highlighting the inherent danger of "thou shalt limit your patch series > to no more than 15 patches" when one has a subsystem who's in-kernel > API is changing. Looking at the rtl8169 driver, it looks pretty similar to the Marvell situation. The value stored in tp->tx_lpi_timer is apparently, according to r8169_get_tx_lpi_timer_us(), a value in bytes, not in a unit of time. So it's dependent on the negotiated speed, and thus we can't read it to set the initial phydev->eee_cfg.tx_lpi_timer state, because in the _probe() function, the PHY may not have negotiated a speed. However, this driver writes keee->tx_lpi_timer after phy_ethtool_get_eee() which means that it overrides phylib, so hasn't been broken. Therefore, I think rtl8169 is fine.
On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote: > In doing this, I came across the fact that the addition of phylib > managed EEE support has actually broken a huge number of drivers - > phylib will now overwrite all members of struct ethtool_keee whether > the netdev driver wants it or not. This leads to weird scenarios where > doing a get_eee() op followed by a set_eee() op results in e.g. > tx_lpi_timer being zeroed, because the MAC driver doesn't know it needs > to initialise phylib's phydev->eee_cfg.tx_lpi_timer member. This mess > really needs urgently addressing, and I believe it came about because > Andrew's patches were only partly merged via another party - I guess > highlighting the inherent danger of "thou shalt limit your patch series > to no more than 15 patches" when one has a subsystem who's in-kernel > API is changing. Looking at the two TI offerings that call phy_ethtool_get_eee(), both of them call the phylib functions from their ethtool ops, but it looks like the driver does diddly squat with LPI state, which makes me wonder why they implemented the calls to phy_ethtool_get_eee() and phy_ethtool_set_eee(), since EEE will not be functional unless the PHY has been configured with a SmartEEE mode outside the kernel.
For archive: static void sxgbe_set_eee_timer(void __iomem *ioaddr, const int ls, const int tw) { int value = ((tw & 0xffff)) | ((ls & 0x7ff) << 16); /* Program the timers in the LPI timer control register: * LS: minimum time (ms) for which the link * status from PHY should be ok before transmitting * the LPI pattern. * TW: minimum time (us) for which the core waits * after it has stopped transmitting the LPI pattern. */ writel(value, ioaddr + SXGBE_CORE_LPI_TIMER_CTRL); } bool sxgbe_eee_init(struct sxgbe_priv_data * const priv) { .... /* MAC core supports the EEE feature. */ if (priv->hw_cap.eee) { /* Check if the PHY supports EEE */ if (phy_init_eee(ndev->phydev, true)) return false; timer_setup(&priv->eee_ctrl_timer, sxgbe_eee_ctrl_timer, 0); priv->eee_ctrl_timer.expires = SXGBE_LPI_TIMER(eee_timer); add_timer(&priv->eee_ctrl_timer); priv->hw->mac->set_eee_timer(priv->ioaddr, SXGBE_DEFAULT_LPI_TIMER, priv->tx_lpi_timer); ^^^^^^^^^^^^ LPI timer is used for Twake timer. } In case user configures lpi_timer value to too low, it will case frame corruption instead of expected performance drop.
On 27.11.2024 11:49, Russell King (Oracle) wrote: > On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote: >> In doing this, I came across the fact that the addition of phylib >> managed EEE support has actually broken a huge number of drivers - >> phylib will now overwrite all members of struct ethtool_keee whether >> the netdev driver wants it or not. This leads to weird scenarios where >> doing a get_eee() op followed by a set_eee() op results in e.g. >> tx_lpi_timer being zeroed, because the MAC driver doesn't know it needs >> to initialise phylib's phydev->eee_cfg.tx_lpi_timer member. This mess >> really needs urgently addressing, and I believe it came about because >> Andrew's patches were only partly merged via another party - I guess >> highlighting the inherent danger of "thou shalt limit your patch series >> to no more than 15 patches" when one has a subsystem who's in-kernel >> API is changing. > > Looking at the rtl8169 driver, it looks pretty similar to the Marvell > situation. The value stored in tp->tx_lpi_timer is apparently, > according to r8169_get_tx_lpi_timer_us(), a value in bytes, not in a > unit of time. So it's dependent on the negotiated speed, and thus we > can't read it to set the initial phydev->eee_cfg.tx_lpi_timer state, > because in the _probe() function, the PHY may not have negotiated a > speed. > Right, hw stores the tx_lpi_timer in bytes. Driver's default value is one frame plus a few bytes. It doesn't use phydev->eee_cfg.tx_lpi_timer. set_eee() op isn't implemented for tx_lpi_timer, because no one ever asked for it and I'm not aware of any good use case. > However, this driver writes keee->tx_lpi_timer after > phy_ethtool_get_eee() which means that it overrides phylib, so hasn't > been broken. > > Therefore, I think rtl8169 is fine. >
On Wed, Nov 27, 2024 at 11:20:08AM +0000, Russell King (Oracle) wrote: > On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote: > > In doing this, I came across the fact that the addition of phylib > > managed EEE support has actually broken a huge number of drivers - > > phylib will now overwrite all members of struct ethtool_keee whether > > the netdev driver wants it or not. This leads to weird scenarios where > > doing a get_eee() op followed by a set_eee() op results in e.g. > > tx_lpi_timer being zeroed, because the MAC driver doesn't know it needs > > to initialise phylib's phydev->eee_cfg.tx_lpi_timer member. This mess > > really needs urgently addressing, and I believe it came about because > > Andrew's patches were only partly merged via another party - I guess > > highlighting the inherent danger of "thou shalt limit your patch series > > to no more than 15 patches" when one has a subsystem who's in-kernel > > API is changing. > > Looking at the two TI offerings that call phy_ethtool_get_eee(), both > of them call the phylib functions from their ethtool ops, but it looks > like the driver does diddly squat with LPI state, which makes me wonder > why they implemented the calls to phy_ethtool_get_eee() and > phy_ethtool_set_eee(), since EEE will not be functional unless the PHY > has been configured with a SmartEEE mode outside the kernel. Probably because they did not know what they were doing, and it got past reviewers. Well, actually: commit a090994980a15f8cc14fc188b5929bd61d2ae9c3 Author: Yegor Yefremov <yegorslists@googlemail.com> Date: Mon Nov 28 09:41:33 2016 +0100 cpsw: ethtool: add support for getting/setting EEE registers Add the ability to query and set Energy Efficient Ethernet parameters via ethtool for applicable devices. This patch doesn't activate full EEE support in cpsw driver, but it enables reading and writing EEE advertising settings. This way one can disable advertising EEE for certain speeds. Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com> Acked-by: Rami Rosen <roszenrami@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Seems like somebody had an issue and did the minimum to work around the issue. This also suggests the hardware is doing EEE by default, hopefully with some sort of sensible hardware defaults. Andrew