mbox series

[RFC,net-next,00/23] net: phylink managed EEE support

Message ID Z0XEWGqLJ8okNSIr@shell.armlinux.org.uk (mailing list archive)
Headers show
Series net: phylink managed EEE support | expand

Message

Russell King (Oracle) Nov. 26, 2024, 12:51 p.m. UTC
Hi,

Adding managed EEE support to phylink has been on the cards ever since
the idea in phylib was mooted. This overly large series attempts to do
so. I've included all the patches as it's important to get the driver
patches out there.

In doing this, I came across the fact that the addition of phylib
managed EEE support has actually broken a huge number of drivers -
phylib will now overwrite all members of struct ethtool_keee whether
the netdev driver wants it or not. This leads to weird scenarios where
doing a get_eee() op followed by a set_eee() op results in e.g.
tx_lpi_timer being zeroed, because the MAC driver doesn't know it needs
to initialise phylib's phydev->eee_cfg.tx_lpi_timer member. This mess
really needs urgently addressing, and I believe it came about because
Andrew's patches were only partly merged via another party - I guess
highlighting the inherent danger of "thou shalt limit your patch series
to no more than 15 patches" when one has a subsystem who's in-kernel
API is changing.

I am ignoring that limit for this posting precisely because of this.
I think we need to have a discussion about it, because if it ends up
causing breakage, then we're doing something wrong.

One of the drivers that got broken was stmmac, so this series also
includes a number of patches that fix it before converting stmmac to
phylink managed EEE. I can point to many many more that are similarly
broken.

Also inflating this series are two important patches that have been
submitted for the NET tree, but which aren't yet part of the net-next
tree - thus making this series larger than really necessary. If it
weren't for both of these issues, then this series would be exactly
15 patches.

Anyway, these patches...

Patch 1 and 2 are patches that have been submitted and possibly applied
to the net tree.

Patch 3 changes the Marvell driver to use the state we store in
struct phy_device, rather than manually calling
phydev->eee_cfg.eee_enabled.

Patch 4 avoids genphy_c45_ethtool_get_eee() setting ->eee_enabled, as
we copy that from phydev->eee_cfg.eee_enabled later, and after patch 3
mo one uses this after calling genphy_c45_ethtool_get_eee(). In fact,
the only caller of this function now is phy_ethtool_get_eee().

As all callers to genphy_c45_eee_is_active() now pass NULL as its
is_enabled flag, this is no longer useful. Remove the argument in
patch 5.

Patch 6 updates the phylib documentation to make it absolutely clear
that phy_ethtool_get_eee() now fills in all members of struct
ethtool_keee, which is why we now have so many buggy network drivers.
We need to decide how to fix this mess.

Patch 7 adds a definition for the clock stop capable bit in the PCS
MMD status register.

Patch 8 adds a phylib API to query whether the PHY allows the transmit
xMII clock to be stopped while in LPI mode. This capability is for MAC
drivers to save power when LPI is active, to allow them to stop their
transmit clock.

Patch 9 adds another phylib API to configure whether the receive xMII
clock may be disabled by the PHY. We do have an existing API,
phy_init_eee(), but... it only allows the control bit to be set which
is weird - what if a boot firmware or previous kernel has set this bit
and we want it clear?

Patch 10 finally starts on the phylink parts of this, extracting from
phylink_resolve() the detection of link-up. (Yes, okay, I could've
dropped this patch, but with 23 patches, it's not going to make that
much difference.)

Patch 11 adds phylink managed EEE support. Two new MAC APIs are added,
to enable and disable LPI. The enable method is passed the LPI timer
setting which it is expected to program into the hardware, and also a
flag ehther the transmit clock should be stopped.

 *** There are open questions here. Eagle eyed reviewers will notice
   pl->config->lpi_interfaces. There are MACs out there which only
   support LPI signalling on a subset of their interface types. Phylib
   doesn't understand this. I'm handling this at the moment by simply
   not activating LPI at the MAC, but that leads to ethtool --show-eee
   suggesting that EEE is active when it isn't.
 *** Should we pass the phy_interface_t to these functions?
 *** Should mac_enable_tx_lpi() be allowed to fail if the MAC doesn't
   support the interface mode?

An example of a MAC that this is the case are the Marvell ones - both
NETA and PP2 only support LPI signalling when connected via SGMII,
which makes being connected to a PHY which changes its link mode
problematical.

The remainder of the patches address the driver sides, which are
necessary to actually test this stuff out. The exception are the stmmac
patches.

The first four stmmac patches show what is necessary across many drivers
to fix the current phylib EEE mess.

The 5th stmmac patch makes reporting of EEE errors dependent on whether
EEE is supported by stmmac or not - I can't see why one would want
anything else (maybe someone can enlighten me?)

The 6th stmmac patch converts to use phy_eee_rx_clock_stop(), thereby
ensuring that, if desired, the RX clock will not be stopped by the PHY
when in LPI mode (which as noted above is something that phy_init_eee()
doesn't do.) Given that we know stmmac has issues if the RX clock is
stopped, this could be a bug fix.

The final patch converts stmmac to phylink managed EEE.

 drivers/net/ethernet/marvell/mvneta.c              | 118 ++++++++++--------
 drivers/net/ethernet/marvell/mvpp2/mvpp2.h         |   5 +
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c    |  88 ++++++++++++++
 drivers/net/ethernet/microchip/lan743x_ethtool.c   |  21 ----
 drivers/net/ethernet/microchip/lan743x_main.c      |  39 ++++--
 drivers/net/ethernet/microchip/lan743x_main.h      |   1 -
 drivers/net/ethernet/stmicro/stmmac/stmmac.h       |   1 -
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |  25 +---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  68 ++++++++---
 drivers/net/phy/marvell.c                          |   4 +-
 drivers/net/phy/phy-c45.c                          |  15 +--
 drivers/net/phy/phy.c                              | 106 +++++++++++-----
 drivers/net/phy/phylink.c                          | 134 +++++++++++++++++++--
 include/linux/phy.h                                |   6 +-
 include/linux/phylink.h                            |  44 +++++++
 include/uapi/linux/mdio.h                          |   1 +
 16 files changed, 505 insertions(+), 171 deletions(-)

Comments

Russell King (Oracle) Nov. 26, 2024, 1:01 p.m. UTC | #1
On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote:
> Patch 11 adds phylink managed EEE support. Two new MAC APIs are added,
> to enable and disable LPI. The enable method is passed the LPI timer
> setting which it is expected to program into the hardware, and also a
> flag ehther the transmit clock should be stopped.
> 
>  *** There are open questions here. Eagle eyed reviewers will notice
>    pl->config->lpi_interfaces. There are MACs out there which only
>    support LPI signalling on a subset of their interface types. Phylib
>    doesn't understand this. I'm handling this at the moment by simply
>    not activating LPI at the MAC, but that leads to ethtool --show-eee
>    suggesting that EEE is active when it isn't.
>  *** Should we pass the phy_interface_t to these functions?
>  *** Should mac_enable_tx_lpi() be allowed to fail if the MAC doesn't
>    support the interface mode?

There is another point to raise here - should we have a "validate_eee"
method in struct phylink_mac_ops so that MAC drivers can validate
settings such as the tx_lpi_timer value can be programmed into the
hardware?

We do have the situation on Marvell platforms where the programmed
value depends on the MAC speed, and is only 8 bit, which makes
validating its value rather difficult - at 1G speeds, it's a
resolution of 1us so we can support up to 255us. At 100M speeds,
it's 10us, supporting up to 2.55ms. This makes it awkward to be able
to validate the set_eee() settings are sane for the hardware. Should
Marvell platforms instead implement a hrtimer above this? That sounds
a bit problematical to manage sanely.
Oleksij Rempel Nov. 26, 2024, 2:21 p.m. UTC | #2
Hi Russell,

On Tue, Nov 26, 2024 at 01:01:11PM +0000, Russell King (Oracle) wrote:
> On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote:
> > Patch 11 adds phylink managed EEE support. Two new MAC APIs are added,
> > to enable and disable LPI. The enable method is passed the LPI timer
> > setting which it is expected to program into the hardware, and also a
> > flag ehther the transmit clock should be stopped.
> > 
> >  *** There are open questions here. Eagle eyed reviewers will notice
> >    pl->config->lpi_interfaces. There are MACs out there which only
> >    support LPI signalling on a subset of their interface types. Phylib
> >    doesn't understand this. I'm handling this at the moment by simply
> >    not activating LPI at the MAC, but that leads to ethtool --show-eee
> >    suggesting that EEE is active when it isn't.
> >  *** Should we pass the phy_interface_t to these functions?
> >  *** Should mac_enable_tx_lpi() be allowed to fail if the MAC doesn't
> >    support the interface mode?
> 
> There is another point to raise here - should we have a "validate_eee"
> method in struct phylink_mac_ops so that MAC drivers can validate
> settings such as the tx_lpi_timer value can be programmed into the
> hardware?
> 
> We do have the situation on Marvell platforms where the programmed
> value depends on the MAC speed, and is only 8 bit, which makes
> validating its value rather difficult - at 1G speeds, it's a
> resolution of 1us so we can support up to 255us. At 100M speeds,
> it's 10us, supporting up to 2.55ms. This makes it awkward to be able
> to validate the set_eee() settings are sane for the hardware. Should
> Marvell platforms instead implement a hrtimer above this? That sounds
> a bit problematical to manage sanely.

I agree that tx_lpi_timer can be a problem, and this is not just a
Marvell issue.  For example, I think the FEC MAC on i.MX8MP might also
be affected.  But I can't confirm this because I don't have an i.MX8MP
board with a PHY that supports MAC-controlled EEE mode. The Realtek PHY
I have uses PHY-controlled EEE (SmartEEE).

Except for this, I think there should be sane default values for
tx_lpi_timer.  The IEEE 802.3-2022 standard (Section 78.2) describes EEE
timing, but it doesn’t give a clear recommendation for tx_lpi_timer.
IMO, the best value for tx_lpi_timer should be the sum of the time
needed to put the full chain (MAC -> PHY -> remote PHY) into sleep mode
and the time needed to wake the chain. These times are link-speed
specific, so defaults should consider PHY timings for each link speed.

Except for tx_lpi_timer, some MACs also allow configuration of the Twake
timer.  For example, the FEC driver uses the lpi_timer value to
configure the Twake timer, and the LAN78xx driver also provides a
configurable Twake timer register.

If the Twake timer is not configured properly, or if the system has
quirks causing the actual Twake time to be longer than expected, it can
result in frame corruption. 

Regards,
Oleksij
Russell King (Oracle) Nov. 26, 2024, 4:14 p.m. UTC | #3
On Tue, Nov 26, 2024 at 03:21:26PM +0100, Oleksij Rempel wrote:
> Hi Russell,
> 
> On Tue, Nov 26, 2024 at 01:01:11PM +0000, Russell King (Oracle) wrote:
> > On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote:
> > > Patch 11 adds phylink managed EEE support. Two new MAC APIs are added,
> > > to enable and disable LPI. The enable method is passed the LPI timer
> > > setting which it is expected to program into the hardware, and also a
> > > flag ehther the transmit clock should be stopped.
> > > 
> > >  *** There are open questions here. Eagle eyed reviewers will notice
> > >    pl->config->lpi_interfaces. There are MACs out there which only
> > >    support LPI signalling on a subset of their interface types. Phylib
> > >    doesn't understand this. I'm handling this at the moment by simply
> > >    not activating LPI at the MAC, but that leads to ethtool --show-eee
> > >    suggesting that EEE is active when it isn't.
> > >  *** Should we pass the phy_interface_t to these functions?
> > >  *** Should mac_enable_tx_lpi() be allowed to fail if the MAC doesn't
> > >    support the interface mode?
> > 
> > There is another point to raise here - should we have a "validate_eee"
> > method in struct phylink_mac_ops so that MAC drivers can validate
> > settings such as the tx_lpi_timer value can be programmed into the
> > hardware?
> > 
> > We do have the situation on Marvell platforms where the programmed
> > value depends on the MAC speed, and is only 8 bit, which makes
> > validating its value rather difficult - at 1G speeds, it's a
> > resolution of 1us so we can support up to 255us. At 100M speeds,
> > it's 10us, supporting up to 2.55ms. This makes it awkward to be able
> > to validate the set_eee() settings are sane for the hardware. Should
> > Marvell platforms instead implement a hrtimer above this? That sounds
> > a bit problematical to manage sanely.
> 
> I agree that tx_lpi_timer can be a problem, and this is not just a
> Marvell issue.  For example, I think the FEC MAC on i.MX8MP might also
> be affected.  But I can't confirm this because I don't have an i.MX8MP
> board with a PHY that supports MAC-controlled EEE mode. The Realtek PHY
> I have uses PHY-controlled EEE (SmartEEE).
> 
> Except for this, I think there should be sane default values for
> tx_lpi_timer.  The IEEE 802.3-2022 standard (Section 78.2) describes EEE
> timing, but it doesn’t give a clear recommendation for tx_lpi_timer.

There are of course some parameters of EEE that should be fixed, and
IEEE specifies them in that section. These are Ts, Tq and Tr, and IEEE
gives a range of values for these which need to be conformed with in a
compliant implementation.

Please don't get confused by the mvneta/mvpp2 implementation, there are
parameters for Ts and Tw, but the Ts value is not the same as Ts in
IEEE. IEEE defines it as the period of time between the PHY transmitting
sleep and turning all transmitters off. Marvell define it as the minimum
time from the Tx FIFO being empty to asserting LPI - quite different!

Other parameters depend on the implementation, such as the propagation
delay of data through the PHY. These, we don't currently take account
of. I don't recall off-hand whether there's any standards defined
registers describing these parameters in the PHY (I need to delve into
802.3...) That's all needed for computing Tw.

> IMO, the best value for tx_lpi_timer should be the sum of the time
> needed to put the full chain (MAC -> PHY -> remote PHY) into sleep mode
> and the time needed to wake the chain. These times are link-speed
> specific, so defaults should consider PHY timings for each link speed.

One can only make a set of defaults that depend on the speed if we also
give the user the ability to set the tx_lpi_timer values on a per-speed
basis, otherwise how do we update the values when set_eee() gets called?

Also how do we know what the requirements of the remote PHY are? I think
the only way that could be known is by exchanging the EEE TLV with the
link partner as described in section 78.

> Except for tx_lpi_timer, some MACs also allow configuration of the Twake
> timer.  For example, the FEC driver uses the lpi_timer value to
> configure the Twake timer, and the LAN78xx driver also provides a
> configurable Twake timer register.
> 
> If the Twake timer is not configured properly, or if the system has
> quirks causing the actual Twake time to be longer than expected, it can
> result in frame corruption. 

I have been aware of it, and to me it sounds like another can of worms
that right now I'd rather not open until we have solved the basics of
EEE and got MAC drivers into better shape for the basics. I had been
wondering whether we would eventually need phylink to pass Tw along
with the LPI timer, and I think eventually we would need to - and we
also need some infrastructure for the EEE TLV, both sending it at the
appropriate time, and processing it when received. I don't think we
have any of that at the moment, so it would be another chunk of
development.
Russell King (Oracle) Nov. 26, 2024, 4:55 p.m. UTC | #4
On Tue, Nov 26, 2024 at 12:51:36PM +0000, Russell King (Oracle) wrote:
> In doing this, I came across the fact that the addition of phylib
> managed EEE support has actually broken a huge number of drivers -
> phylib will now overwrite all members of struct ethtool_keee whether
> the netdev driver wants it or not. This leads to weird scenarios where
> doing a get_eee() op followed by a set_eee() op results in e.g.
> tx_lpi_timer being zeroed, because the MAC driver doesn't know it needs
> to initialise phylib's phydev->eee_cfg.tx_lpi_timer member. This mess
> really needs urgently addressing, and I believe it came about because
> Andrew's patches were only partly merged via another party - I guess
> highlighting the inherent danger of "thou shalt limit your patch series
> to no more than 15 patches" when one has a subsystem who's in-kernel
> API is changing.

Note that, I think, fec_main.c isn't broken, although a quick review
only looking at fec_enet_get_eee() may suggest otherwise:

static int
fec_enet_get_eee(struct net_device *ndev, struct ethtool_keee *edata)
{
        struct fec_enet_private *fep = netdev_priv(ndev);
        struct ethtool_keee *p = &fep->eee;

        if (!(fep->quirks & FEC_QUIRK_HAS_EEE))
                return -EOPNOTSUPP;

        if (!netif_running(ndev))
                return -ENETDOWN;

        edata->tx_lpi_timer = p->tx_lpi_timer; // <===========================

        return phy_ethtool_get_eee(ndev->phydev, edata);
}

static int
fec_enet_set_eee(struct net_device *ndev, struct ethtool_keee *edata)
{
...
        p->tx_lpi_timer = edata->tx_lpi_timer;

Since the driver does not touch phydev->eee_cfg.tx_lpi_timer,
phy_ethtool_get_eee() above will overwrite edata->tx_lpi_timer with
zero. If ethtool does a read-modify-write on the EEE settings, then
fec_enet_set_eee() will be passed a value of zero for
edata->tx_lpi_timer.

This will result in FEC_LPI_SLEEP and FEC_LPI_WAKE being written with
zero, and from what I can see in fec_enet_eee_mode_set(), that disables
EEE.

The saving grace for this driver is that p->tx_lpi_timer also starts
off as zero.

A better implementation would be to get rid of p->tx_lpi_timer
entirely, and instead rely on phydev->eee_cfg.tx_lpi_timer to be
managed by phylib, obtaining its value from there in
fec_enet_eee_mode_set(). At least the driver doesn't attempt to
maintain any other EEE state!

So something like this:


diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h
index 1cca0425d493..c81f2ea588f2 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -671,8 +671,6 @@ struct fec_enet_private {
 	unsigned int tx_time_itr;
 	unsigned int itr_clk_rate;
 
-	/* tx lpi eee mode */
-	struct ethtool_keee eee;
 	unsigned int clk_ref_rate;
 
 	/* ptp clock period in ns*/
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 1b55047c0237..25c842835d52 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2045,14 +2045,14 @@ static int fec_enet_us_to_tx_cycle(struct net_device *ndev, int us)
 	return us * (fep->clk_ref_rate / 1000) / 1000;
 }
 
-static int fec_enet_eee_mode_set(struct net_device *ndev, bool enable)
+static int fec_enet_eee_mode_set(struct net_device *ndev, u32 lpi_timer,
+				 bool enable)
 {
 	struct fec_enet_private *fep = netdev_priv(ndev);
-	struct ethtool_keee *p = &fep->eee;
 	unsigned int sleep_cycle, wake_cycle;
 
 	if (enable) {
-		sleep_cycle = fec_enet_us_to_tx_cycle(ndev, p->tx_lpi_timer);
+		sleep_cycle = fec_enet_us_to_tx_cycle(lpi_timer);
 		wake_cycle = sleep_cycle;
 	} else {
 		sleep_cycle = 0;
@@ -2105,7 +2105,9 @@ static void fec_enet_adjust_link(struct net_device *ndev)
 			napi_enable(&fep->napi);
 		}
 		if (fep->quirks & FEC_QUIRK_HAS_EEE)
-			fec_enet_eee_mode_set(ndev, phy_dev->enable_tx_lpi);
+			fec_enet_eee_mode_set(ndev,
+					      phy_dev->eee_cfg.tx_lpi_timer,
+					      phy_dev->enable_tx_lpi);
 	} else {
 		if (fep->link) {
 			netif_stop_queue(ndev);
@@ -3181,7 +3183,6 @@ static int
 fec_enet_get_eee(struct net_device *ndev, struct ethtool_keee *edata)
 {
 	struct fec_enet_private *fep = netdev_priv(ndev);
-	struct ethtool_keee *p = &fep->eee;
 
 	if (!(fep->quirks & FEC_QUIRK_HAS_EEE))
 		return -EOPNOTSUPP;
@@ -3189,8 +3190,6 @@ fec_enet_get_eee(struct net_device *ndev, struct ethtool_keee *edata)
 	if (!netif_running(ndev))
 		return -ENETDOWN;
 
-	edata->tx_lpi_timer = p->tx_lpi_timer;
-
 	return phy_ethtool_get_eee(ndev->phydev, edata);
 }
 
@@ -3198,7 +3197,6 @@ static int
 fec_enet_set_eee(struct net_device *ndev, struct ethtool_keee *edata)
 {
 	struct fec_enet_private *fep = netdev_priv(ndev);
-	struct ethtool_keee *p = &fep->eee;
 
 	if (!(fep->quirks & FEC_QUIRK_HAS_EEE))
 		return -EOPNOTSUPP;
@@ -3206,8 +3204,6 @@ fec_enet_set_eee(struct net_device *ndev, struct ethtool_keee *edata)
 	if (!netif_running(ndev))
 		return -ENETDOWN;
 
-	p->tx_lpi_timer = edata->tx_lpi_timer;
-
 	return phy_ethtool_set_eee(ndev->phydev, edata);
 }
 
Another patch to be added to my stack...