Message ID | 20241114081653.3939346-2-yong.liang.choong@linux.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Fix 'ethtool --show-eee' during initial stage | expand |
On Thu, Nov 14, 2024 at 04:16:52PM +0800, Choong Yong Liang wrote: > Not all PHYs have EEE enabled by default. For example, Marvell PHYs are > designed to have EEE hardware disabled during the initial state, and it > needs to be configured to turn it on again. > > This patch reads the PHY configuration and sets it as the initial value for > eee_cfg.tx_lpi_enabled and eee_cfg.eee_enabled instead of having them set to > true by default. eee_cfg.tx_lpi_enabled is something phylib tracks, and it merely means that LPI needs to be enabled at the MAC if EEE was negotiated: * @tx_lpi_enabled: Whether the interface should assert its tx lpi, given * that eee was negotiated. eee_cfg.eee_enabled means that EEE mode was enabled - which is user configuration: * @eee_enabled: EEE configured mode (enabled/disabled). phy_probe() reads the initial PHY state and sets things up appropriately. However, there is a point where the EEE configuration (advertisement, and therefore eee_enabled state) is written to the PHY, and that should be config_aneg(). Looking at the Marvell driver, it's calling genphy_config_aneg() which eventually calls genphy_c45_an_config_eee_aneg() which does this (via __genphy_config_aneg()). Please investigate why the hardware state is going out of sync with the software state. Thanks. > void phy_support_eee(struct phy_device *phydev) > { > + bool is_enabled = true; > + > + genphy_c45_eee_is_active(phydev, NULL, NULL, &is_enabled); > linkmode_copy(phydev->advertising_eee, phydev->supported_eee); > - phydev->eee_cfg.tx_lpi_enabled = true; > - phydev->eee_cfg.eee_enabled = true; > + phydev->eee_cfg.tx_lpi_enabled = is_enabled; > + phydev->eee_cfg.eee_enabled = is_enabled; This is almost certainly incorrect, because eee_enabled should only be set when phydev->advertising_eee (which should track the hardware EEE advertisement programmed into the PHY) is non-zero. Note that phy_support_eee() must be called _before_ phy_start(). I haven't checked whether stmmac does this. Thanks.
On Thu, Nov 14, 2024 at 09:23:48AM +0000, Russell King (Oracle) wrote: > On Thu, Nov 14, 2024 at 04:16:52PM +0800, Choong Yong Liang wrote: > > Not all PHYs have EEE enabled by default. For example, Marvell PHYs are > > designed to have EEE hardware disabled during the initial state, and it > > needs to be configured to turn it on again. > > > > This patch reads the PHY configuration and sets it as the initial value for > > eee_cfg.tx_lpi_enabled and eee_cfg.eee_enabled instead of having them set to > > true by default. > > eee_cfg.tx_lpi_enabled is something phylib tracks, and it merely means > that LPI needs to be enabled at the MAC if EEE was negotiated: > > * @tx_lpi_enabled: Whether the interface should assert its tx lpi, given > * that eee was negotiated. > > eee_cfg.eee_enabled means that EEE mode was enabled - which is user > configuration: > > * @eee_enabled: EEE configured mode (enabled/disabled). > > phy_probe() reads the initial PHY state and sets things up > appropriately. > > However, there is a point where the EEE configuration (advertisement, > and therefore eee_enabled state) is written to the PHY, and that should > be config_aneg(). Looking at the Marvell driver, it's calling > genphy_config_aneg() which eventually calls > genphy_c45_an_config_eee_aneg() which does this (via > __genphy_config_aneg()). > > Please investigate why the hardware state is going out of sync with the > software state. I think I've found the issue. We have phydev->eee_enabled and phydev->eee_cfg.eee_enabled, which looks like a bug to me. We write to phydev->eee_cfg.eee_enabled in phy_support_eee(), leaving phydev->eee_enabled untouched. However, most other places are using phydev->eee_enabled. This is (a) confusing and (b) wrong, and having the two members leads to this confusion, and makes the code more difficult to follow (unless one has already clocked that there are these two different things both called eee_enabled). This is my untested prototype patch to fix this - it may cause breakage elsewhere: diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c index c1b3576c307f..2d64d3f293e5 100644 --- a/drivers/net/phy/phy-c45.c +++ b/drivers/net/phy/phy-c45.c @@ -943,7 +943,7 @@ EXPORT_SYMBOL_GPL(genphy_c45_read_eee_abilities); */ int genphy_c45_an_config_eee_aneg(struct phy_device *phydev) { - if (!phydev->eee_enabled) { + if (!phydev->eee_cfg.eee_enabled) { __ETHTOOL_DECLARE_LINK_MODE_MASK(adv) = {}; return genphy_c45_write_eee_adv(phydev, adv); @@ -1576,8 +1576,6 @@ int genphy_c45_ethtool_set_eee(struct phy_device *phydev, } } - phydev->eee_enabled = data->eee_enabled; - ret = genphy_c45_an_config_eee_aneg(phydev); if (ret > 0) { ret = phy_restart_aneg(phydev); diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index bc24c9f2786b..b26bb33cd1d4 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -3589,12 +3589,12 @@ static int phy_probe(struct device *dev) /* There is no "enabled" flag. If PHY is advertising, assume it is * kind of enabled. */ - phydev->eee_enabled = !linkmode_empty(phydev->advertising_eee); + phydev->eee_cfg.eee_enabled = !linkmode_empty(phydev->advertising_eee); /* Some PHYs may advertise, by default, not support EEE modes. So, * we need to clean them. */ - if (phydev->eee_enabled) + if (phydev->eee_cfg.eee_enabled) linkmode_and(phydev->advertising_eee, phydev->supported_eee, phydev->advertising_eee); diff --git a/include/linux/phy.h b/include/linux/phy.h index 1e4127c495c0..33905e9672a7 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -601,7 +601,6 @@ struct macsec_ops; * @adv_old: Saved advertised while power saving for WoL * @supported_eee: supported PHY EEE linkmodes * @advertising_eee: Currently advertised EEE linkmodes - * @eee_enabled: Flag indicating whether the EEE feature is enabled * @enable_tx_lpi: When True, MAC should transmit LPI to PHY * @eee_cfg: User configuration of EEE * @lp_advertising: Current link partner advertised linkmodes @@ -721,7 +720,6 @@ struct phy_device { /* used for eee validation and configuration*/ __ETHTOOL_DECLARE_LINK_MODE_MASK(supported_eee); __ETHTOOL_DECLARE_LINK_MODE_MASK(advertising_eee); - bool eee_enabled; /* Host supported PHY interface types. Should be ignored if empty. */ DECLARE_PHY_INTERFACE_MASK(host_interfaces);
On Thu, Nov 14, 2024 at 10:05:52AM +0000, Russell King (Oracle) wrote: > On Thu, Nov 14, 2024 at 09:23:48AM +0000, Russell King (Oracle) wrote: > > On Thu, Nov 14, 2024 at 04:16:52PM +0800, Choong Yong Liang wrote: > > > Not all PHYs have EEE enabled by default. For example, Marvell PHYs are > > > designed to have EEE hardware disabled during the initial state, and it > > > needs to be configured to turn it on again. > > > > > > This patch reads the PHY configuration and sets it as the initial value for > > > eee_cfg.tx_lpi_enabled and eee_cfg.eee_enabled instead of having them set to > > > true by default. > > > > eee_cfg.tx_lpi_enabled is something phylib tracks, and it merely means > > that LPI needs to be enabled at the MAC if EEE was negotiated: > > > > * @tx_lpi_enabled: Whether the interface should assert its tx lpi, given > > * that eee was negotiated. > > > > eee_cfg.eee_enabled means that EEE mode was enabled - which is user > > configuration: > > > > * @eee_enabled: EEE configured mode (enabled/disabled). > > > > phy_probe() reads the initial PHY state and sets things up > > appropriately. > > > > However, there is a point where the EEE configuration (advertisement, > > and therefore eee_enabled state) is written to the PHY, and that should > > be config_aneg(). Looking at the Marvell driver, it's calling > > genphy_config_aneg() which eventually calls > > genphy_c45_an_config_eee_aneg() which does this (via > > __genphy_config_aneg()). > > > > Please investigate why the hardware state is going out of sync with the > > software state. > > I think I've found the issue. > > We have phydev->eee_enabled and phydev->eee_cfg.eee_enabled, which looks > like a bug to me. We write to phydev->eee_cfg.eee_enabled in > phy_support_eee(), leaving phydev->eee_enabled untouched. > > However, most other places are using phydev->eee_enabled. > > This is (a) confusing and (b) wrong, and having the two members leads > to this confusion, and makes the code more difficult to follow (unless > one has already clocked that there are these two different things both > called eee_enabled). > > This is my untested prototype patch to fix this - it may cause breakage > elsewhere: As mentioned in the other thread: Without a call to phy_support_eee(): EEE settings for eth2: EEE status: disabled Tx LPI: disabled Supported EEE link modes: 100baseT/Full 1000baseT/Full Advertised EEE link modes: Not reported Link partner advertised EEE link modes: 100baseT/Full 1000baseT/Full With a call to phy_support_eee(): EEE settings for eth2: EEE status: enabled - active Tx LPI: 0 (us) Supported EEE link modes: 100baseT/Full 1000baseT/Full Advertised EEE link modes: 100baseT/Full 1000baseT/Full Link partner advertised EEE link modes: 100baseT/Full 1000baseT/Full So the EEE status is now behaving correctly, and the Marvell PHY is being programmed with the advertisement correctly.
On 14/11/2024 6:16 pm, Russell King (Oracle) wrote: > On Thu, Nov 14, 2024 at 10:05:52AM +0000, Russell King (Oracle) wrote: >> On Thu, Nov 14, 2024 at 09:23:48AM +0000, Russell King (Oracle) wrote: >>> On Thu, Nov 14, 2024 at 04:16:52PM +0800, Choong Yong Liang wrote: >>>> Not all PHYs have EEE enabled by default. For example, Marvell PHYs are >>>> designed to have EEE hardware disabled during the initial state, and it >>>> needs to be configured to turn it on again. >>>> >>>> This patch reads the PHY configuration and sets it as the initial value for >>>> eee_cfg.tx_lpi_enabled and eee_cfg.eee_enabled instead of having them set to >>>> true by default. >>> >>> eee_cfg.tx_lpi_enabled is something phylib tracks, and it merely means >>> that LPI needs to be enabled at the MAC if EEE was negotiated: >>> >>> * @tx_lpi_enabled: Whether the interface should assert its tx lpi, given >>> * that eee was negotiated. >>> >>> eee_cfg.eee_enabled means that EEE mode was enabled - which is user >>> configuration: >>> >>> * @eee_enabled: EEE configured mode (enabled/disabled). >>> >>> phy_probe() reads the initial PHY state and sets things up >>> appropriately. >>> >>> However, there is a point where the EEE configuration (advertisement, >>> and therefore eee_enabled state) is written to the PHY, and that should >>> be config_aneg(). Looking at the Marvell driver, it's calling >>> genphy_config_aneg() which eventually calls >>> genphy_c45_an_config_eee_aneg() which does this (via >>> __genphy_config_aneg()). >>> >>> Please investigate why the hardware state is going out of sync with the >>> software state. >> >> I think I've found the issue. >> >> We have phydev->eee_enabled and phydev->eee_cfg.eee_enabled, which looks >> like a bug to me. We write to phydev->eee_cfg.eee_enabled in >> phy_support_eee(), leaving phydev->eee_enabled untouched. >> >> However, most other places are using phydev->eee_enabled. >> >> This is (a) confusing and (b) wrong, and having the two members leads >> to this confusion, and makes the code more difficult to follow (unless >> one has already clocked that there are these two different things both >> called eee_enabled). >> >> This is my untested prototype patch to fix this - it may cause breakage >> elsewhere: > > As mentioned in the other thread: > > Without a call to phy_support_eee(): > > EEE settings for eth2: > EEE status: disabled > Tx LPI: disabled > Supported EEE link modes: 100baseT/Full > 1000baseT/Full > Advertised EEE link modes: Not reported > Link partner advertised EEE link modes: 100baseT/Full > 1000baseT/Full > > With a call to phy_support_eee(): > > EEE settings for eth2: > EEE status: enabled - active > Tx LPI: 0 (us) > Supported EEE link modes: 100baseT/Full > 1000baseT/Full > Advertised EEE link modes: 100baseT/Full > 1000baseT/Full > Link partner advertised EEE link modes: 100baseT/Full > 1000baseT/Full > > So the EEE status is now behaving correctly, and the Marvell PHY is > being programmed with the advertisement correctly. > Thank you for all the suggestions, the provided prototype, and the tested results. I will study the suggestions in depth, test the provided prototype, and provide more feedback.
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 499797646580..b4fa40c2371a 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -3010,9 +3010,12 @@ EXPORT_SYMBOL_GPL(phy_advertise_eee_all); */ void phy_support_eee(struct phy_device *phydev) { + bool is_enabled = true; + + genphy_c45_eee_is_active(phydev, NULL, NULL, &is_enabled); linkmode_copy(phydev->advertising_eee, phydev->supported_eee); - phydev->eee_cfg.tx_lpi_enabled = true; - phydev->eee_cfg.eee_enabled = true; + phydev->eee_cfg.tx_lpi_enabled = is_enabled; + phydev->eee_cfg.eee_enabled = is_enabled; } EXPORT_SYMBOL(phy_support_eee);
Not all PHYs have EEE enabled by default. For example, Marvell PHYs are designed to have EEE hardware disabled during the initial state, and it needs to be configured to turn it on again. This patch reads the PHY configuration and sets it as the initial value for eee_cfg.tx_lpi_enabled and eee_cfg.eee_enabled instead of having them set to true by default. Fixes: 49168d1980e2 ("net: phy: Add phy_support_eee() indicating MAC support EEE") Cc: <stable@vger.kernel.org> Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> --- drivers/net/phy/phy_device.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)