Message ID | 20240913121230.2620122-1-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 194ef9d0de9021df4a0ba8b112f91e56adaddd22 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: phy: aquantia: fix -ETIMEDOUT PHY probe failure when firmware not present | expand |
On Fri, 13 Sept 2024 at 14:12, Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > The author of the blamed commit apparently did not notice something > about aqr_wait_reset_complete(): it polls the exact same register - > MDIO_MMD_VEND1:VEND1_GLOBAL_FW_ID - as aqr_firmware_load(). > > Thus, the entire logic after the introduction of aqr_wait_reset_complete() is > now completely side-stepped, because if aqr_wait_reset_complete() > succeeds, MDIO_MMD_VEND1:VEND1_GLOBAL_FW_ID could have only been a > non-zero value. The handling of the case where the register reads as 0 > is dead code, due to the previous -ETIMEDOUT having stopped execution > and returning a fatal error to the caller. We never attempt to load > new firmware if no firmware is present. > > Based on static code analysis, I guess we should simply introduce a > switch/case statement based on the return code from aqr_wait_reset_complete(), > to determine whether to load firmware or not. I am not intending to > change the procedure through which the driver determines whether to load > firmware or not, as I am unaware of alternative possibilities. > > At the same time, Russell King suggests that if aqr_wait_reset_complete() > is expected to return -ETIMEDOUT as part of normal operation and not > just catastrophic failure, the use of phy_read_mmd_poll_timeout() is > improper, since that has an embedded print inside. Just open-code a > call to read_poll_timeout() to avoid printing -ETIMEDOUT, but continue > printing actual read errors from the MDIO bus. > > Fixes: ad649a1fac37 ("net: phy: aquantia: wait for FW reset before checking the vendor ID") > Reported-by: Clark Wang <xiaoning.wang@nxp.com> > Reported-by: Jon Hunter <jonathanh@nvidia.com> > Closes: https://lore.kernel.org/netdev/8ac00a45-ac61-41b4-9f74-d18157b8b6bf@nvidia.com/ > Reported-by: Hans-Frieder Vogt <hfdevel@gmx.net> > Closes: https://lore.kernel.org/netdev/c7c1a3ae-be97-4929-8d89-04c8aa870209@gmx.net/ > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > --- > Only compile-tested. However, my timeout timer expired waiting for > reactions on the thread with Bartosz' original patch, and Hans-Frieder > Vogt wrote a message in his cover letter implying that the patch fixes > the issue for him. Any Tested-by: tags are welcome. > Still works on sa8775p-ride v3 Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
On Fri, Sep 13, 2024 at 03:18:42PM +0200, Bartosz Golaszewski wrote: > Still works on sa8775p-ride v3 > > Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Thanks for testing, I appreciate it.
On 13.09.2024 14.12, Vladimir Oltean wrote: > The author of the blamed commit apparently did not notice something > about aqr_wait_reset_complete(): it polls the exact same register - > MDIO_MMD_VEND1:VEND1_GLOBAL_FW_ID - as aqr_firmware_load(). > > Thus, the entire logic after the introduction of aqr_wait_reset_complete() is > now completely side-stepped, because if aqr_wait_reset_complete() > succeeds, MDIO_MMD_VEND1:VEND1_GLOBAL_FW_ID could have only been a > non-zero value. The handling of the case where the register reads as 0 > is dead code, due to the previous -ETIMEDOUT having stopped execution > and returning a fatal error to the caller. We never attempt to load > new firmware if no firmware is present. > > Based on static code analysis, I guess we should simply introduce a > switch/case statement based on the return code from aqr_wait_reset_complete(), > to determine whether to load firmware or not. I am not intending to > change the procedure through which the driver determines whether to load > firmware or not, as I am unaware of alternative possibilities. > > At the same time, Russell King suggests that if aqr_wait_reset_complete() > is expected to return -ETIMEDOUT as part of normal operation and not > just catastrophic failure, the use of phy_read_mmd_poll_timeout() is > improper, since that has an embedded print inside. Just open-code a > call to read_poll_timeout() to avoid printing -ETIMEDOUT, but continue > printing actual read errors from the MDIO bus. > > Fixes: ad649a1fac37 ("net: phy: aquantia: wait for FW reset before checking the vendor ID") > Reported-by: Clark Wang <xiaoning.wang@nxp.com> > Reported-by: Jon Hunter <jonathanh@nvidia.com> > Closes: https://lore.kernel.org/netdev/8ac00a45-ac61-41b4-9f74-d18157b8b6bf@nvidia.com/ > Reported-by: Hans-Frieder Vogt <hfdevel@gmx.net> > Closes: https://lore.kernel.org/netdev/c7c1a3ae-be97-4929-8d89-04c8aa870209@gmx.net/ > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > --- > Only compile-tested. However, my timeout timer expired waiting for > reactions on the thread with Bartosz' original patch, and Hans-Frieder > Vogt wrote a message in his cover letter implying that the patch fixes > the issue for him. Any Tested-by: tags are welcome. > > drivers/net/phy/aquantia/aquantia_firmware.c | 42 +++++++++++--------- > drivers/net/phy/aquantia/aquantia_main.c | 19 +++++++-- > 2 files changed, 39 insertions(+), 22 deletions(-) > > diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c > index 524627a36c6f..dac6464b5fe2 100644 > --- a/drivers/net/phy/aquantia/aquantia_firmware.c > +++ b/drivers/net/phy/aquantia/aquantia_firmware.c > @@ -353,26 +353,32 @@ int aqr_firmware_load(struct phy_device *phydev) > { > int ret; > > - ret = aqr_wait_reset_complete(phydev); > - if (ret) > - return ret; > - > - /* Check if the firmware is not already loaded by pooling > - * the current version returned by the PHY. If 0 is returned, > - * no firmware is loaded. > + /* Check if the firmware is not already loaded by polling > + * the current version returned by the PHY. > */ > - ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID); > - if (ret > 0) > - goto exit; > - > - ret = aqr_firmware_load_nvmem(phydev); > - if (!ret) > - goto exit; > - > - ret = aqr_firmware_load_fs(phydev); > - if (ret) > + ret = aqr_wait_reset_complete(phydev); > + switch (ret) { > + case 0: > + /* Some firmware is loaded => do nothing */ > + return 0; > + case -ETIMEDOUT: > + /* VEND1_GLOBAL_FW_ID still reads 0 after 2 seconds of polling. > + * We don't have full confidence that no firmware is loaded (in > + * theory it might just not have loaded yet), but we will > + * assume that, and load a new image. > + */ > + ret = aqr_firmware_load_nvmem(phydev); > + if (!ret) > + return ret; > + > + ret = aqr_firmware_load_fs(phydev); > + if (ret) > + return ret; > + break; > + default: > + /* PHY read error, propagate it to the caller */ > return ret; > + } > > -exit: > return 0; > } > diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c > index e982e9ce44a5..57b8b8f400fd 100644 > --- a/drivers/net/phy/aquantia/aquantia_main.c > +++ b/drivers/net/phy/aquantia/aquantia_main.c > @@ -435,6 +435,9 @@ static int aqr107_set_tunable(struct phy_device *phydev, > } > } > > +#define AQR_FW_WAIT_SLEEP_US 20000 > +#define AQR_FW_WAIT_TIMEOUT_US 2000000 > + > /* If we configure settings whilst firmware is still initializing the chip, > * then these settings may be overwritten. Therefore make sure chip > * initialization has completed. Use presence of the firmware ID as > @@ -444,11 +447,19 @@ static int aqr107_set_tunable(struct phy_device *phydev, > */ > int aqr_wait_reset_complete(struct phy_device *phydev) > { > - int val; > + int ret, val; > + > + ret = read_poll_timeout(phy_read_mmd, val, val != 0, > + AQR_FW_WAIT_SLEEP_US, AQR_FW_WAIT_TIMEOUT_US, > + false, phydev, MDIO_MMD_VEND1, > + VEND1_GLOBAL_FW_ID); > + if (val < 0) { > + phydev_err(phydev, "Failed to read VEND1_GLOBAL_FW_ID: %pe\n", > + ERR_PTR(val)); > + return val; > + } > > - return phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1, > - VEND1_GLOBAL_FW_ID, val, val != 0, > - 20000, 2000000, false); > + return ret; > } > > static void aqr107_chip_info(struct phy_device *phydev) Tested-by: Hans-Frieder Vogt <hfdevel@gmx.net> Hans
Hello: This patch was applied to netdev/net.git (main) by Paolo Abeni <pabeni@redhat.com>: On Fri, 13 Sep 2024 15:12:30 +0300 you wrote: > The author of the blamed commit apparently did not notice something > about aqr_wait_reset_complete(): it polls the exact same register - > MDIO_MMD_VEND1:VEND1_GLOBAL_FW_ID - as aqr_firmware_load(). > > Thus, the entire logic after the introduction of aqr_wait_reset_complete() is > now completely side-stepped, because if aqr_wait_reset_complete() > succeeds, MDIO_MMD_VEND1:VEND1_GLOBAL_FW_ID could have only been a > non-zero value. The handling of the case where the register reads as 0 > is dead code, due to the previous -ETIMEDOUT having stopped execution > and returning a fatal error to the caller. We never attempt to load > new firmware if no firmware is present. > > [...] Here is the summary with links: - [net] net: phy: aquantia: fix -ETIMEDOUT PHY probe failure when firmware not present https://git.kernel.org/netdev/net/c/194ef9d0de90 You are awesome, thank you!
diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c index 524627a36c6f..dac6464b5fe2 100644 --- a/drivers/net/phy/aquantia/aquantia_firmware.c +++ b/drivers/net/phy/aquantia/aquantia_firmware.c @@ -353,26 +353,32 @@ int aqr_firmware_load(struct phy_device *phydev) { int ret; - ret = aqr_wait_reset_complete(phydev); - if (ret) - return ret; - - /* Check if the firmware is not already loaded by pooling - * the current version returned by the PHY. If 0 is returned, - * no firmware is loaded. + /* Check if the firmware is not already loaded by polling + * the current version returned by the PHY. */ - ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID); - if (ret > 0) - goto exit; - - ret = aqr_firmware_load_nvmem(phydev); - if (!ret) - goto exit; - - ret = aqr_firmware_load_fs(phydev); - if (ret) + ret = aqr_wait_reset_complete(phydev); + switch (ret) { + case 0: + /* Some firmware is loaded => do nothing */ + return 0; + case -ETIMEDOUT: + /* VEND1_GLOBAL_FW_ID still reads 0 after 2 seconds of polling. + * We don't have full confidence that no firmware is loaded (in + * theory it might just not have loaded yet), but we will + * assume that, and load a new image. + */ + ret = aqr_firmware_load_nvmem(phydev); + if (!ret) + return ret; + + ret = aqr_firmware_load_fs(phydev); + if (ret) + return ret; + break; + default: + /* PHY read error, propagate it to the caller */ return ret; + } -exit: return 0; } diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c index e982e9ce44a5..57b8b8f400fd 100644 --- a/drivers/net/phy/aquantia/aquantia_main.c +++ b/drivers/net/phy/aquantia/aquantia_main.c @@ -435,6 +435,9 @@ static int aqr107_set_tunable(struct phy_device *phydev, } } +#define AQR_FW_WAIT_SLEEP_US 20000 +#define AQR_FW_WAIT_TIMEOUT_US 2000000 + /* If we configure settings whilst firmware is still initializing the chip, * then these settings may be overwritten. Therefore make sure chip * initialization has completed. Use presence of the firmware ID as @@ -444,11 +447,19 @@ static int aqr107_set_tunable(struct phy_device *phydev, */ int aqr_wait_reset_complete(struct phy_device *phydev) { - int val; + int ret, val; + + ret = read_poll_timeout(phy_read_mmd, val, val != 0, + AQR_FW_WAIT_SLEEP_US, AQR_FW_WAIT_TIMEOUT_US, + false, phydev, MDIO_MMD_VEND1, + VEND1_GLOBAL_FW_ID); + if (val < 0) { + phydev_err(phydev, "Failed to read VEND1_GLOBAL_FW_ID: %pe\n", + ERR_PTR(val)); + return val; + } - return phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1, - VEND1_GLOBAL_FW_ID, val, val != 0, - 20000, 2000000, false); + return ret; } static void aqr107_chip_info(struct phy_device *phydev)
The author of the blamed commit apparently did not notice something about aqr_wait_reset_complete(): it polls the exact same register - MDIO_MMD_VEND1:VEND1_GLOBAL_FW_ID - as aqr_firmware_load(). Thus, the entire logic after the introduction of aqr_wait_reset_complete() is now completely side-stepped, because if aqr_wait_reset_complete() succeeds, MDIO_MMD_VEND1:VEND1_GLOBAL_FW_ID could have only been a non-zero value. The handling of the case where the register reads as 0 is dead code, due to the previous -ETIMEDOUT having stopped execution and returning a fatal error to the caller. We never attempt to load new firmware if no firmware is present. Based on static code analysis, I guess we should simply introduce a switch/case statement based on the return code from aqr_wait_reset_complete(), to determine whether to load firmware or not. I am not intending to change the procedure through which the driver determines whether to load firmware or not, as I am unaware of alternative possibilities. At the same time, Russell King suggests that if aqr_wait_reset_complete() is expected to return -ETIMEDOUT as part of normal operation and not just catastrophic failure, the use of phy_read_mmd_poll_timeout() is improper, since that has an embedded print inside. Just open-code a call to read_poll_timeout() to avoid printing -ETIMEDOUT, but continue printing actual read errors from the MDIO bus. Fixes: ad649a1fac37 ("net: phy: aquantia: wait for FW reset before checking the vendor ID") Reported-by: Clark Wang <xiaoning.wang@nxp.com> Reported-by: Jon Hunter <jonathanh@nvidia.com> Closes: https://lore.kernel.org/netdev/8ac00a45-ac61-41b4-9f74-d18157b8b6bf@nvidia.com/ Reported-by: Hans-Frieder Vogt <hfdevel@gmx.net> Closes: https://lore.kernel.org/netdev/c7c1a3ae-be97-4929-8d89-04c8aa870209@gmx.net/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- Only compile-tested. However, my timeout timer expired waiting for reactions on the thread with Bartosz' original patch, and Hans-Frieder Vogt wrote a message in his cover letter implying that the patch fixes the issue for him. Any Tested-by: tags are welcome. drivers/net/phy/aquantia/aquantia_firmware.c | 42 +++++++++++--------- drivers/net/phy/aquantia/aquantia_main.c | 19 +++++++-- 2 files changed, 39 insertions(+), 22 deletions(-)