Message ID | 20240708075023.14893-3-brgl@bgdev.pl (mailing list archive) |
---|---|
State | Accepted |
Commit | ad649a1fac37e390de5b4211b902ec666c4cd202 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: phy: aquantia: enable support for aqr115c | expand |
Hi Bartosz, On 08/07/2024 08:50, Bartosz Golaszewski wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > Checking the firmware register before it complete the boot process makes > no sense, it will report 0 even if FW is available from internal memory. > Always wait for FW to boot before continuing or we'll unnecessarily try > to load it from nvmem/filesystem and fail. > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > --- > drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c > index 0c9640ef153b..524627a36c6f 100644 > --- a/drivers/net/phy/aquantia/aquantia_firmware.c > +++ b/drivers/net/phy/aquantia/aquantia_firmware.c > @@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev) > { > int ret; > > + ret = aqr_wait_reset_complete(phydev); > + if (ret) > + return ret; > + > /* Check if the firmware is not already loaded by pooling > * the current version returned by the PHY. If 0 is returned, > * no firmware is loaded. Although this fixed another issue we were seeing with this driver, we have been reviewing this change and have a question about it. According to the description for the function aqr_wait_reset_complete() this function is intended to give the device time to load firmware and check there is a valid firmware ID. If a valid firmware ID (non-zero) is detected, then aqr_wait_reset_complete() will return 0 (because phy_read_mmd_poll_timeout() returns 0 on success and -ETIMEDOUT upon a timeout). If it times out, then it would appear that with the above code we don't attempt to load the firmware by any other means? Hence, I was wondering if we want this ... diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c index 524627a36c6f..a167f42ae36b 100644 --- a/drivers/net/phy/aquantia/aquantia_firmware.c +++ b/drivers/net/phy/aquantia/aquantia_firmware.c @@ -353,16 +353,12 @@ int aqr_firmware_load(struct phy_device *phydev) { int ret; - ret = aqr_wait_reset_complete(phydev); - if (ret) - return ret; - - /* Check if the firmware is not already loaded by pooling + /* Check if the firmware is not already loaded by polling * the current version returned by the PHY. If 0 is returned, - * no firmware is loaded. + * firmware is loaded. */ - ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID); - if (ret > 0) + ret = aqr_wait_reset_complete(phydev); + if (!ret) goto exit; ret = aqr_firmware_load_nvmem(phydev); Our Aquantia PHY has a SPI-NOR and so we don't to test the other firmware loading cases. Jon
On Tue, Jul 30, 2024 at 10:59:59AM +0100, Jon Hunter wrote: > Hi Bartosz, > > On 08/07/2024 08:50, Bartosz Golaszewski wrote: > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > Checking the firmware register before it complete the boot process makes > > no sense, it will report 0 even if FW is available from internal memory. > > Always wait for FW to boot before continuing or we'll unnecessarily try > > to load it from nvmem/filesystem and fail. > > > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > --- > > drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c > > index 0c9640ef153b..524627a36c6f 100644 > > --- a/drivers/net/phy/aquantia/aquantia_firmware.c > > +++ b/drivers/net/phy/aquantia/aquantia_firmware.c > > @@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev) > > { > > int ret; > > + ret = aqr_wait_reset_complete(phydev); > > + if (ret) > > + return ret; > > + > > /* Check if the firmware is not already loaded by pooling > > * the current version returned by the PHY. If 0 is returned, > > * no firmware is loaded. > > > Although this fixed another issue we were seeing with this driver, we have > been reviewing this change and have a question about it. > > According to the description for the function aqr_wait_reset_complete() this > function is intended to give the device time to load firmware and check > there is a valid firmware ID. > > If a valid firmware ID (non-zero) is detected, then > aqr_wait_reset_complete() will return 0 (because phy_read_mmd_poll_timeout() > returns 0 on success and -ETIMEDOUT upon a timeout). > > If it times out, then it would appear that with the above code we don't > attempt to load the firmware by any other means? I'm also wondering about aqr_wait_reset_complete(). It uses phy_read_mmd_poll_timeout(), which prints an error message if it times out (which means no firmware has been loaded.) If we're then going on to attempt to load firmware, the error is not an error at all. So, I think while phy_read_poll_timeout() is nice and convenient, we need something like: #define phy_read_poll_timeout_quiet(phydev, regnum, val, cond, sleep_us, \ timeout_us, sleep_before_read) \ ({ \ int __ret, __val; \ __ret = read_poll_timeout(__val = phy_read, val, \ __val < 0 || (cond), \ sleep_us, timeout_us, sleep_before_read, phydev, regnum); \ if (__val < 0) \ __ret = __val; \ __ret; \ }) #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \ timeout_us, sleep_before_read) \ ({ \ int __ret = phy_read_poll_timeout_quiet(phydev, regnum, val, cond, \ sleep_us, timeout_us, \ sleep_before_read); \ if (__ret) \ phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \ __ret; \ }) and aqr_wait_reset_complete() needs to use phy_read_poll_timeout_quiet().
Hi Jon, On Tue, Jul 30, 2024 at 10:59:59AM +0100, Jon Hunter wrote: > Hi Bartosz, > > On 08/07/2024 08:50, Bartosz Golaszewski wrote: > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > Checking the firmware register before it complete the boot process makes > > no sense, it will report 0 even if FW is available from internal memory. > > Always wait for FW to boot before continuing or we'll unnecessarily try > > to load it from nvmem/filesystem and fail. > > > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > --- > > drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c > > index 0c9640ef153b..524627a36c6f 100644 > > --- a/drivers/net/phy/aquantia/aquantia_firmware.c > > +++ b/drivers/net/phy/aquantia/aquantia_firmware.c > > @@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev) > > { > > int ret; > > + ret = aqr_wait_reset_complete(phydev); > > + if (ret) > > + return ret; > > + > > /* Check if the firmware is not already loaded by pooling > > * the current version returned by the PHY. If 0 is returned, > > * no firmware is loaded. > > > Although this fixed another issue we were seeing with this driver, we have > been reviewing this change and have a question about it. > > According to the description for the function aqr_wait_reset_complete() this > function is intended to give the device time to load firmware and check > there is a valid firmware ID. > > If a valid firmware ID (non-zero) is detected, then > aqr_wait_reset_complete() will return 0 (because phy_read_mmd_poll_timeout() > returns 0 on success and -ETIMEDOUT upon a timeout). > > If it times out, then it would appear that with the above code we don't > attempt to load the firmware by any other means? > > Hence, I was wondering if we want this ... > > diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c > b/drivers/net/phy/aquantia/aquantia_firmware.c > index 524627a36c6f..a167f42ae36b 100644 > --- a/drivers/net/phy/aquantia/aquantia_firmware.c > +++ b/drivers/net/phy/aquantia/aquantia_firmware.c > @@ -353,16 +353,12 @@ int aqr_firmware_load(struct phy_device *phydev) > { > int ret; > > - ret = aqr_wait_reset_complete(phydev); > - if (ret) > - return ret; > - > - /* Check if the firmware is not already loaded by pooling > + /* Check if the firmware is not already loaded by polling > * the current version returned by the PHY. If 0 is returned, > - * no firmware is loaded. > + * firmware is loaded. > */ > - ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID); > - if (ret > 0) > + ret = aqr_wait_reset_complete(phydev); > + if (!ret) > goto exit; > > ret = aqr_firmware_load_nvmem(phydev); I agree with your analysis and we also noticed this. But actually, you wouldn't want to ignore other return codes from phy_read_mmd_poll_timeout() like real errors from phy_read_mmd(): -ENODEV, -ENXIO etc. I found that the logic is more readable with a switch/case statement as below. diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c index 524627a36c6f..d839f64471bd 100644 --- a/drivers/net/phy/aquantia/aquantia_firmware.c +++ b/drivers/net/phy/aquantia/aquantia_firmware.c @@ -353,26 +353,33 @@ int aqr_firmware_load(struct phy_device *phydev) { int ret; - ret = aqr_wait_reset_complete(phydev); - if (ret) - return ret; - - /* Check if the firmware is not already loaded by pooling - * the current version returned by the PHY. If 0 is returned, - * no firmware is loaded. + /* Check if the firmware is not already loaded by polling + * the current version returned by the PHY. */ - ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID); - if (ret > 0) - goto exit; + ret = aqr_wait_reset_complete(phydev); + switch (ret) { + case 0: + /* Some firmware is loaded => do nothing */ + return 0; + case -ETIMEDOUT: + /* VEND1_GLOBAL_FW_ID still reads 0 after 2 seconds of polling. + * We don't have full confidence that no firmware is loaded (in + * theory it might just not have loaded yet), but we will + * assume that, and load a new image. + */ + ret = aqr_firmware_load_nvmem(phydev); + if (!ret) + goto exit; - ret = aqr_firmware_load_nvmem(phydev); - if (!ret) - goto exit; + ret = aqr_firmware_load_fs(phydev); + if (ret) + return ret; - ret = aqr_firmware_load_fs(phydev); - if (ret) + break; + default: + /* PHY read error, propagate it to the caller */ return ret; + } -exit: return 0; }
Hi Russell, On Tue, Jul 30, 2024 at 12:23:45PM +0100, Russell King (Oracle) wrote: > > If it times out, then it would appear that with the above code we don't > > attempt to load the firmware by any other means? > > I'm also wondering about aqr_wait_reset_complete(). It uses > phy_read_mmd_poll_timeout(), which prints an error message if it times > out (which means no firmware has been loaded.) If we're then going on to > attempt to load firmware, the error is not an error at all. So, I think > while phy_read_poll_timeout() is nice and convenient, we need something > like: > > #define phy_read_poll_timeout_quiet(phydev, regnum, val, cond, sleep_us, \ > timeout_us, sleep_before_read) \ > ({ \ > int __ret, __val; \ > __ret = read_poll_timeout(__val = phy_read, val, \ > __val < 0 || (cond), \ > sleep_us, timeout_us, sleep_before_read, phydev, regnum); \ > if (__val < 0) \ > __ret = __val; \ > __ret; \ > }) > > #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \ > timeout_us, sleep_before_read) \ > ({ \ > int __ret = phy_read_poll_timeout_quiet(phydev, regnum, val, cond, \ > sleep_us, timeout_us, \ > sleep_before_read); \ > if (__ret) \ > phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \ > __ret; \ > }) > > and aqr_wait_reset_complete() needs to use phy_read_poll_timeout_quiet(). I agree that aqr_wait_reset_complete() shouldn't have built-in prints in it, as long as failures are also expected. Maybe an alternative option would be for aqr_wait_reset_complete() to manually roll a call to read_poll_timeout(), considering how it would be nice for _actual_ errors (not -ETIMEDOUT) from phy_read_mmd() to still be logged. But it seems strange that the driver has to time out on a 2 second poll, and then it's still not sure why VEND1_GLOBAL_FW_ID still reads 0? Is it because there's no firmware, or because there is, but it hasn't waited for long enough? I haven't followed the development of AQR firmware loading. Isn't there a faster and more reliable way of determining whether there is firmware in the first place? It could give the driver a 2 second boot-time speedup, plus more confidence.
diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c index 0c9640ef153b..524627a36c6f 100644 --- a/drivers/net/phy/aquantia/aquantia_firmware.c +++ b/drivers/net/phy/aquantia/aquantia_firmware.c @@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev) { int ret; + ret = aqr_wait_reset_complete(phydev); + if (ret) + return ret; + /* Check if the firmware is not already loaded by pooling * the current version returned by the PHY. If 0 is returned, * no firmware is loaded.