diff mbox series

[RESEND,net-next,v3,2/4] net: phy: aquantia: wait for FW reset before checking the vendor ID

Message ID 20240708075023.14893-3-brgl@bgdev.pl (mailing list archive)
State Accepted
Commit ad649a1fac37e390de5b4211b902ec666c4cd202
Delegated to: Netdev Maintainers
Headers show
Series net: phy: aquantia: enable support for aqr115c | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 816 this patch: 816
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 2 maintainers not CCed: robimarko@gmail.com ansuelsmth@gmail.com
netdev/build_clang success Errors and warnings before: 821 this patch: 821
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 821 this patch: 821
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-08--15-00 (tests: 694)

Commit Message

Bartosz Golaszewski July 8, 2024, 7:50 a.m. UTC
From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

Checking the firmware register before it complete the boot process makes
no sense, it will report 0 even if FW is available from internal memory.
Always wait for FW to boot before continuing or we'll unnecessarily try
to load it from nvmem/filesystem and fail.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
---
 drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jon Hunter July 30, 2024, 9:59 a.m. UTC | #1
Hi Bartosz,

On 08/07/2024 08:50, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> 
> Checking the firmware register before it complete the boot process makes
> no sense, it will report 0 even if FW is available from internal memory.
> Always wait for FW to boot before continuing or we'll unnecessarily try
> to load it from nvmem/filesystem and fail.
> 
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> ---
>   drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
> index 0c9640ef153b..524627a36c6f 100644
> --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> @@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev)
>   {
>   	int ret;
>   
> +	ret = aqr_wait_reset_complete(phydev);
> +	if (ret)
> +		return ret;
> +
>   	/* Check if the firmware is not already loaded by pooling
>   	 * the current version returned by the PHY. If 0 is returned,
>   	 * no firmware is loaded.


Although this fixed another issue we were seeing with this driver, we 
have been reviewing this change and have a question about it.

According to the description for the function aqr_wait_reset_complete() 
this function is intended to give the device time to load firmware and 
check there is a valid firmware ID.

If a valid firmware ID (non-zero) is detected, then 
aqr_wait_reset_complete() will return 0 (because 
phy_read_mmd_poll_timeout() returns 0 on success and -ETIMEDOUT upon a 
timeout).

If it times out, then it would appear that with the above code we don't 
attempt to load the firmware by any other means?

Hence, I was wondering if we want this ...

diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c 
b/drivers/net/phy/aquantia/aquantia_firmware.c
index 524627a36c6f..a167f42ae36b 100644
--- a/drivers/net/phy/aquantia/aquantia_firmware.c
+++ b/drivers/net/phy/aquantia/aquantia_firmware.c
@@ -353,16 +353,12 @@ int aqr_firmware_load(struct phy_device *phydev)
  {
         int ret;

-       ret = aqr_wait_reset_complete(phydev);
-       if (ret)
-               return ret;
-
-       /* Check if the firmware is not already loaded by pooling
+       /* Check if the firmware is not already loaded by polling
          * the current version returned by the PHY. If 0 is returned,
-        * no firmware is loaded.
+        * firmware is loaded.
          */
-       ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID);
-       if (ret > 0)
+       ret = aqr_wait_reset_complete(phydev);
+       if (!ret)
                 goto exit;

         ret = aqr_firmware_load_nvmem(phydev);


Our Aquantia PHY has a SPI-NOR and so we don't to test the other 
firmware loading cases.

Jon
Russell King (Oracle) July 30, 2024, 11:23 a.m. UTC | #2
On Tue, Jul 30, 2024 at 10:59:59AM +0100, Jon Hunter wrote:
> Hi Bartosz,
> 
> On 08/07/2024 08:50, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > 
> > Checking the firmware register before it complete the boot process makes
> > no sense, it will report 0 even if FW is available from internal memory.
> > Always wait for FW to boot before continuing or we'll unnecessarily try
> > to load it from nvmem/filesystem and fail.
> > 
> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > ---
> >   drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
> > index 0c9640ef153b..524627a36c6f 100644
> > --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> > +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> > @@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev)
> >   {
> >   	int ret;
> > +	ret = aqr_wait_reset_complete(phydev);
> > +	if (ret)
> > +		return ret;
> > +
> >   	/* Check if the firmware is not already loaded by pooling
> >   	 * the current version returned by the PHY. If 0 is returned,
> >   	 * no firmware is loaded.
> 
> 
> Although this fixed another issue we were seeing with this driver, we have
> been reviewing this change and have a question about it.
> 
> According to the description for the function aqr_wait_reset_complete() this
> function is intended to give the device time to load firmware and check
> there is a valid firmware ID.
> 
> If a valid firmware ID (non-zero) is detected, then
> aqr_wait_reset_complete() will return 0 (because phy_read_mmd_poll_timeout()
> returns 0 on success and -ETIMEDOUT upon a timeout).
> 
> If it times out, then it would appear that with the above code we don't
> attempt to load the firmware by any other means?

I'm also wondering about aqr_wait_reset_complete(). It uses
phy_read_mmd_poll_timeout(), which prints an error message if it times
out (which means no firmware has been loaded.) If we're then going on to
attempt to load firmware, the error is not an error at all. So, I think
while phy_read_poll_timeout() is nice and convenient, we need something
like:

#define phy_read_poll_timeout_quiet(phydev, regnum, val, cond, sleep_us, \
                                    timeout_us, sleep_before_read) \
({ \
        int __ret, __val; \
        __ret = read_poll_timeout(__val = phy_read, val, \
                                  __val < 0 || (cond), \
                sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
        if (__val < 0) \
                __ret = __val; \
        __ret; \
})

#define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
                                timeout_us, sleep_before_read) \
({ \
        int __ret = phy_read_poll_timeout_quiet(phydev, regnum, val, cond, \
						sleep_us, timeout_us, \
						sleep_before_read); \
        if (__ret) \
                phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
        __ret; \
})

and aqr_wait_reset_complete() needs to use phy_read_poll_timeout_quiet().
Vladimir Oltean Aug. 6, 2024, 11:27 a.m. UTC | #3
Hi Jon,

On Tue, Jul 30, 2024 at 10:59:59AM +0100, Jon Hunter wrote:
> Hi Bartosz,
> 
> On 08/07/2024 08:50, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > 
> > Checking the firmware register before it complete the boot process makes
> > no sense, it will report 0 even if FW is available from internal memory.
> > Always wait for FW to boot before continuing or we'll unnecessarily try
> > to load it from nvmem/filesystem and fail.
> > 
> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > ---
> >   drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
> > index 0c9640ef153b..524627a36c6f 100644
> > --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> > +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> > @@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev)
> >   {
> >   	int ret;
> > +	ret = aqr_wait_reset_complete(phydev);
> > +	if (ret)
> > +		return ret;
> > +
> >   	/* Check if the firmware is not already loaded by pooling
> >   	 * the current version returned by the PHY. If 0 is returned,
> >   	 * no firmware is loaded.
> 
> 
> Although this fixed another issue we were seeing with this driver, we have
> been reviewing this change and have a question about it.
> 
> According to the description for the function aqr_wait_reset_complete() this
> function is intended to give the device time to load firmware and check
> there is a valid firmware ID.
> 
> If a valid firmware ID (non-zero) is detected, then
> aqr_wait_reset_complete() will return 0 (because phy_read_mmd_poll_timeout()
> returns 0 on success and -ETIMEDOUT upon a timeout).
> 
> If it times out, then it would appear that with the above code we don't
> attempt to load the firmware by any other means?
> 
> Hence, I was wondering if we want this ...
> 
> diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c
> b/drivers/net/phy/aquantia/aquantia_firmware.c
> index 524627a36c6f..a167f42ae36b 100644
> --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> @@ -353,16 +353,12 @@ int aqr_firmware_load(struct phy_device *phydev)
>  {
>         int ret;
> 
> -       ret = aqr_wait_reset_complete(phydev);
> -       if (ret)
> -               return ret;
> -
> -       /* Check if the firmware is not already loaded by pooling
> +       /* Check if the firmware is not already loaded by polling
>          * the current version returned by the PHY. If 0 is returned,
> -        * no firmware is loaded.
> +        * firmware is loaded.
>          */
> -       ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID);
> -       if (ret > 0)
> +       ret = aqr_wait_reset_complete(phydev);
> +       if (!ret)
>                 goto exit;
> 
>         ret = aqr_firmware_load_nvmem(phydev);

I agree with your analysis and we also noticed this.

But actually, you wouldn't want to ignore other return codes from
phy_read_mmd_poll_timeout() like real errors from phy_read_mmd():
-ENODEV, -ENXIO etc.

I found that the logic is more readable with a switch/case statement as below.

diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
index 524627a36c6f..d839f64471bd 100644
--- a/drivers/net/phy/aquantia/aquantia_firmware.c
+++ b/drivers/net/phy/aquantia/aquantia_firmware.c
@@ -353,26 +353,33 @@ int aqr_firmware_load(struct phy_device *phydev)
 {
 	int ret;
 
-	ret = aqr_wait_reset_complete(phydev);
-	if (ret)
-		return ret;
-
-	/* Check if the firmware is not already loaded by pooling
-	 * the current version returned by the PHY. If 0 is returned,
-	 * no firmware is loaded.
+	/* Check if the firmware is not already loaded by polling
+	 * the current version returned by the PHY.
 	 */
-	ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID);
-	if (ret > 0)
-		goto exit;
+	ret = aqr_wait_reset_complete(phydev);
+	switch (ret) {
+	case 0:
+		/* Some firmware is loaded => do nothing */
+		return 0;
+	case -ETIMEDOUT:
+		/* VEND1_GLOBAL_FW_ID still reads 0 after 2 seconds of polling.
+		 * We don't have full confidence that no firmware is loaded (in
+		 * theory it might just not have loaded yet), but we will
+		 * assume that, and load a new image.
+		 */
+		ret = aqr_firmware_load_nvmem(phydev);
+		if (!ret)
+			goto exit;
 
-	ret = aqr_firmware_load_nvmem(phydev);
-	if (!ret)
-		goto exit;
+		ret = aqr_firmware_load_fs(phydev);
+		if (ret)
+			return ret;
 
-	ret = aqr_firmware_load_fs(phydev);
-	if (ret)
+		break;
+	default:
+		/* PHY read error, propagate it to the caller */
 		return ret;
+	}
 
-exit:
 	return 0;
 }
Vladimir Oltean Aug. 6, 2024, 11:36 a.m. UTC | #4
Hi Russell,

On Tue, Jul 30, 2024 at 12:23:45PM +0100, Russell King (Oracle) wrote:
> > If it times out, then it would appear that with the above code we don't
> > attempt to load the firmware by any other means?
> 
> I'm also wondering about aqr_wait_reset_complete(). It uses
> phy_read_mmd_poll_timeout(), which prints an error message if it times
> out (which means no firmware has been loaded.) If we're then going on to
> attempt to load firmware, the error is not an error at all. So, I think
> while phy_read_poll_timeout() is nice and convenient, we need something
> like:
> 
> #define phy_read_poll_timeout_quiet(phydev, regnum, val, cond, sleep_us, \
>                                     timeout_us, sleep_before_read) \
> ({ \
>         int __ret, __val; \
>         __ret = read_poll_timeout(__val = phy_read, val, \
>                                   __val < 0 || (cond), \
>                 sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
>         if (__val < 0) \
>                 __ret = __val; \
>         __ret; \
> })
> 
> #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
>                                 timeout_us, sleep_before_read) \
> ({ \
>         int __ret = phy_read_poll_timeout_quiet(phydev, regnum, val, cond, \
> 						sleep_us, timeout_us, \
> 						sleep_before_read); \
>         if (__ret) \
>                 phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
>         __ret; \
> })
> 
> and aqr_wait_reset_complete() needs to use phy_read_poll_timeout_quiet().

I agree that aqr_wait_reset_complete() shouldn't have built-in prints in it,
as long as failures are also expected. Maybe an alternative option would
be for aqr_wait_reset_complete() to manually roll a call to read_poll_timeout(),
considering how it would be nice for _actual_ errors (not -ETIMEDOUT)
from phy_read_mmd() to still be logged.

But it seems strange that the driver has to time out on a 2 second poll,
and then it's still not sure why VEND1_GLOBAL_FW_ID still reads 0?
Is it because there's no firmware, or because there is, but it hasn't
waited for long enough?

I haven't followed the development of AQR firmware loading. Isn't there
a faster and more reliable way of determining whether there is firmware
in the first place? It could give the driver a 2 second boot-time speedup,
plus more confidence.
diff mbox series

Patch

diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
index 0c9640ef153b..524627a36c6f 100644
--- a/drivers/net/phy/aquantia/aquantia_firmware.c
+++ b/drivers/net/phy/aquantia/aquantia_firmware.c
@@ -353,6 +353,10 @@  int aqr_firmware_load(struct phy_device *phydev)
 {
 	int ret;
 
+	ret = aqr_wait_reset_complete(phydev);
+	if (ret)
+		return ret;
+
 	/* Check if the firmware is not already loaded by pooling
 	 * the current version returned by the PHY. If 0 is returned,
 	 * no firmware is loaded.