diff mbox series

[net,v1] net: dsa: mv88e6xxx: Make unsupported C45 reads return 0xffff

Message ID 20240120192125.1340857-1-andrew@lunn.ch (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v1] net: dsa: mv88e6xxx: Make unsupported C45 reads return 0xffff | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1077 this patch: 1077
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 0 of 0 maintainers
netdev/build_clang success Errors and warnings before: 1095 this patch: 1095
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1095 this patch: 1095
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-01-20--21-00 (tests: 403)

Commit Message

Andrew Lunn Jan. 20, 2024, 7:21 p.m. UTC
When there is no device on the bus for a given address, the pull up
resistor on the data line results in the read returning 0xffff. The
phylib core code understands this when scanning for devices on the
bus, and a number of MDIO bus masters make use of this as a way to
indicate they cannot perform the read.

Make us of this as a minimal fix for stable where the mv88e6xxx
returns EOPNOTSUPP when the hardware does not support C45, but phylib
interprets this as a fatal error, which it should not be.

Cc: stable@vger.kernel.org
Reported-by: Tim Menninger <tmenninger@purestorage.com>
Fixes: 1a136ca2e089 ("net: mdio: scan bus based on bus capabilities for C22 and C45")
Fixes: da099a7fb13d ("net: phy: Remove probe_capabilities")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Vladimir Oltean Jan. 22, 2024, 12:24 p.m. UTC | #1
Hi Andrew,

On Sat, Jan 20, 2024 at 08:21:25PM +0100, Andrew Lunn wrote:
> When there is no device on the bus for a given address, the pull up
> resistor on the data line results in the read returning 0xffff. The
> phylib core code understands this when scanning for devices on the
> bus, and a number of MDIO bus masters make use of this as a way to
> indicate they cannot perform the read.
> 
> Make us of this as a minimal fix for stable where the mv88e6xxx

s/us/use/

Also, what is the "proper" fix if this is the minimal one for stable?

> returns EOPNOTSUPP when the hardware does not support C45, but phylib
> interprets this as a fatal error, which it should not be.

I think the commit message is a bit backwards, it starts with an
explanation of the solution without ever clarifying exactly what is
the problem.

At least it could have referenced the old thread which explains that:
https://lore.kernel.org/netdev/CAO-L_44YVi0HDk4gC9QijMZrYNGoKtfH7qsXOwtDwM4VrFRDHw@mail.gmail.com/

> 
> Cc: stable@vger.kernel.org
> Reported-by: Tim Menninger <tmenninger@purestorage.com>
> Fixes: 1a136ca2e089 ("net: mdio: scan bus based on bus capabilities for C22 and C45")
> Fixes: da099a7fb13d ("net: phy: Remove probe_capabilities")
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 383b3c4d6f59..614cabb5c1b0 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -3659,7 +3659,7 @@ static int mv88e6xxx_mdio_read_c45(struct mii_bus *bus, int phy, int devad,
>  	int err;
>  
>  	if (!chip->info->ops->phy_read_c45)
> -		return -EOPNOTSUPP;
> +		return 0xffff;
>  
>  	mv88e6xxx_reg_lock(chip);
>  	err = chip->info->ops->phy_read_c45(chip, bus, phy, devad, reg, &val);
> -- 
> 2.43.0
>

Is this an RFC pending testing from Tim? Or have you reproduced the
problem and confirmed that this fixes it? It's not clear how the old
thread ended.
Vladimir Oltean Jan. 22, 2024, 12:29 p.m. UTC | #2
On Mon, Jan 22, 2024 at 02:24:57PM +0200, Vladimir Oltean wrote:
> > Fixes: 1a136ca2e089 ("net: mdio: scan bus based on bus capabilities for C22 and C45")
> > Fixes: da099a7fb13d ("net: phy: Remove probe_capabilities")

Also: commit da099a7fb13d ("net: phy: Remove probe_capabilities") is not
a functional change, so I don't see why it should be blamed? I suppose
'git bisect' would find 1a136ca2e089 ("net: mdio: scan bus based on bus
capabilities for C22 and C45")?
Vladimir Oltean Jan. 22, 2024, 12:52 p.m. UTC | #3
On Mon, Jan 22, 2024 at 02:29:21PM +0200, Vladimir Oltean wrote:
> On Mon, Jan 22, 2024 at 02:24:57PM +0200, Vladimir Oltean wrote:
> > > Fixes: 1a136ca2e089 ("net: mdio: scan bus based on bus capabilities for C22 and C45")
> > > Fixes: da099a7fb13d ("net: phy: Remove probe_capabilities")
> 
> Also: commit da099a7fb13d ("net: phy: Remove probe_capabilities") is not
> a functional change, so I don't see why it should be blamed? I suppose
> 'git bisect' would find 1a136ca2e089 ("net: mdio: scan bus based on bus
> capabilities for C22 and C45")?

One reason why this patch is not better than Tim's is because it does
not tell phylib straight away that there are no C45 capabilities on the
bus. It lets phylib scan the bus for all addresses, which yes, is not as
slow because no actual MDIO access is performed, but is also not as fast
as simply omitting the c45 ops altogether. I think it would be good to
state what would there be to lose if we just went for Tim's approach.
Andrew Lunn Jan. 22, 2024, 1:39 p.m. UTC | #4
On Mon, Jan 22, 2024 at 02:24:57PM +0200, Vladimir Oltean wrote:
> Hi Andrew,
> 
> On Sat, Jan 20, 2024 at 08:21:25PM +0100, Andrew Lunn wrote:
> > When there is no device on the bus for a given address, the pull up
> > resistor on the data line results in the read returning 0xffff. The
> > phylib core code understands this when scanning for devices on the
> > bus, and a number of MDIO bus masters make use of this as a way to
> > indicate they cannot perform the read.
> > 
> > Make us of this as a minimal fix for stable where the mv88e6xxx
> 
> s/us/use/
> 
> Also, what is the "proper" fix if this is the minimal one for stable?

Hi Vladimir

I have a patchset for net-next, once it opens. I looked at how C22 and
C45 differ in handling error codes. C22 allows the MDIO bus driver to
return -ENODEV to indicate its impossible for a device to be at a
given address. The scan code then skips that address and continues to
the next address. Current C45 code would turn that -ENODEV into an
-EIO and consider it fatal. So i change the C45 code to allow for
-ENODEV in the same way, and change the mv88e6xxx driver to return
-ENODEV if there are is no C45 read op.

Since making the handling of the error codes uniform is more than a
simple fix, i decided on a minimal fix for net.

Thanks for the comments on the commit message, i will address them
soon.

	Andrew
Tim Menninger Jan. 22, 2024, 8:44 p.m. UTC | #5
On Mon, Jan 22, 2024 at 5:39 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Mon, Jan 22, 2024 at 02:24:57PM +0200, Vladimir Oltean wrote:
> > Hi Andrew,
> >
> > On Sat, Jan 20, 2024 at 08:21:25PM +0100, Andrew Lunn wrote:
> > > When there is no device on the bus for a given address, the pull up
> > > resistor on the data line results in the read returning 0xffff. The
> > > phylib core code understands this when scanning for devices on the
> > > bus, and a number of MDIO bus masters make use of this as a way to
> > > indicate they cannot perform the read.
> > >
> > > Make us of this as a minimal fix for stable where the mv88e6xxx
> >
> > s/us/use/
> >
> > Also, what is the "proper" fix if this is the minimal one for stable?
>
> Hi Vladimir
>
> I have a patchset for net-next, once it opens. I looked at how C22 and
> C45 differ in handling error codes. C22 allows the MDIO bus driver to
> return -ENODEV to indicate its impossible for a device to be at a
> given address. The scan code then skips that address and continues to
> the next address. Current C45 code would turn that -ENODEV into an
> -EIO and consider it fatal. So i change the C45 code to allow for
> -ENODEV in the same way, and change the mv88e6xxx driver to return
> -ENODEV if there are is no C45 read op.
>
> Since making the handling of the error codes uniform is more than a
> simple fix, i decided on a minimal fix for net.
>
> Thanks for the comments on the commit message, i will address them
> soon.
>
>         Andrew

I'm not sure I fully agree with returning 0xffff here, and especially not
for just one of the four functions (reads and writes, c22 and c45). If the
end goal is to unify error handling, what if we keep the return values as
they are, i.e. continue to return -EOPNOTSUPP, and then in get_phy_c22_id
and get_phy_c45_ids on error we do something like:

    return (phy_reg == -EIO || phy_reg == -ENODEV || phy_reg == -EOPNOTSUPP)
        ? -ENODEV : -EIO;

So the diff looks something like (just getting a point across, haven't
tried or style checked this)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 3611ea64875e..f21f07f33f06 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -758,12 +758,14 @@ static int get_phy_c45_devs_in_pkg(struct
mii_bus *bus, int addr, int dev_addr,

        phy_reg = mdiobus_c45_read(bus, addr, dev_addr, MDIO_DEVS2);
        if (phy_reg < 0)
-               return -EIO;
+               return (phy_reg == -EIO || phy_reg == -ENODEV ||
phy_reg == -EOPNOTSUPP)
+                       ? -ENODEV : -EIO;
        *devices_in_package = phy_reg << 16;

        phy_reg = mdiobus_c45_read(bus, addr, dev_addr, MDIO_DEVS1);
        if (phy_reg < 0)
-               return -EIO;
+               return (phy_reg == -EIO || phy_reg == -ENODEV ||
phy_reg == -EOPNOTSUPP)
+                       ? -ENODEV : -EIO;
        *devices_in_package |= phy_reg;

        return 0;
@@ -882,7 +884,8 @@ static int get_phy_c22_id(struct mii_bus *bus, int
addr, u32 *phy_id)
        phy_reg = mdiobus_read(bus, addr, MII_PHYSID1);
        if (phy_reg < 0) {
                /* returning -ENODEV doesn't stop bus scanning */
-               return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO;
+               return (phy_reg == -EIO || phy_reg == -ENODEV ||
phy_reg == -EOPNOTSUPP)
+                       ? -ENODEV : -EIO;
        }

        *phy_id = phy_reg << 16;
@@ -891,7 +894,8 @@ static int get_phy_c22_id(struct mii_bus *bus, int
addr, u32 *phy_id)
        phy_reg = mdiobus_read(bus, addr, MII_PHYSID2);
        if (phy_reg < 0) {
                /* returning -ENODEV doesn't stop bus scanning */
-               return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO;
+               return (phy_reg == -EIO || phy_reg == -ENODEV ||
phy_reg == -EOPNOTSUPP)
+                       ? -ENODEV : -EIO;
        }

        *phy_id |= phy_reg;

This might even resemble what you had in mind in your initial feedback...

Tim
Andrew Lunn Jan. 22, 2024, 11:01 p.m. UTC | #6
> I'm not sure I fully agree with returning 0xffff here, and especially not
> for just one of the four functions (reads and writes, c22 and c45). If the
> end goal is to unify error handling, what if we keep the return values as
> they are, i.e. continue to return -EOPNOTSUPP, and then in get_phy_c22_id
> and get_phy_c45_ids on error we do something like:
> 
>     return (phy_reg == -EIO || phy_reg == -ENODEV || phy_reg == -EOPNOTSUPP)
>         ? -ENODEV : -EIO;

As i said to Vladimir, what i posted so far is just a minimal fix for
stable. After that, i have two patches for net-next, which are the
full, clean fix. And the first patch is similar to what you suggest:

+++ b/drivers/net/phy/phy_device.c
@@ -780,7 +780,7 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
  * and identifiers in @c45_ids.
  *
  * Returns zero on success, %-EIO on bus access error, or %-ENODEV if
- * the "devices in package" is invalid.
+ * the "devices in package" is invalid or no device responds.
  */
 static int get_phy_c45_ids(struct mii_bus *bus, int addr,
                           struct phy_c45_device_ids *c45_ids)
@@ -803,7 +803,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr,
                         */
                        ret = phy_c45_probe_present(bus, addr, i);
                        if (ret < 0)
-                               return -EIO;
+                               /* returning -ENODEV doesn't stop bus
+                                * scanning */
+                               return (phy_reg == -EIO ||
+                                       phy_reg == -ENODEV) ? -ENODEV : -EIO;
 
                        if (!ret)
                                continue;

This makes C22 and C45 handling of -ENODEV the same.

I then have another patch which changed mv88e6xxx to return -ENODEV.
I cannot post the net-next patches for merging until the net patch is
accepted and then merged into net-next.

  Andrew
Tim Menninger Jan. 23, 2024, 3:27 p.m. UTC | #7
On Mon, Jan 22, 2024 at 3:01 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > I'm not sure I fully agree with returning 0xffff here, and especially not
> > for just one of the four functions (reads and writes, c22 and c45). If the
> > end goal is to unify error handling, what if we keep the return values as
> > they are, i.e. continue to return -EOPNOTSUPP, and then in get_phy_c22_id
> > and get_phy_c45_ids on error we do something like:
> >
> >     return (phy_reg == -EIO || phy_reg == -ENODEV || phy_reg == -EOPNOTSUPP)
> >         ? -ENODEV : -EIO;
>
> As i said to Vladimir, what i posted so far is just a minimal fix for
> stable. After that, i have two patches for net-next, which are the
> full, clean fix. And the first patch is similar to what you suggest:
>
> +++ b/drivers/net/phy/phy_device.c
> @@ -780,7 +780,7 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
>   * and identifiers in @c45_ids.
>   *
>   * Returns zero on success, %-EIO on bus access error, or %-ENODEV if
> - * the "devices in package" is invalid.
> + * the "devices in package" is invalid or no device responds.
>   */
>  static int get_phy_c45_ids(struct mii_bus *bus, int addr,
>                            struct phy_c45_device_ids *c45_ids)
> @@ -803,7 +803,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr,
>                          */
>                         ret = phy_c45_probe_present(bus, addr, i);
>                         if (ret < 0)
> -                               return -EIO;
> +                               /* returning -ENODEV doesn't stop bus
> +                                * scanning */
> +                               return (phy_reg == -EIO ||
> +                                       phy_reg == -ENODEV) ? -ENODEV : -EIO;
>
>                         if (!ret)
>                                 continue;
>
> This makes C22 and C45 handling of -ENODEV the same.
>
> I then have another patch which changed mv88e6xxx to return -ENODEV.
> I cannot post the net-next patches for merging until the net patch is
> accepted and then merged into net-next.
>
>   Andrew

Does that mean if there's a device there but it doesn't support C45 (no
phy_read_c45), it will now return ENODEV?

I suppose that's my only nit but at the end of the day I'm not unhappy with it.

Thank you for taking the time to look at this with me. Is there anything
else you need from me?
Andrew Lunn Jan. 23, 2024, 10:59 p.m. UTC | #8
> Does that mean if there's a device there but it doesn't support C45 (no
> phy_read_c45), it will now return ENODEV?

Yes, mv88e6xxx_mdio_read_c45() will return -ENODEV if
chip->info->ops->phy_read_c45 is NULL. That will cause the scan of
that address to immediately skip to the next address. This is old
behaviour for C22:

commit 02a6efcab675fe32815d824837784c3f42a7d892
Author: Alexandre Belloni <alexandre.belloni@bootlin.com>
Date:   Tue Apr 24 18:09:04 2018 +0200

    net: phy: allow scanning busses with missing phys
    
    Some MDIO busses will error out when trying to read a phy address with no
    phy present at that address. In that case, probing the bus will fail
    because __mdiobus_register() is scanning the bus for all possible phys
    addresses.
    
    In case MII_PHYSID1 returns -EIO or -ENODEV, consider there is no phy at
    this address and set the phy ID to 0xffffffff which is then properly
    handled in get_phy_device().

And there are a few MDIO bus drivers which make use of this, e.g.

static int lan9303_phy_read(struct dsa_switch *ds, int phy, int regnum)
{
        struct lan9303 *chip = ds->priv;
        int phy_base = chip->phy_addr_base;

        if (phy == phy_base)
                return lan9303_virt_phy_reg_read(chip, regnum);
        if (phy > phy_base + 2)
                return -ENODEV;

        return chip->ops->phy_read(chip, phy, regnum);

This Ethernet switch supports only a number of PHY addresses, and
returns -ENODEV for the rest.

So its a legitimate way to say there is nothing here.

You suggestion of allowing ENOPSUPP for C45 would of fixed the
problem, but C22 and C45 would support different error codes, which i
don't like. Its better to be uniform.

	Andrew
Tim Menninger Jan. 24, 2024, 3:59 p.m. UTC | #9
On Tue, Jan 23, 2024 at 2:59 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Does that mean if there's a device there but it doesn't support C45 (no
> > phy_read_c45), it will now return ENODEV?
>
> Yes, mv88e6xxx_mdio_read_c45() will return -ENODEV if
> chip->info->ops->phy_read_c45 is NULL. That will cause the scan of
> that address to immediately skip to the next address. This is old
> behaviour for C22:
>
> commit 02a6efcab675fe32815d824837784c3f42a7d892
> Author: Alexandre Belloni <alexandre.belloni@bootlin.com>
> Date:   Tue Apr 24 18:09:04 2018 +0200
>
>     net: phy: allow scanning busses with missing phys
>
>     Some MDIO busses will error out when trying to read a phy address with no
>     phy present at that address. In that case, probing the bus will fail
>     because __mdiobus_register() is scanning the bus for all possible phys
>     addresses.
>
>     In case MII_PHYSID1 returns -EIO or -ENODEV, consider there is no phy at
>     this address and set the phy ID to 0xffffffff which is then properly
>     handled in get_phy_device().
>
> And there are a few MDIO bus drivers which make use of this, e.g.
>
> static int lan9303_phy_read(struct dsa_switch *ds, int phy, int regnum)
> {
>         struct lan9303 *chip = ds->priv;
>         int phy_base = chip->phy_addr_base;
>
>         if (phy == phy_base)
>                 return lan9303_virt_phy_reg_read(chip, regnum);
>         if (phy > phy_base + 2)
>                 return -ENODEV;
>
>         return chip->ops->phy_read(chip, phy, regnum);
>
> This Ethernet switch supports only a number of PHY addresses, and
> returns -ENODEV for the rest.
>
> So its a legitimate way to say there is nothing here.
>
> You suggestion of allowing ENOPSUPP for C45 would of fixed the
> problem, but C22 and C45 would support different error codes, which i
> don't like. Its better to be uniform.
>
>         Andrew

Excellent, color me convinced.
diff mbox series

Patch

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 383b3c4d6f59..614cabb5c1b0 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3659,7 +3659,7 @@  static int mv88e6xxx_mdio_read_c45(struct mii_bus *bus, int phy, int devad,
 	int err;
 
 	if (!chip->info->ops->phy_read_c45)
-		return -EOPNOTSUPP;
+		return 0xffff;
 
 	mv88e6xxx_reg_lock(chip);
 	err = chip->info->ops->phy_read_c45(chip, bus, phy, devad, reg, &val);