Message ID | 20240120192125.1340857-1-andrew@lunn.ch (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v1] net: dsa: mv88e6xxx: Make unsupported C45 reads return 0xffff | expand |
Hi Andrew, On Sat, Jan 20, 2024 at 08:21:25PM +0100, Andrew Lunn wrote: > When there is no device on the bus for a given address, the pull up > resistor on the data line results in the read returning 0xffff. The > phylib core code understands this when scanning for devices on the > bus, and a number of MDIO bus masters make use of this as a way to > indicate they cannot perform the read. > > Make us of this as a minimal fix for stable where the mv88e6xxx s/us/use/ Also, what is the "proper" fix if this is the minimal one for stable? > returns EOPNOTSUPP when the hardware does not support C45, but phylib > interprets this as a fatal error, which it should not be. I think the commit message is a bit backwards, it starts with an explanation of the solution without ever clarifying exactly what is the problem. At least it could have referenced the old thread which explains that: https://lore.kernel.org/netdev/CAO-L_44YVi0HDk4gC9QijMZrYNGoKtfH7qsXOwtDwM4VrFRDHw@mail.gmail.com/ > > Cc: stable@vger.kernel.org > Reported-by: Tim Menninger <tmenninger@purestorage.com> > Fixes: 1a136ca2e089 ("net: mdio: scan bus based on bus capabilities for C22 and C45") > Fixes: da099a7fb13d ("net: phy: Remove probe_capabilities") > Signed-off-by: Andrew Lunn <andrew@lunn.ch> > --- > drivers/net/dsa/mv88e6xxx/chip.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c > index 383b3c4d6f59..614cabb5c1b0 100644 > --- a/drivers/net/dsa/mv88e6xxx/chip.c > +++ b/drivers/net/dsa/mv88e6xxx/chip.c > @@ -3659,7 +3659,7 @@ static int mv88e6xxx_mdio_read_c45(struct mii_bus *bus, int phy, int devad, > int err; > > if (!chip->info->ops->phy_read_c45) > - return -EOPNOTSUPP; > + return 0xffff; > > mv88e6xxx_reg_lock(chip); > err = chip->info->ops->phy_read_c45(chip, bus, phy, devad, reg, &val); > -- > 2.43.0 > Is this an RFC pending testing from Tim? Or have you reproduced the problem and confirmed that this fixes it? It's not clear how the old thread ended.
On Mon, Jan 22, 2024 at 02:24:57PM +0200, Vladimir Oltean wrote: > > Fixes: 1a136ca2e089 ("net: mdio: scan bus based on bus capabilities for C22 and C45") > > Fixes: da099a7fb13d ("net: phy: Remove probe_capabilities") Also: commit da099a7fb13d ("net: phy: Remove probe_capabilities") is not a functional change, so I don't see why it should be blamed? I suppose 'git bisect' would find 1a136ca2e089 ("net: mdio: scan bus based on bus capabilities for C22 and C45")?
On Mon, Jan 22, 2024 at 02:29:21PM +0200, Vladimir Oltean wrote: > On Mon, Jan 22, 2024 at 02:24:57PM +0200, Vladimir Oltean wrote: > > > Fixes: 1a136ca2e089 ("net: mdio: scan bus based on bus capabilities for C22 and C45") > > > Fixes: da099a7fb13d ("net: phy: Remove probe_capabilities") > > Also: commit da099a7fb13d ("net: phy: Remove probe_capabilities") is not > a functional change, so I don't see why it should be blamed? I suppose > 'git bisect' would find 1a136ca2e089 ("net: mdio: scan bus based on bus > capabilities for C22 and C45")? One reason why this patch is not better than Tim's is because it does not tell phylib straight away that there are no C45 capabilities on the bus. It lets phylib scan the bus for all addresses, which yes, is not as slow because no actual MDIO access is performed, but is also not as fast as simply omitting the c45 ops altogether. I think it would be good to state what would there be to lose if we just went for Tim's approach.
On Mon, Jan 22, 2024 at 02:24:57PM +0200, Vladimir Oltean wrote: > Hi Andrew, > > On Sat, Jan 20, 2024 at 08:21:25PM +0100, Andrew Lunn wrote: > > When there is no device on the bus for a given address, the pull up > > resistor on the data line results in the read returning 0xffff. The > > phylib core code understands this when scanning for devices on the > > bus, and a number of MDIO bus masters make use of this as a way to > > indicate they cannot perform the read. > > > > Make us of this as a minimal fix for stable where the mv88e6xxx > > s/us/use/ > > Also, what is the "proper" fix if this is the minimal one for stable? Hi Vladimir I have a patchset for net-next, once it opens. I looked at how C22 and C45 differ in handling error codes. C22 allows the MDIO bus driver to return -ENODEV to indicate its impossible for a device to be at a given address. The scan code then skips that address and continues to the next address. Current C45 code would turn that -ENODEV into an -EIO and consider it fatal. So i change the C45 code to allow for -ENODEV in the same way, and change the mv88e6xxx driver to return -ENODEV if there are is no C45 read op. Since making the handling of the error codes uniform is more than a simple fix, i decided on a minimal fix for net. Thanks for the comments on the commit message, i will address them soon. Andrew
On Mon, Jan 22, 2024 at 5:39 AM Andrew Lunn <andrew@lunn.ch> wrote: > > On Mon, Jan 22, 2024 at 02:24:57PM +0200, Vladimir Oltean wrote: > > Hi Andrew, > > > > On Sat, Jan 20, 2024 at 08:21:25PM +0100, Andrew Lunn wrote: > > > When there is no device on the bus for a given address, the pull up > > > resistor on the data line results in the read returning 0xffff. The > > > phylib core code understands this when scanning for devices on the > > > bus, and a number of MDIO bus masters make use of this as a way to > > > indicate they cannot perform the read. > > > > > > Make us of this as a minimal fix for stable where the mv88e6xxx > > > > s/us/use/ > > > > Also, what is the "proper" fix if this is the minimal one for stable? > > Hi Vladimir > > I have a patchset for net-next, once it opens. I looked at how C22 and > C45 differ in handling error codes. C22 allows the MDIO bus driver to > return -ENODEV to indicate its impossible for a device to be at a > given address. The scan code then skips that address and continues to > the next address. Current C45 code would turn that -ENODEV into an > -EIO and consider it fatal. So i change the C45 code to allow for > -ENODEV in the same way, and change the mv88e6xxx driver to return > -ENODEV if there are is no C45 read op. > > Since making the handling of the error codes uniform is more than a > simple fix, i decided on a minimal fix for net. > > Thanks for the comments on the commit message, i will address them > soon. > > Andrew I'm not sure I fully agree with returning 0xffff here, and especially not for just one of the four functions (reads and writes, c22 and c45). If the end goal is to unify error handling, what if we keep the return values as they are, i.e. continue to return -EOPNOTSUPP, and then in get_phy_c22_id and get_phy_c45_ids on error we do something like: return (phy_reg == -EIO || phy_reg == -ENODEV || phy_reg == -EOPNOTSUPP) ? -ENODEV : -EIO; So the diff looks something like (just getting a point across, haven't tried or style checked this) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 3611ea64875e..f21f07f33f06 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -758,12 +758,14 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr, phy_reg = mdiobus_c45_read(bus, addr, dev_addr, MDIO_DEVS2); if (phy_reg < 0) - return -EIO; + return (phy_reg == -EIO || phy_reg == -ENODEV || phy_reg == -EOPNOTSUPP) + ? -ENODEV : -EIO; *devices_in_package = phy_reg << 16; phy_reg = mdiobus_c45_read(bus, addr, dev_addr, MDIO_DEVS1); if (phy_reg < 0) - return -EIO; + return (phy_reg == -EIO || phy_reg == -ENODEV || phy_reg == -EOPNOTSUPP) + ? -ENODEV : -EIO; *devices_in_package |= phy_reg; return 0; @@ -882,7 +884,8 @@ static int get_phy_c22_id(struct mii_bus *bus, int addr, u32 *phy_id) phy_reg = mdiobus_read(bus, addr, MII_PHYSID1); if (phy_reg < 0) { /* returning -ENODEV doesn't stop bus scanning */ - return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO; + return (phy_reg == -EIO || phy_reg == -ENODEV || phy_reg == -EOPNOTSUPP) + ? -ENODEV : -EIO; } *phy_id = phy_reg << 16; @@ -891,7 +894,8 @@ static int get_phy_c22_id(struct mii_bus *bus, int addr, u32 *phy_id) phy_reg = mdiobus_read(bus, addr, MII_PHYSID2); if (phy_reg < 0) { /* returning -ENODEV doesn't stop bus scanning */ - return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO; + return (phy_reg == -EIO || phy_reg == -ENODEV || phy_reg == -EOPNOTSUPP) + ? -ENODEV : -EIO; } *phy_id |= phy_reg; This might even resemble what you had in mind in your initial feedback... Tim
> I'm not sure I fully agree with returning 0xffff here, and especially not > for just one of the four functions (reads and writes, c22 and c45). If the > end goal is to unify error handling, what if we keep the return values as > they are, i.e. continue to return -EOPNOTSUPP, and then in get_phy_c22_id > and get_phy_c45_ids on error we do something like: > > return (phy_reg == -EIO || phy_reg == -ENODEV || phy_reg == -EOPNOTSUPP) > ? -ENODEV : -EIO; As i said to Vladimir, what i posted so far is just a minimal fix for stable. After that, i have two patches for net-next, which are the full, clean fix. And the first patch is similar to what you suggest: +++ b/drivers/net/phy/phy_device.c @@ -780,7 +780,7 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr, * and identifiers in @c45_ids. * * Returns zero on success, %-EIO on bus access error, or %-ENODEV if - * the "devices in package" is invalid. + * the "devices in package" is invalid or no device responds. */ static int get_phy_c45_ids(struct mii_bus *bus, int addr, struct phy_c45_device_ids *c45_ids) @@ -803,7 +803,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, */ ret = phy_c45_probe_present(bus, addr, i); if (ret < 0) - return -EIO; + /* returning -ENODEV doesn't stop bus + * scanning */ + return (phy_reg == -EIO || + phy_reg == -ENODEV) ? -ENODEV : -EIO; if (!ret) continue; This makes C22 and C45 handling of -ENODEV the same. I then have another patch which changed mv88e6xxx to return -ENODEV. I cannot post the net-next patches for merging until the net patch is accepted and then merged into net-next. Andrew
On Mon, Jan 22, 2024 at 3:01 PM Andrew Lunn <andrew@lunn.ch> wrote: > > > I'm not sure I fully agree with returning 0xffff here, and especially not > > for just one of the four functions (reads and writes, c22 and c45). If the > > end goal is to unify error handling, what if we keep the return values as > > they are, i.e. continue to return -EOPNOTSUPP, and then in get_phy_c22_id > > and get_phy_c45_ids on error we do something like: > > > > return (phy_reg == -EIO || phy_reg == -ENODEV || phy_reg == -EOPNOTSUPP) > > ? -ENODEV : -EIO; > > As i said to Vladimir, what i posted so far is just a minimal fix for > stable. After that, i have two patches for net-next, which are the > full, clean fix. And the first patch is similar to what you suggest: > > +++ b/drivers/net/phy/phy_device.c > @@ -780,7 +780,7 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr, > * and identifiers in @c45_ids. > * > * Returns zero on success, %-EIO on bus access error, or %-ENODEV if > - * the "devices in package" is invalid. > + * the "devices in package" is invalid or no device responds. > */ > static int get_phy_c45_ids(struct mii_bus *bus, int addr, > struct phy_c45_device_ids *c45_ids) > @@ -803,7 +803,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, > */ > ret = phy_c45_probe_present(bus, addr, i); > if (ret < 0) > - return -EIO; > + /* returning -ENODEV doesn't stop bus > + * scanning */ > + return (phy_reg == -EIO || > + phy_reg == -ENODEV) ? -ENODEV : -EIO; > > if (!ret) > continue; > > This makes C22 and C45 handling of -ENODEV the same. > > I then have another patch which changed mv88e6xxx to return -ENODEV. > I cannot post the net-next patches for merging until the net patch is > accepted and then merged into net-next. > > Andrew Does that mean if there's a device there but it doesn't support C45 (no phy_read_c45), it will now return ENODEV? I suppose that's my only nit but at the end of the day I'm not unhappy with it. Thank you for taking the time to look at this with me. Is there anything else you need from me?
> Does that mean if there's a device there but it doesn't support C45 (no > phy_read_c45), it will now return ENODEV? Yes, mv88e6xxx_mdio_read_c45() will return -ENODEV if chip->info->ops->phy_read_c45 is NULL. That will cause the scan of that address to immediately skip to the next address. This is old behaviour for C22: commit 02a6efcab675fe32815d824837784c3f42a7d892 Author: Alexandre Belloni <alexandre.belloni@bootlin.com> Date: Tue Apr 24 18:09:04 2018 +0200 net: phy: allow scanning busses with missing phys Some MDIO busses will error out when trying to read a phy address with no phy present at that address. In that case, probing the bus will fail because __mdiobus_register() is scanning the bus for all possible phys addresses. In case MII_PHYSID1 returns -EIO or -ENODEV, consider there is no phy at this address and set the phy ID to 0xffffffff which is then properly handled in get_phy_device(). And there are a few MDIO bus drivers which make use of this, e.g. static int lan9303_phy_read(struct dsa_switch *ds, int phy, int regnum) { struct lan9303 *chip = ds->priv; int phy_base = chip->phy_addr_base; if (phy == phy_base) return lan9303_virt_phy_reg_read(chip, regnum); if (phy > phy_base + 2) return -ENODEV; return chip->ops->phy_read(chip, phy, regnum); This Ethernet switch supports only a number of PHY addresses, and returns -ENODEV for the rest. So its a legitimate way to say there is nothing here. You suggestion of allowing ENOPSUPP for C45 would of fixed the problem, but C22 and C45 would support different error codes, which i don't like. Its better to be uniform. Andrew
On Tue, Jan 23, 2024 at 2:59 PM Andrew Lunn <andrew@lunn.ch> wrote: > > > Does that mean if there's a device there but it doesn't support C45 (no > > phy_read_c45), it will now return ENODEV? > > Yes, mv88e6xxx_mdio_read_c45() will return -ENODEV if > chip->info->ops->phy_read_c45 is NULL. That will cause the scan of > that address to immediately skip to the next address. This is old > behaviour for C22: > > commit 02a6efcab675fe32815d824837784c3f42a7d892 > Author: Alexandre Belloni <alexandre.belloni@bootlin.com> > Date: Tue Apr 24 18:09:04 2018 +0200 > > net: phy: allow scanning busses with missing phys > > Some MDIO busses will error out when trying to read a phy address with no > phy present at that address. In that case, probing the bus will fail > because __mdiobus_register() is scanning the bus for all possible phys > addresses. > > In case MII_PHYSID1 returns -EIO or -ENODEV, consider there is no phy at > this address and set the phy ID to 0xffffffff which is then properly > handled in get_phy_device(). > > And there are a few MDIO bus drivers which make use of this, e.g. > > static int lan9303_phy_read(struct dsa_switch *ds, int phy, int regnum) > { > struct lan9303 *chip = ds->priv; > int phy_base = chip->phy_addr_base; > > if (phy == phy_base) > return lan9303_virt_phy_reg_read(chip, regnum); > if (phy > phy_base + 2) > return -ENODEV; > > return chip->ops->phy_read(chip, phy, regnum); > > This Ethernet switch supports only a number of PHY addresses, and > returns -ENODEV for the rest. > > So its a legitimate way to say there is nothing here. > > You suggestion of allowing ENOPSUPP for C45 would of fixed the > problem, but C22 and C45 would support different error codes, which i > don't like. Its better to be uniform. > > Andrew Excellent, color me convinced.
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 383b3c4d6f59..614cabb5c1b0 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -3659,7 +3659,7 @@ static int mv88e6xxx_mdio_read_c45(struct mii_bus *bus, int phy, int devad, int err; if (!chip->info->ops->phy_read_c45) - return -EOPNOTSUPP; + return 0xffff; mv88e6xxx_reg_lock(chip); err = chip->info->ops->phy_read_c45(chip, bus, phy, devad, reg, &val);
When there is no device on the bus for a given address, the pull up resistor on the data line results in the read returning 0xffff. The phylib core code understands this when scanning for devices on the bus, and a number of MDIO bus masters make use of this as a way to indicate they cannot perform the read. Make us of this as a minimal fix for stable where the mv88e6xxx returns EOPNOTSUPP when the hardware does not support C45, but phylib interprets this as a fatal error, which it should not be. Cc: stable@vger.kernel.org Reported-by: Tim Menninger <tmenninger@purestorage.com> Fixes: 1a136ca2e089 ("net: mdio: scan bus based on bus capabilities for C22 and C45") Fixes: da099a7fb13d ("net: phy: Remove probe_capabilities") Signed-off-by: Andrew Lunn <andrew@lunn.ch> --- drivers/net/dsa/mv88e6xxx/chip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)