Message ID | 20220816163701.1578850-1-sean.anderson@seco.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: phy: Warn if phy is attached when removing | expand |
On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote: > netdevs using phylib can be oopsed from userspace in the following > manner: > > $ ip link set $iface up > $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \ > /sys/class/net/$iface/phydev/driver/unbind > $ ip link set $iface down > > However, the traceback provided is a bit too late, since it does not > capture the root of the problem (unbinding the driver). It's also > possible that the memory has been reallocated if sufficient time passes > between when the phy is detached and when the netdev touches the phy > (which could result in silent memory corruption). Add a warning at the > source of the problem. A future patch could make this more robust by > calling dev_close. Hm, so we're adding the warning to get more detailed reports "from the field"? Guess we've all done that, so fair. Acks? It can still make -rc2 if that matters...
On 8/18/22 1:24 PM, Jakub Kicinski wrote: > On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote: >> netdevs using phylib can be oopsed from userspace in the following >> manner: >> >> $ ip link set $iface up >> $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \ >> /sys/class/net/$iface/phydev/driver/unbind >> $ ip link set $iface down >> >> However, the traceback provided is a bit too late, since it does not >> capture the root of the problem (unbinding the driver). It's also >> possible that the memory has been reallocated if sufficient time passes >> between when the phy is detached and when the netdev touches the phy >> (which could result in silent memory corruption). Add a warning at the >> source of the problem. A future patch could make this more robust by >> calling dev_close. > > Hm, so we're adding the warning to get more detailed reports "from the > field"? Guess we've all done that, so fair. My suspicion is that this case never occurs, since users don't expect to be able to remove the phy while the interface is running (and so don't attempt it). If we do end up getting reports of this bug, then we will need to create a more robust fix. My intention is to take the same strategy for PCS devices as whatever we do here, since the issue is analogous. --Sean
On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote: > netdevs using phylib can be oopsed from userspace in the following > manner: > > $ ip link set $iface up > $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \ > /sys/class/net/$iface/phydev/driver/unbind > $ ip link set $iface down > > However, the traceback provided is a bit too late, since it does not > capture the root of the problem (unbinding the driver). It's also > possible that the memory has been reallocated if sufficient time passes > between when the phy is detached and when the netdev touches the phy > (which could result in silent memory corruption). Add a warning at the > source of the problem. A future patch could make this more robust by > calling dev_close. FWIW, I think DaveM marked this patch as changes requested. I don't really know enough to have an opinion. PHY maintainers, anyone willing to cast a vote?
On Fri, Aug 19, 2022 at 04:45:19PM -0700, Jakub Kicinski wrote: > On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote: > > netdevs using phylib can be oopsed from userspace in the following > > manner: > > > > $ ip link set $iface up > > $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \ > > /sys/class/net/$iface/phydev/driver/unbind > > $ ip link set $iface down > > > > However, the traceback provided is a bit too late, since it does not > > capture the root of the problem (unbinding the driver). It's also > > possible that the memory has been reallocated if sufficient time passes > > between when the phy is detached and when the netdev touches the phy > > (which could result in silent memory corruption). Add a warning at the > > source of the problem. A future patch could make this more robust by > > calling dev_close. > > FWIW, I think DaveM marked this patch as changes requested. > > I don't really know enough to have an opinion. > > PHY maintainers, anyone willing to cast a vote? I don't think Linus is a fan of using WARN_ON() as an assert() replacement, which this feels very much like that kind of thing. I don't see much point in using WARN_ON() here as we'll soon get a kernel oops anyway, and the backtrace WARN_ON() will produce isn't useful - it'll be just noise. So, I'd tone it down to something like: if (phydev->attached_dev) phydev_err(phydev, "Removing in-use PHY, expect an oops"); Maybe even introduce phydev_crit() just for this message. Since we have bazillions of network drivers that hold a reference to the phydev, I don't think we can do much better than this for phylib. It would be a massive effort to go through all the network drivers and try to work out how to fix them. Addressing the PCS part of the patch posting and unrelated to what we do for phylib... However, I don't see "we'll do this for phylib, so we can do it for PCS as well" as much of a sane argument - we don't have bazillions of network drivers to fix to make this work for PCS. We currently have no removable PCS (they don't appear with a driver so aren't even bound to anything.) So we should add the appropriate handling when we start to do this. Phylink has the capability to take the link down when something goes away, and if the PCS goes away, that's exactly what should happen, rather than oopsing the kernel. As MAC drivers hold a reference to the PCS instances, as they need to select the appropriate one, how do MAC drivers get to know that the PCS has gone away to drop their reference - and tell phylink that the PCS has gone. That's the problem that needs solving to allow PCS to be unbound if we're going to make them first class citizens of the driver model. I am no fan of "but XYZ doesn't care about it, so why should we care" arguments. If I remember correctly, phylib pre-dates the device model, and had the device model retrofitted, so it was a best-efforts attempt - and suffered back then with the same problem of needing lots of drivers to be changed in non-trivial ways. We have the chance here to come up with something better - and I think that chance should be used to full effect.
On 8/19/22 8:20 PM, Russell King (Oracle) wrote: > On Fri, Aug 19, 2022 at 04:45:19PM -0700, Jakub Kicinski wrote: >> On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote: >> > netdevs using phylib can be oopsed from userspace in the following >> > manner: >> > >> > $ ip link set $iface up >> > $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \ >> > /sys/class/net/$iface/phydev/driver/unbind >> > $ ip link set $iface down >> > >> > However, the traceback provided is a bit too late, since it does not >> > capture the root of the problem (unbinding the driver). It's also >> > possible that the memory has been reallocated if sufficient time passes >> > between when the phy is detached and when the netdev touches the phy >> > (which could result in silent memory corruption). Add a warning at the >> > source of the problem. A future patch could make this more robust by >> > calling dev_close. >> >> FWIW, I think DaveM marked this patch as changes requested. >> >> I don't really know enough to have an opinion. >> >> PHY maintainers, anyone willing to cast a vote? > > I don't think Linus is a fan of using WARN_ON() as an assert() > replacement, which this feels very much like that kind of thing. > I don't see much point in using WARN_ON() here as we'll soon get > a kernel oops anyway, and the backtrace WARN_ON() will produce isn't > useful - it'll be just noise. > > So, I'd tone it down to something like: > > if (phydev->attached_dev) > phydev_err(phydev, "Removing in-use PHY, expect an oops"); That's fine by me > Maybe even introduce phydev_crit() just for this message. > > Since we have bazillions of network drivers that hold a reference to > the phydev, I don't think we can do much better than this for phylib. > It would be a massive effort to go through all the network drivers > and try to work out how to fix them. In the last thread I posted this snippet: diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index a74b320f5b27..05894e1c3e59 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -27,6 +27,7 @@ #include <linux/phy.h> #include <linux/phy_led_triggers.h> #include <linux/property.h> +#include <linux/rtnetlink.h> #include <linux/sfp.h> #include <linux/skbuff.h> #include <linux/slab.h> @@ -3111,6 +3112,13 @@ static int phy_remove(struct device *dev) { struct phy_device *phydev = to_phy_device(dev); + // I'm pretty sure this races with multiple unbinds... + rtnl_lock(); + device_unlock(dev); + dev_close(phydev->attached_dev); + device_lock(dev); + rtnl_unlock(); + WARN_ON(phydev->attached_dev); + cancel_delayed_work_sync(&phydev->state_queue); mutex_lock(&phydev->lock); --- Would this be acceptable? Can the locking be fixed? > Addressing the PCS part of the patch posting and unrelated to what we > do for phylib... > > However, I don't see "we'll do this for phylib, so we can do it for > PCS as well" as much of a sane argument - we don't have bazillions > of network drivers to fix to make this work for PCS. We currently > have no removable PCS (they don't appear with a driver so aren't > even bound to anything.) So we should add the appropriate handling > when we start to do this. > > Phylink has the capability to take the link down when something goes > away, and if the PCS goes away, that's exactly what should happen, > rather than oopsing the kernel. Yeah, but we can't just call phylink_stop; we have to call the function which will call phylink_stop, which is different for MAC drivers and for DSA drivers. I think we'd need something like struct phylink_pcs *pcs_get(struct device *dev, const char *id, void (*detach)(struct phylink_pcs *, void *), void *priv) which would also require that pcs_get is called before phylink_start, instead of in probe (which is what every existing user does). > As MAC drivers hold a reference to the PCS instances, as they need to > select the appropriate one, how do MAC drivers get to know that the > PCS has gone away to drop their reference - and tell phylink that the > PCS has gone. That's the problem that needs solving to allow PCS to > be unbound if we're going to make them first class citizens of the > driver model. > > I am no fan of "but XYZ doesn't care about it, so why should we care" > arguments. If I remember correctly, phylib pre-dates the device model, > and had the device model retrofitted, so it was a best-efforts > attempt - and suffered back then with the same problem of needing > lots of drivers to be changed in non-trivial ways. > > We have the chance here to come up with something better - and I think > that chance should be used to full effect. --- This whole thing has me asking the question: why do we allow unbinding in-use devices in the first place? --Sean
On Mon, Aug 22, 2022 at 12:00:28PM -0400, Sean Anderson wrote: > In the last thread I posted this snippet: > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index a74b320f5b27..05894e1c3e59 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -27,6 +27,7 @@ > #include <linux/phy.h> > #include <linux/phy_led_triggers.h> > #include <linux/property.h> > +#include <linux/rtnetlink.h> > #include <linux/sfp.h> > #include <linux/skbuff.h> > #include <linux/slab.h> > @@ -3111,6 +3112,13 @@ static int phy_remove(struct device *dev) > { > struct phy_device *phydev = to_phy_device(dev); > > + // I'm pretty sure this races with multiple unbinds... > + rtnl_lock(); > + device_unlock(dev); > + dev_close(phydev->attached_dev); > + device_lock(dev); > + rtnl_unlock(); > + WARN_ON(phydev->attached_dev); > + > cancel_delayed_work_sync(&phydev->state_queue); > > mutex_lock(&phydev->lock); > --- > > Would this be acceptable? Can the locking be fixed? I can't comment on that. > > Addressing the PCS part of the patch posting and unrelated to what we > > do for phylib... > > > > However, I don't see "we'll do this for phylib, so we can do it for > > PCS as well" as much of a sane argument - we don't have bazillions > > of network drivers to fix to make this work for PCS. We currently > > have no removable PCS (they don't appear with a driver so aren't > > even bound to anything.) So we should add the appropriate handling > > when we start to do this. > > > > Phylink has the capability to take the link down when something goes > > away, and if the PCS goes away, that's exactly what should happen, > > rather than oopsing the kernel. > > Yeah, but we can't just call phylink_stop; we have to call the function > which will call phylink_stop, which is different for MAC drivers and > for DSA drivers. I think that's way too much and breaks the phylink design. phylink_stop is designed to be called only from ndo_close() - and to be paired with phylink_start(). When I talk about "taking the link down" what I mean by that is telling the network device that the *link* *is* *down* and no more. In other words, having phylink_resolve() know that there should be a PCS but it's gone, and therefore the link should not come up in its current configuration. > I think we'd need something like > > struct phylink_pcs *pcs_get(struct device *dev, const char *id, > void (*detach)(struct phylink_pcs *, void *), > void *priv) > > which would also require that pcs_get is called before phylink_start, > instead of in probe (which is what every existing user does). That would at least allow the MAC driver to take action when the PCS gets removed. > This whole thing has me asking the question: why do we allow unbinding > in-use devices in the first place? The driver model was designed that way from the start, because in most cases when something is unplugged from the system, the "remove" driver callback is just a notification that the device has already gone. Failing it makes no sense, because software can't magic the device back. Take an example of a USB device. The user unplugs it, it's gone from the system, but the system briefly still believes the device to be present for a while. It eventually notices that the device has gone, and the USB layer unregisters the struct device - which has the effect of unbinding the device from the driver and eventually cleaning up the struct device. This can and does happen with Ethernet PHYs ever since we started supporting SFPs with Ethernet PHYs. The same thing is true there - you can pull the module at any moment, it will be gone, and the system will catch up with its physical disconnection some point later. It's no good trying to make ->remove say "no, the device is still in use, I won't let you remove it" because there's nothing software can do to prevent the going of the device - the device has already physically gone. I don't think that's the case with PCS - do we really have any PCS that can be disconnected from the system while it's running? Maybe ones in a FPGA and the FPGA can be reprogrammed at runtime (yes, people have really done this in the past.) If we don't want to support "hotpluggable" PCS, then there's a simple solution to this - the driver model has the facility to suppress the bind/unbind driver files in sysfs, which means userspace can't unbind the device. If there's also no way to make the struct device go away in the kernel, then effectively the driver can then only be unbound if the driver is built as a module. At that point, we always have the option of insisting that PCS drivers are never modules - and then we have the situation where a PCS will never disappear from the system once the driver has picked up on it. Of course, those PCS that are tightly bound to their MACs can still come and go along with their MACs, but it's up to the MAC driver to make that happen safely (unregistering the netdev first, which will have the effect of calling ndo_close(), taking the network device down and preventing further access to the netdev - standard netdev MAC driver stuff.)
On 8/22/22 12:32 PM, Russell King (Oracle) wrote: > On Mon, Aug 22, 2022 at 12:00:28PM -0400, Sean Anderson wrote: >> In the last thread I posted this snippet: >> >> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >> index a74b320f5b27..05894e1c3e59 100644 >> --- a/drivers/net/phy/phy_device.c >> +++ b/drivers/net/phy/phy_device.c >> @@ -27,6 +27,7 @@ >> #include <linux/phy.h> >> #include <linux/phy_led_triggers.h> >> #include <linux/property.h> >> +#include <linux/rtnetlink.h> >> #include <linux/sfp.h> >> #include <linux/skbuff.h> >> #include <linux/slab.h> >> @@ -3111,6 +3112,13 @@ static int phy_remove(struct device *dev) >> { >> struct phy_device *phydev = to_phy_device(dev); >> >> + // I'm pretty sure this races with multiple unbinds... >> + rtnl_lock(); >> + device_unlock(dev); >> + dev_close(phydev->attached_dev); >> + device_lock(dev); >> + rtnl_unlock(); >> + WARN_ON(phydev->attached_dev); >> + >> cancel_delayed_work_sync(&phydev->state_queue); >> >> mutex_lock(&phydev->lock); >> --- >> >> Would this be acceptable? Can the locking be fixed? > > I can't comment on that. :l I feel like doing device_unlock is very wrong, but someone in the dev_close call calls device_lock and it deadlocks. >> > Addressing the PCS part of the patch posting and unrelated to what we >> > do for phylib... >> > >> > However, I don't see "we'll do this for phylib, so we can do it for >> > PCS as well" as much of a sane argument - we don't have bazillions >> > of network drivers to fix to make this work for PCS. We currently >> > have no removable PCS (they don't appear with a driver so aren't >> > even bound to anything.) So we should add the appropriate handling >> > when we start to do this. >> > >> > Phylink has the capability to take the link down when something goes >> > away, and if the PCS goes away, that's exactly what should happen, >> > rather than oopsing the kernel. >> >> Yeah, but we can't just call phylink_stop; we have to call the function >> which will call phylink_stop, which is different for MAC drivers and >> for DSA drivers. > > I think that's way too much and breaks the phylink design. phylink_stop > is designed to be called only from ndo_close() - and to be paired with > phylink_start(). Well, the driver almost certainly wants to bring the link down (so that it can stop using the PCS) Maybe we just need to call phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED)? > When I talk about "taking the link down" what I mean by that is telling > the network device that the *link* *is* *down* and no more. In other > words, having phylink_resolve() know that there should be a PCS but it's > gone, and therefore the link should not come up in its current > configuration. Fundamentally the driver is the one which owns the PCS, not phylink. The driver is perfectly capable of touching the PCS by accident or on purpose, including calling PCS operations directly. We already have two in-tree drivers which do this (mt7530 and mvpp2). I also used this method when conversion dpaa to phylink. In the patch before the conversion, I switched to the lynx PCS instead of using the existing (ad-hoc) helpers. So we can't just handle everything in phylink; the driver has to be involved too. And of course, what happens when the link is brought back up again? The driver will once again offer the (now bogus) PCS. Will we have to track dead PCSs forever? What happens if another (valid) PCS gets allocated at the same address as the old PCS? >> I think we'd need something like >> >> struct phylink_pcs *pcs_get(struct device *dev, const char *id, >> void (*detach)(struct phylink_pcs *, void *), >> void *priv) >> >> which would also require that pcs_get is called before phylink_start, >> instead of in probe (which is what every existing user does). > > That would at least allow the MAC driver to take action when the PCS > gets removed. > >> This whole thing has me asking the question: why do we allow unbinding >> in-use devices in the first place? > > The driver model was designed that way from the start, because in most > cases when something is unplugged from the system, the "remove" driver > callback is just a notification that the device has already gone. > Failing it makes no sense, because software can't magic the device > back. > > Take an example of a USB device. The user unplugs it, it's gone from > the system, but the system briefly still believes the device to be > present for a while. It eventually notices that the device has gone, > and the USB layer unregisters the struct device - which has the effect > of unbinding the device from the driver and eventually cleaning up the > struct device. > > This can and does happen with Ethernet PHYs ever since we started > supporting SFPs with Ethernet PHYs. The same thing is true there - > you can pull the module at any moment, it will be gone, and the > system will catch up with its physical disconnection some point later. > It's no good trying to make ->remove say "no, the device is still in > use, I won't let you remove it" because there's nothing software can > do to prevent the going of the device - the device has already > physically gone. OK, that clears things up. > I don't think that's the case with PCS - do we really have any PCS > that can be disconnected from the system while it's running? Maybe > ones in a FPGA and the FPGA can be reprogrammed at runtime (yes, > people have really done this in the past.) This is actually a pretty good case for PCS drivers, since the MAC has no idea what kind of PCS it's hooked up to when there's a PCS on the FPGA. > If we don't want to support "hotpluggable" PCS, then there's a > simple solution to this - the driver model has the facility to suppress > the bind/unbind driver files in sysfs, which means userspace can't > unbind the device. If there's also no way to make the struct device go > away in the kernel, then effectively the driver can then only be > unbound if the driver is built as a module. > > At that point, we always have the option of insisting that PCS drivers > are never modules - and then we have the situation where a PCS will > never disappear from the system once the driver has picked up on it. Well can't we increment the module refcount? > Of course, those PCS that are tightly bound to their MACs can still > come and go along with their MACs, but it's up to the MAC driver to > make that happen safely (unregistering the netdev first, which will > have the effect of calling ndo_close(), taking the network device > down and preventing further access to the netdev - standard netdev > MAC driver stuff.) And I'm not too worried about that; there's no need to create a separate device for the PCS if it's always present. --Sean
> In the last thread I posted this snippet: > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index a74b320f5b27..05894e1c3e59 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -27,6 +27,7 @@ > #include <linux/phy.h> > #include <linux/phy_led_triggers.h> > #include <linux/property.h> > +#include <linux/rtnetlink.h> > #include <linux/sfp.h> > #include <linux/skbuff.h> > #include <linux/slab.h> > @@ -3111,6 +3112,13 @@ static int phy_remove(struct device *dev) > { > struct phy_device *phydev = to_phy_device(dev); > > + // I'm pretty sure this races with multiple unbinds... > + rtnl_lock(); > + device_unlock(dev); > + dev_close(phydev->attached_dev); > + device_lock(dev); > + rtnl_unlock(); > + WARN_ON(phydev->attached_dev); > + > cancel_delayed_work_sync(&phydev->state_queue); > > mutex_lock(&phydev->lock); > --- > > Would this be acceptable? Can the locking be fixed? Code like this should not be hidden in the PHY layer. If we decide to go down this path it probably should be in net/core/dev.c. I suggest you talk to the maintainers of that file, probably Eric Dumazet, give him a clear explanation of the problem, and see what he suggests. Andrew
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 0c6efd792690..d75ca97f74d4 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -3119,6 +3119,8 @@ static int phy_remove(struct device *dev) cancel_delayed_work_sync(&phydev->state_queue); + WARN_ON(phydev->attached_dev); + mutex_lock(&phydev->lock); phydev->state = PHY_DOWN; mutex_unlock(&phydev->lock);
netdevs using phylib can be oopsed from userspace in the following manner: $ ip link set $iface up $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \ /sys/class/net/$iface/phydev/driver/unbind $ ip link set $iface down However, the traceback provided is a bit too late, since it does not capture the root of the problem (unbinding the driver). It's also possible that the memory has been reallocated if sufficient time passes between when the phy is detached and when the netdev touches the phy (which could result in silent memory corruption). Add a warning at the source of the problem. A future patch could make this more robust by calling dev_close. Signed-off-by: Sean Anderson <sean.anderson@seco.com> --- This is a result of discussion around my attempt to make PCS devices proper devices [1]. Russell pointed out [2] that someone could unbind the PCS at any time. However, the same issue applies to ethernet phys, as well as serdes phys. Both of these can be unbound at any time, yet no precautions are taken to avoid dereferencing a stale pointer. As I discussed [3], we have (in general) four ways to approach this - Just warn and accept that we are going to oops later on - Create devlinks between the consumer/supplier - Create a composite device composed of the consumer and its suppliers - Add a callback (such as dev_close) and call it when the consumer goes away It is (of course) also possible to rewrite phylib so that devices are not used (preventing the phy from being unbound). However, I suspect that this is quite undesirable. This patch implements the first option, which, while fixing nothing, at least provides some better debug information. [1] https://lore.kernel.org/netdev/20220711160519.741990-1-sean.anderson@seco.com/ [2] https://lore.kernel.org/netdev/YsyPGMOiIGktUlqD@shell.armlinux.org.uk/ [3] https://lore.kernel.org/netdev/9747f8ef-66b3-0870-cbc0-c1783896b30d@seco.com/ drivers/net/phy/phy_device.c | 2 ++ 1 file changed, 2 insertions(+)