Message ID | 20210901225053.1205571-2-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
Series | Make the PHY library stop being so greedy when binding the generic PHY driver | expand |
On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote: > There are systems where the PHY driver might get its probe deferred due > to a missing supplier, like an interrupt-parent, gpio, clock or whatever. > > If the phy_attach_direct call happens right in between probe attempts, > the PHY library is greedy and assumes that a specific driver will never > appear, so it just binds the generic PHY driver. > > In certain cases this is the wrong choice, because some PHYs simply need > the specific driver. The specific PHY driver was going to probe, given > enough time, but this doesn't seem to matter to phy_attach_direct. > > To solve this, make phy_attach_direct check whether a specific PHY > driver is pending or not, and if it is, just defer the probing of the > MAC that's connecting to us a bit more too. > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > --- > drivers/base/dd.c | 21 +++++++++++++++++++-- > drivers/net/phy/phy_device.c | 8 ++++++++ > include/linux/device.h | 1 + > 3 files changed, 28 insertions(+), 2 deletions(-) > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > index 1c379d20812a..b22073b0acd2 100644 > --- a/drivers/base/dd.c > +++ b/drivers/base/dd.c > @@ -128,13 +128,30 @@ static void deferred_probe_work_func(struct work_struct *work) > } > static DECLARE_WORK(deferred_probe_work, deferred_probe_work_func); > > +static bool __device_pending_probe(struct device *dev) > +{ > + return !list_empty(&dev->p->deferred_probe); > +} > + > +bool device_pending_probe(struct device *dev) > +{ > + bool pending; > + > + mutex_lock(&deferred_probe_mutex); > + pending = __device_pending_probe(dev); > + mutex_unlock(&deferred_probe_mutex); > + > + return pending; > +} > +EXPORT_SYMBOL_GPL(device_pending_probe); > + > void driver_deferred_probe_add(struct device *dev) > { > if (!dev->can_match) > return; > > mutex_lock(&deferred_probe_mutex); > - if (list_empty(&dev->p->deferred_probe)) { > + if (!__device_pending_probe(dev)) { > dev_dbg(dev, "Added to deferred list\n"); > list_add_tail(&dev->p->deferred_probe, &deferred_probe_pending_list); > } > @@ -144,7 +161,7 @@ void driver_deferred_probe_add(struct device *dev) > void driver_deferred_probe_del(struct device *dev) > { > mutex_lock(&deferred_probe_mutex); > - if (!list_empty(&dev->p->deferred_probe)) { > + if (__device_pending_probe(dev)) { > dev_dbg(dev, "Removed from deferred list\n"); > list_del_init(&dev->p->deferred_probe); > __device_set_deferred_probe_reason(dev, NULL); > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 52310df121de..2c22a32f0a1c 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, > > /* Assume that if there is no driver, that it doesn't > * exist, and we should use the genphy driver. > + * The exception is during probing, when the PHY driver might have > + * attempted a probe but has requested deferral. Since there might be > + * MAC drivers which also attach to the PHY during probe time, try > + * harder to bind the specific PHY driver, and defer the MAC driver's > + * probing until then. Wait, no, this should not be a "special" thing, and why would the list of deferred probe show this? If a bus wants to have this type of "generic vs. specific" logic, then it needs to handle it in the bus logic itself as that does NOT fit into the normal driver model at all. Don't try to get a "hint" of this by messing with the probe function list. thanks, greg k-h
On Thu, Sep 02, 2021 at 07:43:10AM +0200, Greg Kroah-Hartman wrote: > Wait, no, this should not be a "special" thing, and why would the list > of deferred probe show this? Why as in why would it work/do what I want, or as in why would you want to do that? > If a bus wants to have this type of "generic vs. specific" logic, then > it needs to handle it in the bus logic itself as that does NOT fit into > the normal driver model at all. Don't try to get a "hint" of this by > messing with the probe function list. Where and how? Do you have an example?
On Thu, Sep 02, 2021 at 01:11:50PM +0300, Vladimir Oltean wrote: > On Thu, Sep 02, 2021 at 07:43:10AM +0200, Greg Kroah-Hartman wrote: > > Wait, no, this should not be a "special" thing, and why would the list > > of deferred probe show this? > > Why as in why would it work/do what I want, or as in why would you want to do that? Both! :) > > If a bus wants to have this type of "generic vs. specific" logic, then > > it needs to handle it in the bus logic itself as that does NOT fit into > > the normal driver model at all. Don't try to get a "hint" of this by > > messing with the probe function list. > > Where and how? Do you have an example? No I do not, sorry, most busses do not do this for obvious ordering / loading / we are not that crazy reasons. What is causing this all to suddenly break? The devlink stuff? thanks, greg k-h
On Thu, Sep 02, 2021 at 12:37:34PM +0200, Greg Kroah-Hartman wrote: > On Thu, Sep 02, 2021 at 01:11:50PM +0300, Vladimir Oltean wrote: > > On Thu, Sep 02, 2021 at 07:43:10AM +0200, Greg Kroah-Hartman wrote: > > > Wait, no, this should not be a "special" thing, and why would the list > > > of deferred probe show this? > > > > Why as in why would it work/do what I want, or as in why would you want to do that? > > Both! :) So first: why would it work. You seem to have a misconception that I am "messing with the probe function list". I am not, I am just exporting the information whether the device had a driver which returned -EPROBE_DEFER during probe, or not. For that I am looking at the presence of this device on the deferred_probe_pending_list. driver_probe_device -> if (ret == -EPROBE_DEFER || ret == EPROBE_DEFER) driver_deferred_probe_add(dev); -> list_add_tail(&dev->p->deferred_probe, &deferred_probe_pending_list); driver_bound -> driver_deferred_probe_del -> list_del_init(&dev->p->deferred_probe); So the presence of "dev" inside deferred_probe_pending_list means precisely that a driver is pending to be bound. Second: why would I want to do that. In the case of PHY devices, the driver binding process starts here: phy_device_register -> device_add It begins synchronously, but may not finish due to probe deferral. So after device_add finishes, phydev->drv might be NULL due to 2 reasons: 1. -EPROBE_DEFER triggered by "somebody", either by the PHY driver probe function itself, or by third parties (like device_links_check_suppliers happening to notice that before even calling the driver's probe fn). Anyway, the distinction between these 2 is pretty much irrelevant. 2. There genuinely was no driver loaded in the system for this PHY. Note that the way things are written, the Generic PHY driver will not match on any device in phy_bus_match(). It is bound manually, separately. The PHY library is absolutely happy to work with a headless chicken, a phydev with a NULL phydev->drv. Just search for "if (!phydev->drv)" inside drivers/net/phy/phy.c and drivers/net/phy/phy_device.c. However, the phydev walking with a NULL drv can only last for so long. An Ethernet port will soon need that PHY device, and will attach to it. There are many code paths, all ending in phy_attach_direct. However, when an Ethernet port decides to attach to a PHY device is completely asynchronous to the lifetime of the PHY device itself. This moment is where a driver is really needed, and if none is present, the generic one is force-bound. My patch only distinguishes between case 1 and 2 for which phydev->drv might be NULL. It avoids force-binding the generic PHY when a specific PHY driver was found, but did not finish binding due to probe deferral. > > > If a bus wants to have this type of "generic vs. specific" logic, then > > > it needs to handle it in the bus logic itself as that does NOT fit into > > > the normal driver model at all. Don't try to get a "hint" of this by > > > messing with the probe function list. > > > > Where and how? Do you have an example? > > No I do not, sorry, most busses do not do this for obvious ordering / > loading / we are not that crazy reasons. > > What is causing this all to suddenly break? The devlink stuff? There was a report related to fw_devlink indeed, however strictly speaking, I wouldn't say it is the cause of all this. It is pretty uncommon for a PHY device to defer probing I think, hence the bad assumptions made around it.
On Thu, Sep 2, 2021 at 7:43 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote: > > There are systems where the PHY driver might get its probe deferred due > > to a missing supplier, like an interrupt-parent, gpio, clock or whatever. > > > > If the phy_attach_direct call happens right in between probe attempts, > > the PHY library is greedy and assumes that a specific driver will never > > appear, so it just binds the generic PHY driver. > > > > In certain cases this is the wrong choice, because some PHYs simply need > > the specific driver. The specific PHY driver was going to probe, given > > enough time, but this doesn't seem to matter to phy_attach_direct. > > > > To solve this, make phy_attach_direct check whether a specific PHY > > driver is pending or not, and if it is, just defer the probing of the > > MAC that's connecting to us a bit more too. > > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > > --- > > drivers/base/dd.c | 21 +++++++++++++++++++-- > > drivers/net/phy/phy_device.c | 8 ++++++++ > > include/linux/device.h | 1 + > > 3 files changed, 28 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > > index 1c379d20812a..b22073b0acd2 100644 > > --- a/drivers/base/dd.c > > +++ b/drivers/base/dd.c > > @@ -128,13 +128,30 @@ static void deferred_probe_work_func(struct work_struct *work) > > } > > static DECLARE_WORK(deferred_probe_work, deferred_probe_work_func); > > > > +static bool __device_pending_probe(struct device *dev) > > +{ > > + return !list_empty(&dev->p->deferred_probe); > > +} > > + > > +bool device_pending_probe(struct device *dev) > > +{ > > + bool pending; > > + > > + mutex_lock(&deferred_probe_mutex); > > + pending = __device_pending_probe(dev); > > + mutex_unlock(&deferred_probe_mutex); > > + > > + return pending; > > +} > > +EXPORT_SYMBOL_GPL(device_pending_probe); > > + > > void driver_deferred_probe_add(struct device *dev) > > { > > if (!dev->can_match) > > return; > > > > mutex_lock(&deferred_probe_mutex); > > - if (list_empty(&dev->p->deferred_probe)) { > > + if (!__device_pending_probe(dev)) { > > dev_dbg(dev, "Added to deferred list\n"); > > list_add_tail(&dev->p->deferred_probe, &deferred_probe_pending_list); > > } > > @@ -144,7 +161,7 @@ void driver_deferred_probe_add(struct device *dev) > > void driver_deferred_probe_del(struct device *dev) > > { > > mutex_lock(&deferred_probe_mutex); > > - if (!list_empty(&dev->p->deferred_probe)) { > > + if (__device_pending_probe(dev)) { > > dev_dbg(dev, "Removed from deferred list\n"); > > list_del_init(&dev->p->deferred_probe); > > __device_set_deferred_probe_reason(dev, NULL); > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > > index 52310df121de..2c22a32f0a1c 100644 > > --- a/drivers/net/phy/phy_device.c > > +++ b/drivers/net/phy/phy_device.c > > @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, > > > > /* Assume that if there is no driver, that it doesn't > > * exist, and we should use the genphy driver. > > + * The exception is during probing, when the PHY driver might have > > + * attempted a probe but has requested deferral. Since there might be > > + * MAC drivers which also attach to the PHY during probe time, try > > + * harder to bind the specific PHY driver, and defer the MAC driver's > > + * probing until then. > > Wait, no, this should not be a "special" thing, and why would the list > of deferred probe show this? > > If a bus wants to have this type of "generic vs. specific" logic, then > it needs to handle it in the bus logic itself as that does NOT fit into > the normal driver model at all. Well, I think that this is a general issue and it appears to me to be present in the driver core too, at least to some extent. Namely, if there are two drivers matching the same device and the first one's ->probe() returns -EPROBE_DEFER, that will be converted to EPROBE_DEFER by really_probe(), so driver_probe_device() will pass it to __device_attach_driver() which then will return 0. This bus_for_each_drv() will call __device_attach_driver() for the second matching driver even though the first one may still probe successfully later. To me, this really is a variant of "if a driver has failed to probe, try another one" which phy_attach_direct() appears to be doing and in both cases the probing of the "alternative" is premature if the probing of the original driver has been deferred. > Don't try to get a "hint" of this by messing with the probe function list. I agree that this doesn't look particularly clean, but then I'm wondering how to address this cleanly.
On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote: > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 52310df121de..2c22a32f0a1c 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, > > /* Assume that if there is no driver, that it doesn't > * exist, and we should use the genphy driver. > + * The exception is during probing, when the PHY driver might have > + * attempted a probe but has requested deferral. Since there might be > + * MAC drivers which also attach to the PHY during probe time, try > + * harder to bind the specific PHY driver, and defer the MAC driver's > + * probing until then. > */ > if (!d->driver) { > + if (device_pending_probe(d)) > + return -EPROBE_DEFER; Something else that concerns me here. As noted, many network drivers attempt to attach their PHY when the device is brought up, and not during their probe function. Taking a driver at random: drivers/net/ethernet/renesas/sh_eth.c sh_eth_phy_init() calls of_phy_connect() or phy_connect(), which ultimately calls phy_attach_direct() and propagates the error code via an error pointer. sh_eth_phy_init() propagates the error code to its caller, sh_eth_phy_start(). This is called from sh_eth_open(), which probagates the error code. This is called from .ndo_open... and it's highly likely -EPROBE_DEFER will end up being returned to userspace through either netlink or netdev ioctls. Since EPROBE_DEFER is not an error number that we export to userspace, this should basically never be exposed to userspace, yet we have a path that it _could_ be exposed if the above condition is true. If device_pending_probe() returns true e.g. during initial boot up while modules are being loaded - maybe the phy driver doesn't have all the resources it needs because of some other module that hasn't finished initialising - then we have a window where this will be exposed to userspace. So, do we need to fix all the network drivers to do something if their .ndo_open method encounters this? If so, what? Sleep a bit and try again? How many times to retry? Convert the error code into something else, causing userspace to fail where it worked before? If so which error code? I think this needs to be thought through a bit better. In this case, I feel that throwing -EPROBE_DEFER to solve one problem with one subsystem can result in new problems elsewhere. We did have an idea at one point about reserving some flag bits in phydev->dev_flags for phylib use, but I don't think that happened. If this is the direction we want to go, I think we need to have a flag in dev_flags so that callers opt-in to the new behaviour whereas callers such as from .ndo_open keep the old behaviour - because they just aren't setup to handle an -EPROBE_DEFER return from these functions.
On Thu, Sep 02, 2021 at 07:50:16PM +0100, Russell King (Oracle) wrote: > On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote: > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > > index 52310df121de..2c22a32f0a1c 100644 > > --- a/drivers/net/phy/phy_device.c > > +++ b/drivers/net/phy/phy_device.c > > @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, > > > > /* Assume that if there is no driver, that it doesn't > > * exist, and we should use the genphy driver. > > + * The exception is during probing, when the PHY driver might have > > + * attempted a probe but has requested deferral. Since there might be > > + * MAC drivers which also attach to the PHY during probe time, try > > + * harder to bind the specific PHY driver, and defer the MAC driver's > > + * probing until then. > > */ > > if (!d->driver) { > > + if (device_pending_probe(d)) > > + return -EPROBE_DEFER; > > Something else that concerns me here. > > As noted, many network drivers attempt to attach their PHY when the > device is brought up, and not during their probe function. > > Taking a driver at random: > > drivers/net/ethernet/renesas/sh_eth.c > > sh_eth_phy_init() calls of_phy_connect() or phy_connect(), which > ultimately calls phy_attach_direct() and propagates the error code > via an error pointer. > > sh_eth_phy_init() propagates the error code to its caller, > sh_eth_phy_start(). This is called from sh_eth_open(), which > probagates the error code. This is called from .ndo_open... and it's > highly likely -EPROBE_DEFER will end up being returned to userspace > through either netlink or netdev ioctls. > > Since EPROBE_DEFER is not an error number that we export to > userspace, this should basically never be exposed to userspace, yet > we have a path that it _could_ be exposed if the above condition > is true. > > If device_pending_probe() returns true e.g. during initial boot up > while modules are being loaded - maybe the phy driver doesn't have > all the resources it needs because of some other module that hasn't > finished initialising - then we have a window where this will be > exposed to userspace. > > So, do we need to fix all the network drivers to do something if > their .ndo_open method encounters this? If so, what? Sleep a bit > and try again? How many times to retry? Convert the error code into > something else, causing userspace to fail where it worked before? If > so which error code? It depends what is the outcome you're going for. If there's a PHY driver pending, I would do something to wait for that if I could, it would be silly for the PHY driver to be loading but the PHY to still be bound to genphy. I feel that connecting to the PHY from the probe path is the overall cleaner way to go since it deals with this automatically, but due to the sheer volume of drivers that connect from .ndo_open, modifying them in bulk is out of the question. Something sensible needs to happen with them too, and 'genphy is what you get' might be just that, which is basically what is happening without these patches. On that note, I don't know whether there is any objective advantage to connecting to the PHY at .ndo_open time. > > I think this needs to be thought through a bit better. In this case, > I feel that throwing -EPROBE_DEFER to solve one problem with one > subsystem can result in new problems elsewhere. > > We did have an idea at one point about reserving some flag bits in > phydev->dev_flags for phylib use, but I don't think that happened. > If this is the direction we want to go, I think we need to have a > flag in dev_flags so that callers opt-in to the new behaviour whereas > callers such as from .ndo_open keep the old behaviour - because they > just aren't setup to handle an -EPROBE_DEFER return from these > functions. Or that, yes. I hadn't actually thought about using PHY flags, but I suppose callers which already can cope with EPROBE_DEFER (they connect from probe) can opt into that.
On Thu, Sep 02, 2021 at 07:50:16PM +0100, Russell King (Oracle) wrote: > On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote: > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > > index 52310df121de..2c22a32f0a1c 100644 > > --- a/drivers/net/phy/phy_device.c > > +++ b/drivers/net/phy/phy_device.c > > @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, > > > > /* Assume that if there is no driver, that it doesn't > > * exist, and we should use the genphy driver. > > + * The exception is during probing, when the PHY driver might have > > + * attempted a probe but has requested deferral. Since there might be > > + * MAC drivers which also attach to the PHY during probe time, try > > + * harder to bind the specific PHY driver, and defer the MAC driver's > > + * probing until then. > > */ > > if (!d->driver) { > > + if (device_pending_probe(d)) > > + return -EPROBE_DEFER; > > Something else that concerns me here. > > As noted, many network drivers attempt to attach their PHY when the > device is brought up, and not during their probe function. Yes, this is going to be a problem. I agree it is too late to return -EPROBE_DEFER. Maybe phy_attach_direct() needs to wait around, if the device is still on the deferred list, otherwise use genphy. And maybe a timeout and return -ENODEV, which is not 100% correct, we know the device exists, we just cannot drive it. Can we tell we are in the context of a driver probe? Or do we need to add a parameter to the various phy_attach API calls to let the core know if this is probe or open? This is more likely to be a problem with NFS root, with the kernel bringing up an interface as soon as its registered. userspace bringing up interfaces is generally much later, and udev tends to wait around until there are no more driver load requests before the boot continues. Andrew
On 9/2/2021 12:51 PM, Andrew Lunn wrote: > On Thu, Sep 02, 2021 at 07:50:16PM +0100, Russell King (Oracle) wrote: >> On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote: >>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >>> index 52310df121de..2c22a32f0a1c 100644 >>> --- a/drivers/net/phy/phy_device.c >>> +++ b/drivers/net/phy/phy_device.c >>> @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, >>> >>> /* Assume that if there is no driver, that it doesn't >>> * exist, and we should use the genphy driver. >>> + * The exception is during probing, when the PHY driver might have >>> + * attempted a probe but has requested deferral. Since there might be >>> + * MAC drivers which also attach to the PHY during probe time, try >>> + * harder to bind the specific PHY driver, and defer the MAC driver's >>> + * probing until then. >>> */ >>> if (!d->driver) { >>> + if (device_pending_probe(d)) >>> + return -EPROBE_DEFER; >> >> Something else that concerns me here. >> >> As noted, many network drivers attempt to attach their PHY when the >> device is brought up, and not during their probe function. > > Yes, this is going to be a problem. I agree it is too late to return > -EPROBE_DEFER. Maybe phy_attach_direct() needs to wait around, if the > device is still on the deferred list, otherwise use genphy. And maybe > a timeout and return -ENODEV, which is not 100% correct, we know the > device exists, we just cannot drive it. Is it really going to be a problem though? The two cases where this will matter is if we use IP auto-configuration within the kernel, which this patchset ought to be helping with, if we are already in user-space and the PHY is connected at .ndo_open() time, there is a whole lot of things that did happen prior to getting there, such as udevd using modaliases in order to load every possible module we might, so I am debating whether we will really see a probe deferral at all. > > Can we tell we are in the context of a driver probe? Or do we need to > add a parameter to the various phy_attach API calls to let the core > know if this is probe or open? Actually we do the RTNL lock will be held during ndo_open and it won't during driver probe. > > This is more likely to be a problem with NFS root, with the kernel > bringing up an interface as soon as its registered. userspace bringing > up interfaces is generally much later, and udev tends to wait around > until there are no more driver load requests before the boot > continues. See my point above, with Vladimir's change, we should have fw_devlink do its job such that by the time the network interface is needed for IP auto-configuration, all of its depending resources should also be ready, would not they?
On Thu, Sep 02, 2021 at 01:33:57PM -0700, Florian Fainelli wrote: > On 9/2/2021 12:51 PM, Andrew Lunn wrote: > > On Thu, Sep 02, 2021 at 07:50:16PM +0100, Russell King (Oracle) wrote: > > > On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote: > > > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > > > > index 52310df121de..2c22a32f0a1c 100644 > > > > --- a/drivers/net/phy/phy_device.c > > > > +++ b/drivers/net/phy/phy_device.c > > > > @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, > > > > /* Assume that if there is no driver, that it doesn't > > > > * exist, and we should use the genphy driver. > > > > + * The exception is during probing, when the PHY driver might have > > > > + * attempted a probe but has requested deferral. Since there might be > > > > + * MAC drivers which also attach to the PHY during probe time, try > > > > + * harder to bind the specific PHY driver, and defer the MAC driver's > > > > + * probing until then. > > > > */ > > > > if (!d->driver) { > > > > + if (device_pending_probe(d)) > > > > + return -EPROBE_DEFER; > > > > > > Something else that concerns me here. > > > > > > As noted, many network drivers attempt to attach their PHY when the > > > device is brought up, and not during their probe function. > > > > Yes, this is going to be a problem. I agree it is too late to return > > -EPROBE_DEFER. Maybe phy_attach_direct() needs to wait around, if the > > device is still on the deferred list, otherwise use genphy. And maybe > > a timeout and return -ENODEV, which is not 100% correct, we know the > > device exists, we just cannot drive it. > > Is it really going to be a problem though? The two cases where this will > matter is if we use IP auto-configuration within the kernel, which this > patchset ought to be helping with There is no handling of EPROBE_DEFER in the IP auto-configuration code while trying to bring up interfaces: for_each_netdev(&init_net, dev) { if (ic_is_init_dev(dev)) { ... oflags = dev->flags; if (dev_change_flags(dev, oflags | IFF_UP, NULL) < 0) { pr_err("IP-Config: Failed to open %s\n", dev->name); continue; } So, the only way this could be reliable is if we can guarantee that all deferred probes will have been retried by the time we get here. Do we have that guarantee? > if we are already in user-space and the > PHY is connected at .ndo_open() time, there is a whole lot of things that > did happen prior to getting there, such as udevd using modaliases in order > to load every possible module we might, so I am debating whether we will > really see a probe deferral at all. As can be seen from my recent posts which show on Debian Buster that interfaces are attempted to be brought up while e.g. mv88e6xxx is still probing, we can't make any guarantees that things have "settled" by the time userspace attempts to bring up the network interfaces. I may have more on why that is happening... I won't post it here, I'll post to the other thread. > > Can we tell we are in the context of a driver probe? Or do we need to > > add a parameter to the various phy_attach API calls to let the core > > know if this is probe or open? > > Actually we do the RTNL lock will be held during ndo_open and it won't > during driver probe. That's probably an unreliable indicator. DPAA2 has weirdness in the way it can dynamically create and destroy network interfaces, which does lead to problems with the rtnl lock. I've been carrying a patch from NXP for this for almost two years now, which NXP still haven't submitted: http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=cex7&id=a600f2ee50223e9bcdcf86b65b4c427c0fd425a4 ... and I've no idea why that patch never made mainline. I need it to avoid the stated deadlock on SolidRun Honeycomb platforms when creating additional network interfaces for the SFP cages in userspace.
On Thu, Sep 02, 2021 at 10:33:03PM +0100, Russell King (Oracle) wrote: > That's probably an unreliable indicator. DPAA2 has weirdness in the > way it can dynamically create and destroy network interfaces, which > does lead to problems with the rtnl lock. I've been carrying a patch > from NXP for this for almost two years now, which NXP still haven't > submitted: > > http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=cex7&id=a600f2ee50223e9bcdcf86b65b4c427c0fd425a4 > > ... and I've no idea why that patch never made mainline. I need it > to avoid the stated deadlock on SolidRun Honeycomb platforms when > creating additional network interfaces for the SFP cages in userspace. Ah, nice, I've copied that broken logic for the dpaa2-switch too: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d52ef12f7d6c016f3b249db95af33f725e3dd065 So why don't you send the patch? I can send it too if you want to, one for the switch and one for the DPNI driver.
On Fri, Sep 03, 2021 at 12:39:49AM +0300, Vladimir Oltean wrote: > On Thu, Sep 02, 2021 at 10:33:03PM +0100, Russell King (Oracle) wrote: > > That's probably an unreliable indicator. DPAA2 has weirdness in the > > way it can dynamically create and destroy network interfaces, which > > does lead to problems with the rtnl lock. I've been carrying a patch > > from NXP for this for almost two years now, which NXP still haven't > > submitted: > > > > http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=cex7&id=a600f2ee50223e9bcdcf86b65b4c427c0fd425a4 > > > > ... and I've no idea why that patch never made mainline. I need it > > to avoid the stated deadlock on SolidRun Honeycomb platforms when > > creating additional network interfaces for the SFP cages in userspace. > > Ah, nice, I've copied that broken logic for the dpaa2-switch too: > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d52ef12f7d6c016f3b249db95af33f725e3dd065 > > So why don't you send the patch? I can send it too if you want to, one > for the switch and one for the DPNI driver. Sorry, I mis-stated. NXP did submit that exact patch, but it's actually incorrect for the reason I stated when it was sent: https://patchwork.ozlabs.org/project/netdev/patch/1574363727-5437-2-git-send-email-ioana.ciornei@nxp.com/ I did miss the rtnl_lock() around phylink_disconnect_phy() in the description of the race, which goes someway towards hiding it, but there is still a race between phylink_destroy() and another thread calling dpaa2_eth_get_link_ksettings(), and priv->mac being freed: static int dpaa2_eth_get_link_ksettings(struct net_device *net_dev, struct ethtool_link_ksettings *link_settings) { struct dpaa2_eth_priv *priv = netdev_priv(net_dev); if (dpaa2_eth_is_type_phy(priv)) return phylink_ethtool_ksettings_get(priv->mac->phylink, link_settings); which dereferences priv->mac and priv->mac->phylink, vs: static irqreturn_t dpni_irq0_handler_thread(int irq_num, void *arg) { ... if (status & DPNI_IRQ_EVENT_ENDPOINT_CHANGED) { dpaa2_eth_set_mac_addr(netdev_priv(net_dev)); dpaa2_eth_update_tx_fqids(priv); if (dpaa2_eth_has_mac(priv)) dpaa2_eth_disconnect_mac(priv); else dpaa2_eth_connect_mac(priv); } static void dpaa2_eth_disconnect_mac(struct dpaa2_eth_priv *priv) { if (dpaa2_eth_is_type_phy(priv)) dpaa2_mac_disconnect(priv->mac); if (!dpaa2_eth_has_mac(priv)) return; dpaa2_mac_close(priv->mac); kfree(priv->mac); <== potential use after free bug by priv->mac = NULL; <== dpaa2_eth_get_link_ksettings() } void dpaa2_mac_disconnect(struct dpaa2_mac *mac) { if (!mac->phylink) return; phylink_disconnect_phy(mac->phylink); phylink_destroy(mac->phylink); <== another use-after-free bug via dpaa2_eth_get_link_ksettings() dpaa2_pcs_destroy(mac); } Note that phylink_destroy() is documented as: * Note: the rtnl lock must not be held when calling this function. because it calls sfp_bus_del_upstream(), which will take the rtnl lock itself. An alternative solution would be to remove the rtnl locking from sfp_bus_del_upstream(), but then force _everyone_ to take the rtnl lock before calling phylink_destroy() - meaning a larger block of code ends up executing under the lock than is really necessary. However, as I stated in my review of the patch "As I've already stated, the phylink is not designed to be created and destroyed on a published network device." That still remains true today, and it seems that the issue has never been fixed in DPAA2 despite having been pointed out.
On Thu, Sep 02, 2021 at 11:24:39PM +0100, Russell King (Oracle) wrote: > On Fri, Sep 03, 2021 at 12:39:49AM +0300, Vladimir Oltean wrote: > > On Thu, Sep 02, 2021 at 10:33:03PM +0100, Russell King (Oracle) wrote: > > > That's probably an unreliable indicator. DPAA2 has weirdness in the > > > way it can dynamically create and destroy network interfaces, which > > > does lead to problems with the rtnl lock. I've been carrying a patch > > > from NXP for this for almost two years now, which NXP still haven't > > > submitted: > > > > > > http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=cex7&id=a600f2ee50223e9bcdcf86b65b4c427c0fd425a4 > > > > > > ... and I've no idea why that patch never made mainline. I need it > > > to avoid the stated deadlock on SolidRun Honeycomb platforms when > > > creating additional network interfaces for the SFP cages in userspace. > > > > Ah, nice, I've copied that broken logic for the dpaa2-switch too: > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d52ef12f7d6c016f3b249db95af33f725e3dd065 > > > > So why don't you send the patch? I can send it too if you want to, one > > for the switch and one for the DPNI driver. > > Sorry, I mis-stated. NXP did submit that exact patch, but it's actually > incorrect for the reason I stated when it was sent: > > https://patchwork.ozlabs.org/project/netdev/patch/1574363727-5437-2-git-send-email-ioana.ciornei@nxp.com/ So why are you carrying it then? > I did miss the rtnl_lock() around phylink_disconnect_phy() in the > description of the race, which goes someway towards hiding it, but > there is still a race between phylink_destroy() and another thread > calling dpaa2_eth_get_link_ksettings(), and priv->mac being freed: > > static int > dpaa2_eth_get_link_ksettings(struct net_device *net_dev, > struct ethtool_link_ksettings *link_settings) > { > struct dpaa2_eth_priv *priv = netdev_priv(net_dev); > > if (dpaa2_eth_is_type_phy(priv)) > return phylink_ethtool_ksettings_get(priv->mac->phylink, > link_settings); > > which dereferences priv->mac and priv->mac->phylink, vs: > > static irqreturn_t dpni_irq0_handler_thread(int irq_num, void *arg) > { > ... > if (status & DPNI_IRQ_EVENT_ENDPOINT_CHANGED) { > dpaa2_eth_set_mac_addr(netdev_priv(net_dev)); > dpaa2_eth_update_tx_fqids(priv); > > if (dpaa2_eth_has_mac(priv)) > dpaa2_eth_disconnect_mac(priv); > else > dpaa2_eth_connect_mac(priv); > } > > static void dpaa2_eth_disconnect_mac(struct dpaa2_eth_priv *priv) > { > if (dpaa2_eth_is_type_phy(priv)) > dpaa2_mac_disconnect(priv->mac); > > if (!dpaa2_eth_has_mac(priv)) > return; > > dpaa2_mac_close(priv->mac); > kfree(priv->mac); <== potential use after free bug by > priv->mac = NULL; <== dpaa2_eth_get_link_ksettings() > } Okay, so this needs to stay under the rtnetlink mutex, to serialize with dpaa2_eth_get_link_ksettings which is already under the rtnetlink mutex. So the way in which rtnl_lock is taken right now is actually fine in a way. > > void dpaa2_mac_disconnect(struct dpaa2_mac *mac) > { > if (!mac->phylink) > return; > > phylink_disconnect_phy(mac->phylink); > phylink_destroy(mac->phylink); <== another use-after-free bug via > dpaa2_eth_get_link_ksettings() > dpaa2_pcs_destroy(mac); > } > > Note that phylink_destroy() is documented as: > > * Note: the rtnl lock must not be held when calling this function. > > because it calls sfp_bus_del_upstream(), which will take the rtnl lock > itself. An alternative solution would be to remove the rtnl locking > from sfp_bus_del_upstream(), but then force _everyone_ to take the > rtnl lock before calling phylink_destroy() - meaning a larger block of > code ends up executing under the lock than is really necessary. So phylink_destroy has exactly 20 call sites, it is not that bad? And as for "larger block than necessary" - doesn't the dpaa2 prolonged usage count as necessary? > However, as I stated in my review of the patch "As I've already stated, > the phylink is not designed to be created and destroyed on a published > network device." That still remains true today, and it seems that the > issue has never been fixed in DPAA2 despite having been pointed out. So what would you do, exactly, to "fix" the issue that a DPNI can connect and disconnect at runtime from a DPMAC? Also, "X is not designed to Y" doesn't really say much, given a bit of will power. Linux was not designed to run on non-i386 either. Any other issues besides needing to take rtnl_mutex top-level when calling phylink_destroy? Since phylink_disconnect_phy needs it anyway, and phylink_destroy ends up calling sfp_bus_del_upstream which takes the same mutex again, and drivers that connect/disconnect at probe/remove time end up calling both in a row, I don't think there is much of an issue to speak of, or that the rework would be that difficult.
> > Note that phylink_destroy() is documented as: > > > > * Note: the rtnl lock must not be held when calling this function. > > ... > > Any other issues besides needing to take rtnl_mutex top-level when > calling phylink_destroy? We should try to keep phylink_create and phylink_destroy symmetrical: /** * phylink_create() - create a phylink instance * @config: a pointer to the target &struct phylink_config * @fwnode: a pointer to a &struct fwnode_handle describing the network * interface * @iface: the desired link mode defined by &typedef phy_interface_t * @mac_ops: a pointer to a &struct phylink_mac_ops for the MAC. * * Create a new phylink instance, and parse the link parameters found in @np. * This will parse in-band modes, fixed-link or SFP configuration. * * Note: the rtnl lock must not be held when calling this function. Having different locking requirements will catch people out. Interestingly, there is no ASSERT_NO_RTNL(). Maybe we should add such a macro. Andrew
On Fri, Sep 03, 2021 at 01:02:06AM +0200, Andrew Lunn wrote: > We should try to keep phylink_create and phylink_destroy symmetrical: > > /** > * phylink_create() - create a phylink instance > * @config: a pointer to the target &struct phylink_config > * @fwnode: a pointer to a &struct fwnode_handle describing the network > * interface > * @iface: the desired link mode defined by &typedef phy_interface_t > * @mac_ops: a pointer to a &struct phylink_mac_ops for the MAC. > * > * Create a new phylink instance, and parse the link parameters found in @np. > * This will parse in-band modes, fixed-link or SFP configuration. > * > * Note: the rtnl lock must not be held when calling this function. > > Having different locking requirements will catch people out. > > Interestingly, there is no ASSERT_NO_RTNL(). Maybe we should add such > a macro. In this case, the easiest might be to just take a different mutex in dpaa2 which serializes all places that access the priv->mac references. I don't know exactly why the SFP bus needs the rtnl_mutex, I've removed those locks and will see what fails tomorrow, but I don't think dpaa2 has a good enough justification to take the rtnl_mutex just so that it can connect and disconnect to the MAC freely at runtime.
On Fri, Sep 03, 2021 at 02:26:07AM +0300, Vladimir Oltean wrote: > On Fri, Sep 03, 2021 at 01:02:06AM +0200, Andrew Lunn wrote: > > We should try to keep phylink_create and phylink_destroy symmetrical: > > > > /** > > * phylink_create() - create a phylink instance > > * @config: a pointer to the target &struct phylink_config > > * @fwnode: a pointer to a &struct fwnode_handle describing the network > > * interface > > * @iface: the desired link mode defined by &typedef phy_interface_t > > * @mac_ops: a pointer to a &struct phylink_mac_ops for the MAC. > > * > > * Create a new phylink instance, and parse the link parameters found in @np. > > * This will parse in-band modes, fixed-link or SFP configuration. > > * > > * Note: the rtnl lock must not be held when calling this function. > > > > Having different locking requirements will catch people out. > > > > Interestingly, there is no ASSERT_NO_RTNL(). Maybe we should add such > > a macro. > > In this case, the easiest might be to just take a different mutex in > dpaa2 which serializes all places that access the priv->mac references. > I don't know exactly why the SFP bus needs the rtnl_mutex, I've removed > those locks and will see what fails tomorrow, but I don't think dpaa2 > has a good enough justification to take the rtnl_mutex just so that it > can connect and disconnect to the MAC freely at runtime. It needs it to ensure that the sfp-bus code is safe. sfp-bus code sits between phylink and the sfp stuff, and will be called from either side. It can't have its own lock, because that gives lockdep splats. Removing a lock and then running the kernel is a down right stupid way to test to see if a lock is necessary. That approach is like having built a iron bridge, covered it in paint, then you remove most the bolts, and then test to see whether it's safe for vehicles to travel over it by riding your bicycle across it and declaring it safe. Sorry, but if you think "remove lock, run kernel, if it works fine the lock is unnecessary" is a valid approach, then you've just disqualified yourself from discussing this topic any further. Locking is done by knowing the code and code analysis, not by playing "does the code fail if I remove it" games. I am utterly shocked that you think that this is a valid approach.
On Thu, Sep 02, 2021 at 11:24:39PM +0100, Russell King (Oracle) wrote: > On Fri, Sep 03, 2021 at 12:39:49AM +0300, Vladimir Oltean wrote: > > On Thu, Sep 02, 2021 at 10:33:03PM +0100, Russell King (Oracle) wrote: > > > That's probably an unreliable indicator. DPAA2 has weirdness in the > > > way it can dynamically create and destroy network interfaces, which > > > does lead to problems with the rtnl lock. I've been carrying a patch > > > from NXP for this for almost two years now, which NXP still haven't > > > submitted: > > > > > > http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=cex7&id=a600f2ee50223e9bcdcf86b65b4c427c0fd425a4 > > > > > > ... and I've no idea why that patch never made mainline. I need it > > > to avoid the stated deadlock on SolidRun Honeycomb platforms when > > > creating additional network interfaces for the SFP cages in userspace. > > > > Ah, nice, I've copied that broken logic for the dpaa2-switch too: > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d52ef12f7d6c016f3b249db95af33f725e3dd065 > > > > So why don't you send the patch? I can send it too if you want to, one > > for the switch and one for the DPNI driver. > > Sorry, I mis-stated. NXP did submit that exact patch, but it's actually > incorrect for the reason I stated when it was sent: > > https://patchwork.ozlabs.org/project/netdev/patch/1574363727-5437-2-git-send-email-ioana.ciornei@nxp.com/ > > I did miss the rtnl_lock() around phylink_disconnect_phy() in the > description of the race, which goes someway towards hiding it, but > there is still a race between phylink_destroy() and another thread > calling dpaa2_eth_get_link_ksettings(), and priv->mac being freed: > > static int > dpaa2_eth_get_link_ksettings(struct net_device *net_dev, > struct ethtool_link_ksettings *link_settings) > { > struct dpaa2_eth_priv *priv = netdev_priv(net_dev); > > if (dpaa2_eth_is_type_phy(priv)) > return phylink_ethtool_ksettings_get(priv->mac->phylink, > link_settings); > > which dereferences priv->mac and priv->mac->phylink, vs: > > static irqreturn_t dpni_irq0_handler_thread(int irq_num, void *arg) > { > ... > if (status & DPNI_IRQ_EVENT_ENDPOINT_CHANGED) { > dpaa2_eth_set_mac_addr(netdev_priv(net_dev)); > dpaa2_eth_update_tx_fqids(priv); > > if (dpaa2_eth_has_mac(priv)) > dpaa2_eth_disconnect_mac(priv); > else > dpaa2_eth_connect_mac(priv); > } > > static void dpaa2_eth_disconnect_mac(struct dpaa2_eth_priv *priv) > { > if (dpaa2_eth_is_type_phy(priv)) > dpaa2_mac_disconnect(priv->mac); > > if (!dpaa2_eth_has_mac(priv)) > return; > > dpaa2_mac_close(priv->mac); > kfree(priv->mac); <== potential use after free bug by > priv->mac = NULL; <== dpaa2_eth_get_link_ksettings() > } > > void dpaa2_mac_disconnect(struct dpaa2_mac *mac) > { > if (!mac->phylink) > return; > > phylink_disconnect_phy(mac->phylink); > phylink_destroy(mac->phylink); <== another use-after-free bug via > dpaa2_eth_get_link_ksettings() > dpaa2_pcs_destroy(mac); > } > > Note that phylink_destroy() is documented as: > > * Note: the rtnl lock must not be held when calling this function. > > because it calls sfp_bus_del_upstream(), which will take the rtnl lock > itself. An alternative solution would be to remove the rtnl locking > from sfp_bus_del_upstream(), but then force _everyone_ to take the > rtnl lock before calling phylink_destroy() - meaning a larger block of > code ends up executing under the lock than is really necessary. > > However, as I stated in my review of the patch "As I've already stated, > the phylink is not designed to be created and destroyed on a published > network device." That still remains true today, and it seems that the > issue has never been fixed in DPAA2 despite having been pointed out. > My attempt to fix this issue was that patch that you just pointed at. Taking your feedback into account (that phylink is not designed to be created and destroyed on a published networking device) I really do not know what other viable solution to send out. The alternative here would have been to just have a different driver for the MAC side (probing on dpmac objects) that creates the phylink instance at probe time and then is just used by the dpaa2-eth driver when it connects to a dpmac. This way no phylink is created/destroyed dynamically. This was the architecture of my initial attempt at supporting phylink in DPAA2. https://patchwork.ozlabs.org/project/netdev/patch/1560470153-26155-5-git-send-email-ioana.ciornei@nxp.com/ If you have any suggestion on how I should go about fixing this, please let me know. Ioana
On Fri, Sep 03, 2021 at 01:04:19AM +0100, Russell King (Oracle) wrote: > Removing a lock and then running the kernel is a down right stupid > way to test to see if a lock is necessary. > > That approach is like having built a iron bridge, covered it in paint, > then you remove most the bolts, and then test to see whether it's safe > for vehicles to travel over it by riding your bicycle across it and > declaring it safe. > > Sorry, but if you think "remove lock, run kernel, if it works fine > the lock is unnecessary" is a valid approach, then you've just > disqualified yourself from discussing this topic any further. > Locking is done by knowing the code and code analysis, not by > playing "does the code fail if I remove it" games. I am utterly > shocked that you think that this is a valid approach. ... and this is exactly why you will no longer get any attention from me on this topic. Good luck.
On Fri, Sep 03, 2021 at 11:48:22PM +0300, Vladimir Oltean wrote: > On Fri, Sep 03, 2021 at 01:04:19AM +0100, Russell King (Oracle) wrote: > > Removing a lock and then running the kernel is a down right stupid > > way to test to see if a lock is necessary. > > > > That approach is like having built a iron bridge, covered it in paint, > > then you remove most the bolts, and then test to see whether it's safe > > for vehicles to travel over it by riding your bicycle across it and > > declaring it safe. > > > > Sorry, but if you think "remove lock, run kernel, if it works fine > > the lock is unnecessary" is a valid approach, then you've just > > disqualified yourself from discussing this topic any further. > > Locking is done by knowing the code and code analysis, not by > > playing "does the code fail if I remove it" games. I am utterly > > shocked that you think that this is a valid approach. > > ... and this is exactly why you will no longer get any attention from me > on this topic. Good luck. Good, because your approach to this to me reads as "I don't think you know what the hell you're doing so I'm going to remove a lock to test whether it is needed." Effectively, that action is an insult towards me as the author of that code. And as I said, if you think that's a valid approach, then quite frankly I don't want you touching my code, because you clearly don't know what you're doing as you aren't willing to put the necessary effort in to understanding the code. Removing a lock and running the kernel is _never_ a valid way to see whether the lock is required or not. The only way is via code analysis. I wonder whether you'd take the same approach with filesystems or memory management code. Why don't you try removing some locks from those subsystems and see how long your filesystems last? You could have asked why the lock was necessary, and I would have described it. That would have been the civil approach. Maybe even put forward a hypothesis why you think the lock isn't necessary, but no, you decide that the best way to go about this is to remove the lock and see whether the kernel breaks. It may shock you to know that those of us who have been working on the kernel for almost 30 years and have seen the evolution of the kernel from uniprocessor to SMP, have had to debug race conditions caused by a lack of locking know very well that you can have what seems to be a functioning kernel despite missing locks - and such a kernel can last quite a long time and only show up the race quite rarely. This is exactly why "lets remove the lock and see if it breaks" is a completely invalid approach. I'm sorry that you don't seem to realise just how stupid a suggestion that was. I can tell you now: removing the locks you proposed will not show an immediate problem, but by removing those locks you will definitely open up race conditions between driver binding events on the SFP side and network usage on the netdev side which will only occur rarely. And just because they only happen rarely is not a justification to remove locks, no matter how inconvenient those locks may be.
diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 1c379d20812a..b22073b0acd2 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -128,13 +128,30 @@ static void deferred_probe_work_func(struct work_struct *work) } static DECLARE_WORK(deferred_probe_work, deferred_probe_work_func); +static bool __device_pending_probe(struct device *dev) +{ + return !list_empty(&dev->p->deferred_probe); +} + +bool device_pending_probe(struct device *dev) +{ + bool pending; + + mutex_lock(&deferred_probe_mutex); + pending = __device_pending_probe(dev); + mutex_unlock(&deferred_probe_mutex); + + return pending; +} +EXPORT_SYMBOL_GPL(device_pending_probe); + void driver_deferred_probe_add(struct device *dev) { if (!dev->can_match) return; mutex_lock(&deferred_probe_mutex); - if (list_empty(&dev->p->deferred_probe)) { + if (!__device_pending_probe(dev)) { dev_dbg(dev, "Added to deferred list\n"); list_add_tail(&dev->p->deferred_probe, &deferred_probe_pending_list); } @@ -144,7 +161,7 @@ void driver_deferred_probe_add(struct device *dev) void driver_deferred_probe_del(struct device *dev) { mutex_lock(&deferred_probe_mutex); - if (!list_empty(&dev->p->deferred_probe)) { + if (__device_pending_probe(dev)) { dev_dbg(dev, "Removed from deferred list\n"); list_del_init(&dev->p->deferred_probe); __device_set_deferred_probe_reason(dev, NULL); diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 52310df121de..2c22a32f0a1c 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, /* Assume that if there is no driver, that it doesn't * exist, and we should use the genphy driver. + * The exception is during probing, when the PHY driver might have + * attempted a probe but has requested deferral. Since there might be + * MAC drivers which also attach to the PHY during probe time, try + * harder to bind the specific PHY driver, and defer the MAC driver's + * probing until then. */ if (!d->driver) { + if (device_pending_probe(d)) + return -EPROBE_DEFER; + if (phydev->is_c45) d->driver = &genphy_c45_driver.mdiodrv.driver; else diff --git a/include/linux/device.h b/include/linux/device.h index e270cb740b9e..505e77715789 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -889,6 +889,7 @@ int __must_check driver_attach(struct device_driver *drv); void device_initial_probe(struct device *dev); int __must_check device_reprobe(struct device *dev); +bool device_pending_probe(struct device *dev); bool device_is_bound(struct device *dev); /*
There are systems where the PHY driver might get its probe deferred due to a missing supplier, like an interrupt-parent, gpio, clock or whatever. If the phy_attach_direct call happens right in between probe attempts, the PHY library is greedy and assumes that a specific driver will never appear, so it just binds the generic PHY driver. In certain cases this is the wrong choice, because some PHYs simply need the specific driver. The specific PHY driver was going to probe, given enough time, but this doesn't seem to matter to phy_attach_direct. To solve this, make phy_attach_direct check whether a specific PHY driver is pending or not, and if it is, just defer the probing of the MAC that's connecting to us a bit more too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- drivers/base/dd.c | 21 +++++++++++++++++++-- drivers/net/phy/phy_device.c | 8 ++++++++ include/linux/device.h | 1 + 3 files changed, 28 insertions(+), 2 deletions(-)