[net] net: phy: Warn if phy is attached when removing

Message ID	20220816163701.1578850-1-sean.anderson@seco.com (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> From: Sean Anderson <sean.anderson@seco.com> To: Andrew Lunn <andrew@lunn.ch>, Heiner Kallweit <hkallweit1@gmail.com>, Russell King <linux@armlinux.org.uk>, netdev@vger.kernel.org Cc: Paolo Abeni <pabeni@redhat.com>, Jakub Kicinski <kuba@kernel.org>, "David S . Miller" <davem@davemloft.net>, linux-kernel@vger.kernel.org, Eric Dumazet <edumazet@google.com>, Vladimir Oltean <olteanv@gmail.com>, Sean Anderson <sean.anderson@seco.com> Subject: [PATCH net] net: phy: Warn if phy is attached when removing Date: Tue, 16 Aug 2022 12:37:01 -0400 Message-Id: <20220816163701.1578850-1-sean.anderson@seco.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 Precedence: bulk
Series	[net] net: phy: Warn if phy is attached when removing \| expand [net] net: phy: Warn if phy is attached when removing

Context	Check	Description
netdev/tree_selection	success	Clearly marked for net
netdev/fixes_present	fail	Series targets non-next tree, but doesn't contain any Fixes tags
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 85 this patch: 85
netdev/cc_maintainers	success	CCed 8 of 8 maintainers
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 85 this patch: 85
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Sean Anderson Aug. 16, 2022, 4:37 p.m. UTC

netdevs using phylib can be oopsed from userspace in the following
manner:

$ ip link set $iface up
$ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \
      /sys/class/net/$iface/phydev/driver/unbind
$ ip link set $iface down

However, the traceback provided is a bit too late, since it does not
capture the root of the problem (unbinding the driver). It's also
possible that the memory has been reallocated if sufficient time passes
between when the phy is detached and when the netdev touches the phy
(which could result in silent memory corruption). Add a warning at the
source of the problem. A future patch could make this more robust by
calling dev_close.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
---
This is a result of discussion around my attempt to make PCS devices
proper devices [1]. Russell pointed out [2] that someone could unbind
the PCS at any time. However, the same issue applies to ethernet phys,
as well as serdes phys. Both of these can be unbound at any time, yet
no precautions are taken to avoid dereferencing a stale pointer.

As I discussed [3], we have (in general) four ways to approach this

- Just warn and accept that we are going to oops later on
- Create devlinks between the consumer/supplier
- Create a composite device composed of the consumer and its suppliers
- Add a callback (such as dev_close) and call it when the consumer goes
  away

It is (of course) also possible to rewrite phylib so that devices are
not used (preventing the phy from being unbound). However, I suspect
that this is quite undesirable.

This patch implements the first option, which, while fixing nothing, at
least provides some better debug information.

[1] https://lore.kernel.org/netdev/20220711160519.741990-1-sean.anderson@seco.com/
[2] https://lore.kernel.org/netdev/YsyPGMOiIGktUlqD@shell.armlinux.org.uk/
[3] https://lore.kernel.org/netdev/9747f8ef-66b3-0870-cbc0-c1783896b30d@seco.com/

 drivers/net/phy/phy_device.c | 2 ++
 1 file changed, 2 insertions(+)

Jakub Kicinski Aug. 18, 2022, 5:24 p.m. UTC | #1

On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote:
> netdevs using phylib can be oopsed from userspace in the following
> manner:
> 
> $ ip link set $iface up
> $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \
>       /sys/class/net/$iface/phydev/driver/unbind
> $ ip link set $iface down
> 
> However, the traceback provided is a bit too late, since it does not
> capture the root of the problem (unbinding the driver). It's also
> possible that the memory has been reallocated if sufficient time passes
> between when the phy is detached and when the netdev touches the phy
> (which could result in silent memory corruption). Add a warning at the
> source of the problem. A future patch could make this more robust by
> calling dev_close.

Hm, so we're adding the warning to get more detailed reports "from the
field"? Guess we've all done that, so fair.

Acks? It can still make -rc2 if that matters...

Sean Anderson Aug. 18, 2022, 5:32 p.m. UTC | #2

On 8/18/22 1:24 PM, Jakub Kicinski wrote:
> On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote:
>> netdevs using phylib can be oopsed from userspace in the following
>> manner:
>> 
>> $ ip link set $iface up
>> $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \
>>       /sys/class/net/$iface/phydev/driver/unbind
>> $ ip link set $iface down
>> 
>> However, the traceback provided is a bit too late, since it does not
>> capture the root of the problem (unbinding the driver). It's also
>> possible that the memory has been reallocated if sufficient time passes
>> between when the phy is detached and when the netdev touches the phy
>> (which could result in silent memory corruption). Add a warning at the
>> source of the problem. A future patch could make this more robust by
>> calling dev_close.
> 
> Hm, so we're adding the warning to get more detailed reports "from the
> field"? Guess we've all done that, so fair.

My suspicion is that this case never occurs, since users don't expect to
be able to remove the phy while the interface is running (and so don't
attempt it). If we do end up getting reports of this bug, then we will
need to create a more robust fix. My intention is to take the same
strategy for PCS devices as whatever we do here, since the issue is
analogous.

--Sean

Jakub Kicinski Aug. 19, 2022, 11:45 p.m. UTC | #3

On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote:
> netdevs using phylib can be oopsed from userspace in the following
> manner:
> 
> $ ip link set $iface up
> $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \
>       /sys/class/net/$iface/phydev/driver/unbind
> $ ip link set $iface down
> 
> However, the traceback provided is a bit too late, since it does not
> capture the root of the problem (unbinding the driver). It's also
> possible that the memory has been reallocated if sufficient time passes
> between when the phy is detached and when the netdev touches the phy
> (which could result in silent memory corruption). Add a warning at the
> source of the problem. A future patch could make this more robust by
> calling dev_close.

FWIW, I think DaveM marked this patch as changes requested.

I don't really know enough to have an opinion.

PHY maintainers, anyone willing to cast a vote?

Russell King (Oracle) Aug. 20, 2022, 12:20 a.m. UTC | #4

On Fri, Aug 19, 2022 at 04:45:19PM -0700, Jakub Kicinski wrote:
> On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote:
> > netdevs using phylib can be oopsed from userspace in the following
> > manner:
> > 
> > $ ip link set $iface up
> > $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \
> >       /sys/class/net/$iface/phydev/driver/unbind
> > $ ip link set $iface down
> > 
> > However, the traceback provided is a bit too late, since it does not
> > capture the root of the problem (unbinding the driver). It's also
> > possible that the memory has been reallocated if sufficient time passes
> > between when the phy is detached and when the netdev touches the phy
> > (which could result in silent memory corruption). Add a warning at the
> > source of the problem. A future patch could make this more robust by
> > calling dev_close.
> 
> FWIW, I think DaveM marked this patch as changes requested.
> 
> I don't really know enough to have an opinion.
> 
> PHY maintainers, anyone willing to cast a vote?

I don't think Linus is a fan of using WARN_ON() as an assert()
replacement, which this feels very much like that kind of thing.
I don't see much point in using WARN_ON() here as we'll soon get
a kernel oops anyway, and the backtrace WARN_ON() will produce isn't
useful - it'll be just noise.

So, I'd tone it down to something like:

	if (phydev->attached_dev)
		phydev_err(phydev, "Removing in-use PHY, expect an oops");

Maybe even introduce phydev_crit() just for this message.

Since we have bazillions of network drivers that hold a reference to
the phydev, I don't think we can do much better than this for phylib.
It would be a massive effort to go through all the network drivers
and try to work out how to fix them.

Addressing the PCS part of the patch posting and unrelated to what we
do for phylib...

However, I don't see "we'll do this for phylib, so we can do it for
PCS as well" as much of a sane argument - we don't have bazillions
of network drivers to fix to make this work for PCS. We currently
have no removable PCS (they don't appear with a driver so aren't
even bound to anything.) So we should add the appropriate handling
when we start to do this.

Phylink has the capability to take the link down when something goes
away, and if the PCS goes away, that's exactly what should happen,
rather than oopsing the kernel.

As MAC drivers hold a reference to the PCS instances, as they need to
select the appropriate one, how do MAC drivers get to know that the
PCS has gone away to drop their reference - and tell phylink that the
PCS has gone. That's the problem that needs solving to allow PCS to
be unbound if we're going to make them first class citizens of the
driver model.

I am no fan of "but XYZ doesn't care about it, so why should we care"
arguments. If I remember correctly, phylib pre-dates the device model,
and had the device model retrofitted, so it was a best-efforts
attempt - and suffered back then with the same problem of needing
lots of drivers to be changed in non-trivial ways.

We have the chance here to come up with something better - and I think
that chance should be used to full effect.

Sean Anderson Aug. 22, 2022, 4 p.m. UTC | #5

On 8/19/22 8:20 PM, Russell King (Oracle) wrote:
> On Fri, Aug 19, 2022 at 04:45:19PM -0700, Jakub Kicinski wrote:
>> On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote:
>> > netdevs using phylib can be oopsed from userspace in the following
>> > manner:
>> > 
>> > $ ip link set $iface up
>> > $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \
>> >       /sys/class/net/$iface/phydev/driver/unbind
>> > $ ip link set $iface down
>> > 
>> > However, the traceback provided is a bit too late, since it does not
>> > capture the root of the problem (unbinding the driver). It's also
>> > possible that the memory has been reallocated if sufficient time passes
>> > between when the phy is detached and when the netdev touches the phy
>> > (which could result in silent memory corruption). Add a warning at the
>> > source of the problem. A future patch could make this more robust by
>> > calling dev_close.
>> 
>> FWIW, I think DaveM marked this patch as changes requested.
>> 
>> I don't really know enough to have an opinion.
>> 
>> PHY maintainers, anyone willing to cast a vote?
> 
> I don't think Linus is a fan of using WARN_ON() as an assert()
> replacement, which this feels very much like that kind of thing.
> I don't see much point in using WARN_ON() here as we'll soon get
> a kernel oops anyway, and the backtrace WARN_ON() will produce isn't
> useful - it'll be just noise.
> 
> So, I'd tone it down to something like:
> 
> 	if (phydev->attached_dev)
> 		phydev_err(phydev, "Removing in-use PHY, expect an oops");

That's fine by me

> Maybe even introduce phydev_crit() just for this message.
> 
> Since we have bazillions of network drivers that hold a reference to
> the phydev, I don't think we can do much better than this for phylib.
> It would be a massive effort to go through all the network drivers
> and try to work out how to fix them.

In the last thread I posted this snippet:

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index a74b320f5b27..05894e1c3e59 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -27,6 +27,7 @@
 #include <linux/phy.h>
 #include <linux/phy_led_triggers.h>
 #include <linux/property.h>
+#include <linux/rtnetlink.h>
 #include <linux/sfp.h>
 #include <linux/skbuff.h>
 #include <linux/slab.h>
@@ -3111,6 +3112,13 @@ static int phy_remove(struct device *dev)
 {
        struct phy_device *phydev = to_phy_device(dev);
 
+	// I'm pretty sure this races with multiple unbinds...
+       rtnl_lock();
+       device_unlock(dev);
+       dev_close(phydev->attached_dev);
+       device_lock(dev);
+       rtnl_unlock();
+       WARN_ON(phydev->attached_dev);
+
        cancel_delayed_work_sync(&phydev->state_queue);
 
        mutex_lock(&phydev->lock);
---

Would this be acceptable? Can the locking be fixed?

> Addressing the PCS part of the patch posting and unrelated to what we
> do for phylib...
> 
> However, I don't see "we'll do this for phylib, so we can do it for
> PCS as well" as much of a sane argument - we don't have bazillions
> of network drivers to fix to make this work for PCS. We currently
> have no removable PCS (they don't appear with a driver so aren't
> even bound to anything.) So we should add the appropriate handling
> when we start to do this.
> 
> Phylink has the capability to take the link down when something goes
> away, and if the PCS goes away, that's exactly what should happen,
> rather than oopsing the kernel.

Yeah, but we can't just call phylink_stop; we have to call the function
which will call phylink_stop, which is different for MAC drivers and
for DSA drivers. I think we'd need something like

struct phylink_pcs *pcs_get(struct device *dev, const char *id,
			    void (*detach)(struct phylink_pcs *, void *),
			    void *priv)

which would also require that pcs_get is called before phylink_start,
instead of in probe (which is what every existing user does).

> As MAC drivers hold a reference to the PCS instances, as they need to
> select the appropriate one, how do MAC drivers get to know that the
> PCS has gone away to drop their reference - and tell phylink that the
> PCS has gone. That's the problem that needs solving to allow PCS to
> be unbound if we're going to make them first class citizens of the
> driver model.
> 
> I am no fan of "but XYZ doesn't care about it, so why should we care"
> arguments. If I remember correctly, phylib pre-dates the device model,
> and had the device model retrofitted, so it was a best-efforts
> attempt - and suffered back then with the same problem of needing
> lots of drivers to be changed in non-trivial ways.
> 
> We have the chance here to come up with something better - and I think
> that chance should be used to full effect.

---

This whole thing has me asking the question: why do we allow unbinding
in-use devices in the first place?

--Sean

Russell King (Oracle) Aug. 22, 2022, 4:32 p.m. UTC | #6

On Mon, Aug 22, 2022 at 12:00:28PM -0400, Sean Anderson wrote:
> In the last thread I posted this snippet:
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index a74b320f5b27..05894e1c3e59 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -27,6 +27,7 @@
>  #include <linux/phy.h>
>  #include <linux/phy_led_triggers.h>
>  #include <linux/property.h>
> +#include <linux/rtnetlink.h>
>  #include <linux/sfp.h>
>  #include <linux/skbuff.h>
>  #include <linux/slab.h>
> @@ -3111,6 +3112,13 @@ static int phy_remove(struct device *dev)
>  {
>         struct phy_device *phydev = to_phy_device(dev);
>  
> +	// I'm pretty sure this races with multiple unbinds...
> +       rtnl_lock();
> +       device_unlock(dev);
> +       dev_close(phydev->attached_dev);
> +       device_lock(dev);
> +       rtnl_unlock();
> +       WARN_ON(phydev->attached_dev);
> +
>         cancel_delayed_work_sync(&phydev->state_queue);
>  
>         mutex_lock(&phydev->lock);
> ---
> 
> Would this be acceptable? Can the locking be fixed?

I can't comment on that.

> > Addressing the PCS part of the patch posting and unrelated to what we
> > do for phylib...
> > 
> > However, I don't see "we'll do this for phylib, so we can do it for
> > PCS as well" as much of a sane argument - we don't have bazillions
> > of network drivers to fix to make this work for PCS. We currently
> > have no removable PCS (they don't appear with a driver so aren't
> > even bound to anything.) So we should add the appropriate handling
> > when we start to do this.
> > 
> > Phylink has the capability to take the link down when something goes
> > away, and if the PCS goes away, that's exactly what should happen,
> > rather than oopsing the kernel.
> 
> Yeah, but we can't just call phylink_stop; we have to call the function
> which will call phylink_stop, which is different for MAC drivers and
> for DSA drivers.

I think that's way too much and breaks the phylink design. phylink_stop
is designed to be called only from ndo_close() - and to be paired with
phylink_start().

When I talk about "taking the link down" what I mean by that is telling
the network device that the *link* *is* *down* and no more. In other
words, having phylink_resolve() know that there should be a PCS but it's
gone, and therefore the link should not come up in its current
configuration.

> I think we'd need something like
> 
> struct phylink_pcs *pcs_get(struct device *dev, const char *id,
> 			    void (*detach)(struct phylink_pcs *, void *),
> 			    void *priv)
> 
> which would also require that pcs_get is called before phylink_start,
> instead of in probe (which is what every existing user does).

That would at least allow the MAC driver to take action when the PCS
gets removed.

> This whole thing has me asking the question: why do we allow unbinding
> in-use devices in the first place?

The driver model was designed that way from the start, because in most
cases when something is unplugged from the system, the "remove" driver
callback is just a notification that the device has already gone.
Failing it makes no sense, because software can't magic the device
back.

Take an example of a USB device. The user unplugs it, it's gone from
the system, but the system briefly still believes the device to be
present for a while. It eventually notices that the device has gone,
and the USB layer unregisters the struct device - which has the effect
of unbinding the device from the driver and eventually cleaning up the
struct device.

This can and does happen with Ethernet PHYs ever since we started
supporting SFPs with Ethernet PHYs. The same thing is true there -
you can pull the module at any moment, it will be gone, and the
system will catch up with its physical disconnection some point later.
It's no good trying to make ->remove say "no, the device is still in
use, I won't let you remove it" because there's nothing software can
do to prevent the going of the device - the device has already
physically gone.

I don't think that's the case with PCS - do we really have any PCS
that can be disconnected from the system while it's running? Maybe
ones in a FPGA and the FPGA can be reprogrammed at runtime (yes,
people have really done this in the past.)

If we don't want to support "hotpluggable" PCS, then there's a
simple solution to this - the driver model has the facility to suppress
the bind/unbind driver files in sysfs, which means userspace can't
unbind the device. If there's also no way to make the struct device go
away in the kernel, then effectively the driver can then only be
unbound if the driver is built as a module.

At that point, we always have the option of insisting that PCS drivers
are never modules - and then we have the situation where a PCS will
never disappear from the system once the driver has picked up on it.

Of course, those PCS that are tightly bound to their MACs can still
come and go along with their MACs, but it's up to the MAC driver to
make that happen safely (unregistering the netdev first, which will
have the effect of calling ndo_close(), taking the network device
down and preventing further access to the netdev - standard netdev
MAC driver stuff.)

Sean Anderson Aug. 22, 2022, 5:02 p.m. UTC | #7

On 8/22/22 12:32 PM, Russell King (Oracle) wrote:
> On Mon, Aug 22, 2022 at 12:00:28PM -0400, Sean Anderson wrote:
>> In the last thread I posted this snippet:
>> 
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index a74b320f5b27..05894e1c3e59 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/phy.h>
>>  #include <linux/phy_led_triggers.h>
>>  #include <linux/property.h>
>> +#include <linux/rtnetlink.h>
>>  #include <linux/sfp.h>
>>  #include <linux/skbuff.h>
>>  #include <linux/slab.h>
>> @@ -3111,6 +3112,13 @@ static int phy_remove(struct device *dev)
>>  {
>>         struct phy_device *phydev = to_phy_device(dev);
>>  
>> +	// I'm pretty sure this races with multiple unbinds...
>> +       rtnl_lock();
>> +       device_unlock(dev);
>> +       dev_close(phydev->attached_dev);
>> +       device_lock(dev);
>> +       rtnl_unlock();
>> +       WARN_ON(phydev->attached_dev);
>> +
>>         cancel_delayed_work_sync(&phydev->state_queue);
>>  
>>         mutex_lock(&phydev->lock);
>> ---
>> 
>> Would this be acceptable? Can the locking be fixed?
> 
> I can't comment on that.

:l

I feel like doing device_unlock is very wrong, but someone in the dev_close
call calls device_lock and it deadlocks.

>> > Addressing the PCS part of the patch posting and unrelated to what we
>> > do for phylib...
>> > 
>> > However, I don't see "we'll do this for phylib, so we can do it for
>> > PCS as well" as much of a sane argument - we don't have bazillions
>> > of network drivers to fix to make this work for PCS. We currently
>> > have no removable PCS (they don't appear with a driver so aren't
>> > even bound to anything.) So we should add the appropriate handling
>> > when we start to do this.
>> > 
>> > Phylink has the capability to take the link down when something goes
>> > away, and if the PCS goes away, that's exactly what should happen,
>> > rather than oopsing the kernel.
>> 
>> Yeah, but we can't just call phylink_stop; we have to call the function
>> which will call phylink_stop, which is different for MAC drivers and
>> for DSA drivers.
> 
> I think that's way too much and breaks the phylink design. phylink_stop
> is designed to be called only from ndo_close() - and to be paired with
> phylink_start().

Well, the driver almost certainly wants to bring the link down (so that
it can stop using the PCS) Maybe we just need to call
phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED)?

> When I talk about "taking the link down" what I mean by that is telling
> the network device that the *link* *is* *down* and no more. In other
> words, having phylink_resolve() know that there should be a PCS but it's
> gone, and therefore the link should not come up in its current
> configuration.

Fundamentally the driver is the one which owns the PCS, not phylink. The
driver is perfectly capable of touching the PCS by accident or on purpose,
including calling PCS operations directly. We already have two in-tree
drivers which do this (mt7530 and mvpp2). I also used this method when
conversion dpaa to phylink. In the patch before the conversion, I
switched to the lynx PCS instead of using the existing (ad-hoc) helpers.

So we can't just handle everything in phylink; the driver has to be
involved too. And of course, what happens when the link is brought back
up again? The driver will once again offer the (now bogus) PCS. Will we
have to track dead PCSs forever? What happens if another (valid) PCS gets
allocated at the same address as the old PCS?

>> I think we'd need something like
>> 
>> struct phylink_pcs *pcs_get(struct device *dev, const char *id,
>> 			    void (*detach)(struct phylink_pcs *, void *),
>> 			    void *priv)
>> 
>> which would also require that pcs_get is called before phylink_start,
>> instead of in probe (which is what every existing user does).
> 
> That would at least allow the MAC driver to take action when the PCS
> gets removed.
> 
>> This whole thing has me asking the question: why do we allow unbinding
>> in-use devices in the first place?
> 
> The driver model was designed that way from the start, because in most
> cases when something is unplugged from the system, the "remove" driver
> callback is just a notification that the device has already gone.
> Failing it makes no sense, because software can't magic the device
> back.
> 
> Take an example of a USB device. The user unplugs it, it's gone from
> the system, but the system briefly still believes the device to be
> present for a while. It eventually notices that the device has gone,
> and the USB layer unregisters the struct device - which has the effect
> of unbinding the device from the driver and eventually cleaning up the
> struct device.
> 
> This can and does happen with Ethernet PHYs ever since we started
> supporting SFPs with Ethernet PHYs. The same thing is true there -
> you can pull the module at any moment, it will be gone, and the
> system will catch up with its physical disconnection some point later.
> It's no good trying to make ->remove say "no, the device is still in
> use, I won't let you remove it" because there's nothing software can
> do to prevent the going of the device - the device has already
> physically gone.

OK, that clears things up.

> I don't think that's the case with PCS - do we really have any PCS
> that can be disconnected from the system while it's running? Maybe
> ones in a FPGA and the FPGA can be reprogrammed at runtime (yes,
> people have really done this in the past.)

This is actually a pretty good case for PCS drivers, since the MAC
has no idea what kind of PCS it's hooked up to when there's a PCS on
the FPGA.

> If we don't want to support "hotpluggable" PCS, then there's a
> simple solution to this - the driver model has the facility to suppress
> the bind/unbind driver files in sysfs, which means userspace can't
> unbind the device. If there's also no way to make the struct device go
> away in the kernel, then effectively the driver can then only be
> unbound if the driver is built as a module.
> 
> At that point, we always have the option of insisting that PCS drivers
> are never modules - and then we have the situation where a PCS will
> never disappear from the system once the driver has picked up on it.

Well can't we increment the module refcount?

> Of course, those PCS that are tightly bound to their MACs can still
> come and go along with their MACs, but it's up to the MAC driver to
> make that happen safely (unregistering the netdev first, which will
> have the effect of calling ndo_close(), taking the network device
> down and preventing further access to the netdev - standard netdev
> MAC driver stuff.)

And I'm not too worried about that; there's no need to create a separate
device for the PCS if it's always present.

--Sean

Andrew Lunn Aug. 22, 2022, 5:09 p.m. UTC | #8

> In the last thread I posted this snippet:
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index a74b320f5b27..05894e1c3e59 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -27,6 +27,7 @@
>  #include <linux/phy.h>
>  #include <linux/phy_led_triggers.h>
>  #include <linux/property.h>
> +#include <linux/rtnetlink.h>
>  #include <linux/sfp.h>
>  #include <linux/skbuff.h>
>  #include <linux/slab.h>
> @@ -3111,6 +3112,13 @@ static int phy_remove(struct device *dev)
>  {
>         struct phy_device *phydev = to_phy_device(dev);
>  
> +	// I'm pretty sure this races with multiple unbinds...
> +       rtnl_lock();
> +       device_unlock(dev);
> +       dev_close(phydev->attached_dev);
> +       device_lock(dev);
> +       rtnl_unlock();
> +       WARN_ON(phydev->attached_dev);
> +
>         cancel_delayed_work_sync(&phydev->state_queue);
>  
>         mutex_lock(&phydev->lock);
> ---
> 
> Would this be acceptable? Can the locking be fixed?

Code like this should not be hidden in the PHY layer. If we decide to
go down this path it probably should be in net/core/dev.c.

I suggest you talk to the maintainers of that file, probably Eric
Dumazet, give him a clear explanation of the problem, and see what he
suggests.

	Andrew

[net] net: phy: Warn if phy is attached when removing

Checks

Commit Message

Comments

Patch