Message ID | 20210901225053.1205571-1-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
Headers | show |
Series | Make the PHY library stop being so greedy when binding the generic PHY driver | expand |
On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote: > The central point of that discussion is that DSA seems "broken" for > expecting the PHY driver to probe immediately on PHYs belonging to the > internal MDIO buses of switches. A few suggestions were made about what > to do, but some were not satisfactory and some did not solve the problem. I think you need to describe the mechanism here. Why wouldn't a PHY belonging to an internal MDIO bus of a switch not probe immediately? What resources may not be available? If we have a DSA driver that tries to probe the PHYs before e.g. the interrupt controller inside the DSA switch has been configured, aren't we just making completely unnecessary problems for ourselves? Wouldn't it be saner to ensure that the interrupt controller has been setup and become available prior to attempting to setup anything that relies upon that interrupt controller? From what I see of Marvell switches, the internal PHYs only ever rely on internal resources of the switch they are embedded in. External PHYs to the switch are a different matter - these can rely on external clocks, and in that scenario, it would make sense for a deferred probe to cause the entire switch to defer, since we don't have all the resources for the switch to be functional (and, because we want the PHYs to be present at switch probe time, not when we try to bring up the interface, I don't see there's much other choice.) Trying to move that to interface-up time /will/ break userspace - for example, Debian's interfaces(8) bridge support will become unreliable, and probably a whole host of other userspace. It will cause regressions and instability to userspace. So that's a big no. Maybe I'm missing exactly what the problem is...
On Thu, Sep 02, 2021 at 01:19:27PM +0100, Russell King (Oracle) wrote: > On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote: > > The central point of that discussion is that DSA seems "broken" for > > expecting the PHY driver to probe immediately on PHYs belonging to the > > internal MDIO buses of switches. A few suggestions were made about what > > to do, but some were not satisfactory and some did not solve the problem. > > I think you need to describe the mechanism here. Why wouldn't a PHY > belonging to an internal MDIO bus of a switch not probe immediately? > What resources may not be available? As you point out below, the interrupt-controller is what is not available. There is a mechanism called fw_devlink which infers links from one OF node to another based on phandles. When you have an interrupt-parent, that OF node becomes a supplier to you. Those OF node links are then transferred to device links once the devices having those OF nodes are created. > If we have a DSA driver that tries to probe the PHYs before e.g. the > interrupt controller inside the DSA switch has been configured, aren't > we just making completely unnecessary problems for ourselves? This is not what happens, if that were the case, of course I would fix _that_ and not in this way. > Wouldn't it be saner to ensure that the interrupt controller has been > setup and become available prior to attempting to setup anything that > relies upon that interrupt controller? The interrupt controller _has_ been set up. The trouble is that the interrupt controller has the same OF node as the switch itself, and the same OF node. Therefore, fw_devlink waits for the _entire_ switch to finish probing, it doesn't have insight into the fact that the dependency is just on the interrupt controller. > From what I see of Marvell switches, the internal PHYs only ever rely > on internal resources of the switch they are embedded in. > > External PHYs to the switch are a different matter - these can rely on > external clocks, and in that scenario, it would make sense for a > deferred probe to cause the entire switch to defer, since we don't > have all the resources for the switch to be functional (and, because we > want the PHYs to be present at switch probe time, not when we try to > bring up the interface, I don't see there's much other choice.) > > Trying to move that to interface-up time /will/ break userspace - for > example, Debian's interfaces(8) bridge support will become unreliable, > and probably a whole host of other userspace. It will cause regressions > and instability to userspace. So that's a big no. Why a big no? I expect there to be 2 call paths of phy_attach_direct: - At probe time. Both the MAC driver and the PHY driver are probing. This is what has this patch addresses. There is no issue to return -EPROBE_DEFER at that time, since drivers connect to the PHY before they register their netdev. So if connecting defers, there is no netdev to unregister, and user space knows nothing of this. - At .ndo_open time. This is where it maybe gets interesting, but not to user space. If you open a netdev and it connects to the PHY then, I wouldn't expect the PHY to be undergoing a probing process, all of that should have been settled by then, should it not? Where it might get interesting is with NFS root, and I admit I haven't tested that.
On Thu, Sep 02, 2021 at 03:35:32PM +0300, Vladimir Oltean wrote: > On Thu, Sep 02, 2021 at 01:19:27PM +0100, Russell King (Oracle) wrote: > > On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote: > > > The central point of that discussion is that DSA seems "broken" for > > > expecting the PHY driver to probe immediately on PHYs belonging to the > > > internal MDIO buses of switches. A few suggestions were made about what > > > to do, but some were not satisfactory and some did not solve the problem. > > > > I think you need to describe the mechanism here. Why wouldn't a PHY > > belonging to an internal MDIO bus of a switch not probe immediately? > > What resources may not be available? > > As you point out below, the interrupt-controller is what is not available. > There is a mechanism called fw_devlink which infers links from one OF > node to another based on phandles. When you have an interrupt-parent, > that OF node becomes a supplier to you. Those OF node links are then > transferred to device links once the devices having those OF nodes are > created. > > > If we have a DSA driver that tries to probe the PHYs before e.g. the > > interrupt controller inside the DSA switch has been configured, aren't > > we just making completely unnecessary problems for ourselves? > > This is not what happens, if that were the case, of course I would fix > _that_ and not in this way. > > > Wouldn't it be saner to ensure that the interrupt controller has been > > setup and become available prior to attempting to setup anything that > > relies upon that interrupt controller? > > The interrupt controller _has_ been set up. The trouble is that the > interrupt controller has the same OF node as the switch itself, and the > same OF node. Therefore, fw_devlink waits for the _entire_ switch to ...and the same struct device, not "OF node" repeated twice, silly me. > finish probing, it doesn't have insight into the fact that the > dependency is just on the interrupt controller. > > > From what I see of Marvell switches, the internal PHYs only ever rely > > on internal resources of the switch they are embedded in. > > > > External PHYs to the switch are a different matter - these can rely on > > external clocks, and in that scenario, it would make sense for a > > deferred probe to cause the entire switch to defer, since we don't > > have all the resources for the switch to be functional (and, because we > > want the PHYs to be present at switch probe time, not when we try to > > bring up the interface, I don't see there's much other choice.) > > > > Trying to move that to interface-up time /will/ break userspace - for > > example, Debian's interfaces(8) bridge support will become unreliable, > > and probably a whole host of other userspace. It will cause regressions > > and instability to userspace. So that's a big no. > > Why a big no? I expect there to be 2 call paths of phy_attach_direct: > - At probe time. Both the MAC driver and the PHY driver are probing. > This is what has this patch addresses. There is no issue to return > -EPROBE_DEFER at that time, since drivers connect to the PHY before > they register their netdev. So if connecting defers, there is no > netdev to unregister, and user space knows nothing of this. > - At .ndo_open time. This is where it maybe gets interesting, but not to > user space. If you open a netdev and it connects to the PHY then, I > wouldn't expect the PHY to be undergoing a probing process, all of > that should have been settled by then, should it not? Where it might > get interesting is with NFS root, and I admit I haven't tested that.
On Thu, Sep 02, 2021 at 03:35:32PM +0300, Vladimir Oltean wrote: > On Thu, Sep 02, 2021 at 01:19:27PM +0100, Russell King (Oracle) wrote: > > On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote: > > > The central point of that discussion is that DSA seems "broken" for > > > expecting the PHY driver to probe immediately on PHYs belonging to the > > > internal MDIO buses of switches. A few suggestions were made about what > > > to do, but some were not satisfactory and some did not solve the problem. > > > > I think you need to describe the mechanism here. Why wouldn't a PHY > > belonging to an internal MDIO bus of a switch not probe immediately? > > What resources may not be available? > > As you point out below, the interrupt-controller is what is not available. > There is a mechanism called fw_devlink which infers links from one OF > node to another based on phandles. When you have an interrupt-parent, > that OF node becomes a supplier to you. Those OF node links are then > transferred to device links once the devices having those OF nodes are > created. > > > If we have a DSA driver that tries to probe the PHYs before e.g. the > > interrupt controller inside the DSA switch has been configured, aren't > > we just making completely unnecessary problems for ourselves? > > This is not what happens, if that were the case, of course I would fix > _that_ and not in this way. > > > Wouldn't it be saner to ensure that the interrupt controller has been > > setup and become available prior to attempting to setup anything that > > relies upon that interrupt controller? > > The interrupt controller _has_ been set up. The trouble is that the > interrupt controller has the same OF node as the switch itself, and the > same OF node. Therefore, fw_devlink waits for the _entire_ switch to > finish probing, it doesn't have insight into the fact that the > dependency is just on the interrupt controller. > > > From what I see of Marvell switches, the internal PHYs only ever rely > > on internal resources of the switch they are embedded in. > > > > External PHYs to the switch are a different matter - these can rely on > > external clocks, and in that scenario, it would make sense for a > > deferred probe to cause the entire switch to defer, since we don't > > have all the resources for the switch to be functional (and, because we > > want the PHYs to be present at switch probe time, not when we try to > > bring up the interface, I don't see there's much other choice.) > > > > Trying to move that to interface-up time /will/ break userspace - for > > example, Debian's interfaces(8) bridge support will become unreliable, > > and probably a whole host of other userspace. It will cause regressions > > and instability to userspace. So that's a big no. > > Why a big no? Fundamental rule of kernel programming: we do not break existing userspace. Debian has had support for configuring bridges at boot time via the interfaces file for years. Breaking that is going to upset a lot of people (me included) resulting in busted networks. It would be a sure way to make oneself unpopular. > I expect there to be 2 call paths of phy_attach_direct: > - At probe time. Both the MAC driver and the PHY driver are probing. > This is what has this patch addresses. There is no issue to return > -EPROBE_DEFER at that time, since drivers connect to the PHY before > they register their netdev. So if connecting defers, there is no > netdev to unregister, and user space knows nothing of this. > - At .ndo_open time. This is where it maybe gets interesting, but not to > user space. If you open a netdev and it connects to the PHY then, I > wouldn't expect the PHY to be undergoing a probing process, all of > that should have been settled by then, should it not? Where it might > get interesting is with NFS root, and I admit I haven't tested that. I don't think you can make that assumption. Consider the case where systemd is being used, DSA stuff is modular, and we're trying to setup a bridge device on DSA. DSA could be probing while the bridge is being setup. Sadly, this isn't theoretical. I've ended up needing: pre-up sleep 1 in my bridge configuration to allow time for DSA to finish probing. It's not a pleasant solution, nor a particularly reliable one at that, but it currently works around the problem. We don't need more cases of this kind of thing leading to boot time unreliability... Or if we do, then we're turning Linux into Windows, where you can end up with different behaviours each time the system is boot depending on the exact order that various stuff comes up.
On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote: > On Thu, Sep 02, 2021 at 03:35:32PM +0300, Vladimir Oltean wrote: > > On Thu, Sep 02, 2021 at 01:19:27PM +0100, Russell King (Oracle) wrote: > > > On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote: > > > > The central point of that discussion is that DSA seems "broken" for > > > > expecting the PHY driver to probe immediately on PHYs belonging to the > > > > internal MDIO buses of switches. A few suggestions were made about what > > > > to do, but some were not satisfactory and some did not solve the problem. > > > > > > I think you need to describe the mechanism here. Why wouldn't a PHY > > > belonging to an internal MDIO bus of a switch not probe immediately? > > > What resources may not be available? > > > > As you point out below, the interrupt-controller is what is not available. > > There is a mechanism called fw_devlink which infers links from one OF > > node to another based on phandles. When you have an interrupt-parent, > > that OF node becomes a supplier to you. Those OF node links are then > > transferred to device links once the devices having those OF nodes are > > created. > > > > > If we have a DSA driver that tries to probe the PHYs before e.g. the > > > interrupt controller inside the DSA switch has been configured, aren't > > > we just making completely unnecessary problems for ourselves? > > > > This is not what happens, if that were the case, of course I would fix > > _that_ and not in this way. > > > > > Wouldn't it be saner to ensure that the interrupt controller has been > > > setup and become available prior to attempting to setup anything that > > > relies upon that interrupt controller? > > > > The interrupt controller _has_ been set up. The trouble is that the > > interrupt controller has the same OF node as the switch itself, and the > > same OF node. Therefore, fw_devlink waits for the _entire_ switch to > > finish probing, it doesn't have insight into the fact that the > > dependency is just on the interrupt controller. > > > > > From what I see of Marvell switches, the internal PHYs only ever rely > > > on internal resources of the switch they are embedded in. > > > > > > External PHYs to the switch are a different matter - these can rely on > > > external clocks, and in that scenario, it would make sense for a > > > deferred probe to cause the entire switch to defer, since we don't > > > have all the resources for the switch to be functional (and, because we > > > want the PHYs to be present at switch probe time, not when we try to > > > bring up the interface, I don't see there's much other choice.) > > > > > > Trying to move that to interface-up time /will/ break userspace - for > > > example, Debian's interfaces(8) bridge support will become unreliable, > > > and probably a whole host of other userspace. It will cause regressions > > > and instability to userspace. So that's a big no. > > > > Why a big no? > > Fundamental rule of kernel programming: we do not break existing > userspace. Of course, I wasn't asking why we shouldn't be breaking user space, but about the specifics of why this change would do that. > Debian has had support for configuring bridges at boot time via > the interfaces file for years. Breaking that is going to upset a > lot of people (me included) resulting in busted networks. It > would be a sure way to make oneself unpopular. > > > I expect there to be 2 call paths of phy_attach_direct: > > - At probe time. Both the MAC driver and the PHY driver are probing. > > This is what has this patch addresses. There is no issue to return > > -EPROBE_DEFER at that time, since drivers connect to the PHY before > > they register their netdev. So if connecting defers, there is no > > netdev to unregister, and user space knows nothing of this. > > - At .ndo_open time. This is where it maybe gets interesting, but not to > > user space. If you open a netdev and it connects to the PHY then, I > > wouldn't expect the PHY to be undergoing a probing process, all of > > that should have been settled by then, should it not? Where it might > > get interesting is with NFS root, and I admit I haven't tested that. > > I don't think you can make that assumption. Consider the case where > systemd is being used, DSA stuff is modular, and we're trying to > setup a bridge device on DSA. DSA could be probing while the bridge > is being setup. > > Sadly, this isn't theoretical. I've ended up needing: > > pre-up sleep 1 > > in my bridge configuration to allow time for DSA to finish probing. > It's not a pleasant solution, nor a particularly reliable one at > that, but it currently works around the problem. What problem? This is the first time I've heard of this report, and you should definitely not need that. I do have a system set up to use systemd-networkd, and I did want to try this out: $ for file in /etc/systemd/network/*; do echo ${file}; cat ${file}; done /etc/systemd/network/br0.netdev [NetDev] Name=br0 Kind=bridge [Bridge] VLANFiltering=no DefaultPVID=1 STP=no [VLAN] MVRP=no /etc/systemd/network/br0.network [Match] Name=br0 [Network] DHCP=ipv4 /etc/systemd/network/eth0.network [Match] Name=eth0 [Network] Bridge=br0 /etc/systemd/network/eth1.network [Match] Name=eth1 [Network] Bridge=br0 /etc/systemd/network/eth2.network [Match] Name=eth2 [Network] LinkLocalAddressing=yes /etc/systemd/network/swp.network [Match] Name=swp* [Network] BindCarrier=eth2 Bridge=br0 # Before # bridge link 7: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4 8: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br0 state disabled priority 32 cost 100 # Kick off the probing $ insmod sja1105.ko [ 34.922908] sja1105 spi0.1: Probed switch chip: SJA1105T $ insmod tag_sja1105.ko $ echo spi0.1 > /sys/bus/spi/drivers/sja1105/bind [ 51.345993] sja1105 spi0.1: Probed switch chip: SJA1105T [ 51.378063] sja1105 spi0.1 swp5 (uninitialized): PHY [mdio@2d24000:06] driver [Broadcom BCM5464] (irq=POLL) [ 51.389880] sja1105 spi0.1 swp2 (uninitialized): PHY [mdio@2d24000:03] driver [Broadcom BCM5464] (irq=POLL) [ 51.401806] sja1105 spi0.1 swp3 (uninitialized): PHY [mdio@2d24000:04] driver [Broadcom BCM5464] (irq=POLL) [ 51.413710] sja1105 spi0.1 swp4 (uninitialized): PHY [mdio@2d24000:05] driver [Broadcom BCM5464] (irq=POLL) [ 51.424859] fsl-gianfar soc:ethernet@2d90000 eth2: Link is Up - 1Gbps/Full - flow control off [ 51.453768] sja1105 spi0.1: configuring for fixed/rgmii link mode [ 51.460094] device eth2 entered promiscuous mode [ 51.464856] DSA: tree 0 setup [ 51.477105] br0: port 3(swp2) entered blocking state [ 51.478394] sja1105 spi0.1: Link is Up - 1Gbps/Full - flow control off [ 51.482080] br0: port 3(swp2) entered disabled state [ 51.531585] device swp2 entered promiscuous mode [ 51.550365] sja1105 spi0.1 swp2: configuring for phy/rgmii-id link mode [ 51.559631] br0: port 4(swp5) entered blocking state [ 51.564597] br0: port 4(swp5) entered disabled state [ 51.586224] device swp5 entered promiscuous mode [ 51.647483] sja1105 spi0.1 swp5: configuring for phy/rgmii-id link mode [ 51.665995] br0: port 5(swp4) entered blocking state [ 51.671004] br0: port 5(swp4) entered disabled state [ 51.677991] device swp4 entered promiscuous mode [ 51.685967] br0: port 6(swp3) entered blocking state [ 51.690935] br0: port 6(swp3) entered disabled state [ 51.698246] device swp3 entered promiscuous mode [ 51.746640] sja1105 spi0.1 swp4: configuring for phy/rgmii-id link mode [ 51.754986] sja1105 spi0.1 swp3: configuring for phy/rgmii-id link mode [ 54.716225] sja1105 spi0.1 swp2: Link is Up - 1Gbps/Full - flow control off [ 54.723208] IPv6: ADDRCONF(NETDEV_CHANGE): swp2: link becomes ready [ 54.729620] br0: port 3(swp2) entered blocking state [ 54.734576] br0: port 3(swp2) entered forwarding state [ 54.796136] sja1105 spi0.1 swp5: Link is Up - 1Gbps/Full - flow control off [ 54.803117] IPv6: ADDRCONF(NETDEV_CHANGE): swp5: link becomes ready [ 54.809527] br0: port 4(swp5) entered blocking state [ 54.814484] br0: port 4(swp5) entered forwarding state [ 54.876397] sja1105 spi0.1 swp3: Link is Up - 1Gbps/Full - flow control off [ 54.883378] IPv6: ADDRCONF(NETDEV_CHANGE): swp3: link becomes ready [ 54.889790] br0: port 6(swp3) entered blocking state [ 54.894744] br0: port 6(swp3) entered forwarding state # After $ bridge link 7: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4 8: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br0 state disabled priority 32 cost 100 12: swp5@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4 13: swp2@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4 14: swp3@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4 15: swp4@eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br0 state disabled priority 32 cost 100 The ports are ready to pass traffic, and are doing it. So what does "wait for DSA to finish probing" mean? What driver, kernel and systemd-networkd version is this, exactly, and what is it that needs waiting?
On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote: > On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote: > > Debian has had support for configuring bridges at boot time via > > the interfaces file for years. Breaking that is going to upset a > > lot of people (me included) resulting in busted networks. It > > would be a sure way to make oneself unpopular. > > > > > I expect there to be 2 call paths of phy_attach_direct: > > > - At probe time. Both the MAC driver and the PHY driver are probing. > > > This is what has this patch addresses. There is no issue to return > > > -EPROBE_DEFER at that time, since drivers connect to the PHY before > > > they register their netdev. So if connecting defers, there is no > > > netdev to unregister, and user space knows nothing of this. > > > - At .ndo_open time. This is where it maybe gets interesting, but not to > > > user space. If you open a netdev and it connects to the PHY then, I > > > wouldn't expect the PHY to be undergoing a probing process, all of > > > that should have been settled by then, should it not? Where it might > > > get interesting is with NFS root, and I admit I haven't tested that. > > > > I don't think you can make that assumption. Consider the case where > > systemd is being used, DSA stuff is modular, and we're trying to > > setup a bridge device on DSA. DSA could be probing while the bridge > > is being setup. > > > > Sadly, this isn't theoretical. I've ended up needing: > > > > pre-up sleep 1 > > > > in my bridge configuration to allow time for DSA to finish probing. > > It's not a pleasant solution, nor a particularly reliable one at > > that, but it currently works around the problem. > > What problem? This is the first time I've heard of this report, and you > should definitely not need that. I found it when upgrading the Clearfog by the DSL modems to v5.13. When I rebooted it with a previously working kernel (v5.7) it has never had a problem. With v5.13, it failed to add all the lan ports into the bridge, because the bridge was still being setup by the kernel while userspace was trying to configure it. Note that I have extra debug in my kernels, hence the extra messages: Aug 30 11:29:52 sw-dsl kernel: [ 3.308583] Marvell 88E1540 mv88e6xxx-0:03: probe: irq=78 Aug 30 11:29:52 sw-dsl kernel: [ 3.308595] Marvell 88E1540 mv88e6xxx-0:03: probe: irq=78 Aug 30 11:29:52 sw-dsl kernel: [ 3.332403] Marvell 88E1540 mv88e6xxx-0:04: probe: irq=79 Aug 30 11:29:52 sw-dsl kernel: [ 3.332415] Marvell 88E1540 mv88e6xxx-0:04: probe: irq=79 Aug 30 11:29:52 sw-dsl kernel: [ 3.412638] Marvell 88E1545 mv88e6xxx-0:0f: probe: irq=-1 Aug 30 11:29:52 sw-dsl kernel: [ 3.412649] Marvell 88E1545 mv88e6xxx-0:0f: probe: irq=-1 Aug 30 11:29:52 sw-dsl kernel: [ 3.515888] libphy: mv88e6xxx SMI: probed Here, userspace starts configuring eno1, the ethernet port connected to the DSA switch: Aug 30 11:29:52 sw-dsl kernel: [ 3.536090] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode Aug 30 11:29:52 sw-dsl kernel: [ 3.536109] mvneta f1030000.ethernet eno1: major config 1000base-x Aug 30 11:29:52 sw-dsl kernel: [ 3.536117] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=0000000,00000200,00002240 pause=04 link=0 an=1 Aug 30 11:29:52 sw-dsl kernel: [ 3.536135] mvneta f1030000.ethernet eno1: mac link down Aug 30 11:29:52 sw-dsl kernel: [ 3.536135] mvneta f1030000.ethernet eno1: mac link down Aug 30 11:29:52 sw-dsl kernel: [ 3.536146] mvneta f1030000.ethernet eno1: mac link down Aug 30 11:29:52 sw-dsl kernel: [ 3.536146] mvneta f1030000.ethernet eno1: mac link down Aug 30 11:29:52 sw-dsl kernel: [ 3.572013] mvneta f1030000.ethernet eno1: mac link up Aug 30 11:29:52 sw-dsl kernel: [ 3.572016] mvneta f1030000.ethernet eno1: mac link up Aug 30 11:29:52 sw-dsl kernel: [ 3.572046] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx Aug 30 11:29:52 sw-dsl kernel: [ 3.657820] 8021q: 802.1Q VLAN Support v1.8 We get the link to eno1 going down/up due to DSA's actions: Aug 30 11:29:53 sw-dsl kernel: [ 4.291882] mvneta f1030000.ethernet eno1: Link is Down Aug 30 11:29:53 sw-dsl kernel: [ 4.309425] mvneta f1030000.ethernet eno1: mac link down Aug 30 11:29:53 sw-dsl kernel: [ 4.309425] mvneta f1030000.ethernet eno1: mac link down Aug 30 11:29:53 sw-dsl kernel: [ 4.309440] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode Aug 30 11:29:53 sw-dsl kernel: [ 4.309447] mvneta f1030000.ethernet eno1: major config 1000base-x Aug 30 11:29:53 sw-dsl kernel: [ 4.309454] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=0000000,00000200,00002240 pause=04 link=0 an=1 Aug 30 11:29:53 sw-dsl kernel: [ 4.345013] mvneta f1030000.ethernet eno1: mac link up Aug 30 11:29:53 sw-dsl kernel: [ 4.345014] mvneta f1030000.ethernet eno1: mac link up Aug 30 11:29:53 sw-dsl kernel: [ 4.345036] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx DSA then starts initialising the ports: Aug 30 11:29:53 sw-dsl kernel: [ 4.397647] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=75) Aug 30 11:29:53 sw-dsl kernel: [ 4.397663] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising 0000000,00000000,000022ef Aug 30 11:29:53 sw-dsl kernel: [ 4.493080] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=76) Aug 30 11:29:53 sw-dsl kernel: [ 4.493093] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising 0000000,00000000,000022ef Aug 30 11:29:53 sw-dsl kernel: [ 4.577070] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=77) Aug 30 11:29:53 sw-dsl kernel: [ 4.577081] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising 0000000,00000000,000022ef Meanwhile userspace is trying to setup the bridge while this is going on, and has tried to add the non-existent lan2 at this point, but lan4 has just been created in time, so Debian's bridge support adds it to the brdsl bridge: Aug 30 11:29:53 sw-dsl kernel: [ 4.652237] brdsl: port 1(lan4) entered blocking state Aug 30 11:29:53 sw-dsl kernel: [ 4.652250] brdsl: port 1(lan4) entered disabled state DSA continues setting up the other ports, here lan2, but the bridge setup scripts have already moved on past lan2. Aug 30 11:29:53 sw-dsl kernel: [ 4.674038] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=78) Aug 30 11:29:53 sw-dsl kernel: [ 4.674052] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising 0000000,00000000,000022ef Aug 30 11:29:53 sw-dsl kernel: [ 4.674612] device lan4 entered promiscuous mode Aug 30 11:29:53 sw-dsl kernel: [ 4.785886] device eno1 entered promiscuous mode Aug 30 11:29:53 sw-dsl kernel: [ 4.786971] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode Aug 30 11:29:53 sw-dsl kernel: [ 4.786980] mv88e6085 f1072004.mdio-mii:04 lan4: major config gmii Aug 30 11:29:53 sw-dsl kernel: [ 4.786986] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 Aug 30 11:29:53 sw-dsl kernel: [ 4.786996] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config() Aug 30 11:29:53 sw-dsl kernel: [ 4.789977] 8021q: adding VLAN 0 to HW filter on device lan4 Aug 30 11:29:53 sw-dsl kernel: [ 4.836720] brdsl: port 2(eno2) entered blocking state Aug 30 11:29:53 sw-dsl kernel: [ 4.836733] brdsl: port 2(eno2) entered disabled state Here, the SFP port (on eno2) is added to the bridge. Aug 30 11:29:53 sw-dsl kernel: [ 4.836907] device eno2 entered promiscuous mode Aug 30 11:29:53 sw-dsl kernel: [ 4.837011] brdsl: port 2(eno2) entered blocking state Aug 30 11:29:53 sw-dsl kernel: [ 4.837019] brdsl: port 2(eno2) entered forwarding state Aug 30 11:29:53 sw-dsl kernel: [ 4.837058] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready Aug 30 11:29:53 sw-dsl kernel: [ 4.846989] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off Aug 30 11:29:53 sw-dsl kernel: [ 4.896264] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=79) Aug 30 11:29:53 sw-dsl kernel: [ 4.896278] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising 0000000,00000000,000022ef Aug 30 11:29:53 sw-dsl kernel: [ 4.934514] DSA: tree 0 setup Here, the DSA tree has finally finished initialising in the kernel. Aug 30 11:29:53 sw-dsl kernel: [ 4.986877] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode Aug 30 11:29:53 sw-dsl kernel: [ 4.986890] mv88e6085 f1072004.mdio-mii:04 lan1: major config gmii Aug 30 11:29:53 sw-dsl kernel: [ 4.986896] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 Aug 30 11:29:53 sw-dsl kernel: [ 4.986907] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config() Aug 30 11:29:53 sw-dsl kernel: [ 4.990199] 8021q: adding VLAN 0 to HW filter on device lan1 Aug 30 11:29:54 sw-dsl kernel: [ 5.041313] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off Aug 30 11:29:56 sw-dsl kernel: [ 7.630016] mv88e6085 f1072004.mdio-mii:04 lan4: phy link up gmii/1Gbps/Full/off Aug 30 11:29:56 sw-dsl kernel: [ 7.630031] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=00 link=1 an=0 Aug 30 11:29:56 sw-dsl kernel: [ 7.630043] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config() Aug 30 11:29:56 sw-dsl kernel: [ 7.630294] mv88e6085 f1072004.mdio-mii:04 lan4: Link is Up - 1Gbps/Full - flow control off Aug 30 11:29:56 sw-dsl kernel: [ 7.630312] brdsl: port 1(lan4) entered blocking state Aug 30 11:29:56 sw-dsl kernel: [ 7.630321] brdsl: port 1(lan4) entered forwarding state I then notice that my Internet connection hasn't come back, so I start poking about with it, first adding it to the bridge: Aug 30 11:31:13 sw-dsl kernel: [ 84.990122] brdsl: port 3(lan2) entered blocking state Aug 30 11:31:13 sw-dsl kernel: [ 84.990134] brdsl: port 3(lan2) entered disabled state Aug 30 11:31:14 sw-dsl kernel: [ 85.063971] device lan2 entered promiscuous mode And then setting it to up state and configuring its vlan settings: Aug 30 11:32:45 sw-dsl kernel: [ 176.476090] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode Aug 30 11:32:45 sw-dsl kernel: [ 176.476103] mv88e6085 f1072004.mdio-mii:04 lan2: major config gmii Aug 30 11:32:45 sw-dsl kernel: [ 176.476109] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 Aug 30 11:32:45 sw-dsl kernel: [ 176.476120] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config() Aug 30 11:32:45 sw-dsl kernel: [ 176.479495] 8021q: adding VLAN 0 to HW filter on device lan2 Aug 30 11:32:45 sw-dsl kernel: [ 176.537796] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off Aug 30 11:32:48 sw-dsl kernel: [ 179.280863] mv88e6085 f1072004.mdio-mii:04 lan2: phy link up gmii/1Gbps/Full/rx/tx Aug 30 11:32:48 sw-dsl kernel: [ 179.280877] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=03 link=1 an=0 Aug 30 11:32:48 sw-dsl kernel: [ 179.280888] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config() Aug 30 11:32:48 sw-dsl kernel: [ 179.280894] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_link_up() Aug 30 11:32:48 sw-dsl kernel: [ 179.282958] mv88e6085 f1072004.mdio-mii:04 lan2: Link is Up - 1Gbps/Full - flow control rx/tx I had: iface brdsl inet manual bridge-ports lan2 lan4 bridge-maxwait 0 up brctl addif $IFACE eno2 I now have: iface brdsl inet manual bridge-ports lan2 lan4 bridge-waitport 10 bridge-maxwait 0 pre-up sleep 1 up brctl addif $IFACE eno2 to ensure that all ports get properly configured. What can be seen from the above is that there is most definitely a race. It is possible to start configuring a DSA switch before the DSA switch driver has finished being probed by the kernel. Here is the kernel log from v5.7 which has never showed these problems, because DSA seemed to always setup everything in kernel space prior to userspace beginning configuration: Aug 25 23:03:54 sw-dsl kernel: [ 5.793137] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode Aug 25 23:03:54 sw-dsl kernel: [ 5.793148] mvneta f1030000.ethernet eno1: config interface 1000base-x Aug 25 23:03:54 sw-dsl kernel: [ 5.793157] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1 Aug 25 23:03:54 sw-dsl kernel: [ 5.793168] mvneta f1030000.ethernet eno1: mac link down Aug 25 23:03:54 sw-dsl kernel: [ 5.793170] mvneta f1030000.ethernet eno1: mac link down Aug 25 23:03:54 sw-dsl kernel: [ 5.819769] mvneta f1030000.ethernet eno1: mac link up Aug 25 23:03:54 sw-dsl kernel: [ 5.819792] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx Aug 25 23:03:54 sw-dsl kernel: [ 5.948900] 8021q: 802.1Q VLAN Support v1.8 6.459779] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 0 Aug 25 23:03:54 sw-dsl kernel: [ 6.462890] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=67) Aug 25 23:03:54 sw-dsl kernel: [ 6.462905] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef Aug 25 23:03:54 sw-dsl kernel: [ 6.465904] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 1 Aug 25 23:03:54 sw-dsl kernel: [ 6.468101] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=68) Aug 25 23:03:54 sw-dsl kernel: [ 6.468109] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef Aug 25 23:03:54 sw-dsl kernel: [ 6.472162] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 2 Aug 25 23:03:54 sw-dsl kernel: [ 6.474247] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=69) Aug 25 23:03:54 sw-dsl kernel: [ 6.474261] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef Aug 25 23:03:54 sw-dsl kernel: [ 6.481824] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 3 Aug 25 23:03:54 sw-dsl kernel: [ 6.486354] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=70) Aug 25 23:03:54 sw-dsl kernel: [ 6.486363] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef Aug 25 23:03:54 sw-dsl kernel: [ 6.498494] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 4 Aug 25 23:03:54 sw-dsl kernel: [ 6.502272] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=71) Aug 25 23:03:54 sw-dsl kernel: [ 6.502279] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef Aug 25 23:03:54 sw-dsl kernel: [ 6.532258] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 6 Aug 25 23:03:54 sw-dsl kernel: [ 6.535877] mvneta f1030000.ethernet eno1: Link is Down Aug 25 23:03:54 sw-dsl kernel: [ 6.541733] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode Aug 25 23:03:54 sw-dsl kernel: [ 6.541741] mvneta f1030000.ethernet eno1: config interface 1000base-x Aug 25 23:03:54 sw-dsl kernel: [ 6.541754] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1 Aug 25 23:03:54 sw-dsl kernel: [ 6.541771] mvneta f1030000.ethernet eno1: mac link down Aug 25 23:03:54 sw-dsl kernel: [ 6.541779] mvneta f1030000.ethernet eno1: mac link down Aug 25 23:03:54 sw-dsl kernel: [ 6.541907] DSA: tree 0 setup Here, the kernel DSA switch driver has finished doing its setup before we even get to configuring the bridge device below. Aug 25 23:03:54 sw-dsl kernel: [ 6.569105] mvneta f1030000.ethernet eno1: mac link up Aug 25 23:03:54 sw-dsl kernel: [ 6.569113] mvneta f1030000.ethernet eno1: mac link up Aug 25 23:03:54 sw-dsl kernel: [ 6.569139] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx Aug 25 23:03:55 sw-dsl kernel: [ 6.931763] brdsl: port 1(lan2) entered blocking state Aug 25 23:03:55 sw-dsl kernel: [ 6.931769] brdsl: port 1(lan2) entered disabled state Aug 25 23:03:55 sw-dsl kernel: [ 6.932863] device lan2 entered promiscuous mode Aug 25 23:03:55 sw-dsl kernel: [ 7.032838] device eno1 entered promiscuous mode Aug 25 23:03:55 sw-dsl kernel: [ 7.032902] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode Aug 25 23:03:55 sw-dsl kernel: [ 7.032907] mv88e6085 f1072004.mdio-mii:04 lan2: config interface gmii Aug 25 23:03:55 sw-dsl kernel: [ 7.032916] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0 Aug 25 23:03:55 sw-dsl kernel: [ 7.032920] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config() Aug 25 23:03:55 sw-dsl kernel: [ 7.037225] 8021q: adding VLAN 0 to HW filter on device lan2 Aug 25 23:03:55 sw-dsl kernel: [ 7.044979] brdsl: port 2(lan4) entered blocking state Aug 25 23:03:55 sw-dsl kernel: [ 7.044985] brdsl: port 2(lan4) entered disabled state Aug 25 23:03:55 sw-dsl kernel: [ 7.056189] device lan4 entered promiscuous mode Aug 25 23:03:55 sw-dsl kernel: [ 7.107067] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode Aug 25 23:03:55 sw-dsl kernel: [ 7.107073] mv88e6085 f1072004.mdio-mii:04 lan4: config interface gmii Aug 25 23:03:55 sw-dsl kernel: [ 7.107080] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0 Aug 25 23:03:55 sw-dsl kernel: [ 7.107084] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config() Aug 25 23:03:55 sw-dsl kernel: [ 7.118831] 8021q: adding VLAN 0 to HW filter on device lan4 Aug 25 23:03:55 sw-dsl kernel: [ 7.153604] brdsl: port 3(eno2) entered blocking state Aug 25 23:03:55 sw-dsl kernel: [ 7.153610] brdsl: port 3(eno2) entered disabled state Aug 25 23:03:55 sw-dsl kernel: [ 7.153720] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off Aug 25 23:03:55 sw-dsl kernel: [ 7.153790] device eno2 entered promiscuous mode Aug 25 23:03:55 sw-dsl kernel: [ 7.153890] brdsl: port 3(eno2) entered blocking state Aug 25 23:03:55 sw-dsl kernel: [ 7.153895] brdsl: port 3(eno2) entered forwarding state Aug 25 23:03:55 sw-dsl kernel: [ 7.153930] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready Aug 25 23:03:55 sw-dsl kernel: [ 7.295739] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off Aug 25 23:03:55 sw-dsl kernel: [ 7.575615] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode Aug 25 23:03:55 sw-dsl kernel: [ 7.575622] mv88e6085 f1072004.mdio-mii:04 lan1: config interface gmii Aug 25 23:03:55 sw-dsl kernel: [ 7.575630] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0 Aug 25 23:03:55 sw-dsl kernel: [ 7.575634] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config() Aug 25 23:03:55 sw-dsl kernel: [ 7.579334] 8021q: adding VLAN 0 to HW filter on device lan1 Aug 25 23:03:55 sw-dsl kernel: [ 7.635966] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off
On Thu, Sep 02, 2021 at 05:31:44PM +0100, Russell King (Oracle) wrote: > On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote: > > On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote: > > > Debian has had support for configuring bridges at boot time via > > > the interfaces file for years. Breaking that is going to upset a > > > lot of people (me included) resulting in busted networks. It > > > would be a sure way to make oneself unpopular. > > > > > > > I expect there to be 2 call paths of phy_attach_direct: > > > > - At probe time. Both the MAC driver and the PHY driver are probing. > > > > This is what has this patch addresses. There is no issue to return > > > > -EPROBE_DEFER at that time, since drivers connect to the PHY before > > > > they register their netdev. So if connecting defers, there is no > > > > netdev to unregister, and user space knows nothing of this. > > > > - At .ndo_open time. This is where it maybe gets interesting, but not to > > > > user space. If you open a netdev and it connects to the PHY then, I > > > > wouldn't expect the PHY to be undergoing a probing process, all of > > > > that should have been settled by then, should it not? Where it might > > > > get interesting is with NFS root, and I admit I haven't tested that. > > > > > > I don't think you can make that assumption. Consider the case where > > > systemd is being used, DSA stuff is modular, and we're trying to > > > setup a bridge device on DSA. DSA could be probing while the bridge > > > is being setup. > > > > > > Sadly, this isn't theoretical. I've ended up needing: > > > > > > pre-up sleep 1 > > > > > > in my bridge configuration to allow time for DSA to finish probing. > > > It's not a pleasant solution, nor a particularly reliable one at > > > that, but it currently works around the problem. > > > > What problem? This is the first time I've heard of this report, and you > > should definitely not need that. > > I found it when upgrading the Clearfog by the DSL modems to v5.13. > When I rebooted it with a previously working kernel (v5.7) it has > never had a problem. With v5.13, it failed to add all the lan ports > into the bridge, because the bridge was still being setup by the > kernel while userspace was trying to configure it. Note that I have > extra debug in my kernels, hence the extra messages: Ok, first you talked about the interfaces file, then systemd. If it's not about systemd's network manager then I don't see how it is relevant. What package and version is this exactly, ifupdown, ifupdown2, ifupdown-ng, busybox ifupdown? I think they all use the interfaces file. > Aug 30 11:29:52 sw-dsl kernel: [ 3.308583] Marvell 88E1540 mv88e6xxx-0:03: probe: irq=78 > Aug 30 11:29:52 sw-dsl kernel: [ 3.308595] Marvell 88E1540 mv88e6xxx-0:03: probe: irq=78 > Aug 30 11:29:52 sw-dsl kernel: [ 3.332403] Marvell 88E1540 mv88e6xxx-0:04: probe: irq=79 > Aug 30 11:29:52 sw-dsl kernel: [ 3.332415] Marvell 88E1540 mv88e6xxx-0:04: probe: irq=79 > Aug 30 11:29:52 sw-dsl kernel: [ 3.412638] Marvell 88E1545 mv88e6xxx-0:0f: probe: irq=-1 > Aug 30 11:29:52 sw-dsl kernel: [ 3.412649] Marvell 88E1545 mv88e6xxx-0:0f: probe: irq=-1 > Aug 30 11:29:52 sw-dsl kernel: [ 3.515888] libphy: mv88e6xxx SMI: probed > > Here, userspace starts configuring eno1, the ethernet port connected > to the DSA switch: > > Aug 30 11:29:52 sw-dsl kernel: [ 3.536090] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode > Aug 30 11:29:52 sw-dsl kernel: [ 3.536109] mvneta f1030000.ethernet eno1: major config 1000base-x > Aug 30 11:29:52 sw-dsl kernel: [ 3.536117] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=0000000,00000200,00002240 pause=04 link=0 an=1 > Aug 30 11:29:52 sw-dsl kernel: [ 3.536135] mvneta f1030000.ethernet eno1: mac link down > Aug 30 11:29:52 sw-dsl kernel: [ 3.536135] mvneta f1030000.ethernet eno1: mac link down > Aug 30 11:29:52 sw-dsl kernel: [ 3.536146] mvneta f1030000.ethernet eno1: mac link down > Aug 30 11:29:52 sw-dsl kernel: [ 3.536146] mvneta f1030000.ethernet eno1: mac link down > Aug 30 11:29:52 sw-dsl kernel: [ 3.572013] mvneta f1030000.ethernet eno1: mac link up > Aug 30 11:29:52 sw-dsl kernel: [ 3.572016] mvneta f1030000.ethernet eno1: mac link up > Aug 30 11:29:52 sw-dsl kernel: [ 3.572046] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx > Aug 30 11:29:52 sw-dsl kernel: [ 3.657820] 8021q: 802.1Q VLAN Support v1.8 > > We get the link to eno1 going down/up due to DSA's actions: What "actions"? There were only 2 DSA changes related to the state of the master interface, but DSA never forces the master to go down. Quite the opposite, it forces the master up when it needs to, and it goes down when the master goes down. See: 9d5ef190e561 ("net: dsa: automatically bring up DSA master when opening user port") c0a8a9c27493 ("net: dsa: automatically bring user ports down when master goes down") So if eno1 goes down and that causes breakage, DSA did not trigger it. Also, please note that eno1 goes down in your "working" example too. > > Aug 30 11:29:53 sw-dsl kernel: [ 4.291882] mvneta f1030000.ethernet eno1: Link is Down > Aug 30 11:29:53 sw-dsl kernel: [ 4.309425] mvneta f1030000.ethernet eno1: mac link down > Aug 30 11:29:53 sw-dsl kernel: [ 4.309425] mvneta f1030000.ethernet eno1: mac link down > Aug 30 11:29:53 sw-dsl kernel: [ 4.309440] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode > Aug 30 11:29:53 sw-dsl kernel: [ 4.309447] mvneta f1030000.ethernet eno1: major config 1000base-x > Aug 30 11:29:53 sw-dsl kernel: [ 4.309454] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=0000000,00000200,00002240 pause=04 link=0 an=1 > Aug 30 11:29:53 sw-dsl kernel: [ 4.345013] mvneta f1030000.ethernet eno1: mac link up > Aug 30 11:29:53 sw-dsl kernel: [ 4.345014] mvneta f1030000.ethernet eno1: mac link up > Aug 30 11:29:53 sw-dsl kernel: [ 4.345036] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx > > DSA then starts initialising the ports: > > Aug 30 11:29:53 sw-dsl kernel: [ 4.397647] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=75) > Aug 30 11:29:53 sw-dsl kernel: [ 4.397663] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising > 0000000,00000000,000022ef > Aug 30 11:29:53 sw-dsl kernel: [ 4.493080] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=76) > Aug 30 11:29:53 sw-dsl kernel: [ 4.493093] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising > 0000000,00000000,000022ef > Aug 30 11:29:53 sw-dsl kernel: [ 4.577070] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=77) > Aug 30 11:29:53 sw-dsl kernel: [ 4.577081] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising > 0000000,00000000,000022ef > > Meanwhile userspace is trying to setup the bridge while this is going > on, and has tried to add the non-existent lan2 at this point, but > lan4 has just been created in time, so Debian's bridge support adds > it to the brdsl bridge: > > Aug 30 11:29:53 sw-dsl kernel: [ 4.652237] brdsl: port 1(lan4) entered blocking state > Aug 30 11:29:53 sw-dsl kernel: [ 4.652250] brdsl: port 1(lan4) entered disabled state > > DSA continues setting up the other ports, here lan2, but the bridge > setup scripts have already moved on past lan2. How does this program know that lan2 exists before it starts attempting to enslave it to a bridge via the brctl program, and what does DSA do to violate that assumption? > > Aug 30 11:29:53 sw-dsl kernel: [ 4.674038] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=78) > Aug 30 11:29:53 sw-dsl kernel: [ 4.674052] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising > 0000000,00000000,000022ef > Aug 30 11:29:53 sw-dsl kernel: [ 4.674612] device lan4 entered promiscuous mode > Aug 30 11:29:53 sw-dsl kernel: [ 4.785886] device eno1 entered promiscuous mode > Aug 30 11:29:53 sw-dsl kernel: [ 4.786971] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode > Aug 30 11:29:53 sw-dsl kernel: [ 4.786980] mv88e6085 f1072004.mdio-mii:04 lan4: major config gmii > Aug 30 11:29:53 sw-dsl kernel: [ 4.786986] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 > Aug 30 11:29:53 sw-dsl kernel: [ 4.786996] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config() > Aug 30 11:29:53 sw-dsl kernel: [ 4.789977] 8021q: adding VLAN 0 to HW filter > on device lan4 > Aug 30 11:29:53 sw-dsl kernel: [ 4.836720] brdsl: port 2(eno2) entered blocking state > Aug 30 11:29:53 sw-dsl kernel: [ 4.836733] brdsl: port 2(eno2) entered disabled state > > Here, the SFP port (on eno2) is added to the bridge. > > Aug 30 11:29:53 sw-dsl kernel: [ 4.836907] device eno2 entered promiscuous mode > Aug 30 11:29:53 sw-dsl kernel: [ 4.837011] brdsl: port 2(eno2) entered blocking state > Aug 30 11:29:53 sw-dsl kernel: [ 4.837019] brdsl: port 2(eno2) entered forwarding state > Aug 30 11:29:53 sw-dsl kernel: [ 4.837058] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready > Aug 30 11:29:53 sw-dsl kernel: [ 4.846989] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off > Aug 30 11:29:53 sw-dsl kernel: [ 4.896264] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=79) > Aug 30 11:29:53 sw-dsl kernel: [ 4.896278] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising > 0000000,00000000,000022ef > Aug 30 11:29:53 sw-dsl kernel: [ 4.934514] DSA: tree 0 setup > > Here, the DSA tree has finally finished initialising in the kernel. > > Aug 30 11:29:53 sw-dsl kernel: [ 4.986877] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode > Aug 30 11:29:53 sw-dsl kernel: [ 4.986890] mv88e6085 f1072004.mdio-mii:04 lan1: major config gmii > Aug 30 11:29:53 sw-dsl kernel: [ 4.986896] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 > Aug 30 11:29:53 sw-dsl kernel: [ 4.986907] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config() > Aug 30 11:29:53 sw-dsl kernel: [ 4.990199] 8021q: adding VLAN 0 to HW filter > on device lan1 > Aug 30 11:29:54 sw-dsl kernel: [ 5.041313] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off > Aug 30 11:29:56 sw-dsl kernel: [ 7.630016] mv88e6085 f1072004.mdio-mii:04 lan4: phy link up gmii/1Gbps/Full/off > Aug 30 11:29:56 sw-dsl kernel: [ 7.630031] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=00 link=1 an=0 > Aug 30 11:29:56 sw-dsl kernel: [ 7.630043] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config() > Aug 30 11:29:56 sw-dsl kernel: [ 7.630294] mv88e6085 f1072004.mdio-mii:04 lan4: Link is Up - 1Gbps/Full - flow control off > Aug 30 11:29:56 sw-dsl kernel: [ 7.630312] brdsl: port 1(lan4) entered blocking state > Aug 30 11:29:56 sw-dsl kernel: [ 7.630321] brdsl: port 1(lan4) entered forwarding state > > I then notice that my Internet connection hasn't come back, so I start > poking about with it, first adding it to the bridge: > > Aug 30 11:31:13 sw-dsl kernel: [ 84.990122] brdsl: port 3(lan2) entered blocking state > Aug 30 11:31:13 sw-dsl kernel: [ 84.990134] brdsl: port 3(lan2) entered disabled state > Aug 30 11:31:14 sw-dsl kernel: [ 85.063971] device lan2 entered promiscuous mode > > And then setting it to up state and configuring its vlan settings: > > Aug 30 11:32:45 sw-dsl kernel: [ 176.476090] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode > Aug 30 11:32:45 sw-dsl kernel: [ 176.476103] mv88e6085 f1072004.mdio-mii:04 lan2: major config gmii > Aug 30 11:32:45 sw-dsl kernel: [ 176.476109] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 > Aug 30 11:32:45 sw-dsl kernel: [ 176.476120] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config() > Aug 30 11:32:45 sw-dsl kernel: [ 176.479495] 8021q: adding VLAN 0 to HW filter > on device lan2 > Aug 30 11:32:45 sw-dsl kernel: [ 176.537796] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off > Aug 30 11:32:48 sw-dsl kernel: [ 179.280863] mv88e6085 f1072004.mdio-mii:04 lan2: phy link up gmii/1Gbps/Full/rx/tx > Aug 30 11:32:48 sw-dsl kernel: [ 179.280877] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=03 link=1 an=0 > Aug 30 11:32:48 sw-dsl kernel: [ 179.280888] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config() > Aug 30 11:32:48 sw-dsl kernel: [ 179.280894] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_link_up() > Aug 30 11:32:48 sw-dsl kernel: [ 179.282958] mv88e6085 f1072004.mdio-mii:04 lan2: Link is Up - 1Gbps/Full - flow control rx/tx > > I had: > > iface brdsl inet manual > bridge-ports lan2 lan4 > bridge-maxwait 0 > up brctl addif $IFACE eno2 > > I now have: > iface brdsl inet manual > bridge-ports lan2 lan4 > bridge-waitport 10 > bridge-maxwait 0 > pre-up sleep 1 > up brctl addif $IFACE eno2 I searched google for the "bridge-ports" keyword relative to ifupdown and could not find the source code of a program which parses this. Could you let me know what is the source code of the program you are using? > > to ensure that all ports get properly configured. > > What can be seen from the above is that there is most definitely a race. > It is possible to start configuring a DSA switch before the DSA switch > driver has finished being probed by the kernel. > > Here is the kernel log from v5.7 which has never showed these problems, > because DSA seemed to always setup everything in kernel space prior to > userspace beginning configuration: > > Aug 25 23:03:54 sw-dsl kernel: [ 5.793137] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode > Aug 25 23:03:54 sw-dsl kernel: [ 5.793148] mvneta f1030000.ethernet eno1: config interface 1000base-x > Aug 25 23:03:54 sw-dsl kernel: [ 5.793157] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1 > Aug 25 23:03:54 sw-dsl kernel: [ 5.793168] mvneta f1030000.ethernet eno1: mac link down > Aug 25 23:03:54 sw-dsl kernel: [ 5.793170] mvneta f1030000.ethernet eno1: mac link down > Aug 25 23:03:54 sw-dsl kernel: [ 5.819769] mvneta f1030000.ethernet eno1: mac link up > Aug 25 23:03:54 sw-dsl kernel: [ 5.819792] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx > Aug 25 23:03:54 sw-dsl kernel: [ 5.948900] 8021q: 802.1Q VLAN Support v1.8 > 6.459779] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 0 > Aug 25 23:03:54 sw-dsl kernel: [ 6.462890] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=67) > Aug 25 23:03:54 sw-dsl kernel: [ 6.462905] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef > Aug 25 23:03:54 sw-dsl kernel: [ 6.465904] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 1 > Aug 25 23:03:54 sw-dsl kernel: [ 6.468101] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=68) > Aug 25 23:03:54 sw-dsl kernel: [ 6.468109] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef > Aug 25 23:03:54 sw-dsl kernel: [ 6.472162] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 2 > Aug 25 23:03:54 sw-dsl kernel: [ 6.474247] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=69) > Aug 25 23:03:54 sw-dsl kernel: [ 6.474261] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef > Aug 25 23:03:54 sw-dsl kernel: [ 6.481824] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 3 > Aug 25 23:03:54 sw-dsl kernel: [ 6.486354] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=70) > Aug 25 23:03:54 sw-dsl kernel: [ 6.486363] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef > Aug 25 23:03:54 sw-dsl kernel: [ 6.498494] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 4 > Aug 25 23:03:54 sw-dsl kernel: [ 6.502272] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=71) > Aug 25 23:03:54 sw-dsl kernel: [ 6.502279] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef > Aug 25 23:03:54 sw-dsl kernel: [ 6.532258] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 6 > Aug 25 23:03:54 sw-dsl kernel: [ 6.535877] mvneta f1030000.ethernet eno1: Link is Down > Aug 25 23:03:54 sw-dsl kernel: [ 6.541733] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode > Aug 25 23:03:54 sw-dsl kernel: [ 6.541741] mvneta f1030000.ethernet eno1: config interface 1000base-x > Aug 25 23:03:54 sw-dsl kernel: [ 6.541754] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1 > Aug 25 23:03:54 sw-dsl kernel: [ 6.541771] mvneta f1030000.ethernet eno1: mac link down > Aug 25 23:03:54 sw-dsl kernel: [ 6.541779] mvneta f1030000.ethernet eno1: mac link down > Aug 25 23:03:54 sw-dsl kernel: [ 6.541907] DSA: tree 0 setup > > Here, the kernel DSA switch driver has finished doing its setup > before we even get to configuring the bridge device below. > > Aug 25 23:03:54 sw-dsl kernel: [ 6.569105] mvneta f1030000.ethernet eno1: mac link up > Aug 25 23:03:54 sw-dsl kernel: [ 6.569113] mvneta f1030000.ethernet eno1: mac link up > Aug 25 23:03:54 sw-dsl kernel: [ 6.569139] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx > Aug 25 23:03:55 sw-dsl kernel: [ 6.931763] brdsl: port 1(lan2) entered blocking state > Aug 25 23:03:55 sw-dsl kernel: [ 6.931769] brdsl: port 1(lan2) entered disabled state > Aug 25 23:03:55 sw-dsl kernel: [ 6.932863] device lan2 entered promiscuous mode > Aug 25 23:03:55 sw-dsl kernel: [ 7.032838] device eno1 entered promiscuous mode > Aug 25 23:03:55 sw-dsl kernel: [ 7.032902] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode > Aug 25 23:03:55 sw-dsl kernel: [ 7.032907] mv88e6085 f1072004.mdio-mii:04 lan2: config interface gmii > Aug 25 23:03:55 sw-dsl kernel: [ 7.032916] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0 > Aug 25 23:03:55 sw-dsl kernel: [ 7.032920] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config() > Aug 25 23:03:55 sw-dsl kernel: [ 7.037225] 8021q: adding VLAN 0 to HW filter > on device lan2 > Aug 25 23:03:55 sw-dsl kernel: [ 7.044979] brdsl: port 2(lan4) entered blocking state > Aug 25 23:03:55 sw-dsl kernel: [ 7.044985] brdsl: port 2(lan4) entered disabled state > Aug 25 23:03:55 sw-dsl kernel: [ 7.056189] device lan4 entered promiscuous mode > Aug 25 23:03:55 sw-dsl kernel: [ 7.107067] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode > Aug 25 23:03:55 sw-dsl kernel: [ 7.107073] mv88e6085 f1072004.mdio-mii:04 lan4: config interface gmii > Aug 25 23:03:55 sw-dsl kernel: [ 7.107080] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0 > Aug 25 23:03:55 sw-dsl kernel: [ 7.107084] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config() > Aug 25 23:03:55 sw-dsl kernel: [ 7.118831] 8021q: adding VLAN 0 to HW filter > on device lan4 > Aug 25 23:03:55 sw-dsl kernel: [ 7.153604] brdsl: port 3(eno2) entered blocking state > Aug 25 23:03:55 sw-dsl kernel: [ 7.153610] brdsl: port 3(eno2) entered disabled state > Aug 25 23:03:55 sw-dsl kernel: [ 7.153720] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off > Aug 25 23:03:55 sw-dsl kernel: [ 7.153790] device eno2 entered promiscuous mode > Aug 25 23:03:55 sw-dsl kernel: [ 7.153890] brdsl: port 3(eno2) entered blocking state > Aug 25 23:03:55 sw-dsl kernel: [ 7.153895] brdsl: port 3(eno2) entered forwarding state > Aug 25 23:03:55 sw-dsl kernel: [ 7.153930] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready > Aug 25 23:03:55 sw-dsl kernel: [ 7.295739] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off > Aug 25 23:03:55 sw-dsl kernel: [ 7.575615] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode > Aug 25 23:03:55 sw-dsl kernel: [ 7.575622] mv88e6085 f1072004.mdio-mii:04 lan1: config interface gmii > Aug 25 23:03:55 sw-dsl kernel: [ 7.575630] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0 > Aug 25 23:03:55 sw-dsl kernel: [ 7.575634] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config() > Aug 25 23:03:55 sw-dsl kernel: [ 7.579334] 8021q: adding VLAN 0 to HW filter > on device lan1 > Aug 25 23:03:55 sw-dsl kernel: [ 7.635966] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off > > -- > RMK's Patch system: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.armlinux.org.uk%2Fdeveloper%2Fpatches%2F&data=04%7C01%7Cvladimir.oltean%40nxp.com%7C4226a7652ae7497284df08d96e2f29e4%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637661971114812881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6hDf%2FS%2FMnpRhzEYuW14zuaEAcaTgdMsQJPpmR9WA5cI%3D&reserved=0 > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Thu, Sep 02, 2021 at 05:10:34PM +0000, Vladimir Oltean wrote: > On Thu, Sep 02, 2021 at 05:31:44PM +0100, Russell King (Oracle) wrote: > > On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote: > > > On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote: > > > > Debian has had support for configuring bridges at boot time via > > > > the interfaces file for years. Breaking that is going to upset a > > > > lot of people (me included) resulting in busted networks. It > > > > would be a sure way to make oneself unpopular. > > > > > > > > > I expect there to be 2 call paths of phy_attach_direct: > > > > > - At probe time. Both the MAC driver and the PHY driver are probing. > > > > > This is what has this patch addresses. There is no issue to return > > > > > -EPROBE_DEFER at that time, since drivers connect to the PHY before > > > > > they register their netdev. So if connecting defers, there is no > > > > > netdev to unregister, and user space knows nothing of this. > > > > > - At .ndo_open time. This is where it maybe gets interesting, but not to > > > > > user space. If you open a netdev and it connects to the PHY then, I > > > > > wouldn't expect the PHY to be undergoing a probing process, all of > > > > > that should have been settled by then, should it not? Where it might > > > > > get interesting is with NFS root, and I admit I haven't tested that. > > > > > > > > I don't think you can make that assumption. Consider the case where > > > > systemd is being used, DSA stuff is modular, and we're trying to > > > > setup a bridge device on DSA. DSA could be probing while the bridge > > > > is being setup. > > > > > > > > Sadly, this isn't theoretical. I've ended up needing: > > > > > > > > pre-up sleep 1 > > > > > > > > in my bridge configuration to allow time for DSA to finish probing. > > > > It's not a pleasant solution, nor a particularly reliable one at > > > > that, but it currently works around the problem. > > > > > > What problem? This is the first time I've heard of this report, and you > > > should definitely not need that. > > > > I found it when upgrading the Clearfog by the DSL modems to v5.13. > > When I rebooted it with a previously working kernel (v5.7) it has > > never had a problem. With v5.13, it failed to add all the lan ports > > into the bridge, because the bridge was still being setup by the > > kernel while userspace was trying to configure it. Note that I have > > extra debug in my kernels, hence the extra messages: > > Ok, first you talked about the interfaces file, then systemd. If it's > not about systemd's network manager then I don't see how it is relevant. You're reading in stuff to what I write that I did not write... I said: "Consider the case where systemd is being used, DSA stuff is modular, and we're trying to setup a bridge device on DSA." That does not mean I'm using systemd's network manager - which is something I know little about and have never used. The reason I mentioned systemd is precisely because with systemd, you get a hell of a lot happening parallel - and that's significiant in this case, because it's very clear that modules are being loaded in parallel with networking being brought up - and that is where the problems begin. In fact, modules themselves get loaded in paralllel with systemd. > What package and version is this exactly, ifupdown, ifupdown2, > ifupdown-ng, busybox ifupdown? I think they all use the interfaces file. It's a standard uptodate debian oldstable (buster) install, not yet upgraded to bullseye: ifupdown 0.8.35 bridge-utils 1.6-2 > > We get the link to eno1 going down/up due to DSA's actions: > > What "actions"? There were only 2 DSA changes related to the state of > the master interface, but DSA never forces the master to go down. Quite > the opposite, it forces the master up when it needs to, and it goes down > when the master goes down. See: > > 9d5ef190e561 ("net: dsa: automatically bring up DSA master when opening user port") > c0a8a9c27493 ("net: dsa: automatically bring user ports down when master goes down") mv88e6xxx will temporarily force the link down while the port is being configured if one asks it to operate in in-band mode (which I have.) > So if eno1 goes down and that causes breakage, DSA did not trigger it. > Also, please note that eno1 goes down in your "working" example too. I'm not complaining about this. It's a non-problem. It does however serve as an indication where we are through the bring-up. > > Aug 30 11:29:53 sw-dsl kernel: [ 4.291882] mvneta f1030000.ethernet eno1: Link is Down > > Aug 30 11:29:53 sw-dsl kernel: [ 4.309425] mvneta f1030000.ethernet eno1: mac link down > > Aug 30 11:29:53 sw-dsl kernel: [ 4.309425] mvneta f1030000.ethernet eno1: mac link down > > Aug 30 11:29:53 sw-dsl kernel: [ 4.309440] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode > > Aug 30 11:29:53 sw-dsl kernel: [ 4.309447] mvneta f1030000.ethernet eno1: major config 1000base-x > > Aug 30 11:29:53 sw-dsl kernel: [ 4.309454] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=0000000,00000200,00002240 pause=04 link=0 an=1 > > Aug 30 11:29:53 sw-dsl kernel: [ 4.345013] mvneta f1030000.ethernet eno1: mac link up > > Aug 30 11:29:53 sw-dsl kernel: [ 4.345014] mvneta f1030000.ethernet eno1: mac link up > > Aug 30 11:29:53 sw-dsl kernel: [ 4.345036] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx > > > > DSA then starts initialising the ports: > > > > Aug 30 11:29:53 sw-dsl kernel: [ 4.397647] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=75) > > Aug 30 11:29:53 sw-dsl kernel: [ 4.397663] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising > > 0000000,00000000,000022ef > > Aug 30 11:29:53 sw-dsl kernel: [ 4.493080] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=76) > > Aug 30 11:29:53 sw-dsl kernel: [ 4.493093] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising > > 0000000,00000000,000022ef > > Aug 30 11:29:53 sw-dsl kernel: [ 4.577070] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=77) > > Aug 30 11:29:53 sw-dsl kernel: [ 4.577081] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising > > 0000000,00000000,000022ef > > > > Meanwhile userspace is trying to setup the bridge while this is going > > on, and has tried to add the non-existent lan2 at this point, but > > lan4 has just been created in time, so Debian's bridge support adds > > it to the brdsl bridge: > > > > Aug 30 11:29:53 sw-dsl kernel: [ 4.652237] brdsl: port 1(lan4) entered blocking state > > Aug 30 11:29:53 sw-dsl kernel: [ 4.652250] brdsl: port 1(lan4) entered disabled state > > > > DSA continues setting up the other ports, here lan2, but the bridge > > setup scripts have already moved on past lan2. > > How does this program know that lan2 exists before it starts attempting > to enslave it to a bridge via the brctl program, and what does DSA do to > violate that assumption? This is the whole point I'm trying to get across to you - these are _scripts_. They aren't some fancy program that runs in the background. They assume that the interfaces are already there - as can be seen from my v5.7 log, they are. With v5.13, they aren't because stuff starts coming up while DSA is still initialising. > > Aug 30 11:29:53 sw-dsl kernel: [ 4.674038] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=78) > > Aug 30 11:29:53 sw-dsl kernel: [ 4.674052] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising > > 0000000,00000000,000022ef > > Aug 30 11:29:53 sw-dsl kernel: [ 4.674612] device lan4 entered promiscuous mode > > Aug 30 11:29:53 sw-dsl kernel: [ 4.785886] device eno1 entered promiscuous mode > > Aug 30 11:29:53 sw-dsl kernel: [ 4.786971] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode > > Aug 30 11:29:53 sw-dsl kernel: [ 4.786980] mv88e6085 f1072004.mdio-mii:04 lan4: major config gmii > > Aug 30 11:29:53 sw-dsl kernel: [ 4.786986] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 > > Aug 30 11:29:53 sw-dsl kernel: [ 4.786996] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config() > > Aug 30 11:29:53 sw-dsl kernel: [ 4.789977] 8021q: adding VLAN 0 to HW filter > > on device lan4 > > Aug 30 11:29:53 sw-dsl kernel: [ 4.836720] brdsl: port 2(eno2) entered blocking state > > Aug 30 11:29:53 sw-dsl kernel: [ 4.836733] brdsl: port 2(eno2) entered disabled state > > > > Here, the SFP port (on eno2) is added to the bridge. > > > > Aug 30 11:29:53 sw-dsl kernel: [ 4.836907] device eno2 entered promiscuous mode > > Aug 30 11:29:53 sw-dsl kernel: [ 4.837011] brdsl: port 2(eno2) entered blocking state > > Aug 30 11:29:53 sw-dsl kernel: [ 4.837019] brdsl: port 2(eno2) entered forwarding state > > Aug 30 11:29:53 sw-dsl kernel: [ 4.837058] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready > > Aug 30 11:29:53 sw-dsl kernel: [ 4.846989] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off > > Aug 30 11:29:53 sw-dsl kernel: [ 4.896264] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=79) > > Aug 30 11:29:53 sw-dsl kernel: [ 4.896278] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising > > 0000000,00000000,000022ef > > Aug 30 11:29:53 sw-dsl kernel: [ 4.934514] DSA: tree 0 setup > > > > Here, the DSA tree has finally finished initialising in the kernel. > > > > Aug 30 11:29:53 sw-dsl kernel: [ 4.986877] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode > > Aug 30 11:29:53 sw-dsl kernel: [ 4.986890] mv88e6085 f1072004.mdio-mii:04 lan1: major config gmii > > Aug 30 11:29:53 sw-dsl kernel: [ 4.986896] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 > > Aug 30 11:29:53 sw-dsl kernel: [ 4.986907] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config() > > Aug 30 11:29:53 sw-dsl kernel: [ 4.990199] 8021q: adding VLAN 0 to HW filter > > on device lan1 > > Aug 30 11:29:54 sw-dsl kernel: [ 5.041313] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off > > Aug 30 11:29:56 sw-dsl kernel: [ 7.630016] mv88e6085 f1072004.mdio-mii:04 lan4: phy link up gmii/1Gbps/Full/off > > Aug 30 11:29:56 sw-dsl kernel: [ 7.630031] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=00 link=1 an=0 > > Aug 30 11:29:56 sw-dsl kernel: [ 7.630043] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config() > > Aug 30 11:29:56 sw-dsl kernel: [ 7.630294] mv88e6085 f1072004.mdio-mii:04 lan4: Link is Up - 1Gbps/Full - flow control off > > Aug 30 11:29:56 sw-dsl kernel: [ 7.630312] brdsl: port 1(lan4) entered blocking state > > Aug 30 11:29:56 sw-dsl kernel: [ 7.630321] brdsl: port 1(lan4) entered forwarding state > > > > I then notice that my Internet connection hasn't come back, so I start > > poking about with it, first adding it to the bridge: > > > > Aug 30 11:31:13 sw-dsl kernel: [ 84.990122] brdsl: port 3(lan2) entered blocking state > > Aug 30 11:31:13 sw-dsl kernel: [ 84.990134] brdsl: port 3(lan2) entered disabled state > > Aug 30 11:31:14 sw-dsl kernel: [ 85.063971] device lan2 entered promiscuous mode > > > > And then setting it to up state and configuring its vlan settings: > > > > Aug 30 11:32:45 sw-dsl kernel: [ 176.476090] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode > > Aug 30 11:32:45 sw-dsl kernel: [ 176.476103] mv88e6085 f1072004.mdio-mii:04 lan2: major config gmii > > Aug 30 11:32:45 sw-dsl kernel: [ 176.476109] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 > > Aug 30 11:32:45 sw-dsl kernel: [ 176.476120] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config() > > Aug 30 11:32:45 sw-dsl kernel: [ 176.479495] 8021q: adding VLAN 0 to HW filter > > on device lan2 > > Aug 30 11:32:45 sw-dsl kernel: [ 176.537796] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off > > Aug 30 11:32:48 sw-dsl kernel: [ 179.280863] mv88e6085 f1072004.mdio-mii:04 lan2: phy link up gmii/1Gbps/Full/rx/tx > > Aug 30 11:32:48 sw-dsl kernel: [ 179.280877] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=03 link=1 an=0 > > Aug 30 11:32:48 sw-dsl kernel: [ 179.280888] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config() > > Aug 30 11:32:48 sw-dsl kernel: [ 179.280894] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_link_up() > > Aug 30 11:32:48 sw-dsl kernel: [ 179.282958] mv88e6085 f1072004.mdio-mii:04 lan2: Link is Up - 1Gbps/Full - flow control rx/tx > > > > I had: > > > > iface brdsl inet manual > > bridge-ports lan2 lan4 > > bridge-maxwait 0 > > up brctl addif $IFACE eno2 > > > > I now have: > > iface brdsl inet manual > > bridge-ports lan2 lan4 > > bridge-waitport 10 > > bridge-maxwait 0 > > pre-up sleep 1 > > up brctl addif $IFACE eno2 > > I searched google for the "bridge-ports" keyword relative to ifupdown > and could not find the source code of a program which parses this. Could > you let me know what is the source code of the program you are using? It's a script, see the debian bridge-utils package: /lib/bridge-utils/ifupdown.sh Also see the ifup man page - ifup converts much of the interfaces file into environment variables for called hook scripts in /etc/network/*.d to make use of. So e.g. bridge-ports becomes $IF_BRIDGE_PORTS etc. Debian has been using this method since probably shortly after bridge support was introduced - it's been around for a very long time. > > to ensure that all ports get properly configured. > > > > What can be seen from the above is that there is most definitely a race. > > It is possible to start configuring a DSA switch before the DSA switch > > driver has finished being probed by the kernel. > > > > Here is the kernel log from v5.7 which has never showed these problems, > > because DSA seemed to always setup everything in kernel space prior to > > userspace beginning configuration: > > > > Aug 25 23:03:54 sw-dsl kernel: [ 5.793137] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode > > Aug 25 23:03:54 sw-dsl kernel: [ 5.793148] mvneta f1030000.ethernet eno1: config interface 1000base-x > > Aug 25 23:03:54 sw-dsl kernel: [ 5.793157] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1 > > Aug 25 23:03:54 sw-dsl kernel: [ 5.793168] mvneta f1030000.ethernet eno1: mac link down > > Aug 25 23:03:54 sw-dsl kernel: [ 5.793170] mvneta f1030000.ethernet eno1: mac link down > > Aug 25 23:03:54 sw-dsl kernel: [ 5.819769] mvneta f1030000.ethernet eno1: mac link up > > Aug 25 23:03:54 sw-dsl kernel: [ 5.819792] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx > > Aug 25 23:03:54 sw-dsl kernel: [ 5.948900] 8021q: 802.1Q VLAN Support v1.8 > > 6.459779] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 0 > > Aug 25 23:03:54 sw-dsl kernel: [ 6.462890] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=67) > > Aug 25 23:03:54 sw-dsl kernel: [ 6.462905] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef > > Aug 25 23:03:54 sw-dsl kernel: [ 6.465904] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 1 > > Aug 25 23:03:54 sw-dsl kernel: [ 6.468101] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=68) > > Aug 25 23:03:54 sw-dsl kernel: [ 6.468109] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef > > Aug 25 23:03:54 sw-dsl kernel: [ 6.472162] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 2 > > Aug 25 23:03:54 sw-dsl kernel: [ 6.474247] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=69) > > Aug 25 23:03:54 sw-dsl kernel: [ 6.474261] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef > > Aug 25 23:03:54 sw-dsl kernel: [ 6.481824] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 3 > > Aug 25 23:03:54 sw-dsl kernel: [ 6.486354] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=70) > > Aug 25 23:03:54 sw-dsl kernel: [ 6.486363] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef > > Aug 25 23:03:54 sw-dsl kernel: [ 6.498494] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 4 > > Aug 25 23:03:54 sw-dsl kernel: [ 6.502272] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=71) > > Aug 25 23:03:54 sw-dsl kernel: [ 6.502279] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef > > Aug 25 23:03:54 sw-dsl kernel: [ 6.532258] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 6 > > Aug 25 23:03:54 sw-dsl kernel: [ 6.535877] mvneta f1030000.ethernet eno1: Link is Down > > Aug 25 23:03:54 sw-dsl kernel: [ 6.541733] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode > > Aug 25 23:03:54 sw-dsl kernel: [ 6.541741] mvneta f1030000.ethernet eno1: config interface 1000base-x > > Aug 25 23:03:54 sw-dsl kernel: [ 6.541754] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1 > > Aug 25 23:03:54 sw-dsl kernel: [ 6.541771] mvneta f1030000.ethernet eno1: mac link down > > Aug 25 23:03:54 sw-dsl kernel: [ 6.541779] mvneta f1030000.ethernet eno1: mac link down > > Aug 25 23:03:54 sw-dsl kernel: [ 6.541907] DSA: tree 0 setup > > > > Here, the kernel DSA switch driver has finished doing its setup > > before we even get to configuring the bridge device below. > > > > Aug 25 23:03:54 sw-dsl kernel: [ 6.569105] mvneta f1030000.ethernet eno1: mac link up > > Aug 25 23:03:54 sw-dsl kernel: [ 6.569113] mvneta f1030000.ethernet eno1: mac link up > > Aug 25 23:03:54 sw-dsl kernel: [ 6.569139] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx > > Aug 25 23:03:55 sw-dsl kernel: [ 6.931763] brdsl: port 1(lan2) entered blocking state > > Aug 25 23:03:55 sw-dsl kernel: [ 6.931769] brdsl: port 1(lan2) entered disabled state > > Aug 25 23:03:55 sw-dsl kernel: [ 6.932863] device lan2 entered promiscuous mode > > Aug 25 23:03:55 sw-dsl kernel: [ 7.032838] device eno1 entered promiscuous mode > > Aug 25 23:03:55 sw-dsl kernel: [ 7.032902] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode > > Aug 25 23:03:55 sw-dsl kernel: [ 7.032907] mv88e6085 f1072004.mdio-mii:04 lan2: config interface gmii > > Aug 25 23:03:55 sw-dsl kernel: [ 7.032916] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0 > > Aug 25 23:03:55 sw-dsl kernel: [ 7.032920] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config() > > Aug 25 23:03:55 sw-dsl kernel: [ 7.037225] 8021q: adding VLAN 0 to HW filter > > on device lan2 > > Aug 25 23:03:55 sw-dsl kernel: [ 7.044979] brdsl: port 2(lan4) entered blocking state > > Aug 25 23:03:55 sw-dsl kernel: [ 7.044985] brdsl: port 2(lan4) entered disabled state > > Aug 25 23:03:55 sw-dsl kernel: [ 7.056189] device lan4 entered promiscuous mode > > Aug 25 23:03:55 sw-dsl kernel: [ 7.107067] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode > > Aug 25 23:03:55 sw-dsl kernel: [ 7.107073] mv88e6085 f1072004.mdio-mii:04 lan4: config interface gmii > > Aug 25 23:03:55 sw-dsl kernel: [ 7.107080] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0 > > Aug 25 23:03:55 sw-dsl kernel: [ 7.107084] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config() > > Aug 25 23:03:55 sw-dsl kernel: [ 7.118831] 8021q: adding VLAN 0 to HW filter > > on device lan4 > > Aug 25 23:03:55 sw-dsl kernel: [ 7.153604] brdsl: port 3(eno2) entered blocking state > > Aug 25 23:03:55 sw-dsl kernel: [ 7.153610] brdsl: port 3(eno2) entered disabled state > > Aug 25 23:03:55 sw-dsl kernel: [ 7.153720] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off > > Aug 25 23:03:55 sw-dsl kernel: [ 7.153790] device eno2 entered promiscuous mode > > Aug 25 23:03:55 sw-dsl kernel: [ 7.153890] brdsl: port 3(eno2) entered blocking state > > Aug 25 23:03:55 sw-dsl kernel: [ 7.153895] brdsl: port 3(eno2) entered forwarding state > > Aug 25 23:03:55 sw-dsl kernel: [ 7.153930] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready > > Aug 25 23:03:55 sw-dsl kernel: [ 7.295739] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off > > Aug 25 23:03:55 sw-dsl kernel: [ 7.575615] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode > > Aug 25 23:03:55 sw-dsl kernel: [ 7.575622] mv88e6085 f1072004.mdio-mii:04 lan1: config interface gmii > > Aug 25 23:03:55 sw-dsl kernel: [ 7.575630] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0 > > Aug 25 23:03:55 sw-dsl kernel: [ 7.575634] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config() > > Aug 25 23:03:55 sw-dsl kernel: [ 7.579334] 8021q: adding VLAN 0 to HW filter > > on device lan1 > > Aug 25 23:03:55 sw-dsl kernel: [ 7.635966] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off > > > > -- > > RMK's Patch system: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.armlinux.org.uk%2Fdeveloper%2Fpatches%2F&data=04%7C01%7Cvladimir.oltean%40nxp.com%7C4226a7652ae7497284df08d96e2f29e4%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637661971114812881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6hDf%2FS%2FMnpRhzEYuW14zuaEAcaTgdMsQJPpmR9WA5cI%3D&reserved=0 > > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Thu, Sep 02, 2021 at 06:50:43PM +0100, Russell King (Oracle) wrote: > On Thu, Sep 02, 2021 at 05:10:34PM +0000, Vladimir Oltean wrote: > > On Thu, Sep 02, 2021 at 05:31:44PM +0100, Russell King (Oracle) wrote: > > > On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote: > > > > On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote: > > > > > Debian has had support for configuring bridges at boot time via > > > > > the interfaces file for years. Breaking that is going to upset a > > > > > lot of people (me included) resulting in busted networks. It > > > > > would be a sure way to make oneself unpopular. > > > > > > > > > > > I expect there to be 2 call paths of phy_attach_direct: > > > > > > - At probe time. Both the MAC driver and the PHY driver are probing. > > > > > > This is what has this patch addresses. There is no issue to return > > > > > > -EPROBE_DEFER at that time, since drivers connect to the PHY before > > > > > > they register their netdev. So if connecting defers, there is no > > > > > > netdev to unregister, and user space knows nothing of this. > > > > > > - At .ndo_open time. This is where it maybe gets interesting, but not to > > > > > > user space. If you open a netdev and it connects to the PHY then, I > > > > > > wouldn't expect the PHY to be undergoing a probing process, all of > > > > > > that should have been settled by then, should it not? Where it might > > > > > > get interesting is with NFS root, and I admit I haven't tested that. > > > > > > > > > > I don't think you can make that assumption. Consider the case where > > > > > systemd is being used, DSA stuff is modular, and we're trying to > > > > > setup a bridge device on DSA. DSA could be probing while the bridge > > > > > is being setup. > > > > > > > > > > Sadly, this isn't theoretical. I've ended up needing: > > > > > > > > > > pre-up sleep 1 > > > > > > > > > > in my bridge configuration to allow time for DSA to finish probing. > > > > > It's not a pleasant solution, nor a particularly reliable one at > > > > > that, but it currently works around the problem. > > > > > > > > What problem? This is the first time I've heard of this report, and you > > > > should definitely not need that. > > > > > > I found it when upgrading the Clearfog by the DSL modems to v5.13. > > > When I rebooted it with a previously working kernel (v5.7) it has > > > never had a problem. With v5.13, it failed to add all the lan ports > > > into the bridge, because the bridge was still being setup by the > > > kernel while userspace was trying to configure it. Note that I have > > > extra debug in my kernels, hence the extra messages: > > > > Ok, first you talked about the interfaces file, then systemd. If it's > > not about systemd's network manager then I don't see how it is relevant. > > You're reading in stuff to what I write that I did not write... I said: > > "Consider the case where systemd is being used, DSA stuff is modular, > and we're trying to setup a bridge device on DSA." > > That does not mean I'm using systemd's network manager - which is > something I know little about and have never used. You should definitely try it out, it gets a lot of new features added all the time, it uses the netlink interface, it reacts on udev events. > The reason I mentioned systemd is precisely because with systemd, you > get a hell of a lot happening parallel - and that's significiant in > this case, because it's very clear that modules are being loaded in > parallel with networking being brought up - and that is where the > problems begin. In fact, modules themselves get loaded in paralllel > with systemd. So that's what I don't understand. You're saying that the ifupdown service runs in parallel with systemd-modules-load.service, and networking is a kernel module? Doesn't that mean it behaves as expected, then? /shrug/ Have you tried adding an 'After=systemd-modules-load.service' dependency to the ifupdown service? I don't think that DSA is that bad that it registers its net devices outside of the process context in which the insmod mv88e6xxx.ko is called. Quite the opposite, I think (but I haven't actually taken a close look yet) that the component stuff Saravana is proposing would do exactly that. So you "fix" one issue, you introduce another.
On Thu, Sep 02, 2021 at 10:05:07PM +0300, Vladimir Oltean wrote: > On Thu, Sep 02, 2021 at 06:50:43PM +0100, Russell King (Oracle) wrote: > > On Thu, Sep 02, 2021 at 05:10:34PM +0000, Vladimir Oltean wrote: > > > On Thu, Sep 02, 2021 at 05:31:44PM +0100, Russell King (Oracle) wrote: > > > > On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote: > > > > > On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote: > > > > > > Debian has had support for configuring bridges at boot time via > > > > > > the interfaces file for years. Breaking that is going to upset a > > > > > > lot of people (me included) resulting in busted networks. It > > > > > > would be a sure way to make oneself unpopular. > > > > > > > > > > > > > I expect there to be 2 call paths of phy_attach_direct: > > > > > > > - At probe time. Both the MAC driver and the PHY driver are probing. > > > > > > > This is what has this patch addresses. There is no issue to return > > > > > > > -EPROBE_DEFER at that time, since drivers connect to the PHY before > > > > > > > they register their netdev. So if connecting defers, there is no > > > > > > > netdev to unregister, and user space knows nothing of this. > > > > > > > - At .ndo_open time. This is where it maybe gets interesting, but not to > > > > > > > user space. If you open a netdev and it connects to the PHY then, I > > > > > > > wouldn't expect the PHY to be undergoing a probing process, all of > > > > > > > that should have been settled by then, should it not? Where it might > > > > > > > get interesting is with NFS root, and I admit I haven't tested that. > > > > > > > > > > > > I don't think you can make that assumption. Consider the case where > > > > > > systemd is being used, DSA stuff is modular, and we're trying to > > > > > > setup a bridge device on DSA. DSA could be probing while the bridge > > > > > > is being setup. > > > > > > > > > > > > Sadly, this isn't theoretical. I've ended up needing: > > > > > > > > > > > > pre-up sleep 1 > > > > > > > > > > > > in my bridge configuration to allow time for DSA to finish probing. > > > > > > It's not a pleasant solution, nor a particularly reliable one at > > > > > > that, but it currently works around the problem. > > > > > > > > > > What problem? This is the first time I've heard of this report, and you > > > > > should definitely not need that. > > > > > > > > I found it when upgrading the Clearfog by the DSL modems to v5.13. > > > > When I rebooted it with a previously working kernel (v5.7) it has > > > > never had a problem. With v5.13, it failed to add all the lan ports > > > > into the bridge, because the bridge was still being setup by the > > > > kernel while userspace was trying to configure it. Note that I have > > > > extra debug in my kernels, hence the extra messages: > > > > > > Ok, first you talked about the interfaces file, then systemd. If it's > > > not about systemd's network manager then I don't see how it is relevant. > > > > You're reading in stuff to what I write that I did not write... I said: > > > > "Consider the case where systemd is being used, DSA stuff is modular, > > and we're trying to setup a bridge device on DSA." > > > > That does not mean I'm using systemd's network manager - which is > > something I know little about and have never used. > > You should definitely try it out, it gets a lot of new features added > all the time, it uses the netlink interface, it reacts on udev events. > > > The reason I mentioned systemd is precisely because with systemd, you > > get a hell of a lot happening parallel - and that's significiant in > > this case, because it's very clear that modules are being loaded in > > parallel with networking being brought up - and that is where the > > problems begin. In fact, modules themselves get loaded in paralllel > > with systemd. > > So that's what I don't understand. You're saying that the ifupdown > service runs in parallel with systemd-modules-load.service, and > networking is a kernel module? Doesn't that mean it behaves as expected, > then? /shrug/ > Have you tried adding an 'After=systemd-modules-load.service' dependency > to the ifupdown service? I don't think that DSA is that bad that it > registers its net devices outside of the process context in which the > insmod mv88e6xxx.ko is called. Quite the opposite, I think (but I > haven't actually taken a close look yet) that the component stuff > Saravana is proposing would do exactly that. So you "fix" one issue, you > introduce another. # systemctl list-dependencies networking.service networking.service ├─ifupdown-pre.service ├─system.slice └─network.target # systemctl list-dependencies ifupdown-pre.service ifupdown-pre.service ├─system.slice └─systemd-udevd.service Looking in the service files for a better idea: networking.service: Requires=ifupdown-pre.service Wants=network.target After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service ifupdown-pre.service Before=network.target shutdown.target network-online.target ifupdown-pre.service: Wants=systemd-udevd.service After=systemd-udev-trigger.service Before=network.target So, the dependency you mention is already present. As is a dependency on udev. The problem is udev does all the automatic module loading asynchronously and in a multithreaded way. I don't think there's a way to make systemd wait for all module loads to complete.
> The interrupt controller _has_ been set up. The trouble is that the > interrupt controller has the same OF node as the switch itself, and the > same OF node. Therefore, fw_devlink waits for the _entire_ switch to > finish probing, it doesn't have insight into the fact that the > dependency is just on the interrupt controller. That seems to be the problem. fw_devlink appears to think probe is an atomic operation. A device is not probed, or full probed. Where as the drivers are making use of it being non atomic. Maybe fw_devlink needs the third state, probing. And when deciding if a device can be probed and depends on a device which is currently probing, it looks deeper, follows the phandle and see if the resource is actually available? Andrew
On Thu, Sep 02, 2021 at 09:03:01PM +0100, Russell King (Oracle) wrote: > On Thu, Sep 02, 2021 at 10:05:07PM +0300, Vladimir Oltean wrote: > > On Thu, Sep 02, 2021 at 06:50:43PM +0100, Russell King (Oracle) wrote: > > > On Thu, Sep 02, 2021 at 05:10:34PM +0000, Vladimir Oltean wrote: > > > > On Thu, Sep 02, 2021 at 05:31:44PM +0100, Russell King (Oracle) wrote: > > > > > On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote: > > > > > > On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote: > > > > > > > Debian has had support for configuring bridges at boot time via > > > > > > > the interfaces file for years. Breaking that is going to upset a > > > > > > > lot of people (me included) resulting in busted networks. It > > > > > > > would be a sure way to make oneself unpopular. > > > > > > > > > > > > > > > I expect there to be 2 call paths of phy_attach_direct: > > > > > > > > - At probe time. Both the MAC driver and the PHY driver are probing. > > > > > > > > This is what has this patch addresses. There is no issue to return > > > > > > > > -EPROBE_DEFER at that time, since drivers connect to the PHY before > > > > > > > > they register their netdev. So if connecting defers, there is no > > > > > > > > netdev to unregister, and user space knows nothing of this. > > > > > > > > - At .ndo_open time. This is where it maybe gets interesting, but not to > > > > > > > > user space. If you open a netdev and it connects to the PHY then, I > > > > > > > > wouldn't expect the PHY to be undergoing a probing process, all of > > > > > > > > that should have been settled by then, should it not? Where it might > > > > > > > > get interesting is with NFS root, and I admit I haven't tested that. > > > > > > > > > > > > > > I don't think you can make that assumption. Consider the case where > > > > > > > systemd is being used, DSA stuff is modular, and we're trying to > > > > > > > setup a bridge device on DSA. DSA could be probing while the bridge > > > > > > > is being setup. > > > > > > > > > > > > > > Sadly, this isn't theoretical. I've ended up needing: > > > > > > > > > > > > > > pre-up sleep 1 > > > > > > > > > > > > > > in my bridge configuration to allow time for DSA to finish probing. > > > > > > > It's not a pleasant solution, nor a particularly reliable one at > > > > > > > that, but it currently works around the problem. > > > > > > > > > > > > What problem? This is the first time I've heard of this report, and you > > > > > > should definitely not need that. > > > > > > > > > > I found it when upgrading the Clearfog by the DSL modems to v5.13. > > > > > When I rebooted it with a previously working kernel (v5.7) it has > > > > > never had a problem. With v5.13, it failed to add all the lan ports > > > > > into the bridge, because the bridge was still being setup by the > > > > > kernel while userspace was trying to configure it. Note that I have > > > > > extra debug in my kernels, hence the extra messages: > > > > > > > > Ok, first you talked about the interfaces file, then systemd. If it's > > > > not about systemd's network manager then I don't see how it is relevant. > > > > > > You're reading in stuff to what I write that I did not write... I said: > > > > > > "Consider the case where systemd is being used, DSA stuff is modular, > > > and we're trying to setup a bridge device on DSA." > > > > > > That does not mean I'm using systemd's network manager - which is > > > something I know little about and have never used. > > > > You should definitely try it out, it gets a lot of new features added > > all the time, it uses the netlink interface, it reacts on udev events. > > > > > The reason I mentioned systemd is precisely because with systemd, you > > > get a hell of a lot happening parallel - and that's significiant in > > > this case, because it's very clear that modules are being loaded in > > > parallel with networking being brought up - and that is where the > > > problems begin. In fact, modules themselves get loaded in paralllel > > > with systemd. > > > > So that's what I don't understand. You're saying that the ifupdown > > service runs in parallel with systemd-modules-load.service, and > > networking is a kernel module? Doesn't that mean it behaves as expected, > > then? /shrug/ > > Have you tried adding an 'After=systemd-modules-load.service' dependency > > to the ifupdown service? I don't think that DSA is that bad that it > > registers its net devices outside of the process context in which the > > insmod mv88e6xxx.ko is called. Quite the opposite, I think (but I > > haven't actually taken a close look yet) that the component stuff > > Saravana is proposing would do exactly that. So you "fix" one issue, you > > introduce another. > > # systemctl list-dependencies networking.service > networking.service > ├─ifupdown-pre.service > ├─system.slice > └─network.target > # systemctl list-dependencies ifupdown-pre.service > ifupdown-pre.service > ├─system.slice > └─systemd-udevd.service > > Looking in the service files for a better idea: > > networking.service: > Requires=ifupdown-pre.service > Wants=network.target > After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service ifupdown-pre.service > Before=network.target shutdown.target network-online.target > > ifupdown-pre.service: > Wants=systemd-udevd.service > After=systemd-udev-trigger.service > Before=network.target > > So, the dependency you mention is already present. As is a dependency > on udev. The problem is udev does all the automatic module loading > asynchronously and in a multithreaded way. > > I don't think there's a way to make systemd wait for all module loads > to complete. So ifupdown-pre.service has a call to "udevadm settle". This "watches the udev event queue, and exits if all current events are handled", according to the man page. But which current events? ifupdown-pre.service does not have the dependency on systemd-modules-load.service, just networking.service does. So maybe ifupdown-pre.service does not wait for DSA to finish initializing, then it tells networking.service that all is ok.
On Thu, Sep 02, 2021 at 11:21:24PM +0300, Vladimir Oltean wrote: > On Thu, Sep 02, 2021 at 09:03:01PM +0100, Russell King (Oracle) wrote: > > # systemctl list-dependencies networking.service > > networking.service > > ├─ifupdown-pre.service > > ├─system.slice > > └─network.target > > # systemctl list-dependencies ifupdown-pre.service > > ifupdown-pre.service > > ├─system.slice > > └─systemd-udevd.service > > > > Looking in the service files for a better idea: > > > > networking.service: > > Requires=ifupdown-pre.service > > Wants=network.target > > After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service ifupdown-pre.service > > Before=network.target shutdown.target network-online.target > > > > ifupdown-pre.service: > > Wants=systemd-udevd.service > > After=systemd-udev-trigger.service > > Before=network.target > > > > So, the dependency you mention is already present. As is a dependency > > on udev. The problem is udev does all the automatic module loading > > asynchronously and in a multithreaded way. > > > > I don't think there's a way to make systemd wait for all module loads > > to complete. > > So ifupdown-pre.service has a call to "udevadm settle". This "watches > the udev event queue, and exits if all current events are handled", > according to the man page. But which current events? ifupdown-pre.service > does not have the dependency on systemd-modules-load.service, just > networking.service does. So maybe ifupdown-pre.service does not wait for > DSA to finish initializing, then it tells networking.service that all is ok. ifupdown-pre.service does have a call to udevadm settle, and that does get called from what I can tell. systemd-modules-load.service is an entire red herring. The only module listed in the various modules-load.d directories is "tun" for openvpn (which isn't currently being used.) As I've already told you (and you seem to have ignored), DSA gets loaded by udev, not by systemd-modules-load.service. systemd-modules-load.service is irrelevant to my situation. I think there's a problem with "and exits if all current events are handled" - does that mean it's fired off a modprobe process which is in progress, or does that mean that the modprobe process has completed. Given that we can see that ifup is being run while the DSA module is still in the middle of probing, the latter interpretation can not be true - unless systemd is ignoring the dependencies. Or just in general, systemd being systemd (I have very little faith in systemd behaving as it should.)
On Thu, Sep 02, 2021 at 10:07:49PM +0200, Andrew Lunn wrote: > > The interrupt controller _has_ been set up. The trouble is that the > > interrupt controller has the same OF node as the switch itself, and the > > same OF node. Therefore, fw_devlink waits for the _entire_ switch to > > finish probing, it doesn't have insight into the fact that the > > dependency is just on the interrupt controller. > > That seems to be the problem. fw_devlink appears to think probe is an > atomic operation. A device is not probed, or full probed. Where as the > drivers are making use of it being non atomic. > > Maybe fw_devlink needs the third state, probing. And when deciding if > a device can be probed and depends on a device which is currently > probing, it looks deeper, follows the phandle and see if the resource > is actually available? This is interesting because there already exists a device link state for when the consumer is "probing", but for the supplier, it's binary: /** * enum device_link_state - Device link states. * @DL_STATE_NONE: The presence of the drivers is not being tracked. * @DL_STATE_DORMANT: None of the supplier/consumer drivers is present. * @DL_STATE_AVAILABLE: The supplier driver is present, but the consumer is not. * @DL_STATE_CONSUMER_PROBE: The consumer is probing (supplier driver present). * @DL_STATE_ACTIVE: Both the supplier and consumer drivers are present. * @DL_STATE_SUPPLIER_UNBIND: The supplier driver is unbinding. */ The check that's killing us is in device_links_check_suppliers, and is for DL_STATE_AVAILABLE: list_for_each_entry(link, &dev->links.suppliers, c_node) { if (!(link->flags & DL_FLAG_MANAGED)) continue; if (link->status != DL_STATE_AVAILABLE && !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) { device_links_missing_supplier(dev); dev_err(dev, "probe deferral - supplier %s not ready\n", dev_name(link->supplier)); ret = -EPROBE_DEFER; break; } WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE); } Anyway, I was expecting quite a different reaction from this patch series, and especially one from Saravana. We are essentially battling to handle an -EPROBE_DEFER we don't need (the battle might be worth it though, in the general case, which is one of the reasons I posted them). But these patches also solve DSA's issue with the circular dependency between the switch and its internal PHYs, and nobody seems to have asked the most important question: why? The PHY should return -EPROBE_DEFER ad infinitum, since its supplier has never finished probing by the time it calls phy_attach_direct.
On Thu, Sep 02, 2021 at 11:32:48PM +0300, Vladimir Oltean wrote: > But these patches also solve DSA's issue with the circular dependency > between the switch and its internal PHYs, and nobody seems to have asked > the most important question: why? Surely you specified that in your cover message and in the patch that actually fixes the problem, as one always should do.
On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote: > This is a continuation of the discussion on patch "[v1,1/2] driver core: > fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT" from here: > https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/ > > Summary: in a complex combination of device dependencies which is not > really relevant to what is being proposed here, DSA ends up calling > phylink_of_phy_connect during a period in which the PHY driver goes > through a series of probe deferral events. > > The central point of that discussion is that DSA seems "broken" for > expecting the PHY driver to probe immediately on PHYs belonging to the > internal MDIO buses of switches. A few suggestions were made about what > to do, but some were not satisfactory and some did not solve the problem. > > In fact, fw_devlink, the mechanism that causes the PHY driver to defer > probing in this particular case, has some significant "issues" too, but > its "issues" are only in quotes "because at worst it'd allow a few > unnecessary deferred probes": > https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/#24418895 > > So if that's the criterion by which an issue is an issue, maybe we > should take a step back and look at the bigger picture. > > There is nothing about the idea that a PHY might defer probing, or about > the changes proposed here, that has anything with DSA. Furthermore, the > changes done by this series solve the problem in the same way: "they > allow a few unnecessary deferred probes" <- in this case they provoke > this to the caller of phy_attach_direct. > > If we look at commit 16983507742c ("net: phy: probe PHY drivers > synchronously"), we see that the PHY library expectation is for the PHY > device to have a PHY driver bound to it as soon as device_add() finishes. > > Well, as it turns out, in case the PHY device has any supplier which is > not ready, this is not possible, but that patch still seems to ensure > that the process of binding a driver to the device has at least started. > That process will continue for a while, and will race with > phy_attach_direct calls, so we need to make the latter observe the fact > that a driver is struggling to probe, and wait for it a bit more. > > What I've not tested is loading the PHY module at runtime, and seeing > how phy_attach_direct behaves then. I expect that this change set will > not alter the behavior in that case: the genphy will still bind to a > device with no driver, and phy_attach_direct will not return -EPROBE_DEFER > in that case. > > I might not be very versed in the device core/internals, but the patches > make sense to me, and worked as intended from the first try on my system > (Turris MOX with mv88e6xxx), which was modified to make the same "sins" > as those called out in the thread above: > > - use PHY interrupts provided by the switch itself as an interrupt-controller > - call of_mdiobus_register from setup() and not from probe(), so as to > not circumvent fw_devlink's limitations, and still get to hit the PHY > probe deferral conditions. > > So feedback and testing on other platforms is very appreciated. > > Vladimir Oltean (3): > net: phy: don't bind genphy in phy_attach_direct if the specific > driver defers probe > net: dsa: destroy the phylink instance on any error in > dsa_slave_phy_setup > net: dsa: allow the phy_connect() call to return -EPROBE_DEFER > > drivers/base/dd.c | 21 +++++++++++++++++++-- > drivers/net/phy/phy_device.c | 8 ++++++++ > include/linux/device.h | 1 + > net/dsa/dsa2.c | 2 ++ > net/dsa/slave.c | 12 +++++------- > 5 files changed, 35 insertions(+), 9 deletions(-) > > -- > 2.25.1 > Ouch, I just realized that Saravana, the person whose reaction I've been waiting for the most, is not copied.... Saravana, you can find the thread here to sync up with what has been discussed: https://patchwork.kernel.org/project/netdevbpf/cover/20210901225053.1205571-1-vladimir.oltean@nxp.com/ Sorry.
On Thu, Sep 2, 2021 at 3:05 PM Vladimir Oltean <olteanv@gmail.com> wrote: > > On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote: > > This is a continuation of the discussion on patch "[v1,1/2] driver core: > > fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT" from here: > > https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/ > > > > Summary: in a complex combination of device dependencies which is not > > really relevant to what is being proposed here, DSA ends up calling > > phylink_of_phy_connect during a period in which the PHY driver goes > > through a series of probe deferral events. > > > > The central point of that discussion is that DSA seems "broken" for > > expecting the PHY driver to probe immediately on PHYs belonging to the > > internal MDIO buses of switches. A few suggestions were made about what > > to do, but some were not satisfactory and some did not solve the problem. > > > > In fact, fw_devlink, the mechanism that causes the PHY driver to defer > > probing in this particular case, has some significant "issues" too, but > > its "issues" are only in quotes "because at worst it'd allow a few > > unnecessary deferred probes": > > https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/#24418895 > > > > So if that's the criterion by which an issue is an issue, maybe we > > should take a step back and look at the bigger picture. > > > > There is nothing about the idea that a PHY might defer probing, or about > > the changes proposed here, that has anything with DSA. Furthermore, the > > changes done by this series solve the problem in the same way: "they > > allow a few unnecessary deferred probes" <- in this case they provoke > > this to the caller of phy_attach_direct. > > > > If we look at commit 16983507742c ("net: phy: probe PHY drivers > > synchronously"), we see that the PHY library expectation is for the PHY > > device to have a PHY driver bound to it as soon as device_add() finishes. > > > > Well, as it turns out, in case the PHY device has any supplier which is > > not ready, this is not possible, but that patch still seems to ensure > > that the process of binding a driver to the device has at least started. > > That process will continue for a while, and will race with > > phy_attach_direct calls, so we need to make the latter observe the fact > > that a driver is struggling to probe, and wait for it a bit more. > > > > What I've not tested is loading the PHY module at runtime, and seeing > > how phy_attach_direct behaves then. I expect that this change set will > > not alter the behavior in that case: the genphy will still bind to a > > device with no driver, and phy_attach_direct will not return -EPROBE_DEFER > > in that case. > > > > I might not be very versed in the device core/internals, but the patches > > make sense to me, and worked as intended from the first try on my system > > (Turris MOX with mv88e6xxx), which was modified to make the same "sins" > > as those called out in the thread above: > > > > - use PHY interrupts provided by the switch itself as an interrupt-controller > > - call of_mdiobus_register from setup() and not from probe(), so as to > > not circumvent fw_devlink's limitations, and still get to hit the PHY > > probe deferral conditions. > > > > So feedback and testing on other platforms is very appreciated. > > > > Vladimir Oltean (3): > > net: phy: don't bind genphy in phy_attach_direct if the specific > > driver defers probe > > net: dsa: destroy the phylink instance on any error in > > dsa_slave_phy_setup > > net: dsa: allow the phy_connect() call to return -EPROBE_DEFER > > > > drivers/base/dd.c | 21 +++++++++++++++++++-- > > drivers/net/phy/phy_device.c | 8 ++++++++ > > include/linux/device.h | 1 + > > net/dsa/dsa2.c | 2 ++ > > net/dsa/slave.c | 12 +++++------- > > 5 files changed, 35 insertions(+), 9 deletions(-) > > > > -- > > 2.25.1 > > > > Ouch, I just realized that Saravana, the person whose reaction I've been > waiting for the most, is not copied.... > > Saravana, you can find the thread here to sync up with what has been > discussed: > https://patchwork.kernel.org/project/netdevbpf/cover/20210901225053.1205571-1-vladimir.oltean@nxp.com/ Woah! The thread blew up. > > Sorry. No worries. I'll read through the thread later and maybe provide more responses, but one thing I wanted to say right away: Don't depend on dev->p->deferred_probe. It can be "empty" for a device that has returned -EPROBE_DEFER for a bunch of reasons: 1. When the device is in the middle of being reattempted, it would be empty. You can't hold any lock that'll ensure correctness either because deferred probe locking is a mess (I'm working on cleaning that up). 2. I'm working on actually not adding devices to that list if there's a known supplier that hasn't been probed yet. No point retrying it again and again for every deferred probe trigger when we know it's going to fail. And we'll basically get topological probe ordering. Your closest bet right now is d->can_match. Only caveat is that it's not cleared if the actual driver gets unregistered. -Saravana