Message ID | 20240930-phy-handle-fw-devlink-v1-1-4ea46acfcc12@redhat.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | [RFT] of: property: fw_devlink: Add support for the "phy-handle" binding | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Mon, Sep 30, 2024 at 2:28 PM Andrew Halaney <ahalaney@redhat.com> wrote: > > Add support for parsing the phy-handle binding so that fw_devlink can > enforce the dependency. This prevents MACs (that use this binding to > claim they're using the corresponding phy) from probing prior to the > phy, unless the phy is a child of the MAC (which results in a > dependency cycle) or similar. > > For some motivation, imagine a device topology like so: > > ðernet0 { > phy-mode = "sgmii"; > phy-handle = <&sgmii_phy0>; > > mdio { > compatible = "snps,dwmac-mdio"; > sgmii_phy0: phy@8 { > compatible = "ethernet-phy-id0141.0dd4"; > reg = <0x8>; > device_type = "ethernet-phy"; > }; > > sgmii_phy1: phy@a { > compatible = "ethernet-phy-id0141.0dd4"; > reg = <0xa>; > device_type = "ethernet-phy"; > }; > }; > }; > > ðernet1 { > phy-mode = "sgmii"; > phy-handle = <&sgmii_phy1>; > }; > > Here ethernet1 depends on sgmii_phy1 to function properly. In the below > link an issue is reported where ethernet1 is probed and used prior to > sgmii_phy1, resulting in a failure to get things working for ethernet1. > With this change in place ethernet1 doesn't probe until sgmii_phy1 is > ready, resulting in ethernet1 functioning properly. > > ethernet0 consumes sgmii_phy0, but this dependency isn't enforced > via the device_links backing fw_devlink since ethernet0 is the parent of > sgmii_phy0. Here's a log showing that in action: > > [ 7.000432] qcom-ethqos 23040000.ethernet: Fixed dependency cycle(s) with /soc@0/ethernet@23040000/mdio/phy@8 > > With this change in place ethernet1's dependency is properly described, > and it doesn't probe prior to sgmii_phy1 being available. > > Link: https://lore.kernel.org/netdev/7723d4l2kqgrez3yfauvp2ueu6awbizkrq4otqpsqpytzp45q2@rju2nxmqu4ew/ > Suggested-by: Serge Semin <fancer.lancer@gmail.com> > Signed-off-by: Andrew Halaney <ahalaney@redhat.com> > --- > I've marked this as an RFT because when looking through old mailing > list discusssions and kernel tech talks on this subject, I was unable > to really understand why in the past phy-handle had been left out. There > were some loose references to circular dependencies (which seem more or > less handled by fw_devlink to me), and the fact that a lot of behavior > happens in ndo_open() (but I couldn't quite grok the concern there). > > I'd appreciate more testing by others and some feedback from those who > know this a bit better to indicate whether fw_devlink is ready to handle > this or not. > > At least in my narrow point of view, it's working well for me. I do want this to land and I'm fairly certain it'll break something. But it's been so long that I don't remember what it was. I think it has to do with the generic phy driver not working well with fw_devlink because it doesn't go through the device driver model. But like you said, it's been a while and fw_devlink has improved since then (I think). So please go ahead and give this a shot. If you can help fix any issues this highlights, I'd really appreciate it and I'd be happy to guide you through what I think needs to happen. But I don't think I have the time to fix it myself. Overly optimistic: Acked-by: Saravana Kannan <saravanak@google.com> -Saravana > > Thanks, > Andrew > --- > drivers/of/property.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/of/property.c b/drivers/of/property.c > index 11b922fde7af..4a2fca75e1c6 100644 > --- a/drivers/of/property.c > +++ b/drivers/of/property.c > @@ -1220,6 +1220,7 @@ DEFINE_SIMPLE_PROP(hwlocks, "hwlocks", "#hwlock-cells") > DEFINE_SIMPLE_PROP(extcon, "extcon", NULL) > DEFINE_SIMPLE_PROP(nvmem_cells, "nvmem-cells", "#nvmem-cell-cells") > DEFINE_SIMPLE_PROP(phys, "phys", "#phy-cells") > +DEFINE_SIMPLE_PROP(phy_handle, "phy-handle", NULL) > DEFINE_SIMPLE_PROP(wakeup_parent, "wakeup-parent", NULL) > DEFINE_SIMPLE_PROP(pinctrl0, "pinctrl-0", NULL) > DEFINE_SIMPLE_PROP(pinctrl1, "pinctrl-1", NULL) > @@ -1366,6 +1367,7 @@ static const struct supplier_bindings of_supplier_bindings[] = { > { .parse_prop = parse_extcon, }, > { .parse_prop = parse_nvmem_cells, }, > { .parse_prop = parse_phys, }, > + { .parse_prop = parse_phy_handle, }, > { .parse_prop = parse_wakeup_parent, }, > { .parse_prop = parse_pinctrl0, }, > { .parse_prop = parse_pinctrl1, }, > > --- > base-commit: cea5425829f77e476b03702426f6b3701299b925 > change-id: 20240829-phy-handle-fw-devlink-b829622200ae > > Best regards, > -- > Andrew Halaney <ahalaney@redhat.com> >
On Mon, Sep 30, 2024 at 05:12:42PM GMT, Saravana Kannan wrote: > On Mon, Sep 30, 2024 at 2:28 PM Andrew Halaney <ahalaney@redhat.com> wrote: > > > > Add support for parsing the phy-handle binding so that fw_devlink can > > enforce the dependency. This prevents MACs (that use this binding to > > claim they're using the corresponding phy) from probing prior to the > > phy, unless the phy is a child of the MAC (which results in a > > dependency cycle) or similar. > > > > For some motivation, imagine a device topology like so: > > > > ðernet0 { > > phy-mode = "sgmii"; > > phy-handle = <&sgmii_phy0>; > > > > mdio { > > compatible = "snps,dwmac-mdio"; > > sgmii_phy0: phy@8 { > > compatible = "ethernet-phy-id0141.0dd4"; > > reg = <0x8>; > > device_type = "ethernet-phy"; > > }; > > > > sgmii_phy1: phy@a { > > compatible = "ethernet-phy-id0141.0dd4"; > > reg = <0xa>; > > device_type = "ethernet-phy"; > > }; > > }; > > }; > > > > ðernet1 { > > phy-mode = "sgmii"; > > phy-handle = <&sgmii_phy1>; > > }; > > > > Here ethernet1 depends on sgmii_phy1 to function properly. In the below > > link an issue is reported where ethernet1 is probed and used prior to > > sgmii_phy1, resulting in a failure to get things working for ethernet1. > > With this change in place ethernet1 doesn't probe until sgmii_phy1 is > > ready, resulting in ethernet1 functioning properly. > > > > ethernet0 consumes sgmii_phy0, but this dependency isn't enforced > > via the device_links backing fw_devlink since ethernet0 is the parent of > > sgmii_phy0. Here's a log showing that in action: > > > > [ 7.000432] qcom-ethqos 23040000.ethernet: Fixed dependency cycle(s) with /soc@0/ethernet@23040000/mdio/phy@8 > > > > With this change in place ethernet1's dependency is properly described, > > and it doesn't probe prior to sgmii_phy1 being available. > > > > Link: https://lore.kernel.org/netdev/7723d4l2kqgrez3yfauvp2ueu6awbizkrq4otqpsqpytzp45q2@rju2nxmqu4ew/ > > Suggested-by: Serge Semin <fancer.lancer@gmail.com> > > Signed-off-by: Andrew Halaney <ahalaney@redhat.com> > > --- > > I've marked this as an RFT because when looking through old mailing > > list discusssions and kernel tech talks on this subject, I was unable > > to really understand why in the past phy-handle had been left out. There > > were some loose references to circular dependencies (which seem more or > > less handled by fw_devlink to me), and the fact that a lot of behavior > > happens in ndo_open() (but I couldn't quite grok the concern there). > > > > I'd appreciate more testing by others and some feedback from those who > > know this a bit better to indicate whether fw_devlink is ready to handle > > this or not. > > > > At least in my narrow point of view, it's working well for me. > > I do want this to land and I'm fairly certain it'll break something. > But it's been so long that I don't remember what it was. I think it > has to do with the generic phy driver not working well with fw_devlink > because it doesn't go through the device driver model. Let me see if I can hack something up on this board (which has a decent dependency tree for testing this stuff) to use the generic phy driver instead of the marvell one that it needs and see how that goes. It won't *actually* work from a phy perspective, but it will at least test out the driver core bits here I think. > > But like you said, it's been a while and fw_devlink has improved since > then (I think). So please go ahead and give this a shot. If you can > help fix any issues this highlights, I'd really appreciate it and I'd > be happy to guide you through what I think needs to happen. But I > don't think I have the time to fix it myself. Sure, I tend to agree. Let me check the generic phy driver path for any issues and if that test seems to go okay I too am of the opinion that without any solid reasoning against this we enable it and battle through (revert and fix after the fact if necessary) any newly identified issues that prevent phy-handle and fw_devlink have with each other. > > Overly optimistic: > Acked-by: Saravana Kannan <saravanak@google.com> > > -Saravana >
> Let me see if I can hack something up on this board (which has a decent > dependency tree for testing this stuff) to use the generic phy driver > instead of the marvell one that it needs and see how that goes. It won't > *actually* work from a phy perspective, but it will at least test out > the driver core bits here I think. > > > > > But like you said, it's been a while and fw_devlink has improved since > > then (I think). So please go ahead and give this a shot. If you can > > help fix any issues this highlights, I'd really appreciate it and I'd > > be happy to guide you through what I think needs to happen. But I > > don't think I have the time to fix it myself. > > Sure, I tend to agree. Let me check the generic phy driver path for any > issues and if that test seems to go okay I too am of the opinion that > without any solid reasoning against this we enable it and battle through > (revert and fix after the fact if necessary) any newly identified issues > that prevent phy-handle and fw_devlink have with each other. Here is one for you to look at: https://elixir.bootlin.com/linux/v6.11.1/source/drivers/net/ethernet/freescale/fec_main.c#L2481 I don't know if there is a real issue here, but if the order of the probe gets swapped, i think the usage of mii_cnt will break. I don't remember what broke last time, and i'm currently too lazy to go look. But maybe take a look at devices like this: arch/arm/boot/dts/nxp/vf/vf610-zii-dev-rev-b.dts mdio-mux { compatible = "mdio-mux-gpio"; pinctrl-0 = <&pinctrl_mdio_mux>; pinctrl-names = "default"; gpios = <&gpio0 8 GPIO_ACTIVE_HIGH &gpio0 9 GPIO_ACTIVE_HIGH &gpio0 24 GPIO_ACTIVE_HIGH &gpio0 25 GPIO_ACTIVE_HIGH>; mdio-parent-bus = <&mdio1>; #address-cells = <1>; #size-cells = <0>; mdio_mux_1: mdio@1 { reg = <1>; #address-cells = <1>; #size-cells = <0>; switch0: switch@0 { compatible = "marvell,mv88e6085"; pinctrl-0 = <&pinctrl_gpio_switch0>; pinctrl-names = "default"; reg = <0>; dsa,member = <0 0>; interrupt-parent = <&gpio0>; interrupts = <27 IRQ_TYPE_LEVEL_LOW>; interrupt-controller; #interrupt-cells = <2>; eeprom-length = <512>; ports { #address-cells = <1>; #size-cells = <0>; port@0 { reg = <0>; label = "lan0"; phy-handle = <&switch0phy0>; }; port@1 { reg = <1>; label = "lan1"; phy-handle = <&switch0phy1>; }; port@2 { reg = <2>; label = "lan2"; phy-handle = <&switch0phy2>; }; switch0port5: port@5 { reg = <5>; label = "dsa"; phy-mode = "rgmii-txid"; link = <&switch1port6 &switch2port9>; fixed-link { speed = <1000>; full-duplex; }; }; port@6 { reg = <6>; phy-mode = "rmii"; ethernet = <&fec1>; fixed-link { speed = <100>; full-duplex; }; }; }; mdio { #address-cells = <1>; #size-cells = <0>; switch0phy0: switch0phy0@0 { reg = <0>; interrupt-parent = <&switch0>; interrupts = <0 IRQ_TYPE_LEVEL_HIGH>; }; switch0phy1: switch1phy0@1 { reg = <1>; interrupt-parent = <&switch0>; interrupts = <1 IRQ_TYPE_LEVEL_HIGH>; }; switch0phy2: switch1phy0@2 { reg = <2>; interrupt-parent = <&switch0>; interrupts = <2 IRQ_TYPE_LEVEL_HIGH>; }; }; }; }; This would be an example of circular dependencies, with the interrupt properties closing the loop. switch -> mdio -> phy ^ | |----------------+ Andrew
On Tue, Oct 01, 2024 at 02:22:23PM GMT, Andrew Halaney wrote: > On Mon, Sep 30, 2024 at 05:12:42PM GMT, Saravana Kannan wrote: > > On Mon, Sep 30, 2024 at 2:28 PM Andrew Halaney <ahalaney@redhat.com> wrote: > > > > > > Add support for parsing the phy-handle binding so that fw_devlink can > > > enforce the dependency. This prevents MACs (that use this binding to > > > claim they're using the corresponding phy) from probing prior to the > > > phy, unless the phy is a child of the MAC (which results in a > > > dependency cycle) or similar. > > > > > > For some motivation, imagine a device topology like so: > > > > > > ðernet0 { > > > phy-mode = "sgmii"; > > > phy-handle = <&sgmii_phy0>; > > > > > > mdio { > > > compatible = "snps,dwmac-mdio"; > > > sgmii_phy0: phy@8 { > > > compatible = "ethernet-phy-id0141.0dd4"; > > > reg = <0x8>; > > > device_type = "ethernet-phy"; > > > }; > > > > > > sgmii_phy1: phy@a { > > > compatible = "ethernet-phy-id0141.0dd4"; > > > reg = <0xa>; > > > device_type = "ethernet-phy"; > > > }; > > > }; > > > }; > > > > > > ðernet1 { > > > phy-mode = "sgmii"; > > > phy-handle = <&sgmii_phy1>; > > > }; > > > > > > Here ethernet1 depends on sgmii_phy1 to function properly. In the below > > > link an issue is reported where ethernet1 is probed and used prior to > > > sgmii_phy1, resulting in a failure to get things working for ethernet1. > > > With this change in place ethernet1 doesn't probe until sgmii_phy1 is > > > ready, resulting in ethernet1 functioning properly. > > > > > > ethernet0 consumes sgmii_phy0, but this dependency isn't enforced > > > via the device_links backing fw_devlink since ethernet0 is the parent of > > > sgmii_phy0. Here's a log showing that in action: > > > > > > [ 7.000432] qcom-ethqos 23040000.ethernet: Fixed dependency cycle(s) with /soc@0/ethernet@23040000/mdio/phy@8 > > > > > > With this change in place ethernet1's dependency is properly described, > > > and it doesn't probe prior to sgmii_phy1 being available. > > > > > > Link: https://lore.kernel.org/netdev/7723d4l2kqgrez3yfauvp2ueu6awbizkrq4otqpsqpytzp45q2@rju2nxmqu4ew/ > > > Suggested-by: Serge Semin <fancer.lancer@gmail.com> > > > Signed-off-by: Andrew Halaney <ahalaney@redhat.com> > > > --- > > > I've marked this as an RFT because when looking through old mailing > > > list discusssions and kernel tech talks on this subject, I was unable > > > to really understand why in the past phy-handle had been left out. There > > > were some loose references to circular dependencies (which seem more or > > > less handled by fw_devlink to me), and the fact that a lot of behavior > > > happens in ndo_open() (but I couldn't quite grok the concern there). > > > > > > I'd appreciate more testing by others and some feedback from those who > > > know this a bit better to indicate whether fw_devlink is ready to handle > > > this or not. > > > > > > At least in my narrow point of view, it's working well for me. > > > > I do want this to land and I'm fairly certain it'll break something. > > But it's been so long that I don't remember what it was. I think it > > has to do with the generic phy driver not working well with fw_devlink > > because it doesn't go through the device driver model. > > Let me see if I can hack something up on this board (which has a decent > dependency tree for testing this stuff) to use the generic phy driver > instead of the marvell one that it needs and see how that goes. It won't > *actually* work from a phy perspective, but it will at least test out > the driver core bits here I think. > > > > > But like you said, it's been a while and fw_devlink has improved since > > then (I think). So please go ahead and give this a shot. If you can > > help fix any issues this highlights, I'd really appreciate it and I'd > > be happy to guide you through what I think needs to happen. But I > > don't think I have the time to fix it myself. > > Sure, I tend to agree. Let me check the generic phy driver path for any > issues and if that test seems to go okay I too am of the opinion that > without any solid reasoning against this we enable it and battle through > (revert and fix after the fact if necessary) any newly identified issues > that prevent phy-handle and fw_devlink have with each other. > Hmmm, yes the generic phy driver path for this doesn't seem to work well. Its fine and dandy if there's no device_link (current situation), but if there is one (say with my patch and in my example above between ethernet1 and phy@a, you can ignore the ethernet0 relationship since its a cycle and therefore no device_link is created as mentioned in the patch) you run into problems with the generic phy driver. In my original test you can see I use the marvell driver for the phy. In that case things work well. In the generic phy case however, ethernet1's probe is actually delayed far past phy@a. Here's some logs that show that the device_link getting "relaxed" due to no driver being bound, which has fw_devlink thinking this supplier phy isn't going to get a driver ever, so it finally tries to unblock (probe) the consumer (ethernet1): [ 40.695570] platform 23000000.ethernet: Relaxing link with stmmac-0:0a [ 40.702274] CPU: 4 UID: 0 PID: 111 Comm: kworker/u34:1 Not tainted 6.12.0-rc1-next-20240930-00004-gb766c5527800-dirty #155 [ 40.713605] Hardware name: Qualcomm SA8775P Ride (DT) [ 40.718789] Workqueue: events_unbound deferred_probe_work_func [ 40.724774] Call trace: [ 40.727295] dump_backtrace+0x108/0x190 [ 40.731233] show_stack+0x24/0x38 [ 40.734638] dump_stack_lvl+0x40/0x88 [ 40.738406] dump_stack+0x18/0x28 [ 40.741811] fw_devlink_unblock_consumers+0x78/0xe8 [ 40.746824] device_add+0x290/0x3f8 [ 40.750411] phy_device_register+0x6c/0xd0 [ 40.754615] fwnode_mdiobus_phy_device_register+0xe8/0x178 [ 40.760246] fwnode_mdiobus_register_phy+0x214/0x268 [ 40.765344] __of_mdiobus_parse_phys+0x80/0x280 [ 40.769995] __of_mdiobus_register+0xd0/0x230 [ 40.774465] stmmac_mdio_register+0x220/0x3c8 [stmmac] [ 40.779755] stmmac_dvr_probe+0x91c/0xd70 [stmmac] [ 40.784682] devm_stmmac_pltfr_probe+0x54/0xe0 [stmmac_platform] [ 40.790846] qcom_ethqos_probe+0x404/0x438 [dwmac_qcom_ethqos] [ 40.796830] platform_probe+0x94/0xd8 If I understand correctly that's because the generic phy driver is bound during a MAC's (like ethernet1 here) phylink_fwnode_phy_connect() call in ndo_open() currently.. here's another dump_stack() (yes I abuse that alot) showing when that happens: [ 42.980611] net end1: Before phylink_fwnode_phy_connect [ 42.986011] CPU: 4 UID: 0 PID: 310 Comm: NetworkManager Not tainted 6.12.0-rc1-next-20240930-00004-gb766c5527800-dirty #156 [ 42.997436] Hardware name: Qualcomm SA8775P Ride (DT) [ 43.002632] Call trace: [ 43.005152] dump_backtrace+0x108/0x190 [ 43.009106] show_stack+0x24/0x38 [ 43.012518] dump_stack_lvl+0x40/0x88 [ 43.016290] dump_stack+0x18/0x28 [ 43.019701] phy_attach_direct+0x2d4/0x3e0 [ 43.023918] phylink_fwnode_phy_connect+0xc4/0x178 [ 43.028848] __stmmac_open+0x698/0x6e0 [stmmac] [ 43.033534] stmmac_open+0x54/0xe0 [stmmac] [ 43.037850] __dev_open+0x110/0x228 [ 43.041442] __dev_change_flags+0xbc/0x1d0 And here's the code for the binding of the generic phy driver: /** * phy_attach_direct - attach a network device to a given PHY device pointer * @dev: network device to attach * @phydev: Pointer to phy_device to attach * @flags: PHY device's dev_flags * @interface: PHY device's interface * * Description: Called by drivers to attach to a particular PHY * device. The phy_device is found, and properly hooked up * to the phy_driver. If no driver is attached, then a * generic driver is used. The phy_device is given a ptr to * the attaching device, and given a callback for link status * change. The phy_device is returned to the attaching driver. * This function takes a reference on the phy device. */ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, u32 flags, phy_interface_t interface) { struct mii_bus *bus = phydev->mdio.bus; struct device *d = &phydev->mdio.dev; struct module *ndev_owner = NULL; bool using_genphy = false; int err; /* For Ethernet device drivers that register their own MDIO bus, we * will have bus->owner match ndev_mod, so we do not want to increment * our own module->refcnt here, otherwise we would not be able to * unload later on. */ if (dev) ndev_owner = dev->dev.parent->driver->owner; if (ndev_owner != bus->owner && !try_module_get(bus->owner)) { phydev_err(phydev, "failed to get the bus module\n"); return -EIO; } get_device(d); /* Assume that if there is no driver, that it doesn't * exist, and we should use the genphy driver. */ if (!d->driver) { if (phydev->is_c45) d->driver = &genphy_c45_driver.mdiodrv.driver; else d->driver = &genphy_driver.mdiodrv.driver; using_genphy = true; dump_stack(); } if (!try_module_get(d->driver->owner)) { phydev_err(phydev, "failed to get the device driver module\n"); err = -EIO; goto error_put_device; } if (using_genphy) { err = d->driver->probe(d); if (err >= 0) err = device_bind_driver(d); if (err) goto error_module_put; } ... } Something will need to be done for the generic phy driver case before this patch could be considered acceptable as this would slow the boot time for the topology I described in the patch description if the generic phy driver was used.
On Wed, Oct 2, 2024 at 12:34 PM Andrew Halaney <ahalaney@redhat.com> wrote: > > On Tue, Oct 01, 2024 at 02:22:23PM GMT, Andrew Halaney wrote: > > On Mon, Sep 30, 2024 at 05:12:42PM GMT, Saravana Kannan wrote: > > > On Mon, Sep 30, 2024 at 2:28 PM Andrew Halaney <ahalaney@redhat.com> wrote: > > > > > > > > Add support for parsing the phy-handle binding so that fw_devlink can > > > > enforce the dependency. This prevents MACs (that use this binding to > > > > claim they're using the corresponding phy) from probing prior to the > > > > phy, unless the phy is a child of the MAC (which results in a > > > > dependency cycle) or similar. > > > > > > > > For some motivation, imagine a device topology like so: > > > > > > > > ðernet0 { > > > > phy-mode = "sgmii"; > > > > phy-handle = <&sgmii_phy0>; > > > > > > > > mdio { > > > > compatible = "snps,dwmac-mdio"; > > > > sgmii_phy0: phy@8 { > > > > compatible = "ethernet-phy-id0141.0dd4"; > > > > reg = <0x8>; > > > > device_type = "ethernet-phy"; > > > > }; > > > > > > > > sgmii_phy1: phy@a { > > > > compatible = "ethernet-phy-id0141.0dd4"; > > > > reg = <0xa>; > > > > device_type = "ethernet-phy"; > > > > }; > > > > }; > > > > }; > > > > > > > > ðernet1 { > > > > phy-mode = "sgmii"; > > > > phy-handle = <&sgmii_phy1>; > > > > }; > > > > > > > > Here ethernet1 depends on sgmii_phy1 to function properly. In the below > > > > link an issue is reported where ethernet1 is probed and used prior to > > > > sgmii_phy1, resulting in a failure to get things working for ethernet1. > > > > With this change in place ethernet1 doesn't probe until sgmii_phy1 is > > > > ready, resulting in ethernet1 functioning properly. > > > > > > > > ethernet0 consumes sgmii_phy0, but this dependency isn't enforced > > > > via the device_links backing fw_devlink since ethernet0 is the parent of > > > > sgmii_phy0. Here's a log showing that in action: > > > > > > > > [ 7.000432] qcom-ethqos 23040000.ethernet: Fixed dependency cycle(s) with /soc@0/ethernet@23040000/mdio/phy@8 > > > > > > > > With this change in place ethernet1's dependency is properly described, > > > > and it doesn't probe prior to sgmii_phy1 being available. > > > > > > > > Link: https://lore.kernel.org/netdev/7723d4l2kqgrez3yfauvp2ueu6awbizkrq4otqpsqpytzp45q2@rju2nxmqu4ew/ > > > > Suggested-by: Serge Semin <fancer.lancer@gmail.com> > > > > Signed-off-by: Andrew Halaney <ahalaney@redhat.com> > > > > --- > > > > I've marked this as an RFT because when looking through old mailing > > > > list discusssions and kernel tech talks on this subject, I was unable > > > > to really understand why in the past phy-handle had been left out. There > > > > were some loose references to circular dependencies (which seem more or > > > > less handled by fw_devlink to me), and the fact that a lot of behavior > > > > happens in ndo_open() (but I couldn't quite grok the concern there). > > > > > > > > I'd appreciate more testing by others and some feedback from those who > > > > know this a bit better to indicate whether fw_devlink is ready to handle > > > > this or not. > > > > > > > > At least in my narrow point of view, it's working well for me. > > > > > > I do want this to land and I'm fairly certain it'll break something. > > > But it's been so long that I don't remember what it was. I think it > > > has to do with the generic phy driver not working well with fw_devlink > > > because it doesn't go through the device driver model. > > > > Let me see if I can hack something up on this board (which has a decent > > dependency tree for testing this stuff) to use the generic phy driver > > instead of the marvell one that it needs and see how that goes. It won't > > *actually* work from a phy perspective, but it will at least test out > > the driver core bits here I think. > > > > > > > > But like you said, it's been a while and fw_devlink has improved since > > > then (I think). So please go ahead and give this a shot. If you can > > > help fix any issues this highlights, I'd really appreciate it and I'd > > > be happy to guide you through what I think needs to happen. But I > > > don't think I have the time to fix it myself. > > > > Sure, I tend to agree. Let me check the generic phy driver path for any > > issues and if that test seems to go okay I too am of the opinion that > > without any solid reasoning against this we enable it and battle through > > (revert and fix after the fact if necessary) any newly identified issues > > that prevent phy-handle and fw_devlink have with each other. > > > > Hmmm, yes the generic phy driver path for this > doesn't seem to work well. Its fine and dandy if there's > no device_link (current situation), but if there is one > (say with my patch and in my example above between ethernet1 and phy@a, > you can ignore the ethernet0 relationship since its a cycle > and therefore no device_link is created as mentioned in the patch) > you run into problems with the generic phy driver. > > In my original test you can see I use the marvell driver > for the phy. In that case things work well. In the generic phy > case however, ethernet1's probe is actually delayed far past > phy@a. Here's some logs that show that the device_link getting > "relaxed" due to no driver being bound, which has fw_devlink > thinking this supplier phy isn't going to get a driver ever, > so it finally tries to unblock (probe) the consumer (ethernet1): > > [ 40.695570] platform 23000000.ethernet: Relaxing link with stmmac-0:0a > [ 40.702274] CPU: 4 UID: 0 PID: 111 Comm: kworker/u34:1 Not tainted 6.12.0-rc1-next-20240930-00004-gb766c5527800-dirty #155 > [ 40.713605] Hardware name: Qualcomm SA8775P Ride (DT) > [ 40.718789] Workqueue: events_unbound deferred_probe_work_func > [ 40.724774] Call trace: > [ 40.727295] dump_backtrace+0x108/0x190 > [ 40.731233] show_stack+0x24/0x38 > [ 40.734638] dump_stack_lvl+0x40/0x88 > [ 40.738406] dump_stack+0x18/0x28 > [ 40.741811] fw_devlink_unblock_consumers+0x78/0xe8 > [ 40.746824] device_add+0x290/0x3f8 > [ 40.750411] phy_device_register+0x6c/0xd0 > [ 40.754615] fwnode_mdiobus_phy_device_register+0xe8/0x178 > [ 40.760246] fwnode_mdiobus_register_phy+0x214/0x268 > [ 40.765344] __of_mdiobus_parse_phys+0x80/0x280 > [ 40.769995] __of_mdiobus_register+0xd0/0x230 > [ 40.774465] stmmac_mdio_register+0x220/0x3c8 [stmmac] > [ 40.779755] stmmac_dvr_probe+0x91c/0xd70 [stmmac] > [ 40.784682] devm_stmmac_pltfr_probe+0x54/0xe0 [stmmac_platform] > [ 40.790846] qcom_ethqos_probe+0x404/0x438 [dwmac_qcom_ethqos] > [ 40.796830] platform_probe+0x94/0xd8 > > If I understand correctly that's because the generic phy driver > is bound during a MAC's (like ethernet1 here) phylink_fwnode_phy_connect() call > in ndo_open() currently.. here's another dump_stack() (yes I abuse that alot) > showing when that happens: > > [ 42.980611] net end1: Before phylink_fwnode_phy_connect > [ 42.986011] CPU: 4 UID: 0 PID: 310 Comm: NetworkManager Not tainted 6.12.0-rc1-next-20240930-00004-gb766c5527800-dirty #156 > [ 42.997436] Hardware name: Qualcomm SA8775P Ride (DT) > [ 43.002632] Call trace: > [ 43.005152] dump_backtrace+0x108/0x190 > [ 43.009106] show_stack+0x24/0x38 > [ 43.012518] dump_stack_lvl+0x40/0x88 > [ 43.016290] dump_stack+0x18/0x28 > [ 43.019701] phy_attach_direct+0x2d4/0x3e0 > [ 43.023918] phylink_fwnode_phy_connect+0xc4/0x178 > [ 43.028848] __stmmac_open+0x698/0x6e0 [stmmac] > [ 43.033534] stmmac_open+0x54/0xe0 [stmmac] > [ 43.037850] __dev_open+0x110/0x228 > [ 43.041442] __dev_change_flags+0xbc/0x1d0 > > > And here's the code for the binding of the generic phy driver: > > /** > * phy_attach_direct - attach a network device to a given PHY device pointer > * @dev: network device to attach > * @phydev: Pointer to phy_device to attach > * @flags: PHY device's dev_flags > * @interface: PHY device's interface > * > * Description: Called by drivers to attach to a particular PHY > * device. The phy_device is found, and properly hooked up > * to the phy_driver. If no driver is attached, then a > * generic driver is used. The phy_device is given a ptr to > * the attaching device, and given a callback for link status > * change. The phy_device is returned to the attaching driver. > * This function takes a reference on the phy device. > */ > int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, > u32 flags, phy_interface_t interface) > { > struct mii_bus *bus = phydev->mdio.bus; > struct device *d = &phydev->mdio.dev; > struct module *ndev_owner = NULL; > bool using_genphy = false; > int err; > > /* For Ethernet device drivers that register their own MDIO bus, we > * will have bus->owner match ndev_mod, so we do not want to increment > * our own module->refcnt here, otherwise we would not be able to > * unload later on. > */ > if (dev) > ndev_owner = dev->dev.parent->driver->owner; > if (ndev_owner != bus->owner && !try_module_get(bus->owner)) { > phydev_err(phydev, "failed to get the bus module\n"); > return -EIO; > } > > get_device(d); > > /* Assume that if there is no driver, that it doesn't > * exist, and we should use the genphy driver. > */ > if (!d->driver) { > if (phydev->is_c45) > d->driver = &genphy_c45_driver.mdiodrv.driver; > else > d->driver = &genphy_driver.mdiodrv.driver; > > using_genphy = true; > dump_stack(); > } > > if (!try_module_get(d->driver->owner)) { > phydev_err(phydev, "failed to get the device driver module\n"); > err = -EIO; > goto error_put_device; > } > > if (using_genphy) { > err = d->driver->probe(d); > if (err >= 0) > err = device_bind_driver(d); > > if (err) > goto error_module_put; > } > > ... > } > > Something will need to be done for the generic phy driver case before > this patch could be considered acceptable as this would slow the boot time > for the topology I described in the patch description if the generic phy > driver was used. Right. And the way to do that is to move the generic phy driver matching to go through the normal probe model instead of doing a direct driver attach or directly calling the probe function. It's possible and I had a mental model a while ago, but didn't have the time to get around to it. Basically, if we find that none of the drivers match, we need to trigger something like -EPROBE_DEFER again and then match it with the generic phy driver. Or figure out some other way for the generic phy driver to NOT match if a better driver is available. Once we do that, I think the rest should be easy to fix. Let me know if there's anything I can do to help you move this forward. -Saravana
diff --git a/drivers/of/property.c b/drivers/of/property.c index 11b922fde7af..4a2fca75e1c6 100644 --- a/drivers/of/property.c +++ b/drivers/of/property.c @@ -1220,6 +1220,7 @@ DEFINE_SIMPLE_PROP(hwlocks, "hwlocks", "#hwlock-cells") DEFINE_SIMPLE_PROP(extcon, "extcon", NULL) DEFINE_SIMPLE_PROP(nvmem_cells, "nvmem-cells", "#nvmem-cell-cells") DEFINE_SIMPLE_PROP(phys, "phys", "#phy-cells") +DEFINE_SIMPLE_PROP(phy_handle, "phy-handle", NULL) DEFINE_SIMPLE_PROP(wakeup_parent, "wakeup-parent", NULL) DEFINE_SIMPLE_PROP(pinctrl0, "pinctrl-0", NULL) DEFINE_SIMPLE_PROP(pinctrl1, "pinctrl-1", NULL) @@ -1366,6 +1367,7 @@ static const struct supplier_bindings of_supplier_bindings[] = { { .parse_prop = parse_extcon, }, { .parse_prop = parse_nvmem_cells, }, { .parse_prop = parse_phys, }, + { .parse_prop = parse_phy_handle, }, { .parse_prop = parse_wakeup_parent, }, { .parse_prop = parse_pinctrl0, }, { .parse_prop = parse_pinctrl1, },
Add support for parsing the phy-handle binding so that fw_devlink can enforce the dependency. This prevents MACs (that use this binding to claim they're using the corresponding phy) from probing prior to the phy, unless the phy is a child of the MAC (which results in a dependency cycle) or similar. For some motivation, imagine a device topology like so: ðernet0 { phy-mode = "sgmii"; phy-handle = <&sgmii_phy0>; mdio { compatible = "snps,dwmac-mdio"; sgmii_phy0: phy@8 { compatible = "ethernet-phy-id0141.0dd4"; reg = <0x8>; device_type = "ethernet-phy"; }; sgmii_phy1: phy@a { compatible = "ethernet-phy-id0141.0dd4"; reg = <0xa>; device_type = "ethernet-phy"; }; }; }; ðernet1 { phy-mode = "sgmii"; phy-handle = <&sgmii_phy1>; }; Here ethernet1 depends on sgmii_phy1 to function properly. In the below link an issue is reported where ethernet1 is probed and used prior to sgmii_phy1, resulting in a failure to get things working for ethernet1. With this change in place ethernet1 doesn't probe until sgmii_phy1 is ready, resulting in ethernet1 functioning properly. ethernet0 consumes sgmii_phy0, but this dependency isn't enforced via the device_links backing fw_devlink since ethernet0 is the parent of sgmii_phy0. Here's a log showing that in action: [ 7.000432] qcom-ethqos 23040000.ethernet: Fixed dependency cycle(s) with /soc@0/ethernet@23040000/mdio/phy@8 With this change in place ethernet1's dependency is properly described, and it doesn't probe prior to sgmii_phy1 being available. Link: https://lore.kernel.org/netdev/7723d4l2kqgrez3yfauvp2ueu6awbizkrq4otqpsqpytzp45q2@rju2nxmqu4ew/ Suggested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Andrew Halaney <ahalaney@redhat.com> --- I've marked this as an RFT because when looking through old mailing list discusssions and kernel tech talks on this subject, I was unable to really understand why in the past phy-handle had been left out. There were some loose references to circular dependencies (which seem more or less handled by fw_devlink to me), and the fact that a lot of behavior happens in ndo_open() (but I couldn't quite grok the concern there). I'd appreciate more testing by others and some feedback from those who know this a bit better to indicate whether fw_devlink is ready to handle this or not. At least in my narrow point of view, it's working well for me. Thanks, Andrew --- drivers/of/property.c | 2 ++ 1 file changed, 2 insertions(+) --- base-commit: cea5425829f77e476b03702426f6b3701299b925 change-id: 20240829-phy-handle-fw-devlink-b829622200ae Best regards,