Message ID | 20220711160519.741990-1-sean.anderson@seco.com (mailing list archive) |
---|---|
Headers | show |
Series | net: pcs: Add support for devices probed in the "usual" manner | expand |
Hi Sean, On Mon, Jul 11, 2022 at 12:05:10PM -0400, Sean Anderson wrote: > For a long time, PCSs have been tightly coupled with their MACs. For > this reason, the MAC creates the "phy" or mdio device, and then passes > it to the PCS to initialize. This has a few disadvantages: > > - Each MAC must re-implement the same steps to look up/create a PCS > - The PCS cannot use functions tied to device lifetime, such as devm_*. > - Generally, the PCS does not have easy access to its device tree node > > I'm not sure if these are terribly large disadvantages. In fact, I'm not > sure if this series provides any benefit which could not be achieved > with judicious use of helper functions. In any case, here it is. > > NB: Several (later) patches in this series should not be applied. See > the notes in each commit for details on when they can be applied. Sorry to burst your bubble, but the networking drivers on NXP LS1028A (device tree at arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi, drivers at drivers/net/ethernet/freescale/enetc/ and drivers/net/dsa/ocelot/) do not use the Lynx PCS through a pcs-handle, because the Lynx PCS in fact has no backing OF node there, nor do the internal MDIO buses of the ENETC and of the switch. It seems that I need to point this out explicitly: you need to provide at least a working migration path to your PCS driver model. Currently there isn't one, and as a result, networking is broken on the LS1028A with this patch set.
Hi Vladimir, On 7/19/22 11:25 AM, Vladimir Oltean wrote: > Hi Sean, > > On Mon, Jul 11, 2022 at 12:05:10PM -0400, Sean Anderson wrote: >> For a long time, PCSs have been tightly coupled with their MACs. For >> this reason, the MAC creates the "phy" or mdio device, and then passes >> it to the PCS to initialize. This has a few disadvantages: >> >> - Each MAC must re-implement the same steps to look up/create a PCS >> - The PCS cannot use functions tied to device lifetime, such as devm_*. >> - Generally, the PCS does not have easy access to its device tree node >> >> I'm not sure if these are terribly large disadvantages. In fact, I'm not >> sure if this series provides any benefit which could not be achieved >> with judicious use of helper functions. In any case, here it is. >> >> NB: Several (later) patches in this series should not be applied. See >> the notes in each commit for details on when they can be applied. > > Sorry to burst your bubble, but the networking drivers on NXP LS1028A > (device tree at arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi, drivers > at drivers/net/ethernet/freescale/enetc/ and drivers/net/dsa/ocelot/) > do not use the Lynx PCS through a pcs-handle, because the Lynx PCS in > fact has no backing OF node there, nor do the internal MDIO buses of the > ENETC and of the switch. > > It seems that I need to point this out explicitly: you need to provide > at least a working migration path to your PCS driver model. Currently > there isn't one, and as a result, networking is broken on the LS1028A > with this patch set. > Please refer to patches 4, 5, and 6. --Sean
On Tue, Jul 19, 2022 at 11:28:42AM -0400, Sean Anderson wrote: > Hi Vladimir, > > On 7/19/22 11:25 AM, Vladimir Oltean wrote: > > Hi Sean, > > > > On Mon, Jul 11, 2022 at 12:05:10PM -0400, Sean Anderson wrote: > >> For a long time, PCSs have been tightly coupled with their MACs. For > >> this reason, the MAC creates the "phy" or mdio device, and then passes > >> it to the PCS to initialize. This has a few disadvantages: > >> > >> - Each MAC must re-implement the same steps to look up/create a PCS > >> - The PCS cannot use functions tied to device lifetime, such as devm_*. > >> - Generally, the PCS does not have easy access to its device tree node > >> > >> I'm not sure if these are terribly large disadvantages. In fact, I'm not > >> sure if this series provides any benefit which could not be achieved > >> with judicious use of helper functions. In any case, here it is. > >> > >> NB: Several (later) patches in this series should not be applied. See > >> the notes in each commit for details on when they can be applied. > > > > Sorry to burst your bubble, but the networking drivers on NXP LS1028A > > (device tree at arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi, drivers > > at drivers/net/ethernet/freescale/enetc/ and drivers/net/dsa/ocelot/) > > do not use the Lynx PCS through a pcs-handle, because the Lynx PCS in > > fact has no backing OF node there, nor do the internal MDIO buses of the > > ENETC and of the switch. > > > > It seems that I need to point this out explicitly: you need to provide > > at least a working migration path to your PCS driver model. Currently > > there isn't one, and as a result, networking is broken on the LS1028A > > with this patch set. > > > > Please refer to patches 4, 5, and 6. I don't understand, could you be more clear? Are you saying that I shouldn't have applied patch 9 while testing? When would be a good moment to apply patch 9?
On 7/19/22 11:38 AM, Vladimir Oltean wrote: > On Tue, Jul 19, 2022 at 11:28:42AM -0400, Sean Anderson wrote: >> Hi Vladimir, >> >> On 7/19/22 11:25 AM, Vladimir Oltean wrote: >> > Hi Sean, >> > >> > On Mon, Jul 11, 2022 at 12:05:10PM -0400, Sean Anderson wrote: >> >> For a long time, PCSs have been tightly coupled with their MACs. For >> >> this reason, the MAC creates the "phy" or mdio device, and then passes >> >> it to the PCS to initialize. This has a few disadvantages: >> >> >> >> - Each MAC must re-implement the same steps to look up/create a PCS >> >> - The PCS cannot use functions tied to device lifetime, such as devm_*. >> >> - Generally, the PCS does not have easy access to its device tree node >> >> >> >> I'm not sure if these are terribly large disadvantages. In fact, I'm not >> >> sure if this series provides any benefit which could not be achieved >> >> with judicious use of helper functions. In any case, here it is. >> >> >> >> NB: Several (later) patches in this series should not be applied. See >> >> the notes in each commit for details on when they can be applied. >> > >> > Sorry to burst your bubble, but the networking drivers on NXP LS1028A >> > (device tree at arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi, drivers >> > at drivers/net/ethernet/freescale/enetc/ and drivers/net/dsa/ocelot/) >> > do not use the Lynx PCS through a pcs-handle, because the Lynx PCS in >> > fact has no backing OF node there, nor do the internal MDIO buses of the >> > ENETC and of the switch. >> > >> > It seems that I need to point this out explicitly: you need to provide >> > at least a working migration path to your PCS driver model. Currently >> > there isn't one, and as a result, networking is broken on the LS1028A >> > with this patch set. >> > >> >> Please refer to patches 4, 5, and 6. > > I don't understand, could you be more clear? Are you saying that I > shouldn't have applied patch 9 while testing? When would be a good > moment to apply patch 9? I'm saying that patches 4 and 5 [1] provide "...a working migration path to [my] PCS driver model." Since enetc/ocelot do not use devicetree for the PCS, patch 9 should have no effect. That said, if you've tested this on actual hardware, I'm interested in your results. I do not have access to enetc/ocelot hardware, so I was unable to test whether my proposed migration would work. --Sean [1] I listed 6 but it seems like it just has some small hunks which should have been in 5 instead
On Tue, Jul 19, 2022 at 11:46:23AM -0400, Sean Anderson wrote: > I'm saying that patches 4 and 5 [1] provide "...a working migration > path to [my] PCS driver model." Since enetc/ocelot do not use > devicetree for the PCS, patch 9 should have no effect. > > That said, if you've tested this on actual hardware, I'm interested > in your results. I do not have access to enetc/ocelot hardware, so > I was unable to test whether my proposed migration would work. > > --Sean > > [1] I listed 6 but it seems like it just has some small hunks which should have been in 5 instead Got it, thanks. So things actually work up until the end, after fixing the compilation errors and warnings and applying my phy_mask patch first. However, as mentioned by Russell King, this patch set now gives us the possibility of doing this, which happily kills the system: echo "0000:00:00.5-imdio:03" > /sys/bus/mdio_bus/drivers/lynx-pcs/unbind For your information, pcs-rzn1-miic.c already has a device_link_add() call to its consumer, and it does avoid the unbinding problem. It is a bit of a heavy hammer as Russell points out (a DSA switch is a single struct device, but has multiple net_devices and phylink instances, and the switch device would be unregistered in its entirety), but on the other hand, this is one of the simpler things we can do, until we have something more fine-grained. I, for one, am perfectly happy with a device link. The alternative would be reworking phylink to react on PCS devices coming and going. I don't even know what the implications are upon mac_select_pcs() and such...
On 7/19/22 2:11 PM, Vladimir Oltean wrote: > On Tue, Jul 19, 2022 at 11:46:23AM -0400, Sean Anderson wrote: >> I'm saying that patches 4 and 5 [1] provide "...a working migration >> path to [my] PCS driver model." Since enetc/ocelot do not use >> devicetree for the PCS, patch 9 should have no effect. >> >> That said, if you've tested this on actual hardware, I'm interested >> in your results. I do not have access to enetc/ocelot hardware, so >> I was unable to test whether my proposed migration would work. >> >> --Sean >> >> [1] I listed 6 but it seems like it just has some small hunks which should have been in 5 instead > > Got it, thanks. So things actually work up until the end, after fixing > the compilation errors and warnings and applying my phy_mask patch first. > However, as mentioned by Russell King, this patch set now gives us the > possibility of doing this, which happily kills the system: > > echo "0000:00:00.5-imdio:03" > /sys/bus/mdio_bus/drivers/lynx-pcs/unbind > > For your information, pcs-rzn1-miic.c already has a device_link_add() > call to its consumer, and it does avoid the unbinding problem. It is a > bit of a heavy hammer as Russell points out (a DSA switch is a single > struct device, but has multiple net_devices and phylink instances, and > the switch device would be unregistered in its entirety), but on the > other hand, this is one of the simpler things we can do, until we have > something more fine-grained. I, for one, am perfectly happy with a > device link. The alternative would be reworking phylink to react on PCS > devices coming and going. I don't even know what the implications are > upon mac_select_pcs() and such... We could do it, but it'd be a pretty big hack. Something like the following. Phylink would need to be modified to grab the lock before every op and check if the PCS is dead or not. This is of course still not optimal, since there's no way to re-attach a PCS once it goes away. IMO a better solution is to use devlink and submit a patch to add notifications which the MAC driver can register for. That way it can find out when the PCS goes away and potentially do something about it (or just let itself get removed). --- drivers/net/pcs/core.c | 115 +++++++++++++++++++++++++++---------- drivers/net/pcs/pcs-lynx.c | 20 +++---- include/linux/pcs.h | 23 +++++++- include/linux/phylink.h | 19 +----- 4 files changed, 117 insertions(+), 60 deletions(-) diff --git a/drivers/net/pcs/core.c b/drivers/net/pcs/core.c index 782a4cdd19b2..46e4168802db 100644 --- a/drivers/net/pcs/core.c +++ b/drivers/net/pcs/core.c @@ -10,42 +10,83 @@ #include <linux/phylink.h> #include <linux/property.h> +struct phylink_pcs { + struct mutex lock; + struct list_head list; + struct device *dev; + void *priv; + const struct phylink_pcs_ops *ops; + int refs; + bool dead; + bool poll; +}; + static LIST_HEAD(pcs_devices); static DEFINE_MUTEX(pcs_mutex); /** * pcs_register() - register a new PCS - * @pcs: the PCS to register + * @init: Initialization data for a new PCS * * Registers a new PCS which can be automatically attached to a phylink. * - * Return: 0 on success, or -errno on error + * Return: A new PCS, or an error pointer */ -int pcs_register(struct phylink_pcs *pcs) +struct phylink_pcs *pcs_register(struct device *dev, + struct pcs_init *init) { - if (!pcs->dev || !pcs->ops) - return -EINVAL; - if (!pcs->ops->pcs_an_restart || !pcs->ops->pcs_config || - !pcs->ops->pcs_get_state) - return -EINVAL; + struct phylink_pcs *pcs; + if (!init->ops) + return ERR_PTR(-EINVAL); + if (!init->ops->pcs_an_restart || !init->ops->pcs_config || + !init->ops->pcs_get_state) + return ERR_PTR(-EINVAL); + + pcs = kzalloc(sizeof(*pcs), GFP_KERNEL); + if (!pcs) + return ERR_PTR(-ENOMEM); + + pcs->dev = dev; + pcs->priv = init->priv; + pcs->ops = init->ops; + pcs->poll = init->poll; + pcs->refs = 1; + mutex_init(&pcs->lock); INIT_LIST_HEAD(&pcs->list); + mutex_lock(&pcs_mutex); list_add(&pcs->list, &pcs_devices); mutex_unlock(&pcs_mutex); - return 0; + return pcs; } EXPORT_SYMBOL_GPL(pcs_register); +static void pcs_free(struct phylink_pcs *pcs) +{ + int refs; + + refs = --pcs->refs; + mutex_unlock(&pcs->lock); + if (refs) + return; + + WARN_ON(!pcs->dead); + mutex_lock(&pcs_mutex); + list_del(&pcs->list); + mutex_unlock(&pcs_mutex); + kfree(pcs); +} + /** * pcs_unregister() - unregister a PCS * @pcs: a PCS previously registered with pcs_register() */ void pcs_unregister(struct phylink_pcs *pcs) { - mutex_lock(&pcs_mutex); - list_del(&pcs->list); - mutex_unlock(&pcs_mutex); + mutex_lock(&pcs->lock); + pcs->dead = true; + pcs_free(pcs); } EXPORT_SYMBOL_GPL(pcs_unregister); @@ -65,26 +106,25 @@ static void devm_pcs_release(struct device *dev, void *res) * * Return: 0 on success, or -errno on failure */ -int devm_pcs_register(struct device *dev, struct phylink_pcs *pcs) +struct phylink_pcs *devm_pcs_register(struct device *dev, + struct pcs_init *init) { struct phylink_pcs **pcsp; - int ret; + struct phylink_pcs *pcs; pcsp = devres_alloc(devm_pcs_release, sizeof(*pcsp), GFP_KERNEL); if (!pcsp) - return -ENOMEM; + return ERR_PTR(-ENOMEM); - ret = pcs_register(pcs); - if (ret) { + pcs = pcs_register(dev, init); + if (IS_ERR(pcs)) { devres_free(pcsp); - return ret; + } else { + *pcsp = pcs; + devres_add(dev, pcsp); } - - *pcsp = pcs; - devres_add(dev, pcsp); - - return ret; + return pcs; } EXPORT_SYMBOL_GPL(devm_pcs_register); @@ -106,16 +146,20 @@ static struct phylink_pcs *pcs_find(const struct fwnode_handle *fwnode, mutex_lock(&pcs_mutex); list_for_each_entry(pcs, &pcs_devices, list) { - if (dev && pcs->dev == dev) - goto out; - if (fwnode && pcs->dev->fwnode == fwnode) - goto out; + mutex_lock(&pcs->lock); + if (!pcs->dead) { + if (dev && pcs->dev == dev) + goto out; + if (fwnode && pcs->dev->fwnode == fwnode) + goto out; + } + mutex_unlock(&pcs->lock); } pcs = NULL; out: mutex_unlock(&pcs_mutex); - pr_devel("%s: looking for %pfwf or %s %s...%s found\n", __func__, + pr_debug("%s: looking for %pfwf or %s %s...%s found\n", __func__, fwnode, dev ? dev_driver_string(dev) : "(null)", dev ? dev_name(dev) : "(null)", pcs ? " not" : ""); return pcs; @@ -132,10 +176,15 @@ static struct phylink_pcs *pcs_find(const struct fwnode_handle *fwnode, */ static struct phylink_pcs *pcs_get_tail(struct phylink_pcs *pcs) { + bool got_module; + if (!pcs) return ERR_PTR(-EPROBE_DEFER); - if (!try_module_get(pcs->ops->owner)) + got_module = try_module_get(pcs->ops->owner); + pcs->refs += got_module; + mutex_unlock(&pcs->lock); + if (!got_module) return ERR_PTR(-ENODEV); get_device(pcs->dev); @@ -222,5 +271,13 @@ void pcs_put(struct phylink_pcs *pcs) put_device(pcs->dev); module_put(pcs->ops->owner); + mutex_lock(&pcs->lock); + pcs_free(pcs); } EXPORT_SYMBOL_GPL(pcs_put); + +void *pcs_get_priv(struct phylink_pcs *pcs) +{ + return pcs->priv; +} +EXPORT_SYMBOL_GPL(pcs_get_priv); diff --git a/drivers/net/pcs/pcs-lynx.c b/drivers/net/pcs/pcs-lynx.c index c3e2c4a6fab6..f792f2a7cdf2 100644 --- a/drivers/net/pcs/pcs-lynx.c +++ b/drivers/net/pcs/pcs-lynx.c @@ -26,7 +26,6 @@ #define IF_MODE_HALF_DUPLEX BIT(4) struct lynx_pcs { - struct phylink_pcs pcs; struct mdio_device *mdio; }; @@ -37,8 +36,7 @@ enum sgmii_speed { SGMII_SPEED_2500 = 2, }; -#define phylink_pcs_to_lynx(pl_pcs) container_of((pl_pcs), struct lynx_pcs, pcs) -#define lynx_to_phylink_pcs(lynx) (&(lynx)->pcs) +#define phylink_pcs_to_lynx(pl_pcs) pcs_get_priv(pl_pcs) static void lynx_pcs_get_state_usxgmii(struct mdio_device *pcs, struct phylink_link_state *state) @@ -318,21 +316,23 @@ static const struct phylink_pcs_ops lynx_pcs_phylink_ops = { static int lynx_pcs_probe(struct mdio_device *mdio) { struct device *dev = &mdio->dev; + struct phylink_pcs *pcs; struct lynx_pcs *lynx; - int ret; + struct pcs_init init; lynx = devm_kzalloc(dev, sizeof(*lynx), GFP_KERNEL); if (!lynx) return -ENOMEM; lynx->mdio = mdio; - lynx->pcs.dev = dev; - lynx->pcs.ops = &lynx_pcs_phylink_ops; - lynx->pcs.poll = true; + init.priv = lynx; + init.ops = &lynx_pcs_phylink_ops; + init.poll = true; - ret = devm_pcs_register(dev, &lynx->pcs); - if (ret) - return dev_err_probe(dev, ret, "could not register PCS\n"); + pcs = devm_pcs_register(dev, &init); + if (IS_ERR(pcs)) + return dev_err_probe(dev, PTR_ERR(pcs), + "could not register PCS\n"); dev_info(dev, "probed\n"); return 0; } diff --git a/include/linux/pcs.h b/include/linux/pcs.h index 00e76594e03c..2605603149ec 100644 --- a/include/linux/pcs.h +++ b/include/linux/pcs.h @@ -6,12 +6,27 @@ #ifndef _PCS_H #define _PCS_H -struct phylink_pcs; struct fwnode; +struct phylink_pcs; +struct phylink_pcs_ops; -int pcs_register(struct phylink_pcs *pcs); +/** + * struct pcs_init - PCS initialization data + * @priv: the device's private data + * @ops: a pointer to the &struct phylink_pcs_ops structure + * @poll: poll the PCS for link changes + */ +struct pcs_init { + void *priv; + const struct phylink_pcs_ops *ops; + bool poll; +}; + +struct phylink_pcs *pcs_register(struct device *dev, + struct pcs_init *init); void pcs_unregister(struct phylink_pcs *pcs); -int devm_pcs_register(struct device *dev, struct phylink_pcs *pcs); +struct phylink_pcs *devm_pcs_register(struct device *dev, + struct pcs_init *init); struct phylink_pcs *_pcs_get_by_fwnode(const struct fwnode_handle *fwnode, const char *id, bool optional); struct phylink_pcs *pcs_get_by_provider(const struct device *dev); @@ -30,4 +45,6 @@ static inline struct phylink_pcs return _pcs_get_by_fwnode(fwnode, id, true); } +void *pcs_get_priv(struct phylink_pcs *pcs); + #endif /* PCS_H */ diff --git a/include/linux/phylink.h b/include/linux/phylink.h index a713e70108a1..864536d1b293 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -392,24 +392,7 @@ void mac_link_up(struct phylink_config *config, struct phy_device *phy, int speed, int duplex, bool tx_pause, bool rx_pause); #endif -struct phylink_pcs_ops; - -/** - * struct phylink_pcs - PHYLINK PCS instance - * @dev: the device associated with this PCS - * @ops: a pointer to the &struct phylink_pcs_ops structure - * @list: internal list of PCS devices - * @poll: poll the PCS for link changes - * - * This structure is designed to be embedded within the PCS private data, - * and will be passed between phylink and the PCS. - */ -struct phylink_pcs { - struct device *dev; - const struct phylink_pcs_ops *ops; - struct list_head list; - bool poll; -}; +struct phylink_pcs; /** * struct phylink_pcs_ops - MAC PCS operations structure.
On Tue, Jul 19, 2022 at 03:34:45PM -0400, Sean Anderson wrote: > We could do it, but it'd be a pretty big hack. Something like the > following. Phylink would need to be modified to grab the lock before > every op and check if the PCS is dead or not. This is of course still > not optimal, since there's no way to re-attach a PCS once it goes away. You assume it's just phylink who operates on a PCS structure, but if you include your search pool to also cover include/linux/pcs/pcs-xpcs.h, you'll see a bunch of exported functions which are called directly by the client drivers (stmmac, sja1105). At this stage it gets pretty hard to validate that drivers won't attempt from any code path to do something stupid with a dead PCS. All in all it creates an environment with insanely weak guarantees; that's pretty hard to get behind IMO. > IMO a better solution is to use devlink and submit a patch to add > notifications which the MAC driver can register for. That way it can > find out when the PCS goes away and potentially do something about it > (or just let itself get removed). Not sure I understand what connection there is between devlink (device links) and PCS {de}registration notifications. We could probably add those notifications without any intervention from the device core: we would just need to make this new PCS "core" to register an blocking_notifier_call_chain to which interested drivers could add their notifier blocks. How a certain phylink user is going to determine that "hey, this PCS is definitely mine and I can use it" is an open question. In any case, my expectation is that we have a notifier chain, we can at least continue operating (avoid unbinding the struct device), but essentially move our phylink_create/phylink_destroy calls to within those notifier blocks. Again, retrofitting this model to existing drivers, phylink API (and maybe even its internal structure) is something that's hard to hop on board of; I think it's a solution waiting for a problem, and I don't have an interest to develop or even review it.
On 7/20/22 9:53 AM, Vladimir Oltean wrote: > On Tue, Jul 19, 2022 at 03:34:45PM -0400, Sean Anderson wrote: >> We could do it, but it'd be a pretty big hack. Something like the >> following. Phylink would need to be modified to grab the lock before >> every op and check if the PCS is dead or not. This is of course still >> not optimal, since there's no way to re-attach a PCS once it goes away. > > You assume it's just phylink who operates on a PCS structure, but if you > include your search pool to also cover include/linux/pcs/pcs-xpcs.h, > you'll see a bunch of exported functions which are called directly by > the client drivers (stmmac, sja1105). At this stage it gets pretty hard > to validate that drivers won't attempt from any code path to do > something stupid with a dead PCS. All in all it creates an environment > with insanely weak guarantees; that's pretty hard to get behind IMO. Right. To do this properly, we'd need wrapper functions for all the PCS operations. And the super-weak guarantees is why I referred to this as a "hack". But we could certainly make it so that removing a PCS might not bring down the MAC. >> IMO a better solution is to use devlink and submit a patch to add >> notifications which the MAC driver can register for. That way it can >> find out when the PCS goes away and potentially do something about it >> (or just let itself get removed). > > Not sure I understand what connection there is between devlink (device > links) and PCS {de}registration notifications. The default action when a supplier is going to be removed is to remove the consumers. However, it'd be nice to notify the consumer beforehand. If we used device links, this would need to be integrated (since otherwise we'd only find out that a PCS was gone after the MAC was gone too). > We could probably add those > notifications without any intervention from the device core: we would > just need to make this new PCS "core" to register an blocking_notifier_call_chain > to which interested drivers could add their notifier blocks. How a> certain phylink user is going to determine that "hey, this PCS is > definitely mine and I can use it" is an open question. In any case, my > expectation is that we have a notifier chain, we can at least continue > operating (avoid unbinding the struct device), but essentially move our > phylink_create/phylink_destroy calls to within those notifier blocks. > Again, retrofitting this model to existing drivers, phylink API (and > maybe even its internal structure) is something that's hard to hop on > board of; I think it's a solution waiting for a problem, and I don't > have an interest to develop or even review it. I don't either. I'd much rather just bring down the whole MAC when any PCS gets removed. Whatever we decide on doing here should also be done for (serdes) phys as well, since they have all the same pitfalls. For that reason I'd rather use a generic, non-intrusive solution like device links. I know Russell mentioned composite devices, but I think those would have similar advantages/drawbacks as a device-link-based solution (unbinding of one device unbinds the rest). --Sean
On 7/21/22 5:41 PM, Sean Anderson wrote: > On 7/20/22 9:53 AM, Vladimir Oltean wrote: >> On Tue, Jul 19, 2022 at 03:34:45PM -0400, Sean Anderson wrote: >>> We could do it, but it'd be a pretty big hack. Something like the >>> following. Phylink would need to be modified to grab the lock before >>> every op and check if the PCS is dead or not. This is of course still >>> not optimal, since there's no way to re-attach a PCS once it goes away. >> >> You assume it's just phylink who operates on a PCS structure, but if you >> include your search pool to also cover include/linux/pcs/pcs-xpcs.h, >> you'll see a bunch of exported functions which are called directly by >> the client drivers (stmmac, sja1105). At this stage it gets pretty hard >> to validate that drivers won't attempt from any code path to do >> something stupid with a dead PCS. All in all it creates an environment >> with insanely weak guarantees; that's pretty hard to get behind IMO. > > Right. To do this properly, we'd need wrapper functions for all the PCS > operations. And the super-weak guarantees is why I referred to this as a > "hack". But we could certainly make it so that removing a PCS might not > bring down the MAC. > >>> IMO a better solution is to use devlink and submit a patch to add >>> notifications which the MAC driver can register for. That way it can >>> find out when the PCS goes away and potentially do something about it >>> (or just let itself get removed). >> >> Not sure I understand what connection there is between devlink (device >> links) and PCS {de}registration notifications. > > The default action when a supplier is going to be removed is to remove > the consumers. However, it'd be nice to notify the consumer beforehand. > If we used device links, this would need to be integrated (since otherwise > we'd only find out that a PCS was gone after the MAC was gone too). > >> We could probably add those >> notifications without any intervention from the device core: we would >> just need to make this new PCS "core" to register an blocking_notifier_call_chain >> to which interested drivers could add their notifier blocks. How a> certain phylink user is going to determine that "hey, this PCS is >> definitely mine and I can use it" is an open question. In any case, my >> expectation is that we have a notifier chain, we can at least continue >> operating (avoid unbinding the struct device), but essentially move our >> phylink_create/phylink_destroy calls to within those notifier blocks. >> Again, retrofitting this model to existing drivers, phylink API (and >> maybe even its internal structure) is something that's hard to hop on >> board of; I think it's a solution waiting for a problem, and I don't >> have an interest to develop or even review it. > > I don't either. I'd much rather just bring down the whole MAC when any > PCS gets removed. Whatever we decide on doing here should also be done > for (serdes) phys as well, since they have all the same pitfalls. For > that reason I'd rather use a generic, non-intrusive solution like device > links. I know Russell mentioned composite devices, but I think those > would have similar advantages/drawbacks as a device-link-based solution > (unbinding of one device unbinds the rest). OK, I've thought about this a bit more. Right now, you can crash the kernel by unbinding a phy (either serdes or networking) and waiting for a state change. The serdes problem could probably be solved by strengthening the existing device_link_add calls. This will of course unbind the netdev if you unbind the serdes. I think this is not a common use case; if a user does this they probably know what they're doing (or not). The problem with ethernet phys is a bit worse. It's very common to have something like + netdev | +-+ mdiobus | +-- phy which poses a problem for device links. The phy is a child of the netdev, so it should be unbound first. device_link_add will see this and refuse to create a link, since such a link implies that netdev should be unbound before phy. One solution might be something like: diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index a74b320f5b27..05894e1c3e59 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -27,6 +27,7 @@ #include <linux/phy.h> #include <linux/phy_led_triggers.h> #include <linux/property.h> +#include <linux/rtnetlink.h> #include <linux/sfp.h> #include <linux/skbuff.h> #include <linux/slab.h> @@ -3111,6 +3112,13 @@ static int phy_remove(struct device *dev) { struct phy_device *phydev = to_phy_device(dev); + // I'm pretty sure this races with multiple unbinds... + rtnl_lock(); + device_unlock(dev); + dev_close(phydev->attached_dev); + device_lock(dev); + rtnl_unlock(); + WARN_ON(phydev->attached_dev); + cancel_delayed_work_sync(&phydev->state_queue); mutex_lock(&phydev->lock); which is probably the least intrusive we can get. But this isn't very nice for PCSs. First, PCSs are not always used by netdevs. So there's no generic way to ask the user "please clean up this PCS." Additionally, most PCS users expect the PCS to be around for the lifetime of the driver (or at least until they're done using it). Maybe the best solution is to provide some helper functions to use with bus_register_notifier and just let the drivers fend for themselves. Or possibly default to a devlink, but allow registering a notifier instead. --Sean