Message ID | E1qChay-00Fmrf-9Y@rmk-PC.armlinux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | dsa/88e6xxx/phylink changes after the next merge window | expand |
Hi Russell, Does there need to be any locking when calling phylink_pcs_change? I noticed that you call it from threaded IRQ context in [1]. Can that race with phylink_major_config? --Sean [1] https://lore.kernel.org/all/E1qJruX-00Gkk8-RY@rmk-PC.armlinux.org.uk/
On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote: > Hi Russell, > > Does there need to be any locking when calling phylink_pcs_change? I > noticed that you call it from threaded IRQ context in [1]. Can that race > with phylink_major_config? What kind of scenario are you thinking may require locking? I guess the possibility would be if pcs->phylink changes and the compiler reads it multiple times - READ_ONCE() should solve that. However, in terms of the mechanics, there's no race. During the initial bringup, the resolve worker isn't started until after phylink_major_config() has completed (it's started at phylink_enable_and_run_resolve().) So, if phylink_pcs_change() gets called while in phylink_major_config() there, it'll see that pl->phylink_disable_state is non-zero, and won't queue the work. The next one is within the worker itself - and there can only be one instance of the worker running in totality. So, if phylink_pcs_change() gets called while phylink_major_config() is running from this path, the only thing it'll do is re-schedule the resolve worker to run another iteration which is harmless (whether or not the PCS is still current.) The last case is phylink_ethtool_ksettings_set(). This runs under the state_mutex, which locks out the resolve worker (since it also takes that mutex). So calling phylink_pcs_change() should be pretty harmless _unless_ the compiler re-reads pcs->phylink multiple times inside phylink_pcs_change(), which I suppose with modern compilers is possible. Hence my suggestion above about READ_ONCE() for that. Have you encountered an OOPS because pcs->phylink has become NULL? Or have you spotted another issue?
On 1/23/24 15:07, Russell King (Oracle) wrote: > On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote: >> Hi Russell, >> >> Does there need to be any locking when calling phylink_pcs_change? I >> noticed that you call it from threaded IRQ context in [1]. Can that race >> with phylink_major_config? > > What kind of scenario are you thinking may require locking? Can't we at least get a spurious bounce? E.g. pcs_major_config() pcs_disable(old_pcs) /* masks IRQ */ old_pcs->phylink = NULL; new_pcs->phylink = pl; ... pcs_enable(new_pcs) /* unmasks IRQ */ ... pcs_handle_irq(new_pcs) /* Link up IRQ */ phylink_pcs_change(new_pcs, true) phylink_run_resolve(pl) phylink_resolve(pl) /* Link up */ pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before pcs_disable) */ phylink_pcs_change(old_pcs, false) phylink_run_resolve(pl) /* Doesn't see the NULL */ phylink_resolve(pl) /* Link down; retrigger */ phylink_resolve(pl) /* Link up */ And mac_link_dropped probably needs WRITE_ONCE in order to take advantage of the ordering provided by queue_work. > I guess the possibility would be if pcs->phylink changes and the > compiler reads it multiple times - READ_ONCE() should solve that. > > However, in terms of the mechanics, there's no race. > > During the initial bringup, the resolve worker isn't started until > after phylink_major_config() has completed (it's started at > phylink_enable_and_run_resolve().) So, if phylink_pcs_change() > gets called while in phylink_major_config() there, it'll see > that pl->phylink_disable_state is non-zero, and won't queue the > work. > > The next one is within the worker itself - and there can only > be one instance of the worker running in totality. So, if > phylink_pcs_change() gets called while phylink_major_config() is > running from this path, the only thing it'll do is re-schedule > the resolve worker to run another iteration which is harmless > (whether or not the PCS is still current.) > > The last case is phylink_ethtool_ksettings_set(). This runs under > the state_mutex, which locks out the resolve worker (since it also > takes that mutex). > > So calling phylink_pcs_change() should be pretty harmless _unless_ > the compiler re-reads pcs->phylink multiple times inside > phylink_pcs_change(), which I suppose with modern compilers is > possible. Hence my suggestion above about READ_ONCE() for that. > > Have you encountered an OOPS because pcs->phylink has become NULL? > Or have you spotted another issue? I was looking at extending this code, and I was wondering if I needed to e.g. take RTNL first. Thanks for the quick response. --Sean
On Tue, Jan 23, 2024 at 03:33:57PM -0500, Sean Anderson wrote: > On 1/23/24 15:07, Russell King (Oracle) wrote: > > On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote: > >> Hi Russell, > >> > >> Does there need to be any locking when calling phylink_pcs_change? I > >> noticed that you call it from threaded IRQ context in [1]. Can that race > >> with phylink_major_config? > > > > What kind of scenario are you thinking may require locking? > > Can't we at least get a spurious bounce? E.g. > > pcs_major_config() > pcs_disable(old_pcs) /* masks IRQ */ > old_pcs->phylink = NULL; > new_pcs->phylink = pl; > ... > pcs_enable(new_pcs) /* unmasks IRQ */ > ... > > pcs_handle_irq(new_pcs) /* Link up IRQ */ > phylink_pcs_change(new_pcs, true) > phylink_run_resolve(pl) > > phylink_resolve(pl) > /* Link up */ By this time, old_pcs->phylink has been set to NULL as you mentioned above. > pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before pcs_disable) */ > phylink_pcs_change(old_pcs, false) > phylink_run_resolve(pl) /* Doesn't see the NULL */ So here, phylink_pcs_change(old_pcs, ...) will read old_pcs->phylink and find that it's NULL, and do nothing. > > I guess the possibility would be if pcs->phylink changes and the > > compiler reads it multiple times - READ_ONCE() should solve that. > > > > However, in terms of the mechanics, there's no race. > > > > During the initial bringup, the resolve worker isn't started until > > after phylink_major_config() has completed (it's started at > > phylink_enable_and_run_resolve().) So, if phylink_pcs_change() > > gets called while in phylink_major_config() there, it'll see > > that pl->phylink_disable_state is non-zero, and won't queue the > > work. > > > > The next one is within the worker itself - and there can only > > be one instance of the worker running in totality. So, if > > phylink_pcs_change() gets called while phylink_major_config() is > > running from this path, the only thing it'll do is re-schedule > > the resolve worker to run another iteration which is harmless > > (whether or not the PCS is still current.) > > > > The last case is phylink_ethtool_ksettings_set(). This runs under > > the state_mutex, which locks out the resolve worker (since it also > > takes that mutex). > > > > So calling phylink_pcs_change() should be pretty harmless _unless_ > > the compiler re-reads pcs->phylink multiple times inside > > phylink_pcs_change(), which I suppose with modern compilers is > > possible. Hence my suggestion above about READ_ONCE() for that. > > > > Have you encountered an OOPS because pcs->phylink has become NULL? > > Or have you spotted another issue? > > I was looking at extending this code, and I was wondering if I needed > to e.g. take RTNL first. Thanks for the quick response. Note that phylink_mac_change() gets called in irq context, so this stuff can't take any mutexes or the rtnl. It is also intended that phylink_pcs_change() is similarly callable in irq context.
On 1/23/24 16:05, Russell King (Oracle) wrote: > On Tue, Jan 23, 2024 at 03:33:57PM -0500, Sean Anderson wrote: >> On 1/23/24 15:07, Russell King (Oracle) wrote: >>> On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote: >>>> Hi Russell, >>>> >>>> Does there need to be any locking when calling >>>> phylink_pcs_change? I noticed that you call it from threaded >>>> IRQ context in [1]. Can that race with phylink_major_config? >>> >>> What kind of scenario are you thinking may require locking? >> >> Can't we at least get a spurious bounce? E.g. >> >> pcs_major_config() pcs_disable(old_pcs) /* masks IRQ */ >> old_pcs->phylink = NULL; new_pcs->phylink = pl; ... >> pcs_enable(new_pcs) /* unmasks IRQ */ ... >> >> pcs_handle_irq(new_pcs) /* Link up IRQ */ >> phylink_pcs_change(new_pcs, true) phylink_run_resolve(pl) >> >> phylink_resolve(pl) /* Link up */ > > By this time, old_pcs->phylink has been set to NULL as you mentioned > above. > >> pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before >> pcs_disable) */ phylink_pcs_change(old_pcs, false) >> phylink_run_resolve(pl) /* Doesn't see the NULL */ > > So here, phylink_pcs_change(old_pcs, ...) will read old_pcs->phylink > and find that it's NULL, and do nothing. This can happen on another CPU. There are no memory barriers on the read side (until queue_work), so there's no guarantee that other CPUs will see the write. --Sean >>> I guess the possibility would be if pcs->phylink changes and the >>> compiler reads it multiple times - READ_ONCE() should solve >>> that. >>> >>> However, in terms of the mechanics, there's no race. >>> >>> During the initial bringup, the resolve worker isn't started >>> until after phylink_major_config() has completed (it's started >>> at phylink_enable_and_run_resolve().) So, if >>> phylink_pcs_change() gets called while in phylink_major_config() >>> there, it'll see that pl->phylink_disable_state is non-zero, and >>> won't queue the work. >>> >>> The next one is within the worker itself - and there can only be >>> one instance of the worker running in totality. So, if >>> phylink_pcs_change() gets called while phylink_major_config() is >>> running from this path, the only thing it'll do is re-schedule >>> the resolve worker to run another iteration which is harmless >>> (whether or not the PCS is still current.) >>> >>> The last case is phylink_ethtool_ksettings_set(). This runs >>> under the state_mutex, which locks out the resolve worker (since >>> it also takes that mutex). >>> >>> So calling phylink_pcs_change() should be pretty harmless >>> _unless_ the compiler re-reads pcs->phylink multiple times >>> inside phylink_pcs_change(), which I suppose with modern >>> compilers is possible. Hence my suggestion above about >>> READ_ONCE() for that. >>> >>> Have you encountered an OOPS because pcs->phylink has become >>> NULL? Or have you spotted another issue? >> >> I was looking at extending this code, and I was wondering if I >> needed to e.g. take RTNL first. Thanks for the quick response. > > Note that phylink_mac_change() gets called in irq context, so this > stuff can't take any mutexes or the rtnl. It is also intended that > phylink_pcs_change() is similarly callable in irq context. >
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 9840a2952309..71b1012ef3be 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -1137,6 +1137,11 @@ static void phylink_major_config(struct phylink *pl, bool restart, if (pcs_changed) { phylink_pcs_disable(pl->pcs); + if (pl->pcs) + pl->pcs->phylink = NULL; + + pcs->phylink = pl; + pl->pcs = pcs; } @@ -1991,6 +1996,14 @@ void phylink_disconnect_phy(struct phylink *pl) } EXPORT_SYMBOL_GPL(phylink_disconnect_phy); +static void phylink_link_changed(struct phylink *pl, bool up, const char *what) +{ + if (!up) + pl->mac_link_dropped = true; + phylink_run_resolve(pl); + phylink_dbg(pl, "%s link %s\n", what, up ? "up" : "down"); +} + /** * phylink_mac_change() - notify phylink of a change in MAC state * @pl: a pointer to a &struct phylink returned from phylink_create() @@ -2001,13 +2014,30 @@ EXPORT_SYMBOL_GPL(phylink_disconnect_phy); */ void phylink_mac_change(struct phylink *pl, bool up) { - if (!up) - pl->mac_link_dropped = true; - phylink_run_resolve(pl); - phylink_dbg(pl, "mac link %s\n", up ? "up" : "down"); + phylink_link_changed(pl, up, "mac"); } EXPORT_SYMBOL_GPL(phylink_mac_change); +/** + * phylink_pcs_change() - notify phylink of a change to PCS link state + * @pcs: pointer to &struct phylink_pcs + * @up: indicates whether the link is currently up. + * + * The PCS driver should call this when the state of its link changes + * (e.g. link failure, new negotiation results, etc.) Note: it should + * not determine "up" by reading the BMSR. If in doubt about the link + * state at interrupt time, then pass true if pcs_get_state() returns + * the latched link-down state, otherwise pass false. + */ +void phylink_pcs_change(struct phylink_pcs *pcs, bool up) +{ + struct phylink *pl = pcs->phylink; + + if (pl) + phylink_link_changed(pl, up, "pcs"); +} +EXPORT_SYMBOL_GPL(phylink_pcs_change); + static irqreturn_t phylink_link_handler(int irq, void *data) { struct phylink *pl = data; diff --git a/include/linux/phylink.h b/include/linux/phylink.h index e609b208fa3d..734fb4ca9cf7 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -9,6 +9,7 @@ struct device_node; struct ethtool_cmd; struct fwnode_handle; struct net_device; +struct phylink; enum { MLO_PAUSE_NONE, @@ -518,14 +519,19 @@ struct phylink_pcs_ops; /** * struct phylink_pcs - PHYLINK PCS instance * @ops: a pointer to the &struct phylink_pcs_ops structure + * @phylink: pointer to &struct phylink_config * @neg_mode: provide PCS neg mode via "mode" argument * @poll: poll the PCS for link changes * * This structure is designed to be embedded within the PCS private data, * and will be passed between phylink and the PCS. + * + * The @phylink member is private to phylink and must not be touched by + * the PCS driver. */ struct phylink_pcs { const struct phylink_pcs_ops *ops; + struct phylink *phylink; bool neg_mode; bool poll; }; @@ -697,6 +703,7 @@ int phylink_fwnode_phy_connect(struct phylink *pl, void phylink_disconnect_phy(struct phylink *); void phylink_mac_change(struct phylink *, bool up); +void phylink_pcs_change(struct phylink_pcs *, bool up); void phylink_start(struct phylink *); void phylink_stop(struct phylink *);
Add a function, phylink_pcs_change() which can be used by PCs drivers to notify phylink about changes to the PCS link state. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> --- drivers/net/phy/phylink.c | 38 ++++++++++++++++++++++++++++++++++---- include/linux/phylink.h | 7 +++++++ 2 files changed, 41 insertions(+), 4 deletions(-)