diff mbox series

[RFC,net-next,03/14] net: phylink: add support for PCS link change notifications

Message ID E1qChay-00Fmrf-9Y@rmk-PC.armlinux.org.uk (mailing list archive)
State New, archived
Headers show
Series dsa/88e6xxx/phylink changes after the next merge window | expand

Commit Message

Russell King (Oracle) June 23, 2023, 2:17 p.m. UTC
Add a function, phylink_pcs_change() which can be used by PCs drivers
to notify phylink about changes to the PCS link state.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/net/phy/phylink.c | 38 ++++++++++++++++++++++++++++++++++----
 include/linux/phylink.h   |  7 +++++++
 2 files changed, 41 insertions(+), 4 deletions(-)

Comments

Sean Anderson Jan. 23, 2024, 7:46 p.m. UTC | #1
Hi Russell,

Does there need to be any locking when calling phylink_pcs_change? I
noticed that you call it from threaded IRQ context in [1]. Can that race
with phylink_major_config?

--Sean

[1] https://lore.kernel.org/all/E1qJruX-00Gkk8-RY@rmk-PC.armlinux.org.uk/
Russell King (Oracle) Jan. 23, 2024, 8:07 p.m. UTC | #2
On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote:
> Hi Russell,
> 
> Does there need to be any locking when calling phylink_pcs_change? I
> noticed that you call it from threaded IRQ context in [1]. Can that race
> with phylink_major_config?

What kind of scenario are you thinking may require locking?

I guess the possibility would be if pcs->phylink changes and the
compiler reads it multiple times - READ_ONCE() should solve that.

However, in terms of the mechanics, there's no race.

During the initial bringup, the resolve worker isn't started until
after phylink_major_config() has completed (it's started at
phylink_enable_and_run_resolve().) So, if phylink_pcs_change()
gets called while in phylink_major_config() there, it'll see
that pl->phylink_disable_state is non-zero, and won't queue the
work.

The next one is within the worker itself - and there can only
be one instance of the worker running in totality. So, if
phylink_pcs_change() gets called while phylink_major_config() is
running from this path, the only thing it'll do is re-schedule
the resolve worker to run another iteration which is harmless
(whether or not the PCS is still current.)

The last case is phylink_ethtool_ksettings_set(). This runs under
the state_mutex, which locks out the resolve worker (since it also
takes that mutex).

So calling phylink_pcs_change() should be pretty harmless _unless_
the compiler re-reads pcs->phylink multiple times inside
phylink_pcs_change(), which I suppose with modern compilers is
possible. Hence my suggestion above about READ_ONCE() for that.

Have you encountered an OOPS because pcs->phylink has become NULL?
Or have you spotted another issue?
Sean Anderson Jan. 23, 2024, 8:33 p.m. UTC | #3
On 1/23/24 15:07, Russell King (Oracle) wrote:
> On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote:
>> Hi Russell,
>> 
>> Does there need to be any locking when calling phylink_pcs_change? I
>> noticed that you call it from threaded IRQ context in [1]. Can that race
>> with phylink_major_config?
> 
> What kind of scenario are you thinking may require locking?

Can't we at least get a spurious bounce? E.g.

pcs_major_config()
  pcs_disable(old_pcs) /* masks IRQ */
  old_pcs->phylink = NULL;
  new_pcs->phylink = pl;
  ...
  pcs_enable(new_pcs) /* unmasks IRQ */
  ...

pcs_handle_irq(new_pcs) /* Link up IRQ */
  phylink_pcs_change(new_pcs, true)
    phylink_run_resolve(pl)

phylink_resolve(pl)
  /* Link up */

pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before pcs_disable) */
  phylink_pcs_change(old_pcs, false)
    phylink_run_resolve(pl) /* Doesn't see the NULL */

phylink_resolve(pl)
  /* Link down; retrigger */
phylink_resolve(pl)
  /* Link up */

And mac_link_dropped probably needs WRITE_ONCE in order to take
advantage of the ordering provided by queue_work.

> I guess the possibility would be if pcs->phylink changes and the
> compiler reads it multiple times - READ_ONCE() should solve that.
> 
> However, in terms of the mechanics, there's no race.
> 
> During the initial bringup, the resolve worker isn't started until
> after phylink_major_config() has completed (it's started at
> phylink_enable_and_run_resolve().) So, if phylink_pcs_change()
> gets called while in phylink_major_config() there, it'll see
> that pl->phylink_disable_state is non-zero, and won't queue the
> work.
> 
> The next one is within the worker itself - and there can only
> be one instance of the worker running in totality. So, if
> phylink_pcs_change() gets called while phylink_major_config() is
> running from this path, the only thing it'll do is re-schedule
> the resolve worker to run another iteration which is harmless
> (whether or not the PCS is still current.)
> 
> The last case is phylink_ethtool_ksettings_set(). This runs under
> the state_mutex, which locks out the resolve worker (since it also
> takes that mutex).
> 
> So calling phylink_pcs_change() should be pretty harmless _unless_
> the compiler re-reads pcs->phylink multiple times inside
> phylink_pcs_change(), which I suppose with modern compilers is
> possible. Hence my suggestion above about READ_ONCE() for that.
> 
> Have you encountered an OOPS because pcs->phylink has become NULL?
> Or have you spotted another issue?

I was looking at extending this code, and I was wondering if I needed
to e.g. take RTNL first. Thanks for the quick response.

--Sean
Russell King (Oracle) Jan. 23, 2024, 9:05 p.m. UTC | #4
On Tue, Jan 23, 2024 at 03:33:57PM -0500, Sean Anderson wrote:
> On 1/23/24 15:07, Russell King (Oracle) wrote:
> > On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote:
> >> Hi Russell,
> >> 
> >> Does there need to be any locking when calling phylink_pcs_change? I
> >> noticed that you call it from threaded IRQ context in [1]. Can that race
> >> with phylink_major_config?
> > 
> > What kind of scenario are you thinking may require locking?
> 
> Can't we at least get a spurious bounce? E.g.
> 
> pcs_major_config()
>   pcs_disable(old_pcs) /* masks IRQ */
>   old_pcs->phylink = NULL;
>   new_pcs->phylink = pl;
>   ...
>   pcs_enable(new_pcs) /* unmasks IRQ */
>   ...
> 
> pcs_handle_irq(new_pcs) /* Link up IRQ */
>   phylink_pcs_change(new_pcs, true)
>     phylink_run_resolve(pl)
> 
> phylink_resolve(pl)
>   /* Link up */

By this time, old_pcs->phylink has been set to NULL as you mentioned
above.

> pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before pcs_disable) */
>   phylink_pcs_change(old_pcs, false)
>     phylink_run_resolve(pl) /* Doesn't see the NULL */

So here, phylink_pcs_change(old_pcs, ...) will read old_pcs->phylink and
find that it's NULL, and do nothing.

> > I guess the possibility would be if pcs->phylink changes and the
> > compiler reads it multiple times - READ_ONCE() should solve that.
> > 
> > However, in terms of the mechanics, there's no race.
> > 
> > During the initial bringup, the resolve worker isn't started until
> > after phylink_major_config() has completed (it's started at
> > phylink_enable_and_run_resolve().) So, if phylink_pcs_change()
> > gets called while in phylink_major_config() there, it'll see
> > that pl->phylink_disable_state is non-zero, and won't queue the
> > work.
> > 
> > The next one is within the worker itself - and there can only
> > be one instance of the worker running in totality. So, if
> > phylink_pcs_change() gets called while phylink_major_config() is
> > running from this path, the only thing it'll do is re-schedule
> > the resolve worker to run another iteration which is harmless
> > (whether or not the PCS is still current.)
> > 
> > The last case is phylink_ethtool_ksettings_set(). This runs under
> > the state_mutex, which locks out the resolve worker (since it also
> > takes that mutex).
> > 
> > So calling phylink_pcs_change() should be pretty harmless _unless_
> > the compiler re-reads pcs->phylink multiple times inside
> > phylink_pcs_change(), which I suppose with modern compilers is
> > possible. Hence my suggestion above about READ_ONCE() for that.
> > 
> > Have you encountered an OOPS because pcs->phylink has become NULL?
> > Or have you spotted another issue?
> 
> I was looking at extending this code, and I was wondering if I needed
> to e.g. take RTNL first. Thanks for the quick response.

Note that phylink_mac_change() gets called in irq context, so this
stuff can't take any mutexes or the rtnl. It is also intended that
phylink_pcs_change() is similarly callable in irq context.
Sean Anderson Jan. 23, 2024, 9:09 p.m. UTC | #5
On 1/23/24 16:05, Russell King (Oracle) wrote:
> On Tue, Jan 23, 2024 at 03:33:57PM -0500, Sean Anderson wrote:
>> On 1/23/24 15:07, Russell King (Oracle) wrote:
>>> On Tue, Jan 23, 2024 at 02:46:15PM -0500, Sean Anderson wrote:
>>>> Hi Russell,
>>>> 
>>>> Does there need to be any locking when calling
>>>> phylink_pcs_change? I noticed that you call it from threaded
>>>> IRQ context in [1]. Can that race with phylink_major_config?
>>> 
>>> What kind of scenario are you thinking may require locking?
>> 
>> Can't we at least get a spurious bounce? E.g.
>> 
>> pcs_major_config() pcs_disable(old_pcs) /* masks IRQ */ 
>> old_pcs->phylink = NULL; new_pcs->phylink = pl; ... 
>> pcs_enable(new_pcs) /* unmasks IRQ */ ...
>> 
>> pcs_handle_irq(new_pcs) /* Link up IRQ */ 
>> phylink_pcs_change(new_pcs, true) phylink_run_resolve(pl)
>> 
>> phylink_resolve(pl) /* Link up */
> 
> By this time, old_pcs->phylink has been set to NULL as you mentioned 
> above.
> 
>> pcs_handle_irq(old_pcs) /* Link down IRQ (pending from before
>> pcs_disable) */ phylink_pcs_change(old_pcs, false) 
>> phylink_run_resolve(pl) /* Doesn't see the NULL */
> 
> So here, phylink_pcs_change(old_pcs, ...) will read old_pcs->phylink
> and find that it's NULL, and do nothing.

This can happen on another CPU. There are no memory barriers on the read
side (until queue_work), so there's no guarantee that other CPUs will
see the write.

--Sean

>>> I guess the possibility would be if pcs->phylink changes and the 
>>> compiler reads it multiple times - READ_ONCE() should solve
>>> that.
>>> 
>>> However, in terms of the mechanics, there's no race.
>>> 
>>> During the initial bringup, the resolve worker isn't started
>>> until after phylink_major_config() has completed (it's started
>>> at phylink_enable_and_run_resolve().) So, if
>>> phylink_pcs_change() gets called while in phylink_major_config()
>>> there, it'll see that pl->phylink_disable_state is non-zero, and
>>> won't queue the work.
>>> 
>>> The next one is within the worker itself - and there can only be
>>> one instance of the worker running in totality. So, if 
>>> phylink_pcs_change() gets called while phylink_major_config() is 
>>> running from this path, the only thing it'll do is re-schedule 
>>> the resolve worker to run another iteration which is harmless 
>>> (whether or not the PCS is still current.)
>>> 
>>> The last case is phylink_ethtool_ksettings_set(). This runs
>>> under the state_mutex, which locks out the resolve worker (since
>>> it also takes that mutex).
>>> 
>>> So calling phylink_pcs_change() should be pretty harmless
>>> _unless_ the compiler re-reads pcs->phylink multiple times
>>> inside phylink_pcs_change(), which I suppose with modern
>>> compilers is possible. Hence my suggestion above about
>>> READ_ONCE() for that.
>>> 
>>> Have you encountered an OOPS because pcs->phylink has become
>>> NULL? Or have you spotted another issue?
>> 
>> I was looking at extending this code, and I was wondering if I
>> needed to e.g. take RTNL first. Thanks for the quick response.
> 
> Note that phylink_mac_change() gets called in irq context, so this 
> stuff can't take any mutexes or the rtnl. It is also intended that 
> phylink_pcs_change() is similarly callable in irq context.
>
diff mbox series

Patch

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 9840a2952309..71b1012ef3be 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1137,6 +1137,11 @@  static void phylink_major_config(struct phylink *pl, bool restart,
 	if (pcs_changed) {
 		phylink_pcs_disable(pl->pcs);
 
+		if (pl->pcs)
+			pl->pcs->phylink = NULL;
+
+		pcs->phylink = pl;
+
 		pl->pcs = pcs;
 	}
 
@@ -1991,6 +1996,14 @@  void phylink_disconnect_phy(struct phylink *pl)
 }
 EXPORT_SYMBOL_GPL(phylink_disconnect_phy);
 
+static void phylink_link_changed(struct phylink *pl, bool up, const char *what)
+{
+	if (!up)
+		pl->mac_link_dropped = true;
+	phylink_run_resolve(pl);
+	phylink_dbg(pl, "%s link %s\n", what, up ? "up" : "down");
+}
+
 /**
  * phylink_mac_change() - notify phylink of a change in MAC state
  * @pl: a pointer to a &struct phylink returned from phylink_create()
@@ -2001,13 +2014,30 @@  EXPORT_SYMBOL_GPL(phylink_disconnect_phy);
  */
 void phylink_mac_change(struct phylink *pl, bool up)
 {
-	if (!up)
-		pl->mac_link_dropped = true;
-	phylink_run_resolve(pl);
-	phylink_dbg(pl, "mac link %s\n", up ? "up" : "down");
+	phylink_link_changed(pl, up, "mac");
 }
 EXPORT_SYMBOL_GPL(phylink_mac_change);
 
+/**
+ * phylink_pcs_change() - notify phylink of a change to PCS link state
+ * @pcs: pointer to &struct phylink_pcs
+ * @up: indicates whether the link is currently up.
+ *
+ * The PCS driver should call this when the state of its link changes
+ * (e.g. link failure, new negotiation results, etc.) Note: it should
+ * not determine "up" by reading the BMSR. If in doubt about the link
+ * state at interrupt time, then pass true if pcs_get_state() returns
+ * the latched link-down state, otherwise pass false.
+ */
+void phylink_pcs_change(struct phylink_pcs *pcs, bool up)
+{
+	struct phylink *pl = pcs->phylink;
+
+	if (pl)
+		phylink_link_changed(pl, up, "pcs");
+}
+EXPORT_SYMBOL_GPL(phylink_pcs_change);
+
 static irqreturn_t phylink_link_handler(int irq, void *data)
 {
 	struct phylink *pl = data;
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index e609b208fa3d..734fb4ca9cf7 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -9,6 +9,7 @@  struct device_node;
 struct ethtool_cmd;
 struct fwnode_handle;
 struct net_device;
+struct phylink;
 
 enum {
 	MLO_PAUSE_NONE,
@@ -518,14 +519,19 @@  struct phylink_pcs_ops;
 /**
  * struct phylink_pcs - PHYLINK PCS instance
  * @ops: a pointer to the &struct phylink_pcs_ops structure
+ * @phylink: pointer to &struct phylink_config
  * @neg_mode: provide PCS neg mode via "mode" argument
  * @poll: poll the PCS for link changes
  *
  * This structure is designed to be embedded within the PCS private data,
  * and will be passed between phylink and the PCS.
+ *
+ * The @phylink member is private to phylink and must not be touched by
+ * the PCS driver.
  */
 struct phylink_pcs {
 	const struct phylink_pcs_ops *ops;
+	struct phylink *phylink;
 	bool neg_mode;
 	bool poll;
 };
@@ -697,6 +703,7 @@  int phylink_fwnode_phy_connect(struct phylink *pl,
 void phylink_disconnect_phy(struct phylink *);
 
 void phylink_mac_change(struct phylink *, bool up);
+void phylink_pcs_change(struct phylink_pcs *, bool up);
 
 void phylink_start(struct phylink *);
 void phylink_stop(struct phylink *);