diff mbox series

[net-next,1/3] net: stmmac: call phylink_start() and phylink_stop() in XDP functions

Message ID E1u3XG6-000EJg-V8@rmk-PC.armlinux.org.uk (mailing list archive)
State New
Headers show
Series net: stmmac: fix setting RE and TE inappropriately | expand

Commit Message

Russell King (Oracle) April 12, 2025, 9:34 a.m. UTC
Phylink does not permit drivers to mess with the netif carrier, as
this will de-synchronise phylink with the MAC driver. Moreover,
setting and clearing the TE and RE bits via stmmac_mac_set() in this
path is also wrong as the link may not be up.

Replace the netif_carrier_on(), netif_carrier_off() and
stmmac_mac_set() calls with the appropriate phylink_start() and
phylink_stop() calls, thereby allowing phylink to manage the netif
carrier and TE/RE bits through the .mac_link_up() and .mac_link_down()
methods.

Note that RE should only be set after the DMA is ready to avoid the
receive FIFO between the MAC and DMA blocks overflowing, so
phylink_start() needs to be placed after DMA has been started.

Tested-by: Furong Xu <0x1207@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

Comments

Jakub Kicinski April 15, 2025, 12:43 a.m. UTC | #1
On Sat, 12 Apr 2025 10:34:42 +0100 Russell King (Oracle) wrote:
> Phylink does not permit drivers to mess with the netif carrier, as
> this will de-synchronise phylink with the MAC driver. Moreover,
> setting and clearing the TE and RE bits via stmmac_mac_set() in this
> path is also wrong as the link may not be up.
> 
> Replace the netif_carrier_on(), netif_carrier_off() and
> stmmac_mac_set() calls with the appropriate phylink_start() and
> phylink_stop() calls, thereby allowing phylink to manage the netif
> carrier and TE/RE bits through the .mac_link_up() and .mac_link_down()
> methods.
> 
> Note that RE should only be set after the DMA is ready to avoid the
> receive FIFO between the MAC and DMA blocks overflowing, so
> phylink_start() needs to be placed after DMA has been started.

IIUC this will case a link loss when XDP is installed, if not disregard
the reset of the email.

Any idea why it's necessary to mess with the link for XDP changes?
Is there no way to discard all the traffic and let the queues go
idle without dropping the link?

I think we should mention in the commit message that the side effect is
link loss on XDP on / off. I don't know of any other driver which would
need this, stmmac is a real gift..
Russell King (Oracle) April 15, 2025, 9:54 a.m. UTC | #2
On Mon, Apr 14, 2025 at 05:43:42PM -0700, Jakub Kicinski wrote:
> On Sat, 12 Apr 2025 10:34:42 +0100 Russell King (Oracle) wrote:
> > Phylink does not permit drivers to mess with the netif carrier, as
> > this will de-synchronise phylink with the MAC driver. Moreover,
> > setting and clearing the TE and RE bits via stmmac_mac_set() in this
> > path is also wrong as the link may not be up.
> > 
> > Replace the netif_carrier_on(), netif_carrier_off() and
> > stmmac_mac_set() calls with the appropriate phylink_start() and
> > phylink_stop() calls, thereby allowing phylink to manage the netif
> > carrier and TE/RE bits through the .mac_link_up() and .mac_link_down()
> > methods.
> > 
> > Note that RE should only be set after the DMA is ready to avoid the
> > receive FIFO between the MAC and DMA blocks overflowing, so
> > phylink_start() needs to be placed after DMA has been started.
> 
> IIUC this will case a link loss when XDP is installed, if not disregard
> the reset of the email.

It will, because the author who added XDP support to stmmac decided it
was easier to tear everything down and rebuild, which meant (presumably)
that it was necessary to use netif_carrier_off() to stop the net layer
queueing packets to the driver. I'm just guessing - I know nothing
about XDP, and never knowingly used it.

> Any idea why it's necessary to mess with the link for XDP changes?

Depends what you mean by "link". If you're asking why it messes with
netif_carrier_foo(), my best guess is as above. However, phylink
drivers are not allowed to mess with the netif_carrier state (as the
commit message states.) This is not a new requirement, it's always
been this way with phylink, and this pre-dates the addition of XDP
to this driver.

As long as the code requires the netif_carrier to be turned off, the
only way to guarantee that in a phylink using driver is as per this
patch.

I'm guessing that the reason it does this is because it completely
takes down the MAC and tx/rx rings to reprogram everything from
scratch, and thus any interference from a packet coming in to be
transmitted is going to cause problems.

> I think we should mention in the commit message that the side effect is
> link loss on XDP on / off. I don't know of any other driver which would
> need this, stmmac is a real gift..

I'll add that. However, it would be nice to find a different solution
for XDP on this driver.
Russell King (Oracle) April 16, 2025, 6:03 p.m. UTC | #3
On Tue, Apr 15, 2025 at 10:54:44AM +0100, Russell King (Oracle) wrote:
> On Mon, Apr 14, 2025 at 05:43:42PM -0700, Jakub Kicinski wrote:
> > IIUC this will case a link loss when XDP is installed, if not disregard
> > the reset of the email.
> 
> It will, because the author who added XDP support to stmmac decided it
> was easier to tear everything down and rebuild, which meant (presumably)
> that it was necessary to use netif_carrier_off() to stop the net layer
> queueing packets to the driver. I'm just guessing - I know nothing
> about XDP, and never knowingly used it.
> 
> > Any idea why it's necessary to mess with the link for XDP changes?
> 
> Depends what you mean by "link". If you're asking why it messes with
> netif_carrier_foo(), my best guess is as above. However, phylink
> drivers are not allowed to mess with the netif_carrier state (as the
> commit message states.) This is not a new requirement, it's always
> been this way with phylink, and this pre-dates the addition of XDP
> to this driver.
> 
> As long as the code requires the netif_carrier to be turned off, the
> only way to guarantee that in a phylink using driver is as per this
> patch.
> 
> I'm guessing that the reason it does this is because it completely
> takes down the MAC and tx/rx rings to reprogram everything from
> scratch, and thus any interference from a packet coming in to be
> transmitted is going to cause problems.

I'd like the "what do you mean by link" clarified before I update the
commit message.

If you're referring to the carrier state via netif_carrier_off() /
netif_carrier_on(), then nothing actually changes in that respect
because the carrier manipulation is being done by the driver today,
behind phylink's back. That changes to inside phylink with phylink's
knowledge.

It is my understanding that netif_carrier_off() / netif_carrier_on()
get notified to userspace, so this is visible today when XDP changes.

If you are referring to the messages that appear on the kernel console,
then yes, phylink will print those in addition, which actually makes
it more consistent with what's being reported to userspace.

Depending which you are referring to changes what I should say in the
commit message. E.g.

"We retain the changes to carrier state, which are already being
reported to userspace as link loss/link gain events, but we gain
kernel messages reporting the link state."

if you're referring to the carrier state. Or maybe:

"This change will have the side effect of printing link messages to
the kernel log, even though the physical link hasn't changed state.
This matches the carrier state."

if you're referring to the additional kernel messages.
Jakub Kicinski April 16, 2025, 10:37 p.m. UTC | #4
On Wed, 16 Apr 2025 19:03:19 +0100 Russell King (Oracle) wrote:
> "This change will have the side effect of printing link messages to
> the kernel log, even though the physical link hasn't changed state.
> This matches the carrier state."

So I did misunderstand. I thought we lose physical link. This paragraph
looks good, then, it'd correct my guess.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 59d07d0d3369..24eaabd1445e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -6922,6 +6922,8 @@  void stmmac_xdp_release(struct net_device *dev)
 	/* Ensure tx function is not running */
 	netif_tx_disable(dev);
 
+	phylink_stop(priv->phylink);
+
 	/* Disable NAPI process */
 	stmmac_disable_all_queues(priv);
 
@@ -6937,14 +6939,10 @@  void stmmac_xdp_release(struct net_device *dev)
 	/* Release and free the Rx/Tx resources */
 	free_dma_desc_resources(priv, &priv->dma_conf);
 
-	/* Disable the MAC Rx/Tx */
-	stmmac_mac_set(priv, priv->ioaddr, false);
-
 	/* set trans_start so we don't get spurious
 	 * watchdogs during reset
 	 */
 	netif_trans_update(dev);
-	netif_carrier_off(dev);
 }
 
 int stmmac_xdp_open(struct net_device *dev)
@@ -7026,25 +7024,25 @@  int stmmac_xdp_open(struct net_device *dev)
 		hrtimer_setup(&tx_q->txtimer, stmmac_tx_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	}
 
-	/* Enable the MAC Rx/Tx */
-	stmmac_mac_set(priv, priv->ioaddr, true);
-
 	/* Start Rx & Tx DMA Channels */
 	stmmac_start_all_dma(priv);
 
+	phylink_start(priv->phylink);
+
 	ret = stmmac_request_irq(dev);
 	if (ret)
 		goto irq_error;
 
 	/* Enable NAPI process*/
 	stmmac_enable_all_queues(priv);
-	netif_carrier_on(dev);
 	netif_tx_start_all_queues(dev);
 	stmmac_enable_all_dma_irq(priv);
 
 	return 0;
 
 irq_error:
+	phylink_stop(priv->phylink);
+
 	for (chan = 0; chan < priv->plat->tx_queues_to_use; chan++)
 		hrtimer_cancel(&priv->dma_conf.tx_queue[chan].txtimer);