Message ID | 20241209113204.175015-1-nikita.yoush@cogentembedded.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Geert Uytterhoeven |
Headers | show |
Series | [net] net: renesas: rswitch: handle stop vs interrupt race | expand |
Hello Nikita-san, Thank you for your patch! > From: Nikita Yushchenko, Sent: Monday, December 9, 2024 8:32 PM > > Currently the stop routine of rswitch driver does not immediately > prevent hardware from continuing to update descriptors and requesting > interrupts. > > It can happen that when rswitch_stop() executes the masking of > interrupts from the queues of the port being closed, napi poll for > that port is already scheduled or running on a different CPU. When > execution of this napi poll completes, it will unmask the interrupts. > And unmasked interrupt can fire after rswitch_stop() returns from > napi_disable() call. Then, the handler won't mask it, because > napi_schedule_prep() will return false, and interrupt storm will > happen. > > This can't be fixed by making rswitch_stop() call napi_disable() before > masking interrupts. In this case, the interrupt storm will happen if > interrupt fires between napi_disable() and masking. > > Fix this by checking for priv->opened_ports bit when unmasking > interrupts after napi poll. For that to be consistent, move > priv->opened_ports changes into spinlock-protected areas, and reorder > other operations in rswitch_open() and rswitch_stop() accordingly. We should add a Fixes tag for net.git here. I think the following tag is better because the first commit had this issue. Although this fixing patch cannot be applied on the first commit, I believe this is no matter about the Fixes tag [1]. Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") > Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> I could not apply this patch on net.git / main branch and the branch + your patches [2] though, the fixed code looks good. So, Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/5.Posting.rst?h=v6.12#n204 [2] https://patchwork.kernel.org/project/netdevbpf/list/?series=915669 Best regards, Yoshihiro Shimoda > --- > drivers/net/ethernet/renesas/rswitch.c | 33 ++++++++++++++------------ > 1 file changed, 18 insertions(+), 15 deletions(-) > > diff --git a/drivers/net/ethernet/renesas/rswitch.c b/drivers/net/ethernet/renesas/rswitch.c > index 6ca5f72193eb..a33f74e1c447 100644 > --- a/drivers/net/ethernet/renesas/rswitch.c > +++ b/drivers/net/ethernet/renesas/rswitch.c > @@ -918,8 +918,10 @@ static int rswitch_poll(struct napi_struct *napi, int budget) > > if (napi_complete_done(napi, budget - quota)) { > spin_lock_irqsave(&priv->lock, flags); > - rswitch_enadis_data_irq(priv, rdev->tx_queue->index, true); > - rswitch_enadis_data_irq(priv, rdev->rx_queue->index, true); > + if (test_bit(rdev->port, priv->opened_ports)) { > + rswitch_enadis_data_irq(priv, rdev->tx_queue->index, true); > + rswitch_enadis_data_irq(priv, rdev->rx_queue->index, true); > + } > spin_unlock_irqrestore(&priv->lock, flags); > } > > @@ -1582,20 +1584,20 @@ static int rswitch_open(struct net_device *ndev) > struct rswitch_device *rdev = netdev_priv(ndev); > unsigned long flags; > > - phy_start(ndev->phydev); > + if (bitmap_empty(rdev->priv->opened_ports, RSWITCH_NUM_PORTS)) > + iowrite32(GWCA_TS_IRQ_BIT, rdev->priv->addr + GWTSDIE); > > napi_enable(&rdev->napi); > - netif_start_queue(ndev); > > spin_lock_irqsave(&rdev->priv->lock, flags); > + bitmap_set(rdev->priv->opened_ports, rdev->port, 1); > rswitch_enadis_data_irq(rdev->priv, rdev->tx_queue->index, true); > rswitch_enadis_data_irq(rdev->priv, rdev->rx_queue->index, true); > spin_unlock_irqrestore(&rdev->priv->lock, flags); > > - if (bitmap_empty(rdev->priv->opened_ports, RSWITCH_NUM_PORTS)) > - iowrite32(GWCA_TS_IRQ_BIT, rdev->priv->addr + GWTSDIE); > + phy_start(ndev->phydev); > > - bitmap_set(rdev->priv->opened_ports, rdev->port, 1); > + netif_start_queue(ndev); > > return 0; > }; > @@ -1607,7 +1609,16 @@ static int rswitch_stop(struct net_device *ndev) > unsigned long flags; > > netif_tx_stop_all_queues(ndev); > + > + phy_stop(ndev->phydev); > + > + spin_lock_irqsave(&rdev->priv->lock, flags); > + rswitch_enadis_data_irq(rdev->priv, rdev->tx_queue->index, false); > + rswitch_enadis_data_irq(rdev->priv, rdev->rx_queue->index, false); > bitmap_clear(rdev->priv->opened_ports, rdev->port, 1); > + spin_unlock_irqrestore(&rdev->priv->lock, flags); > + > + napi_disable(&rdev->napi); > > if (bitmap_empty(rdev->priv->opened_ports, RSWITCH_NUM_PORTS)) > iowrite32(GWCA_TS_IRQ_BIT, rdev->priv->addr + GWTSDID); > @@ -1620,14 +1631,6 @@ static int rswitch_stop(struct net_device *ndev) > kfree(ts_info); > } > > - spin_lock_irqsave(&rdev->priv->lock, flags); > - rswitch_enadis_data_irq(rdev->priv, rdev->tx_queue->index, false); > - rswitch_enadis_data_irq(rdev->priv, rdev->rx_queue->index, false); > - spin_unlock_irqrestore(&rdev->priv->lock, flags); > - > - phy_stop(ndev->phydev); > - napi_disable(&rdev->napi); > - > return 0; > }; > > -- > 2.39.5
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Mon, 9 Dec 2024 16:32:04 +0500 you wrote: > Currently the stop routine of rswitch driver does not immediately > prevent hardware from continuing to update descriptors and requesting > interrupts. > > It can happen that when rswitch_stop() executes the masking of > interrupts from the queues of the port being closed, napi poll for > that port is already scheduled or running on a different CPU. When > execution of this napi poll completes, it will unmask the interrupts. > And unmasked interrupt can fire after rswitch_stop() returns from > napi_disable() call. Then, the handler won't mask it, because > napi_schedule_prep() will return false, and interrupt storm will > happen. > > [...] Here is the summary with links: - [net] net: renesas: rswitch: handle stop vs interrupt race https://git.kernel.org/netdev/net/c/3dd002f20098 You are awesome, thank you!
diff --git a/drivers/net/ethernet/renesas/rswitch.c b/drivers/net/ethernet/renesas/rswitch.c index 6ca5f72193eb..a33f74e1c447 100644 --- a/drivers/net/ethernet/renesas/rswitch.c +++ b/drivers/net/ethernet/renesas/rswitch.c @@ -918,8 +918,10 @@ static int rswitch_poll(struct napi_struct *napi, int budget) if (napi_complete_done(napi, budget - quota)) { spin_lock_irqsave(&priv->lock, flags); - rswitch_enadis_data_irq(priv, rdev->tx_queue->index, true); - rswitch_enadis_data_irq(priv, rdev->rx_queue->index, true); + if (test_bit(rdev->port, priv->opened_ports)) { + rswitch_enadis_data_irq(priv, rdev->tx_queue->index, true); + rswitch_enadis_data_irq(priv, rdev->rx_queue->index, true); + } spin_unlock_irqrestore(&priv->lock, flags); } @@ -1582,20 +1584,20 @@ static int rswitch_open(struct net_device *ndev) struct rswitch_device *rdev = netdev_priv(ndev); unsigned long flags; - phy_start(ndev->phydev); + if (bitmap_empty(rdev->priv->opened_ports, RSWITCH_NUM_PORTS)) + iowrite32(GWCA_TS_IRQ_BIT, rdev->priv->addr + GWTSDIE); napi_enable(&rdev->napi); - netif_start_queue(ndev); spin_lock_irqsave(&rdev->priv->lock, flags); + bitmap_set(rdev->priv->opened_ports, rdev->port, 1); rswitch_enadis_data_irq(rdev->priv, rdev->tx_queue->index, true); rswitch_enadis_data_irq(rdev->priv, rdev->rx_queue->index, true); spin_unlock_irqrestore(&rdev->priv->lock, flags); - if (bitmap_empty(rdev->priv->opened_ports, RSWITCH_NUM_PORTS)) - iowrite32(GWCA_TS_IRQ_BIT, rdev->priv->addr + GWTSDIE); + phy_start(ndev->phydev); - bitmap_set(rdev->priv->opened_ports, rdev->port, 1); + netif_start_queue(ndev); return 0; }; @@ -1607,7 +1609,16 @@ static int rswitch_stop(struct net_device *ndev) unsigned long flags; netif_tx_stop_all_queues(ndev); + + phy_stop(ndev->phydev); + + spin_lock_irqsave(&rdev->priv->lock, flags); + rswitch_enadis_data_irq(rdev->priv, rdev->tx_queue->index, false); + rswitch_enadis_data_irq(rdev->priv, rdev->rx_queue->index, false); bitmap_clear(rdev->priv->opened_ports, rdev->port, 1); + spin_unlock_irqrestore(&rdev->priv->lock, flags); + + napi_disable(&rdev->napi); if (bitmap_empty(rdev->priv->opened_ports, RSWITCH_NUM_PORTS)) iowrite32(GWCA_TS_IRQ_BIT, rdev->priv->addr + GWTSDID); @@ -1620,14 +1631,6 @@ static int rswitch_stop(struct net_device *ndev) kfree(ts_info); } - spin_lock_irqsave(&rdev->priv->lock, flags); - rswitch_enadis_data_irq(rdev->priv, rdev->tx_queue->index, false); - rswitch_enadis_data_irq(rdev->priv, rdev->rx_queue->index, false); - spin_unlock_irqrestore(&rdev->priv->lock, flags); - - phy_stop(ndev->phydev); - napi_disable(&rdev->napi); - return 0; };
Currently the stop routine of rswitch driver does not immediately prevent hardware from continuing to update descriptors and requesting interrupts. It can happen that when rswitch_stop() executes the masking of interrupts from the queues of the port being closed, napi poll for that port is already scheduled or running on a different CPU. When execution of this napi poll completes, it will unmask the interrupts. And unmasked interrupt can fire after rswitch_stop() returns from napi_disable() call. Then, the handler won't mask it, because napi_schedule_prep() will return false, and interrupt storm will happen. This can't be fixed by making rswitch_stop() call napi_disable() before masking interrupts. In this case, the interrupt storm will happen if interrupt fires between napi_disable() and masking. Fix this by checking for priv->opened_ports bit when unmasking interrupts after napi poll. For that to be consistent, move priv->opened_ports changes into spinlock-protected areas, and reorder other operations in rswitch_open() and rswitch_stop() accordingly. Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> --- drivers/net/ethernet/renesas/rswitch.c | 33 ++++++++++++++------------ 1 file changed, 18 insertions(+), 15 deletions(-)