Message ID | 20231127122420.3706751-1-yoshihiro.shimoda.uh@renesas.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 9870257a0a338cd8d6c1cddab74e703f490f6779 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v4] ravb: Fix races between ravb_tx_timeout_work() and net related ops | expand |
On 11/27/23 3:24 PM, Yoshihiro Shimoda wrote: > Fix races between ravb_tx_timeout_work() and functions of net_device_ops > and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that > since ravb_close() is under the rtnl lock and calls cancel_work_sync(), > ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock > may happen in ravb_tx_timeout_work() like below: > > CPU0 CPU1 > ravb_tx_timeout() > schedule_work() > ... > __dev_close_many() > // Under rtnl lock > ravb_close() > cancel_work_sync() > // Waiting > ravb_tx_timeout_work() > rtnl_lock() > // This is possible to cause a deadlock > > If rtnl_trylock() fails, rescheduling the work with sleep for 1 msec. > > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> [...] > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > index c70cff80cc99..7c007ecd3ff6 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -1874,6 +1874,12 @@ static void ravb_tx_timeout_work(struct work_struct *work) > struct net_device *ndev = priv->ndev; > int error; > > + if (!rtnl_trylock()) { > + usleep_range(1000, 2000); Why not msleep(1) though? > + schedule_work(&priv->work); > + return; > + } > + > netif_tx_stop_all_queues(ndev); > > /* Stop PTP Clock driver */ [...] MBR, Sergey
Hi Sergey, > From: Sergey Shtylyov, Sent: Tuesday, November 28, 2023 1:28 AM > > On 11/27/23 3:24 PM, Yoshihiro Shimoda wrote: > > > Fix races between ravb_tx_timeout_work() and functions of net_device_ops > > and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that > > since ravb_close() is under the rtnl lock and calls cancel_work_sync(), > > ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock > > may happen in ravb_tx_timeout_work() like below: > > > > CPU0 CPU1 > > ravb_tx_timeout() > > schedule_work() > > ... > > __dev_close_many() > > // Under rtnl lock > > ravb_close() > > cancel_work_sync() > > // Waiting > > ravb_tx_timeout_work() > > rtnl_lock() > > // This is possible to cause a deadlock > > > > If rtnl_trylock() fails, rescheduling the work with sleep for 1 msec. > > > > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") > > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> > > Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Thank you very much for your review! > [...] > > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > > index c70cff80cc99..7c007ecd3ff6 100644 > > --- a/drivers/net/ethernet/renesas/ravb_main.c > > +++ b/drivers/net/ethernet/renesas/ravb_main.c > > @@ -1874,6 +1874,12 @@ static void ravb_tx_timeout_work(struct work_struct *work) > > struct net_device *ndev = priv->ndev; > > int error; > > > > + if (!rtnl_trylock()) { > > + usleep_range(1000, 2000); > > Why not msleep(1) though? It's for the following guideline: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/timers/timers-howto.rst?h=v6.7-rc3 ----- SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms): * Use usleep_range ----- Best regards, Yoshihiro Shimoda > > + schedule_work(&priv->work); > > + return; > > + } > > + > > netif_tx_stop_all_queues(ndev); > > > > /* Stop PTP Clock driver */ > [...] > > MBR, Sergey
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Mon, 27 Nov 2023 21:24:20 +0900 you wrote: > Fix races between ravb_tx_timeout_work() and functions of net_device_ops > and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that > since ravb_close() is under the rtnl lock and calls cancel_work_sync(), > ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock > may happen in ravb_tx_timeout_work() like below: > > CPU0 CPU1 > ravb_tx_timeout() > schedule_work() > ... > __dev_close_many() > // Under rtnl lock > ravb_close() > cancel_work_sync() > // Waiting > ravb_tx_timeout_work() > rtnl_lock() > // This is possible to cause a deadlock > > [...] Here is the summary with links: - [net,v4] ravb: Fix races between ravb_tx_timeout_work() and net related ops https://git.kernel.org/netdev/net/c/9870257a0a33 You are awesome, thank you!
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index c70cff80cc99..7c007ecd3ff6 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1874,6 +1874,12 @@ static void ravb_tx_timeout_work(struct work_struct *work) struct net_device *ndev = priv->ndev; int error; + if (!rtnl_trylock()) { + usleep_range(1000, 2000); + schedule_work(&priv->work); + return; + } + netif_tx_stop_all_queues(ndev); /* Stop PTP Clock driver */ @@ -1907,7 +1913,7 @@ static void ravb_tx_timeout_work(struct work_struct *work) */ netdev_err(ndev, "%s: ravb_dmac_init() failed, error %d\n", __func__, error); - return; + goto out_unlock; } ravb_emac_init(ndev); @@ -1917,6 +1923,9 @@ static void ravb_tx_timeout_work(struct work_struct *work) ravb_ptp_init(ndev, priv->pdev); netif_tx_start_all_queues(ndev); + +out_unlock: + rtnl_unlock(); } /* Packet transmit function for Ethernet AVB */
Fix races between ravb_tx_timeout_work() and functions of net_device_ops and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that since ravb_close() is under the rtnl lock and calls cancel_work_sync(), ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock may happen in ravb_tx_timeout_work() like below: CPU0 CPU1 ravb_tx_timeout() schedule_work() ... __dev_close_many() // Under rtnl lock ravb_close() cancel_work_sync() // Waiting ravb_tx_timeout_work() rtnl_lock() // This is possible to cause a deadlock If rtnl_trylock() fails, rescheduling the work with sleep for 1 msec. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> --- Changes from v3: https://lore.kernel.org/all/20231115022644.2316961-1-yoshihiro.shimoda.uh@renesas.com/ - Based on v2 patch. This means that delayed work is not used. - Add rescheduling the work if rtnl_trylock() fails. - Drop Reviewed-by tags because implementation was changed. Changes from v2: https://lore.kernel.org/netdev/20231019113308.1133944-1-yoshihiro.shimoda.uh@renesas.com/ - Add rescheduling if rtnl_trylock() fails and the netif is still running and update commit description for it. - Add Reviewed-by tags. drivers/net/ethernet/renesas/ravb_main.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)