diff mbox series

[net,v2] ravb: Fix races between ravb_tx_timeout_work() and net related ops

Message ID 20231019113308.1133944-1-yoshihiro.shimoda.uh@renesas.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] ravb: Fix races between ravb_tx_timeout_work() and net related ops | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1362 this patch: 1362
netdev/cc_maintainers fail 1 blamed authors not CCed: mitsuhiro.kimura.kc@renesas.com; 2 maintainers not CCed: wsa+renesas@sang-engineering.com mitsuhiro.kimura.kc@renesas.com
netdev/build_clang success Errors and warnings before: 1386 this patch: 1386
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1386 this patch: 1386
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Yoshihiro Shimoda Oct. 19, 2023, 11:33 a.m. UTC
Fix races between ravb_tx_timeout_work() and functions of net_device_ops
and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock
may happen in ravb_tx_timeout_work() like below:

CPU0			CPU1
			ravb_tx_timeout()
			schedule_work()
...
__dev_close_many()
// Under rtnl lock
ravb_close()
cancel_work_sync()
// Waiting
			ravb_tx_timeout_work()
			rtnl_lock()
			// This is possible to cause a deadlock

Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
Changes from v1:
https://lore.kernel.org/all/20231017085341.813335-1-yoshihiro.shimoda.uh@renesas.com/
 - Modify commit description.
 - Use goto in a error path.

 drivers/net/ethernet/renesas/ravb_main.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Sergey Shtylyov Oct. 19, 2023, 12:36 p.m. UTC | #1
Hello!

On 10/19/23 2:33 PM, Yoshihiro Shimoda wrote:

> Fix races between ravb_tx_timeout_work() and functions of net_device_ops
> and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
> since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
> ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock
> may happen in ravb_tx_timeout_work() like below:
> 
> CPU0			CPU1
> 			ravb_tx_timeout()
> 			schedule_work()
> ...
> __dev_close_many()
> // Under rtnl lock
> ravb_close()
> cancel_work_sync()
> // Waiting
> 			ravb_tx_timeout_work()
> 			rtnl_lock()
> 			// This is possible to cause a deadlock
> 
> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>

[...]
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 0ef0b88b7145..300c1885e1e1 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1874,6 +1874,9 @@ static void ravb_tx_timeout_work(struct work_struct *work)
>  	struct net_device *ndev = priv->ndev;
>  	int error;
>  
> +	if (!rtnl_trylock())
> +		return;

   I wonder if we should reschedule the work here...

[...]

MBR, Sergey
Simon Horman Oct. 23, 2023, 11:16 a.m. UTC | #2
On Thu, Oct 19, 2023 at 08:33:08PM +0900, Yoshihiro Shimoda wrote:
> Fix races between ravb_tx_timeout_work() and functions of net_device_ops
> and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
> since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
> ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock
> may happen in ravb_tx_timeout_work() like below:
> 
> CPU0			CPU1
> 			ravb_tx_timeout()
> 			schedule_work()
> ...
> __dev_close_many()
> // Under rtnl lock
> ravb_close()
> cancel_work_sync()
> // Waiting
> 			ravb_tx_timeout_work()
> 			rtnl_lock()
> 			// This is possible to cause a deadlock
> 
> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Yoshihiro Shimoda Nov. 14, 2023, 8:52 a.m. UTC | #3
Hello,

> From: Yoshihiro Shimoda, Sent: Thursday, October 19, 2023 8:33 PM
> 
> Fix races between ravb_tx_timeout_work() and functions of net_device_ops
> and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
> since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
> ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock
> may happen in ravb_tx_timeout_work() like below:
> 
> CPU0			CPU1
> 			ravb_tx_timeout()
> 			schedule_work()
> ...
> __dev_close_many()
> // Under rtnl lock
> ravb_close()
> cancel_work_sync()
> // Waiting
> 			ravb_tx_timeout_work()
> 			rtnl_lock()
> 			// This is possible to cause a deadlock
> 
> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

Gentle ping. I confirmed that I could apply this patch on
the latest net.git / main branch.

Best regards,
Yoshihiro Shimoda

> ---
> Changes from v1:
<snip URL>
>  - Modify commit description.
>  - Use goto in a error path.
> 
>  drivers/net/ethernet/renesas/ravb_main.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 0ef0b88b7145..300c1885e1e1 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1874,6 +1874,9 @@ static void ravb_tx_timeout_work(struct work_struct *work)
>  	struct net_device *ndev = priv->ndev;
>  	int error;
> 
> +	if (!rtnl_trylock())
> +		return;
> +
>  	netif_tx_stop_all_queues(ndev);
> 
>  	/* Stop PTP Clock driver */
> @@ -1907,7 +1910,7 @@ static void ravb_tx_timeout_work(struct work_struct *work)
>  		 */
>  		netdev_err(ndev, "%s: ravb_dmac_init() failed, error %d\n",
>  			   __func__, error);
> -		return;
> +		goto out_unlock;
>  	}
>  	ravb_emac_init(ndev);
> 
> @@ -1917,6 +1920,9 @@ static void ravb_tx_timeout_work(struct work_struct *work)
>  		ravb_ptp_init(ndev, priv->pdev);
> 
>  	netif_tx_start_all_queues(ndev);
> +
> +out_unlock:
> +	rtnl_unlock();
>  }
> 
>  /* Packet transmit function for Ethernet AVB */
> --
> 2.25.1
Jakub Kicinski Nov. 14, 2023, 10:09 p.m. UTC | #4
On Tue, 14 Nov 2023 08:52:29 +0000 Yoshihiro Shimoda wrote:
> Gentle ping. I confirmed that I could apply this patch on
> the latest net.git / main branch.

At a glance the suggestion from Sergei makes sense.
You need to reply to him.
Yoshihiro Shimoda Nov. 14, 2023, 11:48 p.m. UTC | #5
> From: Jakub Kicinski, Sent: Wednesday, November 15, 2023 7:10 AM
> 
> On Tue, 14 Nov 2023 08:52:29 +0000 Yoshihiro Shimoda wrote:
> > Gentle ping. I confirmed that I could apply this patch on
> > the latest net.git / main branch.
> 
> At a glance the suggestion from Sergei makes sense.
> You need to reply to him.

Thank you for your reply. I completely overlooked his suggestion.
I'll check it and reply to him.

Best regards,
Yoshihiro Shimoda
Yoshihiro Shimoda Nov. 15, 2023, 2:09 a.m. UTC | #6
Hello Sergey,

> From: Sergey Shtylyov, Sent: Thursday, October 19, 2023 9:36 PM
> 
> Hello!
> 
> On 10/19/23 2:33 PM, Yoshihiro Shimoda wrote:
> 
> > Fix races between ravb_tx_timeout_work() and functions of net_device_ops
> > and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
> > since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
> > ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock
> > may happen in ravb_tx_timeout_work() like below:
> >
> > CPU0			CPU1
> > 			ravb_tx_timeout()
> > 			schedule_work()
> > ...
> > __dev_close_many()
> > // Under rtnl lock
> > ravb_close()
> > cancel_work_sync()
> > // Waiting
> > 			ravb_tx_timeout_work()
> > 			rtnl_lock()
> > 			// This is possible to cause a deadlock
> >
> > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> 
> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>

Thank you for your review!

> [...]
> > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> > index 0ef0b88b7145..300c1885e1e1 100644
> > --- a/drivers/net/ethernet/renesas/ravb_main.c
> > +++ b/drivers/net/ethernet/renesas/ravb_main.c
> > @@ -1874,6 +1874,9 @@ static void ravb_tx_timeout_work(struct work_struct *work)
> >  	struct net_device *ndev = priv->ndev;
> >  	int error;
> >
> > +	if (!rtnl_trylock())
> > +		return;
> 
>    I wonder if we should reschedule the work here...

I think so. But, it should reschedule the work if the netif is still running because
Use-after-free issue happens again when cancel_work_sync() is calling. Also, I also think
we should use schedule_delayed_work() instead. So, I'll submit such a patch as v3.

Best regards,
Yoshihiro Shimoda

> [...]
> 
> MBR, Sergey
Sergey Shtylyov Nov. 15, 2023, 6:31 p.m. UTC | #7
On 11/15/23 5:09 AM, Yoshihiro Shimoda wrote:
[...]

>>> Fix races between ravb_tx_timeout_work() and functions of net_device_ops
>>> and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
>>> since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
>>> ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock
>>> may happen in ravb_tx_timeout_work() like below:
>>>
>>> CPU0			CPU1
>>> 			ravb_tx_timeout()
>>> 			schedule_work()
>>> ...
>>> __dev_close_many()
>>> // Under rtnl lock
>>> ravb_close()
>>> cancel_work_sync()
>>> // Waiting
>>> 			ravb_tx_timeout_work()
>>> 			rtnl_lock()
>>> 			// This is possible to cause a deadlock
>>>
>>> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
>>> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
>>
>> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>

[...]

>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>> index 0ef0b88b7145..300c1885e1e1 100644
>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>> @@ -1874,6 +1874,9 @@ static void ravb_tx_timeout_work(struct work_struct *work)
>>>  	struct net_device *ndev = priv->ndev;
>>>  	int error;
>>>
>>> +	if (!rtnl_trylock())
>>> +		return;
>>
>>    I wonder if we should reschedule the work here...
> 
> I think so. But, it should reschedule the work if the netif is still running because
> Use-after-free issue happens again when cancel_work_sync() is calling. Also, I also think
> we should use schedule_delayed_work() instead. So, I'll submit such a patch as v3.

   I'm not really sure about that one. Note that cancel_work_sync() should
work with the works requeueing themselves, the comments say...

> Best regards,
> Yoshihiro Shimoda

[...]

MBR, Sergey
Yoshihiro Shimoda Nov. 16, 2023, 2:15 a.m. UTC | #8
Hello,

> From: Sergey Shtylyov, Sent: Thursday, November 16, 2023 3:31 AM
> 
> On 11/15/23 5:09 AM, Yoshihiro Shimoda wrote:
> [...]
> 
> >>> Fix races between ravb_tx_timeout_work() and functions of net_device_ops
> >>> and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
> >>> since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
> >>> ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock
> >>> may happen in ravb_tx_timeout_work() like below:
> >>>
> >>> CPU0			CPU1
> >>> 			ravb_tx_timeout()
> >>> 			schedule_work()
> >>> ...
> >>> __dev_close_many()
> >>> // Under rtnl lock
> >>> ravb_close()
> >>> cancel_work_sync()
> >>> // Waiting
> >>> 			ravb_tx_timeout_work()
> >>> 			rtnl_lock()
> >>> 			// This is possible to cause a deadlock
> >>>
> >>> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> >>> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> >>
> >> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
> 
> [...]
> 
> >>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> >>> index 0ef0b88b7145..300c1885e1e1 100644
> >>> --- a/drivers/net/ethernet/renesas/ravb_main.c
> >>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> >>> @@ -1874,6 +1874,9 @@ static void ravb_tx_timeout_work(struct work_struct *work)
> >>>  	struct net_device *ndev = priv->ndev;
> >>>  	int error;
> >>>
> >>> +	if (!rtnl_trylock())
> >>> +		return;
> >>
> >>    I wonder if we should reschedule the work here...
> >
> > I think so. But, it should reschedule the work if the netif is still running because
> > Use-after-free issue happens again when cancel_work_sync() is calling. Also, I also think
> > we should use schedule_delayed_work() instead. So, I'll submit such a patch as v3.
> 
>    I'm not really sure about that one. Note that cancel_work_sync() should
> work with the works requeueing themselves, the comments say...

I'm sorry, I completely mistook to explain this... I should have said:
It should not reschedule the work if the netif is not running because
          ~~~                                     ~~~
use-after-free issue happens again when cancel_work_sync() is called from ravb_remove().
                                                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Also, I completely misunderstood the behavior of cancel_{schedule_}work_sync().
In the function(s), since WORK_STRUCT_PENDING_BIT is set, schedule_{delayed_}work()
will not schedule the work anymore. So, I'll drop a condition netif_running()
from the ravb_tx_timeout_work().

Best regards,
Yoshihiro Shimoda


> > Best regards,
> > Yoshihiro Shimoda
> 
> [...]
> 
> MBR, Sergey
Sergey Shtylyov Nov. 16, 2023, 6:11 p.m. UTC | #9
On 11/16/23 5:15 AM, Yoshihiro Shimoda wrote:
[...]

>>>>> Fix races between ravb_tx_timeout_work() and functions of net_device_ops
>>>>> and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
>>>>> since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
>>>>> ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock
>>>>> may happen in ravb_tx_timeout_work() like below:
>>>>>
>>>>> CPU0			CPU1
>>>>> 			ravb_tx_timeout()
>>>>> 			schedule_work()
>>>>> ...
>>>>> __dev_close_many()
>>>>> // Under rtnl lock
>>>>> ravb_close()
>>>>> cancel_work_sync()
>>>>> // Waiting
>>>>> 			ravb_tx_timeout_work()
>>>>> 			rtnl_lock()
>>>>> 			// This is possible to cause a deadlock
>>>>>
>>>>> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
>>>>> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
>>>>
>>>> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
>>
>> [...]
>>
>>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>>>> index 0ef0b88b7145..300c1885e1e1 100644
>>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>>>> @@ -1874,6 +1874,9 @@ static void ravb_tx_timeout_work(struct work_struct *work)
>>>>>  	struct net_device *ndev = priv->ndev;
>>>>>  	int error;
>>>>>
>>>>> +	if (!rtnl_trylock())
>>>>> +		return;
>>>>
>>>>    I wonder if we should reschedule the work here...
>>>
>>> I think so. But, it should reschedule the work if the netif is still running because
>>> Use-after-free issue happens again when cancel_work_sync() is calling. Also, I also think
>>> we should use schedule_delayed_work() instead. So, I'll submit such a patch as v3.
>>
>>    I'm not really sure about that one. Note that cancel_work_sync() should
>> work with the works requeueing themselves, the comments say...
> 
> I'm sorry, I completely mistook to explain this... I should have said:

   Don't worry, my uncertainty was about using the "delayed" flavor of
the works. :-)

> It should not reschedule the work if the netif is not running because
>           ~~~                                     ~~~
> use-after-free issue happens again when cancel_work_sync() is called from ravb_remove().
>                                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   Well, it's called from ravb_close() -- which is called by the networking
core when unregister_netdev() is called bt ravb_remove()...

> Also, I completely misunderstood the behavior of cancel_{schedule_}work_sync().

   cancel_{delayed_}work_sync(), you meant...

> In the function(s), since WORK_STRUCT_PENDING_BIT is set, schedule_{delayed_}work()
> will not schedule the work anymore. So, I'll drop a condition netif_running()
> from the ravb_tx_timeout_work().

  Hm, this caused me to rummage in the work queue code for more time than
I could afford... still not sure what you meant... :-/

> Best regards,
> Yoshihiro Shimoda
[...]

MBR, Sergey
Yoshihiro Shimoda Nov. 17, 2023, 12:07 a.m. UTC | #10
> From: Sergey Shtylyov, Sent: Friday, November 17, 2023 3:12 AM
> 
> On 11/16/23 5:15 AM, Yoshihiro Shimoda wrote:
> [...]
> 
> >>>>> Fix races between ravb_tx_timeout_work() and functions of net_device_ops
> >>>>> and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that
> >>>>> since ravb_close() is under the rtnl lock and calls cancel_work_sync(),
> >>>>> ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock
> >>>>> may happen in ravb_tx_timeout_work() like below:
> >>>>>
> >>>>> CPU0			CPU1
> >>>>> 			ravb_tx_timeout()
> >>>>> 			schedule_work()
> >>>>> ...
> >>>>> __dev_close_many()
> >>>>> // Under rtnl lock
> >>>>> ravb_close()
> >>>>> cancel_work_sync()
> >>>>> // Waiting
> >>>>> 			ravb_tx_timeout_work()
> >>>>> 			rtnl_lock()
> >>>>> 			// This is possible to cause a deadlock
> >>>>>
> >>>>> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> >>>>> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> >>>>
> >>>> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
> >>
> >> [...]
> >>
> >>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> >>>>> index 0ef0b88b7145..300c1885e1e1 100644
> >>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
> >>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> >>>>> @@ -1874,6 +1874,9 @@ static void ravb_tx_timeout_work(struct work_struct *work)
> >>>>>  	struct net_device *ndev = priv->ndev;
> >>>>>  	int error;
> >>>>>
> >>>>> +	if (!rtnl_trylock())
> >>>>> +		return;
> >>>>
> >>>>    I wonder if we should reschedule the work here...
> >>>
> >>> I think so. But, it should reschedule the work if the netif is still running because
> >>> Use-after-free issue happens again when cancel_work_sync() is calling. Also, I also think
> >>> we should use schedule_delayed_work() instead. So, I'll submit such a patch as v3.
> >>
> >>    I'm not really sure about that one. Note that cancel_work_sync() should
> >> work with the works requeueing themselves, the comments say...
> >
> > I'm sorry, I completely mistook to explain this... I should have said:
> 
>    Don't worry, my uncertainty was about using the "delayed" flavor of
> the works. :-)
> 
> > It should not reschedule the work if the netif is not running because
> >           ~~~                                     ~~~
> > use-after-free issue happens again when cancel_work_sync() is called from ravb_remove().
> >                                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>    Well, it's called from ravb_close() -- which is called by the networking
> core when unregister_netdev() is called bt ravb_remove()...

You're correct. I'm sorry for my lack explanation again...

> > Also, I completely misunderstood the behavior of cancel_{schedule_}work_sync().
> 
>    cancel_{delayed_}work_sync(), you meant...

Yes...

> > In the function(s), since WORK_STRUCT_PENDING_BIT is set, schedule_{delayed_}work()
> > will not schedule the work anymore. So, I'll drop a condition netif_running()
> > from the ravb_tx_timeout_work().
> 
>   Hm, this caused me to rummage in the work queue code for more time than
> I could afford... still not sure what you meant... :-/

I'm sorry for bothering you about this topic...
In the v3 patch, the rescheduling code was:
---
+	if (!rtnl_trylock()) {
+		if (netif_running(ndev))
+			schedule_delayed_work(&priv->work, msecs_to_jiffies(10));
+		return;
+	}
---

However, we can implement this like the following:
---
+	if (!rtnl_trylock()) {
+		schedule_delayed_work(&priv->work, msecs_to_jiffies(10));
+		return;
+	}
---

The schedule_{delayed}_work() will not be queued after cancel_{delayed_}work_sync()
was called, because WORK_STRUCT_PENDING_BIT was set in cancel_{delayed_}work_sync()
like the following:
---
cancel_work_sync()
-> __cancel_work_timer()
  -> try_to_grab_pending()
   -> if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)

schedule_work()
 -> queue_work()
  -> queue_work_on()
   -> if (test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)
    -> __queue_work()
---

Best regards,
Yoshihiro Shimoda

> > Best regards,
> > Yoshihiro Shimoda
> [...]
> 
> MBR, Sergey
Sergey Shtylyov Nov. 23, 2023, 5:15 p.m. UTC | #11
Hello!

   Sorry for the late reply -- the damn dozen of the AVB patches fell on me this
Monday... :-/

On 11/17/23 3:07 AM, Yoshihiro Shimoda wrote:
[...]

>>> In the function(s), since WORK_STRUCT_PENDING_BIT is set, schedule_{delayed_}work()
>>> will not schedule the work anymore. So, I'll drop a condition netif_running()
>>> from the ravb_tx_timeout_work().
>>
>>   Hm, this caused me to rummage in the work queue code for more time than
>> I could afford... still not sure what you meant... :-/
> 
> I'm sorry for bothering you about this topic...
> In the v3 patch, the rescheduling code was:
> ---
> +	if (!rtnl_trylock()) {
> +		if (netif_running(ndev))
> +			schedule_delayed_work(&priv->work, msecs_to_jiffies(10));
> +		return;
> +	}
> ---
> 
> However, we can implement this like the following:
> ---
> +	if (!rtnl_trylock()) {
> +		schedule_delayed_work(&priv->work, msecs_to_jiffies(10));
> +		return;
> +	}
> ---
> 
> The schedule_{delayed}_work() will not be queued after cancel_{delayed_}work_sync()
> was called, because WORK_STRUCT_PENDING_BIT was set in cancel_{delayed_}work_sync()
> like the following:
> ---
> cancel_work_sync()
> -> __cancel_work_timer()
>   -> try_to_grab_pending()
>    -> if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)
> 
> schedule_work()
>  -> queue_work()
>   -> queue_work_on()
>    -> if (test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)

   You seem to have lost ! here. :-)

>     -> __queue_work()

   Ah! Now it makes perfect sense. Sorry, this somehow evaded me... :-/

> ---
> 
> Best regards,
> Yoshihiro Shimoda

[...]

MBR, Sergey
diff mbox series

Patch

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 0ef0b88b7145..300c1885e1e1 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1874,6 +1874,9 @@  static void ravb_tx_timeout_work(struct work_struct *work)
 	struct net_device *ndev = priv->ndev;
 	int error;
 
+	if (!rtnl_trylock())
+		return;
+
 	netif_tx_stop_all_queues(ndev);
 
 	/* Stop PTP Clock driver */
@@ -1907,7 +1910,7 @@  static void ravb_tx_timeout_work(struct work_struct *work)
 		 */
 		netdev_err(ndev, "%s: ravb_dmac_init() failed, error %d\n",
 			   __func__, error);
-		return;
+		goto out_unlock;
 	}
 	ravb_emac_init(ndev);
 
@@ -1917,6 +1920,9 @@  static void ravb_tx_timeout_work(struct work_struct *work)
 		ravb_ptp_init(ndev, priv->pdev);
 
 	netif_tx_start_all_queues(ndev);
+
+out_unlock:
+	rtnl_unlock();
 }
 
 /* Packet transmit function for Ethernet AVB */