diff mbox series

[net,v2] net: Handle napi_schedule() calls from non-interrupt

Message ID 20250223221708.27130-1-frederic@kernel.org (mailing list archive)
State Under Review
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] net: Handle napi_schedule() calls from non-interrupt | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: horms@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5 this patch: 5
netdev/checkpatch warning WARNING: Possible repeated word: 'add'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 79 this patch: 79
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-25--15-00 (tests: 895)

Commit Message

Frederic Weisbecker Feb. 23, 2025, 10:17 p.m. UTC
napi_schedule() is expected to be called either:

* From an interrupt, where raised softirqs are handled on IRQ exit

* From a softirq disabled section, where raised softirqs are handled on
  the next call to local_bh_enable().

* From a softirq handler, where raised softirqs are handled on the next
  round in do_softirq(), or further deferred to a dedicated kthread.

Other bare tasks context may end up ignoring the raised NET_RX vector
until the next random softirq handling opportunity, which may not
happen before a while if the CPU goes idle afterwards with the tick
stopped.

Such "misuses" have been detected on several places thanks to messages
of the kind:

	"NOHZ tick-stop error: local softirq work is pending, handler #08!!!"

For example:

       __raise_softirq_irqoff
        __napi_schedule
        rtl8152_runtime_resume.isra.0
        rtl8152_resume
        usb_resume_interface.isra.0
        usb_resume_both
        __rpm_callback
        rpm_callback
        rpm_resume
        __pm_runtime_resume
        usb_autoresume_device
        usb_remote_wakeup
        hub_event
        process_one_work
        worker_thread
        kthread
        ret_from_fork
        ret_from_fork_asm

And also:

* drivers/net/usb/r8152.c::rtl_work_func_t
* drivers/net/netdevsim/netdev.c::nsim_start_xmit

There is a long history of issues of this kind:

	019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
	330068589389 ("idpf: disable local BH when scheduling napi for marker packets")
	e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error")
	e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule")
	c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule")
	970be1dff26d ("mt76: disable BH around napi_schedule() calls")
	019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
	30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new  function to be called from threaded interrupt")
	e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()")
	83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule")
	bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule")
	8cf699ec849f ("mlx4: do not call napi_schedule() without care")
	ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule")

This shows that relying on the caller to arrange a proper context for
the softirqs to be handled while calling napi_schedule() is very fragile
and error prone. Also fixing them can also prove challenging if the
caller may be called from different kinds of contexts.

Therefore fix this from napi_schedule() itself with waking up ksoftirqd
when softirqs are raised from task contexts.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: Francois Romieu <romieu@fr.zoreil.com>
Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de/
Cc: Breno Leitao <leitao@debian.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Dumazet Feb. 26, 2025, 10:31 a.m. UTC | #1
On Sun, Feb 23, 2025 at 11:17 PM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> napi_schedule() is expected to be called either:
>
> * From an interrupt, where raised softirqs are handled on IRQ exit
>
> * From a softirq disabled section, where raised softirqs are handled on
>   the next call to local_bh_enable().
>
> * From a softirq handler, where raised softirqs are handled on the next
>   round in do_softirq(), or further deferred to a dedicated kthread.
>
> Other bare tasks context may end up ignoring the raised NET_RX vector
> until the next random softirq handling opportunity, which may not
> happen before a while if the CPU goes idle afterwards with the tick
> stopped.
>
> Such "misuses" have been detected on several places thanks to messages
> of the kind:
>
>         "NOHZ tick-stop error: local softirq work is pending, handler #08!!!"
>
> For example:
>
>        __raise_softirq_irqoff
>         __napi_schedule
>         rtl8152_runtime_resume.isra.0
>         rtl8152_resume
>         usb_resume_interface.isra.0
>         usb_resume_both
>         __rpm_callback
>         rpm_callback
>         rpm_resume
>         __pm_runtime_resume
>         usb_autoresume_device
>         usb_remote_wakeup
>         hub_event
>         process_one_work
>         worker_thread
>         kthread
>         ret_from_fork
>         ret_from_fork_asm
>
> And also:
>
> * drivers/net/usb/r8152.c::rtl_work_func_t
> * drivers/net/netdevsim/netdev.c::nsim_start_xmit
>
> There is a long history of issues of this kind:
>
>         019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
>         330068589389 ("idpf: disable local BH when scheduling napi for marker packets")
>         e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error")
>         e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule")
>         c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule")
>         970be1dff26d ("mt76: disable BH around napi_schedule() calls")
>         019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
>         30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new  function to be called from threaded interrupt")
>         e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()")
>         83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule")
>         bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule")
>         8cf699ec849f ("mlx4: do not call napi_schedule() without care")
>         ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule")
>
> This shows that relying on the caller to arrange a proper context for
> the softirqs to be handled while calling napi_schedule() is very fragile
> and error prone. Also fixing them can also prove challenging if the
> caller may be called from different kinds of contexts.
>
> Therefore fix this from napi_schedule() itself with waking up ksoftirqd
> when softirqs are raised from task contexts.
>
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Reported-by: Francois Romieu <romieu@fr.zoreil.com>
> Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de/
> Cc: Breno Leitao <leitao@debian.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Francois Romieu <romieu@fr.zoreil.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  net/core/dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 80e415ccf2c8..5c1b93a3f50a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4693,7 +4693,7 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>          * we have to raise NET_RX_SOFTIRQ.
>          */
>         if (!sd->in_net_rx_action)
> -               __raise_softirq_irqoff(NET_RX_SOFTIRQ);
> +               raise_softirq_irqoff(NET_RX_SOFTIRQ);

Your patch is fine, but would silence performance bugs.

I would probably add something to let network developers be aware of them.

diff --git a/net/core/dev.c b/net/core/dev.c
index 1b252e9459fdbde42f6fb71dc146692c7f7ec17a..ae8882a622943a81ddd8e2d141df685637e334b6
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4762,8 +4762,10 @@ static inline void ____napi_schedule(struct
softnet_data *sd,
        /* If not called from net_rx_action()
         * we have to raise NET_RX_SOFTIRQ.
         */
-       if (!sd->in_net_rx_action)
-               __raise_softirq_irqoff(NET_RX_SOFTIRQ);
+       if (!sd->in_net_rx_action) {
+               raise_softirq_irqoff(NET_RX_SOFTIRQ);
+               DEBUG_NET_WARN_ON_ONCE(!in_interrupt());
+       }
 }

 #ifdef CONFIG_RPS


Looking at the list of patches, I can see idpf fix was not very good,
I will submit another patch.
Frederic Weisbecker Feb. 26, 2025, 1:21 p.m. UTC | #2
Le Wed, Feb 26, 2025 at 11:31:24AM +0100, Eric Dumazet a écrit :
> On Sun, Feb 23, 2025 at 11:17 PM Frederic Weisbecker
> <frederic@kernel.org> wrote:
> >
> > napi_schedule() is expected to be called either:
> >
> > * From an interrupt, where raised softirqs are handled on IRQ exit
> >
> > * From a softirq disabled section, where raised softirqs are handled on
> >   the next call to local_bh_enable().
> >
> > * From a softirq handler, where raised softirqs are handled on the next
> >   round in do_softirq(), or further deferred to a dedicated kthread.
> >
> > Other bare tasks context may end up ignoring the raised NET_RX vector
> > until the next random softirq handling opportunity, which may not
> > happen before a while if the CPU goes idle afterwards with the tick
> > stopped.
> >
> > Such "misuses" have been detected on several places thanks to messages
> > of the kind:
> >
> >         "NOHZ tick-stop error: local softirq work is pending, handler #08!!!"
> >
> > For example:
> >
> >        __raise_softirq_irqoff
> >         __napi_schedule
> >         rtl8152_runtime_resume.isra.0
> >         rtl8152_resume
> >         usb_resume_interface.isra.0
> >         usb_resume_both
> >         __rpm_callback
> >         rpm_callback
> >         rpm_resume
> >         __pm_runtime_resume
> >         usb_autoresume_device
> >         usb_remote_wakeup
> >         hub_event
> >         process_one_work
> >         worker_thread
> >         kthread
> >         ret_from_fork
> >         ret_from_fork_asm
> >
> > And also:
> >
> > * drivers/net/usb/r8152.c::rtl_work_func_t
> > * drivers/net/netdevsim/netdev.c::nsim_start_xmit
> >
> > There is a long history of issues of this kind:
> >
> >         019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
> >         330068589389 ("idpf: disable local BH when scheduling napi for marker packets")
> >         e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error")
> >         e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule")
> >         c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule")
> >         970be1dff26d ("mt76: disable BH around napi_schedule() calls")
> >         019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
> >         30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new  function to be called from threaded interrupt")
> >         e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()")
> >         83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule")
> >         bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule")
> >         8cf699ec849f ("mlx4: do not call napi_schedule() without care")
> >         ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule")
> >
> > This shows that relying on the caller to arrange a proper context for
> > the softirqs to be handled while calling napi_schedule() is very fragile
> > and error prone. Also fixing them can also prove challenging if the
> > caller may be called from different kinds of contexts.
> >
> > Therefore fix this from napi_schedule() itself with waking up ksoftirqd
> > when softirqs are raised from task contexts.
> >
> > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > Reported-by: Jakub Kicinski <kuba@kernel.org>
> > Reported-by: Francois Romieu <romieu@fr.zoreil.com>
> > Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de/
> > Cc: Breno Leitao <leitao@debian.org>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: Francois Romieu <romieu@fr.zoreil.com>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  net/core/dev.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 80e415ccf2c8..5c1b93a3f50a 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4693,7 +4693,7 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> >          * we have to raise NET_RX_SOFTIRQ.
> >          */
> >         if (!sd->in_net_rx_action)
> > -               __raise_softirq_irqoff(NET_RX_SOFTIRQ);
> > +               raise_softirq_irqoff(NET_RX_SOFTIRQ);
> 
> Your patch is fine, but would silence performance bugs.
> 
> I would probably add something to let network developers be aware of them.
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1b252e9459fdbde42f6fb71dc146692c7f7ec17a..ae8882a622943a81ddd8e2d141df685637e334b6
> 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4762,8 +4762,10 @@ static inline void ____napi_schedule(struct
> softnet_data *sd,
>         /* If not called from net_rx_action()
>          * we have to raise NET_RX_SOFTIRQ.
>          */
> -       if (!sd->in_net_rx_action)
> -               __raise_softirq_irqoff(NET_RX_SOFTIRQ);
> +       if (!sd->in_net_rx_action) {
> +               raise_softirq_irqoff(NET_RX_SOFTIRQ);
> +               DEBUG_NET_WARN_ON_ONCE(!in_interrupt());
> +       }

That looks good and looks like what I did initially:

https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/

Do you prefer me doing it over DEBUG_NET_WARN_ON_ONCE() or with lockdep
like in the link?

Thanks.

>  }
> 
>  #ifdef CONFIG_RPS
> 
> 
> Looking at the list of patches, I can see idpf fix was not very good,
> I will submit another patch.
Eric Dumazet Feb. 26, 2025, 1:34 p.m. UTC | #3
On Wed, Feb 26, 2025 at 2:21 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>

> That looks good and looks like what I did initially:
>
> https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/
>
> Do you prefer me doing it over DEBUG_NET_WARN_ON_ONCE() or with lockdep
> like in the link?

To be clear, I have not tried this thing yet.

Perhaps let your patch as is (for stable backports), and put the debug
stuff only after some tests, in net-next.

It is very possible that napi_schedule() in the problematic cases were
not on a fast path anyway.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Frederic Weisbecker Feb. 26, 2025, 1:38 p.m. UTC | #4
Le Wed, Feb 26, 2025 at 02:34:39PM +0100, Eric Dumazet a écrit :
> On Wed, Feb 26, 2025 at 2:21 PM Frederic Weisbecker <frederic@kernel.org> wrote:
> >
> 
> > That looks good and looks like what I did initially:
> >
> > https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/
> >
> > Do you prefer me doing it over DEBUG_NET_WARN_ON_ONCE() or with lockdep
> > like in the link?
> 
> To be clear, I have not tried this thing yet.
> 
> Perhaps let your patch as is (for stable backports), and put the debug
> stuff only after some tests, in net-next.

Ok.

> 
> It is very possible that napi_schedule() in the problematic cases were
> not on a fast path anyway.

That was my assumption but I must confess I don't know well this realm.

> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks!
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 80e415ccf2c8..5c1b93a3f50a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4693,7 +4693,7 @@  static inline void ____napi_schedule(struct softnet_data *sd,
 	 * we have to raise NET_RX_SOFTIRQ.
 	 */
 	if (!sd->in_net_rx_action)
-		__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+		raise_softirq_irqoff(NET_RX_SOFTIRQ);
 }
 
 #ifdef CONFIG_RPS