Message ID | 20250223221708.27130-1-frederic@kernel.org (mailing list archive) |
---|---|
State | Under Review |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] net: Handle napi_schedule() calls from non-interrupt | expand |
On Sun, Feb 23, 2025 at 11:17 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > napi_schedule() is expected to be called either: > > * From an interrupt, where raised softirqs are handled on IRQ exit > > * From a softirq disabled section, where raised softirqs are handled on > the next call to local_bh_enable(). > > * From a softirq handler, where raised softirqs are handled on the next > round in do_softirq(), or further deferred to a dedicated kthread. > > Other bare tasks context may end up ignoring the raised NET_RX vector > until the next random softirq handling opportunity, which may not > happen before a while if the CPU goes idle afterwards with the tick > stopped. > > Such "misuses" have been detected on several places thanks to messages > of the kind: > > "NOHZ tick-stop error: local softirq work is pending, handler #08!!!" > > For example: > > __raise_softirq_irqoff > __napi_schedule > rtl8152_runtime_resume.isra.0 > rtl8152_resume > usb_resume_interface.isra.0 > usb_resume_both > __rpm_callback > rpm_callback > rpm_resume > __pm_runtime_resume > usb_autoresume_device > usb_remote_wakeup > hub_event > process_one_work > worker_thread > kthread > ret_from_fork > ret_from_fork_asm > > And also: > > * drivers/net/usb/r8152.c::rtl_work_func_t > * drivers/net/netdevsim/netdev.c::nsim_start_xmit > > There is a long history of issues of this kind: > > 019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()") > 330068589389 ("idpf: disable local BH when scheduling napi for marker packets") > e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error") > e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule") > c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule") > 970be1dff26d ("mt76: disable BH around napi_schedule() calls") > 019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()") > 30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new function to be called from threaded interrupt") > e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()") > 83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule") > bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule") > 8cf699ec849f ("mlx4: do not call napi_schedule() without care") > ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule") > > This shows that relying on the caller to arrange a proper context for > the softirqs to be handled while calling napi_schedule() is very fragile > and error prone. Also fixing them can also prove challenging if the > caller may be called from different kinds of contexts. > > Therefore fix this from napi_schedule() itself with waking up ksoftirqd > when softirqs are raised from task contexts. > > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> > Reported-by: Jakub Kicinski <kuba@kernel.org> > Reported-by: Francois Romieu <romieu@fr.zoreil.com> > Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de/ > Cc: Breno Leitao <leitao@debian.org> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: Francois Romieu <romieu@fr.zoreil.com> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > --- > net/core/dev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 80e415ccf2c8..5c1b93a3f50a 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4693,7 +4693,7 @@ static inline void ____napi_schedule(struct softnet_data *sd, > * we have to raise NET_RX_SOFTIRQ. > */ > if (!sd->in_net_rx_action) > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > + raise_softirq_irqoff(NET_RX_SOFTIRQ); Your patch is fine, but would silence performance bugs. I would probably add something to let network developers be aware of them. diff --git a/net/core/dev.c b/net/core/dev.c index 1b252e9459fdbde42f6fb71dc146692c7f7ec17a..ae8882a622943a81ddd8e2d141df685637e334b6 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4762,8 +4762,10 @@ static inline void ____napi_schedule(struct softnet_data *sd, /* If not called from net_rx_action() * we have to raise NET_RX_SOFTIRQ. */ - if (!sd->in_net_rx_action) - __raise_softirq_irqoff(NET_RX_SOFTIRQ); + if (!sd->in_net_rx_action) { + raise_softirq_irqoff(NET_RX_SOFTIRQ); + DEBUG_NET_WARN_ON_ONCE(!in_interrupt()); + } } #ifdef CONFIG_RPS Looking at the list of patches, I can see idpf fix was not very good, I will submit another patch.
Le Wed, Feb 26, 2025 at 11:31:24AM +0100, Eric Dumazet a écrit : > On Sun, Feb 23, 2025 at 11:17 PM Frederic Weisbecker > <frederic@kernel.org> wrote: > > > > napi_schedule() is expected to be called either: > > > > * From an interrupt, where raised softirqs are handled on IRQ exit > > > > * From a softirq disabled section, where raised softirqs are handled on > > the next call to local_bh_enable(). > > > > * From a softirq handler, where raised softirqs are handled on the next > > round in do_softirq(), or further deferred to a dedicated kthread. > > > > Other bare tasks context may end up ignoring the raised NET_RX vector > > until the next random softirq handling opportunity, which may not > > happen before a while if the CPU goes idle afterwards with the tick > > stopped. > > > > Such "misuses" have been detected on several places thanks to messages > > of the kind: > > > > "NOHZ tick-stop error: local softirq work is pending, handler #08!!!" > > > > For example: > > > > __raise_softirq_irqoff > > __napi_schedule > > rtl8152_runtime_resume.isra.0 > > rtl8152_resume > > usb_resume_interface.isra.0 > > usb_resume_both > > __rpm_callback > > rpm_callback > > rpm_resume > > __pm_runtime_resume > > usb_autoresume_device > > usb_remote_wakeup > > hub_event > > process_one_work > > worker_thread > > kthread > > ret_from_fork > > ret_from_fork_asm > > > > And also: > > > > * drivers/net/usb/r8152.c::rtl_work_func_t > > * drivers/net/netdevsim/netdev.c::nsim_start_xmit > > > > There is a long history of issues of this kind: > > > > 019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()") > > 330068589389 ("idpf: disable local BH when scheduling napi for marker packets") > > e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error") > > e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule") > > c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule") > > 970be1dff26d ("mt76: disable BH around napi_schedule() calls") > > 019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()") > > 30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new function to be called from threaded interrupt") > > e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()") > > 83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule") > > bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule") > > 8cf699ec849f ("mlx4: do not call napi_schedule() without care") > > ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule") > > > > This shows that relying on the caller to arrange a proper context for > > the softirqs to be handled while calling napi_schedule() is very fragile > > and error prone. Also fixing them can also prove challenging if the > > caller may be called from different kinds of contexts. > > > > Therefore fix this from napi_schedule() itself with waking up ksoftirqd > > when softirqs are raised from task contexts. > > > > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> > > Reported-by: Jakub Kicinski <kuba@kernel.org> > > Reported-by: Francois Romieu <romieu@fr.zoreil.com> > > Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de/ > > Cc: Breno Leitao <leitao@debian.org> > > Cc: Jakub Kicinski <kuba@kernel.org> > > Cc: "David S. Miller" <davem@davemloft.net> > > Cc: Eric Dumazet <edumazet@google.com> > > Cc: Paolo Abeni <pabeni@redhat.com> > > Cc: Francois Romieu <romieu@fr.zoreil.com> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > > --- > > net/core/dev.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 80e415ccf2c8..5c1b93a3f50a 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4693,7 +4693,7 @@ static inline void ____napi_schedule(struct softnet_data *sd, > > * we have to raise NET_RX_SOFTIRQ. > > */ > > if (!sd->in_net_rx_action) > > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > + raise_softirq_irqoff(NET_RX_SOFTIRQ); > > Your patch is fine, but would silence performance bugs. > > I would probably add something to let network developers be aware of them. > > diff --git a/net/core/dev.c b/net/core/dev.c > index 1b252e9459fdbde42f6fb71dc146692c7f7ec17a..ae8882a622943a81ddd8e2d141df685637e334b6 > 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4762,8 +4762,10 @@ static inline void ____napi_schedule(struct > softnet_data *sd, > /* If not called from net_rx_action() > * we have to raise NET_RX_SOFTIRQ. > */ > - if (!sd->in_net_rx_action) > - __raise_softirq_irqoff(NET_RX_SOFTIRQ); > + if (!sd->in_net_rx_action) { > + raise_softirq_irqoff(NET_RX_SOFTIRQ); > + DEBUG_NET_WARN_ON_ONCE(!in_interrupt()); > + } That looks good and looks like what I did initially: https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/ Do you prefer me doing it over DEBUG_NET_WARN_ON_ONCE() or with lockdep like in the link? Thanks. > } > > #ifdef CONFIG_RPS > > > Looking at the list of patches, I can see idpf fix was not very good, > I will submit another patch.
On Wed, Feb 26, 2025 at 2:21 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > That looks good and looks like what I did initially: > > https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/ > > Do you prefer me doing it over DEBUG_NET_WARN_ON_ONCE() or with lockdep > like in the link? To be clear, I have not tried this thing yet. Perhaps let your patch as is (for stable backports), and put the debug stuff only after some tests, in net-next. It is very possible that napi_schedule() in the problematic cases were not on a fast path anyway. Reviewed-by: Eric Dumazet <edumazet@google.com>
Le Wed, Feb 26, 2025 at 02:34:39PM +0100, Eric Dumazet a écrit : > On Wed, Feb 26, 2025 at 2:21 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > > > > That looks good and looks like what I did initially: > > > > https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/ > > > > Do you prefer me doing it over DEBUG_NET_WARN_ON_ONCE() or with lockdep > > like in the link? > > To be clear, I have not tried this thing yet. > > Perhaps let your patch as is (for stable backports), and put the debug > stuff only after some tests, in net-next. Ok. > > It is very possible that napi_schedule() in the problematic cases were > not on a fast path anyway. That was my assumption but I must confess I don't know well this realm. > > Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks!
diff --git a/net/core/dev.c b/net/core/dev.c index 80e415ccf2c8..5c1b93a3f50a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4693,7 +4693,7 @@ static inline void ____napi_schedule(struct softnet_data *sd, * we have to raise NET_RX_SOFTIRQ. */ if (!sd->in_net_rx_action) - __raise_softirq_irqoff(NET_RX_SOFTIRQ); + raise_softirq_irqoff(NET_RX_SOFTIRQ); } #ifdef CONFIG_RPS
napi_schedule() is expected to be called either: * From an interrupt, where raised softirqs are handled on IRQ exit * From a softirq disabled section, where raised softirqs are handled on the next call to local_bh_enable(). * From a softirq handler, where raised softirqs are handled on the next round in do_softirq(), or further deferred to a dedicated kthread. Other bare tasks context may end up ignoring the raised NET_RX vector until the next random softirq handling opportunity, which may not happen before a while if the CPU goes idle afterwards with the tick stopped. Such "misuses" have been detected on several places thanks to messages of the kind: "NOHZ tick-stop error: local softirq work is pending, handler #08!!!" For example: __raise_softirq_irqoff __napi_schedule rtl8152_runtime_resume.isra.0 rtl8152_resume usb_resume_interface.isra.0 usb_resume_both __rpm_callback rpm_callback rpm_resume __pm_runtime_resume usb_autoresume_device usb_remote_wakeup hub_event process_one_work worker_thread kthread ret_from_fork ret_from_fork_asm And also: * drivers/net/usb/r8152.c::rtl_work_func_t * drivers/net/netdevsim/netdev.c::nsim_start_xmit There is a long history of issues of this kind: 019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()") 330068589389 ("idpf: disable local BH when scheduling napi for marker packets") e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error") e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule") c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule") 970be1dff26d ("mt76: disable BH around napi_schedule() calls") 019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()") 30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new function to be called from threaded interrupt") e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()") 83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule") bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule") 8cf699ec849f ("mlx4: do not call napi_schedule() without care") ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule") This shows that relying on the caller to arrange a proper context for the softirqs to be handled while calling napi_schedule() is very fragile and error prone. Also fixing them can also prove challenging if the caller may be called from different kinds of contexts. Therefore fix this from napi_schedule() itself with waking up ksoftirqd when softirqs are raised from task contexts. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Jakub Kicinski <kuba@kernel.org> Reported-by: Francois Romieu <romieu@fr.zoreil.com> Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de/ Cc: Breno Leitao <leitao@debian.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)