Message ID | 57e66b364f1b6f09c9bc0316742c3b14f4ce83bd.1683526542.git.pabeni@redhat.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | revert: "softirq: Let ksoftirqd do its job" | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
Paolo! On Mon, May 08 2023 at 08:17, Paolo Abeni wrote: > Due to the mentioned commit, when the ksoftirqd processes take charge > of softirq processing, the system can experience high latencies. > > In the past a few workarounds have been implemented for specific > side-effects of the above: > > commit 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred") > commit 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run") > commit 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") > commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous") > > but the latency problem still exists in real-life workloads, see the > link below. > > The reverted commit intended to solve a live-lock scenario that can now > be addressed with the NAPI threaded mode, introduced with commit > 29863d41bb6e ("net: implement threaded-able napi poll loop support"), > and nowadays in a pretty stable status. > > While a complete solution to put softirq processing under nice resource > control would be preferable, that has proven to be a very hard task. In > the short term, remove the main pain point, and also simplify a bit the > current softirq implementation. > > Note that this change also reverts commit 3c53776e29f8 ("Mark HI and > TASKLET softirq synchronous") and commit 1342d8080f61 ("softirq: Don't > skip softirq execution when softirq thread is parking"), which are > direct follow-ups of the feature commit. A single change is preferred to > avoid known bad intermediate states introduced by a patch series > reverting them individually. I'm fine with this change, but I definitely want that to be acked/reviewed by the other stakeholders in the networking arena. Thanks, tglx
On Mon, 8 May 2023 08:17:44 +0200 Paolo Abeni wrote: > Due to the mentioned commit, when the ksoftirqd processes take charge > of softirq processing, the system can experience high latencies. > > In the past a few workarounds have been implemented for specific > side-effects of the above: > > commit 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred") > commit 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run") > commit 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") > commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous") > > but the latency problem still exists in real-life workloads, see the > link below. > > The reverted commit intended to solve a live-lock scenario that can now > be addressed with the NAPI threaded mode, introduced with commit > 29863d41bb6e ("net: implement threaded-able napi poll loop support"), > and nowadays in a pretty stable status. > > While a complete solution to put softirq processing under nice resource > control would be preferable, that has proven to be a very hard task. In > the short term, remove the main pain point, and also simplify a bit the > current softirq implementation. > > Note that this change also reverts commit 3c53776e29f8 ("Mark HI and > TASKLET softirq synchronous") and commit 1342d8080f61 ("softirq: Don't > skip softirq execution when softirq thread is parking"), which are > direct follow-ups of the feature commit. A single change is preferred to > avoid known bad intermediate states introduced by a patch series > reverting them individually. > > Link: https://lore.kernel.org/netdev/305d7742212cbe98621b16be782b0562f1012cb6.camel@redhat.com/ > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org>
On Tue, May 9, 2023 at 3:42 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 8 May 2023 08:17:44 +0200 Paolo Abeni wrote: > > Due to the mentioned commit, when the ksoftirqd processes take charge > > of softirq processing, the system can experience high latencies. > > > > In the past a few workarounds have been implemented for specific > > side-effects of the above: > > > > commit 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred") > > commit 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run") > > commit 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") > > commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous") > > > > but the latency problem still exists in real-life workloads, see the > > link below. > > > > The reverted commit intended to solve a live-lock scenario that can now > > be addressed with the NAPI threaded mode, introduced with commit > > 29863d41bb6e ("net: implement threaded-able napi poll loop support"), > > and nowadays in a pretty stable status. > > > > While a complete solution to put softirq processing under nice resource > > control would be preferable, that has proven to be a very hard task. In > > the short term, remove the main pain point, and also simplify a bit the > > current softirq implementation. > > > > Note that this change also reverts commit 3c53776e29f8 ("Mark HI and > > TASKLET softirq synchronous") and commit 1342d8080f61 ("softirq: Don't > > skip softirq execution when softirq thread is parking"), which are > > direct follow-ups of the feature commit. A single change is preferred to > > avoid known bad intermediate states introduced by a patch series > > reverting them individually. > > > > Link: https://lore.kernel.org/netdev/305d7742212cbe98621b16be782b0562f1012cb6.camel@redhat.com/ > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > Tested-by: Jason Xing <kerneljasonxing@gmail.com> > > Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks.
On 2023-05-08 08:17:44 [+0200], Paolo Abeni wrote: > Due to the mentioned commit, when the ksoftirqd processes take charge > of softirq processing, the system can experience high latencies. Yes. RT wise I tried a lot to keep ksoftirqd from getting scheduled. With this change, it makes the life a lot easier. Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Sebastian
diff --git a/kernel/softirq.c b/kernel/softirq.c index 1b725510dd0f..807b34ccd797 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -80,21 +80,6 @@ static void wakeup_softirqd(void) wake_up_process(tsk); } -/* - * If ksoftirqd is scheduled, we do not want to process pending softirqs - * right now. Let ksoftirqd handle this at its own rate, to get fairness, - * unless we're doing some of the synchronous softirqs. - */ -#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ)) -static bool ksoftirqd_running(unsigned long pending) -{ - struct task_struct *tsk = __this_cpu_read(ksoftirqd); - - if (pending & SOFTIRQ_NOW_MASK) - return false; - return tsk && task_is_running(tsk) && !__kthread_should_park(tsk); -} - #ifdef CONFIG_TRACE_IRQFLAGS DEFINE_PER_CPU(int, hardirqs_enabled); DEFINE_PER_CPU(int, hardirq_context); @@ -236,7 +221,7 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt) goto out; pending = local_softirq_pending(); - if (!pending || ksoftirqd_running(pending)) + if (!pending) goto out; /* @@ -432,9 +417,6 @@ static inline bool should_wake_ksoftirqd(void) static inline void invoke_softirq(void) { - if (ksoftirqd_running(local_softirq_pending())) - return; - if (!force_irqthreads() || !__this_cpu_read(ksoftirqd)) { #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK /* @@ -468,7 +450,7 @@ asmlinkage __visible void do_softirq(void) pending = local_softirq_pending(); - if (pending && !ksoftirqd_running(pending)) + if (pending) do_softirq_own_stack(); local_irq_restore(flags);