diff mbox series

[2/9] softirq: Use sched_clock() based timeout

Message ID 20230505113315.3307723-3-liujian56@huawei.com (mailing list archive)
State New, archived
Headers show
Series fix softlockup in run_timer_softirq | expand

Commit Message

Liu Jian May 5, 2023, 11:33 a.m. UTC
From: Peter Zijlstra <peterz@infradead.org>

Replace the jiffies based timeout with a sched_clock() based one.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 kernel/softirq.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Jason Xing May 8, 2023, 4:08 a.m. UTC | #1
On Fri, May 5, 2023 at 7:25 PM Liu Jian <liujian56@huawei.com> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> Replace the jiffies based timeout with a sched_clock() based one.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Liu Jian <liujian56@huawei.com>
> ---
>  kernel/softirq.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index bff5debf6ce6..59f16a9af5d1 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -27,6 +27,7 @@
>  #include <linux/tick.h>
>  #include <linux/irq.h>
>  #include <linux/wait_bit.h>
> +#include <linux/sched/clock.h>
>
>  #include <asm/softirq_stack.h>
>
> @@ -489,7 +490,7 @@ asmlinkage __visible void do_softirq(void)
>   * we want to handle softirqs as soon as possible, but they
>   * should not be able to lock up the box.
>   */
> -#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
> +#define MAX_SOFTIRQ_TIME       (2 * NSEC_PER_MSEC)

I wonder if it affects those servers that set HZ to some different
values rather than 1000 as default.

Thanks,
Jason

>  #define MAX_SOFTIRQ_RESTART 10
>
>  #ifdef CONFIG_TRACE_IRQFLAGS
> @@ -527,9 +528,9 @@ static inline void lockdep_softirq_end(bool in_hardirq) { }
>
>  asmlinkage __visible void __softirq_entry __do_softirq(void)
>  {
> -       unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
>         unsigned long old_flags = current->flags;
>         int max_restart = MAX_SOFTIRQ_RESTART;
> +       u64 start = sched_clock();
>         struct softirq_action *h;
>         unsigned long pending;
>         unsigned int vec_nr;
> @@ -584,7 +585,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>
>         pending = local_softirq_pending();
>         if (pending) {
> -               if (time_before(jiffies, end) && !need_resched() &&
> +               if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() &&
>                     --max_restart)
>                         goto restart;
>
> --
> 2.34.1
>
>
Thomas Gleixner May 8, 2023, 5:51 p.m. UTC | #2
On Mon, May 08 2023 at 12:08, Jason Xing wrote:
> On Fri, May 5, 2023 at 7:25 PM Liu Jian <liujian56@huawei.com> wrote:
>> @@ -489,7 +490,7 @@ asmlinkage __visible void do_softirq(void)
>>   * we want to handle softirqs as soon as possible, but they
>>   * should not be able to lock up the box.
>>   */
>> -#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
>> +#define MAX_SOFTIRQ_TIME       (2 * NSEC_PER_MSEC)
>
> I wonder if it affects those servers that set HZ to some different
> values rather than 1000 as default.

The result of msecs_to_jiffies(2) for different HZ values:

HZ=100     1
HZ=250     1
HZ=1000    2

So depending on when the softirq processing starts, this gives the
following ranges in which the timeout ends:

HZ=100    0 - 10ms
HZ=250    0 -  4ms
HZ=1000   1 -  2ms

But as the various softirq handlers have their own notion of timeouts,
loop limits etc. and the timeout is only checked after _all_ pending
bits of each iteration have been processed, the outcome of this is all
lottery.

Due to that the sched_clock() change per se won't have too much impact,
but if the overall changes to consolidate the break conditions are in
place, I think it will have observable effects.

Making this consistent is definitely a good thing, but it won't solve
the underlying problem of soft interrupt processing at all.

We definitely need to spend more thoughts on pulling things out of soft
interrupt context so that these functionalities get under proper
resource control by the scheduler.

Thanks,

        tglx
diff mbox series

Patch

diff --git a/kernel/softirq.c b/kernel/softirq.c
index bff5debf6ce6..59f16a9af5d1 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -27,6 +27,7 @@ 
 #include <linux/tick.h>
 #include <linux/irq.h>
 #include <linux/wait_bit.h>
+#include <linux/sched/clock.h>
 
 #include <asm/softirq_stack.h>
 
@@ -489,7 +490,7 @@  asmlinkage __visible void do_softirq(void)
  * we want to handle softirqs as soon as possible, but they
  * should not be able to lock up the box.
  */
-#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
+#define MAX_SOFTIRQ_TIME	(2 * NSEC_PER_MSEC)
 #define MAX_SOFTIRQ_RESTART 10
 
 #ifdef CONFIG_TRACE_IRQFLAGS
@@ -527,9 +528,9 @@  static inline void lockdep_softirq_end(bool in_hardirq) { }
 
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
-	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
 	unsigned long old_flags = current->flags;
 	int max_restart = MAX_SOFTIRQ_RESTART;
+	u64 start = sched_clock();
 	struct softirq_action *h;
 	unsigned long pending;
 	unsigned int vec_nr;
@@ -584,7 +585,7 @@  asmlinkage __visible void __softirq_entry __do_softirq(void)
 
 	pending = local_softirq_pending();
 	if (pending) {
-		if (time_before(jiffies, end) && !need_resched() &&
+		if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() &&
 		    --max_restart)
 			goto restart;