diff mbox series

sched: Fix rq->nr_iowait ordering

Message ID 20201117093829.GD3121429@hirez.programming.kicks-ass.net (mailing list archive)
State New, archived
Headers show
Series sched: Fix rq->nr_iowait ordering | expand

Commit Message

Peter Zijlstra Nov. 17, 2020, 9:38 a.m. UTC
And poking at this reminded me of an order email from TJ that seems to
have stagnated.

---
Subject: sched: Fix rq->nr_iowait ordering
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 24 Sep 2020 13:50:42 +0200

  schedule()				ttwu()
    deactivate_task();			  if (p->on_rq && ...) // false
					    atomic_dec(&task_rq(p)->nr_iowait);
    if (prev->in_iowait)
      atomic_inc(&rq->nr_iowait);

Allows nr_iowait to be decremented before it gets incremented,
resulting in more dodgy IO-wait numbers than usual.

Note that because we can now do ttwu_queue_wakelist() before
p->on_cpu==0, we lose the natural ordering and have to further delay
the decrement.

Fixes: Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

Comments

Mel Gorman Nov. 17, 2020, 11:43 a.m. UTC | #1
On Tue, Nov 17, 2020 at 10:38:29AM +0100, Peter Zijlstra wrote:
> Subject: sched: Fix rq->nr_iowait ordering
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Thu, 24 Sep 2020 13:50:42 +0200
> 
>   schedule()				ttwu()
>     deactivate_task();			  if (p->on_rq && ...) // false
> 					    atomic_dec(&task_rq(p)->nr_iowait);
>     if (prev->in_iowait)
>       atomic_inc(&rq->nr_iowait);
> 
> Allows nr_iowait to be decremented before it gets incremented,
> resulting in more dodgy IO-wait numbers than usual.
> 
> Note that because we can now do ttwu_queue_wakelist() before
> p->on_cpu==0, we lose the natural ordering and have to further delay
> the decrement.
> 
> Fixes: Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
> Reported-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

s/Fixes: Fixes:/Fixes:/

Ok, very minor hazard that the same logic gets duplicated that someone
might try "fix" but git blame should help. Otherwise, it makes sense as
I've received more than one "bug" that complained that a number was larger
than they expected even if no other problem was present so

Acked-by: Mel Gorman <mgorman@techsingularity.net>
diff mbox series

Patch

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2949,7 +2949,12 @@  ttwu_do_activate(struct rq *rq, struct t
 #ifdef CONFIG_SMP
 	if (wake_flags & WF_MIGRATED)
 		en_flags |= ENQUEUE_MIGRATED;
+	else
 #endif
+	if (p->in_iowait) {
+		delayacct_blkio_end(p);
+		atomic_dec(&task_rq(p)->nr_iowait);
+	}
 
 	activate_task(rq, p, en_flags);
 	ttwu_do_wakeup(rq, p, wake_flags, rf);
@@ -3336,11 +3341,6 @@  try_to_wake_up(struct task_struct *p, un
 	if (READ_ONCE(p->on_rq) && ttwu_runnable(p, wake_flags))
 		goto unlock;
 
-	if (p->in_iowait) {
-		delayacct_blkio_end(p);
-		atomic_dec(&task_rq(p)->nr_iowait);
-	}
-
 #ifdef CONFIG_SMP
 	/*
 	 * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be
@@ -3411,6 +3411,11 @@  try_to_wake_up(struct task_struct *p, un
 
 	cpu = select_task_rq(p, p->wake_cpu, wake_flags | WF_TTWU);
 	if (task_cpu(p) != cpu) {
+		if (p->in_iowait) {
+			delayacct_blkio_end(p);
+			atomic_dec(&task_rq(p)->nr_iowait);
+		}
+
 		wake_flags |= WF_MIGRATED;
 		psi_ttwu_dequeue(p);
 		set_task_cpu(p, cpu);