[v2] sched/core: Preempt current task in favour of bound kthread

A running task can wake-up a per CPU bound kthread on the same CPU.
If the current running task doesn't yield the CPU before the next load
balance operation, the scheduler would detect load imbalance and try to
balance the load. However this load balance would fail as the waiting
task is CPU bound, while the running task cannot be moved by the regular
load balancer. Finally the active load balancer would kick in and move
the task to a different CPU/Core. Moving the task to a different
CPU/core can lead to loss in cache affinity leading to poor performance.

This is more prone to happen if the current running task is CPU
intensive and the sched_wake_up_granularity is set to larger value.
When the sched_wake_up_granularity was relatively small, it was observed
that the bound thread would complete before the load balancer would have
chosen to move the cache hot task to a different CPU.

To deal with this situation, the current running task would yield to a
per CPU bound kthread, provided kthread is not CPU intensive.

/pboffline/hwcct_prg_old/lib/fsperf -t overwrite --noclean -f 5g -b 4k /pboffline

(With sched_wake_up_granularity set to 15ms)

Performance counter stats for 'system wide' (5 runs):
event					     v5.4                               v5.4 + patch(v2)
probe:active_load_balance_cpu_stop       1,919  ( +-  2.89% )                   5  ( +- 12.56% )
sched:sched_waking                     441,535  ( +-  0.17% )             901,174  ( +-  0.25% )
sched:sched_wakeup                     441,533  ( +-  0.17% )             901,172  ( +-  0.25% )
sched:sched_wakeup_new                   2,436  ( +-  8.08% )                 525  ( +-  2.57% )
sched:sched_switch                     797,007  ( +-  0.26% )           1,458,463  ( +-  0.24% )
sched:sched_migrate_task                20,998  ( +-  1.04% )               2,279  ( +-  3.47% )
sched:sched_process_free                 2,436  ( +-  7.90% )                 527  ( +-  2.30% )
sched:sched_process_exit                 2,451  ( +-  7.85% )                 542  ( +-  2.24% )
sched:sched_wait_task                        7  ( +- 21.20% )                   1  ( +- 77.46% )
sched:sched_process_wait                 3,951  ( +-  9.14% )                 816  ( +-  3.52% )
sched:sched_process_fork                 2,435  ( +-  8.09% )                 524  ( +-  2.58% )
sched:sched_process_exec                 1,023  ( +- 12.21% )                 198  ( +-  3.23% )
sched:sched_wake_idle_without_ipi      187,794  ( +-  1.14% )             348,565  ( +-  0.34% )

Elasped time in seconds          289.43 +- 1.42 ( +-  0.49% )    72.6013 +- 0.0417 ( +-  0.06% )
Throughput results

v5.4
Trigger time:................... 0.842679 s   (Throughput:     6075.86 MB/s)
Asynchronous submit time:.......   1.0184 s   (Throughput:     5027.49 MB/s)
Synchronous submit time:........        0 s   (Throughput:           0 MB/s)
I/O time:.......................   263.17 s   (Throughput:      19.455 MB/s)
Ratio trigger time to I/O time:.0.00320202

v5.4 + patch(v2)
Trigger time:................... 0.853973 s   (Throughput:      5995.5 MB/s)
Asynchronous submit time:....... 0.768092 s   (Throughput:     6665.86 MB/s)
Synchronous submit time:........        0 s   (Throughput:           0 MB/s)
I/O time:.......................  44.0267 s   (Throughput:     116.292 MB/s)
Ratio trigger time to I/O time:.0.0193966

(With sched_wake_up_granularity set to 4ms)

Performance counter stats for 'system wide' (5 runs):
event					      v5.4 				v5.4 + patch(v2)
probe:active_load_balance_cpu_stop               6  ( +-  6.03% )                   5  ( +- 23.20% )
sched:sched_waking                         899,880  ( +-  0.38% )             899,737  ( +-  0.41% )
sched:sched_wakeup                         899,878  ( +-  0.38% )             899,736  ( +-  0.41% )
sched:sched_wakeup_new                         622  ( +- 11.95% )                 499  ( +-  1.08% )
sched:sched_switch                       1,458,214  ( +-  0.40% )           1,451,374  ( +-  0.32% )
sched:sched_migrate_task                     3,120  ( +- 10.00% )               2,500  ( +- 10.86% )
sched:sched_process_free                       608  ( +- 12.18% )                 484  ( +-  1.19% )
sched:sched_process_exit                       623  ( +- 11.91% )                 499  ( +-  1.15% )
sched:sched_wait_task                            1  ( +- 31.18% )                   1  ( +- 31.18% )
sched:sched_process_wait                       998  ( +- 13.22% )                 765  ( +-  0.16% )
sched:sched_process_fork                       622  ( +- 11.95% )                 498  ( +-  1.08% )
sched:sched_process_exec                       242  ( +- 13.81% )                 183  ( +-  0.48% )
sched:sched_wake_idle_without_ipi          349,165  ( +-  0.35% )             347,773  ( +-  0.43% )

Elasped time in seconds           72.8560 +- 0.0768 ( +-  0.11% )     72.4327 +- 0.0797 ( +-  0.11% )

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog:
v1 : http://lore.kernel.org/lkml/20191209165122.GA27229@linux.vnet.ibm.com
v1->v2: Pass the the right params to try_to_wake_up as correctly pointed out
by Dave Chinner

 kernel/sched/core.c  |  7 ++++++-
 kernel/sched/fair.c  | 23 ++++++++++++++++++++++-
 kernel/sched/sched.h |  3 ++-
 3 files changed, 30 insertions(+), 3 deletions(-)

Message ID	20191210054330.GF27253@linux.vnet.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=aLzd=2A=vger.kernel.org=linux-fsdevel-owner@kernel.org> Gateway: Authorized Use Only! Violators will be prosecuted for <linux-fsdevel@vger.kernel.org> from <srikar@linux.vnet.ibm.com>; Tue, 10 Dec 2019 05:43:39 -0000 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 10 Dec 2019 05:43:34 -0000 Date: Tue, 10 Dec 2019 11:13:30 +0530 From: Srikar Dronamraju <srikar@linux.vnet.ibm.com> To: Dave Chinner <david@fromorbit.com> Cc: Peter Zijlstra <peterz@infradead.org>, Phil Auld <pauld@redhat.com>, Ming Lei <ming.lei@redhat.com>, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, Jeff Moyer <jmoyer@redhat.com>, Dave Chinner <dchinner@redhat.com>, Eric Sandeen <sandeen@redhat.com>, Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>, Ingo Molnar <mingo@redhat.com>, Tejun Heo <tj@kernel.org>, Vincent Guittot <vincent.guittot@linaro.org> Subject: [PATCH v2] sched/core: Preempt current task in favour of bound kthread Reply-To: Srikar Dronamraju <srikar@linux.vnet.ibm.com> References: <20191115045634.GN4614@dread.disaster.area> <20191115070843.GA24246@ming.t460p> <20191115234005.GO4614@dread.disaster.area> <20191118092121.GV4131@hirez.programming.kicks-ass.net> <20191118204054.GV4614@dread.disaster.area> <20191120191636.GI4097@hirez.programming.kicks-ass.net> <20191120220313.GC18056@pauld.bos.csb> <20191121132937.GW4114@hirez.programming.kicks-ass.net> <20191209165122.GA27229@linux.vnet.ibm.com> <20191209231743.GA19256@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20191209231743.GA19256@dread.disaster.area> User-Agent: Mutt/1.10.1 (2018-07-13) Message-Id: <20191210054330.GF27253@linux.vnet.ibm.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk
Series	[v2] sched/core: Preempt current task in favour of bound kthread \| expand [v2] sched/core: Preempt current task in favour of bound kthread

[v2] sched/core: Preempt current task in favour of bound kthread

Commit Message

Comments

Patch