Message ID | 20210427023758.4048-1-song.bao.hua@hisilicon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | sched/fair: don't use waker's cpu if the waker of sync wake-up is interrupt | expand |
On Tue, 2021-04-27 at 14:37 +1200, Barry Song wrote: > > To fix this issue, this patch checks the waker of sync wake-up is a task > but not an interrupt. In this case, the waker will schedule out and give > CPU to wakee. That rash "the waker will schedule out" assumption, ie this really really is a synchronous wakeup, may be true in your particular case, but the sync hint is so overused as to be fairly meaningless. We've squabbled with it repeatedly over the years because of that. It really should either become more of a communication of intent than it currently is, or just go away. I'd argue for go away, simply because there is no way for the kernel to know that there isn't more work directly behind any particular wakeup. Take a pipe, does shoving some bits through a pipe mean you have no further use of your CPU? IFF you're doing nothing but playing ping- pong, sure it does, but how many real proggies have zero overlap among its threads of execution? The mere notion of threaded apps having no overlap *to be converted to throughput* is dainbramaged, which should be the death knell of the sync wakeup hint. Threaded apps can't do stuff like, oh, networking, which uses the sync hint heavily, without at least to some extent defeating the purpose of threading if we were to take the hint seriously. Heck, just look at the beauty (gak) of wake_wide(). It was born specifically to combat the pure-evil side of the sync wakeup hint. Bah, 'nuff "Danger Will Robinson, that thing will *eat you*!!" ;-) -Mike
> -----Original Message----- > From: Mike Galbraith [mailto:efault@gmx.de] > Sent: Tuesday, April 27, 2021 4:22 PM > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; > vincent.guittot@linaro.org; mingo@redhat.com; peterz@infradead.org; > dietmar.eggemann@arm.com; rostedt@goodmis.org; bsegall@google.com; > mgorman@suse.de > Cc: valentin.schneider@arm.com; juri.lelli@redhat.com; bristot@redhat.com; > linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; xuwei (O) > <xuwei5@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; > guodong.xu@linaro.org; yangyicong <yangyicong@huawei.com>; Liguozhu (Kenneth) > <liguozhu@hisilicon.com>; linuxarm@openeuler.org; wanghuiqiang > <wanghuiqiang@huawei.com>; xieyongjia (A) <xieyongjia1@huawei.com> > Subject: Re: [PATCH] sched/fair: don't use waker's cpu if the waker of sync > wake-up is interrupt > > On Tue, 2021-04-27 at 14:37 +1200, Barry Song wrote: > > > > To fix this issue, this patch checks the waker of sync wake-up is a task > > but not an interrupt. In this case, the waker will schedule out and give > > CPU to wakee. > > That rash "the waker will schedule out" assumption, ie this really > really is a synchronous wakeup, may be true in your particular case, Hi Mike, For my particular case, sync hint is used by interrupt but not task. so "the waker will schedule out" is false because the existing task is relevant at all. Here the description "the waker will schedule out" is just copying the general idea of sync wake-up though the real code might overuse this hint. > but the sync hint is so overused as to be fairly meaningless. We've > squabbled with it repeatedly over the years because of that. It really > should either become more of a communication of intent than it > currently is, or just go away. I agree sync hint might have been overused by other kernel subsystem. But this patch will at least fix a case: sync waker is interrupt, in this case, the existing task has nothing to do with waker and wakee, so this case should be excluded from wake_affine_idle(). > > I'd argue for go away, simply because there is no way for the kernel to > know that there isn't more work directly behind any particular wakeup. To some extent, "go way" might be a good choice. But this is a more aggressive action. For those cases waker will really scheduler out, wakee won't be able to take the advantage of getting the idle cpu of waker. > > Take a pipe, does shoving some bits through a pipe mean you have no > further use of your CPU? IFF you're doing nothing but playing ping- > pong, sure it does, but how many real proggies have zero overlap among > its threads of execution? The mere notion of threaded apps having no > overlap *to be converted to throughput* is dainbramaged, which should > be the death knell of the sync wakeup hint. Threaded apps can't do > stuff like, oh, networking, which uses the sync hint heavily, without > at least to some extent defeating the purpose of threading if we were > to take the hint seriously. Heck, just look at the beauty (gak) of > wake_wide(). It was born specifically to combat the pure-evil side of > the sync wakeup hint. As above, removing the code of migrating wakee to the cpu of sync waker could be an option, but needs more investigations. > > Bah, 'nuff "Danger Will Robinson, that thing will *eat you*!!" ;-) > > -Mike Thanks Barry
On Tue, 2021-04-27 at 04:44 +0000, Song Bao Hua (Barry Song) wrote: > > > I agree sync hint might have been overused by other kernel subsystem. > But this patch will at least fix a case: sync waker is interrupt, > in this case, the existing task has nothing to do with waker and wakee, > so this case should be excluded from wake_affine_idle(). I long ago tried filtering interrupt wakeups, and met some surprises. Wakeup twiddling always managing to end up being a rob Peter to pay Paul operation despite our best efforts, here's hoping that your pile of stolen cycles is small enough to escape performance bot notice :) -Mike
> -----Original Message----- > From: Mike Galbraith [mailto:efault@gmx.de] > Sent: Tuesday, April 27, 2021 5:55 PM > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; > vincent.guittot@linaro.org; mingo@redhat.com; peterz@infradead.org; > dietmar.eggemann@arm.com; rostedt@goodmis.org; bsegall@google.com; > mgorman@suse.de > Cc: valentin.schneider@arm.com; juri.lelli@redhat.com; bristot@redhat.com; > linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; xuwei (O) > <xuwei5@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; > guodong.xu@linaro.org; yangyicong <yangyicong@huawei.com>; Liguozhu (Kenneth) > <liguozhu@hisilicon.com>; linuxarm@openeuler.org; wanghuiqiang > <wanghuiqiang@huawei.com>; xieyongjia (A) <xieyongjia1@huawei.com> > Subject: Re: [PATCH] sched/fair: don't use waker's cpu if the waker of sync > wake-up is interrupt > > On Tue, 2021-04-27 at 04:44 +0000, Song Bao Hua (Barry Song) wrote: > > > > > > I agree sync hint might have been overused by other kernel subsystem. > > But this patch will at least fix a case: sync waker is interrupt, > > in this case, the existing task has nothing to do with waker and wakee, > > so this case should be excluded from wake_affine_idle(). > > I long ago tried filtering interrupt wakeups, and met some surprises. > Wakeup twiddling always managing to end up being a rob Peter to pay > Paul operation despite our best efforts, here's hoping that your pile > of stolen cycles is small enough to escape performance bot notice :) Would you like to share the link you did before to filter interrupt wakeups? The wake up path has hundreds of lines of code, so I don't expect that reading preempt_count will cause visible performance losses to bot. But who knows :-) > > -Mike Thanks Barry
On Tue, 2021-04-27 at 06:05 +0000, Song Bao Hua (Barry Song) wrote: > > Would you like to share the link you did before to filter interrupt > wakeups? Can't, failed experiments go only to my Bitmaster-9000 patch shredder to avoid needing a snow plow to get to near my box. Besides, it was long ago in a much changed code base. -Mike
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6d73bdbb2d40..8ad2d732033d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5829,7 +5829,12 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync) if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu)) return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu; - if (sync && cpu_rq(this_cpu)->nr_running == 1) + /* + * If this is a sync wake-up and the only running thread is just + * waker, thus, waker is not interrupt, we assume wakee will get + * the cpu of waker soon + */ + if (sync && cpu_rq(this_cpu)->nr_running == 1 && in_task()) return this_cpu; if (available_idle_cpu(prev_cpu))
a severe qperf performance decrease was reported in the below use case: For a hardware with 2 NUMA nodes, node0 has cpu0-31, node1 has cpu32-63. Ethernet is located in node1. Run the below commands: $ taskset -c 32-63 stress -c 32 & $ qperf 192.168.50.166 tcp_lat tcp_lat: latency = 2.95ms. Normally the latency should be less than 20us. But in the above test, latency increased dramatically to 2.95ms. This is caused by ping-pong of qperf between node0 and node1. Since it is a sync wake-up and waker's nr_running == 1, WAKE_AFFINE will pull qperf to node1, but LB will soon migrate qperf back to node0. Not like a normal sync wake-up coming from a task, the waker in the above test is an interrupt and nr_running happens to be 1 since stress starts 32 threads on node1 with 32 cpus. Testing also shows the performance of qperf won't drop if the number of threads are increased to 64, 96 or larger values: $ taskset -c 32-63 stress -c 96 & $ qperf 192.168.50.166 tcp_lat tcp_lat: latency = 14.7us. Obviously "-c 96" makes "cpu_rq(this_cpu)->nr_running == 1" false in wake_affine_idle() so WAKE_AFFINE won't pull qperf to node1. To fix this issue, this patch checks the waker of sync wake-up is a task but not an interrupt. In this case, the waker will schedule out and give CPU to wakee. Reported-by: Yongjia Xie <xieyongjia1@huawei.com> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> --- kernel/sched/fair.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)