diff mbox

[RFC,v2,7/7] sched/idle: update poll time when wakeup from idle

Message ID 1504007201-12904-8-git-send-email-yang.zhang.wz@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yang Zhang Aug. 29, 2017, 11:46 a.m. UTC
In ttwu_do_wakeup, it will update avg_idle when wakeup from idle. Here
we just reuse this logic to update the poll time. It may be a little
late to update the poll in ttwu_do_wakeup, but the test result shows no
obvious performance gap compare with updating poll in irq handler.

one problem is that idle_stamp only used when using CFS scheduler. But
it is ok since it is the default policy for scheduler and only consider
it should enough.

Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
Signed-off-by: Quan Xu <quan.xu0@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
---
 include/linux/sched/idle.h | 4 ++++
 kernel/sched/core.c        | 4 ++++
 kernel/sched/idle.c        | 7 +++++++
 3 files changed, 15 insertions(+)

Comments

Peter Zijlstra Aug. 29, 2017, 12:46 p.m. UTC | #1
On Tue, Aug 29, 2017 at 11:46:41AM +0000, Yang Zhang wrote:
> In ttwu_do_wakeup, it will update avg_idle when wakeup from idle. Here
> we just reuse this logic to update the poll time. It may be a little
> late to update the poll in ttwu_do_wakeup, but the test result shows no
> obvious performance gap compare with updating poll in irq handler.
> 
> one problem is that idle_stamp only used when using CFS scheduler. But
> it is ok since it is the default policy for scheduler and only consider
> it should enough.
> 
> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
> Signed-off-by: Quan Xu <quan.xu0@gmail.com>

Same broken SoB chain, and not a useful word on why you need to adjust
this crap to begin with. What you want that poll duration to be related
to is the cost of a VMEXIT/VMENTER cycle, not however long we happened
to be idle.

So no.
Yang Zhang Sept. 1, 2017, 7:30 a.m. UTC | #2
On 2017/8/29 20:46, Peter Zijlstra wrote:
> On Tue, Aug 29, 2017 at 11:46:41AM +0000, Yang Zhang wrote:
>> In ttwu_do_wakeup, it will update avg_idle when wakeup from idle. Here
>> we just reuse this logic to update the poll time. It may be a little
>> late to update the poll in ttwu_do_wakeup, but the test result shows no
>> obvious performance gap compare with updating poll in irq handler.
>>
>> one problem is that idle_stamp only used when using CFS scheduler. But
>> it is ok since it is the default policy for scheduler and only consider
>> it should enough.
>>
>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
> 
> Same broken SoB chain, and not a useful word on why you need to adjust
> this crap to begin with. What you want that poll duration to be related
> to is the cost of a VMEXIT/VMENTER cycle, not however long we happened
> to be idle.

Actually, we should compare the cost of VMEXIT/VMENTER with the real 
duration in idle. We have a rough number of the cost for one 
VMEXIT/VMENTER(it is about 2k~4k cycles depends on the underlying CPU) 
and it introduces 4~5 VMENTER/VMEXITs in idle path which may increase 
about 7us latency in average. So we set the poll duration to 10us by 
default.

Another problem is there is no good way to measure the duration in idle. 
avg_idle is the only way i find so far. Do you have any suggestion to do 
it better? Thanks.
Quan Xu Sept. 29, 2017, 10:29 a.m. UTC | #3
On 2017/8/29 20:46, Peter Zijlstra wrote:
> On Tue, Aug 29, 2017 at 11:46:41AM +0000, Yang Zhang wrote:
>> In ttwu_do_wakeup, it will update avg_idle when wakeup from idle. Here
>> we just reuse this logic to update the poll time. It may be a little
>> late to update the poll in ttwu_do_wakeup, but the test result shows no
>> obvious performance gap compare with updating poll in irq handler.
>>
>> one problem is that idle_stamp only used when using CFS scheduler. But
>> it is ok since it is the default policy for scheduler and only consider
>> it should enough.
>>
>> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com>
>> Signed-off-by: Quan Xu <quan.xu0@gmail.com>
> Same broken SoB chain, and not a useful word on why you need to adjust
> this crap to begin with. What you want that poll duration to be related
> to is the cost of a VMEXIT/VMENTER cycle, not however long we happened
> to be idle.
>
> So no.

Peter,

I think you are right..

IIUC, the time we happened to be idle may contain a chain of 
VMEXIT/VMENTER cycles,
which would be mainly (except the last VMEXIT/VMENTER cycles) for just 
idle loops. right?

as you mentioned, poll duration to be related to is the cost of __a__ 
VMEXIT/VMENTER cycle.
howerver it is very difficult to measure a VMEXIT/VMENTER cycle 
accurately from
kvm guest, we could find out an approximate one -- dropping the idle 
loops from the
time we happened to be idle.. make sense?

Quan
diff mbox

Patch

diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
index 5ca63eb..6e0554d 100644
--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -12,6 +12,10 @@  enum cpu_idle_type {
 
 extern void wake_up_if_idle(int cpu);
 
+#if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT)
+extern void update_poll_duration(unsigned long idle_duration);
+#endif
+
 /*
  * Idle thread specific functions to determine the need_resched
  * polling state.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0869b20..25be9a3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1678,6 +1678,10 @@  static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags,
 		if (rq->avg_idle > max)
 			rq->avg_idle = max;
 
+#if defined(CONFIG_PARAVIRT)
+		update_poll_duration(rq->avg_idle);
+#endif
+
 		rq->idle_stamp = 0;
 	}
 #endif
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index b374744..7eb8559 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -101,6 +101,13 @@  void __cpuidle default_idle_call(void)
 	}
 }
 
+#if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT)
+void update_poll_duration(unsigned long idle_duration)
+{
+	paravirt_idle_update_poll_duration(idle_duration);
+}
+#endif
+
 static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		      int next_state)
 {