Message ID | 1504007201-12904-8-git-send-email-yang.zhang.wz@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Aug 29, 2017 at 11:46:41AM +0000, Yang Zhang wrote: > In ttwu_do_wakeup, it will update avg_idle when wakeup from idle. Here > we just reuse this logic to update the poll time. It may be a little > late to update the poll in ttwu_do_wakeup, but the test result shows no > obvious performance gap compare with updating poll in irq handler. > > one problem is that idle_stamp only used when using CFS scheduler. But > it is ok since it is the default policy for scheduler and only consider > it should enough. > > Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com> > Signed-off-by: Quan Xu <quan.xu0@gmail.com> Same broken SoB chain, and not a useful word on why you need to adjust this crap to begin with. What you want that poll duration to be related to is the cost of a VMEXIT/VMENTER cycle, not however long we happened to be idle. So no.
On 2017/8/29 20:46, Peter Zijlstra wrote: > On Tue, Aug 29, 2017 at 11:46:41AM +0000, Yang Zhang wrote: >> In ttwu_do_wakeup, it will update avg_idle when wakeup from idle. Here >> we just reuse this logic to update the poll time. It may be a little >> late to update the poll in ttwu_do_wakeup, but the test result shows no >> obvious performance gap compare with updating poll in irq handler. >> >> one problem is that idle_stamp only used when using CFS scheduler. But >> it is ok since it is the default policy for scheduler and only consider >> it should enough. >> >> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com> >> Signed-off-by: Quan Xu <quan.xu0@gmail.com> > > Same broken SoB chain, and not a useful word on why you need to adjust > this crap to begin with. What you want that poll duration to be related > to is the cost of a VMEXIT/VMENTER cycle, not however long we happened > to be idle. Actually, we should compare the cost of VMEXIT/VMENTER with the real duration in idle. We have a rough number of the cost for one VMEXIT/VMENTER(it is about 2k~4k cycles depends on the underlying CPU) and it introduces 4~5 VMENTER/VMEXITs in idle path which may increase about 7us latency in average. So we set the poll duration to 10us by default. Another problem is there is no good way to measure the duration in idle. avg_idle is the only way i find so far. Do you have any suggestion to do it better? Thanks.
On 2017/8/29 20:46, Peter Zijlstra wrote: > On Tue, Aug 29, 2017 at 11:46:41AM +0000, Yang Zhang wrote: >> In ttwu_do_wakeup, it will update avg_idle when wakeup from idle. Here >> we just reuse this logic to update the poll time. It may be a little >> late to update the poll in ttwu_do_wakeup, but the test result shows no >> obvious performance gap compare with updating poll in irq handler. >> >> one problem is that idle_stamp only used when using CFS scheduler. But >> it is ok since it is the default policy for scheduler and only consider >> it should enough. >> >> Signed-off-by: Yang Zhang <yang.zhang.wz@gmail.com> >> Signed-off-by: Quan Xu <quan.xu0@gmail.com> > Same broken SoB chain, and not a useful word on why you need to adjust > this crap to begin with. What you want that poll duration to be related > to is the cost of a VMEXIT/VMENTER cycle, not however long we happened > to be idle. > > So no. Peter, I think you are right.. IIUC, the time we happened to be idle may contain a chain of VMEXIT/VMENTER cycles, which would be mainly (except the last VMEXIT/VMENTER cycles) for just idle loops. right? as you mentioned, poll duration to be related to is the cost of __a__ VMEXIT/VMENTER cycle. howerver it is very difficult to measure a VMEXIT/VMENTER cycle accurately from kvm guest, we could find out an approximate one -- dropping the idle loops from the time we happened to be idle.. make sense? Quan
diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h index 5ca63eb..6e0554d 100644 --- a/include/linux/sched/idle.h +++ b/include/linux/sched/idle.h @@ -12,6 +12,10 @@ enum cpu_idle_type { extern void wake_up_if_idle(int cpu); +#if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT) +extern void update_poll_duration(unsigned long idle_duration); +#endif + /* * Idle thread specific functions to determine the need_resched * polling state. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0869b20..25be9a3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1678,6 +1678,10 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags, if (rq->avg_idle > max) rq->avg_idle = max; +#if defined(CONFIG_PARAVIRT) + update_poll_duration(rq->avg_idle); +#endif + rq->idle_stamp = 0; } #endif diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index b374744..7eb8559 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -101,6 +101,13 @@ void __cpuidle default_idle_call(void) } } +#if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT) +void update_poll_duration(unsigned long idle_duration) +{ + paravirt_idle_update_poll_duration(idle_duration); +} +#endif + static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev, int next_state) {