Message ID | 150307947767.29525.16150424729950084786.stgit@Solace.fritz.box (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 18.08.17 at 20:04, <dario.faggioli@citrix.com> wrote: > --- a/xen/arch/x86/cpu/mwait-idle.c > +++ b/xen/arch/x86/cpu/mwait-idle.c > @@ -741,9 +741,8 @@ static void mwait_idle(void) > } > > cpufreq_dbs_timer_suspend(); > - > sched_tick_suspend(); > - /* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */ > + /* Timer related operations can raise TIMER_SOFTIRQ. Process it now. */ > process_pending_softirqs(); Is this a leftover from v1? Otherwise, why do you do the adjustment here but not in acpi_processor_idle()? > --- a/xen/common/rcupdate.c > +++ b/xen/common/rcupdate.c > @@ -84,8 +84,37 @@ struct rcu_data { > int cpu; > struct rcu_head barrier; > long last_rs_qlen; /* qlen during the last resched */ > + > + /* 3) idle CPUs handling */ > + struct timer idle_timer; > + bool idle_timer_active; > }; > > +/* > + * If a CPU with RCU callbacks queued goes idle, when the grace period is > + * not finished yet, how can we make sure that the callbacks will eventually > + * be executed? In Linux (2.6.21, the first "tickless idle" Linux kernel), > + * the periodic timer tick would not be stopped for such CPU. Here in Xen, > + * we (may) don't even have a periodic timer tick, so we need to use a > + * special purpose timer. > + * > + * Such timer: > + * 1) is armed only when a CPU with an RCU callback(s) queued goes idle > + * before the end of the current grace period (_not_ for any CPUs that > + * go idle!); > + * 2) when it fires, it is only re-armed if the grace period is still > + * running; > + * 3) it is stopped immediately, if the CPU wakes up from idle and > + * resumes 'normal' execution. > + * > + * About how far in the future the timer should be programmed each time, > + * it's hard to tell (guess!!). Since this mimics Linux's periodic timer > + * tick, take values used there as an indication. In Linux 2.6.21, tick > + * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to enable > + * at least some power saving on the CPU that is going idle. > + */ > +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10) With you even mentioning that the original Linux code has ways to use different values, wouldn't it be worth allowing this to be command line controllable? Jan
On 08/18/2017 07:04 PM, Dario Faggioli wrote: > On the CPU where a callback is queued, cpu_is_haltable() > returns false (due to rcu_needs_cpu() being itself false). > That means the CPU would spin inside idle_loop(), continuously > calling do_softirq(), and, in there, continuously checking > rcu_pending(), in a tight loop. > > Let's instead allow the CPU to really go idle, but make sure, > by arming a timer, that we periodically check whether the > grace period has come to an ended. As the period of the > timer, we pick a value that makes thing look like what > happens in Linux, with the periodic tick (as this code > comes from there). > > Note that the timer will *only* be armed on CPUs that are > going idle while having queued RCU callbacks. On CPUs that > don't, there won't be any timer, and their sleep won't be > interrupted (and even for CPUs with callbacks, we only > expect an handful of wakeups at most, but that depends on > the system load, as much as from other things). > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
On 08/22/2017 02:04 PM, Jan Beulich wrote: >>>> On 18.08.17 at 20:04, <dario.faggioli@citrix.com> wrote: >> --- a/xen/arch/x86/cpu/mwait-idle.c >> +++ b/xen/arch/x86/cpu/mwait-idle.c >> @@ -741,9 +741,8 @@ static void mwait_idle(void) >> } >> >> cpufreq_dbs_timer_suspend(); >> - >> sched_tick_suspend(); >> - /* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */ >> + /* Timer related operations can raise TIMER_SOFTIRQ. Process it now. */ >> process_pending_softirqs(); > > Is this a leftover from v1? Otherwise, why do you do the adjustment > here but not in acpi_processor_idle()? > >> --- a/xen/common/rcupdate.c >> +++ b/xen/common/rcupdate.c >> @@ -84,8 +84,37 @@ struct rcu_data { >> int cpu; >> struct rcu_head barrier; >> long last_rs_qlen; /* qlen during the last resched */ >> + >> + /* 3) idle CPUs handling */ >> + struct timer idle_timer; >> + bool idle_timer_active; >> }; >> >> +/* >> + * If a CPU with RCU callbacks queued goes idle, when the grace period is >> + * not finished yet, how can we make sure that the callbacks will eventually >> + * be executed? In Linux (2.6.21, the first "tickless idle" Linux kernel), >> + * the periodic timer tick would not be stopped for such CPU. Here in Xen, >> + * we (may) don't even have a periodic timer tick, so we need to use a >> + * special purpose timer. >> + * >> + * Such timer: >> + * 1) is armed only when a CPU with an RCU callback(s) queued goes idle >> + * before the end of the current grace period (_not_ for any CPUs that >> + * go idle!); >> + * 2) when it fires, it is only re-armed if the grace period is still >> + * running; >> + * 3) it is stopped immediately, if the CPU wakes up from idle and >> + * resumes 'normal' execution. >> + * >> + * About how far in the future the timer should be programmed each time, >> + * it's hard to tell (guess!!). Since this mimics Linux's periodic timer >> + * tick, take values used there as an indication. In Linux 2.6.21, tick >> + * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to enable >> + * at least some power saving on the CPU that is going idle. >> + */ >> +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10) > > With you even mentioning that the original Linux code has ways > to use different values, wouldn't it be worth allowing this to be > command line controllable? Apart from this, are you OK with the patch? Dario is on holiday, and I think it would be good to get this functionality in sooner rather than later to shake out as many bugs as possible. Would you be willing to let the idle timer period be set with a follow-up patch? I'm happy to propagate the comment change to acpi_processor_idle() if necessary. -George
>>> On 29.08.17 at 18:06, <george.dunlap@citrix.com> wrote: > On 08/22/2017 02:04 PM, Jan Beulich wrote: >>>>> On 18.08.17 at 20:04, <dario.faggioli@citrix.com> wrote: >>> --- a/xen/arch/x86/cpu/mwait-idle.c >>> +++ b/xen/arch/x86/cpu/mwait-idle.c >>> @@ -741,9 +741,8 @@ static void mwait_idle(void) >>> } >>> >>> cpufreq_dbs_timer_suspend(); >>> - >>> sched_tick_suspend(); >>> - /* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */ >>> + /* Timer related operations can raise TIMER_SOFTIRQ. Process it now. */ >>> process_pending_softirqs(); >> >> Is this a leftover from v1? Otherwise, why do you do the adjustment >> here but not in acpi_processor_idle()? >> >>> --- a/xen/common/rcupdate.c >>> +++ b/xen/common/rcupdate.c >>> @@ -84,8 +84,37 @@ struct rcu_data { >>> int cpu; >>> struct rcu_head barrier; >>> long last_rs_qlen; /* qlen during the last resched */ >>> + >>> + /* 3) idle CPUs handling */ >>> + struct timer idle_timer; >>> + bool idle_timer_active; >>> }; >>> >>> +/* >>> + * If a CPU with RCU callbacks queued goes idle, when the grace period is >>> + * not finished yet, how can we make sure that the callbacks will > eventually >>> + * be executed? In Linux (2.6.21, the first "tickless idle" Linux kernel), >>> + * the periodic timer tick would not be stopped for such CPU. Here in Xen, >>> + * we (may) don't even have a periodic timer tick, so we need to use a >>> + * special purpose timer. >>> + * >>> + * Such timer: >>> + * 1) is armed only when a CPU with an RCU callback(s) queued goes idle >>> + * before the end of the current grace period (_not_ for any CPUs that >>> + * go idle!); >>> + * 2) when it fires, it is only re-armed if the grace period is still >>> + * running; >>> + * 3) it is stopped immediately, if the CPU wakes up from idle and >>> + * resumes 'normal' execution. >>> + * >>> + * About how far in the future the timer should be programmed each time, >>> + * it's hard to tell (guess!!). Since this mimics Linux's periodic timer >>> + * tick, take values used there as an indication. In Linux 2.6.21, tick >>> + * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to enable >>> + * at least some power saving on the CPU that is going idle. >>> + */ >>> +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10) >> >> With you even mentioning that the original Linux code has ways >> to use different values, wouldn't it be worth allowing this to be >> command line controllable? > > Apart from this, are you OK with the patch? Yes. > Dario is on holiday, and I think it would be good to get this > functionality in sooner rather than later to shake out as many bugs as > possible. Would you be willing to let the idle timer period be set with > a follow-up patch? Yes. > I'm happy to propagate the comment change to acpi_processor_idle() if > necessary. Propagate? So far I was of the opinion that the change had been intentionally dropped there, but was mistakenly left in place in mwait_idle(). Or if the comment was intended to be changed (in both places) I can't see how that would belong into this patch, as I think sched_tick_suspend() could have fiddled with timers even before. But in the end I don't care all that much if the comment adjustment gets done here or in a separate patch - my main wish really is for the two places to stay in sync. Jan
On Wed, Aug 30, 2017 at 8:18 AM, Jan Beulich <JBeulich@suse.com> wrote: >> Apart from this, are you OK with the patch? > > Yes. > >> Dario is on holiday, and I think it would be good to get this >> functionality in sooner rather than later to shake out as many bugs as >> possible. Would you be willing to let the idle timer period be set with >> a follow-up patch? > > Yes. > >> I'm happy to propagate the comment change to acpi_processor_idle() if >> necessary. > > Propagate? So far I was of the opinion that the change had been > intentionally dropped there, but was mistakenly left in place in > mwait_idle(). Or if the comment was intended to be changed (in > both places) I can't see how that would belong into this patch, as > I think sched_tick_suspend() could have fiddled with timers even > before. But in the end I don't care all that much if the comment > adjustment gets done here or in a separate patch - my main wish > really is for the two places to stay in sync. OK -- I've checked in through patch 5 with that comment hunk removed. Thanks, -George
On Wed, 2017-08-30 at 01:18 -0600, Jan Beulich wrote: > > > > On 29.08.17 at 18:06, <george.dunlap@citrix.com> wrote: > > > > On 08/22/2017 02:04 PM, Jan Beulich wrote: > > > > > > On 18.08.17 at 20:04, <dario.faggioli@citrix.com> wrote: > > > > > > > > --- a/xen/arch/x86/cpu/mwait-idle.c > > > > +++ b/xen/arch/x86/cpu/mwait-idle.c > > > > @@ -741,9 +741,8 @@ static void mwait_idle(void) > > > > } > > > > > > > > cpufreq_dbs_timer_suspend(); > > > > - > > > > sched_tick_suspend(); > > > > - /* sched_tick_suspend() can raise TIMER_SOFTIRQ. > > > > Process it now. */ > > > > + /* Timer related operations can raise TIMER_SOFTIRQ. > > > > Process it now. */ > > > > process_pending_softirqs(); > > > > > > Is this a leftover from v1? Otherwise, why do you do the > > > adjustment > > > here but not in acpi_processor_idle()? > > > > > > > --- a/xen/common/rcupdate.c > > > > +++ b/xen/common/rcupdate.c > > > > @@ -84,8 +84,37 @@ struct rcu_data { > > > > int cpu; > > > > struct rcu_head barrier; > > > > long last_rs_qlen; /* qlen during the last > > > > resched */ > > > > + > > > > + /* 3) idle CPUs handling */ > > > > + struct timer idle_timer; > > > > + bool idle_timer_active; > > > > }; > > > > > > > > +/* > > > > + * If a CPU with RCU callbacks queued goes idle, when the > > > > grace period is > > > > + * not finished yet, how can we make sure that the callbacks > > > > will > > > > eventually > > > > + * be executed? In Linux (2.6.21, the first "tickless idle" > > > > Linux kernel), > > > > + * the periodic timer tick would not be stopped for such CPU. > > > > Here in Xen, > > > > + * we (may) don't even have a periodic timer tick, so we need > > > > to use a > > > > + * special purpose timer. > > > > + * > > > > + * Such timer: > > > > + * 1) is armed only when a CPU with an RCU callback(s) queued > > > > goes idle > > > > + * before the end of the current grace period (_not_ for > > > > any CPUs that > > > > + * go idle!); > > > > + * 2) when it fires, it is only re-armed if the grace period > > > > is still > > > > + * running; > > > > + * 3) it is stopped immediately, if the CPU wakes up from idle > > > > and > > > > + * resumes 'normal' execution. > > > > + * > > > > + * About how far in the future the timer should be programmed > > > > each time, > > > > + * it's hard to tell (guess!!). Since this mimics Linux's > > > > periodic timer > > > > + * tick, take values used there as an indication. In Linux > > > > 2.6.21, tick > > > > + * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to > > > > enable > > > > + * at least some power saving on the CPU that is going idle. > > > > + */ > > > > +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10) > > > > > > With you even mentioning that the original Linux code has ways > > > to use different values, wouldn't it be worth allowing this to be > > > command line controllable? > > > > Dario is on holiday, and I think it would be good to get this > > functionality in sooner rather than later to shake out as many bugs > > as > > possible. Would you be willing to let the idle timer period be set > > with > > a follow-up patch? > So, I'm back, and can do such a patch. Do we want to enforce a maximum value, to try to at least avoid severe injuries, even for users that shot themselves in the foot? Or we just accept anything which is below STIME_MAX? I personally would only accept values smaller than 100ms (In fact, I was keeping it below that level in patch 6, where the heuristics was implemnted) or, if we really want, 1s. Going above these values is basically equivalent to saying that the bug this series has fixed was not really a bug! :-/ Regards, Dario
>>> On 05.09.17 at 19:13, <dario.faggioli@citrix.com> wrote: > On Wed, 2017-08-30 at 01:18 -0600, Jan Beulich wrote: >> > > > On 29.08.17 at 18:06, <george.dunlap@citrix.com> wrote: >> > Dario is on holiday, and I think it would be good to get this >> > functionality in sooner rather than later to shake out as many bugs >> > as >> > possible. Would you be willing to let the idle timer period be set >> > with >> > a follow-up patch? >> > So, I'm back, and can do such a patch. > > Do we want to enforce a maximum value, to try to at least avoid severe > injuries, even for users that shot themselves in the foot? Or we just > accept anything which is below STIME_MAX? > > I personally would only accept values smaller than 100ms (In fact, I > was keeping it below that level in patch 6, where the heuristics was > implemnted) or, if we really want, 1s. > > Going above these values is basically equivalent to saying that the bug > this series has fixed was not really a bug! :-/ I think an upper limit would be good to have. Jan
diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c index 762dff1..b6770ea 100644 --- a/xen/arch/x86/cpu/mwait-idle.c +++ b/xen/arch/x86/cpu/mwait-idle.c @@ -741,9 +741,8 @@ static void mwait_idle(void) } cpufreq_dbs_timer_suspend(); - sched_tick_suspend(); - /* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */ + /* Timer related operations can raise TIMER_SOFTIRQ. Process it now. */ process_pending_softirqs(); /* Interrupts must be disabled for C2 and higher transitions. */ diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c index 12ae7da..871936f 100644 --- a/xen/common/rcupdate.c +++ b/xen/common/rcupdate.c @@ -84,8 +84,37 @@ struct rcu_data { int cpu; struct rcu_head barrier; long last_rs_qlen; /* qlen during the last resched */ + + /* 3) idle CPUs handling */ + struct timer idle_timer; + bool idle_timer_active; }; +/* + * If a CPU with RCU callbacks queued goes idle, when the grace period is + * not finished yet, how can we make sure that the callbacks will eventually + * be executed? In Linux (2.6.21, the first "tickless idle" Linux kernel), + * the periodic timer tick would not be stopped for such CPU. Here in Xen, + * we (may) don't even have a periodic timer tick, so we need to use a + * special purpose timer. + * + * Such timer: + * 1) is armed only when a CPU with an RCU callback(s) queued goes idle + * before the end of the current grace period (_not_ for any CPUs that + * go idle!); + * 2) when it fires, it is only re-armed if the grace period is still + * running; + * 3) it is stopped immediately, if the CPU wakes up from idle and + * resumes 'normal' execution. + * + * About how far in the future the timer should be programmed each time, + * it's hard to tell (guess!!). Since this mimics Linux's periodic timer + * tick, take values used there as an indication. In Linux 2.6.21, tick + * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to enable + * at least some power saving on the CPU that is going idle. + */ +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10) + static DEFINE_PER_CPU(struct rcu_data, rcu_data); static int blimit = 10; @@ -404,7 +433,45 @@ int rcu_needs_cpu(int cpu) { struct rcu_data *rdp = &per_cpu(rcu_data, cpu); - return (!!rdp->curlist || rcu_pending(cpu)); + return (rdp->curlist && !rdp->idle_timer_active) || rcu_pending(cpu); +} + +/* + * Timer for making sure the CPU where a callback is queued does + * periodically poke rcu_pedning(), so that it will invoke the callback + * not too late after the end of the grace period. + */ +void rcu_idle_timer_start() +{ + struct rcu_data *rdp = &this_cpu(rcu_data); + + /* + * Note that we don't check rcu_pending() here. In fact, we don't want + * the timer armed on CPUs that are in the process of quiescing while + * going idle, unless they really are the ones with a queued callback. + */ + if (likely(!rdp->curlist)) + return; + + set_timer(&rdp->idle_timer, NOW() + RCU_IDLE_TIMER_PERIOD); + rdp->idle_timer_active = true; +} + +void rcu_idle_timer_stop() +{ + struct rcu_data *rdp = &this_cpu(rcu_data); + + if (likely(!rdp->idle_timer_active)) + return; + + rdp->idle_timer_active = false; + stop_timer(&rdp->idle_timer); +} + +static void rcu_idle_timer_handler(void* data) +{ + /* Nothing, really... Just count the number of times we fire */ + perfc_incr(rcu_idle_timer); } void rcu_check_callbacks(int cpu) @@ -425,6 +492,8 @@ static void rcu_move_batch(struct rcu_data *this_rdp, struct rcu_head *list, static void rcu_offline_cpu(struct rcu_data *this_rdp, struct rcu_ctrlblk *rcp, struct rcu_data *rdp) { + kill_timer(&rdp->idle_timer); + /* If the cpu going offline owns the grace period we can block * indefinitely waiting for it, so flush it here. */ @@ -453,6 +522,7 @@ static void rcu_init_percpu_data(int cpu, struct rcu_ctrlblk *rcp, rdp->qs_pending = 0; rdp->cpu = cpu; rdp->blimit = blimit; + init_timer(&rdp->idle_timer, rcu_idle_timer_handler, rdp, cpu); } static int cpu_callback( diff --git a/xen/common/schedule.c b/xen/common/schedule.c index c6f4817..8827921 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -1904,6 +1904,7 @@ void sched_tick_suspend(void) sched = per_cpu(scheduler, cpu); SCHED_OP(sched, tick_suspend, cpu); rcu_idle_enter(cpu); + rcu_idle_timer_start(); } void sched_tick_resume(void) @@ -1911,6 +1912,7 @@ void sched_tick_resume(void) struct scheduler *sched; unsigned int cpu = smp_processor_id(); + rcu_idle_timer_stop(); rcu_idle_exit(cpu); sched = per_cpu(scheduler, cpu); SCHED_OP(sched, tick_resume, cpu); diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h index 53849af..ca446e5 100644 --- a/xen/include/xen/perfc_defn.h +++ b/xen/include/xen/perfc_defn.h @@ -12,6 +12,8 @@ PERFCOUNTER(calls_from_multicall, "calls from multicall") PERFCOUNTER(irqs, "#interrupts") PERFCOUNTER(ipis, "#IPIs") +PERFCOUNTER(rcu_idle_timer, "RCU: idle_timer") + /* Generic scheduler counters (applicable to all schedulers) */ PERFCOUNTER(sched_irq, "sched: timer") PERFCOUNTER(sched_run, "sched: runs through scheduler") diff --git a/xen/include/xen/rcupdate.h b/xen/include/xen/rcupdate.h index 561ac43..3402eb5 100644 --- a/xen/include/xen/rcupdate.h +++ b/xen/include/xen/rcupdate.h @@ -149,4 +149,7 @@ int rcu_barrier(void); void rcu_idle_enter(unsigned int cpu); void rcu_idle_exit(unsigned int cpu); +void rcu_idle_timer_start(void); +void rcu_idle_timer_stop(void); + #endif /* __XEN_RCUPDATE_H */
On the CPU where a callback is queued, cpu_is_haltable() returns false (due to rcu_needs_cpu() being itself false). That means the CPU would spin inside idle_loop(), continuously calling do_softirq(), and, in there, continuously checking rcu_pending(), in a tight loop. Let's instead allow the CPU to really go idle, but make sure, by arming a timer, that we periodically check whether the grace period has come to an ended. As the period of the timer, we pick a value that makes thing look like what happens in Linux, with the periodic tick (as this code comes from there). Note that the timer will *only* be armed on CPUs that are going idle while having queued RCU callbacks. On CPUs that don't, there won't be any timer, and their sleep won't be interrupted (and even for CPUs with callbacks, we only expect an handful of wakeups at most, but that depends on the system load, as much as from other things). Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tim Deegan <tim@xen.org> Cc: Wei Liu <wei.liu2@citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Julien Grall <julien.grall@arm.com> --- Changes from v1: * clarified changelog; * fix style/indentation issues; * deal with RCU idle timer in tick suspension logic; * as a consequence of the point above, the timer now fires, so kill the ASSERT_UNREACHABLE, and put a perfcounter there (to count the times it triggers); * add a comment about the value chosen for programming the idle timer; * avoid pointless/bogus '!!' and void* casts; * rearrange the rcu_needs_cpu() return condition; * add a comment to clarify why we don't want to check rcu_pending() in rcu_idle_timer_start(). --- xen/arch/x86/cpu/mwait-idle.c | 3 +- xen/common/rcupdate.c | 72 ++++++++++++++++++++++++++++++++++++++++- xen/common/schedule.c | 2 + xen/include/xen/perfc_defn.h | 2 + xen/include/xen/rcupdate.h | 3 ++ 5 files changed, 79 insertions(+), 3 deletions(-)