Message ID | 147520403328.22544.3265744862320473651.stgit@Solace.fritz.box (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 30/09/16 03:53, Dario Faggioli wrote: > When a vcpu explicitly yields it is usually giving > us an advice of "let someone else run and come back > to me in a bit." > > Credit2 isn't, so far, doing anything when a vcpu > yields, which means an yield is basically a NOP (well, > actually, it's pure overhead, as it causes the scheduler > kick in, but the result is --at least 99% of the time-- > that the very same vcpu that yielded continues to run). > > Implement a "preempt bias", to be applied to yielding > vcpus. Basically when evaluating what vcpu to run next, > if a vcpu that has just yielded is encountered, we give > it a credit penalty, and check whether there is anyone > else that would better take over the cpu (of course, > if there isn't the yielding vcpu will continue). > > The value of this bias can be configured with a boot > time parameter, and the default is set to 1 ms. > > Also, add an yield performance counter, and fix the > style of a couple of comments. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Hmm, I'm sorry for not asking this earlier -- but what's the rationale for having a "yield bias", rather than just choosing the next runnable guy in the queue on yield, regardless of what his credits are? If snext has 9ms of credit and the top runnable guy on the runqueue has 7.8ms of credit, doesn't it make sense to run him instead of making snext run again and burn his credit? Couldn't this be done by changing yield_bias to a boolean yield, and... > --- > Cc: George Dunlap <george.dunlap@eu.citrix.com> > Cc: Anshul Makkar <anshul.makkar@citrix.com> > Cc: Jan Beulich <jbeulich@suse.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > --- > Changes from v1: > * add _us to the parameter name, as suggested during review; > * get rid of the minimum value for the yield bias; > * apply the idle bias via subtraction of credits to the yielding vcpu, rather > than via addition to all the others; > * merge the Credit2 bits of what was patch 7 here, as suggested during review. > --- > docs/misc/xen-command-line.markdown | 10 +++++ > xen/common/sched_credit2.c | 76 +++++++++++++++++++++++++++-------- > xen/common/schedule.c | 2 + > xen/include/xen/perfc_defn.h | 1 > 4 files changed, 71 insertions(+), 18 deletions(-) > > diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown > index 8ff57fa..4fd3460 100644 > --- a/docs/misc/xen-command-line.markdown > +++ b/docs/misc/xen-command-line.markdown > @@ -1395,6 +1395,16 @@ Choose the default scheduler. > ### sched\_credit2\_migrate\_resist > > `= <integer>` > > +### sched\_credit2\_yield\_bias\_us > +> `= <integer>` > + > +> Default: `1000` > + > +Set how much a yielding vcpu will be penalized, in order to actually > +give a chance to run to some other vcpu. This is basically a bias, in > +favour of the non-yielding vcpus, expressed in microseconds (default > +is 1ms). > + > ### sched\_credit\_tslice\_ms > > `= <integer>` > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 72e31b5..fde61ef 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -144,6 +144,8 @@ > #define CSCHED2_MIGRATE_RESIST ((opt_migrate_resist)*MICROSECS(1)) > /* How much to "compensate" a vcpu for L2 migration */ > #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50) > +/* How big of a bias we should have against a yielding vcpu */ > +#define CSCHED2_YIELD_BIAS ((opt_yield_bias)*MICROSECS(1)) > /* Reset: Value below which credit will be reset. */ > #define CSCHED2_CREDIT_RESET 0 > /* Max timer: Maximum time a guest can be run for. */ > @@ -181,11 +183,20 @@ > */ > #define __CSFLAG_runq_migrate_request 3 > #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request) > - > +/* > + * CSFLAG_vcpu_yield: this vcpu was running, and has called vcpu_yield(). The > + * scheduler is invoked to see if we can give the cpu to someone else, and > + * get back to the yielding vcpu in a while. > + */ > +#define __CSFLAG_vcpu_yield 4 > +#define CSFLAG_vcpu_yield (1<<__CSFLAG_vcpu_yield) > > static unsigned int __read_mostly opt_migrate_resist = 500; > integer_param("sched_credit2_migrate_resist", opt_migrate_resist); > > +static unsigned int __read_mostly opt_yield_bias = 1000; > +integer_param("sched_credit2_yield_bias_us", opt_yield_bias); > + > /* > * Useful macros > */ > @@ -1431,6 +1442,14 @@ out: > } > > static void > +csched2_vcpu_yield(const struct scheduler *ops, struct vcpu *v) > +{ > + struct csched2_vcpu * const svc = CSCHED2_VCPU(v); > + > + __set_bit(__CSFLAG_vcpu_yield, &svc->flags); > +} > + > +static void > csched2_context_saved(const struct scheduler *ops, struct vcpu *vc) > { > struct csched2_vcpu * const svc = CSCHED2_VCPU(vc); > @@ -2250,26 +2269,39 @@ runq_candidate(struct csched2_runqueue_data *rqd, > struct list_head *iter; > struct csched2_vcpu *snext = NULL; > struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu)); > + /* > + * If scurr is yielding, temporarily subtract CSCHED2_YIELD_BIAS > + * credits to it (where "temporarily" means "for the sake of just > + * this scheduling decision). > + */ > + int yield_bias = 0, snext_credit; > > *skipped = 0; > > - /* Default to current if runnable, idle otherwise */ > - if ( vcpu_runnable(scurr->vcpu) ) > - snext = scurr; > - else > - snext = CSCHED2_VCPU(idle_vcpu[cpu]); > - > /* > * Return the current vcpu if it has executed for less than ratelimit. > * Adjuststment for the selected vcpu's credit and decision > * for how long it will run will be taken in csched2_runtime. > + * > + * Note that, if scurr is yielding, we don't let rate limiting kick in. > + * In fact, it may be the case that scurr is about to spin, and there's > + * no point forcing it to do so until rate limiting expires. > */ > - if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) && > - vcpu_runnable(scurr->vcpu) && > - (now - scurr->vcpu->runstate.state_entry_time) < > - MICROSECS(prv->ratelimit_us) ) > + if ( __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags) ) > + yield_bias = CSCHED2_YIELD_BIAS; > + else if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) && > + vcpu_runnable(scurr->vcpu) && > + (now - scurr->vcpu->runstate.state_entry_time) < > + MICROSECS(prv->ratelimit_us) ) > return scurr; > > + /* Default to current if runnable, idle otherwise */ > + if ( vcpu_runnable(scurr->vcpu) ) > + snext = scurr; > + else > + snext = CSCHED2_VCPU(idle_vcpu[cpu]); > + > + snext_credit = snext->credit - yield_bias; > list_for_each( iter, &rqd->runq ) > { > struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem); > @@ -2293,19 +2325,23 @@ runq_candidate(struct csched2_runqueue_data *rqd, > continue; > } > > - /* If this is on a different processor, don't pull it unless > - * its credit is at least CSCHED2_MIGRATE_RESIST higher. */ > + /* > + * If this is on a different processor, don't pull it unless > + * its credit is at least CSCHED2_MIGRATE_RESIST higher. > + */ > if ( svc->vcpu->processor != cpu > - && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit ) > + && snext_credit + CSCHED2_MIGRATE_RESIST > svc->credit ) > { > (*skipped)++; > SCHED_STAT_CRANK(migrate_resisted); > continue; > } > > - /* If the next one on the list has more credit than current > - * (or idle, if current is not runnable), choose it. */ > - if ( svc->credit > snext->credit ) > + /* > + * If the next one on the list has more credit than current > + * (or idle, if current is not runnable), choose it. > + */ > + if ( svc->credit > snext_credit ) > snext = svc; ...changing this conditional to the following? if ( yield || svc->credit > snext->credit ) -George
On Fri, 2016-09-30 at 13:52 +0100, George Dunlap wrote: > On 30/09/16 03:53, Dario Faggioli wrote: > > > > When a vcpu explicitly yields it is usually giving > > us an advice of "let someone else run and come back > > to me in a bit." > > > > Credit2 isn't, so far, doing anything when a vcpu > > yields, which means an yield is basically a NOP (well, > > actually, it's pure overhead, as it causes the scheduler > > kick in, but the result is --at least 99% of the time-- > > that the very same vcpu that yielded continues to run). > > > > Implement a "preempt bias", to be applied to yielding > > vcpus. Basically when evaluating what vcpu to run next, > > if a vcpu that has just yielded is encountered, we give > > it a credit penalty, and check whether there is anyone > > else that would better take over the cpu (of course, > > if there isn't the yielding vcpu will continue). > > > > The value of this bias can be configured with a boot > > time parameter, and the default is set to 1 ms. > > > > Also, add an yield performance counter, and fix the > > style of a couple of comments. > > > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > Hmm, I'm sorry for not asking this earlier -- but what's the > rationale > for having a "yield bias", rather than just choosing the next > runnable > guy in the queue on yield, regardless of what his credits are? > Flexibility, I'd say. Flexibility of deciding 'how strong' an yield would be and, e.g., have two different values for two cpupools (not possible with just this patch, but not a big deal to do in future). Sure, if we only think at the spinlock case, that's not really important (if good at all). OTOH, if you think at yielding as a way of saying: <<hey, I've still got things to do, and I could go on, but if there's anyone that has something more important, I'm fine letting him run for a while>>. Well, this implementation gives you a way of quantifying that "while". But of course, more flexibility often means more complexity. And in this case, rather than complexity in the code, what would be hard is to come up with a good value for a specific workload. IOW, right now we have no yield. Instead of adding a "yield switch", it's implemented as a "yield knob", which has its up and down sides. I personally like knobs a lot more than switches... But I see the risk of people starting to turn the knob, expecting wonders, and being disappointed (and complaining!) if things don't improve for them! :-P > If snext has 9ms of credit and the top runnable guy on the runqueue > has > 7.8ms of credit, doesn't it make sense to run him instead of making > snext run again and burn his credit? > Again, in the one use case for which yield is most popular (the spinlock one) this that you say totally makes sense. Which makes me think that, even if we were to keep (or go back to) using the bias, I'd probably go with a default value higher than 1ms worth of credits. *ANYWAY*, you asked for a rationale, this is mine. But all this being said, though, I honestly think the simple solution you're hinting at is better, at least for now. Also, I've just tried to implement it, and yes, it works by doing as you suggest here, and the code is simpler. Therefore, I'm going for that. :-) I've just seen you have applied most of the series already. I'll send a v3 consisting only of the remaining patches, with this one modified as suggested. Thanks and Regards, Dario
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 8ff57fa..4fd3460 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1395,6 +1395,16 @@ Choose the default scheduler. ### sched\_credit2\_migrate\_resist > `= <integer>` +### sched\_credit2\_yield\_bias\_us +> `= <integer>` + +> Default: `1000` + +Set how much a yielding vcpu will be penalized, in order to actually +give a chance to run to some other vcpu. This is basically a bias, in +favour of the non-yielding vcpus, expressed in microseconds (default +is 1ms). + ### sched\_credit\_tslice\_ms > `= <integer>` diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 72e31b5..fde61ef 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -144,6 +144,8 @@ #define CSCHED2_MIGRATE_RESIST ((opt_migrate_resist)*MICROSECS(1)) /* How much to "compensate" a vcpu for L2 migration */ #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50) +/* How big of a bias we should have against a yielding vcpu */ +#define CSCHED2_YIELD_BIAS ((opt_yield_bias)*MICROSECS(1)) /* Reset: Value below which credit will be reset. */ #define CSCHED2_CREDIT_RESET 0 /* Max timer: Maximum time a guest can be run for. */ @@ -181,11 +183,20 @@ */ #define __CSFLAG_runq_migrate_request 3 #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request) - +/* + * CSFLAG_vcpu_yield: this vcpu was running, and has called vcpu_yield(). The + * scheduler is invoked to see if we can give the cpu to someone else, and + * get back to the yielding vcpu in a while. + */ +#define __CSFLAG_vcpu_yield 4 +#define CSFLAG_vcpu_yield (1<<__CSFLAG_vcpu_yield) static unsigned int __read_mostly opt_migrate_resist = 500; integer_param("sched_credit2_migrate_resist", opt_migrate_resist); +static unsigned int __read_mostly opt_yield_bias = 1000; +integer_param("sched_credit2_yield_bias_us", opt_yield_bias); + /* * Useful macros */ @@ -1431,6 +1442,14 @@ out: } static void +csched2_vcpu_yield(const struct scheduler *ops, struct vcpu *v) +{ + struct csched2_vcpu * const svc = CSCHED2_VCPU(v); + + __set_bit(__CSFLAG_vcpu_yield, &svc->flags); +} + +static void csched2_context_saved(const struct scheduler *ops, struct vcpu *vc) { struct csched2_vcpu * const svc = CSCHED2_VCPU(vc); @@ -2250,26 +2269,39 @@ runq_candidate(struct csched2_runqueue_data *rqd, struct list_head *iter; struct csched2_vcpu *snext = NULL; struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu)); + /* + * If scurr is yielding, temporarily subtract CSCHED2_YIELD_BIAS + * credits to it (where "temporarily" means "for the sake of just + * this scheduling decision). + */ + int yield_bias = 0, snext_credit; *skipped = 0; - /* Default to current if runnable, idle otherwise */ - if ( vcpu_runnable(scurr->vcpu) ) - snext = scurr; - else - snext = CSCHED2_VCPU(idle_vcpu[cpu]); - /* * Return the current vcpu if it has executed for less than ratelimit. * Adjuststment for the selected vcpu's credit and decision * for how long it will run will be taken in csched2_runtime. + * + * Note that, if scurr is yielding, we don't let rate limiting kick in. + * In fact, it may be the case that scurr is about to spin, and there's + * no point forcing it to do so until rate limiting expires. */ - if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) && - vcpu_runnable(scurr->vcpu) && - (now - scurr->vcpu->runstate.state_entry_time) < - MICROSECS(prv->ratelimit_us) ) + if ( __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags) ) + yield_bias = CSCHED2_YIELD_BIAS; + else if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) && + vcpu_runnable(scurr->vcpu) && + (now - scurr->vcpu->runstate.state_entry_time) < + MICROSECS(prv->ratelimit_us) ) return scurr; + /* Default to current if runnable, idle otherwise */ + if ( vcpu_runnable(scurr->vcpu) ) + snext = scurr; + else + snext = CSCHED2_VCPU(idle_vcpu[cpu]); + + snext_credit = snext->credit - yield_bias; list_for_each( iter, &rqd->runq ) { struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem); @@ -2293,19 +2325,23 @@ runq_candidate(struct csched2_runqueue_data *rqd, continue; } - /* If this is on a different processor, don't pull it unless - * its credit is at least CSCHED2_MIGRATE_RESIST higher. */ + /* + * If this is on a different processor, don't pull it unless + * its credit is at least CSCHED2_MIGRATE_RESIST higher. + */ if ( svc->vcpu->processor != cpu - && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit ) + && snext_credit + CSCHED2_MIGRATE_RESIST > svc->credit ) { (*skipped)++; SCHED_STAT_CRANK(migrate_resisted); continue; } - /* If the next one on the list has more credit than current - * (or idle, if current is not runnable), choose it. */ - if ( svc->credit > snext->credit ) + /* + * If the next one on the list has more credit than current + * (or idle, if current is not runnable), choose it. + */ + if ( svc->credit > snext_credit ) snext = svc; /* In any case, if we got this far, break. */ @@ -2391,7 +2427,8 @@ csched2_schedule( */ if ( tasklet_work_scheduled ) { - trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL); + __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags); + trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL); snext = CSCHED2_VCPU(idle_vcpu[cpu]); } else @@ -2923,6 +2960,8 @@ csched2_init(struct scheduler *ops) printk(XENLOG_INFO "load tracking window lenght %llu ns\n", 1ULL << opt_load_window_shift); + printk(XENLOG_INFO "yield bias value %d us\n", opt_yield_bias); + /* Basically no CPU information is available at this point; just * set up basic structures, and a callback when the CPU info is * available. */ @@ -2975,6 +3014,7 @@ static const struct scheduler sched_credit2_def = { .sleep = csched2_vcpu_sleep, .wake = csched2_vcpu_wake, + .yield = csched2_vcpu_yield, .adjust = csched2_dom_cntl, .adjust_global = csched2_sys_cntl, diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 104d203..5b444c4 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -947,6 +947,8 @@ long vcpu_yield(void) SCHED_OP(VCPU2OP(v), yield, v); vcpu_schedule_unlock_irq(lock, v); + SCHED_STAT_CRANK(vcpu_yield); + TRACE_2D(TRC_SCHED_YIELD, current->domain->domain_id, current->vcpu_id); raise_softirq(SCHEDULE_SOFTIRQ); return 0; diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h index 4a835b8..900fddd 100644 --- a/xen/include/xen/perfc_defn.h +++ b/xen/include/xen/perfc_defn.h @@ -23,6 +23,7 @@ PERFCOUNTER(vcpu_alloc, "sched: vcpu_alloc") PERFCOUNTER(vcpu_insert, "sched: vcpu_insert") PERFCOUNTER(vcpu_remove, "sched: vcpu_remove") PERFCOUNTER(vcpu_sleep, "sched: vcpu_sleep") +PERFCOUNTER(vcpu_yield, "sched: vcpu_yield") PERFCOUNTER(vcpu_wake_running, "sched: vcpu_wake_running") PERFCOUNTER(vcpu_wake_onrunq, "sched: vcpu_wake_onrunq") PERFCOUNTER(vcpu_wake_runnable, "sched: vcpu_wake_runnable")
When a vcpu explicitly yields it is usually giving us an advice of "let someone else run and come back to me in a bit." Credit2 isn't, so far, doing anything when a vcpu yields, which means an yield is basically a NOP (well, actually, it's pure overhead, as it causes the scheduler kick in, but the result is --at least 99% of the time-- that the very same vcpu that yielded continues to run). Implement a "preempt bias", to be applied to yielding vcpus. Basically when evaluating what vcpu to run next, if a vcpu that has just yielded is encountered, we give it a credit penalty, and check whether there is anyone else that would better take over the cpu (of course, if there isn't the yielding vcpu will continue). The value of this bias can be configured with a boot time parameter, and the default is set to 1 ms. Also, add an yield performance counter, and fix the style of a couple of comments. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Anshul Makkar <anshul.makkar@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> --- Changes from v1: * add _us to the parameter name, as suggested during review; * get rid of the minimum value for the yield bias; * apply the idle bias via subtraction of credits to the yielding vcpu, rather than via addition to all the others; * merge the Credit2 bits of what was patch 7 here, as suggested during review. --- docs/misc/xen-command-line.markdown | 10 +++++ xen/common/sched_credit2.c | 76 +++++++++++++++++++++++++++-------- xen/common/schedule.c | 2 + xen/include/xen/perfc_defn.h | 1 4 files changed, 71 insertions(+), 18 deletions(-)