Message ID | 20190809145833.1020-14-jgross@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen: add core scheduling support | expand |
On 09.08.2019 16:57, Juergen Gross wrote: > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -407,6 +407,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor) > { > get_sched_res(v->processor)->curr = unit; > v->is_running = 1; > + unit->is_running = 1; > + unit->state_entry_time = NOW(); > } > else > { Are idle vCPU-s also going to get grouped into units (I'm sorry, I don't recall)? If so, just like further down I'd be putting under question the multiple writing of the field. I'd kind of expect the unit and all vCPU-s within it to get an identical state entry time stored. Also both here and further down I'm puzzled to see that the unit's field doesn't get set at the same place in code where the vCPU's field gets set. > @@ -1663,8 +1666,10 @@ static void schedule(void) > * switch, else lost_records resume will not work properly. > */ > > - ASSERT(!next->is_running); > + ASSERT(!next->sched_unit->is_running); > next->is_running = 1; > + next->sched_unit->is_running = 1; > + next->sched_unit->state_entry_time = now; Isn't the ASSERT() you delete still supposed to be true? In which case wouldn't it be worthwhile to retain it? > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -272,6 +272,11 @@ struct sched_unit { > struct sched_resource *res; > int unit_id; > > + /* Last time unit got (de-)scheduled. */ > + uint64_t state_entry_time; > + > + /* Currently running on a CPU? */ > + bool is_running; > /* Does soft affinity actually play a role (given hard affinity)? */ > bool soft_aff_effective; > /* Bitmask of CPUs on which this VCPU may run. */ I'm noticing this here, but it may well have been an issue earlier already (and there may well be later adjustments invalidating my remark, and of course it's the end result of this series which matters in the long run): Could you see about adding/removing fields to this struct (and generally of course also others) minimizing the number / overall size of holes? Jan
On 04.09.19 17:06, Jan Beulich wrote: > On 09.08.2019 16:57, Juergen Gross wrote: >> --- a/xen/common/schedule.c >> +++ b/xen/common/schedule.c >> @@ -407,6 +407,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor) >> { >> get_sched_res(v->processor)->curr = unit; >> v->is_running = 1; >> + unit->is_running = 1; >> + unit->state_entry_time = NOW(); >> } >> else >> { > > Are idle vCPU-s also going to get grouped into units (I'm sorry, > I don't recall)? If so, just like further down I'd be putting > under question the multiple writing of the field. I'd kind of > expect the unit and all vCPU-s within it to get an identical > state entry time stored. When an idle vcpu is being created its cpu is in no cpupool yet (a free cpu). Those cpus are always subject to cpu scheduling (granularity 1). Only when cpus are put into a cpupool the granularity is adjusted accordingly and the idle-vcpus are possibly grouped in units (see patch 45). Regarding the state entry time: different vcpus of a unit can be in different states (blocked/running, etc.), so their state entry times will generally differ. A unit's state entry time will reflect its scheduling events (being scheduled/descheduled). > Also both here and further down I'm puzzled to see that the > unit's field doesn't get set at the same place in code where > the vCPU's field gets set. Right. The states of a vcpu and the unit it is belonging to are coupled, but not identical. > >> @@ -1663,8 +1666,10 @@ static void schedule(void) >> * switch, else lost_records resume will not work properly. >> */ >> >> - ASSERT(!next->is_running); >> + ASSERT(!next->sched_unit->is_running); >> next->is_running = 1; >> + next->sched_unit->is_running = 1; >> + next->sched_unit->state_entry_time = now; > > Isn't the ASSERT() you delete still supposed to be true? In which case > wouldn't it be worthwhile to retain it? No, it is not true any longer. All but one vcpus of a unit can be blocked resulting in the related idle vcpus to be running. This means that even with one unit being replaced by another one the vcpu can be the same. > >> --- a/xen/include/xen/sched.h >> +++ b/xen/include/xen/sched.h >> @@ -272,6 +272,11 @@ struct sched_unit { >> struct sched_resource *res; >> int unit_id; >> >> + /* Last time unit got (de-)scheduled. */ >> + uint64_t state_entry_time; >> + >> + /* Currently running on a CPU? */ >> + bool is_running; >> /* Does soft affinity actually play a role (given hard affinity)? */ >> bool soft_aff_effective; >> /* Bitmask of CPUs on which this VCPU may run. */ > > I'm noticing this here, but it may well have been an issue earlier > already (and there may well be later adjustments invalidating my > remark, and of course it's the end result of this series which > matters in the long run): Could you see about adding/removing > fields to this struct (and generally of course also others) > minimizing the number / overall size of holes? Hmm, yes. Juergen
On 11.09.2019 15:44, Juergen Gross wrote: > On 04.09.19 17:06, Jan Beulich wrote: >> On 09.08.2019 16:57, Juergen Gross wrote: >>> --- a/xen/common/schedule.c >>> +++ b/xen/common/schedule.c >>> @@ -407,6 +407,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor) >>> { >>> get_sched_res(v->processor)->curr = unit; >>> v->is_running = 1; >>> + unit->is_running = 1; >>> + unit->state_entry_time = NOW(); >>> } >>> else >>> { >> >> Are idle vCPU-s also going to get grouped into units (I'm sorry, >> I don't recall)? If so, just like further down I'd be putting >> under question the multiple writing of the field. I'd kind of >> expect the unit and all vCPU-s within it to get an identical >> state entry time stored. > > When an idle vcpu is being created its cpu is in no cpupool yet > (a free cpu). Those cpus are always subject to cpu scheduling > (granularity 1). Only when cpus are put into a cpupool the > granularity is adjusted accordingly and the idle-vcpus are > possibly grouped in units (see patch 45). > > Regarding the state entry time: different vcpus of a unit can be > in different states (blocked/running, etc.), so their state entry > times will generally differ. A unit's state entry time will > reflect its scheduling events (being scheduled/descheduled). But this doesn't (to me) clarify whether the multiple writing (once per vCPU) is actually correct. Obviously whether this is merely cosmetic depends on what the consumers of this data are. >> Also both here and further down I'm puzzled to see that the >> unit's field doesn't get set at the same place in code where >> the vCPU's field gets set. > > Right. The states of a vcpu and the unit it is belonging to are > coupled, but not identical. And this is not (going to be) a problem? Jan
On 11.09.19 17:06, Jan Beulich wrote: > On 11.09.2019 15:44, Juergen Gross wrote: >> On 04.09.19 17:06, Jan Beulich wrote: >>> On 09.08.2019 16:57, Juergen Gross wrote: >>>> --- a/xen/common/schedule.c >>>> +++ b/xen/common/schedule.c >>>> @@ -407,6 +407,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor) >>>> { >>>> get_sched_res(v->processor)->curr = unit; >>>> v->is_running = 1; >>>> + unit->is_running = 1; >>>> + unit->state_entry_time = NOW(); >>>> } >>>> else >>>> { >>> >>> Are idle vCPU-s also going to get grouped into units (I'm sorry, >>> I don't recall)? If so, just like further down I'd be putting >>> under question the multiple writing of the field. I'd kind of >>> expect the unit and all vCPU-s within it to get an identical >>> state entry time stored. >> >> When an idle vcpu is being created its cpu is in no cpupool yet >> (a free cpu). Those cpus are always subject to cpu scheduling >> (granularity 1). Only when cpus are put into a cpupool the >> granularity is adjusted accordingly and the idle-vcpus are >> possibly grouped in units (see patch 45). >> >> Regarding the state entry time: different vcpus of a unit can be >> in different states (blocked/running, etc.), so their state entry >> times will generally differ. A unit's state entry time will >> reflect its scheduling events (being scheduled/descheduled). > > But this doesn't (to me) clarify whether the multiple writing > (once per vCPU) is actually correct. Obviously whether this is > merely cosmetic depends on what the consumers of this data are. There is no multiple writing. This is a preparatory patch for being able to remove the vcpu dependencies from the single schedulers. Later when multiple vcpus per unit are introduced there will be a clear distinction between vcpu switching (context switches) and sched_unit switching (scheduling events). vcpu data will be adjusted at context_switch time, while unit data will be adjusted only when scheduling units. > >>> Also both here and further down I'm puzzled to see that the >>> unit's field doesn't get set at the same place in code where >>> the vCPU's field gets set. >> >> Right. The states of a vcpu and the unit it is belonging to are >> coupled, but not identical. > > And this is not (going to be) a problem? With a proper distinction as explained above, no. Juergen
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 339702f3ec..143c7543e8 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -709,7 +709,7 @@ __csched_vcpu_is_migrateable(const struct csched_private *prv, struct vcpu *vc, * The caller is supposed to have already checked that vc is also * not running. */ - ASSERT(!vc->is_running); + ASSERT(!vc->sched_unit->is_running); return !__csched_vcpu_is_cache_hot(prv, svc) && cpumask_test_cpu(dest_cpu, mask); @@ -1033,7 +1033,8 @@ csched_unit_insert(const struct scheduler *ops, struct sched_unit *unit) lock = unit_schedule_lock_irq(unit); - if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running ) + if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && + !vc->sched_unit->is_running ) runq_insert(svc); unit_schedule_unlock_irq(lock, unit); @@ -1646,8 +1647,9 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) * vCPUs with useful soft affinities in some sort of bitmap * or counter. */ - if ( vc->is_running || (balance_step == BALANCE_SOFT_AFFINITY && - !has_soft_affinity(vc->sched_unit)) ) + if ( vc->sched_unit->is_running || + (balance_step == BALANCE_SOFT_AFFINITY && + !has_soft_affinity(vc->sched_unit)) ) continue; affinity_balance_cpumask(vc->sched_unit, balance_step, cpumask_scratch); @@ -1855,7 +1857,7 @@ csched_schedule( (unsigned char *)&d); } - runtime = now - current->runstate.state_entry_time; + runtime = now - current->sched_unit->state_entry_time; if ( runtime < 0 ) /* Does this ever happen? */ runtime = 0; diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index ed83381ddb..5487d087e1 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -1284,7 +1284,7 @@ runq_insert(const struct scheduler *ops, struct csched2_unit *svc) ASSERT(&svc->rqd->runq == runq); ASSERT(!is_idle_vcpu(svc->vcpu)); - ASSERT(!svc->vcpu->is_running); + ASSERT(!svc->vcpu->sched_unit->is_running); ASSERT(!(svc->flags & CSFLAG_scheduled)); list_for_each( iter, runq ) @@ -1341,8 +1341,8 @@ static inline bool is_preemptable(const struct csched2_unit *svc, if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE ) return true; - ASSERT(svc->vcpu->is_running); - return now - svc->vcpu->runstate.state_entry_time > + ASSERT(svc->vcpu->sched_unit->is_running); + return now - svc->vcpu->sched_unit->state_entry_time > ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE; } @@ -2932,7 +2932,7 @@ csched2_dom_cntl( { svc = csched2_unit(v->sched_unit); lock = unit_schedule_lock(svc->vcpu->sched_unit); - if ( v->is_running ) + if ( v->sched_unit->is_running ) { unsigned int cpu = v->processor; struct csched2_runqueue_data *rqd = c2rqd(ops, cpu); @@ -3205,8 +3205,8 @@ csched2_runtime(const struct scheduler *ops, int cpu, if ( prv->ratelimit_us ) { s_time_t ratelimit_min = MICROSECS(prv->ratelimit_us); - if ( snext->vcpu->is_running ) - ratelimit_min = snext->vcpu->runstate.state_entry_time + + if ( snext->vcpu->sched_unit->is_running ) + ratelimit_min = snext->vcpu->sched_unit->state_entry_time + MICROSECS(prv->ratelimit_us) - now; if ( ratelimit_min > min_time ) min_time = ratelimit_min; @@ -3303,7 +3303,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, * no point forcing it to do so until rate limiting expires. */ if ( !yield && prv->ratelimit_us && vcpu_runnable(scurr->vcpu) && - (now - scurr->vcpu->runstate.state_entry_time) < + (now - scurr->vcpu->sched_unit->state_entry_time) < MICROSECS(prv->ratelimit_us) ) { if ( unlikely(tb_init_done) ) @@ -3314,7 +3314,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, } d; d.dom = scurr->vcpu->domain->domain_id; d.vcpu = scurr->vcpu->vcpu_id; - d.runtime = now - scurr->vcpu->runstate.state_entry_time; + d.runtime = now - scurr->vcpu->sched_unit->state_entry_time; __trace_var(TRC_CSCHED2_RATELIMIT, 1, sizeof(d), (unsigned char *)&d); @@ -3562,7 +3562,7 @@ csched2_schedule( if ( snext != scurr ) { ASSERT(snext->rqd == rqd); - ASSERT(!snext->vcpu->is_running); + ASSERT(!snext->vcpu->sched_unit->is_running); runq_remove(snext); __set_bit(__CSFLAG_scheduled, &snext->flags); diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index b662924483..df1a874c97 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -906,7 +906,7 @@ rt_unit_insert(const struct scheduler *ops, struct sched_unit *unit) { replq_insert(ops, svc); - if ( !vc->is_running ) + if ( !unit->is_running ) runq_insert(ops, svc); } unit_schedule_unlock_irq(lock, unit); diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 8cbb6320c6..03119af25c 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -407,6 +407,8 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor) { get_sched_res(v->processor)->curr = unit; v->is_running = 1; + unit->is_running = 1; + unit->state_entry_time = NOW(); } else { @@ -727,7 +729,8 @@ static void vcpu_migrate_finish(struct vcpu *v) * context_saved(); and in any case, if the bit is cleared, then * someone else has already done the work so we don't need to. */ - if ( v->is_running || !test_bit(_VPF_migrating, &v->pause_flags) ) + if ( v->sched_unit->is_running || + !test_bit(_VPF_migrating, &v->pause_flags) ) return; old_cpu = new_cpu = v->processor; @@ -781,7 +784,7 @@ static void vcpu_migrate_finish(struct vcpu *v) * because they both happen in (different) spinlock regions, and those * regions are strictly serialised. */ - if ( v->is_running || + if ( v->sched_unit->is_running || !test_and_clear_bit(_VPF_migrating, &v->pause_flags) ) { sched_spin_unlock_double(old_lock, new_lock, flags); @@ -809,7 +812,7 @@ void vcpu_force_reschedule(struct vcpu *v) { spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit); - if ( v->is_running ) + if ( v->sched_unit->is_running ) vcpu_migrate_start(v); unit_schedule_unlock_irq(lock, v->sched_unit); @@ -1663,8 +1666,10 @@ static void schedule(void) * switch, else lost_records resume will not work properly. */ - ASSERT(!next->is_running); + ASSERT(!next->sched_unit->is_running); next->is_running = 1; + next->sched_unit->is_running = 1; + next->sched_unit->state_entry_time = now; pcpu_schedule_unlock_irq(lock, cpu); @@ -1686,6 +1691,8 @@ void context_saved(struct vcpu *prev) smp_wmb(); prev->is_running = 0; + prev->sched_unit->is_running = 0; + prev->sched_unit->state_entry_time = NOW(); /* Check for migration request /after/ clearing running flag. */ smp_mb(); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index fa2dbf47ee..f506c0cbd4 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -272,6 +272,11 @@ struct sched_unit { struct sched_resource *res; int unit_id; + /* Last time unit got (de-)scheduled. */ + uint64_t state_entry_time; + + /* Currently running on a CPU? */ + bool is_running; /* Does soft affinity actually play a role (given hard affinity)? */ bool soft_aff_effective; /* Bitmask of CPUs on which this VCPU may run. */
Add an is_running indicator to struct sched_unit which will be set whenever the unit is being scheduled. Switch scheduler code to use unit->is_running instead of vcpu->is_running for scheduling decisions. At the same time introduce a state_entry_time field in struct sched_unit being updated whenever the is_running indicator is changed. Use that new field in the schedulers instead of the similar vcpu field. Signed-off-by: Juergen Gross <jgross@suse.com> --- RFC V2: fix arm build, don't drop v->is_running --- xen/common/sched_credit.c | 12 +++++++----- xen/common/sched_credit2.c | 18 +++++++++--------- xen/common/sched_rt.c | 2 +- xen/common/schedule.c | 15 +++++++++++---- xen/include/xen/sched.h | 5 +++++ 5 files changed, 33 insertions(+), 19 deletions(-)