Message ID | 1456430736-4606-1-git-send-email-tiche@seas.upenn.edu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hey again, Thanks for turning up so quickly. We are getting closer and closer, although (of course :-)) I have some more comments. However, is there a particular reason why you are keeping the RFC tag? Until you do that, it's like saying that you are chasing feedback, but you do not think yourself the patch should be considered for being upstreamed. As far as my opinion goes, this patch is not ready to go in right now (as I said, I've got questions and comments), but its status is way past RFC. On Thu, 2016-02-25 at 15:05 -0500, Tianyang Chen wrote: > changes since v5: > removed unnecessary vcpu_on_replq() checks > deadline_queue_insert() returns a flag to indicate if it's > needed to re-program the timer > removed unnecessary timer checks > added helper functions to remove vcpus from queues > coding style > > Changes since v4: > removed unnecessary replenishment queue checks in vcpu_wake() > extended replq_remove() to all cases in vcpu_sleep() > used _deadline_queue_insert() helper function for both queues > _replq_insert() and _replq_remove() program timer internally > > Changes since v3: > removed running queue. > added repl queue to keep track of repl events. > timer is now per scheduler. > timer is init on a valid cpu in a cpupool. > So, this does not belong here. It is ok to have it in this part of the email, but it should not land in the actual commit changelog, once the patch will be committed into Xen's git tree. The way to achieve the above is to put this summary of changes below the actual changelog, and below the Signed-of-by lines, after a marker that looks like this "---". > Budget replenishment and enforcement are separated by adding > a replenishment timer, which fires at the next most imminent > release time of all runnable vcpus. > > A new runningq has been added to keep track of all vcpus that > are on pcpus. > Mmm.. Is this the proper changelog? runningq is something we discussed, and that appeared in v2, but is certainly no longer here... :-O > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c > index 2e5430f..16f77f9 100644 > --- a/xen/common/sched_rt.c > +++ b/xen/common/sched_rt.c > > @@ -213,8 +220,14 @@ static inline struct list_head > *rt_depletedq(const struct scheduler *ops) > return &rt_priv(ops)->depletedq; > } > > +static inline struct list_head *rt_replq(const struct scheduler > *ops) > +{ > + return &rt_priv(ops)->replq; > +} > + > /* > - * Queue helper functions for runq and depletedq > + * Queue helper functions for runq, depletedq > + * and replenishment event queue > Full stop. :-) In any case, I'd change this in something like: "Helper functions for manipulating the runqueue, the depleted queue, and the replenishment events queue." > @@ -228,6 +241,18 @@ __q_elem(struct list_head *elem) > return list_entry(elem, struct rt_vcpu, q_elem); > } > > +static struct rt_vcpu * > +__replq_elem(struct list_head *elem) > +{ > + return list_entry(elem, struct rt_vcpu, replq_elem); > +} > + > +static int > +__vcpu_on_replq(const struct rt_vcpu *svc) > +{ > + return !list_empty(&svc->replq_elem); > +} > + > Ok, sorry for changing my mind again, but I really can't stand seeing these underscores. Please, rename these as replq_elem() and vcpu_on_replq(). There is nor sensible reason why we should prefix these with '__'. I know, that will create some amount of inconsistency, but: - there is inconsistency already (here and in other sched_* file) - not introducing more __ prefixed function is a step in the right direction; we'll convert the one that are already there with time. > + * Removing a vcpu from the replenishment queue could > + * re-program the timer for the next replenishment event > + * if there is any on the list > > + */ > +static inline void > +__replq_remove(const struct scheduler *ops, struct rt_vcpu *svc) > +{ > + struct rt_private *prv = rt_priv(ops); > + struct list_head *replq = rt_replq(ops); > + struct timer* repl_timer = prv->repl_timer; > + > + /* > + * Disarm the timer before re-programming it. > + * It doesn't matter if the vcpu to be removed > + * is on top of the list or not because the timer > + * is stopped and needs to be re-programmed anyways > + */ > + stop_timer(repl_timer); > + > + deadline_queue_remove(&svc->replq_elem); > + > + /* re-arm the timer for the next replenishment event */ > + if( !list_empty(replq) ) > + { > + struct rt_vcpu *svc_next = __replq_elem(replq->next); > + set_timer(repl_timer, svc_next->cur_deadline); > + } > Wait, maybe you misunderstood and/or I did not make myself clear enough (in which case, sorry). I never meant to say "always stop the timer". Atually, in one of my last email I said the opposite, and I think that would be the best thing to do. Do you think there is any specific reason why we need to always stop and restart it? If not, I think we can: - have deadline_queue_remove() also return whether the element removed was the first one (i.e., the one the timer was programmed after); - if it was, re-program the timer after the new front of the queue; - if it wasn't, nothing to do. Thoughts? > +/* > + * An utility function that inserts a vcpu to a > + * queue based on certain order (EDF). The returned > + * value is 1 if a vcpu has been inserted to the > + * front of a list > + */ > +static int > +deadline_queue_insert(struct rt_vcpu * (*_get_q_elem)(struct > list_head *elem), > + struct rt_vcpu *svc, struct list_head *elem, struct list_head > *queue) > +{ > + struct list_head *iter; > + int pos = 0; > + > + list_for_each(iter, queue) > + { > + struct rt_vcpu * iter_svc = (*_get_q_elem)(iter); > + if ( svc->cur_deadline <= iter_svc->cur_deadline ) > + break; > + > + pos++; > + } > + > + list_add_tail(elem, iter); > + return !pos; > } > Ok. > @@ -405,22 +500,38 @@ __runq_insert(const struct scheduler *ops, > struct rt_vcpu *svc) > > /* add svc to runq if svc still has budget */ > if ( svc->cur_budget > 0 ) > - { > - list_for_each(iter, runq) > - { > - struct rt_vcpu * iter_svc = __q_elem(iter); > - if ( svc->cur_deadline <= iter_svc->cur_deadline ) > - break; > - } > - list_add_tail(&svc->q_elem, iter); > - } > + deadline_queue_insert(&__q_elem, svc, &svc->q_elem, runq); > else > { > list_add(&svc->q_elem, &prv->depletedq); > + ASSERT( __vcpu_on_replq(svc) ); > So, by doing this, you're telling me that, if the vcpu is added to the depleted queue, there must be a replenishment planned for it (or the ASSERT() would fail). But if we are adding it to the runq, there may or may not be a replenishment planned for it. Is this what we want? Why there must not be a replenishment planned already for a vcpu going into the runq (to happen at its next deadline)? > /* > + * Insert svc into the replenishment event list > + * in replenishment time order. > + * vcpus that need to be replished earlier go first. > + * The timer may be re-programmed if svc is inserted > + * at the front of the event list. > + */ > +static void > +__replq_insert(const struct scheduler *ops, struct rt_vcpu *svc) > +{ > + struct list_head *replq = rt_replq(ops); > + struct rt_private *prv = rt_priv(ops); > + struct timer *repl_timer = prv->repl_timer; > + int set; > + > + ASSERT( !__vcpu_on_replq(svc) ); > + > + set = deadline_queue_insert(&__replq_elem, svc, &svc- > >replq_elem, replq); > + > + if( set ) > + set_timer(repl_timer,svc->cur_deadline); > +} A matter of taste, mostly, but I'd avoid the local variable (if this function will at some point become more complex, then we'll see, but for now, I think it's ok to just use the return value of deadline_queue_insert() inside the if(). > @@ -868,6 +968,8 @@ rt_schedule(const struct scheduler *ops, s_time_t > now, bool_t tasklet_work_sched > set_bit(__RTDS_delayed_runq_add, &scurr->flags); > > snext->last_start = now; > + > You don't need to add neither this blank line... > + ret.time = -1; /* if an idle vcpu is picked */ > if ( !is_idle_vcpu(snext->vcpu) ) > { > if ( snext != scurr ) > @@ -880,9 +982,11 @@ rt_schedule(const struct scheduler *ops, > s_time_t now, bool_t tasklet_work_sched > snext->vcpu->processor = cpu; > ret.migrated = 1; > } > + > + ret.time = snext->budget; /* invoke the scheduler next time > */ > + ...nor this one. > } > @@ -924,6 +1028,8 @@ rt_vcpu_sleep(const struct scheduler *ops, > struct vcpu *vc) > __q_remove(svc); > else if ( svc->flags & RTDS_delayed_runq_add ) > clear_bit(__RTDS_delayed_runq_add, &svc->flags); > + > + __replq_remove(ops, svc); > What I said in my last email is that you probably can get rid of this (see below, whe commenting on context_saved()). > @@ -1051,6 +1153,22 @@ rt_vcpu_wake(const struct scheduler *ops, > struct vcpu *vc) > else > SCHED_STAT_CRANK(vcpu_wake_not_runnable); > > + /* > + * If a deadline passed while svc was asleep/blocked, we need > new > + * scheduling parameters ( a new deadline and full budget), and > + * also a new replenishment event > + */ > + if ( now >= svc->cur_deadline) > + { > + rt_update_deadline(now, svc); > + __replq_remove(ops, svc); > + } > + > + if( !__vcpu_on_replq(svc) ) > + { > + __replq_insert(ops, svc); > + ASSERT( vcpu_runnable(vc) ); > Mmm... What's this assert about? > @@ -1087,10 +1193,6 @@ static void > rt_context_saved(const struct scheduler *ops, struct vcpu *vc) > { > struct rt_vcpu *svc = rt_vcpu(vc); > - struct rt_vcpu *snext = NULL; > - struct rt_dom *sdom = NULL; > - struct rt_private *prv = rt_priv(ops); > - cpumask_t *online; > spinlock_t *lock = vcpu_schedule_lock_irq(vc); > > clear_bit(__RTDS_scheduled, &svc->flags); > @@ -1102,14 +1204,7 @@ rt_context_saved(const struct scheduler *ops, > struct vcpu *vc) > likely(vcpu_runnable(vc)) ) > { > __runq_insert(ops, svc); > - __repl_update(ops, NOW()); > - > - ASSERT(!list_empty(&prv->sdom)); > - sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem); > - online = cpupool_domain_cpumask(sdom->dom); > - snext = __runq_pick(ops, online); /* pick snext from ALL > cpus */ > - > - runq_tickle(ops, snext); > + runq_tickle(ops, svc); > } > So, here we are. What I meant was to make this function look more or less like this: static void rt_context_saved(const struct scheduler *ops, struct vcpu *vc) { struct rt_vcpu *svc = rt_vcpu(vc); spinlock_t *lock = vcpu_schedule_lock_irq(vc); clear_bit(__RTDS_scheduled, &svc->flags); /* not insert idle vcpu to runq */ if ( is_idle_vcpu(vc) ) goto out; if ( test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) && likely(vcpu_runnable(vc)) ) { __runq_insert(ops, svc); runq_tickle(ops, snext); } else __replq_remove(ops, svc); out: vcpu_schedule_unlock_irq(lock, vc); } And, as said above, if you do this, try also removing the __replq_remove() call from rt_vcpu_sleep(), this one should cover for that (and, actually, more than just that!). After all this, check whether you still have the assert in __replq_insert() triggering and let me know Thanks and Regards, Dario
On 2/25/2016 6:31 PM, Dario Faggioli wrote: > Hey again, > > Thanks for turning up so quickly. > > We are getting closer and closer, although (of course :-)) I have some > more comments. > > However, is there a particular reason why you are keeping the RFC tag? > Until you do that, it's like saying that you are chasing feedback, but > you do not think yourself the patch should be considered for being > upstreamed. As far as my opinion goes, this patch is not ready to go in > right now (as I said, I've got questions and comments), but its status > is way past RFC. > Oh OK, I had the impression that RFC means request for comments and it should always be used because indeed, I'm asking for comments. > On Thu, 2016-02-25 at 15:05 -0500, Tianyang Chen wrote: >> changes since v5: >> removed unnecessary vcpu_on_replq() checks >> deadline_queue_insert() returns a flag to indicate if it's >> needed to re-program the timer >> removed unnecessary timer checks >> added helper functions to remove vcpus from queues >> coding style >> >> Changes since v4: >> removed unnecessary replenishment queue checks in vcpu_wake() >> extended replq_remove() to all cases in vcpu_sleep() >> used _deadline_queue_insert() helper function for both queues >> _replq_insert() and _replq_remove() program timer internally >> >> Changes since v3: >> removed running queue. >> added repl queue to keep track of repl events. >> timer is now per scheduler. >> timer is init on a valid cpu in a cpupool. >> > So, this does not belong here. It is ok to have it in this part of the > email, but it should not land in the actual commit changelog, once the > patch will be committed into Xen's git tree. > > The way to achieve the above is to put this summary of changes below > the actual changelog, and below the Signed-of-by lines, after a marker > that looks like this "---". > >> Budget replenishment and enforcement are separated by adding >> a replenishment timer, which fires at the next most imminent >> release time of all runnable vcpus. >> >> A new runningq has been added to keep track of all vcpus that >> are on pcpus. >> > Mmm.. Is this the proper changelog? runningq is something we discussed, > and that appeared in v2, but is certainly no longer here... :-O > oops... > Wait, maybe you misunderstood and/or I did not make myself clear enough > (in which case, sorry). I never meant to say "always stop the timer". > Atually, in one of my last email I said the opposite, and I think that > would be the best thing to do. > > Do you think there is any specific reason why we need to always stop > and restart it? If not, I think we can: > - have deadline_queue_remove() also return whether the element > removed was the first one (i.e., the one the timer was programmed > after); > - if it was, re-program the timer after the new front of the queue; > - if it wasn't, nothing to do. > > Thoughts? > It was my thought originally that the timer needs to be re-programmed only when the top vcpu is taken off. So did you mean when I manipulated repl_timer->expires manually, the timer should be stopped using proper timer API? The manipulation is gone now. Also, set_timer internally disables the timer so I assume it's safe just to call set_timer() directly, right? >> @@ -405,22 +500,38 @@ __runq_insert(const struct scheduler *ops, >> struct rt_vcpu *svc) >> >> /* add svc to runq if svc still has budget */ >> if ( svc->cur_budget > 0 ) >> - { >> - list_for_each(iter, runq) >> - { >> - struct rt_vcpu * iter_svc = __q_elem(iter); >> - if ( svc->cur_deadline <= iter_svc->cur_deadline ) >> - break; >> - } >> - list_add_tail(&svc->q_elem, iter); >> - } >> + deadline_queue_insert(&__q_elem, svc, &svc->q_elem, runq); >> else >> { >> list_add(&svc->q_elem, &prv->depletedq); >> + ASSERT( __vcpu_on_replq(svc) ); >> > So, by doing this, you're telling me that, if the vcpu is added to the > depleted queue, there must be a replenishment planned for it (or the > ASSERT() would fail). > > But if we are adding it to the runq, there may or may not be a > replenishment planned for it. > > Is this what we want? Why there must not be a replenishment planned > already for a vcpu going into the runq (to happen at its next > deadline)? > It looks like the current code doesn't add a vcpu to the replenishment queue when vcpu_insert() is called. When the scheduler is initialized, all the vcpus are added to the replenishment queue after waking up from sleep. This needs to be changed (add vcpu to replq in vcpu_insert()) to make it consistent in a sense that when rt_vcpu_insert() is called, it needs to have a corresponding replenishment event queued. This way the ASSERT() is for both cases in __runq_insert() to enforce the fact that "when a vcpu is inserted to runq/depletedq, a replenishment event is waiting for it". >> @@ -1087,10 +1193,6 @@ static void >> rt_context_saved(const struct scheduler *ops, struct vcpu *vc) >> { >> struct rt_vcpu *svc = rt_vcpu(vc); >> - struct rt_vcpu *snext = NULL; >> - struct rt_dom *sdom = NULL; >> - struct rt_private *prv = rt_priv(ops); >> - cpumask_t *online; >> spinlock_t *lock = vcpu_schedule_lock_irq(vc); >> >> clear_bit(__RTDS_scheduled, &svc->flags); >> @@ -1102,14 +1204,7 @@ rt_context_saved(const struct scheduler *ops, >> struct vcpu *vc) >> likely(vcpu_runnable(vc)) ) >> { >> __runq_insert(ops, svc); >> - __repl_update(ops, NOW()); >> - >> - ASSERT(!list_empty(&prv->sdom)); >> - sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem); >> - online = cpupool_domain_cpumask(sdom->dom); >> - snext = __runq_pick(ops, online); /* pick snext from ALL >> cpus */ >> - >> - runq_tickle(ops, snext); >> + runq_tickle(ops, svc); >> } >> > So, here we are. > > What I meant was to make this function look more or less like this: > > static void > rt_context_saved(const struct scheduler *ops, struct vcpu *vc) > { > struct rt_vcpu *svc = rt_vcpu(vc); > spinlock_t *lock = vcpu_schedule_lock_irq(vc); > > clear_bit(__RTDS_scheduled, &svc->flags); > /* not insert idle vcpu to runq */ > if ( is_idle_vcpu(vc) ) > goto out; > > if ( test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) && > likely(vcpu_runnable(vc)) ) > { > __runq_insert(ops, svc); > runq_tickle(ops, snext); > } > else > __replq_remove(ops, svc); > out: > vcpu_schedule_unlock_irq(lock, vc); > } > > And, as said above, if you do this, try also removing the > __replq_remove() call from rt_vcpu_sleep(), this one should cover for > that (and, actually, more than just that!). > > After all this, check whether you still have the assert in > __replq_insert() triggering and let me know So after moving the __replq_remove() to rt_context_saved(), the assert in __replq_insert() still fails when dom0 boots up. (XEN) Assertion '!vcpu_on_replq(svc)' failed at sched_rt.c:524 (XEN) ----[ Xen-4.7-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d080128c07>] sched_rt.c#__replq_insert+0x2b/0x64 (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor (d0v3) (XEN) rax: 0000000000000001 rbx: ffff83023b522940 rcx: 0000000000000001 (XEN) rdx: 00000031bb1b9980 rsi: ffff83023b486760 rdi: ffff83023b486760 (XEN) rbp: ffff8300bfcffd48 rsp: ffff8300bfcffd28 r8: 0000000000000004 (XEN) r9: 00000000deadbeef r10: ffff82d08025f5a0 r11: 0000000000000206 (XEN) r12: ffff83023b486760 r13: ffff83023b522d80 r14: ffff83023b4b5000 (XEN) r15: 000000023a6e774b cr0: 0000000080050033 cr4: 00000000000406a0 (XEN) cr3: 0000000231c0d000 cr2: ffff8802200d81f8 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff8300bfcffd28: (XEN) ffff82d08019642e ffff83023b486760 ffff8300bfd47000 ffff82d080299b00 (XEN) ffff8300bfcffd88 ffff82d08012a072 ffff8300bfcffd70 ffff8300bfd47000 (XEN) 000000023a6e75c0 ffff83023b522940 ffff82d08032bc00 0000000000000282 (XEN) ffff8300bfcffdd8 ffff82d08012be2c ffff83023b4b5000 ffff83023b4f1000 (XEN) ffff8300bfd44000 ffff8300bfd47000 0000000000000001 ffff83023b4b40c0 (XEN) ffff880230c14440 0000000000000000 ffff8300bfcffde8 ffff82d08012c347 (XEN) ffff8300bfcffe08 ffff82d080169cea ffff83023b4b5000 0000000000000003 (XEN) ffff8300bfcffe18 ffff82d080169d65 ffff8300bfcffe38 ffff82d08010762a (XEN) ffff83023b4b40c0 ffff83023b4b5000 ffff8300bfcffe68 ffff82d08010822a (XEN) ffff8300bfcffe68 fffffffffffffff2 ffff88022061dcb4 0000000000000000 (XEN) ffff8300bfcffef8 ffff82d0801096fc 0000000000000001 ffff8300bfcfff18 (XEN) ffff8300bfcffef8 ffff82d080240e85 ffff8300bfcf8000 0000000000000000 (XEN) 0000000000000246 ffffffff810013aa 0000000000000003 ffffffff810013aa (XEN) ffff8300bfcffee8 ffff8300bfd44000 ffff8802205e8000 0000000000000000 (XEN) ffff880230c14440 0000000000000000 00007cff403000c7 ffff82d0802439e2 (XEN) ffffffff8100140a 0000000000000020 0000000000000003 ffff880230c71900 (XEN) ffff8802206584d0 ffff880220658000 ffff88022061dcb8 0000000000000000 (XEN) 0000000000000206 0000000000000000 ffff880223000168 ffff880223408e00 (XEN) 0000000000000020 ffffffff8100140a 0000000000000000 ffff88022061dcb4 (XEN) 0000000000000004 0001010000000000 ffffffff8100140a 000000000000e033 (XEN) Xen call trace: (XEN) [<ffff82d080128c07>] sched_rt.c#__replq_insert+0x2b/0x64 (XEN) [<ffff82d08012a072>] sched_rt.c#rt_vcpu_wake+0xf2/0x12c (XEN) [<ffff82d08012be2c>] vcpu_wake+0x213/0x3d4 (XEN) [<ffff82d08012c347>] vcpu_unblock+0x4b/0x4d (XEN) [<ffff82d080169cea>] vcpu_kick+0x20/0x6f (XEN) [<ffff82d080169d65>] vcpu_mark_events_pending+0x2c/0x2f (XEN) [<ffff82d08010762a>] event_2l.c#evtchn_2l_set_pending+0xa9/0xb9 (XEN) [<ffff82d08010822a>] evtchn_send+0x158/0x183 (XEN) [<ffff82d0801096fc>] do_event_channel_op+0xe21/0x147d (XEN) [<ffff82d0802439e2>] lstar_enter+0xe2/0x13c (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Assertion '!vcpu_on_replq(svc)' failed at sched_rt.c:524 (XEN) **************************************** Thanks, Tianyang
On Fri, 2016-02-26 at 00:15 -0500, Tianyang Chen wrote: > On 2/25/2016 6:31 PM, Dario Faggioli wrote: > > As far as my opinion goes, this patch is not ready to > > go in > > right now (as I said, I've got questions and comments), but its > > status > > is way past RFC. > > > Oh OK, I had the impression that RFC means request for comments and > it > should always be used because indeed, I'm asking for comments. > Exactly. Everyone is asking for comments when sending out a patch series, and that's why it's not necessary to state it with the tag... it's always the case, so no need to tell it all the times! And, for the same reason, this means that, when the tag is there, you're not only asking for comments and/or, if everything is ok, inclusion, but you're asking for "just some feedback on a draft", and as I said, we're beyond that phase. :-) > > Wait, maybe you misunderstood and/or I did not make myself clear > > enough > > (in which case, sorry). I never meant to say "always stop the > > timer". > > Atually, in one of my last email I said the opposite, and I think > > that > > would be the best thing to do. > > > > Do you think there is any specific reason why we need to always > > stop > > and restart it? If not, I think we can: > > - have deadline_queue_remove() also return whether the element > > removed was the first one (i.e., the one the timer was > > programmed > > after); > > - if it was, re-program the timer after the new front of the > > queue; > > - if it wasn't, nothing to do. > > > > Thoughts? > > > It was my thought originally that the timer needs to be re- > programmed > only when the top vcpu is taken off. So did you mean when I > manipulated > repl_timer->expires manually, the timer should be stopped using > proper > timer API? The manipulation is gone now. > I know... This is mostly due to my fault commenting on this in two different emails. So, basically, yes, I meant that, if you want to fiddle with the timer --like you where doing with those 'repl_timer->expires = 0'-- you need to do it properly, with the proper API, locking, etc. Then, in a subsequent email, I said that you just only need to do that in a subset of the cases when this function is called. Of course, the desired result is the combination of the two above considerations, i.e.: - only stop/restart the timer when necessary, - if necessary, do it properly. > Also, set_timer internally > disables the timer so I assume it's safe just to call set_timer() > directly, right? > Yes, it is. > > > @@ -405,22 +500,38 @@ __runq_insert(const struct scheduler *ops, > > > struct rt_vcpu *svc) > > > > > > /* add svc to runq if svc still has budget */ > > > if ( svc->cur_budget > 0 ) > > > - { > > > - list_for_each(iter, runq) > > > - { > > > - struct rt_vcpu * iter_svc = __q_elem(iter); > > > - if ( svc->cur_deadline <= iter_svc->cur_deadline ) > > > - break; > > > - } > > > - list_add_tail(&svc->q_elem, iter); > > > - } > > > + deadline_queue_insert(&__q_elem, svc, &svc->q_elem, > > > runq); > > > else > > > { > > > list_add(&svc->q_elem, &prv->depletedq); > > > + ASSERT( __vcpu_on_replq(svc) ); > > > > > So, by doing this, you're telling me that, if the vcpu is added to > > the > > depleted queue, there must be a replenishment planned for it (or > > the > > ASSERT() would fail). > > > > But if we are adding it to the runq, there may or may not be a > > replenishment planned for it. > > > > Is this what we want? Why there must not be a replenishment planned > > already for a vcpu going into the runq (to happen at its next > > deadline)? > > > It looks like the current code doesn't add a vcpu to the > replenishment > queue when vcpu_insert() is called. When the scheduler is > initialized, > all the vcpus are added to the replenishment queue after waking up > from > sleep. This needs to be changed (add vcpu to replq in vcpu_insert()) > to > make it consistent in a sense that when rt_vcpu_insert() is called, > it > needs to have a corresponding replenishment event queued. > > This way the ASSERT() is for both cases in __runq_insert() to > enforce > the fact that "when a vcpu is inserted to runq/depletedq, a > replenishment event is waiting for it". > I am ok with this (calling replq_insert() in rt_vcpu_insert()). Not just doing that unconditionally though, as a vcpu can, e.g., be paused when the insert_vcpu hook is called, and in that case, I don't think we want to enqueue the replenishment event, do we? > > static void > > rt_context_saved(const struct scheduler *ops, struct vcpu *vc) > > { > > struct rt_vcpu *svc = rt_vcpu(vc); > > spinlock_t *lock = vcpu_schedule_lock_irq(vc); > > > > clear_bit(__RTDS_scheduled, &svc->flags); > > /* not insert idle vcpu to runq */ > > if ( is_idle_vcpu(vc) ) > > goto out; > > > > if ( test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) > > && > > likely(vcpu_runnable(vc)) ) > > { > > __runq_insert(ops, svc); > > runq_tickle(ops, snext); > > } > > else > > __replq_remove(ops, svc); > > out: > > vcpu_schedule_unlock_irq(lock, vc); > > } > > > > And, as said above, if you do this, try also removing the > > __replq_remove() call from rt_vcpu_sleep(), this one should cover > > for > > that (and, actually, more than just that!). > > > > After all this, check whether you still have the assert in > > __replq_insert() triggering and let me know > So after moving the __replq_remove() to rt_context_saved(), the > assert > in __replq_insert() still fails when dom0 boots up. > Well, maybe removing __replq_remove() from rt_vcpu_sleep() is not entirely ok, as if we do that we fail to deal with the case of when (still in rt_vcpu_sleep()), __vcpu_on_q(svc) is true. So, I'd say, do as I said above wrt rt_context_saved(). For rt_vcpu_sleep(), you can try something like this: static void rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) { struct rt_vcpu * const svc = rt_vcpu(vc); BUG_ON( is_idle_vcpu(vc) ); SCHED_STAT_CRANK(vcpu_sleep); if ( curr_on_cpu(vc->processor) == vc ) cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ); else if ( __vcpu_on_q(svc) ) { __q_remove(svc); __replq_remove(svc); <=== *** LOOK HERE *** } else if ( svc->flags & RTDS_delayed_runq_add ) clear_bit(__RTDS_delayed_runq_add, &svc->flags); } In fact, in both the first and the third case, we go will at some point pass through rt_context_switch(), and hit the __replq_remove() that I made you put there. In the case in the middle, as the vcpu was just queued, and for making it go to sleep, it is enough to remove it from the runq (or depletedq, in the case of this scheduler), we won't go through rt_schedule()-->rt_context_saved(), and hence the __replq_remove() won't be called. Sorry for the overlook, can you try this. That being said... > (XEN) Assertion '!vcpu_on_replq(svc)' failed at sched_rt.c:524 > (XEN) ----[ Xen-4.7-unstable x86_64 debug=y Tainted: C ]---- > > (XEN) Xen call trace: > (XEN) [<ffff82d080128c07>] sched_rt.c#__replq_insert+0x2b/0x64 > (XEN) [<ffff82d08012a072>] sched_rt.c#rt_vcpu_wake+0xf2/0x12c > (XEN) [<ffff82d08012be2c>] vcpu_wake+0x213/0x3d4 > (XEN) [<ffff82d08012c347>] vcpu_unblock+0x4b/0x4d > (XEN) [<ffff82d080169cea>] vcpu_kick+0x20/0x6f > (XEN) [<ffff82d080169d65>] vcpu_mark_events_pending+0x2c/0x2f > (XEN) [<ffff82d08010762a>] > event_2l.c#evtchn_2l_set_pending+0xa9/0xb9 > (XEN) [<ffff82d08010822a>] evtchn_send+0x158/0x183 > (XEN) [<ffff82d0801096fc>] do_event_channel_op+0xe21/0x147d > (XEN) [<ffff82d0802439e2>] lstar_enter+0xe2/0x13c > (XEN) > ... This says that the we call __replq_insert() from rt_vcpu_wake() and find out in there that vcpu_on_replq() is true. However, in v6 code for rt_vcpu_wake(), I can see this: + if( !__vcpu_on_replq(svc) ) + { + __replq_insert(ops, svc); + ASSERT( vcpu_runnable(vc) ); + } which would make me think that, if vcpu_on_replq() is true, we shouldn't be calling __replq_insert() in the first place. So, have you made other changes wrt v6 when trying this? Regards, Dario
On 2/26/2016 4:11 AM, Dario Faggioli wrote: >> It looks like the current code doesn't add a vcpu to the >> replenishment >> queue when vcpu_insert() is called. When the scheduler is >> initialized, >> all the vcpus are added to the replenishment queue after waking up >> from >> sleep. This needs to be changed (add vcpu to replq in vcpu_insert()) >> to >> make it consistent in a sense that when rt_vcpu_insert() is called, >> it >> needs to have a corresponding replenishment event queued. >> >> This way the ASSERT() is for both cases in __runq_insert() to >> enforce >> the fact that "when a vcpu is inserted to runq/depletedq, a >> replenishment event is waiting for it". >> > I am ok with this (calling replq_insert() in rt_vcpu_insert()). Not > just doing that unconditionally though, as a vcpu can, e.g., be paused > when the insert_vcpu hook is called, and in that case, I don't think we > want to enqueue the replenishment event, do we? > Yes. >>> static void >>> rt_context_saved(const struct scheduler *ops, struct vcpu *vc) >>> { >>> struct rt_vcpu *svc = rt_vcpu(vc); >>> spinlock_t *lock = vcpu_schedule_lock_irq(vc); >>> >>> clear_bit(__RTDS_scheduled, &svc->flags); >>> /* not insert idle vcpu to runq */ >>> if ( is_idle_vcpu(vc) ) >>> goto out; >>> >>> if ( test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) >>> && >>> likely(vcpu_runnable(vc)) ) >>> { >>> __runq_insert(ops, svc); >>> runq_tickle(ops, snext); >>> } >>> else >>> __replq_remove(ops, svc); >>> out: >>> vcpu_schedule_unlock_irq(lock, vc); >>> } >>> >>> And, as said above, if you do this, try also removing the >>> __replq_remove() call from rt_vcpu_sleep(), this one should cover >>> for >>> that (and, actually, more than just that!). >>> >>> After all this, check whether you still have the assert in >>> __replq_insert() triggering and let me know >> So after moving the __replq_remove() to rt_context_saved(), the >> assert >> in __replq_insert() still fails when dom0 boots up. >> > Well, maybe removing __replq_remove() from rt_vcpu_sleep() is not > entirely ok, as if we do that we fail to deal with the case of when > (still in rt_vcpu_sleep()), __vcpu_on_q(svc) is true. > > So, I'd say, do as I said above wrt rt_context_saved(). For > rt_vcpu_sleep(), you can try something like this: > > static void > rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) > { > struct rt_vcpu * const svc = rt_vcpu(vc); > > BUG_ON( is_idle_vcpu(vc) ); > SCHED_STAT_CRANK(vcpu_sleep); > > if ( curr_on_cpu(vc->processor) == vc ) > cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ); > else if ( __vcpu_on_q(svc) ) > { > __q_remove(svc); > __replq_remove(svc); <=== *** LOOK HERE *** > } > else if ( svc->flags & RTDS_delayed_runq_add ) > clear_bit(__RTDS_delayed_runq_add, &svc->flags); > } > > In fact, in both the first and the third case, we go will at some point > pass through rt_context_switch(), and hit the __replq_remove() that I > made you put there. > > In the case in the middle, as the vcpu was just queued, and for making > it go to sleep, it is enough to remove it from the runq (or depletedq, > in the case of this scheduler), we won't go through > rt_schedule()-->rt_context_saved(), and hence the __replq_remove() > won't be called. > > Sorry for the overlook, can you try this. > > That being said... > >> (XEN) Assertion '!vcpu_on_replq(svc)' failed at sched_rt.c:524 >> (XEN) ----[ Xen-4.7-unstable x86_64 debug=y Tainted: C ]---- >> >> (XEN) Xen call trace: >> (XEN) [<ffff82d080128c07>] sched_rt.c#__replq_insert+0x2b/0x64 >> (XEN) [<ffff82d08012a072>] sched_rt.c#rt_vcpu_wake+0xf2/0x12c >> (XEN) [<ffff82d08012be2c>] vcpu_wake+0x213/0x3d4 >> (XEN) [<ffff82d08012c347>] vcpu_unblock+0x4b/0x4d >> (XEN) [<ffff82d080169cea>] vcpu_kick+0x20/0x6f >> (XEN) [<ffff82d080169d65>] vcpu_mark_events_pending+0x2c/0x2f >> (XEN) [<ffff82d08010762a>] >> event_2l.c#evtchn_2l_set_pending+0xa9/0xb9 >> (XEN) [<ffff82d08010822a>] evtchn_send+0x158/0x183 >> (XEN) [<ffff82d0801096fc>] do_event_channel_op+0xe21/0x147d >> (XEN) [<ffff82d0802439e2>] lstar_enter+0xe2/0x13c >> (XEN) >> > ... This says that the we call __replq_insert() from rt_vcpu_wake() and > find out in there that vcpu_on_replq() is true. > > However, in v6 code for rt_vcpu_wake(), I can see this: > > + if( !__vcpu_on_replq(svc) ) > + { > + __replq_insert(ops, svc); > + ASSERT( vcpu_runnable(vc) ); > + } > > which would make me think that, if vcpu_on_replq() is true, we > shouldn't be calling __replq_insert() in the first place. > > So, have you made other changes wrt v6 when trying this? The v6 doesn't have the if statement commented out when I submitted it. But I tried commenting it out, the assertion failed. It fails in V6 with: rt_vcpu_sleep(): removing replenishment event for all cases rt_context_saved(): not removing replenishment events rt_vcpu_wake(): not checking if the event is already queued. or with: rt_vcpu_sleep(): not removing replenishment event at all rt_context_saved(): removing replenishment events if not runnable rt_vcpu_wake(): not checking if the event is already queued. Also with: rt_vcpu_sleep(): removing replenishment event if the vcpu is on runq/depletedq rt_context_saved(): removing replenishment events if not runnable rt_vcpu_wake(): not checking if the event is already queued. I added debug prints in all these functions and noticed that it could be caused by racing between spurious wakeups and context switching. See the following events for the last modification above: (XEN) cpu1 picked idle (XEN) d0 attempted to change d0v1's CR4 flags 00000620 -> 00040660 (XEN) cpu2 picked idle (XEN) vcpu1 sleeps on cpu (XEN) cpu0 picked idle (XEN) vcpu1 context saved not runnable (XEN) vcpu1 wakes up nowhere (XEN) cpu0 picked vcpu1 (XEN) vcpu1 sleeps on cpu (XEN) cpu0 picked idle (XEN) vcpu1 context saved not runnable (XEN) vcpu1 wakes up nowhere (XEN) cpu0 picked vcpu1 (XEN) cpu0 picked idle (XEN) vcpu1 context saved not runnable (XEN) cpu0 picked vcpu0 (XEN) vcpu1 wakes up nowhere (XEN) cpu1 picked vcpu1 *** vcpu1 is on a cpu (XEN) cpu1 picked idle *** vcpu1 is waiting to be context switched (XEN) vcpu2 wakes up nowhere (XEN) cpu0 picked vcpu0 (XEN) cpu2 picked vcpu2 (XEN) cpu0 picked vcpu0 (XEN) cpu0 picked vcpu0 (XEN) d0 attempted to change d0v2's CR4 flags 00000620 -> 00040660 (XEN) cpu0 picked vcpu0 (XEN) vcpu2 sleeps on cpu (XEN) vcpu1 wakes up nowhere *** vcpu1 wakes up without sleep? (XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:526 (XEN) ----[ Xen-4.7-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d08012a151>] sched_rt.c#rt_vcpu_wake+0x11f/0x17b ... (XEN) Xen call trace: (XEN) [<ffff82d08012a151>] sched_rt.c#rt_vcpu_wake+0x11f/0x17b (XEN) [<ffff82d08012bf2c>] vcpu_wake+0x213/0x3d4 (XEN) [<ffff82d08012c447>] vcpu_unblock+0x4b/0x4d (XEN) [<ffff82d080169cea>] vcpu_kick+0x20/0x6f (XEN) [<ffff82d080169d65>] vcpu_mark_events_pending+0x2c/0x2f (XEN) [<ffff82d08010762a>] event_2l.c#evtchn_2l_set_pending+0xa9/0xb9 (XEN) [<ffff82d080108312>] send_guest_vcpu_virq+0x9d/0xba (XEN) [<ffff82d080196cbe>] send_timer_event+0xe/0x10 (XEN) [<ffff82d08012a7b5>] schedule.c#vcpu_singleshot_timer_fn+0x9/0xb (XEN) [<ffff82d080131978>] timer.c#execute_timer+0x4e/0x6c (XEN) [<ffff82d080131ab9>] timer.c#timer_softirq_action+0xdd/0x213 (XEN) [<ffff82d08012df32>] softirq.c#__do_softirq+0x82/0x8d (XEN) [<ffff82d08012df8a>] do_softirq+0x13/0x15 (XEN) [<ffff82d080243ad1>] cpufreq.c#process_softirqs+0x21/0x30 So, it looks like spurious wakeup for vcpu1 happens before it was completely context switched off a cpu. But rt_vcpu_wake() didn't see it on cpu with curr_on_cpu() so it fell through the first two RETURNs. I guess the replenishment queue check is necessary for this situation? Thanks, Tianyang
On Fri, 2016-02-26 at 12:28 -0500, Tianyang Chen wrote: > > So, have you made other changes wrt v6 when trying this? > The v6 doesn't have the if statement commented out when I submitted > it. > But I tried commenting it out, the assertion failed. > Ok, thanks for these tests. Can you send (just quick-&-dirtily, as an attached to a replay to this email, no need of a proper re-submission of a new version) the patch that does this: > rt_vcpu_sleep(): removing replenishment event if the vcpu is on > runq/depletedq > rt_context_saved(): removing replenishment events if not runnable > rt_vcpu_wake(): not checking if the event is already queued. > > I added debug prints in all these functions and noticed that it could > be > caused by racing between spurious wakeups and context switching. > And the code that produces these debug output as well? > (XEN) cpu1 picked idle > (XEN) d0 attempted to change d0v1's CR4 flags 00000620 -> 00040660 > (XEN) cpu2 picked idle > (XEN) vcpu1 sleeps on cpu > (XEN) cpu0 picked idle > (XEN) vcpu1 context saved not runnable > (XEN) vcpu1 wakes up nowhere > (XEN) cpu0 picked vcpu1 > (XEN) vcpu1 sleeps on cpu > (XEN) cpu0 picked idle > (XEN) vcpu1 context saved not runnable > (XEN) vcpu1 wakes up nowhere > (XEN) cpu0 picked vcpu1 > (XEN) cpu0 picked idle > (XEN) vcpu1 context saved not runnable > (XEN) cpu0 picked vcpu0 > (XEN) vcpu1 wakes up nowhere > (XEN) cpu1 picked vcpu1 *** vcpu1 is on a cpu > (XEN) cpu1 picked idle *** vcpu1 is waiting to be context > switched > (XEN) vcpu2 wakes up nowhere > (XEN) cpu0 picked vcpu0 > (XEN) cpu2 picked vcpu2 > (XEN) cpu0 picked vcpu0 > (XEN) cpu0 picked vcpu0 > (XEN) d0 attempted to change d0v2's CR4 flags 00000620 -> 00040660 > (XEN) cpu0 picked vcpu0 > (XEN) vcpu2 sleeps on cpu > (XEN) vcpu1 wakes up nowhere *** vcpu1 wakes up without sleep? > > (XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:526 > (XEN) ----[ Xen-4.7-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82d08012a151>] > sched_rt.c#rt_vcpu_wake+0x11f/0x17b > ... > (XEN) Xen call trace: > (XEN) [<ffff82d08012a151>] sched_rt.c#rt_vcpu_wake+0x11f/0x17b > (XEN) [<ffff82d08012bf2c>] vcpu_wake+0x213/0x3d4 > (XEN) [<ffff82d08012c447>] vcpu_unblock+0x4b/0x4d > (XEN) [<ffff82d080169cea>] vcpu_kick+0x20/0x6f > (XEN) [<ffff82d080169d65>] vcpu_mark_events_pending+0x2c/0x2f > (XEN) [<ffff82d08010762a>] > event_2l.c#evtchn_2l_set_pending+0xa9/0xb9 > (XEN) [<ffff82d080108312>] send_guest_vcpu_virq+0x9d/0xba > (XEN) [<ffff82d080196cbe>] send_timer_event+0xe/0x10 > (XEN) [<ffff82d08012a7b5>] > schedule.c#vcpu_singleshot_timer_fn+0x9/0xb > (XEN) [<ffff82d080131978>] timer.c#execute_timer+0x4e/0x6c > (XEN) [<ffff82d080131ab9>] timer.c#timer_softirq_action+0xdd/0x213 > (XEN) [<ffff82d08012df32>] softirq.c#__do_softirq+0x82/0x8d > (XEN) [<ffff82d08012df8a>] do_softirq+0x13/0x15 > (XEN) [<ffff82d080243ad1>] cpufreq.c#process_softirqs+0x21/0x30 > > > So, it looks like spurious wakeup for vcpu1 happens before it was > completely context switched off a cpu. But rt_vcpu_wake() didn't see > it > on cpu with curr_on_cpu() so it fell through the first two RETURNs. > > I guess the replenishment queue check is necessary for this > situation? > Perhaps, but I first want to make sure we understand what is really happening. Regards, Dario
Attached. Tianyang On 2016-02-26 13:09, Dario Faggioli wrote: > On Fri, 2016-02-26 at 12:28 -0500, Tianyang Chen wrote: >> > So, have you made other changes wrt v6 when trying this? >> The v6 doesn't have the if statement commented out when I submitted >> it. >> But I tried commenting it out, the assertion failed. >> > Ok, thanks for these tests. Can you send (just quick-&-dirtily, as an > attached to a replay to this email, no need of a proper re-submission > of a new version) the patch that does this: > >> rt_vcpu_sleep(): removing replenishment event if the vcpu is on >> runq/depletedq >> rt_context_saved(): removing replenishment events if not runnable >> rt_vcpu_wake(): not checking if the event is already queued. >> >> I added debug prints in all these functions and noticed that it could >> be >> caused by racing between spurious wakeups and context switching. >> > And the code that produces these debug output as well? > >> (XEN) cpu1 picked idle >> (XEN) d0 attempted to change d0v1's CR4 flags 00000620 -> 00040660 >> (XEN) cpu2 picked idle >> (XEN) vcpu1 sleeps on cpu >> (XEN) cpu0 picked idle >> (XEN) vcpu1 context saved not runnable >> (XEN) vcpu1 wakes up nowhere >> (XEN) cpu0 picked vcpu1 >> (XEN) vcpu1 sleeps on cpu >> (XEN) cpu0 picked idle >> (XEN) vcpu1 context saved not runnable >> (XEN) vcpu1 wakes up nowhere >> (XEN) cpu0 picked vcpu1 >> (XEN) cpu0 picked idle >> (XEN) vcpu1 context saved not runnable >> (XEN) cpu0 picked vcpu0 >> (XEN) vcpu1 wakes up nowhere >> (XEN) cpu1 picked vcpu1 *** vcpu1 is on a cpu >> (XEN) cpu1 picked idle *** vcpu1 is waiting to be context >> switched >> (XEN) vcpu2 wakes up nowhere >> (XEN) cpu0 picked vcpu0 >> (XEN) cpu2 picked vcpu2 >> (XEN) cpu0 picked vcpu0 >> (XEN) cpu0 picked vcpu0 >> (XEN) d0 attempted to change d0v2's CR4 flags 00000620 -> 00040660 >> (XEN) cpu0 picked vcpu0 >> (XEN) vcpu2 sleeps on cpu >> (XEN) vcpu1 wakes up nowhere *** vcpu1 wakes up without sleep? >> >> (XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:526 >> (XEN) ----[ Xen-4.7-unstable x86_64 debug=y Tainted: C ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[<ffff82d08012a151>] >> sched_rt.c#rt_vcpu_wake+0x11f/0x17b >> ... >> (XEN) Xen call trace: >> (XEN) [<ffff82d08012a151>] sched_rt.c#rt_vcpu_wake+0x11f/0x17b >> (XEN) [<ffff82d08012bf2c>] vcpu_wake+0x213/0x3d4 >> (XEN) [<ffff82d08012c447>] vcpu_unblock+0x4b/0x4d >> (XEN) [<ffff82d080169cea>] vcpu_kick+0x20/0x6f >> (XEN) [<ffff82d080169d65>] vcpu_mark_events_pending+0x2c/0x2f >> (XEN) [<ffff82d08010762a>] >> event_2l.c#evtchn_2l_set_pending+0xa9/0xb9 >> (XEN) [<ffff82d080108312>] send_guest_vcpu_virq+0x9d/0xba >> (XEN) [<ffff82d080196cbe>] send_timer_event+0xe/0x10 >> (XEN) [<ffff82d08012a7b5>] >> schedule.c#vcpu_singleshot_timer_fn+0x9/0xb >> (XEN) [<ffff82d080131978>] timer.c#execute_timer+0x4e/0x6c >> (XEN) [<ffff82d080131ab9>] timer.c#timer_softirq_action+0xdd/0x213 >> (XEN) [<ffff82d08012df32>] softirq.c#__do_softirq+0x82/0x8d >> (XEN) [<ffff82d08012df8a>] do_softirq+0x13/0x15 >> (XEN) [<ffff82d080243ad1>] cpufreq.c#process_softirqs+0x21/0x30 >> >> >> So, it looks like spurious wakeup for vcpu1 happens before it was >> completely context switched off a cpu. But rt_vcpu_wake() didn't see >> it >> on cpu with curr_on_cpu() so it fell through the first two RETURNs. >> >> I guess the replenishment queue check is necessary for this >> situation? >> > Perhaps, but I first want to make sure we understand what is really > happening. > > Regards, > Dario
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index 2e5430f..16f77f9 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -16,6 +16,7 @@ #include <xen/delay.h> #include <xen/event.h> #include <xen/time.h> +#include <xen/timer.h> #include <xen/perfc.h> #include <xen/sched-if.h> #include <xen/softirq.h> @@ -87,7 +88,7 @@ #define RTDS_DEFAULT_BUDGET (MICROSECS(4000)) #define UPDATE_LIMIT_SHIFT 10 -#define MAX_SCHEDULE (MILLISECS(1)) + /* * Flags */ @@ -142,6 +143,9 @@ static cpumask_var_t *_cpumask_scratch; */ static unsigned int nr_rt_ops; +/* handler for the replenishment timer */ +static void repl_handler(void *data); + /* * Systme-wide private data, include global RunQueue/DepletedQ * Global lock is referenced by schedule_data.schedule_lock from all @@ -152,7 +156,9 @@ struct rt_private { struct list_head sdom; /* list of availalbe domains, used for dump */ struct list_head runq; /* ordered list of runnable vcpus */ struct list_head depletedq; /* unordered list of depleted vcpus */ + struct list_head replq; /* ordered list of vcpus that need replenishment */ cpumask_t tickled; /* cpus been tickled */ + struct timer *repl_timer; /* replenishment timer */ }; /* @@ -160,6 +166,7 @@ struct rt_private { */ struct rt_vcpu { struct list_head q_elem; /* on the runq/depletedq list */ + struct list_head replq_elem;/* on the repl event list */ /* Up-pointers */ struct rt_dom *sdom; @@ -213,8 +220,14 @@ static inline struct list_head *rt_depletedq(const struct scheduler *ops) return &rt_priv(ops)->depletedq; } +static inline struct list_head *rt_replq(const struct scheduler *ops) +{ + return &rt_priv(ops)->replq; +} + /* - * Queue helper functions for runq and depletedq + * Queue helper functions for runq, depletedq + * and replenishment event queue */ static int __vcpu_on_q(const struct rt_vcpu *svc) @@ -228,6 +241,18 @@ __q_elem(struct list_head *elem) return list_entry(elem, struct rt_vcpu, q_elem); } +static struct rt_vcpu * +__replq_elem(struct list_head *elem) +{ + return list_entry(elem, struct rt_vcpu, replq_elem); +} + +static int +__vcpu_on_replq(const struct rt_vcpu *svc) +{ + return !list_empty(&svc->replq_elem); +} + /* * Debug related code, dump vcpu/cpu information */ @@ -288,7 +313,7 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu) static void rt_dump(const struct scheduler *ops) { - struct list_head *runq, *depletedq, *iter; + struct list_head *runq, *depletedq, *replq, *iter; struct rt_private *prv = rt_priv(ops); struct rt_vcpu *svc; struct rt_dom *sdom; @@ -301,6 +326,7 @@ rt_dump(const struct scheduler *ops) runq = rt_runq(ops); depletedq = rt_depletedq(ops); + replq = rt_replq(ops); printk("Global RunQueue info:\n"); list_for_each( iter, runq ) @@ -316,6 +342,13 @@ rt_dump(const struct scheduler *ops) rt_dump_vcpu(ops, svc); } + printk("Global Replenishment Event info:\n"); + list_for_each( iter, replq ) + { + svc = __replq_elem(iter); + rt_dump_vcpu(ops, svc); + } + printk("Domain info:\n"); list_for_each( iter, &prv->sdom ) { @@ -380,11 +413,74 @@ rt_update_deadline(s_time_t now, struct rt_vcpu *svc) return; } +/* a helper function that only removes a vcpu from a queue */ +static inline void +deadline_queue_remove(struct list_head *elem) +{ + list_del_init(elem); +} + static inline void __q_remove(struct rt_vcpu *svc) { if ( __vcpu_on_q(svc) ) - list_del_init(&svc->q_elem); + deadline_queue_remove(&svc->q_elem); +} + +/* + * Removing a vcpu from the replenishment queue could + * re-program the timer for the next replenishment event + * if there is any on the list + */ +static inline void +__replq_remove(const struct scheduler *ops, struct rt_vcpu *svc) +{ + struct rt_private *prv = rt_priv(ops); + struct list_head *replq = rt_replq(ops); + struct timer* repl_timer = prv->repl_timer; + + /* + * Disarm the timer before re-programming it. + * It doesn't matter if the vcpu to be removed + * is on top of the list or not because the timer + * is stopped and needs to be re-programmed anyways + */ + stop_timer(repl_timer); + + deadline_queue_remove(&svc->replq_elem); + + /* re-arm the timer for the next replenishment event */ + if( !list_empty(replq) ) + { + struct rt_vcpu *svc_next = __replq_elem(replq->next); + set_timer(repl_timer, svc_next->cur_deadline); + } +} + +/* + * An utility function that inserts a vcpu to a + * queue based on certain order (EDF). The returned + * value is 1 if a vcpu has been inserted to the + * front of a list + */ +static int +deadline_queue_insert(struct rt_vcpu * (*_get_q_elem)(struct list_head *elem), + struct rt_vcpu *svc, struct list_head *elem, struct list_head *queue) +{ + struct list_head *iter; + int pos = 0; + + list_for_each(iter, queue) + { + struct rt_vcpu * iter_svc = (*_get_q_elem)(iter); + if ( svc->cur_deadline <= iter_svc->cur_deadline ) + break; + + pos++; + } + + list_add_tail(elem, iter); + return !pos; } /* @@ -397,7 +493,6 @@ __runq_insert(const struct scheduler *ops, struct rt_vcpu *svc) { struct rt_private *prv = rt_priv(ops); struct list_head *runq = rt_runq(ops); - struct list_head *iter; ASSERT( spin_is_locked(&prv->lock) ); @@ -405,22 +500,38 @@ __runq_insert(const struct scheduler *ops, struct rt_vcpu *svc) /* add svc to runq if svc still has budget */ if ( svc->cur_budget > 0 ) - { - list_for_each(iter, runq) - { - struct rt_vcpu * iter_svc = __q_elem(iter); - if ( svc->cur_deadline <= iter_svc->cur_deadline ) - break; - } - list_add_tail(&svc->q_elem, iter); - } + deadline_queue_insert(&__q_elem, svc, &svc->q_elem, runq); else { list_add(&svc->q_elem, &prv->depletedq); + ASSERT( __vcpu_on_replq(svc) ); } } /* + * Insert svc into the replenishment event list + * in replenishment time order. + * vcpus that need to be replished earlier go first. + * The timer may be re-programmed if svc is inserted + * at the front of the event list. + */ +static void +__replq_insert(const struct scheduler *ops, struct rt_vcpu *svc) +{ + struct list_head *replq = rt_replq(ops); + struct rt_private *prv = rt_priv(ops); + struct timer *repl_timer = prv->repl_timer; + int set; + + ASSERT( !__vcpu_on_replq(svc) ); + + set = deadline_queue_insert(&__replq_elem, svc, &svc->replq_elem, replq); + + if( set ) + set_timer(repl_timer,svc->cur_deadline); +} + +/* * Init/Free related code */ static int @@ -449,11 +560,18 @@ rt_init(struct scheduler *ops) INIT_LIST_HEAD(&prv->sdom); INIT_LIST_HEAD(&prv->runq); INIT_LIST_HEAD(&prv->depletedq); + INIT_LIST_HEAD(&prv->replq); cpumask_clear(&prv->tickled); ops->sched_data = prv; + /* + * The timer initialization will happen later when + * the first pcpu is added to this pool in alloc_pdata + */ + prv->repl_timer = NULL; + return 0; no_mem: @@ -473,6 +591,10 @@ rt_deinit(const struct scheduler *ops) xfree(_cpumask_scratch); _cpumask_scratch = NULL; } + + kill_timer(prv->repl_timer); + xfree(prv->repl_timer); + xfree(prv); } @@ -493,6 +615,17 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu) if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) ) return NULL; + if( prv->repl_timer == NULL ) + { + /* allocate the timer on the first cpu of this pool */ + prv->repl_timer = xzalloc(struct timer); + + if(prv->repl_timer == NULL ) + return NULL; + + init_timer(prv->repl_timer, repl_handler, (void *)ops, cpu); + } + /* 1 indicates alloc. succeed in schedule.c */ return (void *)1; } @@ -586,6 +719,7 @@ rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd) return NULL; INIT_LIST_HEAD(&svc->q_elem); + INIT_LIST_HEAD(&svc->replq_elem); svc->flags = 0U; svc->sdom = dd; svc->vcpu = vc; @@ -609,7 +743,8 @@ rt_free_vdata(const struct scheduler *ops, void *priv) } /* - * This function is called in sched_move_domain() in schedule.c + * It is called in sched_move_domain() and sched_init_vcpu + * in schedule.c * When move a domain to a new cpupool. * It inserts vcpus of moving domain to the scheduler's RunQ in * dest. cpupool. @@ -651,6 +786,10 @@ rt_vcpu_remove(const struct scheduler *ops, struct vcpu *vc) lock = vcpu_schedule_lock_irq(vc); if ( __vcpu_on_q(svc) ) __q_remove(svc); + + if( __vcpu_on_replq(svc) ) + __replq_remove(ops,svc); + vcpu_schedule_unlock_irq(lock, vc); } @@ -785,44 +924,6 @@ __runq_pick(const struct scheduler *ops, const cpumask_t *mask) } /* - * Update vcpu's budget and - * sort runq by insert the modifed vcpu back to runq - * lock is grabbed before calling this function - */ -static void -__repl_update(const struct scheduler *ops, s_time_t now) -{ - struct list_head *runq = rt_runq(ops); - struct list_head *depletedq = rt_depletedq(ops); - struct list_head *iter; - struct list_head *tmp; - struct rt_vcpu *svc = NULL; - - list_for_each_safe(iter, tmp, runq) - { - svc = __q_elem(iter); - if ( now < svc->cur_deadline ) - break; - - rt_update_deadline(now, svc); - /* reinsert the vcpu if its deadline is updated */ - __q_remove(svc); - __runq_insert(ops, svc); - } - - list_for_each_safe(iter, tmp, depletedq) - { - svc = __q_elem(iter); - if ( now >= svc->cur_deadline ) - { - rt_update_deadline(now, svc); - __q_remove(svc); /* remove from depleted queue */ - __runq_insert(ops, svc); /* add to runq */ - } - } -} - -/* * schedule function for rt scheduler. * The lock is already grabbed in schedule.c, no need to lock here */ @@ -841,7 +942,6 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched /* burn_budget would return for IDLE VCPU */ burn_budget(ops, scurr, now); - __repl_update(ops, now); if ( tasklet_work_scheduled ) { @@ -868,6 +968,8 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched set_bit(__RTDS_delayed_runq_add, &scurr->flags); snext->last_start = now; + + ret.time = -1; /* if an idle vcpu is picked */ if ( !is_idle_vcpu(snext->vcpu) ) { if ( snext != scurr ) @@ -880,9 +982,11 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched snext->vcpu->processor = cpu; ret.migrated = 1; } + + ret.time = snext->budget; /* invoke the scheduler next time */ + } - ret.time = MIN(snext->budget, MAX_SCHEDULE); /* sched quantum */ ret.task = snext->vcpu; /* TRACE */ @@ -924,6 +1028,8 @@ rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc) __q_remove(svc); else if ( svc->flags & RTDS_delayed_runq_add ) clear_bit(__RTDS_delayed_runq_add, &svc->flags); + + __replq_remove(ops, svc); } /* @@ -1026,10 +1132,6 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) { struct rt_vcpu * const svc = rt_vcpu(vc); s_time_t now = NOW(); - struct rt_private *prv = rt_priv(ops); - struct rt_vcpu *snext = NULL; /* highest priority on RunQ */ - struct rt_dom *sdom = NULL; - cpumask_t *online; BUG_ON( is_idle_vcpu(vc) ); @@ -1051,6 +1153,22 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) else SCHED_STAT_CRANK(vcpu_wake_not_runnable); + /* + * If a deadline passed while svc was asleep/blocked, we need new + * scheduling parameters ( a new deadline and full budget), and + * also a new replenishment event + */ + if ( now >= svc->cur_deadline) + { + rt_update_deadline(now, svc); + __replq_remove(ops, svc); + } + + if( !__vcpu_on_replq(svc) ) + { + __replq_insert(ops, svc); + ASSERT( vcpu_runnable(vc) ); + } /* If context hasn't been saved for this vcpu yet, we can't put it on * the Runqueue/DepletedQ. Instead, we set a flag so that it will be * put on the Runqueue/DepletedQ after the context has been saved. @@ -1061,22 +1179,10 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) return; } - if ( now >= svc->cur_deadline) - rt_update_deadline(now, svc); - /* insert svc to runq/depletedq because svc is not in queue now */ __runq_insert(ops, svc); - __repl_update(ops, now); - - ASSERT(!list_empty(&prv->sdom)); - sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem); - online = cpupool_domain_cpumask(sdom->dom); - snext = __runq_pick(ops, online); /* pick snext from ALL valid cpus */ - - runq_tickle(ops, snext); - - return; + runq_tickle(ops, svc); } /* @@ -1087,10 +1193,6 @@ static void rt_context_saved(const struct scheduler *ops, struct vcpu *vc) { struct rt_vcpu *svc = rt_vcpu(vc); - struct rt_vcpu *snext = NULL; - struct rt_dom *sdom = NULL; - struct rt_private *prv = rt_priv(ops); - cpumask_t *online; spinlock_t *lock = vcpu_schedule_lock_irq(vc); clear_bit(__RTDS_scheduled, &svc->flags); @@ -1102,14 +1204,7 @@ rt_context_saved(const struct scheduler *ops, struct vcpu *vc) likely(vcpu_runnable(vc)) ) { __runq_insert(ops, svc); - __repl_update(ops, NOW()); - - ASSERT(!list_empty(&prv->sdom)); - sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem); - online = cpupool_domain_cpumask(sdom->dom); - snext = __runq_pick(ops, online); /* pick snext from ALL cpus */ - - runq_tickle(ops, snext); + runq_tickle(ops, svc); } out: vcpu_schedule_unlock_irq(lock, vc); @@ -1168,6 +1263,71 @@ rt_dom_cntl( return rc; } +/* + * The replenishment timer handler picks vcpus + * from the replq and does the actual replenishment + */ +static void repl_handler(void *data){ + unsigned long flags; + s_time_t now = NOW(); + struct scheduler *ops = data; + struct rt_private *prv = rt_priv(ops); + struct list_head *replq = rt_replq(ops); + struct timer *repl_timer = prv->repl_timer; + struct list_head *iter, *tmp; + struct rt_vcpu *svc = NULL; + + spin_lock_irqsave(&prv->lock, flags); + + stop_timer(repl_timer); + + list_for_each_safe(iter, tmp, replq) + { + svc = __replq_elem(iter); + + if ( now < svc->cur_deadline ) + break; + + rt_update_deadline(now, svc); + + /* + * when the replenishment happens + * svc is either on a pcpu or on + * runq/depletedq + */ + if( __vcpu_on_q(svc) ) + { + /* put back to runq */ + __q_remove(svc); + __runq_insert(ops, svc); + } + + /* + * tickle regardless where it's at + * because a running vcpu could have + * a later deadline than others after + * replenishment + */ + runq_tickle(ops, svc); + + /* + * update replenishment event queue + * without reprogramming the timer + */ + deadline_queue_remove(&svc->replq_elem); + deadline_queue_insert(&__replq_elem, svc, &svc->replq_elem, replq); + } + + /* + * use the vcpu that's on the top + * or else don't program the timer + */ + if( !list_empty(replq) ) + set_timer(repl_timer, __replq_elem(replq->next)->cur_deadline); + + spin_unlock_irqrestore(&prv->lock, flags); +} + static struct rt_private _rt_priv; static const struct scheduler sched_rtds_def = {