Message ID | 20230921122352.2307574-2-george.dunlap@cloud.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/2] credit: Limit load balancing to once per millisecond | expand |
On Thu, Sep 21, 2023 at 1:23 PM George Dunlap <george.dunlap@cloud.com> wrote: > > On large systems with many vcpus yielding due to spinlock priority > inversion, it's not uncommon for a vcpu to yield its timeslice, only > to be immediately stolen by another pcpu looking for higher-priority > work. > > To prevent this: > > * Keep the YIELD flag until a vcpu is removed from a runqueue > > * When looking for work to steal, skip vcpus which have yielded > > NB that this does mean that sometimes a VM is inserted into an empty > runqueue; handle that case. > > Signed-off-by: George Dunlap <george.dunlap@cloud.com> Marcus, Just wanted to verify my interpretation of the testing you did of this patch several months ago: 1. On the problematic workload, mean execution time for the task under heavy load was around 12 seconds 2. With only patch 2 of this series (0004 in your tests), mean execution time under heavy load was around 5 seconds 3. With only patch 1 of this series (0003 in your tests), mean execution time under heavy load was around 3 seconds 4. With both patch 1 and patch 2 of this series (0003+0004 in your tests), mean execution time under heavy load was also around 3 seconds So both patches independently exhibit an improvement; but the combined effect is about the same as the first patch. Assuming those results are accurate, I would argue that we should take both patches. Does anyone want to argue we should only take the first one? -George
Hi George, > On Sep 21, 2023, at 20:23, George Dunlap <george.dunlap@cloud.com> wrote: > > On large systems with many vcpus yielding due to spinlock priority > inversion, it's not uncommon for a vcpu to yield its timeslice, only > to be immediately stolen by another pcpu looking for higher-priority > work. > > To prevent this: > > * Keep the YIELD flag until a vcpu is removed from a runqueue > > * When looking for work to steal, skip vcpus which have yielded > > NB that this does mean that sometimes a VM is inserted into an empty > runqueue; handle that case. > > Signed-off-by: George Dunlap <george.dunlap@cloud.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com> Kind regards, Henry
On 21.09.23 14:23, George Dunlap wrote: > On large systems with many vcpus yielding due to spinlock priority > inversion, it's not uncommon for a vcpu to yield its timeslice, only > to be immediately stolen by another pcpu looking for higher-priority > work. > > To prevent this: > > * Keep the YIELD flag until a vcpu is removed from a runqueue > > * When looking for work to steal, skip vcpus which have yielded > > NB that this does mean that sometimes a VM is inserted into an empty > runqueue; handle that case. > > Signed-off-by: George Dunlap <george.dunlap@cloud.com> > --- > Changes since v1: > - Moved a comment tweak to the right patch > > CC: Dario Faggioli <dfaggioli@suse.com> > --- > xen/common/sched/credit.c | 25 ++++++++++++++----------- > 1 file changed, 14 insertions(+), 11 deletions(-) > > diff --git a/xen/common/sched/credit.c b/xen/common/sched/credit.c > index 5c06f596d2..38a6f6fa6d 100644 > --- a/xen/common/sched/credit.c > +++ b/xen/common/sched/credit.c > @@ -298,14 +298,10 @@ __runq_insert(struct csched_unit *svc) > * runnable unit if we can. The next runq_sort will bring it forward > * within 30ms if the queue too long. */ > if ( test_bit(CSCHED_FLAG_UNIT_YIELD, &svc->flags) > - && __runq_elem(iter)->pri > CSCHED_PRI_IDLE ) > - { > + && __runq_elem(iter)->pri > CSCHED_PRI_IDLE > + && iter->next != runq) Style > iter=iter->next; > > - /* Some sanity checks */ > - BUG_ON(iter == runq); > - } > - > list_add_tail(&svc->runq_elem, iter); > } > > @@ -321,6 +317,11 @@ __runq_remove(struct csched_unit *svc) > { > BUG_ON( !__unit_on_runq(svc) ); > list_del_init(&svc->runq_elem); > + > + /* > + * Clear YIELD flag when scheduling back in > + */ > + clear_bit(CSCHED_FLAG_UNIT_YIELD, &svc->flags); > } > > static inline void > @@ -1637,6 +1638,13 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) > if ( speer->pri <= pri ) > break; > > + /* > + * Don't steal a UNIT which has yielded; it's waiting for a > + * reason > + */ > + if (test_bit(CSCHED_FLAG_UNIT_YIELD, &speer->flags)) Style > + continue; > + > /* Is this UNIT runnable on our PCPU? */ > unit = speer->unit; > BUG_ON( is_idle_unit(unit) ); > @@ -1954,11 +1962,6 @@ static void cf_check csched_schedule( > dec_nr_runnable(sched_cpu); > } > > - /* > - * Clear YIELD flag before scheduling out > - */ > - clear_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags); > - > do { > snext = __runq_elem(runq->next); > With the style issues fixed: Reviewed-by: Juergen Gross <jgross@suse.com> Juergen
On Thu, 21 Sept 2023 at 15:14, George Dunlap <george.dunlap@cloud.com> wrote: > > On Thu, Sep 21, 2023 at 1:23 PM George Dunlap <george.dunlap@cloud.com> wrote: > > > > On large systems with many vcpus yielding due to spinlock priority > > inversion, it's not uncommon for a vcpu to yield its timeslice, only > > to be immediately stolen by another pcpu looking for higher-priority > > work. > > > > To prevent this: > > > > * Keep the YIELD flag until a vcpu is removed from a runqueue > > > > * When looking for work to steal, skip vcpus which have yielded > > > > Marcus, > > Just wanted to verify my interpretation of the testing you did of this > patch several months ago: > > 1. On the problematic workload, mean execution time for the task under > heavy load was around 12 seconds > 2. With only patch 2 of this series (0004 in your tests), mean > execution time under heavy load was around 5 seconds > 3. With only patch 1 of this series (0003 in your tests), mean > execution time under heavy load was around 3 seconds > 4. With both patch 1 and patch 2 of this series (0003+0004 in your > tests), mean execution time under heavy load was also around 3 seconds > > So both patches independently exhibit an improvement; but the combined > effect is about the same as the first patch. > Yes, thanks for the summary, that's pretty much it. Marcus
diff --git a/xen/common/sched/credit.c b/xen/common/sched/credit.c index 5c06f596d2..38a6f6fa6d 100644 --- a/xen/common/sched/credit.c +++ b/xen/common/sched/credit.c @@ -298,14 +298,10 @@ __runq_insert(struct csched_unit *svc) * runnable unit if we can. The next runq_sort will bring it forward * within 30ms if the queue too long. */ if ( test_bit(CSCHED_FLAG_UNIT_YIELD, &svc->flags) - && __runq_elem(iter)->pri > CSCHED_PRI_IDLE ) - { + && __runq_elem(iter)->pri > CSCHED_PRI_IDLE + && iter->next != runq) iter=iter->next; - /* Some sanity checks */ - BUG_ON(iter == runq); - } - list_add_tail(&svc->runq_elem, iter); } @@ -321,6 +317,11 @@ __runq_remove(struct csched_unit *svc) { BUG_ON( !__unit_on_runq(svc) ); list_del_init(&svc->runq_elem); + + /* + * Clear YIELD flag when scheduling back in + */ + clear_bit(CSCHED_FLAG_UNIT_YIELD, &svc->flags); } static inline void @@ -1637,6 +1638,13 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) if ( speer->pri <= pri ) break; + /* + * Don't steal a UNIT which has yielded; it's waiting for a + * reason + */ + if (test_bit(CSCHED_FLAG_UNIT_YIELD, &speer->flags)) + continue; + /* Is this UNIT runnable on our PCPU? */ unit = speer->unit; BUG_ON( is_idle_unit(unit) ); @@ -1954,11 +1962,6 @@ static void cf_check csched_schedule( dec_nr_runnable(sched_cpu); } - /* - * Clear YIELD flag before scheduling out - */ - clear_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags); - do { snext = __runq_elem(runq->next);
On large systems with many vcpus yielding due to spinlock priority inversion, it's not uncommon for a vcpu to yield its timeslice, only to be immediately stolen by another pcpu looking for higher-priority work. To prevent this: * Keep the YIELD flag until a vcpu is removed from a runqueue * When looking for work to steal, skip vcpus which have yielded NB that this does mean that sometimes a VM is inserted into an empty runqueue; handle that case. Signed-off-by: George Dunlap <george.dunlap@cloud.com> --- Changes since v1: - Moved a comment tweak to the right patch CC: Dario Faggioli <dfaggioli@suse.com> --- xen/common/sched/credit.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)