Message ID | 147981141529.15399.3103284119825700755.stgit@Solace.fritz.box (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Nov 22, 2016 at 11:43:35AM +0100, Dario Faggioli wrote: > Since b047f888d489 ("xen: sched: leave CPUs doing tasklet > work alone") a cpu executing a tasklet, is not marked as > idle. > > Therefore: > - avoid asserting that we can't find the idle vcpu running > on one of them, which is not true, > - avoid triggering a preemption on them (and add an assert > checking that). > > This fixes a bug identified by OSSTest, in flight 102372 > (on ARM, but it's not at all ARM specific), where the > ASSERT() was triggering like this: > > (XEN) Xen call trace: > (XEN) [<0022af78>] sched_credit2.c#runq_tickle+0x3e8/0x61c (PC) > (XEN) [<0022aedc>] sched_credit2.c#runq_tickle+0x34c/0x61c (LR) > (XEN) [<0022b644>] sched_credit2.c#csched2_context_saved+0x128/0x1a4 > (XEN) [<0023303c>] context_saved+0x7c/0xa4 > (XEN) [<0024f660>] domain.c#schedule_tail+0x2b4/0x308 > (XEN) [<0024faac>] context_switch+0x80/0x94 > (XEN) [<0022ff48>] schedule.c#schedule+0x76c/0x7ec > (XEN) [<002338d4>] softirq.c#__do_softirq+0xcc/0xec > (XEN) [<00233968>] do_softirq+0x18/0x28 > (XEN) [<00261084>] leave_hypervisor_tail+0x58/0x88 > (XEN) [<002649d0>] entry.o#return_to_guest+0xc/0xb8 > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 1: > (XEN) Assertion '!is_idle_vcpu(cur->vcpu)' failed at sched_credit2.c:1009 > (XEN) **************************************** > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > Cc: George Dunlap <george.dunlap@citrix.com> > Cc: Wei Liu <wei.liu2@citrix.com> I would like to have this fix, so in principle: Release-acked-by: Wei Liu <wei.liu2@citrix.com> I haven't read the content of this patch though.
>>> On 22.11.16 at 11:43, <dario.faggioli@citrix.com> wrote: > Since b047f888d489 ("xen: sched: leave CPUs doing tasklet > work alone") a cpu executing a tasklet, is not marked as > idle. > > Therefore: > - avoid asserting that we can't find the idle vcpu running > on one of them, which is not true, > - avoid triggering a preemption on them (and add an assert > checking that). > > This fixes a bug identified by OSSTest, in flight 102372 > (on ARM, but it's not at all ARM specific), where the > ASSERT() was triggering like this: > > (XEN) Xen call trace: > (XEN) [<0022af78>] sched_credit2.c#runq_tickle+0x3e8/0x61c (PC) > (XEN) [<0022aedc>] sched_credit2.c#runq_tickle+0x34c/0x61c (LR) > (XEN) [<0022b644>] sched_credit2.c#csched2_context_saved+0x128/0x1a4 > (XEN) [<0023303c>] context_saved+0x7c/0xa4 > (XEN) [<0024f660>] domain.c#schedule_tail+0x2b4/0x308 > (XEN) [<0024faac>] context_switch+0x80/0x94 > (XEN) [<0022ff48>] schedule.c#schedule+0x76c/0x7ec > (XEN) [<002338d4>] softirq.c#__do_softirq+0xcc/0xec > (XEN) [<00233968>] do_softirq+0x18/0x28 > (XEN) [<00261084>] leave_hypervisor_tail+0x58/0x88 > (XEN) [<002649d0>] entry.o#return_to_guest+0xc/0xb8 > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 1: > (XEN) Assertion '!is_idle_vcpu(cur->vcpu)' failed at sched_credit2.c:1009 > (XEN) **************************************** > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> with two remarks: > @@ -1032,6 +1037,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now) > } > } > > + ASSERT(ipid == -1 || !is_idle_vcpu(curr_on_cpu(ipid))); > + > /* > * Only switch to another processor if the credit difference is > * greater than the migrate resistance. If you moved this past the if() following this comment, the ipid == -1 case would already be taken care of, simplifying the code. And then, having looked back at the commit mentioned in the description, that one resulted in two constructs like (taking the code as it looks now) if ( cpumask_test_cpu(cpu, &rqd->idle) ) { __cpumask_clear_cpu(cpu, &rqd->idle); ... Is there a reason this can't or shouldn't be if ( __cpumask_test_and_clear_cpu(cpu, &rqd->idle) ) { ... ? Jan
On 22/11/16 12:21, Jan Beulich wrote: >>>> On 22.11.16 at 11:43, <dario.faggioli@citrix.com> wrote: >> Since b047f888d489 ("xen: sched: leave CPUs doing tasklet >> work alone") a cpu executing a tasklet, is not marked as >> idle. >> >> Therefore: >> - avoid asserting that we can't find the idle vcpu running >> on one of them, which is not true, >> - avoid triggering a preemption on them (and add an assert >> checking that). >> >> This fixes a bug identified by OSSTest, in flight 102372 >> (on ARM, but it's not at all ARM specific), where the >> ASSERT() was triggering like this: >> >> (XEN) Xen call trace: >> (XEN) [<0022af78>] sched_credit2.c#runq_tickle+0x3e8/0x61c (PC) >> (XEN) [<0022aedc>] sched_credit2.c#runq_tickle+0x34c/0x61c (LR) >> (XEN) [<0022b644>] sched_credit2.c#csched2_context_saved+0x128/0x1a4 >> (XEN) [<0023303c>] context_saved+0x7c/0xa4 >> (XEN) [<0024f660>] domain.c#schedule_tail+0x2b4/0x308 >> (XEN) [<0024faac>] context_switch+0x80/0x94 >> (XEN) [<0022ff48>] schedule.c#schedule+0x76c/0x7ec >> (XEN) [<002338d4>] softirq.c#__do_softirq+0xcc/0xec >> (XEN) [<00233968>] do_softirq+0x18/0x28 >> (XEN) [<00261084>] leave_hypervisor_tail+0x58/0x88 >> (XEN) [<002649d0>] entry.o#return_to_guest+0xc/0xb8 >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 1: >> (XEN) Assertion '!is_idle_vcpu(cur->vcpu)' failed at sched_credit2.c:1009 >> (XEN) **************************************** >> >> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > with two remarks: > >> @@ -1032,6 +1037,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now) >> } >> } >> >> + ASSERT(ipid == -1 || !is_idle_vcpu(curr_on_cpu(ipid))); >> + >> /* >> * Only switch to another processor if the credit difference is >> * greater than the migrate resistance. > > If you moved this past the if() following this comment, the ipid == -1 > case would already be taken care of, simplifying the code. > > And then, having looked back at the commit mentioned in the > description, that one resulted in two constructs like (taking the > code as it looks now) > > if ( cpumask_test_cpu(cpu, &rqd->idle) ) > { > __cpumask_clear_cpu(cpu, &rqd->idle); > ... > > Is there a reason this can't or shouldn't be > > if ( __cpumask_test_and_clear_cpu(cpu, &rqd->idle) ) Avoid cache line trashing? Juergen
On Tue, 2016-11-22 at 04:21 -0700, Jan Beulich wrote: > > > > On 22.11.16 at 11:43, <dario.faggioli@citrix.com> wrote: > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > Thanks a lot for looking at the patch. > with two remarks: > > [...] > > If you moved this past the if() following this comment, the ipid == > -1 > case would already be taken care of, simplifying the code. > True. Though about it, but was two minded. I think the ASSERT is more 'descriptive' if it's kept as close as possible to the for (and in fact, I now even regret having put a blank line in between). But I see your point and agree that it's looking awkward when you see it together with the following if. So, yes, all in all, I think I like your idea better. > And then, having looked back at the commit mentioned in the > description, that one resulted in two constructs like (taking the > code as it looks now) > > if ( cpumask_test_cpu(cpu, &rqd->idle) ) > { > __cpumask_clear_cpu(cpu, &rqd->idle); > ... > > Is there a reason this can't or shouldn't be > > if ( __cpumask_test_and_clear_cpu(cpu, &rqd->idle) ) > { > ... > ? > It can indeed. You can find it done as it is right now in other places, in scheduling code, though, I guess for cache betterness, as Juergen is saying. I remember not being sure which way to go, and eventually leaning toward this one. I'm still not sure what's best, but it'd be a cleanup/optimization, and would IMO require, if done, more than just that (e.g., comments should be improved). So I'd be inclined to consider this 4.9 material... But thanks for bringing it up. :-) So, I'm resending this patch with the ASSERT moved below the if, and I'm keeping your Reviewed-by. Hope that's ok. Thanks again and Regards, Dario
>>> On 22.11.16 at 12:52, <dario.faggioli@citrix.com> wrote: > On Tue, 2016-11-22 at 04:21 -0700, Jan Beulich wrote: >> > > > On 22.11.16 at 11:43, <dario.faggioli@citrix.com> wrote: >> And then, having looked back at the commit mentioned in the >> description, that one resulted in two constructs like (taking the >> code as it looks now) >> >> if ( cpumask_test_cpu(cpu, &rqd->idle) ) >> { >> __cpumask_clear_cpu(cpu, &rqd->idle); >> ... >> >> Is there a reason this can't or shouldn't be >> >> if ( __cpumask_test_and_clear_cpu(cpu, &rqd->idle) ) >> { >> ... >> ? >> > It can indeed. You can find it done as it is right now in other places, > in scheduling code, though, I guess for cache betterness, as Juergen is > saying. I remember not being sure which way to go, and eventually > leaning toward this one. The question really is whether this is in fact very frequently executed (and hence bouncing cache lines). I can't easily tell, but I'd guess the actual schedule functions shouldn't typically run more than once every few milliseconds. > I'm still not sure what's best, but it'd be a cleanup/optimization, and > would IMO require, if done, more than just that (e.g., comments should > be improved). So I'd be inclined to consider this 4.9 material... Of course, it was just something I've noticed while looking at that code. > So, I'm resending this patch with the ASSERT moved below the if, and > I'm keeping your Reviewed-by. Hope that's ok. And again of course. Jan
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 1f26553..ac5ef7d 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -1006,7 +1006,12 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now) cur = CSCHED2_VCPU(curr_on_cpu(i)); - ASSERT(!is_idle_vcpu(cur->vcpu)); + /* + * Even if the cpu is not in rqd->idle, it may be running the + * idle vcpu, if it's doing tasklet work. Just skip it. + */ + if ( is_idle_vcpu(cur->vcpu) ) + continue; /* Update credits for current to see if we want to preempt. */ burn_credits(rqd, cur, now); @@ -1032,6 +1037,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now) } } + ASSERT(ipid == -1 || !is_idle_vcpu(curr_on_cpu(ipid))); + /* * Only switch to another processor if the credit difference is * greater than the migrate resistance.
Since b047f888d489 ("xen: sched: leave CPUs doing tasklet work alone") a cpu executing a tasklet, is not marked as idle. Therefore: - avoid asserting that we can't find the idle vcpu running on one of them, which is not true, - avoid triggering a preemption on them (and add an assert checking that). This fixes a bug identified by OSSTest, in flight 102372 (on ARM, but it's not at all ARM specific), where the ASSERT() was triggering like this: (XEN) Xen call trace: (XEN) [<0022af78>] sched_credit2.c#runq_tickle+0x3e8/0x61c (PC) (XEN) [<0022aedc>] sched_credit2.c#runq_tickle+0x34c/0x61c (LR) (XEN) [<0022b644>] sched_credit2.c#csched2_context_saved+0x128/0x1a4 (XEN) [<0023303c>] context_saved+0x7c/0xa4 (XEN) [<0024f660>] domain.c#schedule_tail+0x2b4/0x308 (XEN) [<0024faac>] context_switch+0x80/0x94 (XEN) [<0022ff48>] schedule.c#schedule+0x76c/0x7ec (XEN) [<002338d4>] softirq.c#__do_softirq+0xcc/0xec (XEN) [<00233968>] do_softirq+0x18/0x28 (XEN) [<00261084>] leave_hypervisor_tail+0x58/0x88 (XEN) [<002649d0>] entry.o#return_to_guest+0xc/0xb8 (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) Assertion '!is_idle_vcpu(cur->vcpu)' failed at sched_credit2.c:1009 (XEN) **************************************** Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Cc: George Dunlap <george.dunlap@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> --- xen/common/sched_credit2.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)