diff mbox

[3/5] xen: credit2: fix shutdown/suspend when playing with cpupools.

Message ID 148467401548.27920.15109928323593413208.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli Jan. 17, 2017, 5:26 p.m. UTC
In fact, during shutdown/suspend, we temporarily move all
the vCPUs to the BSP (i.e., pCPU 0, as of now). For Credit2
domains, we call csched2_vcpu_migrate(), expects to find the
target pCPU in the domain's pool

Therefore, if Credit2 is the default scheduler and we have
removed pCPU 0 from cpupool0, shutdown/suspend fails like
this:

 RIP:    e008:[<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
 Xen call trace:
    [<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
    [<ffff82d080129138>] sched_credit2.c#csched2_vcpu_migrate+0x6e/0x86
    [<ffff82d08012c468>] schedule.c#vcpu_move_locked+0x69/0x6f
    [<ffff82d08012ec14>] cpu_disable_scheduler+0x3d7/0x430
    [<ffff82d08019669b>] __cpu_disable+0x299/0x2b0
    [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
    [<ffff82d0801312d8>] stop_machine.c#stopmachine_action+0x7f/0x8d
    [<ffff82d0801330b8>] tasklet.c#do_tasklet_work+0x74/0xab
    [<ffff82d0801333ed>] do_tasklet+0x66/0x8b
    [<ffff82d080166a73>] domain.c#idle_loop+0x3b/0x5e

 ****************************************
 Panic on CPU 8:
 Assertion 'svc->vcpu->processor < nr_cpu_ids' failed at sched_credit2.c:1729
 ****************************************

On the other hand, if Credit2 is the scheduler of another
pool, when trying (still during shutdown/suspend) to move
the vCPUs of the Credit2 domains to pCPU 0, it figures
out that pCPU 0 is not a Credit2 pCPU, and fails like this:

 RIP:    e008:[<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
 Xen call trace:
    [<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
    [<ffff82d08012c4e9>] schedule.c#vcpu_move_locked+0x69/0x6f
    [<ffff82d08012edfc>] cpu_disable_scheduler+0x3d7/0x430
    [<ffff82d08019687b>] __cpu_disable+0x299/0x2b0
    [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
    [<ffff82d0801314c0>] stop_machine.c#stopmachine_action+0x7f/0x8d
    [<ffff82d0801332a0>] tasklet.c#do_tasklet_work+0x74/0xab
    [<ffff82d0801335d5>] do_tasklet+0x66/0x8b
    [<ffff82d080166c53>] domain.c#idle_loop+0x3b/0x5e

The solution is to recognise the specific situation, inside
csched2_vcpu_migrate() and, considering it is something temporary,
which only happens during shutdown/suspend, quickly deal with it.

Then, in the resume path, in restore_vcpu_affinity(), things
are set back to normal, and a new v->processor is chosen, for
each vCPU, from the proper set of pCPUs (i.e., the ones of
the proper cpupool).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
This is a bugfix, and should be backported to 4.8.

Note that Credit2 being used, either as default scheduler or in a cpupool is
what triggers the bug, but it's actually more a general thing, which would
affect any scheduler that remaps the runqueue locks.
---
 xen/common/sched_credit2.c |   32 +++++++++++++++++++++++++++++++-
 xen/common/schedule.c      |   25 ++++++++++++++++++++++---
 2 files changed, 53 insertions(+), 4 deletions(-)

Comments

George Dunlap Jan. 23, 2017, 3:42 p.m. UTC | #1
On Tue, Jan 17, 2017 at 5:26 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> In fact, during shutdown/suspend, we temporarily move all
> the vCPUs to the BSP (i.e., pCPU 0, as of now). For Credit2
> domains, we call csched2_vcpu_migrate(), expects to find the
> target pCPU in the domain's pool
>
> Therefore, if Credit2 is the default scheduler and we have
> removed pCPU 0 from cpupool0, shutdown/suspend fails like
> this:
>
>  RIP:    e008:[<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
>  Xen call trace:
>     [<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
>     [<ffff82d080129138>] sched_credit2.c#csched2_vcpu_migrate+0x6e/0x86
>     [<ffff82d08012c468>] schedule.c#vcpu_move_locked+0x69/0x6f
>     [<ffff82d08012ec14>] cpu_disable_scheduler+0x3d7/0x430
>     [<ffff82d08019669b>] __cpu_disable+0x299/0x2b0
>     [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
>     [<ffff82d0801312d8>] stop_machine.c#stopmachine_action+0x7f/0x8d
>     [<ffff82d0801330b8>] tasklet.c#do_tasklet_work+0x74/0xab
>     [<ffff82d0801333ed>] do_tasklet+0x66/0x8b
>     [<ffff82d080166a73>] domain.c#idle_loop+0x3b/0x5e
>
>  ****************************************
>  Panic on CPU 8:
>  Assertion 'svc->vcpu->processor < nr_cpu_ids' failed at sched_credit2.c:1729
>  ****************************************
>
> On the other hand, if Credit2 is the scheduler of another
> pool, when trying (still during shutdown/suspend) to move
> the vCPUs of the Credit2 domains to pCPU 0, it figures
> out that pCPU 0 is not a Credit2 pCPU, and fails like this:
>
>  RIP:    e008:[<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
>  Xen call trace:
>     [<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
>     [<ffff82d08012c4e9>] schedule.c#vcpu_move_locked+0x69/0x6f
>     [<ffff82d08012edfc>] cpu_disable_scheduler+0x3d7/0x430
>     [<ffff82d08019687b>] __cpu_disable+0x299/0x2b0
>     [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
>     [<ffff82d0801314c0>] stop_machine.c#stopmachine_action+0x7f/0x8d
>     [<ffff82d0801332a0>] tasklet.c#do_tasklet_work+0x74/0xab
>     [<ffff82d0801335d5>] do_tasklet+0x66/0x8b
>     [<ffff82d080166c53>] domain.c#idle_loop+0x3b/0x5e
>
> The solution is to recognise the specific situation, inside
> csched2_vcpu_migrate() and, considering it is something temporary,
> which only happens during shutdown/suspend, quickly deal with it.
>
> Then, in the resume path, in restore_vcpu_affinity(), things
> are set back to normal, and a new v->processor is chosen, for
> each vCPU, from the proper set of pCPUs (i.e., the ones of
> the proper cpupool).
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>
diff mbox

Patch

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ce0e146..2ce738d 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1946,13 +1946,43 @@  static void
 csched2_vcpu_migrate(
     const struct scheduler *ops, struct vcpu *vc, unsigned int new_cpu)
 {
+    struct domain *d = vc->domain;
     struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
     struct csched2_runqueue_data *trqd;
+    s_time_t now = NOW();
 
     /* Check if new_cpu is valid */
     ASSERT(cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
     ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
 
+    /*
+     * Being passed a target pCPU which is outside of our cpupool is only
+     * valid if we are shutting down (or doing ACPI suspend), and we are
+     * moving everyone to BSP, no matter whether or not BSP is inside our
+     * cpupool.
+     *
+     * And since there indeed is the chance that it is not part of it, all
+     * we must do is remove _and_ unassign the vCPU from any runqueue, as
+     * well as updating v->processor with the target, so that the suspend
+     * process can continue.
+     *
+     * It will then be during resume that a new, meaningful, value for
+     * v->processor will be chosen, and during actual domain unpause that
+     * the vCPU will be assigned to and added to the proper runqueue.
+     */
+    if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
+    {
+        ASSERT(system_state == SYS_STATE_suspend);
+        if ( __vcpu_on_runq(svc) )
+        {
+            __runq_remove(svc);
+            update_load(ops, svc->rqd, NULL, -1, now);
+        }
+        __runq_deassign(svc);
+        vc->processor = new_cpu;
+        return;
+    }
+
     trqd = RQD(ops, new_cpu);
 
     /*
@@ -1964,7 +1994,7 @@  csched2_vcpu_migrate(
      * pointing to a pcpu where we can't run any longer.
      */
     if ( trqd != svc->rqd )
-        migrate(ops, svc, trqd, NOW());
+        migrate(ops, svc, trqd, now);
     else
         vc->processor = new_cpu;
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5b444c4..36ff2e9 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -633,8 +633,11 @@  void vcpu_force_reschedule(struct vcpu *v)
 
 void restore_vcpu_affinity(struct domain *d)
 {
+    unsigned int cpu = smp_processor_id();
     struct vcpu *v;
 
+    ASSERT(system_state == SYS_STATE_resume);
+
     for_each_vcpu ( d, v )
     {
         spinlock_t *lock = vcpu_schedule_lock_irq(v);
@@ -643,18 +646,34 @@  void restore_vcpu_affinity(struct domain *d)
         {
             cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
             v->affinity_broken = 0;
+
         }
 
-        if ( v->processor == smp_processor_id() )
+        /*
+         * During suspend (in cpu_disable_scheduler()), we moved every vCPU
+         * to BSP (which, as of now, is pCPU 0), as a temporary measure to
+         * allow the nonboot processors to have their data structure freed
+         * and go to sleep. But nothing guardantees that the BSP is a valid
+         * pCPU for a particular domain.
+         *
+         * Therefore, here, before actually unpausing the domains, we should
+         * set v->processor of each of their vCPUs to something that will
+         * make sense for the scheduler of the cpupool in which they are in.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                    cpupool_domain_cpumask(v->domain));
+        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+
+        if ( v->processor == cpu )
         {
             set_bit(_VPF_migrating, &v->pause_flags);
-            vcpu_schedule_unlock_irq(lock, v);
+            spin_unlock_irq(lock);;
             vcpu_sleep_nosync(v);
             vcpu_migrate(v);
         }
         else
         {
-            vcpu_schedule_unlock_irq(lock, v);
+            spin_unlock_irq(lock);
         }
     }