From patchwork Tue Jan 17 17:26:55 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dario Faggioli <dario.faggioli@citrix.com>
X-Patchwork-Id: 9521591
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	B408A601C3 for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue, 17 Jan 2017 17:29:05 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A54ED285F1
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue, 17 Jan 2017 17:29:05 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 99990285F8; Tue, 17 Jan 2017 17:29:05 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.6 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,RCVD_IN_SORBS_SPAM,T_DKIM_INVALID autolearn=ham
	version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 106DD285F1
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue, 17 Jan 2017 17:29:05 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1cTXXA-0008Co-Ez; Tue, 17 Jan 2017 17:27:00 +0000
Received: from mail6.bemta6.messagelabs.com ([193.109.254.103])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <raistlin.df@gmail.com>) id 1cTXX9-0008CS-Nr
	for xen-devel@lists.xenproject.org; Tue, 17 Jan 2017 17:26:59 +0000
Received: from [85.158.143.35] by server-6.bemta-6.messagelabs.com id
	09/8C-15112-3E35E785; Tue, 17 Jan 2017 17:26:59 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrDIsWRWlGSWpSXmKPExsXiVRvkpPsouC7
	CoK1Z1OL7lslMDowehz9cYQlgjGLNzEvKr0hgzWg5MI294J5ZRdOE9awNjK3aXYxcHEIC0xkl
	Vm9vYQVxWATWsErsmPSMEcSRELjEKvHk5HYghxPIiZF4eXEXkM0BZFdJHPhcARIWElCRuLl9F
	RPEpJlMErt2P2ABSQgL6EkcOfqDHcIOkTjcfw7MZhMwkHizYy8riC0ioCRxb9VkJpCZzAL6Eq
	u/8oCEWQRUJd4+3sYEYvMK+EjsW3WQBaSEE8h+dVYSYq23xN6mC2AlogJyEisvt7BClAtKnJz
	5hAVioqbE+l36IGFmAXmJ7W/nME9gFJmFpGoWQtUsJFULGJlXMWoUpxaVpRbpGhrrJRVlpmeU
	5CZm5ugaGpjp5aYWFyemp+YkJhXrJefnbmIEhj4DEOxg/LIs4BCjJAeTkihvx+PaCCG+pPyUy
	ozE4oz4otKc1OJDjBocHAITzs6dziTFkpefl6okwSsGjDEhwaLU9NSKtMwcYHTClEpw8CiJ8P
	4MAkrzFhck5hZnpkOkTjHqcuzadfklkxDYDClx3jcgRQIgRRmleXAjYIniEqOslDAvI9CBQjw
	FqUW5mSWo8q8YxTkYlYR5w0Cm8GTmlcBtegV0BBPQEdd1qkGOKElESEk1MO43dsttbL0V1cSr
	PPEny8G/LI6uu2puzWlI8a54ld7nPu/rn96/8QkuyRMsM1h+GZuVW625EVl3aI0B/6LFDGfLu
	n9vXGJd9MPi44f+aY0cD0Iryw8EhvCcuualsfHh8be7Zi8R7nV479lgE/E0p+WU28782tsJya
	1SHq2aVZkS/tYPxXf4KrEUZyQaajEXFScCAJkKPY8PAwAA
X-Env-Sender: raistlin.df@gmail.com
X-Msg-Ref: server-7.tower-21.messagelabs.com!1484674018!53193488!1
X-Originating-IP: [74.125.82.66]
X-SpamReason: No, hits=0.0 required=7.0 tests=
X-StarScan-Received: 
X-StarScan-Version: 9.1.1; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 59076 invoked from network); 17 Jan 2017 17:26:58 -0000
Received: from mail-wm0-f66.google.com (HELO mail-wm0-f66.google.com)
	(74.125.82.66)
	by server-7.tower-21.messagelabs.com with AES128-GCM-SHA256 encrypted
	SMTP; 17 Jan 2017 17:26:58 -0000
Received: by mail-wm0-f66.google.com with SMTP id c85so7502232wmi.1
	for <xen-devel@lists.xenproject.org>;
	Tue, 17 Jan 2017 09:26:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=sender:subject:from:to:cc:date:message-id:in-reply-to:references
	:user-agent:mime-version:content-transfer-encoding;
	bh=eds+QTOTX5z3VdSNg7z18xOB7N7Qk/rJRb5jlVidCLI=;
	b=VgjO0TZ6orRTr3f1MpnsjzSk/9+JXh40Rjh1c+XSmo5LoidEGXbllhvOygE7e+1AxV
	MltppoRLXPosNZE+uc/277uc8DjgZihxhWDBAQWfNLed3RMoZMkLYTYlY7D3S9Cv4TWh
	KR3ygd3+aicXAAremBXO4WQekAJutwhtxSkhYQU2LNEKXfZTHlsajEhyXIAYqJFwM/mV
	VrunAY4OuWnJ0IHXcNhLZf05f0fq+bI4uBHAKxAQ6qKk7+CoyHt5NkMhe+aRqEP3P2db
	0Ub4f/N6Bc3STW8A4hzDUmVRgGreCzCnuyLx2rA4WALwpsvpLjub1TvQlS+CPz+kbQd8
	XRWA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:sender:subject:from:to:cc:date:message-id
	:in-reply-to:references:user-agent:mime-version
	:content-transfer-encoding;
	bh=eds+QTOTX5z3VdSNg7z18xOB7N7Qk/rJRb5jlVidCLI=;
	b=GeE5kc+61cRFv5FCOjBxXvVFxGSwXFVVkSHLp7zW5+colUqu5Q85dGWNIINOjY6MYt
	5U2p1U88E0xh+bjqBQAvVeKcpisxmcmkghO/9WmrZ4njGlzFJt3eh/iPo2N52gbsnE9P
	sVD0sNW1HT8wepoLSriJsX63AuikumNlk2TkwHrMVuTXMhwr/dn/hk6OjAQwHiY9II2g
	j+29L3/vTayULkCubS6QlQGaOFDFfgZdVTwKMasG0ndt9qbZ+zZ4poFh9H/j7tHPPoUx
	1rtBsMG4EjJwSs3qu/0sGbr2fKivYPMI8idFHAcOYCdvg2UNj5ulCF6hJFVuabVNyhc0
	PYVQ==
X-Gm-Message-State: 
 AIkVDXJtQmHU+xhwY8Yf8muiFFsU0L2fGMRU9YywlVEAiNoX7Qmlurl92k43tG7YsDyzag==
X-Received: by 10.223.134.173 with SMTP id 42mr32054758wrx.95.1484674017846;
	Tue, 17 Jan 2017 09:26:57 -0800 (PST)
Received: from Solace.fritz.box (58-209-66-80.hosts.abilene.it.
	[80.66.209.58]) by smtp.gmail.com with ESMTPSA id
	197sm38481727wmy.16.2017.01.17.09.26.56
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Tue, 17 Jan 2017 09:26:57 -0800 (PST)
From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xenproject.org
Date: Tue, 17 Jan 2017 18:26:55 +0100
Message-ID: <148467401548.27920.15109928323593413208.stgit@Solace.fritz.box>
In-Reply-To: <148467379229.27920.2367500429219327194.stgit@Solace.fritz.box>
References: <148467379229.27920.2367500429219327194.stgit@Solace.fritz.box>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Subject: [Xen-devel] [PATCH 3/5] xen: credit2: fix shutdown/suspend when
	playing with cpupools.
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

In fact, during shutdown/suspend, we temporarily move all
the vCPUs to the BSP (i.e., pCPU 0, as of now). For Credit2
domains, we call csched2_vcpu_migrate(), expects to find the
target pCPU in the domain's pool

Therefore, if Credit2 is the default scheduler and we have
removed pCPU 0 from cpupool0, shutdown/suspend fails like
this:

 RIP:    e008:[<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
 Xen call trace:
    [<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
    [<ffff82d080129138>] sched_credit2.c#csched2_vcpu_migrate+0x6e/0x86
    [<ffff82d08012c468>] schedule.c#vcpu_move_locked+0x69/0x6f
    [<ffff82d08012ec14>] cpu_disable_scheduler+0x3d7/0x430
    [<ffff82d08019669b>] __cpu_disable+0x299/0x2b0
    [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
    [<ffff82d0801312d8>] stop_machine.c#stopmachine_action+0x7f/0x8d
    [<ffff82d0801330b8>] tasklet.c#do_tasklet_work+0x74/0xab
    [<ffff82d0801333ed>] do_tasklet+0x66/0x8b
    [<ffff82d080166a73>] domain.c#idle_loop+0x3b/0x5e

 ****************************************
 Panic on CPU 8:
 Assertion 'svc->vcpu->processor < nr_cpu_ids' failed at sched_credit2.c:1729
 ****************************************

On the other hand, if Credit2 is the scheduler of another
pool, when trying (still during shutdown/suspend) to move
the vCPUs of the Credit2 domains to pCPU 0, it figures
out that pCPU 0 is not a Credit2 pCPU, and fails like this:

 RIP:    e008:[<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
 Xen call trace:
    [<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
    [<ffff82d08012c4e9>] schedule.c#vcpu_move_locked+0x69/0x6f
    [<ffff82d08012edfc>] cpu_disable_scheduler+0x3d7/0x430
    [<ffff82d08019687b>] __cpu_disable+0x299/0x2b0
    [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
    [<ffff82d0801314c0>] stop_machine.c#stopmachine_action+0x7f/0x8d
    [<ffff82d0801332a0>] tasklet.c#do_tasklet_work+0x74/0xab
    [<ffff82d0801335d5>] do_tasklet+0x66/0x8b
    [<ffff82d080166c53>] domain.c#idle_loop+0x3b/0x5e

The solution is to recognise the specific situation, inside
csched2_vcpu_migrate() and, considering it is something temporary,
which only happens during shutdown/suspend, quickly deal with it.

Then, in the resume path, in restore_vcpu_affinity(), things
are set back to normal, and a new v->processor is chosen, for
each vCPU, from the proper set of pCPUs (i.e., the ones of
the proper cpupool).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
This is a bugfix, and should be backported to 4.8.

Note that Credit2 being used, either as default scheduler or in a cpupool is
what triggers the bug, but it's actually more a general thing, which would
affect any scheduler that remaps the runqueue locks.
---
 xen/common/sched_credit2.c |   32 +++++++++++++++++++++++++++++++-
 xen/common/schedule.c      |   25 ++++++++++++++++++++++---
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ce0e146..2ce738d 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1946,13 +1946,43 @@ static void
 csched2_vcpu_migrate(
     const struct scheduler *ops, struct vcpu *vc, unsigned int new_cpu)
 {
+    struct domain *d = vc->domain;
     struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
     struct csched2_runqueue_data *trqd;
+    s_time_t now = NOW();
 
     /* Check if new_cpu is valid */
     ASSERT(cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
     ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
 
+    /*
+     * Being passed a target pCPU which is outside of our cpupool is only
+     * valid if we are shutting down (or doing ACPI suspend), and we are
+     * moving everyone to BSP, no matter whether or not BSP is inside our
+     * cpupool.
+     *
+     * And since there indeed is the chance that it is not part of it, all
+     * we must do is remove _and_ unassign the vCPU from any runqueue, as
+     * well as updating v->processor with the target, so that the suspend
+     * process can continue.
+     *
+     * It will then be during resume that a new, meaningful, value for
+     * v->processor will be chosen, and during actual domain unpause that
+     * the vCPU will be assigned to and added to the proper runqueue.
+     */
+    if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
+    {
+        ASSERT(system_state == SYS_STATE_suspend);
+        if ( __vcpu_on_runq(svc) )
+        {
+            __runq_remove(svc);
+            update_load(ops, svc->rqd, NULL, -1, now);
+        }
+        __runq_deassign(svc);
+        vc->processor = new_cpu;
+        return;
+    }
+
     trqd = RQD(ops, new_cpu);
 
     /*
@@ -1964,7 +1994,7 @@ csched2_vcpu_migrate(
      * pointing to a pcpu where we can't run any longer.
      */
     if ( trqd != svc->rqd )
-        migrate(ops, svc, trqd, NOW());
+        migrate(ops, svc, trqd, now);
     else
         vc->processor = new_cpu;
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5b444c4..36ff2e9 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -633,8 +633,11 @@ void vcpu_force_reschedule(struct vcpu *v)
 
 void restore_vcpu_affinity(struct domain *d)
 {
+    unsigned int cpu = smp_processor_id();
     struct vcpu *v;
 
+    ASSERT(system_state == SYS_STATE_resume);
+
     for_each_vcpu ( d, v )
     {
         spinlock_t *lock = vcpu_schedule_lock_irq(v);
@@ -643,18 +646,34 @@ void restore_vcpu_affinity(struct domain *d)
         {
             cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
             v->affinity_broken = 0;
+
         }
 
-        if ( v->processor == smp_processor_id() )
+        /*
+         * During suspend (in cpu_disable_scheduler()), we moved every vCPU
+         * to BSP (which, as of now, is pCPU 0), as a temporary measure to
+         * allow the nonboot processors to have their data structure freed
+         * and go to sleep. But nothing guardantees that the BSP is a valid
+         * pCPU for a particular domain.
+         *
+         * Therefore, here, before actually unpausing the domains, we should
+         * set v->processor of each of their vCPUs to something that will
+         * make sense for the scheduler of the cpupool in which they are in.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                    cpupool_domain_cpumask(v->domain));
+        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+
+        if ( v->processor == cpu )
         {
             set_bit(_VPF_migrating, &v->pause_flags);
-            vcpu_schedule_unlock_irq(lock, v);
+            spin_unlock_irq(lock);;
             vcpu_sleep_nosync(v);
             vcpu_migrate(v);
         }
         else
         {
-            vcpu_schedule_unlock_irq(lock, v);
+            spin_unlock_irq(lock);
         }
     }