From patchwork Tue Jan 17 17:26:46 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dario Faggioli <dario.faggioli@citrix.com>
X-Patchwork-Id: 9521589
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	10EF3601C3 for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue, 17 Jan 2017 17:29:02 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0263D285F1
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue, 17 Jan 2017 17:29:02 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id EB8AE285F4; Tue, 17 Jan 2017 17:29:01 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.6 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,RCVD_IN_SORBS_SPAM,T_DKIM_INVALID autolearn=ham
	version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 53C07285F8
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue, 17 Jan 2017 17:29:01 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1cTXX4-0008AK-3L; Tue, 17 Jan 2017 17:26:54 +0000
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <raistlin.df@gmail.com>) id 1cTXX2-00089w-N7
	for xen-devel@lists.xenproject.org; Tue, 17 Jan 2017 17:26:52 +0000
Received: from [85.158.137.68] by server-7.bemta-3.messagelabs.com id
	4F/12-23854-BD35E785; Tue, 17 Jan 2017 17:26:51 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrDIsWRWlGSWpSXmKPExsXiVRvkons7uC7
	C4MZhFovvWyYzOTB6HP5whSWAMYo1My8pvyKBNePAkpqCVa4VX069Z2tgfGfaxcjFISQwnVHi
	zJofLF2MnBwsAmtYJXYeZgNJSAhcYpVoPPsZLCEhECMx5fAXVgi7SmL/tA4mEFtIQEXi5vZVT
	BCTZjBJvOt+zAiSEBbQkzhy9Ac7hO0vsbLzIzOIzSZgIPFmx16wQSICShL3Vk0GG8QsUCRxbX
	IXM8QVqhJ77lxlA7F5BXwk7t/+CTSHg4MTyH51VhJir7fE3qYLYK2iAnISKy+3sEKUC0qcnPm
	EBaScWUBTYv0ufYjp8hLb385hnsAoMgtJ1SyEqllIqhYwMq9i1ChOLSpLLdI1MtJLKspMzyjJ
	TczM0TU0MNbLTS0uTkxPzUlMKtZLzs/dxAgM/XoGBsYdjFNP+B1ilORgUhLl7XhcGyHEl5SfU
	pmRWJwRX1Sak1p8iFGDg0Ngwtm505mkWPLy81KVJHi7g+oihASLUtNTK9Iyc4DRCVMqwcGjJM
	L7EyTNW1yQmFucmQ6ROsWoy7Fr1+WXTEJgM6TEed+AFAmAFGWU5sGNgCWKS4yyUsK8jAwMDEI
	8BalFuZklqPKvGMU5GJWEeTmAaUeIJzOvBG7TK6AjmICOuK5TDXJESSJCSqqB0W353vbk8/s3
	X6+JPdV99c3Llmta7GLHuW/+d5nXIOxlLW6bObuifpFcXfba6dmTbununiTt89u8TOmjks9et
	s8z+rQ7lijw8p2Zyu+opmu6SmPflviVdyY9OO+10uTbuafMqgWtFxQ7dmUz6/+Id9vc9ljq8j
	MTlSa76TEn1grc6t259v6iQCWW4oxEQy3mouJEAO8lKcQPAwAA
X-Env-Sender: raistlin.df@gmail.com
X-Msg-Ref: server-12.tower-31.messagelabs.com!1484674010!64244707!1
X-Originating-IP: [74.125.82.68]
X-SpamReason: No, hits=0.0 required=7.0 tests=
X-StarScan-Received: 
X-StarScan-Version: 9.1.1; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 14349 invoked from network); 17 Jan 2017 17:26:51 -0000
Received: from mail-wm0-f68.google.com (HELO mail-wm0-f68.google.com)
	(74.125.82.68)
	by server-12.tower-31.messagelabs.com with AES128-GCM-SHA256
	encrypted SMTP; 17 Jan 2017 17:26:51 -0000
Received: by mail-wm0-f68.google.com with SMTP id c85so7501538wmi.1
	for <xen-devel@lists.xenproject.org>;
	Tue, 17 Jan 2017 09:26:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=sender:subject:from:to:cc:date:message-id:in-reply-to:references
	:user-agent:mime-version:content-transfer-encoding;
	bh=J1HkGYXMSNBAQy0C2nyprFqfy8sYbMjbjA8OkYU1VtI=;
	b=nYbEIwcEq36HswkuCMBpclhsShYfh93GQaKhxGGkqE3IuabxxNOaxmEzXZ35kFJGf+
	KT7XbRYqlyxOTB4vhW2Bb3jKzWRlQvAhzMXn/c9UOqdfqhtDsa6u43BUoHadN1/w/536
	dbZqxiDS3RJBuBcoM6fHD9LAV6WnlYRkjHlx0FD9ItavsEkZXadx+2xhYu6b4oWIEUP0
	VQfjhPcQ8aeJ2Y9VPon1twoGUoaRDSgC+7FLqAUFIfthUmP7W0JkJeY9TQJYJEgnZmYi
	suiGcgIKCwW27ZsVZuoV5JtLhdv3s1rfEkSaElEb+l3I3x4n0nI2SKQw7gkvsh83xIe6
	Py7Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:sender:subject:from:to:cc:date:message-id
	:in-reply-to:references:user-agent:mime-version
	:content-transfer-encoding;
	bh=J1HkGYXMSNBAQy0C2nyprFqfy8sYbMjbjA8OkYU1VtI=;
	b=EQeJvdzgboY1TXfivwO+NVtpm+vyeqRKrCAWrZYjxvwlkk9xqdgdXDi3d2CNuJZFWW
	aPlR/8atR9CbD1uWzoS/ts1o3MTwwQiho1zeZv3vDbglRHCbcTJrCMyEpHx0++zJskzm
	y7ry/PDfCVSgeJ6EBBLnsyHYj9nDeacAwK0Ithtm2sWztILa9/eQIH/GaBcwB53yZ8Hz
	CmwFAfBB/6EBuqQVOuzLzIx+S91RrHNoGKfATRBRp9RSFDtzlGnHFoaDhpIODv5XAv8R
	+c4IiNbz+0Na+nvU5Oz2KppPbQJIbhpjC01O6zlyD9t07VNGKil7kw/Qqp5/LYoaPPf0
	1O/w==
X-Gm-Message-State: 
 AIkVDXKOTul1sFX6ryuK8fLvbebz9zInweJhWpiJz7ziukJ7kYp1J7rHHly3sdSrtK+Ggw==
X-Received: by 10.28.234.193 with SMTP id g62mr16388374wmi.36.1484674010383;
	Tue, 17 Jan 2017 09:26:50 -0800 (PST)
Received: from Solace.fritz.box (58-209-66-80.hosts.abilene.it.
	[80.66.209.58]) by smtp.gmail.com with ESMTPSA id
	l9sm38456356wmf.18.2017.01.17.09.26.48
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Tue, 17 Jan 2017 09:26:49 -0800 (PST)
From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xenproject.org
Date: Tue, 17 Jan 2017 18:26:46 +0100
Message-ID: <148467400670.27920.10444838852821010432.stgit@Solace.fritz.box>
In-Reply-To: <148467379229.27920.2367500429219327194.stgit@Solace.fritz.box>
References: <148467379229.27920.2367500429219327194.stgit@Solace.fritz.box>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Juergen Gross <jgross@suse.com>, Jan Beulich <jbeulich@suse.com>
Subject: [Xen-devel] [PATCH 2/5] xen: credit2: never consider CPUs outside
	of our cpupool.
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

In fact, relying on the mask of what pCPUs belong to
which Credit2 runqueue is not enough. If we only do that,
when Credit2 is the boot scheduler, we may ASSERT() or
panic when moving a pCPU from Pool-0 to another cpupool.

This is because pCPUs outside of any pool are considered
part of cpupool0. This puts us at risk of crash when those
same pCPUs are added to another pool and something
different than the idle domain is found to be running
on them.

Note that, even if we prevent the above to happen (which
is the purpose of this patch), this is still pretty bad,
in fact, when we remove a pCPU from Pool-0:
- in Credit1, as we do *not* update prv->ncpus and
  prv->credit, which means we're considering the wrong
  total credits when doing accounting;
- in Credit2, the pCPU remains part of one runqueue,
  and is hence at least considered during load balancing,
  even if no vCPU should really run there.

In Credit1, this "only" causes skewed accounting and
no crashes because there is a lot of `cpumask_and`ing
going on with the cpumask of the domains' cpupool
(which, BTW, comes at a price).

A quick and not to involved (and easily backportable)
solution for Credit2, is to do exactly the same.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com
Acked-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
This is a bugfix, and should be backported to 4.8.
---
The proper solution would mean calling deinit_pdata() when removing a pCPU from
cpupool0, as well as a bit more of code reshuffling.

And, although worth doing, it certainly will take more work, more time, and
will probably be hard (well, surely harder than this) to backport.

Therefore, I'd argue in favor of both taking and backporting this change, which
at least enables using Credit2 as default scheduler without risking a crash
when creating a second cpupool.

Afterwards, a proper solution would be proposed for Xen 4.9.

Finally, given the wide number of issues similar to this that I've found and
fixed in the last release cycle, I think it would be good to take a stab at
whether the interface between cpupools and the schedulers could not be
simplified. :-O

Regards,
Dario
---
 xen/common/sched_credit2.c |   59 ++++++++++++++++++++++++++++----------------
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 523922e..ce0e146 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -510,19 +510,22 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
-    int fallback_cpu, cpu = svc->vcpu->processor;
+    struct vcpu *v = svc->vcpu;
+    int cpu = v->processor;
 
-    if ( likely(cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity)) )
-        return cpu;
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
 
-    cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
-                &svc->rqd->active);
-    fallback_cpu = cpumask_first(cpumask_scratch_cpu(cpu));
-    if ( likely(fallback_cpu < nr_cpu_ids) )
-        return fallback_cpu;
+    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
+        return cpu;
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                cpupool_domain_cpumask(svc->vcpu->domain));
+    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                   &svc->rqd->active)) )
+    {
+        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
+                    cpumask_scratch_cpu(cpu));
+        return cpumask_first(cpumask_scratch_cpu(cpu));
+    }
 
     ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
 
@@ -940,6 +943,9 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                     (unsigned char *)&d);
     }
 
+    cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
+                cpupool_domain_cpumask(new->vcpu->domain));
+
     /*
      * First of all, consider idle cpus, checking if we can just
      * re-use the pcpu where we were running before.
@@ -952,7 +958,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
     else
         cpumask_copy(&mask, &rqd->smt_idle);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     i = cpumask_test_or_cycle(cpu, &mask);
     if ( i < nr_cpu_ids )
     {
@@ -967,7 +973,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
      * gone through the scheduler yet.
      */
     cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     i = cpumask_test_or_cycle(cpu, &mask);
     if ( i < nr_cpu_ids )
     {
@@ -983,7 +989,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
      */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     if ( cpumask_test_cpu(cpu, &mask) )
     {
         cur = CSCHED2_VCPU(curr_on_cpu(cpu));
@@ -1525,6 +1531,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         goto out;
     }
 
+    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+                cpupool_domain_cpumask(vc->domain));
+
     /*
      * First check to see if we're here because someone else suggested a place
      * for us to move.
@@ -1536,13 +1545,13 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
             printk(XENLOG_WARNING "%s: target runqueue disappeared!\n",
                    __func__);
         }
-        else
+        else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                     &svc->migrate_rqd->active) )
         {
-            cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                         &svc->migrate_rqd->active);
             new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
-            if ( new_cpu < nr_cpu_ids )
-                goto out_up;
+            goto out_up;
         }
         /* Fall-through to normal cpu pick */
     }
@@ -1570,12 +1579,12 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
          */
         if ( rqd == svc->rqd )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = rqd->b_avgload;
 
             spin_unlock(&rqd->lock);
@@ -1597,7 +1606,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         goto out_up;
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                 &prv->rqd[min_rqi].active);
     new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
     BUG_ON(new_cpu >= nr_cpu_ids);
@@ -1713,6 +1722,8 @@ static void migrate(const struct scheduler *ops,
         __runq_deassign(svc);
 
         cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+                    cpupool_domain_cpumask(svc->vcpu->domain));
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
         svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
         ASSERT(svc->vcpu->processor < nr_cpu_ids);
@@ -1738,8 +1749,14 @@ static void migrate(const struct scheduler *ops,
 static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
                                   struct csched2_runqueue_data *rqd)
 {
+    struct vcpu *v = svc->vcpu;
+    int cpu = svc->vcpu->processor;
+
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
+
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
-           cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
+           cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
 }
 
 static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)