From patchwork Thu Feb  9 10:32:53 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Dario Faggioli <dario.faggioli@citrix.com>
X-Patchwork-Id: 9564341
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	A6F336020C for <patchwork-xen-devel@patchwork.kernel.org>;
	Thu,  9 Feb 2017 10:35:35 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 85CFC2852A
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Thu,  9 Feb 2017 10:35:35 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 7A3DC2852D; Thu,  9 Feb 2017 10:35:35 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
	autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B0FE62852A
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Thu,  9 Feb 2017 10:35:33 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1cbm2C-0003nt-OI; Thu, 09 Feb 2017 10:33:04 +0000
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <prvs=206a8f341=dario.faggioli@citrix.com>)
	id 1cbm2B-0003nl-TU
	for xen-devel@lists.xenproject.org; Thu, 09 Feb 2017 10:33:04 +0000
Received: from [85.158.137.68] by server-12.bemta-3.messagelabs.com id
	5E/D6-16730-F554C985; Thu, 09 Feb 2017 10:33:03 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupkleJIrShJLcpLzFFi42JxWrohUjfOdU6
	Ewfon/Bbft0xmcmD0OPzhCksAYxRrZl5SfkUCa8bOT99ZC14mVSz7/IC5gfFYQhcjJ4eEQIjE
	mmkNzCA2r4ChxMyFL1i7GDk4hAWCJY58TAcJswkYSLzZsZcVxBYRUJbo/fWbBcRmFiiWuLl5C
	xOIzSKgInHs+3k2EJtTwF5iWcNWoHouDiGBW8wSvb33webzC0hK3PrykRmiuVpiz9EbrBA3aE
	v07P0BdYOgxMmZT8AWCAmoScyYe5l1AiPfLCQts5CUQcQdJOYtu8MKYWtKtG7/zQ5ha0ssW/i
	aGcJOlZj19AAjhJ0n8WfyAaheRYkp3Q+h6m0l1q17DxW3kdh0dQFUvbzE9rdzmBcwcq9i1ChO
	LSpLLdI1NNRLKspMzyjJTczM0TU0MNbLTS0uTkxPzUlMKtZLzs/dxAiMFgYg2MG4+rfTIUZJD
	iYlUd6I/7MjhPiS8lMqMxKLM+KLSnNSiw8xanBwCEw4O3c6kxRLXn5eqpIEr6nLnAghwaLU9N
	SKtMwcYDzDlEpw8CiJ8GaDpHmLCxJzizPTIVKnGBWlxHm5QBICIImM0jy4NlgKucQoKyXMywh
	0lBBPQWpRbmYJqvwrRnEORiVh3hyQKTyZeSVw018BLWYCWnz99CyQxSWJCCmpBsaZ/wR7KwKa
	xcQ5ll6d+syY64XoPEHBpl/M2p+yL29l4ukOfBeR7N/kXJh5xSMsRPnhdCfVFVaL23e437C0Z
	l+xooBpTZ3wpWlSNzMuPOn3rdmnfCRQ789t73PzO62bIqet2PKhria+7egdE84Snl0vTi6eLr
	xyTpDR6yTLpZL9zM4iy+7fOqXEUpyRaKjFXFScCADTwGc2HAMAAA==
X-Env-Sender: prvs=206a8f341=dario.faggioli@citrix.com
X-Msg-Ref: server-3.tower-31.messagelabs.com!1486636379!84723495!1
X-Originating-IP: [66.165.176.89]
X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor:
	VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni44OSA9PiAyMDMwMDc=\n,
	received_headers: No Received headers
X-StarScan-Received: 
X-StarScan-Version: 9.1.1; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 23215 invoked from network); 9 Feb 2017 10:33:01 -0000
Received: from smtp.citrix.com (HELO SMTP.CITRIX.COM) (66.165.176.89)
	by server-3.tower-31.messagelabs.com with RC4-SHA encrypted SMTP;
	9 Feb 2017 10:33:01 -0000
X-IronPort-AV: E=Sophos;i="5.35,349,1484006400";
	d="asc'?scan'208";a="405695774"
Message-ID: <1486636373.3042.28.camel@citrix.com>
From: Dario Faggioli <dario.faggioli@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Date: Thu, 9 Feb 2017 11:32:53 +0100
In-Reply-To: <589C41BA020000780013816B@prv-mh.provo.novell.com>
References: <148467379229.27920.2367500429219327194.stgit@Solace.fritz.box>
	<148467400670.27920.10444838852821010432.stgit@Solace.fritz.box>
	<5894505C02000078001366A3@prv-mh.provo.novell.com>
	<1486135653.16676.5.camel@citrix.com>
	<5894B27B02000078001368EC@prv-mh.provo.novell.com>
	<1486572516.3042.15.camel@citrix.com>
	<589B5D460200007800137E42@prv-mh.provo.novell.com>
	<1486580125.3042.22.camel@citrix.com>
	<589C41BA020000780013816B@prv-mh.provo.novell.com>
Organization: Citrix Inc.
X-Mailer: Evolution 3.20.5 (3.20.5-1.fc24) 
MIME-Version: 1.0
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	xen-devel@lists.xenproject.org, Juergen Gross <JGross@suse.com>
Subject: Re: [Xen-devel] [PATCH 2/5] xen: credit2: never consider CPUs
	outside of our cpupool.
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

On Thu, 2017-02-09 at 02:17 -0700, Jan Beulich wrote:
> > > > On 08.02.17 at 19:55, <dario.faggioli@citrix.com> wrote:
>  I'm going to commit what I have later today, and
> I'll pull in the one extra backport next time round.
> 
Ok, patch attached.

I've tested it on top of current tip of staging-4.7.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

commit 09725f8af37415c30a1a53d4a34e67fabcba105d
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Wed Feb 8 19:01:53 2017 +0100

    xen: credit2: never consider CPUs outside of our cpupool.
    
    In fact, relying on the mask of what pCPUs belong to
    which Credit2 runqueue is not enough. If we only do that,
    when Credit2 is the boot scheduler, we may ASSERT() or
    panic when moving a pCPU from Pool-0 to another cpupool.
    
    This is because pCPUs outside of any pool are considered
    part of cpupool0. This puts us at risk of crash when those
    same pCPUs are added to another pool and something
    different than the idle domain is found to be running
    on them.
    
    Note that, even if we prevent the above to happen (which
    is the purpose of this patch), this is still pretty bad,
    in fact, when we remove a pCPU from Pool-0:
    - in Credit1, as we do *not* update prv->ncpus and
      prv->credit, which means we're considering the wrong
      total credits when doing accounting;
    - in Credit2, the pCPU remains part of one runqueue,
      and is hence at least considered during load balancing,
      even if no vCPU should really run there.
    
    In Credit1, this "only" causes skewed accounting and
    no crashes because there is a lot of `cpumask_and`ing
    going on with the cpumask of the domains' cpupool
    (which, BTW, comes at a price).
    
    A quick and not to involved (and easily backportable)
    solution for Credit2, is to do exactly the same.
    
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com
    ---
    Cc: George Dunlap <george.dunlap@eu.citrix.com>
    Cc: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 25b4c91..35dad15 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -331,19 +331,22 @@ static int csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc);
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
-    int fallback_cpu, cpu = svc->vcpu->processor;
+    struct vcpu *v = svc->vcpu;
+    int cpu = v->processor;
 
-    if ( likely(cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity)) )
-        return cpu;
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
 
-    cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
-                &svc->rqd->active);
-    fallback_cpu = cpumask_first(cpumask_scratch_cpu(cpu));
-    if ( likely(fallback_cpu < nr_cpu_ids) )
-        return fallback_cpu;
+    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
+        return cpu;
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                cpupool_domain_cpumask(svc->vcpu->domain));
+    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                   &svc->rqd->active)) )
+    {
+        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
+                    cpumask_scratch_cpu(cpu));
+        return cpumask_first(cpumask_scratch_cpu(cpu));
+    }
 
     ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
 
@@ -582,9 +585,12 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu *
         goto tickle;
     }
     
+    cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
+                cpupool_domain_cpumask(new->vcpu->domain));
+
     /* Get a mask of idle, but not tickled, that new is allowed to run on. */
     cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     
     /* If it's not empty, choose one */
     i = cpumask_cycle(cpu, &mask);
@@ -599,7 +605,7 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu *
      * that new is allowed to run on. */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
 
     for_each_cpu(i, &mask)
     {
@@ -1160,6 +1166,9 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
         return get_fallback_cpu(svc);
     }
 
+    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+                cpupool_domain_cpumask(vc->domain));
+
     /* First check to see if we're here because someone else suggested a place
      * for us to move. */
     if ( test_and_clear_bit(__CSFLAG_runq_migrate_request, &svc->flags) )
@@ -1169,16 +1178,14 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
             printk("%s: Runqueue migrate aborted because target runqueue disappeared!\n",
                    __func__);
         }
-        else
+        else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                     &svc->migrate_rqd->active) )
         {
-            cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                         &svc->migrate_rqd->active);
             new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
-            if ( new_cpu < nr_cpu_ids )
-            {
-                d2printk("%pv +\n", svc->vcpu);
-                goto out_up;
-            }
+            d2printk("%pv +\n", svc->vcpu);
+            goto out_up;
         }
         /* Fall-through to normal cpu pick */
     }
@@ -1208,12 +1215,12 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
          */
         if ( rqd == svc->rqd )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = rqd->b_avgload - svc->avgload;
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = rqd->b_avgload;
 
             spin_unlock(&rqd->lock);
@@ -1231,7 +1238,7 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
         new_cpu = get_fallback_cpu(svc);
     else
     {
-        cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &prv->rqd[min_rqi].active);
         new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
         BUG_ON(new_cpu >= nr_cpu_ids);
@@ -1318,6 +1325,8 @@ static void migrate(const struct scheduler *ops,
         __runq_deassign(svc);
 
         cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+                    cpupool_domain_cpumask(svc->vcpu->domain));
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
         svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
         BUG_ON(svc->vcpu->processor >= nr_cpu_ids);
@@ -1343,8 +1352,14 @@ static void migrate(const struct scheduler *ops,
 static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
                                   struct csched2_runqueue_data *rqd)
 {
+    struct vcpu *v = svc->vcpu;
+    int cpu = svc->vcpu->processor;
+
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
+
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
-           cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
+           cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
 }
 
 static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)