From patchwork Wed Apr  6 17:24:16 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dario Faggioli <dario.faggioli@citrix.com>
X-Patchwork-Id: 8764031
Return-Path: <xen-devel-bounces@lists.xen.org>
X-Original-To: patchwork-xen-devel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 7E1899F39A
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed,  6 Apr 2016 17:26:19 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 45361201EC
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed,  6 Apr 2016 17:26:18 +0000 (UTC)
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id DBA69201D3
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed,  6 Apr 2016 17:26:16 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1anrBo-0001eI-9c; Wed, 06 Apr 2016 17:24:24 +0000
Received: from mail6.bemta6.messagelabs.com ([85.158.143.247])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <raistlin.df@gmail.com>) id 1anrBn-0001dj-RC
	for xen-devel@lists.xenproject.org; Wed, 06 Apr 2016 17:24:24 +0000
Received: from [85.158.143.35] by server-3.bemta-6.messagelabs.com id
	B2/25-07120-74645075; Wed, 06 Apr 2016 17:24:23 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrDIsWRWlGSWpSXmKPExsXiVRvkpOvmxhp
	usPSUqsX3LZOZHBg9Dn+4whLAGMWamZeUX5HAmnH55Du2gl8BFQ+7D7E2ME636mLk4hASmMEo
	sfzAFEYQh0VgDavEw0vrWEAcCYFLrBL3D81l72LkBHJiJLq6p7NC2JUSzw//A7OFBFQkbm5fx
	QQxajGTxMkvC5lBEsICehJHjv5gh7B9JfrONbGA2GwCBhJvduwFaxYRUJK4t2oyE4jNLBAkcX
	vOUTCbRUBV4v6mr0BzODh4BewlDk4pAzE5BRwk9kx2h1hrL3H1ywmwTaICchIrL7eATeQVEJQ
	4OfMJC0g5s4CmxPpd+hDD5SW2v53DPIFRZBaSqlkIVbOQVC1gZF7FqF6cWlSWWqRrrJdUlJme
	UZKbmJmja2hgppebWlycmJ6ak5hUrJecn7uJERj6DECwg7Hjn9MhRkkOJiVRXk8J1nAhvqT8l
	MqMxOKM+KLSnNTiQ4waHBwCE87Onc4kxZKXn5eqJMEb7ApUJ1iUmp5akZaZA4xOmFIJDh4lEV
	5XkDRvcUFibnFmOkTqFKMux5ap99YyCYHNkBLnDQUpEgApyijNgxsBSxSXGGWlhHkZgQ4U4il
	ILcrNLEGVf8UozsGoJMybCzKFJzOvBG7TK6AjmICOqBdmAjmiJBEhJdXAuCc1d1mzcdjzU82c
	yow3XVmW91iWxOquiXTxbeDaLTetsVqea0FaS+SdufvzZDeJXjSzd5nanfBHziTndOsjY4frW
	1PyphzRfCgw1fl8q5j4y6jpq43tz81YKPnjne6WiDi+aYEBPh39mWFtQnP7njRPfBjaIugl+t
	Jh4o4VbhskZ782NOdRYinOSDTUYi4qTgQAiqX6zA8DAAA=
X-Env-Sender: raistlin.df@gmail.com
X-Msg-Ref: server-10.tower-21.messagelabs.com!1459963462!7788889!1
X-Originating-IP: [74.125.82.66]
X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG
X-StarScan-Received: 
X-StarScan-Version: 8.28; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 31986 invoked from network); 6 Apr 2016 17:24:22 -0000
Received: from mail-wm0-f66.google.com (HELO mail-wm0-f66.google.com)
	(74.125.82.66)
	by server-10.tower-21.messagelabs.com with AES128-GCM-SHA256
	encrypted SMTP; 6 Apr 2016 17:24:22 -0000
Received: by mail-wm0-f66.google.com with SMTP id o129so5679732wmo.3
	for <xen-devel@lists.xenproject.org>;
	Wed, 06 Apr 2016 10:24:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=sender:subject:from:to:cc:date:message-id:in-reply-to:references
	:user-agent:mime-version:content-transfer-encoding;
	bh=MnMScPtaq5EIreehC1y372bsIQB1QXkYrSQFI3SC8Ao=;
	b=POdFyCjOZfZNW7fWoNxP0CKlNtZPpB3slFGjmrdISB+OxYQb6xIiPLObvrbfaKjRbx
	v2HIw1tHFTtMuCLu5du/QexwTs+PHU3PdrphW/IHbydF1NnB7ekClifuiTqcnnIiowPB
	juayDZLH1nwrB2fMFLExUFxrGu+apkjrIkG7RzqF2Olm9KRtxFqD65ZQsR2qoD5VgmCG
	cp+BMflASnXRQW5CFMbXCdZdgPgScyXrao2zoPgnnMB8C1jNVbp5JKYwX95ib6tIDfLe
	m78nHv3/h/R73YX+ZgJg2y9a4uk8gwOKcIYwWmzVVpD3RlxRH7lTpNJiQqb/jw0UUU3n
	8rPQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:sender:subject:from:to:cc:date:message-id
	:in-reply-to:references:user-agent:mime-version
	:content-transfer-encoding;
	bh=MnMScPtaq5EIreehC1y372bsIQB1QXkYrSQFI3SC8Ao=;
	b=Ha7BCe73hJTEJRr1P21LA3okHOvTZ4DhYNmTY4ybRcnbJi2mt0krr/j2AecvzW3Ga2
	nU2gGWSBFjVNGXUnxXmcz3Ga5eNngnmLQ0uhfBUIkPJIWCt8uaB5dDYdy2HkwmMT7XPe
	PMquqx74t6zg403PuRcDFYWi9wSuBQnlHWJqxUV8U9TDvAsJUMaAMo5JjiSIgEG3XuuI
	7AAxdDvtr+FcCoYlX60cQYHQvdinzwi+NX4NpiISsaVYgRvva/3SHsmLwMUc2WaAOfhN
	9dckBo5+TA4Bk4RnseYh9Ne5w2kBwi4k23Ip4rGDiLZ8yZpRHmnmvbD1Qc9e0rAalpaa
	sWXQ==
X-Gm-Message-State: 
 AD7BkJKqVwbxRdbW9AcIFPv9wKiLCXVj7uxTigOWaU6c559oyGVoUJly4WCtahKYA8Nkiw==
X-Received: by 10.28.234.137 with SMTP id g9mr18129846wmi.78.1459963462305;
	Wed, 06 Apr 2016 10:24:22 -0700 (PDT)
Received: from Solace.fritz.box (net-37-116-155-252.cust.vodafonedsl.it.
	[37.116.155.252]) by smtp.gmail.com with ESMTPSA id
	ka4sm4226700wjc.47.2016.04.06.10.24.19
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Wed, 06 Apr 2016 10:24:21 -0700 (PDT)
From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xenproject.org
Date: Wed, 06 Apr 2016 19:24:16 +0200
Message-ID: <20160406172416.25877.79330.stgit@Solace.fritz.box>
In-Reply-To: <20160406170023.25877.15622.stgit@Solace.fritz.box>
References: <20160406170023.25877.15622.stgit@Solace.fritz.box>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Cc: Justin Weaver <jtweaver@hawaii.edu>,
	George Dunlap <george.dunlap@citrix.com>
Subject: [Xen-devel] [PATCH v2 11/11] xen: sched: implement vcpu hard
	affinity in Credit2
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED, T_DKIM_INVALID,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

From: Justin Weaver <jtweaver@hawaii.edu>

as it was still missing.

Note that this patch "only" implements hard affinity,
i.e., the possibility of specifying on what pCPUs a
certain vCPU can run. Soft affinity (which express a
preference for vCPUs to run on certain pCPUs) is still
not supported by Credit2, even after this patch.

Signed-off-by: Justin Weaver <jtweaver@hawaii.edu>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
---
 xen/common/sched_credit2.c |  131 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 102 insertions(+), 29 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 084963a..03cd10c 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -310,6 +310,36 @@ struct csched2_dom {
     uint16_t nr_vcpus;
 };
 
+/*
+ * When a hard affinity change occurs, we may not be able to check some
+ * (any!) of the other runqueues, when looking for the best new processor
+ * for svc (as trylock-s in choose_cpu() can fail). If that happens, we
+ * pick, in order of decreasing preference:
+ *  - svc's current pcpu;
+ *  - another pcpu from svc's current runq;
+ *  - any cpu.
+ */
+static int get_fallback_cpu(struct csched2_vcpu *svc)
+{
+    int cpu;
+
+    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
+                                 svc->vcpu->cpu_hard_affinity)) )
+        return svc->vcpu->processor;
+
+    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+                &svc->rqd->active);
+    cpu = cpumask_first(cpumask_scratch);
+    if ( likely(cpu < nr_cpu_ids) )
+        return cpu;
+
+    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+                cpupool_domain_cpumask(svc->vcpu->domain));
+
+    ASSERT(!cpumask_empty(cpumask_scratch));
+
+    return cpumask_first(cpumask_scratch);
+}
 
 /*
  * Time-to-credit, credit-to-time.
@@ -543,8 +573,9 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu *
         goto tickle;
     }
     
-    /* Get a mask of idle, but not tickled */
+    /* Get a mask of idle, but not tickled, that new is allowed to run on. */
     cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
+    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
     
     /* If it's not empty, choose one */
     i = cpumask_cycle(cpu, &mask);
@@ -555,9 +586,11 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu *
     }
 
     /* Otherwise, look for the non-idle cpu with the lowest credit,
-     * skipping cpus which have been tickled but not scheduled yet */
+     * skipping cpus which have been tickled but not scheduled yet,
+     * that new is allowed to run on. */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
+    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
 
     for_each_cpu(i, &mask)
     {
@@ -1107,9 +1140,8 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
             d2printk("%pv -\n", svc->vcpu);
             clear_bit(__CSFLAG_runq_migrate_request, &svc->flags);
         }
-        /* Leave it where it is for now.  When we actually pay attention
-         * to affinity we'll have to figure something out... */
-        return vc->processor;
+
+        return get_fallback_cpu(svc);
     }
 
     /* First check to see if we're here because someone else suggested a place
@@ -1120,45 +1152,56 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
         {
             printk("%s: Runqueue migrate aborted because target runqueue disappeared!\n",
                    __func__);
-            /* Fall-through to normal cpu pick */
         }
         else
         {
-            d2printk("%pv +\n", svc->vcpu);
-            new_cpu = cpumask_cycle(vc->processor, &svc->migrate_rqd->active);
-            goto out_up;
+            cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+                        &svc->migrate_rqd->active);
+            new_cpu = cpumask_any(cpumask_scratch);
+            if ( new_cpu < nr_cpu_ids )
+            {
+                d2printk("%pv +\n", svc->vcpu);
+                goto out_up;
+            }
         }
+        /* Fall-through to normal cpu pick */
     }
 
-    /* FIXME: Pay attention to cpu affinity */                                                                                      
-
     min_avgload = MAX_LOAD;
 
     /* Find the runqueue with the lowest instantaneous load */
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
-        s_time_t rqd_avgload;
+        s_time_t rqd_avgload = MAX_LOAD;
 
         rqd = prv->rqd + i;
 
-        /* If checking a different runqueue, grab the lock,
-         * read the avg, and then release the lock.
+        /*
+         * If checking a different runqueue, grab the lock, check hard
+         * affinity, read the avg, and then release the lock.
          *
          * If on our own runqueue, don't grab or release the lock;
          * but subtract our own load from the runqueue load to simulate
-         * impartiality */
+         * impartiality.
+         *
+         * Note that, if svc's hard affinity has changed, this is the
+         * first time when we see such change, so it is indeed possible
+         * that none of the cpus in svc's current runqueue is in our
+         * (new) hard affinity!
+         */
         if ( rqd == svc->rqd )
         {
-            rqd_avgload = rqd->b_avgload - svc->avgload;
+            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+                rqd_avgload = rqd->b_avgload - svc->avgload;
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            rqd_avgload = rqd->b_avgload;
+            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+                rqd_avgload = rqd->b_avgload;
+
             spin_unlock(&rqd->lock);
         }
-        else
-            continue;
 
         if ( rqd_avgload < min_avgload )
         {
@@ -1167,12 +1210,14 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc)
         }
     }
 
-    /* We didn't find anyone (most likely because of spinlock contention); leave it where it is */
+    /* We didn't find anyone (most likely because of spinlock contention). */
     if ( min_rqi == -1 )
-        new_cpu = vc->processor;
+        new_cpu = get_fallback_cpu(svc);
     else
     {
-        new_cpu = cpumask_cycle(vc->processor, &prv->rqd[min_rqi].active);
+        cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+                    &prv->rqd[min_rqi].active);
+        new_cpu = cpumask_any(cpumask_scratch);
         BUG_ON(new_cpu >= nr_cpu_ids);
     }
 
@@ -1252,7 +1297,12 @@ static void migrate(const struct scheduler *ops,
             on_runq=1;
         }
         __runq_deassign(svc);
-        svc->vcpu->processor = cpumask_any(&trqd->active);
+
+        cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+                    &trqd->active);
+        svc->vcpu->processor = cpumask_any(cpumask_scratch);
+        BUG_ON(svc->vcpu->processor >= nr_cpu_ids);
+
         __runq_assign(svc, trqd);
         if ( on_runq )
         {
@@ -1266,6 +1316,17 @@ static void migrate(const struct scheduler *ops,
     }
 }
 
+/*
+ * It makes sense considering migrating svc to rqd, if:
+ *  - svc is not already flagged to migrate,
+ *  - if svc is allowed to run on at least one of the pcpus of rqd.
+ */
+static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
+                                  struct csched2_runqueue_data *rqd)
+{
+    return !(svc->flags & CSFLAG_runq_migrate_request) &&
+           cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
+}
 
 static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
 {
@@ -1374,8 +1435,7 @@ retry:
 
         __update_svc_load(ops, push_svc, 0, now);
 
-        /* Skip this one if it's already been flagged to migrate */
-        if ( push_svc->flags & CSFLAG_runq_migrate_request )
+        if ( !vcpu_is_migrateable(push_svc, st.orqd) )
             continue;
 
         list_for_each( pull_iter, &st.orqd->svc )
@@ -1387,8 +1447,7 @@ retry:
                 __update_svc_load(ops, pull_svc, 0, now);
             }
         
-            /* Skip this one if it's already been flagged to migrate */
-            if ( pull_svc->flags & CSFLAG_runq_migrate_request )
+            if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
                 continue;
 
             consider(&st, push_svc, pull_svc);
@@ -1404,8 +1463,7 @@ retry:
     {
         struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem);
         
-        /* Skip this one if it's already been flagged to migrate */
-        if ( pull_svc->flags & CSFLAG_runq_migrate_request )
+        if ( !vcpu_is_migrateable(pull_svc, st.lrqd) )
             continue;
 
         /* Consider pull only */
@@ -1444,11 +1502,22 @@ csched2_vcpu_migrate(
 
     /* Check if new_cpu is valid */
     BUG_ON(!cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
+    ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
 
     trqd = RQD(ops, new_cpu);
 
+    /*
+     * Do the actual movement toward new_cpu, and update vc->processor.
+     * If we are changing runqueue, migrate() takes care of everything.
+     * If we are not changing runqueue, we need to update vc->processor
+     * here. In fact, if, for instance, we are here because the vcpu's
+     * hard affinity changed, we don't want to risk leaving vc->processor
+     * pointing to a pcpu where we can't run any longer.
+     */
     if ( trqd != svc->rqd )
         migrate(ops, svc, trqd, NOW());
+    else
+        vc->processor = new_cpu;
 }
 
 static int
@@ -1671,6 +1740,10 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     {
         struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
 
+        /* Only consider vcpus that are allowed to run on this processor. */
+        if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
+            continue;
+
         /* If this is on a different processor, don't pull it unless
          * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
         if ( svc->vcpu->processor != cpu