From patchwork Fri Jun 16 14:14:04 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dario Faggioli <dario.faggioli@citrix.com>
X-Patchwork-Id: 9791819
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	DF0936038E for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri, 16 Jun 2017 14:16:20 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D4715285C2
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri, 16 Jun 2017 14:16:20 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C946C28649; Fri, 16 Jun 2017 14:16:20 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 16CFD285C2
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri, 16 Jun 2017 14:16:20 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1dLs0o-0001wT-0u; Fri, 16 Jun 2017 14:14:10 +0000
Received: from mail6.bemta6.messagelabs.com ([193.109.254.103])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <raistlin.df@gmail.com>) id 1dLs0m-0001vt-HR
	for xen-devel@lists.xenproject.org; Fri, 16 Jun 2017 14:14:08 +0000
Received: from [85.158.143.35] by server-11.bemta-6.messagelabs.com id
	F4/0A-03587-FA7E3495; Fri, 16 Jun 2017 14:14:07 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpileJIrShJLcpLzFFi42K5GNpwWHf9c+d
	Igzer2Cy+b5nM5MDocfjDFZYAxijWzLyk/IoE1owJd14xFWxzqtjS+Ii5gfG2XhcjJ4eQwHRG
	ieU3K7sYuThYBNawShyfM4cJxJEQuMQqcfbSNkaQKgmBOIk1kzayQNhVEsffr2WC6FaRuLl9F
	ZT9g1HiyWNWEFtYQE/iyNEf7BB2oERX9xqwOWwCBhJvduwFqxERUJK4t2oy2DJmgSZGicc7m4
	EWcACdoSrxtrkCpIZXwFvi5YwtYHM4BXwkrj8/wgqxy1vi36PLbCC2qICcxMrLLawQ9YISJ2c
	+ARvDLKApsX6XPkiYWUBeYvvbOcwTGEVmIamahVA1C0nVAkbmVYzqxalFZalFukZ6SUWZ6Rkl
	uYmZObqGBmZ6uanFxYnpqTmJScV6yfm5mxiBoc8ABDsYl/11OsQoycGkJMr7/olzpBBfUn5KZ
	UZicUZ8UWlOavEhRg0ODoEJZ+dOZ5JiycvPS1WS4N32DKhOsCg1PbUiLTMHGJ0wpRIcPEoivK
	dAxvAWFyTmFmemQ6ROMRpzXLmy7gsTx5QD278wCYFNkhLn/QUySQCkNKM0D24QLGlcYpSVEuZ
	lBDpTiKcgtSg3swRV/hWjOAejkjCvPTAFCfFk5pXA7XsFdAoT0ClBFxxATilJREhJNTCujNsf
	sLLEtLKNR2KRvn5x9YO7q6Z7mt5VXfEvN+3I1+Nblkj23ut9Gu0iJnpvn/yjlhOPerumG0tdN
	PhgmPflkbFK0beUuEJfWQOR+3ba5/cdecm3cuPVZEPPxParThsUnWonOMf8/JAgHGZ+/97hdV
	P38q6aP8f56ol62fqJidfvaKdus1NWYinOSDTUYi4qTgQAnBVrTRUDAAA=
X-Env-Sender: raistlin.df@gmail.com
X-Msg-Ref: server-2.tower-21.messagelabs.com!1497622446!60324329!1
X-Originating-IP: [209.85.128.195]
X-SpamReason: No, hits=0.0 required=7.0 tests=
X-StarScan-Received: 
X-StarScan-Version: 9.4.19; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 55158 invoked from network); 16 Jun 2017 14:14:07 -0000
Received: from mail-wr0-f195.google.com (HELO mail-wr0-f195.google.com)
	(209.85.128.195)
	by server-2.tower-21.messagelabs.com with AES128-GCM-SHA256 encrypted
	SMTP; 16 Jun 2017 14:14:07 -0000
Received: by mail-wr0-f195.google.com with SMTP id 77so6719709wrb.3
	for <xen-devel@lists.xenproject.org>;
	Fri, 16 Jun 2017 07:14:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=sender:subject:from:to:cc:date:message-id:in-reply-to:references
	:user-agent:mime-version:content-transfer-encoding;
	bh=kO1uZSKsZhw+wt/qhFFyu2BuohGE5TznuCnCY7+lpgg=;
	b=BuPyP4u53QuiHNnv0YEa96/SqKmhatFNAenRHFRhFIOyT04GGFpvA8zc072NMTxJ16
	tmQGuDkK2FFdePCFLLlxqP8KKvZmEzzWY/fgy3ffchVjZcOocQqN1xF2Tf3Nb8s+Fh5H
	vF8K6vwfrHBM8J4VY8ZMNCB9WeBqdEHDpGnlK/LS7nM3DhsqRdUVtWH32Em88eaq8hC2
	5ZXkmZfIg6saPBx3eUQdw1CiiJh+3GE7EhP8Go8Y8nujBVKUv5Z6UiLE2k+2p3P7z3gH
	F4g+krxkDhDw0b+DEfemexH40FlPJ7INFhgCgH1GSAijh12q6ynZzlDgiwlvxaeDS77X
	6YnA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:sender:subject:from:to:cc:date:message-id
	:in-reply-to:references:user-agent:mime-version
	:content-transfer-encoding;
	bh=kO1uZSKsZhw+wt/qhFFyu2BuohGE5TznuCnCY7+lpgg=;
	b=uh8WlsFNglyGYxFmDHgnl76gLKlRMkupKfzhcYn5dsErDmn9bB4ZXwTQ1URGvRiZZ2
	z8BcVMlrTHA7WhlhpZmxgmM9Y36fR+eK93Ld7k1sqYXHhYRBTiV5p0MVIM6KyAlpzrjU
	NkTdlFCjedvTV8Y3GOBP5lv2eeldI+pVP79Xp7nxb1yGrdnflX6xN/lCEDmuuJNMPA6l
	lUD8wOvRCi38WfNRPNVq9OFnIw8Rg7W6+8OlxkPk/VNtC+fTm1GRsHPOO122k0TMvAdy
	dYolowOgSUTaOlc7tR2tYBV0Jf5PoelxQw00mzL53SYqdd/Sg0k5KxsgUpD9bhQLAS2V
	mXGw==
X-Gm-Message-State: AKS2vOwVrbr9uQ4bVqz55u3uvZZfIx0eFCQrdtJiAYq14ldr+Je40wwo
	MLw7NJ45U0aR1Yug
X-Received: by 10.223.170.219 with SMTP id i27mr3655698wrc.49.1497622446331;
	Fri, 16 Jun 2017 07:14:06 -0700 (PDT)
Received: from Solace.fritz.box ([80.66.223.68])
	by smtp.gmail.com with ESMTPSA id
	22sm2781782wrt.36.2017.06.16.07.14.05
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Fri, 16 Jun 2017 07:14:05 -0700 (PDT)
From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xenproject.org
Date: Fri, 16 Jun 2017 16:14:04 +0200
Message-ID: <149762244440.11899.3927310982261940597.stgit@Solace.fritz.box>
In-Reply-To: <149762114626.11899.6393770850121347748.stgit@Solace.fritz.box>
References: <149762114626.11899.6393770850121347748.stgit@Solace.fritz.box>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Cc: Anshul Makkar <anshul.makkar@citrix.com>,
	"Justin T. Weaver" <jtweaver@hawaii.edu>,
	George Dunlap <george.dunlap@citrix.com>
Subject: [Xen-devel] [PATCH 4/7] xen: credit2: soft-affinity awareness in
	csched2_cpu_pick()
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

We want to find the runqueue with the least average load,
and to do that, we scan through all the runqueues.

It is, therefore, enough that, during such scan:
- we identify the runqueue with the least load, among
  the ones that have pcpus that are part of the soft
  affinity of the vcpu we're calling pick on;
- we identify the same, but for hard affinity.

At this point, we can decide whether to go for the
runqueue with the least load among the ones with some
soft-affinity, or overall.

Therefore, at the price of some code reshuffling, we
can avoid the loop.

(Also, kill a spurious ';' in the definition of MAX_LOAD.)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |  117 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 97 insertions(+), 20 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 54f6e21..fb97ff7 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1725,14 +1725,16 @@ csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
     vcpu_schedule_unlock_irq(lock, vc);
 }
 
-#define MAX_LOAD (STIME_MAX);
+#define MAX_LOAD (STIME_MAX)
 static int
 csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    int i, min_rqi = -1, new_cpu, cpu = vc->processor;
+    int i, min_rqi = -1, min_s_rqi = -1;
+    unsigned int new_cpu, cpu = vc->processor;
     struct csched2_vcpu *svc = csched2_vcpu(vc);
-    s_time_t min_avgload = MAX_LOAD;
+    s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
+    bool has_soft;
 
     ASSERT(!cpumask_empty(&prv->active_queues));
 
@@ -1781,17 +1783,35 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
                                      &svc->migrate_rqd->active) )
         {
+            /*
+             * If we've been asked to move to migrate_rqd, we should just do
+             * that, which we actually do by returning one cpu from that runq.
+             * There is no need to take care of soft affinity, as that will
+             * happen in runq_tickle().
+             */
             cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                         &svc->migrate_rqd->active);
             new_cpu = cpumask_cycle(svc->migrate_rqd->pick_bias,
                                     cpumask_scratch_cpu(cpu));
+
             svc->migrate_rqd->pick_bias = new_cpu;
             goto out_up;
         }
         /* Fall-through to normal cpu pick */
     }
 
-    /* Find the runqueue with the lowest average load. */
+    /*
+     * What we want is:
+     *  - if we have soft affinity, the runqueue with the lowest average
+     *    load, among the ones that contain cpus in our soft affinity; this
+     *    represents the best runq on which we would want to run.
+     *  - the runqueue with the lowest average load among the ones that
+     *    contains cpus in our hard affinity; this represent the best runq
+     *    on which we can run.
+     *
+     * Find both runqueues in one pass.
+     */
+    has_soft = has_soft_affinity(vc, vc->cpu_hard_affinity);
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
@@ -1800,31 +1820,51 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         rqd = prv->rqd + i;
 
         /*
-         * If checking a different runqueue, grab the lock, check hard
-         * affinity, read the avg, and then release the lock.
+         * If none of the cpus of this runqueue is in svc's hard-affinity,
+         * skip the runqueue.
+         *
+         * Note that, in case svc's hard-affinity has changed, this is the
+         * first time when we see such change, so it is indeed possible
+         * that we end up skipping svc's current runqueue.
+         */
+        if ( !cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
+            continue;
+
+        /*
+         * If checking a different runqueue, grab the lock, read the avg,
+         * and then release the lock.
          *
          * If on our own runqueue, don't grab or release the lock;
          * but subtract our own load from the runqueue load to simulate
          * impartiality.
-         *
-         * Note that, if svc's hard affinity has changed, this is the
-         * first time when we see such change, so it is indeed possible
-         * that none of the cpus in svc's current runqueue is in our
-         * (new) hard affinity!
          */
         if ( rqd == svc->rqd )
         {
-            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
-                rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
+            rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
-                rqd_avgload = rqd->b_avgload;
-
+            rqd_avgload = rqd->b_avgload;
             spin_unlock(&rqd->lock);
         }
 
+        /*
+         * if svc has a soft-affinity, and some cpus of rqd are part of it,
+         * see if we need to update the "soft-affinity minimum".
+         */
+        if ( has_soft &&
+             rqd_avgload < min_s_avgload )
+        {
+            cpumask_t mask;
+
+            cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active);
+            if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) )
+            {
+                min_s_avgload = rqd_avgload;
+                min_s_rqi = i;
+            }
+        }
+        /* In any case, keep the "hard-affinity minimum" updated too. */
         if ( rqd_avgload < min_avgload )
         {
             min_avgload = rqd_avgload;
@@ -1832,17 +1872,54 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         }
     }
 
-    /* We didn't find anyone (most likely because of spinlock contention). */
-    if ( min_rqi == -1 )
+    if ( has_soft && min_s_rqi != -1 )
+    {
+        /*
+         * We have soft affinity, and we have a candidate runq, so go for it.
+         *
+         * Note that, to obtain the soft-affinity mask, we "just" put what we
+         * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
+         * ok because:
+         * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have
+         *   a non-empty intersection (because has_soft is true);
+         * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already
+         *   in cpumask_scratch, we do save a lot doing like this.
+         *
+         * It's kind of like open coding affinity_balance_cpumask() but, in
+         * this specific case, calling that would mean a lot of (unnecessary)
+         * cpumask operations.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    vc->cpu_soft_affinity);
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    &prv->rqd[min_s_rqi].active);
+    }
+    else if ( min_rqi != -1 )
     {
+        /*
+         * Either we don't have soft-affinity, or we do, but we did not find
+         * any suitable runq. But we did find one when considering hard
+         * affinity, so go for it.
+         *
+         * cpumask_scratch already has vc->cpu_hard_affinity &
+         * cpupool_domain_cpumask() in it, so it's enough that we filter
+         * with the cpus of the runq.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    &prv->rqd[min_rqi].active);
+    }
+    else
+    {
+        /*
+         * We didn't find anyone at all (most likely because of spinlock
+         * contention).
+         */
         new_cpu = get_fallback_cpu(svc);
         min_rqi = c2r(ops, new_cpu);
         min_avgload = prv->rqd[min_rqi].b_avgload;
         goto out_up;
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                &prv->rqd[min_rqi].active);
     new_cpu = cpumask_cycle(prv->rqd[min_rqi].pick_bias,
                             cpumask_scratch_cpu(cpu));
     prv->rqd[min_rqi].pick_bias = new_cpu;