From patchwork Thu Jul 27 12:05:53 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dario Faggioli <dario.faggioli@citrix.com>
X-Patchwork-Id: 9866697
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	7036560382 for <patchwork-xen-devel@patchwork.kernel.org>;
	Thu, 27 Jul 2017 12:08:04 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61CA828789
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Thu, 27 Jul 2017 12:08:04 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 5674F287FE; Thu, 27 Jul 2017 12:08:04 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.6 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,RCVD_IN_SORBS_SPAM,T_DKIM_INVALID autolearn=ham
	version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A62CC28789
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Thu, 27 Jul 2017 12:08:03 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1dahYE-0005ZL-Ri; Thu, 27 Jul 2017 12:05:58 +0000
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <raistlin.df@gmail.com>) id 1dahYE-0005Yu-0k
	for xen-devel@lists.xenproject.org; Thu, 27 Jul 2017 12:05:58 +0000
Received: from [85.158.137.68] by server-14.bemta-3.messagelabs.com id
	05/7E-01862-527D9795; Thu, 27 Jul 2017 12:05:57 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmphleJIrShJLcpLzFFi42K5GNpwRFflemW
	kQeN/OYvvWyYzOTB6HP5whSWAMYo1My8pvyKBNePeqnbmgh1OFY9+izUw3tPrYuTiEBKYwSix
	b+ZhZhCHRWANq8S1+5/YQBwJgUusEh+2NgFlOIGcOIlN/T/YIOwqiaYJy8BsIQEViZvbVzFB2
	D8YJVZ+MwGxhQX0JI4c/cEOYYdIbFreCDaHTcBA4s2OvawgtoiAksS9VZOZQJYxCzQwSqz/1c
	wCkmARUJW43gmxjFfAQaK76TvYIE4BJ4ntF5qgFjtK3D77EywuKiAnsfJyCytEvaDEyZlPgOZ
	wAA3VlFi/Sx8kzCwgL7H97RzmCYwis5BUzUKomoWkagEj8ypGjeLUorLUIl1DS72kosz0jJLc
	xMwcXUMDY73c1OLixPTUnMSkYr3k/NxNjMDwr2dgYNzB+Pu43yFGSQ4mJVHeSaYVkUJ8Sfkpl
	RmJxRnxRaU5qcWHGDU4OAQmnJ07nUmKJS8/L1VJglfqWmWkkGBRanpqRVpmDjBCYUolOHiURH
	gdQNK8xQWJucWZ6RCpU4zGHFeurPvCxDHlwPYvTEJgk6TEeZ9dBSoVACnNKM2DGwRLHJcYZaW
	EeRkZGBiEeApSi3IzS1DlXzGKczAqCfPOApnCk5lXArfvFdApTECnTGwCO6UkESEl1cC4OO3g
	foNrG452xjDVi0Z0+8n/NOBNkWs5p6xrflvtO9cBk+ZYx3bmdStCHPJnJIarXvsvZfnjW2D6H
	sY6xyeLZU7X77s0xXii6INpab9Ptl9+UD7XsqwjOcE3u4H1+eHqJYon/7xc2+dn3yZlaPW51/
	Hszf9l5m0Swl9a0234uYS/lZUetldiKc5INNRiLipOBABj52+3FwMAAA==
X-Env-Sender: raistlin.df@gmail.com
X-Msg-Ref: server-4.tower-31.messagelabs.com!1501157156!49327127!1
X-Originating-IP: [209.85.128.196]
X-SpamReason: No, hits=0.0 required=7.0 tests=
X-StarScan-Received: 
X-StarScan-Version: 9.4.25; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 11992 invoked from network); 27 Jul 2017 12:05:56 -0000
Received: from mail-wr0-f196.google.com (HELO mail-wr0-f196.google.com)
	(209.85.128.196)
	by server-4.tower-31.messagelabs.com with AES128-GCM-SHA256 encrypted
	SMTP; 27 Jul 2017 12:05:56 -0000
Received: by mail-wr0-f196.google.com with SMTP id g32so6969076wrd.5
	for <xen-devel@lists.xenproject.org>;
	Thu, 27 Jul 2017 05:05:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=sender:subject:from:to:cc:date:message-id:in-reply-to:references
	:user-agent:mime-version:content-transfer-encoding;
	bh=i7k3DKp5nN8xDBN0pZtRQmZrbX7K44xPmRwi6yKLJgk=;
	b=i1fjS/wFZmQ5koJzchgdbhQWdYA2ZOKp6nmDktyWyJcIwwidxRVHqXF00lDkk6RjS6
	sISyk12YzkCxwNdJNkExRZR0LPndwkOn8pHbvWTE8NVwPcoNmJsChWs9B7oqxyX4euiS
	5pouWvn+0KzzJSFVKDb2vaR9bOftZ+ZmvV7ovCLbXfBoHakX1Gi73Vsu24g6Zh3HF2Go
	j9lPzur7gDYkZRPmJ1ZhXAO6l39m3hqNjFVHY+8aOoYkNqqwpg72NDIDm0bNHf4JXO69
	SoqFoIUCJ/6k9RmamhQvg54RMViD/FwBy8Cqv5MUDAf2RhQ1Bu5I10SJXM05dolZwSI3
	CFRw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:sender:subject:from:to:cc:date:message-id
	:in-reply-to:references:user-agent:mime-version
	:content-transfer-encoding;
	bh=i7k3DKp5nN8xDBN0pZtRQmZrbX7K44xPmRwi6yKLJgk=;
	b=WMZjtXOcmLYNMRj5Fog5Kak1Mr/q1Io/eY0cZtgLt9Ih3IjKf5HLENKMzABFdXkU0d
	eA7JSsMoQDrCw+O1BQMB2A0qFtQjp0RBq+ToXzJUoCeJ4z/0BNwoEQDV6pNdhKC81wj+
	dGKlDAgG+KR7LjuzS88oya3uxhMkCXrVc6c7blq4NaEbx0fDbiuKq+jByhzm5ukawfkX
	JEN/R+ZLXsW8WhSjVj3ndoRKDogYI1t2kFWAG3Y0yhvx7gbTcbaUACjtfenwZKKHDQQ7
	N379VonYnbLBVkwGAZz4lXIV7f3lyYi37WZ1Ac4ufTcF525dVfpvBCCSPtTkiKuYqgl/
	Nzqg==
X-Gm-Message-State: AIVw113W5ZtAYrsVhoOnbBlXk6YRBtEctVKSqyLwuGh38mWCu9/QuWq3
	hQnu/IG4BG20RhO4
X-Received: by 10.223.153.106 with SMTP id x97mr3232945wrb.32.1501157155756;
	Thu, 27 Jul 2017 05:05:55 -0700 (PDT)
Received: from [192.168.0.31] ([80.66.223.212])
	by smtp.gmail.com with ESMTPSA id
	76sm16267951wmm.14.2017.07.27.05.05.54
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 27 Jul 2017 05:05:55 -0700 (PDT)
From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xenproject.org
Date: Thu, 27 Jul 2017 14:05:53 +0200
Message-ID: <150115715350.6767.2140393293186342043.stgit@Solace>
In-Reply-To: <150115657192.6767.15778617807307106582.stgit@Solace>
References: <150115657192.6767.15778617807307106582.stgit@Solace>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Cc: "Justin T. Weaver" <jtweaver@hawaii.edu>,
	George Dunlap <george.dunlap@citrix.com>,
	Anshul Makkar <anshulmakkar@gmail.com>
Subject: [Xen-devel] [PATCH v2 3/6] xen: credit2: soft-affinity awareness in
	csched2_cpu_pick()
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

We want to find the runqueue with the least average load,
and to do that, we scan through all the runqueues.

It is, therefore, enough that, during such scan:
- we identify the runqueue with the least load, among
  the ones that have pcpus that are part of the soft
  affinity of the vcpu we're calling pick on;
- we identify the same, but for hard affinity.

At this point, we can decide whether to go for the
runqueue with the least load among the ones with some
soft-affinity, or overall.

Therefore, at the price of some code reshuffling, we
can avoid the loop.

(Also, kill a spurious ';' in the definition of MAX_LOAD.)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: Anshul Makkar <anshulmakkar@gmail.com>
---
 xen/common/sched_credit2.c |  117 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 97 insertions(+), 20 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index aa8f169..8237a0a 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1761,14 +1761,16 @@ csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
     vcpu_schedule_unlock_irq(lock, vc);
 }
 
-#define MAX_LOAD (STIME_MAX);
+#define MAX_LOAD (STIME_MAX)
 static int
 csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_private *prv = csched2_priv(ops);
-    int i, min_rqi = -1, new_cpu, cpu = vc->processor;
+    int i, min_rqi = -1, min_s_rqi = -1;
+    unsigned int new_cpu, cpu = vc->processor;
     struct csched2_vcpu *svc = csched2_vcpu(vc);
-    s_time_t min_avgload = MAX_LOAD;
+    s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
+    bool has_soft;
 
     ASSERT(!cpumask_empty(&prv->active_queues));
 
@@ -1819,17 +1821,35 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
                                      &svc->migrate_rqd->active) )
         {
+            /*
+             * If we've been asked to move to migrate_rqd, we should just do
+             * that, which we actually do by returning one cpu from that runq.
+             * There is no need to take care of soft affinity, as that will
+             * happen in runq_tickle().
+             */
             cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                         &svc->migrate_rqd->active);
             new_cpu = cpumask_cycle(svc->migrate_rqd->pick_bias,
                                     cpumask_scratch_cpu(cpu));
+
             svc->migrate_rqd->pick_bias = new_cpu;
             goto out_up;
         }
         /* Fall-through to normal cpu pick */
     }
 
-    /* Find the runqueue with the lowest average load. */
+    /*
+     * What we want is:
+     *  - if we have soft affinity, the runqueue with the lowest average
+     *    load, among the ones that contain cpus in our soft affinity; this
+     *    represents the best runq on which we would want to run.
+     *  - the runqueue with the lowest average load among the ones that
+     *    contains cpus in our hard affinity; this represent the best runq
+     *    on which we can run.
+     *
+     * Find both runqueues in one pass.
+     */
+    has_soft = has_soft_affinity(vc, vc->cpu_hard_affinity);
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
@@ -1838,31 +1858,51 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         rqd = prv->rqd + i;
 
         /*
-         * If checking a different runqueue, grab the lock, check hard
-         * affinity, read the avg, and then release the lock.
+         * If none of the cpus of this runqueue is in svc's hard-affinity,
+         * skip the runqueue.
+         *
+         * Note that, in case svc's hard-affinity has changed, this is the
+         * first time when we see such change, so it is indeed possible
+         * that we end up skipping svc's current runqueue.
+         */
+        if ( !cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
+            continue;
+
+        /*
+         * If checking a different runqueue, grab the lock, read the avg,
+         * and then release the lock.
          *
          * If on our own runqueue, don't grab or release the lock;
          * but subtract our own load from the runqueue load to simulate
          * impartiality.
-         *
-         * Note that, if svc's hard affinity has changed, this is the
-         * first time when we see such change, so it is indeed possible
-         * that none of the cpus in svc's current runqueue is in our
-         * (new) hard affinity!
          */
         if ( rqd == svc->rqd )
         {
-            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
-                rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
+            rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
-                rqd_avgload = rqd->b_avgload;
-
+            rqd_avgload = rqd->b_avgload;
             spin_unlock(&rqd->lock);
         }
 
+        /*
+         * if svc has a soft-affinity, and some cpus of rqd are part of it,
+         * see if we need to update the "soft-affinity minimum".
+         */
+        if ( has_soft &&
+             rqd_avgload < min_s_avgload )
+        {
+            cpumask_t mask;
+
+            cpumask_and(&mask, cpumask_scratch_cpu(cpu), &rqd->active);
+            if ( cpumask_intersects(&mask, svc->vcpu->cpu_soft_affinity) )
+            {
+                min_s_avgload = rqd_avgload;
+                min_s_rqi = i;
+            }
+        }
+        /* In any case, keep the "hard-affinity minimum" updated too. */
         if ( rqd_avgload < min_avgload )
         {
             min_avgload = rqd_avgload;
@@ -1870,17 +1910,54 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         }
     }
 
-    /* We didn't find anyone (most likely because of spinlock contention). */
-    if ( min_rqi == -1 )
+    if ( has_soft && min_s_rqi != -1 )
+    {
+        /*
+         * We have soft affinity, and we have a candidate runq, so go for it.
+         *
+         * Note that, to obtain the soft-affinity mask, we "just" put what we
+         * have in cpumask_scratch in && with vc->cpu_soft_affinity. This is
+         * ok because:
+         * - we know that vc->cpu_hard_affinity and vc->cpu_soft_affinity have
+         *   a non-empty intersection (because has_soft is true);
+         * - we have vc->cpu_hard_affinity & cpupool_domain_cpumask() already
+         *   in cpumask_scratch, we do save a lot doing like this.
+         *
+         * It's kind of like open coding affinity_balance_cpumask() but, in
+         * this specific case, calling that would mean a lot of (unnecessary)
+         * cpumask operations.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    vc->cpu_soft_affinity);
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    &prv->rqd[min_s_rqi].active);
+    }
+    else if ( min_rqi != -1 )
     {
+        /*
+         * Either we don't have soft-affinity, or we do, but we did not find
+         * any suitable runq. But we did find one when considering hard
+         * affinity, so go for it.
+         *
+         * cpumask_scratch already has vc->cpu_hard_affinity &
+         * cpupool_domain_cpumask() in it, so it's enough that we filter
+         * with the cpus of the runq.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    &prv->rqd[min_rqi].active);
+    }
+    else
+    {
+        /*
+         * We didn't find anyone at all (most likely because of spinlock
+         * contention).
+         */
         new_cpu = get_fallback_cpu(svc);
         min_rqi = c2r(new_cpu);
         min_avgload = prv->rqd[min_rqi].b_avgload;
         goto out_up;
     }
 
-    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
-                &prv->rqd[min_rqi].active);
     new_cpu = cpumask_cycle(prv->rqd[min_rqi].pick_bias,
                             cpumask_scratch_cpu(cpu));
     prv->rqd[min_rqi].pick_bias = new_cpu;