From patchwork Fri Sep 30 02:53:53 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dario Faggioli <dario.faggioli@citrix.com>
X-Patchwork-Id: 9357459
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	96C746077A for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri, 30 Sep 2016 02:56:10 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8AB5729D3E
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri, 30 Sep 2016 02:56:10 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 7F3ED29D45; Fri, 30 Sep 2016 02:56:10 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.6 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,RCVD_IN_SORBS_SPAM,T_DKIM_INVALID autolearn=ham
	version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B5EAD29D3E
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri, 30 Sep 2016 02:56:09 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1bpnxX-0001wz-S1; Fri, 30 Sep 2016 02:53:59 +0000
Received: from mail6.bemta6.messagelabs.com ([193.109.254.103])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <raistlin.df@gmail.com>) id 1bpnxX-0001wN-1o
	for xen-devel@lists.xenproject.org; Fri, 30 Sep 2016 02:53:59 +0000
Received: from [193.109.254.147] by server-6.bemta-6.messagelabs.com id
	AE/00-11175-6C3DDE75; Fri, 30 Sep 2016 02:53:58 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmplleJIrShJLcpLzFFi42Lxqg1y1j18+W2
	4wcwDWhbft0xmcmD0OPzhCksAYxRrZl5SfkUCa8a5aQdZCyZ6Vcx/797A+N+yi5GLQ0hgOqPE
	o6N72UAcFoE1rBL9216xgzgSApdYJbZ9egDkcAI5MRJnX3VC2RUSv6feZAaxhQRUJG5uX8UEM
	eoXo8SMDWtZQBLCAnoSR47+YIewLSXWfFkKZrMJGEi82bGXFcQWEVCSuLdqMlgzs8ByRol5k4
	4wgiRYBFQlHrf8ZQOxeQW8Jeb2XQZr5hTwlfi17QULxGYfid/Ll4PViArISay83MIKUS8ocXL
	mE6AaDqChmhLrd+mDhJkF5CW2v53DPIFRZBaSqlkIVbOQVC1gZF7FqFGcWlSWWqRraKyXVJSZ
	nlGSm5iZo2toYKaXm1pcnJiempOYVKyXnJ+7iREYAQxAsIPxy7KAQ4ySHExKoryyR9+GC/El5
	adUZiQWZ8QXleakFh9i1ODgEJhwdu50JimWvPy8VCUJ3rWXgOoEi1LTUyvSMnOAMQpTKsHBoy
	TCuxIkzVtckJhbnJkOkTrFaMyxZeq9tUwcS3Y9WMskBDZJSpy3CaRUAKQ0ozQPbhAsdVxilJU
	S5mUEOlOIpyC1KDezBFX+FaM4B6OSMO82kCk8mXklcPteAZ3CBHRK/tE3IKeUJCKkpBoYjT80
	7nvreF31yof5827rBgnECzHNWbQyYcL8D1s7vf89nZ/qtP7z6yeT/m++mbUoLOKOSEZS7JmG4
	qpNbCz7dDNOlma9n3/84630jzkK1x/FbJsfMMHx21aGxqhbQZw7tXqkz3QHvWHdpzmFV6dOIX
	CV/sWANE397GOaG7m/rD1rsHTa80ibFUosxRmJhlrMRcWJAHYXp1sYAwAA
X-Env-Sender: raistlin.df@gmail.com
X-Msg-Ref: server-16.tower-27.messagelabs.com!1475204035!62056872!1
X-Originating-IP: [74.125.82.67]
X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG
X-StarScan-Received: 
X-StarScan-Version: 8.84; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 56209 invoked from network); 30 Sep 2016 02:53:55 -0000
Received: from mail-wm0-f67.google.com (HELO mail-wm0-f67.google.com)
	(74.125.82.67)
	by server-16.tower-27.messagelabs.com with AES128-GCM-SHA256
	encrypted SMTP; 30 Sep 2016 02:53:55 -0000
Received: by mail-wm0-f67.google.com with SMTP id w72so1397278wmf.1
	for <xen-devel@lists.xenproject.org>;
	Thu, 29 Sep 2016 19:53:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=sender:subject:from:to:cc:date:message-id:in-reply-to:references
	:user-agent:mime-version:content-transfer-encoding;
	bh=NbJqSHonpv9IqOmpjpuMOiEu3QmDNJaSY5y1QXDlLpU=;
	b=D0MyR+XLFIOiPIrkSCvIjrgsxl7/YusjpBRiRLINrVyYCQENQREeXMKEJVw/uv4KEn
	knFD45bNR8VdTay9pSDDCT5tZq66k+ZOMqVWxho3gkkeNYksCAyGxoLb+ztpwJ0lTcNp
	26LZ3lF1pMNbY1G7WOaasXBkd64KNIj+GsJk6xVu8fYtjMInkc9Q8cbSYl/bFCk3AllF
	lSaveAlXgGgsIJiM7665ajkz+ASKT4SshLHGMctY7SyAbw3Xua2QrXAOcJg07d0gMp6i
	I7IinRv2QfFbR/sJp/AiJ5ln2VoFN1HHjz/U0nLL3MMMeVpJf5TZQW1jipeZC4++nRHU
	412A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:sender:subject:from:to:cc:date:message-id
	:in-reply-to:references:user-agent:mime-version
	:content-transfer-encoding;
	bh=NbJqSHonpv9IqOmpjpuMOiEu3QmDNJaSY5y1QXDlLpU=;
	b=AQCbD4r9ahG/s0HmlP3Wk2gRuskFWrRoFOwIjEySZJf/syB4eXHUGqEFc0ywQmS38x
	6lmquDMkQeFNlYw+P+k6friCFwfPDIOdLf8Og1YoM5IDxwH60ap/yRhO84FeQ/pPFeYc
	EDXHcPHpGTMnucY9HNanM5byu7eKqt6EI9DYtr1R0RaQuiFKgpUHKxilFg4s8xBcaW5f
	8XxRfdKPUYSthvY9fx6M0fbXsz3+W7oeHid/90DtOFgYx852PDurDmqfm6b+cb9eX/c6
	xQQJJME5RluFZQui+8WZDUfEIUE104qXO8aSjBUP0HXshRXWw9+nD8ouQ2B88hWfDbGy
	6YKw==
X-Gm-Message-State: 
 AA6/9RmZUdcI5+zfIsHuNhsDJJJHaJDTPZ2MwXAfx6F2+9vyui4HguJBfg/Mq8o06E2rnQ==
X-Received: by 10.28.215.21 with SMTP id o21mr1567788wmg.68.1475204035280;
	Thu, 29 Sep 2016 19:53:55 -0700 (PDT)
Received: from Solace.fritz.box ([80.66.223.126])
	by smtp.gmail.com with ESMTPSA id
	vx7sm12830159wjc.1.2016.09.29.19.53.53
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 29 Sep 2016 19:53:54 -0700 (PDT)
From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xenproject.org
Date: Fri, 30 Sep 2016 04:53:53 +0200
Message-ID: <147520403328.22544.3265744862320473651.stgit@Solace.fritz.box>
In-Reply-To: <147520253247.22544.10673844222866363947.stgit@Solace.fritz.box>
References: <147520253247.22544.10673844222866363947.stgit@Solace.fritz.box>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Anshul Makkar <anshul.makkar@citrix.com>, Jan Beulich <jbeulich@suse.com>
Subject: [Xen-devel] [PATCH v2 05/10] xen: credit2: implement yield()
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

When a vcpu explicitly yields it is usually giving
us an advice of "let someone else run and come back
to me in a bit."

Credit2 isn't, so far, doing anything when a vcpu
yields, which means an yield is basically a NOP (well,
actually, it's pure overhead, as it causes the scheduler
kick in, but the result is --at least 99% of the time--
that the very same vcpu that yielded continues to run).

Implement a "preempt bias", to be applied to yielding
vcpus. Basically when evaluating what vcpu to run next,
if a vcpu that has just yielded is encountered, we give
it a credit penalty, and check whether there is anyone
else that would better take over the cpu (of course,
if there isn't the yielding vcpu will continue).

The value of this bias can be configured with a boot
time parameter, and the default is set to 1 ms.

Also, add an yield performance counter, and fix the
style of a couple of comments.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes from v1:
 * add _us to the parameter name, as suggested during review;
 * get rid of the minimum value for the yield bias;
 * apply the idle bias via subtraction of credits to the yielding vcpu, rather
   than via addition to all the others;
 * merge the Credit2 bits of what was patch 7 here, as suggested during review.
---
 docs/misc/xen-command-line.markdown |   10 +++++
 xen/common/sched_credit2.c          |   76 +++++++++++++++++++++++++++--------
 xen/common/schedule.c               |    2 +
 xen/include/xen/perfc_defn.h        |    1 
 4 files changed, 71 insertions(+), 18 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 8ff57fa..4fd3460 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1395,6 +1395,16 @@ Choose the default scheduler.
 ### sched\_credit2\_migrate\_resist
 > `= <integer>`
 
+### sched\_credit2\_yield\_bias\_us
+> `= <integer>`
+
+> Default: `1000`
+
+Set how much a yielding vcpu will be penalized, in order to actually
+give a chance to run to some other vcpu. This is basically a bias, in
+favour of the non-yielding vcpus, expressed in microseconds (default
+is 1ms).
+
 ### sched\_credit\_tslice\_ms
 > `= <integer>`
 
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 72e31b5..fde61ef 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -144,6 +144,8 @@
 #define CSCHED2_MIGRATE_RESIST       ((opt_migrate_resist)*MICROSECS(1))
 /* How much to "compensate" a vcpu for L2 migration */
 #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50)
+/* How big of a bias we should have against a yielding vcpu */
+#define CSCHED2_YIELD_BIAS           ((opt_yield_bias)*MICROSECS(1))
 /* Reset: Value below which credit will be reset. */
 #define CSCHED2_CREDIT_RESET         0
 /* Max timer: Maximum time a guest can be run for. */
@@ -181,11 +183,20 @@
  */
 #define __CSFLAG_runq_migrate_request 3
 #define CSFLAG_runq_migrate_request (1<<__CSFLAG_runq_migrate_request)
-
+/*
+ * CSFLAG_vcpu_yield: this vcpu was running, and has called vcpu_yield(). The
+ * scheduler is invoked to see if we can give the cpu to someone else, and
+ * get back to the yielding vcpu in a while.
+ */
+#define __CSFLAG_vcpu_yield 4
+#define CSFLAG_vcpu_yield (1<<__CSFLAG_vcpu_yield)
 
 static unsigned int __read_mostly opt_migrate_resist = 500;
 integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
 
+static unsigned int __read_mostly opt_yield_bias = 1000;
+integer_param("sched_credit2_yield_bias_us", opt_yield_bias);
+
 /*
  * Useful macros
  */
@@ -1431,6 +1442,14 @@ out:
 }
 
 static void
+csched2_vcpu_yield(const struct scheduler *ops, struct vcpu *v)
+{
+    struct csched2_vcpu * const svc = CSCHED2_VCPU(v);
+
+    __set_bit(__CSFLAG_vcpu_yield, &svc->flags);
+}
+
+static void
 csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
@@ -2250,26 +2269,39 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     struct list_head *iter;
     struct csched2_vcpu *snext = NULL;
     struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu));
+    /*
+     * If scurr is yielding, temporarily subtract CSCHED2_YIELD_BIAS
+     * credits to it (where "temporarily" means "for the sake of just
+     * this scheduling decision).
+     */
+    int yield_bias = 0, snext_credit;
 
     *skipped = 0;
 
-    /* Default to current if runnable, idle otherwise */
-    if ( vcpu_runnable(scurr->vcpu) )
-        snext = scurr;
-    else
-        snext = CSCHED2_VCPU(idle_vcpu[cpu]);
-
     /*
      * Return the current vcpu if it has executed for less than ratelimit.
      * Adjuststment for the selected vcpu's credit and decision
      * for how long it will run will be taken in csched2_runtime.
+     *
+     * Note that, if scurr is yielding, we don't let rate limiting kick in.
+     * In fact, it may be the case that scurr is about to spin, and there's
+     * no point forcing it to do so until rate limiting expires.
      */
-    if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
-         vcpu_runnable(scurr->vcpu) &&
-         (now - scurr->vcpu->runstate.state_entry_time) <
-          MICROSECS(prv->ratelimit_us) )
+    if ( __test_and_clear_bit(__CSFLAG_vcpu_yield, &scurr->flags) )
+        yield_bias = CSCHED2_YIELD_BIAS;
+    else if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
+              vcpu_runnable(scurr->vcpu) &&
+              (now - scurr->vcpu->runstate.state_entry_time) <
+               MICROSECS(prv->ratelimit_us) )
         return scurr;
 
+    /* Default to current if runnable, idle otherwise */
+    if ( vcpu_runnable(scurr->vcpu) )
+        snext = scurr;
+    else
+        snext = CSCHED2_VCPU(idle_vcpu[cpu]);
+
+    snext_credit = snext->credit - yield_bias;
     list_for_each( iter, &rqd->runq )
     {
         struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
@@ -2293,19 +2325,23 @@ runq_candidate(struct csched2_runqueue_data *rqd,
             continue;
         }
 
-        /* If this is on a different processor, don't pull it unless
-         * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
+        /*
+         * If this is on a different processor, don't pull it unless
+         * its credit is at least CSCHED2_MIGRATE_RESIST higher.
+         */
         if ( svc->vcpu->processor != cpu
-             && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
+             && snext_credit + CSCHED2_MIGRATE_RESIST > svc->credit )
         {
             (*skipped)++;
             SCHED_STAT_CRANK(migrate_resisted);
             continue;
         }
 
-        /* If the next one on the list has more credit than current
-         * (or idle, if current is not runnable), choose it. */
-        if ( svc->credit > snext->credit )
+        /*
+         * If the next one on the list has more credit than current
+         * (or idle, if current is not runnable), choose it.
+         */
+        if ( svc->credit > snext_credit )
             snext = svc;
 
         /* In any case, if we got this far, break. */
@@ -2391,7 +2427,8 @@ csched2_schedule(
      */
     if ( tasklet_work_scheduled )
     {
-        trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0,  NULL);
+        __clear_bit(__CSFLAG_vcpu_yield, &scurr->flags);
+        trace_var(TRC_CSCHED2_SCHED_TASKLET, 1, 0, NULL);
         snext = CSCHED2_VCPU(idle_vcpu[cpu]);
     }
     else
@@ -2923,6 +2960,8 @@ csched2_init(struct scheduler *ops)
     printk(XENLOG_INFO "load tracking window lenght %llu ns\n",
            1ULL << opt_load_window_shift);
 
+    printk(XENLOG_INFO "yield bias value %d us\n", opt_yield_bias);
+
     /* Basically no CPU information is available at this point; just
      * set up basic structures, and a callback when the CPU info is
      * available. */
@@ -2975,6 +3014,7 @@ static const struct scheduler sched_credit2_def = {
 
     .sleep          = csched2_vcpu_sleep,
     .wake           = csched2_vcpu_wake,
+    .yield          = csched2_vcpu_yield,
 
     .adjust         = csched2_dom_cntl,
     .adjust_global  = csched2_sys_cntl,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 104d203..5b444c4 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -947,6 +947,8 @@ long vcpu_yield(void)
     SCHED_OP(VCPU2OP(v), yield, v);
     vcpu_schedule_unlock_irq(lock, v);
 
+    SCHED_STAT_CRANK(vcpu_yield);
+
     TRACE_2D(TRC_SCHED_YIELD, current->domain->domain_id, current->vcpu_id);
     raise_softirq(SCHEDULE_SOFTIRQ);
     return 0;
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index 4a835b8..900fddd 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -23,6 +23,7 @@ PERFCOUNTER(vcpu_alloc,             "sched: vcpu_alloc")
 PERFCOUNTER(vcpu_insert,            "sched: vcpu_insert")
 PERFCOUNTER(vcpu_remove,            "sched: vcpu_remove")
 PERFCOUNTER(vcpu_sleep,             "sched: vcpu_sleep")
+PERFCOUNTER(vcpu_yield,             "sched: vcpu_yield")
 PERFCOUNTER(vcpu_wake_running,      "sched: vcpu_wake_running")
 PERFCOUNTER(vcpu_wake_onrunq,       "sched: vcpu_wake_onrunq")
 PERFCOUNTER(vcpu_wake_runnable,     "sched: vcpu_wake_runnable")