From patchwork Sat Sep 14 08:52:44 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= <jgross@suse.com>
X-Patchwork-Id: 11145639
Return-Path: <SRS0=7ja4=XJ=lists.xenproject.org=xen-devel-bounces@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 457BF112B
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Sat, 14 Sep 2019 08:55:43 +0000 (UTC)
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 2BB5020717
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Sat, 14 Sep 2019 08:55:43 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2BB5020717
Authentication-Results: mail.kernel.org;
 dmarc=none (p=none dis=none) header.from=suse.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1i93pN-0001ej-93; Sat, 14 Sep 2019 08:54:45 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=rDpt=XJ=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1i93pL-0001cB-OR
 for xen-devel@lists.xenproject.org; Sat, 14 Sep 2019 08:54:43 +0000
X-Inumbo-ID: 117ab549-d6cd-11e9-95c1-12813bfff9fa
Received: from mx1.suse.de (unknown [195.135.220.15])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 117ab549-d6cd-11e9-95c1-12813bfff9fa;
 Sat, 14 Sep 2019 08:53:08 +0000 (UTC)
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 6D918B67B;
 Sat, 14 Sep 2019 08:53:07 +0000 (UTC)
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Sat, 14 Sep 2019 10:52:44 +0200
Message-Id: <20190914085251.18816-41-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20190914085251.18816-1-jgross@suse.com>
References: <20190914085251.18816-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v3 40/47] xen/sched: prepare per-cpupool
 scheduling granularity
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>, Tim Deegan <tim@xen.org>,
 Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>,
 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com>,
 Julien Grall <julien.grall@arm.com>, Jan Beulich <jbeulich@suse.com>
MIME-Version: 1.0
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

On- and offlining cpus with core scheduling is rather complicated as
the cpus are taken on- or offline one by one, but scheduling wants them
rather to be handled per core.

As the future plan is to be able to select scheduling granularity per
cpupool prepare that by storing the granularity in struct cpupool and
struct sched_resource (we need it there for free cpus which are not
associated to any cpupool). Free cpus will always use granularity 1.

Store the selected granularity option (cpu, core or socket) in the
cpupool as well, as we will need it to select the appropriate cpu mask
when populating the cpupool with cpus.

This will make on- and offlining of cpus much easier and avoids
writing code which would needed to be thrown away later.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V1: new patch
---
 xen/common/cpupool.c       |  2 ++
 xen/common/schedule.c      | 27 +++++++++++++++++----------
 xen/include/xen/sched-if.h | 12 ++++++++++++
 3 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index e0333a8417..c7d8a748d4 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -175,6 +175,8 @@ static struct cpupool *cpupool_create(
             return NULL;
         }
     }
+    c->granularity = sched_granularity;
+    c->opt_granularity = opt_sched_granularity;
 
     *q = c;
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e5b7678dc0..b3c1aa0821 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -56,7 +56,8 @@ int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
 
 /* Number of vcpus per struct sched_unit. */
-static unsigned int __read_mostly sched_granularity = 1;
+enum sched_gran __read_mostly opt_sched_granularity = SCHED_GRAN_cpu;
+unsigned int __read_mostly sched_granularity = 1;
 bool __read_mostly sched_disable_smt_switching;
 const cpumask_t *sched_res_mask = &cpumask_all;
 
@@ -412,10 +413,10 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v)
 {
     struct sched_unit *unit, **prev_unit;
     struct domain *d = v->domain;
+    unsigned int gran = d->cpupool ? d->cpupool->granularity : 1;
 
     for_each_sched_unit ( d, unit )
-        if ( unit->vcpu_list->vcpu_id / sched_granularity ==
-             v->vcpu_id / sched_granularity )
+        if ( unit->vcpu_list->vcpu_id / gran == v->vcpu_id / gran )
             break;
 
     if ( unit )
@@ -582,7 +583,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         return PTR_ERR(domdata);
 
     unit_priv = xzalloc_array(void *,
-                              DIV_ROUND_UP(d->max_vcpus, sched_granularity));
+                              DIV_ROUND_UP(d->max_vcpus, c->granularity));
     if ( unit_priv == NULL )
     {
         sched_free_domdata(c->sched, domdata);
@@ -1825,11 +1826,11 @@ static void sched_switch_units(struct sched_resource *sd,
         if ( is_idle_unit(prev) )
         {
             prev->runstate_cnt[RUNSTATE_running] = 0;
-            prev->runstate_cnt[RUNSTATE_runnable] = sched_granularity;
+            prev->runstate_cnt[RUNSTATE_runnable] = sd->granularity;
         }
         if ( is_idle_unit(next) )
         {
-            next->runstate_cnt[RUNSTATE_running] = sched_granularity;
+            next->runstate_cnt[RUNSTATE_running] = sd->granularity;
             next->runstate_cnt[RUNSTATE_runnable] = 0;
         }
     }
@@ -1978,7 +1979,7 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext)
     else
     {
         vcpu_context_saved(vprev, vnext);
-        if ( sched_granularity == 1 )
+        if ( sd->granularity == 1 )
             unit_context_saved(sd);
     }
 
@@ -2089,11 +2090,12 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
 {
     struct sched_unit *next;
     struct vcpu *v;
+    unsigned int gran = get_sched_res(cpu)->granularity;
 
     if ( !--prev->rendezvous_in_cnt )
     {
         next = do_schedule(prev, now, cpu);
-        atomic_set(&next->rendezvous_out_cnt, sched_granularity + 1);
+        atomic_set(&next->rendezvous_out_cnt, gran + 1);
         return next;
     }
 
@@ -2213,6 +2215,7 @@ static void schedule(void)
     struct sched_resource *sd;
     spinlock_t           *lock;
     int cpu = smp_processor_id();
+    unsigned int          gran = get_sched_res(cpu)->granularity;
 
     ASSERT_NOT_IN_ATOMIC();
 
@@ -2238,11 +2241,11 @@ static void schedule(void)
 
     stop_timer(&sd->s_timer);
 
-    if ( sched_granularity > 1 )
+    if ( gran > 1 )
     {
         cpumask_t mask;
 
-        prev->rendezvous_in_cnt = sched_granularity;
+        prev->rendezvous_in_cnt = gran;
         cpumask_andnot(&mask, sd->cpus, cpumask_of(cpu));
         cpumask_raise_softirq(&mask, SCHED_SLAVE_SOFTIRQ);
         next = sched_wait_rendezvous_in(prev, &lock, cpu, now);
@@ -2308,6 +2311,9 @@ static int cpu_schedule_up(unsigned int cpu)
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&per_cpu(sched_urgent_count, cpu), 0);
 
+    /* We start with cpu granularity. */
+    sd->granularity = 1;
+
     /* Boot CPU is dealt with later in scheduler_init(). */
     if ( cpu == 0 )
         return 0;
@@ -2598,6 +2604,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     sched_free_vdata(old_ops, vpriv_old);
     sched_free_pdata(old_ops, ppriv_old, cpu);
 
+    get_sched_res(cpu)->granularity = c ? c->granularity : 1;
     get_sched_res(cpu)->cpupool = c;
     /* When a cpu is added to a pool, trigger it to go pick up some work */
     if ( c != NULL )
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 5625cafb6e..cb58bad0ff 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -25,6 +25,15 @@ extern int sched_ratelimit_us;
 /* Scheduling resource mask. */
 extern const cpumask_t *sched_res_mask;
 
+/* Number of vcpus per struct sched_unit. */
+enum sched_gran {
+    SCHED_GRAN_cpu,
+    SCHED_GRAN_core,
+    SCHED_GRAN_socket
+};
+extern enum sched_gran opt_sched_granularity;
+extern unsigned int sched_granularity;
+
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
  * we have a per-cpu pointer, along with a pre-allocated set of
@@ -48,6 +57,7 @@ struct sched_resource {
 
     /* Cpu with lowest id in scheduling resource. */
     unsigned int        master_cpu;
+    unsigned int        granularity;
     const cpumask_t    *cpus;           /* cpus covered by this struct     */
 };
 
@@ -532,6 +542,8 @@ struct cpupool
     struct cpupool   *next;
     struct scheduler *sched;
     atomic_t         refcnt;
+    unsigned int     granularity;
+    enum sched_gran  opt_granularity;
 };
 
 #define cpupool_online_cpumask(_pool) \