From patchwork Wed Apr 6 17:24:16 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 8764031 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 7E1899F39A for ; Wed, 6 Apr 2016 17:26:19 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 45361201EC for ; Wed, 6 Apr 2016 17:26:18 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DBA69201D3 for ; Wed, 6 Apr 2016 17:26:16 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anrBo-0001eI-9c; Wed, 06 Apr 2016 17:24:24 +0000 Received: from mail6.bemta6.messagelabs.com ([85.158.143.247]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anrBn-0001dj-RC for xen-devel@lists.xenproject.org; Wed, 06 Apr 2016 17:24:24 +0000 Received: from [85.158.143.35] by server-3.bemta-6.messagelabs.com id B2/25-07120-74645075; Wed, 06 Apr 2016 17:24:23 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrDIsWRWlGSWpSXmKPExsXiVRvkpOvmxhp usPSUqsX3LZOZHBg9Dn+4whLAGMWamZeUX5HAmnH55Du2gl8BFQ+7D7E2ME636mLk4hASmMEo sfzAFEYQh0VgDavEw0vrWEAcCYFLrBL3D81l72LkBHJiJLq6p7NC2JUSzw//A7OFBFQkbm5fx QQxajGTxMkvC5lBEsICehJHjv5gh7B9JfrONbGA2GwCBhJvduwFaxYRUJK4t2oyE4jNLBAkcX vOUTCbRUBV4v6mr0BzODh4BewlDk4pAzE5BRwk9kx2h1hrL3H1ywmwTaICchIrL7eATeQVEJQ 4OfMJC0g5s4CmxPpd+hDD5SW2v53DPIFRZBaSqlkIVbOQVC1gZF7FqF6cWlSWWqRrrJdUlJme UZKbmJmja2hgppebWlycmJ6ak5hUrJecn7uJERj6DECwg7Hjn9MhRkkOJiVRXk8J1nAhvqT8l MqMxOKM+KLSnNTiQ4waHBwCE87Onc4kxZKXn5eqJMEb7ApUJ1iUmp5akZaZA4xOmFIJDh4lEV 5XkDRvcUFibnFmOkTqFKMux5ap99YyCYHNkBLnDQUpEgApyijNgxsBSxSXGGWlhHkZgQ4U4il ILcrNLEGVf8UozsGoJMybCzKFJzOvBG7TK6AjmICOqBdmAjmiJBEhJdXAuCc1d1mzcdjzU82c yow3XVmW91iWxOquiXTxbeDaLTetsVqea0FaS+SdufvzZDeJXjSzd5nanfBHziTndOsjY4frW 1PyphzRfCgw1fl8q5j4y6jpq43tz81YKPnjne6WiDi+aYEBPh39mWFtQnP7njRPfBjaIugl+t Jh4o4VbhskZ782NOdRYinOSDTUYi4qTgQAiqX6zA8DAAA= X-Env-Sender: raistlin.df@gmail.com X-Msg-Ref: server-10.tower-21.messagelabs.com!1459963462!7788889!1 X-Originating-IP: [74.125.82.66] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG X-StarScan-Received: X-StarScan-Version: 8.28; banners=-,-,- X-VirusChecked: Checked Received: (qmail 31986 invoked from network); 6 Apr 2016 17:24:22 -0000 Received: from mail-wm0-f66.google.com (HELO mail-wm0-f66.google.com) (74.125.82.66) by server-10.tower-21.messagelabs.com with AES128-GCM-SHA256 encrypted SMTP; 6 Apr 2016 17:24:22 -0000 Received: by mail-wm0-f66.google.com with SMTP id o129so5679732wmo.3 for ; Wed, 06 Apr 2016 10:24:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=MnMScPtaq5EIreehC1y372bsIQB1QXkYrSQFI3SC8Ao=; b=POdFyCjOZfZNW7fWoNxP0CKlNtZPpB3slFGjmrdISB+OxYQb6xIiPLObvrbfaKjRbx v2HIw1tHFTtMuCLu5du/QexwTs+PHU3PdrphW/IHbydF1NnB7ekClifuiTqcnnIiowPB juayDZLH1nwrB2fMFLExUFxrGu+apkjrIkG7RzqF2Olm9KRtxFqD65ZQsR2qoD5VgmCG cp+BMflASnXRQW5CFMbXCdZdgPgScyXrao2zoPgnnMB8C1jNVbp5JKYwX95ib6tIDfLe m78nHv3/h/R73YX+ZgJg2y9a4uk8gwOKcIYwWmzVVpD3RlxRH7lTpNJiQqb/jw0UUU3n 8rPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=MnMScPtaq5EIreehC1y372bsIQB1QXkYrSQFI3SC8Ao=; b=Ha7BCe73hJTEJRr1P21LA3okHOvTZ4DhYNmTY4ybRcnbJi2mt0krr/j2AecvzW3Ga2 nU2gGWSBFjVNGXUnxXmcz3Ga5eNngnmLQ0uhfBUIkPJIWCt8uaB5dDYdy2HkwmMT7XPe PMquqx74t6zg403PuRcDFYWi9wSuBQnlHWJqxUV8U9TDvAsJUMaAMo5JjiSIgEG3XuuI 7AAxdDvtr+FcCoYlX60cQYHQvdinzwi+NX4NpiISsaVYgRvva/3SHsmLwMUc2WaAOfhN 9dckBo5+TA4Bk4RnseYh9Ne5w2kBwi4k23Ip4rGDiLZ8yZpRHmnmvbD1Qc9e0rAalpaa sWXQ== X-Gm-Message-State: AD7BkJKqVwbxRdbW9AcIFPv9wKiLCXVj7uxTigOWaU6c559oyGVoUJly4WCtahKYA8Nkiw== X-Received: by 10.28.234.137 with SMTP id g9mr18129846wmi.78.1459963462305; Wed, 06 Apr 2016 10:24:22 -0700 (PDT) Received: from Solace.fritz.box (net-37-116-155-252.cust.vodafonedsl.it. [37.116.155.252]) by smtp.gmail.com with ESMTPSA id ka4sm4226700wjc.47.2016.04.06.10.24.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Apr 2016 10:24:21 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Wed, 06 Apr 2016 19:24:16 +0200 Message-ID: <20160406172416.25877.79330.stgit@Solace.fritz.box> In-Reply-To: <20160406170023.25877.15622.stgit@Solace.fritz.box> References: <20160406170023.25877.15622.stgit@Solace.fritz.box> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Cc: Justin Weaver , George Dunlap Subject: [Xen-devel] [PATCH v2 11/11] xen: sched: implement vcpu hard affinity in Credit2 X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Justin Weaver as it was still missing. Note that this patch "only" implements hard affinity, i.e., the possibility of specifying on what pCPUs a certain vCPU can run. Soft affinity (which express a preference for vCPUs to run on certain pCPUs) is still not supported by Credit2, even after this patch. Signed-off-by: Justin Weaver Signed-off-by: Dario Faggioli Acked-by: George Dunlap --- xen/common/sched_credit2.c | 131 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 102 insertions(+), 29 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 084963a..03cd10c 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -310,6 +310,36 @@ struct csched2_dom { uint16_t nr_vcpus; }; +/* + * When a hard affinity change occurs, we may not be able to check some + * (any!) of the other runqueues, when looking for the best new processor + * for svc (as trylock-s in choose_cpu() can fail). If that happens, we + * pick, in order of decreasing preference: + * - svc's current pcpu; + * - another pcpu from svc's current runq; + * - any cpu. + */ +static int get_fallback_cpu(struct csched2_vcpu *svc) +{ + int cpu; + + if ( likely(cpumask_test_cpu(svc->vcpu->processor, + svc->vcpu->cpu_hard_affinity)) ) + return svc->vcpu->processor; + + cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity, + &svc->rqd->active); + cpu = cpumask_first(cpumask_scratch); + if ( likely(cpu < nr_cpu_ids) ) + return cpu; + + cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity, + cpupool_domain_cpumask(svc->vcpu->domain)); + + ASSERT(!cpumask_empty(cpumask_scratch)); + + return cpumask_first(cpumask_scratch); +} /* * Time-to-credit, credit-to-time. @@ -543,8 +573,9 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu * goto tickle; } - /* Get a mask of idle, but not tickled */ + /* Get a mask of idle, but not tickled, that new is allowed to run on. */ cpumask_andnot(&mask, &rqd->idle, &rqd->tickled); + cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity); /* If it's not empty, choose one */ i = cpumask_cycle(cpu, &mask); @@ -555,9 +586,11 @@ runq_tickle(const struct scheduler *ops, unsigned int cpu, struct csched2_vcpu * } /* Otherwise, look for the non-idle cpu with the lowest credit, - * skipping cpus which have been tickled but not scheduled yet */ + * skipping cpus which have been tickled but not scheduled yet, + * that new is allowed to run on. */ cpumask_andnot(&mask, &rqd->active, &rqd->idle); cpumask_andnot(&mask, &mask, &rqd->tickled); + cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity); for_each_cpu(i, &mask) { @@ -1107,9 +1140,8 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc) d2printk("%pv -\n", svc->vcpu); clear_bit(__CSFLAG_runq_migrate_request, &svc->flags); } - /* Leave it where it is for now. When we actually pay attention - * to affinity we'll have to figure something out... */ - return vc->processor; + + return get_fallback_cpu(svc); } /* First check to see if we're here because someone else suggested a place @@ -1120,45 +1152,56 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc) { printk("%s: Runqueue migrate aborted because target runqueue disappeared!\n", __func__); - /* Fall-through to normal cpu pick */ } else { - d2printk("%pv +\n", svc->vcpu); - new_cpu = cpumask_cycle(vc->processor, &svc->migrate_rqd->active); - goto out_up; + cpumask_and(cpumask_scratch, vc->cpu_hard_affinity, + &svc->migrate_rqd->active); + new_cpu = cpumask_any(cpumask_scratch); + if ( new_cpu < nr_cpu_ids ) + { + d2printk("%pv +\n", svc->vcpu); + goto out_up; + } } + /* Fall-through to normal cpu pick */ } - /* FIXME: Pay attention to cpu affinity */ - min_avgload = MAX_LOAD; /* Find the runqueue with the lowest instantaneous load */ for_each_cpu(i, &prv->active_queues) { struct csched2_runqueue_data *rqd; - s_time_t rqd_avgload; + s_time_t rqd_avgload = MAX_LOAD; rqd = prv->rqd + i; - /* If checking a different runqueue, grab the lock, - * read the avg, and then release the lock. + /* + * If checking a different runqueue, grab the lock, check hard + * affinity, read the avg, and then release the lock. * * If on our own runqueue, don't grab or release the lock; * but subtract our own load from the runqueue load to simulate - * impartiality */ + * impartiality. + * + * Note that, if svc's hard affinity has changed, this is the + * first time when we see such change, so it is indeed possible + * that none of the cpus in svc's current runqueue is in our + * (new) hard affinity! + */ if ( rqd == svc->rqd ) { - rqd_avgload = rqd->b_avgload - svc->avgload; + if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) ) + rqd_avgload = rqd->b_avgload - svc->avgload; } else if ( spin_trylock(&rqd->lock) ) { - rqd_avgload = rqd->b_avgload; + if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) ) + rqd_avgload = rqd->b_avgload; + spin_unlock(&rqd->lock); } - else - continue; if ( rqd_avgload < min_avgload ) { @@ -1167,12 +1210,14 @@ choose_cpu(const struct scheduler *ops, struct vcpu *vc) } } - /* We didn't find anyone (most likely because of spinlock contention); leave it where it is */ + /* We didn't find anyone (most likely because of spinlock contention). */ if ( min_rqi == -1 ) - new_cpu = vc->processor; + new_cpu = get_fallback_cpu(svc); else { - new_cpu = cpumask_cycle(vc->processor, &prv->rqd[min_rqi].active); + cpumask_and(cpumask_scratch, vc->cpu_hard_affinity, + &prv->rqd[min_rqi].active); + new_cpu = cpumask_any(cpumask_scratch); BUG_ON(new_cpu >= nr_cpu_ids); } @@ -1252,7 +1297,12 @@ static void migrate(const struct scheduler *ops, on_runq=1; } __runq_deassign(svc); - svc->vcpu->processor = cpumask_any(&trqd->active); + + cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity, + &trqd->active); + svc->vcpu->processor = cpumask_any(cpumask_scratch); + BUG_ON(svc->vcpu->processor >= nr_cpu_ids); + __runq_assign(svc, trqd); if ( on_runq ) { @@ -1266,6 +1316,17 @@ static void migrate(const struct scheduler *ops, } } +/* + * It makes sense considering migrating svc to rqd, if: + * - svc is not already flagged to migrate, + * - if svc is allowed to run on at least one of the pcpus of rqd. + */ +static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc, + struct csched2_runqueue_data *rqd) +{ + return !(svc->flags & CSFLAG_runq_migrate_request) && + cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active); +} static void balance_load(const struct scheduler *ops, int cpu, s_time_t now) { @@ -1374,8 +1435,7 @@ retry: __update_svc_load(ops, push_svc, 0, now); - /* Skip this one if it's already been flagged to migrate */ - if ( push_svc->flags & CSFLAG_runq_migrate_request ) + if ( !vcpu_is_migrateable(push_svc, st.orqd) ) continue; list_for_each( pull_iter, &st.orqd->svc ) @@ -1387,8 +1447,7 @@ retry: __update_svc_load(ops, pull_svc, 0, now); } - /* Skip this one if it's already been flagged to migrate */ - if ( pull_svc->flags & CSFLAG_runq_migrate_request ) + if ( !vcpu_is_migrateable(pull_svc, st.lrqd) ) continue; consider(&st, push_svc, pull_svc); @@ -1404,8 +1463,7 @@ retry: { struct csched2_vcpu * pull_svc = list_entry(pull_iter, struct csched2_vcpu, rqd_elem); - /* Skip this one if it's already been flagged to migrate */ - if ( pull_svc->flags & CSFLAG_runq_migrate_request ) + if ( !vcpu_is_migrateable(pull_svc, st.lrqd) ) continue; /* Consider pull only */ @@ -1444,11 +1502,22 @@ csched2_vcpu_migrate( /* Check if new_cpu is valid */ BUG_ON(!cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized)); + ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity)); trqd = RQD(ops, new_cpu); + /* + * Do the actual movement toward new_cpu, and update vc->processor. + * If we are changing runqueue, migrate() takes care of everything. + * If we are not changing runqueue, we need to update vc->processor + * here. In fact, if, for instance, we are here because the vcpu's + * hard affinity changed, we don't want to risk leaving vc->processor + * pointing to a pcpu where we can't run any longer. + */ if ( trqd != svc->rqd ) migrate(ops, svc, trqd, NOW()); + else + vc->processor = new_cpu; } static int @@ -1671,6 +1740,10 @@ runq_candidate(struct csched2_runqueue_data *rqd, { struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem); + /* Only consider vcpus that are allowed to run on this processor. */ + if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) ) + continue; + /* If this is on a different processor, don't pull it unless * its credit is at least CSCHED2_MIGRATE_RESIST higher. */ if ( svc->vcpu->processor != cpu