From patchwork Thu Mar 2 10:38:19 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 9599911 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E48DD60414 for ; Thu, 2 Mar 2017 10:40:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C2855284F4 for ; Thu, 2 Mar 2017 10:40:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B75AF2858F; Thu, 2 Mar 2017 10:40:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RCVD_IN_SORBS_SPAM,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DCE40284F4 for ; Thu, 2 Mar 2017 10:40:44 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cjO7v-0004Cc-DU; Thu, 02 Mar 2017 10:38:27 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cjO7u-0004C7-4F for xen-devel@lists.xenproject.org; Thu, 02 Mar 2017 10:38:26 +0000 Received: from [193.109.254.147] by server-7.bemta-6.messagelabs.com id B4/FD-04817-126F7B85; Thu, 02 Mar 2017 10:38:25 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpmleJIrShJLcpLzFFi42K5GNpwSFfu2/Y Ig/VHOS2+b5nM5MDocfjDFZYAxijWzLyk/IoE1oxry46xFMw0rZj3ZAZ7A+MHzS5GLg4hgRmM EkveHGMCcVgE1rBKfDyxgBXEkRC4xCpx5P5zoAwHkBMjMX8JfxcjJ5BZLbGm5QwziC0koCJxc /sqJgj7J6PEnKd5ILawgJ7EkaM/2CHsKImlbW/BbDYBA4k3O/aygtgiAkoS91ZNButlFoiWWP mwGayGRUBVYuHpfWDzeQV8JP7PnghWwyngK9H59wIrxC4fiR8z+8FsUQE5iZWXW1gh6gUlTs5 8wgJyMrOApsT6XfoQ4+Ultr+dwzyBUWQWkqpZCFWzkFQtYGRexahRnFpUllqka2Skl1SUmZ5R kpuYmaNraGCml5taXJyYnpqTmFSsl5yfu4kRGPwMQLCDcc38wEOMkhxMSqK8vM+2RwjxJeWnV GYkFmfEF5XmpBYfYtTg4BCYcHbudCYplrz8vFQlCV7Hr0B1gkWp6akVaZk5wPiEKZXg4FES4S 0HSfMWFyTmFmemQ6ROMRpzPDi16w0Tx6f+w2+YhMAmSYnzsoGUCoCUZpTmwQ2CpY1LjLJSwry MQGcK8RSkFuVmlqDKv2IU52BUEuY9/AVoCk9mXgncvldApzABnfJCZSvIKSWJCCmpBsaw0NO6 WWwFE7puH1awSa3juyHA9cIntyVM+Mju8sdNIv/2HT/k0f/Hw//HD+1HxxP/Sb6U7Eo3mht8K V5C1Mh9hsnh7Qzv5d8sV2g4EF0wcUvhhx+fmnfYxG1pjy2VKGFT5t0oI627pVzuj8OckzxMtm dYJ+Qx3KrbMm2r2bI95olBLX5KvEosxRmJhlrMRcWJABFaiCIWAwAA X-Env-Sender: raistlin.df@gmail.com X-Msg-Ref: server-6.tower-27.messagelabs.com!1488451102!89819354!1 X-Originating-IP: [209.85.128.194] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.4.3; banners=-,-,- X-VirusChecked: Checked Received: (qmail 3932 invoked from network); 2 Mar 2017 10:38:22 -0000 Received: from mail-wr0-f194.google.com (HELO mail-wr0-f194.google.com) (209.85.128.194) by server-6.tower-27.messagelabs.com with AES128-GCM-SHA256 encrypted SMTP; 2 Mar 2017 10:38:22 -0000 Received: by mail-wr0-f194.google.com with SMTP id g10so8897869wrg.0 for ; Thu, 02 Mar 2017 02:38:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=MaLZLBgICrkhUs5HqnHRzHQ2wOOvjNBOCuMDLTuOGYc=; b=XPl67n13i4fJmk6T9VUQxxHyRCQrnGCraC+jicv0SfNQLvPZkA/0E4+1zmIdEulVt3 2kzF/fsddKalBa/qP3M7X9CgWlCQBzCeVCa5pzN6EgpsYj9qJfcELdf1KM8IpLn0wvkH z4wVFlqeNbRdgki+ra5DFALvoaKnwLUqCnQHD8fKCUKNSW8XWQAHPl0lNRYrWda4yePU 50Vjife4S8IpgTGwb506Oyq7AAfqr+E1azvkyEmz0ePIPUPx7CBCsokiHBOjE2CScB4H lADRJnLxhEMH9ZKRtqwyn+uvOXYmmTru/pt+fXKDVtNv20d2FrU24w/PMtV6KMcBris1 viwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=MaLZLBgICrkhUs5HqnHRzHQ2wOOvjNBOCuMDLTuOGYc=; b=CWRytAYlwFsQUgr7w/63Kvq/+xOHS3JaSpn518Qf8UFXv5vdT1FYriAcHck9/NJlXm HFmWnJyxMenbkR1fLo7yIGgaRqkIbqtbj/gzkLKayKodtBN4mfNpkxTl3af5auQpGRIo xQR4FNb5e+XnfqG4kq7yi5Wz7it7RkIKVr0G+e3H6LFDn7D+vZcLc89fhACi5NrIAG2I FMwJ/VK2OxAR6hUh7PhGw3jRMkx8jcKNm0CyoPPMo8ldRaAkubhEpTCSrRrm5Wbn15Ao pG0x4hsrDgyFbkQSCaQwfNDJ2x0FgtW+4LEI8PDYtd2EbYtd7DSIWG/dnB9SC5zKOoXE 2MBA== X-Gm-Message-State: AMke39lk0JoCYpvX5io027FeepGvMOy5i1k/MggpErkAZUZDBx/1SeOcxMjdynYEpnCvCQ== X-Received: by 10.223.154.44 with SMTP id z41mr11512244wrb.188.1488451101736; Thu, 02 Mar 2017 02:38:21 -0800 (PST) Received: from Solace.fritz.box ([80.66.223.93]) by smtp.gmail.com with ESMTPSA id o15sm2213003wra.61.2017.03.02.02.38.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 02 Mar 2017 02:38:21 -0800 (PST) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Thu, 02 Mar 2017 11:38:19 +0100 Message-ID: <148845109955.23452.14312315410693510946.stgit@Solace.fritz.box> In-Reply-To: <148844531279.23452.17528540110704914171.stgit@Solace.fritz.box> References: <148844531279.23452.17528540110704914171.stgit@Solace.fritz.box> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Cc: George Dunlap , Andrew Cooper Subject: [Xen-devel] [PATCH 3/6] xen: credit1: increase efficiency and scalability of load balancing. X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP During load balancing, we check the non idle pCPUs to see if they have runnable but not running vCPUs that can be stolen by and set to run on currently idle pCPUs. If a pCPU has only one running (or runnable) vCPU, though, we don't want to steal it from there, and it's therefore pointless bothering with it (especially considering that bothering means trying to take its runqueue lock!). On large systems, when load is only slightly higher than the number of pCPUs (i.e., there are just a few more active vCPUs than the number of the pCPUs), this may mean that: - we go through all the pCPUs, - for each one, we (try to) take its runqueue locks, - we figure out there's actually nothing to be stolen! To mitigate this, we introduce here the concept of overloaded runqueues, and a cpumask where to record what pCPUs are in such state. An overloaded runqueue has at least runnable 2 vCPUs (plus the idle one, which is always there). Typically, this means 1 vCPU is running, and 1 is sitting in the runqueue, and can hence be stolen. Then, in csched_balance_load(), it is enough to go over the overloaded pCPUs, instead than all non-idle pCPUs, which is better. signed-off-by: Dario Faggioli --- Cc: George Dunlap Cc: Andrew Cooper --- I'm Cc-ing Andy on this patch, because we've discussed once about doing something like this upstream. --- xen/common/sched_credit.c | 56 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 47 insertions(+), 9 deletions(-) diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 2b13e99..529b6c7 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -171,6 +171,7 @@ struct csched_pcpu { struct timer ticker; unsigned int tick; unsigned int idle_bias; + unsigned int nr_runnable; }; /* @@ -221,6 +222,7 @@ struct csched_private { uint32_t ncpus; struct timer master_ticker; unsigned int master; + cpumask_var_t overloaded; cpumask_var_t idlers; cpumask_var_t cpus; uint32_t weight; @@ -263,7 +265,10 @@ static inline bool_t is_runq_idle(unsigned int cpu) static inline void __runq_insert(struct csched_vcpu *svc) { - const struct list_head * const runq = RUNQ(svc->vcpu->processor); + unsigned int cpu = svc->vcpu->processor; + const struct list_head * const runq = RUNQ(cpu); + struct csched_private * const prv = CSCHED_PRIV(per_cpu(scheduler, cpu)); + struct csched_pcpu * const spc = CSCHED_PCPU(cpu); struct list_head *iter; BUG_ON( __vcpu_on_runq(svc) ); @@ -288,12 +293,37 @@ __runq_insert(struct csched_vcpu *svc) } list_add_tail(&svc->runq_elem, iter); + + /* + * If there is more than just the idle vCPU and a "regular" vCPU runnable + * on the runqueue of this pCPU, mark it as overloaded (so other pCPU + * can come and pick up some work). + */ + if ( ++spc->nr_runnable > 2 && + !cpumask_test_cpu(cpu, prv->overloaded) ) + cpumask_set_cpu(cpu, prv->overloaded); } static inline void __runq_remove(struct csched_vcpu *svc) { + unsigned int cpu = svc->vcpu->processor; + struct csched_private * const prv = CSCHED_PRIV(per_cpu(scheduler, cpu)); + struct csched_pcpu * const spc = CSCHED_PCPU(cpu); + BUG_ON( !__vcpu_on_runq(svc) ); + + /* + * Mark the CPU as no longer overloaded when we drop to having only + * 1 vCPU in its runqueue. In fact, this means that just the idle + * idle vCPU and a "regular" vCPU are around. + */ + if ( --spc->nr_runnable <= 2 && + cpumask_test_cpu(cpu, prv->overloaded) ) + cpumask_clear_cpu(cpu, prv->overloaded); + + ASSERT(spc->nr_runnable >= 1); + list_del_init(&svc->runq_elem); } @@ -590,6 +620,7 @@ init_pdata(struct csched_private *prv, struct csched_pcpu *spc, int cpu) /* Start off idling... */ BUG_ON(!is_idle_vcpu(curr_on_cpu(cpu))); cpumask_set_cpu(cpu, prv->idlers); + spc->nr_runnable = 1; } static void @@ -1704,8 +1735,8 @@ csched_load_balance(struct csched_private *prv, int cpu, peer_node = node; do { - /* Find out what the !idle are in this node */ - cpumask_andnot(&workers, online, prv->idlers); + /* Select the pCPUs in this node that have work we can steal. */ + cpumask_and(&workers, online, prv->overloaded); cpumask_and(&workers, &workers, &node_to_cpumask(peer_node)); __cpumask_clear_cpu(cpu, &workers); @@ -1989,7 +2020,8 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu) runq = &spc->runq; cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_sibling_mask, cpu)); - printk("CPU[%02d] sort=%d, sibling=%s, ", cpu, spc->runq_sort_last, cpustr); + printk("CPU[%02d] nr_run=%d, sort=%d, sibling=%s, ", + cpu, spc->nr_runnable, spc->runq_sort_last, cpustr); cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_core_mask, cpu)); printk("core=%s\n", cpustr); @@ -2027,7 +2059,7 @@ csched_dump(const struct scheduler *ops) spin_lock_irqsave(&prv->lock, flags); -#define idlers_buf keyhandler_scratch +#define cpumask_buf keyhandler_scratch printk("info:\n" "\tncpus = %u\n" @@ -2055,8 +2087,10 @@ csched_dump(const struct scheduler *ops) prv->ticks_per_tslice, vcpu_migration_delay); - cpumask_scnprintf(idlers_buf, sizeof(idlers_buf), prv->idlers); - printk("idlers: %s\n", idlers_buf); + cpumask_scnprintf(cpumask_buf, sizeof(cpumask_buf), prv->idlers); + printk("idlers: %s\n", cpumask_buf); + cpumask_scnprintf(cpumask_buf, sizeof(cpumask_buf), prv->overloaded); + printk("overloaded: %s\n", cpumask_buf); printk("active vcpus:\n"); loop = 0; @@ -2079,7 +2113,7 @@ csched_dump(const struct scheduler *ops) vcpu_schedule_unlock(lock, svc->vcpu); } } -#undef idlers_buf +#undef cpumask_buf spin_unlock_irqrestore(&prv->lock, flags); } @@ -2093,8 +2127,11 @@ csched_init(struct scheduler *ops) if ( prv == NULL ) return -ENOMEM; if ( !zalloc_cpumask_var(&prv->cpus) || - !zalloc_cpumask_var(&prv->idlers) ) + !zalloc_cpumask_var(&prv->idlers) || + !zalloc_cpumask_var(&prv->overloaded) ) { + free_cpumask_var(prv->overloaded); + free_cpumask_var(prv->idlers); free_cpumask_var(prv->cpus); xfree(prv); return -ENOMEM; @@ -2141,6 +2178,7 @@ csched_deinit(struct scheduler *ops) ops->sched_data = NULL; free_cpumask_var(prv->cpus); free_cpumask_var(prv->idlers); + free_cpumask_var(prv->overloaded); xfree(prv); } }