From patchwork Thu Jul 27 12:05:38 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 9866719 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1AA2F6035E for ; Thu, 27 Jul 2017 12:15:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B41628807 for ; Thu, 27 Jul 2017 12:15:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F3F5128811; Thu, 27 Jul 2017 12:15:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RCVD_IN_SORBS_SPAM,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DB84528807 for ; Thu, 27 Jul 2017 12:15:41 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dahfH-0006yI-BG; Thu, 27 Jul 2017 12:13:15 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dahfG-0006yA-59 for xen-devel@lists.xenproject.org; Thu, 27 Jul 2017 12:13:14 +0000 Received: from [85.158.137.68] by server-9.bemta-3.messagelabs.com id 1F/85-01995-9D8D9795; Thu, 27 Jul 2017 12:13:13 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmplleJIrShJLcpLzFFi42Lxqg1y0r1xozL S4MsRaYvvWyYzOTB6HP5whSWAMYo1My8pvyKBNePstGOsBf1pFWu3nmZsYHzi2cXIxSEkMJNR Ysm6bhYQh0VgDavEug3fmUAcCYFLrBKPd8xg72LkBHLiJI7t3ABlV0g8Wr2WEcQWElCRuLl9F RPEqB+MEq96foElhAX0JI4c/cEOYYdK3F0+AyzOJmAg8WbHXlYQW0RASeLeqslgzcwC8xgl1p 15yAKSYBFQlbh+bR0biM0r4CDROX8ZmM0p4CSx/UITG8RmR4nbZ3+CLRAVkJNYebmFFaJeUOL kzCdAcziAhmpKrN+lDxJmFpCX2P52DvMERpFZSKpmIVTNQlK1gJF5FaN6cWpRWWqRrqleUlFm ekZJbmJmjq6hgbFebmpxcWJ6ak5iUrFecn7uJkZgBNQzMDDuYLz81ekQoyQHk5Io7yTTikghv qT8lMqMxOKM+KLSnNTiQ4waHBwCE87Onc4kxZKXn5eqJMF79HplpJBgUWp6akVaZg4wRmFKJT h4lER4l4KkeYsLEnOLM9MhUqcYjTk2rF7/hYlj0oHtX5iEwCZJifP6gpQKgJRmlObBDYKljku MslLCvIwMDAxCPAWpRbmZJajyrxjFORiVhHk/g0zhycwrgdv3CugUJqBTJjaBnVKSiJCSamA8 +eBVQ6HE17tfDda3Tpp2LU+Mt+pe9fGPwcLJwXWFxXYtuRlp1Ud8dY2XiWSGfbNWyzEw+9Rwa /ulD5mCiZcvvjye/nTPJelT+kzKjHNmvJo1u81tvZrHFqer3ZGcjnkOZWvFJ/PO5g/UOMS7g8 VucuQdwyNuUwI26Tyslz+hELG9ceLeGmElluKMREMt5qLiRADahRP8GAMAAA== X-Env-Sender: raistlin.df@gmail.com X-Msg-Ref: server-3.tower-31.messagelabs.com!1501157592!107401352!1 X-Originating-IP: [74.125.82.66] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.4.25; banners=-,-,- X-VirusChecked: Checked Received: (qmail 50195 invoked from network); 27 Jul 2017 12:13:12 -0000 Received: from mail-wm0-f66.google.com (HELO mail-wm0-f66.google.com) (74.125.82.66) by server-3.tower-31.messagelabs.com with AES128-GCM-SHA256 encrypted SMTP; 27 Jul 2017 12:13:12 -0000 Received: by mail-wm0-f66.google.com with SMTP id c184so10950315wmd.1 for ; Thu, 27 Jul 2017 05:13:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=S58vjimk6doTkrtJi2dmcLOzU965Skothr65NvOb/2A=; b=FdmSn8alu7E9Dlx7kjfEdq0Dztzv7MLJpDZyPEazBQqksiJrCuDsUPuEme8P5XYqov RU91aTaUwvmaiSUwhvLHDM0gWSFL5yNfRcuO53abimO/fhvvpOTjwqUSO/isqcd2/MaX qVGuyCWa0VndR/Am0N6swVWARUBxPsGFtb+e5LJ8uWqrznYHla3Ohdpq39CgaTYt5dL+ 3ku4ZX0gmh1GYrVz/VGZKdv1zIwMoNztfi9G+XubMNZtq471U2WWNqbnAAESRNGM3Vev G8yogn+2BUlSR9SpCczC9iUk4aSS8qFjCFk0swSaVtIkLJu3LgB0caW2pXVIbTj4ZE9x pmrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=S58vjimk6doTkrtJi2dmcLOzU965Skothr65NvOb/2A=; b=XLV41//RnfPVFZ9SEQO4BaCbRWwOlREl2vXhK4Xg5sQTEw/S9hQt53jMSRbgRrdJz7 HBzFhJVlfs6ooJf2K43Q2FGQTb83Z5lTDx2QqZVvqvBr0w9E58yfUQHxYEepYEG65FwT waU8DdFyqguItGIHxn9Mxb2qzoCOp1oKrX9ztuSIftcq7SlXYah87JsVi82ZUu+gR15W b/n4izXhC3mr3WXzBP5R6sFr1WGbJxr08qxYdEHZ6YRFHfwquCt4SR7qgqMgkoN8U/Qf BP70b55o3WGxnHVzdO2+URatwgk5NCaXFZja2fsggUBnA4PhWjIiTfZGY/DV3BddviMr iueQ== X-Gm-Message-State: AIVw113x/Yzf50UOUXevkrtvVHqmF1Ox4GeFt7yWhYHIksQawI14cFvK QK1UBVCLji2lpKW7 X-Received: by 10.28.67.68 with SMTP id q65mr2877972wma.162.1501157141222; Thu, 27 Jul 2017 05:05:41 -0700 (PDT) Received: from [192.168.0.31] ([80.66.223.212]) by smtp.gmail.com with ESMTPSA id q64sm2179925wmg.35.2017.07.27.05.05.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jul 2017 05:05:40 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Thu, 27 Jul 2017 14:05:38 +0200 Message-ID: <150115713877.6767.4795115900091736740.stgit@Solace> In-Reply-To: <150115657192.6767.15778617807307106582.stgit@Solace> References: <150115657192.6767.15778617807307106582.stgit@Solace> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Cc: Wei Liu , Ian Jackson , George Dunlap , Anshul Makkar Subject: [Xen-devel] [PATCH v2 1/6] xen/tools: credit2: soft-affinity awareness in runq_tickle() X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Soft-affinity support is usually implemented by means of a two step "balancing loop", where: - during the first step, we consider soft-affinity (if the vcpu has one); - during the second (if we get to it), we consider hard-affinity. In runq_tickle(), we need to do that for checking whether we can execute the waking vCPU on an pCPU that is idle. In fact, we want to be sure that, if there is an idle pCPU in the vCPU's soft affinity, we'll use it. If there are no such idle pCPUs, though, and we have to check non-idle ones, we can avoid the loop and to both hard and soft-affinity in one pass. In fact, we can we scan runqueue and compute a "score" for each vCPU which is running on each pCPU. The idea is, since we may have to preempt someone: - try to make sure that the waking vCPU will run inside its soft-affinity, - try to preempt someone that is running outside of its own soft-affinity. The value of the score is added to a trace record, so xenalyze's code and tools/xentrace/formats are updated accordingly. Suggested-by: George Dunlap Signed-off-by: Dario Faggioli Reviewed-by: George Dunlap --- Cc: Anshul Makkar Cc: Ian Jackson Cc: Wei Liu --- tools/xentrace/formats | 2 tools/xentrace/xenalyze.c | 7 + xen/common/sched_credit2.c | 214 +++++++++++++++++++++++++++++--------------- 3 files changed, 146 insertions(+), 77 deletions(-) diff --git a/tools/xentrace/formats b/tools/xentrace/formats index c1f584f..f39182a 100644 --- a/tools/xentrace/formats +++ b/tools/xentrace/formats @@ -53,7 +53,7 @@ 0x00022202 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:runq_pos [ dom:vcpu = 0x%(1)08x, pos = %(2)d] 0x00022203 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:credit burn [ dom:vcpu = 0x%(1)08x, credit = %(2)d, delta = %(3)d ] 0x00022204 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:credit_add -0x00022205 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:tickle_check [ dom:vcpu = 0x%(1)08x, credit = %(2)d ] +0x00022205 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:tickle_check [ dom:vcpu = 0x%(1)08x, credit = %(2)d, score = %(3)d ] 0x00022206 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:tickle [ cpu = %(1)d ] 0x00022207 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:credit_reset [ dom:vcpu = 0x%(1)08x, cr_start = %(2)d, cr_end = %(3)d, mult = %(4)d ] 0x00022208 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) csched2:sched_tasklet diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c index 24cce2a..39fc35f 100644 --- a/tools/xentrace/xenalyze.c +++ b/tools/xentrace/xenalyze.c @@ -7692,11 +7692,12 @@ void sched_process(struct pcpu_info *p) if(opt.dump_all) { struct { unsigned int vcpuid:16, domid:16; - int credit; + int credit, score; } *r = (typeof(r))ri->d; - printf(" %s csched2:tickle_check d%uv%u, credit = %d\n", - ri->dump_header, r->domid, r->vcpuid, r->credit); + printf(" %s csched2:tickle_check d%uv%u, credit = %d, score = %d\n\n", + ri->dump_header, r->domid, r->vcpuid, + r->credit, r->score); } break; case TRC_SCHED_CLASS_EVT(CSCHED2, 6): /* TICKLE */ diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 29c002a..57e77df 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -1146,6 +1146,73 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd) } /* + * Score to preempt the target cpu. Return a negative number if the + * credit isn't high enough; if it is, favor a preemption on cpu in + * this order: + * - cpu is in new's soft-affinity, not in cur's soft-affinity + * (2 x CSCHED2_CREDIT_INIT score bonus); + * - cpu is in new's soft-affinity and cur's soft-affinity, or + * cpu is not in new's soft-affinity, nor in cur's soft-affinity + * (1x CSCHED2_CREDIT_INIT score bonus); + * - cpu is not in new's soft-affinity, while it is in cur's soft-affinity + * (no bonus). + * + * Within the same class, the highest difference of credit. + */ +static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now, + struct csched2_vcpu *new, unsigned int cpu) +{ + struct csched2_vcpu * cur = csched2_vcpu(curr_on_cpu(cpu)); + s_time_t score; + + /* + * We are dealing with cpus that are marked non-idle (i.e., that are not + * in rqd->idle). However, some of them may be running their idle vcpu, + * if taking care of tasklets. In that case, we want to leave it alone. + */ + if ( unlikely(is_idle_vcpu(cur->vcpu)) ) + return -1; + + burn_credits(rqd, cur, now); + + score = new->credit - cur->credit; + if ( new->vcpu->processor != cpu ) + score -= CSCHED2_MIGRATE_RESIST; + + /* + * If score is positive, it means new has enough credits (i.e., + * new->credit > cur->credit+CSCHED2_MIGRATE_RESIST). + * + * Let's compute the bonuses for soft-affinities. + */ + if ( score > 0 ) + { + if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) ) + score += CSCHED2_CREDIT_INIT; + + if ( !cpumask_test_cpu(cpu, cur->vcpu->cpu_soft_affinity) ) + score += CSCHED2_CREDIT_INIT; + } + + if ( unlikely(tb_init_done) ) + { + struct { + unsigned vcpu:16, dom:16; + int credit, score; + } d; + d.dom = cur->vcpu->domain->domain_id; + d.vcpu = cur->vcpu->vcpu_id; + d.credit = cur->credit; + d.score = score; + __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1, + sizeof(d), + (unsigned char *)&d); + } + + return score; +} + +/* * Check what processor it is best to 'wake', for picking up a vcpu that has * just been put (back) in the runqueue. Logic is as follows: * 1. if there are idle processors in the runq, wake one of them; @@ -1165,11 +1232,11 @@ static void runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now) { int i, ipid = -1; - s_time_t lowest = (1<<30); - unsigned int cpu = new->vcpu->processor; + s_time_t max = 0; + unsigned int bs, cpu = new->vcpu->processor; struct csched2_runqueue_data *rqd = c2rqd(ops, cpu); + cpumask_t *online = cpupool_domain_cpumask(new->vcpu->domain); cpumask_t mask; - struct csched2_vcpu * cur; ASSERT(new->rqd == rqd); @@ -1189,109 +1256,110 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now) (unsigned char *)&d); } - cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity, - cpupool_domain_cpumask(new->vcpu->domain)); - - /* - * First of all, consider idle cpus, checking if we can just - * re-use the pcpu where we were running before. - * - * If there are cores where all the siblings are idle, consider - * them first, honoring whatever the spreading-vs-consolidation - * SMT policy wants us to do. - */ - if ( unlikely(sched_smt_power_savings) ) - cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle); - else - cpumask_copy(&mask, &rqd->smt_idle); - cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu)); - i = cpumask_test_or_cycle(cpu, &mask); - if ( i < nr_cpu_ids ) + for_each_affinity_balance_step( bs ) { - SCHED_STAT_CRANK(tickled_idle_cpu); - ipid = i; - goto tickle; + /* Just skip first step, if we don't have a soft affinity */ + if ( bs == BALANCE_SOFT_AFFINITY && + !has_soft_affinity(new->vcpu, new->vcpu->cpu_hard_affinity) ) + continue; + + affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch_cpu(cpu)); + + /* + * First of all, consider idle cpus, checking if we can just + * re-use the pcpu where we were running before. + * + * If there are cores where all the siblings are idle, consider + * them first, honoring whatever the spreading-vs-consolidation + * SMT policy wants us to do. + */ + if ( unlikely(sched_smt_power_savings) ) + { + cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle); + cpumask_and(&mask, &mask, online); + } + else + cpumask_and(&mask, &rqd->smt_idle, online); + cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu)); + i = cpumask_test_or_cycle(cpu, &mask); + if ( i < nr_cpu_ids ) + { + SCHED_STAT_CRANK(tickled_idle_cpu); + ipid = i; + goto tickle; + } + + /* + * If there are no fully idle cores, check all idlers, after + * having filtered out pcpus that have been tickled but haven't + * gone through the scheduler yet. + */ + cpumask_andnot(&mask, &rqd->idle, &rqd->tickled); + cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), online); + cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu)); + i = cpumask_test_or_cycle(cpu, &mask); + if ( i < nr_cpu_ids ) + { + SCHED_STAT_CRANK(tickled_idle_cpu); + ipid = i; + goto tickle; + } } /* - * If there are no fully idle cores, check all idlers, after - * having filtered out pcpus that have been tickled but haven't - * gone through the scheduler yet. + * Note that, if we are here, it means we have done the hard-affinity + * balancing step of the loop, and hence what we have in cpumask_scratch + * is what we put there for last, i.e., new's vcpu_hard_affinity & online + * which is exactly what we need for the next part of the function. */ - cpumask_andnot(&mask, &rqd->idle, &rqd->tickled); - cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu)); - i = cpumask_test_or_cycle(cpu, &mask); - if ( i < nr_cpu_ids ) - { - SCHED_STAT_CRANK(tickled_idle_cpu); - ipid = i; - goto tickle; - } /* * Otherwise, look for the non-idle (and non-tickled) processors with * the lowest credit, among the ones new is allowed to run on. Again, * the cpu were it was running on would be the best candidate. + * + * For deciding which cpu to tickle, we use tickle_score(), which will + * factor in both new's soft-affinity, and the soft-affinity of the + * vcpu running on each cpu that we consider. */ cpumask_andnot(&mask, &rqd->active, &rqd->idle); cpumask_andnot(&mask, &mask, &rqd->tickled); cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu)); if ( __cpumask_test_and_clear_cpu(cpu, &mask) ) { - cur = csched2_vcpu(curr_on_cpu(cpu)); - burn_credits(rqd, cur, now); + s_time_t score = tickle_score(rqd, now, new, cpu); - if ( cur->credit < new->credit ) + if ( score > max ) { - SCHED_STAT_CRANK(tickled_busy_cpu); + max = score; ipid = cpu; - goto tickle; + + /* If this is in new's soft affinity, just take it */ + if ( cpumask_test_cpu(cpu, new->vcpu->cpu_soft_affinity) ) + { + SCHED_STAT_CRANK(tickled_busy_cpu); + goto tickle; + } } } for_each_cpu(i, &mask) { + s_time_t score; + /* Already looked at this one above */ ASSERT(i != cpu); - cur = csched2_vcpu(curr_on_cpu(i)); - - /* - * Even if the cpu is not in rqd->idle, it may be running the - * idle vcpu, if it's doing tasklet work. Just skip it. - */ - if ( is_idle_vcpu(cur->vcpu) ) - continue; - - /* Update credits for current to see if we want to preempt. */ - burn_credits(rqd, cur, now); + score = tickle_score(rqd, now, new, i); - if ( cur->credit < lowest ) + if ( score > max ) { + max = score; ipid = i; - lowest = cur->credit; - } - - if ( unlikely(tb_init_done) ) - { - struct { - unsigned vcpu:16, dom:16; - int credit; - } d; - d.dom = cur->vcpu->domain->domain_id; - d.vcpu = cur->vcpu->vcpu_id; - d.credit = cur->credit; - __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1, - sizeof(d), - (unsigned char *)&d); } } - /* - * Only switch to another processor if the credit difference is - * greater than the migrate resistance. - */ - if ( ipid == -1 || lowest + CSCHED2_MIGRATE_RESIST > new->credit ) + if ( ipid == -1 ) { SCHED_STAT_CRANK(tickled_no_cpu); return;