From patchwork Fri Jun 16 14:14:25 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 9791823 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 261136038E for ; Fri, 16 Jun 2017 14:16:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BD9D285C2 for ; Fri, 16 Jun 2017 14:16:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 108EB28649; Fri, 16 Jun 2017 14:16:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 67F9E285C2 for ; Fri, 16 Jun 2017 14:16:31 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dLs18-00027X-QZ; Fri, 16 Jun 2017 14:14:30 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dLs16-00026N-Qn for xen-devel@lists.xenproject.org; Fri, 16 Jun 2017 14:14:28 +0000 Received: from [85.158.139.211] by server-13.bemta-5.messagelabs.com id 7D/DF-01709-4C7E3495; Fri, 16 Jun 2017 14:14:28 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpileJIrShJLcpLzFFi42K5GNpwWPfwc+d Ig+tbZSy+b5nM5MDocfjDFZYAxijWzLyk/IoE1ox3W64xFnRpVex/+YS5gfGGYhcjF4eQwAxG iaVTLjGBOCwCa1gl3q/6w9zFyMkhIXCJVaL7cjKEHSex9dp7Fgi7SmLW4+dgNUICKhI3t69ig pj0k1HiVM9FsISwgJ7EkaM/2LsYOYDsMIl3Z8JAwmwCBhJvduxlBbFFBJQk7q2azARiMwtESZ xZ3gzWyiKgKrGiEeIGXgFvic+TN7CD2JwCPhLXnx9hhdjrLfHv0WU2EFtUQE5i5eUWVoh6QYm TM5+wgKxlFtCUWL9LH2K8vMT2t3OYJzCKzEJSNQuhahaSqgWMzKsY1YtTi8pSi3RN9ZKKMtMz SnITM3N0DQ1M9XJTi4sT01NzEpOK9ZLzczcxAkOfAQh2MH7pdz7EKMnBpCTK+/6Jc6QQX1J+S mVGYnFGfFFpTmrxIUYNDg6BCWfnTmeSYsnLz0tVkuANeQZUJ1iUmp5akZaZA4xOmFIJDh4lEd 5TIGN4iwsSc4sz0yFSpxiNOa5cWfeFiWPKge1fmITAJkmJ8/aBTBIAKc0ozYMbBEsalxhlpYR 5GYHOFOIpSC3KzSxBlX/FKM7BqCTMywsyhSczrwRu3yugU5iATgm64ABySkkiQkqqgZGxU6+h PeQHQ5jyF4Ffe+9N4XTSuzFP/+dmObbd0ZtnNJveM5/9oWWSbXDCtR0Fk/3WWNz5dF+Ja0qz7 N437wzf3bc41MyWK8kTv0Q597LV9K++Zl+FcueL/HkZPi/cP/mQza97MtVMAj58h5VOb6uefX PBzRQXwfjHr6p3X+B89abu86VmgWQlluKMREMt5qLiRAAT9+lkFQMAAA== X-Env-Sender: raistlin.df@gmail.com X-Msg-Ref: server-13.tower-206.messagelabs.com!1497622467!88257620!1 X-Originating-IP: [209.85.128.195] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.4.19; banners=-,-,- X-VirusChecked: Checked Received: (qmail 19419 invoked from network); 16 Jun 2017 14:14:27 -0000 Received: from mail-wr0-f195.google.com (HELO mail-wr0-f195.google.com) (209.85.128.195) by server-13.tower-206.messagelabs.com with AES128-GCM-SHA256 encrypted SMTP; 16 Jun 2017 14:14:27 -0000 Received: by mail-wr0-f195.google.com with SMTP id x23so6733475wrb.0 for ; Fri, 16 Jun 2017 07:14:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=M6BaR9a2LJPaFfWK0rQTr1KrxnArj34WfXYuEBFpwcE=; b=aErrTUlEI4ggOmue0d7GJ84SVyG+iKMD633Anc5/lqLbTAglx4ZMvKWiiaT/mU2F/i mDLJVBt3tSwHEm37tVk2RtIc8HV+VPMSr3v1xbflYTYUYKdgwSTWlrNnH2tjmj96/dhd WdyrKLeIAEuN+ZTUJSez6rVY6oIcnCH+s4+bsHcEYEWhZdMjInuUg/coUDSQflUQvisV Q7uXsXdpouOW69iWkWr5Ac9glwmuIwo/yQ4Onuo8gjKijDyPcvOgCg7QMTHYbkz8Wyop F71JYf0wArgHD84mErty7lCkjnZTiBGI7BQm0Gn2UALkOgho24T7JA/wNQ6jBrpqI44E o0Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=M6BaR9a2LJPaFfWK0rQTr1KrxnArj34WfXYuEBFpwcE=; b=eDQ8DOgwvZ2Q8PeuJhBUL/n6jWpQAW+r1RiDWox3JExA0wAScLnEeoMlYsX+tv9Zja fH6L43ziOQszKfVvGvtk7XsSS4mWlVg+kGnTyWvPvbmz272NO/F7nLBzJWU1dM3SrQKc pyi7yH+28kojsRzK4LWyU4uPGGbhijWl/SdmWU9VN8RuEpxeIuz9EuNyFh+rhRHg8jgM 1hK5bbiPrI78k4rvoSM7IjuhCpl5d340GO2+LcyjNfRutof+Sn6oBabCjkTliI2ivP9o m9WIM4ylCyI6lhQQABMeakFw4/J+P7hBnjeBGdFZvVO4Fu54FvMJXxceIpY1QNp/D67U mrOg== X-Gm-Message-State: AKS2vOyqqLa8TqSHMeUWxUbl447b6TzGaAMIaEt6p2DEk4MdXUw57mZI pxTH87+V5MA2Rg== X-Received: by 10.223.162.156 with SMTP id s28mr8271011wra.97.1497622467000; Fri, 16 Jun 2017 07:14:27 -0700 (PDT) Received: from Solace.fritz.box ([80.66.223.68]) by smtp.gmail.com with ESMTPSA id y17sm2992682wrb.39.2017.06.16.07.14.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 16 Jun 2017 07:14:26 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Fri, 16 Jun 2017 16:14:25 +0200 Message-ID: <149762246518.11899.6388937948873905095.stgit@Solace.fritz.box> In-Reply-To: <149762114626.11899.6393770850121347748.stgit@Solace.fritz.box> References: <149762114626.11899.6393770850121347748.stgit@Solace.fritz.box> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Cc: George Dunlap , Anshul Makkar Subject: [Xen-devel] [PATCH 7/7] xen: credit2: try to avoid tickling cpus subject to ratelimiting X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP With context switching ratelimiting enabled, the following pattern is quite common in a scheduling trace: 0.000845622 |||||||||||.x||| d32768v12 csched2:runq_insert d0v13, position 0 0.000845831 |||||||||||.x||| d32768v12 csched2:runq_tickle_new d0v13, processor = 12, credit = 10135529 0.000846546 |||||||||||.x||| d32768v12 csched2:burn_credits d2v7, credit = 2619231, delta = 255937 [1] 0.000846739 |||||||||||.x||| d32768v12 csched2:runq_tickle cpu 12 [...] [2] 0.000850597 ||||||||||||x||| d32768v12 csched2:schedule cpu 12, rq# 1, busy, SMT busy, tickled 0.000850760 ||||||||||||x||| d32768v12 csched2:burn_credits d2v7, credit = 2614028, delta = 5203 [3] 0.000851022 ||||||||||||x||| d32768v12 csched2:ratelimit triggered [4] 0.000851614 ||||||||||||x||| d32768v12 runstate_continue d2v7 running->running Basically, what happens is that runq_tickle() realizes d0v13 should preempt d2v7, running on cpu 12, as it has higher credits (10135529 vs. 2619231). It therefore tickles cpu 12 [1], which, in turn, schedules [2]. But --surprise surprise-- d2v7 has run for less than the ratelimit interval [3], and hence it is _not_ preempted, and continues to run. This indeed looks fine. Actually, this is what ratelimiting is there for. Note, however, that: 1) we interrupted cpu 12 for nothing; 2) what if, say on cpu 8, there is a vcpu that has: + less credit than d0v13 (so d0v13 can well preempt it), + more credit than d2v7 (that's why it was not selected to be preempted), + run for more than the ratelimiting interval (so it can really be scheduled out)? With this patch, if we are in case 2), we'd realize that tickling 12 would be pointless, and we'll continue looking, eventually finding and tickling 8. Signed-off-by: Dario Faggioli Reviewed-by: George Dunlap --- Cc: George Dunlap Cc: Anshul Makkar --- xen/common/sched_credit2.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index bbda790..c45bc03 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -160,6 +160,8 @@ #define CSCHED2_MIGRATE_RESIST ((opt_migrate_resist)*MICROSECS(1)) /* How much to "compensate" a vcpu for L2 migration. */ #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50) +/* How tolerant we should be when peeking at runtime of vcpus on other cpus */ +#define CSCHED2_RATELIMIT_TICKLE_TOLERANCE MICROSECS(50) /* Reset: Value below which credit will be reset. */ #define CSCHED2_CREDIT_RESET 0 /* Max timer: Maximum time a guest can be run for. */ @@ -1167,6 +1169,23 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd) } /* + * What we want to know is whether svc, which we assume to be running on some + * pcpu, can be interrupted and preempted (which, so far, basically means + * whether or not it already run for more than the ratelimit, to which we + * apply some tolerance). + */ +static inline bool is_preemptable(const struct csched2_vcpu *svc, + s_time_t now, s_time_t ratelimit) +{ + if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE ) + return true; + + ASSERT(svc->vcpu->is_running); + return now - svc->vcpu->runstate.state_entry_time > + ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE; +} + +/* * Score to preempt the target cpu. Return a negative number if the * credit isn't high enough; if it is, favor a preemption on cpu in * this order: @@ -1180,10 +1199,12 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd) * * Within the same class, the highest difference of credit. */ -static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now, +static s_time_t tickle_score(const struct scheduler *ops, s_time_t now, struct csched2_vcpu *new, unsigned int cpu) { + struct csched2_runqueue_data *rqd = c2rqd(ops, cpu); struct csched2_vcpu * cur = csched2_vcpu(curr_on_cpu(cpu)); + struct csched2_private *prv = csched2_priv(ops); s_time_t score; /* @@ -1191,7 +1212,8 @@ static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now, * in rqd->idle). However, some of them may be running their idle vcpu, * if taking care of tasklets. In that case, we want to leave it alone. */ - if ( unlikely(is_idle_vcpu(cur->vcpu)) ) + if ( unlikely(is_idle_vcpu(cur->vcpu) || + !is_preemptable(cur, now, MICROSECS(prv->ratelimit_us))) ) return -1; burn_credits(rqd, cur, now); @@ -1348,7 +1370,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now) cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu)); if ( __cpumask_test_and_clear_cpu(cpu, &mask) ) { - s_time_t score = tickle_score(rqd, now, new, cpu); + s_time_t score = tickle_score(ops, now, new, cpu); if ( score > max ) { @@ -1371,7 +1393,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now) /* Already looked at this one above */ ASSERT(i != cpu); - score = tickle_score(rqd, now, new, i); + score = tickle_score(ops, now, new, i); if ( score > max ) {