From patchwork Wed Nov 1 01:46:33 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongli Zhang X-Patchwork-Id: 10035835 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3F565600C5 for ; Wed, 1 Nov 2017 01:50:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2FDA928B25 for ; Wed, 1 Nov 2017 01:50:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2496928B3E; Wed, 1 Nov 2017 01:50:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0077828B25 for ; Wed, 1 Nov 2017 01:50:12 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1e9i7Y-0003gJ-F3; Wed, 01 Nov 2017 01:47:08 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1e9i7X-0003gD-Do for xen-devel@lists.xenproject.org; Wed, 01 Nov 2017 01:47:07 +0000 Received: from [85.158.143.35] by server-5.bemta-6.messagelabs.com id 09/CE-29911-A9729F95; Wed, 01 Nov 2017 01:47:06 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFuphkeJIrShJLcpLzFFi42LpnVTnqjtT/We kwe0X1hbft0xmcmD0OPzhCksAYxRrZl5SfkUCa8bfy//ZC2YZVZw4KNLAuFmzi5GLQ0hgEpPE kZUP2CGcP4wSbybNZIVwNjBKzL+0A8rpZpT40bSFqYuRk4NNQEdi2oFTLCC2iICDxKb9B4HaO TiYBbwkps/RBQkLC0RIrF72ihHEZhFQlVj16DArSAmvgJvEh/lSIGEJATmJm+c6mSFsY4n2tx fZJjDyLGBkWMWoUZxaVJZapGtoqJdUlJmeUZKbmJmja2hgppebWlycmJ6ak5hUrJecn7uJEeh 3BiDYwfhpWcAhRkkOJiVR3p2O3yOF+JLyUyozEosz4otKc1KLDzHKcHAoSfBuUPsZKSRYlJqe WpGWmQMMQJi0BAePkgivPUiat7ggMbc4Mx0idYrRm+PYpst/mDh+TLoCJDtu3gWSm8Dkhu8Pg OSzma8bmIVY8vLzUqXEed1ARgiAjMgozYNbAIukS4yyUsK8jEAnC/EUpBblZpagyr9iFOdgVB LmPQgyhSczrwTujldAJzIBnegl8QPkxJJEhJRUA+NE+zkBn1pT7lpG7qjfeI7vH1fPvR3TA6P nuAS7fo9fZSQbWP25oHyB9n8JppICnzQDQy+pdzFtdzj6k+Y/3TbtV7Bs0xZlV8sjXK7aJjte 33ta75yUt0lewMRe8oC3QuR98R7r2EK/qr1f/3cpch3m7j1ydWO++06pgycuLbossjq69nfvL SWW4oxEQy3mouJEAB1QzLKfAgAA X-Env-Sender: dongli.zhang@oracle.com X-Msg-Ref: server-7.tower-21.messagelabs.com!1509500824!79884746!1 X-Originating-IP: [141.146.126.69] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTQxLjE0Ni4xMjYuNjkgPT4gMjc3MjE4\n X-StarScan-Received: X-StarScan-Version: 9.4.45; banners=-,-,- X-VirusChecked: Checked Received: (qmail 20544 invoked from network); 1 Nov 2017 01:47:05 -0000 Received: from aserp1040.oracle.com (HELO aserp1040.oracle.com) (141.146.126.69) by server-7.tower-21.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 1 Nov 2017 01:47:05 -0000 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id vA11l3iL002531 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 1 Nov 2017 01:47:03 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id vA11l36c031659 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 1 Nov 2017 01:47:03 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id vA11l2rm009642; Wed, 1 Nov 2017 01:47:02 GMT Received: from linux.cn.oracle.com (/10.182.70.198) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 31 Oct 2017 18:47:01 -0700 From: Dongli Zhang To: xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org Date: Wed, 1 Nov 2017 09:46:33 +0800 Message-Id: <1509500793-9896-1-git-send-email-dongli.zhang@oracle.com> X-Mailer: git-send-email 2.7.4 X-Source-IP: aserv0022.oracle.com [141.146.126.234] Cc: jgross@suse.com, boris.ostrovsky@oracle.com, joao.m.martins@oracle.com Subject: [Xen-devel] [PATCH v6 1/1] xen/time: do not decrease steal time after live migration on xen X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP After guest live migration on xen, steal time in /proc/stat (cpustat[CPUTIME_STEAL]) might decrease because steal returned by xen_steal_lock() might be less than this_rq()->prev_steal_time which is derived from previous return value of xen_steal_clock(). For instance, steal time of each vcpu is 335 before live migration. cpu 198 0 368 200064 1962 0 0 1340 0 0 cpu0 38 0 81 50063 492 0 0 335 0 0 cpu1 65 0 97 49763 634 0 0 335 0 0 cpu2 38 0 81 50098 462 0 0 335 0 0 cpu3 56 0 107 50138 374 0 0 335 0 0 After live migration, steal time is reduced to 312. cpu 200 0 370 200330 1971 0 0 1248 0 0 cpu0 38 0 82 50123 500 0 0 312 0 0 cpu1 65 0 97 49832 634 0 0 312 0 0 cpu2 39 0 82 50167 462 0 0 312 0 0 cpu3 56 0 107 50207 374 0 0 312 0 0 Since runstate times are cumulative and cleared during xen live migration by xen hypervisor, the idea of this patch is to accumulate runstate times to global percpu variables before live migration suspend. Once guest VM is resumed, xen_get_runstate_snapshot_cpu() would always return the sum of new runstate times and previously accumulated times stored in global percpu variables. Comments before the call of HYPERVISOR_suspend() has been removed as it is inaccurate. The call can return an error code (e.g., possibly -EPERM in the future). Similar and more severe issue would impact prior linux 4.8-4.10 as discussed by Michael Las at https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest, which would overflow steal time and lead to 100% st usage in top command for linux 4.8-4.10. A backport of this patch would fix that issue. References: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest Signed-off-by: Dongli Zhang Reviewed-by: Boris Ostrovsky --- Changed since v1: * relocate modification to xen_get_runstate_snapshot_cpu Changed since v2: * accumulate runstate times before live migration Changed since v3: * do not accumulate times in the case of guest checkpointing Changed since v4: * allocate array of vcpu_runstate_info to reduce number of memory allocation Changed since v5: * remove old incorrect comments above hypercall and mention in commit message * rename xen_accumulate_runstate_time() to xen_manage_runstate_time() * move global static pointer into xen_manage_runstate_time * change warn and alert to pr_warn_once() or pr_warn() * change kcalloc to kmalloc_array * do not add RUNSTATE_max to change Xen ABI and use 4 in the code instead --- drivers/xen/manage.c | 7 ++--- drivers/xen/time.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++-- include/xen/xen-ops.h | 1 + 3 files changed, 72 insertions(+), 7 deletions(-) diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index c425d03..8835065 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -72,18 +72,15 @@ static int xen_suspend(void *data) } gnttab_suspend(); + xen_manage_runstate_time(-1); xen_arch_pre_suspend(); - /* - * This hypercall returns 1 if suspend was cancelled - * or the domain was merely checkpointed, and 0 if it - * is resuming in a new domain. - */ si->cancelled = HYPERVISOR_suspend(xen_pv_domain() ? virt_to_gfn(xen_start_info) : 0); xen_arch_post_suspend(si->cancelled); + xen_manage_runstate_time(si->cancelled ? 1 : 0); gnttab_resume(); if (!si->cancelled) { diff --git a/drivers/xen/time.c b/drivers/xen/time.c index ac5f23f..65a0b25 100644 --- a/drivers/xen/time.c +++ b/drivers/xen/time.c @@ -19,6 +19,8 @@ /* runstate info updated by Xen */ static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate); +static DEFINE_PER_CPU(u64[4], old_runstate_time); + /* return an consistent snapshot of 64-bit time/counter value */ static u64 get64(const u64 *p) { @@ -47,8 +49,8 @@ static u64 get64(const u64 *p) return ret; } -static void xen_get_runstate_snapshot_cpu(struct vcpu_runstate_info *res, - unsigned int cpu) +static void xen_get_runstate_snapshot_cpu_delta( + struct vcpu_runstate_info *res, unsigned int cpu) { u64 state_time; struct vcpu_runstate_info *state; @@ -66,6 +68,71 @@ static void xen_get_runstate_snapshot_cpu(struct vcpu_runstate_info *res, (state_time & XEN_RUNSTATE_UPDATE)); } +static void xen_get_runstate_snapshot_cpu(struct vcpu_runstate_info *res, + unsigned int cpu) +{ + int i; + + xen_get_runstate_snapshot_cpu_delta(res, cpu); + + for (i = 0; i < 4; i++) + res->time[i] += per_cpu(old_runstate_time, cpu)[i]; +} + +void xen_manage_runstate_time(int action) +{ + static struct vcpu_runstate_info *runstate_delta; + struct vcpu_runstate_info state; + int cpu, i; + + switch (action) { + case -1: /* backup runstate time before suspend */ + if (unlikely(runstate_delta)) + pr_warn_once("%s: memory leak as runstate_delta is not NULL\n", + __func__); + + runstate_delta = kmalloc_array(num_possible_cpus(), + sizeof(*runstate_delta), + GFP_ATOMIC); + if (unlikely(!runstate_delta)) { + pr_warn("%s: failed to allocate runstate_delta\n", + __func__); + return; + } + + for_each_possible_cpu(cpu) { + xen_get_runstate_snapshot_cpu_delta(&state, cpu); + memcpy(runstate_delta[cpu].time, state.time, + sizeof(runstate_delta[cpu].time)); + } + + break; + + case 0: /* backup runstate time after resume */ + if (unlikely(!runstate_delta)) { + pr_warn("%s: cannot accumulate runstate time as runstate_delta is NULL\n", + __func__); + return; + } + + for_each_possible_cpu(cpu) { + for (i = 0; i < 4; i++) + per_cpu(old_runstate_time, cpu)[i] += + runstate_delta[cpu].time[i]; + } + + break; + + default: /* do not accumulate runstate time for checkpointing */ + break; + } + + if (action != -1 && runstate_delta) { + kfree(runstate_delta); + runstate_delta = NULL; + } +} + /* * Runstate accounting */ diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 218e6aa..0907227 100644 --- a/include/xen/xen-ops.h +++ b/include/xen/xen-ops.h @@ -32,6 +32,7 @@ void xen_resume_notifier_unregister(struct notifier_block *nb); bool xen_vcpu_stolen(int vcpu); void xen_setup_runstate_info(int cpu); void xen_time_setup_guest(void); +void xen_manage_runstate_time(int action); void xen_get_runstate_snapshot(struct vcpu_runstate_info *res); u64 xen_steal_clock(int cpu);