From patchwork Tue Jan 7 23:43:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anchal Agarwal X-Patchwork-Id: 11322379 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB9F4138D for ; Tue, 7 Jan 2020 23:43:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6A819222D9 for ; Tue, 7 Jan 2020 23:43:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="vFyaKuFv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A819222D9 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A5EA98E0009; Tue, 7 Jan 2020 18:43:35 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A0EA58E0001; Tue, 7 Jan 2020 18:43:35 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FEEC8E0009; Tue, 7 Jan 2020 18:43:35 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0233.hostedemail.com [216.40.44.233]) by kanga.kvack.org (Postfix) with ESMTP id 79B1D8E0001 for ; Tue, 7 Jan 2020 18:43:35 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 32FB18249980 for ; Tue, 7 Jan 2020 23:43:35 +0000 (UTC) X-FDA: 76352467590.18.lunch89_596996dea971d X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=268257c0c=anchalag@amazon.com,:tglx@linutronix.de:mingo@redhat.com:bp@alien8.de:hpa@zytor.com:x86@kernel.org:boris.ostrovsky@oracle.com:jgross@suse.com:linux-pm@vger.kernel.org::kamatam@amazon.com:sstabellini@kernel.org:konrad.wilk@oracle.co:roger.pau@citrix.com:axboe@kernel.dk:davem@davemloft.net:rjw@rjwysocki.net:len.brown@intel.com:pavel@ucw.cz:peterz@infradead.org:eduval@amazon.com:sblbir@amazon.com:anchalag@amazon.com:xen-devel@lists.xenproject.org:vkuznets@redhat.com:netdev@vger.kernel.org:linux-kernel@vger.kernel.org:woodhouse@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com:dwmw@amazon.co.uk:fllinden@amaozn.com:anchalag@amazon.com,RULES_HIT:30012:30034:30045:30054:30064:30069:30070:30075:30079:30080,0,RBL:207.171.184.29:@amazon.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: lunch89_596996dea971d X-Filterd-Recvd-Size: 8635 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Tue, 7 Jan 2020 23:43:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1578440615; x=1609976615; h=date:from:to:cc:subject:message-id:mime-version; bh=rdcg1lWycoz2N6RBe/TY1zMNmoMIsPWFhPaqjiy0cqo=; b=vFyaKuFvdk6gTKUientBsLj0fvn4rbn/bmbVNCCEJzBpM+jDCusRRwJX KuXQzsoDkfPtZVQ5H6Z/ytAbz9hMlBTKmtXgHKN+4/1Iw4GJVk5LpBxvT OyBFCtb4ne8FSLrSyT0OSIiXWGMURTnPnAFl2/ESy8TgahkHFsIhIJuBy 0=; IronPort-SDR: axEFrB4PTHlFKe/HXSD29MdTq89JZXh7B1fKqv3wb6QDgYRWQBHq3hPDRCvX3S368jrGZCfCgc QU7WOYAKbDQA== X-IronPort-AV: E=Sophos;i="5.69,407,1571702400"; d="scan'208";a="17326029" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1e-a70de69e.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9102.sea19.amazon.com with ESMTP; 07 Jan 2020 23:43:32 +0000 Received: from EX13MTAUEE002.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162]) by email-inbound-relay-1e-a70de69e.us-east-1.amazon.com (Postfix) with ESMTPS id E2104A01E2; Tue, 7 Jan 2020 23:43:24 +0000 (UTC) Received: from EX13D08UEE002.ant.amazon.com (10.43.62.92) by EX13MTAUEE002.ant.amazon.com (10.43.62.24) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Tue, 7 Jan 2020 23:43:06 +0000 Received: from EX13MTAUEA002.ant.amazon.com (10.43.61.77) by EX13D08UEE002.ant.amazon.com (10.43.62.92) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Tue, 7 Jan 2020 23:43:06 +0000 Received: from dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com (172.22.96.68) by mail-relay.amazon.com (10.43.61.169) with Microsoft SMTP Server id 15.0.1236.3 via Frontend Transport; Tue, 7 Jan 2020 23:43:06 +0000 Received: by dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com (Postfix, from userid 4335130) id 0A80440E65; Tue, 7 Jan 2020 23:43:06 +0000 (UTC) Date: Tue, 7 Jan 2020 23:43:06 +0000 From: Anchal Agarwal To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , CC: Subject: [RFC PATCH V2 07/11] x86/xen: save and restore steal clock during hibernation Message-ID: <20200107234306.GA18610@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Munehisa Kamata Currently, steal time accounting code in scheduler expects steal clock callback to provide monotonically increasing value. If the accounting code receives a smaller value than previous one, it uses a negative value to calculate steal time and results in incorrectly updated idle and steal time accounting. This breaks userspace tools which read /proc/stat. top - 08:05:35 up 2:12, 3 users, load average: 0.00, 0.07, 0.23 Tasks: 80 total, 1 running, 79 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,30100.0%id, 0.0%wa, 0.0%hi, 0.0%si, -1253874204672.0%st This can actually happen when a Xen PVHVM guest gets restored from hibernation, because such a restored guest is just a fresh domain from Xen perspective and the time information in runstate info starts over from scratch. Introduce xen_save_steal_clock() which saves current steal clock values of all present CPUs in runstate info into per-cpu variables during system core ops suspend callbacks. Its couterpart, xen_restore_steal_clock(), restores a boot CPU's steal clock in the system core resume callback. It sets offset if it found the current values in runstate info are smaller than previous ones. xen_steal_clock() is also modified to use the offset to ensure that scheduler only sees monotonically increasing number. For non-boot CPUs, restore after they're brought up, because runstate info for non-boot CPUs are not active until then. [Anchal Changelog: Merged patch xen/time: introduce xen_{save,restore}_steal_clock with this one for better code readability] Signed-off-by: Anchal Agarwal Signed-off-by: Munehisa Kamata --- arch/x86/xen/suspend.c | 13 ++++++++++++- arch/x86/xen/time.c | 3 +++ drivers/xen/time.c | 28 +++++++++++++++++++++++++++- include/xen/xen-ops.h | 2 ++ 4 files changed, 44 insertions(+), 2 deletions(-) diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 784c4484100b..dae0f74f5390 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -91,12 +91,20 @@ void xen_arch_suspend(void) static int xen_syscore_suspend(void) { struct xen_remove_from_physmap xrfp; - int ret; + int cpu, ret; /* Xen suspend does similar stuffs in its own logic */ if (xen_suspend_mode_is_xen_suspend()) return 0; + for_each_present_cpu(cpu) { + /* + * Nonboot CPUs are already offline, but the last copy of + * runstate info is still accessible. + */ + xen_save_steal_clock(cpu); + } + xrfp.domid = DOMID_SELF; xrfp.gpfn = __pa(HYPERVISOR_shared_info) >> PAGE_SHIFT; @@ -118,6 +126,9 @@ static void xen_syscore_resume(void) pvclock_resume(); + /* Nonboot CPUs will be resumed when they're brought up */ + xen_restore_steal_clock(smp_processor_id()); + gnttab_resume(); } diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index befbdd8b17f0..8cf632dda605 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -537,6 +537,9 @@ static void xen_hvm_setup_cpu_clockevents(void) { int cpu = smp_processor_id(); xen_setup_runstate_info(cpu); + if (cpu) + xen_restore_steal_clock(cpu); + /* * xen_setup_timer(cpu) - snprintf is bad in atomic context. Hence * doing it xen_hvm_cpu_notify (which gets called by smp_init during diff --git a/drivers/xen/time.c b/drivers/xen/time.c index 0968859c29d0..3713d716070c 100644 --- a/drivers/xen/time.c +++ b/drivers/xen/time.c @@ -20,6 +20,8 @@ /* runstate info updated by Xen */ static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate); +static DEFINE_PER_CPU(u64, xen_prev_steal_clock); +static DEFINE_PER_CPU(u64, xen_steal_clock_offset); static DEFINE_PER_CPU(u64[4], old_runstate_time); @@ -149,7 +151,7 @@ bool xen_vcpu_stolen(int vcpu) return per_cpu(xen_runstate, vcpu).state == RUNSTATE_runnable; } -u64 xen_steal_clock(int cpu) +static u64 __xen_steal_clock(int cpu) { struct vcpu_runstate_info state; @@ -157,6 +159,30 @@ u64 xen_steal_clock(int cpu) return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline]; } +u64 xen_steal_clock(int cpu) +{ + return __xen_steal_clock(cpu) + per_cpu(xen_steal_clock_offset, cpu); +} + +void xen_save_steal_clock(int cpu) +{ + per_cpu(xen_prev_steal_clock, cpu) = xen_steal_clock(cpu); +} + +void xen_restore_steal_clock(int cpu) +{ + u64 steal_clock = __xen_steal_clock(cpu); + + if (per_cpu(xen_prev_steal_clock, cpu) > steal_clock) { + /* Need to update the offset */ + per_cpu(xen_steal_clock_offset, cpu) = + per_cpu(xen_prev_steal_clock, cpu) - steal_clock; + } else { + /* Avoid unnecessary steal clock warp */ + per_cpu(xen_steal_clock_offset, cpu) = 0; + } +} + void xen_setup_runstate_info(int cpu) { struct vcpu_register_runstate_memory_area area; diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 3b3992b5b0c2..12b3f4474a05 100644 --- a/include/xen/xen-ops.h +++ b/include/xen/xen-ops.h @@ -37,6 +37,8 @@ void xen_time_setup_guest(void); void xen_manage_runstate_time(int action); void xen_get_runstate_snapshot(struct vcpu_runstate_info *res); u64 xen_steal_clock(int cpu); +void xen_save_steal_clock(int cpu); +void xen_restore_steal_clock(int cpu); int xen_setup_shutdown_event(void);