From patchwork Thu Oct 19 13:39:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 10016899 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 80E3F60224 for ; Thu, 19 Oct 2017 13:42:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7390628B4D for ; Thu, 19 Oct 2017 13:42:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6742928D18; Thu, 19 Oct 2017 13:42:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 88A6228B4D for ; Thu, 19 Oct 2017 13:41:59 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1e5B3H-0002cD-8y; Thu, 19 Oct 2017 13:39:59 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1e5B3G-0002bg-EK for xen-devel@lists.xenproject.org; Thu, 19 Oct 2017 13:39:58 +0000 Received: from [85.158.143.35] by server-8.bemta-6.messagelabs.com id D4/15-13910-D2BA8E95; Thu, 19 Oct 2017 13:39:57 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupjkeJIrShJLcpLzFFi42KZM10+UFdr9Yt Ig5+ztS2+b5nM5MDocfjDFZYAxijWzLyk/IoE1owD1x6wFRz3qJi8O7+BcYNtFyMXh5DAZCaJ ZfvOsEA4vxklHn1fzwrhbGCUuLSzj72LkRPI6WaUOLBCGcRmE9CTaD3/mRnEFhFwkHj5bgkjS AOzwC4miVOrJ4MlhAUcJXpXtTGB2CwCqhIN3x+ADeIVsJM4sWQ2G4gtISAvsavtIiuIzSlgL7 Fo/VJWiGV2Ete3drJC1BhL9M3qY5nAyLeAkWEVo0ZxalFZapGukYVeUlFmekZJbmJmjq6hgZl ebmpxcWJ6ak5iUrFecn7uJkZgqDAAwQ7G82sDDzFKcjApifJ+rHoRKcSXlJ9SmZFYnBFfVJqT WnyIUYaDQ0mC9+BKoJxgUWp6akVaZg4waGHSEhw8SiK8U0DSvMUFibnFmekQqVOMxhzHNl3+w 8TRcfPuHyYhlrz8vFQpcd5bIKUCIKUZpXlwg2DRdIlRVkqYlxHoNCGegtSi3MwSVPlXjOIcjE rCvItBpvBk5pXA7XsFdAoT0Cns9mCnlCQipKQaGH02eP2d1XzFeEbX9hSVngWS+cVzJ89dVnb eYTPP36ehdy0cas+ZN027tTyH35qr4ZjoTZma3zvTRBzV8vxWHokWtzipVzPp+6lb1wX/er51 P7HR4NQE1mcWZzq2Rq+z12Saf27+tN6fB8NniNhe2pCgP4NBvt7Fr+0kS5JkRf3dmVMN57RKC imxFGckGmoxFxUnAgBzThF/oQIAAA== X-Env-Sender: joao.m.martins@oracle.com X-Msg-Ref: server-15.tower-21.messagelabs.com!1508420393!77816342!1 X-Originating-IP: [156.151.31.81] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTU2LjE1MS4zMS44MSA9PiAyODgzMzk=\n X-StarScan-Received: X-StarScan-Version: 9.4.45; banners=-,-,- X-VirusChecked: Checked Received: (qmail 46820 invoked from network); 19 Oct 2017 13:39:54 -0000 Received: from userp1040.oracle.com (HELO userp1040.oracle.com) (156.151.31.81) by server-15.tower-21.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 19 Oct 2017 13:39:54 -0000 Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v9JDdf9S006665 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 19 Oct 2017 13:39:42 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id v9JDdfMO019385 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 19 Oct 2017 13:39:41 GMT Received: from abhmp0003.oracle.com (abhmp0003.oracle.com [141.146.116.9]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id v9JDdeFP031540; Thu, 19 Oct 2017 13:39:40 GMT Received: from paddy.uk.oracle.com (/10.175.233.11) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 19 Oct 2017 06:39:39 -0700 From: Joao Martins To: linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org Date: Thu, 19 Oct 2017 14:39:17 +0100 Message-Id: <20171019133918.18367-5-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171019133918.18367-1-joao.m.martins@oracle.com> References: <20171019133918.18367-1-joao.m.martins@oracle.com> X-Source-IP: aserv0021.oracle.com [141.146.126.233] Cc: Juergen Gross , x86@kernel.org, Andy Lutomirski , Ingo Molnar , Thomas Gleixner , "H. Peter Anvin" , Boris Ostrovsky , Joao Martins Subject: [Xen-devel] [PATCH v7 4/5] x86/xen/time: setup vcpu 0 time info page X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP In order to support pvclock vdso on xen we need to setup the time info page for vcpu 0 and register the page with Xen using the VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall will also forcefully update the pvti which will set some of the necessary flags for vdso. Afterwards we check if it supports the PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having vdso/vsyscall support. And if so, it will set the cpu 0 pvti that will be later on used when mapping the vdso image. The xen headers are also updated to include the new hypercall for registering the secondary vcpu_time_info struct. Signed-off-by: Joao Martins Reviewed-by: Juergen Gross Reviewed-by: Boris Ostrovsky --- Changes since v6: * Add Boris RoB Changes since v5: * Move xen_setup_vsyscall_time_info within the PVCLOCK_TSC_STABLE_BIT clause added in predecessor patch. Changes since v4: * Remove pvclock_set_flags since predecessor patch will set in xen_time_init. Consequently pvti local variable is not so useful and doesn't make things more clear - therefore remove it. * Adjust comment on xen_setup_vsyscall_time_info() * Add Juergen's Reviewed-by (Retained as there wasn't functional changes) Changes since v3: (Comments from Juergen) * Remove _t added suffix from *GUEST_HANDLE* when sync vcpu.h with the latest Changes since v2: (Comments from Juergen) * Omit the blank after the cast on all 3 occurrences. * Change last VCLOCK_PVCLOCK message to be more descriptive * Sync the complete vcpu.h header instead of just adding the needed one. (IOW adding VCPUOP_get_physid) Changes since v1: * Check flags ahead to see if the primary clock can use PVCLOCK_TSC_STABLE_BIT even if secondary registration fails. (Comments from Boris) * Remove addr, addr variables; * Change first pr_debug to pr_warn; * Change last pr_debug to pr_notice; * Add routine to solely register secondary time info. * Move xen_clock to outside xen_setup_vsyscall_time_info to allow restore path to simply re-register secondary time info. Let us handle the restore path more gracefully without re-allocating a page. * Removed cpu argument from xen_setup_vsyscall_time_info() * Adjustment failed registration error messages/loglevel to be the same * Also teardown secondary time info on suspend Changes since RFC: (Comments from Boris and David) * Remove Kconfig option * Use get_zeroed_page/free/page * Remove the hypercall availability check * Unregister pvti with arg.addr.v = NULL if stable bit isn't supported. (New) * Set secondary copy on restore such that it works on migration. * Drop global xen_clock variable and stash it locally on xen_setup_vsyscall_time_info. * WARN_ON(ret) if we fail to unregister the pvti. --- arch/x86/xen/suspend.c | 4 ++ arch/x86/xen/time.c | 90 +++++++++++++++++++++++++++++++++++++++++++- arch/x86/xen/xen-ops.h | 2 + include/xen/interface/vcpu.h | 42 +++++++++++++++++++++ 4 files changed, 137 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index d6b1680693a9..800ed36ecfba 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -16,6 +16,8 @@ void xen_arch_pre_suspend(void) { + xen_save_time_memory_area(); + if (xen_pv_domain()) xen_pv_pre_suspend(); } @@ -26,6 +28,8 @@ void xen_arch_post_suspend(int cancelled) xen_pv_post_suspend(cancelled); else xen_hvm_post_suspend(cancelled); + + xen_restore_time_memory_area(); } static void xen_vcpu_notify_restore(void *data) diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index fc0148d3a70d..dec966fbe888 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -370,6 +370,92 @@ static const struct pv_time_ops xen_time_ops __initconst = { .steal_clock = xen_steal_clock, }; +static struct pvclock_vsyscall_time_info *xen_clock __read_mostly; + +void xen_save_time_memory_area(void) +{ + struct vcpu_register_time_memory_area t; + int ret; + + if (!xen_clock) + return; + + t.addr.v = NULL; + + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t); + if (ret != 0) + pr_notice("Cannot save secondary vcpu_time_info (err %d)", + ret); + else + clear_page(xen_clock); +} + +void xen_restore_time_memory_area(void) +{ + struct vcpu_register_time_memory_area t; + int ret; + + if (!xen_clock) + return; + + t.addr.v = &xen_clock->pvti; + + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t); + + /* + * We don't disable VCLOCK_PVCLOCK entirely if it fails to register the + * secondary time info with Xen or if we migrated to a host without the + * necessary flags. On both of these cases what happens is either + * process seeing a zeroed out pvti or seeing no PVCLOCK_TSC_STABLE_BIT + * bit set. Userspace checks the latter and if 0, it discards the data + * in pvti and fallbacks to a system call for a reliable timestamp. + */ + if (ret != 0) + pr_notice("Cannot restore secondary vcpu_time_info (err %d)", + ret); +} + +static void xen_setup_vsyscall_time_info(void) +{ + struct vcpu_register_time_memory_area t; + struct pvclock_vsyscall_time_info *ti; + int ret; + + ti = (struct pvclock_vsyscall_time_info *)get_zeroed_page(GFP_KERNEL); + if (!ti) + return; + + t.addr.v = &ti->pvti; + + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t); + if (ret) { + pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret); + free_page((unsigned long)ti); + return; + } + + /* + * If primary time info had this bit set, secondary should too since + * it's the same data on both just different memory regions. But we + * still check it in case hypervisor is buggy. + */ + if (!(ti->pvti.flags & PVCLOCK_TSC_STABLE_BIT)) { + t.addr.v = NULL; + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, + 0, &t); + if (!ret) + free_page((unsigned long)ti); + + pr_notice("xen: VCLOCK_PVCLOCK not supported (tsc unstable)\n"); + return; + } + + xen_clock = ti; + pvclock_set_pvti_cpu0_va(xen_clock); + + xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK; +} + static void __init xen_time_init(void) { struct pvclock_vcpu_time_info *pvti; @@ -401,8 +487,10 @@ static void __init xen_time_init(void) * bit is supported hence speeding up Xen clocksource. */ pvti = &__this_cpu_read(xen_vcpu)->time; - if (pvti->flags & PVCLOCK_TSC_STABLE_BIT) + if (pvti->flags & PVCLOCK_TSC_STABLE_BIT) { pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); + xen_setup_vsyscall_time_info(); + } xen_setup_runstate_info(cpu); xen_setup_timer(cpu); diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index c8a6d224f7ed..f96dbedb33d4 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -69,6 +69,8 @@ void xen_setup_runstate_info(int cpu); void xen_teardown_timer(int cpu); u64 xen_clocksource_read(void); void xen_setup_cpu_clockevents(void); +void xen_save_time_memory_area(void); +void xen_restore_time_memory_area(void); void __init xen_init_time_ops(void); void __init xen_hvm_init_time_ops(void); diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h index 98188c87f5c1..504c71601511 100644 --- a/include/xen/interface/vcpu.h +++ b/include/xen/interface/vcpu.h @@ -178,4 +178,46 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info); /* Send an NMI to the specified VCPU. @extra_arg == NULL. */ #define VCPUOP_send_nmi 11 + +/* + * Get the physical ID information for a pinned vcpu's underlying physical + * processor. The physical ID informmation is architecture-specific. + * On x86: id[31:0]=apic_id, id[63:32]=acpi_id. + * This command returns -EINVAL if it is not a valid operation for this VCPU. + */ +#define VCPUOP_get_physid 12 /* arg == vcpu_get_physid_t */ +struct vcpu_get_physid { + uint64_t phys_id; +}; +DEFINE_GUEST_HANDLE_STRUCT(vcpu_get_physid); +#define xen_vcpu_physid_to_x86_apicid(physid) ((uint32_t)(physid)) +#define xen_vcpu_physid_to_x86_acpiid(physid) ((uint32_t)((physid) >> 32)) + +/* + * Register a memory location to get a secondary copy of the vcpu time + * parameters. The master copy still exists as part of the vcpu shared + * memory area, and this secondary copy is updated whenever the master copy + * is updated (and using the same versioning scheme for synchronisation). + * + * The intent is that this copy may be mapped (RO) into userspace so + * that usermode can compute system time using the time info and the + * tsc. Usermode will see an array of vcpu_time_info structures, one + * for each vcpu, and choose the right one by an existing mechanism + * which allows it to get the current vcpu number (such as via a + * segment limit). It can then apply the normal algorithm to compute + * system time from the tsc. + * + * @extra_arg == pointer to vcpu_register_time_info_memory_area structure. + */ +#define VCPUOP_register_vcpu_time_memory_area 13 +DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info); +struct vcpu_register_time_memory_area { + union { + GUEST_HANDLE(vcpu_time_info) h; + struct pvclock_vcpu_time_info *v; + uint64_t p; + } addr; +}; +DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area); + #endif /* __XEN_PUBLIC_VCPU_H__ */