diff mbox

[v5,3/4] x86/xen/time: setup vcpu 0 time info page

Message ID 20171002183122.3961-4-joao.m.martins@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Joao Martins Oct. 2, 2017, 6:31 p.m. UTC
In order to support pvclock vdso on xen we need to setup the time
info page for vcpu 0 and register the page with Xen using the
VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall
will also forcefully update the pvti which will set some of the
necessary flags for vdso. Afterwards we check if it supports the
PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having
vdso/vsyscall support. And if so, it will set the cpu 0 pvti that
will be later on used when mapping the vdso image.

The xen headers are also updated to include the new hypercall for
registering the secondary vcpu_time_info struct.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
Changes since v4:
 * Remove pvclock_set_flags since predecessor patch will set in
 xen_time_init. Consequently pvti local variable is not so useful
 and doesn't make things more clear - therefore remove it.
 * Adjust comment on xen_setup_vsyscall_time_info()
 * Add Juergen's Reviewed-by (Retained as there wasn't functional
 changes)

Changes since v3:
 (Comments from Juergen)
 * Remove _t added suffix from *GUEST_HANDLE* when sync vcpu.h
 with the latest

Changes since v2:
 (Comments from Juergen)
 * Omit the blank after the cast on all 3 occurrences.
 * Change last VCLOCK_PVCLOCK message to be more descriptive
 * Sync the complete vcpu.h header instead of just adding the
 needed one. (IOW adding VCPUOP_get_physid)

Changes since v1:
 * Check flags ahead to see if the  primary clock can use
 PVCLOCK_TSC_STABLE_BIT even if secondary registration fails.
 (Comments from Boris)
 * Remove addr, addr variables;
 * Change first pr_debug to pr_warn;
 * Change last pr_debug to pr_notice;
 * Add routine to solely register secondary time info.
 * Move xen_clock to outside xen_setup_vsyscall_time_info to allow
 restore path to simply re-register secondary time info. Let us
 handle the restore path more gracefully without re-allocating a
 page.
 * Removed cpu argument from xen_setup_vsyscall_time_info()
 * Adjustment failed registration error messages/loglevel to be the same
 * Also teardown secondary time info on suspend

Changes since RFC:
 (Comments from Boris and David)
 * Remove Kconfig option
 * Use get_zeroed_page/free/page
 * Remove the hypercall availability check
 * Unregister pvti with arg.addr.v = NULL if stable bit isn't supported.
 (New)
 * Set secondary copy on restore such that it works on migration.
 * Drop global xen_clock variable and stash it locally on
 xen_setup_vsyscall_time_info.
 * WARN_ON(ret) if we fail to unregister the pvti.
---
 arch/x86/xen/suspend.c       |  4 ++
 arch/x86/xen/time.c          | 87 ++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/xen/xen-ops.h       |  2 +
 include/xen/interface/vcpu.h | 42 +++++++++++++++++++++
 4 files changed, 135 insertions(+)

Comments

Boris Ostrovsky Oct. 2, 2017, 6:44 p.m. UTC | #1
> +
> +static void xen_setup_vsyscall_time_info(void)
> +{
> +	struct vcpu_register_time_memory_area t;
> +	struct pvclock_vsyscall_time_info *ti;
> +	int ret;


In the previous version you'd return immediately if
PVCLOCK_TSC_STABLE_BIT was not set. Don't you still need to check this?
Especially give...


> +
> +	ti = (struct pvclock_vsyscall_time_info *)get_zeroed_page(GFP_KERNEL);
> +	if (!ti)
> +		return;
> +
> +	t.addr.v = &ti->pvti;
> +
> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
> +	if (ret) {
> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
> +		free_page((unsigned long)ti);
> +		return;
> +	}
> +
> +	/*
> +	 * If primary time info had this bit set, secondary should too since

... this comment?

-boris

> +	 * it's the same data on both just different memory regions. But we
> +	 * still check it in case hypervisor is buggy.
> +	 */
> +	if (!(ti->pvti.flags & PVCLOCK_TSC_STABLE_BIT)) {
> +		t.addr.v = NULL;
> +		ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
> +					 0, &t);
> +		if (!ret)
> +			free_page((unsigned long)ti);
> +
> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (tsc unstable)\n");
> +		return;
> +	}
> +
> +	xen_clock = ti;
> +	pvclock_set_pvti_cpu0_va(xen_clock);
> +
> +	xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
> +}
> +
>
Joao Martins Oct. 2, 2017, 8:58 p.m. UTC | #2
On 10/02/2017 07:44 PM, Boris Ostrovsky wrote:
> 
>> +
>> +static void xen_setup_vsyscall_time_info(void)
>> +{
>> +	struct vcpu_register_time_memory_area t;
>> +	struct pvclock_vsyscall_time_info *ti;
>> +	int ret;
> 
> 
> In the previous version you'd return immediately if
> PVCLOCK_TSC_STABLE_BIT was not set. Don't you still need to check this?
> Especially give...
> 
Yes, my mistake.

When moving the primary info check I changed the comment below, but should have
moved the call to xen_setup_vsyscall_time_info() into the newly added if ()
clause added in the previous patch. Let me move that inside the conditional and
respin in v6.

Joao

> 
>> +
>> +	ti = (struct pvclock_vsyscall_time_info *)get_zeroed_page(GFP_KERNEL);
>> +	if (!ti)
>> +		return;
>> +
>> +	t.addr.v = &ti->pvti;
>> +
>> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
>> +	if (ret) {
>> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
>> +		free_page((unsigned long)ti);
>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * If primary time info had this bit set, secondary should too since
> 
> ... this comment?
> 
> -boris
diff mbox

Patch

diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index d6b1680693a9..800ed36ecfba 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -16,6 +16,8 @@ 
 
 void xen_arch_pre_suspend(void)
 {
+	xen_save_time_memory_area();
+
 	if (xen_pv_domain())
 		xen_pv_pre_suspend();
 }
@@ -26,6 +28,8 @@  void xen_arch_post_suspend(int cancelled)
 		xen_pv_post_suspend(cancelled);
 	else
 		xen_hvm_post_suspend(cancelled);
+
+	xen_restore_time_memory_area();
 }
 
 static void xen_vcpu_notify_restore(void *data)
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index fc0148d3a70d..aa8bb87601f3 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -370,6 +370,92 @@  static const struct pv_time_ops xen_time_ops __initconst = {
 	.steal_clock = xen_steal_clock,
 };
 
+static struct pvclock_vsyscall_time_info *xen_clock __read_mostly;
+
+void xen_save_time_memory_area(void)
+{
+	struct vcpu_register_time_memory_area t;
+	int ret;
+
+	if (!xen_clock)
+		return;
+
+	t.addr.v = NULL;
+
+	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
+	if (ret != 0)
+		pr_notice("Cannot save secondary vcpu_time_info (err %d)",
+			  ret);
+	else
+		clear_page(xen_clock);
+}
+
+void xen_restore_time_memory_area(void)
+{
+	struct vcpu_register_time_memory_area t;
+	int ret;
+
+	if (!xen_clock)
+		return;
+
+	t.addr.v = &xen_clock->pvti;
+
+	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
+
+	/*
+	 * We don't disable VCLOCK_PVCLOCK entirely if it fails to register the
+	 * secondary time info with Xen or if we migrated to a host without the
+	 * necessary flags. On both of these cases what happens is either
+	 * process seeing a zeroed out pvti or seeing no PVCLOCK_TSC_STABLE_BIT
+	 * bit set. Userspace checks the latter and if 0, it discards the data
+	 * in pvti and fallbacks to a system call for a reliable timestamp.
+	 */
+	if (ret != 0)
+		pr_notice("Cannot restore secondary vcpu_time_info (err %d)",
+			  ret);
+}
+
+static void xen_setup_vsyscall_time_info(void)
+{
+	struct vcpu_register_time_memory_area t;
+	struct pvclock_vsyscall_time_info *ti;
+	int ret;
+
+	ti = (struct pvclock_vsyscall_time_info *)get_zeroed_page(GFP_KERNEL);
+	if (!ti)
+		return;
+
+	t.addr.v = &ti->pvti;
+
+	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
+	if (ret) {
+		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
+		free_page((unsigned long)ti);
+		return;
+	}
+
+	/*
+	 * If primary time info had this bit set, secondary should too since
+	 * it's the same data on both just different memory regions. But we
+	 * still check it in case hypervisor is buggy.
+	 */
+	if (!(ti->pvti.flags & PVCLOCK_TSC_STABLE_BIT)) {
+		t.addr.v = NULL;
+		ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
+					 0, &t);
+		if (!ret)
+			free_page((unsigned long)ti);
+
+		pr_notice("xen: VCLOCK_PVCLOCK not supported (tsc unstable)\n");
+		return;
+	}
+
+	xen_clock = ti;
+	pvclock_set_pvti_cpu0_va(xen_clock);
+
+	xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
+}
+
 static void __init xen_time_init(void)
 {
 	struct pvclock_vcpu_time_info *pvti;
@@ -405,6 +491,7 @@  static void __init xen_time_init(void)
 		pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
 
 	xen_setup_runstate_info(cpu);
+	xen_setup_vsyscall_time_info();
 	xen_setup_timer(cpu);
 	xen_setup_cpu_clockevents();
 
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index c8a6d224f7ed..f96dbedb33d4 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -69,6 +69,8 @@  void xen_setup_runstate_info(int cpu);
 void xen_teardown_timer(int cpu);
 u64 xen_clocksource_read(void);
 void xen_setup_cpu_clockevents(void);
+void xen_save_time_memory_area(void);
+void xen_restore_time_memory_area(void);
 void __init xen_init_time_ops(void);
 void __init xen_hvm_init_time_ops(void);
 
diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
index 98188c87f5c1..504c71601511 100644
--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -178,4 +178,46 @@  DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
 
 /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
 #define VCPUOP_send_nmi             11
+
+/*
+ * Get the physical ID information for a pinned vcpu's underlying physical
+ * processor.  The physical ID informmation is architecture-specific.
+ * On x86: id[31:0]=apic_id, id[63:32]=acpi_id.
+ * This command returns -EINVAL if it is not a valid operation for this VCPU.
+ */
+#define VCPUOP_get_physid           12 /* arg == vcpu_get_physid_t */
+struct vcpu_get_physid {
+	uint64_t phys_id;
+};
+DEFINE_GUEST_HANDLE_STRUCT(vcpu_get_physid);
+#define xen_vcpu_physid_to_x86_apicid(physid) ((uint32_t)(physid))
+#define xen_vcpu_physid_to_x86_acpiid(physid) ((uint32_t)((physid) >> 32))
+
+/*
+ * Register a memory location to get a secondary copy of the vcpu time
+ * parameters.  The master copy still exists as part of the vcpu shared
+ * memory area, and this secondary copy is updated whenever the master copy
+ * is updated (and using the same versioning scheme for synchronisation).
+ *
+ * The intent is that this copy may be mapped (RO) into userspace so
+ * that usermode can compute system time using the time info and the
+ * tsc.  Usermode will see an array of vcpu_time_info structures, one
+ * for each vcpu, and choose the right one by an existing mechanism
+ * which allows it to get the current vcpu number (such as via a
+ * segment limit).  It can then apply the normal algorithm to compute
+ * system time from the tsc.
+ *
+ * @extra_arg == pointer to vcpu_register_time_info_memory_area structure.
+ */
+#define VCPUOP_register_vcpu_time_memory_area   13
+DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info);
+struct vcpu_register_time_memory_area {
+	union {
+		GUEST_HANDLE(vcpu_time_info) h;
+		struct pvclock_vcpu_time_info *v;
+		uint64_t p;
+	} addr;
+};
+DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area);
+
 #endif /* __XEN_PUBLIC_VCPU_H__ */