Message ID | 1485365586-21653-3-git-send-email-joao.m.martins@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 01/25/2017 12:33 PM, Joao Martins wrote: > In order to support pvclock vdso on xen we need to setup the time > info page for vcpu 0 and register the page with Xen using the > VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall > will also forcefully update the pvti which will set some of the > necessary flags for vdso. Afterwards we check if it supports the > PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having > vdso/vsyscall support. And if so, it will set the cpu 0 pvti that > will be later on used when mapping the vdso image. > > The xen headers are also updated to include the new hypercall for > registering the secondary vcpu_time_info struct. > > Signed-off-by: Joao Martins <joao.m.martins@oracle.com> > --- > Changes since RFC: > (Comments from Boris and David) > * Remove Kconfig option > * Use get_zeroed_page/free/page > * Remove the hypercall availability check > * Unregister pvti with arg.addr.v = NULL if stable bit isn't supported. > (New) > * Set secondary copy on restore such that it works on migration. > * Drop global xen_clock variable and stash it locally on > xen_setup_vsyscall_time_info. > * WARN_ON(ret) if we fail to unregister the pvti. > --- > arch/x86/xen/enlighten.c | 2 ++ > arch/x86/xen/time.c | 51 ++++++++++++++++++++++++++++++++++++++++++++ > arch/x86/xen/xen-ops.h | 1 + > include/xen/interface/vcpu.h | 28 ++++++++++++++++++++++++ > 4 files changed, 82 insertions(+) > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index 51ef952..15d271d 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -270,6 +270,8 @@ void xen_vcpu_restore(void) This is called for PV only. What about HVM? > HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL)) > BUG(); > } > + > + xen_setup_vsyscall_time_info(0); Do we need to tear down time memory area on VCPU suspend? > } > > static void __init xen_banner(void) > diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c > index 1e69956..e90f703 100644 > --- a/arch/x86/xen/time.c > +++ b/arch/x86/xen/time.c > @@ -367,6 +367,56 @@ static const struct pv_time_ops xen_time_ops __initconst = { > .steal_clock = xen_steal_clock, > }; > > +int xen_setup_vsyscall_time_info(int cpu) > +{ > + struct pvclock_vsyscall_time_info *xen_clock; > + struct vcpu_register_time_memory_area t; > + struct pvclock_vcpu_time_info *pvti; > + unsigned long addr; > + u8 flags; > + int ret; > + > + addr = get_zeroed_page(GFP_KERNEL); > + if (!addr) > + return -ENOMEM; > + > + xen_clock = (struct pvclock_vsyscall_time_info *) addr; > + memset(xen_clock, 0, PAGE_SIZE); You don't really need addr and there is no reason to memset the page to zero, given that you got it with get_zeroed_page(). > + > + t.addr.v = &xen_clock->pvti; > + > + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, > + cpu, &t); > + > + if (ret) { > + pr_debug("xen: cannot register vcpu_time_info err %d\n", ret); pr_warn() would be more appropriate I think. You also have blank line before 'if'. > + free_page(addr); > + return ret; > + } > + > + pvti = &xen_clock->pvti; > + flags = pvti->flags; I don't think you need these, given that you only reference flags once below. > + > + if (!(flags & PVCLOCK_TSC_STABLE_BIT)) { > + t.addr.v = NULL; > + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, > + cpu, &t); > + if (!ret) > + free_page(addr); > + > + WARN_ON(ret); Did someone ask for WARN_ON()? (from your changelog it looks like it could have been either David or me). I think pr_warn() is sufficient, just like above. > + pr_debug("xen: VCLOCK_PVCLOCK not supported\n"); pr_notice() -boris > + return -ENOTSUPP; > + } > + > + pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); > + pvclock_set_pvti_cpu0_va(xen_clock); > + > + xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK; > + > + return 0; > +} > + > static void __init xen_time_init(void) > { > int cpu = smp_processor_id(); > @@ -393,6 +443,7 @@ static void __init xen_time_init(void) > setup_force_cpu_cap(X86_FEATURE_TSC); > > xen_setup_runstate_info(cpu); > + xen_setup_vsyscall_time_info(cpu); > xen_setup_timer(cpu); > xen_setup_cpu_clockevents(); > > diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h > index ac0a2b0..4036d15 100644 > --- a/arch/x86/xen/xen-ops.h > +++ b/arch/x86/xen/xen-ops.h > @@ -66,6 +66,7 @@ void __init xen_vmalloc_p2m_tree(void); > void xen_init_irq_ops(void); > void xen_setup_timer(int cpu); > void xen_setup_runstate_info(int cpu); > +int xen_setup_vsyscall_time_info(int cpu); > void xen_teardown_timer(int cpu); > u64 xen_clocksource_read(void); > void xen_setup_cpu_clockevents(void); > diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h > index 98188c8..8da788c 100644 > --- a/include/xen/interface/vcpu.h > +++ b/include/xen/interface/vcpu.h > @@ -178,4 +178,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info); > > /* Send an NMI to the specified VCPU. @extra_arg == NULL. */ > #define VCPUOP_send_nmi 11 > + > +/* > + * Register a memory location to get a secondary copy of the vcpu time > + * parameters. The master copy still exists as part of the vcpu shared > + * memory area, and this secondary copy is updated whenever the master copy > + * is updated (and using the same versioning scheme for synchronisation). > + * > + * The intent is that this copy may be mapped (RO) into userspace so > + * that usermode can compute system time using the time info and the > + * tsc. Usermode will see an array of vcpu_time_info structures, one > + * for each vcpu, and choose the right one by an existing mechanism > + * which allows it to get the current vcpu number (such as via a > + * segment limit). It can then apply the normal algorithm to compute > + * system time from the tsc. > + * > + * @extra_arg == pointer to vcpu_register_time_info_memory_area structure. > + */ > +#define VCPUOP_register_vcpu_time_memory_area 13 > +DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t); > +struct vcpu_register_time_memory_area { > + union { > + GUEST_HANDLE(vcpu_time_info_t) h; > + struct pvclock_vcpu_time_info *v; > + uint64_t p; > + } addr; > +}; > +DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t); > + > #endif /* __XEN_PUBLIC_VCPU_H__ */
On 01/25/2017 07:26 PM, Boris Ostrovsky wrote: > On 01/25/2017 12:33 PM, Joao Martins wrote: >> In order to support pvclock vdso on xen we need to setup the time >> info page for vcpu 0 and register the page with Xen using the >> VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall >> will also forcefully update the pvti which will set some of the >> necessary flags for vdso. Afterwards we check if it supports the >> PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having >> vdso/vsyscall support. And if so, it will set the cpu 0 pvti that >> will be later on used when mapping the vdso image. >> >> The xen headers are also updated to include the new hypercall for >> registering the secondary vcpu_time_info struct. >> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >> --- >> Changes since RFC: >> (Comments from Boris and David) >> * Remove Kconfig option >> * Use get_zeroed_page/free/page >> * Remove the hypercall availability check >> * Unregister pvti with arg.addr.v = NULL if stable bit isn't supported. >> (New) >> * Set secondary copy on restore such that it works on migration. >> * Drop global xen_clock variable and stash it locally on >> xen_setup_vsyscall_time_info. >> * WARN_ON(ret) if we fail to unregister the pvti. >> --- >> arch/x86/xen/enlighten.c | 2 ++ >> arch/x86/xen/time.c | 51 ++++++++++++++++++++++++++++++++++++++++++++ >> arch/x86/xen/xen-ops.h | 1 + >> include/xen/interface/vcpu.h | 28 ++++++++++++++++++++++++ >> 4 files changed, 82 insertions(+) >> >> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c >> index 51ef952..15d271d 100644 >> --- a/arch/x86/xen/enlighten.c >> +++ b/arch/x86/xen/enlighten.c >> @@ -270,6 +270,8 @@ void xen_vcpu_restore(void) > > This is called for PV only. What about HVM? The call is missing from xen_hvm_post_suspend(...), I will add it. > >> HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL)) >> BUG(); >> } >> + >> + xen_setup_vsyscall_time_info(0); > > Do we need to tear down time memory area on VCPU suspend? I also missed that; otherwise I am leaking a page. I could also rework this patch such that the initially allocated xen_clock page is reused and simply register/unregister in save/restore paths. This would probably mean adding one extra helper to register the vcpu_time info and perhaps make xen_setup_vsyscall_time_info a bit simpler. >> } >> >> static void __init xen_banner(void) >> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c >> index 1e69956..e90f703 100644 >> --- a/arch/x86/xen/time.c >> +++ b/arch/x86/xen/time.c >> @@ -367,6 +367,56 @@ static const struct pv_time_ops xen_time_ops __initconst = { >> .steal_clock = xen_steal_clock, >> }; >> >> +int xen_setup_vsyscall_time_info(int cpu) >> +{ >> + struct pvclock_vsyscall_time_info *xen_clock; >> + struct vcpu_register_time_memory_area t; >> + struct pvclock_vcpu_time_info *pvti; >> + unsigned long addr; >> + u8 flags; >> + int ret; >> + >> + addr = get_zeroed_page(GFP_KERNEL); >> + if (!addr) >> + return -ENOMEM; >> + >> + xen_clock = (struct pvclock_vsyscall_time_info *) addr; >> + memset(xen_clock, 0, PAGE_SIZE); > > You don't really need addr The reason I had this was to avoid to save one cast to unsigned long (on free_page paths). But maybe it's not worth it and looking at the rest of the x86/xen code, this doesn't seem to be the case. I will remove it. > and there is no reason to memset the page to > zero, given that you got it with get_zeroed_page(). Yeap, Fixed. > >> + >> + t.addr.v = &xen_clock->pvti; >> + >> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, >> + cpu, &t); >> + >> + if (ret) { >> + pr_debug("xen: cannot register vcpu_time_info err %d\n", ret); > > pr_warn() would be more appropriate I think. You also have blank line > before 'if'. Fixed. >> + free_page(addr); >> + return ret; >> + } >> + >> + pvti = &xen_clock->pvti; >> + flags = pvti->flags; > > I don't think you need these, given that you only reference flags once > below. > Indeed, fixed as well. >> + >> + if (!(flags & PVCLOCK_TSC_STABLE_BIT)) { >> + t.addr.v = NULL; >> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, >> + cpu, &t); >> + if (!ret) >> + free_page(addr); >> + >> + WARN_ON(ret); > > Did someone ask for WARN_ON()? (from your changelog it looks like it > could have been either David or me). I think pr_warn() is sufficient, > just like above. Looking again at previous comments no, this WARN_ON wasn't asked. I can changed it to the similar above as you are suggesting. > >> + pr_debug("xen: VCLOCK_PVCLOCK not supported\n"); > > pr_notice() Fixed it too. > > -boris > >> + return -ENOTSUPP; >> + } >> + >> + pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); >> + pvclock_set_pvti_cpu0_va(xen_clock); >> + >> + xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK; >> + >> + return 0; >> +} >> + >> static void __init xen_time_init(void) >> { >> int cpu = smp_processor_id(); >> @@ -393,6 +443,7 @@ static void __init xen_time_init(void) >> setup_force_cpu_cap(X86_FEATURE_TSC); >> >> xen_setup_runstate_info(cpu); >> + xen_setup_vsyscall_time_info(cpu); >> xen_setup_timer(cpu); >> xen_setup_cpu_clockevents(); >> >> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h >> index ac0a2b0..4036d15 100644 >> --- a/arch/x86/xen/xen-ops.h >> +++ b/arch/x86/xen/xen-ops.h >> @@ -66,6 +66,7 @@ void __init xen_vmalloc_p2m_tree(void); >> void xen_init_irq_ops(void); >> void xen_setup_timer(int cpu); >> void xen_setup_runstate_info(int cpu); >> +int xen_setup_vsyscall_time_info(int cpu); >> void xen_teardown_timer(int cpu); >> u64 xen_clocksource_read(void); >> void xen_setup_cpu_clockevents(void); >> diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h >> index 98188c8..8da788c 100644 >> --- a/include/xen/interface/vcpu.h >> +++ b/include/xen/interface/vcpu.h >> @@ -178,4 +178,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info); >> >> /* Send an NMI to the specified VCPU. @extra_arg == NULL. */ >> #define VCPUOP_send_nmi 11 >> + >> +/* >> + * Register a memory location to get a secondary copy of the vcpu time >> + * parameters. The master copy still exists as part of the vcpu shared >> + * memory area, and this secondary copy is updated whenever the master copy >> + * is updated (and using the same versioning scheme for synchronisation). >> + * >> + * The intent is that this copy may be mapped (RO) into userspace so >> + * that usermode can compute system time using the time info and the >> + * tsc. Usermode will see an array of vcpu_time_info structures, one >> + * for each vcpu, and choose the right one by an existing mechanism >> + * which allows it to get the current vcpu number (such as via a >> + * segment limit). It can then apply the normal algorithm to compute >> + * system time from the tsc. >> + * >> + * @extra_arg == pointer to vcpu_register_time_info_memory_area structure. >> + */ >> +#define VCPUOP_register_vcpu_time_memory_area 13 >> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t); >> +struct vcpu_register_time_memory_area { >> + union { >> + GUEST_HANDLE(vcpu_time_info_t) h; >> + struct pvclock_vcpu_time_info *v; >> + uint64_t p; >> + } addr; >> +}; >> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t); >> + >> #endif /* __XEN_PUBLIC_VCPU_H__ */ > >
On Wed, Jan 25, 2017 at 9:33 AM, Joao Martins <joao.m.martins@oracle.com> wrote: > In order to support pvclock vdso on xen we need to setup the time > info page for vcpu 0 and register the page with Xen using the > VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall > will also forcefully update the pvti which will set some of the > necessary flags for vdso. Afterwards we check if it supports the > PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having > vdso/vsyscall support. And if so, it will set the cpu 0 pvti that > will be later on used when mapping the vdso image. > > The xen headers are also updated to include the new hypercall for > registering the secondary vcpu_time_info struct. No objections from me assuming the code is correct.
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 51ef952..15d271d 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -270,6 +270,8 @@ void xen_vcpu_restore(void) HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL)) BUG(); } + + xen_setup_vsyscall_time_info(0); } static void __init xen_banner(void) diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index 1e69956..e90f703 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -367,6 +367,56 @@ static const struct pv_time_ops xen_time_ops __initconst = { .steal_clock = xen_steal_clock, }; +int xen_setup_vsyscall_time_info(int cpu) +{ + struct pvclock_vsyscall_time_info *xen_clock; + struct vcpu_register_time_memory_area t; + struct pvclock_vcpu_time_info *pvti; + unsigned long addr; + u8 flags; + int ret; + + addr = get_zeroed_page(GFP_KERNEL); + if (!addr) + return -ENOMEM; + + xen_clock = (struct pvclock_vsyscall_time_info *) addr; + memset(xen_clock, 0, PAGE_SIZE); + + t.addr.v = &xen_clock->pvti; + + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, + cpu, &t); + + if (ret) { + pr_debug("xen: cannot register vcpu_time_info err %d\n", ret); + free_page(addr); + return ret; + } + + pvti = &xen_clock->pvti; + flags = pvti->flags; + + if (!(flags & PVCLOCK_TSC_STABLE_BIT)) { + t.addr.v = NULL; + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, + cpu, &t); + if (!ret) + free_page(addr); + + WARN_ON(ret); + pr_debug("xen: VCLOCK_PVCLOCK not supported\n"); + return -ENOTSUPP; + } + + pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); + pvclock_set_pvti_cpu0_va(xen_clock); + + xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK; + + return 0; +} + static void __init xen_time_init(void) { int cpu = smp_processor_id(); @@ -393,6 +443,7 @@ static void __init xen_time_init(void) setup_force_cpu_cap(X86_FEATURE_TSC); xen_setup_runstate_info(cpu); + xen_setup_vsyscall_time_info(cpu); xen_setup_timer(cpu); xen_setup_cpu_clockevents(); diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index ac0a2b0..4036d15 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -66,6 +66,7 @@ void __init xen_vmalloc_p2m_tree(void); void xen_init_irq_ops(void); void xen_setup_timer(int cpu); void xen_setup_runstate_info(int cpu); +int xen_setup_vsyscall_time_info(int cpu); void xen_teardown_timer(int cpu); u64 xen_clocksource_read(void); void xen_setup_cpu_clockevents(void); diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h index 98188c8..8da788c 100644 --- a/include/xen/interface/vcpu.h +++ b/include/xen/interface/vcpu.h @@ -178,4 +178,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info); /* Send an NMI to the specified VCPU. @extra_arg == NULL. */ #define VCPUOP_send_nmi 11 + +/* + * Register a memory location to get a secondary copy of the vcpu time + * parameters. The master copy still exists as part of the vcpu shared + * memory area, and this secondary copy is updated whenever the master copy + * is updated (and using the same versioning scheme for synchronisation). + * + * The intent is that this copy may be mapped (RO) into userspace so + * that usermode can compute system time using the time info and the + * tsc. Usermode will see an array of vcpu_time_info structures, one + * for each vcpu, and choose the right one by an existing mechanism + * which allows it to get the current vcpu number (such as via a + * segment limit). It can then apply the normal algorithm to compute + * system time from the tsc. + * + * @extra_arg == pointer to vcpu_register_time_info_memory_area structure. + */ +#define VCPUOP_register_vcpu_time_memory_area 13 +DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t); +struct vcpu_register_time_memory_area { + union { + GUEST_HANDLE(vcpu_time_info_t) h; + struct pvclock_vcpu_time_info *v; + uint64_t p; + } addr; +}; +DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t); + #endif /* __XEN_PUBLIC_VCPU_H__ */
In order to support pvclock vdso on xen we need to setup the time info page for vcpu 0 and register the page with Xen using the VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall will also forcefully update the pvti which will set some of the necessary flags for vdso. Afterwards we check if it supports the PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having vdso/vsyscall support. And if so, it will set the cpu 0 pvti that will be later on used when mapping the vdso image. The xen headers are also updated to include the new hypercall for registering the secondary vcpu_time_info struct. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> --- Changes since RFC: (Comments from Boris and David) * Remove Kconfig option * Use get_zeroed_page/free/page * Remove the hypercall availability check * Unregister pvti with arg.addr.v = NULL if stable bit isn't supported. (New) * Set secondary copy on restore such that it works on migration. * Drop global xen_clock variable and stash it locally on xen_setup_vsyscall_time_info. * WARN_ON(ret) if we fail to unregister the pvti. --- arch/x86/xen/enlighten.c | 2 ++ arch/x86/xen/time.c | 51 ++++++++++++++++++++++++++++++++++++++++++++ arch/x86/xen/xen-ops.h | 1 + include/xen/interface/vcpu.h | 28 ++++++++++++++++++++++++ 4 files changed, 82 insertions(+)