Message ID | 20161114154015.GA30048@amt.cnet (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 14/11/2016 16:40, Marcelo Tosatti wrote: > static bool kvmclock_src_use_reliable_get_clock(void *opaque) > { > KVMClockState *s = opaque; > > /* > * On machine types that support reliable KVM_GET_CLOCK, > * if host kernel does provide reliable KVM_GET_CLOCK, > * set src_use_reliable_get_clock=true so that destination > * avoids reading kvmclock from memory. > */ > if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) > { > s->src_use_reliable_get_clock = true; > } > > return s->mach_use_reliable_get_clock; > } > > > Ah, OK, done. s->src_use_reliable_get_clock should not be set with KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK. > So s->src_use_reliable_get_clock is only used to indicate > to the destination that: "you can use KVM_GET_CLOCK value, > its safe". Yes, we agree. I was listing all the points, not just those where we disagree. Actually I'm not sure where we disagree, except on using flags from KVM_CHECK_EXTENSION vs. flags from KVM_GET_CLOCK... Paolo
On Mon, Nov 14, 2016 at 05:43:33PM +0100, Paolo Bonzini wrote: > > > On 14/11/2016 16:40, Marcelo Tosatti wrote: > > static bool kvmclock_src_use_reliable_get_clock(void *opaque) > > { > > KVMClockState *s = opaque; > > > > /* > > * On machine types that support reliable KVM_GET_CLOCK, > > * if host kernel does provide reliable KVM_GET_CLOCK, > > * set src_use_reliable_get_clock=true so that destination > > * avoids reading kvmclock from memory. > > */ > > if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) > > { > > s->src_use_reliable_get_clock = true; > > } > > > > return s->mach_use_reliable_get_clock; > > } > > > > > > Ah, OK, done. > > s->src_use_reliable_get_clock should not be set with > KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK. Well, thats not right: What matters is the presence of get_kvmclock_ns which returns a value that the guest sees. get_kernel_monotonic_clock() + kvmclock_offset + (rdtsc() - tsc_timestamp) IOW what the guest sees. And you changed that in commit 108b249c453dd7132599ab6dc7e435a7036c193f Author: Paolo Bonzini <pbonzini@redhat.com> Date: Thu Sep 1 14:21:03 2016 +0200 KVM: x86: introduce get_kvmclock_ns And the correct behaviour (once KVM_GET_CLOCK is fixed per previous message to return rdtsc - tsc_timestamp for the non masterclock case) depends on this commit above, not on masterclock. > > So s->src_use_reliable_get_clock is only used to indicate > > to the destination that: "you can use KVM_GET_CLOCK value, > > its safe". > > Yes, we agree. I was listing all the points, not just those where we > disagree. Actually I'm not sure where we disagree, except on using > flags from KVM_CHECK_EXTENSION vs. flags from KVM_GET_CLOCK... > > Paolo
On 14/11/2016 18:13, Marcelo Tosatti wrote: > On Mon, Nov 14, 2016 at 05:43:33PM +0100, Paolo Bonzini wrote: >> >> >> On 14/11/2016 16:40, Marcelo Tosatti wrote: >>> static bool kvmclock_src_use_reliable_get_clock(void *opaque) >>> { >>> KVMClockState *s = opaque; >>> >>> /* >>> * On machine types that support reliable KVM_GET_CLOCK, >>> * if host kernel does provide reliable KVM_GET_CLOCK, >>> * set src_use_reliable_get_clock=true so that destination >>> * avoids reading kvmclock from memory. >>> */ >>> if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) >>> { >>> s->src_use_reliable_get_clock = true; >>> } >>> >>> return s->mach_use_reliable_get_clock; >>> } >>> >>> >>> Ah, OK, done. >> >> s->src_use_reliable_get_clock should not be set with >> KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK. > > Well, thats not right: What matters is the presence of get_kvmclock_ns > which returns a value that the guest sees. > > get_kernel_monotonic_clock() + kvmclock_offset + > (rdtsc() - tsc_timestamp) > > IOW what the guest sees. And you changed that in > > commit 108b249c453dd7132599ab6dc7e435a7036c193f > Author: Paolo Bonzini <pbonzini@redhat.com> > Date: Thu Sep 1 14:21:03 2016 +0200 > > KVM: x86: introduce get_kvmclock_ns > > And the correct behaviour (once KVM_GET_CLOCK is fixed per > previous message to return rdtsc - tsc_timestamp for the > non masterclock case) depends on this commit above, > not on masterclock. This commit in turn only gets the correct behavior if "vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT" (and it will be changed soon to ka->use_masterclock). KVM_CHECK_EXTENSION can still return KVM_CLOCK_TSC_STABLE even if the masterclock is disabled, because KVM_CHECK_EXTENSION only tells you which flags are known for this version of the KVM module. To see if the masterclock is enabled _now_, you need to check what KVM_GET_CLOCK sets in the flags. From the KVM_CLOCK_TSC_STABLE patch: user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; Thanks, Paolo >>> So s->src_use_reliable_get_clock is only used to indicate >>> to the destination that: "you can use KVM_GET_CLOCK value, >>> its safe". >> >> Yes, we agree. I was listing all the points, not just those where we >> disagree. Actually I'm not sure where we disagree, except on using >> flags from KVM_CHECK_EXTENSION vs. flags from KVM_GET_CLOCK... >> >> Paolo
On Mon, Nov 14, 2016 at 06:20:29PM +0100, Paolo Bonzini wrote: > > > On 14/11/2016 18:13, Marcelo Tosatti wrote: > > On Mon, Nov 14, 2016 at 05:43:33PM +0100, Paolo Bonzini wrote: > >> > >> > >> On 14/11/2016 16:40, Marcelo Tosatti wrote: > >>> static bool kvmclock_src_use_reliable_get_clock(void *opaque) > >>> { > >>> KVMClockState *s = opaque; > >>> > >>> /* > >>> * On machine types that support reliable KVM_GET_CLOCK, > >>> * if host kernel does provide reliable KVM_GET_CLOCK, > >>> * set src_use_reliable_get_clock=true so that destination > >>> * avoids reading kvmclock from memory. > >>> */ > >>> if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) > >>> { > >>> s->src_use_reliable_get_clock = true; > >>> } > >>> > >>> return s->mach_use_reliable_get_clock; > >>> } > >>> > >>> > >>> Ah, OK, done. > >> > >> s->src_use_reliable_get_clock should not be set with > >> KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK. > > > > Well, thats not right: What matters is the presence of get_kvmclock_ns > > which returns a value that the guest sees. > > > > get_kernel_monotonic_clock() + kvmclock_offset + > > (rdtsc() - tsc_timestamp) > > > > IOW what the guest sees. And you changed that in > > > > commit 108b249c453dd7132599ab6dc7e435a7036c193f > > Author: Paolo Bonzini <pbonzini@redhat.com> > > Date: Thu Sep 1 14:21:03 2016 +0200 > > > > KVM: x86: introduce get_kvmclock_ns > > > > And the correct behaviour (once KVM_GET_CLOCK is fixed per > > previous message to return rdtsc - tsc_timestamp for the > > non masterclock case) depends on this commit above, > > not on masterclock. > > This commit in turn only gets the correct behavior if > "vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT" (and it will be > changed soon to ka->use_masterclock). KVM_CHECK_EXTENSION can still > return KVM_CLOCK_TSC_STABLE even if the masterclock is disabled, > because KVM_CHECK_EXTENSION only tells you which flags are known for > this version of the KVM module. What QEMU wants is to use KVM_GET_CLOCK at pre_save independently of whether masterclock is enabled or not... it just depends on KVM_GET_CLOCK being correct for the masterclock case (108b249c453dd7132599ab6dc7e435a7036c193f). So a "reliable KVM_GET_CLOCK" (that does not timebackward when masterclock is enabled) is much simpler to userspace than "whether masterclock is enabled or not". If you have a reason why that should not be the case, let me know. > To see if the masterclock is enabled _now_, you need to check what > KVM_GET_CLOCK sets in the flags. From the KVM_CLOCK_TSC_STABLE patch: > > user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; Again, whether masterclock is enable is independent of being able to use KVM_GET_CLOCK at pre_save.
On Mon, Nov 14, 2016 at 04:15:18PM -0200, Marcelo Tosatti wrote: > On Mon, Nov 14, 2016 at 06:20:29PM +0100, Paolo Bonzini wrote: > > > > > > On 14/11/2016 18:13, Marcelo Tosatti wrote: > > > On Mon, Nov 14, 2016 at 05:43:33PM +0100, Paolo Bonzini wrote: > > >> > > >> > > >> On 14/11/2016 16:40, Marcelo Tosatti wrote: > > >>> static bool kvmclock_src_use_reliable_get_clock(void *opaque) > > >>> { > > >>> KVMClockState *s = opaque; > > >>> > > >>> /* > > >>> * On machine types that support reliable KVM_GET_CLOCK, > > >>> * if host kernel does provide reliable KVM_GET_CLOCK, > > >>> * set src_use_reliable_get_clock=true so that destination > > >>> * avoids reading kvmclock from memory. > > >>> */ > > >>> if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) > > >>> { > > >>> s->src_use_reliable_get_clock = true; > > >>> } > > >>> > > >>> return s->mach_use_reliable_get_clock; > > >>> } > > >>> > > >>> > > >>> Ah, OK, done. > > >> > > >> s->src_use_reliable_get_clock should not be set with > > >> KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK. > > > > > > Well, thats not right: What matters is the presence of get_kvmclock_ns > > > which returns a value that the guest sees. > > > > > > get_kernel_monotonic_clock() + kvmclock_offset + > > > (rdtsc() - tsc_timestamp) > > > > > > IOW what the guest sees. And you changed that in > > > > > > commit 108b249c453dd7132599ab6dc7e435a7036c193f > > > Author: Paolo Bonzini <pbonzini@redhat.com> > > > Date: Thu Sep 1 14:21:03 2016 +0200 > > > > > > KVM: x86: introduce get_kvmclock_ns > > > > > > And the correct behaviour (once KVM_GET_CLOCK is fixed per > > > previous message to return rdtsc - tsc_timestamp for the > > > non masterclock case) depends on this commit above, > > > not on masterclock. > > > > This commit in turn only gets the correct behavior if > > "vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT" (and it will be > > changed soon to ka->use_masterclock). KVM_CHECK_EXTENSION can still > > return KVM_CLOCK_TSC_STABLE even if the masterclock is disabled, > > because KVM_CHECK_EXTENSION only tells you which flags are known for > > this version of the KVM module. > > What QEMU wants is to use KVM_GET_CLOCK at pre_save independently > of whether masterclock is enabled or not... it just depends > on KVM_GET_CLOCK being correct for the masterclock case > (108b249c453dd7132599ab6dc7e435a7036c193f). > > So a "reliable KVM_GET_CLOCK" (that does not timebackward > when masterclock is enabled) is much simpler to userspace > than "whether masterclock is enabled or not". > > If you have a reason why that should not be the case, > let me know. > > > To see if the masterclock is enabled _now_, you need to check what > > KVM_GET_CLOCK sets in the flags. From the KVM_CLOCK_TSC_STABLE patch: > > > > user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; > > Again, whether masterclock is enable is independent of > being able to use KVM_GET_CLOCK at pre_save. Is this point OK ? Using break; + case KVM_CAP_ADJUST_CLOCK: + r = KVM_CLOCK_TSC_STABLE; + break; To infer whether KVM_GET_CLOCK is fixed for the monotonic case.
On 17/11/2016 13:16, Marcelo Tosatti wrote: > On Mon, Nov 14, 2016 at 04:15:18PM -0200, Marcelo Tosatti wrote: >> On Mon, Nov 14, 2016 at 06:20:29PM +0100, Paolo Bonzini wrote: >>> >>> >>> On 14/11/2016 18:13, Marcelo Tosatti wrote: >>>> On Mon, Nov 14, 2016 at 05:43:33PM +0100, Paolo Bonzini wrote: >>>>> >>>>> >>>>> On 14/11/2016 16:40, Marcelo Tosatti wrote: >>>>>> static bool kvmclock_src_use_reliable_get_clock(void *opaque) >>>>>> { >>>>>> KVMClockState *s = opaque; >>>>>> >>>>>> /* >>>>>> * On machine types that support reliable KVM_GET_CLOCK, >>>>>> * if host kernel does provide reliable KVM_GET_CLOCK, >>>>>> * set src_use_reliable_get_clock=true so that destination >>>>>> * avoids reading kvmclock from memory. >>>>>> */ >>>>>> if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) >>>>>> { >>>>>> s->src_use_reliable_get_clock = true; >>>>>> } >>>>>> >>>>>> return s->mach_use_reliable_get_clock; >>>>>> } >>>>>> >>>>>> >>>>>> Ah, OK, done. >>>>> >>>>> s->src_use_reliable_get_clock should not be set with >>>>> KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK. >>>> >>>> Well, thats not right: What matters is the presence of get_kvmclock_ns >>>> which returns a value that the guest sees. >>>> >>>> get_kernel_monotonic_clock() + kvmclock_offset + >>>> (rdtsc() - tsc_timestamp) >>>> >>>> IOW what the guest sees. And you changed that in >>>> >>>> commit 108b249c453dd7132599ab6dc7e435a7036c193f >>>> Author: Paolo Bonzini <pbonzini@redhat.com> >>>> Date: Thu Sep 1 14:21:03 2016 +0200 >>>> >>>> KVM: x86: introduce get_kvmclock_ns >>>> >>>> And the correct behaviour (once KVM_GET_CLOCK is fixed per >>>> previous message to return rdtsc - tsc_timestamp for the >>>> non masterclock case) depends on this commit above, >>>> not on masterclock. >>> >>> This commit in turn only gets the correct behavior if >>> "vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT" (and it will be >>> changed soon to ka->use_masterclock). KVM_CHECK_EXTENSION can still >>> return KVM_CLOCK_TSC_STABLE even if the masterclock is disabled, >>> because KVM_CHECK_EXTENSION only tells you which flags are known for >>> this version of the KVM module. >> >> What QEMU wants is to use KVM_GET_CLOCK at pre_save independently >> of whether masterclock is enabled or not... it just depends >> on KVM_GET_CLOCK being correct for the masterclock case >> (108b249c453dd7132599ab6dc7e435a7036c193f). >> >> So a "reliable KVM_GET_CLOCK" (that does not timebackward >> when masterclock is enabled) is much simpler to userspace >> than "whether masterclock is enabled or not". >> >> If you have a reason why that should not be the case, >> let me know. >> >>> To see if the masterclock is enabled _now_, you need to check what >>> KVM_GET_CLOCK sets in the flags. From the KVM_CLOCK_TSC_STABLE patch: >>> >>> user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; >> >> Again, whether masterclock is enable is independent of >> being able to use KVM_GET_CLOCK at pre_save. > > Is this point OK ? > > Using > > break; > + case KVM_CAP_ADJUST_CLOCK: > + r = KVM_CLOCK_TSC_STABLE; > + break; > > To infer whether KVM_GET_CLOCK is fixed for the monotonic case. Yes, I still haven't digested why it is correct (I need to read again what you wrote), but it is indeed correct to use KVM_CAP_ADJUST_CLOCK this way. Paolo
Index: qemu-mig-advance-clock/hw/i386/kvm/clock.c =================================================================== --- qemu-mig-advance-clock.orig/hw/i386/kvm/clock.c 2016-11-14 10:40:39.748116312 -0200 +++ qemu-mig-advance-clock/hw/i386/kvm/clock.c 2016-11-14 13:38:29.299955042 -0200 @@ -36,6 +36,12 @@ uint64_t clock; bool clock_valid; + + /* whether machine supports reliable KVM_GET_CLOCK */ + bool mach_use_reliable_get_clock; + + /* whether source host supported reliable KVM_GET_CLOCK */ + bool src_use_reliable_get_clock; } KVMClockState; struct pvclock_vcpu_time_info { @@ -81,6 +87,19 @@ return nsec + time.system_time; } +static uint64_t kvm_get_clock(void) +{ + struct kvm_clock_data data; + int ret; + + ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); + if (ret < 0) { + fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret)); + abort(); + } + return data.clock; +} + static void kvmclock_vm_state_change(void *opaque, int running, RunState state) { @@ -91,15 +110,37 @@ if (running) { struct kvm_clock_data data = {}; - uint64_t time_at_migration = kvmclock_current_nsec(s); + uint64_t pvclock_via_mem = 0; - s->clock_valid = false; + /* local (running VM) restore */ + if (s->clock_valid) { + /* + * if host does not support reliable KVM_GET_CLOCK, + * read kvmclock value from memory + */ + if (!kvm_has_adjust_clock_stable()) { + pvclock_via_mem = kvmclock_current_nsec(s); + } + /* migration/savevm/init restore */ + } else { + /* + * use s->clock in case machine uses reliable + * get clock and source host supported + * reliable get clock + */ + if (!(s->mach_use_reliable_get_clock && + s->src_use_reliable_get_clock)) { + pvclock_via_mem = kvmclock_current_nsec(s); + } + } - /* We can't rely on the migrated clock value, just discard it */ - if (time_at_migration) { - s->clock = time_at_migration; + /* We can't rely on the saved clock value, just discard it */ + if (pvclock_via_mem) { + s->clock = pvclock_via_mem; } + s->clock_valid = false; + data.clock = s->clock; ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data); if (ret < 0) { @@ -120,8 +161,6 @@ } } } else { - struct kvm_clock_data data; - int ret; if (s->clock_valid) { return; @@ -129,13 +168,7 @@ kvm_synchronize_all_tsc(); - ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); - if (ret < 0) { - fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret)); - abort(); - } - s->clock = data.clock; - + s->clock = kvm_get_clock(); /* * If the VM is stopped, declare the clock state valid to * avoid re-reading it on next vmsave (which would return @@ -152,22 +185,69 @@ qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s); } +static bool kvmclock_src_use_reliable_get_clock(void *opaque) +{ + KVMClockState *s = opaque; + + /* + * On machine types that support reliable KVM_GET_CLOCK, + * if host kernel does provide reliable KVM_GET_CLOCK, + * set src_use_reliable_get_clock=true so that destination + * avoids reading kvmclock from memory. + */ + if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) { + s->src_use_reliable_get_clock = true; + } + + return s->mach_use_reliable_get_clock; +} + +static const VMStateDescription kvmclock_reliable_get_clock = { + .name = "kvmclock/src_use_reliable_get_clock", + .version_id = 1, + .minimum_version_id = 1, + .needed = kvmclock_src_use_reliable_get_clock, + .fields = (VMStateField[]) { + VMSTATE_BOOL(src_use_reliable_get_clock, KVMClockState), + VMSTATE_END_OF_LIST() + } +}; + +static void kvmclock_pre_save(void *opaque) +{ + KVMClockState *s = opaque; + + s->clock = kvm_get_clock(); +} + static const VMStateDescription kvmclock_vmsd = { .name = "kvmclock", .version_id = 1, .minimum_version_id = 1, + .pre_save = kvmclock_pre_save, .fields = (VMStateField[]) { VMSTATE_UINT64(clock, KVMClockState), VMSTATE_END_OF_LIST() + }, + .subsections = (const VMStateDescription * []) { + &kvmclock_reliable_get_clock, + NULL } }; +static Property kvmclock_properties[] = { + DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState, + mach_use_reliable_get_clock, true), + DEFINE_PROP_END_OF_LIST(), +}; + static void kvmclock_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); dc->realize = kvmclock_realize; dc->vmsd = &kvmclock_vmsd; + dc->props = kvmclock_properties; } static const TypeInfo kvmclock_info = { Index: qemu-mig-advance-clock/include/hw/i386/pc.h =================================================================== --- qemu-mig-advance-clock.orig/include/hw/i386/pc.h 2016-11-14 10:40:39.748116312 -0200 +++ qemu-mig-advance-clock/include/hw/i386/pc.h 2016-11-14 10:40:48.059128649 -0200 @@ -370,6 +370,11 @@ #define PC_COMPAT_2_7 \ HW_COMPAT_2_7 \ {\ + .driver = "kvmclock",\ + .property = "mach_use_reliable_get_clock",\ + .value = "off",\ + },\ + {\ .driver = TYPE_X86_CPU,\ .property = "l3-cache",\ .value = "off",\ Index: qemu-mig-advance-clock/target-i386/kvm.c =================================================================== --- qemu-mig-advance-clock.orig/target-i386/kvm.c 2016-11-14 10:40:39.750116314 -0200 +++ qemu-mig-advance-clock/target-i386/kvm.c 2016-11-14 10:40:48.061128652 -0200 @@ -117,6 +117,13 @@ return kvm_check_extension(kvm_state, KVM_CAP_X86_SMM); } +bool kvm_has_adjust_clock_stable(void) +{ + int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK); + + return (ret == KVM_CLOCK_TSC_STABLE); +} + bool kvm_allows_irq0_override(void) { return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing(); Index: qemu-mig-advance-clock/target-i386/kvm_i386.h =================================================================== --- qemu-mig-advance-clock.orig/target-i386/kvm_i386.h 2016-11-14 10:40:39.751116316 -0200 +++ qemu-mig-advance-clock/target-i386/kvm_i386.h 2016-11-14 10:40:48.061128652 -0200 @@ -17,6 +17,7 @@ bool kvm_allows_irq0_override(void); bool kvm_has_smm(void); +bool kvm_has_adjust_clock_stable(void); void kvm_synchronize_all_tsc(void); void kvm_arch_reset_vcpu(X86CPU *cs); void kvm_arch_do_init_vcpu(X86CPU *cs);