From patchwork Mon Nov 14 15:40:18 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 9427727 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6452C6047D for ; Mon, 14 Nov 2016 15:41:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5445A28921 for ; Mon, 14 Nov 2016 15:41:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 493EC2893C; Mon, 14 Nov 2016 15:41:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8CF4928921 for ; Mon, 14 Nov 2016 15:41:30 +0000 (UTC) Received: from localhost ([::1]:40917 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c6JNx-0002HN-RE for patchwork-qemu-devel@patchwork.kernel.org; Mon, 14 Nov 2016 10:41:29 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34447) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c6JNK-0002Cn-Df for qemu-devel@nongnu.org; Mon, 14 Nov 2016 10:40:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c6JNH-0007OQ-G5 for qemu-devel@nongnu.org; Mon, 14 Nov 2016 10:40:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:57856) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c6JNH-0007OE-7v for qemu-devel@nongnu.org; Mon, 14 Nov 2016 10:40:47 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7A7736DDCB; Mon, 14 Nov 2016 15:40:46 +0000 (UTC) Received: from amt.cnet (vpn1-4-30.gru2.redhat.com [10.97.4.30]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAEFeg9o010674; Mon, 14 Nov 2016 10:40:43 -0500 Received: from amt.cnet (localhost [127.0.0.1]) by amt.cnet (Postfix) with ESMTP id 92EF01008E5; Mon, 14 Nov 2016 13:40:22 -0200 (BRST) Received: (from marcelo@localhost) by amt.cnet (8.14.7/8.14.7/Submit) id uAEFeJpq032589; Mon, 14 Nov 2016 13:40:19 -0200 Date: Mon, 14 Nov 2016 13:40:18 -0200 From: Marcelo Tosatti To: Paolo Bonzini Message-ID: <20161114154015.GA30048@amt.cnet> References: <20161114123628.703911091@redhat.com> <20161114123700.158592605@redhat.com> <20161114140028.GA25935@amt.cnet> <62d634ab-70ad-4be7-1622-f2e3a9d865fe@redhat.com> <20161114145054.GA28663@amt.cnet> <67bffd95-2e4e-7273-c154-a3fdfe622387@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <67bffd95-2e4e-7273-c154-a3fdfe622387@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 14 Nov 2016 15:40:46 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock difference on migration X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eduardo Habkost , kvm@vger.kernel.org, Juan Quintela , Radim Krcmar , qemu-devel@nongnu.org, "Dr. David Alan Gilbert" Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Nov 14, 2016 at 04:00:38PM +0100, Paolo Bonzini wrote: > > > On 14/11/2016 15:50, Marcelo Tosatti wrote: > > Well, i didnt want to mix the meaning of the variables: > > > > + /* whether machine supports reliable KVM_GET_CLOCK */ > > + bool mach_use_reliable_get_clock; > > + > > + /* whether source host supported reliable KVM_GET_CLOCK */ > > + bool src_use_reliable_get_clock; > > > > See the comments on top (later if you look at the variable, > > then have to think: well it has one name, but its disabled > > by that other path as well, so its more than its > > name,etc...). > > > >> I'm thinking "mach_use_reliable_get_clock is just for migration, > > > > Thats whether the machine supports it. New machines have it enabled, > > olders don't. > > Yes. > > >> src_use_reliable_get_clock is the state". > > > > Thats whether the migration source supported it. > > But it's not used only for migration. It's used on every vmstate change > (running->stop and stop->running, isn't it? No. Its used only if s->clock_valid == false, which means either: migration/savevm/init. Now for savevm there is a bug: > I think that, apart from > the migration case, it's better to use s->clock if kvmclock is stable, > even on older machine types. Yes, its already doing that: /* local (running VM) restore */ if (s->clock_valid) { /* * if host does not support reliable KVM_GET_CLOCK, * read kvmclock value from memory */ if (!kvm_has_adjust_clock_stable()) { time_at_migration = kvmclock_current_nsec(s); } It only uses kvmclock_current_nsec() if kvm_has_adjust_clock_stable is false. > >>>>> +static bool kvmclock_src_use_reliable_get_clock(void *opaque) > >>>>> +{ > >>>>> + KVMClockState *s = opaque; > >>>>> + > >>>>> + /* > >>>>> + * On machine types that support reliable KVM_GET_CLOCK, > >>>>> + * if host kernel does provide reliable KVM_GET_CLOCK, > >>>>> + * set src_use_reliable_get_clock=true so that destination > >>>>> + * avoids reading kvmclock from memory. > >>>>> + */ > >>>>> + if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) { > >>>>> + s->src_use_reliable_get_clock = true; > >>>>> + } > >>>>> + > >>>>> + return s->src_use_reliable_get_clock; > >>>>> +} > >>>> > >>>> Here you can just return s->mach_use_reliable_get_clock. > >>> > >>> mach_use_reliable_get_clock can be true but host might not support it. > >> > >> Yes, but the "needed" function is only required to avoid breaking > >> pc-i440fx-2.7 and earlier. > > > > "needed" is required so that the migration between: > > > > SRC DEST BEHAVIOUR > > ~support supports on migration read from guest, > > on stop/cont use > > kvm_get_clock/kvm_set_clock > > > > Destination does not use KVM_GET_CLOCK value (which is > > broken and should not be used). > > If needed returns false, the destination will see > src_use_reliable_get_clock = false anyway. Causing it to read the clock from memory, thats what should happen. > If needed returns true, the destination can still see > src_use_reliable_get_clock = false if that's the value. You are asking me to change from: +static bool kvmclock_src_use_reliable_get_clock(void *opaque) +{ + KVMClockState *s = opaque; + + /* + * On machine types that support reliable KVM_GET_CLOCK, + * if host kernel does provide reliable KVM_GET_CLOCK, + * set src_use_reliable_get_clock=true so that destination + * avoids reading kvmclock from memory. + */ + if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) { + s->src_use_reliable_get_clock = true; + } + + return s->src_use_reliable_get_clock; +} to static bool kvmclock_src_use_reliable_get_clock(void *opaque) { KVMClockState *s = opaque; /* * On machine types that support reliable KVM_GET_CLOCK, * if host kernel does provide reliable KVM_GET_CLOCK, * set src_use_reliable_get_clock=true so that destination * avoids reading kvmclock from memory. */ if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) { s->src_use_reliable_get_clock = true; } return s->mach_use_reliable_get_clock; } Ah, OK, done. > needed src_use_reliable_get_clock effect > false false use kvmclock_current_nsec > false true use kvmclock_current_nsec > true false use kvmclock_current_nsec > true true use s->clock > > > So the idea is: > > - set s->clock and s->reliable_get_clock on every KVM_GET_CLOCK Already do that (apart from s->clock_valid which is necessary). > - on migration source, use subsections and > s->mach_use_reliable_get_clock to avoid breaking migration on pre-2.8 > machine types Migration is already broken when you use different machine types. So s->src_use_reliable_get_clock is only used to indicate to the destination that: "you can use KVM_GET_CLOCK value, its safe". > - on migration destination, use .pre_load so that s->reliable_get_clock > is initialized to false on older machine types Thats what mach_use_reliable_get_clock does already: static Property kvmclock_properties[] = { DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState, mach_use_reliable_get_clock, true), DEFINE_PROP_END_OF_LIST(), }; + .driver = "kvmclock",\ + .property = "mach_use_reliable_get_clock",\ + .value = "off",\ + },\ + {\ > > >> If you return true here, you can still > >> migrate a "false" value for src_use_reliable_get_clock. > > > > But the source only uses a reliable KVM_GET_CLOCK if > > both conditions are true. > > > > And the subsection is only needed if the source > > uses a reliable KVM_GET_CLOCK. > > Yes, but the point of the subsection is just to avoid breaking migration > format. Its a new creative use of the subsection. > >> It is the same as "is using masterclock", which is actually a stricter > >> condition than the KVM_CHECK_EXTENSION return value. The right check to > >> use is whether masterclock is in use, > > > > Actually its "has a reliable KVM_GET_CLOCK" (which returns > > get_kernel_clock() + (rdtsc() - tsc_timestamp), > > > > "broken KVM_GET_CLOCK" = get_kernel_clock() > > Yes. And these end up being the same. No. You can have masterclock enabled, KVM_TSC_STABLE set in pvclock flags, and unreliable KVM_GET_CLOCK (without your patch to KVM_GET_CLOCK). Paolo not sure if there is anything further you want me to change at this point ? Index: qemu-mig-advance-clock/hw/i386/kvm/clock.c =================================================================== --- qemu-mig-advance-clock.orig/hw/i386/kvm/clock.c 2016-11-14 10:40:39.748116312 -0200 +++ qemu-mig-advance-clock/hw/i386/kvm/clock.c 2016-11-14 13:38:29.299955042 -0200 @@ -36,6 +36,12 @@ uint64_t clock; bool clock_valid; + + /* whether machine supports reliable KVM_GET_CLOCK */ + bool mach_use_reliable_get_clock; + + /* whether source host supported reliable KVM_GET_CLOCK */ + bool src_use_reliable_get_clock; } KVMClockState; struct pvclock_vcpu_time_info { @@ -81,6 +87,19 @@ return nsec + time.system_time; } +static uint64_t kvm_get_clock(void) +{ + struct kvm_clock_data data; + int ret; + + ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); + if (ret < 0) { + fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret)); + abort(); + } + return data.clock; +} + static void kvmclock_vm_state_change(void *opaque, int running, RunState state) { @@ -91,15 +110,37 @@ if (running) { struct kvm_clock_data data = {}; - uint64_t time_at_migration = kvmclock_current_nsec(s); + uint64_t pvclock_via_mem = 0; - s->clock_valid = false; + /* local (running VM) restore */ + if (s->clock_valid) { + /* + * if host does not support reliable KVM_GET_CLOCK, + * read kvmclock value from memory + */ + if (!kvm_has_adjust_clock_stable()) { + pvclock_via_mem = kvmclock_current_nsec(s); + } + /* migration/savevm/init restore */ + } else { + /* + * use s->clock in case machine uses reliable + * get clock and source host supported + * reliable get clock + */ + if (!(s->mach_use_reliable_get_clock && + s->src_use_reliable_get_clock)) { + pvclock_via_mem = kvmclock_current_nsec(s); + } + } - /* We can't rely on the migrated clock value, just discard it */ - if (time_at_migration) { - s->clock = time_at_migration; + /* We can't rely on the saved clock value, just discard it */ + if (pvclock_via_mem) { + s->clock = pvclock_via_mem; } + s->clock_valid = false; + data.clock = s->clock; ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data); if (ret < 0) { @@ -120,8 +161,6 @@ } } } else { - struct kvm_clock_data data; - int ret; if (s->clock_valid) { return; @@ -129,13 +168,7 @@ kvm_synchronize_all_tsc(); - ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); - if (ret < 0) { - fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret)); - abort(); - } - s->clock = data.clock; - + s->clock = kvm_get_clock(); /* * If the VM is stopped, declare the clock state valid to * avoid re-reading it on next vmsave (which would return @@ -152,22 +185,69 @@ qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s); } +static bool kvmclock_src_use_reliable_get_clock(void *opaque) +{ + KVMClockState *s = opaque; + + /* + * On machine types that support reliable KVM_GET_CLOCK, + * if host kernel does provide reliable KVM_GET_CLOCK, + * set src_use_reliable_get_clock=true so that destination + * avoids reading kvmclock from memory. + */ + if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) { + s->src_use_reliable_get_clock = true; + } + + return s->mach_use_reliable_get_clock; +} + +static const VMStateDescription kvmclock_reliable_get_clock = { + .name = "kvmclock/src_use_reliable_get_clock", + .version_id = 1, + .minimum_version_id = 1, + .needed = kvmclock_src_use_reliable_get_clock, + .fields = (VMStateField[]) { + VMSTATE_BOOL(src_use_reliable_get_clock, KVMClockState), + VMSTATE_END_OF_LIST() + } +}; + +static void kvmclock_pre_save(void *opaque) +{ + KVMClockState *s = opaque; + + s->clock = kvm_get_clock(); +} + static const VMStateDescription kvmclock_vmsd = { .name = "kvmclock", .version_id = 1, .minimum_version_id = 1, + .pre_save = kvmclock_pre_save, .fields = (VMStateField[]) { VMSTATE_UINT64(clock, KVMClockState), VMSTATE_END_OF_LIST() + }, + .subsections = (const VMStateDescription * []) { + &kvmclock_reliable_get_clock, + NULL } }; +static Property kvmclock_properties[] = { + DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState, + mach_use_reliable_get_clock, true), + DEFINE_PROP_END_OF_LIST(), +}; + static void kvmclock_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); dc->realize = kvmclock_realize; dc->vmsd = &kvmclock_vmsd; + dc->props = kvmclock_properties; } static const TypeInfo kvmclock_info = { Index: qemu-mig-advance-clock/include/hw/i386/pc.h =================================================================== --- qemu-mig-advance-clock.orig/include/hw/i386/pc.h 2016-11-14 10:40:39.748116312 -0200 +++ qemu-mig-advance-clock/include/hw/i386/pc.h 2016-11-14 10:40:48.059128649 -0200 @@ -370,6 +370,11 @@ #define PC_COMPAT_2_7 \ HW_COMPAT_2_7 \ {\ + .driver = "kvmclock",\ + .property = "mach_use_reliable_get_clock",\ + .value = "off",\ + },\ + {\ .driver = TYPE_X86_CPU,\ .property = "l3-cache",\ .value = "off",\ Index: qemu-mig-advance-clock/target-i386/kvm.c =================================================================== --- qemu-mig-advance-clock.orig/target-i386/kvm.c 2016-11-14 10:40:39.750116314 -0200 +++ qemu-mig-advance-clock/target-i386/kvm.c 2016-11-14 10:40:48.061128652 -0200 @@ -117,6 +117,13 @@ return kvm_check_extension(kvm_state, KVM_CAP_X86_SMM); } +bool kvm_has_adjust_clock_stable(void) +{ + int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK); + + return (ret == KVM_CLOCK_TSC_STABLE); +} + bool kvm_allows_irq0_override(void) { return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing(); Index: qemu-mig-advance-clock/target-i386/kvm_i386.h =================================================================== --- qemu-mig-advance-clock.orig/target-i386/kvm_i386.h 2016-11-14 10:40:39.751116316 -0200 +++ qemu-mig-advance-clock/target-i386/kvm_i386.h 2016-11-14 10:40:48.061128652 -0200 @@ -17,6 +17,7 @@ bool kvm_allows_irq0_override(void); bool kvm_has_smm(void); +bool kvm_has_adjust_clock_stable(void); void kvm_synchronize_all_tsc(void); void kvm_arch_reset_vcpu(X86CPU *cs); void kvm_arch_do_init_vcpu(X86CPU *cs);