From patchwork Mon Nov 14 15:40:18 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Marcelo Tosatti <mtosatti@redhat.com>
X-Patchwork-Id: 9427727
Return-Path: 
 <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	6452C6047D for <patchwork-qemu-devel@patchwork.kernel.org>;
	Mon, 14 Nov 2016 15:41:31 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5445A28921
	for <patchwork-qemu-devel@patchwork.kernel.org>;
	Mon, 14 Nov 2016 15:41:31 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 493EC2893C; Mon, 14 Nov 2016 15:41:31 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=unavailable version=3.3.1
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8CF4928921
	for <patchwork-qemu-devel@patchwork.kernel.org>;
	Mon, 14 Nov 2016 15:41:30 +0000 (UTC)
Received: from localhost ([::1]:40917 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>)
	id 1c6JNx-0002HN-RE for patchwork-qemu-devel@patchwork.kernel.org;
	Mon, 14 Nov 2016 10:41:29 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34447)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mtosatti@redhat.com>) id 1c6JNK-0002Cn-Df
	for qemu-devel@nongnu.org; Mon, 14 Nov 2016 10:40:52 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mtosatti@redhat.com>) id 1c6JNH-0007OQ-G5
	for qemu-devel@nongnu.org; Mon, 14 Nov 2016 10:40:50 -0500
Received: from mx1.redhat.com ([209.132.183.28]:57856)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <mtosatti@redhat.com>) id 1c6JNH-0007OE-7v
	for qemu-devel@nongnu.org; Mon, 14 Nov 2016 10:40:47 -0500
Received: from int-mx09.intmail.prod.int.phx2.redhat.com
	(int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 7A7736DDCB;
	Mon, 14 Nov 2016 15:40:46 +0000 (UTC)
Received: from amt.cnet (vpn1-4-30.gru2.redhat.com [10.97.4.30])
	by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with
	ESMTP id uAEFeg9o010674; Mon, 14 Nov 2016 10:40:43 -0500
Received: from amt.cnet (localhost [127.0.0.1])
	by amt.cnet (Postfix) with ESMTP id 92EF01008E5;
	Mon, 14 Nov 2016 13:40:22 -0200 (BRST)
Received: (from marcelo@localhost)
	by amt.cnet (8.14.7/8.14.7/Submit) id uAEFeJpq032589;
	Mon, 14 Nov 2016 13:40:19 -0200
Date: Mon, 14 Nov 2016 13:40:18 -0200
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20161114154015.GA30048@amt.cnet>
References: <20161114123628.703911091@redhat.com>
	<20161114123700.158592605@redhat.com>
	<a3e00c38-4d4d-d353-231d-9eb0233139e2@redhat.com>
	<20161114140028.GA25935@amt.cnet>
	<62d634ab-70ad-4be7-1622-f2e3a9d865fe@redhat.com>
	<20161114145054.GA28663@amt.cnet>
	<67bffd95-2e4e-7273-c154-a3fdfe622387@redhat.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <67bffd95-2e4e-7273-c154-a3fdfe622387@redhat.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.5.110.28]);
	Mon, 14 Nov 2016 15:40:46 +0000 (UTC)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 209.132.183.28
Subject: Re: [Qemu-devel] [qemu patch 2/2] kvmclock: reduce kvmclock
	difference on migration
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: Eduardo Habkost <ehabkost@redhat.com>, kvm@vger.kernel.org,
	Juan Quintela <quintela@redhat.com>,
	Radim Krcmar <rkrcmar@redhat.com>, qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
X-Virus-Scanned: ClamAV using ClamSMTP

On Mon, Nov 14, 2016 at 04:00:38PM +0100, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 15:50, Marcelo Tosatti wrote:
> > Well, i didnt want to mix the meaning of the variables:
> > 
> > +    /* whether machine supports reliable KVM_GET_CLOCK */
> > +    bool mach_use_reliable_get_clock;
> > +
> > +    /* whether source host supported reliable KVM_GET_CLOCK */
> > +    bool src_use_reliable_get_clock;
> > 
> > See the comments on top (later if you look at the variable, 
> > then have to think: well it has one name, but its disabled 
> > by that other path as well, so its more than its 
> > name,etc...).
> > 
> >> I'm thinking "mach_use_reliable_get_clock is just for migration,
> > 
> > Thats whether the machine supports it. New machines have it enabled,
> > olders don't.
> 
> Yes.
> 
> >> src_use_reliable_get_clock is the state". 
> > 
> > Thats whether the migration source supported it.
> 
> But it's not used only for migration.  It's used on every vmstate change
> (running->stop and stop->running, isn't it?  

No. Its used only if s->clock_valid == false, which means either:

migration/savevm/init.

Now for savevm there is a bug: 

> I think that, apart from
> the migration case, it's better to use s->clock if kvmclock is stable,
> even on older machine types.

Yes, its already doing that:

        /* local (running VM) restore */
        if (s->clock_valid) {
            /*
             * if host does not support reliable KVM_GET_CLOCK,
             * read kvmclock value from memory
             */
            if (!kvm_has_adjust_clock_stable()) {
                time_at_migration = kvmclock_current_nsec(s);
            }

It only uses kvmclock_current_nsec() if kvm_has_adjust_clock_stable is 
false.

> >>>>> +static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> >>>>> +{
> >>>>> +    KVMClockState *s = opaque;
> >>>>> +
> >>>>> +    /*
> >>>>> +     * On machine types that support reliable KVM_GET_CLOCK,
> >>>>> +     * if host kernel does provide reliable KVM_GET_CLOCK,
> >>>>> +     * set src_use_reliable_get_clock=true so that destination
> >>>>> +     * avoids reading kvmclock from memory.
> >>>>> +     */
> >>>>> +    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
> >>>>> +        s->src_use_reliable_get_clock = true;
> >>>>> +    }
> >>>>> +
> >>>>> +    return s->src_use_reliable_get_clock;
> >>>>> +}
> >>>>
> >>>> Here you can just return s->mach_use_reliable_get_clock. 
> >>>
> >>> mach_use_reliable_get_clock can be true but host might not support it.
> >>
> >> Yes, but the "needed" function is only required to avoid breaking
> >> pc-i440fx-2.7 and earlier. 
> > 
> > "needed" is required so that the migration between:
> > 
> > SRC             DEST                BEHAVIOUR
> > ~support        supports            on migration read from guest,
> >                                     on stop/cont use
> >                                     kvm_get_clock/kvm_set_clock
> > 
> > Destination does not use KVM_GET_CLOCK value (which is
> > broken and should not be used).
> 
> If needed returns false, the destination will see
> src_use_reliable_get_clock = false anyway.

Causing it to read the clock from memory, thats what 
should happen.

> If needed returns true, the destination can still see
> src_use_reliable_get_clock = false if that's the value.

You are asking me to change from:

+static bool kvmclock_src_use_reliable_get_clock(void *opaque)
+{
+    KVMClockState *s = opaque;
+
+    /*
+     * On machine types that support reliable KVM_GET_CLOCK,
+     * if host kernel does provide reliable KVM_GET_CLOCK,
+     * set src_use_reliable_get_clock=true so that destination
+     * avoids reading kvmclock from memory.
+     */
+    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
+        s->src_use_reliable_get_clock = true;
+    }
+
+    return s->src_use_reliable_get_clock;
+}

to

static bool kvmclock_src_use_reliable_get_clock(void *opaque)
{
    KVMClockState *s = opaque;

    /*
     * On machine types that support reliable KVM_GET_CLOCK,
     * if host kernel does provide reliable KVM_GET_CLOCK,
     * set src_use_reliable_get_clock=true so that destination
     * avoids reading kvmclock from memory.
     */
    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable())
    {
        s->src_use_reliable_get_clock = true;
    }

    return s->mach_use_reliable_get_clock;
}


Ah, OK, done.


> needed	src_use_reliable_get_clock	effect
> false	false				use kvmclock_current_nsec
> false	true				use kvmclock_current_nsec
> true	false				use kvmclock_current_nsec
> true	true				use s->clock
> 
> 
> So the idea is:
> 
> - set s->clock and s->reliable_get_clock on every KVM_GET_CLOCK

Already do that (apart from s->clock_valid which is necessary).

> - on migration source, use subsections and
> s->mach_use_reliable_get_clock to avoid breaking migration on pre-2.8
> machine types

Migration is already broken when you use different machine types.

So s->src_use_reliable_get_clock is only used to indicate 
to the destination that: "you can use KVM_GET_CLOCK value, 
its safe".

> - on migration destination, use .pre_load so that s->reliable_get_clock
> is initialized to false on older machine types

Thats what mach_use_reliable_get_clock does already:

static Property kvmclock_properties[] = {
    DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState,
                      mach_use_reliable_get_clock, true),
    DEFINE_PROP_END_OF_LIST(),
};

+        .driver   = "kvmclock",\
+        .property = "mach_use_reliable_get_clock",\
+        .value    = "off",\
+    },\
+    {\

> 
> >> If you return true here, you can still
> >> migrate a "false" value for src_use_reliable_get_clock.
> > 
> > But the source only uses a reliable KVM_GET_CLOCK if 
> > both conditions are true.
> > 
> > And the subsection is only needed if the source
> > uses a reliable KVM_GET_CLOCK.
> 
> Yes, but the point of the subsection is just to avoid breaking migration
> format.

Its a new creative use of the subsection. 

> >> It is the same as "is using masterclock", which is actually a stricter
> >> condition than the KVM_CHECK_EXTENSION return value.  The right check to
> >> use is whether masterclock is in use, 
> > 
> > Actually its "has a reliable KVM_GET_CLOCK" (which returns 
> > get_kernel_clock() + (rdtsc() - tsc_timestamp), 
> > 
> > "broken KVM_GET_CLOCK" =  get_kernel_clock()
> 
> Yes.  And these end up being the same.

No. You can have masterclock enabled, KVM_TSC_STABLE set in pvclock
flags, and unreliable KVM_GET_CLOCK (without your patch to
KVM_GET_CLOCK).


Paolo not sure if there is anything further you want me to 
change at this point ?

Index: qemu-mig-advance-clock/hw/i386/kvm/clock.c
===================================================================
--- qemu-mig-advance-clock.orig/hw/i386/kvm/clock.c	2016-11-14 10:40:39.748116312 -0200
+++ qemu-mig-advance-clock/hw/i386/kvm/clock.c	2016-11-14 13:38:29.299955042 -0200
@@ -36,6 +36,12 @@
 
     uint64_t clock;
     bool clock_valid;
+
+    /* whether machine supports reliable KVM_GET_CLOCK */
+    bool mach_use_reliable_get_clock;
+
+    /* whether source host supported reliable KVM_GET_CLOCK */
+    bool src_use_reliable_get_clock;
 } KVMClockState;
 
 struct pvclock_vcpu_time_info {
@@ -81,6 +87,19 @@
     return nsec + time.system_time;
 }
 
+static uint64_t kvm_get_clock(void)
+{
+    struct kvm_clock_data data;
+    int ret;
+
+    ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
+    if (ret < 0) {
+        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
+                abort();
+    }
+    return data.clock;
+}
+
 static void kvmclock_vm_state_change(void *opaque, int running,
                                      RunState state)
 {
@@ -91,15 +110,37 @@
 
     if (running) {
         struct kvm_clock_data data = {};
-        uint64_t time_at_migration = kvmclock_current_nsec(s);
+        uint64_t pvclock_via_mem = 0;
 
-        s->clock_valid = false;
+        /* local (running VM) restore */
+        if (s->clock_valid) {
+            /*
+             * if host does not support reliable KVM_GET_CLOCK,
+             * read kvmclock value from memory
+             */
+            if (!kvm_has_adjust_clock_stable()) {
+                pvclock_via_mem = kvmclock_current_nsec(s);
+            }
+        /* migration/savevm/init restore */
+        } else {
+            /*
+             * use s->clock in case machine uses reliable
+             * get clock and source host supported
+             * reliable get clock
+             */
+            if (!(s->mach_use_reliable_get_clock &&
+                  s->src_use_reliable_get_clock)) {
+                pvclock_via_mem = kvmclock_current_nsec(s);
+            }
+        }
 
-        /* We can't rely on the migrated clock value, just discard it */
-        if (time_at_migration) {
-            s->clock = time_at_migration;
+        /* We can't rely on the saved clock value, just discard it */
+        if (pvclock_via_mem) {
+            s->clock = pvclock_via_mem;
         }
 
+        s->clock_valid = false;
+
         data.clock = s->clock;
         ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
         if (ret < 0) {
@@ -120,8 +161,6 @@
             }
         }
     } else {
-        struct kvm_clock_data data;
-        int ret;
 
         if (s->clock_valid) {
             return;
@@ -129,13 +168,7 @@
 
         kvm_synchronize_all_tsc();
 
-        ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
-        if (ret < 0) {
-            fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
-            abort();
-        }
-        s->clock = data.clock;
-
+        s->clock = kvm_get_clock();
         /*
          * If the VM is stopped, declare the clock state valid to
          * avoid re-reading it on next vmsave (which would return
@@ -152,22 +185,69 @@
     qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
 }
 
+static bool kvmclock_src_use_reliable_get_clock(void *opaque)
+{
+    KVMClockState *s = opaque;
+
+    /*
+     * On machine types that support reliable KVM_GET_CLOCK,
+     * if host kernel does provide reliable KVM_GET_CLOCK,
+     * set src_use_reliable_get_clock=true so that destination
+     * avoids reading kvmclock from memory.
+     */
+    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
+        s->src_use_reliable_get_clock = true;
+    }
+
+    return s->mach_use_reliable_get_clock;
+}
+
+static const VMStateDescription kvmclock_reliable_get_clock = {
+    .name = "kvmclock/src_use_reliable_get_clock",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = kvmclock_src_use_reliable_get_clock,
+    .fields = (VMStateField[]) {
+        VMSTATE_BOOL(src_use_reliable_get_clock, KVMClockState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void kvmclock_pre_save(void *opaque)
+{
+    KVMClockState *s = opaque;
+
+    s->clock = kvm_get_clock();
+}
+
 static const VMStateDescription kvmclock_vmsd = {
     .name = "kvmclock",
     .version_id = 1,
     .minimum_version_id = 1,
+    .pre_save = kvmclock_pre_save,
     .fields = (VMStateField[]) {
         VMSTATE_UINT64(clock, KVMClockState),
         VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription * []) {
+        &kvmclock_reliable_get_clock,
+        NULL
     }
 };
 
+static Property kvmclock_properties[] = {
+    DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState,
+                      mach_use_reliable_get_clock, true),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static void kvmclock_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
 
     dc->realize = kvmclock_realize;
     dc->vmsd = &kvmclock_vmsd;
+    dc->props = kvmclock_properties;
 }
 
 static const TypeInfo kvmclock_info = {
Index: qemu-mig-advance-clock/include/hw/i386/pc.h
===================================================================
--- qemu-mig-advance-clock.orig/include/hw/i386/pc.h	2016-11-14 10:40:39.748116312 -0200
+++ qemu-mig-advance-clock/include/hw/i386/pc.h	2016-11-14 10:40:48.059128649 -0200
@@ -370,6 +370,11 @@
 #define PC_COMPAT_2_7 \
     HW_COMPAT_2_7 \
     {\
+        .driver   = "kvmclock",\
+        .property = "mach_use_reliable_get_clock",\
+        .value    = "off",\
+    },\
+    {\
         .driver   = TYPE_X86_CPU,\
         .property = "l3-cache",\
         .value    = "off",\
Index: qemu-mig-advance-clock/target-i386/kvm.c
===================================================================
--- qemu-mig-advance-clock.orig/target-i386/kvm.c	2016-11-14 10:40:39.750116314 -0200
+++ qemu-mig-advance-clock/target-i386/kvm.c	2016-11-14 10:40:48.061128652 -0200
@@ -117,6 +117,13 @@
     return kvm_check_extension(kvm_state, KVM_CAP_X86_SMM);
 }
 
+bool kvm_has_adjust_clock_stable(void)
+{
+    int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK);
+
+    return (ret == KVM_CLOCK_TSC_STABLE);
+}
+
 bool kvm_allows_irq0_override(void)
 {
     return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
Index: qemu-mig-advance-clock/target-i386/kvm_i386.h
===================================================================
--- qemu-mig-advance-clock.orig/target-i386/kvm_i386.h	2016-11-14 10:40:39.751116316 -0200
+++ qemu-mig-advance-clock/target-i386/kvm_i386.h	2016-11-14 10:40:48.061128652 -0200
@@ -17,6 +17,7 @@
 
 bool kvm_allows_irq0_override(void);
 bool kvm_has_smm(void);
+bool kvm_has_adjust_clock_stable(void);
 void kvm_synchronize_all_tsc(void);
 void kvm_arch_reset_vcpu(X86CPU *cs);
 void kvm_arch_do_init_vcpu(X86CPU *cs);