[qemu,2/2] kvmclock: reduce kvmclock difference on migration

On Mon, Nov 14, 2016 at 04:00:38PM +0100, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 15:50, Marcelo Tosatti wrote:
> > Well, i didnt want to mix the meaning of the variables:
> > 
> > +    /* whether machine supports reliable KVM_GET_CLOCK */
> > +    bool mach_use_reliable_get_clock;
> > +
> > +    /* whether source host supported reliable KVM_GET_CLOCK */
> > +    bool src_use_reliable_get_clock;
> > 
> > See the comments on top (later if you look at the variable, 
> > then have to think: well it has one name, but its disabled 
> > by that other path as well, so its more than its 
> > name,etc...).
> > 
> >> I'm thinking "mach_use_reliable_get_clock is just for migration,
> > 
> > Thats whether the machine supports it. New machines have it enabled,
> > olders don't.
> 
> Yes.
> 
> >> src_use_reliable_get_clock is the state". 
> > 
> > Thats whether the migration source supported it.
> 
> But it's not used only for migration.  It's used on every vmstate change
> (running->stop and stop->running, isn't it?  

No. Its used only if s->clock_valid == false, which means either:

migration/savevm/init.

Now for savevm there is a bug: 

> I think that, apart from
> the migration case, it's better to use s->clock if kvmclock is stable,
> even on older machine types.

Yes, its already doing that:

        /* local (running VM) restore */
        if (s->clock_valid) {
            /*
             * if host does not support reliable KVM_GET_CLOCK,
             * read kvmclock value from memory
             */
            if (!kvm_has_adjust_clock_stable()) {
                time_at_migration = kvmclock_current_nsec(s);
            }

It only uses kvmclock_current_nsec() if kvm_has_adjust_clock_stable is 
false.

> >>>>> +static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> >>>>> +{
> >>>>> +    KVMClockState *s = opaque;
> >>>>> +
> >>>>> +    /*
> >>>>> +     * On machine types that support reliable KVM_GET_CLOCK,
> >>>>> +     * if host kernel does provide reliable KVM_GET_CLOCK,
> >>>>> +     * set src_use_reliable_get_clock=true so that destination
> >>>>> +     * avoids reading kvmclock from memory.
> >>>>> +     */
> >>>>> +    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
> >>>>> +        s->src_use_reliable_get_clock = true;
> >>>>> +    }
> >>>>> +
> >>>>> +    return s->src_use_reliable_get_clock;
> >>>>> +}
> >>>>
> >>>> Here you can just return s->mach_use_reliable_get_clock. 
> >>>
> >>> mach_use_reliable_get_clock can be true but host might not support it.
> >>
> >> Yes, but the "needed" function is only required to avoid breaking
> >> pc-i440fx-2.7 and earlier. 
> > 
> > "needed" is required so that the migration between:
> > 
> > SRC             DEST                BEHAVIOUR
> > ~support        supports            on migration read from guest,
> >                                     on stop/cont use
> >                                     kvm_get_clock/kvm_set_clock
> > 
> > Destination does not use KVM_GET_CLOCK value (which is
> > broken and should not be used).
> 
> If needed returns false, the destination will see
> src_use_reliable_get_clock = false anyway.

Causing it to read the clock from memory, thats what 
should happen.

> If needed returns true, the destination can still see
> src_use_reliable_get_clock = false if that's the value.

You are asking me to change from:

+static bool kvmclock_src_use_reliable_get_clock(void *opaque)
+{
+    KVMClockState *s = opaque;
+
+    /*
+     * On machine types that support reliable KVM_GET_CLOCK,
+     * if host kernel does provide reliable KVM_GET_CLOCK,
+     * set src_use_reliable_get_clock=true so that destination
+     * avoids reading kvmclock from memory.
+     */
+    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
+        s->src_use_reliable_get_clock = true;
+    }
+
+    return s->src_use_reliable_get_clock;
+}

to

static bool kvmclock_src_use_reliable_get_clock(void *opaque)
{
    KVMClockState *s = opaque;

    /*
     * On machine types that support reliable KVM_GET_CLOCK,
     * if host kernel does provide reliable KVM_GET_CLOCK,
     * set src_use_reliable_get_clock=true so that destination
     * avoids reading kvmclock from memory.
     */
    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable())
    {
        s->src_use_reliable_get_clock = true;
    }

    return s->mach_use_reliable_get_clock;
}

Ah, OK, done.

> needed	src_use_reliable_get_clock	effect
> false	false				use kvmclock_current_nsec
> false	true				use kvmclock_current_nsec
> true	false				use kvmclock_current_nsec
> true	true				use s->clock
> 
> 
> So the idea is:
> 
> - set s->clock and s->reliable_get_clock on every KVM_GET_CLOCK

Already do that (apart from s->clock_valid which is necessary).

> - on migration source, use subsections and
> s->mach_use_reliable_get_clock to avoid breaking migration on pre-2.8
> machine types

Migration is already broken when you use different machine types.

So s->src_use_reliable_get_clock is only used to indicate 
to the destination that: "you can use KVM_GET_CLOCK value, 
its safe".

> - on migration destination, use .pre_load so that s->reliable_get_clock
> is initialized to false on older machine types

Thats what mach_use_reliable_get_clock does already:

static Property kvmclock_properties[] = {
    DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState,
                      mach_use_reliable_get_clock, true),
    DEFINE_PROP_END_OF_LIST(),
};

+        .driver   = "kvmclock",\
+        .property = "mach_use_reliable_get_clock",\
+        .value    = "off",\
+    },\
+    {\

> 
> >> If you return true here, you can still
> >> migrate a "false" value for src_use_reliable_get_clock.
> > 
> > But the source only uses a reliable KVM_GET_CLOCK if 
> > both conditions are true.
> > 
> > And the subsection is only needed if the source
> > uses a reliable KVM_GET_CLOCK.
> 
> Yes, but the point of the subsection is just to avoid breaking migration
> format.

Its a new creative use of the subsection. 

> >> It is the same as "is using masterclock", which is actually a stricter
> >> condition than the KVM_CHECK_EXTENSION return value.  The right check to
> >> use is whether masterclock is in use, 
> > 
> > Actually its "has a reliable KVM_GET_CLOCK" (which returns 
> > get_kernel_clock() + (rdtsc() - tsc_timestamp), 
> > 
> > "broken KVM_GET_CLOCK" =  get_kernel_clock()
> 
> Yes.  And these end up being the same.

No. You can have masterclock enabled, KVM_TSC_STABLE set in pvclock
flags, and unreliable KVM_GET_CLOCK (without your patch to
KVM_GET_CLOCK).

Paolo not sure if there is anything further you want me to 
change at this point ?

[qemu,2/2] kvmclock: reduce kvmclock difference on migration

Commit Message

Comments

Patch