diff mbox

[v3,2/2] i386/kvm: lower requirements for Hyper-V frequency MSRs exposure

Message ID 20180320173500.32065-3-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vitaly Kuznetsov March 20, 2018, 5:35 p.m. UTC
Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS
nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when
Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean
these frequencies are stable, it just means they're available for reading.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 target/i386/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Roman Kagan March 21, 2018, 12:10 p.m. UTC | #1
On Tue, Mar 20, 2018 at 06:35:00PM +0100, Vitaly Kuznetsov wrote:
> Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS
> nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when
> Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean
> these frequencies are stable, it just means they're available for reading.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  target/i386/kvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 7d9f9ca0b1..74fc3d3b2c 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -651,7 +651,7 @@ static int hyperv_handle_properties(CPUState *cs)
>          env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE;
>          env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE;
>  
> -        if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) {
> +        if (has_msr_hv_frequencies && env->tsc_khz) {
>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
>          }

I suggest that we add a corresponding cpu property here, too.  The guest
may legitimately rely on these msrs when it sees the support in CPUID,
and migrating from a kernel with the feature supported (4.14+) to an
older one will make it crash.

Roman.
Vitaly Kuznetsov March 21, 2018, 1:18 p.m. UTC | #2
Roman Kagan <rkagan@virtuozzo.com> writes:

> On Tue, Mar 20, 2018 at 06:35:00PM +0100, Vitaly Kuznetsov wrote:
>> Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS
>> nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when
>> Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean
>> these frequencies are stable, it just means they're available for reading.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  target/i386/kvm.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>> index 7d9f9ca0b1..74fc3d3b2c 100644
>> --- a/target/i386/kvm.c
>> +++ b/target/i386/kvm.c
>> @@ -651,7 +651,7 @@ static int hyperv_handle_properties(CPUState *cs)
>>          env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE;
>>          env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE;
>>  
>> -        if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) {
>> +        if (has_msr_hv_frequencies && env->tsc_khz) {
>>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
>>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
>>          }
>
> I suggest that we add a corresponding cpu property here, too.  The guest
> may legitimately rely on these msrs when it sees the support in CPUID,
> and migrating from a kernel with the feature supported (4.14+) to an
> older one will make it crash.
>

This can be arranged, but what happens to people who use these features
today? Assuming they also passed 'invtsc' they have stable TSC page
clocksource already (when Hyper-V role is enabled) but when we start
requesting a new 'hv_frequency' cpu property they'll suddenly lose what
they have...
Paolo Bonzini March 21, 2018, 3:33 p.m. UTC | #3
On 20/03/2018 18:35, Vitaly Kuznetsov wrote:
> +        if (has_msr_hv_frequencies && env->tsc_khz) {
>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
>          }

Since you have added cpu->hyperv_reenlightenment, I'd rather change this
so that we don't make the "license to change guest ABI across migration"
apply more.  We can exploit the fact that Windows doesn't even use the
MSRs unless either invtsc or re-enlightenment is present.  Something
like this:

       if (has_msr_hv_frequencies && env->tsc_khz &&
	   (tsc_is_stable_and_known(env) ||
            cpu->hyperv_reenlightenment))

will make the MSRs visible in all useful cases, without having to add
yet another knob.

(Don't worry, this backwards-compatibility stuff is the hardest part.
I'm so happy that Eduardo is the one maintaining it :)).

Paolo
Vitaly Kuznetsov March 21, 2018, 4:17 p.m. UTC | #4
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 20/03/2018 18:35, Vitaly Kuznetsov wrote:
>> +        if (has_msr_hv_frequencies && env->tsc_khz) {
>>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
>>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
>>          }
>
> Since you have added cpu->hyperv_reenlightenment, I'd rather change this
> so that we don't make the "license to change guest ABI across migration"
> apply more.  We can exploit the fact that Windows doesn't even use the
> MSRs unless either invtsc or re-enlightenment is present.  Something
> like this:
>
>        if (has_msr_hv_frequencies && env->tsc_khz &&
> 	   (tsc_is_stable_and_known(env) ||
>             cpu->hyperv_reenlightenment))
>
> will make the MSRs visible in all useful cases, without having to add
> yet another knob.
>

Can be arranged, of course.

(What I'm worried about with all our hv_* knobs is that more of them we
have easier it is to assemble some frankenstien which won't look like
any existing Hyper-V version; we're probably not doing a very good job
tesing all possible hv_* combinations as people probably use 'all or
nothing'. In case we end up finding a bug in Windows with some weird
hv_* combination it's unlikely Microsoft will bother fixing at as it
doesn't reproduce on any existent Hyper-V version.

That said, it would be great to eventually have something like
'hv_ws2012r2' property making us look exactly the same real WS2012R2
looks like. Unfortunatelly, I'm unsure about a path to get there).

> (Don't worry, this backwards-compatibility stuff is the hardest part.
> I'm so happy that Eduardo is the one maintaining it :)).

I feel the pain :-) Thanks for the reviews!
Roman Kagan March 21, 2018, 4:47 p.m. UTC | #5
On Wed, Mar 21, 2018 at 04:33:33PM +0100, Paolo Bonzini wrote:
> On 20/03/2018 18:35, Vitaly Kuznetsov wrote:
> > +        if (has_msr_hv_frequencies && env->tsc_khz) {
> >              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
> >              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
> >          }
> 
> Since you have added cpu->hyperv_reenlightenment, I'd rather change this
> so that we don't make the "license to change guest ABI across migration"
> apply more.  We can exploit the fact that Windows doesn't even use the
> MSRs unless either invtsc or re-enlightenment is present.

That's not a given.  It reportedly doesn't use these MSRs *now*, but I
see no reason for it not to start using it at some point, say, to avoid
TSC calibration.  E.g. Linux already does so with its hv_get_tsc_khz().

And the bad thing is that it'll kill your guest with #GP when you
migrate from a new KVM to an old one.

Roman.
Roman Kagan March 21, 2018, 4:57 p.m. UTC | #6
On Wed, Mar 21, 2018 at 02:18:54PM +0100, Vitaly Kuznetsov wrote:
> Roman Kagan <rkagan@virtuozzo.com> writes:
> 
> > On Tue, Mar 20, 2018 at 06:35:00PM +0100, Vitaly Kuznetsov wrote:
> >> Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS
> >> nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when
> >> Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean
> >> these frequencies are stable, it just means they're available for reading.
> >> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  target/i386/kvm.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> >> index 7d9f9ca0b1..74fc3d3b2c 100644
> >> --- a/target/i386/kvm.c
> >> +++ b/target/i386/kvm.c
> >> @@ -651,7 +651,7 @@ static int hyperv_handle_properties(CPUState *cs)
> >>          env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE;
> >>          env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE;
> >>  
> >> -        if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) {
> >> +        if (has_msr_hv_frequencies && env->tsc_khz) {
> >>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
> >>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
> >>          }
> >
> > I suggest that we add a corresponding cpu property here, too.  The guest
> > may legitimately rely on these msrs when it sees the support in CPUID,
> > and migrating from a kernel with the feature supported (4.14+) to an
> > older one will make it crash.
> >
> 
> This can be arranged, but what happens to people who use these features
> today? Assuming they also passed 'invtsc' they have stable TSC page
> clocksource already (when Hyper-V role is enabled) but when we start
> requesting a new 'hv_frequency' cpu property they'll suddenly lose what
> they have...

I see two cases here:

1) people start a new VM, and discover that their old configuration is
   not enough for this feature to work.

   They need to reconfigure and restart the VM.  This costs them some
   time investigating and restarting, but not data.

2) people migrate from a QEMU without ->hv_frequency, to a new one with
   ->hv_frequency=off (assuming on both ends KVM supports the frequency
   MSRs).

   With the current implementation in KVM, this will only result in the
   feature bits disappearing from the respective CPUID leaf, but the
   MSRs themselves will continue working as they used to.  So the guest
   either won't notice or will check the CPUID and adjust.


Am I missing anything?

Roman.
Roman Kagan March 21, 2018, 5:17 p.m. UTC | #7
On Wed, Mar 21, 2018 at 05:17:38PM +0100, Vitaly Kuznetsov wrote:
> (What I'm worried about with all our hv_* knobs is that more of them we
> have easier it is to assemble some frankenstien which won't look like
> any existing Hyper-V version; we're probably not doing a very good job
> tesing all possible hv_* combinations as people probably use 'all or
> nothing'. In case we end up finding a bug in Windows with some weird
> hv_* combination it's unlikely Microsoft will bother fixing at as it
> doesn't reproduce on any existent Hyper-V version.

I agree that this is getting cumbersome, but, given that features get
added incrementally and we need to be able to maintain backwards
compatibility, I'm afraid this is unavoidable.

> That said, it would be great to eventually have something like
> 'hv_ws2012r2' property making us look exactly the same real WS2012R2
> looks like. Unfortunatelly, I'm unsure about a path to get there).

I'm tempted to delegate this -- combining features into user-friendly
sets -- to the upper layers: libvirt or even something on top of it.

Roman.
Eduardo Habkost March 21, 2018, 8:06 p.m. UTC | #8
On Wed, Mar 21, 2018 at 08:17:55PM +0300, Roman Kagan wrote:
> On Wed, Mar 21, 2018 at 05:17:38PM +0100, Vitaly Kuznetsov wrote:
> > (What I'm worried about with all our hv_* knobs is that more of them we
> > have easier it is to assemble some frankenstien which won't look like
> > any existing Hyper-V version; we're probably not doing a very good job
> > tesing all possible hv_* combinations as people probably use 'all or
> > nothing'. In case we end up finding a bug in Windows with some weird
> > hv_* combination it's unlikely Microsoft will bother fixing at as it
> > doesn't reproduce on any existent Hyper-V version.
> 
> I agree that this is getting cumbersome, but, given that features get
> added incrementally and we need to be able to maintain backwards
> compatibility, I'm afraid this is unavoidable.
> 
> > That said, it would be great to eventually have something like
> > 'hv_ws2012r2' property making us look exactly the same real WS2012R2
> > looks like. Unfortunatelly, I'm unsure about a path to get there).
> 
> I'm tempted to delegate this -- combining features into user-friendly
> sets -- to the upper layers: libvirt or even something on top of it.

Sounds simpler to me, otherwise we would need a mechanism to tell
the upper layers which of those named are usable on the current
host.
Eduardo Habkost March 21, 2018, 8:19 p.m. UTC | #9
On Wed, Mar 21, 2018 at 07:57:29PM +0300, Roman Kagan wrote:
> On Wed, Mar 21, 2018 at 02:18:54PM +0100, Vitaly Kuznetsov wrote:
> > Roman Kagan <rkagan@virtuozzo.com> writes:
> > 
> > > On Tue, Mar 20, 2018 at 06:35:00PM +0100, Vitaly Kuznetsov wrote:
> > >> Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS
> > >> nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when
> > >> Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean
> > >> these frequencies are stable, it just means they're available for reading.
> > >> 
> > >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > >> ---
> > >>  target/i386/kvm.c | 2 +-
> > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >> 
> > >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> > >> index 7d9f9ca0b1..74fc3d3b2c 100644
> > >> --- a/target/i386/kvm.c
> > >> +++ b/target/i386/kvm.c
> > >> @@ -651,7 +651,7 @@ static int hyperv_handle_properties(CPUState *cs)
> > >>          env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE;
> > >>          env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE;
> > >>  
> > >> -        if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) {
> > >> +        if (has_msr_hv_frequencies && env->tsc_khz) {
> > >>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
> > >>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
> > >>          }
> > >
> > > I suggest that we add a corresponding cpu property here, too.  The guest
> > > may legitimately rely on these msrs when it sees the support in CPUID,
> > > and migrating from a kernel with the feature supported (4.14+) to an
> > > older one will make it crash.
> > >
> > 
> > This can be arranged, but what happens to people who use these features
> > today? Assuming they also passed 'invtsc' they have stable TSC page
> > clocksource already (when Hyper-V role is enabled) but when we start
> > requesting a new 'hv_frequency' cpu property they'll suddenly lose what
> > they have...
> 
> I see two cases here:
> 
> 1) people start a new VM, and discover that their old configuration is
>    not enough for this feature to work.
> 
>    They need to reconfigure and restart the VM.  This costs them some
>    time investigating and restarting, but not data.

If we keep machine-type compatibility, people will need to do
that only if they change the machine-type (or use the "pc" or
"q35" aliases).  If they copy the old configuration, it will keep
working.

machine-type compatibility also makes the following case a bit
safer:

> 
> 2) people migrate from a QEMU without ->hv_frequency, to a new one with
>    ->hv_frequency=off (assuming on both ends KVM supports the frequency
>    MSRs).
> 
>    With the current implementation in KVM, this will only result in the
>    feature bits disappearing from the respective CPUID leaf, but the
>    MSRs themselves will continue working as they used to.  So the guest
>    either won't notice or will check the CPUID and adjust.

If we keep machine-type compatibility, the CPUID bit won't
disappear for the guest while the MSRs keep working.


Whichever solution we choose, we can still have guests crashing
if migrating a pc-2.11 machine from a 4.14+ host kernel to a host
with an older kernel.  But I don't think there's a way out of
this, except requiring an explicit "hv-frequencies" CPU option on
newer machine-types.
Roman Kagan March 22, 2018, 1 p.m. UTC | #10
On Wed, Mar 21, 2018 at 05:19:24PM -0300, Eduardo Habkost wrote:
> On Wed, Mar 21, 2018 at 07:57:29PM +0300, Roman Kagan wrote:
> > On Wed, Mar 21, 2018 at 02:18:54PM +0100, Vitaly Kuznetsov wrote:
> > > Roman Kagan <rkagan@virtuozzo.com> writes:
> > > 
> > > > On Tue, Mar 20, 2018 at 06:35:00PM +0100, Vitaly Kuznetsov wrote:
> > > >> Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS
> > > >> nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when
> > > >> Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean
> > > >> these frequencies are stable, it just means they're available for reading.
> > > >> 
> > > >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > >> ---
> > > >>  target/i386/kvm.c | 2 +-
> > > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >> 
> > > >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> > > >> index 7d9f9ca0b1..74fc3d3b2c 100644
> > > >> --- a/target/i386/kvm.c
> > > >> +++ b/target/i386/kvm.c
> > > >> @@ -651,7 +651,7 @@ static int hyperv_handle_properties(CPUState *cs)
> > > >>          env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE;
> > > >>          env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE;
> > > >>  
> > > >> -        if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) {
> > > >> +        if (has_msr_hv_frequencies && env->tsc_khz) {
> > > >>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
> > > >>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
> > > >>          }
> > > >
> > > > I suggest that we add a corresponding cpu property here, too.  The guest
> > > > may legitimately rely on these msrs when it sees the support in CPUID,
> > > > and migrating from a kernel with the feature supported (4.14+) to an
> > > > older one will make it crash.
> > > >
> > > 
> > > This can be arranged, but what happens to people who use these features
> > > today? Assuming they also passed 'invtsc' they have stable TSC page
> > > clocksource already (when Hyper-V role is enabled) but when we start
> > > requesting a new 'hv_frequency' cpu property they'll suddenly lose what
> > > they have...
> > 
> > I see two cases here:
> > 
> > 1) people start a new VM, and discover that their old configuration is
> >    not enough for this feature to work.
> > 
> >    They need to reconfigure and restart the VM.  This costs them some
> >    time investigating and restarting, but not data.
> 
> If we keep machine-type compatibility, people will need to do
> that only if they change the machine-type (or use the "pc" or
> "q35" aliases).  If they copy the old configuration, it will keep
> working.

The problem is that the feature is not fixed by the machine-type, due to
the forgotten property: it only depends on the KVM version.  So, once
(if) we add the property and make the feature deterministic, we'll lose
compatibility one way or another.

Or are you suggesting that for pre-2.12 machine types we leave the
property at "decided by your KVM" state?

> 
> machine-type compatibility also makes the following case a bit
> safer:
> 
> > 
> > 2) people migrate from a QEMU without ->hv_frequency, to a new one with
> >    ->hv_frequency=off (assuming on both ends KVM supports the frequency
> >    MSRs).
> > 
> >    With the current implementation in KVM, this will only result in the
> >    feature bits disappearing from the respective CPUID leaf, but the
> >    MSRs themselves will continue working as they used to.  So the guest
> >    either won't notice or will check the CPUID and adjust.
> 
> If we keep machine-type compatibility, the CPUID bit won't
> disappear for the guest while the MSRs keep working.
> 
> 
> Whichever solution we choose, we can still have guests crashing
> if migrating a pc-2.11 machine from a 4.14+ host kernel to a host
> with an older kernel.  But I don't think there's a way out of
> this, except requiring an explicit "hv-frequencies" CPU option on
> newer machine-types.

What's wrong with requiring it, as we do for all other hv_* properties?

Roman.
Eduardo Habkost March 22, 2018, 1:22 p.m. UTC | #11
On Thu, Mar 22, 2018 at 04:00:14PM +0300, Roman Kagan wrote:
> On Wed, Mar 21, 2018 at 05:19:24PM -0300, Eduardo Habkost wrote:
> > On Wed, Mar 21, 2018 at 07:57:29PM +0300, Roman Kagan wrote:
> > > On Wed, Mar 21, 2018 at 02:18:54PM +0100, Vitaly Kuznetsov wrote:
> > > > Roman Kagan <rkagan@virtuozzo.com> writes:
> > > > 
> > > > > On Tue, Mar 20, 2018 at 06:35:00PM +0100, Vitaly Kuznetsov wrote:
> > > > >> Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS
> > > > >> nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when
> > > > >> Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean
> > > > >> these frequencies are stable, it just means they're available for reading.
> > > > >> 
> > > > >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > > >> ---
> > > > >>  target/i386/kvm.c | 2 +-
> > > > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >> 
> > > > >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> > > > >> index 7d9f9ca0b1..74fc3d3b2c 100644
> > > > >> --- a/target/i386/kvm.c
> > > > >> +++ b/target/i386/kvm.c
> > > > >> @@ -651,7 +651,7 @@ static int hyperv_handle_properties(CPUState *cs)
> > > > >>          env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE;
> > > > >>          env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE;
> > > > >>  
> > > > >> -        if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) {
> > > > >> +        if (has_msr_hv_frequencies && env->tsc_khz) {
> > > > >>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
> > > > >>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
> > > > >>          }
> > > > >
> > > > > I suggest that we add a corresponding cpu property here, too.  The guest
> > > > > may legitimately rely on these msrs when it sees the support in CPUID,
> > > > > and migrating from a kernel with the feature supported (4.14+) to an
> > > > > older one will make it crash.
> > > > >
> > > > 
> > > > This can be arranged, but what happens to people who use these features
> > > > today? Assuming they also passed 'invtsc' they have stable TSC page
> > > > clocksource already (when Hyper-V role is enabled) but when we start
> > > > requesting a new 'hv_frequency' cpu property they'll suddenly lose what
> > > > they have...
> > > 
> > > I see two cases here:
> > > 
> > > 1) people start a new VM, and discover that their old configuration is
> > >    not enough for this feature to work.
> > > 
> > >    They need to reconfigure and restart the VM.  This costs them some
> > >    time investigating and restarting, but not data.
> > 
> > If we keep machine-type compatibility, people will need to do
> > that only if they change the machine-type (or use the "pc" or
> > "q35" aliases).  If they copy the old configuration, it will keep
> > working.
> 
> The problem is that the feature is not fixed by the machine-type, due to
> the forgotten property: it only depends on the KVM version.  So, once
> (if) we add the property and make the feature deterministic, we'll lose
> compatibility one way or another.
> 
> Or are you suggesting that for pre-2.12 machine types we leave the
> property at "decided by your KVM" state?

Yes, that's what I mean.  This looks like the only way to avoid
losing features by just cold-rebooting an existing VM.

The scenario I'm thinking is this:

1) pc-2.11 VM started on host running QEMU 2.11
2) VM migrated to a host containing this patch
3) 1 year later, the VM is shut down and booted again.
4) Things stop working inside the VM because hv-frequency is
   unexpectedly gone.

Machine-type compatibility code would avoid (4).


> 
> > 
> > machine-type compatibility also makes the following case a bit
> > safer:
> > 
> > > 
> > > 2) people migrate from a QEMU without ->hv_frequency, to a new one with
> > >    ->hv_frequency=off (assuming on both ends KVM supports the frequency
> > >    MSRs).
> > > 
> > >    With the current implementation in KVM, this will only result in the
> > >    feature bits disappearing from the respective CPUID leaf, but the
> > >    MSRs themselves will continue working as they used to.  So the guest
> > >    either won't notice or will check the CPUID and adjust.
> > 
> > If we keep machine-type compatibility, the CPUID bit won't
> > disappear for the guest while the MSRs keep working.
> > 
> > 
> > Whichever solution we choose, we can still have guests crashing
> > if migrating a pc-2.11 machine from a 4.14+ host kernel to a host
> > with an older kernel.  But I don't think there's a way out of
> > this, except requiring an explicit "hv-frequencies" CPU option on
> > newer machine-types.
> 
> What's wrong with requiring it, as we do for all other hv_* properties?

On new machine-types, nothing wrong.

On existing machine-types, see above.
Roman Kagan March 22, 2018, 1:58 p.m. UTC | #12
On Thu, Mar 22, 2018 at 10:22:18AM -0300, Eduardo Habkost wrote:
> On Thu, Mar 22, 2018 at 04:00:14PM +0300, Roman Kagan wrote:
> > On Wed, Mar 21, 2018 at 05:19:24PM -0300, Eduardo Habkost wrote:
> > > On Wed, Mar 21, 2018 at 07:57:29PM +0300, Roman Kagan wrote:
> > > > On Wed, Mar 21, 2018 at 02:18:54PM +0100, Vitaly Kuznetsov wrote:
> > > > > Roman Kagan <rkagan@virtuozzo.com> writes:
> > > > > 
> > > > > > On Tue, Mar 20, 2018 at 06:35:00PM +0100, Vitaly Kuznetsov wrote:
> > > > > >> Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS
> > > > > >> nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when
> > > > > >> Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean
> > > > > >> these frequencies are stable, it just means they're available for reading.
> > > > > >> 
> > > > > >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > > > >> ---
> > > > > >>  target/i386/kvm.c | 2 +-
> > > > > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >> 
> > > > > >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> > > > > >> index 7d9f9ca0b1..74fc3d3b2c 100644
> > > > > >> --- a/target/i386/kvm.c
> > > > > >> +++ b/target/i386/kvm.c
> > > > > >> @@ -651,7 +651,7 @@ static int hyperv_handle_properties(CPUState *cs)
> > > > > >>          env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE;
> > > > > >>          env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE;
> > > > > >>  
> > > > > >> -        if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) {
> > > > > >> +        if (has_msr_hv_frequencies && env->tsc_khz) {
> > > > > >>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
> > > > > >>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
> > > > > >>          }
> > > > > >
> > > > > > I suggest that we add a corresponding cpu property here, too.  The guest
> > > > > > may legitimately rely on these msrs when it sees the support in CPUID,
> > > > > > and migrating from a kernel with the feature supported (4.14+) to an
> > > > > > older one will make it crash.
> > > > > >
> > > > > 
> > > > > This can be arranged, but what happens to people who use these features
> > > > > today? Assuming they also passed 'invtsc' they have stable TSC page
> > > > > clocksource already (when Hyper-V role is enabled) but when we start
> > > > > requesting a new 'hv_frequency' cpu property they'll suddenly lose what
> > > > > they have...
> > > > 
> > > > I see two cases here:
> > > > 
> > > > 1) people start a new VM, and discover that their old configuration is
> > > >    not enough for this feature to work.
> > > > 
> > > >    They need to reconfigure and restart the VM.  This costs them some
> > > >    time investigating and restarting, but not data.
> > > 
> > > If we keep machine-type compatibility, people will need to do
> > > that only if they change the machine-type (or use the "pc" or
> > > "q35" aliases).  If they copy the old configuration, it will keep
> > > working.
> > 
> > The problem is that the feature is not fixed by the machine-type, due to
> > the forgotten property: it only depends on the KVM version.  So, once
> > (if) we add the property and make the feature deterministic, we'll lose
> > compatibility one way or another.
> > 
> > Or are you suggesting that for pre-2.12 machine types we leave the
> > property at "decided by your KVM" state?
> 
> Yes, that's what I mean.  This looks like the only way to avoid
> losing features by just cold-rebooting an existing VM.
> 
> The scenario I'm thinking is this:
> 
> 1) pc-2.11 VM started on host running QEMU 2.11
> 2) VM migrated to a host containing this patch
> 3) 1 year later, the VM is shut down and booted again.
> 4) Things stop working inside the VM because hv-frequency is
>    unexpectedly gone.
> 
> Machine-type compatibility code would avoid (4).

Right, but (4) typically means that you fail to start your workload from
a clean state, so you just go and fix it; no data is lost.

Compare this to a migration to an older KVM which results in your guest
crashing, where you risk data loss and still have to meddle with
configs.

> > > machine-type compatibility also makes the following case a bit
> > > safer:
> > > 
> > > > 
> > > > 2) people migrate from a QEMU without ->hv_frequency, to a new one with
> > > >    ->hv_frequency=off (assuming on both ends KVM supports the frequency
> > > >    MSRs).
> > > > 
> > > >    With the current implementation in KVM, this will only result in the
> > > >    feature bits disappearing from the respective CPUID leaf, but the
> > > >    MSRs themselves will continue working as they used to.  So the guest
> > > >    either won't notice or will check the CPUID and adjust.
> > > 
> > > If we keep machine-type compatibility, the CPUID bit won't
> > > disappear for the guest while the MSRs keep working.
> > > 
> > > 
> > > Whichever solution we choose, we can still have guests crashing
> > > if migrating a pc-2.11 machine from a 4.14+ host kernel to a host
> > > with an older kernel.  But I don't think there's a way out of
> > > this, except requiring an explicit "hv-frequencies" CPU option on
> > > newer machine-types.
> > 
> > What's wrong with requiring it, as we do for all other hv_* properties?
> 
> On new machine-types, nothing wrong.
> 
> On existing machine-types, see above.

I wonder if the following can cater to all relevant cases:

- hv_frequencies property is added, defaulting to "off", so that new
  users of this feature would need to explicitly turn it on;

- on pre-2.12 machine types, it's set to the value of hv_time property
  by the compat code, so that on VMs where this feature could
  potentially be present it would become required; as a result, these
  configurations will refuse to start on insufficiently capable KVM,
  preventing the migration attempts.

Am I missing any scenarios that aren't covered?

Thanks,
Roman.
Eduardo Habkost March 22, 2018, 6:38 p.m. UTC | #13
On Thu, Mar 22, 2018 at 04:58:03PM +0300, Roman Kagan wrote:
> On Thu, Mar 22, 2018 at 10:22:18AM -0300, Eduardo Habkost wrote:
> > On Thu, Mar 22, 2018 at 04:00:14PM +0300, Roman Kagan wrote:
> > > On Wed, Mar 21, 2018 at 05:19:24PM -0300, Eduardo Habkost wrote:
> > > > On Wed, Mar 21, 2018 at 07:57:29PM +0300, Roman Kagan wrote:
> > > > > On Wed, Mar 21, 2018 at 02:18:54PM +0100, Vitaly Kuznetsov wrote:
> > > > > > Roman Kagan <rkagan@virtuozzo.com> writes:
> > > > > > 
> > > > > > > On Tue, Mar 20, 2018 at 06:35:00PM +0100, Vitaly Kuznetsov wrote:
> > > > > > >> Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS
> > > > > > >> nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when
> > > > > > >> Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean
> > > > > > >> these frequencies are stable, it just means they're available for reading.
> > > > > > >> 
> > > > > > >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > > > > >> ---
> > > > > > >>  target/i386/kvm.c | 2 +-
> > > > > > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >> 
> > > > > > >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> > > > > > >> index 7d9f9ca0b1..74fc3d3b2c 100644
> > > > > > >> --- a/target/i386/kvm.c
> > > > > > >> +++ b/target/i386/kvm.c
> > > > > > >> @@ -651,7 +651,7 @@ static int hyperv_handle_properties(CPUState *cs)
> > > > > > >>          env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE;
> > > > > > >>          env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE;
> > > > > > >>  
> > > > > > >> -        if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) {
> > > > > > >> +        if (has_msr_hv_frequencies && env->tsc_khz) {
> > > > > > >>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
> > > > > > >>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
> > > > > > >>          }
> > > > > > >
> > > > > > > I suggest that we add a corresponding cpu property here, too.  The guest
> > > > > > > may legitimately rely on these msrs when it sees the support in CPUID,
> > > > > > > and migrating from a kernel with the feature supported (4.14+) to an
> > > > > > > older one will make it crash.
> > > > > > >
> > > > > > 
> > > > > > This can be arranged, but what happens to people who use these features
> > > > > > today? Assuming they also passed 'invtsc' they have stable TSC page
> > > > > > clocksource already (when Hyper-V role is enabled) but when we start
> > > > > > requesting a new 'hv_frequency' cpu property they'll suddenly lose what
> > > > > > they have...
> > > > > 
> > > > > I see two cases here:
> > > > > 
> > > > > 1) people start a new VM, and discover that their old configuration is
> > > > >    not enough for this feature to work.
> > > > > 
> > > > >    They need to reconfigure and restart the VM.  This costs them some
> > > > >    time investigating and restarting, but not data.
> > > > 
> > > > If we keep machine-type compatibility, people will need to do
> > > > that only if they change the machine-type (or use the "pc" or
> > > > "q35" aliases).  If they copy the old configuration, it will keep
> > > > working.
> > > 
> > > The problem is that the feature is not fixed by the machine-type, due to
> > > the forgotten property: it only depends on the KVM version.  So, once
> > > (if) we add the property and make the feature deterministic, we'll lose
> > > compatibility one way or another.
> > > 
> > > Or are you suggesting that for pre-2.12 machine types we leave the
> > > property at "decided by your KVM" state?
> > 
> > Yes, that's what I mean.  This looks like the only way to avoid
> > losing features by just cold-rebooting an existing VM.
> > 
> > The scenario I'm thinking is this:
> > 
> > 1) pc-2.11 VM started on host running QEMU 2.11
> > 2) VM migrated to a host containing this patch
> > 3) 1 year later, the VM is shut down and booted again.
> > 4) Things stop working inside the VM because hv-frequency is
> >    unexpectedly gone.
> > 
> > Machine-type compatibility code would avoid (4).
> 
> Right, but (4) typically means that you fail to start your workload from
> a clean state, so you just go and fix it; no data is lost.
> 
> Compare this to a migration to an older KVM which results in your guest
> crashing, where you risk data loss and still have to meddle with
> configs.

True. To make it worse, we are already unable to avoid this crash
on existing VMs without a reboot.  The only case where we can fix
this is if live-migration to older KVM happens after the guest
was rebooted when running on a newer QEMU version.  :(



> 
> > > > machine-type compatibility also makes the following case a bit
> > > > safer:
> > > > 
> > > > > 
> > > > > 2) people migrate from a QEMU without ->hv_frequency, to a new one with
> > > > >    ->hv_frequency=off (assuming on both ends KVM supports the frequency
> > > > >    MSRs).
> > > > > 
> > > > >    With the current implementation in KVM, this will only result in the
> > > > >    feature bits disappearing from the respective CPUID leaf, but the
> > > > >    MSRs themselves will continue working as they used to.  So the guest
> > > > >    either won't notice or will check the CPUID and adjust.
> > > > 
> > > > If we keep machine-type compatibility, the CPUID bit won't
> > > > disappear for the guest while the MSRs keep working.
> > > > 
> > > > 
> > > > Whichever solution we choose, we can still have guests crashing
> > > > if migrating a pc-2.11 machine from a 4.14+ host kernel to a host
> > > > with an older kernel.  But I don't think there's a way out of
> > > > this, except requiring an explicit "hv-frequencies" CPU option on
> > > > newer machine-types.
> > > 
> > > What's wrong with requiring it, as we do for all other hv_* properties?
> > 
> > On new machine-types, nothing wrong.
> > 
> > On existing machine-types, see above.
> 
> I wonder if the following can cater to all relevant cases:
> 
> - hv_frequencies property is added, defaulting to "off", so that new
>   users of this feature would need to explicitly turn it on;
> 
> - on pre-2.12 machine types, it's set to the value of hv_time property
>   by the compat code, so that on VMs where this feature could
>   potentially be present it would become required; as a result, these
>   configurations will refuse to start on insufficiently capable KVM,
>   preventing the migration attempts.

This sounds like the safest option.  The cost will be the
inconvenience of being unable to run pc-2.11 on hosts with older
KVM (Linux < v4.14, without commit
72c139bacfa386145d7bbb68c47c8824716153b6), and the need to
explicitly enable hv-frequencies on pc-2.12 and newer.

> 
> Am I missing any scenarios that aren't covered?
> 

It looks like the guest can still crash if we migrate
"QEMU-2.12 -machine pc-2.11 -cpu ...,+hv-time" to a host running
QEMU 2.11 and Linux < 4.14.  I wonder if there's a way to avoid
that?  If there's a way to avoid that with extra migration
subsections, is it worth the effort/complexity?
Roman Kagan March 23, 2018, 9:45 a.m. UTC | #14
On Thu, Mar 22, 2018 at 03:38:13PM -0300, Eduardo Habkost wrote:
> On Thu, Mar 22, 2018 at 04:58:03PM +0300, Roman Kagan wrote:
> > On Thu, Mar 22, 2018 at 10:22:18AM -0300, Eduardo Habkost wrote:
> > > On Thu, Mar 22, 2018 at 04:00:14PM +0300, Roman Kagan wrote:
> > > > On Wed, Mar 21, 2018 at 05:19:24PM -0300, Eduardo Habkost wrote:
> > > > > On Wed, Mar 21, 2018 at 07:57:29PM +0300, Roman Kagan wrote:
> > > > > > On Wed, Mar 21, 2018 at 02:18:54PM +0100, Vitaly Kuznetsov wrote:
> > > > > > > Roman Kagan <rkagan@virtuozzo.com> writes:
> > > > > > > 
> > > > > > > > On Tue, Mar 20, 2018 at 06:35:00PM +0100, Vitaly Kuznetsov wrote:
> > > > > > > >> Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS
> > > > > > > >> nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when
> > > > > > > >> Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean
> > > > > > > >> these frequencies are stable, it just means they're available for reading.
> > > > > > > >> 
> > > > > > > >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > > > > > >> ---
> > > > > > > >>  target/i386/kvm.c | 2 +-
> > > > > > > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > >> 
> > > > > > > >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> > > > > > > >> index 7d9f9ca0b1..74fc3d3b2c 100644
> > > > > > > >> --- a/target/i386/kvm.c
> > > > > > > >> +++ b/target/i386/kvm.c
> > > > > > > >> @@ -651,7 +651,7 @@ static int hyperv_handle_properties(CPUState *cs)
> > > > > > > >>          env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE;
> > > > > > > >>          env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE;
> > > > > > > >>  
> > > > > > > >> -        if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) {
> > > > > > > >> +        if (has_msr_hv_frequencies && env->tsc_khz) {
> > > > > > > >>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
> > > > > > > >>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
> > > > > > > >>          }
> > > > > > > >
> > > > > > > > I suggest that we add a corresponding cpu property here, too.  The guest
> > > > > > > > may legitimately rely on these msrs when it sees the support in CPUID,
> > > > > > > > and migrating from a kernel with the feature supported (4.14+) to an
> > > > > > > > older one will make it crash.
> > > > > > > >
> > > > > > > 
> > > > > > > This can be arranged, but what happens to people who use these features
> > > > > > > today? Assuming they also passed 'invtsc' they have stable TSC page
> > > > > > > clocksource already (when Hyper-V role is enabled) but when we start
> > > > > > > requesting a new 'hv_frequency' cpu property they'll suddenly lose what
> > > > > > > they have...
> > > > > > 
> > > > > > I see two cases here:
> > > > > > 
> > > > > > 1) people start a new VM, and discover that their old configuration is
> > > > > >    not enough for this feature to work.
> > > > > > 
> > > > > >    They need to reconfigure and restart the VM.  This costs them some
> > > > > >    time investigating and restarting, but not data.
> > > > > 
> > > > > If we keep machine-type compatibility, people will need to do
> > > > > that only if they change the machine-type (or use the "pc" or
> > > > > "q35" aliases).  If they copy the old configuration, it will keep
> > > > > working.
> > > > 
> > > > The problem is that the feature is not fixed by the machine-type, due to
> > > > the forgotten property: it only depends on the KVM version.  So, once
> > > > (if) we add the property and make the feature deterministic, we'll lose
> > > > compatibility one way or another.
> > > > 
> > > > Or are you suggesting that for pre-2.12 machine types we leave the
> > > > property at "decided by your KVM" state?
> > > 
> > > Yes, that's what I mean.  This looks like the only way to avoid
> > > losing features by just cold-rebooting an existing VM.
> > > 
> > > The scenario I'm thinking is this:
> > > 
> > > 1) pc-2.11 VM started on host running QEMU 2.11
> > > 2) VM migrated to a host containing this patch
> > > 3) 1 year later, the VM is shut down and booted again.
> > > 4) Things stop working inside the VM because hv-frequency is
> > >    unexpectedly gone.
> > > 
> > > Machine-type compatibility code would avoid (4).
> > 
> > Right, but (4) typically means that you fail to start your workload from
> > a clean state, so you just go and fix it; no data is lost.
> > 
> > Compare this to a migration to an older KVM which results in your guest
> > crashing, where you risk data loss and still have to meddle with
> > configs.
> 
> True. To make it worse, we are already unable to avoid this crash
> on existing VMs without a reboot.  The only case where we can fix
> this is if live-migration to older KVM happens after the guest
> was rebooted when running on a newer QEMU version.  :(

Hmm, I thought the scheme I outlined below covered (== blocked) live
migration QEMU-2.11/KVM-4.14+ -> QEMU-2.12(machine-2.11)/KVM-4.13-,
didn't it?

> > > > > machine-type compatibility also makes the following case a bit
> > > > > safer:
> > > > > 
> > > > > > 
> > > > > > 2) people migrate from a QEMU without ->hv_frequency, to a new one with
> > > > > >    ->hv_frequency=off (assuming on both ends KVM supports the frequency
> > > > > >    MSRs).
> > > > > > 
> > > > > >    With the current implementation in KVM, this will only result in the
> > > > > >    feature bits disappearing from the respective CPUID leaf, but the
> > > > > >    MSRs themselves will continue working as they used to.  So the guest
> > > > > >    either won't notice or will check the CPUID and adjust.
> > > > > 
> > > > > If we keep machine-type compatibility, the CPUID bit won't
> > > > > disappear for the guest while the MSRs keep working.
> > > > > 
> > > > > 
> > > > > Whichever solution we choose, we can still have guests crashing
> > > > > if migrating a pc-2.11 machine from a 4.14+ host kernel to a host
> > > > > with an older kernel.  But I don't think there's a way out of
> > > > > this, except requiring an explicit "hv-frequencies" CPU option on
> > > > > newer machine-types.
> > > > 
> > > > What's wrong with requiring it, as we do for all other hv_* properties?
> > > 
> > > On new machine-types, nothing wrong.
> > > 
> > > On existing machine-types, see above.
> > 
> > I wonder if the following can cater to all relevant cases:
> > 
> > - hv_frequencies property is added, defaulting to "off", so that new
> >   users of this feature would need to explicitly turn it on;
> > 
> > - on pre-2.12 machine types, it's set to the value of hv_time property
> >   by the compat code, so that on VMs where this feature could
> >   potentially be present it would become required; as a result, these
> >   configurations will refuse to start on insufficiently capable KVM,
> >   preventing the migration attempts.
> 
> This sounds like the safest option.  The cost will be the
> inconvenience of being unable to run pc-2.11 on hosts with older
> KVM (Linux < v4.14, without commit
> 72c139bacfa386145d7bbb68c47c8824716153b6),

not completely unable: people will have to add "hv_frequencies=off" to
their cpu spec

> and the need to explicitly enable hv-frequencies on pc-2.12 and newer.

which is the standard situation for all new features.

> > Am I missing any scenarios that aren't covered?
> > 
> 
> It looks like the guest can still crash if we migrate
> "QEMU-2.12 -machine pc-2.11 -cpu ...,+hv-time" to a host running
> QEMU 2.11 and Linux < 4.14.

Indeed :(

> I wonder if there's a way to avoid that?  If there's a way to avoid
> that with extra migration subsections,

I guess this should work.

> is it worth the effort/complexity?

This is a judgement call.  For vendors this is a non-issue because most
of them haven't even started shipping 2.11, so they just don't have VMs
with this problem in the field.

So, taking the effort/complexity vs safety tradeoff into account, we can
consider an alternative approach: just add hv_frequencies (default=off)
cpu property to 2.12 and 2.11-stable, and ignore the cases where it's
run on QEMU versions without explicit control over this feature.  Would
it be too much against the current policy?

Roman.
Eduardo Habkost March 23, 2018, 7:48 p.m. UTC | #15
On Fri, Mar 23, 2018 at 12:45:30PM +0300, Roman Kagan wrote:
> On Thu, Mar 22, 2018 at 03:38:13PM -0300, Eduardo Habkost wrote:
> > On Thu, Mar 22, 2018 at 04:58:03PM +0300, Roman Kagan wrote:
> > > On Thu, Mar 22, 2018 at 10:22:18AM -0300, Eduardo Habkost wrote:
> > > > On Thu, Mar 22, 2018 at 04:00:14PM +0300, Roman Kagan wrote:
> > > > > On Wed, Mar 21, 2018 at 05:19:24PM -0300, Eduardo Habkost wrote:
> > > > > > On Wed, Mar 21, 2018 at 07:57:29PM +0300, Roman Kagan wrote:
> > > > > > > On Wed, Mar 21, 2018 at 02:18:54PM +0100, Vitaly Kuznetsov wrote:
> > > > > > > > Roman Kagan <rkagan@virtuozzo.com> writes:
> > > > > > > > 
> > > > > > > > > On Tue, Mar 20, 2018 at 06:35:00PM +0100, Vitaly Kuznetsov wrote:
> > > > > > > > >> Requiring tsc_is_stable_and_known() is too restrictive: even without INVTCS
> > > > > > > > >> nested Hyper-V-on-KVM enables TSC pages for its guests e.g. when
> > > > > > > > >> Reenlightenment MSRs are present. Presence of frequency MSRs doesn't mean
> > > > > > > > >> these frequencies are stable, it just means they're available for reading.
> > > > > > > > >> 
> > > > > > > > >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > > > > > > >> ---
> > > > > > > > >>  target/i386/kvm.c | 2 +-
> > > > > > > > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >> 
> > > > > > > > >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> > > > > > > > >> index 7d9f9ca0b1..74fc3d3b2c 100644
> > > > > > > > >> --- a/target/i386/kvm.c
> > > > > > > > >> +++ b/target/i386/kvm.c
> > > > > > > > >> @@ -651,7 +651,7 @@ static int hyperv_handle_properties(CPUState *cs)
> > > > > > > > >>          env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE;
> > > > > > > > >>          env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE;
> > > > > > > > >>  
> > > > > > > > >> -        if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) {
> > > > > > > > >> +        if (has_msr_hv_frequencies && env->tsc_khz) {
> > > > > > > > >>              env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
> > > > > > > > >>              env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
> > > > > > > > >>          }
> > > > > > > > >
> > > > > > > > > I suggest that we add a corresponding cpu property here, too.  The guest
> > > > > > > > > may legitimately rely on these msrs when it sees the support in CPUID,
> > > > > > > > > and migrating from a kernel with the feature supported (4.14+) to an
> > > > > > > > > older one will make it crash.
> > > > > > > > >
> > > > > > > > 
> > > > > > > > This can be arranged, but what happens to people who use these features
> > > > > > > > today? Assuming they also passed 'invtsc' they have stable TSC page
> > > > > > > > clocksource already (when Hyper-V role is enabled) but when we start
> > > > > > > > requesting a new 'hv_frequency' cpu property they'll suddenly lose what
> > > > > > > > they have...
> > > > > > > 
> > > > > > > I see two cases here:
> > > > > > > 
> > > > > > > 1) people start a new VM, and discover that their old configuration is
> > > > > > >    not enough for this feature to work.
> > > > > > > 
> > > > > > >    They need to reconfigure and restart the VM.  This costs them some
> > > > > > >    time investigating and restarting, but not data.
> > > > > > 
> > > > > > If we keep machine-type compatibility, people will need to do
> > > > > > that only if they change the machine-type (or use the "pc" or
> > > > > > "q35" aliases).  If they copy the old configuration, it will keep
> > > > > > working.
> > > > > 
> > > > > The problem is that the feature is not fixed by the machine-type, due to
> > > > > the forgotten property: it only depends on the KVM version.  So, once
> > > > > (if) we add the property and make the feature deterministic, we'll lose
> > > > > compatibility one way or another.
> > > > > 
> > > > > Or are you suggesting that for pre-2.12 machine types we leave the
> > > > > property at "decided by your KVM" state?
> > > > 
> > > > Yes, that's what I mean.  This looks like the only way to avoid
> > > > losing features by just cold-rebooting an existing VM.
> > > > 
> > > > The scenario I'm thinking is this:
> > > > 
> > > > 1) pc-2.11 VM started on host running QEMU 2.11
> > > > 2) VM migrated to a host containing this patch
> > > > 3) 1 year later, the VM is shut down and booted again.
> > > > 4) Things stop working inside the VM because hv-frequency is
> > > >    unexpectedly gone.
> > > > 
> > > > Machine-type compatibility code would avoid (4).
> > > 
> > > Right, but (4) typically means that you fail to start your workload from
> > > a clean state, so you just go and fix it; no data is lost.
> > > 
> > > Compare this to a migration to an older KVM which results in your guest
> > > crashing, where you risk data loss and still have to meddle with
> > > configs.
> > 
> > True. To make it worse, we are already unable to avoid this crash
> > on existing VMs without a reboot.  The only case where we can fix
> > this is if live-migration to older KVM happens after the guest
> > was rebooted when running on a newer QEMU version.  :(
> 
> Hmm, I thought the scheme I outlined below covered (== blocked) live
> migration QEMU-2.11/KVM-4.14+ -> QEMU-2.12(machine-2.11)/KVM-4.13-,
> didn't it?

It should, but what about migration
QEMU-2.12(pc-2.11)/KVM-4.14 -> QEMU-2.11(pc-2.11)/KVM-4.13?

Or, more specifically:
QEMU-2.11(pc-2.11)/KVM-4.14 ->
QEMU-2.12(pc-2.11)/KVM-4.14 -> QEMU-2.11(machine-2.11)/KVM-4.13?

> 
> > > > > > machine-type compatibility also makes the following case a bit
> > > > > > safer:
> > > > > > 
> > > > > > > 
> > > > > > > 2) people migrate from a QEMU without ->hv_frequency, to a new one with
> > > > > > >    ->hv_frequency=off (assuming on both ends KVM supports the frequency
> > > > > > >    MSRs).
> > > > > > > 
> > > > > > >    With the current implementation in KVM, this will only result in the
> > > > > > >    feature bits disappearing from the respective CPUID leaf, but the
> > > > > > >    MSRs themselves will continue working as they used to.  So the guest
> > > > > > >    either won't notice or will check the CPUID and adjust.
> > > > > > 
> > > > > > If we keep machine-type compatibility, the CPUID bit won't
> > > > > > disappear for the guest while the MSRs keep working.
> > > > > > 
> > > > > > 
> > > > > > Whichever solution we choose, we can still have guests crashing
> > > > > > if migrating a pc-2.11 machine from a 4.14+ host kernel to a host
> > > > > > with an older kernel.  But I don't think there's a way out of
> > > > > > this, except requiring an explicit "hv-frequencies" CPU option on
> > > > > > newer machine-types.
> > > > > 
> > > > > What's wrong with requiring it, as we do for all other hv_* properties?
> > > > 
> > > > On new machine-types, nothing wrong.
> > > > 
> > > > On existing machine-types, see above.
> > > 
> > > I wonder if the following can cater to all relevant cases:
> > > 
> > > - hv_frequencies property is added, defaulting to "off", so that new
> > >   users of this feature would need to explicitly turn it on;
> > > 
> > > - on pre-2.12 machine types, it's set to the value of hv_time property
> > >   by the compat code, so that on VMs where this feature could
> > >   potentially be present it would become required; as a result, these
> > >   configurations will refuse to start on insufficiently capable KVM,
> > >   preventing the migration attempts.
> > 
> > This sounds like the safest option.  The cost will be the
> > inconvenience of being unable to run pc-2.11 on hosts with older
> > KVM (Linux < v4.14, without commit
> > 72c139bacfa386145d7bbb68c47c8824716153b6),
> 
> not completely unable: people will have to add "hv_frequencies=off" to
> their cpu spec

This is different from the patch you sent, which sets
hv-frequencies=off by default on pc-2.11 too.

(And now I see you described this approach in the last paragraph below. :)

> 
> > and the need to explicitly enable hv-frequencies on pc-2.12 and newer.
> 
> which is the standard situation for all new features.

Yes, no question on what we want to do on pc-2.12.

> 
> > > Am I missing any scenarios that aren't covered?
> > > 
> > 
> > It looks like the guest can still crash if we migrate
> > "QEMU-2.12 -machine pc-2.11 -cpu ...,+hv-time" to a host running
> > QEMU 2.11 and Linux < 4.14.
> 
> Indeed :(

Well, your patch fixes it by not enabling hv-frequencies by
default on any machine-type.  Do you see any gotchas?


> 
> > I wonder if there's a way to avoid that?  If there's a way to avoid
> > that with extra migration subsections,
> 
> I guess this should work.
> 
> > is it worth the effort/complexity?
> 
> This is a judgement call.  For vendors this is a non-issue because most
> of them haven't even started shipping 2.11, so they just don't have VMs
> with this problem in the field.
> 
> So, taking the effort/complexity vs safety tradeoff into account, we can
> consider an alternative approach: just add hv_frequencies (default=off)
> cpu property to 2.12 and 2.11-stable, and ignore the cases where it's
> run on QEMU versions without explicit control over this feature.  Would
> it be too much against the current policy?
> 

With this, we can just declare that QEMU v2.11.0 + Linux 4.14+
was broken, and advise people to upgrade QEMU.

I think is the most reasonable option we have.  See my reply to
the patch you sent.
Roman Kagan March 26, 2018, 2:20 p.m. UTC | #16
On Fri, Mar 23, 2018 at 04:48:27PM -0300, Eduardo Habkost wrote:
> On Fri, Mar 23, 2018 at 12:45:30PM +0300, Roman Kagan wrote:
> > On Thu, Mar 22, 2018 at 03:38:13PM -0300, Eduardo Habkost wrote:
> > > It looks like the guest can still crash if we migrate
> > > "QEMU-2.12 -machine pc-2.11 -cpu ...,+hv-time" to a host running
> > > QEMU 2.11 and Linux < 4.14.
> > 
> > Indeed :(
> 
> Well, your patch fixes it by not enabling hv-frequencies by
> default on any machine-type.  Do you see any gotchas?

Only the need to add a new property in -stable.  I used to think that
was frowned upon.

> > So, taking the effort/complexity vs safety tradeoff into account, we can
> > consider an alternative approach: just add hv_frequencies (default=off)
> > cpu property to 2.12 and 2.11-stable, and ignore the cases where it's
> > run on QEMU versions without explicit control over this feature.  Would
> > it be too much against the current policy?
> > 
> 
> With this, we can just declare that QEMU v2.11.0 + Linux 4.14+
> was broken, and advise people to upgrade QEMU.
> 
> I think is the most reasonable option we have.  See my reply to
> the patch you sent.

Great, let's try going down this route.

Thanks,
Roman.
diff mbox

Patch

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 7d9f9ca0b1..74fc3d3b2c 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -651,7 +651,7 @@  static int hyperv_handle_properties(CPUState *cs)
         env->features[FEAT_HYPERV_EAX] |= HV_TIME_REF_COUNT_AVAILABLE;
         env->features[FEAT_HYPERV_EAX] |= HV_REFERENCE_TSC_AVAILABLE;
 
-        if (has_msr_hv_frequencies && tsc_is_stable_and_known(env)) {
+        if (has_msr_hv_frequencies && env->tsc_khz) {
             env->features[FEAT_HYPERV_EAX] |= HV_ACCESS_FREQUENCY_MSRS;
             env->features[FEAT_HYPERV_EDX] |= HV_FREQUENCY_MSRS_AVAILABLE;
         }