diff mbox series

[3/3] i386: Make sure kvm_arch_set_tsc_khz() succeeds on migration when 'hv-reenlightenment' was exposed

Message ID 20210318160249.1084178-4-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show
Series i386: Make sure TSC frequency is preserved across migration when Hyper-V reenlightenment is in use | expand

Commit Message

Vitaly Kuznetsov March 18, 2021, 4:02 p.m. UTC
KVM doesn't fully support Hyper-V reenlightenment notifications on
migration. In particular, it doesn't support emulating TSC frequency
of the source host by trapping all TSC accesses so unless TSC scaling
is supported on the destination host and KVM_SET_TSC_KHZ succeeds, it
is unsafe to proceed with migration.

Normally, we only require KVM_SET_TSC_KHZ to succeed when 'user_tsc_khz'
was set and just 'try' KVM_SET_TSC_KHZ without otherwise.

Introduce a new vmstate section (which is added when the guest has
reenlightenment feature enabled) and add env.tsc_khz to it. We already
have env.tsc_khz packed in 'cpu/tsc_khz' but we don't want to be dependent
on the section order.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 target/i386/kvm/hyperv.h |  1 +
 target/i386/kvm/kvm.c    | 11 +++++++++++
 target/i386/machine.c    | 37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+)

Comments

Paolo Bonzini March 18, 2021, 4:12 p.m. UTC | #1
On 18/03/21 17:02, Vitaly Kuznetsov wrote:
> KVM doesn't fully support Hyper-V reenlightenment notifications on
> migration. In particular, it doesn't support emulating TSC frequency
> of the source host by trapping all TSC accesses so unless TSC scaling
> is supported on the destination host and KVM_SET_TSC_KHZ succeeds, it
> is unsafe to proceed with migration.
> 
> Normally, we only require KVM_SET_TSC_KHZ to succeed when 'user_tsc_khz'
> was set and just 'try' KVM_SET_TSC_KHZ without otherwise.
> 
> Introduce a new vmstate section (which is added when the guest has
> reenlightenment feature enabled) and add env.tsc_khz to it. We already
> have env.tsc_khz packed in 'cpu/tsc_khz' but we don't want to be dependent
> on the section order.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

Could we instead fail to load the reenlightenment section if 
user_tsc_khz was not set?  This seems to be user (well, management) 
error really, since reenlightenment has to be enabled manually (or with 
hv-passthrough which blocks migration too).

Paolo
Vitaly Kuznetsov March 18, 2021, 4:38 p.m. UTC | #2
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 18/03/21 17:02, Vitaly Kuznetsov wrote:
>> KVM doesn't fully support Hyper-V reenlightenment notifications on
>> migration. In particular, it doesn't support emulating TSC frequency
>> of the source host by trapping all TSC accesses so unless TSC scaling
>> is supported on the destination host and KVM_SET_TSC_KHZ succeeds, it
>> is unsafe to proceed with migration.
>> 
>> Normally, we only require KVM_SET_TSC_KHZ to succeed when 'user_tsc_khz'
>> was set and just 'try' KVM_SET_TSC_KHZ without otherwise.
>> 
>> Introduce a new vmstate section (which is added when the guest has
>> reenlightenment feature enabled) and add env.tsc_khz to it. We already
>> have env.tsc_khz packed in 'cpu/tsc_khz' but we don't want to be dependent
>> on the section order.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>
> Could we instead fail to load the reenlightenment section if 
> user_tsc_khz was not set?  This seems to be user (well, management) 
> error really, since reenlightenment has to be enabled manually (or with 
> hv-passthrough which blocks migration too).

Yes, we certainly could do that but what's the added value of
user_tsc_khz which upper layer will have to set explicitly (probably to
the tsc frequency of the source host anyway)? In case we just want to
avoid calling KVM_SET_TSC_KHZ twice, we can probably achieve that by
adding a CPU flag or something.
Paolo Bonzini March 18, 2021, 5:36 p.m. UTC | #3
On 18/03/21 17:38, Vitaly Kuznetsov wrote:
>> Could we instead fail to load the reenlightenment section if
>> user_tsc_khz was not set?  This seems to be user (well, management)
>> error really, since reenlightenment has to be enabled manually (or with
>> hv-passthrough which blocks migration too).
>
> Yes, we certainly could do that but what's the added value of
> user_tsc_khz which upper layer will have to set explicitly (probably to
> the tsc frequency of the source host anyway)? In case we just want to
> avoid calling KVM_SET_TSC_KHZ twice, we can probably achieve that by
> adding a CPU flag or something.

What I want to achieve is to forbid migration of VMs with 
reenlightenment, if they don't also specify tsc-khz to the frequency of 
the TSC on the source host.  We can't check it at the beginning of 
migration, but at least we can check it at the end.

Maybe we're talking about two different things?

Paolo
Marcelo Tosatti March 18, 2021, 6:03 p.m. UTC | #4
On Thu, Mar 18, 2021 at 05:38:00PM +0100, Vitaly Kuznetsov wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > On 18/03/21 17:02, Vitaly Kuznetsov wrote:
> >> KVM doesn't fully support Hyper-V reenlightenment notifications on
> >> migration. In particular, it doesn't support emulating TSC frequency
> >> of the source host by trapping all TSC accesses so unless TSC scaling
> >> is supported on the destination host and KVM_SET_TSC_KHZ succeeds, it
> >> is unsafe to proceed with migration.
> >> 
> >> Normally, we only require KVM_SET_TSC_KHZ to succeed when 'user_tsc_khz'
> >> was set and just 'try' KVM_SET_TSC_KHZ without otherwise.
> >> 
> >> Introduce a new vmstate section (which is added when the guest has
> >> reenlightenment feature enabled) and add env.tsc_khz to it. We already
> >> have env.tsc_khz packed in 'cpu/tsc_khz' but we don't want to be dependent
> >> on the section order.
> >> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >
> > Could we instead fail to load the reenlightenment section if 
> > user_tsc_khz was not set?  This seems to be user (well, management) 
> > error really, since reenlightenment has to be enabled manually (or with 
> > hv-passthrough which blocks migration too).

Seems to match the strategy of the patchset...

> Yes, we certainly could do that but what's the added value of
> user_tsc_khz which upper layer will have to set explicitly (probably to
> the tsc frequency of the source host anyway)?

Yes. I think what happened was "evolution":

1) Added support to set tsc frequency (with hardware multiplier)
in KVM, so add -tsc-khz VAL (kHz) option to KVM.

2) Scaling is enabled only if -tsc-khz VAL is supplied.

3) libvirt switches to using -tsc-khz HVAL, where HVAL it retrieves
from KVM_GET_TSC_KHZ of newly created KVM_CREATE_VM instance.

It could have been done inside qemu instead.

> In case we just want to avoid calling KVM_SET_TSC_KHZ twice, we can probably achieve that by
> adding a CPU flag or something.

Avoid calling KVM_SET_TSC_KHZ twice ? Don't see why you would avoid
that.
Dr. David Alan Gilbert March 18, 2021, 8:13 p.m. UTC | #5
* Vitaly Kuznetsov (vkuznets@redhat.com) wrote:
> KVM doesn't fully support Hyper-V reenlightenment notifications on
> migration. In particular, it doesn't support emulating TSC frequency
> of the source host by trapping all TSC accesses so unless TSC scaling
> is supported on the destination host and KVM_SET_TSC_KHZ succeeds, it
> is unsafe to proceed with migration.
> 
> Normally, we only require KVM_SET_TSC_KHZ to succeed when 'user_tsc_khz'
> was set and just 'try' KVM_SET_TSC_KHZ without otherwise.
> 
> Introduce a new vmstate section (which is added when the guest has
> reenlightenment feature enabled) and add env.tsc_khz to it. We already
> have env.tsc_khz packed in 'cpu/tsc_khz' but we don't want to be dependent
> on the section order.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  target/i386/kvm/hyperv.h |  1 +
>  target/i386/kvm/kvm.c    | 11 +++++++++++
>  target/i386/machine.c    | 37 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 49 insertions(+)
> 
> diff --git a/target/i386/kvm/hyperv.h b/target/i386/kvm/hyperv.h
> index 67543296c3a4..c65e5c85c4d3 100644
> --- a/target/i386/kvm/hyperv.h
> +++ b/target/i386/kvm/hyperv.h
> @@ -20,6 +20,7 @@
>  
>  #ifdef CONFIG_KVM
>  int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit *exit);
> +int kvm_hv_tsc_frequency_loaded(X86CPU *cpu);
>  #endif
>  
>  int hyperv_x86_synic_add(X86CPU *cpu);
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 7fe9f527103c..f6c4093778e9 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -1460,6 +1460,17 @@ static int hyperv_init_vcpu(X86CPU *cpu)
>      return 0;
>  }
>  
> +int kvm_hv_tsc_frequency_loaded(X86CPU *cpu)
> +{
> +    CPUState *cs = CPU(cpu);
> +
> +    /*
> +     * KVM doens't fully support re-enlightenment notifications so we need to
                 ^^
tpyo

Dave

> +     * make sure TSC frequency doesn't change upon migration.
> +     */
> +    return kvm_arch_set_tsc_khz(cs);
> +}
> +
>  static Error *invtsc_mig_blocker;
>  
>  #define KVM_MAX_CPUID_ENTRIES  100
> diff --git a/target/i386/machine.c b/target/i386/machine.c
> index 715620c58809..369a8f1e7a7a 100644
> --- a/target/i386/machine.c
> +++ b/target/i386/machine.c
> @@ -896,6 +896,42 @@ static const VMStateDescription vmstate_msr_hyperv_reenlightenment = {
>          VMSTATE_END_OF_LIST()
>      }
>  };
> +
> +static bool hyperv_tsc_frequency_needed(void *opaque)
> +{
> +    X86CPU *cpu = opaque;
> +    CPUX86State *env = &cpu->env;
> +
> +    return env->tsc_khz != 0 && (env->msr_hv_reenlightenment_control ||
> +                                 env->msr_hv_tsc_emulation_control);
> +}
> +
> +static int hyperv_tsc_frequency_post_load(void *opaque, int version_id)
> +{
> +    X86CPU *cpu = opaque;
> +    int r;
> +
> +    r = kvm_hv_tsc_frequency_loaded(cpu);
> +    if (r) {
> +        error_report("Failed to set the desired TSC frequency and "
> +                     "reenlightenment was exposed");
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_msr_hyperv_tsc_frequency = {
> +    .name = "cpu/msr_hyperv_tsc_frequency",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = hyperv_tsc_frequency_needed,
> +    .post_load = hyperv_tsc_frequency_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_INT64(env.tsc_khz, X86CPU),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
>  #endif
>  
>  static bool avx512_needed(void *opaque)
> @@ -1495,6 +1531,7 @@ VMStateDescription vmstate_x86_cpu = {
>          &vmstate_msr_hyperv_synic,
>          &vmstate_msr_hyperv_stimer,
>          &vmstate_msr_hyperv_reenlightenment,
> +        &vmstate_msr_hyperv_tsc_frequency,
>  #endif
>          &vmstate_avx512,
>          &vmstate_xss,
> -- 
> 2.30.2
>
Vitaly Kuznetsov March 19, 2021, 9:41 a.m. UTC | #6
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 18/03/21 17:38, Vitaly Kuznetsov wrote:
>>> Could we instead fail to load the reenlightenment section if
>>> user_tsc_khz was not set?  This seems to be user (well, management)
>>> error really, since reenlightenment has to be enabled manually (or with
>>> hv-passthrough which blocks migration too).
>>
>> Yes, we certainly could do that but what's the added value of
>> user_tsc_khz which upper layer will have to set explicitly (probably to
>> the tsc frequency of the source host anyway)? In case we just want to
>> avoid calling KVM_SET_TSC_KHZ twice, we can probably achieve that by
>> adding a CPU flag or something.
>
> What I want to achieve is to forbid migration of VMs with 
> reenlightenment, if they don't also specify tsc-khz to the frequency of 
> the TSC on the source host.  We can't check it at the beginning of 
> migration, but at least we can check it at the end.
>
> Maybe we're talking about two different things?

No, your suggestion basically extends mine and I'm just trying to
understand the benefit. With my suggestion, it is not required to
specify tsc-khz on the source, we just take 'native' tsc frequency as a
reference. Post-migration, we require that KVM_SET_TSC_KHZ succeeds (and
not just 'try' like kvm_arch_put_registers() does so we effectively
break migration when we are unable to set the desired TSC frequency
(also at the end).

With your suggestion to require user_tsc_khz (as I see it) it'll work
the exact same way but there's an additional burden on the
user/management to explicitly specify tsc-khz on the command line and I
believe that almost always this is going to match 'native' tsc frequency
of the source host.
Vitaly Kuznetsov March 19, 2021, 9:46 a.m. UTC | #7
Marcelo Tosatti <mtosatti@redhat.com> writes:

> On Thu, Mar 18, 2021 at 05:38:00PM +0100, Vitaly Kuznetsov wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> 
>> > On 18/03/21 17:02, Vitaly Kuznetsov wrote:
>> >> KVM doesn't fully support Hyper-V reenlightenment notifications on
>> >> migration. In particular, it doesn't support emulating TSC frequency
>> >> of the source host by trapping all TSC accesses so unless TSC scaling
>> >> is supported on the destination host and KVM_SET_TSC_KHZ succeeds, it
>> >> is unsafe to proceed with migration.
>> >> 
>> >> Normally, we only require KVM_SET_TSC_KHZ to succeed when 'user_tsc_khz'
>> >> was set and just 'try' KVM_SET_TSC_KHZ without otherwise.
>> >> 
>> >> Introduce a new vmstate section (which is added when the guest has
>> >> reenlightenment feature enabled) and add env.tsc_khz to it. We already
>> >> have env.tsc_khz packed in 'cpu/tsc_khz' but we don't want to be dependent
>> >> on the section order.
>> >> 
>> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> >
>> > Could we instead fail to load the reenlightenment section if 
>> > user_tsc_khz was not set?  This seems to be user (well, management) 
>> > error really, since reenlightenment has to be enabled manually (or with 
>> > hv-passthrough which blocks migration too).
>
> Seems to match the strategy of the patchset...
>
>> Yes, we certainly could do that but what's the added value of
>> user_tsc_khz which upper layer will have to set explicitly (probably to
>> the tsc frequency of the source host anyway)?
>
> Yes. I think what happened was "evolution":
>
> 1) Added support to set tsc frequency (with hardware multiplier)
> in KVM, so add -tsc-khz VAL (kHz) option to KVM.
>
> 2) Scaling is enabled only if -tsc-khz VAL is supplied.
>
> 3) libvirt switches to using -tsc-khz HVAL, where HVAL it retrieves
> from KVM_GET_TSC_KHZ of newly created KVM_CREATE_VM instance.
>
> It could have been done inside qemu instead.
>
>> In case we just want to avoid calling KVM_SET_TSC_KHZ twice, we can probably achieve that by
>> adding a CPU flag or something.
>
> Avoid calling KVM_SET_TSC_KHZ twice ? Don't see why you would avoid
> that.
>

Actually, we already do KVM_SET_TSC_KHZ twice, my patch adds just
another call for KVM_SET_TSC_KHZ. We already do one call in
kvm_arch_put_registers() but we don't propagate errors from it so in case
TSC scaling is unsupported, migration still succeeds and this is
intentional unless 'tsc-khz' was explicitly specified. When 'tsc-khz' is
specified, the error is propageted from kvm_arch_init_vcpu() (second
call site). We can also achieve the goal of this patch if we follow
Paolo's suggestion: just make 'tsc-khz' a must with reenlightenment.
Paolo Bonzini March 19, 2021, 11:04 a.m. UTC | #8
On 19/03/21 10:41, Vitaly Kuznetsov wrote:
>> What I want to achieve is to forbid migration of VMs with
>> reenlightenment, if they don't also specify tsc-khz to the frequency of
>> the TSC on the source host.  We can't check it at the beginning of
>> migration, but at least we can check it at the end.
>>
>> Maybe we're talking about two different things?
> No, your suggestion basically extends mine and I'm just trying to
> understand the benefit. With my suggestion, it is not required to
> specify tsc-khz on the source, we just take 'native' tsc frequency as a
> reference. Post-migration, we require that KVM_SET_TSC_KHZ succeeds (and
> not just 'try' like kvm_arch_put_registers() does so we effectively
> break migration when we are unable to set the desired TSC frequency
> (also at the end).

Oh, okay, I understand the confusion; I was thinking of checking for 
user_tsc_khz in the post-load function for reenlightenment, not in the 
command line processing.

Paolo
Vitaly Kuznetsov March 19, 2021, 12:02 p.m. UTC | #9
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 19/03/21 10:41, Vitaly Kuznetsov wrote:
>>> What I want to achieve is to forbid migration of VMs with
>>> reenlightenment, if they don't also specify tsc-khz to the frequency of
>>> the TSC on the source host.  We can't check it at the beginning of
>>> migration, but at least we can check it at the end.
>>>
>>> Maybe we're talking about two different things?
>> No, your suggestion basically extends mine and I'm just trying to
>> understand the benefit. With my suggestion, it is not required to
>> specify tsc-khz on the source, we just take 'native' tsc frequency as a
>> reference. Post-migration, we require that KVM_SET_TSC_KHZ succeeds (and
>> not just 'try' like kvm_arch_put_registers() does so we effectively
>> break migration when we are unable to set the desired TSC frequency
>> (also at the end).
>
> Oh, okay, I understand the confusion; I was thinking of checking for 
> user_tsc_khz in the post-load function for reenlightenment, not in the 
> command line processing.

Got it, yes, we can avoid this additional vmstate section and just add a
.post_load() hook to the existing vmstate_msr_hyperv_reenlightenment.
This will make 'tsc-frequency=' command line option mandatory for
successful migration with reenlightenment. Let me draft an alternative
patch.
diff mbox series

Patch

diff --git a/target/i386/kvm/hyperv.h b/target/i386/kvm/hyperv.h
index 67543296c3a4..c65e5c85c4d3 100644
--- a/target/i386/kvm/hyperv.h
+++ b/target/i386/kvm/hyperv.h
@@ -20,6 +20,7 @@ 
 
 #ifdef CONFIG_KVM
 int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit *exit);
+int kvm_hv_tsc_frequency_loaded(X86CPU *cpu);
 #endif
 
 int hyperv_x86_synic_add(X86CPU *cpu);
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 7fe9f527103c..f6c4093778e9 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1460,6 +1460,17 @@  static int hyperv_init_vcpu(X86CPU *cpu)
     return 0;
 }
 
+int kvm_hv_tsc_frequency_loaded(X86CPU *cpu)
+{
+    CPUState *cs = CPU(cpu);
+
+    /*
+     * KVM doens't fully support re-enlightenment notifications so we need to
+     * make sure TSC frequency doesn't change upon migration.
+     */
+    return kvm_arch_set_tsc_khz(cs);
+}
+
 static Error *invtsc_mig_blocker;
 
 #define KVM_MAX_CPUID_ENTRIES  100
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 715620c58809..369a8f1e7a7a 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -896,6 +896,42 @@  static const VMStateDescription vmstate_msr_hyperv_reenlightenment = {
         VMSTATE_END_OF_LIST()
     }
 };
+
+static bool hyperv_tsc_frequency_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    return env->tsc_khz != 0 && (env->msr_hv_reenlightenment_control ||
+                                 env->msr_hv_tsc_emulation_control);
+}
+
+static int hyperv_tsc_frequency_post_load(void *opaque, int version_id)
+{
+    X86CPU *cpu = opaque;
+    int r;
+
+    r = kvm_hv_tsc_frequency_loaded(cpu);
+    if (r) {
+        error_report("Failed to set the desired TSC frequency and "
+                     "reenlightenment was exposed");
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_msr_hyperv_tsc_frequency = {
+    .name = "cpu/msr_hyperv_tsc_frequency",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = hyperv_tsc_frequency_needed,
+    .post_load = hyperv_tsc_frequency_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_INT64(env.tsc_khz, X86CPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
 #endif
 
 static bool avx512_needed(void *opaque)
@@ -1495,6 +1531,7 @@  VMStateDescription vmstate_x86_cpu = {
         &vmstate_msr_hyperv_synic,
         &vmstate_msr_hyperv_stimer,
         &vmstate_msr_hyperv_reenlightenment,
+        &vmstate_msr_hyperv_tsc_frequency,
 #endif
         &vmstate_avx512,
         &vmstate_xss,