Message ID | 20200102203926.1179743-1-vkuznets@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] i386/kvm: fix enlightened VMCS with fine-grained VMX feature enablement | expand |
On 02/01/20 21:39, Vitaly Kuznetsov wrote: > When enlightened VMCS is enabled, certain VMX controls disappear, e.g. > posted interrupts for PINBASED_CTLS. With fine-grained VMX feature > enablement QEMU tries to do KVM_SET_MSRS with default (matching CPU > model) values and fails as KVM doesn't allow to set now-unsupported > controls. > > The ideal solution for the issue would probably be to re-read VMX > feature MSRs after enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS, however, > this doesn't seem to be possible: currently, KVM returns global > &vmcs_config.nested values for VMX MSRs when userspace does KVM_GET_MSR. > > It is also possible to modify KVM to apply 'evmcs filtering' to VMX > MSRs when userspace tries to set them and hide the issue but this doesn't > seem to be entirely correct. > > It is unfortunate that we now need to support the list of VMX features > disabled by enlightened VMCS in QEMU. When (and if) enlightened VMCS v2 > arrives we'll need to fix QEMU and allow previously disabled features. > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > --- > - I don't quite like this workaround myself, thus RFC. I'm sure someone > will suggest a better alternative. The patch itself is not a big deal, the only thing is that we should probably check that we get warnings if the "forbidden" features are explicitly requested by the user. On the other hand, for something like "-cpu Haswell,vmx,hv_evmcs" there should be no warnings. That said, I'm not sure about the whole idea of disabling features, even in the kernel. I would prefer if the guest knew that it cannot use these features if using eVMCS. Is this the case for Windows? If so, we should teach guest-side KVM about this, not QEMU. Paolo > --- > target/i386/kvm.c | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/target/i386/kvm.c b/target/i386/kvm.c > index 0b511906e3fe..1b0589b79358 100644 > --- a/target/i386/kvm.c > +++ b/target/i386/kvm.c > @@ -1198,6 +1198,30 @@ static int hyperv_handle_properties(CPUState *cs, > } > > if (!r) { > + /* > + * Certain VMX controls are unsupported when enlightened VMCS is > + * enabled, filter them out here so we don't attempt to set them > + * with KVM_SET_MSR even if they are supported by CPU model. > + * The list below is for eVMCS version 1. > + */ > + env->features[FEAT_VMX_PINBASED_CTLS] &= > + ~(VMX_PIN_BASED_VMX_PREEMPTION_TIMER | > + VMX_PIN_BASED_POSTED_INTR); > + env->features[FEAT_VMX_SECONDARY_CTLS] &= > + ~(VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | > + VMX_SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | > + VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT | > + VMX_SECONDARY_EXEC_ENABLE_PML | > + VMX_SECONDARY_EXEC_ENABLE_VMFUNC | > + VMX_SECONDARY_EXEC_SHADOW_VMCS | > + /* VMX_SECONDARY_EXEC_TSC_SCALING | */ > + VMX_SECONDARY_EXEC_PAUSE_LOOP_EXITING); > + env->features[FEAT_VMX_ENTRY_CTLS] &= > + ~VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; > + env->features[FEAT_VMX_EXIT_CTLS] &= > + ~VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; > + env->features[FEAT_VMX_VMFUNC] &= ~MSR_VMX_VMFUNC_EPT_SWITCHING; > + > env->features[FEAT_HV_RECOMM_EAX] |= > HV_ENLIGHTENED_VMCS_RECOMMENDED; > env->features[FEAT_HV_NESTED_EAX] = evmcs_version; >
Paolo Bonzini <pbonzini@redhat.com> writes: > On 02/01/20 21:39, Vitaly Kuznetsov wrote: >> When enlightened VMCS is enabled, certain VMX controls disappear, e.g. >> posted interrupts for PINBASED_CTLS. With fine-grained VMX feature >> enablement QEMU tries to do KVM_SET_MSRS with default (matching CPU >> model) values and fails as KVM doesn't allow to set now-unsupported >> controls. >> >> The ideal solution for the issue would probably be to re-read VMX >> feature MSRs after enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS, however, >> this doesn't seem to be possible: currently, KVM returns global >> &vmcs_config.nested values for VMX MSRs when userspace does KVM_GET_MSR. >> >> It is also possible to modify KVM to apply 'evmcs filtering' to VMX >> MSRs when userspace tries to set them and hide the issue but this doesn't >> seem to be entirely correct. >> >> It is unfortunate that we now need to support the list of VMX features >> disabled by enlightened VMCS in QEMU. When (and if) enlightened VMCS v2 >> arrives we'll need to fix QEMU and allow previously disabled features. >> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> >> --- >> - I don't quite like this workaround myself, thus RFC. I'm sure someone >> will suggest a better alternative. > > The patch itself is not a big deal, the only thing is that we should > probably check that we get warnings if the "forbidden" features are > explicitly requested by the user. On the other hand, for something like > "-cpu Haswell,vmx,hv_evmcs" there should be no warnings. > > That said, I'm not sure about the whole idea of disabling features, even > in the kernel. I would prefer if the guest knew that it cannot use > these features if using eVMCS. Is this the case for Windows? Honestly I forgot the story why we filtered out these features upon eVMCS enablement in KVM. As there are no corresponding eVMCS fields, there's no way a guest can actually use them. I'm going to check that nothing breaks if we remove the filter. I'll go and test Hyper-V 2016 and 2019. > If so, we should teach guest-side KVM about this, not QEMU. This is not required when enabling eVMCS on a genuine Hyper-V because it correctly filters out unsupported features, however, to not break KVM-on-KVM-using-eVMCS case we'll have to move the filter from host to guest. Thanks!
On 07/01/20 13:08, Vitaly Kuznetsov wrote: > Honestly I forgot the story why we filtered out these features upon > eVMCS enablement in KVM. As there are no corresponding eVMCS fields, > there's no way a guest can actually use them. Well, mostly because we mimicked what Hyper-V was doing I guess. > I'm going to check that nothing breaks if we remove the filter. I'll go > and test Hyper-V 2016 and 2019. KVM would break, right? But we can mark that patch as stable material. Paolo >> If so, we should teach guest-side KVM about this, not QEMU. > > This is not required when enabling eVMCS on a genuine Hyper-V because it > correctly filters out unsupported features, however, to not break > KVM-on-KVM-using-eVMCS case we'll have to move the filter from host to > guest. > > Thanks! >
Paolo Bonzini <pbonzini@redhat.com> writes: > On 07/01/20 13:08, Vitaly Kuznetsov wrote: >> Honestly I forgot the story why we filtered out these features upon >> eVMCS enablement in KVM. As there are no corresponding eVMCS fields, >> there's no way a guest can actually use them. > > Well, mostly because we mimicked what Hyper-V was doing I guess. > An update from reverse-engineering trenches. I ran some tests to see if we can just drop the filtering and there is only one problematic control which Hyper-V enables: SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES the problem with it is that we don't have 'apic_access_addr' field in eVMCS ('virtual_apic_page_addr' is there). By running the same setup with eVMCS disabled I figured out which address can be hardcoded to make it boot. My guess was that the fields is present but not documented properly, I tried scanning eVMCS for the value but with no luck so far. I'll try to fish some information out of Microsoft.
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES > > the problem with it is that we don't have 'apic_access_addr' field in > eVMCS ('virtual_apic_page_addr' is there). By running the same setup > with eVMCS disabled I figured out which address can be hardcoded to make > it boot. > Maybe it's really hard coded (what is the value? Is it consistent across Hyper-V version?) Can you try changing KVM to enable it and see if the hardcoded APIC access address works? Paolo >
Paolo Bonzini <pbonzini@redhat.com> writes: > SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES >> >> the problem with it is that we don't have 'apic_access_addr' field in >> eVMCS ('virtual_apic_page_addr' is there). By running the same setup >> with eVMCS disabled I figured out which address can be hardcoded to make >> it boot. >> > > Maybe it's really hard coded (what is the value? Is it consistent across > Hyper-V version?) Can you try changing KVM to enable it and see if the > hardcoded APIC access address works? Unfortunately, it's not the same even between BIOS and UEFI booted WS2016 (Gen1/Gen2). It is, however, the same across reboot for the same image so for example for WS2016Gen1 it is '0x294000' so if I do: diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index fab7451a5793..8366b2a02b3b 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1433,6 +1433,8 @@ static int copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx) vmcs12->tpr_threshold = evmcs->tpr_threshold; vmcs12->guest_rip = evmcs->guest_rip; + vmcs12->apic_access_addr = 0x294000; it all works. I'm really puzzled.
Paolo Bonzini <pbonzini@redhat.com> writes: > On 07/01/20 13:08, Vitaly Kuznetsov wrote: >> Honestly I forgot the story why we filtered out these features upon >> eVMCS enablement in KVM. As there are no corresponding eVMCS fields, >> there's no way a guest can actually use them. > > Well, mostly because we mimicked what Hyper-V was doing I guess. > >> I'm going to check that nothing breaks if we remove the filter. I'll go >> and test Hyper-V 2016 and 2019. > > KVM would break, right? But we can mark that patch as stable material. > While we are trying to understand how APIC virtualization works without apic_access_addr field (maybe it doesn't?), should we fix the immediate issue with QEMU-4.2 with a hack like: diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c index 72359709cdc1..038297e63396 100644 --- a/arch/x86/kvm/vmx/evmcs.c +++ b/arch/x86/kvm/vmx/evmcs.c @@ -357,15 +357,15 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu, if (vmcs_version) *vmcs_version = nested_get_evmcs_version(vcpu); - /* We don't support disabling the feature for simplicity. */ - if (evmcs_already_enabled) - return 0; - - vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL; - vmx->nested.msrs.entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL; - vmx->nested.msrs.exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL; - vmx->nested.msrs.secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC; - vmx->nested.msrs.vmfunc_controls &= ~EVMCS1_UNSUPPORTED_VMFUNC; - return 0; } + +void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata) { + /* + * Enlightened VMCS doesn't have apic_access_addr field but Hyper-V + * still tries to enable SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES when + * it is available, filter it out + */ + if (msr_index == MSR_IA32_VMX_PROCBASED_CTLS2) + *pdata &= ~((u64)SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES << 32); +} diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h index 07ebf6882a45..b88d9807a796 100644 --- a/arch/x86/kvm/vmx/evmcs.h +++ b/arch/x86/kvm/vmx/evmcs.h @@ -201,5 +201,6 @@ bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa); uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu); int nested_enable_evmcs(struct kvm_vcpu *vcpu, uint16_t *vmcs_version); +void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata); #endif /* __KVM_X86_VMX_EVMCS_H */ diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e3394c839dea..8eb74618b8d8 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1849,8 +1849,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC: if (!nested_vmx_allowed(vcpu)) return 1; - return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index, - &msr_info->data); + if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index, + &msr_info->data)) + return 1; + if (!msr_info->host_initiated && + vmx->nested.enlightened_vmcs_enabled) + nested_evmcs_filter_control_msr(msr_info->index, + &msr_info->data); + break; case MSR_IA32_RTIT_CTL: if (pt_mode != PT_MODE_HOST_GUEST) return 1; This should probably be complemented with a patch to not enable unsupported controls when KVM is acting as a guest on eVMCS + a check that none of the unsupported controls are enabled. What do you think?
diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 0b511906e3fe..1b0589b79358 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -1198,6 +1198,30 @@ static int hyperv_handle_properties(CPUState *cs, } if (!r) { + /* + * Certain VMX controls are unsupported when enlightened VMCS is + * enabled, filter them out here so we don't attempt to set them + * with KVM_SET_MSR even if they are supported by CPU model. + * The list below is for eVMCS version 1. + */ + env->features[FEAT_VMX_PINBASED_CTLS] &= + ~(VMX_PIN_BASED_VMX_PREEMPTION_TIMER | + VMX_PIN_BASED_POSTED_INTR); + env->features[FEAT_VMX_SECONDARY_CTLS] &= + ~(VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | + VMX_SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | + VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT | + VMX_SECONDARY_EXEC_ENABLE_PML | + VMX_SECONDARY_EXEC_ENABLE_VMFUNC | + VMX_SECONDARY_EXEC_SHADOW_VMCS | + /* VMX_SECONDARY_EXEC_TSC_SCALING | */ + VMX_SECONDARY_EXEC_PAUSE_LOOP_EXITING); + env->features[FEAT_VMX_ENTRY_CTLS] &= + ~VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; + env->features[FEAT_VMX_EXIT_CTLS] &= + ~VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; + env->features[FEAT_VMX_VMFUNC] &= ~MSR_VMX_VMFUNC_EPT_SWITCHING; + env->features[FEAT_HV_RECOMM_EAX] |= HV_ENLIGHTENED_VMCS_RECOMMENDED; env->features[FEAT_HV_NESTED_EAX] = evmcs_version;
When enlightened VMCS is enabled, certain VMX controls disappear, e.g. posted interrupts for PINBASED_CTLS. With fine-grained VMX feature enablement QEMU tries to do KVM_SET_MSRS with default (matching CPU model) values and fails as KVM doesn't allow to set now-unsupported controls. The ideal solution for the issue would probably be to re-read VMX feature MSRs after enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS, however, this doesn't seem to be possible: currently, KVM returns global &vmcs_config.nested values for VMX MSRs when userspace does KVM_GET_MSR. It is also possible to modify KVM to apply 'evmcs filtering' to VMX MSRs when userspace tries to set them and hide the issue but this doesn't seem to be entirely correct. It is unfortunate that we now need to support the list of VMX features disabled by enlightened VMCS in QEMU. When (and if) enlightened VMCS v2 arrives we'll need to fix QEMU and allow previously disabled features. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> --- - I don't quite like this workaround myself, thus RFC. I'm sure someone will suggest a better alternative. --- target/i386/kvm.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)