Message ID | 20190819214650.41991-2-nikita.leshchenko@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: nVMX: Improve HLT activity support | expand |
On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote: > Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in > VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in > VMCS12. We can fix it by either failing VM entries with HLT activity state when > it's not supported or by disallowing clearing this bit. > > The latter is preferable. If we go with the former, to disable > GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be > 1" control, otherwise KVM will be presenting a bogus model to L1. > > Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards > compatibility. Paolo, do we actually need to maintain backwards compatibility in this case? This seems like a good candidate for "fix the bug and see who yells". > Reviewed-by: Liran Alon <liran.alon@oracle.com> > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> > Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> > --- > arch/x86/kvm/vmx/nested.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 46af3a5e9209..24734946ec75 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -1102,6 +1102,14 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data) > if (vmx_misc_mseg_revid(data) != vmx_misc_mseg_revid(vmx_misc)) > return -EINVAL; > > + /* > + * We always support HLT activity state. In the past it was possible to > + * turn HLT bit off (without actually turning off HLT activity state > + * support) so we don't fail vmx_restore_vmx_misc if this bit is turned > + * off. > + */ > + data |= VMX_MISC_ACTIVITY_HLT; > + > vmx->nested.msrs.misc_low = data; > vmx->nested.msrs.misc_high = data >> 32; > > -- > 2.20.1 >
On Mon, Aug 19, 2019 at 3:11 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote: > > Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in > > VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in > > VMCS12. We can fix it by either failing VM entries with HLT activity state when > > it's not supported or by disallowing clearing this bit. > > > > The latter is preferable. If we go with the former, to disable > > GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be > > 1" control, otherwise KVM will be presenting a bogus model to L1. > > > > Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards > > compatibility. > > Paolo, do we actually need to maintain backwards compatibility in this > case? This seems like a good candidate for "fix the bug and see who yells". Google's userspace clears bit 6. Please don't fail that write!
On Wed, Aug 21, 2019 at 01:59:20PM -0700, Jim Mattson wrote: > On Mon, Aug 19, 2019 at 3:11 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote: > > > Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in > > > VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in > > > VMCS12. We can fix it by either failing VM entries with HLT activity state when > > > it's not supported or by disallowing clearing this bit. > > > > > > The latter is preferable. If we go with the former, to disable > > > GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be > > > 1" control, otherwise KVM will be presenting a bogus model to L1. > > > > > > Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards > > > compatibility. > > > > Paolo, do we actually need to maintain backwards compatibility in this > > case? This seems like a good candidate for "fix the bug and see who yells". > > Google's userspace clears bit 6. Please don't fail that write! Booooo. Requiring CPU_BASED_HLT_EXITING to be forced to 1 is probably off the table since the bits are in two separate MSRs, i.e. we run into a chicken-and-egg scenario. Silently forcing GUEST_ACTIVITY_HLT seems wrong, especially if userspace has also forced CPU_BASED_HLT_EXITING, e.g. KVM would be letting L1 do something userspace explicitly asked KVM to prevent. Generally speaking, KVM lets userspace do dumb things so long as it doesn't break the kernel, so, what if we change sync_vmcs02_to_vmcs12() to propagate GUEST_ACTIVITY_HLT to vmcs12 if and only if the activity state exists? That might be better than causing a nested VM-Enter to fail? I'm also a-ok doing nothing and fulling putting the onus on userspace to get the config correct.
On Wed, Aug 21, 2019 at 3:22 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Wed, Aug 21, 2019 at 01:59:20PM -0700, Jim Mattson wrote: > > On Mon, Aug 19, 2019 at 3:11 PM Sean Christopherson > > <sean.j.christopherson@intel.com> wrote: > > > > > > On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote: > > > > Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in > > > > VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in > > > > VMCS12. We can fix it by either failing VM entries with HLT activity state when > > > > it's not supported or by disallowing clearing this bit. > > > > > > > > The latter is preferable. If we go with the former, to disable > > > > GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be > > > > 1" control, otherwise KVM will be presenting a bogus model to L1. > > > > > > > > Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards > > > > compatibility. > > > > > > Paolo, do we actually need to maintain backwards compatibility in this > > > case? This seems like a good candidate for "fix the bug and see who yells". > > > > Google's userspace clears bit 6. Please don't fail that write! > > Booooo. > Supporting activity state HLT is on our list of things to do, but I'm not convinced that kvm actually handles it properly yet. For instance... What happens if L1 launches L2 into activity state HLT with a zero-valued VMX preemption timer? (Maybe this is fixed now?) What happens if "monitor trap flag" is set and "HLT exiting" is clear in the vmcs12, and immediately on VM-entry, L2 executes HLT? (Yes, this is a special case of MTF being broken when L0 emulates an L2 instruction.) I'm sure there are other interesting scenarios that haven't been validated.
On Wed, Aug 21, 2019 at 04:01:49PM -0700, Jim Mattson wrote: > On Wed, Aug 21, 2019 at 3:22 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On Wed, Aug 21, 2019 at 01:59:20PM -0700, Jim Mattson wrote: > > > On Mon, Aug 19, 2019 at 3:11 PM Sean Christopherson > > > <sean.j.christopherson@intel.com> wrote: > > > > > > > > On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote: > > > > > Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in > > > > > VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in > > > > > VMCS12. We can fix it by either failing VM entries with HLT activity state when > > > > > it's not supported or by disallowing clearing this bit. > > > > > > > > > > The latter is preferable. If we go with the former, to disable > > > > > GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be > > > > > 1" control, otherwise KVM will be presenting a bogus model to L1. > > > > > > > > > > Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards > > > > > compatibility. > > > > > > > > Paolo, do we actually need to maintain backwards compatibility in this > > > > case? This seems like a good candidate for "fix the bug and see who yells". > > > > > > Google's userspace clears bit 6. Please don't fail that write! > > > > Booooo. > > > > Supporting activity state HLT is on our list of things to do, but I'm > not convinced that kvm actually handles it properly yet. For > instance... I fully understand why you'd want to hide it from L1, I was just bummed that we couldn't go with a quick and dirty fix :-) > What happens if L1 launches L2 into activity state HLT with a > zero-valued VMX preemption timer? (Maybe this is fixed now?) I think that one got fixed in vmx_start_preemption_timer(). > What happens if "monitor trap flag" is set and "HLT exiting" is clear > in the vmcs12, and immediately on VM-entry, L2 executes HLT? (Yes, > this is a special case of MTF being broken when L0 emulates an L2 > instruction.) > > I'm sure there are other interesting scenarios that haven't been validated.
On Mon, Aug 19, 2019 at 2:47 PM Nikita Leshenko <nikita.leshchenko@oracle.com> wrote: > > Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in > VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in > VMCS12. We can fix it by either failing VM entries with HLT activity state when > it's not supported or by disallowing clearing this bit. > > The latter is preferable. If we go with the former, to disable > GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be > 1" control, otherwise KVM will be presenting a bogus model to L1. > > Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards > compatibility. > > Reviewed-by: Liran Alon <liran.alon@oracle.com> > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> > Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> > --- > arch/x86/kvm/vmx/nested.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 46af3a5e9209..24734946ec75 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -1102,6 +1102,14 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data) > if (vmx_misc_mseg_revid(data) != vmx_misc_mseg_revid(vmx_misc)) > return -EINVAL; > > + /* > + * We always support HLT activity state. In the past it was possible to > + * turn HLT bit off (without actually turning off HLT activity state > + * support) so we don't fail vmx_restore_vmx_misc if this bit is turned > + * off. > + */ > + data |= VMX_MISC_ACTIVITY_HLT; > + > vmx->nested.msrs.misc_low = data; > vmx->nested.msrs.misc_high = data >> 32; > This change breaks live migration to an upgraded kernel, since it doesn't allow the IA32_VMX_MISC MSR to be restored to its original value. I think this warrants a quirk.
> On 21 Aug 2019, at 23:59, Jim Mattson <jmattson@google.com> wrote: > > On Mon, Aug 19, 2019 at 3:11 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: >> >> On Tue, Aug 20, 2019 at 12:46:49AM +0300, Nikita Leshenko wrote: >>> Before this commit, userspace could disable the GUEST_ACTIVITY_HLT bit in >>> VMX_MISC yet KVM would happily accept GUEST_ACTIVITY_HLT activity state in >>> VMCS12. We can fix it by either failing VM entries with HLT activity state when >>> it's not supported or by disallowing clearing this bit. >>> >>> The latter is preferable. If we go with the former, to disable >>> GUEST_ACTIVITY_HLT userspace also has to make CPU_BASED_HLT_EXITING a "must be >>> 1" control, otherwise KVM will be presenting a bogus model to L1. >>> >>> Don't fail writes that disable GUEST_ACTIVITY_HLT to maintain backwards >>> compatibility. >> >> Paolo, do we actually need to maintain backwards compatibility in this >> case? This seems like a good candidate for "fix the bug and see who yells". > > Google's userspace clears bit 6. Please don't fail that write! What happens if the guest tries to use HLT activity state regardless of the bit? Currently in KVM the guest will succeed, since there are no checks on VM entry for that. I previously submitted a patch[1] that adds a check for that, what do you think about it? [1] https://patchwork.kernel.org/patch/11092209/ Nikita
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 46af3a5e9209..24734946ec75 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1102,6 +1102,14 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data) if (vmx_misc_mseg_revid(data) != vmx_misc_mseg_revid(vmx_misc)) return -EINVAL; + /* + * We always support HLT activity state. In the past it was possible to + * turn HLT bit off (without actually turning off HLT activity state + * support) so we don't fail vmx_restore_vmx_misc if this bit is turned + * off. + */ + data |= VMX_MISC_ACTIVITY_HLT; + vmx->nested.msrs.misc_low = data; vmx->nested.msrs.misc_high = data >> 32;