Message ID | 20231205234956.1156210-1-michael.roth@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: SEV: Fix handling of EFER_LMA bit when SEV-ES is enabled | expand |
On Tue, Dec 05, 2023, Michael Roth wrote: > In general, activating long mode involves setting the EFER_LME bit in > the EFER register and then enabling the X86_CR0_PG bit in the CR0 > register. At this point, the EFER_LMA bit will be set automatically by > hardware. > > In the case of SVM/SEV guests where writes to CR0 are intercepted, it's > necessary for the host to set EFER_LMA on behalf of the guest since > hardware does not see the actual CR0 write. > > In the case of SEV-ES guests where writes to CR0 are trapped instead of > intercepted, the hardware *does* see/record the write to CR0 before > exiting and passing the value on to the host, so as part of enabling > SEV-ES support commit f1c6366e3043 ("KVM: SVM: Add required changes to > support intercepts under SEV-ES") dropped special handling of the > EFER_LMA bit with the understanding that it would be set automatically. > > However, since the guest never explicitly sets the EFER_LMA bit, the > host never becomes aware that it has been set. This becomes problematic > when userspace tries to get/set the EFER values via > KVM_GET_SREGS/KVM_SET_SREGS, since the EFER contents tracked by the host > will be missing the EFER_LMA bit, and when userspace attempts to pass > the EFER value back via KVM_SET_SREGS it will fail a sanity check that > asserts that EFER_LMA should always be set when X86_CR0_PG and EFER_LME > are set. > > Fix this by always inferring the value of EFER_LMA based on X86_CR0_PG > and EFER_LME, regardless of whether or not SEV-ES is enabled. Blech. This is a hack to fix even worse hacks. KVM ignores CR0/CR4/EFER values that are set via KVM_SET_SREGS, i.e. KVM is rejecting an EFER value that it will never consume, which is ridiculous. And the fact that you're not trying to have KVM actually set state further strengthens my assertion that tracking CR0/CR4/EFER in KVM is pointless necessary for SEV-ES+ guests[1]. This also fixes the _source_, which makes it far less useful, e.g. live migrating to fixed version of KVM won't work. So my very strong preference is to first skip the kvm_is_valid_sregs() check, and then in a follow-up series disable the CR0/CR4/EFER traps and probably use maximal (according to the guest's capabilities) "defaults" for CR0/CR4/EFER (and I suppose XCR0 and XSS too), i.e. assume the vCPU is in 64-bit mode with everything enabled. My understanding is that SVM_VMGEXIT_AP_CREATION is going to force KVM to assume maximal state anyways since KVM will have no way of verifying what state is actually shoved into the VMSA, i.e. emulating INIT is wildly broken[2]. Side topic, Peter suspected that KVM _does_ need to let userspace set CR8 since that's not captured in the VMSA[3]. [1] https://lore.kernel.org/all/YJla8vpwqCxqgS8C@google.com [2] https://lore.kernel.org/all/20231016132819.1002933-38-michael.roth@amd.com [3] https://lore.kernel.org/all/CAMkAt6oL9tfF5rvP0htbQNDPr50Zk41Q4KP-dM0N+SJ7xmsWvw@mail.gmail.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2c924075f6f1..6fb2b913009e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11620,7 +11620,8 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs, int idx; struct desc_ptr dt; - if (!kvm_is_valid_sregs(vcpu, sregs)) + if (!vcpu->arch.guest_state_protected && + !kvm_is_valid_sregs(vcpu, sregs)) return -EINVAL; apic_base_msr.data = sregs->apic_base;
On 12/5/23 17:49, Michael Roth wrote: > In general, activating long mode involves setting the EFER_LME bit in > the EFER register and then enabling the X86_CR0_PG bit in the CR0 > register. At this point, the EFER_LMA bit will be set automatically by > hardware. > > In the case of SVM/SEV guests where writes to CR0 are intercepted, it's > necessary for the host to set EFER_LMA on behalf of the guest since > hardware does not see the actual CR0 write. > > In the case of SEV-ES guests where writes to CR0 are trapped instead of > intercepted, the hardware *does* see/record the write to CR0 before > exiting and passing the value on to the host, so as part of enabling > SEV-ES support commit f1c6366e3043 ("KVM: SVM: Add required changes to > support intercepts under SEV-ES") dropped special handling of the > EFER_LMA bit with the understanding that it would be set automatically. Maybe add here that it was thought that the change to EFER would generate a trap. However, a trap isn't generated for hardware changes to EFER and only explicit writes to EFER generate the trap. Then the "However" can be dropped from the start of the next sentence. > > However, since the guest never explicitly sets the EFER_LMA bit, the > host never becomes aware that it has been set. This becomes problematic > when userspace tries to get/set the EFER values via > KVM_GET_SREGS/KVM_SET_SREGS, since the EFER contents tracked by the host > will be missing the EFER_LMA bit, and when userspace attempts to pass > the EFER value back via KVM_SET_SREGS it will fail a sanity check that > asserts that EFER_LMA should always be set when X86_CR0_PG and EFER_LME > are set. > > Fix this by always inferring the value of EFER_LMA based on X86_CR0_PG > and EFER_LME, regardless of whether or not SEV-ES is enabled. > > Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") > Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Sean Christopherson <seanjc@google.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: x86@kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Michael Roth <michael.roth@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> > --- > arch/x86/kvm/svm/svm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 5d75a1732da4..b31d4f2deb66 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1869,7 +1869,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) > bool old_paging = is_paging(vcpu); > > #ifdef CONFIG_X86_64 > - if (vcpu->arch.efer & EFER_LME && !vcpu->arch.guest_state_protected) { > + if (vcpu->arch.efer & EFER_LME) { > if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) { > vcpu->arch.efer |= EFER_LMA; > svm->vmcb->save.efer |= EFER_LMA | EFER_LME;
On Tue, 2023-12-05 at 17:49 -0600, Michael Roth wrote: > In general, activating long mode involves setting the EFER_LME bit in > the EFER register and then enabling the X86_CR0_PG bit in the CR0 > register. At this point, the EFER_LMA bit will be set automatically by > hardware. > > In the case of SVM/SEV guests where writes to CR0 are intercepted, it's > necessary for the host to set EFER_LMA on behalf of the guest since > hardware does not see the actual CR0 write. Could you explain in which case the writes to CR0 will be still intercepted? It's for CPUs that only support SEV-ES and nothing beyond it? > > In the case of SEV-ES guests where writes to CR0 are trapped instead of > intercepted, the hardware *does* see/record the write to CR0 before > exiting and passing the value on to the host, so as part of enabling > SEV-ES support commit f1c6366e3043 ("KVM: SVM: Add required changes to > support intercepts under SEV-ES") dropped special handling of the > EFER_LMA bit with the understanding that it would be set automatically. > > However, since the guest never explicitly sets the EFER_LMA bit, the > host never becomes aware that it has been set. This becomes problematic > when userspace tries to get/set the EFER values via > KVM_GET_SREGS/KVM_SET_SREGS, since the EFER contents tracked by the host > will be missing the EFER_LMA bit, and when userspace attempts to pass > the EFER value back via KVM_SET_SREGS it will fail a sanity check that > asserts that EFER_LMA should always be set when X86_CR0_PG and EFER_LME > are set. > > Fix this by always inferring the value of EFER_LMA based on X86_CR0_PG > and EFER_LME, regardless of whether or not SEV-ES is enabled. > > Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") > Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Sean Christopherson <seanjc@google.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: x86@kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Michael Roth <michael.roth@amd.com> > --- > arch/x86/kvm/svm/svm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 5d75a1732da4..b31d4f2deb66 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1869,7 +1869,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) > bool old_paging = is_paging(vcpu); > > #ifdef CONFIG_X86_64 > - if (vcpu->arch.efer & EFER_LME && !vcpu->arch.guest_state_protected) { > + if (vcpu->arch.efer & EFER_LME) { > if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) { > vcpu->arch.efer |= EFER_LMA; > svm->vmcb->save.efer |= EFER_LMA | EFER_LME; Purely from the point of view of not confusing future readers of this code: Due to encrypted guest state, if I understand correctly, the 'svm->vmcb->save.efer' is only given for the hypervisor to see but not to modify. While the modification of save.efer is a nop, can we still guard it with 'vcpu->arch.guest_state_protected'? Besides these nitpicks: Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Best regards, Maxim Levitsky
On Wed, Dec 6, 2023 at 4:28 PM Sean Christopherson <seanjc@google.com> wrote: > Blech. This is a hack to fix even worse hacks. KVM ignores CR0/CR4/EFER values > that are set via KVM_SET_SREGS, i.e. KVM is rejecting an EFER value that it will > never consume, which is ridiculous. And the fact that you're not trying to have > KVM actually set state further strengthens my assertion that tracking CR0/CR4/EFER > in KVM is pointless necessary for SEV-ES+ guests[1]. I agree that KVM is not going to consume CR0/CR4/EFER. I disagree that it's a good idea to have a value of vcpu->arch.efer that is architecturally impossible (so much so that it would fail vmentry in a non-SEV-ES guest). I also agree that changing the source is not particularly useful, but then changing the destination can be easily done in userspace. In other words, bugfix or not this can and should be merged as a code cleanup (though your older "[PATCH 1/2] KVM: SVM: Update EFER software model on CR0 trap for SEV-ES" is nicer in that it clarifies that svm->vmcb->save.efer is not used, and that's what I would like to apply). > So my very strong preference is to first skip the kvm_is_valid_sregs() check No, please don't. If you want to add a quirk that, when disabled, causes all guest state get/set ioctls to fail, go ahead. But invalid processor state remains invalid, and should be rejected, even when KVM won't consume it. > My understanding is that SVM_VMGEXIT_AP_CREATION is going to force KVM to assume > maximal state anyways since KVM will have no way of verifying what state is actually > shoved into the VMSA, i.e. emulating INIT is wildly broken[2]. Yes, or alternatively a way to pass CR0/CR4/EFER from the guest should be included in the VMGEXIT spec. > Side topic, Peter suspected that KVM _does_ need to let userspace set CR8 since > that's not captured in the VMSA[3]. Makes sense, and then we would have to apply the 2/2 patch from 2021 as well. But for now I'll leave that aside. Paolo > [1] https://lore.kernel.org/all/YJla8vpwqCxqgS8C@google.com > [2] https://lore.kernel.org/all/20231016132819.1002933-38-michael.roth@amd.com > [3] https://lore.kernel.org/all/CAMkAt6oL9tfF5rvP0htbQNDPr50Zk41Q4KP-dM0N+SJ7xmsWvw@mail.gmail.com > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 2c924075f6f1..6fb2b913009e 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -11620,7 +11620,8 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs, > int idx; > struct desc_ptr dt; > > - if (!kvm_is_valid_sregs(vcpu, sregs)) > + if (!vcpu->arch.guest_state_protected && > + !kvm_is_valid_sregs(vcpu, sregs)) > return -EINVAL; > > apic_base_msr.data = sregs->apic_base; >
On Fri, Dec 08, 2023, Paolo Bonzini wrote: > On Wed, Dec 6, 2023 at 4:28 PM Sean Christopherson <seanjc@google.com> wrote: > > So my very strong preference is to first skip the kvm_is_valid_sregs() check > > No, please don't. If you want to add a quirk that, when disabled, > causes all guest state get/set ioctls to fail, go ahead. But invalid > processor state remains invalid, and should be rejected, even when KVM > won't consume it. Ugh, true, KVM should still reject garbage.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 5d75a1732da4..b31d4f2deb66 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1869,7 +1869,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) bool old_paging = is_paging(vcpu); #ifdef CONFIG_X86_64 - if (vcpu->arch.efer & EFER_LME && !vcpu->arch.guest_state_protected) { + if (vcpu->arch.efer & EFER_LME) { if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) { vcpu->arch.efer |= EFER_LMA; svm->vmcb->save.efer |= EFER_LMA | EFER_LME;
In general, activating long mode involves setting the EFER_LME bit in the EFER register and then enabling the X86_CR0_PG bit in the CR0 register. At this point, the EFER_LMA bit will be set automatically by hardware. In the case of SVM/SEV guests where writes to CR0 are intercepted, it's necessary for the host to set EFER_LMA on behalf of the guest since hardware does not see the actual CR0 write. In the case of SEV-ES guests where writes to CR0 are trapped instead of intercepted, the hardware *does* see/record the write to CR0 before exiting and passing the value on to the host, so as part of enabling SEV-ES support commit f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") dropped special handling of the EFER_LMA bit with the understanding that it would be set automatically. However, since the guest never explicitly sets the EFER_LMA bit, the host never becomes aware that it has been set. This becomes problematic when userspace tries to get/set the EFER values via KVM_GET_SREGS/KVM_SET_SREGS, since the EFER contents tracked by the host will be missing the EFER_LMA bit, and when userspace attempts to pass the EFER value back via KVM_SET_SREGS it will fail a sanity check that asserts that EFER_LMA should always be set when X86_CR0_PG and EFER_LME are set. Fix this by always inferring the value of EFER_LMA based on X86_CR0_PG and EFER_LME, regardless of whether or not SEV-ES is enabled. Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Michael Roth <michael.roth@amd.com> --- arch/x86/kvm/svm/svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)