diff mbox series

[v3,2/2] KVM: x86: nSVM/nVMX: Fix RSM logic leading to L2 VM-Entries

Message ID 20240501202934.1365061-3-kishen.maloor@intel.com (mailing list archive)
State New, archived
Headers show
Series Address syzkaller warnings in nested VM-Exit after RSM | expand

Commit Message

Kishen Maloor May 1, 2024, 8:29 p.m. UTC
Syzkaller found a warning triggered in nested_vmx_vmexit().
The nested_run_pending flag is non-zero, even though we're in
nested_vmx_vmexit(). Generally, trying to cancel a pending entry is
considered a bug. However in this particular scenario, the kernel
behavior seems correct.

Syzkaller scenario:
1) Set up VCPU's
2) Run some code with KVM_RUN in L2 as a nested guest
3) Return from KVM_RUN
4) Inject KVM_SMI into the VCPU
5) Change the EFER register with KVM_SET_SREGS to value 0x2501
6) Run some code on the VCPU using KVM_RUN
7) Observe following behavior:

kvm_smm_transition: vcpu 0: entering SMM, smbase 0x30000
kvm_entry: vcpu 0, rip 0x8000
kvm_entry: vcpu 0, rip 0x8000
kvm_entry: vcpu 0, rip 0x8002
kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x30000
kvm_nested_vmenter: rip: 0x0000000000008002 vmcs: 0x0000000000007000
                    nested_rip: 0x0000000000000000 int_ctl: 0x00000000
                    event_inj: 0x00000000 nested_ept=n guest
                    cr3: 0x0000000000002000
kvm_nested_vmexit_inject: reason: TRIPLE_FAULT ext_inf1: 0x0000000000000000
                          ext_inf2: 0x0000000000000000 ext_int: 0x00000000
                          ext_int_err: 0x00000000

What happened here is an SMI was injected immediately and the handler was
called at address 0x8000; all is good. Later, an RSM instruction is
executed in an emulator to return to the nested VM. em_rsm() is called,
which leads to emulator_leave_smm(). A part of this function calls the
VMX/SVM callback, in this case vmx_leave_smm(). It attempts to set up a
pending reentry to the guest VM by calling nested_vmx_enter_non_root_mode()
and sets the nested_run_pending flag to one. Unfortunately, later in
emulator_leave_smm(), rsm_load_state_64() fails to write an invalid EFER to
the MSR. This results in em_rsm() calling the triple_fault callback. At this
point, it's clear that KVM should perform a vmexit, but nested_run_pending
is left set to 1.

Similar flow goes for SVM, as the bug also reproduces on AMD platforms.

[Notes above by Michal Wilczynski of the bug analysis]

To address the issue, this change sets the nested_run_pending flag only
after a successful emulation of the RSM instruction. Previously, it
was set (prematurely) inside the vendor-specific leave_smm() callback since
commit 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM
entry which is a result of RSM").

Fixes: 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM")
Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Closes: https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@mail.gmail.com
Analyzed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
---
 arch/x86/kvm/smm.c     | 12 ++++++++++--
 arch/x86/kvm/svm/svm.c |  2 --
 arch/x86/kvm/vmx/vmx.c |  1 -
 3 files changed, 10 insertions(+), 5 deletions(-)

Comments

Sean Christopherson Aug. 16, 2024, 5:22 p.m. UTC | #1
Sorry for the super slow review, I was dreading looking at KVM's SMM mess :-)

On Wed, May 01, 2024, Kishen Maloor wrote:
> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
> index d06d43d8d2aa..b1dac967f1a5 100644
> --- a/arch/x86/kvm/smm.c
> +++ b/arch/x86/kvm/smm.c
> @@ -633,8 +633,16 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
>  
>  #ifdef CONFIG_X86_64
>  	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> -		return rsm_load_state_64(ctxt, &smram.smram64);
> +		ret = rsm_load_state_64(ctxt, &smram.smram64);
>  	else
>  #endif
> -		return rsm_load_state_32(ctxt, &smram.smram32);
> +		ret = rsm_load_state_32(ctxt, &smram.smram32);
> +
> +	/*
> +	 * Set nested_run_pending to ensure completion of a nested VM-Entry
> +	 */
> +	if (ret == X86EMUL_CONTINUE && ctxt->ops->is_guest_mode(ctxt))

No need to bounce through the emulation context, this can simply be

	if (ret == X86EMUL_CONTINUE && is_guest_mode(vcpu))

> +		vcpu->arch.nested_run_pending = 1;

That said, while I 100% agree that KVM shouldn't set nested_run_pending before
loading state from SMRAM, I think it's actually the wrong fix.  I am fairly certain
that the real issue is that KVM is synthesizing shutdown _for L2_.  Neither the
the SDM and APM state that a shutdown on RSM to guest mode can trigger a VM-Exit,
Intel's pseudocode explicitly says state is loaded from SMRAM before transitioning
the CPU back to non-root mode, and AMD saves VMCB state in SMRAM, i.e. it's not
treated differently (like Intel does for VMX state).

So, the more correct, and amusingly easier, fix is to forcibly leave nested mode
prior to signalling shutdown.  I'll post a patch for that.

I think I'll also grab patch 1 and post it in a slightly larger cleanup series
at some point.  Making nested_run_pending a common field is worthwhile regardless
of this SMM mess.
diff mbox series

Patch

diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index d06d43d8d2aa..b1dac967f1a5 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -633,8 +633,16 @@  int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
 
 #ifdef CONFIG_X86_64
 	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
-		return rsm_load_state_64(ctxt, &smram.smram64);
+		ret = rsm_load_state_64(ctxt, &smram.smram64);
 	else
 #endif
-		return rsm_load_state_32(ctxt, &smram.smram32);
+		ret = rsm_load_state_32(ctxt, &smram.smram32);
+
+	/*
+	 * Set nested_run_pending to ensure completion of a nested VM-Entry
+	 */
+	if (ret == X86EMUL_CONTINUE && ctxt->ops->is_guest_mode(ctxt))
+		vcpu->arch.nested_run_pending = 1;
+
+	return ret;
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index debc53b73ea3..4c3f0e1f0dd0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4699,8 +4699,6 @@  static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
 	if (ret)
 		goto unmap_save;
 
-	vcpu->arch.nested_run_pending = 1;
-
 unmap_save:
 	kvm_vcpu_unmap(vcpu, &map_save, true);
 unmap_map:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e83439ecd956..e66fc14b54be 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8214,7 +8214,6 @@  static int vmx_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
 		if (ret)
 			return ret;
 
-		vcpu->arch.nested_run_pending = 1;
 		vmx->nested.smm.guest_mode = false;
 	}
 	return 0;