diff mbox series

KVM: nSVM: Enter guest mode before initializing nested NPT MMU

Message ID 20250130010825.220346-1-seanjc@google.com (mailing list archive)
State New
Headers show
Series KVM: nSVM: Enter guest mode before initializing nested NPT MMU | expand

Commit Message

Sean Christopherson Jan. 30, 2025, 1:08 a.m. UTC
When preparing vmcb02 for nested VMRUN (or state restore), "enter" guest
mode prior to initializing the MMU for nested NPT so that guest_mode is
set in the MMU's role.  KVM's model is that all L2 MMUs are tagged with
guest_mode, as the behavior of hypervisor MMUs tends to be significantly
different than kernel MMUs.

Practically speaking, the bug is relatively benign, as KVM only directly
queries role.guest_mode in kvm_mmu_free_guest_mode_roots(), which SVM
doesn't use, and in paths that are optimizations (mmu_page_zap_pte() and
shadow_mmu_try_split_huge_pages()).

And while the role is incorprated into shadow page usage, because nested
NPT requires KVM to be using NPT for L1, reusing shadow pages across L1
and L2 is impossible as L1 MMUs will always have direct=1, while L2 MMUs
will have direct=0.

Hoist the TLB processing and setting of HF_GUEST_MASK to the beginning
of the flow instead of forcing guest_mode in the MMU, as nothing in
nested_vmcb02_prepare_control() between the old and new locations touches
TLB flush requests or HF_GUEST_MASK, i.e. there's no reason to present
inconsistent vCPU state to the MMU.

Fixes: 69cb877487de ("KVM: nSVM: move MMU setup to nested_prepare_vmcb_control")
Cc: stable@vger.kernel.org
Reported-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c    |  2 +-
 arch/x86/kvm/svm/nested.c | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)


base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0

Comments

Yosry Ahmed Jan. 30, 2025, 2:17 a.m. UTC | #1
On Wed, Jan 29, 2025 at 05:08:25PM -0800, Sean Christopherson wrote:
> When preparing vmcb02 for nested VMRUN (or state restore), "enter" guest
> mode prior to initializing the MMU for nested NPT so that guest_mode is
> set in the MMU's role.  KVM's model is that all L2 MMUs are tagged with
> guest_mode, as the behavior of hypervisor MMUs tends to be significantly
> different than kernel MMUs.
> 
> Practically speaking, the bug is relatively benign, as KVM only directly
> queries role.guest_mode in kvm_mmu_free_guest_mode_roots(), which SVM
> doesn't use, and in paths that are optimizations (mmu_page_zap_pte() and
> shadow_mmu_try_split_huge_pages()).

Just curious, what about kvm_mmu_page_ad_need_write_protect()? Is it
also bengin?

> 
> And while the role is incorprated into shadow page usage, because nested
> NPT requires KVM to be using NPT for L1, reusing shadow pages across L1
> and L2 is impossible as L1 MMUs will always have direct=1, while L2 MMUs
> will have direct=0.
> 
> Hoist the TLB processing and setting of HF_GUEST_MASK to the beginning
> of the flow instead of forcing guest_mode in the MMU, as nothing in
> nested_vmcb02_prepare_control() between the old and new locations touches
> TLB flush requests or HF_GUEST_MASK, i.e. there's no reason to present
> inconsistent vCPU state to the MMU.
> 
> Fixes: 69cb877487de ("KVM: nSVM: move MMU setup to nested_prepare_vmcb_control")
> Cc: stable@vger.kernel.org
> Reported-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

FWIW:

Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>

> ---
>  arch/x86/kvm/mmu/mmu.c    |  2 +-
>  arch/x86/kvm/svm/nested.c | 10 +++++-----
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 74fa38ebddbf..3ff55a18347d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5524,7 +5524,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
>  	union kvm_mmu_page_role root_role;
>  
>  	/* NPT requires CR0.PG=1. */
> -	WARN_ON_ONCE(cpu_role.base.direct);
> +	WARN_ON_ONCE(cpu_role.base.direct || !cpu_role.base.guest_mode);
>  
>  	root_role = cpu_role.base;
>  	root_role.level = kvm_mmu_get_tdp_level(vcpu);
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index d77b094d9a4d..04c375bf1ac2 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -646,6 +646,11 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  	u32 pause_count12;
>  	u32 pause_thresh12;
>  
> +	nested_svm_transition_tlb_flush(vcpu);
> +
> +	/* Enter Guest-Mode */
> +	enter_guest_mode(vcpu);
> +
>  	/*
>  	 * Filled at exit: exit_code, exit_code_hi, exit_info_1, exit_info_2,
>  	 * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
> @@ -762,11 +767,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  		}
>  	}
>  
> -	nested_svm_transition_tlb_flush(vcpu);
> -
> -	/* Enter Guest-Mode */
> -	enter_guest_mode(vcpu);
> -
>  	/*
>  	 * Merge guest and host intercepts - must be called with vcpu in
>  	 * guest-mode to take effect.
> 
> base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0
> -- 
> 2.48.1.262.g85cc9f2d1e-goog
>
Sean Christopherson Jan. 30, 2025, 4:13 p.m. UTC | #2
On Thu, Jan 30, 2025, Yosry Ahmed wrote:
> On Wed, Jan 29, 2025 at 05:08:25PM -0800, Sean Christopherson wrote:
> > When preparing vmcb02 for nested VMRUN (or state restore), "enter" guest
> > mode prior to initializing the MMU for nested NPT so that guest_mode is
> > set in the MMU's role.  KVM's model is that all L2 MMUs are tagged with
> > guest_mode, as the behavior of hypervisor MMUs tends to be significantly
> > different than kernel MMUs.
> > 
> > Practically speaking, the bug is relatively benign, as KVM only directly
> > queries role.guest_mode in kvm_mmu_free_guest_mode_roots(), which SVM
> > doesn't use, and in paths that are optimizations (mmu_page_zap_pte() and
> > shadow_mmu_try_split_huge_pages()).
> 
> Just curious, what about kvm_mmu_page_ad_need_write_protect()?

Doh, I missed that usage.

> Is it also bengin?

Yes.  Better to be lucky than good :-)

That path forces KVM to use write-protection instead of dirty-bit based Page
Modification Logging (PML) when L2 is active, because the GPAs captured by the
CPU would be L2 GPAs, not L1 GPAs, and there's no guarantee that the L2=>L1
translation would be valid when KVM processes the PML buffer.  To ensure the
correct page gets marked dirty, KVM uses it's standard write-protect scheme when
running L2, even if KVM is using PML to dirty log L1 accesses.

Lucky for me, PML isn't supported on any AMD CPUs.
Yosry Ahmed Jan. 30, 2025, 4:36 p.m. UTC | #3
On Thu, Jan 30, 2025 at 08:13:10AM -0800, Sean Christopherson wrote:
> On Thu, Jan 30, 2025, Yosry Ahmed wrote:
> > On Wed, Jan 29, 2025 at 05:08:25PM -0800, Sean Christopherson wrote:
> > > When preparing vmcb02 for nested VMRUN (or state restore), "enter" guest
> > > mode prior to initializing the MMU for nested NPT so that guest_mode is
> > > set in the MMU's role.  KVM's model is that all L2 MMUs are tagged with
> > > guest_mode, as the behavior of hypervisor MMUs tends to be significantly
> > > different than kernel MMUs.
> > > 
> > > Practically speaking, the bug is relatively benign, as KVM only directly
> > > queries role.guest_mode in kvm_mmu_free_guest_mode_roots(), which SVM
> > > doesn't use, and in paths that are optimizations (mmu_page_zap_pte() and
> > > shadow_mmu_try_split_huge_pages()).
> > 
> > Just curious, what about kvm_mmu_page_ad_need_write_protect()?
> 
> Doh, I missed that usage.
> 
> > Is it also bengin?
> 
> Yes.  Better to be lucky than good :-)
> 
> That path forces KVM to use write-protection instead of dirty-bit based Page
> Modification Logging (PML) when L2 is active, because the GPAs captured by the
> CPU would be L2 GPAs, not L1 GPAs, and there's no guarantee that the L2=>L1
> translation would be valid when KVM processes the PML buffer.  To ensure the
> correct page gets marked dirty, KVM uses it's standard write-protect scheme when
> running L2, even if KVM is using PML to dirty log L1 accesses.
> 
> Lucky for me, PML isn't supported on any AMD CPUs.

Well, that worked out nicely, and probably explains why the bug was not
noticed. Thanks for explaining it!
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 74fa38ebddbf..3ff55a18347d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5524,7 +5524,7 @@  void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 	union kvm_mmu_page_role root_role;
 
 	/* NPT requires CR0.PG=1. */
-	WARN_ON_ONCE(cpu_role.base.direct);
+	WARN_ON_ONCE(cpu_role.base.direct || !cpu_role.base.guest_mode);
 
 	root_role = cpu_role.base;
 	root_role.level = kvm_mmu_get_tdp_level(vcpu);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d77b094d9a4d..04c375bf1ac2 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -646,6 +646,11 @@  static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	u32 pause_count12;
 	u32 pause_thresh12;
 
+	nested_svm_transition_tlb_flush(vcpu);
+
+	/* Enter Guest-Mode */
+	enter_guest_mode(vcpu);
+
 	/*
 	 * Filled at exit: exit_code, exit_code_hi, exit_info_1, exit_info_2,
 	 * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
@@ -762,11 +767,6 @@  static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 		}
 	}
 
-	nested_svm_transition_tlb_flush(vcpu);
-
-	/* Enter Guest-Mode */
-	enter_guest_mode(vcpu);
-
 	/*
 	 * Merge guest and host intercepts - must be called with vcpu in
 	 * guest-mode to take effect.