[09/13] KVM: SVM: allow NMI window with vNMI

Message ID	20221117143242.102721-10-mlevitsk@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Maxim Levitsky <mlevitsk@redhat.com> To: kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com>, Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>, Dave Hansen <dave.hansen@linux.intel.com>, linux-kernel@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Sandipan Das <sandipan.das@amd.com>, Daniel Sneddon <daniel.sneddon@linux.intel.com>, Jing Liu <jing2.liu@intel.com>, Josh Poimboeuf <jpoimboe@kernel.org>, Wyes Karny <wyes.karny@amd.com>, Borislav Petkov <bp@alien8.de>, Babu Moger <babu.moger@amd.com>, Pawan Gupta <pawan.kumar.gupta@linux.intel.com>, Sean Christopherson <seanjc@google.com>, Jim Mattson <jmattson@google.com>, x86@kernel.org, Maxim Levitsky <mlevitsk@redhat.com> Subject: [PATCH 09/13] KVM: SVM: allow NMI window with vNMI Date: Thu, 17 Nov 2022 16:32:38 +0200 Message-Id: <20221117143242.102721-10-mlevitsk@redhat.com> In-Reply-To: <20221117143242.102721-1-mlevitsk@redhat.com> References: <20221117143242.102721-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	SVM: vNMI (with my fixes) \| expand [00/13] SVM: vNMI (with my fixes) [01/13] KVM: nSVM: don't sync back tlb_ctl on nested VM exit [02/13] KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each VM exit [03/13] KVM: nSVM: rename nested_sync_control_from_vmcb02 to nested_sync_int_ctl_from_vmcb02 [04/13] KVM: nSVM: clean up copying of int_ctl fields back to vmcb01/vmcb12 [05/13] x86/cpu: Add CPUID feature bit for VNMI [06/13] KVM: SVM: Add VNMI bit definition [07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask [08/13] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI [09/13] KVM: SVM: allow NMI window with vNMI [10/13] KVM: SVM: Add VNMI support in inject_nmi [11/13] KVM: nSVM: implement nested VNMI [12/13] KVM: nSVM: emulate VMEXIT_INVALID case for nested VNMI [13/13] KVM: SVM: Enable VNMI feature

Message ID

20221117143242.102721-10-mlevitsk@redhat.com (mailing list archive)

State

New, archived

Headers

From: Maxim Levitsky <mlevitsk@redhat.com>
To: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
        Ingo Molnar <mingo@redhat.com>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        linux-kernel@vger.kernel.org,
        Peter Zijlstra <peterz@infradead.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Sandipan Das <sandipan.das@amd.com>,
        Daniel Sneddon <daniel.sneddon@linux.intel.com>,
        Jing Liu <jing2.liu@intel.com>,
        Josh Poimboeuf <jpoimboe@kernel.org>,
        Wyes Karny <wyes.karny@amd.com>,
        Borislav Petkov <bp@alien8.de>,
        Babu Moger <babu.moger@amd.com>,
        Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
        Sean Christopherson <seanjc@google.com>,
        Jim Mattson <jmattson@google.com>, x86@kernel.org,
        Maxim Levitsky <mlevitsk@redhat.com>
Subject: [PATCH 09/13] KVM: SVM: allow NMI window with vNMI
Date: Thu, 17 Nov 2022 16:32:38 +0200
Message-Id: <20221117143242.102721-10-mlevitsk@redhat.com>
In-Reply-To: <20221117143242.102721-1-mlevitsk@redhat.com>
References: <20221117143242.102721-1-mlevitsk@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

SVM: vNMI (with my fixes) | expand

Commit Message

Maxim Levitsky Nov. 17, 2022, 2:32 p.m. UTC

When the vNMI is enabled, the only case when the KVM will use an NMI
window is when the vNMI injection is pending.

In this case on next IRET/RSM/STGI, the injection has to be complete
and a new NMI can be injected.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

Comments

Sean Christopherson Nov. 17, 2022, 6:21 p.m. UTC | #1

On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> When the vNMI is enabled, the only case when the KVM will use an NMI
> window is when the vNMI injection is pending.
> 
> In this case on next IRET/RSM/STGI, the injection has to be complete
> and a new NMI can be injected.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/svm.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index cfec4c98bb589b..eaa30f8ace518d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2477,7 +2477,10 @@ static int iret_interception(struct kvm_vcpu *vcpu)
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
>  	++vcpu->stat.nmi_window_exits;
> -	vcpu->arch.hflags |= HF_IRET_MASK;
> +
> +	if (!is_vnmi_enabled(svm))
> +		vcpu->arch.hflags |= HF_IRET_MASK;

Ugh, HF_IRET_MASK is such a terrible name/flag.  Given that it lives with GIF
and NMI, one would naturally think that it means "IRET is intercepted", but it
really means "KVM just intercepted an IRET and is waiting for NMIs to become
unblocked".

And on a related topic, why on earth are GIF, NMI, and IRET tracked in hflags?
They are 100% SVM concepts.  IMO, this code would be much easier to follow if
by making them bools in vcpu_svm with more descriptive names.

> +
>  	if (!sev_es_guest(vcpu->kvm)) {
>  		svm_clr_intercept(svm, INTERCEPT_IRET);
>  		svm->nmi_iret_rip = kvm_rip_read(vcpu);

The vNMI interaction with this logic is confusing, as nmi_iret_rip doesn't need
to be captured for the vNMI case.  SEV-ES actually has unrelated reasons for not
reading RIP vs. not intercepting IRET, they just got bundled together here for
convenience.

This is also an opportunity to clean up the SEV-ES interaction with IRET interception,
which is splattered all over the place and isn't documented anywhere.

E.g. (with an HF_IRET_MASK => awaiting_iret_completion change)

/*
 * For SEV-ES guests, KVM must not rely on IRET to detect NMI unblocking as
 * #VC->IRET in the guest will result in KVM thinking NMIs are unblocked before
 * the guest is ready for a new NMI.  Architecturally, KVM is 100% correct to
 * treat NMIs as unblocked on IRET, but the guest-host ABI for SEV-ES guests is
 * that KVM must wait for an explicit "NMI Complete" from the guest.
 */
static void svm_disable_iret_interception(struct vcpu_svm *svm)
{
	if (!sev_es_guest(svm->vcpu.kvm))
		svm_clr_intercept(svm, INTERCEPT_IRET);
}

static void svm_enable_iret_interception(struct vcpu_svm *svm)
{
	if (!sev_es_guest(svm->vcpu.kvm))
		svm_set_intercept(svm, INTERCEPT_IRET);
}

static int iret_interception(struct kvm_vcpu *vcpu)
{
	struct vcpu_svm *svm = to_svm(vcpu);

	++vcpu->stat.nmi_window_exits;

	/*
	 * No need to wait for the IRET to complete if vNMIs are enabled as
	 * hardware will automatically process the pending NMI when NMIs are
	 * unblocked from the guest's perspective.
	 */
	if (!is_vnmi_enabled(svm)) {
		svm->awaiting_iret_completion = true;

		/*
		 * The guest's RIP is inaccessible for SEV-ES guests, just
		 * assume forward progress was made on the next VM-Exit.
		 */
		if (!sev_es_guest(vcpu->kvm))
			svm->nmi_iret_rip = kvm_rip_read(vcpu);
	}

	svm_disable_iret_interception(svm);

	kvm_make_request(KVM_REQ_EVENT, vcpu);
	return 1;
}

> @@ -3735,9 +3738,6 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> -	if (is_vnmi_enabled(svm))
> -		return;
> -
>  	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
>  		return; /* IRET will cause a vm exit */

As much as I like incremental patches, in this case I'm having a hell of a time
reviewing the code as the vNMI logic ends up being split across four patches.
E.g. in this particular case, the above requires knowing that svm_inject_nmi()
never sets HF_NMI_MASK when vNMI is enabled.

In the next version, any objection to squashing patches 7-10 into a single "Add
non-nested vNMI support" patch?

As for this code, IMO some pre-work to change the flow would help with the vNMI
case.  The GIF=0 logic overrides legacy NMI blocking, and so can be handled first.
And I vote to explicitly set INTERCEPT_IRET in the above case instead of relying
on INTERCEPT_IRET to already be set by svm_inject_nmi().

That would yield this as a final result:

static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
{
	struct vcpu_svm *svm = to_svm(vcpu);

	/*
	 * GIF=0 blocks NMIs irrespective of legacy NMI blocking.  No need to
	 * intercept or single-step IRET if GIF=0, just intercept STGI.
	 */
	if (!gif_set(svm)) {
		if (vgif)
			svm_set_intercept(svm, INTERCEPT_STGI);
		return;
	}

	/*
	 * NMI is blocked, either because an NMI is in service or because KVM
	 * just injected an NMI.  If KVM is waiting for an intercepted IRET to
	 * complete, single-step the IRET to wait for NMIs to become unblocked.
	 * Otherwise, intercept the guest's next IRET.
	 */
	if (svm->awaiting_iret_completion) {
		svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
		svm->nmi_singlestep = true;
		svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
	} else {
		svm_set_intercept(svm, INTERCEPT_IRET);
	}
}

>  
> @@ -3751,9 +3751,14 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  	 * Something prevents NMI from been injected. Single step over possible
>  	 * problem (IRET or exception injection or interrupt shadow)
>  	 */
> -	svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> -	svm->nmi_singlestep = true;
> -	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> +
> +	if (is_vnmi_enabled(svm)) {
> +		svm_set_intercept(svm, INTERCEPT_IRET);

This will break SEV-ES.  Per commit 4444dfe4050b ("KVM: SVM: Add NMI support for
an SEV-ES guest"), the hypervisor must not rely on IRET interception to detect
NMI unblocking for SEV-ES guests.  As above, I think we should provide helpers to
toggle NMI interception to reduce the probability of breaking SEV-ES.

> +	} else {
> +		svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> +		svm->nmi_singlestep = true;
> +		svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> +	}
>  }
>  
>  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> -- 
> 2.34.3
>

Maxim Levitsky Nov. 21, 2022, 1:40 p.m. UTC | #2

On Thu, 2022-11-17 at 18:21 +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > When the vNMI is enabled, the only case when the KVM will use an NMI
> > window is when the vNMI injection is pending.
> > 
> > In this case on next IRET/RSM/STGI, the injection has to be complete
> > and a new NMI can be injected.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/svm.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index cfec4c98bb589b..eaa30f8ace518d 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -2477,7 +2477,10 @@ static int iret_interception(struct kvm_vcpu *vcpu)
> >         struct vcpu_svm *svm = to_svm(vcpu);
> >  
> >         ++vcpu->stat.nmi_window_exits;
> > -       vcpu->arch.hflags |= HF_IRET_MASK;
> > +
> > +       if (!is_vnmi_enabled(svm))
> > +               vcpu->arch.hflags |= HF_IRET_MASK;
> 
> Ugh, HF_IRET_MASK is such a terrible name/flag.  Given that it lives with GIF
> and NMI, one would naturally think that it means "IRET is intercepted", but it
> really means "KVM just intercepted an IRET and is waiting for NMIs to become
> unblocked".
> 
> And on a related topic, why on earth are GIF, NMI, and IRET tracked in hflags?
> They are 100% SVM concepts.  IMO, this code would be much easier to follow if
> by making them bools in vcpu_svm with more descriptive names.
> 
> > +
> >         if (!sev_es_guest(vcpu->kvm)) {
> >                 svm_clr_intercept(svm, INTERCEPT_IRET);
> >                 svm->nmi_iret_rip = kvm_rip_read(vcpu);
> 
> The vNMI interaction with this logic is confusing, as nmi_iret_rip doesn't need
> to be captured for the vNMI case.  SEV-ES actually has unrelated reasons for not
> reading RIP vs. not intercepting IRET, they just got bundled together here for
> convenience.
Yes, this can be cleaned up, again I didn't want to change too much of the code.


> 
> This is also an opportunity to clean up the SEV-ES interaction with IRET interception,
> which is splattered all over the place and isn't documented anywhere.
> 
> E.g. (with an HF_IRET_MASK => awaiting_iret_completion change)
> 
> /*
>  * For SEV-ES guests, KVM must not rely on IRET to detect NMI unblocking as
>  * #VC->IRET in the guest will result in KVM thinking NMIs are unblocked before
>  * the guest is ready for a new NMI.  Architecturally, KVM is 100% correct to
>  * treat NMIs as unblocked on IRET, but the guest-host ABI for SEV-ES guests is
>  * that KVM must wait for an explicit "NMI Complete" from the guest.
>  */
> static void svm_disable_iret_interception(struct vcpu_svm *svm)
> {
>         if (!sev_es_guest(svm->vcpu.kvm))
>                 svm_clr_intercept(svm, INTERCEPT_IRET);
> }
> 
> static void svm_enable_iret_interception(struct vcpu_svm *svm)
> {
>         if (!sev_es_guest(svm->vcpu.kvm))
>                 svm_set_intercept(svm, INTERCEPT_IRET);
> }

This makes sense, but doesn't have to be done in this patch series IMHO.
> 
> static int iret_interception(struct kvm_vcpu *vcpu)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
> 
>         ++vcpu->stat.nmi_window_exits;
> 
>         /*
>          * No need to wait for the IRET to complete if vNMIs are enabled as
>          * hardware will automatically process the pending NMI when NMIs are
>          * unblocked from the guest's perspective.
>          */
>         if (!is_vnmi_enabled(svm)) {
>                 svm->awaiting_iret_completion = true;
> 
>                 /*
>                  * The guest's RIP is inaccessible for SEV-ES guests, just
>                  * assume forward progress was made on the next VM-Exit.
>                  */
>                 if (!sev_es_guest(vcpu->kvm))
>                         svm->nmi_iret_rip = kvm_rip_read(vcpu);
>         }
> 
>         svm_disable_iret_interception(svm);
> 
>         kvm_make_request(KVM_REQ_EVENT, vcpu);
>         return 1;
> }

> > @@ -3735,9 +3738,6 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> >  {
> >         struct vcpu_svm *svm = to_svm(vcpu);
> >  
> > -       if (is_vnmi_enabled(svm))
> > -               return;
> > -
> >         if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
> >                 return; /* IRET will cause a vm exit */
> 
> As much as I like incremental patches, in this case I'm having a hell of a time
> reviewing the code as the vNMI logic ends up being split across four patches.
> E.g. in this particular case, the above requires knowing that svm_inject_nmi()
> never sets HF_NMI_MASK when vNMI is enabled.
> 
> In the next version, any objection to squashing patches 7-10 into a single "Add
> non-nested vNMI support" patch?

No objection at all - again since this is not my patch series, I didn't want
to make too many invasive changes to it.


> 
> As for this code, IMO some pre-work to change the flow would help with the vNMI
> case.  The GIF=0 logic overrides legacy NMI blocking, and so can be handled first.
> And I vote to explicitly set INTERCEPT_IRET in the above case instead of relying
> on INTERCEPT_IRET to already be set by svm_inject_nmi().
> 
> That would yield this as a final result:
> 
> static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
> 
>         /*
>          * GIF=0 blocks NMIs irrespective of legacy NMI blocking.  No need to
>          * intercept or single-step IRET if GIF=0, just intercept STGI.
>          */
>         if (!gif_set(svm)) {
>                 if (vgif)
>                         svm_set_intercept(svm, INTERCEPT_STGI);
>                 return;
>         }
> 
>         /*
>          * NMI is blocked, either because an NMI is in service or because KVM
>          * just injected an NMI.  If KVM is waiting for an intercepted IRET to
>          * complete, single-step the IRET to wait for NMIs to become unblocked.
>          * Otherwise, intercept the guest's next IRET.
>          */
>         if (svm->awaiting_iret_completion) {
>                 svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
>                 svm->nmi_singlestep = true;
>                 svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
>         } else {
>                 svm_set_intercept(svm, INTERCEPT_IRET);
>         }
> }
> 
> >  
> > @@ -3751,9 +3751,14 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> >          * Something prevents NMI from been injected. Single step over possible
> >          * problem (IRET or exception injection or interrupt shadow)
> >          */
> > -       svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> > -       svm->nmi_singlestep = true;
> > -       svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> > +
> > +       if (is_vnmi_enabled(svm)) {
> > +               svm_set_intercept(svm, INTERCEPT_IRET);
> 
> This will break SEV-ES.  Per commit 4444dfe4050b ("KVM: SVM: Add NMI support for
> an SEV-ES guest"), the hypervisor must not rely on IRET interception to detect
> NMI unblocking for SEV-ES guests.  As above, I think we should provide helpers to
> toggle NMI interception to reduce the probability of breaking SEV-ES.
Yes, one more reason for the helpers, I didn't notice that I missed that 'if'.


> 
> > +       } else {
> > +               svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> > +               svm->nmi_singlestep = true;
> > +               svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> > +       }
> >  }
> >  
> >  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> > -- 
> > 2.34.3
> > 
> 
Best regards,
	Maxim Levitsky

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cfec4c98bb589b..eaa30f8ace518d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2477,7 +2477,10 @@  static int iret_interception(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	++vcpu->stat.nmi_window_exits;
-	vcpu->arch.hflags |= HF_IRET_MASK;
+
+	if (!is_vnmi_enabled(svm))
+		vcpu->arch.hflags |= HF_IRET_MASK;
+
 	if (!sev_es_guest(vcpu->kvm)) {
 		svm_clr_intercept(svm, INTERCEPT_IRET);
 		svm->nmi_iret_rip = kvm_rip_read(vcpu);
@@ -3735,9 +3738,6 @@  static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if (is_vnmi_enabled(svm))
-		return;
-
 	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
 		return; /* IRET will cause a vm exit */
 
@@ -3751,9 +3751,14 @@  static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 	 * Something prevents NMI from been injected. Single step over possible
 	 * problem (IRET or exception injection or interrupt shadow)
 	 */
-	svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
-	svm->nmi_singlestep = true;
-	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
+
+	if (is_vnmi_enabled(svm)) {
+		svm_set_intercept(svm, INTERCEPT_IRET);
+	} else {
+		svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
+		svm->nmi_singlestep = true;
+		svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
+	}
 }
 
 static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)

[09/13] KVM: SVM: allow NMI window with vNMI

Commit Message

Comments

Patch