Message ID | 20211213104634.199141-6-mlevitsk@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RFC: KVM: SVM: Allow L1's AVIC to co-exist with nesting | expand |
On Mon, Dec 13, 2021, Maxim Levitsky wrote: > @@ -1486,6 +1485,12 @@ struct kvm_x86_ops { > int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); > > void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); > + > + /* > + * Returns false if for some reason APICv (e.g guest mode) > + * must be inhibited on this vCPU Comment is wrong, code returns 'true' if AVIC is inhibited due to is_guest_mode(). Even better, rename the hook to something that's more self-documenting. vcpu_is_apicv_inhibited() jumps to mind, but that's a bad name since it's not called by kvm_vcpu_apicv_active(). Maybe vcpu_has_apicv_inhibit_condition()? > + */ > + bool (*apicv_check_inhibit)(struct kvm_vcpu *vcpu); > }; > > struct kvm_x86_nested_ops { > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > index 34f62da2fbadd..5a8304938f51e 100644 > --- a/arch/x86/kvm/svm/avic.c > +++ b/arch/x86/kvm/svm/avic.c > @@ -734,6 +734,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > return 0; > } > > +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu) This should follow whatever wording we decide on for the kvm_x86_ops hook. In isolation, this name is too close to kvm_vcpu_apicv_active() and could be wrongly assumed to mean that APICv is not inhibited for _any_ reason on this vCPU if it returns false. > +{ > + return is_guest_mode(vcpu); > +} > + > bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu) > { > return false; ... > @@ -4486,6 +4493,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { > .complete_emulated_msr = svm_complete_emulated_msr, > > .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, > + .apicv_check_inhibit = avic_is_vcpu_inhibited, This can technically be NULL if nested=0. > }; > > /* > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index daa8ca84afccd..545684ea37353 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -590,6 +590,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); > void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr); > void svm_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr); > int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec); > +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu); > bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu); > int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq, > uint32_t guest_irq, bool set); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 81a74d86ee5eb..125599855af47 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9161,6 +9161,10 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit) > r = kvm_check_nested_events(vcpu); > if (r < 0) > goto out; > + > + /* Nested VM exit might need to update APICv status */ > + if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu)) > + kvm_vcpu_update_apicv(vcpu); > } > > /* try to inject new event if pending */ > @@ -9538,6 +9542,10 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu) > down_read(&vcpu->kvm->arch.apicv_update_lock); > > activate = kvm_apicv_activated(vcpu->kvm); > + > + if (kvm_x86_ops.apicv_check_inhibit) > + activate = activate && !kvm_x86_ops.apicv_check_inhibit(vcpu); Might as well use Use static_call(). This would also be a candidate for DEFINE_STATIC_CALL_RET0, though that's overkill if this is the only call site. > + > if (vcpu->arch.apicv_active == activate) > goto out; > > @@ -9935,7 +9943,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > * per-VM state, and responsing vCPUs must wait for the update > * to complete before servicing KVM_REQ_APICV_UPDATE. > */ > - WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu)); > + if (!is_guest_mode(vcpu)) > + WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu)); > + else > + WARN_ON(kvm_vcpu_apicv_active(vcpu)); Won't this fire on VMX? > > exit_fastpath = static_call(kvm_x86_run)(vcpu); > if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST)) > -- > 2.26.3 >
On Wed, 2022-01-05 at 21:56 +0000, Sean Christopherson wrote: > On Mon, Dec 13, 2021, Maxim Levitsky wrote: > > @@ -1486,6 +1485,12 @@ struct kvm_x86_ops { > > int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); > > > > void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); > > + > > + /* > > + * Returns false if for some reason APICv (e.g guest mode) > > + * must be inhibited on this vCPU > > Comment is wrong, code returns 'true' if AVIC is inhibited due to is_guest_mode(). > Even better, rename the hook to something that's more self-documenting. > > vcpu_is_apicv_inhibited() jumps to mind, but that's a bad name since it's not > called by kvm_vcpu_apicv_active(). Maybe vcpu_has_apicv_inhibit_condition()? Yep. I also kind of don't like the name, but I didn't though of anything better. vcpu_has_apicv_inhibit_condition seems a good idea. > > > + */ > > + bool (*apicv_check_inhibit)(struct kvm_vcpu *vcpu); > > }; > > > > struct kvm_x86_nested_ops { > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > index 34f62da2fbadd..5a8304938f51e 100644 > > --- a/arch/x86/kvm/svm/avic.c > > +++ b/arch/x86/kvm/svm/avic.c > > @@ -734,6 +734,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > > return 0; > > } > > > > +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu) > > This should follow whatever wording we decide on for the kvm_x86_ops hook. In > isolation, this name is too close to kvm_vcpu_apicv_active() and could be wrongly > assumed to mean that APICv is not inhibited for _any_ reason on this vCPU if it > returns false. I will think of a better name. > > > +{ > > + return is_guest_mode(vcpu); > > +} > > + > > bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu) > > { > > return false; > > ... > > > @@ -4486,6 +4493,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { > > .complete_emulated_msr = svm_complete_emulated_msr, > > > > .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, > > + .apicv_check_inhibit = avic_is_vcpu_inhibited, > > This can technically be NULL if nested=0. Good idea, now it is possible to after recent refactoring. > > > }; > > > > /* > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > > index daa8ca84afccd..545684ea37353 100644 > > --- a/arch/x86/kvm/svm/svm.h > > +++ b/arch/x86/kvm/svm/svm.h > > @@ -590,6 +590,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); > > void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr); > > void svm_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr); > > int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec); > > +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu); > > bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu); > > int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq, > > uint32_t guest_irq, bool set); > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 81a74d86ee5eb..125599855af47 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -9161,6 +9161,10 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit) > > r = kvm_check_nested_events(vcpu); > > if (r < 0) > > goto out; > > + > > + /* Nested VM exit might need to update APICv status */ > > + if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu)) > > + kvm_vcpu_update_apicv(vcpu); > > } > > > > /* try to inject new event if pending */ > > @@ -9538,6 +9542,10 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu) > > down_read(&vcpu->kvm->arch.apicv_update_lock); > > > > activate = kvm_apicv_activated(vcpu->kvm); > > + > > + if (kvm_x86_ops.apicv_check_inhibit) > > + activate = activate && !kvm_x86_ops.apicv_check_inhibit(vcpu); > > Might as well use Use static_call(). This would also be a candidate for > DEFINE_STATIC_CALL_RET0, though that's overkill if this is the only call site. This is also something that should be done, but I prefer to do this in one go. There are several nested related functions that were not converted to static_call (like .check_nested_events). Also I recently found that we have KVM_X86_OP and KVM_X86_OP_NULL which are the same thing - another thing for refactoring, so I prefer to refactor this in one patch series. > > > + > > if (vcpu->arch.apicv_active == activate) > > goto out; > > > > @@ -9935,7 +9943,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > > * per-VM state, and responsing vCPUs must wait for the update > > * to complete before servicing KVM_REQ_APICV_UPDATE. > > */ > > - WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu)); > > + if (!is_guest_mode(vcpu)) > > + WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu)); > > + else > > + WARN_ON(kvm_vcpu_apicv_active(vcpu)); > > Won't this fire on VMX? Yes it will! Good catch. It almost like I would like to have .apicv_is_avic boolean, for such cases :-) I'll think of something. Best regards, Maxim Levitsky > > > > > exit_fastpath = static_call(kvm_x86_run)(vcpu); > > if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST)) > > -- > > 2.26.3 > >
On Thu, Jan 06, 2022, Maxim Levitsky wrote: > Also I recently found that we have KVM_X86_OP and KVM_X86_OP_NULL which are the > same thing - another thing for refactoring, so I prefer to refactor this > in one patch series. I have a mostly-complete patch series that removes KVM_X86_OP_NULL, will get it finished and posted after I get through my review backlog.
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index c2b007171abd2..d9d7459ef9e8f 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -119,6 +119,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush) KVM_X86_OP_NULL(migrate_timers) KVM_X86_OP(msr_filter_changed) KVM_X86_OP_NULL(complete_emulated_msr) +KVM_X86_OP_NULL(apicv_check_inhibit); #undef KVM_X86_OP #undef KVM_X86_OP_NULL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e863d569c89a4..78b3793cc08c5 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1036,7 +1036,6 @@ struct kvm_x86_msr_filter { #define APICV_INHIBIT_REASON_DISABLE 0 #define APICV_INHIBIT_REASON_HYPERV 1 -#define APICV_INHIBIT_REASON_NESTED 2 #define APICV_INHIBIT_REASON_IRQWIN 3 #define APICV_INHIBIT_REASON_PIT_REINJ 4 #define APICV_INHIBIT_REASON_X2APIC 5 @@ -1486,6 +1485,12 @@ struct kvm_x86_ops { int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); + + /* + * Returns false if for some reason APICv (e.g guest mode) + * must be inhibited on this vCPU + */ + bool (*apicv_check_inhibit)(struct kvm_vcpu *vcpu); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 34f62da2fbadd..5a8304938f51e 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -734,6 +734,11 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) return 0; } +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu) +{ + return is_guest_mode(vcpu); +} + bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu) { return false; @@ -950,7 +955,6 @@ bool svm_check_apicv_inhibit_reasons(ulong bit) ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) | BIT(APICV_INHIBIT_REASON_ABSENT) | BIT(APICV_INHIBIT_REASON_HYPERV) | - BIT(APICV_INHIBIT_REASON_NESTED) | BIT(APICV_INHIBIT_REASON_IRQWIN) | BIT(APICV_INHIBIT_REASON_PIT_REINJ) | BIT(APICV_INHIBIT_REASON_X2APIC) | diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index cf206855ebf09..bf17c2d7cf321 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -551,11 +551,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm) * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes. */ - /* - * Also covers avic_vapic_bar, avic_backing_page, avic_logical_id, - * avic_physical_id. - */ - WARN_ON(kvm_apicv_activated(svm->vcpu.kvm)); /* Copied from vmcb01. msrpm_base can be overwritten later. */ svm->vmcb->control.nested_ctl = svm->vmcb01.ptr->control.nested_ctl; @@ -659,6 +654,9 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, svm_set_gif(svm, true); + if (kvm_vcpu_apicv_active(vcpu)) + kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu); + return 0; } @@ -923,6 +921,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm) if (unlikely(svm->vmcb->save.rflags & X86_EFLAGS_TF)) kvm_queue_exception(&(svm->vcpu), DB_VECTOR); + if (kvm_apicv_activated(vcpu->kvm)) + kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu); + return 0; } diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 468cc385c35f0..e4228580286e8 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1383,7 +1383,8 @@ static void svm_set_vintr(struct vcpu_svm *svm) /* * The following fields are ignored when AVIC is enabled */ - WARN_ON(kvm_apicv_activated(svm->vcpu.kvm)); + if (!is_guest_mode(&svm->vcpu)) + WARN_ON(kvm_apicv_activated(svm->vcpu.kvm)); svm_set_intercept(svm, INTERCEPT_VINTR); @@ -2853,10 +2854,16 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu) svm_clear_vintr(to_svm(vcpu)); /* - * For AVIC, the only reason to end up here is ExtINTs. + * If not running nested, for AVIC, the only reason to end up here is ExtINTs. * In this case AVIC was temporarily disabled for * requesting the IRQ window and we have to re-enable it. + * + * If running nested, still uninhibit the AVIC in case irq window + * was requested when it was not running nested. + * All vCPUs which run nested will have their AVIC still + * inhibited due to AVIC inhibition override for that. */ + kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_IRQWIN); ++vcpu->stat.irq_window_exits; @@ -3410,8 +3417,16 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu) * unless we have pending ExtINT since it cannot be injected * via AVIC. In such case, we need to temporarily disable AVIC, * and fallback to injecting IRQ via V_IRQ. + * + * If running nested, this vCPU will use separate page tables + * which don't have L1's AVIC mapped, and the AVIC is + * already inhibited thus there is no need for global + * AVIC inhibition. */ - kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN); + + if (!is_guest_mode(vcpu)) + kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN); + svm_set_vintr(svm); } } @@ -3881,14 +3896,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC)) kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_X2APIC); - - /* - * Currently, AVIC does not work with nested virtualization. - * So, we disable AVIC when cpuid for SVM is set in the L1 guest. - */ - if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM)) - kvm_request_apicv_update(vcpu->kvm, false, - APICV_INHIBIT_REASON_NESTED); } init_vmcb_after_set_cpuid(vcpu); } @@ -4486,6 +4493,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .complete_emulated_msr = svm_complete_emulated_msr, .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, + .apicv_check_inhibit = avic_is_vcpu_inhibited, }; /* diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index daa8ca84afccd..545684ea37353 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -590,6 +590,7 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr); void svm_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr); int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec); +bool avic_is_vcpu_inhibited(struct kvm_vcpu *vcpu); bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu); int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq, bool set); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 81a74d86ee5eb..125599855af47 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9161,6 +9161,10 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit) r = kvm_check_nested_events(vcpu); if (r < 0) goto out; + + /* Nested VM exit might need to update APICv status */ + if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu)) + kvm_vcpu_update_apicv(vcpu); } /* try to inject new event if pending */ @@ -9538,6 +9542,10 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu) down_read(&vcpu->kvm->arch.apicv_update_lock); activate = kvm_apicv_activated(vcpu->kvm); + + if (kvm_x86_ops.apicv_check_inhibit) + activate = activate && !kvm_x86_ops.apicv_check_inhibit(vcpu); + if (vcpu->arch.apicv_active == activate) goto out; @@ -9935,7 +9943,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) * per-VM state, and responsing vCPUs must wait for the update * to complete before servicing KVM_REQ_APICV_UPDATE. */ - WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu)); + if (!is_guest_mode(vcpu)) + WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu)); + else + WARN_ON(kvm_vcpu_apicv_active(vcpu)); exit_fastpath = static_call(kvm_x86_run)(vcpu); if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
Inhibit the AVIC of the vCPU that is running nested for the duration of the nested run, so that all interrupts arriving from both its vCPU siblings and from the KVM are delivered using normal IPIs and cause that vCPU to vmexit. Note that unlike normal AVIC inhibition, there is no need to update the AVIC mmio memslot, because the nested guest uses its own set of paging tables. That also means that AVIC doesn't need to be inhibited VM wide. Note that in theory when a nested guest doesn't intercept physical interrupts, we could continue using AVIC to deliver them to it but don't bother doing this. Plus when nested AVIC is implemented, the nested guest will likely use it, which will not allow this optimization to be used anyway. (can't use real AVIC to support both L1 and L2 at the same time) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 7 ++++++- arch/x86/kvm/svm/avic.c | 6 +++++- arch/x86/kvm/svm/nested.c | 11 ++++++----- arch/x86/kvm/svm/svm.c | 30 +++++++++++++++++++----------- arch/x86/kvm/svm/svm.h | 1 + arch/x86/kvm/x86.c | 13 ++++++++++++- 7 files changed, 50 insertions(+), 19 deletions(-)