Message ID | 20241101192114.1810198-3-seanjc@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: nVMX: Fix an SVI update bug with passthrough APIC | expand |
>@@ -6873,6 +6873,23 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr) > u16 status; > u8 old; > >+ /* >+ * If L2 is active, defer the SVI update until vmcs01 is loaded, as SVI >+ * is only relevant for if and only if Virtual Interrupt Delivery is >+ * enabled in vmcs12, and if VID is enabled then L2 EOIs affect L2's >+ * vAPIC, not L1's vAPIC. KVM must update vmcs01 on the next nested >+ * VM-Exit, otherwise L1 with run with a stale SVI. >+ */ >+ if (is_guest_mode(vcpu)) { >+ /* >+ * KVM is supposed to forward intercepted L2 EOIs to L1 if VID >+ * is enabled in vmcs12; as above, the EOIs affect L2's vAPIC. >+ */ >+ WARN_ON_ONCE(nested_cpu_has_vid(get_vmcs12(vcpu))); This function can be called in other scenarios, e.g., from kvm_apic_set_state(). Is it possible for the userspace VMM to set the APIC state while L2 is running and VID is enabled for L2? If so, this warning could be triggered by the userspace VMM. Anyway, I verified this patch can fix the reported issue. So, Tested-by: Chao Gao <chao.gao@intel.com> >+ to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true; >+ return; >+ } >+ > if (max_isr == -1) > max_isr = 0;
On Mon, Nov 04, 2024, Chao Gao wrote: > >@@ -6873,6 +6873,23 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr) > > u16 status; > > u8 old; > > > >+ /* > >+ * If L2 is active, defer the SVI update until vmcs01 is loaded, as SVI > >+ * is only relevant for if and only if Virtual Interrupt Delivery is > >+ * enabled in vmcs12, and if VID is enabled then L2 EOIs affect L2's > >+ * vAPIC, not L1's vAPIC. KVM must update vmcs01 on the next nested > >+ * VM-Exit, otherwise L1 with run with a stale SVI. > >+ */ > >+ if (is_guest_mode(vcpu)) { > >+ /* > >+ * KVM is supposed to forward intercepted L2 EOIs to L1 if VID > >+ * is enabled in vmcs12; as above, the EOIs affect L2's vAPIC. > >+ */ > >+ WARN_ON_ONCE(nested_cpu_has_vid(get_vmcs12(vcpu))); > > This function can be called in other scenarios, e.g., from > kvm_apic_set_state(). Is it possible for the userspace VMM to set the APIC > state while L2 is running and VID is enabled for L2? If so, this warning could > be triggered by the userspace VMM. Ugh, yeah, that's reachable. Bummer. Ooh! I think we can use the recently added "wants_to_run" to filter out state stuffing. I.e. this WARN_ON_ONCE(vcpu->wants_to_run && nested_cpu_has_vid(get_vmcs12(vcpu))); plus an updated comment.
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 5be2be44a188..66751bf9d4f4 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -816,6 +816,17 @@ static inline void apic_clear_isr(int vec, struct kvm_lapic *apic) } } +void kvm_apic_update_hwapic_isr(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu->arch.apic; + + if (WARN_ON_ONCE(!lapic_in_kernel(vcpu)) || !apic->apicv_active) + return; + + kvm_x86_call(hwapic_isr_update)(apic->vcpu, apic_find_highest_isr(apic)); +} +EXPORT_SYMBOL_GPL(kvm_apic_update_hwapic_isr); + int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) { /* This may race with setting of irr in __apic_accept_irq() and diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 1b8ef9856422..469a6f20e2db 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -122,6 +122,7 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s); enum lapic_mode kvm_get_apic_mode(struct kvm_vcpu *vcpu); +void kvm_apic_update_hwapic_isr(struct kvm_vcpu *vcpu); int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu); u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 746cb41c5b98..0111539fcea1 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -5036,6 +5036,11 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason, kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu); } + if (vmx->nested.update_vmcs01_hwapic_isr) { + vmx->nested.update_vmcs01_hwapic_isr = false; + kvm_apic_update_hwapic_isr(vcpu); + } + if ((vm_exit_reason != -1) && (enable_shadow_vmcs || nested_vmx_is_evmptr12_valid(vmx))) vmx->nested.need_vmcs12_to_shadow_sync = true; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index fe9887a5fa4a..a3513fc05a01 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6873,6 +6873,23 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr) u16 status; u8 old; + /* + * If L2 is active, defer the SVI update until vmcs01 is loaded, as SVI + * is only relevant for if and only if Virtual Interrupt Delivery is + * enabled in vmcs12, and if VID is enabled then L2 EOIs affect L2's + * vAPIC, not L1's vAPIC. KVM must update vmcs01 on the next nested + * VM-Exit, otherwise L1 with run with a stale SVI. + */ + if (is_guest_mode(vcpu)) { + /* + * KVM is supposed to forward intercepted L2 EOIs to L1 if VID + * is enabled in vmcs12; as above, the EOIs affect L2's vAPIC. + */ + WARN_ON_ONCE(nested_cpu_has_vid(get_vmcs12(vcpu))); + to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true; + return; + } + if (max_isr == -1) max_isr = 0; diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 43f573f6ca46..892302022094 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -176,6 +176,7 @@ struct nested_vmx { bool reload_vmcs01_apic_access_page; bool update_vmcs01_cpu_dirty_logging; bool update_vmcs01_apicv_status; + bool update_vmcs01_hwapic_isr; /* * Enlightened VMCS has been enabled. It does not mean that L1 has to