diff mbox series

KVM: x86/xen: Inject vCPU upcall vector when local APIC is enabled

Message ID e95615834074b72b15138a7176953c014b870e14.camel@infradead.org (mailing list archive)
State New, archived
Headers show
Series KVM: x86/xen: Inject vCPU upcall vector when local APIC is enabled | expand

Commit Message

David Woodhouse Oct. 25, 2023, 11:11 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

Linux guests since commit b1c3497e604d ("x86/xen: Add support for
HVMOP_set_evtchn_upcall_vector") in v6.0 onwards will use the per-vCPU
upcall vector when it's advertised in the Xen CPUID leaves.

This is injected through the local APIC as an MSI, as opposed to the
older system vector which was merely injected by the hypervisor any
time the CPU was able to receive an interrupt and the upcall_pending
flags is set in its vcpu_info.

Effectively, that makes it edge triggered instead of level triggered.

We lose edges.

Specifically, when the local APIC is *disabled*, delivering the MSI
will fail. Xen checks the vcpu_info->evtchn_upcall_pending flag when
enabling the local APIC for a vCPU and injects the vector immediately
if so.

Since userspace doesn't get to notice when the guest enables a local
APIC which is emulated in KVM, KVM needs to do the same.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---

This fixes the case when the kernel is handling event channels. And
thankfully to support the SCHEDOP_poll wakeups, we expect userspace to
*promise* that it'll deliver all events through the kernel and never go
setting bits in the port bitmap for itself. Yay!

However... what about older kernels which don't have the
KVM_XEN_HVM_CONFIG_EVTCHN_SEND capability and the VMM does it all?
Well, AFAICT they're screwed. I don't think the VMM gets to *notice*
when the local APIC is enabled/disabled? Right now, QEMU doesn't even
get 'info lapic' right in its monitor, although that's fixable.

We could make QEMU notice when kvm_irqchip_send_msi() reports a
delivery failure, and set a flag which makes it poll the APIC state on
every exit until the APIC is re-enabled? That's kind of awful though.

Any better options I'm missing?

Thankfully, it's not a particularly common failure mode. For it to show
up, I needed to:
 • Use Linux v6.0+ in the guest,
 • Specifically ask QEMU to emulate Xen 4.17 or newer, and
 • Hot-online CPUs in the guest.



 arch/x86/kvm/lapic.c |  5 ++++-
 arch/x86/kvm/xen.c   |  2 +-
 arch/x86/kvm/xen.h   | 18 ++++++++++++++++++
 3 files changed, 23 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dcd60b39e794..f3f097d70b69 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -41,6 +41,7 @@ 
 #include "ioapic.h"
 #include "trace.h"
 #include "x86.h"
+#include "xen.h"
 #include "cpuid.h"
 #include "hyperv.h"
 #include "smm.h"
@@ -499,8 +500,10 @@  static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
 	}
 
 	/* Check if there are APF page ready requests pending */
-	if (enabled)
+	if (enabled) {
 		kvm_make_request(KVM_REQ_APF_READY, apic->vcpu);
+		kvm_xen_enable_lapic(apic->vcpu);
+	}
 }
 
 static inline void kvm_apic_set_xapic_id(struct kvm_lapic *apic, u8 id)
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 40edf4d1974c..0ea6016ad132 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -471,7 +471,7 @@  void kvm_xen_update_runstate(struct kvm_vcpu *v, int state)
 		kvm_xen_update_runstate_guest(v, state == RUNSTATE_runnable);
 }
 
-static void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *v)
+void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *v)
 {
 	struct kvm_lapic_irq irq = { };
 	int r;
diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h
index f8f1fe22d090..8eba3943b246 100644
--- a/arch/x86/kvm/xen.h
+++ b/arch/x86/kvm/xen.h
@@ -18,6 +18,7 @@  extern struct static_key_false_deferred kvm_xen_enabled;
 
 int __kvm_xen_has_interrupt(struct kvm_vcpu *vcpu);
 void kvm_xen_inject_pending_events(struct kvm_vcpu *vcpu);
+void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *vcpu);
 int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data);
 int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data);
 int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data);
@@ -36,6 +37,19 @@  int kvm_xen_setup_evtchn(struct kvm *kvm,
 			 const struct kvm_irq_routing_entry *ue);
 void kvm_xen_update_tsc_info(struct kvm_vcpu *vcpu);
 
+static inline void kvm_xen_enable_lapic(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * The local APIC is being enabled. If the per-vCPU upcall vector is
+	 * set and the vCPU's evtchn_upcall_pending flag is set, inject the
+	 * interrupt.
+	 */
+	if (static_branch_unlikely(&kvm_xen_enabled.key) &&
+	    vcpu->arch.xen.vcpu_info_cache.active &&
+	    vcpu->arch.xen.upcall_vector && __kvm_xen_has_interrupt(vcpu))
+		kvm_xen_inject_vcpu_vector(vcpu);
+}
+
 static inline bool kvm_xen_msr_enabled(struct kvm *kvm)
 {
 	return static_branch_unlikely(&kvm_xen_enabled.key) &&
@@ -101,6 +115,10 @@  static inline void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu)
 {
 }
 
+static inline void kvm_xen_enable_lapic(struct kvm_vcpu *vcpu)
+{
+}
+
 static inline bool kvm_xen_msr_enabled(struct kvm *kvm)
 {
 	return false;