From patchwork Wed Oct 25 23:11:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 13436916 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68799C25B6B for ; Wed, 25 Oct 2023 23:11:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229877AbjJYXLL (ORCPT ); Wed, 25 Oct 2023 19:11:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229709AbjJYXLJ (ORCPT ); Wed, 25 Oct 2023 19:11:09 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4476EA3 for ; Wed, 25 Oct 2023 16:11:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=MIME-Version:Content-Type:Date:Cc:To: From:Subject:Message-ID:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=0unIc+HRHCa9G5yOPsrIgYuKh+PMvqm3Sr3oat2XQyo=; b=MmzyD8RDRZqBdm+rJSPyMt5WXC T0Bus+jprvpp7XMcXqOZtkvb2Vbr6TYkHRAqmWf2cAB/ba6j3bsXqTCU38F70btLfGsLAOVVsGtZo EVwYadG2xErtzHYlW/IpAlJkHIVXmtZIJn+pTk4se57kxjjoNTCpnA2D1VfZc0F5CjQS9HJeaGuHj 66RHGGrUWobdMhyP5LV2dFIupcbwK+XB854yg5xY2I4UFCXRqngXJAtDLo9BAkotJrfwraHdICu5m +bp11ilqnRvK86sdsz7t6WzT/616G7hVQ9t9rdjjYwN3WD2Dk16xsEXCDiIC+8JDe/nvLonKndNpl qvR8EmNA==; Received: from [2001:8b0:10b:5:5bd0:63a6:23b2:1881] (helo=u3832b3a9db3152.ant.amazon.com) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qvn1i-00BXjr-Qf; Wed, 25 Oct 2023 23:11:02 +0000 Message-ID: Subject: [PATCH] KVM: x86/xen: Inject vCPU upcall vector when local APIC is enabled From: David Woodhouse To: kvm@vger.kernel.org, qemu-devel Cc: Sean Christopherson , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Paul Durrant Date: Thu, 26 Oct 2023 00:11:02 +0100 User-Agent: Evolution 3.44.4-0ubuntu2 MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse Linux guests since commit b1c3497e604d ("x86/xen: Add support for HVMOP_set_evtchn_upcall_vector") in v6.0 onwards will use the per-vCPU upcall vector when it's advertised in the Xen CPUID leaves. This is injected through the local APIC as an MSI, as opposed to the older system vector which was merely injected by the hypervisor any time the CPU was able to receive an interrupt and the upcall_pending flags is set in its vcpu_info. Effectively, that makes it edge triggered instead of level triggered. We lose edges. Specifically, when the local APIC is *disabled*, delivering the MSI will fail. Xen checks the vcpu_info->evtchn_upcall_pending flag when enabling the local APIC for a vCPU and injects the vector immediately if so. Since userspace doesn't get to notice when the guest enables a local APIC which is emulated in KVM, KVM needs to do the same. Signed-off-by: David Woodhouse --- This fixes the case when the kernel is handling event channels. And thankfully to support the SCHEDOP_poll wakeups, we expect userspace to *promise* that it'll deliver all events through the kernel and never go setting bits in the port bitmap for itself. Yay! However... what about older kernels which don't have the KVM_XEN_HVM_CONFIG_EVTCHN_SEND capability and the VMM does it all? Well, AFAICT they're screwed. I don't think the VMM gets to *notice* when the local APIC is enabled/disabled? Right now, QEMU doesn't even get 'info lapic' right in its monitor, although that's fixable. We could make QEMU notice when kvm_irqchip_send_msi() reports a delivery failure, and set a flag which makes it poll the APIC state on every exit until the APIC is re-enabled? That's kind of awful though. Any better options I'm missing? Thankfully, it's not a particularly common failure mode. For it to show up, I needed to: • Use Linux v6.0+ in the guest, • Specifically ask QEMU to emulate Xen 4.17 or newer, and • Hot-online CPUs in the guest. arch/x86/kvm/lapic.c | 5 ++++- arch/x86/kvm/xen.c | 2 +- arch/x86/kvm/xen.h | 18 ++++++++++++++++++ 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index dcd60b39e794..f3f097d70b69 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -41,6 +41,7 @@ #include "ioapic.h" #include "trace.h" #include "x86.h" +#include "xen.h" #include "cpuid.h" #include "hyperv.h" #include "smm.h" @@ -499,8 +500,10 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val) } /* Check if there are APF page ready requests pending */ - if (enabled) + if (enabled) { kvm_make_request(KVM_REQ_APF_READY, apic->vcpu); + kvm_xen_enable_lapic(apic->vcpu); + } } static inline void kvm_apic_set_xapic_id(struct kvm_lapic *apic, u8 id) diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 40edf4d1974c..0ea6016ad132 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -471,7 +471,7 @@ void kvm_xen_update_runstate(struct kvm_vcpu *v, int state) kvm_xen_update_runstate_guest(v, state == RUNSTATE_runnable); } -static void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *v) +void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *v) { struct kvm_lapic_irq irq = { }; int r; diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h index f8f1fe22d090..8eba3943b246 100644 --- a/arch/x86/kvm/xen.h +++ b/arch/x86/kvm/xen.h @@ -18,6 +18,7 @@ extern struct static_key_false_deferred kvm_xen_enabled; int __kvm_xen_has_interrupt(struct kvm_vcpu *vcpu); void kvm_xen_inject_pending_events(struct kvm_vcpu *vcpu); +void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *vcpu); int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data); int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data); int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data); @@ -36,6 +37,19 @@ int kvm_xen_setup_evtchn(struct kvm *kvm, const struct kvm_irq_routing_entry *ue); void kvm_xen_update_tsc_info(struct kvm_vcpu *vcpu); +static inline void kvm_xen_enable_lapic(struct kvm_vcpu *vcpu) +{ + /* + * The local APIC is being enabled. If the per-vCPU upcall vector is + * set and the vCPU's evtchn_upcall_pending flag is set, inject the + * interrupt. + */ + if (static_branch_unlikely(&kvm_xen_enabled.key) && + vcpu->arch.xen.vcpu_info_cache.active && + vcpu->arch.xen.upcall_vector && __kvm_xen_has_interrupt(vcpu)) + kvm_xen_inject_vcpu_vector(vcpu); +} + static inline bool kvm_xen_msr_enabled(struct kvm *kvm) { return static_branch_unlikely(&kvm_xen_enabled.key) && @@ -101,6 +115,10 @@ static inline void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu) { } +static inline void kvm_xen_enable_lapic(struct kvm_vcpu *vcpu) +{ +} + static inline bool kvm_xen_msr_enabled(struct kvm *kvm) { return false;