From patchwork Fri Oct 9 10:46:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 11825569 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D745A6CA for ; Fri, 9 Oct 2020 10:46:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AD6F6222C8 for ; Fri, 9 Oct 2020 10:46:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="cQC3Sh+2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387815AbgJIKq3 (ORCPT ); Fri, 9 Oct 2020 06:46:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387787AbgJIKq0 (ORCPT ); Fri, 9 Oct 2020 06:46:26 -0400 Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31665C0613D9; Fri, 9 Oct 2020 03:46:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=CsaHshU52iFBF6irbPkHUNEv7y5J4Xj0UjALSbLyJyI=; b=cQC3Sh+2XWLLTkMC7OJJgsl7li a7twtM3yjvC6QruFzDuG8X9D3e7DM7Gh0+f+yZP1s8OyQSarZBHxk7zG6701WCYZD47DIWAgbz1uG mU+5OAWNyLTlJDbQeL+ub5H6IMVa42pFWho2raiy5M1idEC9HTyDzkD34kSz4/klBGQW1p/GzKimz jKykXI4gBhxC/Zxn7ChRnz/d913FTfIMg8s29h/1Ii5ZcM8yrQsttQNsb3OVS4dXWIr9zmHQqXp4p NfrdWPVmvMW1zG9TRqBYgu4cGNHgUJzpTDDwPo3wOR4mKRutuKYHDW8zUXbcoF/+4aJU4xkLcbJx7 ZBOeMUGQ==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kQpuq-00050T-1Y; Fri, 09 Oct 2020 10:46:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.93 #3 (Red Hat Linux)) id 1kQpup-005W3u-17; Fri, 09 Oct 2020 11:46:23 +0100 From: David Woodhouse To: x86@kernel.org Cc: kvm , Thomas Gleixner , Paolo Bonzini , linux-kernel , linux-hyperv@vger.kernel.org Subject: [PATCH v2 1/8] x86/apic: Fix x2apic enablement without interrupt remapping Date: Fri, 9 Oct 2020 11:46:09 +0100 Message-Id: <20201009104616.1314746-2-dwmw2@infradead.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201009104616.1314746-1-dwmw2@infradead.org> References: <803bb6b2212e65c568c84ff6882c2aa8a0ee03d5.camel@infradead.org> <20201009104616.1314746-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by merlin.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse Currently, Linux as a hypervisor guest will enable x2apic only if there are no CPUs present at boot time with an APIC ID above 255. Hotplugging a CPU later with a higher APIC ID would result in a CPU which cannot be targeted by external interrupts. Add a filter in x2apic_apic_id_valid() which can be used to prevent such CPUs from coming online, and allow x2apic to be enabled even if they are present at boot time. Fixes: ce69a784504 ("x86/apic: Enable x2APIC without interrupt remapping under KVM") Signed-off-by: David Woodhouse --- arch/x86/include/asm/apic.h | 1 + arch/x86/kernel/apic/apic.c | 14 ++++++++------ arch/x86/kernel/apic/x2apic_phys.c | 9 +++++++++ 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 1c129abb7f09..b0fd204e0023 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -259,6 +259,7 @@ static inline u64 native_x2apic_icr_read(void) extern int x2apic_mode; extern int x2apic_phys; +extern void __init x2apic_set_max_apicid(u32 apicid); extern void __init check_x2apic(void); extern void x2apic_setup(void); static inline int x2apic_enabled(void) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index b3eef1d5c903..113f6ca7b828 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1841,20 +1841,22 @@ static __init void try_to_enable_x2apic(int remap_mode) return; if (remap_mode != IRQ_REMAP_X2APIC_MODE) { - /* IR is required if there is APIC ID > 255 even when running - * under KVM + /* + * Using X2APIC without IR is not architecturally supported + * on bare metal but may be supported in guests. */ - if (max_physical_apicid > 255 || - !x86_init.hyper.x2apic_available()) { + if (!x86_init.hyper.x2apic_available()) { pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n"); x2apic_disable(); return; } /* - * without IR all CPUs can be addressed by IOAPIC/MSI - * only in physical mode + * Without IR, all CPUs can be addressed by IOAPIC/MSI only + * in physical mode, and CPUs with an APIC ID that cannnot + * be addressed must not be brought online. */ + x2apic_set_max_apicid(255); x2apic_phys = 1; } x2apic_enable(); diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c index bc9693841353..e14eae6d6ea7 100644 --- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -8,6 +8,12 @@ int x2apic_phys; static struct apic apic_x2apic_phys; +static u32 x2apic_max_apicid __ro_after_init; + +void __init x2apic_set_max_apicid(u32 apicid) +{ + x2apic_max_apicid = apicid; +} static int __init set_x2apic_phys_mode(char *arg) { @@ -98,6 +104,9 @@ static int x2apic_phys_probe(void) /* Common x2apic functions, also used by x2apic_cluster */ int x2apic_apic_id_valid(u32 apicid) { + if (x2apic_max_apicid && apicid > x2apic_max_apicid) + return 0; + return 1; } From patchwork Fri Oct 9 10:46:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 11825579 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D19F16CA for ; Fri, 9 Oct 2020 10:47:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A77CD222B9 for ; Fri, 9 Oct 2020 10:47:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="GZjnKez0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387839AbgJIKrX (ORCPT ); Fri, 9 Oct 2020 06:47:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47024 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387790AbgJIKq1 (ORCPT ); Fri, 9 Oct 2020 06:46:27 -0400 Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37AFCC0613DA; Fri, 9 Oct 2020 03:46:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=nIfuILCsIGon8Erk55Ntq680fWPgiNl5Dn9hBdKaUu8=; b=GZjnKez0xwYHxC+pKfa6q70uxS SAUEa/jjusT371+OcJpvssYY+kBwmE7vdcSXSVX8j5hjkPbZr2shwiqQVN5nLrXji01SQM2OnkJDq pB7SCCmyOl/G5qtDG2CzQIzfm9tB5fQmwQLD21ZP0pKliWlyVFMAydSIO2eqyfI1EZzzdSWhxJ97B NbRidNxyhdKW9mJCHpkRMQzFUcLXOKjO1iHgZKCoICKo37bSGn//hyo85VDzzLja69HGHop2SvjhL SCNFE2E04erSzO1+4RnnG271RVETdP5oRUGOO+XG30ekfnnAcvzqHV9OmRnC7X57UBuzN7gF4KvF8 9S1V/t0g==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kQpuq-00050U-2Y; Fri, 09 Oct 2020 10:46:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.93 #3 (Red Hat Linux)) id 1kQpup-005W3z-1t; Fri, 09 Oct 2020 11:46:23 +0100 From: David Woodhouse To: x86@kernel.org Cc: kvm , Thomas Gleixner , Paolo Bonzini , linux-kernel , linux-hyperv@vger.kernel.org Subject: [PATCH v2 2/8] x86/msi: Only use high bits of MSI address for DMAR unit Date: Fri, 9 Oct 2020 11:46:10 +0100 Message-Id: <20201009104616.1314746-3-dwmw2@infradead.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201009104616.1314746-1-dwmw2@infradead.org> References: <803bb6b2212e65c568c84ff6882c2aa8a0ee03d5.camel@infradead.org> <20201009104616.1314746-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by merlin.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse The Intel IOMMU has an MSI-like configuration for its interrupt, but it isn't really MSI. So it gets to abuse the high 32 bits of the address, and puts the high 24 bits of the extended APIC ID there. This isn't something that can be used in the general case for real MSIs, since external devices using the high bits of the address would be performing writes to actual memory space above 4GiB, not targeted at the APIC. Factor the hack out and allow it only to be used when appropriate, adding a WARN_ON_ONCE() if other MSIs are targeted at an unreachable APIC ID. That should never happen since the compatibility MSI messages are not used when Interrupt Remapping is enabled. The x2apic_enabled() check isn't needed because Linux won't bring up CPUs with higher APIC IDs unless IR and x2apic are enabled anyway. Signed-off-by: David Woodhouse --- arch/x86/kernel/apic/msi.c | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 6313f0a05db7..516df47bde73 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -23,13 +23,11 @@ struct irq_domain *x86_pci_msi_default_domain __ro_after_init; -static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg) +static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, + bool dmar) { msg->address_hi = MSI_ADDR_BASE_HI; - if (x2apic_enabled()) - msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); - msg->address_lo = MSI_ADDR_BASE_LO | ((apic->irq_dest_mode == 0) ? @@ -43,18 +41,29 @@ static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg) MSI_DATA_LEVEL_ASSERT | MSI_DATA_DELIVERY_FIXED | MSI_DATA_VECTOR(cfg->vector); + + /* + * Only the IOMMU itself can use the trick of putting destination + * APIC ID into the high bits of the address. Anything else would + * just be writing to memory if it tried that, and needs IR to + * address higher APIC IDs. + */ + if (dmar) + msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); + else + WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid)); } void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) { - __irq_msi_compose_msg(irqd_cfg(data), msg); + __irq_msi_compose_msg(irqd_cfg(data), msg, false); } static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg) { struct msi_msg msg[2] = { [1] = { }, }; - __irq_msi_compose_msg(cfg, msg); + __irq_msi_compose_msg(cfg, msg, false); irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg); } @@ -276,6 +285,17 @@ struct irq_domain *arch_create_remap_msi_irq_domain(struct irq_domain *parent, #endif #ifdef CONFIG_DMAR_TABLE +/* + * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the + * high bits of the destination APIC ID. This can't be done in the general + * case for MSIs as it would be targeting real memory above 4GiB not the + * APIC. + */ +static void dmar_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) +{ + __irq_msi_compose_msg(irqd_cfg(data), msg, true); +} + static void dmar_msi_write_msg(struct irq_data *data, struct msi_msg *msg) { dmar_msi_write(data->irq, msg); @@ -288,6 +308,7 @@ static struct irq_chip dmar_msi_controller = { .irq_ack = irq_chip_ack_parent, .irq_set_affinity = msi_domain_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, + .irq_compose_msi_msg = dmar_msi_compose_msg, .irq_write_msi_msg = dmar_msi_write_msg, .flags = IRQCHIP_SKIP_SET_WAKE, }; From patchwork Fri Oct 9 10:46:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 11825577 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3394E15E6 for ; Fri, 9 Oct 2020 10:47:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 07896222EB for ; Fri, 9 Oct 2020 10:47:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="iIrdS+ck" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387895AbgJIKrK (ORCPT ); Fri, 9 Oct 2020 06:47:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387791AbgJIKq1 (ORCPT ); Fri, 9 Oct 2020 06:46:27 -0400 Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D6A5C0613DB; Fri, 9 Oct 2020 03:46:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=kuTqzhVG0LNb29VAmYWMEQAkS2Bnr6LWN3rexef6i8o=; b=iIrdS+ckQwuVrNYOdpqaSiorVa qRlO8pbCglkMo20YCllSvoOE5u4zgtkMjWA4qYSiU9GOJFFXaFHNS8tWwbQUyOchwBHerFPz0X2vq gWK+hn9D5dt94wn7AmtGs9X3SzdClWvYgNAsS++IGPB3Oeroid8KydreIG+VDCgk0sZA4vBbPj00r EtcB++wlrsEylsyMC5hqGZ0PiGo41x5lQK3GFy4OoVn5J86NPZGhRXCp+F2OTjhG7t/F4/Z39GEeT iHgD2ly7jm/pk/ojws6qtbdHpXhoFrhqrILAGOIMO/rTx7KhGGI+ToDhraDVjXoY4XTFRTq/OWAhO DVAD4PBg==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kQpuq-00050V-3M; Fri, 09 Oct 2020 10:46:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.93 #3 (Red Hat Linux)) id 1kQpup-005W44-2i; Fri, 09 Oct 2020 11:46:23 +0100 From: David Woodhouse To: x86@kernel.org Cc: kvm , Thomas Gleixner , Paolo Bonzini , linux-kernel , linux-hyperv@vger.kernel.org Subject: [PATCH v2 3/8] x86/apic: Always provide irq_compose_msi_msg() method for vector domain Date: Fri, 9 Oct 2020 11:46:11 +0100 Message-Id: <20201009104616.1314746-4-dwmw2@infradead.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201009104616.1314746-1-dwmw2@infradead.org> References: <803bb6b2212e65c568c84ff6882c2aa8a0ee03d5.camel@infradead.org> <20201009104616.1314746-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by merlin.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse This shouldn't be dependent on PCI_MSI. HPET and I/OAPIC can deliver interrupts through MSI without having any PCI in the system at all. Signed-off-by: David Woodhouse --- arch/x86/include/asm/apic.h | 8 +++----- arch/x86/kernel/apic/apic.c | 32 +++++++++++++++++++++++++++++++ arch/x86/kernel/apic/msi.c | 36 ----------------------------------- arch/x86/kernel/apic/vector.c | 6 ++++++ 4 files changed, 41 insertions(+), 41 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index b0fd204e0023..f5b88fb723bf 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -521,12 +521,10 @@ static inline void apic_smt_update(void) { } #endif struct msi_msg; +struct irq_cfg; -#ifdef CONFIG_PCI_MSI -void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg); -#else -# define x86_vector_msi_compose_msg NULL -#endif +extern void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, + bool dmar); extern void ioapic_zap_locks(void); diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 113f6ca7b828..fba3ba383ad2 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include #include @@ -2480,6 +2481,37 @@ int hard_smp_processor_id(void) return read_apic_id(); } +void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, + bool dmar) +{ + msg->address_hi = MSI_ADDR_BASE_HI; + + msg->address_lo = + MSI_ADDR_BASE_LO | + ((apic->irq_dest_mode == 0) ? + MSI_ADDR_DEST_MODE_PHYSICAL : + MSI_ADDR_DEST_MODE_LOGICAL) | + MSI_ADDR_REDIRECTION_CPU | + MSI_ADDR_DEST_ID(cfg->dest_apicid); + + msg->data = + MSI_DATA_TRIGGER_EDGE | + MSI_DATA_LEVEL_ASSERT | + MSI_DATA_DELIVERY_FIXED | + MSI_DATA_VECTOR(cfg->vector); + + /* + * Only the IOMMU itself can use the trick of putting destination + * APIC ID into the high bits of the address. Anything else would + * just be writing to memory if it tried that, and needs IR to + * address higher APIC IDs. + */ + if (dmar) + msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); + else + WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid)); +} + /* * Override the generic EOI implementation with an optimized version. * Only called during early boot when only one CPU is active and with diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 516df47bde73..de585cfa4d6c 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -23,42 +23,6 @@ struct irq_domain *x86_pci_msi_default_domain __ro_after_init; -static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, - bool dmar) -{ - msg->address_hi = MSI_ADDR_BASE_HI; - - msg->address_lo = - MSI_ADDR_BASE_LO | - ((apic->irq_dest_mode == 0) ? - MSI_ADDR_DEST_MODE_PHYSICAL : - MSI_ADDR_DEST_MODE_LOGICAL) | - MSI_ADDR_REDIRECTION_CPU | - MSI_ADDR_DEST_ID(cfg->dest_apicid); - - msg->data = - MSI_DATA_TRIGGER_EDGE | - MSI_DATA_LEVEL_ASSERT | - MSI_DATA_DELIVERY_FIXED | - MSI_DATA_VECTOR(cfg->vector); - - /* - * Only the IOMMU itself can use the trick of putting destination - * APIC ID into the high bits of the address. Anything else would - * just be writing to memory if it tried that, and needs IR to - * address higher APIC IDs. - */ - if (dmar) - msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); - else - WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid)); -} - -void x86_vector_msi_compose_msg(struct irq_data *data, struct msi_msg *msg) -{ - __irq_msi_compose_msg(irqd_cfg(data), msg, false); -} - static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg) { struct msi_msg msg[2] = { [1] = { }, }; diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 1eac53632786..bb2e2a2488a5 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -818,6 +818,12 @@ void apic_ack_edge(struct irq_data *irqd) apic_ack_irq(irqd); } +static void x86_vector_msi_compose_msg(struct irq_data *data, + struct msi_msg *msg) +{ + __irq_msi_compose_msg(irqd_cfg(data), msg, false); +} + static struct irq_chip lapic_controller = { .name = "APIC", .irq_ack = apic_ack_edge, From patchwork Fri Oct 9 10:46:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 11825571 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A81FC6CA for ; Fri, 9 Oct 2020 10:46:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5EEDF222E9 for ; Fri, 9 Oct 2020 10:46:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="j5rAuQlV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387821AbgJIKq3 (ORCPT ); Fri, 9 Oct 2020 06:46:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387784AbgJIKq0 (ORCPT ); Fri, 9 Oct 2020 06:46:26 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DE3AC0613D5; Fri, 9 Oct 2020 03:46:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description; bh=bdghlBusCsH3kpLXSnhxxUf/8cOhgwwRUP0AR3Z0byw=; b=j5rAuQlVREykhOxWr7eEsZTMtm 6ERPmnEYJIbCFbCV1UUp833RTi2eVfqqKx70Xop4DCldVMLeXHl976xp0Ax3oruLd/9DnNuWgi5Cz l134Oku7cHPkE/I3Jk0BlBOPzcI1w5z58WZCzD+b09iiaagBv/0YF9Er+NjpcleBTMae3IPdvaiDY FLAVhVWldvWhFodxIXcUDyAv751yWXts8hIDKBLP8vtE/LMQNT6VjctiiIPRqqs4YxnhROVOLSp2u tsZcujmnT434XEwR8PYBPuLhO+iG2aSjNM1Zw8tz/A8EiBdmv/JZoZjcG3P5TcwBor3mM3A2KT25f Q+6hvCiw==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kQpup-0000jr-HD; Fri, 09 Oct 2020 10:46:23 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.93 #3 (Red Hat Linux)) id 1kQpup-005W49-3W; Fri, 09 Oct 2020 11:46:23 +0100 From: David Woodhouse To: x86@kernel.org Cc: kvm , Thomas Gleixner , Paolo Bonzini , linux-kernel , linux-hyperv@vger.kernel.org Subject: [PATCH v2 4/8] x86/ioapic: Handle Extended Destination ID field in RTE Date: Fri, 9 Oct 2020 11:46:12 +0100 Message-Id: <20201009104616.1314746-5-dwmw2@infradead.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201009104616.1314746-1-dwmw2@infradead.org> References: <803bb6b2212e65c568c84ff6882c2aa8a0ee03d5.camel@infradead.org> <20201009104616.1314746-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to bits 19-4 of the address used in the resulting MSI cycle. Historically, the x86 MSI format only used the top 8 of those 16 bits as the destination APIC ID, and the "Extended Destination ID" in the lower 8 bits was unused. With interrupt remapping, the lowest bit of the Extended Destination ID (bit 48 of RTE, bit 4 of MSI address) is now used to indicate a remappable format MSI. A hypervisor can use the other 7 bits of the Extended Destination ID to permit guests to address up to 15 bits of APIC IDs, thus allowing 32768 vCPUs before having to expose a vIOMMU and interrupt remapping to the guest. This enlightenment could theoretically be transparent to the I/OAPIC code if it were always generating its RTE from an MSI message created by the parent irqchip. That cleanup will happen separately but doesn't cover all cases — for the ExtINT hackery and restoring boot mode, RTEs are still generated locally. So we have to teach the I/OAPIC about the ext_dest bits anyway. No behavioural change in this patch, since nothing yet permits APIC IDs above 255 to be used with the non-IR I/OAPIC domain. Signed-off-by: David Woodhouse --- arch/x86/include/asm/io_apic.h | 3 ++- arch/x86/kernel/apic/io_apic.c | 19 +++++++++++++------ 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h index a1a26f6d3aa4..5bb3cf4ff2fd 100644 --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -78,7 +78,8 @@ struct IO_APIC_route_entry { mask : 1, /* 0: enabled, 1: disabled */ __reserved_2 : 15; - __u32 __reserved_3 : 24, + __u32 __reserved_3 : 17, + virt_ext_dest : 7, dest : 8; } __attribute__ ((packed)); diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index a33380059db6..54f6a029b1d1 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1239,10 +1239,10 @@ static void io_apic_print_entries(unsigned int apic, unsigned int nr_entries) buf, (ir_entry->index2 << 15) | ir_entry->index, ir_entry->zero); else - printk(KERN_DEBUG "%s, %s, D(%02X), M(%1d)\n", + printk(KERN_DEBUG "%s, %s, D(%02X%02X), M(%1d)\n", buf, entry.dest_mode == IOAPIC_DEST_MODE_LOGICAL ? - "logical " : "physical", + "logical " : "physical", entry.virt_ext_dest, entry.dest, entry.delivery_mode); } } @@ -1410,6 +1410,7 @@ void native_restore_boot_irq_mode(void) */ if (ioapic_i8259.pin != -1) { struct IO_APIC_route_entry entry; + u32 apic_id = read_apic_id(); memset(&entry, 0, sizeof(entry)); entry.mask = IOAPIC_UNMASKED; @@ -1417,7 +1418,8 @@ void native_restore_boot_irq_mode(void) entry.polarity = IOAPIC_POL_HIGH; entry.dest_mode = IOAPIC_DEST_MODE_PHYSICAL; entry.delivery_mode = dest_ExtINT; - entry.dest = read_apic_id(); + entry.dest = apic_id & 0xff; + entry.virt_ext_dest = apic_id >> 8; /* * Add it to the IO-APIC irq-routing table: @@ -1861,7 +1863,8 @@ static void ioapic_configure_entry(struct irq_data *irqd) * ioapic chip to verify that. */ if (irqd->chip == &ioapic_chip) { - mpd->entry.dest = cfg->dest_apicid; + mpd->entry.dest = cfg->dest_apicid & 0xff; + mpd->entry.virt_ext_dest = cfg->dest_apicid >> 8; mpd->entry.vector = cfg->vector; } for_each_irq_pin(entry, mpd->irq_2_pin) @@ -2027,6 +2030,7 @@ static inline void __init unlock_ExtINT_logic(void) int apic, pin, i; struct IO_APIC_route_entry entry0, entry1; unsigned char save_control, save_freq_select; + u32 apic_id; pin = find_isa_irq_pin(8, mp_INT); if (pin == -1) { @@ -2042,11 +2046,13 @@ static inline void __init unlock_ExtINT_logic(void) entry0 = ioapic_read_entry(apic, pin); clear_IO_APIC_pin(apic, pin); + apic_id = hard_smp_processor_id(); memset(&entry1, 0, sizeof(entry1)); entry1.dest_mode = IOAPIC_DEST_MODE_PHYSICAL; entry1.mask = IOAPIC_UNMASKED; - entry1.dest = hard_smp_processor_id(); + entry1.dest = apic_id & 0xff; + entry1.virt_ext_dest = apic_id >> 8; entry1.delivery_mode = dest_ExtINT; entry1.polarity = entry0.polarity; entry1.trigger = IOAPIC_EDGE; @@ -2949,7 +2955,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data, memset(entry, 0, sizeof(*entry)); entry->delivery_mode = apic->irq_delivery_mode; entry->dest_mode = apic->irq_dest_mode; - entry->dest = cfg->dest_apicid; + entry->dest = cfg->dest_apicid & 0xff; + entry->virt_ext_dest = cfg->dest_apicid >> 8; entry->vector = cfg->vector; entry->trigger = data->trigger; entry->polarity = data->polarity; From patchwork Fri Oct 9 10:46:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 11825583 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0BA6B6CA for ; Fri, 9 Oct 2020 10:47:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D9C4022284 for ; Fri, 9 Oct 2020 10:47:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="aEzTBEL6" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387944AbgJIKr3 (ORCPT ); Fri, 9 Oct 2020 06:47:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387789AbgJIKq1 (ORCPT ); Fri, 9 Oct 2020 06:46:27 -0400 Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43092C0613DC; Fri, 9 Oct 2020 03:46:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=a+wMpSAv5Ff83kPG4jc6b3ut+FENx6kjjKE2+uIBrrI=; b=aEzTBEL6vwUnRSBXzybFd0nFex qzGuFC05JDkxWUXUIZmzJHYeSoPZDQWXc5LW5J++oD2WlIWPJJvy6sAg+6YwtGjpO9Nq7S7rAdYeA aaGiT83/bOMaU0xqmfl4Ve4vl1/nhwkvzkhLpydM6hfIXeAQYNbyrIF2D9S9ToCxoyT9bt120jzDZ xVFQdtTq9ha0UMI0/Bpzdp4rx+3T3ekouyYrX2Gjo/5quAHYjuehQiiJsVcDSBt0504ROhYqgEb6h 48aaxtyPs62S933io+3yFsPU3PWFl4P0huWuax9mABfW3hHvCO2xU2oEtUIHbsy7AJVZMG6yBw117 1YIvc22g==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kQpuq-00050W-4r; Fri, 09 Oct 2020 10:46:24 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.93 #3 (Red Hat Linux)) id 1kQpup-005W4E-4N; Fri, 09 Oct 2020 11:46:23 +0100 From: David Woodhouse To: x86@kernel.org Cc: kvm , Thomas Gleixner , Paolo Bonzini , linux-kernel , linux-hyperv@vger.kernel.org Subject: [PATCH v2 5/8] x86/apic: Support 15 bits of APIC ID in MSI where available Date: Fri, 9 Oct 2020 11:46:13 +0100 Message-Id: <20201009104616.1314746-6-dwmw2@infradead.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201009104616.1314746-1-dwmw2@infradead.org> References: <803bb6b2212e65c568c84ff6882c2aa8a0ee03d5.camel@infradead.org> <20201009104616.1314746-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by merlin.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse Some hypervisors can allow the guest to use the Extended Destination ID field in the MSI address to address up to 32768 CPUs. This applies to all downstream devices which generate MSI cycles, including HPET, I/OAPIC and PCI MSI. HPET and PCI MSI use the same __irq_msi_compose_msg() function, while I/OAPIC generates its own and had support for the extended bits added in a previous commit. Signed-off-by: David Woodhouse --- arch/x86/include/asm/x86_init.h | 2 ++ arch/x86/kernel/apic/apic.c | 26 ++++++++++++++++++++++++-- arch/x86/kernel/x86_init.c | 1 + 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 397196fae24d..96a719fb2e53 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -114,6 +114,7 @@ struct x86_init_pci { * @init_platform: platform setup * @guest_late_init: guest late init * @x2apic_available: X2APIC detection + * @msi_ext_dest_id: MSI supports 15-bit APIC IDs * @init_mem_mapping: setup early mappings during init_mem_mapping() * @init_after_bootmem: guest init after boot allocator is finished */ @@ -121,6 +122,7 @@ struct x86_hyper_init { void (*init_platform)(void); void (*guest_late_init)(void); bool (*x2apic_available)(void); + bool (*msi_ext_dest_id)(void); void (*init_mem_mapping)(void); void (*init_after_bootmem)(void); }; diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index fba3ba383ad2..62e769c267c3 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -94,6 +94,11 @@ static unsigned int disabled_cpu_apicid __ro_after_init = BAD_APICID; */ static int apic_extnmi __ro_after_init = APIC_EXTNMI_BSP; +/* + * Hypervisor supports 15 bits of APIC ID in MSI Extended Destination ID + */ +static bool virt_ext_dest_id __ro_after_init; + /* * Map cpu index to physical APIC ID */ @@ -1842,6 +1847,8 @@ static __init void try_to_enable_x2apic(int remap_mode) return; if (remap_mode != IRQ_REMAP_X2APIC_MODE) { + u32 apic_limit = 255; + /* * Using X2APIC without IR is not architecturally supported * on bare metal but may be supported in guests. @@ -1852,12 +1859,22 @@ static __init void try_to_enable_x2apic(int remap_mode) return; } + /* + * If the hypervisor supports extended destination ID in + * MSI, that increases the maximum APIC ID that can be + * used for non-remapped IRQ domains. + */ + if (x86_init.hyper.msi_ext_dest_id()) { + virt_ext_dest_id = 1; + apic_limit = 32767; + } + /* * Without IR, all CPUs can be addressed by IOAPIC/MSI only * in physical mode, and CPUs with an APIC ID that cannnot * be addressed must not be brought online. */ - x2apic_set_max_apicid(255); + x2apic_set_max_apicid(apic_limit); x2apic_phys = 1; } x2apic_enable(); @@ -2504,10 +2521,15 @@ void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg, * Only the IOMMU itself can use the trick of putting destination * APIC ID into the high bits of the address. Anything else would * just be writing to memory if it tried that, and needs IR to - * address higher APIC IDs. + * address APICs which can't be addressed in the normal 32-bit + * address range at 0xFFExxxxx. That is typically just 8 bits, but + * some hypervisors allow the extended destination ID field in bits + * 11-5 to be used, giving support for 15 bits of APIC IDs in total. */ if (dmar) msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid); + else if (virt_ext_dest_id && cfg->dest_apicid < 0x8000) + msg->address_lo |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid) >> 3; else WARN_ON_ONCE(MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid)); } diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index a3038d8deb6a..8b395821cb8d 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -110,6 +110,7 @@ struct x86_init_ops x86_init __initdata = { .init_platform = x86_init_noop, .guest_late_init = x86_init_noop, .x2apic_available = bool_x86_init_noop, + .msi_ext_dest_id = bool_x86_init_noop, .init_mem_mapping = x86_init_noop, .init_after_bootmem = x86_init_noop, }, From patchwork Fri Oct 9 10:46:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 11825575 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EBFA36CA for ; Fri, 9 Oct 2020 10:46:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C393022277 for ; Fri, 9 Oct 2020 10:46:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="rR69r17f" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387836AbgJIKqb (ORCPT ); Fri, 9 Oct 2020 06:46:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387786AbgJIKq0 (ORCPT ); Fri, 9 Oct 2020 06:46:26 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DDC9C0613D8; Fri, 9 Oct 2020 03:46:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=Y7hX4CMOL2NxcxTbKTC0sEPMs6ptoPSjwtvMLF2dqqU=; b=rR69r17f2NQGqPQJmIblGD4ino vxUBUAOQfzpTfcb4lcmYrDD7Usp+eHQ+pQTaatzu5xuD9w6BL4QEtOHDWTRqq2DXv71N+iw9gtlkq Pdq+MhMlTekNQZd8A7v7SzhmZ/jOpwSk3Ec/XqK7TAiTWHkTq9a4JdKxk/1nql/BuSoGbuP+iPvmM AinqGUYVT9tSSHwLDxlg1yadv2XQ3SJg7cqdg6FbXgEYHwy0QskUaoMoSJ9RtgAC4s3lDkQNRV7in JSqIImIgb+X4bMkJizwH1Jrl+WUkTUkdrxSycIW7MS9TrMhRGX8IxRbib6T3oajUC29i/rEJ7brXF vvNbH/eA==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kQpup-0000js-Iy; Fri, 09 Oct 2020 10:46:23 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.93 #3 (Red Hat Linux)) id 1kQpup-005W4J-59; Fri, 09 Oct 2020 11:46:23 +0100 From: David Woodhouse To: x86@kernel.org Cc: kvm , Thomas Gleixner , Paolo Bonzini , linux-kernel , linux-hyperv@vger.kernel.org Subject: [PATCH v2 6/8] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID Date: Fri, 9 Oct 2020 11:46:14 +0100 Message-Id: <20201009104616.1314746-7-dwmw2@infradead.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201009104616.1314746-1-dwmw2@infradead.org> References: <803bb6b2212e65c568c84ff6882c2aa8a0ee03d5.camel@infradead.org> <20201009104616.1314746-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse This allows the host to indicate that MSI emulation supports 15-bit destination IDs, allowing up to 32768 CPUs without interrupt remapping. cf. https://patchwork.kernel.org/patch/11816693/ for qemu Signed-off-by: David Woodhouse Acked-by: Paolo Bonzini --- Documentation/virt/kvm/cpuid.rst | 4 ++++ arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kernel/kvm.c | 6 ++++++ 3 files changed, 11 insertions(+) diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst index a7dff9186bed..1726b5925d2b 100644 --- a/Documentation/virt/kvm/cpuid.rst +++ b/Documentation/virt/kvm/cpuid.rst @@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT 14 guest checks this feature bit async pf acknowledgment msr 0x4b564d07. +KVM_FEATURE_MSI_EXT_DEST_ID 15 guest checks this feature bit + before using extended destination + ID bits in MSI address bits 11-5. + KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side per-cpu warps are expeced in kvmclock diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 812e9b4c1114..950afebfba88 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -32,6 +32,7 @@ #define KVM_FEATURE_POLL_CONTROL 12 #define KVM_FEATURE_PV_SCHED_YIELD 13 #define KVM_FEATURE_ASYNC_PF_INT 14 +#define KVM_FEATURE_MSI_EXT_DEST_ID 15 #define KVM_HINTS_REALTIME 0 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 1b51b727b140..4986b4399aef 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -743,12 +743,18 @@ static void __init kvm_init_platform(void) x86_platform.apic_post_init = kvm_apic_init; } +static bool __init kvm_msi_ext_dest_id(void) +{ + return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID); +} + const __initconst struct hypervisor_x86 x86_hyper_kvm = { .name = "KVM", .detect = kvm_detect, .type = X86_HYPER_KVM, .init.guest_late_init = kvm_guest_init, .init.x2apic_available = kvm_para_available, + .init.msi_ext_dest_id = kvm_msi_ext_dest_id, .init.init_platform = kvm_init_platform, }; From patchwork Fri Oct 9 10:46:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 11825573 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8CEB16CA for ; Fri, 9 Oct 2020 10:46:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4AF7822277 for ; Fri, 9 Oct 2020 10:46:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="mqAp2hge" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387827AbgJIKqa (ORCPT ); Fri, 9 Oct 2020 06:46:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387788AbgJIKq1 (ORCPT ); Fri, 9 Oct 2020 06:46:27 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25058C0613D6; Fri, 9 Oct 2020 03:46:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=k8rruyYtHOtLX+T0n4yk5E6HXM140fcZiSnIws1lqdU=; b=mqAp2hgew7NDqlipuzrwMXHsIF Kp/WOkDEEVdHZ52DN/Lu5fyi60Fae8oCrcvtl7NvX2jd+ipkQU1Y7gw899ZwVSKLIAVDzmcG03mgj etHhilhMVBbGinNorZG5McFqAs6TcehkM/HJsBnakzt6LbiB4Vo73Dk0YcECpxXMkSCBQaaiVZ2jG zIsUp+ofjfkN3B2EhgV33k08bnHgNvJzq8hzpR90ZfR+Wvd9KJp3fm39RmXXzyaBVlUDQVE5egmuv tcrLOf0oTz/Xvu0hrO0NI6kEFMRVoQP8zAOh7rtmdXmz72DUIoUQxJ5bvL2jS2HZBBE3dy1Hll83n 2OmATyrw==; Received: from i7.infradead.org ([2001:8b0:10b:1:21e:67ff:fecb:7a92]) by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kQpup-0000ju-KG; Fri, 09 Oct 2020 10:46:23 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.93 #3 (Red Hat Linux)) id 1kQpup-005W4T-6w; Fri, 09 Oct 2020 11:46:23 +0100 From: David Woodhouse To: x86@kernel.org Cc: kvm , Thomas Gleixner , Paolo Bonzini , linux-kernel , linux-hyperv@vger.kernel.org Subject: [PATCH v2 8/8] x86/ioapic: Generate RTE directly from parent irqchip's MSI message Date: Fri, 9 Oct 2020 11:46:16 +0100 Message-Id: <20201009104616.1314746-9-dwmw2@infradead.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20201009104616.1314746-1-dwmw2@infradead.org> References: <803bb6b2212e65c568c84ff6882c2aa8a0ee03d5.camel@infradead.org> <20201009104616.1314746-1-dwmw2@infradead.org> MIME-Version: 1.0 Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: David Woodhouse The I/OAPIC generates an MSI cycle with address/data bits taken from its Redirection Table Entry in some combination which used to make sense, but now is just a bunch of bits which get passed through in some seemingly arbitrary order. Instead of making IRQ remapping drivers directly frob the I/OAPIC RTE, let them just do their job and generate an MSI message. The bit swizzling to turn that MSI message into the IOAPIC's RTE is the same in all cases, since it's a function of the I/OAPIC hardware. The IRQ remappers have no real need to get involved with that. The only slight caveat is that the I/OAPIC is interpreting some of those fields too, and it does want the 'vector' field to be unique to make EOI work. The AMD IOMMU happens to put its IRTE index in the bits that the I/OAPIC thinks are the vector field, and accommodates this requirement by reserving the first 32 indices for the I/OAPIC. The Intel IOMMU doesn't actually use the bits that the I/OAPIC thinks are the vector field, so it fills in the 'pin' value there instead. Signed-off-by: David Woodhouse Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse --- arch/x86/include/asm/hw_irq.h | 11 +++--- arch/x86/include/asm/msidef.h | 2 ++ arch/x86/kernel/apic/io_apic.c | 55 ++++++++++++++++++----------- drivers/iommu/amd/iommu.c | 14 -------- drivers/iommu/hyperv-iommu.c | 31 ---------------- drivers/iommu/intel/irq_remapping.c | 19 +++------- 6 files changed, 46 insertions(+), 86 deletions(-) diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index a4aeeaace040..aabd8f1b6bb0 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -45,12 +45,11 @@ enum irq_alloc_type { }; struct ioapic_alloc_info { - int pin; - int node; - u32 trigger : 1; - u32 polarity : 1; - u32 valid : 1; - struct IO_APIC_route_entry *entry; + int pin; + int node; + u32 trigger : 1; + u32 polarity : 1; + u32 valid : 1; }; struct uv_alloc_info { diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h index ee2f8ccc32d0..37c3d2d492c9 100644 --- a/arch/x86/include/asm/msidef.h +++ b/arch/x86/include/asm/msidef.h @@ -18,6 +18,7 @@ #define MSI_DATA_DELIVERY_MODE_SHIFT 8 #define MSI_DATA_DELIVERY_FIXED (0 << MSI_DATA_DELIVERY_MODE_SHIFT) #define MSI_DATA_DELIVERY_LOWPRI (1 << MSI_DATA_DELIVERY_MODE_SHIFT) +#define MSI_DATA_DELIVERY_MODE_MASK (3 << MSI_DATA_DELIVERY_MODE_SHIFT) #define MSI_DATA_LEVEL_SHIFT 14 #define MSI_DATA_LEVEL_DEASSERT (0 << MSI_DATA_LEVEL_SHIFT) @@ -37,6 +38,7 @@ #define MSI_ADDR_DEST_MODE_SHIFT 2 #define MSI_ADDR_DEST_MODE_PHYSICAL (0 << MSI_ADDR_DEST_MODE_SHIFT) #define MSI_ADDR_DEST_MODE_LOGICAL (1 << MSI_ADDR_DEST_MODE_SHIFT) +#define MSI_ADDR_DEST_MODE_MASK (1 << MSI_DATA_DELIVERY_MODE_SHIFT) #define MSI_ADDR_REDIRECTION_SHIFT 3 #define MSI_ADDR_REDIRECTION_CPU (0 << MSI_ADDR_REDIRECTION_SHIFT) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 54f6a029b1d1..ca2da19d5c55 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -48,6 +48,7 @@ #include /* time_after() */ #include #include +#include #include #include @@ -63,6 +64,7 @@ #include #include #include +#include #include @@ -1851,22 +1853,36 @@ static void ioapic_ir_ack_level(struct irq_data *irq_data) eoi_ioapic_pin(data->entry.vector, data); } +static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry) +{ + struct msi_msg msg; + u32 *entry = _entry; + + irq_chip_compose_msi_msg(irq_data, &msg); + + /* + * They're in a bit of a random order for historical reasons, but + * the IO/APIC is just a device for turning interrupt lines into + * MSIs, and various bits of the MSI addr/data are just swizzled + * into/from the bits of Redirection Table Entry. + */ + entry[0] &= 0xfffff000; + entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK | + MSI_DATA_VECTOR_MASK)); + entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9; + + entry[1] &= 0xffff; + entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12; +} + + static void ioapic_configure_entry(struct irq_data *irqd) { struct mp_chip_data *mpd = irqd->chip_data; - struct irq_cfg *cfg = irqd_cfg(irqd); struct irq_pin_list *entry; - /* - * Only update when the parent is the vector domain, don't touch it - * if the parent is the remapping domain. Check the installed - * ioapic chip to verify that. - */ - if (irqd->chip == &ioapic_chip) { - mpd->entry.dest = cfg->dest_apicid & 0xff; - mpd->entry.virt_ext_dest = cfg->dest_apicid >> 8; - mpd->entry.vector = cfg->vector; - } + mp_swizzle_msi_dest_bits(irqd, &mpd->entry); + for_each_irq_pin(entry, mpd->irq_2_pin) __ioapic_write_entry(entry->apic, entry->pin, mpd->entry); } @@ -2949,15 +2965,14 @@ static void mp_irqdomain_get_attr(u32 gsi, struct mp_chip_data *data, } } -static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data, - struct IO_APIC_route_entry *entry) +static void mp_setup_entry(struct irq_data *irq_data, struct mp_chip_data *data) { + struct IO_APIC_route_entry *entry = &data->entry; + memset(entry, 0, sizeof(*entry)); - entry->delivery_mode = apic->irq_delivery_mode; - entry->dest_mode = apic->irq_dest_mode; - entry->dest = cfg->dest_apicid & 0xff; - entry->virt_ext_dest = cfg->dest_apicid >> 8; - entry->vector = cfg->vector; + + mp_swizzle_msi_dest_bits(irq_data, entry); + entry->trigger = data->trigger; entry->polarity = data->polarity; /* @@ -2995,7 +3010,6 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq, if (!data) return -ENOMEM; - info->ioapic.entry = &data->entry; ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info); if (ret < 0) { kfree(data); @@ -3013,8 +3027,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq, add_pin_to_irq_node(data, ioapic_alloc_attr_node(info), ioapic, pin); local_irq_save(flags); - if (info->ioapic.entry) - mp_setup_entry(cfg, data, info->ioapic.entry); + mp_setup_entry(irq_data, data); mp_register_handler(virq, data->trigger); if (virq < nr_legacy_irqs()) legacy_pic->mask(virq); diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index ef64e01f66d7..13d0a8f42d56 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3597,7 +3597,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data, { struct irq_2_irte *irte_info = &data->irq_2_irte; struct msi_msg *msg = &data->msi_entry; - struct IO_APIC_route_entry *entry; struct amd_iommu *iommu = amd_iommu_rlookup_table[devid]; if (!iommu) @@ -3611,19 +3610,6 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data, switch (info->type) { case X86_IRQ_ALLOC_TYPE_IOAPIC: - /* Setup IOAPIC entry */ - entry = info->ioapic.entry; - info->ioapic.entry = NULL; - memset(entry, 0, sizeof(*entry)); - entry->vector = index; - entry->mask = 0; - entry->trigger = info->ioapic.trigger; - entry->polarity = info->ioapic.polarity; - /* Mask level triggered irqs. */ - if (info->ioapic.trigger) - entry->mask = 1; - break; - case X86_IRQ_ALLOC_TYPE_HPET: case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index e09e2d734c57..37dd485a5640 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -40,7 +40,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data, { struct irq_data *parent = data->parent_data; struct irq_cfg *cfg = irqd_cfg(data); - struct IO_APIC_route_entry *entry; int ret; /* Return error If new irq affinity is out of ioapic_max_cpumask. */ @@ -51,9 +50,6 @@ static int hyperv_ir_set_affinity(struct irq_data *data, if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE) return ret; - entry = data->chip_data; - entry->dest = cfg->dest_apicid; - entry->vector = cfg->vector; send_cleanup_vector(cfg); return 0; @@ -89,20 +85,6 @@ static int hyperv_irq_remapping_alloc(struct irq_domain *domain, irq_data->chip = &hyperv_ir_chip; - /* - * If there is interrupt remapping function of IOMMU, setting irq - * affinity only needs to change IRTE of IOMMU. But Hyper-V doesn't - * support interrupt remapping function, setting irq affinity of IO-APIC - * interrupts still needs to change IO-APIC registers. But ioapic_ - * configure_entry() will ignore value of cfg->vector and cfg-> - * dest_apicid when IO-APIC's parent irq domain is not the vector - * domain.(See ioapic_configure_entry()) In order to setting vector - * and dest_apicid to IO-APIC register, IO-APIC entry pointer is saved - * in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_ - * affinity() set vector and dest_apicid directly into IO-APIC entry. - */ - irq_data->chip_data = info->ioapic.entry; - /* * Hypver-V IO APIC irq affinity should be in the scope of * ioapic_max_cpumask because no irq remapping support. @@ -119,22 +101,9 @@ static void hyperv_irq_remapping_free(struct irq_domain *domain, irq_domain_free_irqs_common(domain, virq, nr_irqs); } -static int hyperv_irq_remapping_activate(struct irq_domain *domain, - struct irq_data *irq_data, bool reserve) -{ - struct irq_cfg *cfg = irqd_cfg(irq_data); - struct IO_APIC_route_entry *entry = irq_data->chip_data; - - entry->dest = cfg->dest_apicid; - entry->vector = cfg->vector; - - return 0; -} - static const struct irq_domain_ops hyperv_ir_domain_ops = { .alloc = hyperv_irq_remapping_alloc, .free = hyperv_irq_remapping_free, - .activate = hyperv_irq_remapping_activate, }; static int __init hyperv_prepare_irq_remapping(void) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 0cfce1d3b7bb..511dfb4884bc 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1265,7 +1265,6 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, struct irq_alloc_info *info, int index, int sub_handle) { - struct IR_IO_APIC_route_entry *entry; struct irte *irte = &data->irte_entry; struct msi_msg *msg = &data->msi_entry; @@ -1281,23 +1280,15 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, irte->avail, irte->vector, irte->dest_id, irte->sid, irte->sq, irte->svt); - entry = (struct IR_IO_APIC_route_entry *)info->ioapic.entry; - info->ioapic.entry = NULL; - memset(entry, 0, sizeof(*entry)); - entry->index2 = (index >> 15) & 0x1; - entry->zero = 0; - entry->format = 1; - entry->index = (index & 0x7fff); /* * IO-APIC RTE will be configured with virtual vector. * irq handler will do the explicit EOI to the io-apic. */ - entry->vector = info->ioapic.pin; - entry->mask = 0; /* enable IRQ */ - entry->trigger = info->ioapic.trigger; - entry->polarity = info->ioapic.polarity; - if (info->ioapic.trigger) - entry->mask = 1; /* Mask level triggered irqs. */ + msg->data = info->ioapic.pin; + msg->address_hi = MSI_ADDR_BASE_HI; + msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT | + MSI_ADDR_IR_INDEX1(index) | + MSI_ADDR_IR_INDEX2(index); break; case X86_IRQ_ALLOC_TYPE_HPET: