From patchwork Tue Aug  8 15:59:00 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Boris Ostrovsky <boris.ostrovsky@oracle.com>
X-Patchwork-Id: 9888621
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	5E0F1601EB for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue,  8 Aug 2017 16:00:06 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4FC93288D4
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue,  8 Aug 2017 16:00:06 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 44A98288ED; Tue,  8 Aug 2017 16:00:06 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	UNPARSEABLE_RELAY autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E030228935
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue,  8 Aug 2017 16:00:04 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1df6s1-0007Ya-8S; Tue, 08 Aug 2017 15:56:37 +0000
Received: from mail6.bemta6.messagelabs.com ([193.109.254.103])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <boris.ostrovsky@oracle.com>) id 1df6rz-0007YU-Jl
	for xen-devel@lists.xen.org; Tue, 08 Aug 2017 15:56:35 +0000
Received: from [85.158.143.35] by server-3.bemta-6.messagelabs.com id
	80/35-03044-23FD9895; Tue, 08 Aug 2017 15:56:34 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrPLMWRWlGSWpSXmKPExsXSO6nOVVfvfme
	kwcO31hZLPi5mcWD0OLr7N1MAYxRrZl5SfkUCa8bzZWvYCu5bVPz9O5elgXGtfhcjJ4eQwAQm
	iaazIl2MXED2H0aJQwceskE4GxglZp24ywrh9DBKvDi/nBGkhU3ASOLs0elgtoiAtMS1z5fBb
	GYBA4kXH58DNXBwCAsUSLxaIAgSZhFQldhyawsLiM0r4ClxZMkMdhBbQkBBYsrD98wQtqHE54
	1LmScw8ixgZFjFqFGcWlSWWqRrZKCXVJSZnlGSm5iZo2toYKaXm1pcnJiempOYVKyXnJ+7iRH
	oeQYg2MH4a1nAIUZJDiYlUd5N2p2RQnxJ+SmVGYnFGfFFpTmpxYcYZTg4lCR4he8B5QSLUtNT
	K9Iyc4AhCJOW4OBREuF1A0nzFhck5hZnpkOkTjHqcrya8P8bkxBLXn5eqpQ4ryNIkQBIUUZpH
	twIWDxcYpSVEuZlBDpKiKcgtSg3swRV/hWjOAejkjBvAMgUnsy8ErhNr4COYAI6IsIX7IiSRI
	SUVAPjJqOPExnSfJ69WmxncXKn0fKQefFLHu9ezv2eRbndTc7eXUGq3+iwy/7jgXubUp3lipz
	znylFqF9t2lK69iqrivnmCVv+3WjfHD5F+8nlA0YKO6/dnup4s07xr7300fb3VWLbtmRk8VZc
	neNd6rfjZGm+S4T2//x3h+LX8x0urLqlavlTNzpPiaU4I9FQi7moOBEA3X6JrYICAAA=
X-Env-Sender: boris.ostrovsky@oracle.com
X-Msg-Ref: server-4.tower-21.messagelabs.com!1502207789!70815934!1
X-Originating-IP: [141.146.126.69]
X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor:
	VHJ1c3RlZCBJUDogMTQxLjE0Ni4xMjYuNjkgPT4gMjc3MjE4\n
X-StarScan-Received: 
X-StarScan-Version: 9.4.45; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 8150 invoked from network); 8 Aug 2017 15:56:30 -0000
Received: from aserp1040.oracle.com (HELO aserp1040.oracle.com)
	(141.146.126.69)
	by server-4.tower-21.messagelabs.com with DHE-RSA-AES256-GCM-SHA384
	encrypted SMTP; 8 Aug 2017 15:56:30 -0000
Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74])
	by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with
	ESMTP id v78FuNx5022838
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256
	verify=OK); Tue, 8 Aug 2017 15:56:25 GMT
Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235])
	by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id
	v78FuNpI012695
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=OK); Tue, 8 Aug 2017 15:56:23 GMT
Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14])
	by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id
	v78FuMSA005177; Tue, 8 Aug 2017 15:56:22 GMT
Received: from ovs104.us.oracle.com (/10.149.76.204)
	by default (Oracle Beehive Gateway v4.0)
	with ESMTP ; Tue, 08 Aug 2017 08:56:22 -0700
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: xen-devel@lists.xen.org
Date: Tue,  8 Aug 2017 11:59:00 -0400
Message-Id: <1502207940-7407-1-git-send-email-boris.ostrovsky@oracle.com>
X-Mailer: git-send-email 1.8.3.1
X-Source-IP: userv0022.oracle.com [156.151.31.74]
Cc: andrew.cooper3@citrix.com, jbeulich@suse.com
Subject: [Xen-devel] [PATCH v2] x86/apic/x2apic: Share IRQ vector between
	cluster members only when no cpumask is specified
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

We have limited number (slightly under NR_DYNAMIC_VECTORS=192) of IRQ
vectors that are available to each processor. Currently, when x2apic
cluster mode is used (which is default), each vector is shared among
all processors in the cluster. With many IRQs (as is the case on systems
with multiple SR-IOV cards) and few clusters (e.g. single socket)
there is a good chance that we will run out of vectors.

This patch tries to decrease vector sharing between processors by
assigning vector to a single processor if the assignment request (via
__assign_irq_vector()) comes without explicitly specifying which
processors are expected to share the interrupt. This typically happens
during boot time (or possibly PCI hotplug) when create_irq(NUMA_NO_NODE)
is called. When __assign_irq_vector() is called from
set_desc_affinity() which provides sharing mask, vector sharing will
continue to be performed, as before.

This patch to some extent mirrors Linux commit d872818dbbee
("x86/apic/x2apic: Use multiple cluster members for the irq destination
only with the explicit affinity").

Note that this change still does not guarantee that we never run out of
vectors. For example, on a single core system we will be effectively
back to the single cluster/socket case of original code.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v2:
* Instead of relying on having mask set to TARGET_CPUS as indication that the
caller doesn't care about how vectors are shared allow passing NULL mask to
__assign_irq_vector() (and then to vector_allocation_cpumask()).


 xen/arch/x86/genapic/delivery.c              | 6 ++++--
 xen/arch/x86/genapic/x2apic.c                | 6 +++++-
 xen/arch/x86/irq.c                           | 9 +++++----
 xen/include/asm-x86/genapic.h                | 9 ++++++---
 xen/include/asm-x86/mach-generic/mach_apic.h | 3 ++-
 5 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/genapic/delivery.c b/xen/arch/x86/genapic/delivery.c
index ced92a1..3cb65d2 100644
--- a/xen/arch/x86/genapic/delivery.c
+++ b/xen/arch/x86/genapic/delivery.c
@@ -30,7 +30,8 @@ void __init clustered_apic_check_flat(void)
 	printk("Enabling APIC mode:  Flat.  Using %d I/O APICs\n", nr_ioapics);
 }
 
-const cpumask_t *vector_allocation_cpumask_flat(int cpu)
+const cpumask_t *vector_allocation_cpumask_flat(int cpu,
+    const cpumask_t *cpumask)
 {
 	return &cpu_online_map;
 } 
@@ -58,7 +59,8 @@ void __init clustered_apic_check_phys(void)
 	printk("Enabling APIC mode:  Phys.  Using %d I/O APICs\n", nr_ioapics);
 }
 
-const cpumask_t *vector_allocation_cpumask_phys(int cpu)
+const cpumask_t *vector_allocation_cpumask_phys(int cpu,
+    const cpumask_t *cpumask)
 {
 	return cpumask_of(cpu);
 }
diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
index 5fffb31..b12d529 100644
--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -72,8 +72,12 @@ static void __init clustered_apic_check_x2apic(void)
 {
 }
 
-static const cpumask_t *vector_allocation_cpumask_x2apic_cluster(int cpu)
+static const cpumask_t *vector_allocation_cpumask_x2apic_cluster(int cpu,
+    const cpumask_t *cpumask)
 {
+    if ( !cpumask )
+        return cpumask_of(cpu);
+
     return per_cpu(cluster_cpus, cpu);
 }
 
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 57e6c18..a0385a3 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -450,11 +450,12 @@ static int __assign_irq_vector(
     static int current_vector = FIRST_DYNAMIC_VECTOR, current_offset = 0;
     int cpu, err, old_vector;
     cpumask_t tmp_mask;
+    const cpumask_t *initial_mask = mask ?: TARGET_CPUS;
     vmask_t *irq_used_vectors = NULL;
 
     old_vector = irq_to_vector(irq);
     if (old_vector > 0) {
-        cpumask_and(&tmp_mask, mask, &cpu_online_map);
+        cpumask_and(&tmp_mask, initial_mask, &cpu_online_map);
         if (cpumask_intersects(&tmp_mask, desc->arch.cpu_mask)) {
             desc->arch.vector = old_vector;
             return 0;
@@ -476,7 +477,7 @@ static int __assign_irq_vector(
     else
         irq_used_vectors = irq_get_used_vector_mask(irq);
 
-    for_each_cpu(cpu, mask) {
+    for_each_cpu(cpu, initial_mask) {
         int new_cpu;
         int vector, offset;
 
@@ -484,7 +485,7 @@ static int __assign_irq_vector(
         if (!cpu_online(cpu))
             continue;
 
-        cpumask_and(&tmp_mask, vector_allocation_cpumask(cpu),
+        cpumask_and(&tmp_mask, vector_allocation_cpumask(cpu, mask),
                     &cpu_online_map);
 
         vector = current_vector;
@@ -550,7 +551,7 @@ int assign_irq_vector(int irq, const cpumask_t *mask)
     BUG_ON(irq >= nr_irqs || irq <0);
 
     spin_lock_irqsave(&vector_lock, flags);
-    ret = __assign_irq_vector(irq, desc, mask ?: TARGET_CPUS);
+    ret = __assign_irq_vector(irq, desc, mask);
     if (!ret) {
         ret = desc->arch.vector;
         cpumask_copy(desc->affinity, desc->arch.cpu_mask);
diff --git a/xen/include/asm-x86/genapic.h b/xen/include/asm-x86/genapic.h
index 5496ab0..306544d 100644
--- a/xen/include/asm-x86/genapic.h
+++ b/xen/include/asm-x86/genapic.h
@@ -34,7 +34,8 @@ struct genapic {
 	void (*init_apic_ldr)(void);
 	void (*clustered_apic_check)(void);
 	const cpumask_t *(*target_cpus)(void);
-	const cpumask_t *(*vector_allocation_cpumask)(int cpu);
+	const cpumask_t *(*vector_allocation_cpumask)(int cpu,
+                                                      const cpumask_t *mask);
 	unsigned int (*cpu_mask_to_apicid)(const cpumask_t *cpumask);
 	void (*send_IPI_mask)(const cpumask_t *mask, int vector);
     void (*send_IPI_self)(uint8_t vector);
@@ -58,7 +59,8 @@ void init_apic_ldr_flat(void);
 void clustered_apic_check_flat(void);
 unsigned int cpu_mask_to_apicid_flat(const cpumask_t *cpumask);
 void send_IPI_mask_flat(const cpumask_t *mask, int vector);
-const cpumask_t *vector_allocation_cpumask_flat(int cpu);
+const cpumask_t *vector_allocation_cpumask_flat(int cpu,
+    const cpumask_t *mask);
 #define GENAPIC_FLAT \
 	.int_delivery_mode = dest_LowestPrio, \
 	.int_dest_mode = 1 /* logical delivery */, \
@@ -74,7 +76,8 @@ void init_apic_ldr_phys(void);
 void clustered_apic_check_phys(void);
 unsigned int cpu_mask_to_apicid_phys(const cpumask_t *cpumask);
 void send_IPI_mask_phys(const cpumask_t *mask, int vector);
-const cpumask_t *vector_allocation_cpumask_phys(int cpu);
+const cpumask_t *vector_allocation_cpumask_phys(int cpu,
+    const cpumask_t *mask);
 #define GENAPIC_PHYS \
 	.int_delivery_mode = dest_Fixed, \
 	.int_dest_mode = 0 /* physical delivery */, \
diff --git a/xen/include/asm-x86/mach-generic/mach_apic.h b/xen/include/asm-x86/mach-generic/mach_apic.h
index 03e9e8a..60a32c5 100644
--- a/xen/include/asm-x86/mach-generic/mach_apic.h
+++ b/xen/include/asm-x86/mach-generic/mach_apic.h
@@ -16,7 +16,8 @@
 #define init_apic_ldr (genapic->init_apic_ldr)
 #define clustered_apic_check (genapic->clustered_apic_check) 
 #define cpu_mask_to_apicid (genapic->cpu_mask_to_apicid)
-#define vector_allocation_cpumask(cpu) (genapic->vector_allocation_cpumask(cpu))
+#define vector_allocation_cpumask(cpu, mask) \
+    (genapic->vector_allocation_cpumask(cpu, mask))
 
 static inline void enable_apic_mode(void)
 {