diff mbox series

[net-next] net: set default rss queues num to physical cores / 2

Message ID 20220315091832.13873-1-ihuguet@redhat.com (mailing list archive)
State Accepted
Commit 046e1537a3cf0adc68fe865b5dc9a7e731cc63b3
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: set default rss queues num to physical cores / 2 | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4769 this patch: 4769
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 824 this patch: 824
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4924 this patch: 4924
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0

Commit Message

Íñigo Huguet March 15, 2022, 9:18 a.m. UTC
Network drivers can call to netif_get_num_default_rss_queues to get the
default number of receive queues to use. Right now, this default number
is min(8, num_online_cpus()).

Instead, as suggested by Jakub, use the number of physical cores divided
by 2 as a way to avoid wasting CPU resources and to avoid using both CPU
threads, but still allowing to scale for high-end processors with many
cores.

As an exception, select 2 queues for processors with 2 cores, because
otherwise it won't take any advantage of RSS despite being SMP capable.

Tested: Processor Intel Xeon E5-2620 (2 sockets, 6 cores/socket, 2
threads/core). NIC Broadcom NetXtreme II BCM57810 (10GBps). Ran some
tests with `perf stat iperf3 -R`, with parallelisms of 1, 8 and 24,
getting the following results:
- Number of queues: 6 (instead of 8)
- Network throughput: not affected
- CPU usage: utilized 0.05-0.12 CPUs more than before (having 24 CPUs
  this is only 0.2-0.5% higher)
- Reduced the number of context switches by 7-50%, being more noticeable
  when using a higher number of parallel threads.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
---
 include/linux/netdevice.h |  1 -
 net/core/dev.c            | 20 ++++++++++++++++----
 2 files changed, 16 insertions(+), 5 deletions(-)

Comments

Jakub Kicinski March 17, 2022, 3:10 a.m. UTC | #1
On Tue, 15 Mar 2022 10:18:32 +0100 Íñigo Huguet wrote:
> Network drivers can call to netif_get_num_default_rss_queues to get the
> default number of receive queues to use. Right now, this default number
> is min(8, num_online_cpus()).
> 
> Instead, as suggested by Jakub, use the number of physical cores divided
> by 2 as a way to avoid wasting CPU resources and to avoid using both CPU
> threads, but still allowing to scale for high-end processors with many
> cores.
> 
> As an exception, select 2 queues for processors with 2 cores, because
> otherwise it won't take any advantage of RSS despite being SMP capable.
> 
> Tested: Processor Intel Xeon E5-2620 (2 sockets, 6 cores/socket, 2
> threads/core). NIC Broadcom NetXtreme II BCM57810 (10GBps). Ran some
> tests with `perf stat iperf3 -R`, with parallelisms of 1, 8 and 24,
> getting the following results:
> - Number of queues: 6 (instead of 8)
> - Network throughput: not affected
> - CPU usage: utilized 0.05-0.12 CPUs more than before (having 24 CPUs
>   this is only 0.2-0.5% higher)
> - Reduced the number of context switches by 7-50%, being more noticeable
>   when using a higher number of parallel threads.

Thanks for following up, Inigo!

Heads up for the maintainers of drivers which use
netif_get_num_default_rss_queues() today - please note the above - 
the default number of Rx queues may change for you starting with 
the 5.18 kernel.
patchwork-bot+netdevbpf@kernel.org March 18, 2022, 9:10 p.m. UTC | #2
Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 15 Mar 2022 10:18:32 +0100 you wrote:
> Network drivers can call to netif_get_num_default_rss_queues to get the
> default number of receive queues to use. Right now, this default number
> is min(8, num_online_cpus()).
> 
> Instead, as suggested by Jakub, use the number of physical cores divided
> by 2 as a way to avoid wasting CPU resources and to avoid using both CPU
> threads, but still allowing to scale for high-end processors with many
> cores.
> 
> [...]

Here is the summary with links:
  - [net-next] net: set default rss queues num to physical cores / 2
    https://git.kernel.org/netdev/net-next/c/046e1537a3cf

You are awesome, thank you!
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0d994710b335..db9874ed79d9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3664,7 +3664,6 @@  static inline unsigned int get_netdev_rx_queue_index(
 }
 #endif
 
-#define DEFAULT_MAX_NUM_RSS_QUEUES	(8)
 int netif_get_num_default_rss_queues(void);
 
 enum skb_free_reason {
diff --git a/net/core/dev.c b/net/core/dev.c
index 75bab5b0dbae..8e0cc5f2020d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2990,13 +2990,25 @@  EXPORT_SYMBOL(netif_set_real_num_queues);
 /**
  * netif_get_num_default_rss_queues - default number of RSS queues
  *
- * This routine should set an upper limit on the number of RSS queues
- * used by default by multiqueue devices.
+ * Default value is the number of physical cores if there are only 1 or 2, or
+ * divided by 2 if there are more.
  */
 int netif_get_num_default_rss_queues(void)
 {
-	return is_kdump_kernel() ?
-		1 : min_t(int, DEFAULT_MAX_NUM_RSS_QUEUES, num_online_cpus());
+	cpumask_var_t cpus;
+	int cpu, count = 0;
+
+	if (unlikely(is_kdump_kernel() || !zalloc_cpumask_var(&cpus, GFP_KERNEL)))
+		return 1;
+
+	cpumask_copy(cpus, cpu_online_mask);
+	for_each_cpu(cpu, cpus) {
+		++count;
+		cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
+	}
+	free_cpumask_var(cpus);
+
+	return count > 2 ? DIV_ROUND_UP(count, 2) : count;
 }
 EXPORT_SYMBOL(netif_get_num_default_rss_queues);