Message ID | 20220315091832.13873-1-ihuguet@redhat.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 046e1537a3cf0adc68fe865b5dc9a7e731cc63b3 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: set default rss queues num to physical cores / 2 | expand |
On Tue, 15 Mar 2022 10:18:32 +0100 Íñigo Huguet wrote: > Network drivers can call to netif_get_num_default_rss_queues to get the > default number of receive queues to use. Right now, this default number > is min(8, num_online_cpus()). > > Instead, as suggested by Jakub, use the number of physical cores divided > by 2 as a way to avoid wasting CPU resources and to avoid using both CPU > threads, but still allowing to scale for high-end processors with many > cores. > > As an exception, select 2 queues for processors with 2 cores, because > otherwise it won't take any advantage of RSS despite being SMP capable. > > Tested: Processor Intel Xeon E5-2620 (2 sockets, 6 cores/socket, 2 > threads/core). NIC Broadcom NetXtreme II BCM57810 (10GBps). Ran some > tests with `perf stat iperf3 -R`, with parallelisms of 1, 8 and 24, > getting the following results: > - Number of queues: 6 (instead of 8) > - Network throughput: not affected > - CPU usage: utilized 0.05-0.12 CPUs more than before (having 24 CPUs > this is only 0.2-0.5% higher) > - Reduced the number of context switches by 7-50%, being more noticeable > when using a higher number of parallel threads. Thanks for following up, Inigo! Heads up for the maintainers of drivers which use netif_get_num_default_rss_queues() today - please note the above - the default number of Rx queues may change for you starting with the 5.18 kernel.
Hello: This patch was applied to netdev/net-next.git (master) by Jakub Kicinski <kuba@kernel.org>: On Tue, 15 Mar 2022 10:18:32 +0100 you wrote: > Network drivers can call to netif_get_num_default_rss_queues to get the > default number of receive queues to use. Right now, this default number > is min(8, num_online_cpus()). > > Instead, as suggested by Jakub, use the number of physical cores divided > by 2 as a way to avoid wasting CPU resources and to avoid using both CPU > threads, but still allowing to scale for high-end processors with many > cores. > > [...] Here is the summary with links: - [net-next] net: set default rss queues num to physical cores / 2 https://git.kernel.org/netdev/net-next/c/046e1537a3cf You are awesome, thank you!
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0d994710b335..db9874ed79d9 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3664,7 +3664,6 @@ static inline unsigned int get_netdev_rx_queue_index( } #endif -#define DEFAULT_MAX_NUM_RSS_QUEUES (8) int netif_get_num_default_rss_queues(void); enum skb_free_reason { diff --git a/net/core/dev.c b/net/core/dev.c index 75bab5b0dbae..8e0cc5f2020d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2990,13 +2990,25 @@ EXPORT_SYMBOL(netif_set_real_num_queues); /** * netif_get_num_default_rss_queues - default number of RSS queues * - * This routine should set an upper limit on the number of RSS queues - * used by default by multiqueue devices. + * Default value is the number of physical cores if there are only 1 or 2, or + * divided by 2 if there are more. */ int netif_get_num_default_rss_queues(void) { - return is_kdump_kernel() ? - 1 : min_t(int, DEFAULT_MAX_NUM_RSS_QUEUES, num_online_cpus()); + cpumask_var_t cpus; + int cpu, count = 0; + + if (unlikely(is_kdump_kernel() || !zalloc_cpumask_var(&cpus, GFP_KERNEL))) + return 1; + + cpumask_copy(cpus, cpu_online_mask); + for_each_cpu(cpu, cpus) { + ++count; + cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu)); + } + free_cpumask_var(cpus); + + return count > 2 ? DIV_ROUND_UP(count, 2) : count; } EXPORT_SYMBOL(netif_get_num_default_rss_queues);
Network drivers can call to netif_get_num_default_rss_queues to get the default number of receive queues to use. Right now, this default number is min(8, num_online_cpus()). Instead, as suggested by Jakub, use the number of physical cores divided by 2 as a way to avoid wasting CPU resources and to avoid using both CPU threads, but still allowing to scale for high-end processors with many cores. As an exception, select 2 queues for processors with 2 cores, because otherwise it won't take any advantage of RSS despite being SMP capable. Tested: Processor Intel Xeon E5-2620 (2 sockets, 6 cores/socket, 2 threads/core). NIC Broadcom NetXtreme II BCM57810 (10GBps). Ran some tests with `perf stat iperf3 -R`, with parallelisms of 1, 8 and 24, getting the following results: - Number of queues: 6 (instead of 8) - Network throughput: not affected - CPU usage: utilized 0.05-0.12 CPUs more than before (having 24 CPUs this is only 0.2-0.5% higher) - Reduced the number of context switches by 7-50%, being more noticeable when using a higher number of parallel threads. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> --- include/linux/netdevice.h | 1 - net/core/dev.c | 20 ++++++++++++++++---- 2 files changed, 16 insertions(+), 5 deletions(-)