[net-next,resend,0/2] sfc: optimize RXQs count and affinities

Message ID	20220216094139.15989-1-ihuguet@redhat.com (mailing list archive)
Headers	show Return-Path: <netdev-owner@kernel.org> From: =?utf-8?b?w43DsWlnbyBIdWd1ZXQ=?= <ihuguet@redhat.com> To: ecree.xilinx@gmail.com, habetsm.xilinx@gmail.com Cc: davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, =?utf-8?b?w43DsWlnbyBIdWd1ZXQ=?= <ihuguet@redhat.com> Subject: [PATCH net-next resend 0/2] sfc: optimize RXQs count and affinities Date: Wed, 16 Feb 2022 10:41:37 +0100 Message-Id: <20220216094139.15989-1-ihuguet@redhat.com> In-Reply-To: <20220128151922.1016841-1-ihuguet@redhat.com> References: <20220128151922.1016841-1-ihuguet@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	sfc: optimize RXQs count and affinities \| expand [net-next,resend,0/2] sfc: optimize RXQs count and affinities [net-next,resend,1/2] sfc: default config to 1 channel/core in local NUMA node only [net-next,resend,2/2] sfc: set affinity hints in local NUMA node only

Message ID

20220216094139.15989-1-ihuguet@redhat.com (mailing list archive)

Headers

From: =?utf-8?b?w43DsWlnbyBIdWd1ZXQ=?= <ihuguet@redhat.com>
To: ecree.xilinx@gmail.com, habetsm.xilinx@gmail.com
Cc: davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org,
	=?utf-8?b?w43DsWlnbyBIdWd1ZXQ=?= <ihuguet@redhat.com>
Subject: [PATCH net-next resend 0/2] sfc: optimize RXQs count and affinities
Date: Wed, 16 Feb 2022 10:41:37 +0100
Message-Id: <20220216094139.15989-1-ihuguet@redhat.com>
In-Reply-To: <20220128151922.1016841-1-ihuguet@redhat.com>
References: <20220128151922.1016841-1-ihuguet@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

sfc: optimize RXQs count and affinities | expand

Message

Íñigo Huguet Feb. 16, 2022, 9:41 a.m. UTC

In sfc driver one RX queue per physical core was allocated by default.
Later on, IRQ affinities were set spreading the IRQs in all NUMA local
CPUs.

However, with that default configuration it result in a non very optimal
configuration in many modern systems. Specifically, in systems with hyper
threading and 2 NUMA nodes, affinities are set in a way that IRQs are
handled by all logical cores of one same NUMA node. Handling IRQs from
both hyper threading siblings has no benefit, and setting affinities to one
queue per physical core is neither a very good idea because there is a
performance penalty for moving data across nodes (I was able to check it
with some XDP tests using pktgen).

This patches reduce the default number of channels to one per physical
core in the local NUMA node. Then, they set IRQ affinities to CPUs in
the local NUMA node only. This way we save hardware resources since
channels are limited resources. We also leave more room for XDP_TX
channels without hitting driver's limit of 32 channels per interface.

Running performance tests using iperf with a SFC9140 device showed no
performance penalty for reducing the number of channels.

RX XDP tests showed that performance can go down to less than half if
the IRQ is handled by a CPU in a different NUMA node, which doesn't
happen with the new defaults from this patches.

Íñigo Huguet (2):
  sfc: default config to 1 channel/core in local NUMA node only
  sfc: set affinity hints in local NUMA node only

 drivers/net/ethernet/sfc/efx_channels.c | 62 +++++++++++++++++--------
 1 file changed, 43 insertions(+), 19 deletions(-)

Comments

Jakub Kicinski Feb. 19, 2022, 5:19 a.m. UTC | #1

On Wed, 16 Feb 2022 10:41:37 +0100 Íñigo Huguet wrote:
> In sfc driver one RX queue per physical core was allocated by default.
> Later on, IRQ affinities were set spreading the IRQs in all NUMA local
> CPUs.
> 
> However, with that default configuration it result in a non very optimal
> configuration in many modern systems. Specifically, in systems with hyper
> threading and 2 NUMA nodes, affinities are set in a way that IRQs are
> handled by all logical cores of one same NUMA node. Handling IRQs from
> both hyper threading siblings has no benefit, and setting affinities to one
> queue per physical core is neither a very good idea because there is a
> performance penalty for moving data across nodes (I was able to check it
> with some XDP tests using pktgen).
> 
> This patches reduce the default number of channels to one per physical
> core in the local NUMA node. Then, they set IRQ affinities to CPUs in
> the local NUMA node only. This way we save hardware resources since
> channels are limited resources. We also leave more room for XDP_TX
> channels without hitting driver's limit of 32 channels per interface.
> 
> Running performance tests using iperf with a SFC9140 device showed no
> performance penalty for reducing the number of channels.
> 
> RX XDP tests showed that performance can go down to less than half if
> the IRQ is handled by a CPU in a different NUMA node, which doesn't
> happen with the new defaults from this patches.

Martin, Ed, any thoughts?