Message ID | 20220216094139.15989-1-ihuguet@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | sfc: optimize RXQs count and affinities | expand |
On Wed, 16 Feb 2022 10:41:37 +0100 Íñigo Huguet wrote: > In sfc driver one RX queue per physical core was allocated by default. > Later on, IRQ affinities were set spreading the IRQs in all NUMA local > CPUs. > > However, with that default configuration it result in a non very optimal > configuration in many modern systems. Specifically, in systems with hyper > threading and 2 NUMA nodes, affinities are set in a way that IRQs are > handled by all logical cores of one same NUMA node. Handling IRQs from > both hyper threading siblings has no benefit, and setting affinities to one > queue per physical core is neither a very good idea because there is a > performance penalty for moving data across nodes (I was able to check it > with some XDP tests using pktgen). > > This patches reduce the default number of channels to one per physical > core in the local NUMA node. Then, they set IRQ affinities to CPUs in > the local NUMA node only. This way we save hardware resources since > channels are limited resources. We also leave more room for XDP_TX > channels without hitting driver's limit of 32 channels per interface. > > Running performance tests using iperf with a SFC9140 device showed no > performance penalty for reducing the number of channels. > > RX XDP tests showed that performance can go down to less than half if > the IRQ is handled by a CPU in a different NUMA node, which doesn't > happen with the new defaults from this patches. Martin, Ed, any thoughts?