mbox series

[net-next,v3,0/4] eth: mlx4: use the page pool for Rx buffers

Message ID 20250213010635.1354034-1-kuba@kernel.org (mailing list archive)
Headers show
Series eth: mlx4: use the page pool for Rx buffers | expand

Message

Jakub Kicinski Feb. 13, 2025, 1:06 a.m. UTC
Convert mlx4 to page pool. I've been sitting on these patches for
over a year, and Jonathan Lemon had a similar series years before.
We never deployed it or sent upstream because it didn't really show
much perf win under normal load (admittedly I think the real testing
was done before Ilias's work on recycling).

During the v6.9 kernel rollout Meta's CDN team noticed that machines
with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time
spent mapping buffers in the IOMMU). The problem does not occur with
modern NICs, so I dusted off this series and reportedly it still works.
And it makes the problem go away, no overloads, perf back in line with
older kernels. Something must have changed in IOMMU code, I guess.

This series is very simple, and can very likely be optimized further.
Thing is, I don't have access to any CX3 Pro NICs. They only exist
in CDN locations which haven't had a HW refresh for a while. So I can
say this series survives a week under traffic w/ XDP enabled, but
my ability to iterate and improve is a bit limited.

v3:
 - use priv->rx_skb_size for buffer size calculation
 - use priv->dma_dir for DMA mapping direction, instead of always BIDIR
v2: https://lore.kernel.org/20250211192141.619024-1-kuba@kernel.org
 - remove unnecessary .max_size (Nit by Ido.)
 - change pool size
 - fix xdp xmit support description.
v1: https://lore.kernel.org/20250205031213.358973-1-kuba@kernel.org

Jakub Kicinski (4):
  eth: mlx4: create a page pool for Rx
  eth: mlx4: don't try to complete XDP frames in netpoll
  eth: mlx4: remove the local XDP fast-recycling ring
  eth: mlx4: use the page pool for Rx buffers

 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  15 +--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 119 +++++++------------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   |  17 ++-
 3 files changed, 52 insertions(+), 99 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org Feb. 18, 2025, 11:40 p.m. UTC | #1
Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 12 Feb 2025 17:06:31 -0800 you wrote:
> Convert mlx4 to page pool. I've been sitting on these patches for
> over a year, and Jonathan Lemon had a similar series years before.
> We never deployed it or sent upstream because it didn't really show
> much perf win under normal load (admittedly I think the real testing
> was done before Ilias's work on recycling).
> 
> During the v6.9 kernel rollout Meta's CDN team noticed that machines
> with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time
> spent mapping buffers in the IOMMU). The problem does not occur with
> modern NICs, so I dusted off this series and reportedly it still works.
> And it makes the problem go away, no overloads, perf back in line with
> older kernels. Something must have changed in IOMMU code, I guess.
> 
> [...]

Here is the summary with links:
  - [net-next,v3,1/4] eth: mlx4: create a page pool for Rx
    (no matching commit)
  - [net-next,v3,2/4] eth: mlx4: don't try to complete XDP frames in netpoll
    https://git.kernel.org/netdev/net-next/c/8fdeafd66eda
  - [net-next,v3,3/4] eth: mlx4: remove the local XDP fast-recycling ring
    (no matching commit)
  - [net-next,v3,4/4] eth: mlx4: use the page pool for Rx buffers
    (no matching commit)

You are awesome, thank you!