[bpf-next,2/2] xsk: introduce generic almost-zerocopy xmit

Message ID	20210330231528.546284-3-alobakin@pm.me (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> Date: Tue, 30 Mar 2021 23:15:59 +0000 To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net> From: Alexander Lobakin <alobakin@pm.me> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>, =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= <bjorn@kernel.org>, Magnus Karlsson <magnus.karlsson@intel.com>, Jonathan Lemon <jonathan.lemon@gmail.com>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Jesper Dangaard Brouer <hawk@kernel.org>, John Fastabend <john.fastabend@gmail.com>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>, Alexander Lobakin <alobakin@pm.me>, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Reply-To: Alexander Lobakin <alobakin@pm.me> Subject: [PATCH bpf-next 2/2] xsk: introduce generic almost-zerocopy xmit Message-ID: <20210330231528.546284-3-alobakin@pm.me> In-Reply-To: <20210330231528.546284-1-alobakin@pm.me> References: <20210330231528.546284-1-alobakin@pm.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk
Series	xsk: introduce generic almost-zerocopy xmit \| expand [bpf-next,0/2] xsk: introduce generic almost-zerocopy xmit [bpf-next,1/2] xsk: speed-up generic full-copy xmit [bpf-next,2/2] xsk: introduce generic almost-zerocopy xmit

Message ID

20210330231528.546284-3-alobakin@pm.me (mailing list archive)

State

Superseded

Delegated to:

BPF

Headers

Date: Tue, 30 Mar 2021 23:15:59 +0000
To: Alexei Starovoitov <ast@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>
From: Alexander Lobakin <alobakin@pm.me>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
 =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= <bjorn@kernel.org>,
 Magnus Karlsson <magnus.karlsson@intel.com>,
 Jonathan Lemon <jonathan.lemon@gmail.com>,
 "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>,
 Jesper Dangaard Brouer <hawk@kernel.org>,
 John Fastabend <john.fastabend@gmail.com>,
 Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <kafai@fb.com>,
 Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
 KP Singh <kpsingh@kernel.org>, Alexander Lobakin <alobakin@pm.me>,
 netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Reply-To: Alexander Lobakin <alobakin@pm.me>
Subject: [PATCH bpf-next 2/2] xsk: introduce generic almost-zerocopy xmit
Message-ID: <20210330231528.546284-3-alobakin@pm.me>
In-Reply-To: <20210330231528.546284-1-alobakin@pm.me>
References: <20210330231528.546284-1-alobakin@pm.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Precedence: bulk

Series

xsk: introduce generic almost-zerocopy xmit | expand

Checks

Context	Check	Description
netdev/cover_letter	success	Link
netdev/fixes_present	success	Link
netdev/patch_count	success	Link
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/subject_prefix	success	Link
netdev/cc_maintainers	success	CCed 16 of 16 maintainers
netdev/source_inline	success	Was 0 now: 0
netdev/verify_signedoff	success	Link
netdev/module_param	success	Was 0 now: 0
netdev/build_32bit	success	Errors and warnings before: 1 this patch: 1
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/verify_fixes	success	Link
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 66 lines checked
netdev/build_allmodconfig_warn	success	Errors and warnings before: 1 this patch: 1
netdev/header_inline	success	Link

Context

Check

Description

netdev/cover_letter

success

Link

netdev/fixes_present

success

Link

netdev/patch_count

success

Link

netdev/tree_selection

success

Clearly marked for bpf-next

netdev/subject_prefix

success

Link

netdev/cc_maintainers

success

CCed 16 of 16 maintainers

netdev/source_inline

success

Was 0 now: 0

netdev/verify_signedoff

success

Link

netdev/module_param

success

Was 0 now: 0

netdev/build_32bit

success

Errors and warnings before: 1 this patch: 1

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/verify_fixes

success

Link

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 66 lines checked

netdev/build_allmodconfig_warn

success

Errors and warnings before: 1 this patch: 1

netdev/header_inline

success

Link

Commit Message

Alexander Lobakin March 30, 2021, 11:15 p.m. UTC

The reasons behind IFF_TX_SKB_NO_LINEAR are:
 - most drivers expect skb with the linear space;
 - most drivers expect hard header in the linear space;
 - many drivers need some headroom to insert custom headers
   and/or pull headers from frags (pskb_may_pull() etc.).

With some bits of overhead, we can satisfy all of this without
inducing full buffer data copy.

Now frames that are no lesser than 128 bytes (to mitigate allocation
overhead) are also being built using zerocopy path (if the device and
driver support S/G xmit, which is almost always true).
We allocate 256* additional bytes for skb linear space and pull hard
header there (aligning its end by 16 bytes for platforms with
NET_IP_ALIGN). The rest of the buffer data is just pinned as frags.
A room of at least 242 bytes is left for any driver needs.

We could just pass the buffer to eth_get_headlen() to minimize
allocation overhead and be able to copy all the headers into the
linear space, but the flow dissection procedure tends to be more
expensive than the current approach.

IFF_TX_SKB_NO_LINEAR path remains unchanged and is still actual and
generally faster.

* The value of 256 bytes is kinda "magic", it can be found in lots
  of drivers and places of core code and it is believed that 256
  bytes are enough to store any headers of any frame.

Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/xdp/xsk.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

--
2.31.1

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 41f8f21b3348..090ff9c096a3 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -445,6 +445,9 @@  static void xsk_destruct_skb(struct sk_buff *skb)
 	sock_wfree(skb);
 }

+#define XSK_SKB_HEADLEN		256
+#define XSK_COPY_THRESHOLD	(XSK_SKB_HEADLEN / 2)
+
 static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 					      struct xdp_desc *desc)
 {
@@ -452,13 +455,22 @@  static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 	u32 hr, len, ts, offset, copy, copied;
 	struct sk_buff *skb;
 	struct page *page;
+	bool need_pull;
 	void *buffer;
 	int err, i;
 	u64 addr;

 	hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom));
+	len = hr;
+
+	need_pull = !(xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR);
+	if (need_pull) {
+		len += XSK_SKB_HEADLEN;
+		len += NET_IP_ALIGN;
+		hr += NET_IP_ALIGN;
+	}

-	skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err);
+	skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
 	if (unlikely(!skb))
 		return ERR_PTR(err);

@@ -488,6 +500,11 @@  static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 	skb->data_len += len;
 	skb->truesize += ts;

+	if (need_pull && unlikely(!__pskb_pull_tail(skb, ETH_HLEN))) {
+		kfree_skb(skb);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	refcount_add(ts, &xs->sk.sk_wmem_alloc);

 	return skb;
@@ -498,19 +515,20 @@  static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 {
 	struct net_device *dev = xs->dev;
 	struct sk_buff *skb;
+	u32 len = desc->len;

-	if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
+	if ((dev->priv_flags & IFF_TX_SKB_NO_LINEAR) ||
+	    (len >= XSK_COPY_THRESHOLD && likely(dev->features & NETIF_F_SG))) {
 		skb = xsk_build_skb_zerocopy(xs, desc);
 		if (IS_ERR(skb))
 			return skb;
 	} else {
-		u32 hr, tr, len;
 		void *buffer;
+		u32 hr, tr;
 		int err;

 		hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom));
 		tr = dev->needed_tailroom;
-		len = desc->len;

 		skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err);
 		if (unlikely(!skb))