From patchwork Thu Feb 18 20:49:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12094305 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 401FFC433E6 for ; Thu, 18 Feb 2021 20:51:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 000B064EC2 for ; Thu, 18 Feb 2021 20:51:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230121AbhBRUvA (ORCPT ); Thu, 18 Feb 2021 15:51:00 -0500 Received: from mail1.protonmail.ch ([185.70.40.18]:32019 "EHLO mail1.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229577AbhBRUui (ORCPT ); Thu, 18 Feb 2021 15:50:38 -0500 Date: Thu, 18 Feb 2021 20:49:41 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613681385; bh=qd8YH/ZJZyXiTeuWuSjWkQwVFPKKu5hK9PwUOTkC+Yg=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=JLd3BFFqmqlK5YvYKyOBF0u66U0ZD/xadJkOEHNpUEcbgUfxmp7gc7rH2wVptKKzH 5aAoeIWMLst6Eeg638jiyGQcFw6P+QJQ2188JLLvojAQDeruvMiu1FsPOdpSNOsklH a4Z/nZ4MHwJVZ4JrFN3dJGHKz/cja7BJBHYdA6apqn9TUPgj9D46DGUdeim587GmQV q0wLuDEH+RswdJ8Ue8XqJEekiKzIP7g6aN1Ma+HwihRRnZiCXNxuSCV7c4TrnJ8QeF wdIvLq1Xc9aLxYMe+vMXzvdGKBr/I/pzGJHEIXzRwNnftx/MK+Lo2AZalHxHQLhzT5 UAG3ILzYOqrlg== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v8 bpf-next 1/5] netdevice: add missing IFF_PHONY_HEADROOM self-definition Message-ID: <20210218204908.5455-2-alobakin@pm.me> In-Reply-To: <20210218204908.5455-1-alobakin@pm.me> References: <20210218204908.5455-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This is harmless for now, but can be fatal for future refactors. Fixes: 871b642adebe3 ("netdev: introduce ndo_set_rx_headroom") Signed-off-by: Alexander Lobakin Acked-by: John Fastabend --- include/linux/netdevice.h | 1 + 1 file changed, 1 insertion(+) -- 2.30.1 diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ddf4cfc12615..3b6f82c2c271 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1577,6 +1577,7 @@ enum netdev_priv_flags { #define IFF_L3MDEV_SLAVE IFF_L3MDEV_SLAVE #define IFF_TEAM IFF_TEAM #define IFF_RXFH_CONFIGURED IFF_RXFH_CONFIGURED +#define IFF_PHONY_HEADROOM IFF_PHONY_HEADROOM #define IFF_MACSEC IFF_MACSEC #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER #define IFF_FAILOVER IFF_FAILOVER From patchwork Thu Feb 18 20:50:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12094307 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D788C433E6 for ; Thu, 18 Feb 2021 20:51:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CD23564EB6 for ; Thu, 18 Feb 2021 20:51:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230205AbhBRUvP (ORCPT ); Thu, 18 Feb 2021 15:51:15 -0500 Received: from mail-40136.protonmail.ch ([185.70.40.136]:54191 "EHLO mail-40136.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229946AbhBRUux (ORCPT ); Thu, 18 Feb 2021 15:50:53 -0500 Date: Thu, 18 Feb 2021 20:50:02 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613681403; bh=UT5dqOxXemXMBrQO9+qpLgvawJH5qOALCJiqi4jXfPQ=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=ZzPhaU1Um8KKnztCWdjI7U1gntZ54rIbwLoyG7z7JO3++LUhiAORy0MgqXE/++yRk gBTBajl4Kw+vw+g+fNl3uQ8uOaSdtOL3wWa6H1BirrmQ79QJeQINP6Y5tWT8pOd6Qa DalmENxYUoppmXBC22JeA1fzilIOKF9HLjjHxP9zpv2M4z9XHggpcu1r3cAsTZnKVS rMOV7DQNcbsVP7P7z7+p4p2gNmg2+8ODetnUuCQzveJ5BJPdSY6uh3GJp14Aqkq8AG G/GKp35+v2K8Zz8j2nkQPpgw1O2d+zI2K7JTnG9803cnqeuIA9qFz44X+VSE3iLusm f2Wse1HssAk2g== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v8 bpf-next 2/5] net: add priv_flags for allow tx skb without linear Message-ID: <20210218204908.5455-3-alobakin@pm.me> In-Reply-To: <20210218204908.5455-1-alobakin@pm.me> References: <20210218204908.5455-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Xuan Zhuo In some cases, we hope to construct skb directly based on the existing memory without copying data. In this case, the page will be placed directly in the skb, and the linear space of skb is empty. But unfortunately, many the network card does not support this operation. For example Mellanox Technologies MT27710 Family [ConnectX-4 Lx] will get the following error message: mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2 WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64 00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb So a priv_flag is added here to indicate whether the network card supports this feature. Signed-off-by: Xuan Zhuo Suggested-by: Alexander Lobakin [ alobakin: give a new flag more detailed description ] Signed-off-by: Alexander Lobakin Acked-by: John Fastabend --- include/linux/netdevice.h | 4 ++++ 1 file changed, 4 insertions(+) -- 2.30.1 diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3b6f82c2c271..6cef47b76cc6 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1518,6 +1518,8 @@ struct net_device_ops { * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device * @IFF_LIVE_RENAME_OK: rename is allowed while device is up and running + * @IFF_TX_SKB_NO_LINEAR: device/driver is capable of xmitting frames with + * skb_headlen(skb) == 0 (data starts from frag0) */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1551,6 +1553,7 @@ enum netdev_priv_flags { IFF_FAILOVER_SLAVE = 1<<28, IFF_L3MDEV_RX_HANDLER = 1<<29, IFF_LIVE_RENAME_OK = 1<<30, + IFF_TX_SKB_NO_LINEAR = 1<<31, }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN @@ -1584,6 +1587,7 @@ enum netdev_priv_flags { #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE #define IFF_L3MDEV_RX_HANDLER IFF_L3MDEV_RX_HANDLER #define IFF_LIVE_RENAME_OK IFF_LIVE_RENAME_OK +#define IFF_TX_SKB_NO_LINEAR IFF_TX_SKB_NO_LINEAR /** * struct net_device - The DEVICE structure. From patchwork Thu Feb 18 20:50:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12094309 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AE94C433E9 for ; Thu, 18 Feb 2021 20:51:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0343764EBA for ; Thu, 18 Feb 2021 20:51:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230334AbhBRUvX (ORCPT ); Thu, 18 Feb 2021 15:51:23 -0500 Received: from mail-40136.protonmail.ch ([185.70.40.136]:46390 "EHLO mail-40136.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230112AbhBRUvA (ORCPT ); Thu, 18 Feb 2021 15:51:00 -0500 Date: Thu, 18 Feb 2021 20:50:17 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613681418; bh=nvb2yx8DAQSWWcjoTwgm5AP3LHJtzlTi3X02uTTAjO0=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=bjUASJEJZpVOK3U8eOaxIx7AQETLm4KBLHSNpWpoE+Q2dy76YRTXZis9tFDicFGUG YqD+gKmNSKojjy1YVZou7MpSnrzgxtyJLttJnYwHwOiQ7r50/pGUZfByj41FdhoKWT QakuOv57clNV7tUMsBd+0vaRlFKOtD0SZKXe3OTm/UFfcEVVWWe1SkjMKNUC3GmptP 9TDLJiZicwLzZpT+DXJH8jEf52j9vz+miTl+PRG3GozSTSp3WuBrOD+IpFLoRY1sGh iXaaGX113Xplqxtu6r9KVQUJgr6n4BeHtvjcGZFp0Ez1HsjE8HCUDm5qYms/FzlJc9 s632M29CivQyw== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v8 bpf-next 3/5] virtio-net: support IFF_TX_SKB_NO_LINEAR Message-ID: <20210218204908.5455-4-alobakin@pm.me> In-Reply-To: <20210218204908.5455-1-alobakin@pm.me> References: <20210218204908.5455-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Xuan Zhuo Virtio net supports the case where the skb linear space is empty, so add priv_flags. Signed-off-by: Xuan Zhuo Acked-by: Michael S. Tsirkin Signed-off-by: Alexander Lobakin Acked-by: John Fastabend --- drivers/net/virtio_net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- 2.30.1 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ba8e63792549..f2ff6c3906c1 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2972,7 +2972,8 @@ static int virtnet_probe(struct virtio_device *vdev) return -ENOMEM; /* Set up network device as normal. */ - dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE; + dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE | + IFF_TX_SKB_NO_LINEAR; dev->netdev_ops = &virtnet_netdev; dev->features = NETIF_F_HIGHDMA; From patchwork Thu Feb 18 20:50:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12094311 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02053C433E6 for ; Thu, 18 Feb 2021 20:52:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A5C6B64EC2 for ; Thu, 18 Feb 2021 20:52:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230465AbhBRUvl (ORCPT ); Thu, 18 Feb 2021 15:51:41 -0500 Received: from mail-40133.protonmail.ch ([185.70.40.133]:55513 "EHLO mail-40133.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230209AbhBRUvT (ORCPT ); Thu, 18 Feb 2021 15:51:19 -0500 Date: Thu, 18 Feb 2021 20:50:31 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613681436; bh=owSrMjomC3UBcnN55YEs7Jgmm2WXTHJ5680gLftdiFo=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=e3+8yjP4BwqRmhn71hIU3iQn4c6MwxGpDLPoIOKsOGawdxCkNZ0LEwQhKUdhNE+Wl W/mQLdf22X0Rbg8ur47JLw6ZL/r63HyOIlYiR8orF7aWMYtLheYdpFkiZCX5LbT4Ox zn5JHItknuwNjdpbP4XXuBcZ/pdS/e2Tq9i0kd+abvHFYFebZEwKT1uzLVgDPngZ6e 6QHnnYrbG8OTVv4OOo08mJ81l+9nuKN7I9ADINx3PzrRVsL1xz8dk1QbzHAszHKqAC dnBXqzBFGekWe/Lr+k09fC3MhURuTKUdP8EZncb6uUSTz+OXlP1WwMAcY3FtkFaRVS vqYPSt4o7CR2Q== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v8 bpf-next 4/5] xsk: respect device's headroom and tailroom on generic xmit path Message-ID: <20210218204908.5455-5-alobakin@pm.me> In-Reply-To: <20210218204908.5455-1-alobakin@pm.me> References: <20210218204908.5455-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net xsk_generic_xmit() allocates a new skb and then queues it for xmitting. The size of new skb's headroom is desc->len, so it comes to the driver/device with no reserved headroom and/or tailroom. Lots of drivers need some headroom (and sometimes tailroom) to prepend (and/or append) some headers or data, e.g. CPU tags, device-specific headers/descriptors (LSO, TLS etc.), and if case of no available space skb_cow_head() will reallocate the skb. Reallocations are unwanted on fast-path, especially when it comes to XDP, so generic XSK xmit should reserve the spaces declared in dev->needed_headroom and dev->needed tailroom to avoid them. Note on max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)): Usually, output functions reserve LL_RESERVED_SPACE(dev), which consists of dev->hard_header_len + dev->needed_headroom, aligned by 16. However, on XSK xmit hard header is already here in the chunk, so hard_header_len is not needed. But it'd still be better to align data up to cacheline, while reserving no less than driver requests for headroom. NET_SKB_PAD here is to double-insure there will be no reallocations even when the driver advertises no needed_headroom, but in fact need it (not so rare case). Fixes: 35fcde7f8deb ("xsk: support for Tx") Signed-off-by: Alexander Lobakin Acked-by: Magnus Karlsson Acked-by: John Fastabend --- net/xdp/xsk.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) -- 2.30.1 diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 4faabd1ecfd1..143979ea4165 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -454,12 +454,16 @@ static int xsk_generic_xmit(struct sock *sk) struct sk_buff *skb; unsigned long flags; int err = 0; + u32 hr, tr; mutex_lock(&xs->mutex); if (xs->queue_id >= xs->dev->real_num_tx_queues) goto out; + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); + tr = xs->dev->needed_tailroom; + while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { char *buffer; u64 addr; @@ -471,11 +475,13 @@ static int xsk_generic_xmit(struct sock *sk) } len = desc.len; - skb = sock_alloc_send_skb(sk, len, 1, &err); + skb = sock_alloc_send_skb(sk, hr + len + tr, 1, &err); if (unlikely(!skb)) goto out; + skb_reserve(skb, hr); skb_put(skb, len); + addr = desc.addr; buffer = xsk_buff_raw_get_data(xs->pool, addr); err = skb_store_bits(skb, 0, buffer, len); From patchwork Thu Feb 18 20:50:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12094313 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2634BC433E9 for ; Thu, 18 Feb 2021 20:52:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CF71664EB6 for ; Thu, 18 Feb 2021 20:52:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230481AbhBRUvs (ORCPT ); Thu, 18 Feb 2021 15:51:48 -0500 Received: from mail2.protonmail.ch ([185.70.40.22]:20619 "EHLO mail2.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230386AbhBRUvh (ORCPT ); Thu, 18 Feb 2021 15:51:37 -0500 Date: Thu, 18 Feb 2021 20:50:45 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613681453; bh=zQHj3/HKAPzvECNoTNIwcCk1EJt5vsh7hS7FKXiS/sc=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=ZaAJ7yWJwbDMTAjxsDj8pZLoYRkqBi4sWW6AeXYn8ntBUm9OQI5VlFBidbm4Vk/J9 ScDTN4/bAGVmPuAluGnPDBRMifCI1ymQtFsRxuYJ6G0Az22QJrTuj5oW544niOYTtS v9uwIEzsYCZHI/oRdRfIvvP3jEbGnlrGTVoH1qdF2F9QzwB5BeA47PecaskfDA6JoQ I8OA+wXsv7hX7om4UZLwv1iqohoTus65qybOhrGoX0Cf9JdmlahNuST4NoTNU5upx7 eZym8XS+4yqz5zsNMX+h7iuTfut+UtaJS/QLWYb67mXxzPzqNqnmekBPZyJdo8FAnf fnEW6twX25uzg== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v8 bpf-next 5/5] xsk: build skb by page (aka generic zerocopy xmit) Message-ID: <20210218204908.5455-6-alobakin@pm.me> In-Reply-To: <20210218204908.5455-1-alobakin@pm.me> References: <20210218204908.5455-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Xuan Zhuo This patch is used to construct skb based on page to save memory copy overhead. This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to directly construct skb. If this feature is not supported, it is still necessary to copy data to construct skb. ---------------- Performance Testing ------------ The test environment is Aliyun ECS server. Test cmd: ``` xdpsock -i eth0 -t -S -s ``` Test result data: size 64 512 1024 1500 copy 1916747 1775988 1600203 1440054 page 1974058 1953655 1945463 1904478 percent 3.0% 10.0% 21.58% 32.3% Signed-off-by: Xuan Zhuo Reviewed-by: Dust Li [ alobakin: - expand subject to make it clearer; - improve skb->truesize calculation; - reserve some headroom in skb for drivers; - tailroom is not needed as skb is non-linear ] Signed-off-by: Alexander Lobakin Acked-by: Magnus Karlsson Acked-by: John Fastabend --- net/xdp/xsk.c | 120 ++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 96 insertions(+), 24 deletions(-) -- 2.30.1 diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 143979ea4165..a71ed664da0a 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -445,6 +445,97 @@ static void xsk_destruct_skb(struct sk_buff *skb) sock_wfree(skb); } +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, + struct xdp_desc *desc) +{ + struct xsk_buff_pool *pool = xs->pool; + u32 hr, len, ts, offset, copy, copied; + struct sk_buff *skb; + struct page *page; + void *buffer; + int err, i; + u64 addr; + + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); + + skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err); + if (unlikely(!skb)) + return ERR_PTR(err); + + skb_reserve(skb, hr); + + addr = desc->addr; + len = desc->len; + ts = pool->unaligned ? len : pool->chunk_size; + + buffer = xsk_buff_raw_get_data(pool, addr); + offset = offset_in_page(buffer); + addr = buffer - pool->addrs; + + for (copied = 0, i = 0; copied < len; i++) { + page = pool->umem->pgs[addr >> PAGE_SHIFT]; + get_page(page); + + copy = min_t(u32, PAGE_SIZE - offset, len - copied); + skb_fill_page_desc(skb, i, page, offset, copy); + + copied += copy; + addr += copy; + offset = 0; + } + + skb->len += len; + skb->data_len += len; + skb->truesize += ts; + + refcount_add(ts, &xs->sk.sk_wmem_alloc); + + return skb; +} + +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, + struct xdp_desc *desc) +{ + struct net_device *dev = xs->dev; + struct sk_buff *skb; + + if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { + skb = xsk_build_skb_zerocopy(xs, desc); + if (IS_ERR(skb)) + return skb; + } else { + u32 hr, tr, len; + void *buffer; + int err; + + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)); + tr = dev->needed_tailroom; + len = desc->len; + + skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err); + if (unlikely(!skb)) + return ERR_PTR(err); + + skb_reserve(skb, hr); + skb_put(skb, len); + + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); + err = skb_store_bits(skb, 0, buffer, len); + if (unlikely(err)) { + kfree_skb(skb); + return ERR_PTR(err); + } + } + + skb->dev = dev; + skb->priority = xs->sk.sk_priority; + skb->mark = xs->sk.sk_mark; + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; + skb->destructor = xsk_destruct_skb; + + return skb; +} + static int xsk_generic_xmit(struct sock *sk) { struct xdp_sock *xs = xdp_sk(sk); @@ -454,56 +545,37 @@ static int xsk_generic_xmit(struct sock *sk) struct sk_buff *skb; unsigned long flags; int err = 0; - u32 hr, tr; mutex_lock(&xs->mutex); if (xs->queue_id >= xs->dev->real_num_tx_queues) goto out; - hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); - tr = xs->dev->needed_tailroom; - while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { - char *buffer; - u64 addr; - u32 len; - if (max_batch-- == 0) { err = -EAGAIN; goto out; } - len = desc.len; - skb = sock_alloc_send_skb(sk, hr + len + tr, 1, &err); - if (unlikely(!skb)) + skb = xsk_build_skb(xs, &desc); + if (IS_ERR(skb)) { + err = PTR_ERR(skb); goto out; + } - skb_reserve(skb, hr); - skb_put(skb, len); - - addr = desc.addr; - buffer = xsk_buff_raw_get_data(xs->pool, addr); - err = skb_store_bits(skb, 0, buffer, len); /* This is the backpressure mechanism for the Tx path. * Reserve space in the completion queue and only proceed * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ spin_lock_irqsave(&xs->pool->cq_lock, flags); - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { + if (xskq_prod_reserve(xs->pool->cq)) { spin_unlock_irqrestore(&xs->pool->cq_lock, flags); kfree_skb(skb); goto out; } spin_unlock_irqrestore(&xs->pool->cq_lock, flags); - skb->dev = xs->dev; - skb->priority = sk->sk_priority; - skb->mark = sk->sk_mark; - skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; - skb->destructor = xsk_destruct_skb; - err = __dev_direct_xmit(skb, xs->queue_id); if (err == NETDEV_TX_BUSY) { /* Tell user-space to retry the send */