From patchwork Wed Feb 17 12:00:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12091491 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6357C433E0 for ; Wed, 17 Feb 2021 12:02:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B0FC464E76 for ; Wed, 17 Feb 2021 12:02:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232533AbhBQMBw (ORCPT ); Wed, 17 Feb 2021 07:01:52 -0500 Received: from mail1.protonmail.ch ([185.70.40.18]:40026 "EHLO mail1.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232496AbhBQMBs (ORCPT ); Wed, 17 Feb 2021 07:01:48 -0500 Date: Wed, 17 Feb 2021 12:00:54 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613563264; bh=f5a51AJWUIP1+g9AKjCCLIkq+KzMf7BABzoVckGuAUU=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=f0NFekYpkvZVy3mI4UEYe7UqpDTEFJfKN5BnZhxYMH631ecg9GONcBNuGk5EGj4KP vPbdBYfbRqVlxTmaNr/pfwLeYcWUAdgMCpzSnJotGyq9yOFu66bmhsjlEDsAb2JpGw Dp4lYik/+6sdAxP93kxdqwF8pwaifimb+rwC/sV0FXfAVmK+MGg5XFXIyjjGcmH3Mg wFpoRnw6BsWAsamRR3x5VzMZRgZBKOjSuWeNJg99guRgIESWz0oxchIl6o3yrjVZII QB0cBLFgmdRkHaBdqHhxFDDB7/5r5lZ+DdDL6CPNoTjxNjpqNRqkxyQsK9zme81u9N RODymtlcXBNRg== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v7 bpf-next 1/6] netdev_priv_flags: add missing IFF_PHONY_HEADROOM self-definition Message-ID: <20210217120003.7938-2-alobakin@pm.me> In-Reply-To: <20210217120003.7938-1-alobakin@pm.me> References: <20210217120003.7938-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This is harmless for now, but comes fatal for the subsequent patch. Fixes: 871b642adebe3 ("netdev: introduce ndo_set_rx_headroom") Signed-off-by: Alexander Lobakin --- include/linux/netdevice.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ddf4cfc12615..3b6f82c2c271 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1577,6 +1577,7 @@ enum netdev_priv_flags { #define IFF_L3MDEV_SLAVE IFF_L3MDEV_SLAVE #define IFF_TEAM IFF_TEAM #define IFF_RXFH_CONFIGURED IFF_RXFH_CONFIGURED +#define IFF_PHONY_HEADROOM IFF_PHONY_HEADROOM #define IFF_MACSEC IFF_MACSEC #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER #define IFF_FAILOVER IFF_FAILOVER From patchwork Wed Feb 17 12:01:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12091493 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UPPERCASE_50_75 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A081EC433DB for ; Wed, 17 Feb 2021 12:03:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6291164E76 for ; Wed, 17 Feb 2021 12:03:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232111AbhBQMDS (ORCPT ); Wed, 17 Feb 2021 07:03:18 -0500 Received: from mail-40133.protonmail.ch ([185.70.40.133]:14358 "EHLO mail-40133.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232556AbhBQMC0 (ORCPT ); Wed, 17 Feb 2021 07:02:26 -0500 Date: Wed, 17 Feb 2021 12:01:10 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613563276; bh=6DL4KLQOQC3zKbOQ+sDV/WMzc6ub+ldnNzx24Bfbo8c=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=cERpgKPIoX8HUR9e8LXW7Tj0GZ3D1maFvT1wfGzcyQ8RnTvB/evnZg4jXQOHLMUFe oBc0QxAqUfgt4wFGSSQZoa6qKK1I8lY2neXTWd9MJIAX0d6DEu7zgFoJ0XWWEIIj7X dXITrQoqVjATrTjUpARuXR4WqU62IMJSKSZCQprqFyd4TlhVqMcqULX5zRN4bwxAuR Ccp7cMUPf9tL1b4VUJU/asvhPDHLn+OYBPUUZ230Xl+FnklsyAtWHWNieZVUjs2YiG D1NJItf0vUz56NULOCBlj77SGh5ZYqf8WZ5LlmRAVlw4MvfZGPXuR53Refu0D2LzSs JUOXK6H6eO1yw== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, kernel test robot Reply-To: Alexander Lobakin Subject: [PATCH v7 bpf-next 2/6] netdevice: check for net_device::priv_flags bitfield overflow Message-ID: <20210217120003.7938-3-alobakin@pm.me> In-Reply-To: <20210217120003.7938-1-alobakin@pm.me> References: <20210217120003.7938-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net We almost ran out of unsigned int bitwidth. Define priv flags and check for potential overflow in the fashion of netdev_features_t. Defined this way, priv_flags can be easily expanded later with just changing its typedef. Signed-off-by: Alexander Lobakin Reported-by: kernel test robot # Inverted assert condition --- include/linux/netdevice.h | 199 ++++++++++++++++++++------------------ 1 file changed, 105 insertions(+), 94 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3b6f82c2c271..2c1a642ecdc0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1483,107 +1483,118 @@ struct net_device_ops { * * You should have a pretty good reason to be extending these flags. * - * @IFF_802_1Q_VLAN: 802.1Q VLAN device - * @IFF_EBRIDGE: Ethernet bridging device - * @IFF_BONDING: bonding master or slave - * @IFF_ISATAP: ISATAP interface (RFC4214) - * @IFF_WAN_HDLC: WAN HDLC device - * @IFF_XMIT_DST_RELEASE: dev_hard_start_xmit() is allowed to + * @IFF_802_1Q_VLAN_BIT: 802.1Q VLAN device + * @IFF_EBRIDGE_BIT: Ethernet bridging device + * @IFF_BONDING_BIT: bonding master or slave + * @IFF_ISATAP_BIT: ISATAP interface (RFC4214) + * @IFF_WAN_HDLC_BIT: WAN HDLC device + * @IFF_XMIT_DST_RELEASE_BIT: dev_hard_start_xmit() is allowed to * release skb->dst - * @IFF_DONT_BRIDGE: disallow bridging this ether dev - * @IFF_DISABLE_NETPOLL: disable netpoll at run-time - * @IFF_MACVLAN_PORT: device used as macvlan port - * @IFF_BRIDGE_PORT: device used as bridge port - * @IFF_OVS_DATAPATH: device used as Open vSwitch datapath port - * @IFF_TX_SKB_SHARING: The interface supports sharing skbs on transmit - * @IFF_UNICAST_FLT: Supports unicast filtering - * @IFF_TEAM_PORT: device used as team port - * @IFF_SUPP_NOFCS: device supports sending custom FCS - * @IFF_LIVE_ADDR_CHANGE: device supports hardware address + * @IFF_DONT_BRIDGE_BIT: disallow bridging this ether dev + * @IFF_DISABLE_NETPOLL_BIT: disable netpoll at run-time + * @IFF_MACVLAN_PORT_BIT: device used as macvlan port + * @IFF_BRIDGE_PORT_BIT: device used as bridge port + * @IFF_OVS_DATAPATH_BIT: device used as Open vSwitch datapath port + * @IFF_TX_SKB_SHARING_BIT: The interface supports sharing skbs on transmit + * @IFF_UNICAST_FLT_BIT: Supports unicast filtering + * @IFF_TEAM_PORT_BIT: device used as team port + * @IFF_SUPP_NOFCS_BIT: device supports sending custom FCS + * @IFF_LIVE_ADDR_CHANGE_BIT: device supports hardware address * change when it's running - * @IFF_MACVLAN: Macvlan device - * @IFF_XMIT_DST_RELEASE_PERM: IFF_XMIT_DST_RELEASE not taking into account + * @IFF_MACVLAN_BIT: Macvlan device + * @IFF_XMIT_DST_RELEASE_PERM_BIT: IFF_XMIT_DST_RELEASE not taking into account * underlying stacked devices - * @IFF_L3MDEV_MASTER: device is an L3 master device - * @IFF_NO_QUEUE: device can run without qdisc attached - * @IFF_OPENVSWITCH: device is a Open vSwitch master - * @IFF_L3MDEV_SLAVE: device is enslaved to an L3 master device - * @IFF_TEAM: device is a team device - * @IFF_RXFH_CONFIGURED: device has had Rx Flow indirection table configured - * @IFF_PHONY_HEADROOM: the headroom value is controlled by an external + * @IFF_L3MDEV_MASTER_BIT: device is an L3 master device + * @IFF_NO_QUEUE_BIT: device can run without qdisc attached + * @IFF_OPENVSWITCH_BIT: device is a Open vSwitch master + * @IFF_L3MDEV_SLAVE_BIT: device is enslaved to an L3 master device + * @IFF_TEAM_BIT: device is a team device + * @IFF_RXFH_CONFIGURED_BIT: device has had Rx Flow indirection table configured + * @IFF_PHONY_HEADROOM_BIT: the headroom value is controlled by an external * entity (i.e. the master device for bridged veth) - * @IFF_MACSEC: device is a MACsec device - * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook - * @IFF_FAILOVER: device is a failover master device - * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device - * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device - * @IFF_LIVE_RENAME_OK: rename is allowed while device is up and running + * @IFF_MACSEC_BIT: device is a MACsec device + * @IFF_NO_RX_HANDLER_BIT: device doesn't support the rx_handler hook + * @IFF_FAILOVER_BIT: device is a failover master device + * @IFF_FAILOVER_SLAVE_BIT: device is lower dev of a failover master device + * @IFF_L3MDEV_RX_HANDLER_BIT: only invoke the rx handler of L3 master device + * @IFF_LIVE_RENAME_OK_BIT: rename is allowed while device is up and running + * + * @NETDEV_PRIV_FLAG_COUNT: total priv flags count */ enum netdev_priv_flags { - IFF_802_1Q_VLAN = 1<<0, - IFF_EBRIDGE = 1<<1, - IFF_BONDING = 1<<2, - IFF_ISATAP = 1<<3, - IFF_WAN_HDLC = 1<<4, - IFF_XMIT_DST_RELEASE = 1<<5, - IFF_DONT_BRIDGE = 1<<6, - IFF_DISABLE_NETPOLL = 1<<7, - IFF_MACVLAN_PORT = 1<<8, - IFF_BRIDGE_PORT = 1<<9, - IFF_OVS_DATAPATH = 1<<10, - IFF_TX_SKB_SHARING = 1<<11, - IFF_UNICAST_FLT = 1<<12, - IFF_TEAM_PORT = 1<<13, - IFF_SUPP_NOFCS = 1<<14, - IFF_LIVE_ADDR_CHANGE = 1<<15, - IFF_MACVLAN = 1<<16, - IFF_XMIT_DST_RELEASE_PERM = 1<<17, - IFF_L3MDEV_MASTER = 1<<18, - IFF_NO_QUEUE = 1<<19, - IFF_OPENVSWITCH = 1<<20, - IFF_L3MDEV_SLAVE = 1<<21, - IFF_TEAM = 1<<22, - IFF_RXFH_CONFIGURED = 1<<23, - IFF_PHONY_HEADROOM = 1<<24, - IFF_MACSEC = 1<<25, - IFF_NO_RX_HANDLER = 1<<26, - IFF_FAILOVER = 1<<27, - IFF_FAILOVER_SLAVE = 1<<28, - IFF_L3MDEV_RX_HANDLER = 1<<29, - IFF_LIVE_RENAME_OK = 1<<30, + IFF_802_1Q_VLAN_BIT, + IFF_EBRIDGE_BIT, + IFF_BONDING_BIT, + IFF_ISATAP_BIT, + IFF_WAN_HDLC_BIT, + IFF_XMIT_DST_RELEASE_BIT, + IFF_DONT_BRIDGE_BIT, + IFF_DISABLE_NETPOLL_BIT, + IFF_MACVLAN_PORT_BIT, + IFF_BRIDGE_PORT_BIT, + IFF_OVS_DATAPATH_BIT, + IFF_TX_SKB_SHARING_BIT, + IFF_UNICAST_FLT_BIT, + IFF_TEAM_PORT_BIT, + IFF_SUPP_NOFCS_BIT, + IFF_LIVE_ADDR_CHANGE_BIT, + IFF_MACVLAN_BIT, + IFF_XMIT_DST_RELEASE_PERM_BIT, + IFF_L3MDEV_MASTER_BIT, + IFF_NO_QUEUE_BIT, + IFF_OPENVSWITCH_BIT, + IFF_L3MDEV_SLAVE_BIT, + IFF_TEAM_BIT, + IFF_RXFH_CONFIGURED_BIT, + IFF_PHONY_HEADROOM_BIT, + IFF_MACSEC_BIT, + IFF_NO_RX_HANDLER_BIT, + IFF_FAILOVER_BIT, + IFF_FAILOVER_SLAVE_BIT, + IFF_L3MDEV_RX_HANDLER_BIT, + IFF_LIVE_RENAME_OK_BIT, + + NETDEV_PRIV_FLAG_COUNT, }; -#define IFF_802_1Q_VLAN IFF_802_1Q_VLAN -#define IFF_EBRIDGE IFF_EBRIDGE -#define IFF_BONDING IFF_BONDING -#define IFF_ISATAP IFF_ISATAP -#define IFF_WAN_HDLC IFF_WAN_HDLC -#define IFF_XMIT_DST_RELEASE IFF_XMIT_DST_RELEASE -#define IFF_DONT_BRIDGE IFF_DONT_BRIDGE -#define IFF_DISABLE_NETPOLL IFF_DISABLE_NETPOLL -#define IFF_MACVLAN_PORT IFF_MACVLAN_PORT -#define IFF_BRIDGE_PORT IFF_BRIDGE_PORT -#define IFF_OVS_DATAPATH IFF_OVS_DATAPATH -#define IFF_TX_SKB_SHARING IFF_TX_SKB_SHARING -#define IFF_UNICAST_FLT IFF_UNICAST_FLT -#define IFF_TEAM_PORT IFF_TEAM_PORT -#define IFF_SUPP_NOFCS IFF_SUPP_NOFCS -#define IFF_LIVE_ADDR_CHANGE IFF_LIVE_ADDR_CHANGE -#define IFF_MACVLAN IFF_MACVLAN -#define IFF_XMIT_DST_RELEASE_PERM IFF_XMIT_DST_RELEASE_PERM -#define IFF_L3MDEV_MASTER IFF_L3MDEV_MASTER -#define IFF_NO_QUEUE IFF_NO_QUEUE -#define IFF_OPENVSWITCH IFF_OPENVSWITCH -#define IFF_L3MDEV_SLAVE IFF_L3MDEV_SLAVE -#define IFF_TEAM IFF_TEAM -#define IFF_RXFH_CONFIGURED IFF_RXFH_CONFIGURED -#define IFF_PHONY_HEADROOM IFF_PHONY_HEADROOM -#define IFF_MACSEC IFF_MACSEC -#define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER -#define IFF_FAILOVER IFF_FAILOVER -#define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE -#define IFF_L3MDEV_RX_HANDLER IFF_L3MDEV_RX_HANDLER -#define IFF_LIVE_RENAME_OK IFF_LIVE_RENAME_OK +typedef u32 netdev_priv_flags_t; +static_assert(sizeof(netdev_priv_flags_t) * BITS_PER_BYTE >= + NETDEV_PRIV_FLAG_COUNT); + +#define __IFF_BIT(bit) ((netdev_priv_flags_t)1 << (bit)) +#define __IFF(name) __IFF_BIT(IFF_##name##_BIT) + +#define IFF_802_1Q_VLAN __IFF(802_1Q_VLAN) +#define IFF_EBRIDGE __IFF(EBRIDGE) +#define IFF_BONDING __IFF(BONDING) +#define IFF_ISATAP __IFF(ISATAP) +#define IFF_WAN_HDLC __IFF(WAN_HDLC) +#define IFF_XMIT_DST_RELEASE __IFF(XMIT_DST_RELEASE) +#define IFF_DONT_BRIDGE __IFF(DONT_BRIDGE) +#define IFF_DISABLE_NETPOLL __IFF(DISABLE_NETPOLL) +#define IFF_MACVLAN_PORT __IFF(MACVLAN_PORT) +#define IFF_BRIDGE_PORT __IFF(BRIDGE_PORT) +#define IFF_OVS_DATAPATH __IFF(OVS_DATAPATH) +#define IFF_TX_SKB_SHARING __IFF(TX_SKB_SHARING) +#define IFF_UNICAST_FLT __IFF(UNICAST_FLT) +#define IFF_TEAM_PORT __IFF(TEAM_PORT) +#define IFF_SUPP_NOFCS __IFF(SUPP_NOFCS) +#define IFF_LIVE_ADDR_CHANGE __IFF(LIVE_ADDR_CHANGE) +#define IFF_MACVLAN __IFF(MACVLAN) +#define IFF_XMIT_DST_RELEASE_PERM __IFF(XMIT_DST_RELEASE_PERM) +#define IFF_L3MDEV_MASTER __IFF(L3MDEV_MASTER) +#define IFF_NO_QUEUE __IFF(NO_QUEUE) +#define IFF_OPENVSWITCH __IFF(OPENVSWITCH) +#define IFF_L3MDEV_SLAVE __IFF(L3MDEV_SLAVE) +#define IFF_TEAM __IFF(TEAM) +#define IFF_RXFH_CONFIGURED __IFF(RXFH_CONFIGURED) +#define IFF_PHONY_HEADROOM __IFF(PHONY_HEADROOM) +#define IFF_MACSEC __IFF(MACSEC) +#define IFF_NO_RX_HANDLER __IFF(NO_RX_HANDLER) +#define IFF_FAILOVER __IFF(FAILOVER) +#define IFF_FAILOVER_SLAVE __IFF(FAILOVER_SLAVE) +#define IFF_L3MDEV_RX_HANDLER __IFF(L3MDEV_RX_HANDLER) +#define IFF_LIVE_RENAME_OK __IFF(LIVE_RENAME_OK) /** * struct net_device - The DEVICE structure. @@ -1876,7 +1887,7 @@ struct net_device { /* Read-mostly cache-line for fast-path access */ unsigned int flags; - unsigned int priv_flags; + netdev_priv_flags_t priv_flags; const struct net_device_ops *netdev_ops; int ifindex; unsigned short gflags; From patchwork Wed Feb 17 12:01:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12091497 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F50EC433DB for ; Wed, 17 Feb 2021 12:03:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F050664E5F for ; Wed, 17 Feb 2021 12:03:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232496AbhBQMDX (ORCPT ); Wed, 17 Feb 2021 07:03:23 -0500 Received: from mail2.protonmail.ch ([185.70.40.22]:40051 "EHLO mail2.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232563AbhBQMCZ (ORCPT ); Wed, 17 Feb 2021 07:02:25 -0500 Date: Wed, 17 Feb 2021 12:01:21 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613563282; bh=uvsje6kXLionjEyqOvW16bSJRxCvIh50mtSZCtnJSvo=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=kl/0zkCDlZIHzDLdP/5tOn1YP0vKzpfihguDgaZk8EWz3aLwjyIeJUELieMqjCIGM StSJiu1OwQS4YKpoRsdNqv3A02k+fmOl2mKE2Q5FWECgR87zDRWXTVugtSju+qTHtZ 3aM1A9OJtge0ioCnQpDpKLx4KO5e9R/ArpWICScYzx6905sMBXVyLUsyBhp9aupr+r YZny4I8Ix59HJ3rWh22kboNLIMswzbjb/2C85/kUPt0rk083qhgQFOwBcaq9VmFxaE NxE0xFKVZXz+rWoEnfT719F74HayUYByBxGwKug0dFRB4baEuZxykwOKdPyyK8rxGx IBOSJ5UkptpFA== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v7 bpf-next 3/6] net: add priv_flags for allow tx skb without linear Message-ID: <20210217120003.7938-4-alobakin@pm.me> In-Reply-To: <20210217120003.7938-1-alobakin@pm.me> References: <20210217120003.7938-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Xuan Zhuo In some cases, we hope to construct skb directly based on the existing memory without copying data. In this case, the page will be placed directly in the skb, and the linear space of skb is empty. But unfortunately, many the network card does not support this operation. For example Mellanox Technologies MT27710 Family [ConnectX-4 Lx] will get the following error message: mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2 WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64 00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb So a priv_flag is added here to indicate whether the network card supports this feature. Signed-off-by: Xuan Zhuo Suggested-by: Alexander Lobakin [ alobakin: give a new flag more detailed description ] Signed-off-by: Alexander Lobakin --- include/linux/netdevice.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2c1a642ecdc0..1186ba901ad3 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1518,6 +1518,8 @@ struct net_device_ops { * @IFF_FAILOVER_SLAVE_BIT: device is lower dev of a failover master device * @IFF_L3MDEV_RX_HANDLER_BIT: only invoke the rx handler of L3 master device * @IFF_LIVE_RENAME_OK_BIT: rename is allowed while device is up and running + * @IFF_TX_SKB_NO_LINEAR_BIT: device/driver is capable of xmitting frames with + * skb_headlen(skb) == 0 (data starts from frag0) * * @NETDEV_PRIV_FLAG_COUNT: total priv flags count */ @@ -1553,6 +1555,7 @@ enum netdev_priv_flags { IFF_FAILOVER_SLAVE_BIT, IFF_L3MDEV_RX_HANDLER_BIT, IFF_LIVE_RENAME_OK_BIT, + IFF_TX_SKB_NO_LINEAR_BIT, NETDEV_PRIV_FLAG_COUNT, }; @@ -1595,6 +1598,7 @@ static_assert(sizeof(netdev_priv_flags_t) * BITS_PER_BYTE >= #define IFF_FAILOVER_SLAVE __IFF(FAILOVER_SLAVE) #define IFF_L3MDEV_RX_HANDLER __IFF(L3MDEV_RX_HANDLER) #define IFF_LIVE_RENAME_OK __IFF(LIVE_RENAME_OK) +#define IFF_TX_SKB_NO_LINEAR __IFF(TX_SKB_NO_LINEAR) /** * struct net_device - The DEVICE structure. From patchwork Wed Feb 17 12:01:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12091495 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B806BC433E9 for ; Wed, 17 Feb 2021 12:03:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8148A64E57 for ; Wed, 17 Feb 2021 12:03:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232110AbhBQMDP (ORCPT ); Wed, 17 Feb 2021 07:03:15 -0500 Received: from mail-40134.protonmail.ch ([185.70.40.134]:30456 "EHLO mail-40134.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232546AbhBQMCZ (ORCPT ); Wed, 17 Feb 2021 07:02:25 -0500 Date: Wed, 17 Feb 2021 12:01:35 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613563299; bh=lgg1wNkMT8XQOATCoVN+lIKDRAuSPLux4jRMr6UVN+I=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=mKH57b/z3ygQp6bN9b2bkh7Wr+ip9/4vHvJE7HLfSm5hNvA4WyKWgIPnVTnoMqpXM vtyKe6sDEJ1uflLtcYfameo5ZiDops+pvjmpkoBomfRiDeDDQIUG84i6lzxK90JH8U EGltuSEVQV/LI6GwACuVarzi7128Mmdzq6aN8Kb3X/7mW7XvZy3LpVNEfR+DB9jRzT YoBw9CuWg+G5vb7AA085gATPcVd80A/qV9SeKxgmsQf5Qgwd+CBDP/9YbS37GmD74Y BB2sIh9gO69YfGbcchQAOcj4TmO3Hk1NkvToTI6kTsh4srZKrE7CQtPBIKeONemobO YbGxCnHQyUO3g== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v7 bpf-next 4/6] virtio-net: support IFF_TX_SKB_NO_LINEAR Message-ID: <20210217120003.7938-5-alobakin@pm.me> In-Reply-To: <20210217120003.7938-1-alobakin@pm.me> References: <20210217120003.7938-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Xuan Zhuo Virtio net supports the case where the skb linear space is empty, so add priv_flags. Signed-off-by: Xuan Zhuo Acked-by: Michael S. Tsirkin Signed-off-by: Alexander Lobakin --- drivers/net/virtio_net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ba8e63792549..f2ff6c3906c1 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2972,7 +2972,8 @@ static int virtnet_probe(struct virtio_device *vdev) return -ENOMEM; /* Set up network device as normal. */ - dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE; + dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE | + IFF_TX_SKB_NO_LINEAR; dev->netdev_ops = &virtnet_netdev; dev->features = NETIF_F_HIGHDMA; From patchwork Wed Feb 17 12:01:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12091499 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39D77C433E6 for ; Wed, 17 Feb 2021 12:03:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0E6E764E57 for ; Wed, 17 Feb 2021 12:03:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232553AbhBQMDb (ORCPT ); Wed, 17 Feb 2021 07:03:31 -0500 Received: from mail-40136.protonmail.ch ([185.70.40.136]:28215 "EHLO mail-40136.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232574AbhBQMCe (ORCPT ); Wed, 17 Feb 2021 07:02:34 -0500 Date: Wed, 17 Feb 2021 12:01:46 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613563312; bh=rD+msxxQv7QdHAN2CCbl57zhJ/ALAb4aGQXqzumABFM=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=PT3LFaRlDYIV38fZI3I77r0E6FRjWTnolRzDMij8v+KHiOlF1PrjH6OD/eTLWIG/r vRO5YwIToJaBCQfuG3D8h85eevr57gs7i+gJp8O7wHlSUu4al7QRs91SMIn077+2EP 4kYtYfHvMHVtcaW4f5bWsWWJ9vPRwjAoSx2Bhkh9/QRN45SuYmKeujZIcUIcvP0bc6 BwMwNEc7+xBMyTU7alJelqBMMVLlanUgXbNCTImULuUCu5gDwKZsHSAH8+Z8/w0PbH RK6FzU95XwQ1b6MFWnLms29T1f6zGNv+w5d0Qr0PbX27GRFMFxtAhAmH3gqAOA6NBc Srtl/5xLIkMfg== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v7 bpf-next 5/6] xsk: respect device's headroom and tailroom on generic xmit path Message-ID: <20210217120003.7938-6-alobakin@pm.me> In-Reply-To: <20210217120003.7938-1-alobakin@pm.me> References: <20210217120003.7938-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net xsk_generic_xmit() allocates a new skb and then queues it for xmitting. The size of new skb's headroom is desc->len, so it comes to the driver/device with no reserved headroom and/or tailroom. Lots of drivers need some headroom (and sometimes tailroom) to prepend (and/or append) some headers or data, e.g. CPU tags, device-specific headers/descriptors (LSO, TLS etc.), and if case of no available space skb_cow_head() will reallocate the skb. Reallocations are unwanted on fast-path, especially when it comes to XDP, so generic XSK xmit should reserve the spaces declared in dev->needed_headroom and dev->needed tailroom to avoid them. Note on max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)): Usually, output functions reserve LL_RESERVED_SPACE(dev), which consists of dev->hard_header_len + dev->needed_headroom, aligned by 16. However, on XSK xmit hard header is already here in the chunk, so hard_header_len is not needed. But it'd still be better to align data up to cacheline, while reserving no less than driver requests for headroom. NET_SKB_PAD here is to double-insure there will be no reallocations even when the driver advertises no needed_headroom, but in fact need it (not so rare case). Fixes: 35fcde7f8deb ("xsk: support for Tx") Signed-off-by: Alexander Lobakin Acked-by: Magnus Karlsson --- net/xdp/xsk.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 4faabd1ecfd1..143979ea4165 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -454,12 +454,16 @@ static int xsk_generic_xmit(struct sock *sk) struct sk_buff *skb; unsigned long flags; int err = 0; + u32 hr, tr; mutex_lock(&xs->mutex); if (xs->queue_id >= xs->dev->real_num_tx_queues) goto out; + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); + tr = xs->dev->needed_tailroom; + while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { char *buffer; u64 addr; @@ -471,11 +475,13 @@ static int xsk_generic_xmit(struct sock *sk) } len = desc.len; - skb = sock_alloc_send_skb(sk, len, 1, &err); + skb = sock_alloc_send_skb(sk, hr + len + tr, 1, &err); if (unlikely(!skb)) goto out; + skb_reserve(skb, hr); skb_put(skb, len); + addr = desc.addr; buffer = xsk_buff_raw_get_data(xs->pool, addr); err = skb_store_bits(skb, 0, buffer, len); From patchwork Wed Feb 17 12:01:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 12091501 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1655C43381 for ; Wed, 17 Feb 2021 12:03:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8796E64E57 for ; Wed, 17 Feb 2021 12:03:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232578AbhBQMDg (ORCPT ); Wed, 17 Feb 2021 07:03:36 -0500 Received: from mail-40133.protonmail.ch ([185.70.40.133]:15758 "EHLO mail-40133.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232582AbhBQMDB (ORCPT ); Wed, 17 Feb 2021 07:03:01 -0500 Date: Wed, 17 Feb 2021 12:01:59 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail; t=1613563321; bh=OZJJvVw+aRM0kcaqH3GC0aHgPjhIDjrz+ohzOH7Tkpo=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=nxV/mreQyJ0mA1F1uMARWnnjx0RJUHWn0diaIFMDwZp+bGv6ro71Qk4osWyxON1xF ZeZ4ZAGw1PQgQBoKsynpJ5Ii5tiwsxNv7fYB7jq6zXD0VOS78kLyEqrs0Od/NQMXTA Vf9bxDUZpZlqji7PdQ8eX+LdaAolvBBWqUdxTicuwYMk0q0999/5vuTAeM7VUc5ibN OhuboFRaGtaY/1Iq0HAiFhSAri0YB3LTLo7v1AA8vZ2p71smZl8DPKXTZLyDQ5i6bl M5Uca3ra83D08IGXAo+KCWVnz6yd1GVx3yTZpu5Eg07sVaTibaZsDcbmqwQ5oqg42d FojfMjd0qfZqQ== To: Daniel Borkmann , Magnus Karlsson From: Alexander Lobakin Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , Jonathan Lemon , Alexei Starovoitov , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Paolo Abeni , Eric Dumazet , Xuan Zhuo , Dust Li , Alexander Lobakin , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Reply-To: Alexander Lobakin Subject: [PATCH v7 bpf-next 6/6] xsk: build skb by page (aka generic zerocopy xmit) Message-ID: <20210217120003.7938-7-alobakin@pm.me> In-Reply-To: <20210217120003.7938-1-alobakin@pm.me> References: <20210217120003.7938-1-alobakin@pm.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Xuan Zhuo This patch is used to construct skb based on page to save memory copy overhead. This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to directly construct skb. If this feature is not supported, it is still necessary to copy data to construct skb. ---------------- Performance Testing ------------ The test environment is Aliyun ECS server. Test cmd: ``` xdpsock -i eth0 -t -S -s ``` Test result data: size 64 512 1024 1500 copy 1916747 1775988 1600203 1440054 page 1974058 1953655 1945463 1904478 percent 3.0% 10.0% 21.58% 32.3% Signed-off-by: Xuan Zhuo Reviewed-by: Dust Li [ alobakin: - expand subject to make it clearer; - improve skb->truesize calculation; - reserve some headroom in skb for drivers; - tailroom is not needed as skb is non-linear ] Signed-off-by: Alexander Lobakin Acked-by: Magnus Karlsson Acked-by: John Fastabend --- net/xdp/xsk.c | 120 ++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 96 insertions(+), 24 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 143979ea4165..a71ed664da0a 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -445,6 +445,97 @@ static void xsk_destruct_skb(struct sk_buff *skb) sock_wfree(skb); } +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, + struct xdp_desc *desc) +{ + struct xsk_buff_pool *pool = xs->pool; + u32 hr, len, ts, offset, copy, copied; + struct sk_buff *skb; + struct page *page; + void *buffer; + int err, i; + u64 addr; + + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); + + skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err); + if (unlikely(!skb)) + return ERR_PTR(err); + + skb_reserve(skb, hr); + + addr = desc->addr; + len = desc->len; + ts = pool->unaligned ? len : pool->chunk_size; + + buffer = xsk_buff_raw_get_data(pool, addr); + offset = offset_in_page(buffer); + addr = buffer - pool->addrs; + + for (copied = 0, i = 0; copied < len; i++) { + page = pool->umem->pgs[addr >> PAGE_SHIFT]; + get_page(page); + + copy = min_t(u32, PAGE_SIZE - offset, len - copied); + skb_fill_page_desc(skb, i, page, offset, copy); + + copied += copy; + addr += copy; + offset = 0; + } + + skb->len += len; + skb->data_len += len; + skb->truesize += ts; + + refcount_add(ts, &xs->sk.sk_wmem_alloc); + + return skb; +} + +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, + struct xdp_desc *desc) +{ + struct net_device *dev = xs->dev; + struct sk_buff *skb; + + if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { + skb = xsk_build_skb_zerocopy(xs, desc); + if (IS_ERR(skb)) + return skb; + } else { + u32 hr, tr, len; + void *buffer; + int err; + + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)); + tr = dev->needed_tailroom; + len = desc->len; + + skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err); + if (unlikely(!skb)) + return ERR_PTR(err); + + skb_reserve(skb, hr); + skb_put(skb, len); + + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); + err = skb_store_bits(skb, 0, buffer, len); + if (unlikely(err)) { + kfree_skb(skb); + return ERR_PTR(err); + } + } + + skb->dev = dev; + skb->priority = xs->sk.sk_priority; + skb->mark = xs->sk.sk_mark; + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; + skb->destructor = xsk_destruct_skb; + + return skb; +} + static int xsk_generic_xmit(struct sock *sk) { struct xdp_sock *xs = xdp_sk(sk); @@ -454,56 +545,37 @@ static int xsk_generic_xmit(struct sock *sk) struct sk_buff *skb; unsigned long flags; int err = 0; - u32 hr, tr; mutex_lock(&xs->mutex); if (xs->queue_id >= xs->dev->real_num_tx_queues) goto out; - hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); - tr = xs->dev->needed_tailroom; - while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { - char *buffer; - u64 addr; - u32 len; - if (max_batch-- == 0) { err = -EAGAIN; goto out; } - len = desc.len; - skb = sock_alloc_send_skb(sk, hr + len + tr, 1, &err); - if (unlikely(!skb)) + skb = xsk_build_skb(xs, &desc); + if (IS_ERR(skb)) { + err = PTR_ERR(skb); goto out; + } - skb_reserve(skb, hr); - skb_put(skb, len); - - addr = desc.addr; - buffer = xsk_buff_raw_get_data(xs->pool, addr); - err = skb_store_bits(skb, 0, buffer, len); /* This is the backpressure mechanism for the Tx path. * Reserve space in the completion queue and only proceed * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ spin_lock_irqsave(&xs->pool->cq_lock, flags); - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { + if (xskq_prod_reserve(xs->pool->cq)) { spin_unlock_irqrestore(&xs->pool->cq_lock, flags); kfree_skb(skb); goto out; } spin_unlock_irqrestore(&xs->pool->cq_lock, flags); - skb->dev = xs->dev; - skb->priority = sk->sk_priority; - skb->mark = sk->sk_mark; - skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; - skb->destructor = xsk_destruct_skb; - err = __dev_direct_xmit(skb, xs->queue_id); if (err == NETDEV_TX_BUSY) { /* Tell user-space to retry the send */