From patchwork Thu Dec 21 13:26:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fijalkowski, Maciej" X-Patchwork-Id: 13501869 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A14377F39; Thu, 21 Dec 2023 13:27:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DPv2gPhT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1703165228; x=1734701228; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eJyulSxFkLgss8vVHYcG6J+Lt6BM/8SR04Fb1v9dFa8=; b=DPv2gPhTpWO+uz32TyqPsayrBC3h80tDM1tnoPukdCzaN0ZLU5XhjhoU tFMaF/0Hs2J0RspFhUA07B9t1oF5W/VO5+8tf5YYs29UXua0efOnEN3oq ddFcV1XS7jhNyF5WyTnwtD6sZaIrbwfbg2QpdJVqndhvsRonl5GBg1zY7 ibkWdrXZfEsFReWPS3LYhz/76VjD6+wIfIAOWmsiCI9dHUOlEgtkNSs+Z W7uoJ3FvGOym3b0RXgemv33wxmmbbdy9l09tYnYKLeX2PBYMzbPJZ405W rYCPA/ow9ADDDRQ4us8m4QTPUqS8sM430g7X3z4SKZzo9uyZhH94++fOO g==; X-IronPort-AV: E=McAfee;i="6600,9927,10930"; a="3205545" X-IronPort-AV: E=Sophos;i="6.04,293,1695711600"; d="scan'208";a="3205545" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2023 05:27:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,293,1695711600"; d="scan'208";a="24955779" Received: from boxer.igk.intel.com ([10.102.20.173]) by orviesa001.jf.intel.com with ESMTP; 21 Dec 2023 05:27:06 -0800 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, echaudro@redhat.com, lorenzo@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v3 bpf 1/4] xsk: recycle buffer in case Rx queue was full Date: Thu, 21 Dec 2023 14:26:53 +0100 Message-Id: <20231221132656.384606-2-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231221132656.384606-1-maciej.fijalkowski@intel.com> References: <20231221132656.384606-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Add missing xsk_buff_free() call when __xsk_rcv_zc() failed to produce descriptor to XSK Rx queue. Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Acked-by: Magnus Karlsson Signed-off-by: Maciej Fijalkowski --- net/xdp/xsk.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 9f13aa3353e3..1eadfac03cc4 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -167,8 +167,10 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) contd = XDP_PKT_CONTD; err = __xsk_rcv_zc(xs, xskb, len, contd); - if (err || likely(!frags)) - goto out; + if (err) + goto err; + if (likely(!frags)) + return 0; xskb_list = &xskb->pool->xskb_list; list_for_each_entry_safe(pos, tmp, xskb_list, xskb_list_node) { @@ -177,11 +179,13 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) len = pos->xdp.data_end - pos->xdp.data; err = __xsk_rcv_zc(xs, pos, len, contd); if (err) - return err; + goto err; list_del(&pos->xskb_list_node); } -out: + return 0; +err: + xsk_buff_free(xdp); return err; } From patchwork Thu Dec 21 13:26:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fijalkowski, Maciej" X-Patchwork-Id: 13501870 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 237D86D6C0; Thu, 21 Dec 2023 13:27:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="g/Mln8U9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1703165231; x=1734701231; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gy+3JGGJvg5YbIavzYqCZcWzf4+wmHieQCNFt2txYqE=; b=g/Mln8U9mKpUVxmppVxs3Nv4zN1BAmcjRGCd6vjM81DHEEBHRAZ8AO0q MudW7mMqyzzpoXju6jd21sls06BiVjDlzomxYJPdBcvRYMO24B6yP5XAM g/odl1fn0QZD+ultOKFhL+QtrBcfch83KqXMjqZp+FRVmcgsyStVNOGch Xx67QsnkgTjmlQOJjr8XLu2Ukqz1Wt+U3aYw4TohWX+y6CSFAEK7Bo2Wf P0/X9aqF6OISBGCX12Se5G7eZvicTS4oNDqJkfOJgbup7SB+JF44b0ZT2 bClfbaRRkkZgfMyNUCNqjHdks6ZHGhCbSe+liaY27IfOJNaV7lCXAsO+6 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10930"; a="3205560" X-IronPort-AV: E=Sophos;i="6.04,293,1695711600"; d="scan'208";a="3205560" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2023 05:27:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,293,1695711600"; d="scan'208";a="24955794" Received: from boxer.igk.intel.com ([10.102.20.173]) by orviesa001.jf.intel.com with ESMTP; 21 Dec 2023 05:27:08 -0800 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, echaudro@redhat.com, lorenzo@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v3 bpf 2/4] xsk: fix usage of multi-buffer BPF helpers for ZC XDP Date: Thu, 21 Dec 2023 14:26:54 +0100 Message-Id: <20231221132656.384606-3-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231221132656.384606-1-maciej.fijalkowski@intel.com> References: <20231221132656.384606-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Currently when packet is shrunk via bpf_xdp_adjust_tail(), null ptr dereference happens: [1136314.192256] BUG: kernel NULL pointer dereference, address: 0000000000000034 [1136314.203943] #PF: supervisor read access in kernel mode [1136314.213768] #PF: error_code(0x0000) - not-present page [1136314.223550] PGD 0 P4D 0 [1136314.230684] Oops: 0000 [#1] PREEMPT SMP NOPTI [1136314.239621] CPU: 8 PID: 54203 Comm: xdpsock Not tainted 6.6.0+ #257 [1136314.250469] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [1136314.265615] RIP: 0010:__xdp_return+0x6c/0x210 [1136314.274653] Code: ad 00 48 8b 47 08 49 89 f8 a8 01 0f 85 9b 01 00 00 0f 1f 44 00 00 f0 41 ff 48 34 75 32 4c 89 c7 e9 79 cd 80 ff 83 fe 03 75 17 41 34 01 0f 85 02 01 00 00 48 89 cf e9 22 cc 1e 00 e9 3d d2 86 [1136314.302907] RSP: 0018:ffffc900089f8db0 EFLAGS: 00010246 [1136314.312967] RAX: ffffc9003168aed0 RBX: ffff8881c3300000 RCX: 0000000000000000 [1136314.324953] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffc9003168c000 [1136314.336929] RBP: 0000000000000ae0 R08: 0000000000000002 R09: 0000000000010000 [1136314.348844] R10: ffffc9000e495000 R11: 0000000000000040 R12: 0000000000000001 [1136314.360706] R13: 0000000000000524 R14: ffffc9003168aec0 R15: 0000000000000001 [1136314.373298] FS: 00007f8df8bbcb80(0000) GS:ffff8897e0e00000(0000) knlGS:0000000000000000 [1136314.386105] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1136314.396532] CR2: 0000000000000034 CR3: 00000001aa912002 CR4: 00000000007706f0 [1136314.408377] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1136314.420173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1136314.431890] PKRU: 55555554 [1136314.439143] Call Trace: [1136314.446058] [1136314.452465] ? __die+0x20/0x70 [1136314.459881] ? page_fault_oops+0x15b/0x440 [1136314.468305] ? exc_page_fault+0x6a/0x150 [1136314.476491] ? asm_exc_page_fault+0x22/0x30 [1136314.484927] ? __xdp_return+0x6c/0x210 [1136314.492863] bpf_xdp_adjust_tail+0x155/0x1d0 [1136314.501269] bpf_prog_ccc47ae29d3b6570_xdp_sock_prog+0x15/0x60 [1136314.511263] ice_clean_rx_irq_zc+0x206/0xc60 [ice] [1136314.520222] ? ice_xmit_zc+0x6e/0x150 [ice] [1136314.528506] ice_napi_poll+0x467/0x670 [ice] [1136314.536858] ? ttwu_do_activate.constprop.0+0x8f/0x1a0 [1136314.546010] __napi_poll+0x29/0x1b0 [1136314.553462] net_rx_action+0x133/0x270 [1136314.561619] __do_softirq+0xbe/0x28e [1136314.569303] do_softirq+0x3f/0x60 This comes from __xdp_return() call with xdp_buff argument passed as NULL which is supposed to be consumed by xsk_buff_free() call. To address this properly, in ZC case, a node that represents the frag being removed has to be pulled out of xskb_list. Introduce appriopriate xsk helpers to do such node operation and use them accordingly within bpf_xdp_adjust_tail(). Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Acked-by: Magnus Karlsson # For the xsk header part Signed-off-by: Maciej Fijalkowski --- include/net/xdp_sock_drv.h | 26 +++++++++++++++++++++ net/core/filter.c | 48 +++++++++++++++++++++++++++++++------- 2 files changed, 65 insertions(+), 9 deletions(-) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index b62bb8525a5f..3d35ac0f838b 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -159,6 +159,23 @@ static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) return ret; } +static inline void xsk_buff_del_tail(struct xdp_buff *tail) +{ + struct xdp_buff_xsk *xskb = container_of(tail, struct xdp_buff_xsk, xdp); + + list_del(&xskb->xskb_list_node); +} + +static inline struct xdp_buff *xsk_buff_get_tail(struct xdp_buff *first) +{ + struct xdp_buff_xsk *xskb = container_of(first, struct xdp_buff_xsk, xdp); + struct xdp_buff_xsk *frag; + + frag = list_last_entry(&xskb->pool->xskb_list, struct xdp_buff_xsk, + xskb_list_node); + return &frag->xdp; +} + static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size) { xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM; @@ -350,6 +367,15 @@ static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) return NULL; } +static inline void xsk_buff_del_tail(struct xdp_buff *tail) +{ +} + +static inline struct xdp_buff *xsk_buff_get_tail(struct xdp_buff *first) +{ + return NULL; +} + static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size) { } diff --git a/net/core/filter.c b/net/core/filter.c index 24061f29c9dd..1e20196687fd 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -83,6 +83,7 @@ #include #include #include +#include #include "dev.h" @@ -4096,6 +4097,42 @@ static int bpf_xdp_frags_increase_tail(struct xdp_buff *xdp, int offset) return 0; } +static void __shrink_data(struct xdp_buff *xdp, struct xdp_mem_info *mem_info, + skb_frag_t *frag, int shrink) +{ + if (mem_info->type == MEM_TYPE_XSK_BUFF_POOL) { + struct xdp_buff *tail = xsk_buff_get_tail(xdp); + + if (tail) + tail->data_end -= shrink; + } + skb_frag_size_sub(frag, shrink); +} + +static bool shrink_data(struct xdp_buff *xdp, skb_frag_t *frag, int shrink) +{ + struct xdp_mem_info *mem_info = &xdp->rxq->mem; + + if (skb_frag_size(frag) == shrink) { + struct page *page = skb_frag_page(frag); + struct xdp_buff *zc_frag = NULL; + + if (mem_info->type == MEM_TYPE_XSK_BUFF_POOL) { + zc_frag = xsk_buff_get_tail(xdp); + + if (zc_frag) { + xdp_buff_clear_frags_flag(zc_frag); + xsk_buff_del_tail(zc_frag); + } + } + + __xdp_return(page_address(page), mem_info, false, zc_frag); + return true; + } + __shrink_data(xdp, mem_info, frag, shrink); + return false; +} + static int bpf_xdp_frags_shrink_tail(struct xdp_buff *xdp, int offset) { struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); @@ -4110,17 +4147,10 @@ static int bpf_xdp_frags_shrink_tail(struct xdp_buff *xdp, int offset) len_free += shrink; offset -= shrink; - - if (skb_frag_size(frag) == shrink) { - struct page *page = skb_frag_page(frag); - - __xdp_return(page_address(page), &xdp->rxq->mem, - false, NULL); + if (shrink_data(xdp, frag, shrink)) n_frags_free++; - } else { - skb_frag_size_sub(frag, shrink); + else break; - } } sinfo->nr_frags -= n_frags_free; sinfo->xdp_frags_size -= len_free; From patchwork Thu Dec 21 13:26:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fijalkowski, Maciej" X-Patchwork-Id: 13501871 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DF326D6D6; Thu, 21 Dec 2023 13:27:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MEpLdwoF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1703165234; x=1734701234; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cOVAeHrs0kqbdh4sWPWFBuE+qdHFXP1QuAfmOSxXlpw=; b=MEpLdwoF6yeWpaRqRVwzShFoejY56SpOm60o/3YNY7EDHxCL8F1cyGrs O98SnX3dtAiYECpVsT0BB2uydaKD4esdkfBokQhDmoJc8dtPDOJnsSNDN wBGe5GYUcRM5UNKTt8RN1nAurKf85luR9Pk5ID7CKK0AqFbsvm9cqZ1B1 mUCn6V2M8ww7Ilh6tHln1P0Etsdc89vle0rtRdktNzlv2bqFD9jN4dyUy 8sDPY1GeiMtrLB7M0NqZSlJfVNwZRocqr/6aLTdqB/Xg1j2RZ3RKJORcN +80HXoWnh1YmEUk0NFAmTJ7G32QkWEPbm+wyYo3+3xhrn8TfI/Q4hCUxI w==; X-IronPort-AV: E=McAfee;i="6600,9927,10930"; a="3205569" X-IronPort-AV: E=Sophos;i="6.04,293,1695711600"; d="scan'208";a="3205569" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2023 05:27:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,293,1695711600"; d="scan'208";a="24955805" Received: from boxer.igk.intel.com ([10.102.20.173]) by orviesa001.jf.intel.com with ESMTP; 21 Dec 2023 05:27:11 -0800 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, echaudro@redhat.com, lorenzo@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v3 bpf 3/4] ice: work on pre-XDP prog frag count Date: Thu, 21 Dec 2023 14:26:55 +0100 Message-Id: <20231221132656.384606-4-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231221132656.384606-1-maciej.fijalkowski@intel.com> References: <20231221132656.384606-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Fix an OOM panic in XDP_DRV mode when a XDP program shrinks a multi-buffer packet by 4k bytes and then redirects it to an AF_XDP socket. Since support for handling multi-buffer frames was added to XDP, usage of bpf_xdp_adjust_tail() helper within XDP program can free the page that given fragment occupies and in turn decrease the fragment count within skb_shared_info that is embedded in xdp_buff struct. In current ice driver codebase, it can become problematic when page recycling logic decides not to reuse the page. In such case, __page_frag_cache_drain() is used with ice_rx_buf::pagecnt_bias that was not adjusted after refcount of page was changed by XDP prog which in turn does not drain the refcount to 0 and page is never freed. To address this, let us store the count of frags before the XDP program was executed on Rx ring struct. This will be used to compare with current frag count from skb_shared_info embedded in xdp_buff. A smaller value in the latter indicates that XDP prog freed frag(s). Then, for given delta decrement pagecnt_bias for XDP_DROP verdict. While at it, let us also handle the EOP frag within ice_set_rx_bufs_act() to make our life easier, so all of the adjustments needed to be applied against freed frags are performed in the single place. Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Signed-off-by: Maciej Fijalkowski --- drivers/net/ethernet/intel/ice/ice_txrx.c | 14 ++++++--- drivers/net/ethernet/intel/ice/ice_txrx.h | 1 + drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 31 +++++++++++++------ 3 files changed, 32 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 59617f055e35..1760e81379cc 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -603,9 +603,7 @@ ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, ret = ICE_XDP_CONSUMED; } exit: - rx_buf->act = ret; - if (unlikely(xdp_buff_has_frags(xdp))) - ice_set_rx_bufs_act(xdp, rx_ring, ret); + ice_set_rx_bufs_act(xdp, rx_ring, ret); } /** @@ -893,14 +891,17 @@ ice_add_xdp_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, } if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) { - if (unlikely(xdp_buff_has_frags(xdp))) - ice_set_rx_bufs_act(xdp, rx_ring, ICE_XDP_CONSUMED); + ice_set_rx_bufs_act(xdp, rx_ring, ICE_XDP_CONSUMED); return -ENOMEM; } __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page, rx_buf->page_offset, size); sinfo->xdp_frags_size += size; + /* remember frag count before XDP prog execution; bpf_xdp_adjust_tail() + * can pop off frags but driver has to handle it on its own + */ + rx_ring->nr_frags = sinfo->nr_frags; if (page_is_pfmemalloc(rx_buf->page)) xdp_buff_set_frag_pfmemalloc(xdp); @@ -1251,6 +1252,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) xdp->data = NULL; rx_ring->first_desc = ntc; + rx_ring->nr_frags = 0; continue; construct_skb: if (likely(ice_ring_uses_build_skb(rx_ring))) @@ -1266,10 +1268,12 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) ICE_XDP_CONSUMED); xdp->data = NULL; rx_ring->first_desc = ntc; + rx_ring->nr_frags = 0; break; } xdp->data = NULL; rx_ring->first_desc = ntc; + rx_ring->nr_frags = 0; stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_RXE_S); if (unlikely(ice_test_staterr(rx_desc->wb.status_error0, diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index b3379ff73674..af955b0e5dc5 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -358,6 +358,7 @@ struct ice_rx_ring { struct ice_tx_ring *xdp_ring; struct ice_rx_ring *next; /* pointer to next ring in q_vector */ struct xsk_buff_pool *xsk_pool; + u32 nr_frags; dma_addr_t dma; /* physical address of ring */ u16 rx_buf_len; u8 dcb_tc; /* Traffic class of ring */ diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h index 762047508619..afcead4baef4 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h @@ -12,26 +12,39 @@ * act: action to store onto Rx buffers related to XDP buffer parts * * Set action that should be taken before putting Rx buffer from first frag - * to one before last. Last one is handled by caller of this function as it - * is the EOP frag that is currently being processed. This function is - * supposed to be called only when XDP buffer contains frags. + * to the last. */ static inline void ice_set_rx_bufs_act(struct xdp_buff *xdp, const struct ice_rx_ring *rx_ring, const unsigned int act) { - const struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); - u32 first = rx_ring->first_desc; - u32 nr_frags = sinfo->nr_frags; + u32 sinfo_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; + u32 nr_frags = rx_ring->nr_frags + 1; + u32 idx = rx_ring->first_desc; u32 cnt = rx_ring->count; struct ice_rx_buf *buf; for (int i = 0; i < nr_frags; i++) { - buf = &rx_ring->rx_buf[first]; + buf = &rx_ring->rx_buf[idx]; buf->act = act; - if (++first == cnt) - first = 0; + if (++idx == cnt) + idx = 0; + } + + /* adjust pagecnt_bias on frags freed by XDP prog */ + if (sinfo_frags < rx_ring->nr_frags && act == ICE_XDP_CONSUMED) { + u32 delta = rx_ring->nr_frags - sinfo_frags; + + while (delta) { + if (idx == 0) + idx = cnt - 1; + else + idx--; + buf = &rx_ring->rx_buf[idx]; + buf->pagecnt_bias--; + delta--; + } } } From patchwork Thu Dec 21 13:26:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fijalkowski, Maciej" X-Patchwork-Id: 13501872 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 165846D6E7; Thu, 21 Dec 2023 13:27:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mE7vbur8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1703165237; x=1734701237; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q2093ZnsMZvgQaMPyjj3n9Ui20GdMLK+vKTbvIUoFAc=; b=mE7vbur89We6B47HVffqqYirgoBNFgOe5xZgM2IWpP/WhAWml6LUD/Jw qpdHvejaWhFHriPLP3aeNJVERABHLgo5ARrEV/iRemTDlLmGy5IpHV4wh Rw6H5rxXUmdRDcP0u5qqsDJgntXCCH5o0VPiM0A/NStytemVZpDQ7U7S2 VFS58BhkkodBL5juoKb80ost/2dYWEpLeL8tceS4hr5DL3HWkEQWjeyow iQ6Du2Vvp953Ew2hiOftNFKSGCcseMllFjBNv814mVGAtlRq97D2e8QvZ 7RQ0Pmc4qIFmjBffcn1LbVIFvBqIQHEnqsCLZY49LoAHeY+93PGab5VkC Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10930"; a="3205579" X-IronPort-AV: E=Sophos;i="6.04,293,1695711600"; d="scan'208";a="3205579" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Dec 2023 05:27:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,293,1695711600"; d="scan'208";a="24955815" Received: from boxer.igk.intel.com ([10.102.20.173]) by orviesa001.jf.intel.com with ESMTP; 21 Dec 2023 05:27:14 -0800 From: Maciej Fijalkowski To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, echaudro@redhat.com, lorenzo@kernel.org, tirthendu.sarkar@intel.com Subject: [PATCH v3 bpf 4/4] i40e: handle multi-buffer packets that are shrunk by xdp prog Date: Thu, 21 Dec 2023 14:26:56 +0100 Message-Id: <20231221132656.384606-5-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231221132656.384606-1-maciej.fijalkowski@intel.com> References: <20231221132656.384606-1-maciej.fijalkowski@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Tirthendu Sarkar XDP programs can shrink packets by calling the bpf_xdp_adjust_tail() helper function. For multi-buffer packets this may lead to reduction of frag count stored in skb_shared_info area of the xdp_buff struct. This results in issues with the current handling of XDP_PASS and XDP_DROP cases. For XDP_PASS, currently skb is being built using frag count of xdp_buffer before it was processed by XDP prog and thus will result in an inconsistent skb when frag count gets reduced by XDP prog. To fix this, get correct frag count while building the skb instead of using pre-obtained frag count. For XDP_DROP, current page recycling logic will not reuse the page but instead will adjust the pagecnt_bias so that the page can be freed. This again results in inconsistent behavior as the page count has already been changed by the helper while freeing the frag(s) as part of shrinking the packet. To fix this, only adjust pagecnt_bias for buffers that are stillpart of the packet post-xdp prog run. Fixes: e213ced19bef ("i40e: add support for XDP multi-buffer Rx") Reported-by: Maciej Fijalkowski Tested-by: Maciej Fijalkowski Reviewed-by: Maciej Fijalkowski Signed-off-by: Tirthendu Sarkar --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 42 ++++++++++++--------- 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index b82df5bdfac0..b098ca2c11af 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2099,7 +2099,8 @@ static void i40e_put_rx_buffer(struct i40e_ring *rx_ring, static void i40e_process_rx_buffs(struct i40e_ring *rx_ring, int xdp_res, struct xdp_buff *xdp) { - u32 next = rx_ring->next_to_clean; + u32 nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; + u32 next = rx_ring->next_to_clean, i = 0; struct i40e_rx_buffer *rx_buffer; xdp->flags = 0; @@ -2112,10 +2113,10 @@ static void i40e_process_rx_buffs(struct i40e_ring *rx_ring, int xdp_res, if (!rx_buffer->page) continue; - if (xdp_res == I40E_XDP_CONSUMED) - rx_buffer->pagecnt_bias++; - else + if (xdp_res != I40E_XDP_CONSUMED) i40e_rx_buffer_flip(rx_buffer, xdp->frame_sz); + else if (i++ <= nr_frags) + rx_buffer->pagecnt_bias++; /* EOP buffer will be put in i40e_clean_rx_irq() */ if (next == rx_ring->next_to_process) @@ -2129,20 +2130,20 @@ static void i40e_process_rx_buffs(struct i40e_ring *rx_ring, int xdp_res, * i40e_construct_skb - Allocate skb and populate it * @rx_ring: rx descriptor ring to transact packets on * @xdp: xdp_buff pointing to the data - * @nr_frags: number of buffers for the packet * * This function allocates an skb. It then populates it with the page * data from the current receive descriptor, taking care to set up the * skb correctly. */ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring, - struct xdp_buff *xdp, - u32 nr_frags) + struct xdp_buff *xdp) { unsigned int size = xdp->data_end - xdp->data; struct i40e_rx_buffer *rx_buffer; + struct skb_shared_info *sinfo; unsigned int headlen; struct sk_buff *skb; + u32 nr_frags; /* prefetch first cache line of first page */ net_prefetch(xdp->data); @@ -2180,6 +2181,10 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring, memcpy(__skb_put(skb, headlen), xdp->data, ALIGN(headlen, sizeof(long))); + if (unlikely(xdp_buff_has_frags(xdp))) { + sinfo = xdp_get_shared_info_from_buff(xdp); + nr_frags = sinfo->nr_frags; + } rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean); /* update all of the pointers */ size -= headlen; @@ -2199,9 +2204,8 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring, } if (unlikely(xdp_buff_has_frags(xdp))) { - struct skb_shared_info *sinfo, *skinfo = skb_shinfo(skb); + struct skb_shared_info *skinfo = skb_shinfo(skb); - sinfo = xdp_get_shared_info_from_buff(xdp); memcpy(&skinfo->frags[skinfo->nr_frags], &sinfo->frags[0], sizeof(skb_frag_t) * nr_frags); @@ -2224,17 +2228,17 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring, * i40e_build_skb - Build skb around an existing buffer * @rx_ring: Rx descriptor ring to transact packets on * @xdp: xdp_buff pointing to the data - * @nr_frags: number of buffers for the packet * * This function builds an skb around an existing Rx buffer, taking care * to set up the skb correctly and avoid any memcpy overhead. */ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring, - struct xdp_buff *xdp, - u32 nr_frags) + struct xdp_buff *xdp) { unsigned int metasize = xdp->data - xdp->data_meta; + struct skb_shared_info *sinfo; struct sk_buff *skb; + u32 nr_frags; /* Prefetch first cache line of first page. If xdp->data_meta * is unused, this points exactly as xdp->data, otherwise we @@ -2243,6 +2247,11 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring, */ net_prefetch(xdp->data_meta); + if (unlikely(xdp_buff_has_frags(xdp))) { + sinfo = xdp_get_shared_info_from_buff(xdp); + nr_frags = sinfo->nr_frags; + } + /* build an skb around the page buffer */ skb = napi_build_skb(xdp->data_hard_start, xdp->frame_sz); if (unlikely(!skb)) @@ -2255,9 +2264,6 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring, skb_metadata_set(skb, metasize); if (unlikely(xdp_buff_has_frags(xdp))) { - struct skb_shared_info *sinfo; - - sinfo = xdp_get_shared_info_from_buff(xdp); xdp_update_skb_shared_info(skb, nr_frags, sinfo->xdp_frags_size, nr_frags * xdp->frame_sz, @@ -2546,7 +2552,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget, /* Update ntc and bump cleaned count if not in the * middle of mb packet. */ - if (rx_ring->next_to_clean == ntp) { + if (!xdp->data) { rx_ring->next_to_clean = rx_ring->next_to_process; cleaned_count++; @@ -2602,9 +2608,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget, total_rx_bytes += size; } else { if (ring_uses_build_skb(rx_ring)) - skb = i40e_build_skb(rx_ring, xdp, nfrags); + skb = i40e_build_skb(rx_ring, xdp); else - skb = i40e_construct_skb(rx_ring, xdp, nfrags); + skb = i40e_construct_skb(rx_ring, xdp); /* drop if we failed to retrieve a buffer */ if (!skb) {