From patchwork Mon Feb 5 11:35:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 13545296 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FE4718E29; Mon, 5 Feb 2024 11:35:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707132944; cv=none; b=M7zqSFU2qlSm8AkbtvYRAtAPvGI4CquruJPGtk7Bg/viMhCQ5kwyMCHn72qObmUJg2Yd2xDXIwQaXCPxqw/x3dVvuUvvZnYuziS0g/ajQ74NU5iicnWjLzjF0X5+WC+qzHF0rPR4hVQyQYRphJyOqT6pyf2aesvD/4/sYdWksUs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707132944; c=relaxed/simple; bh=zG1TxOZhUAK5wcAuXan93WEIPBI/5Cwiiz9izcb2ytA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cu/2wQmIRbrY+OhlVHTu2KJrP6ZpB99PEL0t+8y4aT8L20GDbp9CvpAIBHyunaoLSw49JgVzg5b4MEG6I4w47a4NqXK+fQEt9MlxZ89jlAOtZeyICh9EN/gBCXFJ3kLu9AuROiS3pMVu5IEGZYIB/adfJ/cNujeYALoXTBORqKI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=l4QQGKc2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="l4QQGKc2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2BEC4C433F1; Mon, 5 Feb 2024 11:35:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707132943; bh=zG1TxOZhUAK5wcAuXan93WEIPBI/5Cwiiz9izcb2ytA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=l4QQGKc2AGkFPvG8M/JHRY0uEudv+KQ4YDGK95QM1IxspYFALoTg+sbvI+QbQQQtR UzR7OdXhLt/9V/RNqD3hHSEQuJfz2NtoKHYpaaRq0S7ZAxs+qgbUsrxlNjknxrfpmw No+qMQ6DRJJIc9Bl/eYrgIzfFLc0RAAU7sjPl3L1m9rKf/S0E9o4+F+4HCdQo+WlnU 28f94M4yz4ISSR9nbVYasvzyL+e+ubh9TqpWuAlIKsspx5gbCbVvp+vBmVDTCU/vJA N6tt0KmZ5nGFGHXtF9z6B+ao/MqKIHMyr3Pj6tr6pCK0IJZTG5VtFAAsF1brcX6sCY 5pV9Wie5rwsMw== From: Lorenzo Bianconi To: netdev@vger.kernel.org Cc: lorenzo.bianconi@redhat.com, davem@davemloft.net, kuba@kernel.org, edumazet@google.com, pabeni@redhat.com, bpf@vger.kernel.org, toke@redhat.com, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, sdf@google.com, hawk@kernel.org, ilias.apalodimas@linaro.org, linyunsheng@huawei.com Subject: [PATCH v8 net-next 1/4] net: add generic percpu page_pool allocator Date: Mon, 5 Feb 2024 12:35:12 +0100 Message-ID: <1b7af9f7bac0d144e3fd44dc62320763349e728b.1707132752.git.lorenzo@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Introduce generic percpu page_pools allocator. Moreover add page_pool_create_percpu() and cpuid filed in page_pool struct in order to recycle the page in the page_pool "hot" cache if napi_pp_put_page() is running on the same cpu. This is a preliminary patch to add xdp multi-buff support for xdp running in generic mode. Acked-by: Jesper Dangaard Brouer Reviewed-by: Toke Hoiland-Jorgensen Signed-off-by: Lorenzo Bianconi Acked-by: Ilias Apalodimas --- include/net/page_pool/types.h | 3 +++ net/core/dev.c | 45 +++++++++++++++++++++++++++++++++++ net/core/page_pool.c | 23 ++++++++++++++---- net/core/skbuff.c | 5 ++-- 4 files changed, 70 insertions(+), 6 deletions(-) diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 76481c465375..3828396ae60c 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -128,6 +128,7 @@ struct page_pool_stats { struct page_pool { struct page_pool_params_fast p; + int cpuid; bool has_init_callback; long frag_users; @@ -203,6 +204,8 @@ struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp); struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset, unsigned int size, gfp_t gfp); struct page_pool *page_pool_create(const struct page_pool_params *params); +struct page_pool *page_pool_create_percpu(const struct page_pool_params *params, + int cpuid); struct xdp_mem_info; diff --git a/net/core/dev.c b/net/core/dev.c index 27ba057d06c4..235421d313c3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -153,6 +153,8 @@ #include #include #include +#include +#include #include "dev.h" #include "net-sysfs.h" @@ -450,6 +452,12 @@ static RAW_NOTIFIER_HEAD(netdev_chain); DEFINE_PER_CPU_ALIGNED(struct softnet_data, softnet_data); EXPORT_PER_CPU_SYMBOL(softnet_data); +/* Page_pool has a lockless array/stack to alloc/recycle pages. + * PP consumers must pay attention to run APIs in the appropriate context + * (e.g. NAPI context). + */ +static DEFINE_PER_CPU_ALIGNED(struct page_pool *, system_page_pool); + #ifdef CONFIG_LOCKDEP /* * register_netdevice() inits txq->_xmit_lock and sets lockdep class @@ -11697,6 +11705,27 @@ static void __init net_dev_struct_check(void) * */ +/* We allocate 256 pages for each CPU if PAGE_SHIFT is 12 */ +#define SYSTEM_PERCPU_PAGE_POOL_SIZE ((1 << 20) / PAGE_SIZE) + +static int net_page_pool_create(int cpuid) +{ +#if IS_ENABLED(CONFIG_PAGE_POOL) + struct page_pool_params page_pool_params = { + .pool_size = SYSTEM_PERCPU_PAGE_POOL_SIZE, + .nid = NUMA_NO_NODE, + }; + struct page_pool *pp_ptr; + + pp_ptr = page_pool_create_percpu(&page_pool_params, cpuid); + if (IS_ERR(pp_ptr)) + return -ENOMEM; + + per_cpu(system_page_pool, cpuid) = pp_ptr; +#endif + return 0; +} + /* * This is called single threaded during boot, so no need * to take the rtnl semaphore. @@ -11749,6 +11778,9 @@ static int __init net_dev_init(void) init_gro_hash(&sd->backlog); sd->backlog.poll = process_backlog; sd->backlog.weight = weight_p; + + if (net_page_pool_create(i)) + goto out; } dev_boot_phase = 0; @@ -11776,6 +11808,19 @@ static int __init net_dev_init(void) WARN_ON(rc < 0); rc = 0; out: + if (rc < 0) { + for_each_possible_cpu(i) { + struct page_pool *pp_ptr; + + pp_ptr = per_cpu(system_page_pool, i); + if (!pp_ptr) + continue; + + page_pool_destroy(pp_ptr); + per_cpu(system_page_pool, i) = NULL; + } + } + return rc; } diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 4933762e5a6b..89c835fcf094 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -171,13 +171,16 @@ static void page_pool_producer_unlock(struct page_pool *pool, } static int page_pool_init(struct page_pool *pool, - const struct page_pool_params *params) + const struct page_pool_params *params, + int cpuid) { unsigned int ring_qsize = 1024; /* Default */ memcpy(&pool->p, ¶ms->fast, sizeof(pool->p)); memcpy(&pool->slow, ¶ms->slow, sizeof(pool->slow)); + pool->cpuid = cpuid; + /* Validate only known flags were used */ if (pool->p.flags & ~(PP_FLAG_ALL)) return -EINVAL; @@ -253,10 +256,12 @@ static void page_pool_uninit(struct page_pool *pool) } /** - * page_pool_create() - create a page pool. + * page_pool_create_percpu() - create a page pool for a given cpu. * @params: parameters, see struct page_pool_params + * @cpuid: cpu identifier */ -struct page_pool *page_pool_create(const struct page_pool_params *params) +struct page_pool * +page_pool_create_percpu(const struct page_pool_params *params, int cpuid) { struct page_pool *pool; int err; @@ -265,7 +270,7 @@ struct page_pool *page_pool_create(const struct page_pool_params *params) if (!pool) return ERR_PTR(-ENOMEM); - err = page_pool_init(pool, params); + err = page_pool_init(pool, params, cpuid); if (err < 0) goto err_free; @@ -282,6 +287,16 @@ struct page_pool *page_pool_create(const struct page_pool_params *params) kfree(pool); return ERR_PTR(err); } +EXPORT_SYMBOL(page_pool_create_percpu); + +/** + * page_pool_create() - create a page pool + * @params: parameters, see struct page_pool_params + */ +struct page_pool *page_pool_create(const struct page_pool_params *params) +{ + return page_pool_create_percpu(params, -1); +} EXPORT_SYMBOL(page_pool_create); static void page_pool_return_page(struct page_pool *pool, struct page *page); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index edbbef563d4d..9e5eb47b4025 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -923,9 +923,10 @@ bool napi_pp_put_page(struct page *page, bool napi_safe) */ if (napi_safe || in_softirq()) { const struct napi_struct *napi = READ_ONCE(pp->p.napi); + unsigned int cpuid = smp_processor_id(); - allow_direct = napi && - READ_ONCE(napi->list_owner) == smp_processor_id(); + allow_direct = napi && READ_ONCE(napi->list_owner) == cpuid; + allow_direct |= (pp->cpuid == cpuid); } /* Driver set this to memory recycling info. Reset it on recycle. From patchwork Mon Feb 5 11:35:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 13545297 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CC5D19472; Mon, 5 Feb 2024 11:35:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707132948; cv=none; b=GQ9zEvl0ocMd24m8VfIug8k3VGw7fbJMjPGWLMDtk9SyY1rlSLRlUb+blqtmqE1MR+nnL67xWr0RlgxmTCsJ//eQulzj+qSO3oZy4SIhHF2q4QLn96bJgAeVAaqbb5ofxT2i6J7pjX5engUXRNig+UFgs9ceGqzyZFRE3dRGo7g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707132948; c=relaxed/simple; bh=Zv9Lrce4BD7qPjKQmWg+2Ft8wIC05bvsbMHpZu4x0mM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QeRPBfBKDebBD0WbIKrS8L/n7SBHLxY3qJZEDMUvDoZz1PJFftvJ2cSoUfF5MI9xqQyHq/EYsfB603eKk6kGxkRP3wtxPAdtS/oFCUc447MfJbYQ9I3I7qAEeP4WfRtTUBb4b1oVS0CXzN0fEraTVgq8D+ZS5N+0Hv7k3AEJPsU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LLERJeCM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LLERJeCM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2AAF4C433F1; Mon, 5 Feb 2024 11:35:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707132947; bh=Zv9Lrce4BD7qPjKQmWg+2Ft8wIC05bvsbMHpZu4x0mM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LLERJeCMdxJPGPM8jGJq/UfYWAq8lxU3Z3Q4okkll0WZfS8EOtOdg2AVUEkq3qbc/ ZQZQUljiF9h24ppg/mElrl3Ak7PW2C6TrwKeUbzzVdydWa+15fhb10RTfl9xyNgZR3 Xc0Zj/c9z+xr9yc9eSasEXWUgOMH/tVJa6rpH907fqtP4Y9+qybrxOa/kmeomGFIBT ePxtv4VZrxHAbinfVvaESmk1f4frUNjdwrCcso0Yibap3lonKBlDmanls4rnPN69X3 d2naazI6a9VdUoApYI3GlMaUNZED/WEoZo8DBeJUkTaftz9xAZOk+tyfa4EYWd3IEi BUl1lMmUZF/dQ== From: Lorenzo Bianconi To: netdev@vger.kernel.org Cc: lorenzo.bianconi@redhat.com, davem@davemloft.net, kuba@kernel.org, edumazet@google.com, pabeni@redhat.com, bpf@vger.kernel.org, toke@redhat.com, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, sdf@google.com, hawk@kernel.org, ilias.apalodimas@linaro.org, linyunsheng@huawei.com Subject: [PATCH v8 net-next 2/4] xdp: rely on skb pointer reference in do_xdp_generic and netif_receive_generic_xdp Date: Mon, 5 Feb 2024 12:35:13 +0100 Message-ID: X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Rely on skb pointer reference instead of the skb pointer in do_xdp_generic and netif_receive_generic_xdp routine signatures. This is a preliminary patch to add multi-buff support for xdp running in generic mode where we will need to reallocate the skb to avoid linearization and we will need to make it visible to do_xdp_generic() caller. Acked-by: Jesper Dangaard Brouer Reviewed-by: Toke Hoiland-Jorgensen Signed-off-by: Lorenzo Bianconi --- drivers/net/tun.c | 4 ++-- include/linux/netdevice.h | 2 +- net/core/dev.c | 16 +++++++++------- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index b472f2c972d8..bc80fc1d576e 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1926,7 +1926,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); if (xdp_prog) { - ret = do_xdp_generic(xdp_prog, skb); + ret = do_xdp_generic(xdp_prog, &skb); if (ret != XDP_PASS) { rcu_read_unlock(); local_bh_enable(); @@ -2516,7 +2516,7 @@ static int tun_xdp_one(struct tun_struct *tun, skb_record_rx_queue(skb, tfile->queue_index); if (skb_xdp) { - ret = do_xdp_generic(xdp_prog, skb); + ret = do_xdp_generic(xdp_prog, &skb); if (ret != XDP_PASS) { ret = 0; goto out; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 118c40258d07..7eee99a58200 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3958,7 +3958,7 @@ static inline void dev_consume_skb_any(struct sk_buff *skb) u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog); void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog); -int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb); +int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff **pskb); int netif_rx(struct sk_buff *skb); int __netif_rx(struct sk_buff *skb); diff --git a/net/core/dev.c b/net/core/dev.c index 235421d313c3..0cd25dcac9d9 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4936,10 +4936,11 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, return act; } -static u32 netif_receive_generic_xdp(struct sk_buff *skb, +static u32 netif_receive_generic_xdp(struct sk_buff **pskb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { + struct sk_buff *skb = *pskb; u32 act = XDP_DROP; /* Reinjected packets coming from act_mirred or similar should @@ -5020,24 +5021,24 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog) static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key); -int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) +int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff **pskb) { if (xdp_prog) { struct xdp_buff xdp; u32 act; int err; - act = netif_receive_generic_xdp(skb, &xdp, xdp_prog); + act = netif_receive_generic_xdp(pskb, &xdp, xdp_prog); if (act != XDP_PASS) { switch (act) { case XDP_REDIRECT: - err = xdp_do_generic_redirect(skb->dev, skb, + err = xdp_do_generic_redirect((*pskb)->dev, *pskb, &xdp, xdp_prog); if (err) goto out_redir; break; case XDP_TX: - generic_xdp_tx(skb, xdp_prog); + generic_xdp_tx(*pskb, xdp_prog); break; } return XDP_DROP; @@ -5045,7 +5046,7 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) } return XDP_PASS; out_redir: - kfree_skb_reason(skb, SKB_DROP_REASON_XDP); + kfree_skb_reason(*pskb, SKB_DROP_REASON_XDP); return XDP_DROP; } EXPORT_SYMBOL_GPL(do_xdp_generic); @@ -5368,7 +5369,8 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc, int ret2; migrate_disable(); - ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb); + ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), + &skb); migrate_enable(); if (ret2 != XDP_PASS) { From patchwork Mon Feb 5 11:35:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 13545298 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA1BE199A1; Mon, 5 Feb 2024 11:35:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707132952; cv=none; b=loR1cspRepRZB/bcXOK8YpJgdbAr3m9lBNGo+xv8Hzcr8/UoWL9qHJ/bN1zmuPOrvJ29YjbtL143wKWRIEO4CPVpZFgpUOlQh2qlwZidOhG3VBYRPqEMkzCJMtLmm1itEIIWbjw5GsKtewW3Br4vAHYyz6dbid7GOf007OHthm8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707132952; c=relaxed/simple; bh=Yt3pdKfgSXpxlGDJWl+NR1sRJOtDzhYXOhv8fFfYiZs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HtwgYgv9is0Ui+vSwEMWONxbNS1RPwKIu9ghRUD/Ma6ZsC8GftM+4oH05bwNQhISDOGANyBTt8dSdL4KrdyCeaaUmu/wtWcbk6oGS7uM/Kwv2IVRCy852QVeTKiG7pqTIyuOimDR1okQRl7GqS+/TTn72EB8wniUKvEGu6L9SO4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RWATxO06; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RWATxO06" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0CB02C433C7; Mon, 5 Feb 2024 11:35:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707132951; bh=Yt3pdKfgSXpxlGDJWl+NR1sRJOtDzhYXOhv8fFfYiZs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RWATxO06qCWUFIXe7r8+wBOEc2UjNy2w0JhdGP2mnc4z2TCYBZxKvT7oDDVVzXgX/ DhH5PgEbdenuFs6AKvYt1jLj0fcQwW2TxLU0jj+Chpa8qV1KqqWw+4E/Kz6rMorE1P f3MsscUShORWZ1XHE+Mz8MVjWyEOg5z1O6haXeiMY58xb82GqF2xmaiUP5FkivbZ8+ yEgjkB011u66BQ57nl0Rg9mbi4HMWXmNSCyVlEjFaS+6kMCmrX/1M9kx/G87tvBugi q93Zl/r9mRaKMz3yE2ydJoe0RIhLNFpjvY4bL4jY2HHZ2VKCUJ8F0D2P+ySqjeZmkw 4iH5D5jgfzDtQ== From: Lorenzo Bianconi To: netdev@vger.kernel.org Cc: lorenzo.bianconi@redhat.com, davem@davemloft.net, kuba@kernel.org, edumazet@google.com, pabeni@redhat.com, bpf@vger.kernel.org, toke@redhat.com, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, sdf@google.com, hawk@kernel.org, ilias.apalodimas@linaro.org, linyunsheng@huawei.com Subject: [PATCH v8 net-next 3/4] xdp: add multi-buff support for xdp running in generic mode Date: Mon, 5 Feb 2024 12:35:14 +0100 Message-ID: <75a3f36681379289f70ce536b5bf9ac984e05f49.1707132752.git.lorenzo@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Similar to native xdp, do not always linearize the skb in netif_receive_generic_xdp routine but create a non-linear xdp_buff to be processed by the eBPF program. This allow to add multi-buffer support for xdp running in generic mode. Acked-by: Jesper Dangaard Brouer Reviewed-by: Toke Hoiland-Jorgensen Signed-off-by: Lorenzo Bianconi --- include/linux/skbuff.h | 2 + net/core/dev.c | 70 +++++++++++++++++++++++--------- net/core/skbuff.c | 91 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 144 insertions(+), 19 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2dde34c29203..def3d8689c3d 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3446,6 +3446,8 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f) __skb_frag_ref(&skb_shinfo(skb)->frags[f]); } +int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb, + struct bpf_prog *prog); bool napi_pp_put_page(struct page *page, bool napi_safe); static inline void diff --git a/net/core/dev.c b/net/core/dev.c index 0cd25dcac9d9..78afaf31f61e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4874,6 +4874,12 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, xdp_init_buff(xdp, frame_sz, &rxqueue->xdp_rxq); xdp_prepare_buff(xdp, hard_start, skb_headroom(skb) - mac_len, skb_headlen(skb) + mac_len, true); + if (skb_is_nonlinear(skb)) { + skb_shinfo(skb)->xdp_frags_size = skb->data_len; + xdp_buff_set_frags_flag(xdp); + } else { + xdp_buff_clear_frags_flag(xdp); + } orig_data_end = xdp->data_end; orig_data = xdp->data; @@ -4903,6 +4909,14 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, skb->len += off; /* positive on grow, negative on shrink */ } + /* XDP frag metadata (e.g. nr_frags) are updated in eBPF helpers + * (e.g. bpf_xdp_adjust_tail), we need to update data_len here. + */ + if (xdp_buff_has_frags(xdp)) + skb->data_len = skb_shinfo(skb)->xdp_frags_size; + else + skb->data_len = 0; + /* check if XDP changed eth hdr such SKB needs update */ eth = (struct ethhdr *)xdp->data; if ((orig_eth_type != eth->h_proto) || @@ -4936,12 +4950,35 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, return act; } +static int +netif_skb_check_for_xdp(struct sk_buff **pskb, struct bpf_prog *prog) +{ + struct sk_buff *skb = *pskb; + int err, hroom, troom; + + if (!skb_cow_data_for_xdp(this_cpu_read(system_page_pool), pskb, prog)) + return 0; + + /* In case we have to go down the path and also linearize, + * then lets do the pskb_expand_head() work just once here. + */ + hroom = XDP_PACKET_HEADROOM - skb_headroom(skb); + troom = skb->tail + skb->data_len - skb->end; + err = pskb_expand_head(skb, + hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0, + troom > 0 ? troom + 128 : 0, GFP_ATOMIC); + if (err) + return err; + + return skb_linearize(skb); +} + static u32 netif_receive_generic_xdp(struct sk_buff **pskb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { struct sk_buff *skb = *pskb; - u32 act = XDP_DROP; + u32 mac_len, act = XDP_DROP; /* Reinjected packets coming from act_mirred or similar should * not get XDP generic processing. @@ -4949,41 +4986,36 @@ static u32 netif_receive_generic_xdp(struct sk_buff **pskb, if (skb_is_redirected(skb)) return XDP_PASS; - /* XDP packets must be linear and must have sufficient headroom - * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also - * native XDP provides, thus we need to do it here as well. + /* XDP packets must have sufficient headroom of XDP_PACKET_HEADROOM + * bytes. This is the guarantee that also native XDP provides, + * thus we need to do it here as well. */ + mac_len = skb->data - skb_mac_header(skb); + __skb_push(skb, mac_len); + if (skb_cloned(skb) || skb_is_nonlinear(skb) || skb_headroom(skb) < XDP_PACKET_HEADROOM) { - int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb); - int troom = skb->tail + skb->data_len - skb->end; - - /* In case we have to go down the path and also linearize, - * then lets do the pskb_expand_head() work just once here. - */ - if (pskb_expand_head(skb, - hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0, - troom > 0 ? troom + 128 : 0, GFP_ATOMIC)) - goto do_drop; - if (skb_linearize(skb)) + if (netif_skb_check_for_xdp(pskb, xdp_prog)) goto do_drop; } - act = bpf_prog_run_generic_xdp(skb, xdp, xdp_prog); + __skb_pull(*pskb, mac_len); + + act = bpf_prog_run_generic_xdp(*pskb, xdp, xdp_prog); switch (act) { case XDP_REDIRECT: case XDP_TX: case XDP_PASS: break; default: - bpf_warn_invalid_xdp_action(skb->dev, xdp_prog, act); + bpf_warn_invalid_xdp_action((*pskb)->dev, xdp_prog, act); fallthrough; case XDP_ABORTED: - trace_xdp_exception(skb->dev, xdp_prog, act); + trace_xdp_exception((*pskb)->dev, xdp_prog, act); fallthrough; case XDP_DROP: do_drop: - kfree_skb(skb); + kfree_skb(*pskb); break; } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9e5eb47b4025..bdb94749f05d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -895,6 +895,97 @@ static bool is_pp_page(struct page *page) return (page->pp_magic & ~0x3UL) == PP_SIGNATURE; } +static int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb, + unsigned int headroom) +{ +#if IS_ENABLED(CONFIG_PAGE_POOL) + u32 size, truesize, len, max_head_size, off; + struct sk_buff *skb = *pskb, *nskb; + int err, i, head_off; + void *data; + + /* XDP does not support fraglist so we need to linearize + * the skb. + */ + if (skb_has_frag_list(skb)) + return -EOPNOTSUPP; + + max_head_size = SKB_WITH_OVERHEAD(PAGE_SIZE - headroom); + if (skb->len > max_head_size + MAX_SKB_FRAGS * PAGE_SIZE) + return -ENOMEM; + + size = min_t(u32, skb->len, max_head_size); + truesize = SKB_HEAD_ALIGN(size) + headroom; + data = page_pool_dev_alloc_va(pool, &truesize); + if (!data) + return -ENOMEM; + + nskb = napi_build_skb(data, truesize); + if (!nskb) { + page_pool_free_va(pool, data, true); + return -ENOMEM; + } + + skb_reserve(nskb, headroom); + skb_copy_header(nskb, skb); + skb_mark_for_recycle(nskb); + + err = skb_copy_bits(skb, 0, nskb->data, size); + if (err) { + consume_skb(nskb); + return err; + } + skb_put(nskb, size); + + head_off = skb_headroom(nskb) - skb_headroom(skb); + skb_headers_offset_update(nskb, head_off); + + off = size; + len = skb->len - off; + for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) { + struct page *page; + u32 page_off; + + size = min_t(u32, len, PAGE_SIZE); + truesize = size; + + page = page_pool_dev_alloc(pool, &page_off, &truesize); + if (!data) { + consume_skb(nskb); + return -ENOMEM; + } + + skb_add_rx_frag(nskb, i, page, page_off, size, truesize); + err = skb_copy_bits(skb, off, page_address(page) + page_off, + size); + if (err) { + consume_skb(nskb); + return err; + } + + len -= size; + off += size; + } + + consume_skb(skb); + *pskb = nskb; + + return 0; +#else + return -EOPNOTSUPP; +#endif +} + +int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb, + struct bpf_prog *prog) +{ + if (!prog->aux->xdp_has_frags) + return -EINVAL; + + return skb_pp_cow_data(pool, pskb, XDP_PACKET_HEADROOM); +} +EXPORT_SYMBOL(skb_cow_data_for_xdp); + #if IS_ENABLED(CONFIG_PAGE_POOL) bool napi_pp_put_page(struct page *page, bool napi_safe) { From patchwork Mon Feb 5 11:35:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 13545299 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 949A0199B8; Mon, 5 Feb 2024 11:35:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707132955; cv=none; b=jqMShpGiHcOuBhyOp9Tqq76D14HjkatHYcKSjx0Ea3OMsyQvgGxcsAY8fqE1jHfkeIr2y8ZTze6EyrGb+BzAWARAdiUCPeSCenOX31eLmndgGHtiPK6UVezkgQfG3CAmvsvjwqKqGGkO3h1ehQ4MpSg1eS1s3asNQc9c21qMO4E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707132955; c=relaxed/simple; bh=npnGTxqdAfxgLB2h7ew4AfQNQpV2lZxre+vKff1Nc9U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UMafEv77CEvzfrnHHVMPHGemD9+o0Bry1mq7dPhg5Somp+0/w1ArvoXBsXaMo5i+s28Gsg/DD4dUch5z97TkKuB4WoyUwzFciDsmZgsGeXf3mlE5e49ZZu0rIoyzZg/rk1Zi+6xLhXzILgwIdGef0CloL58RI4bUxkQvLZf1nTQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sUYmeCYS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sUYmeCYS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DFEBEC433C7; Mon, 5 Feb 2024 11:35:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707132955; bh=npnGTxqdAfxgLB2h7ew4AfQNQpV2lZxre+vKff1Nc9U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sUYmeCYSimpi1aLTo3AHZwqHvFjX57nknHyXoKLknTGF20IDIPx3iRzdxx3fwqC4e H1lze1Hotf2vICSPJjRLq4QA4Mu5r0y5KVjIEmZ4390R2Eqi3Hi7tRG28H8web9iwv kYhZ9PEJA++fr0iXXUh6TOx38XWM9ewbwyeAJzbrjkHDisnPn1LMLTB8k9bLOimNt3 PNiGvQl//4RKX/r2eG8sE1zie5JuA/duDQTZs6+xxtAlYikETM3m1hEyX9CvItaFq4 ZAjCVrs4ZFXIXgB4vgGicUW6SckMvKYVdmcBEW7AvkqUE7pMzkBwYIicNrNTpHK48Q esWHiCXANsw9w== From: Lorenzo Bianconi To: netdev@vger.kernel.org Cc: lorenzo.bianconi@redhat.com, davem@davemloft.net, kuba@kernel.org, edumazet@google.com, pabeni@redhat.com, bpf@vger.kernel.org, toke@redhat.com, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, sdf@google.com, hawk@kernel.org, ilias.apalodimas@linaro.org, linyunsheng@huawei.com Subject: [PATCH v8 net-next 4/4] veth: rely on skb_cow_data_for_xdp utility routine Date: Mon, 5 Feb 2024 12:35:15 +0100 Message-ID: <0585c5f792121d8cfb29d8e0224b00d1e4a30f84.1707132752.git.lorenzo@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Rely on skb_cow_data_for_xdp utility routine and remove duplicated code. Acked-by: Jesper Dangaard Brouer Reviewed-by: Toke Hoiland-Jorgensen Signed-off-by: Lorenzo Bianconi --- drivers/net/veth.c | 79 +++------------------------------------------- 1 file changed, 5 insertions(+), 74 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 578e36ea1589..a7a541c1a374 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -721,7 +721,8 @@ static void veth_xdp_get(struct xdp_buff *xdp) static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, struct xdp_buff *xdp, - struct sk_buff **pskb) + struct sk_buff **pskb, + struct bpf_prog *prog) { struct sk_buff *skb = *pskb; u32 frame_sz; @@ -729,80 +730,10 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, if (skb_shared(skb) || skb_head_is_locked(skb) || skb_shinfo(skb)->nr_frags || skb_headroom(skb) < XDP_PACKET_HEADROOM) { - u32 size, len, max_head_size, off, truesize, page_offset; - struct sk_buff *nskb; - struct page *page; - int i, head_off; - void *va; - - /* We need a private copy of the skb and data buffers since - * the ebpf program can modify it. We segment the original skb - * into order-0 pages without linearize it. - * - * Make sure we have enough space for linear and paged area - */ - max_head_size = SKB_WITH_OVERHEAD(PAGE_SIZE - - VETH_XDP_HEADROOM); - if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size) - goto drop; - - size = min_t(u32, skb->len, max_head_size); - truesize = SKB_HEAD_ALIGN(size) + VETH_XDP_HEADROOM; - - /* Allocate skb head */ - va = page_pool_dev_alloc_va(rq->page_pool, &truesize); - if (!va) - goto drop; - - nskb = napi_build_skb(va, truesize); - if (!nskb) { - page_pool_free_va(rq->page_pool, va, true); + if (skb_cow_data_for_xdp(rq->page_pool, pskb, prog)) goto drop; - } - - skb_reserve(nskb, VETH_XDP_HEADROOM); - skb_copy_header(nskb, skb); - skb_mark_for_recycle(nskb); - - if (skb_copy_bits(skb, 0, nskb->data, size)) { - consume_skb(nskb); - goto drop; - } - skb_put(nskb, size); - head_off = skb_headroom(nskb) - skb_headroom(skb); - skb_headers_offset_update(nskb, head_off); - - /* Allocate paged area of new skb */ - off = size; - len = skb->len - off; - - for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) { - size = min_t(u32, len, PAGE_SIZE); - truesize = size; - - page = page_pool_dev_alloc(rq->page_pool, &page_offset, - &truesize); - if (!page) { - consume_skb(nskb); - goto drop; - } - - skb_add_rx_frag(nskb, i, page, page_offset, size, - truesize); - if (skb_copy_bits(skb, off, - page_address(page) + page_offset, - size)) { - consume_skb(nskb); - goto drop; - } - - len -= size; - off += size; - } - - consume_skb(skb); - skb = nskb; + skb = *pskb; } /* SKB "head" area always have tailroom for skb_shared_info */ @@ -850,7 +781,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, } __skb_push(skb, skb->data - skb_mac_header(skb)); - if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb)) + if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb, xdp_prog)) goto drop; vxbuf.skb = skb;