From patchwork Fri Jan 7 21:54:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12707096 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6D2EC433FE for ; Fri, 7 Jan 2022 21:54:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230246AbiAGVyt (ORCPT ); Fri, 7 Jan 2022 16:54:49 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:26054 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230142AbiAGVys (ORCPT ); Fri, 7 Jan 2022 16:54:48 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1641592487; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KAcD94qdlwjlnNCoiQr7ntwNIbP3ohSgAIuV4csJwJA=; b=Tux+/Uh4xLnzbRdd/tUBNXBtNSOM/J8xFetgN3I2QrzT1lKA8IEB53GcRW9ZjVhzOURcy3 Nqy3ECASOoXPJFB3ogqfS/Qk9k5XvUBZYxNzGdygregAN9TbJyCTLTsyX6J4utcHLE0PJU x9rhjGonZhB5FMdgemaYzXcqGWUatZI= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-492-WPrEpGKOOAWC5G3a2N8aiQ-1; Fri, 07 Jan 2022 16:54:46 -0500 X-MC-Unique: WPrEpGKOOAWC5G3a2N8aiQ-1 Received: by mail-ed1-f69.google.com with SMTP id y18-20020a056402271200b003fa16a5debcso5715503edd.14 for ; Fri, 07 Jan 2022 13:54:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KAcD94qdlwjlnNCoiQr7ntwNIbP3ohSgAIuV4csJwJA=; b=7wd+sL7di1wXZiWV92qOEMAFc0RQ5bjG0hJlsLiIRbMPrCmUUOXYgqvXI0/bbTX9GP OJnKGAfc6NYokPfBXOcjRD1tlqLB6Swohrm849YUmHal98gRNZIfwDDKGeWjovAR/gTO ufgsfgu0w1DF+aJ8K+zjeaWMXoHJ/ib4y7Rikn0mpPTRmiPDeDtw69GkBxAUP/LaRk4e b84xSoIqgx2ix21HWBSgWdgxaJo3duNPI8aZrUc0FxMfa27y4jqeWWCTEPD3cOoynChu 2V3NTq3t1bGjmA0h5AZ9F+zVlXbXwEDho27XlW+mnxUocwJINuWFc+RDSdSiMpXopZe8 8U0A== X-Gm-Message-State: AOAM530rOsqL633VIAxpg8dZC//fc8g2lw9mWoX5emrCQlpJNQ19DoTR xhwwq3BBrKqDCbqnQ+WkSRNYQ26MKpizerZbxzlEzSNdefuLP523EX55LLXvj6jJrFGwL5X73Wz MkyPAM7m5sw8MFdY5 X-Received: by 2002:a17:907:212c:: with SMTP id qo12mr4844530ejb.108.1641592484483; Fri, 07 Jan 2022 13:54:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJxfVNKszgeD2TnzBoKwc5nlN44/0KK16QVDgL+XT+EouOQHhv3g92pKRy4bdpe4Pc0WiEv4yg== X-Received: by 2002:a17:907:212c:: with SMTP id qo12mr4844519ejb.108.1641592484118; Fri, 07 Jan 2022 13:54:44 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id x20sm2508023edd.28.2022.01.07.13.54.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Jan 2022 13:54:43 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 80455181F2C; Fri, 7 Jan 2022 22:54:41 +0100 (CET) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer Cc: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next v7 1/3] bpf: Add "live packet" mode for XDP in bpf_prog_run() Date: Fri, 7 Jan 2022 22:54:36 +0100 Message-Id: <20220107215438.321922-2-toke@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220107215438.321922-1-toke@redhat.com> References: <20220107215438.321922-1-toke@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This adds support for running XDP programs through bpf_prog_run() in a mode that enables live packet processing of the resulting frames. Previous uses of bpf_prog_run() for XDP returned the XDP program return code and the modified packet data to userspace, which is useful for unit testing of XDP programs. The existing bpf_prog_run() for XDP allows userspace to set the ingress ifindex and RXQ number as part of the context object being passed to the kernel. This patch reuses that code, but adds a new mode with different semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES flag. When running bpf_prog_run() in this mode, the XDP program return codes will be honoured: returning XDP_PASS will result in the frame being injected into the networking stack as if it came from the selected networking interface, while returning XDP_TX and XDP_REDIRECT will result in the frame being transmitted out that interface. XDP_TX is translated into an XDP_REDIRECT operation to the same interface, since the real XDP_TX action is only possible from within the network drivers themselves, not from the process context where bpf_prog_run() is executed. Internally, this new mode of operation creates a page pool instance while setting up the test run, and feeds pages from that into the XDP program. The setup cost of this is amortised over the number of repetitions specified by userspace. To support the performance testing use case, we further optimise the setup step so that all pages in the pool are pre-initialised with the packet data, and pre-computed context and xdp_frame objects stored at the start of each page. This makes it possible to entirely avoid touching the page content on each XDP program invocation, and enables sending up to 12 Mpps/core on my test box. Because the data pages are recycled by the page pool, and the test runner doesn't re-initialise them for each run, subsequent invocations of the XDP program will see the packet data in the state it was after the last time it ran on that particular page. This means that an XDP program that modifies the packet before redirecting it has to be careful about which assumptions it makes about the packet content, but that is only an issue for the most naively written programs. Enabling the new flag is only allowed when not setting ctx_out and data_out in the test specification, since using it means frames will be redirected somewhere else, so they can't be returned. Signed-off-by: Toke Høiland-Jørgensen --- include/uapi/linux/bpf.h | 2 + kernel/bpf/Kconfig | 1 + net/bpf/test_run.c | 290 ++++++++++++++++++++++++++++++++- tools/include/uapi/linux/bpf.h | 2 + 4 files changed, 287 insertions(+), 8 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b0383d371b9a..5ef20deaf49f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1225,6 +1225,8 @@ enum { /* If set, run the test on the cpu specified by bpf_attr.test.cpu */ #define BPF_F_TEST_RUN_ON_CPU (1U << 0) +/* If set, XDP frames will be transmitted after processing */ +#define BPF_F_TEST_XDP_LIVE_FRAMES (1U << 1) /* type for BPF_ENABLE_STATS */ enum bpf_stats_type { diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig index d24d518ddd63..c8c920020d11 100644 --- a/kernel/bpf/Kconfig +++ b/kernel/bpf/Kconfig @@ -30,6 +30,7 @@ config BPF_SYSCALL select TASKS_TRACE_RCU select BINARY_PRINTF select NET_SOCK_MSG if NET + select PAGE_POOL if NET default n help Enable the bpf() system call that allows to manipulate BPF programs diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 46dd95755967..c110daccd6e5 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -52,10 +53,11 @@ static void bpf_test_timer_leave(struct bpf_test_timer *t) rcu_read_unlock(); } -static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *err, u32 *duration) +static bool bpf_test_timer_continue(struct bpf_test_timer *t, int iterations, + u32 repeat, int *err, u32 *duration) __must_hold(rcu) { - t->i++; + t->i += iterations; if (t->i >= repeat) { /* We're done. */ t->time_spent += ktime_get_ns() - t->time_start; @@ -87,6 +89,270 @@ static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *e return false; } +/* We put this struct at the head of each page with a context and frame + * initialised when the page is allocated, so we don't have to do this on each + * repetition of the test run. + */ +struct xdp_page_head { + struct xdp_buff orig_ctx; + struct xdp_buff ctx; + struct xdp_frame frm; + u8 data[]; +}; + +struct xdp_test_data { + struct xdp_buff *orig_ctx; + struct xdp_rxq_info rxq; + struct net_device *dev; + struct page_pool *pp; + u32 frame_cnt; +}; + +#define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head) \ + - sizeof(struct skb_shared_info)) +#define TEST_XDP_BATCH 64 + +static void xdp_test_run_init_page(struct page *page, void *arg) +{ + struct xdp_page_head *head = phys_to_virt(page_to_phys(page)); + struct xdp_buff *new_ctx, *orig_ctx; + u32 headroom = XDP_PACKET_HEADROOM; + struct xdp_test_data *xdp = arg; + size_t frm_len, meta_len; + struct xdp_frame *frm; + void *data; + + orig_ctx = xdp->orig_ctx; + frm_len = orig_ctx->data_end - orig_ctx->data_meta; + meta_len = orig_ctx->data - orig_ctx->data_meta; + headroom -= meta_len; + + new_ctx = &head->ctx; + frm = &head->frm; + data = &head->data; + memcpy(data + headroom, orig_ctx->data_meta, frm_len); + + xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &xdp->rxq); + xdp_prepare_buff(new_ctx, data, headroom, frm_len, true); + new_ctx->data = new_ctx->data_meta + meta_len; + + xdp_update_frame_from_buff(new_ctx, frm); + frm->mem = new_ctx->rxq->mem; + + memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx)); +} + +static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_ctx) +{ + struct xdp_mem_info mem = {}; + struct page_pool *pp; + int err; + struct page_pool_params pp_params = { + .order = 0, + .flags = 0, + .pool_size = NAPI_POLL_WEIGHT * 2, + .nid = NUMA_NO_NODE, + .max_len = TEST_XDP_FRAME_SIZE, + .init_callback = xdp_test_run_init_page, + .init_arg = xdp, + }; + + pp = page_pool_create(&pp_params); + if (IS_ERR(pp)) + return PTR_ERR(pp); + + /* will copy 'mem.id' into pp->xdp_mem_id */ + err = xdp_reg_mem_model(&mem, MEM_TYPE_PAGE_POOL, pp); + if (err) { + page_pool_destroy(pp); + return err; + } + xdp->pp = pp; + + /* We create a 'fake' RXQ referencing the original dev, but with an + * xdp_mem_info pointing to our page_pool + */ + xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0); + xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL; + xdp->rxq.mem.id = pp->xdp_mem_id; + xdp->dev = orig_ctx->rxq->dev; + xdp->orig_ctx = orig_ctx; + + return 0; +} + +static void xdp_test_run_teardown(struct xdp_test_data *xdp) +{ + struct xdp_mem_info mem = { + .id = xdp->pp->xdp_mem_id, + .type = MEM_TYPE_PAGE_POOL, + }; + + xdp_unreg_mem_model(&mem); +} + +static bool ctx_was_changed(struct xdp_page_head *head) +{ + return head->orig_ctx.data != head->ctx.data || + head->orig_ctx.data_meta != head->ctx.data_meta || + head->orig_ctx.data_end != head->ctx.data_end; +} + +static void reset_ctx(struct xdp_page_head *head) +{ + if (likely(!ctx_was_changed(head))) + return; + + head->ctx.data = head->orig_ctx.data; + head->ctx.data_meta = head->orig_ctx.data_meta; + head->ctx.data_end = head->orig_ctx.data_end; + xdp_update_frame_from_buff(&head->ctx, &head->frm); +} + +static int xdp_recv_frames(struct xdp_frame **frames, int nframes, + struct net_device *dev) +{ + gfp_t gfp = __GFP_ZERO | GFP_ATOMIC; + void *skbs[TEST_XDP_BATCH]; + int i, n; + LIST_HEAD(list); + + n = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, nframes, skbs); + if (unlikely(n == 0)) { + for (i = 0; i < nframes; i++) + xdp_return_frame(frames[i]); + return -ENOMEM; + } + + for (i = 0; i < nframes; i++) { + struct xdp_frame *xdpf = frames[i]; + struct sk_buff *skb = skbs[i]; + + skb = __xdp_build_skb_from_frame(xdpf, skb, dev); + if (!skb) { + xdp_return_frame(xdpf); + continue; + } + + list_add_tail(&skb->list, &list); + } + netif_receive_skb_list(&list); + + return 0; +} + +static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, + u32 repeat) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + int err = 0, act, ret, i, nframes = 0, batch_sz; + struct xdp_frame *frames[TEST_XDP_BATCH]; + struct xdp_page_head *head; + struct xdp_frame *frm; + bool redirect = false; + struct xdp_buff *ctx; + struct page *page; + + batch_sz = min_t(u32, repeat, TEST_XDP_BATCH); + local_bh_disable(); + xdp_set_return_frame_no_direct(); + + for (i = 0; i < batch_sz; i++) { + page = page_pool_dev_alloc_pages(xdp->pp); + if (!page) { + err = -ENOMEM; + goto out; + } + + head = phys_to_virt(page_to_phys(page)); + reset_ctx(head); + ctx = &head->ctx; + frm = &head->frm; + xdp->frame_cnt++; + + act = bpf_prog_run_xdp(prog, ctx); + + /* if program changed pkt bounds we need to update the xdp_frame */ + if (unlikely(ctx_was_changed(head))) { + err = xdp_update_frame_from_buff(ctx, frm); + if (err) { + xdp_return_buff(ctx); + goto out; + } + } + + switch (act) { + case XDP_TX: + /* we can't do a real XDP_TX since we're not in the + * driver, so turn it into a REDIRECT back to the same + * index + */ + ri->tgt_index = xdp->dev->ifindex; + ri->map_id = INT_MAX; + ri->map_type = BPF_MAP_TYPE_UNSPEC; + fallthrough; + case XDP_REDIRECT: + redirect = true; + err = xdp_do_redirect_frame(xdp->dev, ctx, frm, prog); + if (err) { + xdp_return_buff(ctx); + goto out; + } + break; + case XDP_PASS: + frames[nframes++] = frm; + break; + default: + bpf_warn_invalid_xdp_action(NULL, prog, act); + fallthrough; + case XDP_DROP: + xdp_return_buff(ctx); + break; + } + } + +out: + if (redirect) + xdp_do_flush(); + if (nframes) { + ret = xdp_recv_frames(frames, nframes, xdp->dev); + if (ret) + err = ret; + } + + xdp_clear_return_frame_no_direct(); + local_bh_enable(); + return err; +} + +static int bpf_test_run_xdp_live(struct bpf_prog *prog, struct xdp_buff *ctx, + u32 repeat, u32 *time) + +{ + struct bpf_test_timer t = { .mode = NO_MIGRATE }; + struct xdp_test_data xdp = {}; + int ret; + + if (!repeat) + repeat = 1; + + ret = xdp_test_run_setup(&xdp, ctx); + if (ret) + return ret; + + bpf_test_timer_enter(&t); + do { + xdp.frame_cnt = 0; + ret = xdp_test_run_batch(&xdp, prog, repeat - t.i); + if (unlikely(ret < 0)) + break; + } while (bpf_test_timer_continue(&t, xdp.frame_cnt, repeat, &ret, time)); + bpf_test_timer_leave(&t); + + xdp_test_run_teardown(&xdp); + return ret; +} + static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *retval, u32 *time, bool xdp) { @@ -118,7 +384,7 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, *retval = bpf_prog_run_xdp(prog, ctx); else *retval = bpf_prog_run(prog, ctx); - } while (bpf_test_timer_continue(&t, repeat, &ret, time)); + } while (bpf_test_timer_continue(&t, 1, repeat, &ret, time)); bpf_reset_run_ctx(old_ctx); bpf_test_timer_leave(&t); @@ -757,13 +1023,14 @@ static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md) int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { + bool do_live = (kattr->test.flags & BPF_F_TEST_XDP_LIVE_FRAMES); u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); u32 headroom = XDP_PACKET_HEADROOM; u32 size = kattr->test.data_size_in; u32 repeat = kattr->test.repeat; struct netdev_rx_queue *rxqueue; struct xdp_buff xdp = {}; - u32 retval, duration; + u32 retval = 0, duration; struct xdp_md *ctx; u32 max_data_sz; void *data; @@ -773,6 +1040,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, prog->expected_attach_type == BPF_XDP_CPUMAP) return -EINVAL; + if (kattr->test.flags & ~BPF_F_TEST_XDP_LIVE_FRAMES) + return -EINVAL; + ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md)); if (IS_ERR(ctx)) return PTR_ERR(ctx); @@ -781,7 +1051,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, /* There can't be user provided data before the meta data */ if (ctx->data_meta || ctx->data_end != size || ctx->data > ctx->data_end || - unlikely(xdp_metalen_invalid(ctx->data))) + unlikely(xdp_metalen_invalid(ctx->data)) || + (do_live && (kattr->test.data_out || kattr->test.ctx_out))) goto free_ctx; /* Meta data is allocated from the headroom */ headroom -= ctx->data; @@ -807,7 +1078,10 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, if (repeat > 1) bpf_prog_change_xdp(NULL, prog); - ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true); + if (do_live) + ret = bpf_test_run_xdp_live(prog, &xdp, repeat, &duration); + else + ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true); /* We convert the xdp_buff back to an xdp_md before checking the return * code so the reference count of any held netdevice will be decremented * even if the test run failed. @@ -905,7 +1179,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, do { retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN, size, flags); - } while (bpf_test_timer_continue(&t, repeat, &ret, &duration)); + } while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration)); bpf_test_timer_leave(&t); if (ret < 0) @@ -1000,7 +1274,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat do { ctx.selected_sk = NULL; retval = BPF_PROG_SK_LOOKUP_RUN_ARRAY(progs, ctx, bpf_prog_run); - } while (bpf_test_timer_continue(&t, repeat, &ret, &duration)); + } while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration)); bpf_test_timer_leave(&t); if (ret < 0) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index b0383d371b9a..5ef20deaf49f 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1225,6 +1225,8 @@ enum { /* If set, run the test on the cpu specified by bpf_attr.test.cpu */ #define BPF_F_TEST_RUN_ON_CPU (1U << 0) +/* If set, XDP frames will be transmitted after processing */ +#define BPF_F_TEST_XDP_LIVE_FRAMES (1U << 1) /* type for BPF_ENABLE_STATS */ enum bpf_stats_type { From patchwork Fri Jan 7 21:54:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12707094 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D57AEC433F5 for ; Fri, 7 Jan 2022 21:54:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230203AbiAGVyr (ORCPT ); Fri, 7 Jan 2022 16:54:47 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:31807 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230129AbiAGVyr (ORCPT ); Fri, 7 Jan 2022 16:54:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1641592486; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j+vG9Qa9OkGa6HWCa2Inpz3+8aGl216Qd1qEl2tq6fs=; b=auh8F/aSf7R/ZXJeuGScyRoloJlmL8KPuTukIyLPwVLvkNuC9/cEur6LS7/TkTMhORWzTl iQ0y/UA78NvaxFwcOLkDJUi5+F2OTMdrp5AljEk/vBf9XmUqa0Wlx/iYWgyq5t0f8+L802 MLC16N0PYfvtIQC9XUMwHS2QRO0dJSQ= Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-370-Loj-b5l7OmeoOoZLinyvOw-1; Fri, 07 Jan 2022 16:54:45 -0500 X-MC-Unique: Loj-b5l7OmeoOoZLinyvOw-1 Received: by mail-ed1-f72.google.com with SMTP id s7-20020a056402520700b003f841380832so5738336edd.5 for ; Fri, 07 Jan 2022 13:54:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=j+vG9Qa9OkGa6HWCa2Inpz3+8aGl216Qd1qEl2tq6fs=; b=44MO5x7GrU+QJ4yuQNb/IppFVXVyIk+5HDXiCQqtSILRbXeVVwcQcfqcP3Yvnr5u6t gCkqslJmhMHrqzw4Icy4Z53v6/6Aw+BM4Kfv1sjVZOjDzgkF2cNKcd0N1g0Fv6SpVW4o 8ekAOMPkJS7whw6wZpUWei2t4ZfV3jbnOOldqC6k2fDPg+jxowAaRZzb38ZJkC+19Fk5 boX+ujkVVjIooZN5TQjrBQHH3GKQBqGfxzJFg9D0A+Fuqnpzda/DyexpUPE/eF1jZYSJ GmPpPwoMWYumAwgNe0cJR+rFHF12Mw32iLv6o4knhsX0aAMts44v/BT6CuVDGqkLsbpu SMjg== X-Gm-Message-State: AOAM530ex+iH9lm1URLiJB1eGGU5Cyi3PSEh5J9dy0WVhfXGmdNJLhsA dLzUL0ll6YkEbrehiUEidm2vOZLWBA+NgljMwS1U1WxxDZKRbSJSp+Cg37ayZ0knLpi/d0dkYyf gQSKelAqmp3S8QwrX X-Received: by 2002:a17:907:d29:: with SMTP id gn41mr53650748ejc.124.1641592483860; Fri, 07 Jan 2022 13:54:43 -0800 (PST) X-Google-Smtp-Source: ABdhPJy40LQrWwZFXRylXH/1vsdIoGMX06eJwTDefuArTsBOPiSVY2fwcREoSGAW4tVQUZG1f2FVzA== X-Received: by 2002:a17:907:d29:: with SMTP id gn41mr53650733ejc.124.1641592483510; Fri, 07 Jan 2022 13:54:43 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id ho14sm1726455ejc.73.2022.01.07.13.54.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Jan 2022 13:54:42 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 2F50A181F2E; Fri, 7 Jan 2022 22:54:42 +0100 (CET) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer Cc: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Shuah Khan , netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next v7 2/3] selftests/bpf: Move open_netns() and close_netns() into network_helpers.c Date: Fri, 7 Jan 2022 22:54:37 +0100 Message-Id: <20220107215438.321922-3-toke@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220107215438.321922-1-toke@redhat.com> References: <20220107215438.321922-1-toke@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net These will also be used by the xdp_do_redirect test being added in the next commit. Signed-off-by: Toke Høiland-Jørgensen --- tools/testing/selftests/bpf/network_helpers.c | 86 ++++++++++++++++++ tools/testing/selftests/bpf/network_helpers.h | 9 ++ .../selftests/bpf/prog_tests/tc_redirect.c | 87 ------------------- 3 files changed, 95 insertions(+), 87 deletions(-) diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index 6db1af8fdee7..2bb1f9b3841d 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -1,18 +1,25 @@ // SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE + #include #include #include #include #include +#include #include +#include +#include #include #include #include +#include #include "bpf_util.h" #include "network_helpers.h" +#include "test_progs.h" #define clean_errno() (errno == 0 ? "None" : strerror(errno)) #define log_err(MSG, ...) ({ \ @@ -356,3 +363,82 @@ char *ping_command(int family) } return "ping"; } + +struct nstoken { + int orig_netns_fd; +}; + +static int setns_by_fd(int nsfd) +{ + int err; + + err = setns(nsfd, CLONE_NEWNET); + close(nsfd); + + if (!ASSERT_OK(err, "setns")) + return err; + + /* Switch /sys to the new namespace so that e.g. /sys/class/net + * reflects the devices in the new namespace. + */ + err = unshare(CLONE_NEWNS); + if (!ASSERT_OK(err, "unshare")) + return err; + + /* Make our /sys mount private, so the following umount won't + * trigger the global umount in case it's shared. + */ + err = mount("none", "/sys", NULL, MS_PRIVATE, NULL); + if (!ASSERT_OK(err, "remount private /sys")) + return err; + + err = umount2("/sys", MNT_DETACH); + if (!ASSERT_OK(err, "umount2 /sys")) + return err; + + err = mount("sysfs", "/sys", "sysfs", 0, NULL); + if (!ASSERT_OK(err, "mount /sys")) + return err; + + err = mount("bpffs", "/sys/fs/bpf", "bpf", 0, NULL); + if (!ASSERT_OK(err, "mount /sys/fs/bpf")) + return err; + + return 0; +} + +struct nstoken *open_netns(const char *name) +{ + int nsfd; + char nspath[PATH_MAX]; + int err; + struct nstoken *token; + + token = malloc(sizeof(struct nstoken)); + if (!ASSERT_OK_PTR(token, "malloc token")) + return NULL; + + token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY); + if (!ASSERT_GE(token->orig_netns_fd, 0, "open /proc/self/ns/net")) + goto fail; + + snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name); + nsfd = open(nspath, O_RDONLY | O_CLOEXEC); + if (!ASSERT_GE(nsfd, 0, "open netns fd")) + goto fail; + + err = setns_by_fd(nsfd); + if (!ASSERT_OK(err, "setns_by_fd")) + goto fail; + + return token; +fail: + free(token); + return NULL; +} + +void close_netns(struct nstoken *token) +{ + ASSERT_OK(setns_by_fd(token->orig_netns_fd), "setns_by_fd"); + free(token); +} diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h index d198181a5648..a4b3b2f9877b 100644 --- a/tools/testing/selftests/bpf/network_helpers.h +++ b/tools/testing/selftests/bpf/network_helpers.h @@ -55,4 +55,13 @@ int make_sockaddr(int family, const char *addr_str, __u16 port, struct sockaddr_storage *addr, socklen_t *len); char *ping_command(int family); +struct nstoken; +/** + * open_netns() - Switch to specified network namespace by name. + * + * Returns token with which to restore the original namespace + * using close_netns(). + */ +struct nstoken *open_netns(const char *name); +void close_netns(struct nstoken *token); #endif diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c index c2426df58e17..0a13c7eb40f3 100644 --- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c +++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c @@ -17,10 +17,8 @@ #include #include #include -#include #include #include -#include #include #include @@ -84,91 +82,6 @@ static int write_file(const char *path, const char *newval) return 0; } -struct nstoken { - int orig_netns_fd; -}; - -static int setns_by_fd(int nsfd) -{ - int err; - - err = setns(nsfd, CLONE_NEWNET); - close(nsfd); - - if (!ASSERT_OK(err, "setns")) - return err; - - /* Switch /sys to the new namespace so that e.g. /sys/class/net - * reflects the devices in the new namespace. - */ - err = unshare(CLONE_NEWNS); - if (!ASSERT_OK(err, "unshare")) - return err; - - /* Make our /sys mount private, so the following umount won't - * trigger the global umount in case it's shared. - */ - err = mount("none", "/sys", NULL, MS_PRIVATE, NULL); - if (!ASSERT_OK(err, "remount private /sys")) - return err; - - err = umount2("/sys", MNT_DETACH); - if (!ASSERT_OK(err, "umount2 /sys")) - return err; - - err = mount("sysfs", "/sys", "sysfs", 0, NULL); - if (!ASSERT_OK(err, "mount /sys")) - return err; - - err = mount("bpffs", "/sys/fs/bpf", "bpf", 0, NULL); - if (!ASSERT_OK(err, "mount /sys/fs/bpf")) - return err; - - return 0; -} - -/** - * open_netns() - Switch to specified network namespace by name. - * - * Returns token with which to restore the original namespace - * using close_netns(). - */ -static struct nstoken *open_netns(const char *name) -{ - int nsfd; - char nspath[PATH_MAX]; - int err; - struct nstoken *token; - - token = malloc(sizeof(struct nstoken)); - if (!ASSERT_OK_PTR(token, "malloc token")) - return NULL; - - token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY); - if (!ASSERT_GE(token->orig_netns_fd, 0, "open /proc/self/ns/net")) - goto fail; - - snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name); - nsfd = open(nspath, O_RDONLY | O_CLOEXEC); - if (!ASSERT_GE(nsfd, 0, "open netns fd")) - goto fail; - - err = setns_by_fd(nsfd); - if (!ASSERT_OK(err, "setns_by_fd")) - goto fail; - - return token; -fail: - free(token); - return NULL; -} - -static void close_netns(struct nstoken *token) -{ - ASSERT_OK(setns_by_fd(token->orig_netns_fd), "setns_by_fd"); - free(token); -} - static int netns_setup_namespaces(const char *verb) { const char * const *ns = namespaces; From patchwork Fri Jan 7 21:54:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12707095 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC423C4332F for ; Fri, 7 Jan 2022 21:54:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230205AbiAGVys (ORCPT ); Fri, 7 Jan 2022 16:54:48 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:21105 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230142AbiAGVyr (ORCPT ); Fri, 7 Jan 2022 16:54:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1641592486; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C30WN8dCHLdbfh99BPY5gF72+lpcQTAlFKzrznTegsU=; b=CYJ16vu59XijbLvb5/HxYhzkPwTxPcXNIWElriJXuzxHH+SfuOy839janPwjz6JtD8VKCH +srwUYbER02lU8e4aGNX60vV4eGEBguhSAVHbxoYmfE+vY242kkvsTNUECxQ+t5NDTA3Hh xegCIRaavLPSpGC41pcB3Oj2am2E/YI= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-418-jnMM7sMrNbCz9AR_jUXUqw-1; Fri, 07 Jan 2022 16:54:45 -0500 X-MC-Unique: jnMM7sMrNbCz9AR_jUXUqw-1 Received: by mail-ed1-f70.google.com with SMTP id ec25-20020a0564020d5900b003fc074c5d21so1390306edb.19 for ; Fri, 07 Jan 2022 13:54:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=C30WN8dCHLdbfh99BPY5gF72+lpcQTAlFKzrznTegsU=; b=7HXsjlSERX97LthZ5wzvUrGU1fBN1vB/uJeZXR6sPhoL+wl5FKx7VDNF2RpiS0O7iB /1IBta1W8dab/0IB653I6YapMiimVj1wOMZs68Ue7Q/D8GCzjuUsnAVayC52JqGR6DTN 3W3t5K86cAH+mU+11CEWggkiaxsTlmhqbktMsLOuhRpmaLn4vTDF1jjM7PJaI89Vm+kK Hcd39QmVO5CVTEFPEC+spC7SAUmkCShkc23Mvnks70t4Me6z9dPYUPmdxiPzQ+SXdyhb ZAE8M0V3TP0ihiuf61xNdVlH2zGiNhlYWBws0ivi6+Z5ke/4EqlqfcvAf4RYfA5vG0cP mhVA== X-Gm-Message-State: AOAM532/q7J8zPvjVdNBc7QmNPzjCD0714zkT8s41CVPDtZxdD9zcIXQ G1RZa/tv6Q8iqwE2HPbLlc4+3x+dFO2JeLkLPdD2uCOke/Dx8/lYjdTIR53VlREV7PgHZGHdZ9E qJtzHDCKgbrsBjGeH X-Received: by 2002:aa7:d793:: with SMTP id s19mr5060958edq.85.1641592484341; Fri, 07 Jan 2022 13:54:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJyKsYzPPw6ysf/axAQAzPdvgaSMHYoAKMptNcrAvW7tbLlECwPRB/JBWVW+BKHmSQp02wB0wQ== X-Received: by 2002:aa7:d793:: with SMTP id s19mr5060935edq.85.1641592483930; Fri, 07 Jan 2022 13:54:43 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id 16sm1760693ejx.149.2022.01.07.13.54.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Jan 2022 13:54:43 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id D7B0E181F2F; Fri, 7 Jan 2022 22:54:42 +0100 (CET) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: Alexei Starovoitov , Daniel Borkmann , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh Cc: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Shuah Khan , netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next v7 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run() Date: Fri, 7 Jan 2022 22:54:38 +0100 Message-Id: <20220107215438.321922-4-toke@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220107215438.321922-1-toke@redhat.com> References: <20220107215438.321922-1-toke@redhat.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This adds a selftest for the XDP_REDIRECT facility in bpf_prog_run, that redirects packets into a veth and counts them using an XDP program on the other side of the veth pair and a TC program on the local side of the veth. Signed-off-by: Toke Høiland-Jørgensen --- .../bpf/prog_tests/xdp_do_redirect.c | 175 ++++++++++++++++++ .../bpf/progs/test_xdp_do_redirect.c | 85 +++++++++ 2 files changed, 260 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c new file mode 100644 index 000000000000..2c17edc3bd5a --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c @@ -0,0 +1,175 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "test_xdp_do_redirect.skel.h" + +#define SYS(fmt, ...) \ + ({ \ + char cmd[1024]; \ + snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ + if (!ASSERT_OK(system(cmd), cmd)) \ + goto out; \ + }) + +struct udp_packet { + struct ethhdr eth; + struct ipv6hdr iph; + struct udphdr udp; + __u8 payload[64 - sizeof(struct udphdr) + - sizeof(struct ethhdr) - sizeof(struct ipv6hdr)]; +} __packed; + +static struct udp_packet pkt_udp = { + .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6), + .eth.h_dest = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55}, + .eth.h_source = {0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb}, + .iph.version = 6, + .iph.nexthdr = IPPROTO_UDP, + .iph.payload_len = bpf_htons(sizeof(struct udp_packet) + - offsetof(struct udp_packet, udp)), + .iph.hop_limit = 2, + .iph.saddr.s6_addr16 = {bpf_htons(0xfc00), 0, 0, 0, 0, 0, 0, bpf_htons(1)}, + .iph.daddr.s6_addr16 = {bpf_htons(0xfc00), 0, 0, 0, 0, 0, 0, bpf_htons(2)}, + .udp.source = bpf_htons(1), + .udp.dest = bpf_htons(1), + .udp.len = bpf_htons(sizeof(struct udp_packet) + - offsetof(struct udp_packet, udp)), + .payload = {0x42}, /* receiver XDP program matches on this */ +}; + +static int attach_tc_prog(struct bpf_tc_hook *hook, int fd) +{ + DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1, .prog_fd = fd); + int ret; + + ret = bpf_tc_hook_create(hook); + if (!ASSERT_OK(ret, "create tc hook")) + return ret; + + ret = bpf_tc_attach(hook, &opts); + if (!ASSERT_OK(ret, "bpf_tc_attach")) { + bpf_tc_hook_destroy(hook); + return ret; + } + + return 0; +} + +#define NUM_PKTS 1000000 +void test_xdp_do_redirect(void) +{ + int err, xdp_prog_fd, tc_prog_fd, ifindex_src, ifindex_dst; + char data[sizeof(pkt_udp) + sizeof(__u32)]; + struct test_xdp_do_redirect *skel = NULL; + struct nstoken *nstoken = NULL; + struct bpf_link *link; + + struct xdp_md ctx_in = { .data = sizeof(__u32), + .data_end = sizeof(data) }; + DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts, + .data_in = &data, + .data_size_in = sizeof(data), + .ctx_in = &ctx_in, + .ctx_size_in = sizeof(ctx_in), + .flags = BPF_F_TEST_XDP_LIVE_FRAMES, + .repeat = NUM_PKTS, + ); + DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook, + .attach_point = BPF_TC_INGRESS); + + memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp)); + *((__u32 *)data) = 0x42; /* metadata test value */ + + skel = test_xdp_do_redirect__open(); + if (!ASSERT_OK_PTR(skel, "skel")) + return; + + /* The XDP program we run with bpf_prog_run() will cycle through all + * three xmit (PASS/TX/REDIRECT) return codes starting from above, and + * ending up with PASS, so we should end up with two packets on the dst + * iface and NUM_PKTS-2 in the TC hook. We match the packets on the UDP + * payload. + */ + SYS("ip netns add testns"); + nstoken = open_netns("testns"); + if (!ASSERT_OK_PTR(nstoken, "setns")) + goto out; + + SYS("ip link add veth_src type veth peer name veth_dst"); + SYS("ip link set dev veth_src address 00:11:22:33:44:55"); + SYS("ip link set dev veth_dst address 66:77:88:99:aa:bb"); + SYS("ip link set dev veth_src up"); + SYS("ip link set dev veth_dst up"); + SYS("ip addr add dev veth_src fc00::1/64"); + SYS("ip addr add dev veth_dst fc00::2/64"); + SYS("ip neigh add fc00::2 dev veth_src lladdr 66:77:88:99:aa:bb"); + + /* We enable forwarding in the test namespace because that will cause + * the packets that go through the kernel stack (with XDP_PASS) to be + * forwarded back out the same interface (because of the packet dst + * combined with the interface addresses). When this happens, the + * regular forwarding path will end up going through the same + * veth_xdp_xmit() call as the XDP_REDIRECT code, which can cause a + * deadlock if it happens on the same CPU. There's a local_bh_disable() + * in the test_run code to prevent this, but an earlier version of the + * code didn't have this, so we keep the test behaviour to make sure the + * bug doesn't resurface. + */ + SYS("sysctl -qw net.ipv6.conf.all.forwarding=1"); + + ifindex_src = if_nametoindex("veth_src"); + ifindex_dst = if_nametoindex("veth_dst"); + if (!ASSERT_NEQ(ifindex_src, 0, "ifindex_src") || + !ASSERT_NEQ(ifindex_dst, 0, "ifindex_dst")) + goto out; + + memcpy(skel->rodata->expect_dst, &pkt_udp.eth.h_dest, ETH_ALEN); + skel->rodata->ifindex_out = ifindex_src; /* redirect back to the same iface */ + skel->rodata->ifindex_in = ifindex_src; + ctx_in.ingress_ifindex = ifindex_src; + tc_hook.ifindex = ifindex_src; + + if (!ASSERT_OK(test_xdp_do_redirect__load(skel), "load")) + goto out; + + link = bpf_program__attach_xdp(skel->progs.xdp_count_pkts, ifindex_dst); + if (!ASSERT_OK_PTR(link, "prog_attach")) + goto out; + skel->links.xdp_count_pkts = link; + + tc_prog_fd = bpf_program__fd(skel->progs.tc_count_pkts); + if (attach_tc_prog(&tc_hook, tc_prog_fd)) + goto out; + + xdp_prog_fd = bpf_program__fd(skel->progs.xdp_redirect); + err = bpf_prog_test_run_opts(xdp_prog_fd, &opts); + if (!ASSERT_OK(err, "prog_run")) + goto out_tc; + + /* wait for the packets to be flushed */ + kern_sync_rcu(); + + /* There will be one packet sent through XDP_REDIRECT and one through + * XDP_TX; these will show up on the XDP counting program, while the + * rest will be counted at the TC ingress hook (and the counting program + * resets the packet payload so they don't get counted twice even though + * they are re-xmited out the veth device + */ + ASSERT_EQ(skel->bss->pkts_seen_xdp, 2, "pkt_count_xdp"); + ASSERT_EQ(skel->bss->pkts_seen_tc, NUM_PKTS - 2, "pkt_count_tc"); + +out_tc: + bpf_tc_hook_destroy(&tc_hook); +out: + if (nstoken) + close_netns(nstoken); + system("ip netns del testns"); + test_xdp_do_redirect__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c new file mode 100644 index 000000000000..af3cffccc794 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c @@ -0,0 +1,85 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include + +#define ETH_ALEN 6 +const volatile int ifindex_out; +const volatile int ifindex_in; +const volatile __u8 expect_dst[ETH_ALEN]; +volatile int pkts_seen_xdp = 0; +volatile int pkts_seen_tc = 0; +volatile int retcode = XDP_REDIRECT; + +SEC("xdp") +int xdp_redirect(struct xdp_md *xdp) +{ + __u32 *metadata = (void *)(long)xdp->data_meta; + void *data = (void *)(long)xdp->data; + int ret = retcode; + + if (xdp->ingress_ifindex != ifindex_in) + return XDP_ABORTED; + + if (metadata + 1 > data) + return XDP_ABORTED; + + if (*metadata != 0x42) + return XDP_ABORTED; + + if (bpf_xdp_adjust_meta(xdp, 4)) + return XDP_ABORTED; + + if (retcode > XDP_PASS) + retcode--; + + if (ret == XDP_REDIRECT) + return bpf_redirect(ifindex_out, 0); + + return ret; +} + +static bool check_pkt(void *data, void *data_end) +{ + struct ethhdr *eth = data; + struct ipv6hdr *iph = (void *)(eth + 1); + struct udphdr *udp = (void *)(iph + 1); + __u8 *payload = (void *)(udp + 1); + + if (payload + 1 > data_end) + return false; + + if (iph->nexthdr != IPPROTO_UDP || *payload != 0x42) + return false; + + /* reset the payload so the same packet doesn't get counted twice when + * it cycles back through the kernel path and out the dst veth + */ + *payload = 0; + return true; +} + +SEC("xdp") +int xdp_count_pkts(struct xdp_md *xdp) +{ + void *data = (void *)(long)xdp->data; + void *data_end = (void *)(long)xdp->data_end; + + if (check_pkt(data, data_end)) + pkts_seen_xdp++; + + return XDP_PASS; +} + +SEC("tc") +int tc_count_pkts(struct __sk_buff *skb) +{ + void *data = (void *)(long)skb->data; + void *data_end = (void *)(long)skb->data_end; + + if (check_pkt(data, data_end)) + pkts_seen_tc++; + + return 0; +} + +char _license[] SEC("license") = "GPL";