From patchwork Tue Mar 9 07:29:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Hangbin Liu X-Patchwork-Id: 12124267 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B2ABC433E0 for ; Tue, 9 Mar 2021 07:31:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6233465199 for ; Tue, 9 Mar 2021 07:31:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230148AbhCIHal (ORCPT ); Tue, 9 Mar 2021 02:30:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230266AbhCIHab (ORCPT ); Tue, 9 Mar 2021 02:30:31 -0500 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A126C06174A; Mon, 8 Mar 2021 23:30:31 -0800 (PST) Received: by mail-pj1-x1033.google.com with SMTP id q6-20020a17090a4306b02900c42a012202so4689774pjg.5; Mon, 08 Mar 2021 23:30:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+sheySVEK9Z1Io2UrHFOydpD6doQmnBTSUZe429hMLs=; b=oyo7XgQo7g7lHEXeP/Ve73IyQiVVpDafn6Z/fJTitnHzW2KgapJskC9+hkKBler6fL ZgEqEcX4brhkd6BJ42dF/snJ47yFIQX0QfHT3JfpUZzOf4Q+8tvZHsOxaVXcFy2AIrEy +bvPPlBmrQNJMqQfQXhHXR5q7tOatPlxqeYtrEAYpOfQvDcAQjP27Ha1iFQEIkcxGHm2 EwDrOOJj9e9Y5QSsgukYMDJt82gfIjOmrTxWAVHurg5fr1iYemzpouPpm8ne46HCbuCq /OwK02VeAKlxh8AWb+shZhdDwX6q85htzH05kHFEHd7NpXXN5JYHoPZ6/u9OhVhUpLJ8 Lzhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+sheySVEK9Z1Io2UrHFOydpD6doQmnBTSUZe429hMLs=; b=ZtIknYcMsdKnzqN0WwNaSeG1Sv1YhwvIH6YfkTrTdP4VDmsX0u0cYKnoXqZRFGE3UR sDRvObJYCA9U2iu1VSeIcPivk73x2UwBWUGwwH63MRMjWyU61mesF667AmKlF7qlZyY9 /i6ysLC6oKhXzvpmcDIKZ1gQ3xkhqzSDOz4gjg+ZjSoj6g61E8/7wCpVz3knUlzepvuJ ciqWHu6Nhf4NrDt/yNByYVSVw4yLkIz+AJRr/CWZOb0KkBhb2svbyCkQe6ob2qle7uvK qzAnR2+uQeoHexhTKkoaTOzRncUJ8xIRoDECnVRumDxUcWEr7eZm0c5MuSCeJzL4W98c 6Pnw== X-Gm-Message-State: AOAM530BZwnI4jK0OQRWoyUux2HZ+NBRN5Yu5xmD3vc97k3LdUm8OuOQ HHFXIeJGqga0bbOqpcwu5liy23um/FA= X-Google-Smtp-Source: ABdhPJzn1shhHi7teAZQFY1zuoVw1GzVclk+1e2mTaJRZcLVT2vDgb2g71XyKAMRRLYKinv0W2zu7A== X-Received: by 2002:a17:902:8218:b029:e6:190e:48e with SMTP id x24-20020a1709028218b02900e6190e048emr10383604pln.33.1615275030666; Mon, 08 Mar 2021 23:30:30 -0800 (PST) Received: from Leo-laptop-t470s.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id j3sm11521007pgk.24.2021.03.08.23.30.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Mar 2021 23:30:30 -0800 (PST) From: Hangbin Liu To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Jiri Benc , Jesper Dangaard Brouer , Eelco Chaudron , ast@kernel.org, Daniel Borkmann , Lorenzo Bianconi , David Ahern , Andrii Nakryiko , Alexei Starovoitov , John Fastabend , Maciej Fijalkowski , Hangbin Liu Subject: [PATCH bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue Date: Tue, 9 Mar 2021 15:29:45 +0800 Message-Id: <20210309072948.2127710-2-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210309072948.2127710-1-liuhangbin@gmail.com> References: <20210309072948.2127710-1-liuhangbin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Jesper Dangaard Brouer This changes the devmap XDP program support to run the program when the bulk queue is flushed instead of before the frame is enqueued. This has a couple of benefits: - It "sorts" the packets by destination devmap entry, and then runs the same BPF program on all the packets in sequence. This ensures that we keep the XDP program and destination device properties hot in I-cache. - It makes the multicast implementation simpler because it can just enqueue packets using bq_enqueue() without having to deal with the devmap program at all. The drawback is that if the devmap program drops the packet, the enqueue step is redundant. However, arguably this is mostly visible in a micro-benchmark, and with more mixed traffic the I-cache benefit should win out. The performance impact of just this patch is as follows: The bq_xmit_all's logic is also refactored and error label is removed. When bq_xmit_all() is called from bq_enqueue(), another packet will always be enqueued immediately after, so clearing dev_rx, xdp_prog and flush_node in bq_xmit_all() is redundant. Let's move the clear to __dev_flush(), and only check them once in bq_enqueue() since they are all modified together. By using xdp_redirect_map in sample/bpf and send pkts via pktgen cmd: ./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64 There are about +/- 0.1M deviation for native testing, the performance improved for the base-case, but some drop back with xdp devmap prog attached. Version | Test | Generic | Native | Native + 2nd xdp_prog 5.10 rc6 | xdp_redirect_map i40e->i40e | 2.0M | 9.1M | 8.0M 5.10 rc6 | xdp_redirect_map i40e->veth | 1.7M | 11.0M | 9.7M 5.10 rc6 + patch | xdp_redirect_map i40e->i40e | 2.0M | 9.5M | 7.5M 5.10 rc6 + patch | xdp_redirect_map i40e->veth | 1.7M | 11.6M | 9.1M Reviewed-by: Maciej Fijalkowski Acked-by: Toke Høiland-Jørgensen Acked-by: John Fastabend Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Hangbin Liu --- kernel/bpf/devmap.c | 146 +++++++++++++++++++++++++------------------- 1 file changed, 84 insertions(+), 62 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 85d9d1b72a33..f80cf5036d39 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -57,6 +57,7 @@ struct xdp_dev_bulk_queue { struct list_head flush_node; struct net_device *dev; struct net_device *dev_rx; + struct bpf_prog *xdp_prog; unsigned int count; }; @@ -327,46 +328,92 @@ bool dev_map_can_have_prog(struct bpf_map *map) return false; } +static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog, + struct xdp_frame **frames, int n, + struct net_device *dev) +{ + struct xdp_txq_info txq = { .dev = dev }; + struct xdp_buff xdp; + int i, nframes = 0; + + for (i = 0; i < n; i++) { + struct xdp_frame *xdpf = frames[i]; + u32 act; + int err; + + xdp_convert_frame_to_buff(xdpf, &xdp); + xdp.txq = &txq; + + act = bpf_prog_run_xdp(xdp_prog, &xdp); + switch (act) { + case XDP_PASS: + err = xdp_update_frame_from_buff(&xdp, xdpf); + if (unlikely(err < 0)) + xdp_return_frame_rx_napi(xdpf); + else + frames[nframes++] = xdpf; + break; + default: + bpf_warn_invalid_xdp_action(act); + fallthrough; + case XDP_ABORTED: + trace_xdp_exception(dev, xdp_prog, act); + fallthrough; + case XDP_DROP: + xdp_return_frame_rx_napi(xdpf); + break; + } + } + return nframes; /* sent frames count */ +} + static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) { struct net_device *dev = bq->dev; - int sent = 0, drops = 0, err = 0; + unsigned int cnt = bq->count; + int drops = 0, err = 0; + int to_send = cnt; + int sent = cnt; int i; - if (unlikely(!bq->count)) + if (unlikely(!cnt)) return; - for (i = 0; i < bq->count; i++) { + for (i = 0; i < cnt; i++) { struct xdp_frame *xdpf = bq->q[i]; prefetch(xdpf); } - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags); + if (bq->xdp_prog) { + to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev); + if (!to_send) { + sent = 0; + goto out; + } + drops = cnt - to_send; + } + + sent = dev->netdev_ops->ndo_xdp_xmit(dev, to_send, bq->q, flags); if (sent < 0) { err = sent; sent = 0; - goto error; + + /* If ndo_xdp_xmit fails with an errno, no frames have been + * xmit'ed and it's our responsibility to them free all. + */ + for (i = 0; i < cnt - drops; i++) { + struct xdp_frame *xdpf = bq->q[i]; + + xdp_return_frame_rx_napi(xdpf); + } } - drops = bq->count - sent; out: + drops = cnt - sent; bq->count = 0; trace_xdp_devmap_xmit(bq->dev_rx, dev, sent, drops, err); - bq->dev_rx = NULL; - __list_del_clearprev(&bq->flush_node); return; -error: - /* If ndo_xdp_xmit fails with an errno, no frames have been - * xmit'ed and it's our responsibility to them free all. - */ - for (i = 0; i < bq->count; i++) { - struct xdp_frame *xdpf = bq->q[i]; - - xdp_return_frame_rx_napi(xdpf); - drops++; - } - goto out; } /* __dev_flush is called from xdp_do_flush() which _must_ be signaled @@ -384,8 +431,12 @@ void __dev_flush(void) struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); struct xdp_dev_bulk_queue *bq, *tmp; - list_for_each_entry_safe(bq, tmp, flush_list, flush_node) + list_for_each_entry_safe(bq, tmp, flush_list, flush_node) { bq_xmit_all(bq, XDP_XMIT_FLUSH); + bq->dev_rx = NULL; + bq->xdp_prog = NULL; + __list_del_clearprev(&bq->flush_node); + } } /* rcu_read_lock (from syscall and BPF contexts) ensures that if a delete and/or @@ -408,7 +459,7 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key) * Thus, safe percpu variable access. */ static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, - struct net_device *dev_rx) + struct net_device *dev_rx, struct bpf_prog *xdp_prog) { struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); @@ -419,18 +470,22 @@ static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, /* Ingress dev_rx will be the same for all xdp_frame's in * bulk_queue, because bq stored per-CPU and must be flushed * from net_device drivers NAPI func end. + * + * Do the same with xdp_prog and flush_list since these fields + * are only ever modified together. */ - if (!bq->dev_rx) + if (!bq->dev_rx) { bq->dev_rx = dev_rx; + bq->xdp_prog = xdp_prog; + list_add(&bq->flush_node, flush_list); + } bq->q[bq->count++] = xdpf; - - if (!bq->flush_node.prev) - list_add(&bq->flush_node, flush_list); } static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, - struct net_device *dev_rx) + struct net_device *dev_rx, + struct bpf_prog *xdp_prog) { struct xdp_frame *xdpf; int err; @@ -446,42 +501,14 @@ static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, if (unlikely(!xdpf)) return -EOVERFLOW; - bq_enqueue(dev, xdpf, dev_rx); + bq_enqueue(dev, xdpf, dev_rx, xdp_prog); return 0; } -static struct xdp_buff *dev_map_run_prog(struct net_device *dev, - struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) -{ - struct xdp_txq_info txq = { .dev = dev }; - u32 act; - - xdp_set_data_meta_invalid(xdp); - xdp->txq = &txq; - - act = bpf_prog_run_xdp(xdp_prog, xdp); - switch (act) { - case XDP_PASS: - return xdp; - case XDP_DROP: - break; - default: - bpf_warn_invalid_xdp_action(act); - fallthrough; - case XDP_ABORTED: - trace_xdp_exception(dev, xdp_prog, act); - break; - } - - xdp_return_buff(xdp); - return NULL; -} - int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, struct net_device *dev_rx) { - return __xdp_enqueue(dev, xdp, dev_rx); + return __xdp_enqueue(dev, xdp, dev_rx, NULL); } int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, @@ -489,12 +516,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, { struct net_device *dev = dst->dev; - if (dst->xdp_prog) { - xdp = dev_map_run_prog(dev, xdp, dst->xdp_prog); - if (!xdp) - return 0; - } - return __xdp_enqueue(dev, xdp, dev_rx); + return __xdp_enqueue(dev, xdp, dev_rx, dst->xdp_prog); } int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, From patchwork Tue Mar 9 07:29:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hangbin Liu X-Patchwork-Id: 12124271 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F268C43381 for ; Tue, 9 Mar 2021 07:31:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B4CE3652B7 for ; Tue, 9 Mar 2021 07:31:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230198AbhCIHal (ORCPT ); Tue, 9 Mar 2021 02:30:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230269AbhCIHah (ORCPT ); Tue, 9 Mar 2021 02:30:37 -0500 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05A8EC06174A; Mon, 8 Mar 2021 23:30:37 -0800 (PST) Received: by mail-pf1-x431.google.com with SMTP id x7so5452020pfi.7; Mon, 08 Mar 2021 23:30:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ucfB0blmLkGEM6tg+Rdo3Vq8KNRGcH6GBoVqyR79Dpc=; b=IyXjIjF2HV/k1tkc1OLkSwBzo+mvJMvJu5mis7CT9BfgZ5NwN9upKa7Z41zcQnYg22 fAtswdiOQAY2LvSVu2QwDR1VrXbphWB1TPB5JYEol00nltkjuP34OL/zFHk4s4opLKzW L9iBNXC6oWXIqsqbDP38Cokyjtc1WCoeKkH+lLQRSiEcYNj83JrwsQgDGy9vw5QRqj1I BJ7voiPNwyE+rUhb2eXiZt8uu9gBurxJnLPftvlN1UE0VyLc7g1Kp7atDm5EbK3BzxKA 22UYXRRlPEq7QZfS6FteMuB6gZDHBNRCrOoGnL8GpCmtSL1WKvHkSN7FLJMNKOiQEKbO RQ6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ucfB0blmLkGEM6tg+Rdo3Vq8KNRGcH6GBoVqyR79Dpc=; b=QMV+hUnf0t8vZLFBG0+20y5b/cyh+idjMHZCE/gm/BB9p+yplKoZoilmZQGSg8v2j8 r7eOY6N+s1mLPXBbcLh49nhbez9yts6U3m8iggStAU9BwzwNi8y7fUssyhpAhgZcGGMQ C8FUFy2eehqCUVxi/boB/p9ItAGdGu9lG3ljS9uplArT5V7ewaKKmSpCb/NIS1V/irTm 2n6FQ8glqarzsf0DyF5aJEVIYB00ySQQvhsOepVjrFAy8XlZsalf5J63RMuFKnnxDmDO Z7aUKapzweZbGLGeOQtRBdOny0t6FVKhkGmzddrmYlDTTyxOFOSS7tSd/6wCByOyqDA9 h9FQ== X-Gm-Message-State: AOAM533SrWt0o5gaGujKprO3l5rdiiV16wxS5PIf8/OvERRe5epXR/D/ b43plDohU1OAONJ7RqYgYzH6Mt2XaTw= X-Google-Smtp-Source: ABdhPJxbmc01HcmpCrJ89gwsO1ib3WmmrfKMAC92M4QpkEkY2tDjbc0il8qzVLXBJdZYjsJsHbST+Q== X-Received: by 2002:a62:5302:0:b029:1ee:c519:8cdd with SMTP id h2-20020a6253020000b02901eec5198cddmr24874920pfb.79.1615275036103; Mon, 08 Mar 2021 23:30:36 -0800 (PST) Received: from Leo-laptop-t470s.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id j3sm11521007pgk.24.2021.03.08.23.30.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Mar 2021 23:30:35 -0800 (PST) From: Hangbin Liu To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Jiri Benc , Jesper Dangaard Brouer , Eelco Chaudron , ast@kernel.org, Daniel Borkmann , Lorenzo Bianconi , David Ahern , Andrii Nakryiko , Alexei Starovoitov , John Fastabend , Maciej Fijalkowski , Hangbin Liu Subject: [PATCH bpf-next 2/4] xdp: extend xdp_redirect_map with broadcast support Date: Tue, 9 Mar 2021 15:29:46 +0800 Message-Id: <20210309072948.2127710-3-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210309072948.2127710-1-liuhangbin@gmail.com> References: <20210309072948.2127710-1-liuhangbin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch add two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to extend xdp_redirect_map for broadcast support. Keep the general data path in net/core/filter.c and the native data path in kernel/bpf/devmap.c so we can use direct calls to get better performace. Here is the performance result by using xdp_redirect_{map, map_multi} in sample/bpf and send pkts via pktgen cmd: ./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64 There are some drop back as we need to loop the map and get each interface. Version | Test | Generic | Native 5.11 | redirect_map i40e->i40e | 1.9M | 9.3M 5.11 | redirect_map i40e->veth | 1.5M | 11.2M 5.11 + patch | redirect_map i40e->i40e | 1.9M | 9.6M 5.11 + patch | redirect_map i40e->veth | 1.5M | 11.9M 5.11 + patch | redirect_map_multi i40e->i40e | 1.5M | 7.7M 5.11 + patch | redirect_map_multi i40e->veth | 1.2M | 9.1M 5.11 + patch | redirect_map_multi i40e->mlx4+veth | 0.9M | 3.2M Signed-off-by: Hangbin Liu Reported-by: kernel test robot --- include/linux/bpf.h | 16 +++++ include/net/xdp.h | 1 + include/uapi/linux/bpf.h | 16 ++++- kernel/bpf/devmap.c | 119 +++++++++++++++++++++++++++++++++ net/core/filter.c | 74 ++++++++++++++++++-- net/core/xdp.c | 29 ++++++++ tools/include/uapi/linux/bpf.h | 16 ++++- 7 files changed, 260 insertions(+), 11 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c931bc97019d..bb07ccd170f2 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1458,6 +1458,9 @@ int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, struct net_device *dev_rx); int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, struct net_device *dev_rx); +bool dst_dev_is_ingress(struct bpf_dtab_netdev *obj, int ifindex); +int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, + struct bpf_map *map, bool exclude_ingress); int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, struct bpf_prog *xdp_prog); bool dev_map_can_have_prog(struct bpf_map *map); @@ -1630,6 +1633,19 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, return 0; } +static inline +bool dst_dev_is_ingress(struct bpf_dtab_netdev *obj, int ifindex) +{ + return false; +} + +static inline +int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, + struct bpf_map *map, bool exclude_ingress) +{ + return 0; +} + struct sk_buff; static inline int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, diff --git a/include/net/xdp.h b/include/net/xdp.h index a5bc214a49d9..5533f0ab2afc 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -170,6 +170,7 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, struct sk_buff *xdp_build_skb_from_frame(struct xdp_frame *xdpf, struct net_device *dev); int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp); +struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf); static inline void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 2d3036e292a9..4cca312f80ca 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2508,8 +2508,12 @@ union bpf_attr { * The lower two bits of *flags* are used as the return code if * the map lookup fails. This is so that the return value can be * one of the XDP program return codes up to **XDP_TX**, as chosen - * by the caller. Any higher bits in the *flags* argument must be - * unset. + * by the caller. The higher bits of *flags* can be set to + * BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS as defined below. + * + * With BPF_F_BROADCAST the packet will be broadcasted to all the + * interfaces in the map. with BPF_F_EXCLUDE_INGRESS the ingress + * interface will be excluded when do broadcasting. * * See also **bpf_redirect**\ (), which only supports redirecting * to an ifindex, but doesn't require a map to do so. @@ -5004,6 +5008,14 @@ enum { BPF_F_BPRM_SECUREEXEC = (1ULL << 0), }; +/* Flags for bpf_redirect_map helper */ +enum { + BPF_F_BROADCAST = (1ULL << 3), + BPF_F_EXCLUDE_INGRESS = (1ULL << 4), +}; +#define BPF_F_ACTION_MASK (XDP_ABORTED | XDP_DROP | XDP_PASS | XDP_TX) +#define BPF_F_REDIR_MASK (BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS) + #define __bpf_md_ptr(type, name) \ union { \ type name; \ diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index f80cf5036d39..ad616a043d2a 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -519,6 +519,125 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, return __xdp_enqueue(dev, xdp, dev_rx, dst->xdp_prog); } +/* Use direct call in fast path instead of map->ops->map_get_next_key() */ +static int devmap_get_next_key(struct bpf_map *map, void *key, void *next_key) +{ + switch (map->map_type) { + case BPF_MAP_TYPE_DEVMAP: + return dev_map_get_next_key(map, key, next_key); + case BPF_MAP_TYPE_DEVMAP_HASH: + return dev_map_hash_get_next_key(map, key, next_key); + default: + break; + } + + return -ENOENT; +} + +bool dst_dev_is_ingress(struct bpf_dtab_netdev *dst, int ifindex) +{ + return dst->dev->ifindex == ifindex; +} + +static struct bpf_dtab_netdev *devmap_get_next_obj(struct xdp_buff *xdp, + struct bpf_map *map, + u32 *key, u32 *next_key, + int ex_ifindex) +{ + struct bpf_dtab_netdev *obj; + struct net_device *dev; + u32 *tmp_key = key; + u32 index; + int err; + + err = devmap_get_next_key(map, tmp_key, next_key); + if (err) + return NULL; + + /* When using dev map hash, we could restart the hashtab traversal + * in case the key has been updated/removed in the mean time. + * So we may end up potentially looping due to traversal restarts + * from first elem. + * + * Let's use map's max_entries to limit the loop number. + */ + for (index = 0; index < map->max_entries; index++) { + switch (map->map_type) { + case BPF_MAP_TYPE_DEVMAP: + obj = __dev_map_lookup_elem(map, *next_key); + break; + case BPF_MAP_TYPE_DEVMAP_HASH: + obj = __dev_map_hash_lookup_elem(map, *next_key); + break; + default: + break; + } + + if (!obj || dst_dev_is_ingress(obj, ex_ifindex)) + goto find_next; + + dev = obj->dev; + + if (!dev->netdev_ops->ndo_xdp_xmit) + goto find_next; + + err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data); + if (unlikely(err)) + goto find_next; + + return obj; + +find_next: + tmp_key = next_key; + err = devmap_get_next_key(map, tmp_key, next_key); + if (err) + break; + } + + return NULL; +} + +int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, + struct bpf_map *map, bool exclude_ingress) +{ + struct bpf_dtab_netdev *obj = NULL, *next_obj = NULL; + struct xdp_frame *xdpf, *nxdpf; + int ex_ifindex; + u32 key, next_key; + + ex_ifindex = exclude_ingress ? dev_rx->ifindex : 0; + + /* Find first available obj */ + obj = devmap_get_next_obj(xdp, map, NULL, &key, ex_ifindex); + if (!obj) + return -ENOENT; + + xdpf = xdp_convert_buff_to_frame(xdp); + if (unlikely(!xdpf)) + return -EOVERFLOW; + + for (;;) { + /* Check if we still have one more available obj */ + next_obj = devmap_get_next_obj(xdp, map, &key, &next_key, ex_ifindex); + if (!next_obj) { + bq_enqueue(obj->dev, xdpf, dev_rx, obj->xdp_prog); + return 0; + } + + nxdpf = xdpf_clone(xdpf); + if (unlikely(!nxdpf)) { + xdp_return_frame_rx_napi(xdpf); + return -ENOMEM; + } + + bq_enqueue(obj->dev, nxdpf, dev_rx, obj->xdp_prog); + + /* Deal with next obj */ + obj = next_obj; + key = next_key; + } +} + int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, struct bpf_prog *xdp_prog) { diff --git a/net/core/filter.c b/net/core/filter.c index 588b19ba0da8..b191718186ef 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3919,12 +3919,17 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { }; static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, - struct bpf_map *map, struct xdp_buff *xdp) + struct bpf_map *map, struct xdp_buff *xdp, + u32 flags) { switch (map->map_type) { case BPF_MAP_TYPE_DEVMAP: case BPF_MAP_TYPE_DEVMAP_HASH: - return dev_map_enqueue(fwd, xdp, dev_rx); + if (flags & BPF_F_REDIR_BROADCAST) + return dev_map_enqueue_multi(xdp, dev_rx, map, + flags & BPF_F_REDIR_EXCLUDE_INGRESS); + else + return dev_map_enqueue(fwd, xdp, dev_rx); case BPF_MAP_TYPE_CPUMAP: return cpu_map_enqueue(fwd, xdp, dev_rx); case BPF_MAP_TYPE_XSKMAP: @@ -3998,7 +4003,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, err = dev_xdp_enqueue(fwd, xdp, dev); } else { - err = __bpf_tx_xdp_map(dev, fwd, map, xdp); + err = __bpf_tx_xdp_map(dev, fwd, map, xdp, ri->flags); } if (unlikely(err)) @@ -4012,6 +4017,57 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, } EXPORT_SYMBOL_GPL(xdp_do_redirect); +static int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, + struct bpf_prog *xdp_prog, struct bpf_map *map, + bool exclude_ingress) +{ + struct bpf_dtab_netdev *dst; + u32 key, next_key, index; + struct sk_buff *nskb; + void *fwd; + int err; + + err = map->ops->map_get_next_key(map, NULL, &key); + if (err) + return err; + + /* When using dev map hash, we could restart the hashtab traversal + * in case the key has been updated/removed in the mean time. + * So we may end up potentially looping due to traversal restarts + * from first elem. + * + * Let's use map's max_entries to limit the loop number. + */ + + for (index = 0; index < map->max_entries; index++) { + fwd = __xdp_map_lookup_elem(map, key); + if (fwd) { + dst = (struct bpf_dtab_netdev *)fwd; + if (dst_dev_is_ingress(dst, exclude_ingress ? dev->ifindex : 0)) + goto find_next; + + nskb = skb_clone(skb, GFP_ATOMIC); + if (!nskb) + return -ENOMEM; + + /* Try forword next one no mater the current forward + * succeed or not. + */ + dev_map_generic_redirect(dst, nskb, xdp_prog); + } + +find_next: + err = map->ops->map_get_next_key(map, &key, &next_key); + if (err) + break; + + key = next_key; + } + + consume_skb(skb); + return 0; +} + static int xdp_do_generic_redirect_map(struct net_device *dev, struct sk_buff *skb, struct xdp_buff *xdp, @@ -4031,7 +4087,11 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) { struct bpf_dtab_netdev *dst = fwd; - err = dev_map_generic_redirect(dst, skb, xdp_prog); + if (ri->flags & BPF_F_REDIR_BROADCAST) + err = dev_map_redirect_multi(dev, skb, xdp_prog, map, + ri->flags & BPF_F_REDIR_EXCLUDE_INGRESS); + else + err = dev_map_generic_redirect(dst, skb, xdp_prog); if (unlikely(err)) goto err; } else if (map->map_type == BPF_MAP_TYPE_XSKMAP) { @@ -4115,18 +4175,18 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); /* Lower bits of the flags are used as return code on lookup failure */ - if (unlikely(flags > XDP_TX)) + if (unlikely(flags & ~(BPF_F_ACTION_MASK | BPF_F_REDIR_MASK))) return XDP_ABORTED; ri->tgt_value = __xdp_map_lookup_elem(map, ifindex); - if (unlikely(!ri->tgt_value)) { + if (unlikely(!ri->tgt_value) && !(flags & BPF_F_REDIR_BROADCAST)) { /* If the lookup fails we want to clear out the state in the * redirect_info struct completely, so that if an eBPF program * performs multiple lookups, the last one always takes * precedence. */ WRITE_ONCE(ri->map, NULL); - return flags; + return flags & BPF_F_ACTION_MASK; } ri->flags = flags; diff --git a/net/core/xdp.c b/net/core/xdp.c index 05354976c1fc..aba84d04642b 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -583,3 +583,32 @@ struct sk_buff *xdp_build_skb_from_frame(struct xdp_frame *xdpf, return __xdp_build_skb_from_frame(xdpf, skb, dev); } EXPORT_SYMBOL_GPL(xdp_build_skb_from_frame); + +struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf) +{ + unsigned int headroom, totalsize; + struct xdp_frame *nxdpf; + struct page *page; + void *addr; + + headroom = xdpf->headroom + sizeof(*xdpf); + totalsize = headroom + xdpf->len; + + if (unlikely(totalsize > PAGE_SIZE)) + return NULL; + page = dev_alloc_page(); + if (!page) + return NULL; + addr = page_to_virt(page); + + memcpy(addr, xdpf, totalsize); + + nxdpf = addr; + nxdpf->data = addr + headroom; + nxdpf->frame_sz = PAGE_SIZE; + nxdpf->mem.type = MEM_TYPE_PAGE_ORDER0; + nxdpf->mem.id = 0; + + return nxdpf; +} +EXPORT_SYMBOL_GPL(xdpf_clone); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 2d3036e292a9..4cca312f80ca 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2508,8 +2508,12 @@ union bpf_attr { * The lower two bits of *flags* are used as the return code if * the map lookup fails. This is so that the return value can be * one of the XDP program return codes up to **XDP_TX**, as chosen - * by the caller. Any higher bits in the *flags* argument must be - * unset. + * by the caller. The higher bits of *flags* can be set to + * BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS as defined below. + * + * With BPF_F_BROADCAST the packet will be broadcasted to all the + * interfaces in the map. with BPF_F_EXCLUDE_INGRESS the ingress + * interface will be excluded when do broadcasting. * * See also **bpf_redirect**\ (), which only supports redirecting * to an ifindex, but doesn't require a map to do so. @@ -5004,6 +5008,14 @@ enum { BPF_F_BPRM_SECUREEXEC = (1ULL << 0), }; +/* Flags for bpf_redirect_map helper */ +enum { + BPF_F_BROADCAST = (1ULL << 3), + BPF_F_EXCLUDE_INGRESS = (1ULL << 4), +}; +#define BPF_F_ACTION_MASK (XDP_ABORTED | XDP_DROP | XDP_PASS | XDP_TX) +#define BPF_F_REDIR_MASK (BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS) + #define __bpf_md_ptr(type, name) \ union { \ type name; \ From patchwork Tue Mar 9 07:29:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hangbin Liu X-Patchwork-Id: 12124275 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66502C4332B for ; Tue, 9 Mar 2021 07:31:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 41355652B6 for ; Tue, 9 Mar 2021 07:31:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230047AbhCIHbM (ORCPT ); Tue, 9 Mar 2021 02:31:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229607AbhCIHam (ORCPT ); Tue, 9 Mar 2021 02:30:42 -0500 Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FECEC06174A; Mon, 8 Mar 2021 23:30:42 -0800 (PST) Received: by mail-pg1-x52f.google.com with SMTP id n10so8177950pgl.10; Mon, 08 Mar 2021 23:30:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MprfcIK4cl1HUWnXo0IhEBI6LPXX2CStbuzUiBcOqfI=; b=KCtQGvdRjkpLtK3KoQp9KfEcKI9mSmmKQLX9o//dnqEP4fhsFEEZddd07T2BPqYdgm N2S1Lr6SakMyXpVPc3WdtBW7rfoxTsV2E9PVHeFHYNz+ZrLkgieT0jJacxnFLYyI2wWj HMp3rblvx6jf+tOb11bOTFE611HF9VWYSVPC29oxuUTkNVAeM898wBiRGv2DwVzOLewr LICoSXqohbpXXhNDCdCrPkxkJLQrXyydiYfsd1YqvAnrwNyTIdtKJJ1ckN3BlZFwwdtY pOkRa9ecQOkqnSl66Bu3QiSLJK16XgCQTVFGDaX+gGdv76yuAWU1bmXwIIN2ihHa+lJY rfEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MprfcIK4cl1HUWnXo0IhEBI6LPXX2CStbuzUiBcOqfI=; b=ouo2NNam3yBNRoonPZEg7UAIMl5uwutmFJQUmYfYksUEOmDQUAT3GPEIBWvh6Q5DmU dNRpwTt37Lj6fxN+GjrNrwQejKCycJA79ducnm+vXgSER5qwVl2+0Q0AB9pZXCyvs/so udMGB9A9Hp1b3plNpKmj/T+x7+nVAZBwZEFFSwroJyGmN6ciS3tPvQZSR9cdEJTtLzrt RWTaE40U/8dVLQ7QiS7hbTGhTanDrrMCEtAQ6DR9OIsjW+5346cTM3KQlG9Bd52sDnPK hh451nG5vSbN1TwAej9LOfFKYKvJc+YrQN8BWkjx1nYDUcCxfsJ6CCjPMc7uGAMtTBTe Dhgw== X-Gm-Message-State: AOAM532sFSXpdQFRayGCLMpFaEqmPyzR/+ksqzYYCglITGHH54KpRtjU vMclLVe4m2A9fiRexfGPCUaLmLs39Y0= X-Google-Smtp-Source: ABdhPJwvMtTctnb8hv4S9e1XJj5PV/zCNOXOyf5v+rkQ6Qv83kdsnTQDdJGCa9Q5r58XpC8LjNhaag== X-Received: by 2002:a63:751c:: with SMTP id q28mr3924718pgc.352.1615275041427; Mon, 08 Mar 2021 23:30:41 -0800 (PST) Received: from Leo-laptop-t470s.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id j3sm11521007pgk.24.2021.03.08.23.30.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Mar 2021 23:30:41 -0800 (PST) From: Hangbin Liu To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Jiri Benc , Jesper Dangaard Brouer , Eelco Chaudron , ast@kernel.org, Daniel Borkmann , Lorenzo Bianconi , David Ahern , Andrii Nakryiko , Alexei Starovoitov , John Fastabend , Maciej Fijalkowski , Hangbin Liu Subject: [PATCH bpf-next 3/4] sample/bpf: add xdp_redirect_map_multi for redirect_map broadcast test Date: Tue, 9 Mar 2021 15:29:47 +0800 Message-Id: <20210309072948.2127710-4-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210309072948.2127710-1-liuhangbin@gmail.com> References: <20210309072948.2127710-1-liuhangbin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This is a sample for xdp redirect broadcast. In the sample we could forward all packets between given interfaces. There is also an option -X that could enable 2nd xdp_prog on egress interface. Signed-off-by: Hangbin Liu --- samples/bpf/Makefile | 3 + samples/bpf/xdp_redirect_map_multi_kern.c | 87 +++++++ samples/bpf/xdp_redirect_map_multi_user.c | 302 ++++++++++++++++++++++ 3 files changed, 392 insertions(+) create mode 100644 samples/bpf/xdp_redirect_map_multi_kern.c create mode 100644 samples/bpf/xdp_redirect_map_multi_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 45ceca4e2c70..520434ea966f 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -41,6 +41,7 @@ tprogs-y += test_map_in_map tprogs-y += per_socket_stats_example tprogs-y += xdp_redirect tprogs-y += xdp_redirect_map +tprogs-y += xdp_redirect_map_multi tprogs-y += xdp_redirect_cpu tprogs-y += xdp_monitor tprogs-y += xdp_rxq_info @@ -99,6 +100,7 @@ test_map_in_map-objs := test_map_in_map_user.o per_socket_stats_example-objs := cookie_uid_helper_example.o xdp_redirect-objs := xdp_redirect_user.o xdp_redirect_map-objs := xdp_redirect_map_user.o +xdp_redirect_map_multi-objs := xdp_redirect_map_multi_user.o xdp_redirect_cpu-objs := xdp_redirect_cpu_user.o xdp_monitor-objs := xdp_monitor_user.o xdp_rxq_info-objs := xdp_rxq_info_user.o @@ -160,6 +162,7 @@ always-y += tcp_tos_reflect_kern.o always-y += tcp_dumpstats_kern.o always-y += xdp_redirect_kern.o always-y += xdp_redirect_map_kern.o +always-y += xdp_redirect_map_multi_kern.o always-y += xdp_redirect_cpu_kern.o always-y += xdp_monitor_kern.o always-y += xdp_rxq_info_kern.o diff --git a/samples/bpf/xdp_redirect_map_multi_kern.c b/samples/bpf/xdp_redirect_map_multi_kern.c new file mode 100644 index 000000000000..e6be70225ee1 --- /dev/null +++ b/samples/bpf/xdp_redirect_map_multi_kern.c @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0 +#define KBUILD_MODNAME "foo" +#include +#include +#include +#include +#include +#include + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(int)); + __uint(max_entries, 32); +} forward_map_general SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(struct bpf_devmap_val)); + __uint(max_entries, 32); +} forward_map_native SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); + __type(key, u32); + __type(value, long); + __uint(max_entries, 1); +} rxcnt SEC(".maps"); + +/* map to store egress interfaces mac addresses, set the + * max_entries to 1 and extend it in user sapce prog. + */ +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, u32); + __type(value, __be64); + __uint(max_entries, 1); +} mac_map SEC(".maps"); + +static int xdp_redirect_map(struct xdp_md *ctx, void *forward_map) +{ + long *value; + u32 key = 0; + + /* count packet in global counter */ + value = bpf_map_lookup_elem(&rxcnt, &key); + if (value) + *value += 1; + + return bpf_redirect_map(forward_map, key, BPF_F_REDIR_MASK); +} + +SEC("xdp_redirect_general") +int xdp_redirect_map_general(struct xdp_md *ctx) +{ + return xdp_redirect_map(ctx, &forward_map_general); +} + +SEC("xdp_redirect_native") +int xdp_redirect_map_native(struct xdp_md *ctx) +{ + return xdp_redirect_map(ctx, &forward_map_native); +} + +SEC("xdp_devmap/map_prog") +int xdp_devmap_prog(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + u32 key = ctx->egress_ifindex; + struct ethhdr *eth = data; + __be64 *mac; + u64 nh_off; + + nh_off = sizeof(*eth); + if (data + nh_off > data_end) + return XDP_DROP; + + mac = bpf_map_lookup_elem(&mac_map, &key); + if (mac) + __builtin_memcpy(eth->h_source, mac, ETH_ALEN); + + return XDP_PASS; +} + +char _license[] SEC("license") = "GPL"; diff --git a/samples/bpf/xdp_redirect_map_multi_user.c b/samples/bpf/xdp_redirect_map_multi_user.c new file mode 100644 index 000000000000..84cdbbed20b7 --- /dev/null +++ b/samples/bpf/xdp_redirect_map_multi_user.c @@ -0,0 +1,302 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "bpf_util.h" +#include +#include + +#define MAX_IFACE_NUM 32 + +static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST; +static int ifaces[MAX_IFACE_NUM] = {}; +static int rxcnt_map_fd; + +static void int_exit(int sig) +{ + __u32 prog_id = 0; + int i; + + for (i = 0; ifaces[i] > 0; i++) { + if (bpf_get_link_xdp_id(ifaces[i], &prog_id, xdp_flags)) { + printf("bpf_get_link_xdp_id failed\n"); + exit(1); + } + if (prog_id) + bpf_set_link_xdp_fd(ifaces[i], -1, xdp_flags); + } + + exit(0); +} + +static void poll_stats(int interval) +{ + unsigned int nr_cpus = bpf_num_possible_cpus(); + __u64 values[nr_cpus], prev[nr_cpus]; + + memset(prev, 0, sizeof(prev)); + + while (1) { + __u64 sum = 0; + __u32 key = 0; + int i; + + sleep(interval); + assert(bpf_map_lookup_elem(rxcnt_map_fd, &key, values) == 0); + for (i = 0; i < nr_cpus; i++) + sum += (values[i] - prev[i]); + if (sum) + printf("Forwarding %10llu pkt/s\n", sum / interval); + memcpy(prev, values, sizeof(values)); + } +} + +static int get_mac_addr(unsigned int ifindex, void *mac_addr) +{ + char ifname[IF_NAMESIZE]; + struct ifreq ifr; + int fd, ret = -1; + + fd = socket(AF_INET, SOCK_DGRAM, 0); + if (fd < 0) + return ret; + + if (!if_indextoname(ifindex, ifname)) + goto err_out; + + strcpy(ifr.ifr_name, ifname); + + if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0) + goto err_out; + + memcpy(mac_addr, ifr.ifr_hwaddr.sa_data, 6 * sizeof(char)); + ret = 0; + +err_out: + close(fd); + return ret; +} + +static int update_mac_map(struct bpf_object *obj) +{ + int i, ret = -1, mac_map_fd; + unsigned char mac_addr[6]; + unsigned int ifindex; + + mac_map_fd = bpf_object__find_map_fd_by_name(obj, "mac_map"); + if (mac_map_fd < 0) { + printf("find mac map fd failed\n"); + return ret; + } + + for (i = 0; ifaces[i] > 0; i++) { + ifindex = ifaces[i]; + + ret = get_mac_addr(ifindex, mac_addr); + if (ret < 0) { + printf("get interface %d mac failed\n", ifindex); + return ret; + } + + ret = bpf_map_update_elem(mac_map_fd, &ifindex, mac_addr, 0); + if (ret) { + perror("bpf_update_elem mac_map_fd"); + return ret; + } + } + + return 0; +} + +static void usage(const char *prog) +{ + fprintf(stderr, + "usage: %s [OPTS] ...\n" + "OPTS:\n" + " -S use skb-mode\n" + " -N enforce native mode\n" + " -F force loading prog\n" + " -X load xdp program on egress\n", + prog); +} + +int main(int argc, char **argv) +{ + int i, ret, opt, forward_map_fd, max_ifindex = 0; + struct bpf_program *ingress_prog, *egress_prog; + int ingress_prog_fd, egress_prog_fd = 0; + struct bpf_devmap_val devmap_val; + bool attach_egress_prog = false; + char ifname[IF_NAMESIZE]; + struct bpf_map *mac_map; + struct bpf_object *obj; + unsigned int ifindex; + char filename[256]; + + while ((opt = getopt(argc, argv, "SNFX")) != -1) { + switch (opt) { + case 'S': + xdp_flags |= XDP_FLAGS_SKB_MODE; + break; + case 'N': + /* default, set below */ + break; + case 'F': + xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST; + break; + case 'X': + attach_egress_prog = true; + break; + default: + usage(basename(argv[0])); + return 1; + } + } + + if (!(xdp_flags & XDP_FLAGS_SKB_MODE)) { + xdp_flags |= XDP_FLAGS_DRV_MODE; + } else if (attach_egress_prog) { + printf("Load xdp program on egress with SKB mode not supported yet\n"); + return 1; + } + + if (optind == argc) { + printf("usage: %s ...\n", argv[0]); + return 1; + } + + printf("Get interfaces"); + for (i = 0; i < MAX_IFACE_NUM && argv[optind + i]; i++) { + ifaces[i] = if_nametoindex(argv[optind + i]); + if (!ifaces[i]) + ifaces[i] = strtoul(argv[optind + i], NULL, 0); + if (!if_indextoname(ifaces[i], ifname)) { + perror("Invalid interface name or i"); + return 1; + } + + /* Find the largest index number */ + if (ifaces[i] > max_ifindex) + max_ifindex = ifaces[i]; + + printf(" %d", ifaces[i]); + } + printf("\n"); + + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); + + obj = bpf_object__open(filename); + if (libbpf_get_error(obj)) { + printf("ERROR: opening BPF object file failed\n"); + obj = NULL; + goto err_out; + } + + /* Reset the map size to max ifindex + 1 */ + if (attach_egress_prog) { + mac_map = bpf_object__find_map_by_name(obj, "mac_map"); + ret = bpf_map__resize(mac_map, max_ifindex + 1); + if (ret < 0) { + printf("ERROR: reset mac map size failed\n"); + goto err_out; + } + } + + /* load BPF program */ + if (bpf_object__load(obj)) { + printf("ERROR: loading BPF object file failed\n"); + goto err_out; + } + + if (xdp_flags & XDP_FLAGS_SKB_MODE) { + ingress_prog = bpf_object__find_program_by_name(obj, "xdp_redirect_map_general"); + forward_map_fd = bpf_object__find_map_fd_by_name(obj, "forward_map_general"); + } else { + ingress_prog = bpf_object__find_program_by_name(obj, "xdp_redirect_map_native"); + forward_map_fd = bpf_object__find_map_fd_by_name(obj, "forward_map_native"); + } + if (!ingress_prog || forward_map_fd < 0) { + printf("finding ingress_prog/forward_map in obj file failed\n"); + goto err_out; + } + + ingress_prog_fd = bpf_program__fd(ingress_prog); + if (ingress_prog_fd < 0) { + printf("find ingress_prog fd failed\n"); + goto err_out; + } + + rxcnt_map_fd = bpf_object__find_map_fd_by_name(obj, "rxcnt"); + if (rxcnt_map_fd < 0) { + printf("bpf_object__find_map_fd_by_name failed\n"); + goto err_out; + } + + if (attach_egress_prog) { + /* Update mac_map with all egress interfaces' mac addr */ + if (update_mac_map(obj) < 0) { + printf("Error: update mac map failed"); + goto err_out; + } + + /* Find egress prog fd */ + egress_prog = bpf_object__find_program_by_name(obj, "xdp_devmap_prog"); + if (!egress_prog) { + printf("finding egress_prog in obj file failed\n"); + goto err_out; + } + egress_prog_fd = bpf_program__fd(egress_prog); + if (egress_prog_fd < 0) { + printf("find egress_prog fd failed\n"); + goto err_out; + } + } + + /* Remove attached program when program is interrupted or killed */ + signal(SIGINT, int_exit); + signal(SIGTERM, int_exit); + + /* Init forward multicast groups */ + for (i = 0; ifaces[i] > 0; i++) { + ifindex = ifaces[i]; + + /* bind prog_fd to each interface */ + ret = bpf_set_link_xdp_fd(ifindex, ingress_prog_fd, xdp_flags); + if (ret) { + printf("Set xdp fd failed on %d\n", ifindex); + goto err_out; + } + + /* Add all the interfaces to forward group and attach + * egress devmap programe if exist + */ + devmap_val.ifindex = ifindex; + devmap_val.bpf_prog.fd = egress_prog_fd; + ret = bpf_map_update_elem(forward_map_fd, &ifindex, &devmap_val, 0); + if (ret) { + perror("bpf_map_update_elem forward_map"); + goto err_out; + } + } + + poll_stats(2); + + return 0; + +err_out: + return 1; +} From patchwork Tue Mar 9 07:29:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hangbin Liu X-Patchwork-Id: 12124273 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09BF2C433E9 for ; Tue, 9 Mar 2021 07:31:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DADB26520C for ; Tue, 9 Mar 2021 07:31:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230269AbhCIHbN (ORCPT ); Tue, 9 Mar 2021 02:31:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230285AbhCIHar (ORCPT ); Tue, 9 Mar 2021 02:30:47 -0500 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C054DC06174A; Mon, 8 Mar 2021 23:30:47 -0800 (PST) Received: by mail-pf1-x435.google.com with SMTP id t29so8768274pfg.11; Mon, 08 Mar 2021 23:30:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kHrqjlvxbU7kze/xntv5k8iOR3RHcsQUEj6Atilesjg=; b=tIHQvHC8eDOAfYTnTV4DLGlW+WIM6ccOU2Rh5QNW86rKqh0BMXeshhCVbjqQks9QZO trbckBaCFHLJ9VnApZjJQi+wpDLCVs8sNCl8Q+cqrm8eDPa45mBRzdpDI8U3zgOwlNOp f43sXPSBgFLDybQNZgUEXifEiRSKd28ibjcdUJCQbxwnHjm9BDVV3oW57w3wDu3lXMXV cVqkinwlkX6I96qcv+K0NpH+/6futXpBf3YEDu99EAKooEn1a6dsHi3l1cQrJUGepoGK WnS2Pe96dRXwA+r9kAdweIk1YIJTyoWejM6pRnmluT5PGeLzL/FTfA2k5IupZOFJ8Dt4 uYyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kHrqjlvxbU7kze/xntv5k8iOR3RHcsQUEj6Atilesjg=; b=QmMrF3BUuRMOfGv75huge3dyAogLcaQxXZBNXbyG1GWKonabx7x+doOXTwN3QQqpvX uDqlIIDQK6rnM4SMydFJaCnizbYy96fTxsWzszJn4Sd+CxM1ZdAV0+FV7uBheKtLlnv/ x2Hvh8pBWS7NFeLKmHNEhiAuWElMgSRP0qHYwhm56KQSuhxbThAuELWTB288l1o3A7GP 8es7Pwj3MLr/E38627rcEzPZTL3aTZ63lpLsfKvFrgMfQ8Z7JtwhlMPMir+juImCO82o 4Ld+flWSSaFYvmTiDrvpo1Dt2+KEschyjUr4D7CEfslT3ifP7BTRM6Vn3mfGF33+W2Li 5jHw== X-Gm-Message-State: AOAM532BknhS5pqSQ+Gh5j4SEtwF3OLbmUhNy0p0ad1wfEG059B5B5ge MWRw9H/n1sTlc9EebNaeKKqjLNGpznI= X-Google-Smtp-Source: ABdhPJx6m2nDjXrmjiyEYpEL/71EfCCfSh/v9EKHM9mlb3kEE/xqIzT234yFUEWz6jQnWIBEHHlotw== X-Received: by 2002:a63:520e:: with SMTP id g14mr5101406pgb.350.1615275046504; Mon, 08 Mar 2021 23:30:46 -0800 (PST) Received: from Leo-laptop-t470s.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id j3sm11521007pgk.24.2021.03.08.23.30.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Mar 2021 23:30:46 -0800 (PST) From: Hangbin Liu To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Jiri Benc , Jesper Dangaard Brouer , Eelco Chaudron , ast@kernel.org, Daniel Borkmann , Lorenzo Bianconi , David Ahern , Andrii Nakryiko , Alexei Starovoitov , John Fastabend , Maciej Fijalkowski , Hangbin Liu Subject: [PATCH bpf-next 4/4] selftests/bpf: add xdp_redirect_multi test Date: Tue, 9 Mar 2021 15:29:48 +0800 Message-Id: <20210309072948.2127710-5-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210309072948.2127710-1-liuhangbin@gmail.com> References: <20210309072948.2127710-1-liuhangbin@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Add a bpf selftest for new helper xdp_redirect_map_multi(). In this test there are 3 forward groups and 1 exclude group. The test will redirect each interface's packets to all the interfaces in the forward group, and exclude the interface in exclude map. Two maps (DEVMAP, DEVMAP_HASH) and two xdp modes (generic, drive) will be tested. XDP egress program will also be tested by setting pkt src MAC to egress interface's MAC address. For more test details, you can find it in the test script. Here is the test result. ]# ./test_xdp_redirect_multi.sh Pass: xdpgeneric arp ns1-2 Pass: xdpgeneric arp ns1-3 Pass: xdpgeneric arp ns1-4 Pass: xdpgeneric ping ns1-2 Pass: xdpgeneric ping ns1-3 Pass: xdpgeneric ping ns1-4 Pass: xdpdrv arp ns1-2 Pass: xdpdrv arp ns1-3 Pass: xdpdrv arp ns1-4 Pass: xdpdrv ping ns1-2 Pass: xdpdrv ping ns1-3 Pass: xdpdrv ping ns1-4 Pass: xdpegress mac ns1-2 Pass: xdpegress mac ns1-3 Pass: xdpegress mac ns1-4 Summary: PASS 15, FAIL 0 Signed-off-by: Hangbin Liu --- tools/testing/selftests/bpf/Makefile | 3 +- .../bpf/progs/xdp_redirect_multi_kern.c | 96 +++++++ .../selftests/bpf/test_xdp_redirect_multi.sh | 187 ++++++++++++++ .../selftests/bpf/xdp_redirect_multi.c | 236 ++++++++++++++++++ 4 files changed, 521 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c create mode 100755 tools/testing/selftests/bpf/test_xdp_redirect_multi.sh create mode 100644 tools/testing/selftests/bpf/xdp_redirect_multi.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index c3999587bc23..e9112899a9a5 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -49,6 +49,7 @@ TEST_FILES = xsk_prereqs.sh \ # Order correspond to 'make run_tests' order TEST_PROGS := test_kmod.sh \ test_xdp_redirect.sh \ + test_xdp_redirect_multi.sh \ test_xdp_meta.sh \ test_xdp_veth.sh \ test_offload.py \ @@ -79,7 +80,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \ flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \ test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \ - xdpxceiver + xdpxceiver xdp_redirect_multi TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read diff --git a/tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c b/tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c new file mode 100644 index 000000000000..738f48ff3ddc --- /dev/null +++ b/tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c @@ -0,0 +1,96 @@ +// SPDX-License-Identifier: GPL-2.0 +#define KBUILD_MODNAME "foo" +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +/* It would be easier to use a key:if_index, value:if_index map, but it + * will need a very large entries as the if_index number may get very large, + * this would affect the performace. So the DEVMAP here is just for testing. + */ +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(int)); + __uint(max_entries, 1024); +} map_v4 SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(int)); + __uint(max_entries, 128); +} map_all SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(struct bpf_devmap_val)); + __uint(max_entries, 128); +} map_egress SEC(".maps"); + +/* map to store egress interfaces mac addresses */ +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, __u32); + __type(value, __be64); + __uint(max_entries, 128); +} mac_map SEC(".maps"); + +SEC("xdp_redirect_map_multi") +int xdp_redirect_map_multi_prog(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + struct ethhdr *eth = data; + __u16 h_proto; + __u64 nh_off; + + nh_off = sizeof(*eth); + if (data + nh_off > data_end) + return XDP_DROP; + + h_proto = eth->h_proto; + + if (h_proto == bpf_htons(ETH_P_IP)) + return bpf_redirect_map(&map_v4, 0, BPF_F_REDIR_MASK); + else + return bpf_redirect_map(&map_all, 0, BPF_F_REDIR_MASK); +} + +/* The following 2 progs are for 2nd devmap prog testing */ +SEC("xdp_redirect_map_ingress") +int xdp_redirect_map_all_prog(struct xdp_md *ctx) +{ + return bpf_redirect_map(&map_egress, 0, BPF_F_REDIR_MASK); +} + +SEC("xdp_devmap/map_prog") +int xdp_devmap_prog(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + __u32 key = ctx->egress_ifindex; + struct ethhdr *eth = data; + __u64 nh_off; + __be64 *mac; + + nh_off = sizeof(*eth); + if (data + nh_off > data_end) + return XDP_DROP; + + mac = bpf_map_lookup_elem(&mac_map, &key); + if (mac) + __builtin_memcpy(eth->h_source, mac, ETH_ALEN); + + return XDP_PASS; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_xdp_redirect_multi.sh b/tools/testing/selftests/bpf/test_xdp_redirect_multi.sh new file mode 100755 index 000000000000..e070d9de2c3a --- /dev/null +++ b/tools/testing/selftests/bpf/test_xdp_redirect_multi.sh @@ -0,0 +1,187 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Test topology: +# - - - - - - - - - - - - - - - - - - - - - - - - - +# | veth1 veth2 veth3 veth4 | ... init net +# - -| - - - - - - | - - - - - - | - - - - - - | - - +# --------- --------- --------- --------- +# | veth0 | | veth0 | | veth0 | | veth0 | ... +# --------- --------- --------- --------- +# ns1 ns2 ns3 ns4 +# +# Forward maps: +# map_all has interfaces: veth1, veth2, veth3, veth4, ... (All traffic except IPv4) +# map_v4 has interfaces: veth1, veth3, veth4, ... (For IPv4 traffic only) +# map_egress has all interfaces and redirect all pkts +# Map type: +# map_v4 use DEVMAP, others use DEVMAP_HASH +# +# Test modules: +# XDP modes: generic, native, native + egress_prog +# +# Test cases: +# ARP: +# ns1 -> gw: ns2, ns3, ns4 should receive the arp request +# IPv4: +# ping test: ns1 -> ns2 (block), ns1 -> ns3 (pass), ns1 -> ns4 (pass) +# egress_prog: +# all src mac should be egress interface's mac +# + + +# netns numbers +NUM=4 +IFACES="" +DRV_MODE="xdpgeneric xdpdrv xdpegress" +PASS=0 +FAIL=0 + +test_pass() +{ + echo "Pass: $@" + PASS=$((PASS + 1)) +} + +test_fail() +{ + echo "fail: $@" + FAIL=$((FAIL + 1)) +} + +clean_up() +{ + for i in $(seq $NUM); do + ip link del veth$i 2> /dev/null + ip netns del ns$i 2> /dev/null + done +} + +# Kselftest framework requirement - SKIP code is 4. +check_env() +{ + ip link set dev lo xdpgeneric off &>/dev/null + if [ $? -ne 0 ];then + echo "selftests: [SKIP] Could not run test without the ip xdpgeneric support" + exit 4 + fi + + which tcpdump &>/dev/null + if [ $? -ne 0 ];then + echo "selftests: [SKIP] Could not run test without tcpdump" + exit 4 + fi +} + +setup_ns() +{ + local mode=$1 + IFACES="" + + if [ "$mode" = "xdpegress" ]; then + mode="xdpdrv" + fi + + for i in $(seq $NUM); do + ip netns add ns$i + ip link add veth$i type veth peer name veth0 netns ns$i + ip link set veth$i up + ip -n ns$i link set veth0 up + + ip -n ns$i addr add 192.0.2.$i/24 dev veth0 + ip -n ns$i addr add 2001:db8::$i/64 dev veth0 + ip -n ns$i link set veth0 $mode obj \ + xdp_dummy.o sec xdp_dummy &> /dev/null || \ + { test_fail "Unable to load dummy xdp" && exit 1; } + IFACES="$IFACES veth$i" + veth_mac[$i]=$(ip link show veth$i | awk '/link\/ether/ {print $2}') + done +} + +do_egress_tests() +{ + local mode=$1 + + # mac test + ip netns exec ns2 tcpdump -e -i veth0 -nn -l -e &> mac_ns1-2_${mode}.log & + ip netns exec ns3 tcpdump -e -i veth0 -nn -l -e &> mac_ns1-3_${mode}.log & + ip netns exec ns4 tcpdump -e -i veth0 -nn -l -e &> mac_ns1-4_${mode}.log & + ip netns exec ns1 ping 192.0.2.254 -c 4 &> /dev/null + sleep 2 + pkill -9 tcpdump + + # mac check + grep -q "${veth_mac[2]} > ff:ff:ff:ff:ff:ff" mac_ns1-2_${mode}.log && \ + test_pass "$mode mac ns1-2" || test_fail "$mode mac ns1-2" + grep -q "${veth_mac[3]} > ff:ff:ff:ff:ff:ff" mac_ns1-3_${mode}.log && \ + test_pass "$mode mac ns1-3" || test_fail "$mode mac ns1-3" + grep -q "${veth_mac[4]} > ff:ff:ff:ff:ff:ff" mac_ns1-4_${mode}.log && \ + test_pass "$mode mac ns1-4" || test_fail "$mode mac ns1-4" +} + +do_ping_tests() +{ + local mode=$1 + + # arp test + ip netns exec ns2 tcpdump -i veth0 -nn -l -e &> arp_ns1-2_${mode}.log & + ip netns exec ns3 tcpdump -i veth0 -nn -l -e &> arp_ns1-3_${mode}.log & + ip netns exec ns4 tcpdump -i veth0 -nn -l -e &> arp_ns1-4_${mode}.log & + ip netns exec ns1 ping 192.0.2.254 -c 4 &> /dev/null + sleep 2 + pkill -9 tcpdump + grep -q "Request who-has 192.0.2.254 tell 192.0.2.1" arp_ns1-2_${mode}.log && \ + test_pass "$mode arp ns1-2" || test_fail "$mode arp ns1-2" + grep -q "Request who-has 192.0.2.254 tell 192.0.2.1" arp_ns1-3_${mode}.log && \ + test_pass "$mode arp ns1-3" || test_fail "$mode arp ns1-3" + grep -q "Request who-has 192.0.2.254 tell 192.0.2.1" arp_ns1-4_${mode}.log && \ + test_pass "$mode arp ns1-4" || test_fail "$mode arp ns1-4" + + # ping test + ip netns exec ns1 ping 192.0.2.2 -c 4 &> /dev/null && \ + test_fail "$mode ping ns1-2" || test_pass "$mode ping ns1-2" + ip netns exec ns1 ping 192.0.2.3 -c 4 &> /dev/null && \ + test_pass "$mode ping ns1-3" || test_pass "$mode ping ns1-3" + ip netns exec ns1 ping 192.0.2.4 -c 4 &> /dev/null && \ + test_pass "$mode ping ns1-4" || test_fail "$mode ping ns1-4" +} + +do_tests() +{ + local mode=$1 + local drv_p + + case ${mode} in + xdpdrv) drv_p="-N";; + xdpegress) drv_p="-X";; + xdpgeneric) drv_p="-S";; + esac + + ./xdp_redirect_multi $drv_p $IFACES &> xdp_redirect_${mode}.log & + xdp_pid=$! + sleep 10 + + if [ "$mode" = "xdpegress" ]; then + do_egress_tests $mode + else + do_ping_tests $mode + fi + + kill $xdp_pid +} + +trap clean_up 0 2 3 6 9 + +check_env +rm -f xdp_redirect_*.log arp_ns*.log mac_ns*.log + +for mode in ${DRV_MODE}; do + setup_ns $mode + do_tests $mode + sleep 10 + clean_up + sleep 5 +done + +echo "Summary: PASS $PASS, FAIL $FAIL" +[ $FAIL -eq 0 ] && exit 0 || exit 1 diff --git a/tools/testing/selftests/bpf/xdp_redirect_multi.c b/tools/testing/selftests/bpf/xdp_redirect_multi.c new file mode 100644 index 000000000000..6a282dde90bd --- /dev/null +++ b/tools/testing/selftests/bpf/xdp_redirect_multi.c @@ -0,0 +1,236 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "bpf_util.h" +#include +#include + +#define MAX_IFACE_NUM 32 +#define MAX_INDEX_NUM 1024 + +static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST; +static int ifaces[MAX_IFACE_NUM] = {}; + +static void int_exit(int sig) +{ + __u32 prog_id = 0; + int i; + + for (i = 0; ifaces[i] > 0; i++) { + if (bpf_get_link_xdp_id(ifaces[i], &prog_id, xdp_flags)) { + printf("bpf_get_link_xdp_id failed\n"); + exit(1); + } + if (prog_id) + bpf_set_link_xdp_fd(ifaces[i], -1, xdp_flags); + } + + exit(0); +} + +static int get_mac_addr(unsigned int ifindex, void *mac_addr) +{ + char ifname[IF_NAMESIZE]; + struct ifreq ifr; + int fd, ret = -1; + + fd = socket(AF_INET, SOCK_DGRAM, 0); + if (fd < 0) + return ret; + + if (!if_indextoname(ifindex, ifname)) + goto err_out; + + strcpy(ifr.ifr_name, ifname); + + if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0) + goto err_out; + + memcpy(mac_addr, ifr.ifr_hwaddr.sa_data, 6 * sizeof(char)); + ret = 0; + +err_out: + close(fd); + return ret; +} + +static void usage(const char *prog) +{ + fprintf(stderr, + "usage: %s [OPTS] ...\n" + "OPTS:\n" + " -S use skb-mode\n" + " -N enforce native mode\n" + " -F force loading prog\n" + " -X load xdp program on egress\n", + prog); +} + +int main(int argc, char **argv) +{ + int prog_fd, group_all, group_v4, mac_map; + struct bpf_program *ingress_prog, *egress_prog; + struct bpf_prog_load_attr prog_load_attr = { + .prog_type = BPF_PROG_TYPE_UNSPEC, + }; + int i, ret, opt, egress_prog_fd = 0; + struct bpf_devmap_val devmap_val; + bool attach_egress_prog = false; + unsigned char mac_addr[6]; + char ifname[IF_NAMESIZE]; + struct bpf_object *obj; + unsigned int ifindex; + char filename[256]; + + while ((opt = getopt(argc, argv, "SNFX")) != -1) { + switch (opt) { + case 'S': + xdp_flags |= XDP_FLAGS_SKB_MODE; + break; + case 'N': + /* default, set below */ + break; + case 'F': + xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST; + break; + case 'X': + attach_egress_prog = true; + break; + default: + usage(basename(argv[0])); + return 1; + } + } + + if (!(xdp_flags & XDP_FLAGS_SKB_MODE)) { + xdp_flags |= XDP_FLAGS_DRV_MODE; + } else if (attach_egress_prog) { + printf("Load xdp program on egress with SKB mode not supported yet\n"); + goto err_out; + } + + if (optind == argc) { + printf("usage: %s ...\n", argv[0]); + goto err_out; + } + + printf("Get interfaces"); + for (i = 0; i < MAX_IFACE_NUM && argv[optind + i]; i++) { + ifaces[i] = if_nametoindex(argv[optind + i]); + if (!ifaces[i]) + ifaces[i] = strtoul(argv[optind + i], NULL, 0); + if (!if_indextoname(ifaces[i], ifname)) { + perror("Invalid interface name or i"); + goto err_out; + } + if (ifaces[i] > MAX_INDEX_NUM) { + printf("Interface index to large\n"); + goto err_out; + } + printf(" %d", ifaces[i]); + } + printf("\n"); + + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); + prog_load_attr.file = filename; + + if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd)) + goto err_out; + + if (attach_egress_prog) + group_all = bpf_object__find_map_fd_by_name(obj, "map_egress"); + else + group_all = bpf_object__find_map_fd_by_name(obj, "map_all"); + group_v4 = bpf_object__find_map_fd_by_name(obj, "map_v4"); + mac_map = bpf_object__find_map_fd_by_name(obj, "mac_map"); + + if (group_all < 0 || group_v4 < 0 || mac_map < 0) { + printf("bpf_object__find_map_fd_by_name failed\n"); + goto err_out; + } + + if (attach_egress_prog) { + /* Find ingress/egress prog for 2nd xdp prog */ + ingress_prog = bpf_object__find_program_by_name(obj, "xdp_redirect_map_all_prog"); + egress_prog = bpf_object__find_program_by_name(obj, "xdp_devmap_prog"); + if (!ingress_prog || !egress_prog) { + printf("finding ingress/egress_prog in obj file failed\n"); + goto err_out; + } + prog_fd = bpf_program__fd(ingress_prog); + egress_prog_fd = bpf_program__fd(egress_prog); + if (prog_fd < 0 || egress_prog_fd < 0) { + printf("find egress_prog fd failed\n"); + goto err_out; + } + } + + signal(SIGINT, int_exit); + signal(SIGTERM, int_exit); + + /* Init forward multicast groups and exclude group */ + for (i = 0; ifaces[i] > 0; i++) { + ifindex = ifaces[i]; + + if (attach_egress_prog) { + ret = get_mac_addr(ifindex, mac_addr); + if (ret < 0) { + printf("get interface %d mac failed\n", ifindex); + goto err_out; + } + ret = bpf_map_update_elem(mac_map, &ifindex, mac_addr, 0); + if (ret) { + perror("bpf_update_elem mac_map failed\n"); + goto err_out; + } + } + + /* Add all the interfaces to group all */ + devmap_val.ifindex = ifindex; + devmap_val.bpf_prog.fd = egress_prog_fd; + ret = bpf_map_update_elem(group_all, &ifindex, &devmap_val, 0); + if (ret) { + perror("bpf_map_update_elem"); + goto err_out; + } + + /* For testing: skip adding the 2nd interfaces to group v4 */ + if (i != 1) { + ret = bpf_map_update_elem(group_v4, &ifindex, &ifindex, 0); + if (ret) { + perror("bpf_map_update_elem"); + goto err_out; + } + } + + /* bind prog_fd to each interface */ + ret = bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags); + if (ret) { + printf("Set xdp fd failed on %d\n", ifindex); + goto err_out; + } + } + + /* sleep some time for testing */ + sleep(999); + + return 0; + +err_out: + return 1; +}