[PATCHv11,bpf-next,2/4] xdp: extend xdp_redirect_map with broadcast support

This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
extend xdp_redirect_map for broadcast support.

With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
excluded when do broadcasting.

When getting the devices in dev hash map via dev_map_hash_get_next_key(),
there is a possibility that we fall back to the first key when a device
was removed. This will duplicate packets on some interfaces. So just walk
the whole buckets to avoid this issue. For dev array map, we also walk the
whole map to find valid interfaces.

Function bpf_clear_redirect_map() was removed in
commit ee75aef23afe ("bpf, xdp: Restructure redirect actions").
Add it back as we need to use ri->map again.

With test topology:
  +-------------------+             +-------------------+
  | Host A (i40e 10G) |  ---------- | eno1(i40e 10G)    |
  +-------------------+             |                   |
                                    |   Host B          |
  +-------------------+             |                   |
  | Host C (i40e 10G) |  ---------- | eno2(i40e 10G)    |
  +-------------------+             |                   |
                                    |          +------+ |
                                    | veth0 -- | Peer | |
                                    | veth1 -- |      | |
                                    | veth2 -- |  NS  | |
                                    |          +------+ |
                                    +-------------------+

On Host A:
 # pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64

On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory):
Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing.
All the veth peers in the NS have a XDP_DROP program loaded. The
forward_map max_entries in xdp_redirect_map_multi is modify to 4.

Testing the performance impact on the regular xdp_redirect path with and
without patch (to check impact of additional check for broadcast mode):

5.12 rc4         | redirect_map        i40e->i40e      |    2.0M |  9.7M
5.12 rc4         | redirect_map        i40e->veth      |    1.7M | 11.8M
5.12 rc4 + patch | redirect_map        i40e->i40e      |    2.0M |  9.6M
5.12 rc4 + patch | redirect_map        i40e->veth      |    1.7M | 11.7M

Testing the performance when cloning packets with the redirect_map_multi
test, using a redirect map size of 4, filled with 1-3 devices:

5.12 rc4 + patch | redirect_map multi  i40e->veth (x1) |    1.7M | 11.4M
5.12 rc4 + patch | redirect_map multi  i40e->veth (x2) |    1.1M |  4.3M
5.12 rc4 + patch | redirect_map multi  i40e->veth (x3) |    0.8M |  2.6M

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

---
v11:
a) Use unlikely() when checking if this is for broadcast redirecting.
b) Fix the tracepoint NULL pointer issue Jesper found
c) Remove BPF_F_REDIR_MASK and just use OR flags to make the reader more
   clear about what's going on
d) Add the performace number with multi veth interfaces in commit
   description.

v10:
Remind by Jesper: revert xchg() and use READ/WRITE_ONCE when read/write map
pointer as xchg call can be expensive, since this is an atomic operation.

v9: no update

v8:
use hlist_for_each_entry_rcu() when loop the devmap hash ojbs

v7:
no need to free xdpf in dev_map_enqueue_clone() if xdpf_clone failed.
Also return -EOVERFLOW if xdp_convert_buff_to_frame() failed the same
as other caller did.

v6:
Fix a skb leak in the error path for generic XDP

v5:
a) use xchg() instead of READ/WRITE_ONCE and no need to clear ri->flags
   in xdp_do_redirect()
b) Do not use get_next_key() as we may restart looping from the first key
   when remove/update a dev in hash map. Just walk the map directly to
   get all the devices and ignore the new added/deleted objects.
c) Loop all the array map instead stop at the first hole.

v4:
a) add a new argument flag_mask to __bpf_xdp_redirect_map() filter out
invalid map.
b) __bpf_xdp_redirect_map() sets the map pointer if the broadcast flag
is set and clears it if the flag isn't set
c) xdp_do_redirect() does the READ_ONCE/WRITE_ONCE on ri->map to check
if we should enqueue multi

v3:
a) Rebase the code on Björn's "bpf, xdp: Restructure redirect actions".
   - Add struct bpf_map *map back to struct bpf_redirect_info as we need
     it for multicast.
   - Add bpf_clear_redirect_map() back for devmap.c
   - Add devmap_lookup_elem() as we need it in general path.
b) remove tmp_key in devmap_get_next_obj()

v2: Fix flag renaming issue in v1
---
 include/linux/bpf.h            |  20 ++++
 include/linux/filter.h         |  18 +++-
 include/net/xdp.h              |   1 +
 include/trace/events/xdp.h     |   6 +-
 include/uapi/linux/bpf.h       |  16 ++-
 kernel/bpf/cpumap.c            |   3 +-
 kernel/bpf/devmap.c            | 183 ++++++++++++++++++++++++++++++++-
 net/core/filter.c              |  37 ++++++-
 net/core/xdp.c                 |  29 ++++++
 net/xdp/xskmap.c               |   3 +-
 tools/include/uapi/linux/bpf.h |  16 ++-
 11 files changed, 317 insertions(+), 15 deletions(-)

Message ID	20210428071916.204820-3-liuhangbin@gmail.com (mailing list archive)
State	New, archived
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F93FC433ED for <bpf@archiver.kernel.org>; Wed, 28 Apr 2021 07:19:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ECED3613D8 for <bpf@archiver.kernel.org>; Wed, 28 Apr 2021 07:19:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236657AbhD1HUb (ORCPT <rfc822;bpf@archiver.kernel.org>); Wed, 28 Apr 2021 03:20:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229478AbhD1HUa (ORCPT <rfc822;bpf@vger.kernel.org>); Wed, 28 Apr 2021 03:20:30 -0400 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58491C061574; Wed, 28 Apr 2021 00:19:46 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id h20so32304899plr.4; Wed, 28 Apr 2021 00:19:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=rV+eqF1b8ijMtZbkuDgbVI+KsdnFCpwTChLFgPPnWI0=; b=XJiW/yO0gwxi3oy8r7YhXXrUnR4GQdFBoWzDX3iM6HQABKekq7AbVQ9ZbrZsaN2nhe thdMsUKHtw3gcoVkMz6CkDe10jYywgri8IdjbqRZlywQb0lD8s5UaoOtVin1KWNez8aA Qe5RZZW/sAIgUKkd4A7KLSgYZu6zbdO8qIguTmz6TXBLG/YVa2ZRm+pkQt7Hj5Dbq+xw Xr+01s/zYK9MGwlZ2j+kp091Zwz+qW0rP3fWPpmswnmCZxyEDg7JEtsDubcGW7zTNjDw 79GrsjLwWEhzRrbCL5VMDrraBgqMMfTrG9PI89M12Oy2pPZQ78mqvJaU8waWYhUxz/0V tl/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=rV+eqF1b8ijMtZbkuDgbVI+KsdnFCpwTChLFgPPnWI0=; b=pAKur2qICXE8ST2SHjoqotrDlQqyr5y//9wUWc8xeJY7TTpVkJJ6T7Naly2jfkAYyr OSN/MQxS4FK704n/6iwoUO0Qo1PJgrKnzwnFeWjvokrKrVAUZm4Vdzn863crhtWuxKXF 4nl54pbqUi2/uWMuXiimSpAvPgAddNDX24Dqtkh7GTczr9NXIM9BAE/BUaf7q7zrxp0Z LjYnixtPgMHN/I3qPkbPWHdPxrUkju/+CfqX+ESLQ3AYJdCd6r2vVzvBZY+H9ONgmb/p U8sfbt4Gb6OcN/3A4NzowUn+KiLt/hC2IkVZ6fpysk/Zrqs1Y/ADglJelV1djKQdNzTm pgTw== X-Gm-Message-State: AOAM531V5Md7TTaez4mSxfPq14vik6fY+DYo1ri7wwFkDe3MHimGr+8O 6UVSGdbgKQcVxyfzO97r5A11lDTz8URdDg== X-Google-Smtp-Source: ABdhPJy9VA8bEK/Hozt7QDPw9eUwtjOg5JmXX2mPFYpXoBW1WRUDaQTipk/qpN6yFVQsx9XOaQNZ/A== X-Received: by 2002:a17:902:9b94:b029:ed:37fb:df9a with SMTP id y20-20020a1709029b94b02900ed37fbdf9amr13683468plp.59.1619594385399; Wed, 28 Apr 2021 00:19:45 -0700 (PDT) Received: from Leo-laptop-t470s.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id jv12sm4152491pjb.56.2021.04.28.00.19.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Apr 2021 00:19:45 -0700 (PDT) From: Hangbin Liu <liuhangbin@gmail.com> To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= <toke@redhat.com>, Jiri Benc <jbenc@redhat.com>, Jesper Dangaard Brouer <brouer@redhat.com>, Eelco Chaudron <echaudro@redhat.com>, ast@kernel.org, Daniel Borkmann <daniel@iogearbox.net>, Lorenzo Bianconi <lorenzo.bianconi@redhat.com>, David Ahern <dsahern@gmail.com>, Andrii Nakryiko <andrii.nakryiko@gmail.com>, Alexei Starovoitov <alexei.starovoitov@gmail.com>, John Fastabend <john.fastabend@gmail.com>, Maciej Fijalkowski <maciej.fijalkowski@intel.com>, =?utf-8?b?QmrDtnJuIFQ=?= =?utf-8?b?w7ZwZWw=?= <bjorn.topel@gmail.com>, Martin KaFai Lau <kafai@fb.com>, Hangbin Liu <liuhangbin@gmail.com> Subject: [PATCHv11 bpf-next 2/4] xdp: extend xdp_redirect_map with broadcast support Date: Wed, 28 Apr 2021 15:19:14 +0800 Message-Id: <20210428071916.204820-3-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.3 In-Reply-To: <20210428071916.204820-1-liuhangbin@gmail.com> References: <20210428071916.204820-1-liuhangbin@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <bpf.vger.kernel.org> X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net
Series	xdp: extend xdp_redirect_map with broadcast support \| expand [PATCHv11,bpf-next,0/4] xdp: extend xdp_redirect_map with broadcast support [PATCHv11,bpf-next,1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue [PATCHv11,bpf-next,2/4] xdp: extend xdp_redirect_map with broadcast support [PATCHv11,bpf-next,3/4] sample/bpf: add xdp_redirect_map_multi for redirect_map broadcast test [PATCHv11,bpf-next,4/4] selftests/bpf: add xdp_redirect_multi test

Context	Check	Description
netdev/cover_letter	success	Link
netdev/fixes_present	success	Link
netdev/patch_count	success	Link
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/subject_prefix	success	Link
netdev/cc_maintainers	warning	14 maintainers not CCed: joe@cilium.io jonathan.lemon@gmail.com yhs@fb.com kpsingh@kernel.org mingo@redhat.com andrii@kernel.org hawk@kernel.org rostedt@goodmis.org magnus.karlsson@intel.com songliubraving@fb.com bjorn@kernel.org davem@davemloft.net quentin@isovalent.com kuba@kernel.org
netdev/source_inline	success	Was 0 now: 0
netdev/verify_signedoff	success	Link
netdev/module_param	success	Was 0 now: 0
netdev/build_32bit	success	Errors and warnings before: 11991 this patch: 11991
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/verify_fixes	success	Link
netdev/checkpatch	warning	WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: please, no space before tabs
netdev/build_allmodconfig_warn	success	Errors and warnings before: 12478 this patch: 12478
netdev/header_inline	success	Link

[PATCHv11,bpf-next,2/4] xdp: extend xdp_redirect_map with broadcast support

Checks

Commit Message

Patch