From patchwork Tue Dec 1 17:23:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= X-Patchwork-Id: 11943389 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7073C83012 for ; Tue, 1 Dec 2020 17:25:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5E0012168B for ; Tue, 1 Dec 2020 17:25:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AlHZg0M3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728668AbgLARYz (ORCPT ); Tue, 1 Dec 2020 12:24:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726303AbgLARYz (ORCPT ); Tue, 1 Dec 2020 12:24:55 -0500 Received: from mail-pl1-x644.google.com (mail-pl1-x644.google.com [IPv6:2607:f8b0:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2D78C0613CF; Tue, 1 Dec 2020 09:24:14 -0800 (PST) Received: by mail-pl1-x644.google.com with SMTP id r2so1528588pls.3; Tue, 01 Dec 2020 09:24:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=6S8l4sjPOYg9rXEcyAyEf6XtqYnJV0I5XG+0cw5t1rI=; b=AlHZg0M3u3gJp2Ywbb8cVfayPzCJ9eh/Yc3qbjSk1vtfTnTvZJOt45xzla8kvzYDV7 n+LYd5eum217P1/RizSL4ssXrCDFj+M08TCnftjVXfJfykNjcNXBx5pMYq9tg99OmvYH rbZ88LlzsX0sdfLx2KM4BrU8cTJ6/VaqUKfri+mc8/jdhotxKnZUBMN5yjxFQ8w3cV9x cgKjCxJQZkB5PxoPcumCaWtoGTpkiLEVcoySIva9Zhn6huhVUxF9YLOEuEQDu4VLFPdS 1FQnnwwVnzePZcxyTxGcIwFT8FcmyRFskA8ZYx42eFgGZO4KvzRYS2OmLiCDKGq4EPxE Bpnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=6S8l4sjPOYg9rXEcyAyEf6XtqYnJV0I5XG+0cw5t1rI=; b=n0fmvtXh7LyiLRaLUu+L71WqBy6ru0d3y9S0+v6W4OgRxNgxJCA/BATN6NaT82vpM3 mtgaLYzmzWQaTDOiB3AyTFa8jGNShs12HyNx0unHer/a1FaOtTUO9UQTzJFuI1w3ag1B Jw8RLHLyXS2fJ45Y1TjnDppwPZXvM89AdpWeZ8PShT8BNtPuW8tXO5uKV4VRzTn2TAMq E38KSfRtIOQWtVxZU9Fo4F0LuvCdh12vV33vwdE/sD/9LNPBfRo1fsowqCgm68caOunE 10bLYLvuvakd5uwFbR4U4NCaiVR+x30hmsq+RF/Mf04TL4tUbwBEquZq0xJijG1IjRrm vfZQ== X-Gm-Message-State: AOAM531/0RreIPMOSF7Ub0s1ZLwBLJ6YxH27Xg4n1/+jDgGLZoXSMFSL 0mdmChj4TP6+MFGF2xEuQsg= X-Google-Smtp-Source: ABdhPJzpRdUyQIadmjlT+980Jpgg4KhPTHdFqxyzCOIOCWIkZyDUZhXeMuIM80iMYEpRPeSqcuiccQ== X-Received: by 2002:a17:90a:5d17:: with SMTP id s23mr3822778pji.208.1606843454219; Tue, 01 Dec 2020 09:24:14 -0800 (PST) Received: from btopel-mobl.ger.intel.com ([192.55.54.44]) by smtp.gmail.com with ESMTPSA id m204sm377778pfd.118.2020.12.01.09.24.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Dec 2020 09:24:13 -0800 (PST) From: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= To: ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, bpf@vger.kernel.org Cc: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , kafai@fb.com, songliubraving@fb.com, yhs@fb.com, andrii@kernel.org, john.fastabend@gmail.com, hawk@kernel.org, kuba@kernel.org, magnus.karlsson@intel.com, maciej.fijalkowski@intel.com Subject: [PATCH bpf-next] bpf, xdp: add bpf_redirect{,_map}() leaf node detection and optimization Date: Tue, 1 Dec 2020 18:23:45 +0100 Message-Id: <20201201172345.264053-1-bjorn.topel@gmail.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Björn Töpel Teach the verifier to detect if all calls to bpf_redirect{,_map}() are leaf nodes, i.e.: return bpf_redirect_map(...); or ret = bpf_redirect_map(...); if (ret != 0) return ret; If so, we can apply an optimization to the XDP path. Instead of calling bpf_redirect_map() followed by xdp_do_redirect(), we simply perform the work of xdp_do_redirect() from bpf_redirect_map(). By doing so we can do fewer loads/stores/checks and save some cycles. The XDP core will introspect the XDP program to check whether the optimization can be performed, by checking the "redirect_opt" bit in the bpf_prog structure. The bpf_redirect_info structure is extended with some new members: xdp_prog_redirect_opt and xdp. The xdp_prog_redirect_opt member is the current program executing the helper. This is also used as a flag in the XDP core to determine if the optimization is turned on. The xdp member is the current xdp_buff/context executing. The verifier detection is currently very simplistic, and aimed for very simple XDP programs such as the libbpf AF_XDP XDP program. If BPF tail calls or bpf2bpf calls are used, the optimization will be disabled. Performance up ~5% Mpps for the xdp_redirect_map and xdpsock samples, and ~3% for bpf_redirect() programs. An interesting extension would be to support an indirect jump instruction/proper tail calls (only for helpers) in BPF, so the call could be elided in favor for a jump. Thanks to Maciej for the internal code review. Signed-off-by: Björn Töpel --- include/linux/bpf_verifier.h | 3 ++ include/linux/filter.h | 30 +++++++++-- kernel/bpf/verifier.c | 68 ++++++++++++++++++++++++ net/core/dev.c | 2 +- net/core/filter.c | 100 +++++++++++++++++++++++++++++++++-- 5 files changed, 195 insertions(+), 8 deletions(-) base-commit: ba0581749fec389e55c9d761f2716f8fcbefced5 diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 306869d4743b..74e7e2f89251 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -423,6 +423,9 @@ struct bpf_verifier_env { u32 peak_states; /* longest register parentage chain walked for liveness marking */ u32 longest_mark_read_walk; + /* Are all leaf nodes redirect_map? */ + bool all_leaves_redirect; + u32 redirect_call_cnt; }; __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log, diff --git a/include/linux/filter.h b/include/linux/filter.h index 1b62397bd124..6509ced898a2 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -534,7 +534,8 @@ struct bpf_prog { kprobe_override:1, /* Do we override a kprobe? */ has_callchain_buf:1, /* callchain buffer allocated? */ enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */ - call_get_stack:1; /* Do we call bpf_get_stack() or bpf_get_stackid() */ + call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */ + redirect_opt:1; /* All bpf_redirect{,_map}() are leaf calls */ enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ @@ -622,6 +623,8 @@ struct bpf_redirect_info { struct bpf_map *map; u32 kern_flags; struct bpf_nh_params nh; + const struct bpf_prog *xdp_prog_redirect_opt; + struct xdp_buff *xdp; }; DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); @@ -734,6 +737,13 @@ DECLARE_BPF_DISPATCHER(xdp) static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, struct xdp_buff *xdp) { + if (prog->redirect_opt) { + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + + ri->xdp_prog_redirect_opt = prog; + ri->xdp = xdp; + } + /* Caller needs to hold rcu_read_lock() (!), otherwise program * can be released while still running, or map elements could be * freed early while still having concurrent users. XDP fastpath @@ -743,6 +753,11 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); } +static __always_inline u32 bpf_prog_run_xdp_skb(const struct bpf_prog *prog, struct xdp_buff *xdp) +{ + return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); +} + void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog); static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog) @@ -951,9 +966,16 @@ static inline int xdp_ok_fwd_dev(const struct net_device *fwd, */ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *prog); -int xdp_do_redirect(struct net_device *dev, - struct xdp_buff *xdp, - struct bpf_prog *prog); +int __xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *prog); +static inline int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, + struct bpf_prog *prog) +{ + if (prog->redirect_opt) + return 0; + + return __xdp_do_redirect(dev, xdp, prog); +} + void xdp_do_flush(void); /* The xdp_do_flush_map() helper has been renamed to drop the _map suffix, as diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e333ce43f281..9ede6f1bca37 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5032,6 +5032,58 @@ static int check_reference_leak(struct bpf_verifier_env *env) return state->acquired_refs ? -EINVAL : 0; } +static void check_redirect_opt(struct bpf_verifier_env *env, int func_id, int insn_idx) +{ + struct bpf_insn *insns = env->prog->insnsi; + int insn_cnt = env->prog->len; + struct bpf_insn *insn; + bool is_leaf = false; + + if (!(func_id == BPF_FUNC_redirect || func_id == BPF_FUNC_redirect_map)) + return; + + /* Naive peephole leaf node checking */ + insn_idx++; + if (insn_idx >= insn_cnt) + return; + + insn = &insns[insn_idx]; + switch (insn->code) { + /* Is the instruction following the call, an exit? */ + case BPF_JMP | BPF_EXIT: + is_leaf = true; + break; + /* Follow the true branch of "if return value (r/w0) is not + * zero", and look for exit. + */ + case BPF_JMP | BPF_JSGT | BPF_K: + case BPF_JMP32 | BPF_JSGT | BPF_K: + case BPF_JMP | BPF_JGT | BPF_K: + case BPF_JMP32 | BPF_JGT | BPF_K: + case BPF_JMP | BPF_JNE | BPF_K: + case BPF_JMP32 | BPF_JNE | BPF_K: + if (insn->dst_reg == BPF_REG_0 && insn->imm == 0) { + insn_idx += insn->off + 1; + if (insn_idx >= insn_cnt) + break; + + insn = &insns[insn_idx]; + is_leaf = insn->code == (BPF_JMP | BPF_EXIT); + } + break; + default: + break; + } + + if (!env->redirect_call_cnt++) { + env->all_leaves_redirect = is_leaf; + return; + } + + if (!is_leaf) + env->all_leaves_redirect = false; +} + static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn_idx) { const struct bpf_func_proto *fn = NULL; @@ -5125,6 +5177,8 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn } } + check_redirect_opt(env, func_id, insn_idx); + regs = cur_regs(env); /* check that flags argument in get_local_storage(map, flags) is 0, @@ -11894,6 +11948,17 @@ static int check_attach_btf_id(struct bpf_verifier_env *env) return 0; } +static void validate_redirect_opt(struct bpf_verifier_env *env) +{ + if (env->subprog_cnt != 1) + return; + + if (env->subprog_info[0].has_tail_call) + return; + + env->prog->redirect_opt = env->all_leaves_redirect; +} + struct btf *bpf_get_btf_vmlinux(void) { if (!btf_vmlinux && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) { @@ -12092,6 +12157,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, if (ret == 0) adjust_btf_func(env); + if (ret == 0) + validate_redirect_opt(env); + err_release_maps: if (!env->prog->aux->used_maps) /* if we didn't copy map pointers into bpf_prog_info, release diff --git a/net/core/dev.c b/net/core/dev.c index 3b6b0e175fe7..d31f97ea955b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4654,7 +4654,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, rxqueue = netif_get_rxqueue(skb); xdp->rxq = &rxqueue->xdp_rxq; - act = bpf_prog_run_xdp(xdp_prog, xdp); + act = bpf_prog_run_xdp_skb(xdp_prog, xdp); /* check if bpf_xdp_adjust_head was used */ off = xdp->data - orig_data; diff --git a/net/core/filter.c b/net/core/filter.c index 2ca5eecebacf..f5a0d29aa272 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3981,8 +3981,8 @@ void bpf_clear_redirect_map(struct bpf_map *map) } } -int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) +int __xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, + struct bpf_prog *xdp_prog) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); struct bpf_map *map = READ_ONCE(ri->map); @@ -4015,7 +4015,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map, index, err); return err; } -EXPORT_SYMBOL_GPL(xdp_do_redirect); +EXPORT_SYMBOL_GPL(__xdp_do_redirect); static int xdp_do_generic_redirect_map(struct net_device *dev, struct sk_buff *skb, @@ -4091,6 +4091,36 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, return err; } +static u64 __bpf_xdp_redirect_opt(u32 index, struct bpf_redirect_info *ri) +{ + const struct bpf_prog *xdp_prog; + struct net_device *fwd, *dev; + struct xdp_buff *xdp; + int err; + + xdp_prog = ri->xdp_prog_redirect_opt; + xdp = ri->xdp; + dev = xdp->rxq->dev; + + ri->xdp_prog_redirect_opt = NULL; + + fwd = dev_get_by_index_rcu(dev_net(dev), index); + if (unlikely(!fwd)) { + err = -EINVAL; + goto err; + } + + err = dev_xdp_enqueue(fwd, xdp, dev); + if (unlikely(err)) + goto err; + + _trace_xdp_redirect_map(dev, xdp_prog, fwd, NULL, index); + return XDP_REDIRECT; +err: + _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, NULL, index, err); + return XDP_ABORTED; +} + BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); @@ -4098,6 +4128,9 @@ BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags) if (unlikely(flags)) return XDP_ABORTED; + if (ri->xdp_prog_redirect_opt) + return __bpf_xdp_redirect_opt(ifindex, ri); + ri->flags = flags; ri->tgt_index = ifindex; ri->tgt_value = NULL; @@ -4114,6 +4147,64 @@ static const struct bpf_func_proto bpf_xdp_redirect_proto = { .arg2_type = ARG_ANYTHING, }; +static u64 __bpf_xdp_redirect_map_opt(struct bpf_map *map, u32 index, u64 flags, + struct bpf_redirect_info *ri) +{ + const struct bpf_prog *xdp_prog; + struct net_device *dev; + struct xdp_buff *xdp; + void *val; + int err; + + xdp_prog = ri->xdp_prog_redirect_opt; + xdp = ri->xdp; + dev = xdp->rxq->dev; + + ri->xdp_prog_redirect_opt = NULL; + + switch (map->map_type) { + case BPF_MAP_TYPE_DEVMAP: { + val = __dev_map_lookup_elem(map, index); + if (unlikely(!val)) + return flags; + err = dev_map_enqueue(val, xdp, dev); + break; + } + case BPF_MAP_TYPE_DEVMAP_HASH: { + val = __dev_map_hash_lookup_elem(map, index); + if (unlikely(!val)) + return flags; + err = dev_map_enqueue(val, xdp, dev); + break; + } + case BPF_MAP_TYPE_CPUMAP: { + val = __cpu_map_lookup_elem(map, index); + if (unlikely(!val)) + return flags; + err = cpu_map_enqueue(val, xdp, dev); + break; + } + case BPF_MAP_TYPE_XSKMAP: { + val = __xsk_map_lookup_elem(map, index); + if (unlikely(!val)) + return flags; + err = __xsk_map_redirect(val, xdp); + break; + } + default: + return flags; + } + + if (unlikely(err)) + goto err; + + _trace_xdp_redirect_map(dev, xdp_prog, val, map, index); + return XDP_REDIRECT; +err: + _trace_xdp_redirect_map_err(dev, xdp_prog, val, map, index, err); + return XDP_ABORTED; +} + BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, u64, flags) { @@ -4123,6 +4214,9 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, if (unlikely(flags > XDP_TX)) return XDP_ABORTED; + if (ri->xdp_prog_redirect_opt) + return __bpf_xdp_redirect_map_opt(map, ifindex, flags, ri); + ri->tgt_value = __xdp_map_lookup_elem(map, ifindex); if (unlikely(!ri->tgt_value)) { /* If the lookup fails we want to clear out the state in the