From patchwork Tue Jul 26 18:47:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 12929702 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73ECAC00140 for ; Tue, 26 Jul 2022 18:48:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239614AbiGZSsE (ORCPT ); Tue, 26 Jul 2022 14:48:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239617AbiGZSsC (ORCPT ); Tue, 26 Jul 2022 14:48:02 -0400 Received: from 66-220-155-178.mail-mxout.facebook.com (66-220-155-178.mail-mxout.facebook.com [66.220.155.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32CB833421 for ; Tue, 26 Jul 2022 11:48:00 -0700 (PDT) Received: by devbig010.atn6.facebook.com (Postfix, from userid 115148) id DAC23F834E80; Tue, 26 Jul 2022 11:47:42 -0700 (PDT) From: Joanne Koong To: bpf@vger.kernel.org Cc: andrii@kernel.org, daniel@iogearbox.net, ast@kernel.org, Joanne Koong Subject: [PATCH bpf-next v1 1/3] bpf: Add skb dynptrs Date: Tue, 26 Jul 2022 11:47:04 -0700 Message-Id: <20220726184706.954822-2-joannelkoong@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220726184706.954822-1-joannelkoong@gmail.com> References: <20220726184706.954822-1-joannelkoong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Add skb dynptrs, which are dynptrs whose underlying pointer points to a skb. The dynptr acts on skb data. skb dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of skb->data and skb->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For bpf prog types that don't support writes on skb data, the dynptr is read-only (writes and data slices are not permitted). For reads on the dynptr, this includes reading into data in the non-linear paged buffers but for writes and data slices, if the data is in a paged buffer, the user must first call bpf_skb_pull_data to pull the data into the linear portion. Additionally, any helper calls that change the underlying packet buffer (eg bpf_skb_pull_data) invalidates any data slices of the associated dynptr. Right now, skb dynptrs can only be constructed from skbs that are the bpf program context - as such, there does not need to be any reference tracking or release on skb dynptrs. Signed-off-by: Joanne Koong --- include/linux/bpf.h | 8 ++++- include/linux/filter.h | 4 +++ include/uapi/linux/bpf.h | 42 ++++++++++++++++++++++++-- kernel/bpf/helpers.c | 54 +++++++++++++++++++++++++++++++++- kernel/bpf/verifier.c | 43 +++++++++++++++++++++++---- net/core/filter.c | 53 ++++++++++++++++++++++++++++++--- tools/include/uapi/linux/bpf.h | 42 ++++++++++++++++++++++++-- 7 files changed, 229 insertions(+), 17 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 20c26aed7896..7fbd4324c848 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -407,11 +407,14 @@ enum bpf_type_flag { /* Size is known at compile time. */ MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS), + /* DYNPTR points to sk_buff */ + DYNPTR_TYPE_SKB = BIT(11 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; -#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF) +#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB) /* Max number of base types. */ #define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS) @@ -2556,12 +2559,15 @@ enum bpf_dynptr_type { BPF_DYNPTR_TYPE_LOCAL, /* Underlying data is a ringbuf record */ BPF_DYNPTR_TYPE_RINGBUF, + /* Underlying data is a sk_buff */ + BPF_DYNPTR_TYPE_SKB, }; void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type, u32 offset, u32 size); void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr); int bpf_dynptr_check_size(u32 size); +void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr); #ifdef CONFIG_BPF_LSM void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype); diff --git a/include/linux/filter.h b/include/linux/filter.h index a5f21dc3c432..649063d9cbfd 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1532,4 +1532,8 @@ static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifind return XDP_REDIRECT; } +int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len); +int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, + u32 len, u64 flags); + #endif /* __LINUX_FILTER_H__ */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 59a217ca2dfd..0730cd198a7f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5241,11 +5241,22 @@ union bpf_attr { * Description * Write *len* bytes from *src* into *dst*, starting from *offset* * into *dst*. - * *flags* is currently unused. + * + * *flags* must be 0 except for skb-type dynptrs. + * + * For skb-type dynptrs: + * * if *offset* + *len* extends into the skb's paged buffers, the user + * should manually pull the skb with bpf_skb_pull and then try again. + * + * * *flags* are a combination of **BPF_F_RECOMPUTE_CSUM** (automatically + * recompute the checksum for the packet after storing the bytes) and + * **BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\ + * **->swhash** and *skb*\ **->l4hash** to 0). * Return * 0 on success, -E2BIG if *offset* + *len* exceeds the length * of *dst*'s data, -EINVAL if *dst* is an invalid dynptr or if *dst* - * is a read-only dynptr or if *flags* is not 0. + * is a read-only dynptr or if *flags* is not correct, -EAGAIN if for + * skb-type dynptrs the write extends into the skb's paged buffers. * * void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len) * Description @@ -5253,10 +5264,19 @@ union bpf_attr { * * *len* must be a statically known value. The returned data slice * is invalidated whenever the dynptr is invalidated. + * + * For skb-type dynptrs: + * * if *offset* + *len* extends into the skb's paged buffers, + * the user should manually pull the skb with bpf_skb_pull and then + * try again. + * + * * the data slice is automatically invalidated anytime a + * helper call that changes the underlying packet buffer + * (eg bpf_skb_pull) is called. * Return * Pointer to the underlying dynptr data, NULL if the dynptr is * read-only, if the dynptr is invalid, or if the offset and length - * is out of bounds. + * is out of bounds or in a paged buffer for skb-type dynptrs. * * s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr *th, u32 th_len) * Description @@ -5331,6 +5351,21 @@ union bpf_attr { * **-EACCES** if the SYN cookie is not valid. * * **-EPROTONOSUPPORT** if CONFIG_IPV6 is not builtin. + * + * long bpf_dynptr_from_skb(struct sk_buff *skb, u64 flags, struct bpf_dynptr *ptr) + * Description + * Get a dynptr to the data in *skb*. *skb* must be the BPF program + * context. Depending on program type, the dynptr may be read-only, + * in which case trying to obtain a direct data slice to it through + * bpf_dynptr_data will return an error. + * + * Calls that change the *skb*'s underlying packet buffer + * (eg bpf_skb_pull_data) do not invalidate the dynptr, but they do + * invalidate any data slices associated with the dynptr. + * + * *flags* is currently unused, it must be 0 for now. + * Return + * 0 on success or -EINVAL if flags is not 0. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5541,6 +5576,7 @@ union bpf_attr { FN(tcp_raw_gen_syncookie_ipv6), \ FN(tcp_raw_check_syncookie_ipv4), \ FN(tcp_raw_check_syncookie_ipv6), \ + FN(dynptr_from_skb), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 1f961f9982d2..21a806057e9e 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1425,11 +1425,21 @@ static bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr) return ptr->size & DYNPTR_RDONLY_BIT; } +void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr) +{ + ptr->size |= DYNPTR_RDONLY_BIT; +} + static void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type) { ptr->size |= type << DYNPTR_TYPE_SHIFT; } +static enum bpf_dynptr_type bpf_dynptr_get_type(const struct bpf_dynptr_kern *ptr) +{ + return (ptr->size & ~(DYNPTR_RDONLY_BIT)) >> DYNPTR_TYPE_SHIFT; +} + static u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr) { return ptr->size & DYNPTR_SIZE_MASK; @@ -1500,6 +1510,7 @@ static const struct bpf_func_proto bpf_dynptr_from_mem_proto = { BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src, u32, offset, u64, flags) { + enum bpf_dynptr_type type; int err; if (!src->data || flags) @@ -1509,6 +1520,11 @@ BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src if (err) return err; + type = bpf_dynptr_get_type(src); + + if (type == BPF_DYNPTR_TYPE_SKB) + return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len); + memcpy(dst, src->data + src->offset + offset, len); return 0; @@ -1528,15 +1544,38 @@ static const struct bpf_func_proto bpf_dynptr_read_proto = { BPF_CALL_5(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, src, u32, len, u64, flags) { + enum bpf_dynptr_type type; int err; - if (!dst->data || flags || bpf_dynptr_is_rdonly(dst)) + if (!dst->data || bpf_dynptr_is_rdonly(dst)) return -EINVAL; err = bpf_dynptr_check_off_len(dst, offset, len); if (err) return err; + type = bpf_dynptr_get_type(dst); + + if (flags) { + if (type == BPF_DYNPTR_TYPE_SKB) { + if (flags & ~(BPF_F_RECOMPUTE_CSUM | BPF_F_INVALIDATE_HASH)) + return -EINVAL; + } else { + return -EINVAL; + } + } + + if (type == BPF_DYNPTR_TYPE_SKB) { + struct sk_buff *skb = dst->data; + + /* if the data is paged, the caller needs to pull it first */ + if (dst->offset + offset + len > skb->len - skb->data_len) + return -EAGAIN; + + return __bpf_skb_store_bytes(skb, dst->offset + offset, src, len, + flags); + } + memcpy(dst->data + dst->offset + offset, src, len); return 0; @@ -1555,6 +1594,7 @@ static const struct bpf_func_proto bpf_dynptr_write_proto = { BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len) { + enum bpf_dynptr_type type; int err; if (!ptr->data) @@ -1567,6 +1607,18 @@ BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len if (bpf_dynptr_is_rdonly(ptr)) return 0; + type = bpf_dynptr_get_type(ptr); + + if (type == BPF_DYNPTR_TYPE_SKB) { + struct sk_buff *skb = ptr->data; + + /* if the data is paged, the caller needs to pull it first */ + if (ptr->offset + offset + len > skb->len - skb->data_len) + return 0; + + return (unsigned long)(skb->data + ptr->offset + offset); + } + return (unsigned long)(ptr->data + ptr->offset + offset); } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0d523741a543..0838653eeb4e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -263,6 +263,7 @@ struct bpf_call_arg_meta { u32 subprogno; struct bpf_map_value_off_desc *kptr_off_desc; u8 uninit_dynptr_regno; + enum bpf_dynptr_type type; }; struct btf *btf_vmlinux; @@ -678,6 +679,8 @@ static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type) return BPF_DYNPTR_TYPE_LOCAL; case DYNPTR_TYPE_RINGBUF: return BPF_DYNPTR_TYPE_RINGBUF; + case DYNPTR_TYPE_SKB: + return BPF_DYNPTR_TYPE_SKB; default: return BPF_DYNPTR_TYPE_INVALID; } @@ -5820,12 +5823,14 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, return __check_ptr_off_reg(env, reg, regno, fixed_off_ok); } -static u32 stack_slot_get_id(struct bpf_verifier_env *env, struct bpf_reg_state *reg) +static void stack_slot_get_dynptr_info(struct bpf_verifier_env *env, struct bpf_reg_state *reg, + struct bpf_call_arg_meta *meta) { struct bpf_func_state *state = func(env, reg); int spi = get_spi(reg->off); - return state->stack[spi].spilled_ptr.id; + meta->ref_obj_id = state->stack[spi].spilled_ptr.id; + meta->type = state->stack[spi].spilled_ptr.dynptr.type; } static int check_func_arg(struct bpf_verifier_env *env, u32 arg, @@ -6052,6 +6057,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, case DYNPTR_TYPE_RINGBUF: err_extra = "ringbuf "; break; + case DYNPTR_TYPE_SKB: + err_extra = "skb "; + break; default: break; } @@ -6065,8 +6073,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, verbose(env, "verifier internal error: multiple refcounted args in BPF_FUNC_dynptr_data"); return -EFAULT; } - /* Find the id of the dynptr we're tracking the reference of */ - meta->ref_obj_id = stack_slot_get_id(env, reg); + /* Find the id and the type of the dynptr we're tracking + * the reference of. + */ + stack_slot_get_dynptr_info(env, reg, meta); } } break; @@ -7406,7 +7416,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn regs[BPF_REG_0].type = PTR_TO_TCP_SOCK | ret_flag; } else if (base_type(ret_type) == RET_PTR_TO_ALLOC_MEM) { mark_reg_known_zero(env, regs, BPF_REG_0); - regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag; + if (func_id == BPF_FUNC_dynptr_data && + meta.type == BPF_DYNPTR_TYPE_SKB) + regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag; + else + regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag; regs[BPF_REG_0].mem_size = meta.mem_size; } else if (base_type(ret_type) == RET_PTR_TO_MEM_OR_BTF_ID) { const struct btf_type *t; @@ -14132,6 +14146,25 @@ static int do_misc_fixups(struct bpf_verifier_env *env) goto patch_call_imm; } + if (insn->imm == BPF_FUNC_dynptr_from_skb) { + if (!may_access_direct_pkt_data(env, NULL, BPF_WRITE)) + insn_buf[0] = BPF_MOV32_IMM(BPF_REG_4, true); + else + insn_buf[0] = BPF_MOV32_IMM(BPF_REG_4, false); + insn_buf[1] = *insn; + cnt = 2; + + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = new_prog; + prog = new_prog; + insn = new_prog->insnsi + i + delta; + goto patch_call_imm; + } + /* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup * and other inlining handlers are currently limited to 64 bit * only. diff --git a/net/core/filter.c b/net/core/filter.c index 5669248aff25..312f99deb759 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1681,8 +1681,8 @@ static inline void bpf_pull_mac_rcsum(struct sk_buff *skb) skb_postpull_rcsum(skb, skb_mac_header(skb), skb->mac_len); } -BPF_CALL_5(bpf_skb_store_bytes, struct sk_buff *, skb, u32, offset, - const void *, from, u32, len, u64, flags) +int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, + u32 len, u64 flags) { void *ptr; @@ -1707,6 +1707,12 @@ BPF_CALL_5(bpf_skb_store_bytes, struct sk_buff *, skb, u32, offset, return 0; } +BPF_CALL_5(bpf_skb_store_bytes, struct sk_buff *, skb, u32, offset, + const void *, from, u32, len, u64, flags) +{ + return __bpf_skb_store_bytes(skb, offset, from, len, flags); +} + static const struct bpf_func_proto bpf_skb_store_bytes_proto = { .func = bpf_skb_store_bytes, .gpl_only = false, @@ -1718,8 +1724,7 @@ static const struct bpf_func_proto bpf_skb_store_bytes_proto = { .arg5_type = ARG_ANYTHING, }; -BPF_CALL_4(bpf_skb_load_bytes, const struct sk_buff *, skb, u32, offset, - void *, to, u32, len) +int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len) { void *ptr; @@ -1738,6 +1743,12 @@ BPF_CALL_4(bpf_skb_load_bytes, const struct sk_buff *, skb, u32, offset, return -EFAULT; } +BPF_CALL_4(bpf_skb_load_bytes, const struct sk_buff *, skb, u32, offset, + void *, to, u32, len) +{ + return __bpf_skb_load_bytes(skb, offset, to, len); +} + static const struct bpf_func_proto bpf_skb_load_bytes_proto = { .func = bpf_skb_load_bytes, .gpl_only = false, @@ -1849,6 +1860,32 @@ static const struct bpf_func_proto bpf_skb_pull_data_proto = { .arg2_type = ARG_ANYTHING, }; +/* is_rdonly is set by the verifier */ +BPF_CALL_4(bpf_dynptr_from_skb, struct sk_buff *, skb, u64, flags, + struct bpf_dynptr_kern *, ptr, u32, is_rdonly) +{ + if (flags) { + bpf_dynptr_set_null(ptr); + return -EINVAL; + } + + bpf_dynptr_init(ptr, skb, BPF_DYNPTR_TYPE_SKB, 0, skb->len); + + if (is_rdonly) + bpf_dynptr_set_rdonly(ptr); + + return 0; +} + +static const struct bpf_func_proto bpf_dynptr_from_skb_proto = { + .func = bpf_dynptr_from_skb, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_SKB | MEM_UNINIT, +}; + BPF_CALL_1(bpf_sk_fullsock, struct sock *, sk) { return sk_fullsock(sk) ? (unsigned long)sk : (unsigned long)NULL; @@ -7808,6 +7845,8 @@ sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_socket_uid_proto; case BPF_FUNC_perf_event_output: return &bpf_skb_event_output_proto; + case BPF_FUNC_dynptr_from_skb: + return &bpf_dynptr_from_skb_proto; default: return bpf_sk_base_func_proto(func_id); } @@ -7991,6 +8030,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_tcp_raw_check_syncookie_ipv6_proto; #endif #endif + case BPF_FUNC_dynptr_from_skb: + return &bpf_dynptr_from_skb_proto; default: return bpf_sk_base_func_proto(func_id); } @@ -8186,6 +8227,8 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_skc_lookup_tcp: return &bpf_skc_lookup_tcp_proto; #endif + case BPF_FUNC_dynptr_from_skb: + return &bpf_dynptr_from_skb_proto; default: return bpf_sk_base_func_proto(func_id); } @@ -8224,6 +8267,8 @@ lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_get_smp_processor_id_proto; case BPF_FUNC_skb_under_cgroup: return &bpf_skb_under_cgroup_proto; + case BPF_FUNC_dynptr_from_skb: + return &bpf_dynptr_from_skb_proto; default: return bpf_sk_base_func_proto(func_id); } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 59a217ca2dfd..0730cd198a7f 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5241,11 +5241,22 @@ union bpf_attr { * Description * Write *len* bytes from *src* into *dst*, starting from *offset* * into *dst*. - * *flags* is currently unused. + * + * *flags* must be 0 except for skb-type dynptrs. + * + * For skb-type dynptrs: + * * if *offset* + *len* extends into the skb's paged buffers, the user + * should manually pull the skb with bpf_skb_pull and then try again. + * + * * *flags* are a combination of **BPF_F_RECOMPUTE_CSUM** (automatically + * recompute the checksum for the packet after storing the bytes) and + * **BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\ + * **->swhash** and *skb*\ **->l4hash** to 0). * Return * 0 on success, -E2BIG if *offset* + *len* exceeds the length * of *dst*'s data, -EINVAL if *dst* is an invalid dynptr or if *dst* - * is a read-only dynptr or if *flags* is not 0. + * is a read-only dynptr or if *flags* is not correct, -EAGAIN if for + * skb-type dynptrs the write extends into the skb's paged buffers. * * void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len) * Description @@ -5253,10 +5264,19 @@ union bpf_attr { * * *len* must be a statically known value. The returned data slice * is invalidated whenever the dynptr is invalidated. + * + * For skb-type dynptrs: + * * if *offset* + *len* extends into the skb's paged buffers, + * the user should manually pull the skb with bpf_skb_pull and then + * try again. + * + * * the data slice is automatically invalidated anytime a + * helper call that changes the underlying packet buffer + * (eg bpf_skb_pull) is called. * Return * Pointer to the underlying dynptr data, NULL if the dynptr is * read-only, if the dynptr is invalid, or if the offset and length - * is out of bounds. + * is out of bounds or in a paged buffer for skb-type dynptrs. * * s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr *th, u32 th_len) * Description @@ -5331,6 +5351,21 @@ union bpf_attr { * **-EACCES** if the SYN cookie is not valid. * * **-EPROTONOSUPPORT** if CONFIG_IPV6 is not builtin. + * + * long bpf_dynptr_from_skb(struct sk_buff *skb, u64 flags, struct bpf_dynptr *ptr) + * Description + * Get a dynptr to the data in *skb*. *skb* must be the BPF program + * context. Depending on program type, the dynptr may be read-only, + * in which case trying to obtain a direct data slice to it through + * bpf_dynptr_data will return an error. + * + * Calls that change the *skb*'s underlying packet buffer + * (eg bpf_skb_pull_data) do not invalidate the dynptr, but they do + * invalidate any data slices associated with the dynptr. + * + * *flags* is currently unused, it must be 0 for now. + * Return + * 0 on success or -EINVAL if flags is not 0. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5541,6 +5576,7 @@ union bpf_attr { FN(tcp_raw_gen_syncookie_ipv6), \ FN(tcp_raw_check_syncookie_ipv4), \ FN(tcp_raw_check_syncookie_ipv6), \ + FN(dynptr_from_skb), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Tue Jul 26 18:47:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 12929700 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98EB8C04A68 for ; Tue, 26 Jul 2022 18:47:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231379AbiGZSr6 (ORCPT ); Tue, 26 Jul 2022 14:47:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231690AbiGZSr5 (ORCPT ); Tue, 26 Jul 2022 14:47:57 -0400 Received: from 69-171-232-181.mail-mxout.facebook.com (69-171-232-181.mail-mxout.facebook.com [69.171.232.181]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 848BE33421 for ; Tue, 26 Jul 2022 11:47:56 -0700 (PDT) Received: by devbig010.atn6.facebook.com (Postfix, from userid 115148) id ED190F834E82; Tue, 26 Jul 2022 11:47:42 -0700 (PDT) From: Joanne Koong To: bpf@vger.kernel.org Cc: andrii@kernel.org, daniel@iogearbox.net, ast@kernel.org, Joanne Koong Subject: [PATCH bpf-next v1 2/3] bpf: Add xdp dynptrs Date: Tue, 26 Jul 2022 11:47:05 -0700 Message-Id: <20220726184706.954822-3-joannelkoong@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220726184706.954822-1-joannelkoong@gmail.com> References: <20220726184706.954822-1-joannelkoong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Add xdp dynptrs, which are dynptrs whose underlying pointer points to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of xdp->data and xdp->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For reads and writes on the dynptr, this includes reading/writing from/to and across fragments. For data slices, direct access to data in fragments is also permitted, but access across fragments is not. Any helper calls that change the underlying packet buffer (eg bpf_xdp_adjust_head) invalidates any data slices of the associated dynptr. Signed-off-by: Joanne Koong --- include/linux/bpf.h | 8 +++++- include/linux/filter.h | 3 +++ include/uapi/linux/bpf.h | 20 +++++++++++++-- kernel/bpf/helpers.c | 10 ++++++++ kernel/bpf/verifier.c | 7 +++++- net/core/filter.c | 46 +++++++++++++++++++++++++++++----- tools/include/uapi/linux/bpf.h | 20 +++++++++++++-- 7 files changed, 102 insertions(+), 12 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 7fbd4324c848..77e2c94cce52 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -410,11 +410,15 @@ enum bpf_type_flag { /* DYNPTR points to sk_buff */ DYNPTR_TYPE_SKB = BIT(11 + BPF_BASE_TYPE_BITS), + /* DYNPTR points to xdp_buff */ + DYNPTR_TYPE_XDP = BIT(12 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; -#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB) +#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \ + | DYNPTR_TYPE_XDP) /* Max number of base types. */ #define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS) @@ -2561,6 +2565,8 @@ enum bpf_dynptr_type { BPF_DYNPTR_TYPE_RINGBUF, /* Underlying data is a sk_buff */ BPF_DYNPTR_TYPE_SKB, + /* Underlying data is a xdp_buff */ + BPF_DYNPTR_TYPE_XDP, }; void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, diff --git a/include/linux/filter.h b/include/linux/filter.h index 649063d9cbfd..80f030239877 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1535,5 +1535,8 @@ static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifind int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len); int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags); +int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len); +int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len); +void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len); #endif /* __LINUX_FILTER_H__ */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 0730cd198a7f..559f9ba8b497 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5270,13 +5270,15 @@ union bpf_attr { * the user should manually pull the skb with bpf_skb_pull and then * try again. * + * For skb-type and xdp-type dynptrs: * * the data slice is automatically invalidated anytime a * helper call that changes the underlying packet buffer - * (eg bpf_skb_pull) is called. + * (eg bpf_skb_pull, bpf_xdp_adjust_head) is called. * Return * Pointer to the underlying dynptr data, NULL if the dynptr is * read-only, if the dynptr is invalid, or if the offset and length - * is out of bounds or in a paged buffer for skb-type dynptrs. + * is out of bounds or in a paged buffer for skb-type dynptrs or + * across fragments for xdp-type dynptrs. * * s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr *th, u32 th_len) * Description @@ -5366,6 +5368,19 @@ union bpf_attr { * *flags* is currently unused, it must be 0 for now. * Return * 0 on success or -EINVAL if flags is not 0. + * + * long bpf_dynptr_from_xdp(struct xdp_buff *xdp_md, u64 flags, struct bpf_dynptr *ptr) + * Description + * Get a dynptr to the data in *xdp_md*. *xdp_md* must be the BPF program + * context. + * + * Calls that change the *xdp_md*'s underlying packet buffer + * (eg bpf_xdp_adjust_head) do not invalidate the dynptr, but they do + * invalidate any data slices associated with the dynptr. + * + * *flags* is currently unused, it must be 0 for now. + * Return + * 0 on success, -EINVAL if flags is not 0. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5577,6 +5592,7 @@ union bpf_attr { FN(tcp_raw_check_syncookie_ipv4), \ FN(tcp_raw_check_syncookie_ipv6), \ FN(dynptr_from_skb), \ + FN(dynptr_from_xdp), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 21a806057e9e..3c6e349790f5 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1524,6 +1524,8 @@ BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src if (type == BPF_DYNPTR_TYPE_SKB) return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len); + else if (type == BPF_DYNPTR_TYPE_XDP) + return __bpf_xdp_load_bytes(src->data, src->offset + offset, dst, len); memcpy(dst, src->data + src->offset + offset, len); @@ -1574,6 +1576,8 @@ BPF_CALL_5(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, return __bpf_skb_store_bytes(skb, dst->offset + offset, src, len, flags); + } else if (type == BPF_DYNPTR_TYPE_XDP) { + return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len); } memcpy(dst->data + dst->offset + offset, src, len); @@ -1617,6 +1621,12 @@ BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len return 0; return (unsigned long)(skb->data + ptr->offset + offset); + } else if (type == BPF_DYNPTR_TYPE_XDP) { + /* if the requested data in across fragments, then it cannot + * be accessed directly - bpf_xdp_pointer will return NULL + */ + return (unsigned long)bpf_xdp_pointer(ptr->data, + ptr->offset + offset, len); } return (unsigned long)(ptr->data + ptr->offset + offset); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0838653eeb4e..6bb1f68539a8 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -681,6 +681,8 @@ static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type) return BPF_DYNPTR_TYPE_RINGBUF; case DYNPTR_TYPE_SKB: return BPF_DYNPTR_TYPE_SKB; + case DYNPTR_TYPE_XDP: + return BPF_DYNPTR_TYPE_XDP; default: return BPF_DYNPTR_TYPE_INVALID; } @@ -6060,6 +6062,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, case DYNPTR_TYPE_SKB: err_extra = "skb "; break; + case DYNPTR_TYPE_XDP: + err_extra = "xdp "; + break; default: break; } @@ -7417,7 +7422,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn } else if (base_type(ret_type) == RET_PTR_TO_ALLOC_MEM) { mark_reg_known_zero(env, regs, BPF_REG_0); if (func_id == BPF_FUNC_dynptr_data && - meta.type == BPF_DYNPTR_TYPE_SKB) + (meta.type == BPF_DYNPTR_TYPE_SKB || meta.type == BPF_DYNPTR_TYPE_XDP)) regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag; else regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag; diff --git a/net/core/filter.c b/net/core/filter.c index 312f99deb759..3c8ba88eabb4 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3825,7 +3825,29 @@ static const struct bpf_func_proto sk_skb_change_head_proto = { .arg3_type = ARG_ANYTHING, }; -BPF_CALL_1(bpf_xdp_get_buff_len, struct xdp_buff*, xdp) +BPF_CALL_3(bpf_dynptr_from_xdp, struct xdp_buff*, xdp, u64, flags, + struct bpf_dynptr_kern *, ptr) +{ + if (flags) { + bpf_dynptr_set_null(ptr); + return -EINVAL; + } + + bpf_dynptr_init(ptr, xdp, BPF_DYNPTR_TYPE_XDP, 0, xdp_get_buff_len(xdp)); + + return 0; +} + +static const struct bpf_func_proto bpf_dynptr_from_xdp_proto = { + .func = bpf_dynptr_from_xdp, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_XDP | MEM_UNINIT, +}; + +BPF_CALL_1(bpf_xdp_get_buff_len, struct xdp_buff*, xdp) { return xdp_get_buff_len(xdp); } @@ -3927,7 +3949,7 @@ static void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, } } -static void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) +void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) { struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); u32 size = xdp->data_end - xdp->data; @@ -3958,8 +3980,7 @@ static void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) return offset + len <= size ? addr + offset : NULL; } -BPF_CALL_4(bpf_xdp_load_bytes, struct xdp_buff *, xdp, u32, offset, - void *, buf, u32, len) +int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len) { void *ptr; @@ -3975,6 +3996,12 @@ BPF_CALL_4(bpf_xdp_load_bytes, struct xdp_buff *, xdp, u32, offset, return 0; } +BPF_CALL_4(bpf_xdp_load_bytes, struct xdp_buff *, xdp, u32, offset, + void *, buf, u32, len) +{ + return __bpf_xdp_load_bytes(xdp, offset, buf, len); +} + static const struct bpf_func_proto bpf_xdp_load_bytes_proto = { .func = bpf_xdp_load_bytes, .gpl_only = false, @@ -3985,8 +4012,7 @@ static const struct bpf_func_proto bpf_xdp_load_bytes_proto = { .arg4_type = ARG_CONST_SIZE, }; -BPF_CALL_4(bpf_xdp_store_bytes, struct xdp_buff *, xdp, u32, offset, - void *, buf, u32, len) +int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len) { void *ptr; @@ -4002,6 +4028,12 @@ BPF_CALL_4(bpf_xdp_store_bytes, struct xdp_buff *, xdp, u32, offset, return 0; } +BPF_CALL_4(bpf_xdp_store_bytes, struct xdp_buff *, xdp, u32, offset, + void *, buf, u32, len) +{ + return __bpf_xdp_store_bytes(xdp, offset, buf, len); +} + static const struct bpf_func_proto bpf_xdp_store_bytes_proto = { .func = bpf_xdp_store_bytes, .gpl_only = false, @@ -8091,6 +8123,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_tcp_raw_check_syncookie_ipv6_proto; #endif #endif + case BPF_FUNC_dynptr_from_xdp: + return &bpf_dynptr_from_xdp_proto; default: return bpf_sk_base_func_proto(func_id); } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 0730cd198a7f..559f9ba8b497 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5270,13 +5270,15 @@ union bpf_attr { * the user should manually pull the skb with bpf_skb_pull and then * try again. * + * For skb-type and xdp-type dynptrs: * * the data slice is automatically invalidated anytime a * helper call that changes the underlying packet buffer - * (eg bpf_skb_pull) is called. + * (eg bpf_skb_pull, bpf_xdp_adjust_head) is called. * Return * Pointer to the underlying dynptr data, NULL if the dynptr is * read-only, if the dynptr is invalid, or if the offset and length - * is out of bounds or in a paged buffer for skb-type dynptrs. + * is out of bounds or in a paged buffer for skb-type dynptrs or + * across fragments for xdp-type dynptrs. * * s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr *th, u32 th_len) * Description @@ -5366,6 +5368,19 @@ union bpf_attr { * *flags* is currently unused, it must be 0 for now. * Return * 0 on success or -EINVAL if flags is not 0. + * + * long bpf_dynptr_from_xdp(struct xdp_buff *xdp_md, u64 flags, struct bpf_dynptr *ptr) + * Description + * Get a dynptr to the data in *xdp_md*. *xdp_md* must be the BPF program + * context. + * + * Calls that change the *xdp_md*'s underlying packet buffer + * (eg bpf_xdp_adjust_head) do not invalidate the dynptr, but they do + * invalidate any data slices associated with the dynptr. + * + * *flags* is currently unused, it must be 0 for now. + * Return + * 0 on success, -EINVAL if flags is not 0. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5577,6 +5592,7 @@ union bpf_attr { FN(tcp_raw_check_syncookie_ipv4), \ FN(tcp_raw_check_syncookie_ipv6), \ FN(dynptr_from_skb), \ + FN(dynptr_from_xdp), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Tue Jul 26 18:47:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 12929701 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1DD3C00140 for ; Tue, 26 Jul 2022 18:48:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239623AbiGZSsC (ORCPT ); Tue, 26 Jul 2022 14:48:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231690AbiGZSsB (ORCPT ); Tue, 26 Jul 2022 14:48:01 -0400 Received: from 69-171-232-181.mail-mxout.facebook.com (69-171-232-181.mail-mxout.facebook.com [69.171.232.181]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 18D6B3341F for ; Tue, 26 Jul 2022 11:47:58 -0700 (PDT) Received: by devbig010.atn6.facebook.com (Postfix, from userid 115148) id 05286F834E84; Tue, 26 Jul 2022 11:47:43 -0700 (PDT) From: Joanne Koong To: bpf@vger.kernel.org Cc: andrii@kernel.org, daniel@iogearbox.net, ast@kernel.org, Joanne Koong Subject: [PATCH bpf-next v1 3/3] selftests/bpf: tests for using dynptrs to parse skb and xdp buffers Date: Tue, 26 Jul 2022 11:47:06 -0700 Message-Id: <20220726184706.954822-4-joannelkoong@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220726184706.954822-1-joannelkoong@gmail.com> References: <20220726184706.954822-1-joannelkoong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Test skb and xdp dynptr functionality in the following ways: 1) progs/test_xdp.c * Change existing test to use dynptrs to parse xdp data There were no noticeable diferences in user + system time between the original version vs. using dynptrs. Averaging the time for 10 runs (run using "time ./test_progs -t xdp_bpf2bpf"): original version: 0.0449 sec with dynptrs: 0.0429 sec 2) progs/test_l4lb_noinline.c * Change existing test to use dynptrs to parse skb data There were no noticeable diferences in user + system time between the original version vs. using dynptrs. Averaging the time for 10 runs (run using "time ./test_progs -t l4lb_all/l4lb_noinline"): original version: 0.0502 sec with dynptrs: 0.055 sec For number of processed verifier instructions: original version: 6284 insns with dynptrs: 2538 insns 3) progs/test_dynptr_xdp.c * Add sample code for parsing tcp hdr opt lookup using dynptrs. This logic is lifted from a real-world use case of packet parsing in katran [0], a layer 4 load balancer 4) progs/dynptr_success.c * Add test case "test_skb_readonly" for testing attempts at writes / data slices on a prog type with read-only skb ctx. 5) progs/dynptr_fail.c * Add test cases "skb_invalid_data_slice" and "xdp_invalid_data_slice" for testing that helpers that modify the underlying packet buffer automatically invalidate the associated data slice. * Add test cases "skb_invalid_ctx" and "xdp_invalid_ctx" for testing that prog types that do not support bpf_dynptr_from_skb/xdp don't have access to the API. [0] https://github.com/facebookincubator/katran/blob/main/katran/lib/bpf/pckt_parsing.h Signed-off-by: Joanne Koong --- .../testing/selftests/bpf/prog_tests/dynptr.c | 85 ++++++++++--- .../selftests/bpf/prog_tests/dynptr_xdp.c | 49 ++++++++ .../testing/selftests/bpf/progs/dynptr_fail.c | 76 ++++++++++++ .../selftests/bpf/progs/dynptr_success.c | 32 +++++ .../selftests/bpf/progs/test_dynptr_xdp.c | 115 ++++++++++++++++++ .../selftests/bpf/progs/test_l4lb_noinline.c | 71 +++++------ tools/testing/selftests/bpf/progs/test_xdp.c | 95 +++++++-------- .../selftests/bpf/test_tcp_hdr_options.h | 1 + 8 files changed, 416 insertions(+), 108 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr_xdp.c create mode 100644 tools/testing/selftests/bpf/progs/test_dynptr_xdp.c diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c index bcf80b9f7c27..c40631f33c7b 100644 --- a/tools/testing/selftests/bpf/prog_tests/dynptr.c +++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c @@ -2,6 +2,7 @@ /* Copyright (c) 2022 Facebook */ #include +#include #include "dynptr_fail.skel.h" #include "dynptr_success.skel.h" @@ -11,8 +12,8 @@ static char obj_log_buf[1048576]; static struct { const char *prog_name; const char *expected_err_msg; -} dynptr_tests[] = { - /* failure cases */ +} verifier_error_tests[] = { + /* these cases should trigger a verifier error */ {"ringbuf_missing_release1", "Unreleased reference id=1"}, {"ringbuf_missing_release2", "Unreleased reference id=2"}, {"ringbuf_missing_release_callback", "Unreleased reference id"}, @@ -42,11 +43,25 @@ static struct { {"release_twice_callback", "arg 1 is an unacquired reference"}, {"dynptr_from_mem_invalid_api", "Unsupported reg type fp for bpf_dynptr_from_mem data"}, + {"skb_invalid_data_slice", "invalid mem access 'scalar'"}, + {"xdp_invalid_data_slice", "invalid mem access 'scalar'"}, + {"skb_invalid_ctx", "unknown func bpf_dynptr_from_skb"}, + {"xdp_invalid_ctx", "unknown func bpf_dynptr_from_xdp"}, +}; + +enum test_setup_type { + SETUP_SYSCALL_SLEEP, + SETUP_SKB_PROG, +}; - /* success cases */ - {"test_read_write", NULL}, - {"test_data_slice", NULL}, - {"test_ringbuf", NULL}, +static struct { + const char *prog_name; + enum test_setup_type type; +} runtime_tests[] = { + {"test_read_write", SETUP_SYSCALL_SLEEP}, + {"test_data_slice", SETUP_SYSCALL_SLEEP}, + {"test_ringbuf", SETUP_SYSCALL_SLEEP}, + {"test_skb_readonly", SETUP_SKB_PROG}, }; static void verify_fail(const char *prog_name, const char *expected_err_msg) @@ -85,7 +100,7 @@ static void verify_fail(const char *prog_name, const char *expected_err_msg) dynptr_fail__destroy(skel); } -static void verify_success(const char *prog_name) +static void run_tests(const char *prog_name, enum test_setup_type setup_type) { struct dynptr_success *skel; struct bpf_program *prog; @@ -107,15 +122,42 @@ static void verify_success(const char *prog_name) if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) goto cleanup; - link = bpf_program__attach(prog); - if (!ASSERT_OK_PTR(link, "bpf_program__attach")) - goto cleanup; + switch (setup_type) { + case SETUP_SYSCALL_SLEEP: + link = bpf_program__attach(prog); + if (!ASSERT_OK_PTR(link, "bpf_program__attach")) + goto cleanup; - usleep(1); + usleep(1); - ASSERT_EQ(skel->bss->err, 0, "err"); + bpf_link__destroy(link); + break; + case SETUP_SKB_PROG: + { + int prog_fd, err; + char buf[64]; + + prog_fd = bpf_program__fd(prog); + if (CHECK_FAIL(prog_fd < 0)) + goto cleanup; + + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .data_out = buf, + .data_size_out = sizeof(buf), + .repeat = 1, + ); - bpf_link__destroy(link); + err = bpf_prog_test_run_opts(prog_fd, &topts); + + if (!ASSERT_OK(err, "test_run")) + goto cleanup; + + break; + } + } + ASSERT_EQ(skel->bss->err, 0, "err"); cleanup: dynptr_success__destroy(skel); @@ -125,14 +167,17 @@ void test_dynptr(void) { int i; - for (i = 0; i < ARRAY_SIZE(dynptr_tests); i++) { - if (!test__start_subtest(dynptr_tests[i].prog_name)) + for (i = 0; i < ARRAY_SIZE(verifier_error_tests); i++) { + if (!test__start_subtest(verifier_error_tests[i].prog_name)) + continue; + + verify_fail(verifier_error_tests[i].prog_name, + verifier_error_tests[i].expected_err_msg); + } + for (i = 0; i < ARRAY_SIZE(runtime_tests); i++) { + if (!test__start_subtest(runtime_tests[i].prog_name)) continue; - if (dynptr_tests[i].expected_err_msg) - verify_fail(dynptr_tests[i].prog_name, - dynptr_tests[i].expected_err_msg); - else - verify_success(dynptr_tests[i].prog_name); + run_tests(runtime_tests[i].prog_name, runtime_tests[i].type); } } diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr_xdp.c b/tools/testing/selftests/bpf/prog_tests/dynptr_xdp.c new file mode 100644 index 000000000000..ca775d126b60 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/dynptr_xdp.c @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include "test_dynptr_xdp.skel.h" +#include "test_tcp_hdr_options.h" + +struct test_pkt { + struct ipv6_packet pk6_v6; + u8 options[16]; +} __packed; + +void test_dynptr_xdp(void) +{ + struct test_dynptr_xdp *skel; + char buf[128]; + int err; + + skel = test_dynptr_xdp__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) + return; + + struct test_pkt pkt = { + .pk6_v6.eth.h_proto = __bpf_constant_htons(ETH_P_IPV6), + .pk6_v6.iph.nexthdr = IPPROTO_TCP, + .pk6_v6.iph.payload_len = __bpf_constant_htons(MAGIC_BYTES), + .pk6_v6.tcp.urg_ptr = 123, + .pk6_v6.tcp.doff = 9, /* 16 bytes of options */ + + .options = { + TCPOPT_MSS, 4, 0x05, 0xB4, TCPOPT_NOP, TCPOPT_NOP, + skel->rodata->tcp_hdr_opt_kind_tpr, 6, 0, 0, 0, 9, TCPOPT_EOL + }, + }; + + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt, + .data_size_in = sizeof(pkt), + .data_out = buf, + .data_size_out = sizeof(buf), + .repeat = 3, + ); + + err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.xdp_ingress_v6), &topts); + ASSERT_OK(err, "ipv6 test_run"); + ASSERT_EQ(skel->bss->server_id, 0x9000000, "server id"); + ASSERT_EQ(topts.retval, XDP_PASS, "ipv6 test_run retval"); + + test_dynptr_xdp__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/dynptr_fail.c b/tools/testing/selftests/bpf/progs/dynptr_fail.c index c1814938a5fd..4e3f853b2d02 100644 --- a/tools/testing/selftests/bpf/progs/dynptr_fail.c +++ b/tools/testing/selftests/bpf/progs/dynptr_fail.c @@ -5,6 +5,7 @@ #include #include #include +#include #include "bpf_misc.h" char _license[] SEC("license") = "GPL"; @@ -622,3 +623,78 @@ int dynptr_from_mem_invalid_api(void *ctx) return 0; } + +/* The data slice is invalidated whenever a helper changes packet data */ +SEC("?tc") +int skb_invalid_data_slice(struct __sk_buff *skb) +{ + struct bpf_dynptr ptr; + struct ethhdr *hdr; + + bpf_dynptr_from_skb(skb, 0, &ptr); + hdr = bpf_dynptr_data(&ptr, 0, sizeof(*hdr)); + if (!hdr) + return SK_DROP; + + hdr->h_proto = 12; + + if (bpf_skb_pull_data(skb, skb->len)) + return SK_DROP; + + /* this should fail */ + hdr->h_proto = 1; + + return SK_PASS; +} + +/* The data slice is invalidated whenever a helper changes packet data */ +SEC("?xdp") +int xdp_invalid_data_slice(struct xdp_md *xdp) +{ + struct bpf_dynptr ptr; + struct ethhdr *hdr1, *hdr2; + + bpf_dynptr_from_xdp(xdp, 0, &ptr); + hdr1 = bpf_dynptr_data(&ptr, 0, sizeof(*hdr1)); + if (!hdr1) + return XDP_DROP; + + hdr2 = bpf_dynptr_data(&ptr, 0, sizeof(*hdr2)); + if (!hdr2) + return XDP_DROP; + + hdr1->h_proto = 12; + hdr2->h_proto = 12; + + if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(*hdr1))) + return XDP_DROP; + + /* this should fail */ + hdr2->h_proto = 1; + + return XDP_PASS; +} + +/* Only supported prog type can create skb-type dynptrs */ +SEC("?raw_tp/sys_nanosleep") +int skb_invalid_ctx(void *ctx) +{ + struct bpf_dynptr ptr; + + /* this should fail */ + bpf_dynptr_from_skb(ctx, 0, &ptr); + + return 0; +} + +/* Only supported prog type can create xdp-type dynptrs */ +SEC("?raw_tp/sys_nanosleep") +int xdp_invalid_ctx(void *ctx) +{ + struct bpf_dynptr ptr; + + /* this should fail */ + bpf_dynptr_from_xdp(ctx, 0, &ptr); + + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/dynptr_success.c b/tools/testing/selftests/bpf/progs/dynptr_success.c index a3a6103c8569..ffddd6ddc7fb 100644 --- a/tools/testing/selftests/bpf/progs/dynptr_success.c +++ b/tools/testing/selftests/bpf/progs/dynptr_success.c @@ -162,3 +162,35 @@ int test_ringbuf(void *ctx) bpf_ringbuf_discard_dynptr(&ptr, 0); return 0; } + +SEC("cgroup_skb/egress") +int test_skb_readonly(void *ctx) +{ + __u8 write_data[2] = {1, 2}; + struct bpf_dynptr ptr; + void *data; + int ret; + + err = 1; + + if (bpf_dynptr_from_skb(ctx, 0, &ptr)) + return 0; + err++; + + data = bpf_dynptr_data(&ptr, 0, 1); + if (data) + /* it's an error if data is not NULL since cgroup skbs + * are read only + */ + return 0; + err++; + + ret = bpf_dynptr_write(&ptr, 0, write_data, sizeof(write_data), 0); + /* since cgroup skbs are read only, writes should fail */ + if (ret != -EINVAL) + return 0; + + err = 0; + + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/test_dynptr_xdp.c b/tools/testing/selftests/bpf/progs/test_dynptr_xdp.c new file mode 100644 index 000000000000..c879dfb6370a --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_dynptr_xdp.c @@ -0,0 +1,115 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* This logic is lifted from a real-world use case of packet parsing, used in + * the open source library katran, a layer 4 load balancer. + * + * This test demonstrates how to parse packet contents using dynptrs. + * + * https://github.com/facebookincubator/katran/blob/main/katran/lib/bpf/pckt_parsing.h + */ + +#include +#include +#include +#include +#include +#include +#include +#include "test_tcp_hdr_options.h" + +char _license[] SEC("license") = "GPL"; + +/* Arbitrarily picked unused value from IANA TCP Option Kind Numbers */ +const __u32 tcp_hdr_opt_kind_tpr = 0xB7; +/* Length of the tcp header option */ +const __u32 tcp_hdr_opt_len_tpr = 6; +/* maximum number of header options to check to lookup server_id */ +const __u32 tcp_hdr_opt_max_opt_checks = 15; + +__u32 server_id; + +static int parse_hdr_opt(struct bpf_dynptr *ptr, __u32 *off, __u8 *hdr_bytes_remaining, + __u32 *server_id) +{ + __u8 *tcp_opt, kind, hdr_len; + __u8 *data; + + data = bpf_dynptr_data(ptr, *off, sizeof(kind) + sizeof(hdr_len) + + sizeof(*server_id)); + if (!data) + return -1; + + kind = data[0]; + + if (kind == TCPOPT_EOL) + return -1; + + if (kind == TCPOPT_NOP) { + *off += 1; + /* continue to the next option */ + *hdr_bytes_remaining -= 1; + + return 0; + } + + if (*hdr_bytes_remaining < 2) + return -1; + + hdr_len = data[1]; + if (hdr_len > *hdr_bytes_remaining) + return -1; + + if (kind == tcp_hdr_opt_kind_tpr) { + if (hdr_len != tcp_hdr_opt_len_tpr) + return -1; + + memcpy(server_id, (__u32 *)(data + 2), sizeof(*server_id)); + return 1; + } + + *off += hdr_len; + *hdr_bytes_remaining -= hdr_len; + + return 0; +} + +SEC("xdp") +int xdp_ingress_v6(struct xdp_md *xdp) +{ + __u8 hdr_bytes_remaining; + struct tcphdr *tcp_hdr; + __u8 tcp_hdr_opt_len; + int err = 0; + __u32 off; + + struct bpf_dynptr ptr; + + bpf_dynptr_from_xdp(xdp, 0, &ptr); + + off = sizeof(struct ethhdr) + sizeof(struct ipv6hdr); + + tcp_hdr = bpf_dynptr_data(&ptr, off, sizeof(*tcp_hdr)); + if (!tcp_hdr) + return XDP_DROP; + + tcp_hdr_opt_len = (tcp_hdr->doff * 4) - sizeof(struct tcphdr); + if (tcp_hdr_opt_len < tcp_hdr_opt_len_tpr) + return XDP_DROP; + + hdr_bytes_remaining = tcp_hdr_opt_len; + + off += sizeof(struct tcphdr); + + /* max number of bytes of options in tcp header is 40 bytes */ + for (int i = 0; i < tcp_hdr_opt_max_opt_checks; i++) { + err = parse_hdr_opt(&ptr, &off, &hdr_bytes_remaining, &server_id); + + if (err || !hdr_bytes_remaining) + break; + } + + if (!server_id) + return XDP_DROP; + + return XDP_PASS; +} diff --git a/tools/testing/selftests/bpf/progs/test_l4lb_noinline.c b/tools/testing/selftests/bpf/progs/test_l4lb_noinline.c index c8bc0c6947aa..1fef7868ea8b 100644 --- a/tools/testing/selftests/bpf/progs/test_l4lb_noinline.c +++ b/tools/testing/selftests/bpf/progs/test_l4lb_noinline.c @@ -230,21 +230,18 @@ static __noinline bool get_packet_dst(struct real_definition **real, return true; } -static __noinline int parse_icmpv6(void *data, void *data_end, __u64 off, +static __noinline int parse_icmpv6(struct bpf_dynptr *skb_ptr, __u64 off, struct packet_description *pckt) { struct icmp6hdr *icmp_hdr; struct ipv6hdr *ip6h; - icmp_hdr = data + off; - if (icmp_hdr + 1 > data_end) + icmp_hdr = bpf_dynptr_data(skb_ptr, off, sizeof(*icmp_hdr) + sizeof(*ip6h)); + if (!icmp_hdr) return TC_ACT_SHOT; if (icmp_hdr->icmp6_type != ICMPV6_PKT_TOOBIG) return TC_ACT_OK; - off += sizeof(struct icmp6hdr); - ip6h = data + off; - if (ip6h + 1 > data_end) - return TC_ACT_SHOT; + ip6h = (struct ipv6hdr *)(icmp_hdr + 1); pckt->proto = ip6h->nexthdr; pckt->flags |= F_ICMP; memcpy(pckt->srcv6, ip6h->daddr.s6_addr32, 16); @@ -252,22 +249,19 @@ static __noinline int parse_icmpv6(void *data, void *data_end, __u64 off, return TC_ACT_UNSPEC; } -static __noinline int parse_icmp(void *data, void *data_end, __u64 off, +static __noinline int parse_icmp(struct bpf_dynptr *skb_ptr, __u64 off, struct packet_description *pckt) { struct icmphdr *icmp_hdr; struct iphdr *iph; - icmp_hdr = data + off; - if (icmp_hdr + 1 > data_end) + icmp_hdr = bpf_dynptr_data(skb_ptr, off, sizeof(*icmp_hdr) + sizeof(*iph)); + if (!icmp_hdr) return TC_ACT_SHOT; if (icmp_hdr->type != ICMP_DEST_UNREACH || icmp_hdr->code != ICMP_FRAG_NEEDED) return TC_ACT_OK; - off += sizeof(struct icmphdr); - iph = data + off; - if (iph + 1 > data_end) - return TC_ACT_SHOT; + iph = (struct iphdr *)(icmp_hdr + 1); if (iph->ihl != 5) return TC_ACT_SHOT; pckt->proto = iph->protocol; @@ -277,13 +271,13 @@ static __noinline int parse_icmp(void *data, void *data_end, __u64 off, return TC_ACT_UNSPEC; } -static __noinline bool parse_udp(void *data, __u64 off, void *data_end, +static __noinline bool parse_udp(struct bpf_dynptr *skb_ptr, __u64 off, struct packet_description *pckt) { struct udphdr *udp; - udp = data + off; - if (udp + 1 > data_end) + udp = bpf_dynptr_data(skb_ptr, off, sizeof(*udp)); + if (!udp) return false; if (!(pckt->flags & F_ICMP)) { @@ -296,13 +290,13 @@ static __noinline bool parse_udp(void *data, __u64 off, void *data_end, return true; } -static __noinline bool parse_tcp(void *data, __u64 off, void *data_end, +static __noinline bool parse_tcp(struct bpf_dynptr *skb_ptr, __u64 off, struct packet_description *pckt) { struct tcphdr *tcp; - tcp = data + off; - if (tcp + 1 > data_end) + tcp = bpf_dynptr_data(skb_ptr, off, sizeof(*tcp)); + if (!tcp) return false; if (tcp->syn) @@ -318,12 +312,11 @@ static __noinline bool parse_tcp(void *data, __u64 off, void *data_end, return true; } -static __noinline int process_packet(void *data, __u64 off, void *data_end, +static __noinline int process_packet(struct bpf_dynptr *skb_ptr, + struct eth_hdr *eth, __u64 off, bool is_ipv6, struct __sk_buff *skb) { - void *pkt_start = (void *)(long)skb->data; struct packet_description pckt = {}; - struct eth_hdr *eth = pkt_start; struct bpf_tunnel_key tkey = {}; struct vip_stats *data_stats; struct real_definition *dst; @@ -344,8 +337,8 @@ static __noinline int process_packet(void *data, __u64 off, void *data_end, tkey.tunnel_ttl = 64; if (is_ipv6) { - ip6h = data + off; - if (ip6h + 1 > data_end) + ip6h = bpf_dynptr_data(skb_ptr, off, sizeof(*ip6h)); + if (!ip6h) return TC_ACT_SHOT; iph_len = sizeof(struct ipv6hdr); @@ -356,7 +349,7 @@ static __noinline int process_packet(void *data, __u64 off, void *data_end, if (protocol == IPPROTO_FRAGMENT) { return TC_ACT_SHOT; } else if (protocol == IPPROTO_ICMPV6) { - action = parse_icmpv6(data, data_end, off, &pckt); + action = parse_icmpv6(skb_ptr, off, &pckt); if (action >= 0) return action; off += IPV6_PLUS_ICMP_HDR; @@ -365,10 +358,8 @@ static __noinline int process_packet(void *data, __u64 off, void *data_end, memcpy(pckt.dstv6, ip6h->daddr.s6_addr32, 16); } } else { - iph = data + off; - if (iph + 1 > data_end) - return TC_ACT_SHOT; - if (iph->ihl != 5) + iph = bpf_dynptr_data(skb_ptr, off, sizeof(*iph)); + if (!iph || iph->ihl != 5) return TC_ACT_SHOT; protocol = iph->protocol; @@ -379,7 +370,7 @@ static __noinline int process_packet(void *data, __u64 off, void *data_end, if (iph->frag_off & PCKT_FRAGMENTED) return TC_ACT_SHOT; if (protocol == IPPROTO_ICMP) { - action = parse_icmp(data, data_end, off, &pckt); + action = parse_icmp(skb_ptr, off, &pckt); if (action >= 0) return action; off += IPV4_PLUS_ICMP_HDR; @@ -391,10 +382,10 @@ static __noinline int process_packet(void *data, __u64 off, void *data_end, protocol = pckt.proto; if (protocol == IPPROTO_TCP) { - if (!parse_tcp(data, off, data_end, &pckt)) + if (!parse_tcp(skb_ptr, off, &pckt)) return TC_ACT_SHOT; } else if (protocol == IPPROTO_UDP) { - if (!parse_udp(data, off, data_end, &pckt)) + if (!parse_udp(skb_ptr, off, &pckt)) return TC_ACT_SHOT; } else { return TC_ACT_SHOT; @@ -450,20 +441,22 @@ static __noinline int process_packet(void *data, __u64 off, void *data_end, SEC("tc") int balancer_ingress(struct __sk_buff *ctx) { - void *data_end = (void *)(long)ctx->data_end; - void *data = (void *)(long)ctx->data; - struct eth_hdr *eth = data; + struct bpf_dynptr ptr; + struct eth_hdr *eth; __u32 eth_proto; __u32 nh_off; nh_off = sizeof(struct eth_hdr); - if (data + nh_off > data_end) + + bpf_dynptr_from_skb(ctx, 0, &ptr); + eth = bpf_dynptr_data(&ptr, 0, sizeof(*eth)); + if (!eth) return TC_ACT_SHOT; eth_proto = eth->eth_proto; if (eth_proto == bpf_htons(ETH_P_IP)) - return process_packet(data, nh_off, data_end, false, ctx); + return process_packet(&ptr, eth, nh_off, false, ctx); else if (eth_proto == bpf_htons(ETH_P_IPV6)) - return process_packet(data, nh_off, data_end, true, ctx); + return process_packet(&ptr, eth, nh_off, true, ctx); else return TC_ACT_SHOT; } diff --git a/tools/testing/selftests/bpf/progs/test_xdp.c b/tools/testing/selftests/bpf/progs/test_xdp.c index d7a9a74b7245..2272b56a8777 100644 --- a/tools/testing/selftests/bpf/progs/test_xdp.c +++ b/tools/testing/selftests/bpf/progs/test_xdp.c @@ -20,6 +20,12 @@ #include #include "test_iptunnel_common.h" +const size_t tcphdr_sz = sizeof(struct tcphdr); +const size_t udphdr_sz = sizeof(struct udphdr); +const size_t ethhdr_sz = sizeof(struct ethhdr); +const size_t iphdr_sz = sizeof(struct iphdr); +const size_t ipv6hdr_sz = sizeof(struct ipv6hdr); + struct { __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); __uint(max_entries, 256); @@ -43,8 +49,7 @@ static __always_inline void count_tx(__u32 protocol) *rxcnt_count += 1; } -static __always_inline int get_dport(void *trans_data, void *data_end, - __u8 protocol) +static __always_inline int get_dport(void *trans_data, __u8 protocol) { struct tcphdr *th; struct udphdr *uh; @@ -52,13 +57,9 @@ static __always_inline int get_dport(void *trans_data, void *data_end, switch (protocol) { case IPPROTO_TCP: th = (struct tcphdr *)trans_data; - if (th + 1 > data_end) - return -1; return th->dest; case IPPROTO_UDP: uh = (struct udphdr *)trans_data; - if (uh + 1 > data_end) - return -1; return uh->dest; default: return 0; @@ -75,14 +76,13 @@ static __always_inline void set_ethhdr(struct ethhdr *new_eth, new_eth->h_proto = h_proto; } -static __always_inline int handle_ipv4(struct xdp_md *xdp) +static __always_inline int handle_ipv4(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr) { - void *data_end = (void *)(long)xdp->data_end; - void *data = (void *)(long)xdp->data; + struct bpf_dynptr new_xdp_ptr; struct iptnl_info *tnl; struct ethhdr *new_eth; struct ethhdr *old_eth; - struct iphdr *iph = data + sizeof(struct ethhdr); + struct iphdr *iph; __u16 *next_iph; __u16 payload_len; struct vip vip = {}; @@ -90,10 +90,12 @@ static __always_inline int handle_ipv4(struct xdp_md *xdp) __u32 csum = 0; int i; - if (iph + 1 > data_end) + iph = bpf_dynptr_data(xdp_ptr, ethhdr_sz, + iphdr_sz + (tcphdr_sz > udphdr_sz ? tcphdr_sz : udphdr_sz)); + if (!iph) return XDP_DROP; - dport = get_dport(iph + 1, data_end, iph->protocol); + dport = get_dport(iph + 1, iph->protocol); if (dport == -1) return XDP_DROP; @@ -108,37 +110,33 @@ static __always_inline int handle_ipv4(struct xdp_md *xdp) if (!tnl || tnl->family != AF_INET) return XDP_PASS; - if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct iphdr))) + if (bpf_xdp_adjust_head(xdp, 0 - (int)iphdr_sz)) return XDP_DROP; - data = (void *)(long)xdp->data; - data_end = (void *)(long)xdp->data_end; - - new_eth = data; - iph = data + sizeof(*new_eth); - old_eth = data + sizeof(*iph); - - if (new_eth + 1 > data_end || - old_eth + 1 > data_end || - iph + 1 > data_end) + bpf_dynptr_from_xdp(xdp, 0, &new_xdp_ptr); + new_eth = bpf_dynptr_data(&new_xdp_ptr, 0, ethhdr_sz + iphdr_sz + ethhdr_sz); + if (!new_eth) return XDP_DROP; + iph = (struct iphdr *)(new_eth + 1); + old_eth = (struct ethhdr *)(iph + 1); + set_ethhdr(new_eth, old_eth, tnl, bpf_htons(ETH_P_IP)); iph->version = 4; - iph->ihl = sizeof(*iph) >> 2; + iph->ihl = iphdr_sz >> 2; iph->frag_off = 0; iph->protocol = IPPROTO_IPIP; iph->check = 0; iph->tos = 0; - iph->tot_len = bpf_htons(payload_len + sizeof(*iph)); + iph->tot_len = bpf_htons(payload_len + iphdr_sz); iph->daddr = tnl->daddr.v4; iph->saddr = tnl->saddr.v4; iph->ttl = 8; next_iph = (__u16 *)iph; #pragma clang loop unroll(full) - for (i = 0; i < sizeof(*iph) >> 1; i++) + for (i = 0; i < iphdr_sz >> 1; i++) csum += *next_iph++; iph->check = ~((csum & 0xffff) + (csum >> 16)); @@ -148,22 +146,23 @@ static __always_inline int handle_ipv4(struct xdp_md *xdp) return XDP_TX; } -static __always_inline int handle_ipv6(struct xdp_md *xdp) +static __always_inline int handle_ipv6(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr) { - void *data_end = (void *)(long)xdp->data_end; - void *data = (void *)(long)xdp->data; + struct bpf_dynptr new_xdp_ptr; struct iptnl_info *tnl; struct ethhdr *new_eth; struct ethhdr *old_eth; - struct ipv6hdr *ip6h = data + sizeof(struct ethhdr); + struct ipv6hdr *ip6h; __u16 payload_len; struct vip vip = {}; int dport; - if (ip6h + 1 > data_end) + ip6h = bpf_dynptr_data(xdp_ptr, ethhdr_sz, + ipv6hdr_sz + (tcphdr_sz > udphdr_sz ? tcphdr_sz : udphdr_sz)); + if (!ip6h) return XDP_DROP; - dport = get_dport(ip6h + 1, data_end, ip6h->nexthdr); + dport = get_dport(ip6h + 1, ip6h->nexthdr); if (dport == -1) return XDP_DROP; @@ -178,26 +177,23 @@ static __always_inline int handle_ipv6(struct xdp_md *xdp) if (!tnl || tnl->family != AF_INET6) return XDP_PASS; - if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct ipv6hdr))) + if (bpf_xdp_adjust_head(xdp, 0 - (int)ipv6hdr_sz)) return XDP_DROP; - data = (void *)(long)xdp->data; - data_end = (void *)(long)xdp->data_end; - - new_eth = data; - ip6h = data + sizeof(*new_eth); - old_eth = data + sizeof(*ip6h); - - if (new_eth + 1 > data_end || old_eth + 1 > data_end || - ip6h + 1 > data_end) + bpf_dynptr_from_xdp(xdp, 0, &new_xdp_ptr); + new_eth = bpf_dynptr_data(&new_xdp_ptr, 0, ethhdr_sz + ipv6hdr_sz + ethhdr_sz); + if (!new_eth) return XDP_DROP; + ip6h = (struct ipv6hdr *)(new_eth + 1); + old_eth = (struct ethhdr *)(ip6h + 1); + set_ethhdr(new_eth, old_eth, tnl, bpf_htons(ETH_P_IPV6)); ip6h->version = 6; ip6h->priority = 0; memset(ip6h->flow_lbl, 0, sizeof(ip6h->flow_lbl)); - ip6h->payload_len = bpf_htons(bpf_ntohs(payload_len) + sizeof(*ip6h)); + ip6h->payload_len = bpf_htons(bpf_ntohs(payload_len) + ipv6hdr_sz); ip6h->nexthdr = IPPROTO_IPV6; ip6h->hop_limit = 8; memcpy(ip6h->saddr.s6_addr32, tnl->saddr.v6, sizeof(tnl->saddr.v6)); @@ -211,21 +207,22 @@ static __always_inline int handle_ipv6(struct xdp_md *xdp) SEC("xdp") int _xdp_tx_iptunnel(struct xdp_md *xdp) { - void *data_end = (void *)(long)xdp->data_end; - void *data = (void *)(long)xdp->data; - struct ethhdr *eth = data; + struct bpf_dynptr ptr; + struct ethhdr *eth; __u16 h_proto; - if (eth + 1 > data_end) + bpf_dynptr_from_xdp(xdp, 0, &ptr); + eth = bpf_dynptr_data(&ptr, 0, ethhdr_sz); + if (!eth) return XDP_DROP; h_proto = eth->h_proto; if (h_proto == bpf_htons(ETH_P_IP)) - return handle_ipv4(xdp); + return handle_ipv4(xdp, &ptr); else if (h_proto == bpf_htons(ETH_P_IPV6)) - return handle_ipv6(xdp); + return handle_ipv6(xdp, &ptr); else return XDP_DROP; } diff --git a/tools/testing/selftests/bpf/test_tcp_hdr_options.h b/tools/testing/selftests/bpf/test_tcp_hdr_options.h index 6118e3ab61fc..56c9f8a3ad3d 100644 --- a/tools/testing/selftests/bpf/test_tcp_hdr_options.h +++ b/tools/testing/selftests/bpf/test_tcp_hdr_options.h @@ -50,6 +50,7 @@ struct linum_err { #define TCPOPT_EOL 0 #define TCPOPT_NOP 1 +#define TCPOPT_MSS 2 #define TCPOPT_WINDOW 3 #define TCPOPT_EXP 254