Message ID | 20220822235649.2218031-3-joannelkoong@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | Add skb + xdp dynptrs | expand |
+Cc XDP folks On Tue, 23 Aug 2022 at 02:12, Joanne Koong <joannelkoong@gmail.com> wrote: > > Add xdp dynptrs, which are dynptrs whose underlying pointer points > to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main > benefits. One is that they allow operations on sizes that are not > statically known at compile-time (eg variable-sized accesses). > Another is that parsing the packet data through dynptrs (instead of > through direct access of xdp->data and xdp->data_end) can be more > ergonomic and less brittle (eg does not need manual if checking for > being within bounds of data_end). > > For reads and writes on the dynptr, this includes reading/writing > from/to and across fragments. For data slices, direct access to It's a bit awkward to have such a difference between xdp and skb dynptr's read/write. I understand why it is the way it is, but it still doesn't feel right. I'm not sure if we can reconcile the differences, but it makes writing common code for both xdp and tc harder as it needs to be aware of the differences (and then the flags for dynptr_write would differ too). So we're 90% there but not the whole way... > data in fragments is also permitted, but access across fragments > is not. The returned data slice is reg type PTR_TO_PACKET | PTR_MAYBE_NULL. > > Any helper calls that change the underlying packet buffer (eg > bpf_xdp_adjust_head) invalidates any data slices of the associated > dynptr. Whenever such a helper call is made, the verifier marks any > PTR_TO_PACKET reg type (which includes xdp dynptr slices since they are > PTR_TO_PACKETs) as unknown. The stack trace for this is > check_helper_call() -> clear_all_pkt_pointers() -> > __clear_all_pkt_pointers() -> mark_reg_unknown() > > For examples of how xdp dynptrs can be used, please see the attached > selftests. > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com> > --- > include/linux/bpf.h | 6 ++++- > include/linux/filter.h | 3 +++ > include/uapi/linux/bpf.h | 25 +++++++++++++++--- > kernel/bpf/helpers.c | 14 ++++++++++- > kernel/bpf/verifier.c | 8 +++++- > net/core/filter.c | 46 +++++++++++++++++++++++++++++----- > tools/include/uapi/linux/bpf.h | 25 +++++++++++++++--- > 7 files changed, 112 insertions(+), 15 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 30615d1a0c13..455a215b6c57 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -410,11 +410,15 @@ enum bpf_type_flag { > /* DYNPTR points to sk_buff */ > DYNPTR_TYPE_SKB = BIT(11 + BPF_BASE_TYPE_BITS), > > + /* DYNPTR points to xdp_buff */ > + DYNPTR_TYPE_XDP = BIT(12 + BPF_BASE_TYPE_BITS), > + > __BPF_TYPE_FLAG_MAX, > __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, > }; > > -#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB) > +#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \ > + | DYNPTR_TYPE_XDP) > > /* Max number of base types. */ > #define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS) > diff --git a/include/linux/filter.h b/include/linux/filter.h > index 649063d9cbfd..80f030239877 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -1535,5 +1535,8 @@ static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifind > int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len); > int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, > u32 len, u64 flags); > +int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len); > +int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len); > +void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len); > > #endif /* __LINUX_FILTER_H__ */ > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 320e6b95d95c..9feea29eebcd 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -5283,13 +5283,18 @@ union bpf_attr { > * and try again. > * > * * The data slice is automatically invalidated anytime > - * **bpf_dynptr_write**\ () or a helper call that changes > - * the underlying packet buffer (eg **bpf_skb_pull_data**\ ()) > + * **bpf_dynptr_write**\ () is called. > + * > + * For skb-type and xdp-type dynptrs: > + * * The data slice is automatically invalidated anytime a > + * helper call that changes the underlying packet buffer > + * (eg **bpf_skb_pull_data**\ (), **bpf_xdp_adjust_head**\ ()) > * is called. > * Return > * Pointer to the underlying dynptr data, NULL if the dynptr is > * read-only, if the dynptr is invalid, or if the offset and length > - * is out of bounds or in a paged buffer for skb-type dynptrs. > + * is out of bounds or in a paged buffer for skb-type dynptrs or > + * across fragments for xdp-type dynptrs. > * > * s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr *th, u32 th_len) > * Description > @@ -5388,6 +5393,19 @@ union bpf_attr { > * *flags* is currently unused, it must be 0 for now. > * Return > * 0 on success or -EINVAL if flags is not 0. > + * > + * long bpf_dynptr_from_xdp(struct xdp_buff *xdp_md, u64 flags, struct bpf_dynptr *ptr) > + * Description > + * Get a dynptr to the data in *xdp_md*. *xdp_md* must be the BPF program > + * context. > + * > + * Calls that change the *xdp_md*'s underlying packet buffer > + * (eg **bpf_xdp_adjust_head**\ ()) do not invalidate the dynptr, but > + * they do invalidate any data slices associated with the dynptr. > + * > + * *flags* is currently unused, it must be 0 for now. > + * Return > + * 0 on success, -EINVAL if flags is not 0. > */ > #define __BPF_FUNC_MAPPER(FN) \ > FN(unspec), \ > @@ -5600,6 +5618,7 @@ union bpf_attr { > FN(tcp_raw_check_syncookie_ipv6), \ > FN(ktime_get_tai_ns), \ > FN(dynptr_from_skb), \ > + FN(dynptr_from_xdp), \ > /* */ > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index 471a01a9b6ae..2b9dc4c6de04 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -1541,6 +1541,8 @@ BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src > return 0; > case BPF_DYNPTR_TYPE_SKB: > return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len); > + case BPF_DYNPTR_TYPE_XDP: > + return __bpf_xdp_load_bytes(src->data, src->offset + offset, dst, len); > default: > WARN(true, "bpf_dynptr_read: unknown dynptr type %d\n", type); > return -EFAULT; > @@ -1583,6 +1585,10 @@ BPF_CALL_5(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, > case BPF_DYNPTR_TYPE_SKB: > return __bpf_skb_store_bytes(dst->data, dst->offset + offset, src, len, > flags); > + case BPF_DYNPTR_TYPE_XDP: > + if (flags) > + return -EINVAL; > + return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len); > default: > WARN(true, "bpf_dynptr_write: unknown dynptr type %d\n", type); > return -EFAULT; > @@ -1616,7 +1622,7 @@ BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len > > type = bpf_dynptr_get_type(ptr); > > - /* Only skb dynptrs can get read-only data slices, because the > + /* Only skb and xdp dynptrs can get read-only data slices, because the > * verifier enforces PTR_TO_PACKET accesses > */ > is_rdonly = bpf_dynptr_is_rdonly(ptr); > @@ -1640,6 +1646,12 @@ BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len > data = skb->data; > break; > } > + case BPF_DYNPTR_TYPE_XDP: > + /* if the requested data in across fragments, then it cannot > + * be accessed directly - bpf_xdp_pointer will return NULL > + */ > + return (unsigned long)bpf_xdp_pointer(ptr->data, > + ptr->offset + offset, len); > default: > WARN(true, "bpf_dynptr_data: unknown dynptr type %d\n", type); > return 0; > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 1ea295f47525..d33648eee188 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -686,6 +686,8 @@ static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type) > return BPF_DYNPTR_TYPE_RINGBUF; > case DYNPTR_TYPE_SKB: > return BPF_DYNPTR_TYPE_SKB; > + case DYNPTR_TYPE_XDP: > + return BPF_DYNPTR_TYPE_XDP; > default: > return BPF_DYNPTR_TYPE_INVALID; > } > @@ -6078,6 +6080,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, > case DYNPTR_TYPE_SKB: > err_extra = "skb "; > break; > + case DYNPTR_TYPE_XDP: > + err_extra = "xdp "; > + break; > } > > verbose(env, "Expected an initialized %sdynptr as arg #%d\n", > @@ -7439,7 +7444,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn > mark_reg_known_zero(env, regs, BPF_REG_0); > > if (func_id == BPF_FUNC_dynptr_data && > - dynptr_type == BPF_DYNPTR_TYPE_SKB) { > + (dynptr_type == BPF_DYNPTR_TYPE_SKB || > + dynptr_type == BPF_DYNPTR_TYPE_XDP)) { > regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag; > regs[BPF_REG_0].range = meta.mem_size; It doesn't seem like this is safe. Since PTR_TO_PACKET's range can be modified by comparisons with packet pointers loaded from the xdp/skb ctx, how do we distinguish e.g. between a pkt slice obtained from some frag in a multi-buff XDP vs pkt pointer from a linear area? Someone can compare data_meta from ctx with PTR_TO_PACKET from bpf_dynptr_data on xdp dynptr (which might be pointing to a xdp mb frag). While MAX_PACKET_OFF is 0xffff, it can still be used to do OOB access for the linear area. reg_is_init_pkt_pointer will return true as modified range is not considered for it. Same kind of issues when doing comparison with data_end from ctx (though maybe you won't be able to do incorrect data access at runtime using that). I had a pkt_uid field in my patch [0] which disallowed comparisons among bpf_packet_pointer slices. Each call assigned a fresh pkt_uid, and that disabled comparisons for them. reg->id is used for var_off range propagation so it cannot be reused. Coming back to this: What we really want here is a PTR_TO_MEM with a mem_size, so maybe you should go that route instead of PTR_TO_PACKET (and add a type tag to maybe pretty print it also as a packet pointer in verifier log), or add some way to distinguish slice vs non-slice pkt pointers like I did in my patch. You might also want to add some tests for this corner case (there are some later in [0] if you want to reuse them). So TBH, I kinda dislike my own solution in [0] :). The complexity does not seem worth it. The pkt_uid distinction is more useful (and actually would be needed) in Toke's xdp queueing series, where in a dequeue program you have multiple xdp_mds and want scoped slice invalidations (i.e. adjust_head on one xdp_md doesn't invalidate slices of some other xdp_md). Here we can just get away with normal PTR_TO_MEM. ... Or just let me know if you handle this correctly already, or if this won't be an actual problem :). [0]: https://lore.kernel.org/bpf/20220306234311.452206-3-memxor@gmail.com
On Mon, Aug 22, 2022 at 7:31 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > +Cc XDP folks > > On Tue, 23 Aug 2022 at 02:12, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > Add xdp dynptrs, which are dynptrs whose underlying pointer points > > to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main > > benefits. One is that they allow operations on sizes that are not > > statically known at compile-time (eg variable-sized accesses). > > Another is that parsing the packet data through dynptrs (instead of > > through direct access of xdp->data and xdp->data_end) can be more > > ergonomic and less brittle (eg does not need manual if checking for > > being within bounds of data_end). > > > > For reads and writes on the dynptr, this includes reading/writing > > from/to and across fragments. For data slices, direct access to > > It's a bit awkward to have such a difference between xdp and skb > dynptr's read/write. I understand why it is the way it is, but it > still doesn't feel right. I'm not sure if we can reconcile the > differences, but it makes writing common code for both xdp and tc > harder as it needs to be aware of the differences (and then the flags > for dynptr_write would differ too). So we're 90% there but not the > whole way... Yeah, it'd be great if the behavior for skb/xdp progs could be the same, but I'm not seeing a better solution here (unless we invalidate data slices on writes in xdp progs, just to make it match more :P). Regarding having 2 different interfaces bpf_dynptr_from_{skb/xdp}, I'm not convinced this is much of a problem - xdp and skb programs already have different interfaces for doing things (eg bpf_{skb/xdp}_{store/load}_bytes). > > > data in fragments is also permitted, but access across fragments > > is not. The returned data slice is reg type PTR_TO_PACKET | PTR_MAYBE_NULL. > > > > Any helper calls that change the underlying packet buffer (eg > > bpf_xdp_adjust_head) invalidates any data slices of the associated > > dynptr. Whenever such a helper call is made, the verifier marks any > > PTR_TO_PACKET reg type (which includes xdp dynptr slices since they are > > PTR_TO_PACKETs) as unknown. The stack trace for this is > > check_helper_call() -> clear_all_pkt_pointers() -> > > __clear_all_pkt_pointers() -> mark_reg_unknown() > > > > For examples of how xdp dynptrs can be used, please see the attached > > selftests. > > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com> > > --- > > include/linux/bpf.h | 6 ++++- > > include/linux/filter.h | 3 +++ > > include/uapi/linux/bpf.h | 25 +++++++++++++++--- > > kernel/bpf/helpers.c | 14 ++++++++++- > > kernel/bpf/verifier.c | 8 +++++- > > net/core/filter.c | 46 +++++++++++++++++++++++++++++----- > > tools/include/uapi/linux/bpf.h | 25 +++++++++++++++--- > > 7 files changed, 112 insertions(+), 15 deletions(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index 30615d1a0c13..455a215b6c57 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -410,11 +410,15 @@ enum bpf_type_flag { > > /* DYNPTR points to sk_buff */ > > DYNPTR_TYPE_SKB = BIT(11 + BPF_BASE_TYPE_BITS), > > > > + /* DYNPTR points to xdp_buff */ > > + DYNPTR_TYPE_XDP = BIT(12 + BPF_BASE_TYPE_BITS), > > + > > __BPF_TYPE_FLAG_MAX, > > __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, > > }; > > > > -#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB) > > +#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \ > > + | DYNPTR_TYPE_XDP) > > > > /* Max number of base types. */ > > #define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS) > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > index 649063d9cbfd..80f030239877 100644 > > --- a/include/linux/filter.h > > +++ b/include/linux/filter.h > > @@ -1535,5 +1535,8 @@ static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifind > > int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len); > > int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, > > u32 len, u64 flags); > > +int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len); > > +int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len); > > +void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len); > > > > #endif /* __LINUX_FILTER_H__ */ > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index 320e6b95d95c..9feea29eebcd 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -5283,13 +5283,18 @@ union bpf_attr { > > * and try again. > > * > > * * The data slice is automatically invalidated anytime > > - * **bpf_dynptr_write**\ () or a helper call that changes > > - * the underlying packet buffer (eg **bpf_skb_pull_data**\ ()) > > + * **bpf_dynptr_write**\ () is called. > > + * > > + * For skb-type and xdp-type dynptrs: > > + * * The data slice is automatically invalidated anytime a > > + * helper call that changes the underlying packet buffer > > + * (eg **bpf_skb_pull_data**\ (), **bpf_xdp_adjust_head**\ ()) > > * is called. > > * Return > > * Pointer to the underlying dynptr data, NULL if the dynptr is > > * read-only, if the dynptr is invalid, or if the offset and length > > - * is out of bounds or in a paged buffer for skb-type dynptrs. > > + * is out of bounds or in a paged buffer for skb-type dynptrs or > > + * across fragments for xdp-type dynptrs. > > * > > * s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr *th, u32 th_len) > > * Description > > @@ -5388,6 +5393,19 @@ union bpf_attr { > > * *flags* is currently unused, it must be 0 for now. > > * Return > > * 0 on success or -EINVAL if flags is not 0. > > + * > > + * long bpf_dynptr_from_xdp(struct xdp_buff *xdp_md, u64 flags, struct bpf_dynptr *ptr) > > + * Description > > + * Get a dynptr to the data in *xdp_md*. *xdp_md* must be the BPF program > > + * context. > > + * > > + * Calls that change the *xdp_md*'s underlying packet buffer > > + * (eg **bpf_xdp_adjust_head**\ ()) do not invalidate the dynptr, but > > + * they do invalidate any data slices associated with the dynptr. > > + * > > + * *flags* is currently unused, it must be 0 for now. > > + * Return > > + * 0 on success, -EINVAL if flags is not 0. > > */ > > #define __BPF_FUNC_MAPPER(FN) \ > > FN(unspec), \ > > @@ -5600,6 +5618,7 @@ union bpf_attr { > > FN(tcp_raw_check_syncookie_ipv6), \ > > FN(ktime_get_tai_ns), \ > > FN(dynptr_from_skb), \ > > + FN(dynptr_from_xdp), \ > > /* */ > > > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > index 471a01a9b6ae..2b9dc4c6de04 100644 > > --- a/kernel/bpf/helpers.c > > +++ b/kernel/bpf/helpers.c > > @@ -1541,6 +1541,8 @@ BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src > > return 0; > > case BPF_DYNPTR_TYPE_SKB: > > return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len); > > + case BPF_DYNPTR_TYPE_XDP: > > + return __bpf_xdp_load_bytes(src->data, src->offset + offset, dst, len); > > default: > > WARN(true, "bpf_dynptr_read: unknown dynptr type %d\n", type); > > return -EFAULT; > > @@ -1583,6 +1585,10 @@ BPF_CALL_5(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, > > case BPF_DYNPTR_TYPE_SKB: > > return __bpf_skb_store_bytes(dst->data, dst->offset + offset, src, len, > > flags); > > + case BPF_DYNPTR_TYPE_XDP: > > + if (flags) > > + return -EINVAL; > > + return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len); > > default: > > WARN(true, "bpf_dynptr_write: unknown dynptr type %d\n", type); > > return -EFAULT; > > @@ -1616,7 +1622,7 @@ BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len > > > > type = bpf_dynptr_get_type(ptr); > > > > - /* Only skb dynptrs can get read-only data slices, because the > > + /* Only skb and xdp dynptrs can get read-only data slices, because the > > * verifier enforces PTR_TO_PACKET accesses > > */ > > is_rdonly = bpf_dynptr_is_rdonly(ptr); > > @@ -1640,6 +1646,12 @@ BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len > > data = skb->data; > > break; > > } > > + case BPF_DYNPTR_TYPE_XDP: > > + /* if the requested data in across fragments, then it cannot > > + * be accessed directly - bpf_xdp_pointer will return NULL > > + */ > > + return (unsigned long)bpf_xdp_pointer(ptr->data, > > + ptr->offset + offset, len); > > default: > > WARN(true, "bpf_dynptr_data: unknown dynptr type %d\n", type); > > return 0; > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index 1ea295f47525..d33648eee188 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -686,6 +686,8 @@ static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type) > > return BPF_DYNPTR_TYPE_RINGBUF; > > case DYNPTR_TYPE_SKB: > > return BPF_DYNPTR_TYPE_SKB; > > + case DYNPTR_TYPE_XDP: > > + return BPF_DYNPTR_TYPE_XDP; > > default: > > return BPF_DYNPTR_TYPE_INVALID; > > } > > @@ -6078,6 +6080,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, > > case DYNPTR_TYPE_SKB: > > err_extra = "skb "; > > break; > > + case DYNPTR_TYPE_XDP: > > + err_extra = "xdp "; > > + break; > > } > > > > verbose(env, "Expected an initialized %sdynptr as arg #%d\n", > > @@ -7439,7 +7444,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn > > mark_reg_known_zero(env, regs, BPF_REG_0); > > > > if (func_id == BPF_FUNC_dynptr_data && > > - dynptr_type == BPF_DYNPTR_TYPE_SKB) { > > + (dynptr_type == BPF_DYNPTR_TYPE_SKB || > > + dynptr_type == BPF_DYNPTR_TYPE_XDP)) { > > regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag; > > regs[BPF_REG_0].range = meta.mem_size; > > It doesn't seem like this is safe. Since PTR_TO_PACKET's range can be > modified by comparisons with packet pointers loaded from the xdp/skb > ctx, how do we distinguish e.g. between a pkt slice obtained from some > frag in a multi-buff XDP vs pkt pointer from a linear area? > > Someone can compare data_meta from ctx with PTR_TO_PACKET from > bpf_dynptr_data on xdp dynptr (which might be pointing to a xdp mb > frag). While MAX_PACKET_OFF is 0xffff, it can still be used to do OOB > access for the linear area. reg_is_init_pkt_pointer will return true > as modified range is not considered for it. Same kind of issues when > doing comparison with data_end from ctx (though maybe you won't be > able to do incorrect data access at runtime using that). > > I had a pkt_uid field in my patch [0] which disallowed comparisons > among bpf_packet_pointer slices. Each call assigned a fresh pkt_uid, > and that disabled comparisons for them. reg->id is used for var_off > range propagation so it cannot be reused. > > Coming back to this: What we really want here is a PTR_TO_MEM with a > mem_size, so maybe you should go that route instead of PTR_TO_PACKET > (and add a type tag to maybe pretty print it also as a packet pointer > in verifier log), or add some way to distinguish slice vs non-slice > pkt pointers like I did in my patch. You might also want to add some > tests for this corner case (there are some later in [0] if you want to > reuse them). > > So TBH, I kinda dislike my own solution in [0] :). The complexity does > not seem worth it. The pkt_uid distinction is more useful (and > actually would be needed) in Toke's xdp queueing series, where in a > dequeue program you have multiple xdp_mds and want scoped slice > invalidations (i.e. adjust_head on one xdp_md doesn't invalidate > slices of some other xdp_md). Here we can just get away with normal > PTR_TO_MEM. > > ... Or just let me know if you handle this correctly already, or if > this won't be an actual problem :). Ooh interesting, I hadn't previously taken a look at try_match_pkt_pointers(), thanks for mentioning it :) The cleanest solution to me is to add the flag "DYNPTR_TYPE_{SKB/XDP}" to PTR_TO_PACKET and change reg_is_init_pkt_pointer() to return false if the DYNPTR_TYPE_{SKB/XDP} flag is present. I prefer this over returning PTR_TO_MEM because it seems more robust (eg if in the future we reject x behavior on the packet data reg types, this will automatically apply to the data slices), and because it'll keep the logic more efficient/simpler for the case when the pkt pointer has to be cleared after any helper that changes pkt data is called (aka the case where the data slice gets invalidated). What are your thoughts? > > [0]: https://lore.kernel.org/bpf/20220306234311.452206-3-memxor@gmail.com
Hi Joanne, Thank you for the patch! Yet something to improve: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/Add-skb-xdp-dynptrs/20220823-080022 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master config: microblaze-buildonly-randconfig-r003-20220823 (https://download.01.org/0day-ci/archive/20220824/202208240923.X2tgBh6Y-lkp@intel.com/config) compiler: microblaze-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/6d832ed17cea3ede1cd48f885a73144d9c4d800a git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Joanne-Koong/Add-skb-xdp-dynptrs/20220823-080022 git checkout 6d832ed17cea3ede1cd48f885a73144d9c4d800a # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=microblaze SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): microblaze-linux-ld: kernel/bpf/helpers.o: in function `____bpf_dynptr_read': kernel/bpf/helpers.c:1543: undefined reference to `__bpf_skb_load_bytes' >> microblaze-linux-ld: kernel/bpf/helpers.c:1545: undefined reference to `__bpf_xdp_load_bytes' microblaze-linux-ld: kernel/bpf/helpers.o: in function `____bpf_dynptr_write': kernel/bpf/helpers.c:1586: undefined reference to `__bpf_skb_store_bytes' >> microblaze-linux-ld: kernel/bpf/helpers.c:1591: undefined reference to `__bpf_xdp_store_bytes' microblaze-linux-ld: kernel/bpf/helpers.o: in function `____bpf_dynptr_data': kernel/bpf/helpers.c:1653: undefined reference to `bpf_xdp_pointer' pahole: .tmp_vmlinux.btf: Invalid argument .btf.vmlinux.bin.o: file not recognized: file format not recognized
Joanne Koong <joannelkoong@gmail.com> writes: > On Mon, Aug 22, 2022 at 7:31 PM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: >> >> +Cc XDP folks >> >> On Tue, 23 Aug 2022 at 02:12, Joanne Koong <joannelkoong@gmail.com> wrote: >> > >> > Add xdp dynptrs, which are dynptrs whose underlying pointer points >> > to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main >> > benefits. One is that they allow operations on sizes that are not >> > statically known at compile-time (eg variable-sized accesses). >> > Another is that parsing the packet data through dynptrs (instead of >> > through direct access of xdp->data and xdp->data_end) can be more >> > ergonomic and less brittle (eg does not need manual if checking for >> > being within bounds of data_end). >> > >> > For reads and writes on the dynptr, this includes reading/writing >> > from/to and across fragments. For data slices, direct access to >> >> It's a bit awkward to have such a difference between xdp and skb >> dynptr's read/write. I understand why it is the way it is, but it >> still doesn't feel right. I'm not sure if we can reconcile the >> differences, but it makes writing common code for both xdp and tc >> harder as it needs to be aware of the differences (and then the flags >> for dynptr_write would differ too). So we're 90% there but not the >> whole way... > > Yeah, it'd be great if the behavior for skb/xdp progs could be the > same, but I'm not seeing a better solution here (unless we invalidate > data slices on writes in xdp progs, just to make it match more :P). > > Regarding having 2 different interfaces bpf_dynptr_from_{skb/xdp}, I'm > not convinced this is much of a problem - xdp and skb programs already > have different interfaces for doing things (eg > bpf_{skb/xdp}_{store/load}_bytes). This is true, but it's quite possible to paper over these differences and write BPF code that works for both TC and XDP. Subtle semantic differences in otherwise identical functions makes this harder. Today you can write a function like: static inline int parse_pkt(void *data, void* data_end) { /* parse data */ } And call it like: SEC("xdp") int parse_xdp(struct xdp_md *ctx) { return parse_pkt(ctx->data, ctx->data_end); } SEC("tc") int parse_tc(struct __sk_buff *skb) { return parse_pkt(skb->data, skb->data_end); } IMO the goal should be to be able to do the equivalent for dynptrs, like: static inline int parse_pkt(struct bpf_dynptr *ptr) { __u64 *data; data = bpf_dynptr_data(ptr, 0, sizeof(*data)); if (!data) return 0; /* parse data */ } SEC("xdp") int parse_xdp(struct xdp_md *ctx) { struct bpf_dynptr ptr; bpf_dynptr_from_xdp(ctx, 0, &ptr); return parse_pkt(&ptr); } SEC("tc") int parse_tc(struct __sk_buff *skb) { struct bpf_dynptr ptr; bpf_dynptr_from_skb(skb, 0, &ptr); return parse_pkt(&ptr); } If the dynptr-based parse_pkt() function has to take special care to figure out where the dynptr comes from, it makes it a lot more difficult to write reusable packet parsing functions. So I'd be in favour of restricting the dynptr interface to the lowest common denominator of the skb and xdp interfaces even if that makes things slightly more awkward in the specialised cases... -Toke
On Wed, Aug 24, 2022 at 3:39 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Joanne Koong <joannelkoong@gmail.com> writes: > > > On Mon, Aug 22, 2022 at 7:31 PM Kumar Kartikeya Dwivedi > > <memxor@gmail.com> wrote: > >> > >> +Cc XDP folks > >> > >> On Tue, 23 Aug 2022 at 02:12, Joanne Koong <joannelkoong@gmail.com> wrote: > >> > > >> > Add xdp dynptrs, which are dynptrs whose underlying pointer points > >> > to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main > >> > benefits. One is that they allow operations on sizes that are not > >> > statically known at compile-time (eg variable-sized accesses). > >> > Another is that parsing the packet data through dynptrs (instead of > >> > through direct access of xdp->data and xdp->data_end) can be more > >> > ergonomic and less brittle (eg does not need manual if checking for > >> > being within bounds of data_end). > >> > > >> > For reads and writes on the dynptr, this includes reading/writing > >> > from/to and across fragments. For data slices, direct access to > >> > >> It's a bit awkward to have such a difference between xdp and skb > >> dynptr's read/write. I understand why it is the way it is, but it > >> still doesn't feel right. I'm not sure if we can reconcile the > >> differences, but it makes writing common code for both xdp and tc > >> harder as it needs to be aware of the differences (and then the flags > >> for dynptr_write would differ too). So we're 90% there but not the > >> whole way... > > > > Yeah, it'd be great if the behavior for skb/xdp progs could be the > > same, but I'm not seeing a better solution here (unless we invalidate > > data slices on writes in xdp progs, just to make it match more :P). > > > > Regarding having 2 different interfaces bpf_dynptr_from_{skb/xdp}, I'm > > not convinced this is much of a problem - xdp and skb programs already > > have different interfaces for doing things (eg > > bpf_{skb/xdp}_{store/load}_bytes). > > This is true, but it's quite possible to paper over these differences > and write BPF code that works for both TC and XDP. Subtle semantic > differences in otherwise identical functions makes this harder. > > Today you can write a function like: > > static inline int parse_pkt(void *data, void* data_end) > { > /* parse data */ > } > > And call it like: > > SEC("xdp") > int parse_xdp(struct xdp_md *ctx) > { > return parse_pkt(ctx->data, ctx->data_end); > } > > SEC("tc") > int parse_tc(struct __sk_buff *skb) > { > return parse_pkt(skb->data, skb->data_end); > } > > > IMO the goal should be to be able to do the equivalent for dynptrs, like: > > static inline int parse_pkt(struct bpf_dynptr *ptr) > { > __u64 *data; > > data = bpf_dynptr_data(ptr, 0, sizeof(*data)); > if (!data) > return 0; > /* parse data */ > } > > SEC("xdp") > int parse_xdp(struct xdp_md *ctx) > { > struct bpf_dynptr ptr; > > bpf_dynptr_from_xdp(ctx, 0, &ptr); > return parse_pkt(&ptr); > } > > SEC("tc") > int parse_tc(struct __sk_buff *skb) > { > struct bpf_dynptr ptr; > > bpf_dynptr_from_skb(skb, 0, &ptr); > return parse_pkt(&ptr); > } > To clarify, this is already possible when using data slices, since the behavior for data slices is equivalent between xdp and tc programs for non-fragmented accesses. From looking through the selftests, I anticipate that data slices will be the main way programs interact with dynptrs. For the cases where the program may write into frags, then bpf_dynptr_write will be needed (which is where functionality between xdp and tc start differing) - today, we're not able to write common code that writes into the frags since tc uses bpf_skb_store_bytes and xdp uses bpf_xdp_store_bytes. I'm more and more liking the idea of limiting xdp to match the constraints of skb given that both you, Kumar, and Jakub have mentioned that portability between xdp and skb would be useful for users :) What are your thoughts on this API: 1) bpf_dynptr_data() Before: for skb-type progs: - data slices in fragments is not supported for xdp-type progs: - data slices in fragments is supported as long as it is in a contiguous frag (eg not across frags) Now: for skb + xdp type progs: - data slices in fragments is not supported 2) bpf_dynptr_write() Before: for skb-type progs: - all data slices are invalidated after a write for xdp-type progs: - nothing Now: for skb + xdp type progs: - all data slices are invalidated after a write This will unite the functionality for skb and xdp programs across bpf_dyntpr_data, bpf_dynptr_write, and bpf_dynptr_read. As for whether we should unite bpf_dynptr_from_skb and bpf_dynptr_from_xdp into one common bpf_dynptr_from_packet as Jakub brought in [0], I'm leaning towards no because 1) if in the future there's some irreconcilable aspect between skb and xdp that gets added, that'll be hard to support since the expectation is that there is just one overall "packet dynptr" 2) the "packet dynptr" view is not completely accurate (eg bpf_dynptr_write accepts flags from skb progs and not xdp progs) 3) this adds some additional hardcoding in the verifier since there's no organic mapping between prog type -> prog ctx [0] https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/T/#m1438f89152b1d0e539fe60a9376482bbc9de7b6e > > If the dynptr-based parse_pkt() function has to take special care to > figure out where the dynptr comes from, it makes it a lot more difficult > to write reusable packet parsing functions. So I'd be in favour of > restricting the dynptr interface to the lowest common denominator of the > skb and xdp interfaces even if that makes things slightly more awkward > in the specialised cases... > > -Toke >
On Wed, 24 Aug 2022 at 00:27, Joanne Koong <joannelkoong@gmail.com> wrote: > > On Mon, Aug 22, 2022 at 7:31 PM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: > > > [...] > > > if (func_id == BPF_FUNC_dynptr_data && > > > - dynptr_type == BPF_DYNPTR_TYPE_SKB) { > > > + (dynptr_type == BPF_DYNPTR_TYPE_SKB || > > > + dynptr_type == BPF_DYNPTR_TYPE_XDP)) { > > > regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag; > > > regs[BPF_REG_0].range = meta.mem_size; > > > > It doesn't seem like this is safe. Since PTR_TO_PACKET's range can be > > modified by comparisons with packet pointers loaded from the xdp/skb > > ctx, how do we distinguish e.g. between a pkt slice obtained from some > > frag in a multi-buff XDP vs pkt pointer from a linear area? > > > > Someone can compare data_meta from ctx with PTR_TO_PACKET from > > bpf_dynptr_data on xdp dynptr (which might be pointing to a xdp mb > > frag). While MAX_PACKET_OFF is 0xffff, it can still be used to do OOB > > access for the linear area. reg_is_init_pkt_pointer will return true > > as modified range is not considered for it. Same kind of issues when > > doing comparison with data_end from ctx (though maybe you won't be > > able to do incorrect data access at runtime using that). > > > > I had a pkt_uid field in my patch [0] which disallowed comparisons > > among bpf_packet_pointer slices. Each call assigned a fresh pkt_uid, > > and that disabled comparisons for them. reg->id is used for var_off > > range propagation so it cannot be reused. > > > > Coming back to this: What we really want here is a PTR_TO_MEM with a > > mem_size, so maybe you should go that route instead of PTR_TO_PACKET > > (and add a type tag to maybe pretty print it also as a packet pointer > > in verifier log), or add some way to distinguish slice vs non-slice > > pkt pointers like I did in my patch. You might also want to add some > > tests for this corner case (there are some later in [0] if you want to > > reuse them). > > > > So TBH, I kinda dislike my own solution in [0] :). The complexity does > > not seem worth it. The pkt_uid distinction is more useful (and > > actually would be needed) in Toke's xdp queueing series, where in a > > dequeue program you have multiple xdp_mds and want scoped slice > > invalidations (i.e. adjust_head on one xdp_md doesn't invalidate > > slices of some other xdp_md). Here we can just get away with normal > > PTR_TO_MEM. > > > > ... Or just let me know if you handle this correctly already, or if > > this won't be an actual problem :). > > Ooh interesting, I hadn't previously taken a look at > try_match_pkt_pointers(), thanks for mentioning it :) > > The cleanest solution to me is to add the flag "DYNPTR_TYPE_{SKB/XDP}" > to PTR_TO_PACKET and change reg_is_init_pkt_pointer() to return false > if the DYNPTR_TYPE_{SKB/XDP} flag is present. I prefer this over > returning PTR_TO_MEM because it seems more robust (eg if in the future > we reject x behavior on the packet data reg types, this will > automatically apply to the data slices), and because it'll keep the > logic more efficient/simpler for the case when the pkt pointer has to > be cleared after any helper that changes pkt data is called (aka the > case where the data slice gets invalidated). > > What are your thoughts? > Thinking more deeply about it, probably not, we need more work here. I remember _now_ why I chose the pkt_uid approach (and this tells us my commit log lacks all the details about the motivation :( ). Consider how equivalency checking for packet pointers works in regsafe. It is checking type, then if old range > cur range, then offs, etc. The problem is, while we now don't prune on access to ptr_to_pkt vs ptr_to_pkt | dynptr_pkt types in same reg (since type differs we return false), we still prune if old range of ptr_to_pkt | dynptr_pkt > cur range of ptr_to_pkt | dynptr_pkt. Both could be pointing into separate frags, so this assumption would be incorrect. I would be able to trick the verifier into accessing data beyond the length of a different frag, by first making sure one line of exploration is verified, and then changing the register in another branch reaching the same branch target. Helpers can take packet pointers so the access can become a pruning point. It would think the rest of the stuff is safe, while they are not equivalent at all. It is ok if they are bit by bit equivalent (same type, range, off, etc.). If you start returning false whenever you see this type tag set, it will become too conservative (it considers reg copies of the same dynptr_data lookup as distinct). So you need some kind of id assigned during dynptr_data lookup to distinguish them.
On Wed, 24 Aug 2022 at 20:11, Joanne Koong <joannelkoong@gmail.com> wrote: > > On Wed, Aug 24, 2022 at 3:39 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > Joanne Koong <joannelkoong@gmail.com> writes: > > >> [...] > > >> It's a bit awkward to have such a difference between xdp and skb > > >> dynptr's read/write. I understand why it is the way it is, but it > > >> still doesn't feel right. I'm not sure if we can reconcile the > > >> differences, but it makes writing common code for both xdp and tc > > >> harder as it needs to be aware of the differences (and then the flags > > >> for dynptr_write would differ too). So we're 90% there but not the > > >> whole way... > > > > > > Yeah, it'd be great if the behavior for skb/xdp progs could be the > > > same, but I'm not seeing a better solution here (unless we invalidate > > > data slices on writes in xdp progs, just to make it match more :P). > > > > > > Regarding having 2 different interfaces bpf_dynptr_from_{skb/xdp}, I'm > > > not convinced this is much of a problem - xdp and skb programs already > > > have different interfaces for doing things (eg > > > bpf_{skb/xdp}_{store/load}_bytes). > > > > This is true, but it's quite possible to paper over these differences > > and write BPF code that works for both TC and XDP. Subtle semantic > > differences in otherwise identical functions makes this harder. > > > > Today you can write a function like: > > > > static inline int parse_pkt(void *data, void* data_end) > > { > > /* parse data */ > > } > > > > And call it like: > > > > SEC("xdp") > > int parse_xdp(struct xdp_md *ctx) > > { > > return parse_pkt(ctx->data, ctx->data_end); > > } > > > > SEC("tc") > > int parse_tc(struct __sk_buff *skb) > > { > > return parse_pkt(skb->data, skb->data_end); > > } > > > > > > IMO the goal should be to be able to do the equivalent for dynptrs, like: > > > > static inline int parse_pkt(struct bpf_dynptr *ptr) > > { > > __u64 *data; > > > > data = bpf_dynptr_data(ptr, 0, sizeof(*data)); > > if (!data) > > return 0; > > /* parse data */ > > } > > > > SEC("xdp") > > int parse_xdp(struct xdp_md *ctx) > > { > > struct bpf_dynptr ptr; > > > > bpf_dynptr_from_xdp(ctx, 0, &ptr); > > return parse_pkt(&ptr); > > } > > > > SEC("tc") > > int parse_tc(struct __sk_buff *skb) > > { > > struct bpf_dynptr ptr; > > > > bpf_dynptr_from_skb(skb, 0, &ptr); > > return parse_pkt(&ptr); > > } > > > > To clarify, this is already possible when using data slices, since the > behavior for data slices is equivalent between xdp and tc programs for > non-fragmented accesses. From looking through the selftests, I > anticipate that data slices will be the main way programs interact > with dynptrs. For the cases where the program may write into frags, > then bpf_dynptr_write will be needed (which is where functionality > between xdp and tc start differing) - today, we're not able to write > common code that writes into the frags since tc uses > bpf_skb_store_bytes and xdp uses bpf_xdp_store_bytes. Yes, we cannot write code that just swaps those two calls right now. The key difference is, the two above are separate functions already and have different semantics. Here in this set you have the same function, but with different semantics depending on ctx, which is the point of contention. > > I'm more and more liking the idea of limiting xdp to match the > constraints of skb given that both you, Kumar, and Jakub have > mentioned that portability between xdp and skb would be useful for > users :) > > What are your thoughts on this API: > > 1) bpf_dynptr_data() > > Before: > for skb-type progs: > - data slices in fragments is not supported > > for xdp-type progs: > - data slices in fragments is supported as long as it is in a > contiguous frag (eg not across frags) > > Now: > for skb + xdp type progs: > - data slices in fragments is not supported > > > 2) bpf_dynptr_write() > > Before: > for skb-type progs: > - all data slices are invalidated after a write > > for xdp-type progs: > - nothing > > Now: > for skb + xdp type progs: > - all data slices are invalidated after a write > There is also the other option: failing to write until you pull skb, which looks a lot better to me if we are adding this helper anyway... > This will unite the functionality for skb and xdp programs across > bpf_dyntpr_data, bpf_dynptr_write, and bpf_dynptr_read. As for whether > we should unite bpf_dynptr_from_skb and bpf_dynptr_from_xdp into one > common bpf_dynptr_from_packet as Jakub brought in [0], I'm leaning > towards no because 1) if in the future there's some irreconcilable > aspect between skb and xdp that gets added, that'll be hard to support > since the expectation is that there is just one overall "packet > dynptr" 2) the "packet dynptr" view is not completely accurate (eg > bpf_dynptr_write accepts flags from skb progs and not xdp progs) 3) > this adds some additional hardcoding in the verifier since there's no > organic mapping between prog type -> prog ctx > There is, e.g. see how btf_get_prog_ctx_type is doing it (unless I misunderstood what you meant). I also had a different tangential question (unrelated to 'reconciliation'). Let's forget about that for a moment. I was listening to the talk here [0]. At 12:00, I can hear someone mentioning that you'd have a dynptr for each segment/frag. Right now, you have a skb/xdp dynptr, which is representing discontiguous memory, i.e. you represent all linear page + fragments using one dynptr. This seems a bit inconsistent with the idea of a dynptr, since it's conceptually a slice of variable length memory. dynptr_data gives you a constant length slice into dynptr's variable length memory (which the verifier can track bounds of statically). So thinking about the idea in [0], one way would be to change from_xdp/from_skb helpers to take an index (into the frags array), and return dynptr for each frag. Or determine frag from offset + len. For the skb case it might require kmap_atomic to access the frag dynptr, which might need a paired kunmap_atomic call as well, and it may not return dynptr for frags in cloned skbs, but for XDP multi-buff all that doesn't count. Was this approach/line of thinking considered and/or rejected? What were the reasons? Just trying to understand the background here. What I'm wondering is that we already have helpers to do reads and writes that you are wrapping over in dynptr interfaces, but what we're missing is to be able to get direct access to frags (or 'next' ctx, essentially). When we know we can avoid pulls, it might be cheaper to then do access this way to skb, right? Especially for a case where just a part of the header lies in the next frag? But I'm no expert here, so I might be missing some subtleties. [0]: https://www.youtube.com/watch?v=EZ5PDrXvs7M > > [0] https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/T/#m1438f89152b1d0e539fe60a9376482bbc9de7b6e > > > > > If the dynptr-based parse_pkt() function has to take special care to > > figure out where the dynptr comes from, it makes it a lot more difficult > > to write reusable packet parsing functions. So I'd be in favour of > > restricting the dynptr interface to the lowest common denominator of the > > skb and xdp interfaces even if that makes things slightly more awkward > > in the specialised cases... > > > > -Toke > >
On Wed, Aug 24, 2022 at 4:04 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Wed, 24 Aug 2022 at 20:11, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > On Wed, Aug 24, 2022 at 3:39 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > > > Joanne Koong <joannelkoong@gmail.com> writes: > > > >> [...] > > > >> It's a bit awkward to have such a difference between xdp and skb > > > >> dynptr's read/write. I understand why it is the way it is, but it > > > >> still doesn't feel right. I'm not sure if we can reconcile the > > > >> differences, but it makes writing common code for both xdp and tc > > > >> harder as it needs to be aware of the differences (and then the flags > > > >> for dynptr_write would differ too). So we're 90% there but not the > > > >> whole way... > > > > > > > > Yeah, it'd be great if the behavior for skb/xdp progs could be the > > > > same, but I'm not seeing a better solution here (unless we invalidate > > > > data slices on writes in xdp progs, just to make it match more :P). > > > > > > > > Regarding having 2 different interfaces bpf_dynptr_from_{skb/xdp}, I'm > > > > not convinced this is much of a problem - xdp and skb programs already > > > > have different interfaces for doing things (eg > > > > bpf_{skb/xdp}_{store/load}_bytes). > > > > > > This is true, but it's quite possible to paper over these differences > > > and write BPF code that works for both TC and XDP. Subtle semantic > > > differences in otherwise identical functions makes this harder. > > > > > > Today you can write a function like: > > > > > > static inline int parse_pkt(void *data, void* data_end) > > > { > > > /* parse data */ > > > } > > > > > > And call it like: > > > > > > SEC("xdp") > > > int parse_xdp(struct xdp_md *ctx) > > > { > > > return parse_pkt(ctx->data, ctx->data_end); > > > } > > > > > > SEC("tc") > > > int parse_tc(struct __sk_buff *skb) > > > { > > > return parse_pkt(skb->data, skb->data_end); > > > } > > > > > > > > > IMO the goal should be to be able to do the equivalent for dynptrs, like: > > > > > > static inline int parse_pkt(struct bpf_dynptr *ptr) > > > { > > > __u64 *data; > > > > > > data = bpf_dynptr_data(ptr, 0, sizeof(*data)); > > > if (!data) > > > return 0; > > > /* parse data */ > > > } > > > > > > SEC("xdp") > > > int parse_xdp(struct xdp_md *ctx) > > > { > > > struct bpf_dynptr ptr; > > > > > > bpf_dynptr_from_xdp(ctx, 0, &ptr); > > > return parse_pkt(&ptr); > > > } > > > > > > SEC("tc") > > > int parse_tc(struct __sk_buff *skb) > > > { > > > struct bpf_dynptr ptr; > > > > > > bpf_dynptr_from_skb(skb, 0, &ptr); > > > return parse_pkt(&ptr); > > > } > > > > > > > To clarify, this is already possible when using data slices, since the > > behavior for data slices is equivalent between xdp and tc programs for > > non-fragmented accesses. From looking through the selftests, I > > anticipate that data slices will be the main way programs interact > > with dynptrs. For the cases where the program may write into frags, > > then bpf_dynptr_write will be needed (which is where functionality > > between xdp and tc start differing) - today, we're not able to write > > common code that writes into the frags since tc uses > > bpf_skb_store_bytes and xdp uses bpf_xdp_store_bytes. > > Yes, we cannot write code that just swaps those two calls right now. > The key difference is, the two above are separate functions already > and have different semantics. Here in this set you have the same > function, but with different semantics depending on ctx, which is the > point of contention. > > > > > I'm more and more liking the idea of limiting xdp to match the > > constraints of skb given that both you, Kumar, and Jakub have > > mentioned that portability between xdp and skb would be useful for > > users :) > > > > What are your thoughts on this API: > > > > 1) bpf_dynptr_data() > > > > Before: > > for skb-type progs: > > - data slices in fragments is not supported > > > > for xdp-type progs: > > - data slices in fragments is supported as long as it is in a > > contiguous frag (eg not across frags) > > > > Now: > > for skb + xdp type progs: > > - data slices in fragments is not supported > > > > > > 2) bpf_dynptr_write() > > > > Before: > > for skb-type progs: > > - all data slices are invalidated after a write > > > > for xdp-type progs: > > - nothing > > > > Now: > > for skb + xdp type progs: > > - all data slices are invalidated after a write > > > > There is also the other option: failing to write until you pull skb, > which looks a lot better to me if we are adding this helper anyway... This was also previously discussed in [0]. After using skb dynptrs for the test_cls_redirect_dynptr.c selftest [1], I'm convinced that allowing bpf_dynptr_writes into frags is the correct way to go. There are instances where the data is probably in head with a slight off-chance that it's in a fragment - being able to call bpf_dynptr_write makes it super easy whereas the alternative is either 1) always pulling, which in most cases will be unnecessary or 2) adding special casing for the case where the bpf_dynptr_write fails. As well, failing to write until you pull the skb will hurt being able to write common code that both xdp/skb can use. [0] https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/T/#md6c17d9916f5937a9ae9dfca11e815e4b89009fb [1] https://lore.kernel.org/bpf/20220822235649.2218031-4-joannelkoong@gmail.com/ > > > This will unite the functionality for skb and xdp programs across > > bpf_dyntpr_data, bpf_dynptr_write, and bpf_dynptr_read. As for whether > > we should unite bpf_dynptr_from_skb and bpf_dynptr_from_xdp into one > > common bpf_dynptr_from_packet as Jakub brought in [0], I'm leaning > > towards no because 1) if in the future there's some irreconcilable > > aspect between skb and xdp that gets added, that'll be hard to support > > since the expectation is that there is just one overall "packet > > dynptr" 2) the "packet dynptr" view is not completely accurate (eg > > bpf_dynptr_write accepts flags from skb progs and not xdp progs) 3) > > this adds some additional hardcoding in the verifier since there's no > > organic mapping between prog type -> prog ctx > > > > There is, e.g. see how btf_get_prog_ctx_type is doing it (unless I > misunderstood what you meant). Cool, that should be pretty straightforwarwd then; from what it looks like, btf_get_prog_ctx_type() + 1 will give the btf id and then from that use btf_type_by_id() to get the corresponding struct btf_type, then use btf_type->name_off to check whether the name corresponds to "__sk_buff" or "xdp_md" > > I also had a different tangential question (unrelated to > 'reconciliation'). Let's forget about that for a moment. I was > listening to the talk here [0]. At 12:00, I can hear someone > mentioning that you'd have a dynptr for each segment/frag. > > Right now, you have a skb/xdp dynptr, which is representing > discontiguous memory, i.e. you represent all linear page + fragments > using one dynptr. This seems a bit inconsistent with the idea of a > dynptr, since it's conceptually a slice of variable length memory. > dynptr_data gives you a constant length slice into dynptr's variable > length memory (which the verifier can track bounds of statically). > > So thinking about the idea in [0], one way would be to change > from_xdp/from_skb helpers to take an index (into the frags array), and > return dynptr for each frag. Or determine frag from offset + len. For > the skb case it might require kmap_atomic to access the frag dynptr, > which might need a paired kunmap_atomic call as well, and it may not > return dynptr for frags in cloned skbs, but for XDP multi-buff all > that doesn't count. > > Was this approach/line of thinking considered and/or rejected? What > were the reasons? Just trying to understand the background here. > > What I'm wondering is that we already have helpers to do reads and > writes that you are wrapping over in dynptr interfaces, but what we're > missing is to be able to get direct access to frags (or 'next' ctx, > essentially). When we know we can avoid pulls, it might be cheaper to > then do access this way to skb, right? Especially for a case where > just a part of the header lies in the next frag? Imo, having a separate dynptr for each frag is confusing for users. My intuition is that in the majority of cases, the user doesn't care how it's laid out at the frag level, they just want to write some data at some offset and length. If every frag is a separate dynptr, then the user needs to be aware whether they're crossing frag (and thus dynptr) boundaries. In cases where the user doesn't know if the data is across head/frag or across frags, then their prog code will need to be more complex to handle these different scenarios. There's the additional issue as well that this doesn't work for cloned skb frags - now users need to know whether the skb is cloned or uncloned to know whether to pull first. Imo it makes the interface more confusing. I'm still not sure what is gained by having separate dynptrs for separate frags. If the goal is to avoid pulls, then we're able to do that without having separate dynptrs. If a write is to an uncloned frag, then bpf_dynptr_write() can kmap, write the data, and kunmap (though it's still unclear as to whether this should be done, since maybe it's faster to pull and do large numbers of writes instead of multiple kmap/kunmap calls). > > But I'm no expert here, so I might be missing some subtleties. > > [0]: https://www.youtube.com/watch?v=EZ5PDrXvs7M > > > > > > > > > > > > > [0] https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/T/#m1438f89152b1d0e539fe60a9376482bbc9de7b6e > > > > > > > > If the dynptr-based parse_pkt() function has to take special care to > > > figure out where the dynptr comes from, it makes it a lot more difficult > > > to write reusable packet parsing functions. So I'd be in favour of > > > restricting the dynptr interface to the lowest common denominator of the > > > skb and xdp interfaces even if that makes things slightly more awkward > > > in the specialised cases... > > > > > > -Toke > > >
On Wed, Aug 24, 2022 at 2:11 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Wed, 24 Aug 2022 at 00:27, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > On Mon, Aug 22, 2022 at 7:31 PM Kumar Kartikeya Dwivedi > > <memxor@gmail.com> wrote: > > > > [...] > > > > if (func_id == BPF_FUNC_dynptr_data && > > > > - dynptr_type == BPF_DYNPTR_TYPE_SKB) { > > > > + (dynptr_type == BPF_DYNPTR_TYPE_SKB || > > > > + dynptr_type == BPF_DYNPTR_TYPE_XDP)) { > > > > regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag; > > > > regs[BPF_REG_0].range = meta.mem_size; > > > > > > It doesn't seem like this is safe. Since PTR_TO_PACKET's range can be > > > modified by comparisons with packet pointers loaded from the xdp/skb > > > ctx, how do we distinguish e.g. between a pkt slice obtained from some > > > frag in a multi-buff XDP vs pkt pointer from a linear area? > > > > > > Someone can compare data_meta from ctx with PTR_TO_PACKET from > > > bpf_dynptr_data on xdp dynptr (which might be pointing to a xdp mb > > > frag). While MAX_PACKET_OFF is 0xffff, it can still be used to do OOB > > > access for the linear area. reg_is_init_pkt_pointer will return true > > > as modified range is not considered for it. Same kind of issues when > > > doing comparison with data_end from ctx (though maybe you won't be > > > able to do incorrect data access at runtime using that). > > > > > > I had a pkt_uid field in my patch [0] which disallowed comparisons > > > among bpf_packet_pointer slices. Each call assigned a fresh pkt_uid, > > > and that disabled comparisons for them. reg->id is used for var_off > > > range propagation so it cannot be reused. > > > > > > Coming back to this: What we really want here is a PTR_TO_MEM with a > > > mem_size, so maybe you should go that route instead of PTR_TO_PACKET > > > (and add a type tag to maybe pretty print it also as a packet pointer > > > in verifier log), or add some way to distinguish slice vs non-slice > > > pkt pointers like I did in my patch. You might also want to add some > > > tests for this corner case (there are some later in [0] if you want to > > > reuse them). > > > > > > So TBH, I kinda dislike my own solution in [0] :). The complexity does > > > not seem worth it. The pkt_uid distinction is more useful (and > > > actually would be needed) in Toke's xdp queueing series, where in a > > > dequeue program you have multiple xdp_mds and want scoped slice > > > invalidations (i.e. adjust_head on one xdp_md doesn't invalidate > > > slices of some other xdp_md). Here we can just get away with normal > > > PTR_TO_MEM. > > > > > > ... Or just let me know if you handle this correctly already, or if > > > this won't be an actual problem :). > > > > Ooh interesting, I hadn't previously taken a look at > > try_match_pkt_pointers(), thanks for mentioning it :) > > > > The cleanest solution to me is to add the flag "DYNPTR_TYPE_{SKB/XDP}" > > to PTR_TO_PACKET and change reg_is_init_pkt_pointer() to return false > > if the DYNPTR_TYPE_{SKB/XDP} flag is present. I prefer this over > > returning PTR_TO_MEM because it seems more robust (eg if in the future > > we reject x behavior on the packet data reg types, this will > > automatically apply to the data slices), and because it'll keep the > > logic more efficient/simpler for the case when the pkt pointer has to > > be cleared after any helper that changes pkt data is called (aka the > > case where the data slice gets invalidated). > > > > What are your thoughts? > > > > Thinking more deeply about it, probably not, we need more work here. I > remember _now_ why I chose the pkt_uid approach (and this tells us my > commit log lacks all the details about the motivation :( ). > > Consider how equivalency checking for packet pointers works in > regsafe. It is checking type, then if old range > cur range, then > offs, etc. > > The problem is, while we now don't prune on access to ptr_to_pkt vs > ptr_to_pkt | dynptr_pkt types in same reg (since type differs we > return false), we still prune if old range of ptr_to_pkt | dynptr_pkt > > cur range of ptr_to_pkt | dynptr_pkt. Both could be pointing into > separate frags, so this assumption would be incorrect. I would be able > to trick the verifier into accessing data beyond the length of a > different frag, by first making sure one line of exploration is > verified, and then changing the register in another branch reaching > the same branch target. Helpers can take packet pointers so the access > can become a pruning point. It would think the rest of the stuff is > safe, while they are not equivalent at all. It is ok if they are bit > by bit equivalent (same type, range, off, etc.). Thanks for the explanation. To clarify, if old range of ptr_to_pkt > cur range of ptr_to_pkt, what gets pruned? Is it access to cur range of ptr_to_pkt since if old range > cur range, then if old range is acceptable cur range must definitely be acceptable? > > If you start returning false whenever you see this type tag set, it > will become too conservative (it considers reg copies of the same > dynptr_data lookup as distinct). So you need some kind of id assigned > during dynptr_data lookup to distinguish them. What about if the dynptr_pkt type tag is set, then we compare the ranges as well? If the ranges are the same, then we return true, else false. Does that work?
On Wed, Aug 24, 2022 at 4:04 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Wed, 24 Aug 2022 at 20:11, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > On Wed, Aug 24, 2022 at 3:39 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > > > Joanne Koong <joannelkoong@gmail.com> writes: > > > >> [...] > > > >> It's a bit awkward to have such a difference between xdp and skb > > > >> dynptr's read/write. I understand why it is the way it is, but it > > > >> still doesn't feel right. I'm not sure if we can reconcile the > > > >> differences, but it makes writing common code for both xdp and tc > > > >> harder as it needs to be aware of the differences (and then the flags > > > >> for dynptr_write would differ too). So we're 90% there but not the > > > >> whole way... > > > > > > > > Yeah, it'd be great if the behavior for skb/xdp progs could be the > > > > same, but I'm not seeing a better solution here (unless we invalidate > > > > data slices on writes in xdp progs, just to make it match more :P). > > > > > > > > Regarding having 2 different interfaces bpf_dynptr_from_{skb/xdp}, I'm > > > > not convinced this is much of a problem - xdp and skb programs already > > > > have different interfaces for doing things (eg > > > > bpf_{skb/xdp}_{store/load}_bytes). > > > > > > This is true, but it's quite possible to paper over these differences > > > and write BPF code that works for both TC and XDP. Subtle semantic > > > differences in otherwise identical functions makes this harder. > > > > > > Today you can write a function like: > > > > > > static inline int parse_pkt(void *data, void* data_end) > > > { > > > /* parse data */ > > > } > > > > > > And call it like: > > > > > > SEC("xdp") > > > int parse_xdp(struct xdp_md *ctx) > > > { > > > return parse_pkt(ctx->data, ctx->data_end); > > > } > > > > > > SEC("tc") > > > int parse_tc(struct __sk_buff *skb) > > > { > > > return parse_pkt(skb->data, skb->data_end); > > > } > > > > > > > > > IMO the goal should be to be able to do the equivalent for dynptrs, like: > > > > > > static inline int parse_pkt(struct bpf_dynptr *ptr) > > > { > > > __u64 *data; > > > > > > data = bpf_dynptr_data(ptr, 0, sizeof(*data)); > > > if (!data) > > > return 0; > > > /* parse data */ > > > } > > > > > > SEC("xdp") > > > int parse_xdp(struct xdp_md *ctx) > > > { > > > struct bpf_dynptr ptr; > > > > > > bpf_dynptr_from_xdp(ctx, 0, &ptr); > > > return parse_pkt(&ptr); > > > } > > > > > > SEC("tc") > > > int parse_tc(struct __sk_buff *skb) > > > { > > > struct bpf_dynptr ptr; > > > > > > bpf_dynptr_from_skb(skb, 0, &ptr); > > > return parse_pkt(&ptr); > > > } > > > > > > > To clarify, this is already possible when using data slices, since the > > behavior for data slices is equivalent between xdp and tc programs for > > non-fragmented accesses. From looking through the selftests, I > > anticipate that data slices will be the main way programs interact > > with dynptrs. For the cases where the program may write into frags, > > then bpf_dynptr_write will be needed (which is where functionality > > between xdp and tc start differing) - today, we're not able to write > > common code that writes into the frags since tc uses > > bpf_skb_store_bytes and xdp uses bpf_xdp_store_bytes. > > Yes, we cannot write code that just swaps those two calls right now. > The key difference is, the two above are separate functions already > and have different semantics. Here in this set you have the same > function, but with different semantics depending on ctx, which is the > point of contention. bpf_dynptr_{read,write}() are effectively virtual methods of dynptr and we have few different implementations of dynptr, depending on what data they are wrapping, so having different semantics depending on ctx makes sense, if they are dictated by good reasons (like good performance for skb and xdp). > > > > > I'm more and more liking the idea of limiting xdp to match the > > constraints of skb given that both you, Kumar, and Jakub have > > mentioned that portability between xdp and skb would be useful for > > users :) > > > > What are your thoughts on this API: > > > > 1) bpf_dynptr_data() > > > > Before: > > for skb-type progs: > > - data slices in fragments is not supported > > > > for xdp-type progs: > > - data slices in fragments is supported as long as it is in a > > contiguous frag (eg not across frags) > > > > Now: > > for skb + xdp type progs: > > - data slices in fragments is not supported > > > > > > 2) bpf_dynptr_write() > > > > Before: > > for skb-type progs: > > - all data slices are invalidated after a write > > > > for xdp-type progs: > > - nothing > > > > Now: > > for skb + xdp type progs: > > - all data slices are invalidated after a write > > > > There is also the other option: failing to write until you pull skb, > which looks a lot better to me if we are adding this helper anyway... Wouldn't this kill performance for typical cases when writes don't go into frags? > > > This will unite the functionality for skb and xdp programs across > > bpf_dyntpr_data, bpf_dynptr_write, and bpf_dynptr_read. As for whether > > we should unite bpf_dynptr_from_skb and bpf_dynptr_from_xdp into one > > common bpf_dynptr_from_packet as Jakub brought in [0], I'm leaning > > towards no because 1) if in the future there's some irreconcilable > > aspect between skb and xdp that gets added, that'll be hard to support > > since the expectation is that there is just one overall "packet > > dynptr" 2) the "packet dynptr" view is not completely accurate (eg > > bpf_dynptr_write accepts flags from skb progs and not xdp progs) 3) > > this adds some additional hardcoding in the verifier since there's no > > organic mapping between prog type -> prog ctx > > > > There is, e.g. see how btf_get_prog_ctx_type is doing it (unless I > misunderstood what you meant). > > I also had a different tangential question (unrelated to > 'reconciliation'). Let's forget about that for a moment. I was > listening to the talk here [0]. At 12:00, I can hear someone > mentioning that you'd have a dynptr for each segment/frag. One of those people was me :) I think what you are referring to was an idea that for frags we might end a special iterator function that will call a callback for each linear frag. And in such case linear frag will be presented to callback as dynptr (just as an interface to statically unknown memory region, it will be DYNPTR_LOCAL at that point, probably). It's different from DYNPTR_SKB that Joanne is adding here, though. > > Right now, you have a skb/xdp dynptr, which is representing > discontiguous memory, i.e. you represent all linear page + fragments > using one dynptr. This seems a bit inconsistent with the idea of a > dynptr, since it's conceptually a slice of variable length memory. > dynptr_data gives you a constant length slice into dynptr's variable > length memory (which the verifier can track bounds of statically). All the data in skb is conceptually a slice of variable length memory with no gaps in between. It's just physically discontiguous, but there is still a linear offset addressing (conceptually). So I think it makes sense to represent skb data as a whole as one SKB dynptr. There are technical restrictions on direct memory access through bpf_dynptr_data(), inevitablly. > > So thinking about the idea in [0], one way would be to change > from_xdp/from_skb helpers to take an index (into the frags array), and > return dynptr for each frag. Or determine frag from offset + len. For > the skb case it might require kmap_atomic to access the frag dynptr, > which might need a paired kunmap_atomic call as well, and it may not > return dynptr for frags in cloned skbs, but for XDP multi-buff all > that doesn't count. We could do something like bpf_dynptr_from_skb_frag(skb, N) and get access to frag #N (where we can special case 0 being linear skb data), but that would be DYNPTR_LOCAL at that point, probably? And it's a separate extension in addition to DYNPTR_SKB. Having generic bpf_dynptr_{read,write}() working over entire skb range is very useful, while this per-frag dynptr would prevent this. So let's keep those two separate. > > Was this approach/line of thinking considered and/or rejected? What > were the reasons? Just trying to understand the background here. > > What I'm wondering is that we already have helpers to do reads and > writes that you are wrapping over in dynptr interfaces, but what we're > missing is to be able to get direct access to frags (or 'next' ctx, > essentially). When we know we can avoid pulls, it might be cheaper to > then do access this way to skb, right? Especially for a case where > just a part of the header lies in the next frag? > > But I'm no expert here, so I might be missing some subtleties. > > [0]: https://www.youtube.com/watch?v=EZ5PDrXvs7M > > > > > > > > > > > > > [0] https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/T/#m1438f89152b1d0e539fe60a9376482bbc9de7b6e > > > > > > > > If the dynptr-based parse_pkt() function has to take special care to > > > figure out where the dynptr comes from, it makes it a lot more difficult > > > to write reusable packet parsing functions. So I'd be in favour of > > > restricting the dynptr interface to the lowest common denominator of the > > > skb and xdp interfaces even if that makes things slightly more awkward > > > in the specialised cases... > > > > > > -Toke > > >
On Thu, 25 Aug 2022 at 22:39, Joanne Koong <joannelkoong@gmail.com> wrote: > > On Wed, Aug 24, 2022 at 2:11 PM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: > > > > On Wed, 24 Aug 2022 at 00:27, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > > > On Mon, Aug 22, 2022 at 7:31 PM Kumar Kartikeya Dwivedi > > > <memxor@gmail.com> wrote: > > > > > [...] > > > > > if (func_id == BPF_FUNC_dynptr_data && > > > > > - dynptr_type == BPF_DYNPTR_TYPE_SKB) { > > > > > + (dynptr_type == BPF_DYNPTR_TYPE_SKB || > > > > > + dynptr_type == BPF_DYNPTR_TYPE_XDP)) { > > > > > regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag; > > > > > regs[BPF_REG_0].range = meta.mem_size; > > > > > > > > It doesn't seem like this is safe. Since PTR_TO_PACKET's range can be > > > > modified by comparisons with packet pointers loaded from the xdp/skb > > > > ctx, how do we distinguish e.g. between a pkt slice obtained from some > > > > frag in a multi-buff XDP vs pkt pointer from a linear area? > > > > > > > > Someone can compare data_meta from ctx with PTR_TO_PACKET from > > > > bpf_dynptr_data on xdp dynptr (which might be pointing to a xdp mb > > > > frag). While MAX_PACKET_OFF is 0xffff, it can still be used to do OOB > > > > access for the linear area. reg_is_init_pkt_pointer will return true > > > > as modified range is not considered for it. Same kind of issues when > > > > doing comparison with data_end from ctx (though maybe you won't be > > > > able to do incorrect data access at runtime using that). > > > > > > > > I had a pkt_uid field in my patch [0] which disallowed comparisons > > > > among bpf_packet_pointer slices. Each call assigned a fresh pkt_uid, > > > > and that disabled comparisons for them. reg->id is used for var_off > > > > range propagation so it cannot be reused. > > > > > > > > Coming back to this: What we really want here is a PTR_TO_MEM with a > > > > mem_size, so maybe you should go that route instead of PTR_TO_PACKET > > > > (and add a type tag to maybe pretty print it also as a packet pointer > > > > in verifier log), or add some way to distinguish slice vs non-slice > > > > pkt pointers like I did in my patch. You might also want to add some > > > > tests for this corner case (there are some later in [0] if you want to > > > > reuse them). > > > > > > > > So TBH, I kinda dislike my own solution in [0] :). The complexity does > > > > not seem worth it. The pkt_uid distinction is more useful (and > > > > actually would be needed) in Toke's xdp queueing series, where in a > > > > dequeue program you have multiple xdp_mds and want scoped slice > > > > invalidations (i.e. adjust_head on one xdp_md doesn't invalidate > > > > slices of some other xdp_md). Here we can just get away with normal > > > > PTR_TO_MEM. > > > > > > > > ... Or just let me know if you handle this correctly already, or if > > > > this won't be an actual problem :). > > > > > > Ooh interesting, I hadn't previously taken a look at > > > try_match_pkt_pointers(), thanks for mentioning it :) > > > > > > The cleanest solution to me is to add the flag "DYNPTR_TYPE_{SKB/XDP}" > > > to PTR_TO_PACKET and change reg_is_init_pkt_pointer() to return false > > > if the DYNPTR_TYPE_{SKB/XDP} flag is present. I prefer this over > > > returning PTR_TO_MEM because it seems more robust (eg if in the future > > > we reject x behavior on the packet data reg types, this will > > > automatically apply to the data slices), and because it'll keep the > > > logic more efficient/simpler for the case when the pkt pointer has to > > > be cleared after any helper that changes pkt data is called (aka the > > > case where the data slice gets invalidated). > > > > > > What are your thoughts? > > > > > > > Thinking more deeply about it, probably not, we need more work here. I > > remember _now_ why I chose the pkt_uid approach (and this tells us my > > commit log lacks all the details about the motivation :( ). > > > > Consider how equivalency checking for packet pointers works in > > regsafe. It is checking type, then if old range > cur range, then > > offs, etc. > > > > The problem is, while we now don't prune on access to ptr_to_pkt vs > > ptr_to_pkt | dynptr_pkt types in same reg (since type differs we > > return false), we still prune if old range of ptr_to_pkt | dynptr_pkt > > > cur range of ptr_to_pkt | dynptr_pkt. Both could be pointing into > > separate frags, so this assumption would be incorrect. I would be able > > to trick the verifier into accessing data beyond the length of a > > different frag, by first making sure one line of exploration is > > verified, and then changing the register in another branch reaching > > the same branch target. Helpers can take packet pointers so the access > > can become a pruning point. It would think the rest of the stuff is > > safe, while they are not equivalent at all. It is ok if they are bit > > by bit equivalent (same type, range, off, etc.). > > Thanks for the explanation. To clarify, if old range of ptr_to_pkt > > cur range of ptr_to_pkt, what gets pruned? Is it access to cur range > of ptr_to_pkt since if old range > cur range, then if old range is > acceptable cur range must definitely be acceptable? No, my description was bad :). We return false when old_range > cur_range, i.e. the path is considered safe and not explored again when old_range <= cur_range (pruning), otherwise we continue verifying. Consider if it was doing pkt[cur_range + 1] access in the path we are about to explore again (already verified for old_range). That is <= old_range, but > cur_range, so it would be problematic if we pruned search for old_range > cur_range. So maybe it won't be a problem here, and just the current range checks for pkt pointer slices is fine even if they belong to different frags. I didn't craft any test case when writing my previous reply. Especially since you will disable comparisons, one cannot relearn range again using var_off + comparison, which closes another loophole. It just seems simpler to me to be a bit more conservative, since it is only an optimization. There might be some corner case lurking we can't think of right now. But I leave the judgement up to you if you can reason about it. In either case it would be good to include some comments in the commit log about all this. Meanwhile, looking at the current code, I'm more inclined to suggest PTR_TO_MEM (and handle invalidation specially), but again, I will leave it up to you to decide. When we do += var_off to a pkt reg, its range is reset to zero, compared to PTR_TO_MEM, where off + var_off (smin/umax) is used to check against the actual size for an access, which is a bit more flexible. The reason to reset range is that it will be relearned using comparisons and transferred to copies (reg->id is assigned for each += var_off), which doesn't apply to slice pointers (essentially the only reason to keep them is being able to pick them for invalidation), we try to disable the rest of the pkt pointer magic in the verifier, anyway. pkt_reg->umax_value influences the prog->aux->max_pkt_offset (and iiuc it can reach that point with range == 0 after += var_off, and zero_size_allowed == true), only seems to be used by netronome's ebpf offload for now, but still a bit confusing if slice pkt pointers cause this to change. > > > > > If you start returning false whenever you see this type tag set, it > > will become too conservative (it considers reg copies of the same > > dynptr_data lookup as distinct). So you need some kind of id assigned > > during dynptr_data lookup to distinguish them. > > What about if the dynptr_pkt type tag is set, then we compare the > ranges as well? If the ranges are the same, then we return true, else > false. Does that work? Seems like it, and true part is already covered by memcmp at the start of the function, I think.
On Thu, Aug 25, 2022 at 01:04:16AM +0200, Kumar Kartikeya Dwivedi wrote: > On Wed, 24 Aug 2022 at 20:11, Joanne Koong <joannelkoong@gmail.com> wrote: > > I'm more and more liking the idea of limiting xdp to match the > > constraints of skb given that both you, Kumar, and Jakub have > > mentioned that portability between xdp and skb would be useful for > > users :) > > > > What are your thoughts on this API: > > > > 1) bpf_dynptr_data() > > > > Before: > > for skb-type progs: > > - data slices in fragments is not supported > > > > for xdp-type progs: > > - data slices in fragments is supported as long as it is in a > > contiguous frag (eg not across frags) > > > > Now: > > for skb + xdp type progs: > > - data slices in fragments is not supported I don't think it is necessary (or help) to restrict xdp slice from getting a fragment. In any case, the xdp prog has to deal with the case that bpf_dynptr_data(xdp_dynptr, offset, len) is across two fragments. Although unlikely, it still need to handle it (dynptr_data returns NULL) properly by using bpf_dynptr_read(). The same that the skb case also needs to handle dynptr_data returning NULL. Something like Kumar's sample in [0] should work for both xdp and skb dynptr but replace the helpers with bpf_dynptr_{data,read,write}(). [0]: https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/T/#mf082a11403bc76fa56fde4de79a25c660680285c > > > > > > 2) bpf_dynptr_write() > > > > Before: > > for skb-type progs: > > - all data slices are invalidated after a write > > > > for xdp-type progs: > > - nothing > > > > Now: > > for skb + xdp type progs: > > - all data slices are invalidated after a write I wonder if the 'Before' behavior can be kept as is. The bpf prog that runs in both xdp and bpf should be the one always expecting the data-slice will be invalidated and it has to call the bpf_dynptr_data() again after writing. Yes, it is unnecessary for xdp but the bpf prog needs to the same anyway if the verifier was the one enforcing it for both skb and xdp dynptr type. If the bpf prog is written for xdp alone, then it has no need to re-get the bpf_dynptr_data() after writing. > > > > There is also the other option: failing to write until you pull skb, > which looks a lot better to me if we are adding this helper anyway...
On Thu, Aug 25, 2022 at 11:37:08PM -0700, Martin KaFai Lau wrote: > On Thu, Aug 25, 2022 at 01:04:16AM +0200, Kumar Kartikeya Dwivedi wrote: > > On Wed, 24 Aug 2022 at 20:11, Joanne Koong <joannelkoong@gmail.com> wrote: > > > I'm more and more liking the idea of limiting xdp to match the > > > constraints of skb given that both you, Kumar, and Jakub have > > > mentioned that portability between xdp and skb would be useful for > > > users :) > > > > > > What are your thoughts on this API: > > > > > > 1) bpf_dynptr_data() > > > > > > Before: > > > for skb-type progs: > > > - data slices in fragments is not supported > > > > > > for xdp-type progs: > > > - data slices in fragments is supported as long as it is in a > > > contiguous frag (eg not across frags) > > > > > > Now: > > > for skb + xdp type progs: > > > - data slices in fragments is not supported > I don't think it is necessary (or help) to restrict xdp slice from getting > a fragment. In any case, the xdp prog has to deal with the case > that bpf_dynptr_data(xdp_dynptr, offset, len) is across two fragments. > Although unlikely, it still need to handle it (dynptr_data returns NULL) > properly by using bpf_dynptr_read(). The same that the skb case > also needs to handle dynptr_data returning NULL. > > Something like Kumar's sample in [0] should work for both > xdp and skb dynptr but replace the helpers with > bpf_dynptr_{data,read,write}(). > > [0]: https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/T/#mf082a11403bc76fa56fde4de79a25c660680285c > > > > > > > > > > 2) bpf_dynptr_write() > > > > > > Before: > > > for skb-type progs: > > > - all data slices are invalidated after a write > > > > > > for xdp-type progs: > > > - nothing > > > > > > Now: > > > for skb + xdp type progs: > > > - all data slices are invalidated after a write > I wonder if the 'Before' behavior can be kept as is. > > The bpf prog that runs in both xdp and bpf should be typo: both xdp and *skb > the one always expecting the data-slice will be invalidated and > it has to call the bpf_dynptr_data() again after writing. > Yes, it is unnecessary for xdp but the bpf prog needs to the > same anyway if the verifier was the one enforcing it for > both skb and xdp dynptr type. > > If the bpf prog is written for xdp alone, then it has > no need to re-get the bpf_dynptr_data() after writing. > > > > > > > > There is also the other option: failing to write until you pull skb, > > which looks a lot better to me if we are adding this helper anyway...
On Thu, Aug 25, 2022 at 4:19 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Thu, 25 Aug 2022 at 22:39, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > On Wed, Aug 24, 2022 at 2:11 PM Kumar Kartikeya Dwivedi > > <memxor@gmail.com> wrote: > > > > > > On Wed, 24 Aug 2022 at 00:27, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > > > > > On Mon, Aug 22, 2022 at 7:31 PM Kumar Kartikeya Dwivedi > > > > <memxor@gmail.com> wrote: > > > > > > [...] > > > > > > if (func_id == BPF_FUNC_dynptr_data && > > > > > > - dynptr_type == BPF_DYNPTR_TYPE_SKB) { > > > > > > + (dynptr_type == BPF_DYNPTR_TYPE_SKB || > > > > > > + dynptr_type == BPF_DYNPTR_TYPE_XDP)) { > > > > > > regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag; > > > > > > regs[BPF_REG_0].range = meta.mem_size; > > > > > > > > > > It doesn't seem like this is safe. Since PTR_TO_PACKET's range can be > > > > > modified by comparisons with packet pointers loaded from the xdp/skb > > > > > ctx, how do we distinguish e.g. between a pkt slice obtained from some > > > > > frag in a multi-buff XDP vs pkt pointer from a linear area? > > > > > > > > > > Someone can compare data_meta from ctx with PTR_TO_PACKET from > > > > > bpf_dynptr_data on xdp dynptr (which might be pointing to a xdp mb > > > > > frag). While MAX_PACKET_OFF is 0xffff, it can still be used to do OOB > > > > > access for the linear area. reg_is_init_pkt_pointer will return true > > > > > as modified range is not considered for it. Same kind of issues when > > > > > doing comparison with data_end from ctx (though maybe you won't be > > > > > able to do incorrect data access at runtime using that). > > > > > > > > > > I had a pkt_uid field in my patch [0] which disallowed comparisons > > > > > among bpf_packet_pointer slices. Each call assigned a fresh pkt_uid, > > > > > and that disabled comparisons for them. reg->id is used for var_off > > > > > range propagation so it cannot be reused. > > > > > > > > > > Coming back to this: What we really want here is a PTR_TO_MEM with a > > > > > mem_size, so maybe you should go that route instead of PTR_TO_PACKET > > > > > (and add a type tag to maybe pretty print it also as a packet pointer > > > > > in verifier log), or add some way to distinguish slice vs non-slice > > > > > pkt pointers like I did in my patch. You might also want to add some > > > > > tests for this corner case (there are some later in [0] if you want to > > > > > reuse them). > > > > > > > > > > So TBH, I kinda dislike my own solution in [0] :). The complexity does > > > > > not seem worth it. The pkt_uid distinction is more useful (and > > > > > actually would be needed) in Toke's xdp queueing series, where in a > > > > > dequeue program you have multiple xdp_mds and want scoped slice > > > > > invalidations (i.e. adjust_head on one xdp_md doesn't invalidate > > > > > slices of some other xdp_md). Here we can just get away with normal > > > > > PTR_TO_MEM. > > > > > > > > > > ... Or just let me know if you handle this correctly already, or if > > > > > this won't be an actual problem :). > > > > > > > > Ooh interesting, I hadn't previously taken a look at > > > > try_match_pkt_pointers(), thanks for mentioning it :) > > > > > > > > The cleanest solution to me is to add the flag "DYNPTR_TYPE_{SKB/XDP}" > > > > to PTR_TO_PACKET and change reg_is_init_pkt_pointer() to return false > > > > if the DYNPTR_TYPE_{SKB/XDP} flag is present. I prefer this over > > > > returning PTR_TO_MEM because it seems more robust (eg if in the future > > > > we reject x behavior on the packet data reg types, this will > > > > automatically apply to the data slices), and because it'll keep the > > > > logic more efficient/simpler for the case when the pkt pointer has to > > > > be cleared after any helper that changes pkt data is called (aka the > > > > case where the data slice gets invalidated). > > > > > > > > What are your thoughts? > > > > > > > > > > Thinking more deeply about it, probably not, we need more work here. I > > > remember _now_ why I chose the pkt_uid approach (and this tells us my > > > commit log lacks all the details about the motivation :( ). > > > > > > Consider how equivalency checking for packet pointers works in > > > regsafe. It is checking type, then if old range > cur range, then > > > offs, etc. > > > > > > The problem is, while we now don't prune on access to ptr_to_pkt vs > > > ptr_to_pkt | dynptr_pkt types in same reg (since type differs we > > > return false), we still prune if old range of ptr_to_pkt | dynptr_pkt > > > > cur range of ptr_to_pkt | dynptr_pkt. Both could be pointing into > > > separate frags, so this assumption would be incorrect. I would be able > > > to trick the verifier into accessing data beyond the length of a > > > different frag, by first making sure one line of exploration is > > > verified, and then changing the register in another branch reaching > > > the same branch target. Helpers can take packet pointers so the access > > > can become a pruning point. It would think the rest of the stuff is > > > safe, while they are not equivalent at all. It is ok if they are bit > > > by bit equivalent (same type, range, off, etc.). > > > > Thanks for the explanation. To clarify, if old range of ptr_to_pkt > > > cur range of ptr_to_pkt, what gets pruned? Is it access to cur range > > of ptr_to_pkt since if old range > cur range, then if old range is > > acceptable cur range must definitely be acceptable? > > No, my description was bad :). > We return false when old_range > cur_range, i.e. the path is > considered safe and not explored again when old_range <= cur_range > (pruning), otherwise we continue verifying. > Consider if it was doing pkt[cur_range + 1] access in the path we are > about to explore again (already verified for old_range). That is <= > old_range, but > cur_range, so it would be problematic if we pruned > search for old_range > cur_range. Does "old_range" here refer to the range that was already previously verified as safe by the verifier? And "cur_range" is the new range that we are trying to figure out is safe or not? When you say "we return false when old_range > cur_range", what function are we returning false from? > > So maybe it won't be a problem here, and just the current range checks > for pkt pointer slices is fine even if they belong to different frags. > I didn't craft any test case when writing my previous reply. > Especially since you will disable comparisons, one cannot relearn > range again using var_off + comparison, which closes another loophole. > > It just seems simpler to me to be a bit more conservative, since it is > only an optimization. There might be some corner case lurking we can't > think of right now. But I leave the judgement up to you if you can > reason about it. In either case it would be good to include some > comments in the commit log about all this. > > Meanwhile, looking at the current code, I'm more inclined to suggest > PTR_TO_MEM (and handle invalidation specially), but again, I will > leave it up to you to decide. > > When we do += var_off to a pkt reg, its range is reset to zero, > compared to PTR_TO_MEM, where off + var_off (smin/umax) is used to > check against the actual size for an access, which is a bit more > flexible. The reason to reset range is that it will be relearned using > comparisons and transferred to copies (reg->id is assigned for each += Can you direct me to the function where this relearning happens? thanks! > var_off), which doesn't apply to slice pointers (essentially the only > reason to keep them is being able to pick them for invalidation), we > try to disable the rest of the pkt pointer magic in the verifier, > anyway. > > pkt_reg->umax_value influences the prog->aux->max_pkt_offset (and iiuc > it can reach that point with range == 0 after += var_off, and > zero_size_allowed == true), only seems to be used by netronome's ebpf > offload for now, but still a bit confusing if slice pkt pointers cause > this to change. My major takeaway from this discussion is that there's a lot of extra subtleties when reg is PTR_TO_PACKET :) I'm going to delve deeper into the source code, but from a glance, I think you're right that just assigning PTR_TO_MEM for the data slice will probably make things a lot more straightforward. thanks for the discussion! > > > > > > > > > If you start returning false whenever you see this type tag set, it > > > will become too conservative (it considers reg copies of the same > > > dynptr_data lookup as distinct). So you need some kind of id assigned > > > during dynptr_data lookup to distinguish them. > > > > What about if the dynptr_pkt type tag is set, then we compare the > > ranges as well? If the ranges are the same, then we return true, else > > false. Does that work? > > Seems like it, and true part is already covered by memcmp at the start > of the function, I think.
On Fri, 26 Aug 2022 at 20:23, Joanne Koong <joannelkoong@gmail.com> wrote: > > On Thu, Aug 25, 2022 at 4:19 PM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: > > > > On Thu, 25 Aug 2022 at 22:39, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > > > On Wed, Aug 24, 2022 at 2:11 PM Kumar Kartikeya Dwivedi > > > <memxor@gmail.com> wrote: > > > > > > > > On Wed, 24 Aug 2022 at 00:27, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > > > > > > > On Mon, Aug 22, 2022 at 7:31 PM Kumar Kartikeya Dwivedi > > > > > <memxor@gmail.com> wrote: > > > > > > > [...] > > > > > > > if (func_id == BPF_FUNC_dynptr_data && > > > > > > > - dynptr_type == BPF_DYNPTR_TYPE_SKB) { > > > > > > > + (dynptr_type == BPF_DYNPTR_TYPE_SKB || > > > > > > > + dynptr_type == BPF_DYNPTR_TYPE_XDP)) { > > > > > > > regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag; > > > > > > > regs[BPF_REG_0].range = meta.mem_size; > > > > > > > > > > > > It doesn't seem like this is safe. Since PTR_TO_PACKET's range can be > > > > > > modified by comparisons with packet pointers loaded from the xdp/skb > > > > > > ctx, how do we distinguish e.g. between a pkt slice obtained from some > > > > > > frag in a multi-buff XDP vs pkt pointer from a linear area? > > > > > > > > > > > > Someone can compare data_meta from ctx with PTR_TO_PACKET from > > > > > > bpf_dynptr_data on xdp dynptr (which might be pointing to a xdp mb > > > > > > frag). While MAX_PACKET_OFF is 0xffff, it can still be used to do OOB > > > > > > access for the linear area. reg_is_init_pkt_pointer will return true > > > > > > as modified range is not considered for it. Same kind of issues when > > > > > > doing comparison with data_end from ctx (though maybe you won't be > > > > > > able to do incorrect data access at runtime using that). > > > > > > > > > > > > I had a pkt_uid field in my patch [0] which disallowed comparisons > > > > > > among bpf_packet_pointer slices. Each call assigned a fresh pkt_uid, > > > > > > and that disabled comparisons for them. reg->id is used for var_off > > > > > > range propagation so it cannot be reused. > > > > > > > > > > > > Coming back to this: What we really want here is a PTR_TO_MEM with a > > > > > > mem_size, so maybe you should go that route instead of PTR_TO_PACKET > > > > > > (and add a type tag to maybe pretty print it also as a packet pointer > > > > > > in verifier log), or add some way to distinguish slice vs non-slice > > > > > > pkt pointers like I did in my patch. You might also want to add some > > > > > > tests for this corner case (there are some later in [0] if you want to > > > > > > reuse them). > > > > > > > > > > > > So TBH, I kinda dislike my own solution in [0] :). The complexity does > > > > > > not seem worth it. The pkt_uid distinction is more useful (and > > > > > > actually would be needed) in Toke's xdp queueing series, where in a > > > > > > dequeue program you have multiple xdp_mds and want scoped slice > > > > > > invalidations (i.e. adjust_head on one xdp_md doesn't invalidate > > > > > > slices of some other xdp_md). Here we can just get away with normal > > > > > > PTR_TO_MEM. > > > > > > > > > > > > ... Or just let me know if you handle this correctly already, or if > > > > > > this won't be an actual problem :). > > > > > > > > > > Ooh interesting, I hadn't previously taken a look at > > > > > try_match_pkt_pointers(), thanks for mentioning it :) > > > > > > > > > > The cleanest solution to me is to add the flag "DYNPTR_TYPE_{SKB/XDP}" > > > > > to PTR_TO_PACKET and change reg_is_init_pkt_pointer() to return false > > > > > if the DYNPTR_TYPE_{SKB/XDP} flag is present. I prefer this over > > > > > returning PTR_TO_MEM because it seems more robust (eg if in the future > > > > > we reject x behavior on the packet data reg types, this will > > > > > automatically apply to the data slices), and because it'll keep the > > > > > logic more efficient/simpler for the case when the pkt pointer has to > > > > > be cleared after any helper that changes pkt data is called (aka the > > > > > case where the data slice gets invalidated). > > > > > > > > > > What are your thoughts? > > > > > > > > > > > > > Thinking more deeply about it, probably not, we need more work here. I > > > > remember _now_ why I chose the pkt_uid approach (and this tells us my > > > > commit log lacks all the details about the motivation :( ). > > > > > > > > Consider how equivalency checking for packet pointers works in > > > > regsafe. It is checking type, then if old range > cur range, then > > > > offs, etc. > > > > > > > > The problem is, while we now don't prune on access to ptr_to_pkt vs > > > > ptr_to_pkt | dynptr_pkt types in same reg (since type differs we > > > > return false), we still prune if old range of ptr_to_pkt | dynptr_pkt > > > > > cur range of ptr_to_pkt | dynptr_pkt. Both could be pointing into > > > > separate frags, so this assumption would be incorrect. I would be able > > > > to trick the verifier into accessing data beyond the length of a > > > > different frag, by first making sure one line of exploration is > > > > verified, and then changing the register in another branch reaching > > > > the same branch target. Helpers can take packet pointers so the access > > > > can become a pruning point. It would think the rest of the stuff is > > > > safe, while they are not equivalent at all. It is ok if they are bit > > > > by bit equivalent (same type, range, off, etc.). > > > > > > Thanks for the explanation. To clarify, if old range of ptr_to_pkt > > > > cur range of ptr_to_pkt, what gets pruned? Is it access to cur range > > > of ptr_to_pkt since if old range > cur range, then if old range is > > > acceptable cur range must definitely be acceptable? > > > > No, my description was bad :). > > We return false when old_range > cur_range, i.e. the path is > > considered safe and not explored again when old_range <= cur_range > > (pruning), otherwise we continue verifying. > > Consider if it was doing pkt[cur_range + 1] access in the path we are > > about to explore again (already verified for old_range). That is <= > > old_range, but > cur_range, so it would be problematic if we pruned > > search for old_range > cur_range. > > Does "old_range" here refer to the range that was already previously > verified as safe by the verifier? And "cur_range" is the new range > that we are trying to figure out is safe or not? Yes, it's the range of the same reg in the already safe verified state from the point we are about to explore again (and considering 'pruning' the search if the current state would satisfy the safety properties as well). > > When you say "we return false when old_range > cur_range", what > function are we returning false from? > We return false from the regsafe function, i.e. it isn't safe if the old range was greater than the current range, otherwise we match other stuff before we return true (like off being equal, and var_off of old being within cur's). > > > > So maybe it won't be a problem here, and just the current range checks > > for pkt pointer slices is fine even if they belong to different frags. > > I didn't craft any test case when writing my previous reply. > > Especially since you will disable comparisons, one cannot relearn > > range again using var_off + comparison, which closes another loophole. > > > > It just seems simpler to me to be a bit more conservative, since it is > > only an optimization. There might be some corner case lurking we can't > > think of right now. But I leave the judgement up to you if you can > > reason about it. In either case it would be good to include some > > comments in the commit log about all this. > > > > Meanwhile, looking at the current code, I'm more inclined to suggest > > PTR_TO_MEM (and handle invalidation specially), but again, I will > > leave it up to you to decide. > > > > When we do += var_off to a pkt reg, its range is reset to zero, > > compared to PTR_TO_MEM, where off + var_off (smin/umax) is used to > > check against the actual size for an access, which is a bit more > > flexible. The reason to reset range is that it will be relearned using > > comparisons and transferred to copies (reg->id is assigned for each += > > Can you direct me to the function where this relearning happens? thanks! > See the code around the comment: /* something was added to pkt_ptr, set range to zero */ where it memsets the range to 0, then in adjust_ptr_min_max_vals, and then you see how find_good_pkt_pointers also takes into account reg->umax_value + off before setting reg->range after your next comparison (because at runtime the variable offset is already added to the pointer). > > var_off), which doesn't apply to slice pointers (essentially the only > > reason to keep them is being able to pick them for invalidation), we > > try to disable the rest of the pkt pointer magic in the verifier, > > anyway. > > > > pkt_reg->umax_value influences the prog->aux->max_pkt_offset (and iiuc > > it can reach that point with range == 0 after += var_off, and > > zero_size_allowed == true), only seems to be used by netronome's ebpf > > offload for now, but still a bit confusing if slice pkt pointers cause > > this to change. > > My major takeaway from this discussion is that there's a lot of extra > subtleties when reg is PTR_TO_PACKET :) I'm going to delve deeper into > the source code, but from a glance, I think you're right that just > assigning PTR_TO_MEM for the data slice will probably make things a > lot more straightforward. thanks for the discussion! Exactly. I think this is an internal verifier detail so we can definitely go back and change this later, but IMO we'd avoid a lot of headache if we just choose PTR_TO_MEM, since PTR_TO_PACKET has a lot of special semantics. > > > > > > > > > > > > > > If you start returning false whenever you see this type tag set, it > > > > will become too conservative (it considers reg copies of the same > > > > dynptr_data lookup as distinct). So you need some kind of id assigned > > > > during dynptr_data lookup to distinguish them. > > > > > > What about if the dynptr_pkt type tag is set, then we compare the > > > ranges as well? If the ranges are the same, then we return true, else > > > false. Does that work? > > > > Seems like it, and true part is already covered by memcmp at the start > > of the function, I think.
On Fri, 26 Aug 2022 at 08:37, Martin KaFai Lau <kafai@fb.com> wrote: > > On Thu, Aug 25, 2022 at 01:04:16AM +0200, Kumar Kartikeya Dwivedi wrote: > > On Wed, 24 Aug 2022 at 20:11, Joanne Koong <joannelkoong@gmail.com> wrote: > > > I'm more and more liking the idea of limiting xdp to match the > > > constraints of skb given that both you, Kumar, and Jakub have > > > mentioned that portability between xdp and skb would be useful for > > > users :) > > > > > > What are your thoughts on this API: > > > > > > 1) bpf_dynptr_data() > > > > > > Before: > > > for skb-type progs: > > > - data slices in fragments is not supported > > > > > > for xdp-type progs: > > > - data slices in fragments is supported as long as it is in a > > > contiguous frag (eg not across frags) > > > > > > Now: > > > for skb + xdp type progs: > > > - data slices in fragments is not supported > I don't think it is necessary (or help) to restrict xdp slice from getting > a fragment. In any case, the xdp prog has to deal with the case > that bpf_dynptr_data(xdp_dynptr, offset, len) is across two fragments. > Although unlikely, it still need to handle it (dynptr_data returns NULL) > properly by using bpf_dynptr_read(). The same that the skb case > also needs to handle dynptr_data returning NULL. > > Something like Kumar's sample in [0] should work for both > xdp and skb dynptr but replace the helpers with > bpf_dynptr_{data,read,write}(). > > [0]: https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/T/#mf082a11403bc76fa56fde4de79a25c660680285c > > > > > > > > > > 2) bpf_dynptr_write() > > > > > > Before: > > > for skb-type progs: > > > - all data slices are invalidated after a write > > > > > > for xdp-type progs: > > > - nothing > > > > > > Now: > > > for skb + xdp type progs: > > > - all data slices are invalidated after a write > I wonder if the 'Before' behavior can be kept as is. > > The bpf prog that runs in both xdp and bpf should be > the one always expecting the data-slice will be invalidated and > it has to call the bpf_dynptr_data() again after writing. > Yes, it is unnecessary for xdp but the bpf prog needs to the > same anyway if the verifier was the one enforcing it for > both skb and xdp dynptr type. > > If the bpf prog is written for xdp alone, then it has > no need to re-get the bpf_dynptr_data() after writing. > Yeah, compared to the alternative, I guess it's better how it is right now. It doesn't seem possible to meaningfully hide the differences without penalizing either case. It also wouldn't be good to make it less useful for XDP, since the whole point of this and the previous effort was exposing bpf_xdp_pointer to the user.
On Fri, Aug 26, 2022 at 12:09 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Fri, 26 Aug 2022 at 08:37, Martin KaFai Lau <kafai@fb.com> wrote: > > > > On Thu, Aug 25, 2022 at 01:04:16AM +0200, Kumar Kartikeya Dwivedi wrote: > > > On Wed, 24 Aug 2022 at 20:11, Joanne Koong <joannelkoong@gmail.com> wrote: > > > > I'm more and more liking the idea of limiting xdp to match the > > > > constraints of skb given that both you, Kumar, and Jakub have > > > > mentioned that portability between xdp and skb would be useful for > > > > users :) > > > > > > > > What are your thoughts on this API: > > > > > > > > 1) bpf_dynptr_data() > > > > > > > > Before: > > > > for skb-type progs: > > > > - data slices in fragments is not supported > > > > > > > > for xdp-type progs: > > > > - data slices in fragments is supported as long as it is in a > > > > contiguous frag (eg not across frags) > > > > > > > > Now: > > > > for skb + xdp type progs: > > > > - data slices in fragments is not supported > > I don't think it is necessary (or help) to restrict xdp slice from getting > > a fragment. In any case, the xdp prog has to deal with the case > > that bpf_dynptr_data(xdp_dynptr, offset, len) is across two fragments. > > Although unlikely, it still need to handle it (dynptr_data returns NULL) > > properly by using bpf_dynptr_read(). The same that the skb case > > also needs to handle dynptr_data returning NULL. > > > > Something like Kumar's sample in [0] should work for both > > xdp and skb dynptr but replace the helpers with > > bpf_dynptr_{data,read,write}(). > > > > [0]: https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/T/#mf082a11403bc76fa56fde4de79a25c660680285c > > > > > > > > > > > > > > 2) bpf_dynptr_write() > > > > > > > > Before: > > > > for skb-type progs: > > > > - all data slices are invalidated after a write > > > > > > > > for xdp-type progs: > > > > - nothing > > > > > > > > Now: > > > > for skb + xdp type progs: > > > > - all data slices are invalidated after a write > > I wonder if the 'Before' behavior can be kept as is. > > > > The bpf prog that runs in both xdp and bpf should be > > the one always expecting the data-slice will be invalidated and > > it has to call the bpf_dynptr_data() again after writing. > > Yes, it is unnecessary for xdp but the bpf prog needs to the > > same anyway if the verifier was the one enforcing it for > > both skb and xdp dynptr type. > > > > If the bpf prog is written for xdp alone, then it has > > no need to re-get the bpf_dynptr_data() after writing. > > > > Yeah, compared to the alternative, I guess it's better how it is right > now. It doesn't seem possible to meaningfully hide the differences > without penalizing either case. It also wouldn't be good to make it > less useful for XDP, since the whole point of this and the previous > effort was exposing bpf_xdp_pointer to the user. I'll keep it as is for v5.
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 30615d1a0c13..455a215b6c57 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -410,11 +410,15 @@ enum bpf_type_flag { /* DYNPTR points to sk_buff */ DYNPTR_TYPE_SKB = BIT(11 + BPF_BASE_TYPE_BITS), + /* DYNPTR points to xdp_buff */ + DYNPTR_TYPE_XDP = BIT(12 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; -#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB) +#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \ + | DYNPTR_TYPE_XDP) /* Max number of base types. */ #define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS) diff --git a/include/linux/filter.h b/include/linux/filter.h index 649063d9cbfd..80f030239877 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1535,5 +1535,8 @@ static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifind int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len); int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags); +int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len); +int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len); +void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len); #endif /* __LINUX_FILTER_H__ */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 320e6b95d95c..9feea29eebcd 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5283,13 +5283,18 @@ union bpf_attr { * and try again. * * * The data slice is automatically invalidated anytime - * **bpf_dynptr_write**\ () or a helper call that changes - * the underlying packet buffer (eg **bpf_skb_pull_data**\ ()) + * **bpf_dynptr_write**\ () is called. + * + * For skb-type and xdp-type dynptrs: + * * The data slice is automatically invalidated anytime a + * helper call that changes the underlying packet buffer + * (eg **bpf_skb_pull_data**\ (), **bpf_xdp_adjust_head**\ ()) * is called. * Return * Pointer to the underlying dynptr data, NULL if the dynptr is * read-only, if the dynptr is invalid, or if the offset and length - * is out of bounds or in a paged buffer for skb-type dynptrs. + * is out of bounds or in a paged buffer for skb-type dynptrs or + * across fragments for xdp-type dynptrs. * * s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr *th, u32 th_len) * Description @@ -5388,6 +5393,19 @@ union bpf_attr { * *flags* is currently unused, it must be 0 for now. * Return * 0 on success or -EINVAL if flags is not 0. + * + * long bpf_dynptr_from_xdp(struct xdp_buff *xdp_md, u64 flags, struct bpf_dynptr *ptr) + * Description + * Get a dynptr to the data in *xdp_md*. *xdp_md* must be the BPF program + * context. + * + * Calls that change the *xdp_md*'s underlying packet buffer + * (eg **bpf_xdp_adjust_head**\ ()) do not invalidate the dynptr, but + * they do invalidate any data slices associated with the dynptr. + * + * *flags* is currently unused, it must be 0 for now. + * Return + * 0 on success, -EINVAL if flags is not 0. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5600,6 +5618,7 @@ union bpf_attr { FN(tcp_raw_check_syncookie_ipv6), \ FN(ktime_get_tai_ns), \ FN(dynptr_from_skb), \ + FN(dynptr_from_xdp), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 471a01a9b6ae..2b9dc4c6de04 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1541,6 +1541,8 @@ BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src return 0; case BPF_DYNPTR_TYPE_SKB: return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len); + case BPF_DYNPTR_TYPE_XDP: + return __bpf_xdp_load_bytes(src->data, src->offset + offset, dst, len); default: WARN(true, "bpf_dynptr_read: unknown dynptr type %d\n", type); return -EFAULT; @@ -1583,6 +1585,10 @@ BPF_CALL_5(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, case BPF_DYNPTR_TYPE_SKB: return __bpf_skb_store_bytes(dst->data, dst->offset + offset, src, len, flags); + case BPF_DYNPTR_TYPE_XDP: + if (flags) + return -EINVAL; + return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len); default: WARN(true, "bpf_dynptr_write: unknown dynptr type %d\n", type); return -EFAULT; @@ -1616,7 +1622,7 @@ BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len type = bpf_dynptr_get_type(ptr); - /* Only skb dynptrs can get read-only data slices, because the + /* Only skb and xdp dynptrs can get read-only data slices, because the * verifier enforces PTR_TO_PACKET accesses */ is_rdonly = bpf_dynptr_is_rdonly(ptr); @@ -1640,6 +1646,12 @@ BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len data = skb->data; break; } + case BPF_DYNPTR_TYPE_XDP: + /* if the requested data in across fragments, then it cannot + * be accessed directly - bpf_xdp_pointer will return NULL + */ + return (unsigned long)bpf_xdp_pointer(ptr->data, + ptr->offset + offset, len); default: WARN(true, "bpf_dynptr_data: unknown dynptr type %d\n", type); return 0; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1ea295f47525..d33648eee188 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -686,6 +686,8 @@ static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type) return BPF_DYNPTR_TYPE_RINGBUF; case DYNPTR_TYPE_SKB: return BPF_DYNPTR_TYPE_SKB; + case DYNPTR_TYPE_XDP: + return BPF_DYNPTR_TYPE_XDP; default: return BPF_DYNPTR_TYPE_INVALID; } @@ -6078,6 +6080,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, case DYNPTR_TYPE_SKB: err_extra = "skb "; break; + case DYNPTR_TYPE_XDP: + err_extra = "xdp "; + break; } verbose(env, "Expected an initialized %sdynptr as arg #%d\n", @@ -7439,7 +7444,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn mark_reg_known_zero(env, regs, BPF_REG_0); if (func_id == BPF_FUNC_dynptr_data && - dynptr_type == BPF_DYNPTR_TYPE_SKB) { + (dynptr_type == BPF_DYNPTR_TYPE_SKB || + dynptr_type == BPF_DYNPTR_TYPE_XDP)) { regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag; regs[BPF_REG_0].range = meta.mem_size; } else { diff --git a/net/core/filter.c b/net/core/filter.c index 5b204b42fb3e..54fbe8f511db 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3825,7 +3825,29 @@ static const struct bpf_func_proto sk_skb_change_head_proto = { .arg3_type = ARG_ANYTHING, }; -BPF_CALL_1(bpf_xdp_get_buff_len, struct xdp_buff*, xdp) +BPF_CALL_3(bpf_dynptr_from_xdp, struct xdp_buff*, xdp, u64, flags, + struct bpf_dynptr_kern *, ptr) +{ + if (flags) { + bpf_dynptr_set_null(ptr); + return -EINVAL; + } + + bpf_dynptr_init(ptr, xdp, BPF_DYNPTR_TYPE_XDP, 0, xdp_get_buff_len(xdp)); + + return 0; +} + +static const struct bpf_func_proto bpf_dynptr_from_xdp_proto = { + .func = bpf_dynptr_from_xdp, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_XDP | MEM_UNINIT, +}; + +BPF_CALL_1(bpf_xdp_get_buff_len, struct xdp_buff*, xdp) { return xdp_get_buff_len(xdp); } @@ -3927,7 +3949,7 @@ static void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, } } -static void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) +void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) { struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); u32 size = xdp->data_end - xdp->data; @@ -3958,8 +3980,7 @@ static void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) return offset + len <= size ? addr + offset : NULL; } -BPF_CALL_4(bpf_xdp_load_bytes, struct xdp_buff *, xdp, u32, offset, - void *, buf, u32, len) +int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len) { void *ptr; @@ -3975,6 +3996,12 @@ BPF_CALL_4(bpf_xdp_load_bytes, struct xdp_buff *, xdp, u32, offset, return 0; } +BPF_CALL_4(bpf_xdp_load_bytes, struct xdp_buff *, xdp, u32, offset, + void *, buf, u32, len) +{ + return __bpf_xdp_load_bytes(xdp, offset, buf, len); +} + static const struct bpf_func_proto bpf_xdp_load_bytes_proto = { .func = bpf_xdp_load_bytes, .gpl_only = false, @@ -3985,8 +4012,7 @@ static const struct bpf_func_proto bpf_xdp_load_bytes_proto = { .arg4_type = ARG_CONST_SIZE, }; -BPF_CALL_4(bpf_xdp_store_bytes, struct xdp_buff *, xdp, u32, offset, - void *, buf, u32, len) +int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len) { void *ptr; @@ -4002,6 +4028,12 @@ BPF_CALL_4(bpf_xdp_store_bytes, struct xdp_buff *, xdp, u32, offset, return 0; } +BPF_CALL_4(bpf_xdp_store_bytes, struct xdp_buff *, xdp, u32, offset, + void *, buf, u32, len) +{ + return __bpf_xdp_store_bytes(xdp, offset, buf, len); +} + static const struct bpf_func_proto bpf_xdp_store_bytes_proto = { .func = bpf_xdp_store_bytes, .gpl_only = false, @@ -8009,6 +8041,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_tcp_raw_check_syncookie_ipv6_proto; #endif #endif + case BPF_FUNC_dynptr_from_xdp: + return &bpf_dynptr_from_xdp_proto; default: return bpf_sk_base_func_proto(func_id); } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 3f1800a2b77c..0d5b0117db2a 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5283,13 +5283,18 @@ union bpf_attr { * and try again. * * * The data slice is automatically invalidated anytime - * **bpf_dynptr_write**\ () or a helper call that changes - * the underlying packet buffer (eg **bpf_skb_pull_data**\ ()) + * **bpf_dynptr_write**\ () is called. + * + * For skb-type and xdp-type dynptrs: + * * The data slice is automatically invalidated anytime a + * helper call that changes the underlying packet buffer + * (eg **bpf_skb_pull_data**\ (), **bpf_xdp_adjust_head**\ ()) * is called. * Return * Pointer to the underlying dynptr data, NULL if the dynptr is * read-only, if the dynptr is invalid, or if the offset and length - * is out of bounds or in a paged buffer for skb-type dynptrs. + * is out of bounds or in a paged buffer for skb-type dynptrs or + * across fragments for xdp-type dynptrs. * * s64 bpf_tcp_raw_gen_syncookie_ipv4(struct iphdr *iph, struct tcphdr *th, u32 th_len) * Description @@ -5388,6 +5393,19 @@ union bpf_attr { * *flags* is currently unused, it must be 0 for now. * Return * 0 on success or -EINVAL if flags is not 0. + * + * long bpf_dynptr_from_xdp(struct xdp_buff *xdp_md, u64 flags, struct bpf_dynptr *ptr) + * Description + * Get a dynptr to the data in *xdp_md*. *xdp_md* must be the BPF program + * context. + * + * Calls that change the *xdp_md*'s underlying packet buffer + * (eg **bpf_xdp_adjust_head**\ ()) do not invalidate the dynptr, but + * they do invalidate any data slices associated with the dynptr. + * + * *flags* is currently unused, it must be 0 for now. + * Return + * 0 on success, -EINVAL if flags is not 0. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5600,6 +5618,7 @@ union bpf_attr { FN(tcp_raw_check_syncookie_ipv6), \ FN(ktime_get_tai_ns), \ FN(dynptr_from_skb), \ + FN(dynptr_from_xdp), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper
Add xdp dynptrs, which are dynptrs whose underlying pointer points to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of xdp->data and xdp->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For reads and writes on the dynptr, this includes reading/writing from/to and across fragments. For data slices, direct access to data in fragments is also permitted, but access across fragments is not. The returned data slice is reg type PTR_TO_PACKET | PTR_MAYBE_NULL. Any helper calls that change the underlying packet buffer (eg bpf_xdp_adjust_head) invalidates any data slices of the associated dynptr. Whenever such a helper call is made, the verifier marks any PTR_TO_PACKET reg type (which includes xdp dynptr slices since they are PTR_TO_PACKETs) as unknown. The stack trace for this is check_helper_call() -> clear_all_pkt_pointers() -> __clear_all_pkt_pointers() -> mark_reg_unknown() For examples of how xdp dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> --- include/linux/bpf.h | 6 ++++- include/linux/filter.h | 3 +++ include/uapi/linux/bpf.h | 25 +++++++++++++++--- kernel/bpf/helpers.c | 14 ++++++++++- kernel/bpf/verifier.c | 8 +++++- net/core/filter.c | 46 +++++++++++++++++++++++++++++----- tools/include/uapi/linux/bpf.h | 25 +++++++++++++++--- 7 files changed, 112 insertions(+), 15 deletions(-)