Message ID | 20220428211059.4065379-4-joannelkoong@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | Dynamic pointers | expand |
On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > Currently, our only way of writing dynamically-sized data into a ring > buffer is through bpf_ringbuf_output but this incurs an extra memcpy > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra > memcpy, but it can only safely support reservation sizes that are > statically known since the verifier cannot guarantee that the bpf > program won’t access memory outside the reserved space. > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer > reservations without the extra memcpy. > > There are 3 new APIs: > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr); > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags); > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags); > > These closely follow the functionalities of the original ringbuf APIs. > For example, all ringbuffer dynptrs that have been reserved must be > either submitted or discarded before the program exits. > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com> > --- Looks great! Modulo those four underscores, they are super confusing... > include/linux/bpf.h | 10 ++++- > include/uapi/linux/bpf.h | 35 +++++++++++++++++ > kernel/bpf/helpers.c | 6 +++ > kernel/bpf/ringbuf.c | 71 ++++++++++++++++++++++++++++++++++ > kernel/bpf/verifier.c | 18 +++++++-- > tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++ > 6 files changed, 171 insertions(+), 4 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 757440406962..10efbec99e93 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -394,7 +394,10 @@ enum bpf_type_flag { > /* DYNPTR points to dynamically allocated memory. */ > DYNPTR_TYPE_MALLOC = BIT(8 + BPF_BASE_TYPE_BITS), > > - __BPF_TYPE_LAST_FLAG = DYNPTR_TYPE_MALLOC, > + /* DYNPTR points to a ringbuf record. */ > + DYNPTR_TYPE_RINGBUF = BIT(9 + BPF_BASE_TYPE_BITS), > + > + __BPF_TYPE_LAST_FLAG = DYNPTR_TYPE_RINGBUF, it's getting a bit old to have to update __BPF_TYPE_LAST_FLAG all the time, maybe let's do this: __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, and never touch it again? > }; > [...] > + * > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags) > + * Description > + * Discard reserved ring buffer sample through the dynptr > + * interface. This is a no-op if the dynptr is invalid/null. > + * > + * For more information on *flags*, please see > + * 'bpf_ringbuf_discard'. > + * Return > + * Nothing. Always succeeds. > */ let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not sure which one is more appropriate, probably just null one), so we can check in code whether some reservation was successful without knowing bpf_ringbuf_reserve_dynptr()'s return value > #define __BPF_FUNC_MAPPER(FN) \ > FN(unspec), \ [...] > +BPF_CALL_4(bpf_ringbuf_reserve_dynptr, struct bpf_map *, map, u32, size, u64, flags, > + struct bpf_dynptr_kern *, ptr) > +{ > + void *sample; > + int err; > + > + err = bpf_dynptr_check_size(size); > + if (err) { > + bpf_dynptr_set_null(ptr); > + return err; > + } > + > + sample = (void __force *)____bpf_ringbuf_reserve(map, size, flags); I was so confused by these four underscored for a bit... Is this what's defined inside BPF_CALL_4 (and thus makes it ungreppable). Can you instead just open-code container_of and __bpf_ringbuf_reserve() directly to make it a bit easier to follow? And flags check as well. It will so much easier to understand what's going on. > + > + if (!sample) { > + bpf_dynptr_set_null(ptr); > + return -EINVAL; > + } > + > + bpf_dynptr_init(ptr, sample, BPF_DYNPTR_TYPE_RINGBUF, 0, size); > + > + return 0; > +} > + > +const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto = { > + .func = bpf_ringbuf_reserve_dynptr, > + .ret_type = RET_INTEGER, > + .arg1_type = ARG_CONST_MAP_PTR, > + .arg2_type = ARG_ANYTHING, > + .arg3_type = ARG_ANYTHING, > + .arg4_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | MEM_UNINIT, > +}; > + > +BPF_CALL_2(bpf_ringbuf_submit_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags) > +{ > + if (!ptr->data) > + return 0; > + > + ____bpf_ringbuf_submit(ptr->data, flags); this just calls bpf_ringbuf_commit(), let's do it here explicitly as well > + > + bpf_dynptr_set_null(ptr); > + > + return 0; > +} > + > +const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto = { > + .func = bpf_ringbuf_submit_dynptr, > + .ret_type = RET_VOID, > + .arg1_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE, > + .arg2_type = ARG_ANYTHING, > +}; > + > +BPF_CALL_2(bpf_ringbuf_discard_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags) > +{ > + if (!ptr->data) > + return 0; > + > + ____bpf_ringbuf_discard(ptr->data, flags); > + ditto > + bpf_dynptr_set_null(ptr); > + > + return 0; > +} > + [...]
On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > > > Currently, our only way of writing dynamically-sized data into a ring > > buffer is through bpf_ringbuf_output but this incurs an extra memcpy > > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra > > memcpy, but it can only safely support reservation sizes that are > > statically known since the verifier cannot guarantee that the bpf > > program won’t access memory outside the reserved space. > > > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer > > reservations without the extra memcpy. > > > > There are 3 new APIs: > > > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr); > > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags); > > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags); > > > > These closely follow the functionalities of the original ringbuf APIs. > > For example, all ringbuffer dynptrs that have been reserved must be > > either submitted or discarded before the program exits. > > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com> > > --- > > Looks great! Modulo those four underscores, they are super confusing... > > > include/linux/bpf.h | 10 ++++- > > include/uapi/linux/bpf.h | 35 +++++++++++++++++ > > kernel/bpf/helpers.c | 6 +++ > > kernel/bpf/ringbuf.c | 71 ++++++++++++++++++++++++++++++++++ > > kernel/bpf/verifier.c | 18 +++++++-- > > tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++ > > 6 files changed, 171 insertions(+), 4 deletions(-) > > [...] > > > > [...] > > > + * > > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags) > > + * Description > > + * Discard reserved ring buffer sample through the dynptr > > + * interface. This is a no-op if the dynptr is invalid/null. > > + * > > + * For more information on *flags*, please see > > + * 'bpf_ringbuf_discard'. > > + * Return > > + * Nothing. Always succeeds. > > */ > > let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not > sure which one is more appropriate, probably just null one), so we can > check in code whether some reservation was successful without knowing > bpf_ringbuf_reserve_dynptr()'s return value I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset (convenience helpers). Do you prefer that this be part of this patchset instead? If so, do you think this should be part of the 2nd patch (aka the one where we set up the infra for dynptrs + implement malloc-type dynptrs) or this ringbuf patch or its own patch? > > [...] > [...]
On Mon, May 9, 2022 at 12:44 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > > > > > Currently, our only way of writing dynamically-sized data into a ring > > > buffer is through bpf_ringbuf_output but this incurs an extra memcpy > > > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra > > > memcpy, but it can only safely support reservation sizes that are > > > statically known since the verifier cannot guarantee that the bpf > > > program won’t access memory outside the reserved space. > > > > > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer > > > reservations without the extra memcpy. > > > > > > There are 3 new APIs: > > > > > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr); > > > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags); > > > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags); > > > > > > These closely follow the functionalities of the original ringbuf APIs. > > > For example, all ringbuffer dynptrs that have been reserved must be > > > either submitted or discarded before the program exits. > > > > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com> > > > --- > > > > Looks great! Modulo those four underscores, they are super confusing... > > > > > include/linux/bpf.h | 10 ++++- > > > include/uapi/linux/bpf.h | 35 +++++++++++++++++ > > > kernel/bpf/helpers.c | 6 +++ > > > kernel/bpf/ringbuf.c | 71 ++++++++++++++++++++++++++++++++++ > > > kernel/bpf/verifier.c | 18 +++++++-- > > > tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++ > > > 6 files changed, 171 insertions(+), 4 deletions(-) > > > > [...] > > > > > > > [...] > > > > > + * > > > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags) > > > + * Description > > > + * Discard reserved ring buffer sample through the dynptr > > > + * interface. This is a no-op if the dynptr is invalid/null. > > > + * > > > + * For more information on *flags*, please see > > > + * 'bpf_ringbuf_discard'. > > > + * Return > > > + * Nothing. Always succeeds. > > > */ > > > > let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not > > sure which one is more appropriate, probably just null one), so we can > > check in code whether some reservation was successful without knowing > > bpf_ringbuf_reserve_dynptr()'s return value > I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset > (convenience helpers). Do you prefer that this be part of this > patchset instead? If so, do you think this should be part of the 2nd > patch (aka the one where we set up the infra for dynptrs + implement > malloc-type dynptrs) or this ringbuf patch or its own patch? No problem adding it in a follow up patch. BTW, is it still in the plan to be able to create bpf_dynptr() from map_value, global variables, etc? I.e., it's a LOCAL dynptr except memory is not on STACK. Something like int k = 123; struct my_val *v; struct bpf_dynptr p; v = bpf_map_lookup_elem(&my_map, &k); if (!v) return 0; bpf_dynptr_from_mem(&v->my_data, &p); /* p points inside my_map's value */ ? > > > > > [...] > > [...]
On Mon, May 9, 2022 at 1:28 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Mon, May 9, 2022 at 12:44 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > > > On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > > > > > > > Currently, our only way of writing dynamically-sized data into a ring > > > > buffer is through bpf_ringbuf_output but this incurs an extra memcpy > > > > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra > > > > memcpy, but it can only safely support reservation sizes that are > > > > statically known since the verifier cannot guarantee that the bpf > > > > program won’t access memory outside the reserved space. > > > > > > > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer > > > > reservations without the extra memcpy. > > > > > > > > There are 3 new APIs: > > > > > > > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr); > > > > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags); > > > > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags); > > > > > > > > These closely follow the functionalities of the original ringbuf APIs. > > > > For example, all ringbuffer dynptrs that have been reserved must be > > > > either submitted or discarded before the program exits. > > > > > > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com> > > > > --- > > > > > > Looks great! Modulo those four underscores, they are super confusing... > > > > > > > include/linux/bpf.h | 10 ++++- > > > > include/uapi/linux/bpf.h | 35 +++++++++++++++++ > > > > kernel/bpf/helpers.c | 6 +++ > > > > kernel/bpf/ringbuf.c | 71 ++++++++++++++++++++++++++++++++++ > > > > kernel/bpf/verifier.c | 18 +++++++-- > > > > tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++ > > > > 6 files changed, 171 insertions(+), 4 deletions(-) > > > > > > [...] > > > > > > > > > > [...] > > > > > > > + * > > > > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags) > > > > + * Description > > > > + * Discard reserved ring buffer sample through the dynptr > > > > + * interface. This is a no-op if the dynptr is invalid/null. > > > > + * > > > > + * For more information on *flags*, please see > > > > + * 'bpf_ringbuf_discard'. > > > > + * Return > > > > + * Nothing. Always succeeds. > > > > */ > > > > > > let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not > > > sure which one is more appropriate, probably just null one), so we can > > > check in code whether some reservation was successful without knowing > > > bpf_ringbuf_reserve_dynptr()'s return value > > I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset > > (convenience helpers). Do you prefer that this be part of this > > patchset instead? If so, do you think this should be part of the 2nd > > patch (aka the one where we set up the infra for dynptrs + implement > > malloc-type dynptrs) or this ringbuf patch or its own patch? > > No problem adding it in a follow up patch. > > BTW, is it still in the plan to be able to create bpf_dynptr() from > map_value, global variables, etc? I.e., it's a LOCAL dynptr except > memory is not on STACK. > > Something like > > int k = 123; > struct my_val *v; > struct bpf_dynptr p; > > v = bpf_map_lookup_elem(&my_map, &k); > if (!v) return 0; > > bpf_dynptr_from_mem(&v->my_data, &p); > > /* p points inside my_map's value */ > > ? The plan is to still support some types of local dynptrs (eg dynptr to ctx skbuf / xdp data). If it would be useful to also have this for map_value, we can add this as well (the RCU protects against the map value being freed out from under the dynptr, I believe). > > > > > > > > > > [...] > > > [...]
On Mon, May 9, 2022 at 1:35 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > On Mon, May 9, 2022 at 1:28 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Mon, May 9, 2022 at 12:44 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > > > > > On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > > > > > > > > > Currently, our only way of writing dynamically-sized data into a ring > > > > > buffer is through bpf_ringbuf_output but this incurs an extra memcpy > > > > > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra > > > > > memcpy, but it can only safely support reservation sizes that are > > > > > statically known since the verifier cannot guarantee that the bpf > > > > > program won’t access memory outside the reserved space. > > > > > > > > > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer > > > > > reservations without the extra memcpy. > > > > > > > > > > There are 3 new APIs: > > > > > > > > > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr); > > > > > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags); > > > > > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags); > > > > > > > > > > These closely follow the functionalities of the original ringbuf APIs. > > > > > For example, all ringbuffer dynptrs that have been reserved must be > > > > > either submitted or discarded before the program exits. > > > > > > > > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com> > > > > > --- > > > > > > > > Looks great! Modulo those four underscores, they are super confusing... > > > > > > > > > include/linux/bpf.h | 10 ++++- > > > > > include/uapi/linux/bpf.h | 35 +++++++++++++++++ > > > > > kernel/bpf/helpers.c | 6 +++ > > > > > kernel/bpf/ringbuf.c | 71 ++++++++++++++++++++++++++++++++++ > > > > > kernel/bpf/verifier.c | 18 +++++++-- > > > > > tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++ > > > > > 6 files changed, 171 insertions(+), 4 deletions(-) > > > > > > > > [...] > > > > > > > > > > > > > [...] > > > > > > > > > + * > > > > > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags) > > > > > + * Description > > > > > + * Discard reserved ring buffer sample through the dynptr > > > > > + * interface. This is a no-op if the dynptr is invalid/null. > > > > > + * > > > > > + * For more information on *flags*, please see > > > > > + * 'bpf_ringbuf_discard'. > > > > > + * Return > > > > > + * Nothing. Always succeeds. > > > > > */ > > > > > > > > let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not > > > > sure which one is more appropriate, probably just null one), so we can > > > > check in code whether some reservation was successful without knowing > > > > bpf_ringbuf_reserve_dynptr()'s return value > > > I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset > > > (convenience helpers). Do you prefer that this be part of this > > > patchset instead? If so, do you think this should be part of the 2nd > > > patch (aka the one where we set up the infra for dynptrs + implement > > > malloc-type dynptrs) or this ringbuf patch or its own patch? > > > > No problem adding it in a follow up patch. > > > > BTW, is it still in the plan to be able to create bpf_dynptr() from > > map_value, global variables, etc? I.e., it's a LOCAL dynptr except > > memory is not on STACK. > > > > Something like > > > > int k = 123; > > struct my_val *v; > > struct bpf_dynptr p; > > > > v = bpf_map_lookup_elem(&my_map, &k); > > if (!v) return 0; > > > > bpf_dynptr_from_mem(&v->my_data, &p); > > > > /* p points inside my_map's value */ > > > > ? > The plan is to still support some types of local dynptrs (eg dynptr to > ctx skbuf / xdp data). If it would be useful to also have this for > map_value, we can add this as well (the RCU protects against the map > value being freed out from under the dynptr, I believe). Yep, I think it's useful and should be pretty straightforward (there are no self-referential issues as with PTR_TO_STACK). > > > > > > > > > > > > > > > [...] > > > > [...]
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 757440406962..10efbec99e93 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -394,7 +394,10 @@ enum bpf_type_flag { /* DYNPTR points to dynamically allocated memory. */ DYNPTR_TYPE_MALLOC = BIT(8 + BPF_BASE_TYPE_BITS), - __BPF_TYPE_LAST_FLAG = DYNPTR_TYPE_MALLOC, + /* DYNPTR points to a ringbuf record. */ + DYNPTR_TYPE_RINGBUF = BIT(9 + BPF_BASE_TYPE_BITS), + + __BPF_TYPE_LAST_FLAG = DYNPTR_TYPE_RINGBUF, }; /* Max number of base types. */ @@ -2203,6 +2206,9 @@ extern const struct bpf_func_proto bpf_ringbuf_reserve_proto; extern const struct bpf_func_proto bpf_ringbuf_submit_proto; extern const struct bpf_func_proto bpf_ringbuf_discard_proto; extern const struct bpf_func_proto bpf_ringbuf_query_proto; +extern const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto; +extern const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto; +extern const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto; extern const struct bpf_func_proto bpf_skc_to_tcp6_sock_proto; extern const struct bpf_func_proto bpf_skc_to_tcp_sock_proto; extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto; @@ -2370,6 +2376,8 @@ enum bpf_dynptr_type { BPF_DYNPTR_TYPE_INVALID, /* Memory allocated dynamically by the kernel for the dynptr */ BPF_DYNPTR_TYPE_MALLOC, + /* Underlying data is a ringbuf record */ + BPF_DYNPTR_TYPE_RINGBUF, }; /* Since the upper 8 bits of dynptr->size is reserved, the diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 5a87ed654016..679f960d2514 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5177,6 +5177,38 @@ union bpf_attr { * After this operation, *ptr* will be an invalidated dynptr. * Return * Void. + * + * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr) + * Description + * Reserve *size* bytes of payload in a ring buffer *ringbuf* + * through the dynptr interface. *flags* must be 0. + * + * Please note that a corresponding bpf_ringbuf_submit_dynptr or + * bpf_ringbuf_discard_dynptr must be called on *ptr*, even if the + * reservation fails. This is enforced by the verifier. + * Return + * 0 on success, or a negative error in case of failure. + * + * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags) + * Description + * Submit reserved ring buffer sample, pointed to by *data*, + * through the dynptr interface. This is a no-op if the dynptr is + * invalid/null. + * + * For more information on *flags*, please see + * 'bpf_ringbuf_submit'. + * Return + * Nothing. Always succeeds. + * + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags) + * Description + * Discard reserved ring buffer sample through the dynptr + * interface. This is a no-op if the dynptr is invalid/null. + * + * For more information on *flags*, please see + * 'bpf_ringbuf_discard'. + * Return + * Nothing. Always succeeds. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5376,6 +5408,9 @@ union bpf_attr { FN(kptr_xchg), \ FN(dynptr_alloc), \ FN(dynptr_put), \ + FN(ringbuf_reserve_dynptr), \ + FN(ringbuf_submit_dynptr), \ + FN(ringbuf_discard_dynptr), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index a4272e9239ea..2d6f2e28b580 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1513,6 +1513,12 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_ringbuf_discard_proto; case BPF_FUNC_ringbuf_query: return &bpf_ringbuf_query_proto; + case BPF_FUNC_ringbuf_reserve_dynptr: + return &bpf_ringbuf_reserve_dynptr_proto; + case BPF_FUNC_ringbuf_submit_dynptr: + return &bpf_ringbuf_submit_dynptr_proto; + case BPF_FUNC_ringbuf_discard_dynptr: + return &bpf_ringbuf_discard_dynptr_proto; case BPF_FUNC_for_each_map_elem: return &bpf_for_each_map_elem_proto; case BPF_FUNC_loop: diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index 311264ab80c4..685bee459525 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -475,3 +475,74 @@ const struct bpf_func_proto bpf_ringbuf_query_proto = { .arg1_type = ARG_CONST_MAP_PTR, .arg2_type = ARG_ANYTHING, }; + +BPF_CALL_4(bpf_ringbuf_reserve_dynptr, struct bpf_map *, map, u32, size, u64, flags, + struct bpf_dynptr_kern *, ptr) +{ + void *sample; + int err; + + err = bpf_dynptr_check_size(size); + if (err) { + bpf_dynptr_set_null(ptr); + return err; + } + + sample = (void __force *)____bpf_ringbuf_reserve(map, size, flags); + + if (!sample) { + bpf_dynptr_set_null(ptr); + return -EINVAL; + } + + bpf_dynptr_init(ptr, sample, BPF_DYNPTR_TYPE_RINGBUF, 0, size); + + return 0; +} + +const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto = { + .func = bpf_ringbuf_reserve_dynptr, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_ANYTHING, + .arg4_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | MEM_UNINIT, +}; + +BPF_CALL_2(bpf_ringbuf_submit_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags) +{ + if (!ptr->data) + return 0; + + ____bpf_ringbuf_submit(ptr->data, flags); + + bpf_dynptr_set_null(ptr); + + return 0; +} + +const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto = { + .func = bpf_ringbuf_submit_dynptr, + .ret_type = RET_VOID, + .arg1_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE, + .arg2_type = ARG_ANYTHING, +}; + +BPF_CALL_2(bpf_ringbuf_discard_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags) +{ + if (!ptr->data) + return 0; + + ____bpf_ringbuf_discard(ptr->data, flags); + + bpf_dynptr_set_null(ptr); + + return 0; +} + +const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto = { + .func = bpf_ringbuf_discard_dynptr, + .ret_type = RET_VOID, + .arg1_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE, + .arg2_type = ARG_ANYTHING, +}; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 16b7ea54a7e0..1b2ec1049368 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -672,13 +672,15 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env) env->scratched_stack_slots = ~0ULL; } -#define DYNPTR_TYPE_FLAG_MASK DYNPTR_TYPE_MALLOC +#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_MALLOC | DYNPTR_TYPE_RINGBUF) static int arg_to_dynptr_type(enum bpf_arg_type arg_type) { switch (arg_type & DYNPTR_TYPE_FLAG_MASK) { case DYNPTR_TYPE_MALLOC: return BPF_DYNPTR_TYPE_MALLOC; + case DYNPTR_TYPE_RINGBUF: + return BPF_DYNPTR_TYPE_RINGBUF; default: return BPF_DYNPTR_TYPE_INVALID; } @@ -686,7 +688,7 @@ static int arg_to_dynptr_type(enum bpf_arg_type arg_type) static inline bool dynptr_type_refcounted(enum bpf_dynptr_type type) { - return type == BPF_DYNPTR_TYPE_MALLOC; + return type == BPF_DYNPTR_TYPE_MALLOC || type == BPF_DYNPTR_TYPE_RINGBUF; } static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg, @@ -6033,9 +6035,13 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, case DYNPTR_TYPE_MALLOC: err_extra = "malloc "; break; + case DYNPTR_TYPE_RINGBUF: + err_extra = "ringbuf "; + break; default: break; } + verbose(env, "Expected an initialized %sdynptr as arg #%d\n", err_extra, arg + BPF_REG_1); return -EINVAL; @@ -6161,7 +6167,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, case BPF_MAP_TYPE_RINGBUF: if (func_id != BPF_FUNC_ringbuf_output && func_id != BPF_FUNC_ringbuf_reserve && - func_id != BPF_FUNC_ringbuf_query) + func_id != BPF_FUNC_ringbuf_query && + func_id != BPF_FUNC_ringbuf_reserve_dynptr && + func_id != BPF_FUNC_ringbuf_submit_dynptr && + func_id != BPF_FUNC_ringbuf_discard_dynptr) goto error; break; case BPF_MAP_TYPE_STACK_TRACE: @@ -6277,6 +6286,9 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, case BPF_FUNC_ringbuf_output: case BPF_FUNC_ringbuf_reserve: case BPF_FUNC_ringbuf_query: + case BPF_FUNC_ringbuf_reserve_dynptr: + case BPF_FUNC_ringbuf_submit_dynptr: + case BPF_FUNC_ringbuf_discard_dynptr: if (map->map_type != BPF_MAP_TYPE_RINGBUF) goto error; break; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 5a87ed654016..679f960d2514 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5177,6 +5177,38 @@ union bpf_attr { * After this operation, *ptr* will be an invalidated dynptr. * Return * Void. + * + * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr) + * Description + * Reserve *size* bytes of payload in a ring buffer *ringbuf* + * through the dynptr interface. *flags* must be 0. + * + * Please note that a corresponding bpf_ringbuf_submit_dynptr or + * bpf_ringbuf_discard_dynptr must be called on *ptr*, even if the + * reservation fails. This is enforced by the verifier. + * Return + * 0 on success, or a negative error in case of failure. + * + * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags) + * Description + * Submit reserved ring buffer sample, pointed to by *data*, + * through the dynptr interface. This is a no-op if the dynptr is + * invalid/null. + * + * For more information on *flags*, please see + * 'bpf_ringbuf_submit'. + * Return + * Nothing. Always succeeds. + * + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags) + * Description + * Discard reserved ring buffer sample through the dynptr + * interface. This is a no-op if the dynptr is invalid/null. + * + * For more information on *flags*, please see + * 'bpf_ringbuf_discard'. + * Return + * Nothing. Always succeeds. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5376,6 +5408,9 @@ union bpf_attr { FN(kptr_xchg), \ FN(dynptr_alloc), \ FN(dynptr_put), \ + FN(ringbuf_reserve_dynptr), \ + FN(ringbuf_submit_dynptr), \ + FN(ringbuf_discard_dynptr), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper
Currently, our only way of writing dynamically-sized data into a ring buffer is through bpf_ringbuf_output but this incurs an extra memcpy cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra memcpy, but it can only safely support reservation sizes that are statically known since the verifier cannot guarantee that the bpf program won’t access memory outside the reserved space. The bpf_dynptr abstraction allows for dynamically-sized ring buffer reservations without the extra memcpy. There are 3 new APIs: long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr); void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags); void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags); These closely follow the functionalities of the original ringbuf APIs. For example, all ringbuffer dynptrs that have been reserved must be either submitted or discarded before the program exits. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> --- include/linux/bpf.h | 10 ++++- include/uapi/linux/bpf.h | 35 +++++++++++++++++ kernel/bpf/helpers.c | 6 +++ kernel/bpf/ringbuf.c | 71 ++++++++++++++++++++++++++++++++++ kernel/bpf/verifier.c | 18 +++++++-- tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++ 6 files changed, 171 insertions(+), 4 deletions(-)