diff mbox series

[bpf-next,v3,3/6] bpf: Dynptr support for ring buffers

Message ID 20220428211059.4065379-4-joannelkoong@gmail.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series Dynamic pointers | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 1822 this patch: 1823
netdev/cc_maintainers warning 6 maintainers not CCed: songliubraving@fb.com netdev@vger.kernel.org kafai@fb.com yhs@fb.com john.fastabend@gmail.com kpsingh@kernel.org
netdev/build_clang success Errors and warnings before: 198 this patch: 198
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1832 this patch: 1832
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 94 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-1 fail Logs for Kernel LATEST on ubuntu-latest + selftests
bpf/vmtest-bpf-next-VM_Test-2 fail Logs for Kernel LATEST on z15 + selftests

Commit Message

Joanne Koong April 28, 2022, 9:10 p.m. UTC
Currently, our only way of writing dynamically-sized data into a ring
buffer is through bpf_ringbuf_output but this incurs an extra memcpy
cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
memcpy, but it can only safely support reservation sizes that are
statically known since the verifier cannot guarantee that the bpf
program won’t access memory outside the reserved space.

The bpf_dynptr abstraction allows for dynamically-sized ring buffer
reservations without the extra memcpy.

There are 3 new APIs:

long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);

These closely follow the functionalities of the original ringbuf APIs.
For example, all ringbuffer dynptrs that have been reserved must be
either submitted or discarded before the program exits.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            | 10 ++++-
 include/uapi/linux/bpf.h       | 35 +++++++++++++++++
 kernel/bpf/helpers.c           |  6 +++
 kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c          | 18 +++++++--
 tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
 6 files changed, 171 insertions(+), 4 deletions(-)

Comments

Andrii Nakryiko May 6, 2022, 11:41 p.m. UTC | #1
On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Currently, our only way of writing dynamically-sized data into a ring
> buffer is through bpf_ringbuf_output but this incurs an extra memcpy
> cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
> memcpy, but it can only safely support reservation sizes that are
> statically known since the verifier cannot guarantee that the bpf
> program won’t access memory outside the reserved space.
>
> The bpf_dynptr abstraction allows for dynamically-sized ring buffer
> reservations without the extra memcpy.
>
> There are 3 new APIs:
>
> long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
> void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
> void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
>
> These closely follow the functionalities of the original ringbuf APIs.
> For example, all ringbuffer dynptrs that have been reserved must be
> either submitted or discarded before the program exits.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---

Looks great! Modulo those four underscores, they are super confusing...

>  include/linux/bpf.h            | 10 ++++-
>  include/uapi/linux/bpf.h       | 35 +++++++++++++++++
>  kernel/bpf/helpers.c           |  6 +++
>  kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c          | 18 +++++++--
>  tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
>  6 files changed, 171 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 757440406962..10efbec99e93 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -394,7 +394,10 @@ enum bpf_type_flag {
>         /* DYNPTR points to dynamically allocated memory. */
>         DYNPTR_TYPE_MALLOC      = BIT(8 + BPF_BASE_TYPE_BITS),
>
> -       __BPF_TYPE_LAST_FLAG    = DYNPTR_TYPE_MALLOC,
> +       /* DYNPTR points to a ringbuf record. */
> +       DYNPTR_TYPE_RINGBUF     = BIT(9 + BPF_BASE_TYPE_BITS),
> +
> +       __BPF_TYPE_LAST_FLAG    = DYNPTR_TYPE_RINGBUF,

it's getting a bit old to have to update __BPF_TYPE_LAST_FLAG all the
time, maybe let's do this:

__BPF_TYPE_FLAG_MAX,
__BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,

and never touch it again?

>  };
>

[...]

> + *
> + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
> + *     Description
> + *             Discard reserved ring buffer sample through the dynptr
> + *             interface. This is a no-op if the dynptr is invalid/null.
> + *
> + *             For more information on *flags*, please see
> + *             'bpf_ringbuf_discard'.
> + *     Return
> + *             Nothing. Always succeeds.
>   */

let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not
sure which one is more appropriate, probably just null one), so we can
check in code whether some reservation was successful without knowing
bpf_ringbuf_reserve_dynptr()'s return value


>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \

[...]

> +BPF_CALL_4(bpf_ringbuf_reserve_dynptr, struct bpf_map *, map, u32, size, u64, flags,
> +          struct bpf_dynptr_kern *, ptr)
> +{
> +       void *sample;
> +       int err;
> +
> +       err = bpf_dynptr_check_size(size);
> +       if (err) {
> +               bpf_dynptr_set_null(ptr);
> +               return err;
> +       }
> +
> +       sample = (void __force *)____bpf_ringbuf_reserve(map, size, flags);

I was so confused by these four underscored for a bit... Is this
what's defined inside BPF_CALL_4 (and thus makes it ungreppable). Can
you instead just open-code container_of and __bpf_ringbuf_reserve()
directly to make it a bit easier to follow? And flags check as well.
It will so much easier to understand what's going on.

> +
> +       if (!sample) {
> +               bpf_dynptr_set_null(ptr);
> +               return -EINVAL;
> +       }
> +
> +       bpf_dynptr_init(ptr, sample, BPF_DYNPTR_TYPE_RINGBUF, 0, size);
> +
> +       return 0;
> +}
> +
> +const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto = {
> +       .func           = bpf_ringbuf_reserve_dynptr,
> +       .ret_type       = RET_INTEGER,
> +       .arg1_type      = ARG_CONST_MAP_PTR,
> +       .arg2_type      = ARG_ANYTHING,
> +       .arg3_type      = ARG_ANYTHING,
> +       .arg4_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | MEM_UNINIT,
> +};
> +
> +BPF_CALL_2(bpf_ringbuf_submit_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
> +{
> +       if (!ptr->data)
> +               return 0;
> +
> +       ____bpf_ringbuf_submit(ptr->data, flags);

this just calls bpf_ringbuf_commit(), let's do it here explicitly as well

> +
> +       bpf_dynptr_set_null(ptr);
> +
> +       return 0;
> +}
> +
> +const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto = {
> +       .func           = bpf_ringbuf_submit_dynptr,
> +       .ret_type       = RET_VOID,
> +       .arg1_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE,
> +       .arg2_type      = ARG_ANYTHING,
> +};
> +
> +BPF_CALL_2(bpf_ringbuf_discard_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
> +{
> +       if (!ptr->data)
> +               return 0;
> +
> +       ____bpf_ringbuf_discard(ptr->data, flags);
> +

ditto


> +       bpf_dynptr_set_null(ptr);
> +
> +       return 0;
> +}
> +

[...]
Joanne Koong May 9, 2022, 7:44 p.m. UTC | #2
On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > Currently, our only way of writing dynamically-sized data into a ring
> > buffer is through bpf_ringbuf_output but this incurs an extra memcpy
> > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
> > memcpy, but it can only safely support reservation sizes that are
> > statically known since the verifier cannot guarantee that the bpf
> > program won’t access memory outside the reserved space.
> >
> > The bpf_dynptr abstraction allows for dynamically-sized ring buffer
> > reservations without the extra memcpy.
> >
> > There are 3 new APIs:
> >
> > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
> > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
> >
> > These closely follow the functionalities of the original ringbuf APIs.
> > For example, all ringbuffer dynptrs that have been reserved must be
> > either submitted or discarded before the program exits.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
>
> Looks great! Modulo those four underscores, they are super confusing...
>
> >  include/linux/bpf.h            | 10 ++++-
> >  include/uapi/linux/bpf.h       | 35 +++++++++++++++++
> >  kernel/bpf/helpers.c           |  6 +++
> >  kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
> >  kernel/bpf/verifier.c          | 18 +++++++--
> >  tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
> >  6 files changed, 171 insertions(+), 4 deletions(-)
> >
[...]
> >
>
> [...]
>
> > + *
> > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
> > + *     Description
> > + *             Discard reserved ring buffer sample through the dynptr
> > + *             interface. This is a no-op if the dynptr is invalid/null.
> > + *
> > + *             For more information on *flags*, please see
> > + *             'bpf_ringbuf_discard'.
> > + *     Return
> > + *             Nothing. Always succeeds.
> >   */
>
> let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not
> sure which one is more appropriate, probably just null one), so we can
> check in code whether some reservation was successful without knowing
> bpf_ringbuf_reserve_dynptr()'s return value
I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset
(convenience helpers). Do you prefer that this be part of this
patchset instead? If so, do you think this should be part of the 2nd
patch (aka the one where we set up the infra for dynptrs + implement
malloc-type dynptrs) or this ringbuf patch or its own patch?
>
>
[...]
> [...]
Andrii Nakryiko May 9, 2022, 8:28 p.m. UTC | #3
On Mon, May 9, 2022 at 12:44 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > Currently, our only way of writing dynamically-sized data into a ring
> > > buffer is through bpf_ringbuf_output but this incurs an extra memcpy
> > > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
> > > memcpy, but it can only safely support reservation sizes that are
> > > statically known since the verifier cannot guarantee that the bpf
> > > program won’t access memory outside the reserved space.
> > >
> > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer
> > > reservations without the extra memcpy.
> > >
> > > There are 3 new APIs:
> > >
> > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
> > > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > >
> > > These closely follow the functionalities of the original ringbuf APIs.
> > > For example, all ringbuffer dynptrs that have been reserved must be
> > > either submitted or discarded before the program exits.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > ---
> >
> > Looks great! Modulo those four underscores, they are super confusing...
> >
> > >  include/linux/bpf.h            | 10 ++++-
> > >  include/uapi/linux/bpf.h       | 35 +++++++++++++++++
> > >  kernel/bpf/helpers.c           |  6 +++
> > >  kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
> > >  kernel/bpf/verifier.c          | 18 +++++++--
> > >  tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
> > >  6 files changed, 171 insertions(+), 4 deletions(-)
> > >
> [...]
> > >
> >
> > [...]
> >
> > > + *
> > > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
> > > + *     Description
> > > + *             Discard reserved ring buffer sample through the dynptr
> > > + *             interface. This is a no-op if the dynptr is invalid/null.
> > > + *
> > > + *             For more information on *flags*, please see
> > > + *             'bpf_ringbuf_discard'.
> > > + *     Return
> > > + *             Nothing. Always succeeds.
> > >   */
> >
> > let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not
> > sure which one is more appropriate, probably just null one), so we can
> > check in code whether some reservation was successful without knowing
> > bpf_ringbuf_reserve_dynptr()'s return value
> I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset
> (convenience helpers). Do you prefer that this be part of this
> patchset instead? If so, do you think this should be part of the 2nd
> patch (aka the one where we set up the infra for dynptrs + implement
> malloc-type dynptrs) or this ringbuf patch or its own patch?

No problem adding it in a follow up patch.

BTW, is it still in the plan to be able to create bpf_dynptr() from
map_value, global variables, etc? I.e., it's a LOCAL dynptr except
memory is not on STACK.

Something like

int k = 123;
struct my_val *v;
struct bpf_dynptr p;

v = bpf_map_lookup_elem(&my_map, &k);
if (!v) return 0;

bpf_dynptr_from_mem(&v->my_data, &p);

/* p points inside my_map's value */

?


> >
> >
> [...]
> > [...]
Joanne Koong May 9, 2022, 8:35 p.m. UTC | #4
On Mon, May 9, 2022 at 1:28 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, May 9, 2022 at 12:44 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > Currently, our only way of writing dynamically-sized data into a ring
> > > > buffer is through bpf_ringbuf_output but this incurs an extra memcpy
> > > > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
> > > > memcpy, but it can only safely support reservation sizes that are
> > > > statically known since the verifier cannot guarantee that the bpf
> > > > program won’t access memory outside the reserved space.
> > > >
> > > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer
> > > > reservations without the extra memcpy.
> > > >
> > > > There are 3 new APIs:
> > > >
> > > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
> > > > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > > > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > > >
> > > > These closely follow the functionalities of the original ringbuf APIs.
> > > > For example, all ringbuffer dynptrs that have been reserved must be
> > > > either submitted or discarded before the program exits.
> > > >
> > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > ---
> > >
> > > Looks great! Modulo those four underscores, they are super confusing...
> > >
> > > >  include/linux/bpf.h            | 10 ++++-
> > > >  include/uapi/linux/bpf.h       | 35 +++++++++++++++++
> > > >  kernel/bpf/helpers.c           |  6 +++
> > > >  kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
> > > >  kernel/bpf/verifier.c          | 18 +++++++--
> > > >  tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
> > > >  6 files changed, 171 insertions(+), 4 deletions(-)
> > > >
> > [...]
> > > >
> > >
> > > [...]
> > >
> > > > + *
> > > > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
> > > > + *     Description
> > > > + *             Discard reserved ring buffer sample through the dynptr
> > > > + *             interface. This is a no-op if the dynptr is invalid/null.
> > > > + *
> > > > + *             For more information on *flags*, please see
> > > > + *             'bpf_ringbuf_discard'.
> > > > + *     Return
> > > > + *             Nothing. Always succeeds.
> > > >   */
> > >
> > > let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not
> > > sure which one is more appropriate, probably just null one), so we can
> > > check in code whether some reservation was successful without knowing
> > > bpf_ringbuf_reserve_dynptr()'s return value
> > I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset
> > (convenience helpers). Do you prefer that this be part of this
> > patchset instead? If so, do you think this should be part of the 2nd
> > patch (aka the one where we set up the infra for dynptrs + implement
> > malloc-type dynptrs) or this ringbuf patch or its own patch?
>
> No problem adding it in a follow up patch.
>
> BTW, is it still in the plan to be able to create bpf_dynptr() from
> map_value, global variables, etc? I.e., it's a LOCAL dynptr except
> memory is not on STACK.
>
> Something like
>
> int k = 123;
> struct my_val *v;
> struct bpf_dynptr p;
>
> v = bpf_map_lookup_elem(&my_map, &k);
> if (!v) return 0;
>
> bpf_dynptr_from_mem(&v->my_data, &p);
>
> /* p points inside my_map's value */
>
> ?
The plan is to still support some types of local dynptrs (eg dynptr to
ctx skbuf / xdp data). If it would be useful to also have this for
map_value, we can add this as well (the RCU protects against the map
value being freed out from under the dynptr, I believe).
>
>
> > >
> > >
> > [...]
> > > [...]
Andrii Nakryiko May 9, 2022, 8:58 p.m. UTC | #5
On Mon, May 9, 2022 at 1:35 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Mon, May 9, 2022 at 1:28 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, May 9, 2022 at 12:44 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > Currently, our only way of writing dynamically-sized data into a ring
> > > > > buffer is through bpf_ringbuf_output but this incurs an extra memcpy
> > > > > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
> > > > > memcpy, but it can only safely support reservation sizes that are
> > > > > statically known since the verifier cannot guarantee that the bpf
> > > > > program won’t access memory outside the reserved space.
> > > > >
> > > > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer
> > > > > reservations without the extra memcpy.
> > > > >
> > > > > There are 3 new APIs:
> > > > >
> > > > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
> > > > > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > > > > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > > > >
> > > > > These closely follow the functionalities of the original ringbuf APIs.
> > > > > For example, all ringbuffer dynptrs that have been reserved must be
> > > > > either submitted or discarded before the program exits.
> > > > >
> > > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > > ---
> > > >
> > > > Looks great! Modulo those four underscores, they are super confusing...
> > > >
> > > > >  include/linux/bpf.h            | 10 ++++-
> > > > >  include/uapi/linux/bpf.h       | 35 +++++++++++++++++
> > > > >  kernel/bpf/helpers.c           |  6 +++
> > > > >  kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
> > > > >  kernel/bpf/verifier.c          | 18 +++++++--
> > > > >  tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
> > > > >  6 files changed, 171 insertions(+), 4 deletions(-)
> > > > >
> > > [...]
> > > > >
> > > >
> > > > [...]
> > > >
> > > > > + *
> > > > > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
> > > > > + *     Description
> > > > > + *             Discard reserved ring buffer sample through the dynptr
> > > > > + *             interface. This is a no-op if the dynptr is invalid/null.
> > > > > + *
> > > > > + *             For more information on *flags*, please see
> > > > > + *             'bpf_ringbuf_discard'.
> > > > > + *     Return
> > > > > + *             Nothing. Always succeeds.
> > > > >   */
> > > >
> > > > let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not
> > > > sure which one is more appropriate, probably just null one), so we can
> > > > check in code whether some reservation was successful without knowing
> > > > bpf_ringbuf_reserve_dynptr()'s return value
> > > I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset
> > > (convenience helpers). Do you prefer that this be part of this
> > > patchset instead? If so, do you think this should be part of the 2nd
> > > patch (aka the one where we set up the infra for dynptrs + implement
> > > malloc-type dynptrs) or this ringbuf patch or its own patch?
> >
> > No problem adding it in a follow up patch.
> >
> > BTW, is it still in the plan to be able to create bpf_dynptr() from
> > map_value, global variables, etc? I.e., it's a LOCAL dynptr except
> > memory is not on STACK.
> >
> > Something like
> >
> > int k = 123;
> > struct my_val *v;
> > struct bpf_dynptr p;
> >
> > v = bpf_map_lookup_elem(&my_map, &k);
> > if (!v) return 0;
> >
> > bpf_dynptr_from_mem(&v->my_data, &p);
> >
> > /* p points inside my_map's value */
> >
> > ?
> The plan is to still support some types of local dynptrs (eg dynptr to
> ctx skbuf / xdp data). If it would be useful to also have this for
> map_value, we can add this as well (the RCU protects against the map
> value being freed out from under the dynptr, I believe).

Yep, I think it's useful and should be pretty straightforward (there
are no self-referential issues as with PTR_TO_STACK).

> >
> >
> > > >
> > > >
> > > [...]
> > > > [...]
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 757440406962..10efbec99e93 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -394,7 +394,10 @@  enum bpf_type_flag {
 	/* DYNPTR points to dynamically allocated memory. */
 	DYNPTR_TYPE_MALLOC	= BIT(8 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= DYNPTR_TYPE_MALLOC,
+	/* DYNPTR points to a ringbuf record. */
+	DYNPTR_TYPE_RINGBUF	= BIT(9 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= DYNPTR_TYPE_RINGBUF,
 };
 
 /* Max number of base types. */
@@ -2203,6 +2206,9 @@  extern const struct bpf_func_proto bpf_ringbuf_reserve_proto;
 extern const struct bpf_func_proto bpf_ringbuf_submit_proto;
 extern const struct bpf_func_proto bpf_ringbuf_discard_proto;
 extern const struct bpf_func_proto bpf_ringbuf_query_proto;
+extern const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto;
+extern const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto;
+extern const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp6_sock_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp_sock_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto;
@@ -2370,6 +2376,8 @@  enum bpf_dynptr_type {
 	BPF_DYNPTR_TYPE_INVALID,
 	/* Memory allocated dynamically by the kernel for the dynptr */
 	BPF_DYNPTR_TYPE_MALLOC,
+	/* Underlying data is a ringbuf record */
+	BPF_DYNPTR_TYPE_RINGBUF,
 };
 
 /* Since the upper 8 bits of dynptr->size is reserved, the
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 5a87ed654016..679f960d2514 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5177,6 +5177,38 @@  union bpf_attr {
  *		After this operation, *ptr* will be an invalidated dynptr.
  *	Return
  *		Void.
+ *
+ * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Reserve *size* bytes of payload in a ring buffer *ringbuf*
+ *		through the dynptr interface. *flags* must be 0.
+ *
+ *		Please note that a corresponding bpf_ringbuf_submit_dynptr or
+ *		bpf_ringbuf_discard_dynptr must be called on *ptr*, even if the
+ *		reservation fails. This is enforced by the verifier.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
+ * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Submit reserved ring buffer sample, pointed to by *data*,
+ *		through the dynptr interface. This is a no-op if the dynptr is
+ *		invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_submit'.
+ *	Return
+ *		Nothing. Always succeeds.
+ *
+ * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Discard reserved ring buffer sample through the dynptr
+ *		interface. This is a no-op if the dynptr is invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_discard'.
+ *	Return
+ *		Nothing. Always succeeds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5376,6 +5408,9 @@  union bpf_attr {
 	FN(kptr_xchg),			\
 	FN(dynptr_alloc),		\
 	FN(dynptr_put),			\
+	FN(ringbuf_reserve_dynptr),	\
+	FN(ringbuf_submit_dynptr),	\
+	FN(ringbuf_discard_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a4272e9239ea..2d6f2e28b580 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1513,6 +1513,12 @@  bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_ringbuf_discard_proto;
 	case BPF_FUNC_ringbuf_query:
 		return &bpf_ringbuf_query_proto;
+	case BPF_FUNC_ringbuf_reserve_dynptr:
+		return &bpf_ringbuf_reserve_dynptr_proto;
+	case BPF_FUNC_ringbuf_submit_dynptr:
+		return &bpf_ringbuf_submit_dynptr_proto;
+	case BPF_FUNC_ringbuf_discard_dynptr:
+		return &bpf_ringbuf_discard_dynptr_proto;
 	case BPF_FUNC_for_each_map_elem:
 		return &bpf_for_each_map_elem_proto;
 	case BPF_FUNC_loop:
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 311264ab80c4..685bee459525 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -475,3 +475,74 @@  const struct bpf_func_proto bpf_ringbuf_query_proto = {
 	.arg1_type	= ARG_CONST_MAP_PTR,
 	.arg2_type	= ARG_ANYTHING,
 };
+
+BPF_CALL_4(bpf_ringbuf_reserve_dynptr, struct bpf_map *, map, u32, size, u64, flags,
+	   struct bpf_dynptr_kern *, ptr)
+{
+	void *sample;
+	int err;
+
+	err = bpf_dynptr_check_size(size);
+	if (err) {
+		bpf_dynptr_set_null(ptr);
+		return err;
+	}
+
+	sample = (void __force *)____bpf_ringbuf_reserve(map, size, flags);
+
+	if (!sample) {
+		bpf_dynptr_set_null(ptr);
+		return -EINVAL;
+	}
+
+	bpf_dynptr_init(ptr, sample, BPF_DYNPTR_TYPE_RINGBUF, 0, size);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto = {
+	.func		= bpf_ringbuf_reserve_dynptr,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | MEM_UNINIT,
+};
+
+BPF_CALL_2(bpf_ringbuf_submit_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
+{
+	if (!ptr->data)
+		return 0;
+
+	____bpf_ringbuf_submit(ptr->data, flags);
+
+	bpf_dynptr_set_null(ptr);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto = {
+	.func		= bpf_ringbuf_submit_dynptr,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE,
+	.arg2_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_2(bpf_ringbuf_discard_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
+{
+	if (!ptr->data)
+		return 0;
+
+	____bpf_ringbuf_discard(ptr->data, flags);
+
+	bpf_dynptr_set_null(ptr);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto = {
+	.func		= bpf_ringbuf_discard_dynptr,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE,
+	.arg2_type	= ARG_ANYTHING,
+};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 16b7ea54a7e0..1b2ec1049368 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -672,13 +672,15 @@  static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
 	env->scratched_stack_slots = ~0ULL;
 }
 
-#define DYNPTR_TYPE_FLAG_MASK		DYNPTR_TYPE_MALLOC
+#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_MALLOC | DYNPTR_TYPE_RINGBUF)
 
 static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
 {
 	switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
 	case DYNPTR_TYPE_MALLOC:
 		return BPF_DYNPTR_TYPE_MALLOC;
+	case DYNPTR_TYPE_RINGBUF:
+		return BPF_DYNPTR_TYPE_RINGBUF;
 	default:
 		return BPF_DYNPTR_TYPE_INVALID;
 	}
@@ -686,7 +688,7 @@  static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
 
 static inline bool dynptr_type_refcounted(enum bpf_dynptr_type type)
 {
-	return type == BPF_DYNPTR_TYPE_MALLOC;
+	return type == BPF_DYNPTR_TYPE_MALLOC || type == BPF_DYNPTR_TYPE_RINGBUF;
 }
 
 static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
@@ -6033,9 +6035,13 @@  static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			case DYNPTR_TYPE_MALLOC:
 				err_extra = "malloc ";
 				break;
+			case DYNPTR_TYPE_RINGBUF:
+				err_extra = "ringbuf ";
+				break;
 			default:
 				break;
 			}
+
 			verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
 				err_extra, arg + BPF_REG_1);
 			return -EINVAL;
@@ -6161,7 +6167,10 @@  static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_MAP_TYPE_RINGBUF:
 		if (func_id != BPF_FUNC_ringbuf_output &&
 		    func_id != BPF_FUNC_ringbuf_reserve &&
-		    func_id != BPF_FUNC_ringbuf_query)
+		    func_id != BPF_FUNC_ringbuf_query &&
+		    func_id != BPF_FUNC_ringbuf_reserve_dynptr &&
+		    func_id != BPF_FUNC_ringbuf_submit_dynptr &&
+		    func_id != BPF_FUNC_ringbuf_discard_dynptr)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_STACK_TRACE:
@@ -6277,6 +6286,9 @@  static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_FUNC_ringbuf_output:
 	case BPF_FUNC_ringbuf_reserve:
 	case BPF_FUNC_ringbuf_query:
+	case BPF_FUNC_ringbuf_reserve_dynptr:
+	case BPF_FUNC_ringbuf_submit_dynptr:
+	case BPF_FUNC_ringbuf_discard_dynptr:
 		if (map->map_type != BPF_MAP_TYPE_RINGBUF)
 			goto error;
 		break;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 5a87ed654016..679f960d2514 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5177,6 +5177,38 @@  union bpf_attr {
  *		After this operation, *ptr* will be an invalidated dynptr.
  *	Return
  *		Void.
+ *
+ * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Reserve *size* bytes of payload in a ring buffer *ringbuf*
+ *		through the dynptr interface. *flags* must be 0.
+ *
+ *		Please note that a corresponding bpf_ringbuf_submit_dynptr or
+ *		bpf_ringbuf_discard_dynptr must be called on *ptr*, even if the
+ *		reservation fails. This is enforced by the verifier.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
+ * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Submit reserved ring buffer sample, pointed to by *data*,
+ *		through the dynptr interface. This is a no-op if the dynptr is
+ *		invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_submit'.
+ *	Return
+ *		Nothing. Always succeeds.
+ *
+ * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Discard reserved ring buffer sample through the dynptr
+ *		interface. This is a no-op if the dynptr is invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_discard'.
+ *	Return
+ *		Nothing. Always succeeds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5376,6 +5408,9 @@  union bpf_attr {
 	FN(kptr_xchg),			\
 	FN(dynptr_alloc),		\
 	FN(dynptr_put),			\
+	FN(ringbuf_reserve_dynptr),	\
+	FN(ringbuf_submit_dynptr),	\
+	FN(ringbuf_discard_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper