diff mbox series

[bpf-next,v1,7/8] bpf: Add bpf_dynptr_iterator

Message ID 20220908000254.3079129-8-joannelkoong@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Dynptr convenience helpers | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/apply fail Patch does not apply to bpf-next
bpf/vmtest-bpf-next-PR fail merge-conflict

Commit Message

Joanne Koong Sept. 8, 2022, 12:02 a.m. UTC
Add a new helper function, bpf_dynptr_iterator:

  long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn,
			   void *callback_ctx, u64 flags)

where callback_fn is defined as:

  long (*callback_fn)(struct bpf_dynptr *ptr, void *ctx)

and callback_fn returns the number of bytes to advance the
dynptr by (or an error code in the case of error). The iteration
will stop if the callback_fn returns 0 or an error or tries to
advance by more bytes than available.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/uapi/linux/bpf.h       | 20 ++++++++++++++
 kernel/bpf/helpers.c           | 48 +++++++++++++++++++++++++++++++---
 kernel/bpf/verifier.c          | 27 +++++++++++++++++++
 tools/include/uapi/linux/bpf.h | 20 ++++++++++++++
 4 files changed, 111 insertions(+), 4 deletions(-)

Comments

Kumar Kartikeya Dwivedi Sept. 19, 2022, 12:07 a.m. UTC | #1
On Thu, 8 Sept 2022 at 02:16, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Add a new helper function, bpf_dynptr_iterator:
>
>   long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn,
>                            void *callback_ctx, u64 flags)
>
> where callback_fn is defined as:
>
>   long (*callback_fn)(struct bpf_dynptr *ptr, void *ctx)
>
> and callback_fn returns the number of bytes to advance the
> dynptr by (or an error code in the case of error). The iteration
> will stop if the callback_fn returns 0 or an error or tries to
> advance by more bytes than available.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---

This is buggy as is.

A user can just reinitialize the dynptr from inside the callback, and
then you will never stop advancing it inside your helper, therefore an
infinite loop can be constructed. The stack frame of the caller is
accessible using callback_ctx.

For example (modifying your selftest)

diff --git a/tools/testing/selftests/bpf/progs/dynptr_success.c
b/tools/testing/selftests/bpf/progs/dynptr_success.c
index 22164ad6df9d..a9e78316c508 100644
--- a/tools/testing/selftests/bpf/progs/dynptr_success.c
+++ b/tools/testing/selftests/bpf/progs/dynptr_success.c
@@ -540,6 +540,19 @@ struct iter_ctx {

 static int iter_callback(struct bpf_dynptr *ptr, struct iter_ctx *ctx)
 {
+       struct map_value *map_val;
+       int key = 0;
+
+       map_val = bpf_map_lookup_elem(&array_map2, &key);
+       if (!map_val) {
+               return 0;
+       }
+
+       *(void **)ptr = 0;
+       *((void **)ptr + 1) = 0;
+       bpf_dynptr_from_mem(map_val, 2, 0, ptr);
+       return 1;
+
        if (ctx->trigger_err_erange)
                return bpf_dynptr_get_size(ptr) + 1;

... leads to a lockup.

It doesn't have to be ringbuf_reserver_dynptr, it can just be
dynptr_from_mem, which also gets around reference state restrictions
inside callbacks.

You cannot prevent overwriting dynptr stack slots in general. Some of
them don't have to be released. It would be prohibitive for stack
reuse.

So it seems like you need to temporarily mark both the slots as
immutable for the caller stack state during exploration of the
callback.
Setting some flag in r1 for callback is not enough (as it can reload
PTR_TO_STACK of caller stack frame pointing at dynptr from
callback_ctx). It needs to be set in spilled_ptr.

Then certain operations modifying the view of the dynptr do not accept
dynptr with that type flag set (e.g. trim, advance, init functions).
While for others which only operate on the underlying view, you fold
the flag (e.g. read/write/dynptr_data).

It is the difference between struct bpf_dynptr *, vs const struct
bpf_dynptr *, we need to give the callback access to the latter.
I.e. it should still allow accessing the dynptr's view, but not modifying it.

And at the risk of sounding like a broken record (and same suggestion
as Martin in skb/xdp v6 set), the view's mutability should ideally
also be part of the verifier's state. That doesn't preclude runtime
tracking later, but there seems to be no strong motivation for that
right now.

>  include/uapi/linux/bpf.h       | 20 ++++++++++++++
>  kernel/bpf/helpers.c           | 48 +++++++++++++++++++++++++++++++---
>  kernel/bpf/verifier.c          | 27 +++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 20 ++++++++++++++
>  4 files changed, 111 insertions(+), 4 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 16973fa4d073..ff78a94c262a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5531,6 +5531,25 @@ union bpf_attr {
>   *             losing access to the original view of the dynptr.
>   *     Return
>   *             0 on success, -EINVAL if the dynptr to clone is invalid.
> + *
> + * long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, void *callback_ctx, u64 flags)
> + *     Description
> + *             Iterate through the dynptr data, calling **callback_fn** on each
> + *             iteration with **callback_ctx** as the context parameter.
> + *             The **callback_fn** should be a static function and
> + *             the **callback_ctx** should be a pointer to the stack.
> + *             Currently **flags** is unused and must be 0.
> + *
> + *             long (\*callback_fn)(struct bpf_dynptr \*ptr, void \*ctx);
> + *
> + *             where **callback_fn** returns the number of bytes to advance
> + *             the dynptr by or an error. The iteration will stop if **callback_fn**
> + *             returns 0 or an error or tries to advance by more bytes than the
> + *             size of the dynptr.
> + *     Return
> + *             0 on success, -EINVAL if the dynptr is invalid or **flags** is not 0,
> + *             -ERANGE if attempting to iterate more bytes than available, or other
> + *             negative error if **callback_fn** returns an error.
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -5752,6 +5771,7 @@ union bpf_attr {
>         FN(dynptr_get_size),            \
>         FN(dynptr_get_offset),          \
>         FN(dynptr_clone),               \
> +       FN(dynptr_iterator),            \
>         /* */
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 667f1e213a61..519b3da06d49 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1653,13 +1653,11 @@ static const struct bpf_func_proto bpf_dynptr_data_proto = {
>         .arg3_type      = ARG_CONST_ALLOC_SIZE_OR_ZERO,
>  };
>
> -BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> +/* *ptr* should always be a valid dynptr */
> +static int __bpf_dynptr_advance(struct bpf_dynptr_kern *ptr, u32 len)
>  {
>         u32 size;
>
> -       if (!ptr->data)
> -               return -EINVAL;
> -
>         size = __bpf_dynptr_get_size(ptr);
>
>         if (len > size)
> @@ -1672,6 +1670,14 @@ BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
>         return 0;
>  }
>
> +BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> +{
> +       if (!ptr->data)
> +               return -EINVAL;
> +
> +       return __bpf_dynptr_advance(ptr, len);
> +}
> +
>  static const struct bpf_func_proto bpf_dynptr_advance_proto = {
>         .func           = bpf_dynptr_advance,
>         .gpl_only       = false,
> @@ -1783,6 +1789,38 @@ static const struct bpf_func_proto bpf_dynptr_clone_proto = {
>         .arg2_type      = ARG_PTR_TO_DYNPTR | MEM_UNINIT,
>  };
>
> +BPF_CALL_4(bpf_dynptr_iterator, struct bpf_dynptr_kern *, ptr, void *, callback_fn,
> +          void *, callback_ctx, u64, flags)
> +{
> +       bpf_callback_t callback = (bpf_callback_t)callback_fn;
> +       int nr_bytes, err;
> +
> +       if (!ptr->data || flags)
> +               return -EINVAL;
> +
> +       while (ptr->size > 0) {
> +               nr_bytes = callback((u64)(long)ptr, (u64)(long)callback_ctx, 0, 0, 0);
> +               if (nr_bytes <= 0)
> +                       return nr_bytes;
> +
> +               err = __bpf_dynptr_advance(ptr, nr_bytes);
> +               if (err)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static const struct bpf_func_proto bpf_dynptr_iterator_proto = {
> +       .func           = bpf_dynptr_iterator,
> +       .gpl_only       = false,
> +       .ret_type       = RET_INTEGER,
> +       .arg1_type      = ARG_PTR_TO_DYNPTR,
> +       .arg2_type      = ARG_PTR_TO_FUNC,
> +       .arg3_type      = ARG_PTR_TO_STACK_OR_NULL,
> +       .arg4_type      = ARG_ANYTHING,
> +};
> +
>  const struct bpf_func_proto bpf_get_current_task_proto __weak;
>  const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
>  const struct bpf_func_proto bpf_probe_read_user_proto __weak;
> @@ -1869,6 +1907,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
>                 return &bpf_dynptr_get_offset_proto;
>         case BPF_FUNC_dynptr_clone:
>                 return &bpf_dynptr_clone_proto;
> +       case BPF_FUNC_dynptr_iterator:
> +               return &bpf_dynptr_iterator_proto;
>         default:
>                 break;
>         }
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 2eb2a4410344..76108cd4ed85 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6901,6 +6901,29 @@ static int set_map_elem_callback_state(struct bpf_verifier_env *env,
>         return 0;
>  }
>
> +static int set_dynptr_iterator_callback_state(struct bpf_verifier_env *env,
> +                                             struct bpf_func_state *caller,
> +                                             struct bpf_func_state *callee,
> +                                             int insn_idx)
> +{
> +       /* bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn,
> +        * void *callback_ctx, u64 flags);
> +        *
> +        * callback_fn(struct bpf_dynptr *ptr, void *callback_ctx);
> +        */
> +       callee->regs[BPF_REG_1] = caller->regs[BPF_REG_1];
> +       callee->regs[BPF_REG_2] = caller->regs[BPF_REG_3];
> +       callee->callback_ret_range = tnum_range(0, U64_MAX);
> +
> +       /* unused */
> +       __mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
> +       __mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
> +       __mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
> +
> +       callee->in_callback_fn = true;
> +       return 0;
> +}
> +
>  static int set_loop_callback_state(struct bpf_verifier_env *env,
>                                    struct bpf_func_state *caller,
>                                    struct bpf_func_state *callee,
> @@ -7472,6 +7495,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>
>                 break;
>         }
> +       case BPF_FUNC_dynptr_iterator:
> +               err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
> +                                       set_dynptr_iterator_callback_state);
> +               break;
>         }
>
>         if (err)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 16973fa4d073..ff78a94c262a 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5531,6 +5531,25 @@ union bpf_attr {
>   *             losing access to the original view of the dynptr.
>   *     Return
>   *             0 on success, -EINVAL if the dynptr to clone is invalid.
> + *
> + * long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, void *callback_ctx, u64 flags)
> + *     Description
> + *             Iterate through the dynptr data, calling **callback_fn** on each
> + *             iteration with **callback_ctx** as the context parameter.
> + *             The **callback_fn** should be a static function and
> + *             the **callback_ctx** should be a pointer to the stack.
> + *             Currently **flags** is unused and must be 0.
> + *
> + *             long (\*callback_fn)(struct bpf_dynptr \*ptr, void \*ctx);
> + *
> + *             where **callback_fn** returns the number of bytes to advance
> + *             the dynptr by or an error. The iteration will stop if **callback_fn**
> + *             returns 0 or an error or tries to advance by more bytes than the
> + *             size of the dynptr.
> + *     Return
> + *             0 on success, -EINVAL if the dynptr is invalid or **flags** is not 0,
> + *             -ERANGE if attempting to iterate more bytes than available, or other
> + *             negative error if **callback_fn** returns an error.
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -5752,6 +5771,7 @@ union bpf_attr {
>         FN(dynptr_get_size),            \
>         FN(dynptr_get_offset),          \
>         FN(dynptr_clone),               \
> +       FN(dynptr_iterator),            \
>         /* */
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> --
> 2.30.2
>
Andrii Nakryiko Sept. 28, 2022, 10:41 p.m. UTC | #2
On Wed, Sep 7, 2022 at 5:07 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Add a new helper function, bpf_dynptr_iterator:
>
>   long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn,
>                            void *callback_ctx, u64 flags)
>
> where callback_fn is defined as:
>
>   long (*callback_fn)(struct bpf_dynptr *ptr, void *ctx)
>
> and callback_fn returns the number of bytes to advance the
> dynptr by (or an error code in the case of error). The iteration
> will stop if the callback_fn returns 0 or an error or tries to
> advance by more bytes than available.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/uapi/linux/bpf.h       | 20 ++++++++++++++
>  kernel/bpf/helpers.c           | 48 +++++++++++++++++++++++++++++++---
>  kernel/bpf/verifier.c          | 27 +++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 20 ++++++++++++++
>  4 files changed, 111 insertions(+), 4 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 16973fa4d073..ff78a94c262a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5531,6 +5531,25 @@ union bpf_attr {
>   *             losing access to the original view of the dynptr.
>   *     Return
>   *             0 on success, -EINVAL if the dynptr to clone is invalid.
> + *
> + * long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, void *callback_ctx, u64 flags)
> + *     Description
> + *             Iterate through the dynptr data, calling **callback_fn** on each
> + *             iteration with **callback_ctx** as the context parameter.
> + *             The **callback_fn** should be a static function and
> + *             the **callback_ctx** should be a pointer to the stack.
> + *             Currently **flags** is unused and must be 0.
> + *
> + *             long (\*callback_fn)(struct bpf_dynptr \*ptr, void \*ctx);
> + *
> + *             where **callback_fn** returns the number of bytes to advance
> + *             the dynptr by or an error. The iteration will stop if **callback_fn**
> + *             returns 0 or an error or tries to advance by more bytes than the
> + *             size of the dynptr.
> + *     Return
> + *             0 on success, -EINVAL if the dynptr is invalid or **flags** is not 0,
> + *             -ERANGE if attempting to iterate more bytes than available, or other
> + *             negative error if **callback_fn** returns an error.
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -5752,6 +5771,7 @@ union bpf_attr {
>         FN(dynptr_get_size),            \
>         FN(dynptr_get_offset),          \
>         FN(dynptr_clone),               \
> +       FN(dynptr_iterator),            \
>         /* */
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 667f1e213a61..519b3da06d49 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1653,13 +1653,11 @@ static const struct bpf_func_proto bpf_dynptr_data_proto = {
>         .arg3_type      = ARG_CONST_ALLOC_SIZE_OR_ZERO,
>  };
>
> -BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> +/* *ptr* should always be a valid dynptr */
> +static int __bpf_dynptr_advance(struct bpf_dynptr_kern *ptr, u32 len)
>  {
>         u32 size;
>
> -       if (!ptr->data)
> -               return -EINVAL;
> -
>         size = __bpf_dynptr_get_size(ptr);
>
>         if (len > size)
> @@ -1672,6 +1670,14 @@ BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
>         return 0;
>  }
>
> +BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> +{
> +       if (!ptr->data)
> +               return -EINVAL;
> +
> +       return __bpf_dynptr_advance(ptr, len);
> +}
> +
>  static const struct bpf_func_proto bpf_dynptr_advance_proto = {
>         .func           = bpf_dynptr_advance,
>         .gpl_only       = false,
> @@ -1783,6 +1789,38 @@ static const struct bpf_func_proto bpf_dynptr_clone_proto = {
>         .arg2_type      = ARG_PTR_TO_DYNPTR | MEM_UNINIT,
>  };
>
> +BPF_CALL_4(bpf_dynptr_iterator, struct bpf_dynptr_kern *, ptr, void *, callback_fn,
> +          void *, callback_ctx, u64, flags)
> +{
> +       bpf_callback_t callback = (bpf_callback_t)callback_fn;
> +       int nr_bytes, err;
> +
> +       if (!ptr->data || flags)
> +               return -EINVAL;
> +
> +       while (ptr->size > 0) {
> +               nr_bytes = callback((u64)(long)ptr, (u64)(long)callback_ctx, 0, 0, 0);
> +               if (nr_bytes <= 0)
> +                       return nr_bytes;

callback is defined as returning long but here you are silently
truncating it to int. Let's either stick to longs or to ints.

> +
> +               err = __bpf_dynptr_advance(ptr, nr_bytes);

as Kumar pointed out, you can't modify ptr in place, you have to
create a local copy and bpf_dynptr_clone() it (so that for MALLOC
you'll bump refcnt). Then advance and pass it to callback. David has
such local dynptr use case in bpf_user_ringbuf_drain() helper.

> +               if (err)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +

[...]
Andrii Nakryiko Sept. 28, 2022, 10:47 p.m. UTC | #3
On Sun, Sep 18, 2022 at 5:08 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Thu, 8 Sept 2022 at 02:16, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > Add a new helper function, bpf_dynptr_iterator:
> >
> >   long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn,
> >                            void *callback_ctx, u64 flags)
> >
> > where callback_fn is defined as:
> >
> >   long (*callback_fn)(struct bpf_dynptr *ptr, void *ctx)
> >
> > and callback_fn returns the number of bytes to advance the
> > dynptr by (or an error code in the case of error). The iteration
> > will stop if the callback_fn returns 0 or an error or tries to
> > advance by more bytes than available.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
>
> This is buggy as is.
>
> A user can just reinitialize the dynptr from inside the callback, and
> then you will never stop advancing it inside your helper, therefore an
> infinite loop can be constructed. The stack frame of the caller is
> accessible using callback_ctx.
>
> For example (modifying your selftest)
>
> diff --git a/tools/testing/selftests/bpf/progs/dynptr_success.c
> b/tools/testing/selftests/bpf/progs/dynptr_success.c
> index 22164ad6df9d..a9e78316c508 100644
> --- a/tools/testing/selftests/bpf/progs/dynptr_success.c
> +++ b/tools/testing/selftests/bpf/progs/dynptr_success.c
> @@ -540,6 +540,19 @@ struct iter_ctx {
>
>  static int iter_callback(struct bpf_dynptr *ptr, struct iter_ctx *ctx)
>  {
> +       struct map_value *map_val;
> +       int key = 0;
> +
> +       map_val = bpf_map_lookup_elem(&array_map2, &key);
> +       if (!map_val) {
> +               return 0;
> +       }
> +
> +       *(void **)ptr = 0;
> +       *((void **)ptr + 1) = 0;
> +       bpf_dynptr_from_mem(map_val, 2, 0, ptr);
> +       return 1;
> +
>         if (ctx->trigger_err_erange)
>                 return bpf_dynptr_get_size(ptr) + 1;
>
> ... leads to a lockup.
>
> It doesn't have to be ringbuf_reserver_dynptr, it can just be
> dynptr_from_mem, which also gets around reference state restrictions
> inside callbacks.
>
> You cannot prevent overwriting dynptr stack slots in general. Some of
> them don't have to be released. It would be prohibitive for stack
> reuse.
>
> So it seems like you need to temporarily mark both the slots as
> immutable for the caller stack state during exploration of the
> callback.
> Setting some flag in r1 for callback is not enough (as it can reload
> PTR_TO_STACK of caller stack frame pointing at dynptr from
> callback_ctx). It needs to be set in spilled_ptr.

This sounds overcomplicated. We need a local copy of dynptr for the
duration of iteration and work with it. Basically internal
bpf_dynptr_clone(). See my other reply in this thread.

>
> Then certain operations modifying the view of the dynptr do not accept
> dynptr with that type flag set (e.g. trim, advance, init functions).
> While for others which only operate on the underlying view, you fold
> the flag (e.g. read/write/dynptr_data).
>
> It is the difference between struct bpf_dynptr *, vs const struct
> bpf_dynptr *, we need to give the callback access to the latter.
> I.e. it should still allow accessing the dynptr's view, but not modifying it.
>
> And at the risk of sounding like a broken record (and same suggestion
> as Martin in skb/xdp v6 set), the view's mutability should ideally
> also be part of the verifier's state. That doesn't preclude runtime
> tracking later, but there seems to be no strong motivation for that
> right now.

The unexpected NULL for bpf_dynptr_data() vs bpf_dynptr_data_rdonly()
argument from Martin is pretty convincing, I agree. So I guess I don't
mind tracking it statically at this point.

>
> >  include/uapi/linux/bpf.h       | 20 ++++++++++++++
> >  kernel/bpf/helpers.c           | 48 +++++++++++++++++++++++++++++++---
> >  kernel/bpf/verifier.c          | 27 +++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h | 20 ++++++++++++++
> >  4 files changed, 111 insertions(+), 4 deletions(-)
> >

please trim irrelevant parts

[...]
Kumar Kartikeya Dwivedi Sept. 29, 2022, 12:31 a.m. UTC | #4
On Thu, 29 Sept 2022 at 01:04, Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Sep 7, 2022 at 5:07 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > Add a new helper function, bpf_dynptr_iterator:
> >
> >   long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn,
> >                            void *callback_ctx, u64 flags)
> >
> > where callback_fn is defined as:
> >
> >   long (*callback_fn)(struct bpf_dynptr *ptr, void *ctx)
> >
> > and callback_fn returns the number of bytes to advance the
> > dynptr by (or an error code in the case of error). The iteration
> > will stop if the callback_fn returns 0 or an error or tries to
> > advance by more bytes than available.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  include/uapi/linux/bpf.h       | 20 ++++++++++++++
> >  kernel/bpf/helpers.c           | 48 +++++++++++++++++++++++++++++++---
> >  kernel/bpf/verifier.c          | 27 +++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h | 20 ++++++++++++++
> >  4 files changed, 111 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 16973fa4d073..ff78a94c262a 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -5531,6 +5531,25 @@ union bpf_attr {
> >   *             losing access to the original view of the dynptr.
> >   *     Return
> >   *             0 on success, -EINVAL if the dynptr to clone is invalid.
> > + *
> > + * long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, void *callback_ctx, u64 flags)
> > + *     Description
> > + *             Iterate through the dynptr data, calling **callback_fn** on each
> > + *             iteration with **callback_ctx** as the context parameter.
> > + *             The **callback_fn** should be a static function and
> > + *             the **callback_ctx** should be a pointer to the stack.
> > + *             Currently **flags** is unused and must be 0.
> > + *
> > + *             long (\*callback_fn)(struct bpf_dynptr \*ptr, void \*ctx);
> > + *
> > + *             where **callback_fn** returns the number of bytes to advance
> > + *             the dynptr by or an error. The iteration will stop if **callback_fn**
> > + *             returns 0 or an error or tries to advance by more bytes than the
> > + *             size of the dynptr.
> > + *     Return
> > + *             0 on success, -EINVAL if the dynptr is invalid or **flags** is not 0,
> > + *             -ERANGE if attempting to iterate more bytes than available, or other
> > + *             negative error if **callback_fn** returns an error.
> >   */
> >  #define __BPF_FUNC_MAPPER(FN)          \
> >         FN(unspec),                     \
> > @@ -5752,6 +5771,7 @@ union bpf_attr {
> >         FN(dynptr_get_size),            \
> >         FN(dynptr_get_offset),          \
> >         FN(dynptr_clone),               \
> > +       FN(dynptr_iterator),            \
> >         /* */
> >
> >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 667f1e213a61..519b3da06d49 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -1653,13 +1653,11 @@ static const struct bpf_func_proto bpf_dynptr_data_proto = {
> >         .arg3_type      = ARG_CONST_ALLOC_SIZE_OR_ZERO,
> >  };
> >
> > -BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> > +/* *ptr* should always be a valid dynptr */
> > +static int __bpf_dynptr_advance(struct bpf_dynptr_kern *ptr, u32 len)
> >  {
> >         u32 size;
> >
> > -       if (!ptr->data)
> > -               return -EINVAL;
> > -
> >         size = __bpf_dynptr_get_size(ptr);
> >
> >         if (len > size)
> > @@ -1672,6 +1670,14 @@ BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> >         return 0;
> >  }
> >
> > +BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> > +{
> > +       if (!ptr->data)
> > +               return -EINVAL;
> > +
> > +       return __bpf_dynptr_advance(ptr, len);
> > +}
> > +
> >  static const struct bpf_func_proto bpf_dynptr_advance_proto = {
> >         .func           = bpf_dynptr_advance,
> >         .gpl_only       = false,
> > @@ -1783,6 +1789,38 @@ static const struct bpf_func_proto bpf_dynptr_clone_proto = {
> >         .arg2_type      = ARG_PTR_TO_DYNPTR | MEM_UNINIT,
> >  };
> >
> > +BPF_CALL_4(bpf_dynptr_iterator, struct bpf_dynptr_kern *, ptr, void *, callback_fn,
> > +          void *, callback_ctx, u64, flags)
> > +{
> > +       bpf_callback_t callback = (bpf_callback_t)callback_fn;
> > +       int nr_bytes, err;
> > +
> > +       if (!ptr->data || flags)
> > +               return -EINVAL;
> > +
> > +       while (ptr->size > 0) {
> > +               nr_bytes = callback((u64)(long)ptr, (u64)(long)callback_ctx, 0, 0, 0);
> > +               if (nr_bytes <= 0)
> > +                       return nr_bytes;
>
> callback is defined as returning long but here you are silently
> truncating it to int. Let's either stick to longs or to ints.
>
> > +
> > +               err = __bpf_dynptr_advance(ptr, nr_bytes);
>
> as Kumar pointed out, you can't modify ptr in place, you have to
> create a local copy and bpf_dynptr_clone() it (so that for MALLOC
> you'll bump refcnt). Then advance and pass it to callback. David has
> such local dynptr use case in bpf_user_ringbuf_drain() helper.
>

My solution was a bit overcomplicated because just creating a local
copy doesn't fix this completely, there's still the hole of writing
through arg#0 (which now does not reflect runtime state, as writes at
runtime go to clone while verifier thinks you touched stack slots),
and still allows constructing the infinite loop (because well, you can
overwrite dynptr through it). The 'side channel' of writing to dynptr
slots through callback_ctx is still there as well.

Maybe the infinite loop _can_ be avoided if you clone inside each
iteration, that part isn't very clear.

My reasoning was that when you iterate, the container is always
immutable (to prevent iterator invalidation) while mutability of
elements remains unaffected from that. Setting it in spilled_ptr
covers all bases (PTR_TO_STACK arg#0, access to it through
callback_ctx, etc.).

But I totally agree with you that we should be working on a local copy
inside the helper and leave the original dynptr untouched.
I think then the first arg should not be PTR_TO_STACK but some other
pointer type. Maybe it should be its own register type, and
spilled_ptr should reflect the same register, which allows the dynptr
state that we track to be copied into the callback arg#0 easily.

For the case of writes to the original dynptr that is already broken
right now, we can't track the state of the stack across iterations
correctly anyway so I think that has to be left as is until your N
callbacks rework happens.

wdyt?
Andrii Nakryiko Sept. 29, 2022, 12:43 a.m. UTC | #5
On Wed, Sep 28, 2022 at 5:32 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Thu, 29 Sept 2022 at 01:04, Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Sep 7, 2022 at 5:07 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > Add a new helper function, bpf_dynptr_iterator:
> > >
> > >   long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn,
> > >                            void *callback_ctx, u64 flags)
> > >
> > > where callback_fn is defined as:
> > >
> > >   long (*callback_fn)(struct bpf_dynptr *ptr, void *ctx)
> > >
> > > and callback_fn returns the number of bytes to advance the
> > > dynptr by (or an error code in the case of error). The iteration
> > > will stop if the callback_fn returns 0 or an error or tries to
> > > advance by more bytes than available.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > ---
> > >  include/uapi/linux/bpf.h       | 20 ++++++++++++++
> > >  kernel/bpf/helpers.c           | 48 +++++++++++++++++++++++++++++++---
> > >  kernel/bpf/verifier.c          | 27 +++++++++++++++++++
> > >  tools/include/uapi/linux/bpf.h | 20 ++++++++++++++
> > >  4 files changed, 111 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 16973fa4d073..ff78a94c262a 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -5531,6 +5531,25 @@ union bpf_attr {
> > >   *             losing access to the original view of the dynptr.
> > >   *     Return
> > >   *             0 on success, -EINVAL if the dynptr to clone is invalid.
> > > + *
> > > + * long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, void *callback_ctx, u64 flags)
> > > + *     Description
> > > + *             Iterate through the dynptr data, calling **callback_fn** on each
> > > + *             iteration with **callback_ctx** as the context parameter.
> > > + *             The **callback_fn** should be a static function and
> > > + *             the **callback_ctx** should be a pointer to the stack.
> > > + *             Currently **flags** is unused and must be 0.
> > > + *
> > > + *             long (\*callback_fn)(struct bpf_dynptr \*ptr, void \*ctx);
> > > + *
> > > + *             where **callback_fn** returns the number of bytes to advance
> > > + *             the dynptr by or an error. The iteration will stop if **callback_fn**
> > > + *             returns 0 or an error or tries to advance by more bytes than the
> > > + *             size of the dynptr.
> > > + *     Return
> > > + *             0 on success, -EINVAL if the dynptr is invalid or **flags** is not 0,
> > > + *             -ERANGE if attempting to iterate more bytes than available, or other
> > > + *             negative error if **callback_fn** returns an error.
> > >   */
> > >  #define __BPF_FUNC_MAPPER(FN)          \
> > >         FN(unspec),                     \
> > > @@ -5752,6 +5771,7 @@ union bpf_attr {
> > >         FN(dynptr_get_size),            \
> > >         FN(dynptr_get_offset),          \
> > >         FN(dynptr_clone),               \
> > > +       FN(dynptr_iterator),            \
> > >         /* */
> > >
> > >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index 667f1e213a61..519b3da06d49 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > > @@ -1653,13 +1653,11 @@ static const struct bpf_func_proto bpf_dynptr_data_proto = {
> > >         .arg3_type      = ARG_CONST_ALLOC_SIZE_OR_ZERO,
> > >  };
> > >
> > > -BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> > > +/* *ptr* should always be a valid dynptr */
> > > +static int __bpf_dynptr_advance(struct bpf_dynptr_kern *ptr, u32 len)
> > >  {
> > >         u32 size;
> > >
> > > -       if (!ptr->data)
> > > -               return -EINVAL;
> > > -
> > >         size = __bpf_dynptr_get_size(ptr);
> > >
> > >         if (len > size)
> > > @@ -1672,6 +1670,14 @@ BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> > >         return 0;
> > >  }
> > >
> > > +BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> > > +{
> > > +       if (!ptr->data)
> > > +               return -EINVAL;
> > > +
> > > +       return __bpf_dynptr_advance(ptr, len);
> > > +}
> > > +
> > >  static const struct bpf_func_proto bpf_dynptr_advance_proto = {
> > >         .func           = bpf_dynptr_advance,
> > >         .gpl_only       = false,
> > > @@ -1783,6 +1789,38 @@ static const struct bpf_func_proto bpf_dynptr_clone_proto = {
> > >         .arg2_type      = ARG_PTR_TO_DYNPTR | MEM_UNINIT,
> > >  };
> > >
> > > +BPF_CALL_4(bpf_dynptr_iterator, struct bpf_dynptr_kern *, ptr, void *, callback_fn,
> > > +          void *, callback_ctx, u64, flags)
> > > +{
> > > +       bpf_callback_t callback = (bpf_callback_t)callback_fn;
> > > +       int nr_bytes, err;
> > > +
> > > +       if (!ptr->data || flags)
> > > +               return -EINVAL;
> > > +
> > > +       while (ptr->size > 0) {
> > > +               nr_bytes = callback((u64)(long)ptr, (u64)(long)callback_ctx, 0, 0, 0);
> > > +               if (nr_bytes <= 0)
> > > +                       return nr_bytes;
> >
> > callback is defined as returning long but here you are silently
> > truncating it to int. Let's either stick to longs or to ints.
> >
> > > +
> > > +               err = __bpf_dynptr_advance(ptr, nr_bytes);
> >
> > as Kumar pointed out, you can't modify ptr in place, you have to
> > create a local copy and bpf_dynptr_clone() it (so that for MALLOC
> > you'll bump refcnt). Then advance and pass it to callback. David has
> > such local dynptr use case in bpf_user_ringbuf_drain() helper.
> >
>
> My solution was a bit overcomplicated because just creating a local
> copy doesn't fix this completely, there's still the hole of writing
> through arg#0 (which now does not reflect runtime state, as writes at
> runtime go to clone while verifier thinks you touched stack slots),
> and still allows constructing the infinite loop (because well, you can
> overwrite dynptr through it). The 'side channel' of writing to dynptr
> slots through callback_ctx is still there as well.
>
> Maybe the infinite loop _can_ be avoided if you clone inside each
> iteration, that part isn't very clear.
>
> My reasoning was that when you iterate, the container is always
> immutable (to prevent iterator invalidation) while mutability of
> elements remains unaffected from that. Setting it in spilled_ptr
> covers all bases (PTR_TO_STACK arg#0, access to it through
> callback_ctx, etc.).
>
> But I totally agree with you that we should be working on a local copy
> inside the helper and leave the original dynptr untouched.
> I think then the first arg should not be PTR_TO_STACK but some other
> pointer type. Maybe it should be its own register type, and
> spilled_ptr should reflect the same register, which allows the dynptr
> state that we track to be copied into the callback arg#0 easily.

Right, something like what David Vernet did with his
bpf_user_ringbuf_drain() helper that passes kernel-only (not
PTR_TO_STACK) dynptr into callback. If that implementation has the
same hole (being able to be modified), we should fix it the same way
in both cases, by not allowing this for PTR_TO_DYNPTR (or whatever the
new reg type is called).

>
> For the case of writes to the original dynptr that is already broken
> right now, we can't track the state of the stack across iterations
> correctly anyway so I think that has to be left as is until your N
> callbacks rework happens.

Right, that's a separate issue.

>
> wdyt?
Kumar Kartikeya Dwivedi Oct. 2, 2022, 4:45 p.m. UTC | #6
On Thu, 29 Sept 2022 at 02:43, Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Sep 28, 2022 at 5:32 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Thu, 29 Sept 2022 at 01:04, Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Wed, Sep 7, 2022 at 5:07 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > Add a new helper function, bpf_dynptr_iterator:
> > > >
> > > >   long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn,
> > > >                            void *callback_ctx, u64 flags)
> > > >
> > > > where callback_fn is defined as:
> > > >
> > > >   long (*callback_fn)(struct bpf_dynptr *ptr, void *ctx)
> > > >
> > > > and callback_fn returns the number of bytes to advance the
> > > > dynptr by (or an error code in the case of error). The iteration
> > > > will stop if the callback_fn returns 0 or an error or tries to
> > > > advance by more bytes than available.
> > > >
> > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > ---
> > > >  include/uapi/linux/bpf.h       | 20 ++++++++++++++
> > > >  kernel/bpf/helpers.c           | 48 +++++++++++++++++++++++++++++++---
> > > >  kernel/bpf/verifier.c          | 27 +++++++++++++++++++
> > > >  tools/include/uapi/linux/bpf.h | 20 ++++++++++++++
> > > >  4 files changed, 111 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > index 16973fa4d073..ff78a94c262a 100644
> > > > --- a/include/uapi/linux/bpf.h
> > > > +++ b/include/uapi/linux/bpf.h
> > > > @@ -5531,6 +5531,25 @@ union bpf_attr {
> > > >   *             losing access to the original view of the dynptr.
> > > >   *     Return
> > > >   *             0 on success, -EINVAL if the dynptr to clone is invalid.
> > > > + *
> > > > + * long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, void *callback_ctx, u64 flags)
> > > > + *     Description
> > > > + *             Iterate through the dynptr data, calling **callback_fn** on each
> > > > + *             iteration with **callback_ctx** as the context parameter.
> > > > + *             The **callback_fn** should be a static function and
> > > > + *             the **callback_ctx** should be a pointer to the stack.
> > > > + *             Currently **flags** is unused and must be 0.
> > > > + *
> > > > + *             long (\*callback_fn)(struct bpf_dynptr \*ptr, void \*ctx);
> > > > + *
> > > > + *             where **callback_fn** returns the number of bytes to advance
> > > > + *             the dynptr by or an error. The iteration will stop if **callback_fn**
> > > > + *             returns 0 or an error or tries to advance by more bytes than the
> > > > + *             size of the dynptr.
> > > > + *     Return
> > > > + *             0 on success, -EINVAL if the dynptr is invalid or **flags** is not 0,
> > > > + *             -ERANGE if attempting to iterate more bytes than available, or other
> > > > + *             negative error if **callback_fn** returns an error.
> > > >   */
> > > >  #define __BPF_FUNC_MAPPER(FN)          \
> > > >         FN(unspec),                     \
> > > > @@ -5752,6 +5771,7 @@ union bpf_attr {
> > > >         FN(dynptr_get_size),            \
> > > >         FN(dynptr_get_offset),          \
> > > >         FN(dynptr_clone),               \
> > > > +       FN(dynptr_iterator),            \
> > > >         /* */
> > > >
> > > >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > index 667f1e213a61..519b3da06d49 100644
> > > > --- a/kernel/bpf/helpers.c
> > > > +++ b/kernel/bpf/helpers.c
> > > > @@ -1653,13 +1653,11 @@ static const struct bpf_func_proto bpf_dynptr_data_proto = {
> > > >         .arg3_type      = ARG_CONST_ALLOC_SIZE_OR_ZERO,
> > > >  };
> > > >
> > > > -BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> > > > +/* *ptr* should always be a valid dynptr */
> > > > +static int __bpf_dynptr_advance(struct bpf_dynptr_kern *ptr, u32 len)
> > > >  {
> > > >         u32 size;
> > > >
> > > > -       if (!ptr->data)
> > > > -               return -EINVAL;
> > > > -
> > > >         size = __bpf_dynptr_get_size(ptr);
> > > >
> > > >         if (len > size)
> > > > @@ -1672,6 +1670,14 @@ BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> > > >         return 0;
> > > >  }
> > > >
> > > > +BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> > > > +{
> > > > +       if (!ptr->data)
> > > > +               return -EINVAL;
> > > > +
> > > > +       return __bpf_dynptr_advance(ptr, len);
> > > > +}
> > > > +
> > > >  static const struct bpf_func_proto bpf_dynptr_advance_proto = {
> > > >         .func           = bpf_dynptr_advance,
> > > >         .gpl_only       = false,
> > > > @@ -1783,6 +1789,38 @@ static const struct bpf_func_proto bpf_dynptr_clone_proto = {
> > > >         .arg2_type      = ARG_PTR_TO_DYNPTR | MEM_UNINIT,
> > > >  };
> > > >
> > > > +BPF_CALL_4(bpf_dynptr_iterator, struct bpf_dynptr_kern *, ptr, void *, callback_fn,
> > > > +          void *, callback_ctx, u64, flags)
> > > > +{
> > > > +       bpf_callback_t callback = (bpf_callback_t)callback_fn;
> > > > +       int nr_bytes, err;
> > > > +
> > > > +       if (!ptr->data || flags)
> > > > +               return -EINVAL;
> > > > +
> > > > +       while (ptr->size > 0) {
> > > > +               nr_bytes = callback((u64)(long)ptr, (u64)(long)callback_ctx, 0, 0, 0);
> > > > +               if (nr_bytes <= 0)
> > > > +                       return nr_bytes;
> > >
> > > callback is defined as returning long but here you are silently
> > > truncating it to int. Let's either stick to longs or to ints.
> > >
> > > > +
> > > > +               err = __bpf_dynptr_advance(ptr, nr_bytes);
> > >
> > > as Kumar pointed out, you can't modify ptr in place, you have to
> > > create a local copy and bpf_dynptr_clone() it (so that for MALLOC
> > > you'll bump refcnt). Then advance and pass it to callback. David has
> > > such local dynptr use case in bpf_user_ringbuf_drain() helper.
> > >
> >
> > My solution was a bit overcomplicated because just creating a local
> > copy doesn't fix this completely, there's still the hole of writing
> > through arg#0 (which now does not reflect runtime state, as writes at
> > runtime go to clone while verifier thinks you touched stack slots),
> > and still allows constructing the infinite loop (because well, you can
> > overwrite dynptr through it). The 'side channel' of writing to dynptr
> > slots through callback_ctx is still there as well.
> >
> > Maybe the infinite loop _can_ be avoided if you clone inside each
> > iteration, that part isn't very clear.
> >
> > My reasoning was that when you iterate, the container is always
> > immutable (to prevent iterator invalidation) while mutability of
> > elements remains unaffected from that. Setting it in spilled_ptr
> > covers all bases (PTR_TO_STACK arg#0, access to it through
> > callback_ctx, etc.).
> >
> > But I totally agree with you that we should be working on a local copy
> > inside the helper and leave the original dynptr untouched.
> > I think then the first arg should not be PTR_TO_STACK but some other
> > pointer type. Maybe it should be its own register type, and
> > spilled_ptr should reflect the same register, which allows the dynptr
> > state that we track to be copied into the callback arg#0 easily.
>
> Right, something like what David Vernet did with his
> bpf_user_ringbuf_drain() helper that passes kernel-only (not
> PTR_TO_STACK) dynptr into callback. If that implementation has the
> same hole (being able to be modified), we should fix it the same way
> in both cases, by not allowing this for PTR_TO_DYNPTR (or whatever the
> new reg type is called).
>

Sorry for the late response.

Somehow I missed that series entirely. Yes, we should reuse that register type,
and it does seem like it needs to check for MEM_UNINIT to prevent
reinitialization. I'm rolling that fix into
my dynptr series that I'm sending next week, since it would lead to
lots of conflicts otherwise.
Then this set can use the same approach.
Andrii Nakryiko Oct. 3, 2022, 6:39 p.m. UTC | #7
On Sun, Oct 2, 2022 at 9:46 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Thu, 29 Sept 2022 at 02:43, Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Sep 28, 2022 at 5:32 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Thu, 29 Sept 2022 at 01:04, Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Wed, Sep 7, 2022 at 5:07 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > Add a new helper function, bpf_dynptr_iterator:
> > > > >
> > > > >   long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn,
> > > > >                            void *callback_ctx, u64 flags)
> > > > >
> > > > > where callback_fn is defined as:
> > > > >
> > > > >   long (*callback_fn)(struct bpf_dynptr *ptr, void *ctx)
> > > > >
> > > > > and callback_fn returns the number of bytes to advance the
> > > > > dynptr by (or an error code in the case of error). The iteration
> > > > > will stop if the callback_fn returns 0 or an error or tries to
> > > > > advance by more bytes than available.
> > > > >
> > > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > > ---
> > > > >  include/uapi/linux/bpf.h       | 20 ++++++++++++++
> > > > >  kernel/bpf/helpers.c           | 48 +++++++++++++++++++++++++++++++---
> > > > >  kernel/bpf/verifier.c          | 27 +++++++++++++++++++
> > > > >  tools/include/uapi/linux/bpf.h | 20 ++++++++++++++
> > > > >  4 files changed, 111 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > index 16973fa4d073..ff78a94c262a 100644
> > > > > --- a/include/uapi/linux/bpf.h
> > > > > +++ b/include/uapi/linux/bpf.h
> > > > > @@ -5531,6 +5531,25 @@ union bpf_attr {
> > > > >   *             losing access to the original view of the dynptr.
> > > > >   *     Return
> > > > >   *             0 on success, -EINVAL if the dynptr to clone is invalid.
> > > > > + *
> > > > > + * long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, void *callback_ctx, u64 flags)
> > > > > + *     Description
> > > > > + *             Iterate through the dynptr data, calling **callback_fn** on each
> > > > > + *             iteration with **callback_ctx** as the context parameter.
> > > > > + *             The **callback_fn** should be a static function and
> > > > > + *             the **callback_ctx** should be a pointer to the stack.
> > > > > + *             Currently **flags** is unused and must be 0.
> > > > > + *
> > > > > + *             long (\*callback_fn)(struct bpf_dynptr \*ptr, void \*ctx);
> > > > > + *
> > > > > + *             where **callback_fn** returns the number of bytes to advance
> > > > > + *             the dynptr by or an error. The iteration will stop if **callback_fn**
> > > > > + *             returns 0 or an error or tries to advance by more bytes than the
> > > > > + *             size of the dynptr.
> > > > > + *     Return
> > > > > + *             0 on success, -EINVAL if the dynptr is invalid or **flags** is not 0,
> > > > > + *             -ERANGE if attempting to iterate more bytes than available, or other
> > > > > + *             negative error if **callback_fn** returns an error.
> > > > >   */
> > > > >  #define __BPF_FUNC_MAPPER(FN)          \
> > > > >         FN(unspec),                     \
> > > > > @@ -5752,6 +5771,7 @@ union bpf_attr {
> > > > >         FN(dynptr_get_size),            \
> > > > >         FN(dynptr_get_offset),          \
> > > > >         FN(dynptr_clone),               \
> > > > > +       FN(dynptr_iterator),            \
> > > > >         /* */
> > > > >
> > > > >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > index 667f1e213a61..519b3da06d49 100644
> > > > > --- a/kernel/bpf/helpers.c
> > > > > +++ b/kernel/bpf/helpers.c
> > > > > @@ -1653,13 +1653,11 @@ static const struct bpf_func_proto bpf_dynptr_data_proto = {
> > > > >         .arg3_type      = ARG_CONST_ALLOC_SIZE_OR_ZERO,
> > > > >  };
> > > > >
> > > > > -BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> > > > > +/* *ptr* should always be a valid dynptr */
> > > > > +static int __bpf_dynptr_advance(struct bpf_dynptr_kern *ptr, u32 len)
> > > > >  {
> > > > >         u32 size;
> > > > >
> > > > > -       if (!ptr->data)
> > > > > -               return -EINVAL;
> > > > > -
> > > > >         size = __bpf_dynptr_get_size(ptr);
> > > > >
> > > > >         if (len > size)
> > > > > @@ -1672,6 +1670,14 @@ BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > +BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
> > > > > +{
> > > > > +       if (!ptr->data)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       return __bpf_dynptr_advance(ptr, len);
> > > > > +}
> > > > > +
> > > > >  static const struct bpf_func_proto bpf_dynptr_advance_proto = {
> > > > >         .func           = bpf_dynptr_advance,
> > > > >         .gpl_only       = false,
> > > > > @@ -1783,6 +1789,38 @@ static const struct bpf_func_proto bpf_dynptr_clone_proto = {
> > > > >         .arg2_type      = ARG_PTR_TO_DYNPTR | MEM_UNINIT,
> > > > >  };
> > > > >
> > > > > +BPF_CALL_4(bpf_dynptr_iterator, struct bpf_dynptr_kern *, ptr, void *, callback_fn,
> > > > > +          void *, callback_ctx, u64, flags)
> > > > > +{
> > > > > +       bpf_callback_t callback = (bpf_callback_t)callback_fn;
> > > > > +       int nr_bytes, err;
> > > > > +
> > > > > +       if (!ptr->data || flags)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       while (ptr->size > 0) {
> > > > > +               nr_bytes = callback((u64)(long)ptr, (u64)(long)callback_ctx, 0, 0, 0);
> > > > > +               if (nr_bytes <= 0)
> > > > > +                       return nr_bytes;
> > > >
> > > > callback is defined as returning long but here you are silently
> > > > truncating it to int. Let's either stick to longs or to ints.
> > > >
> > > > > +
> > > > > +               err = __bpf_dynptr_advance(ptr, nr_bytes);
> > > >
> > > > as Kumar pointed out, you can't modify ptr in place, you have to
> > > > create a local copy and bpf_dynptr_clone() it (so that for MALLOC
> > > > you'll bump refcnt). Then advance and pass it to callback. David has
> > > > such local dynptr use case in bpf_user_ringbuf_drain() helper.
> > > >
> > >
> > > My solution was a bit overcomplicated because just creating a local
> > > copy doesn't fix this completely, there's still the hole of writing
> > > through arg#0 (which now does not reflect runtime state, as writes at
> > > runtime go to clone while verifier thinks you touched stack slots),
> > > and still allows constructing the infinite loop (because well, you can
> > > overwrite dynptr through it). The 'side channel' of writing to dynptr
> > > slots through callback_ctx is still there as well.
> > >
> > > Maybe the infinite loop _can_ be avoided if you clone inside each
> > > iteration, that part isn't very clear.
> > >
> > > My reasoning was that when you iterate, the container is always
> > > immutable (to prevent iterator invalidation) while mutability of
> > > elements remains unaffected from that. Setting it in spilled_ptr
> > > covers all bases (PTR_TO_STACK arg#0, access to it through
> > > callback_ctx, etc.).
> > >
> > > But I totally agree with you that we should be working on a local copy
> > > inside the helper and leave the original dynptr untouched.
> > > I think then the first arg should not be PTR_TO_STACK but some other
> > > pointer type. Maybe it should be its own register type, and
> > > spilled_ptr should reflect the same register, which allows the dynptr
> > > state that we track to be copied into the callback arg#0 easily.
> >
> > Right, something like what David Vernet did with his
> > bpf_user_ringbuf_drain() helper that passes kernel-only (not
> > PTR_TO_STACK) dynptr into callback. If that implementation has the
> > same hole (being able to be modified), we should fix it the same way
> > in both cases, by not allowing this for PTR_TO_DYNPTR (or whatever the
> > new reg type is called).
> >
>
> Sorry for the late response.
>
> Somehow I missed that series entirely. Yes, we should reuse that register type,
> and it does seem like it needs to check for MEM_UNINIT to prevent
> reinitialization. I'm rolling that fix into
> my dynptr series that I'm sending next week, since it would lead to
> lots of conflicts otherwise.
> Then this set can use the same approach.

Sounds good, thanks!
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 16973fa4d073..ff78a94c262a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5531,6 +5531,25 @@  union bpf_attr {
  *		losing access to the original view of the dynptr.
  *	Return
  *		0 on success, -EINVAL if the dynptr to clone is invalid.
+ *
+ * long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, void *callback_ctx, u64 flags)
+ *	Description
+ *		Iterate through the dynptr data, calling **callback_fn** on each
+ *		iteration with **callback_ctx** as the context parameter.
+ *		The **callback_fn** should be a static function and
+ *		the **callback_ctx** should be a pointer to the stack.
+ *		Currently **flags** is unused and must be 0.
+ *
+ *		long (\*callback_fn)(struct bpf_dynptr \*ptr, void \*ctx);
+ *
+ *		where **callback_fn** returns the number of bytes to advance
+ *		the dynptr by or an error. The iteration will stop if **callback_fn**
+ *		returns 0 or an error or tries to advance by more bytes than the
+ *		size of the dynptr.
+ *	Return
+ *		0 on success, -EINVAL if the dynptr is invalid or **flags** is not 0,
+ *		-ERANGE if attempting to iterate more bytes than available, or other
+ *		negative error if **callback_fn** returns an error.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5752,6 +5771,7 @@  union bpf_attr {
 	FN(dynptr_get_size),		\
 	FN(dynptr_get_offset),		\
 	FN(dynptr_clone),		\
+	FN(dynptr_iterator),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 667f1e213a61..519b3da06d49 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1653,13 +1653,11 @@  static const struct bpf_func_proto bpf_dynptr_data_proto = {
 	.arg3_type	= ARG_CONST_ALLOC_SIZE_OR_ZERO,
 };
 
-BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
+/* *ptr* should always be a valid dynptr */
+static int __bpf_dynptr_advance(struct bpf_dynptr_kern *ptr, u32 len)
 {
 	u32 size;
 
-	if (!ptr->data)
-		return -EINVAL;
-
 	size = __bpf_dynptr_get_size(ptr);
 
 	if (len > size)
@@ -1672,6 +1670,14 @@  BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
 	return 0;
 }
 
+BPF_CALL_2(bpf_dynptr_advance, struct bpf_dynptr_kern *, ptr, u32, len)
+{
+	if (!ptr->data)
+		return -EINVAL;
+
+	return __bpf_dynptr_advance(ptr, len);
+}
+
 static const struct bpf_func_proto bpf_dynptr_advance_proto = {
 	.func		= bpf_dynptr_advance,
 	.gpl_only	= false,
@@ -1783,6 +1789,38 @@  static const struct bpf_func_proto bpf_dynptr_clone_proto = {
 	.arg2_type	= ARG_PTR_TO_DYNPTR | MEM_UNINIT,
 };
 
+BPF_CALL_4(bpf_dynptr_iterator, struct bpf_dynptr_kern *, ptr, void *, callback_fn,
+	   void *, callback_ctx, u64, flags)
+{
+	bpf_callback_t callback = (bpf_callback_t)callback_fn;
+	int nr_bytes, err;
+
+	if (!ptr->data || flags)
+		return -EINVAL;
+
+	while (ptr->size > 0) {
+		nr_bytes = callback((u64)(long)ptr, (u64)(long)callback_ctx, 0, 0, 0);
+		if (nr_bytes <= 0)
+			return nr_bytes;
+
+		err = __bpf_dynptr_advance(ptr, nr_bytes);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_dynptr_iterator_proto = {
+	.func		= bpf_dynptr_iterator,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_DYNPTR,
+	.arg2_type	= ARG_PTR_TO_FUNC,
+	.arg3_type	= ARG_PTR_TO_STACK_OR_NULL,
+	.arg4_type	= ARG_ANYTHING,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1869,6 +1907,8 @@  bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_dynptr_get_offset_proto;
 	case BPF_FUNC_dynptr_clone:
 		return &bpf_dynptr_clone_proto;
+	case BPF_FUNC_dynptr_iterator:
+		return &bpf_dynptr_iterator_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2eb2a4410344..76108cd4ed85 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6901,6 +6901,29 @@  static int set_map_elem_callback_state(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int set_dynptr_iterator_callback_state(struct bpf_verifier_env *env,
+					      struct bpf_func_state *caller,
+					      struct bpf_func_state *callee,
+					      int insn_idx)
+{
+	/* bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn,
+	 * void *callback_ctx, u64 flags);
+	 *
+	 * callback_fn(struct bpf_dynptr *ptr, void *callback_ctx);
+	 */
+	callee->regs[BPF_REG_1] = caller->regs[BPF_REG_1];
+	callee->regs[BPF_REG_2] = caller->regs[BPF_REG_3];
+	callee->callback_ret_range = tnum_range(0, U64_MAX);
+
+	/* unused */
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
+
+	callee->in_callback_fn = true;
+	return 0;
+}
+
 static int set_loop_callback_state(struct bpf_verifier_env *env,
 				   struct bpf_func_state *caller,
 				   struct bpf_func_state *callee,
@@ -7472,6 +7495,10 @@  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 
 		break;
 	}
+	case BPF_FUNC_dynptr_iterator:
+		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+					set_dynptr_iterator_callback_state);
+		break;
 	}
 
 	if (err)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 16973fa4d073..ff78a94c262a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5531,6 +5531,25 @@  union bpf_attr {
  *		losing access to the original view of the dynptr.
  *	Return
  *		0 on success, -EINVAL if the dynptr to clone is invalid.
+ *
+ * long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, void *callback_ctx, u64 flags)
+ *	Description
+ *		Iterate through the dynptr data, calling **callback_fn** on each
+ *		iteration with **callback_ctx** as the context parameter.
+ *		The **callback_fn** should be a static function and
+ *		the **callback_ctx** should be a pointer to the stack.
+ *		Currently **flags** is unused and must be 0.
+ *
+ *		long (\*callback_fn)(struct bpf_dynptr \*ptr, void \*ctx);
+ *
+ *		where **callback_fn** returns the number of bytes to advance
+ *		the dynptr by or an error. The iteration will stop if **callback_fn**
+ *		returns 0 or an error or tries to advance by more bytes than the
+ *		size of the dynptr.
+ *	Return
+ *		0 on success, -EINVAL if the dynptr is invalid or **flags** is not 0,
+ *		-ERANGE if attempting to iterate more bytes than available, or other
+ *		negative error if **callback_fn** returns an error.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5752,6 +5771,7 @@  union bpf_attr {
 	FN(dynptr_get_size),		\
 	FN(dynptr_get_offset),		\
 	FN(dynptr_clone),		\
+	FN(dynptr_iterator),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper