diff mbox series

[v1,bpf-next,4/5] bpf: Add bpf_dynptr_clone

Message ID 20230409033431.3992432-5-joannelkoong@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Dynptr convenience helpers | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 67 this patch: 69
netdev/cc_maintainers warning 12 maintainers not CCed: song@kernel.org sdf@google.com haoluo@google.com yhs@fb.com kuba@kernel.org john.fastabend@gmail.com kpsingh@kernel.org jolsa@kernel.org hawk@kernel.org netdev@vger.kernel.org martin.lau@linux.dev davem@davemloft.net
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 67 this patch: 69
netdev/checkpatch warning WARNING: line length of 103 exceeds 80 columns WARNING: line length of 108 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 93 exceeds 80 columns WARNING: line length of 94 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns WARNING: line length of 97 exceeds 80 columns WARNING: line length of 98 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-7 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_parallel on aarch64 with llvm-16

Commit Message

Joanne Koong April 9, 2023, 3:34 a.m. UTC
The cloned dynptr will point to the same data as its parent dynptr,
with the same type, offset, size and read-only properties.

Any writes to a dynptr will be reflected across all instances
(by 'instance', this means any dynptrs that point to the same
underlying data).

Please note that data slice and dynptr invalidations will affect all
instances as well. For example, if bpf_dynptr_write() is called on an
skb-type dynptr, all data slices of dynptr instances to that skb
will be invalidated as well (eg data slices of any clones, parents,
grandparents, ...). Another example is if a ringbuf dynptr is submitted,
any instance of that dynptr will be invalidated.

Changing the view of the dynptr (eg advancing the offset or
trimming the size) will only affect that dynptr and not affect any
other instances.

One example use case where cloning may be helpful is for hashing or
iterating through dynptr data. Cloning will allow the user to maintain
the original view of the dynptr for future use, while also allowing
views to smaller subsets of the data after the offset is advanced or the
size is trimmed.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 kernel/bpf/helpers.c  |  14 +++++
 kernel/bpf/verifier.c | 125 +++++++++++++++++++++++++++++++++++++-----
 2 files changed, 126 insertions(+), 13 deletions(-)

Comments

Andrii Nakryiko April 12, 2023, 10:12 p.m. UTC | #1
On Sat, Apr 8, 2023 at 8:34 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> The cloned dynptr will point to the same data as its parent dynptr,
> with the same type, offset, size and read-only properties.
>
> Any writes to a dynptr will be reflected across all instances
> (by 'instance', this means any dynptrs that point to the same
> underlying data).
>
> Please note that data slice and dynptr invalidations will affect all
> instances as well. For example, if bpf_dynptr_write() is called on an
> skb-type dynptr, all data slices of dynptr instances to that skb
> will be invalidated as well (eg data slices of any clones, parents,
> grandparents, ...). Another example is if a ringbuf dynptr is submitted,
> any instance of that dynptr will be invalidated.
>
> Changing the view of the dynptr (eg advancing the offset or
> trimming the size) will only affect that dynptr and not affect any
> other instances.
>
> One example use case where cloning may be helpful is for hashing or
> iterating through dynptr data. Cloning will allow the user to maintain
> the original view of the dynptr for future use, while also allowing
> views to smaller subsets of the data after the offset is advanced or the
> size is trimmed.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  kernel/bpf/helpers.c  |  14 +++++
>  kernel/bpf/verifier.c | 125 +++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 126 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index bac4c6fe49f0..108f3bcfa6da 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2351,6 +2351,19 @@ __bpf_kfunc __u32 bpf_dynptr_get_offset(const struct bpf_dynptr_kern *ptr)
>         return ptr->offset;
>  }
>
> +__bpf_kfunc int bpf_dynptr_clone(struct bpf_dynptr_kern *ptr,
> +                                struct bpf_dynptr_kern *clone__uninit)

I think most of uses for bpf_dynptr_clone() would be to get a partial
view (like you mentioned above, to, e.g., do a hashing of a part of
the memory range). So it feels it would be best UX if clone would
allow you to define a new range in one go. So e.g., to create a
"sub-dynptr" for range of bytes [10, 30), it should be enough to:

struct bpf_dynptr orig_ptr, new_ptr;

bpf_dynptr_clone(&orig_ptr, 10, 30, &new_ptr);

Instead of:

bpf_dynptr_clone(&orig_ptr, &new_ptr);
bpf_dynptr_advance(&new_ptr, 10);
bpf_dynptr_trim(&new_ptr, bpf_dynptr_get_size(&orig_ptr) - 30);


This, btw, shows the awkwardness of the bpf_dynptr_trim() approach.

If someone really wanted an exact full-sized copy, it's trivial:

bpf_dynptr_clone(&orig_ptr, 0, bpf_dynptr_get_size(&orig_ptr), &new_ptr);


BTW, let's rename bpf_dynptr_get_size -> bpf_dynptr_size()? That
"get_" is a sore in the eye, IMO.


> +{
> +       if (!ptr->data) {
> +               bpf_dynptr_set_null(clone__uninit);
> +               return -EINVAL;
> +       }
> +
> +       memcpy(clone__uninit, ptr, sizeof(*clone__uninit));

why memcpy instead of `*clone__uninit = *ptr`?

> +
> +       return 0;
> +}
> +
>  __bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
>  {
>         return obj;
> @@ -2429,6 +2442,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>  BTF_ID_FLAGS(func, bpf_dynptr_get_size)
>  BTF_ID_FLAGS(func, bpf_dynptr_get_offset)
> +BTF_ID_FLAGS(func, bpf_dynptr_clone)
>  BTF_SET8_END(common_btf_ids)
>
>  static const struct btf_kfunc_id_set common_kfunc_set = {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 3660b573048a..804cb50050f9 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -302,6 +302,7 @@ struct bpf_kfunc_call_arg_meta {
>         struct {
>                 enum bpf_dynptr_type type;
>                 u32 id;
> +               u32 ref_obj_id;
>         } initialized_dynptr;
>         struct {
>                 u8 spi;
> @@ -963,24 +964,15 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
>         return 0;
>  }
>
> -static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> +static void invalidate_dynptr(struct bpf_verifier_env *env, struct bpf_func_state *state, int spi)
>  {
> -       struct bpf_func_state *state = func(env, reg);
> -       int spi, i;
> -
> -       spi = dynptr_get_spi(env, reg);
> -       if (spi < 0)
> -               return spi;
> +       int i;
>
>         for (i = 0; i < BPF_REG_SIZE; i++) {
>                 state->stack[spi].slot_type[i] = STACK_INVALID;
>                 state->stack[spi - 1].slot_type[i] = STACK_INVALID;
>         }
>
> -       /* Invalidate any slices associated with this dynptr */
> -       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type))
> -               WARN_ON_ONCE(release_reference(env, state->stack[spi].spilled_ptr.ref_obj_id));
> -
>         __mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
>         __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
>
> @@ -1007,6 +999,51 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
>          */
>         state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
>         state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN;
> +}
> +
> +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> +{
> +       struct bpf_func_state *state = func(env, reg);
> +       int spi;
> +
> +       spi = dynptr_get_spi(env, reg);
> +       if (spi < 0)
> +               return spi;
> +
> +       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
> +               int ref_obj_id = state->stack[spi].spilled_ptr.ref_obj_id;
> +               int i;
> +
> +               /* If the dynptr has a ref_obj_id, then we need to invaldiate

typo: invalidate

> +                * two things:
> +                *
> +                * 1) Any dynptrs with a matching ref_obj_id (clones)
> +                * 2) Any slices associated with the ref_obj_id

I think this is where this dynptr_id confusion comes from. The rule
should be "any slices derived from this dynptr". But as mentioned on
another thread, it's a separate topic which we can address later.

> +                */
> +
> +               /* Invalidate any slices associated with this dynptr */
> +               WARN_ON_ONCE(release_reference(env, ref_obj_id));
> +
> +               /* Invalidate any dynptr clones */
> +               for (i = 1; i < state->allocated_stack / BPF_REG_SIZE; i++) {
> +                       if (state->stack[i].spilled_ptr.ref_obj_id == ref_obj_id) {

nit: invert if condition and continue to reduce nestedness of the rest
the loop body

> +                               /* it should always be the case that if the ref obj id
> +                                * matches then the stack slot also belongs to a
> +                                * dynptr
> +                                */
> +                               if (state->stack[i].slot_type[0] != STACK_DYNPTR) {
> +                                       verbose(env, "verifier internal error: misconfigured ref_obj_id\n");
> +                                       return -EFAULT;
> +                               }
> +                               if (state->stack[i].spilled_ptr.dynptr.first_slot)
> +                                       invalidate_dynptr(env, state, i);
> +                       }
> +               }
> +
> +               return 0;
> +       }
> +
> +       invalidate_dynptr(env, state, spi);
>
>         return 0;
>  }
> @@ -6967,6 +7004,50 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
>         return 0;
>  }
>
> +static int handle_dynptr_clone(struct bpf_verifier_env *env, enum bpf_arg_type arg_type,
> +                              int regno, int insn_idx, struct bpf_kfunc_call_arg_meta *meta)
> +{
> +       struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
> +       struct bpf_reg_state *first_reg_state, *second_reg_state;
> +       struct bpf_func_state *state = func(env, reg);
> +       enum bpf_dynptr_type dynptr_type = meta->initialized_dynptr.type;
> +       int err, spi, ref_obj_id;
> +
> +       if (!dynptr_type) {
> +               verbose(env, "verifier internal error: no dynptr type for bpf_dynptr_clone\n");
> +               return -EFAULT;
> +       }
> +       arg_type |= get_dynptr_type_flag(dynptr_type);


what about MEM_RDONLY and MEM_UNINIT flags, do we need to derive and add them?

> +
> +       err = process_dynptr_func(env, regno, insn_idx, arg_type);
> +       if (err < 0)
> +               return err;
> +
> +       spi = dynptr_get_spi(env, reg);
> +       if (spi < 0)
> +               return spi;
> +
> +       first_reg_state = &state->stack[spi].spilled_ptr;
> +       second_reg_state = &state->stack[spi - 1].spilled_ptr;
> +       ref_obj_id = first_reg_state->ref_obj_id;
> +
> +       /* reassign the clone the same dynptr id as the original */
> +       __mark_dynptr_reg(first_reg_state, dynptr_type, true, meta->initialized_dynptr.id);
> +       __mark_dynptr_reg(second_reg_state, dynptr_type, false, meta->initialized_dynptr.id);

I'm not sure why clone should have the same dynptr_id? Isn't it a new
instance of a dynptr? I get preserving ref_obj_id (if refcounted), but
why reusing dynptr_id?..


> +
> +       if (meta->initialized_dynptr.ref_obj_id) {
> +               /* release the new ref obj id assigned during process_dynptr_func */
> +               err = release_reference_state(cur_func(env), ref_obj_id);
> +               if (err)
> +                       return err;

ugh... this is not good to create reference just to release. If we
need to reuse logic, let's reuse the logic without parts that
shouldn't happen. Please check if we can split process_dynptr_func()
appropriately to allow this.

> +               /* reassign the clone the same ref obj id as the original */
> +               first_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
> +               second_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
> +       }
> +
> +       return 0;
> +}
> +

[...]
Joanne Koong April 14, 2023, 6:02 a.m. UTC | #2
On Wed, Apr 12, 2023 at 3:12 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sat, Apr 8, 2023 at 8:34 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > The cloned dynptr will point to the same data as its parent dynptr,
> > with the same type, offset, size and read-only properties.
> >
> > Any writes to a dynptr will be reflected across all instances
> > (by 'instance', this means any dynptrs that point to the same
> > underlying data).
> >
> > Please note that data slice and dynptr invalidations will affect all
> > instances as well. For example, if bpf_dynptr_write() is called on an
> > skb-type dynptr, all data slices of dynptr instances to that skb
> > will be invalidated as well (eg data slices of any clones, parents,
> > grandparents, ...). Another example is if a ringbuf dynptr is submitted,
> > any instance of that dynptr will be invalidated.
> >
> > Changing the view of the dynptr (eg advancing the offset or
> > trimming the size) will only affect that dynptr and not affect any
> > other instances.
> >
> > One example use case where cloning may be helpful is for hashing or
> > iterating through dynptr data. Cloning will allow the user to maintain
> > the original view of the dynptr for future use, while also allowing
> > views to smaller subsets of the data after the offset is advanced or the
> > size is trimmed.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  kernel/bpf/helpers.c  |  14 +++++
> >  kernel/bpf/verifier.c | 125 +++++++++++++++++++++++++++++++++++++-----
> >  2 files changed, 126 insertions(+), 13 deletions(-)
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index bac4c6fe49f0..108f3bcfa6da 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -2351,6 +2351,19 @@ __bpf_kfunc __u32 bpf_dynptr_get_offset(const struct bpf_dynptr_kern *ptr)
> >         return ptr->offset;
> >  }
> >
> > +__bpf_kfunc int bpf_dynptr_clone(struct bpf_dynptr_kern *ptr,
> > +                                struct bpf_dynptr_kern *clone__uninit)
>
> I think most of uses for bpf_dynptr_clone() would be to get a partial
> view (like you mentioned above, to, e.g., do a hashing of a part of
> the memory range). So it feels it would be best UX if clone would
> allow you to define a new range in one go. So e.g., to create a
> "sub-dynptr" for range of bytes [10, 30), it should be enough to:
>
> struct bpf_dynptr orig_ptr, new_ptr;
>
> bpf_dynptr_clone(&orig_ptr, 10, 30, &new_ptr);
>
> Instead of:
>
> bpf_dynptr_clone(&orig_ptr, &new_ptr);
> bpf_dynptr_advance(&new_ptr, 10);
> bpf_dynptr_trim(&new_ptr, bpf_dynptr_get_size(&orig_ptr) - 30);
>

I don't think we can do this because we can't have bpf_dynptr_clone()
fail (which might happen if we take in a range, if the range is
invalid). This is because in the case where the clone is of a
reference-counted dynptr (eg like a ringbuf-type dynptr), the clone
even if it's invalid must be treated by the verifier as a legit dynptr
(since the verifier can't know ahead of time whether the clone call
will succeed or not) which means if the invalid clone dynptr is then
passed into a reference-releasing function, the verifier will release
the reference but the actual reference won't be released at runtime
(since the clone dynptr is invalid), which would leak the reference.
An example is something like:

 // invalid range is passed, error is returned and new_ptr is invalid
bpf_dynptr_clone(&ringbuf_ptr, 9999999, 9999999, &new_ptr);
// this releases the reference and invalidates both new_ptr and ringbuf_ptr
bpf_ringbuf_discard_dynptr(&new_ptr, 0);

At runtime, bpf_ringbuf_discard_dynptr() will be a no-op since new_ptr
is invalid - the ringbuf record still needs to be submitted/discarded,
but the verifier will think this already happened

>
> This, btw, shows the awkwardness of the bpf_dynptr_trim() approach.
>
> If someone really wanted an exact full-sized copy, it's trivial:
>
> bpf_dynptr_clone(&orig_ptr, 0, bpf_dynptr_get_size(&orig_ptr), &new_ptr);
>
>
> BTW, let's rename bpf_dynptr_get_size -> bpf_dynptr_size()? That
> "get_" is a sore in the eye, IMO.

Will do!
>
>
> > +{
> > +       if (!ptr->data) {
> > +               bpf_dynptr_set_null(clone__uninit);
> > +               return -EINVAL;
> > +       }
> > +
> > +       memcpy(clone__uninit, ptr, sizeof(*clone__uninit));
>
> why memcpy instead of `*clone__uninit = *ptr`?
>
No good reason :) I'll change this for v2
> > +
> > +       return 0;
> > +}
> > +
> >  __bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
> >  {
> >         return obj;
> > @@ -2429,6 +2442,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> >  BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> >  BTF_ID_FLAGS(func, bpf_dynptr_get_size)
> >  BTF_ID_FLAGS(func, bpf_dynptr_get_offset)
> > +BTF_ID_FLAGS(func, bpf_dynptr_clone)
> >  BTF_SET8_END(common_btf_ids)
> >
> >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 3660b573048a..804cb50050f9 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -302,6 +302,7 @@ struct bpf_kfunc_call_arg_meta {
> >         struct {
> >                 enum bpf_dynptr_type type;
> >                 u32 id;
> > +               u32 ref_obj_id;
> >         } initialized_dynptr;
> >         struct {
> >                 u8 spi;
> > @@ -963,24 +964,15 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
> >         return 0;
> >  }
> >
> > -static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > +static void invalidate_dynptr(struct bpf_verifier_env *env, struct bpf_func_state *state, int spi)
> >  {
> > -       struct bpf_func_state *state = func(env, reg);
> > -       int spi, i;
> > -
> > -       spi = dynptr_get_spi(env, reg);
> > -       if (spi < 0)
> > -               return spi;
> > +       int i;
> >
> >         for (i = 0; i < BPF_REG_SIZE; i++) {
> >                 state->stack[spi].slot_type[i] = STACK_INVALID;
> >                 state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> >         }
> >
> > -       /* Invalidate any slices associated with this dynptr */
> > -       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type))
> > -               WARN_ON_ONCE(release_reference(env, state->stack[spi].spilled_ptr.ref_obj_id));
> > -
> >         __mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
> >         __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
> >
> > @@ -1007,6 +999,51 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
> >          */
> >         state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
> >         state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN;
> > +}
> > +
> > +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > +{
> > +       struct bpf_func_state *state = func(env, reg);
> > +       int spi;
> > +
> > +       spi = dynptr_get_spi(env, reg);
> > +       if (spi < 0)
> > +               return spi;
> > +
> > +       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
> > +               int ref_obj_id = state->stack[spi].spilled_ptr.ref_obj_id;
> > +               int i;
> > +
> > +               /* If the dynptr has a ref_obj_id, then we need to invaldiate
>
> typo: invalidate
>
> > +                * two things:
> > +                *
> > +                * 1) Any dynptrs with a matching ref_obj_id (clones)
> > +                * 2) Any slices associated with the ref_obj_id
>
> I think this is where this dynptr_id confusion comes from. The rule
> should be "any slices derived from this dynptr". But as mentioned on
> another thread, it's a separate topic which we can address later.
>
If there's a parent and a clone and slices derived from the parent and
slices derived from the clone, if the clone is invalidated then we
need to invalidate slices associated with the parent as well. So
shouldn't it be "any slices associated with the ref obj id" not "any
slices derived from this dynptr"? (also just a note, parent/clone
slices will share the same ref obj id and the same dynptr id, so
checking against either does the same thing)

> > +                */
> > +
> > +               /* Invalidate any slices associated with this dynptr */
> > +               WARN_ON_ONCE(release_reference(env, ref_obj_id));
> > +
> > +               /* Invalidate any dynptr clones */
> > +               for (i = 1; i < state->allocated_stack / BPF_REG_SIZE; i++) {
> > +                       if (state->stack[i].spilled_ptr.ref_obj_id == ref_obj_id) {
>
> nit: invert if condition and continue to reduce nestedness of the rest
> the loop body

Ooh good idea
>
> > +                               /* it should always be the case that if the ref obj id
> > +                                * matches then the stack slot also belongs to a
> > +                                * dynptr
> > +                                */
> > +                               if (state->stack[i].slot_type[0] != STACK_DYNPTR) {
> > +                                       verbose(env, "verifier internal error: misconfigured ref_obj_id\n");
> > +                                       return -EFAULT;
> > +                               }
> > +                               if (state->stack[i].spilled_ptr.dynptr.first_slot)
> > +                                       invalidate_dynptr(env, state, i);
> > +                       }
> > +               }
> > +
> > +               return 0;
> > +       }
> > +
> > +       invalidate_dynptr(env, state, spi);
> >
> >         return 0;
> >  }
> > @@ -6967,6 +7004,50 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
> >         return 0;
> >  }
> >
> > +static int handle_dynptr_clone(struct bpf_verifier_env *env, enum bpf_arg_type arg_type,
> > +                              int regno, int insn_idx, struct bpf_kfunc_call_arg_meta *meta)
> > +{
> > +       struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
> > +       struct bpf_reg_state *first_reg_state, *second_reg_state;
> > +       struct bpf_func_state *state = func(env, reg);
> > +       enum bpf_dynptr_type dynptr_type = meta->initialized_dynptr.type;
> > +       int err, spi, ref_obj_id;
> > +
> > +       if (!dynptr_type) {
> > +               verbose(env, "verifier internal error: no dynptr type for bpf_dynptr_clone\n");
> > +               return -EFAULT;
> > +       }
> > +       arg_type |= get_dynptr_type_flag(dynptr_type);
>
>
> what about MEM_RDONLY and MEM_UNINIT flags, do we need to derive and add them?

I don't think we need MEM_UNINIT because we can't clone an
uninitialized dynptr. But yes, definitely MEM_RDONLY. I missed that, I
will add it in in v2

>
> > +
> > +       err = process_dynptr_func(env, regno, insn_idx, arg_type);
> > +       if (err < 0)
> > +               return err;
> > +
> > +       spi = dynptr_get_spi(env, reg);
> > +       if (spi < 0)
> > +               return spi;
> > +
> > +       first_reg_state = &state->stack[spi].spilled_ptr;
> > +       second_reg_state = &state->stack[spi - 1].spilled_ptr;
> > +       ref_obj_id = first_reg_state->ref_obj_id;
> > +
> > +       /* reassign the clone the same dynptr id as the original */
> > +       __mark_dynptr_reg(first_reg_state, dynptr_type, true, meta->initialized_dynptr.id);
> > +       __mark_dynptr_reg(second_reg_state, dynptr_type, false, meta->initialized_dynptr.id);
>
> I'm not sure why clone should have the same dynptr_id? Isn't it a new
> instance of a dynptr? I get preserving ref_obj_id (if refcounted), but
> why reusing dynptr_id?..
>
I think we need to also copy over the dynptr id because in the case of
a non-reference counted dynptr, if the parent (or clone) is
invalidated (eg overwriting bytes of the dynptr on the stack), we must
also invalidate the slices of the clone (or parent)
>
> > +
> > +       if (meta->initialized_dynptr.ref_obj_id) {
> > +               /* release the new ref obj id assigned during process_dynptr_func */
> > +               err = release_reference_state(cur_func(env), ref_obj_id);
> > +               if (err)
> > +                       return err;
>
> ugh... this is not good to create reference just to release. If we
> need to reuse logic, let's reuse the logic without parts that
> shouldn't happen. Please check if we can split process_dynptr_func()
> appropriately to allow this.

I'll change this for v2. I think the simplest approach would be having
mark_stack_slots_dynptr() take in a boolean or something that'll
indicate whether it should acquire a new reference state or not
>
> > +               /* reassign the clone the same ref obj id as the original */
> > +               first_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
> > +               second_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
>
> [...]
kernel test robot April 17, 2023, 6:53 p.m. UTC | #3
Hi Joanne,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/bpf-Add-bpf_dynptr_trim-and-bpf_dynptr_advance/20230409-113652
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230409033431.3992432-5-joannelkoong%40gmail.com
patch subject: [PATCH v1 bpf-next 4/5] bpf: Add bpf_dynptr_clone
config: m68k-randconfig-s032-20230416 (https://download.01.org/0day-ci/archive/20230418/202304180233.Hk6WZE5M-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 12.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-39-gce1a6720-dirty
        # https://github.com/intel-lab-lkp/linux/commit/d7830addcc26375f56b68655ddbfb44116b3e7f6
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Joanne-Koong/bpf-Add-bpf_dynptr_trim-and-bpf_dynptr_advance/20230409-113652
        git checkout d7830addcc26375f56b68655ddbfb44116b3e7f6
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=m68k olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=m68k SHELL=/bin/bash kernel/bpf/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304180233.Hk6WZE5M-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> kernel/bpf/verifier.c:7020:41: sparse: sparse: mixing different enum types:
>> kernel/bpf/verifier.c:7020:41: sparse:    unsigned int enum bpf_arg_type
>> kernel/bpf/verifier.c:7020:41: sparse:    unsigned int enum bpf_type_flag
   kernel/bpf/verifier.c:18042:38: sparse: sparse: subtraction of functions? Share your drugs
   kernel/bpf/verifier.c: note: in included file (through include/linux/bpf.h, include/linux/bpf-cgroup.h):
   include/linux/bpfptr.h:65:40: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:65:40: sparse: sparse: cast from non-scalar
   include/linux/bpfptr.h:65:40: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:65:40: sparse: sparse: cast from non-scalar
   include/linux/bpfptr.h:65:40: sparse: sparse: cast to non-scalar
   include/linux/bpfptr.h:65:40: sparse: sparse: cast from non-scalar

vim +7020 kernel/bpf/verifier.c

  7006	
  7007	static int handle_dynptr_clone(struct bpf_verifier_env *env, enum bpf_arg_type arg_type,
  7008				       int regno, int insn_idx, struct bpf_kfunc_call_arg_meta *meta)
  7009	{
  7010		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
  7011		struct bpf_reg_state *first_reg_state, *second_reg_state;
  7012		struct bpf_func_state *state = func(env, reg);
  7013		enum bpf_dynptr_type dynptr_type = meta->initialized_dynptr.type;
  7014		int err, spi, ref_obj_id;
  7015	
  7016		if (!dynptr_type) {
  7017			verbose(env, "verifier internal error: no dynptr type for bpf_dynptr_clone\n");
  7018			return -EFAULT;
  7019		}
> 7020		arg_type |= get_dynptr_type_flag(dynptr_type);
  7021	
  7022		err = process_dynptr_func(env, regno, insn_idx, arg_type);
  7023		if (err < 0)
  7024			return err;
  7025	
  7026		spi = dynptr_get_spi(env, reg);
  7027		if (spi < 0)
  7028			return spi;
  7029	
  7030		first_reg_state = &state->stack[spi].spilled_ptr;
  7031		second_reg_state = &state->stack[spi - 1].spilled_ptr;
  7032		ref_obj_id = first_reg_state->ref_obj_id;
  7033	
  7034		/* reassign the clone the same dynptr id as the original */
  7035		__mark_dynptr_reg(first_reg_state, dynptr_type, true, meta->initialized_dynptr.id);
  7036		__mark_dynptr_reg(second_reg_state, dynptr_type, false, meta->initialized_dynptr.id);
  7037	
  7038		if (meta->initialized_dynptr.ref_obj_id) {
  7039			/* release the new ref obj id assigned during process_dynptr_func */
  7040			err = release_reference_state(cur_func(env), ref_obj_id);
  7041			if (err)
  7042				return err;
  7043			/* reassign the clone the same ref obj id as the original */
  7044			first_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
  7045			second_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
  7046		}
  7047	
  7048		return 0;
  7049	}
  7050
Andrii Nakryiko April 17, 2023, 11:46 p.m. UTC | #4
On Thu, Apr 13, 2023 at 11:03 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Wed, Apr 12, 2023 at 3:12 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sat, Apr 8, 2023 at 8:34 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > The cloned dynptr will point to the same data as its parent dynptr,
> > > with the same type, offset, size and read-only properties.
> > >
> > > Any writes to a dynptr will be reflected across all instances
> > > (by 'instance', this means any dynptrs that point to the same
> > > underlying data).
> > >
> > > Please note that data slice and dynptr invalidations will affect all
> > > instances as well. For example, if bpf_dynptr_write() is called on an
> > > skb-type dynptr, all data slices of dynptr instances to that skb
> > > will be invalidated as well (eg data slices of any clones, parents,
> > > grandparents, ...). Another example is if a ringbuf dynptr is submitted,
> > > any instance of that dynptr will be invalidated.
> > >
> > > Changing the view of the dynptr (eg advancing the offset or
> > > trimming the size) will only affect that dynptr and not affect any
> > > other instances.
> > >
> > > One example use case where cloning may be helpful is for hashing or
> > > iterating through dynptr data. Cloning will allow the user to maintain
> > > the original view of the dynptr for future use, while also allowing
> > > views to smaller subsets of the data after the offset is advanced or the
> > > size is trimmed.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > ---
> > >  kernel/bpf/helpers.c  |  14 +++++
> > >  kernel/bpf/verifier.c | 125 +++++++++++++++++++++++++++++++++++++-----
> > >  2 files changed, 126 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index bac4c6fe49f0..108f3bcfa6da 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > > @@ -2351,6 +2351,19 @@ __bpf_kfunc __u32 bpf_dynptr_get_offset(const struct bpf_dynptr_kern *ptr)
> > >         return ptr->offset;
> > >  }
> > >
> > > +__bpf_kfunc int bpf_dynptr_clone(struct bpf_dynptr_kern *ptr,
> > > +                                struct bpf_dynptr_kern *clone__uninit)
> >
> > I think most of uses for bpf_dynptr_clone() would be to get a partial
> > view (like you mentioned above, to, e.g., do a hashing of a part of
> > the memory range). So it feels it would be best UX if clone would
> > allow you to define a new range in one go. So e.g., to create a
> > "sub-dynptr" for range of bytes [10, 30), it should be enough to:
> >
> > struct bpf_dynptr orig_ptr, new_ptr;
> >
> > bpf_dynptr_clone(&orig_ptr, 10, 30, &new_ptr);
> >
> > Instead of:
> >
> > bpf_dynptr_clone(&orig_ptr, &new_ptr);
> > bpf_dynptr_advance(&new_ptr, 10);
> > bpf_dynptr_trim(&new_ptr, bpf_dynptr_get_size(&orig_ptr) - 30);
> >
>
> I don't think we can do this because we can't have bpf_dynptr_clone()
> fail (which might happen if we take in a range, if the range is
> invalid). This is because in the case where the clone is of a
> reference-counted dynptr (eg like a ringbuf-type dynptr), the clone
> even if it's invalid must be treated by the verifier as a legit dynptr
> (since the verifier can't know ahead of time whether the clone call
> will succeed or not) which means if the invalid clone dynptr is then
> passed into a reference-releasing function, the verifier will release
> the reference but the actual reference won't be released at runtime
> (since the clone dynptr is invalid), which would leak the reference.
> An example is something like:
>
>  // invalid range is passed, error is returned and new_ptr is invalid
> bpf_dynptr_clone(&ringbuf_ptr, 9999999, 9999999, &new_ptr);
> // this releases the reference and invalidates both new_ptr and ringbuf_ptr
> bpf_ringbuf_discard_dynptr(&new_ptr, 0);
>
> At runtime, bpf_ringbuf_discard_dynptr() will be a no-op since new_ptr
> is invalid - the ringbuf record still needs to be submitted/discarded,
> but the verifier will think this already happened

Ah, tricky, good point. But ok, I guess with bpf_dynptr_adjust()
proposal in another email this would be ok:

bpf_dynptr_clone(..); /* always succeeds */
bpf_dynptr_adjust(&new_ptr, 10, 30); /* could fail to adjust, but
dynptr is left valid */

Right?

>
> >
> > This, btw, shows the awkwardness of the bpf_dynptr_trim() approach.
> >
> > If someone really wanted an exact full-sized copy, it's trivial:
> >
> > bpf_dynptr_clone(&orig_ptr, 0, bpf_dynptr_get_size(&orig_ptr), &new_ptr);
> >
> >
> > BTW, let's rename bpf_dynptr_get_size -> bpf_dynptr_size()? That
> > "get_" is a sore in the eye, IMO.
>
> Will do!
> >
> >
> > > +{
> > > +       if (!ptr->data) {
> > > +               bpf_dynptr_set_null(clone__uninit);
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       memcpy(clone__uninit, ptr, sizeof(*clone__uninit));
> >
> > why memcpy instead of `*clone__uninit = *ptr`?
> >
> No good reason :) I'll change this for v2
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  __bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
> > >  {
> > >         return obj;
> > > @@ -2429,6 +2442,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> > >  BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> > >  BTF_ID_FLAGS(func, bpf_dynptr_get_size)
> > >  BTF_ID_FLAGS(func, bpf_dynptr_get_offset)
> > > +BTF_ID_FLAGS(func, bpf_dynptr_clone)
> > >  BTF_SET8_END(common_btf_ids)
> > >
> > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 3660b573048a..804cb50050f9 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -302,6 +302,7 @@ struct bpf_kfunc_call_arg_meta {
> > >         struct {
> > >                 enum bpf_dynptr_type type;
> > >                 u32 id;
> > > +               u32 ref_obj_id;
> > >         } initialized_dynptr;
> > >         struct {
> > >                 u8 spi;
> > > @@ -963,24 +964,15 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
> > >         return 0;
> > >  }
> > >
> > > -static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > > +static void invalidate_dynptr(struct bpf_verifier_env *env, struct bpf_func_state *state, int spi)
> > >  {
> > > -       struct bpf_func_state *state = func(env, reg);
> > > -       int spi, i;
> > > -
> > > -       spi = dynptr_get_spi(env, reg);
> > > -       if (spi < 0)
> > > -               return spi;
> > > +       int i;
> > >
> > >         for (i = 0; i < BPF_REG_SIZE; i++) {
> > >                 state->stack[spi].slot_type[i] = STACK_INVALID;
> > >                 state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> > >         }
> > >
> > > -       /* Invalidate any slices associated with this dynptr */
> > > -       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type))
> > > -               WARN_ON_ONCE(release_reference(env, state->stack[spi].spilled_ptr.ref_obj_id));
> > > -
> > >         __mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
> > >         __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
> > >
> > > @@ -1007,6 +999,51 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
> > >          */
> > >         state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
> > >         state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN;
> > > +}
> > > +
> > > +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > > +{
> > > +       struct bpf_func_state *state = func(env, reg);
> > > +       int spi;
> > > +
> > > +       spi = dynptr_get_spi(env, reg);
> > > +       if (spi < 0)
> > > +               return spi;
> > > +
> > > +       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
> > > +               int ref_obj_id = state->stack[spi].spilled_ptr.ref_obj_id;
> > > +               int i;
> > > +
> > > +               /* If the dynptr has a ref_obj_id, then we need to invaldiate
> >
> > typo: invalidate
> >
> > > +                * two things:
> > > +                *
> > > +                * 1) Any dynptrs with a matching ref_obj_id (clones)
> > > +                * 2) Any slices associated with the ref_obj_id
> >
> > I think this is where this dynptr_id confusion comes from. The rule
> > should be "any slices derived from this dynptr". But as mentioned on
> > another thread, it's a separate topic which we can address later.
> >
> If there's a parent and a clone and slices derived from the parent and
> slices derived from the clone, if the clone is invalidated then we
> need to invalidate slices associated with the parent as well. So
> shouldn't it be "any slices associated with the ref obj id" not "any
> slices derived from this dynptr"? (also just a note, parent/clone
> slices will share the same ref obj id and the same dynptr id, so
> checking against either does the same thing)

So, we have a ringbuf dynptr with ref_obj_id=1, id=2, ok? We clone it,
clone gets ref_obj_id=1, id=3. If either original dynptr or clone
dynptr is released due to bpf_ringbuf_discard_dynptr(), we invalidate
all the dynptrs with ref_obj_id=1. During invalidation of each dynptr,
we invalidate all the slices with ref_obj_id==dynptr's id. So we'll
invalidate slices derived from dynptr with id=2 (original dynptr), and
then all the slices derived from dynptr with id=3?

>
> > > +                */
> > > +
> > > +               /* Invalidate any slices associated with this dynptr */
> > > +               WARN_ON_ONCE(release_reference(env, ref_obj_id));
> > > +
> > > +               /* Invalidate any dynptr clones */
> > > +               for (i = 1; i < state->allocated_stack / BPF_REG_SIZE; i++) {
> > > +                       if (state->stack[i].spilled_ptr.ref_obj_id == ref_obj_id) {
> >
> > nit: invert if condition and continue to reduce nestedness of the rest
> > the loop body
>
> Ooh good idea
> >
> > > +                               /* it should always be the case that if the ref obj id
> > > +                                * matches then the stack slot also belongs to a
> > > +                                * dynptr
> > > +                                */
> > > +                               if (state->stack[i].slot_type[0] != STACK_DYNPTR) {
> > > +                                       verbose(env, "verifier internal error: misconfigured ref_obj_id\n");
> > > +                                       return -EFAULT;
> > > +                               }
> > > +                               if (state->stack[i].spilled_ptr.dynptr.first_slot)
> > > +                                       invalidate_dynptr(env, state, i);
> > > +                       }
> > > +               }
> > > +
> > > +               return 0;
> > > +       }
> > > +
> > > +       invalidate_dynptr(env, state, spi);
> > >
> > >         return 0;
> > >  }
> > > @@ -6967,6 +7004,50 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
> > >         return 0;
> > >  }
> > >
> > > +static int handle_dynptr_clone(struct bpf_verifier_env *env, enum bpf_arg_type arg_type,
> > > +                              int regno, int insn_idx, struct bpf_kfunc_call_arg_meta *meta)
> > > +{
> > > +       struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
> > > +       struct bpf_reg_state *first_reg_state, *second_reg_state;
> > > +       struct bpf_func_state *state = func(env, reg);
> > > +       enum bpf_dynptr_type dynptr_type = meta->initialized_dynptr.type;
> > > +       int err, spi, ref_obj_id;
> > > +
> > > +       if (!dynptr_type) {
> > > +               verbose(env, "verifier internal error: no dynptr type for bpf_dynptr_clone\n");
> > > +               return -EFAULT;
> > > +       }
> > > +       arg_type |= get_dynptr_type_flag(dynptr_type);
> >
> >
> > what about MEM_RDONLY and MEM_UNINIT flags, do we need to derive and add them?
>
> I don't think we need MEM_UNINIT because we can't clone an
> uninitialized dynptr. But yes, definitely MEM_RDONLY. I missed that, I
> will add it in in v2
>
> >
> > > +
> > > +       err = process_dynptr_func(env, regno, insn_idx, arg_type);
> > > +       if (err < 0)
> > > +               return err;
> > > +
> > > +       spi = dynptr_get_spi(env, reg);
> > > +       if (spi < 0)
> > > +               return spi;
> > > +
> > > +       first_reg_state = &state->stack[spi].spilled_ptr;
> > > +       second_reg_state = &state->stack[spi - 1].spilled_ptr;
> > > +       ref_obj_id = first_reg_state->ref_obj_id;
> > > +
> > > +       /* reassign the clone the same dynptr id as the original */
> > > +       __mark_dynptr_reg(first_reg_state, dynptr_type, true, meta->initialized_dynptr.id);
> > > +       __mark_dynptr_reg(second_reg_state, dynptr_type, false, meta->initialized_dynptr.id);
> >
> > I'm not sure why clone should have the same dynptr_id? Isn't it a new
> > instance of a dynptr? I get preserving ref_obj_id (if refcounted), but
> > why reusing dynptr_id?..
> >
> I think we need to also copy over the dynptr id because in the case of
> a non-reference counted dynptr, if the parent (or clone) is
> invalidated (eg overwriting bytes of the dynptr on the stack), we must
> also invalidate the slices of the clone (or parent)

yep, right now we'll have to do that because we have dynptr_id. But if
we get rid of it and stick to ref_obj_id and id, then clone would need
to get a new id, but keep ref_obj_id, right?

> >
> > > +
> > > +       if (meta->initialized_dynptr.ref_obj_id) {
> > > +               /* release the new ref obj id assigned during process_dynptr_func */
> > > +               err = release_reference_state(cur_func(env), ref_obj_id);
> > > +               if (err)
> > > +                       return err;
> >
> > ugh... this is not good to create reference just to release. If we
> > need to reuse logic, let's reuse the logic without parts that
> > shouldn't happen. Please check if we can split process_dynptr_func()
> > appropriately to allow this.
>
> I'll change this for v2. I think the simplest approach would be having
> mark_stack_slots_dynptr() take in a boolean or something that'll
> indicate whether it should acquire a new reference state or not
> >
> > > +               /* reassign the clone the same ref obj id as the original */
> > > +               first_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
> > > +               second_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> > > +
> >
> > [...]
Joanne Koong April 19, 2023, 6:56 a.m. UTC | #5
On Mon, Apr 17, 2023 at 4:46 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Apr 13, 2023 at 11:03 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Wed, Apr 12, 2023 at 3:12 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Sat, Apr 8, 2023 at 8:34 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > The cloned dynptr will point to the same data as its parent dynptr,
> > > > with the same type, offset, size and read-only properties.
> > > >
> > > > Any writes to a dynptr will be reflected across all instances
> > > > (by 'instance', this means any dynptrs that point to the same
> > > > underlying data).
> > > >
> > > > Please note that data slice and dynptr invalidations will affect all
> > > > instances as well. For example, if bpf_dynptr_write() is called on an
> > > > skb-type dynptr, all data slices of dynptr instances to that skb
> > > > will be invalidated as well (eg data slices of any clones, parents,
> > > > grandparents, ...). Another example is if a ringbuf dynptr is submitted,
> > > > any instance of that dynptr will be invalidated.
> > > >
> > > > Changing the view of the dynptr (eg advancing the offset or
> > > > trimming the size) will only affect that dynptr and not affect any
> > > > other instances.
> > > >
> > > > One example use case where cloning may be helpful is for hashing or
> > > > iterating through dynptr data. Cloning will allow the user to maintain
> > > > the original view of the dynptr for future use, while also allowing
> > > > views to smaller subsets of the data after the offset is advanced or the
> > > > size is trimmed.
> > > >
> > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > ---
> > > >  kernel/bpf/helpers.c  |  14 +++++
> > > >  kernel/bpf/verifier.c | 125 +++++++++++++++++++++++++++++++++++++-----
> > > >  2 files changed, 126 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > index bac4c6fe49f0..108f3bcfa6da 100644
> > > > --- a/kernel/bpf/helpers.c
> > > > +++ b/kernel/bpf/helpers.c
> > > > @@ -2351,6 +2351,19 @@ __bpf_kfunc __u32 bpf_dynptr_get_offset(const struct bpf_dynptr_kern *ptr)
> > > >         return ptr->offset;
> > > >  }
> > > >
> > > > +__bpf_kfunc int bpf_dynptr_clone(struct bpf_dynptr_kern *ptr,
> > > > +                                struct bpf_dynptr_kern *clone__uninit)
> > >
> > > I think most of uses for bpf_dynptr_clone() would be to get a partial
> > > view (like you mentioned above, to, e.g., do a hashing of a part of
> > > the memory range). So it feels it would be best UX if clone would
> > > allow you to define a new range in one go. So e.g., to create a
> > > "sub-dynptr" for range of bytes [10, 30), it should be enough to:
> > >
> > > struct bpf_dynptr orig_ptr, new_ptr;
> > >
> > > bpf_dynptr_clone(&orig_ptr, 10, 30, &new_ptr);
> > >
> > > Instead of:
> > >
> > > bpf_dynptr_clone(&orig_ptr, &new_ptr);
> > > bpf_dynptr_advance(&new_ptr, 10);
> > > bpf_dynptr_trim(&new_ptr, bpf_dynptr_get_size(&orig_ptr) - 30);
> > >
> >
> > I don't think we can do this because we can't have bpf_dynptr_clone()
> > fail (which might happen if we take in a range, if the range is
> > invalid). This is because in the case where the clone is of a
> > reference-counted dynptr (eg like a ringbuf-type dynptr), the clone
> > even if it's invalid must be treated by the verifier as a legit dynptr
> > (since the verifier can't know ahead of time whether the clone call
> > will succeed or not) which means if the invalid clone dynptr is then
> > passed into a reference-releasing function, the verifier will release
> > the reference but the actual reference won't be released at runtime
> > (since the clone dynptr is invalid), which would leak the reference.
> > An example is something like:
> >
> >  // invalid range is passed, error is returned and new_ptr is invalid
> > bpf_dynptr_clone(&ringbuf_ptr, 9999999, 9999999, &new_ptr);
> > // this releases the reference and invalidates both new_ptr and ringbuf_ptr
> > bpf_ringbuf_discard_dynptr(&new_ptr, 0);
> >
> > At runtime, bpf_ringbuf_discard_dynptr() will be a no-op since new_ptr
> > is invalid - the ringbuf record still needs to be submitted/discarded,
> > but the verifier will think this already happened
>
> Ah, tricky, good point. But ok, I guess with bpf_dynptr_adjust()
> proposal in another email this would be ok:
>
> bpf_dynptr_clone(..); /* always succeeds */
> bpf_dynptr_adjust(&new_ptr, 10, 30); /* could fail to adjust, but
> dynptr is left valid */
>
> Right?

Yes, this would be okay because if bpf_dynptr_adjust fails, the clone
is valid (it was unaffected) so submitting/discarding it later on
would correctly release the reference
>
> >
> > >
> > > This, btw, shows the awkwardness of the bpf_dynptr_trim() approach.
> > >
> > > If someone really wanted an exact full-sized copy, it's trivial:
> > >
> > > bpf_dynptr_clone(&orig_ptr, 0, bpf_dynptr_get_size(&orig_ptr), &new_ptr);
> > >
> > >
[...]
> > > > +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > > > +{
> > > > +       struct bpf_func_state *state = func(env, reg);
> > > > +       int spi;
> > > > +
> > > > +       spi = dynptr_get_spi(env, reg);
> > > > +       if (spi < 0)
> > > > +               return spi;
> > > > +
> > > > +       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
> > > > +               int ref_obj_id = state->stack[spi].spilled_ptr.ref_obj_id;
> > > > +               int i;
> > > > +
> > > > +               /* If the dynptr has a ref_obj_id, then we need to invaldiate
> > >
> > > typo: invalidate
> > >
> > > > +                * two things:
> > > > +                *
> > > > +                * 1) Any dynptrs with a matching ref_obj_id (clones)
> > > > +                * 2) Any slices associated with the ref_obj_id
> > >
> > > I think this is where this dynptr_id confusion comes from. The rule
> > > should be "any slices derived from this dynptr". But as mentioned on
> > > another thread, it's a separate topic which we can address later.
> > >
> > If there's a parent and a clone and slices derived from the parent and
> > slices derived from the clone, if the clone is invalidated then we
> > need to invalidate slices associated with the parent as well. So
> > shouldn't it be "any slices associated with the ref obj id" not "any
> > slices derived from this dynptr"? (also just a note, parent/clone
> > slices will share the same ref obj id and the same dynptr id, so
> > checking against either does the same thing)
>
> So, we have a ringbuf dynptr with ref_obj_id=1, id=2, ok? We clone it,
> clone gets ref_obj_id=1, id=3. If either original dynptr or clone
> dynptr is released due to bpf_ringbuf_discard_dynptr(), we invalidate
> all the dynptrs with ref_obj_id=1. During invalidation of each dynptr,
> we invalidate all the slices with ref_obj_id==dynptr's id. So we'll
> invalidate slices derived from dynptr with id=2 (original dynptr), and
> then all the slices derived from dynptr with id=3?
>
When we create a slice for a dynptr (through bpf_dynptr_data()), the
slice's reg->ref_obj_id is set to the dynptr's ref_obj_id (not
dynptr's id). During invalidation of the dynptr, we invalidate all the
slices with that ref obj id, which means we invalidate all slices for
any parents/clones.
[...]
> > > > +
> > > > +       err = process_dynptr_func(env, regno, insn_idx, arg_type);
> > > > +       if (err < 0)
> > > > +               return err;
> > > > +
> > > > +       spi = dynptr_get_spi(env, reg);
> > > > +       if (spi < 0)
> > > > +               return spi;
> > > > +
> > > > +       first_reg_state = &state->stack[spi].spilled_ptr;
> > > > +       second_reg_state = &state->stack[spi - 1].spilled_ptr;
> > > > +       ref_obj_id = first_reg_state->ref_obj_id;
> > > > +
> > > > +       /* reassign the clone the same dynptr id as the original */
> > > > +       __mark_dynptr_reg(first_reg_state, dynptr_type, true, meta->initialized_dynptr.id);
> > > > +       __mark_dynptr_reg(second_reg_state, dynptr_type, false, meta->initialized_dynptr.id);
> > >
> > > I'm not sure why clone should have the same dynptr_id? Isn't it a new
> > > instance of a dynptr? I get preserving ref_obj_id (if refcounted), but
> > > why reusing dynptr_id?..
> > >
> > I think we need to also copy over the dynptr id because in the case of
> > a non-reference counted dynptr, if the parent (or clone) is
> > invalidated (eg overwriting bytes of the dynptr on the stack), we must
> > also invalidate the slices of the clone (or parent)
>
> yep, right now we'll have to do that because we have dynptr_id. But if
> we get rid of it and stick to ref_obj_id and id, then clone would need
> to get a new id, but keep ref_obj_id, right?
>
Thinking about this some more, I think you're right that we shouldn't
reuse the dynptr id. in fact, i think reusing it would lead to
incorrect behavior - in the example of a non-reference counted dynptr,
if the parent dynptr is overwritten on the stack (and thus
invalidated), that shouldn't invalidate the slices of the clone at
all. I'll change this in the next version
> > >
[...]
Andrii Nakryiko April 19, 2023, 4:34 p.m. UTC | #6
On Tue, Apr 18, 2023 at 11:56 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Mon, Apr 17, 2023 at 4:46 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Apr 13, 2023 at 11:03 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Wed, Apr 12, 2023 at 3:12 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Sat, Apr 8, 2023 at 8:34 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > The cloned dynptr will point to the same data as its parent dynptr,
> > > > > with the same type, offset, size and read-only properties.
> > > > >
> > > > > Any writes to a dynptr will be reflected across all instances
> > > > > (by 'instance', this means any dynptrs that point to the same
> > > > > underlying data).
> > > > >
> > > > > Please note that data slice and dynptr invalidations will affect all
> > > > > instances as well. For example, if bpf_dynptr_write() is called on an
> > > > > skb-type dynptr, all data slices of dynptr instances to that skb
> > > > > will be invalidated as well (eg data slices of any clones, parents,
> > > > > grandparents, ...). Another example is if a ringbuf dynptr is submitted,
> > > > > any instance of that dynptr will be invalidated.
> > > > >
> > > > > Changing the view of the dynptr (eg advancing the offset or
> > > > > trimming the size) will only affect that dynptr and not affect any
> > > > > other instances.
> > > > >
> > > > > One example use case where cloning may be helpful is for hashing or
> > > > > iterating through dynptr data. Cloning will allow the user to maintain
> > > > > the original view of the dynptr for future use, while also allowing
> > > > > views to smaller subsets of the data after the offset is advanced or the
> > > > > size is trimmed.
> > > > >
> > > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > > ---
> > > > >  kernel/bpf/helpers.c  |  14 +++++
> > > > >  kernel/bpf/verifier.c | 125 +++++++++++++++++++++++++++++++++++++-----
> > > > >  2 files changed, 126 insertions(+), 13 deletions(-)
> > > > >
> > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > index bac4c6fe49f0..108f3bcfa6da 100644
> > > > > --- a/kernel/bpf/helpers.c
> > > > > +++ b/kernel/bpf/helpers.c
> > > > > @@ -2351,6 +2351,19 @@ __bpf_kfunc __u32 bpf_dynptr_get_offset(const struct bpf_dynptr_kern *ptr)
> > > > >         return ptr->offset;
> > > > >  }
> > > > >
> > > > > +__bpf_kfunc int bpf_dynptr_clone(struct bpf_dynptr_kern *ptr,
> > > > > +                                struct bpf_dynptr_kern *clone__uninit)
> > > >
> > > > I think most of uses for bpf_dynptr_clone() would be to get a partial
> > > > view (like you mentioned above, to, e.g., do a hashing of a part of
> > > > the memory range). So it feels it would be best UX if clone would
> > > > allow you to define a new range in one go. So e.g., to create a
> > > > "sub-dynptr" for range of bytes [10, 30), it should be enough to:
> > > >
> > > > struct bpf_dynptr orig_ptr, new_ptr;
> > > >
> > > > bpf_dynptr_clone(&orig_ptr, 10, 30, &new_ptr);
> > > >
> > > > Instead of:
> > > >
> > > > bpf_dynptr_clone(&orig_ptr, &new_ptr);
> > > > bpf_dynptr_advance(&new_ptr, 10);
> > > > bpf_dynptr_trim(&new_ptr, bpf_dynptr_get_size(&orig_ptr) - 30);
> > > >
> > >
> > > I don't think we can do this because we can't have bpf_dynptr_clone()
> > > fail (which might happen if we take in a range, if the range is
> > > invalid). This is because in the case where the clone is of a
> > > reference-counted dynptr (eg like a ringbuf-type dynptr), the clone
> > > even if it's invalid must be treated by the verifier as a legit dynptr
> > > (since the verifier can't know ahead of time whether the clone call
> > > will succeed or not) which means if the invalid clone dynptr is then
> > > passed into a reference-releasing function, the verifier will release
> > > the reference but the actual reference won't be released at runtime
> > > (since the clone dynptr is invalid), which would leak the reference.
> > > An example is something like:
> > >
> > >  // invalid range is passed, error is returned and new_ptr is invalid
> > > bpf_dynptr_clone(&ringbuf_ptr, 9999999, 9999999, &new_ptr);
> > > // this releases the reference and invalidates both new_ptr and ringbuf_ptr
> > > bpf_ringbuf_discard_dynptr(&new_ptr, 0);
> > >
> > > At runtime, bpf_ringbuf_discard_dynptr() will be a no-op since new_ptr
> > > is invalid - the ringbuf record still needs to be submitted/discarded,
> > > but the verifier will think this already happened
> >
> > Ah, tricky, good point. But ok, I guess with bpf_dynptr_adjust()
> > proposal in another email this would be ok:
> >
> > bpf_dynptr_clone(..); /* always succeeds */
> > bpf_dynptr_adjust(&new_ptr, 10, 30); /* could fail to adjust, but
> > dynptr is left valid */
> >
> > Right?
>
> Yes, this would be okay because if bpf_dynptr_adjust fails, the clone
> is valid (it was unaffected) so submitting/discarding it later on
> would correctly release the reference

ok, cool, let's do that and keep things simple

> >
> > >
> > > >
> > > > This, btw, shows the awkwardness of the bpf_dynptr_trim() approach.
> > > >
> > > > If someone really wanted an exact full-sized copy, it's trivial:
> > > >
> > > > bpf_dynptr_clone(&orig_ptr, 0, bpf_dynptr_get_size(&orig_ptr), &new_ptr);
> > > >
> > > >
> [...]
> > > > > +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > > > > +{
> > > > > +       struct bpf_func_state *state = func(env, reg);
> > > > > +       int spi;
> > > > > +
> > > > > +       spi = dynptr_get_spi(env, reg);
> > > > > +       if (spi < 0)
> > > > > +               return spi;
> > > > > +
> > > > > +       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
> > > > > +               int ref_obj_id = state->stack[spi].spilled_ptr.ref_obj_id;
> > > > > +               int i;
> > > > > +
> > > > > +               /* If the dynptr has a ref_obj_id, then we need to invaldiate
> > > >
> > > > typo: invalidate
> > > >
> > > > > +                * two things:
> > > > > +                *
> > > > > +                * 1) Any dynptrs with a matching ref_obj_id (clones)
> > > > > +                * 2) Any slices associated with the ref_obj_id
> > > >
> > > > I think this is where this dynptr_id confusion comes from. The rule
> > > > should be "any slices derived from this dynptr". But as mentioned on
> > > > another thread, it's a separate topic which we can address later.
> > > >
> > > If there's a parent and a clone and slices derived from the parent and
> > > slices derived from the clone, if the clone is invalidated then we
> > > need to invalidate slices associated with the parent as well. So
> > > shouldn't it be "any slices associated with the ref obj id" not "any
> > > slices derived from this dynptr"? (also just a note, parent/clone
> > > slices will share the same ref obj id and the same dynptr id, so
> > > checking against either does the same thing)
> >
> > So, we have a ringbuf dynptr with ref_obj_id=1, id=2, ok? We clone it,
> > clone gets ref_obj_id=1, id=3. If either original dynptr or clone
> > dynptr is released due to bpf_ringbuf_discard_dynptr(), we invalidate
> > all the dynptrs with ref_obj_id=1. During invalidation of each dynptr,
> > we invalidate all the slices with ref_obj_id==dynptr's id. So we'll
> > invalidate slices derived from dynptr with id=2 (original dynptr), and
> > then all the slices derived from dynptr with id=3?
> >
> When we create a slice for a dynptr (through bpf_dynptr_data()), the
> slice's reg->ref_obj_id is set to the dynptr's ref_obj_id (not
> dynptr's id). During invalidation of the dynptr, we invalidate all the
> slices with that ref obj id, which means we invalidate all slices for
> any parents/clones.
> [...]

Yep, because of how we define that ref_obj_id should be in refs array.
What I'm saying is that we should probably change that to be more
general "ID of an object which lifetime we are associated with". That
would need some verifier internals adjustments.

So let's wrap up this particular ref_obj_id revamp discussion for now,
it's a separate topic that will be just distracting us.

> > > > > +
> > > > > +       err = process_dynptr_func(env, regno, insn_idx, arg_type);
> > > > > +       if (err < 0)
> > > > > +               return err;
> > > > > +
> > > > > +       spi = dynptr_get_spi(env, reg);
> > > > > +       if (spi < 0)
> > > > > +               return spi;
> > > > > +
> > > > > +       first_reg_state = &state->stack[spi].spilled_ptr;
> > > > > +       second_reg_state = &state->stack[spi - 1].spilled_ptr;
> > > > > +       ref_obj_id = first_reg_state->ref_obj_id;
> > > > > +
> > > > > +       /* reassign the clone the same dynptr id as the original */
> > > > > +       __mark_dynptr_reg(first_reg_state, dynptr_type, true, meta->initialized_dynptr.id);
> > > > > +       __mark_dynptr_reg(second_reg_state, dynptr_type, false, meta->initialized_dynptr.id);
> > > >
> > > > I'm not sure why clone should have the same dynptr_id? Isn't it a new
> > > > instance of a dynptr? I get preserving ref_obj_id (if refcounted), but
> > > > why reusing dynptr_id?..
> > > >
> > > I think we need to also copy over the dynptr id because in the case of
> > > a non-reference counted dynptr, if the parent (or clone) is
> > > invalidated (eg overwriting bytes of the dynptr on the stack), we must
> > > also invalidate the slices of the clone (or parent)
> >
> > yep, right now we'll have to do that because we have dynptr_id. But if
> > we get rid of it and stick to ref_obj_id and id, then clone would need
> > to get a new id, but keep ref_obj_id, right?
> >
> Thinking about this some more, I think you're right that we shouldn't
> reuse the dynptr id. in fact, i think reusing it would lead to
> incorrect behavior - in the example of a non-reference counted dynptr,
> if the parent dynptr is overwritten on the stack (and thus
> invalidated), that shouldn't invalidate the slices of the clone at
> all. I'll change this in the next version

ok

> > > >
> [...]
diff mbox series

Patch

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index bac4c6fe49f0..108f3bcfa6da 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2351,6 +2351,19 @@  __bpf_kfunc __u32 bpf_dynptr_get_offset(const struct bpf_dynptr_kern *ptr)
 	return ptr->offset;
 }
 
+__bpf_kfunc int bpf_dynptr_clone(struct bpf_dynptr_kern *ptr,
+				 struct bpf_dynptr_kern *clone__uninit)
+{
+	if (!ptr->data) {
+		bpf_dynptr_set_null(clone__uninit);
+		return -EINVAL;
+	}
+
+	memcpy(clone__uninit, ptr, sizeof(*clone__uninit));
+
+	return 0;
+}
+
 __bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
 {
 	return obj;
@@ -2429,6 +2442,7 @@  BTF_ID_FLAGS(func, bpf_dynptr_is_null)
 BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
 BTF_ID_FLAGS(func, bpf_dynptr_get_size)
 BTF_ID_FLAGS(func, bpf_dynptr_get_offset)
+BTF_ID_FLAGS(func, bpf_dynptr_clone)
 BTF_SET8_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3660b573048a..804cb50050f9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -302,6 +302,7 @@  struct bpf_kfunc_call_arg_meta {
 	struct {
 		enum bpf_dynptr_type type;
 		u32 id;
+		u32 ref_obj_id;
 	} initialized_dynptr;
 	struct {
 		u8 spi;
@@ -963,24 +964,15 @@  static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
 	return 0;
 }
 
-static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+static void invalidate_dynptr(struct bpf_verifier_env *env, struct bpf_func_state *state, int spi)
 {
-	struct bpf_func_state *state = func(env, reg);
-	int spi, i;
-
-	spi = dynptr_get_spi(env, reg);
-	if (spi < 0)
-		return spi;
+	int i;
 
 	for (i = 0; i < BPF_REG_SIZE; i++) {
 		state->stack[spi].slot_type[i] = STACK_INVALID;
 		state->stack[spi - 1].slot_type[i] = STACK_INVALID;
 	}
 
-	/* Invalidate any slices associated with this dynptr */
-	if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type))
-		WARN_ON_ONCE(release_reference(env, state->stack[spi].spilled_ptr.ref_obj_id));
-
 	__mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
 	__mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
 
@@ -1007,6 +999,51 @@  static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
 	 */
 	state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
 	state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN;
+}
+
+static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi;
+
+	spi = dynptr_get_spi(env, reg);
+	if (spi < 0)
+		return spi;
+
+	if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
+		int ref_obj_id = state->stack[spi].spilled_ptr.ref_obj_id;
+		int i;
+
+		/* If the dynptr has a ref_obj_id, then we need to invaldiate
+		 * two things:
+		 *
+		 * 1) Any dynptrs with a matching ref_obj_id (clones)
+		 * 2) Any slices associated with the ref_obj_id
+		 */
+
+		/* Invalidate any slices associated with this dynptr */
+		WARN_ON_ONCE(release_reference(env, ref_obj_id));
+
+		/* Invalidate any dynptr clones */
+		for (i = 1; i < state->allocated_stack / BPF_REG_SIZE; i++) {
+			if (state->stack[i].spilled_ptr.ref_obj_id == ref_obj_id) {
+				/* it should always be the case that if the ref obj id
+				 * matches then the stack slot also belongs to a
+				 * dynptr
+				 */
+				if (state->stack[i].slot_type[0] != STACK_DYNPTR) {
+					verbose(env, "verifier internal error: misconfigured ref_obj_id\n");
+					return -EFAULT;
+				}
+				if (state->stack[i].spilled_ptr.dynptr.first_slot)
+					invalidate_dynptr(env, state, i);
+			}
+		}
+
+		return 0;
+	}
+
+	invalidate_dynptr(env, state, spi);
 
 	return 0;
 }
@@ -6967,6 +7004,50 @@  static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
 	return 0;
 }
 
+static int handle_dynptr_clone(struct bpf_verifier_env *env, enum bpf_arg_type arg_type,
+			       int regno, int insn_idx, struct bpf_kfunc_call_arg_meta *meta)
+{
+	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	struct bpf_reg_state *first_reg_state, *second_reg_state;
+	struct bpf_func_state *state = func(env, reg);
+	enum bpf_dynptr_type dynptr_type = meta->initialized_dynptr.type;
+	int err, spi, ref_obj_id;
+
+	if (!dynptr_type) {
+		verbose(env, "verifier internal error: no dynptr type for bpf_dynptr_clone\n");
+		return -EFAULT;
+	}
+	arg_type |= get_dynptr_type_flag(dynptr_type);
+
+	err = process_dynptr_func(env, regno, insn_idx, arg_type);
+	if (err < 0)
+		return err;
+
+	spi = dynptr_get_spi(env, reg);
+	if (spi < 0)
+		return spi;
+
+	first_reg_state = &state->stack[spi].spilled_ptr;
+	second_reg_state = &state->stack[spi - 1].spilled_ptr;
+	ref_obj_id = first_reg_state->ref_obj_id;
+
+	/* reassign the clone the same dynptr id as the original */
+	__mark_dynptr_reg(first_reg_state, dynptr_type, true, meta->initialized_dynptr.id);
+	__mark_dynptr_reg(second_reg_state, dynptr_type, false, meta->initialized_dynptr.id);
+
+	if (meta->initialized_dynptr.ref_obj_id) {
+		/* release the new ref obj id assigned during process_dynptr_func */
+		err = release_reference_state(cur_func(env), ref_obj_id);
+		if (err)
+			return err;
+		/* reassign the clone the same ref obj id as the original */
+		first_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
+		second_reg_state->ref_obj_id = meta->initialized_dynptr.ref_obj_id;
+	}
+
+	return 0;
+}
+
 static bool arg_type_is_mem_size(enum bpf_arg_type type)
 {
 	return type == ARG_CONST_SIZE ||
@@ -9615,6 +9696,7 @@  enum special_kfunc_type {
 	KF_bpf_dynptr_from_xdp,
 	KF_bpf_dynptr_slice,
 	KF_bpf_dynptr_slice_rdwr,
+	KF_bpf_dynptr_clone,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -9633,6 +9715,7 @@  BTF_ID(func, bpf_dynptr_from_skb)
 BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
+BTF_ID(func, bpf_dynptr_clone)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
@@ -9653,6 +9736,7 @@  BTF_ID(func, bpf_dynptr_from_skb)
 BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
+BTF_ID(func, bpf_dynptr_clone)
 
 static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -10414,10 +10498,24 @@  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (is_kfunc_arg_uninit(btf, &args[i]))
 				dynptr_arg_type |= MEM_UNINIT;
 
-			if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb])
+			if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb]) {
 				dynptr_arg_type |= DYNPTR_TYPE_SKB;
-			else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_xdp])
+			} else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_xdp]) {
 				dynptr_arg_type |= DYNPTR_TYPE_XDP;
+			} else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_clone] &&
+				   (dynptr_arg_type & MEM_UNINIT)) {
+				/* bpf_dynptr_clone is special.
+				 *
+				 * we need to assign the clone the same dynptr type and
+				 * the clone needs to have the same id and ref_obj_id as
+				 * the original dynptr
+				 */
+				ret = handle_dynptr_clone(env, dynptr_arg_type, regno, insn_idx, meta);
+				if (ret < 0)
+					return ret;
+
+				break;
+			}
 
 			ret = process_dynptr_func(env, regno, insn_idx, dynptr_arg_type);
 			if (ret < 0)
@@ -10432,6 +10530,7 @@  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				}
 				meta->initialized_dynptr.id = id;
 				meta->initialized_dynptr.type = dynptr_get_type(env, reg);
+				meta->initialized_dynptr.ref_obj_id = dynptr_ref_obj_id(env, reg);
 			}
 
 			break;