Message ID | 20221217082506.1570898-3-davemarchevsky@fb.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | BPF rbtree next-gen datastructure | expand |
On 12/17/22 3:24 AM, Dave Marchevsky wrote: > This patch introduces non-owning reference semantics to the verifier, > specifically linked_list API kfunc handling. release_on_unlock logic for > refs is refactored - with small functional changes - to implement these > semantics, and bpf_list_push_{front,back} are migrated to use them. > > When a list node is pushed to a list, the program still has a pointer to > the node: > > n = bpf_obj_new(typeof(*n)); > > bpf_spin_lock(&l); > bpf_list_push_back(&l, n); > /* n still points to the just-added node */ > bpf_spin_unlock(&l); > > What the verifier considers n to be after the push, and thus what can be > done with n, are changed by this patch. > > Common properties both before/after this patch: > * After push, n is only a valid reference to the node until end of > critical section > * After push, n cannot be pushed to any list > * After push, the program can read the node's fields using n > > Before: > * After push, n retains the ref_obj_id which it received on > bpf_obj_new, but the associated bpf_reference_state's > release_on_unlock field is set to true > * release_on_unlock field and associated logic is used to implement > "n is only a valid ref until end of critical section" > * After push, n cannot be written to, the node must be removed from > the list before writing to its fields > * After push, n is marked PTR_UNTRUSTED > > After: > * After push, n's ref is released and ref_obj_id set to 0. The > bpf_reg_state's non_owning_ref_lock struct is populated with the > currently active lock > * non_owning_ref_lock and logic is used to implement "n is only a > valid ref until end of critical section" > * n can be written to (except for special fields e.g. bpf_list_node, > timer, ...) > * No special type flag is added to n after push > > Summary of specific implementation changes to achieve the above: > > * release_on_unlock field, ref_set_release_on_unlock helper, and logic > to "release on unlock" based on that field are removed > > * The anonymous active_lock struct used by bpf_verifier_state is > pulled out into a named struct bpf_active_lock. > > * A non_owning_ref_lock field of type bpf_active_lock is added to > bpf_reg_state's PTR_TO_BTF_ID union > > * Helpers are added to use non_owning_ref_lock to implement non-owning > ref semantics as described above > * invalidate_non_owning_refs - helper to clobber all non-owning refs > matching a particular bpf_active_lock identity. Replaces > release_on_unlock logic in process_spin_lock. > * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based > on current verifier state > * ref_convert_owning_non_owning - convert owning reference w/ > specified ref_obj_id to non-owning references. Setup > non_owning_ref_lock for each reg with that ref_obj_id and 0 out > its ref_obj_id > > * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with > KF_RELEASE to indicate that the release arg reg should be converted > to non-owning ref > * Plain KF_RELEASE would clobber all regs with ref_obj_id matching > the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first - > doing ref_convert_owning_non_owning on the ref first, which > prevents the regs from being clobbered by 0ing out their > ref_obj_ids. The bpf_reference_state itself is still released via > release_reference as a result of the KF_RELEASE flag. > * KF_RELEASE | KF_RELEASE_NON_OWN are added to > bpf_list_push_{front,back} > > After these changes, linked_list's "release on unlock" logic continues > to function as before, except for the semantic differences noted above. > The patch immediately following this one makes minor changes to > linked_list selftests to account for the differing behavior. > > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> > --- > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > index 53d175cbaa02..cb417ffbbb84 100644 > --- a/include/linux/bpf_verifier.h > +++ b/include/linux/bpf_verifier.h > @@ -43,6 +43,22 @@ enum bpf_reg_liveness { > REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */ > }; [...] > struct bpf_reg_state { > /* Ordering of fields matters. See states_equal() */ > enum bpf_reg_type type; > @@ -68,6 +84,7 @@ struct bpf_reg_state { > struct { > struct btf *btf; > u32 btf_id; > + struct bpf_active_lock non_owning_ref_lock; > }; > I think it's possible for this to be a pointer by just pointing to struct bpf_verifier_state's active_lock. Why? * There can currently only be one active_lock at a time * non-owning refs are only valid in the critical section So if a verifier_state has an active_lock, any non-owning ref must've been obtained under that lock, and any non-owning ref not obtained under that lock must have been invalidated previously. This will keep bpf_reg_state size down. Will give it a shot for v3, wanted to leave it in current state for v2 so logic in this patch is easier to reason about. Actually, if above logic is correct, then only valid states for non_owning_ref_lock are "empty / null" and "same as current verifier_state", in which case this can go back to being a bool. But for non-spin_unlock invalidation points (e.g. rbtree_remove), we may want to keep additional info around to avoid invalidating everything, which would require re-introducing a non_owning_ref identity. > u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */ > @@ -223,11 +240,6 @@ struct bpf_reference_state { > * exiting a callback function. > */ > int callback_ref; > - /* Mark the reference state to release the registers sharing the same id > - * on bpf_spin_unlock (for nodes that we will lose ownership to but are > - * safe to access inside the critical section). > - */ > - bool release_on_unlock; > }; > > /* state of the program: > @@ -328,21 +340,8 @@ struct bpf_verifier_state { > u32 branches; > u32 insn_idx; > u32 curframe; > - /* For every reg representing a map value or allocated object pointer, > - * we consider the tuple of (ptr, id) for them to be unique in verifier > - * context and conside them to not alias each other for the purposes of > - * tracking lock state. > - */ > - struct { > - /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, > - * there's no active lock held, and other fields have no > - * meaning. If non-NULL, it indicates that a lock is held and > - * id member has the reg->id of the register which can be >= 0. > - */ > - void *ptr; > - /* This will be reg->id */ > - u32 id; > - } active_lock; > + > + struct bpf_active_lock active_lock; > bool speculative; > bool active_rcu_lock; [...]
Hi Dave, url: https://github.com/intel-lab-lkp/linux/commits/Dave-Marchevsky/BPF-rbtree-next-gen-datastructure/20221217-162646 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20221217082506.1570898-3-davemarchevsky%40fb.com patch subject: [PATCH v2 bpf-next 02/13] bpf: Migrate release_on_unlock logic to non-owning ref semantics config: x86_64-randconfig-m001 compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Reported-by: Dan Carpenter <error27@gmail.com> smatch warnings: kernel/bpf/verifier.c:6275 reg_find_field_offset() warn: variable dereferenced before check 'reg' (see line 6274) vim +/reg +6275 kernel/bpf/verifier.c 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6268 static struct btf_field * 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6269 reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields) 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6270 { 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6271 struct btf_field *field; 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6272 struct btf_record *rec; 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6273 4ed17b8d6842ba Dave Marchevsky 2022-12-17 @6274 rec = reg_btf_record(reg); 4ed17b8d6842ba Dave Marchevsky 2022-12-17 @6275 if (!reg) Is this supposed to test rec instead of reg? 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6276 return NULL; 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6277 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6278 field = btf_record_find(rec, off, fields); 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6279 if (!field) 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6280 return NULL; 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6281 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6282 return field; 4ed17b8d6842ba Dave Marchevsky 2022-12-17 6283 }
On Sat, Dec 17, 2022 at 12:24:55AM -0800, Dave Marchevsky wrote: > This patch introduces non-owning reference semantics to the verifier, > specifically linked_list API kfunc handling. release_on_unlock logic for > refs is refactored - with small functional changes - to implement these > semantics, and bpf_list_push_{front,back} are migrated to use them. > > When a list node is pushed to a list, the program still has a pointer to > the node: > > n = bpf_obj_new(typeof(*n)); > > bpf_spin_lock(&l); > bpf_list_push_back(&l, n); > /* n still points to the just-added node */ > bpf_spin_unlock(&l); > > What the verifier considers n to be after the push, and thus what can be > done with n, are changed by this patch. > > Common properties both before/after this patch: > * After push, n is only a valid reference to the node until end of > critical section > * After push, n cannot be pushed to any list > * After push, the program can read the node's fields using n > > Before: > * After push, n retains the ref_obj_id which it received on > bpf_obj_new, but the associated bpf_reference_state's > release_on_unlock field is set to true > * release_on_unlock field and associated logic is used to implement > "n is only a valid ref until end of critical section" > * After push, n cannot be written to, the node must be removed from > the list before writing to its fields > * After push, n is marked PTR_UNTRUSTED > > After: > * After push, n's ref is released and ref_obj_id set to 0. The > bpf_reg_state's non_owning_ref_lock struct is populated with the > currently active lock > * non_owning_ref_lock and logic is used to implement "n is only a > valid ref until end of critical section" > * n can be written to (except for special fields e.g. bpf_list_node, > timer, ...) > * No special type flag is added to n after push > > Summary of specific implementation changes to achieve the above: > > * release_on_unlock field, ref_set_release_on_unlock helper, and logic > to "release on unlock" based on that field are removed > > * The anonymous active_lock struct used by bpf_verifier_state is > pulled out into a named struct bpf_active_lock. > > * A non_owning_ref_lock field of type bpf_active_lock is added to > bpf_reg_state's PTR_TO_BTF_ID union > > * Helpers are added to use non_owning_ref_lock to implement non-owning > ref semantics as described above > * invalidate_non_owning_refs - helper to clobber all non-owning refs > matching a particular bpf_active_lock identity. Replaces > release_on_unlock logic in process_spin_lock. > * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based > on current verifier state > * ref_convert_owning_non_owning - convert owning reference w/ > specified ref_obj_id to non-owning references. Setup > non_owning_ref_lock for each reg with that ref_obj_id and 0 out > its ref_obj_id > > * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with > KF_RELEASE to indicate that the release arg reg should be converted > to non-owning ref > * Plain KF_RELEASE would clobber all regs with ref_obj_id matching > the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first - > doing ref_convert_owning_non_owning on the ref first, which > prevents the regs from being clobbered by 0ing out their > ref_obj_ids. The bpf_reference_state itself is still released via > release_reference as a result of the KF_RELEASE flag. > * KF_RELEASE | KF_RELEASE_NON_OWN are added to > bpf_list_push_{front,back} > > After these changes, linked_list's "release on unlock" logic continues > to function as before, except for the semantic differences noted above. > The patch immediately following this one makes minor changes to > linked_list selftests to account for the differing behavior. > > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Hey Dave, I'm sorry to be chiming in a bit late in the game here, but I only finally had the time to fully review some of this stuff during the holiday-lull, and I have a few questions / concerns about the whole owning vs. non-owning refcount approach we're taking here. > --- > include/linux/bpf.h | 1 + > include/linux/bpf_verifier.h | 39 ++++----- > include/linux/btf.h | 17 ++-- > kernel/bpf/helpers.c | 4 +- > kernel/bpf/verifier.c | 164 ++++++++++++++++++++++++----------- > 5 files changed, 146 insertions(+), 79 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 3de24cfb7a3d..f71571bf6adc 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -180,6 +180,7 @@ enum btf_field_type { > BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF, > BPF_LIST_HEAD = (1 << 4), > BPF_LIST_NODE = (1 << 5), > + BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD, > }; > > struct btf_field_kptr { > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > index 53d175cbaa02..cb417ffbbb84 100644 > --- a/include/linux/bpf_verifier.h > +++ b/include/linux/bpf_verifier.h > @@ -43,6 +43,22 @@ enum bpf_reg_liveness { > REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */ > }; > > +/* For every reg representing a map value or allocated object pointer, > + * we consider the tuple of (ptr, id) for them to be unique in verifier > + * context and conside them to not alias each other for the purposes of > + * tracking lock state. > + */ > +struct bpf_active_lock { > + /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, > + * there's no active lock held, and other fields have no > + * meaning. If non-NULL, it indicates that a lock is held and > + * id member has the reg->id of the register which can be >= 0. > + */ > + void *ptr; > + /* This will be reg->id */ > + u32 id; > +}; > + > struct bpf_reg_state { > /* Ordering of fields matters. See states_equal() */ > enum bpf_reg_type type; > @@ -68,6 +84,7 @@ struct bpf_reg_state { > struct { > struct btf *btf; > u32 btf_id; > + struct bpf_active_lock non_owning_ref_lock; > }; > > u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */ > @@ -223,11 +240,6 @@ struct bpf_reference_state { > * exiting a callback function. > */ > int callback_ref; > - /* Mark the reference state to release the registers sharing the same id > - * on bpf_spin_unlock (for nodes that we will lose ownership to but are > - * safe to access inside the critical section). > - */ > - bool release_on_unlock; > }; > > /* state of the program: > @@ -328,21 +340,8 @@ struct bpf_verifier_state { > u32 branches; > u32 insn_idx; > u32 curframe; > - /* For every reg representing a map value or allocated object pointer, > - * we consider the tuple of (ptr, id) for them to be unique in verifier > - * context and conside them to not alias each other for the purposes of > - * tracking lock state. > - */ > - struct { > - /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, > - * there's no active lock held, and other fields have no > - * meaning. If non-NULL, it indicates that a lock is held and > - * id member has the reg->id of the register which can be >= 0. > - */ > - void *ptr; > - /* This will be reg->id */ > - u32 id; > - } active_lock; > + > + struct bpf_active_lock active_lock; > bool speculative; > bool active_rcu_lock; > > diff --git a/include/linux/btf.h b/include/linux/btf.h > index 5f628f323442..8aee3f7f4248 100644 > --- a/include/linux/btf.h > +++ b/include/linux/btf.h > @@ -15,10 +15,10 @@ > #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) > > /* These need to be macros, as the expressions are used in assembler input */ > -#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ > -#define KF_RELEASE (1 << 1) /* kfunc is a release function */ > -#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ > -#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ > +#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ > +#define KF_RELEASE (1 << 1) /* kfunc is a release function */ > +#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ > +#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ > /* Trusted arguments are those which are guaranteed to be valid when passed to > * the kfunc. It is used to enforce that pointers obtained from either acquire > * kfuncs, or from the main kernel on a tracepoint or struct_ops callback > @@ -67,10 +67,11 @@ > * return 0; > * } > */ > -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ > -#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ > -#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ > -#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ > +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ > +#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ > +#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ > +#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ > +#define KF_RELEASE_NON_OWN (1 << 8) /* kfunc converts its referenced arg into non-owning ref */ It would be nice if we could come up with new kfunc flag names that don't have 'RELEASE' in it. As is this is arguably a bit of a leaky abstraction given that kfunc authors now have to understand a notion of "releasing", "releasing but keeping a non-owning ref", and "releasing but it must be a non-owning reference". I know that in [0] you mention that the notions of owning and non-owning references are entirely relegated to graph-type maps, but I disagree. More below. [0]: https://lore.kernel.org/all/20221217082506.1570898-14-davemarchevsky@fb.com/ In general, IMO this muddies the existing, crystal-clear semantics of BPF object ownership and refcounting. Usually a "weak" or "non-owning" reference is a shadow of a strong reference, and "using" the weak reference requires attempting (because it could fail) to temporarily promote it to a strong reference. If successful, the object's existence is guaranteed until the weak pointer is demoted back to a weak pointer and/or the promoted strong pointer is released, and it's perfectly valid for an object's lifetime to be extended due to a promoted weak pointer not dropping its reference until after all the other strong pointer references have been dropped. The key point here is that a pointer's safety is entirely dictated by whether or not the holder has or is able to acquire a strong reference, and nothing more. In contrast, if I understand correctly, in this proposal a "non-owning" reference means that the object is guaranteed to be valid due to external factors such as a lock being held on the root node of the graph, and is used to e.g. signal whether an object has or has not yet been added as a node to an rbtree or a list. If so, IMO these are completely separate concepts from refcounting, and I don't think we should intertwine it with the acquire / release semantics that we currently use for ensuring object lifetime. Note that weak references are usually (if not always, at least in my experience) used to resolve circular dependencies where the reference would always be leaked if both sides had a strong reference. I don't think that applies here, where instead we're using "owning reference" to mean that ownership of the object has not yet been passed to a graph-type data structure, and "non-owning reference" to mean that the graph now owns the strong reference, but it's still safe to reference the object due to it being protected by some external synchronization mechanism like a lock. There's no danger of a circular dependency here, we just want to provide consistent API semantics. If we want to encapsulate notions of "safe due to a lock being held on a root node", and "pointer hasn't yet been inserted into the graph", I think we should consider adding some entirely separate abstractions. For example, something like PTR_GRAPH_STORED on the register type-modifier side for signaling whether a pointer has already been stored in a graph, and KF_GRAPH_INSERT, KF_GRAPH_REMOVE type kfunc flags for adding and removing from graphs respectively. I don't think we'd have to add anything at all for ensuring pointer safety from the lock being held, as the verifier should be able to figure out that a pointer that was inserted with KF_GRAPH_INSERT is safe to reference inside of the locked region of the lock associated with the root node. The refcnt of the object isn't relevant at all, it's the association of the root node with a specific lock. > > /* > * Return the name of the passed struct, if exists, or halt the build if for > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index af30c6cbd65d..e041409779c3 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) > #endif > BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL) > BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE) > -BTF_ID_FLAGS(func, bpf_list_push_front) > -BTF_ID_FLAGS(func, bpf_list_push_back) > +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN) > +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN) I don't think a helper should specify both of these flags together. IIUC, what this is saying is something along the lines of, "Release the reference, but rather than actually releasing it, just keep it and convert it into a non-owning reference". IMO KF_RELEASE should always mean, exclusively, "I'm releasing a previously-acquired strong reference to an object", and the expectation should be that the object cannot be referenced _at all_ afterwards, unless you happen to have another strong reference. IMO this is another sign that we should consider going in a different direction for owning vs. non-owning references. I don't think this makes sense from an object-refcounting perspective, but I readily admit that I could be missing a lot of important context here. [...] Thanks, David
On Sat, Dec 17, 2022 at 12:24:55AM -0800, Dave Marchevsky wrote: > This patch introduces non-owning reference semantics to the verifier, > specifically linked_list API kfunc handling. release_on_unlock logic for > refs is refactored - with small functional changes - to implement these > semantics, and bpf_list_push_{front,back} are migrated to use them. > > When a list node is pushed to a list, the program still has a pointer to > the node: > > n = bpf_obj_new(typeof(*n)); > > bpf_spin_lock(&l); > bpf_list_push_back(&l, n); > /* n still points to the just-added node */ > bpf_spin_unlock(&l); > > What the verifier considers n to be after the push, and thus what can be > done with n, are changed by this patch. > > Common properties both before/after this patch: > * After push, n is only a valid reference to the node until end of > critical section > * After push, n cannot be pushed to any list > * After push, the program can read the node's fields using n correct. > Before: > * After push, n retains the ref_obj_id which it received on > bpf_obj_new, but the associated bpf_reference_state's > release_on_unlock field is set to true > * release_on_unlock field and associated logic is used to implement > "n is only a valid ref until end of critical section" > * After push, n cannot be written to, the node must be removed from > the list before writing to its fields > * After push, n is marked PTR_UNTRUSTED yep > After: > * After push, n's ref is released and ref_obj_id set to 0. The > bpf_reg_state's non_owning_ref_lock struct is populated with the > currently active lock > * non_owning_ref_lock and logic is used to implement "n is only a > valid ref until end of critical section" > * n can be written to (except for special fields e.g. bpf_list_node, > timer, ...) > * No special type flag is added to n after push yep. Great summary. > Summary of specific implementation changes to achieve the above: > > * release_on_unlock field, ref_set_release_on_unlock helper, and logic > to "release on unlock" based on that field are removed +1 > * The anonymous active_lock struct used by bpf_verifier_state is > pulled out into a named struct bpf_active_lock. ... > * A non_owning_ref_lock field of type bpf_active_lock is added to > bpf_reg_state's PTR_TO_BTF_ID union not great. see below. > * Helpers are added to use non_owning_ref_lock to implement non-owning > ref semantics as described above > * invalidate_non_owning_refs - helper to clobber all non-owning refs > matching a particular bpf_active_lock identity. Replaces > release_on_unlock logic in process_spin_lock. +1 > * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based > on current verifier state +1 > * ref_convert_owning_non_owning - convert owning reference w/ > specified ref_obj_id to non-owning references. Setup > non_owning_ref_lock for each reg with that ref_obj_id and 0 out > its ref_obj_id +1 > * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with > KF_RELEASE to indicate that the release arg reg should be converted > to non-owning ref > * Plain KF_RELEASE would clobber all regs with ref_obj_id matching > the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first - > doing ref_convert_owning_non_owning on the ref first, which > prevents the regs from being clobbered by 0ing out their > ref_obj_ids. The bpf_reference_state itself is still released via > release_reference as a result of the KF_RELEASE flag. > * KF_RELEASE | KF_RELEASE_NON_OWN are added to > bpf_list_push_{front,back} And this bit is confusing and not generalizable. As David noticed in his reply KF_RELEASE_NON_OWN is not a great name. It's hard to come up with a good name and it won't be generic anyway. The ref_convert_owning_non_owning has to be applied to a specific arg. The function itself is not KF_RELEASE in the current definition of it. The combination of KF_RELEASE|KF_RELEASE_NON_OWN is something new that should have been generic, but doesn't really work this way. In the next patches rbtree_root/node still has to have all the custom logic. KF_RELEASE_NON_OWN by itself is a nonsensical flag. Only combination of KF_RELEASE|KF_RELEASE_NON_OWN sort-of kinda makes sense, but still hard to understand what releases what. More below. > After these changes, linked_list's "release on unlock" logic continues > to function as before, except for the semantic differences noted above. > The patch immediately following this one makes minor changes to > linked_list selftests to account for the differing behavior. > > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> > --- > include/linux/bpf.h | 1 + > include/linux/bpf_verifier.h | 39 ++++----- > include/linux/btf.h | 17 ++-- > kernel/bpf/helpers.c | 4 +- > kernel/bpf/verifier.c | 164 ++++++++++++++++++++++++----------- > 5 files changed, 146 insertions(+), 79 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 3de24cfb7a3d..f71571bf6adc 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -180,6 +180,7 @@ enum btf_field_type { > BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF, > BPF_LIST_HEAD = (1 << 4), > BPF_LIST_NODE = (1 << 5), > + BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD, > }; > > struct btf_field_kptr { > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > index 53d175cbaa02..cb417ffbbb84 100644 > --- a/include/linux/bpf_verifier.h > +++ b/include/linux/bpf_verifier.h > @@ -43,6 +43,22 @@ enum bpf_reg_liveness { > REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */ > }; > > +/* For every reg representing a map value or allocated object pointer, > + * we consider the tuple of (ptr, id) for them to be unique in verifier > + * context and conside them to not alias each other for the purposes of > + * tracking lock state. > + */ > +struct bpf_active_lock { > + /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, > + * there's no active lock held, and other fields have no > + * meaning. If non-NULL, it indicates that a lock is held and > + * id member has the reg->id of the register which can be >= 0. > + */ > + void *ptr; > + /* This will be reg->id */ > + u32 id; > +}; > + > struct bpf_reg_state { > /* Ordering of fields matters. See states_equal() */ > enum bpf_reg_type type; > @@ -68,6 +84,7 @@ struct bpf_reg_state { > struct { > struct btf *btf; > u32 btf_id; > + struct bpf_active_lock non_owning_ref_lock; In your other email you argue that pointer should be enough. I suspect that won't be correct. See fixes that Andrii did in states_equal() and regsafe(). In particular: if (!!old->active_lock.id != !!cur->active_lock.id) return false; if (old->active_lock.id && !check_ids(old->active_lock.id, cur->active_lock.id, env->idmap_scratch)) return false; We have to do the comparison of this new ID via idmap as well. I think introduction of struct bpf_active_lock and addition of it to bpf_reg_state is overkill. Here we can add 'u32 non_own_ref_obj_id;' only and compare it via idmap in regsafe(). I'm guessing you didn't like my 'active_lock_id' suggestion. Fine. non_own_ref_obj_id would match existing ref_obj_id at least. > }; > > u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */ > @@ -223,11 +240,6 @@ struct bpf_reference_state { > * exiting a callback function. > */ > int callback_ref; > - /* Mark the reference state to release the registers sharing the same id > - * on bpf_spin_unlock (for nodes that we will lose ownership to but are > - * safe to access inside the critical section). > - */ > - bool release_on_unlock; > }; > > /* state of the program: > @@ -328,21 +340,8 @@ struct bpf_verifier_state { > u32 branches; > u32 insn_idx; > u32 curframe; > - /* For every reg representing a map value or allocated object pointer, > - * we consider the tuple of (ptr, id) for them to be unique in verifier > - * context and conside them to not alias each other for the purposes of > - * tracking lock state. > - */ > - struct { > - /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, > - * there's no active lock held, and other fields have no > - * meaning. If non-NULL, it indicates that a lock is held and > - * id member has the reg->id of the register which can be >= 0. > - */ > - void *ptr; > - /* This will be reg->id */ > - u32 id; > - } active_lock; I would keep it as-is. > + > + struct bpf_active_lock active_lock; > bool speculative; > bool active_rcu_lock; > > diff --git a/include/linux/btf.h b/include/linux/btf.h > index 5f628f323442..8aee3f7f4248 100644 > --- a/include/linux/btf.h > +++ b/include/linux/btf.h > @@ -15,10 +15,10 @@ > #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) > > /* These need to be macros, as the expressions are used in assembler input */ > -#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ > -#define KF_RELEASE (1 << 1) /* kfunc is a release function */ > -#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ > -#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ > +#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ > +#define KF_RELEASE (1 << 1) /* kfunc is a release function */ > +#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ > +#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ > /* Trusted arguments are those which are guaranteed to be valid when passed to > * the kfunc. It is used to enforce that pointers obtained from either acquire > * kfuncs, or from the main kernel on a tracepoint or struct_ops callback > @@ -67,10 +67,11 @@ > * return 0; > * } > */ > -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ > -#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ > -#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ > -#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ > +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ > +#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ > +#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ > +#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ > +#define KF_RELEASE_NON_OWN (1 << 8) /* kfunc converts its referenced arg into non-owning ref */ No need for this flag. > /* > * Return the name of the passed struct, if exists, or halt the build if for > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index af30c6cbd65d..e041409779c3 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) > #endif > BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL) > BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE) > -BTF_ID_FLAGS(func, bpf_list_push_front) > -BTF_ID_FLAGS(func, bpf_list_push_back) > +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN) > +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN) No need for this. > BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL) > BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL) > BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 824e2242eae5..84b0660e2a76 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -190,6 +190,10 @@ struct bpf_verifier_stack_elem { > > static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx); > static int release_reference(struct bpf_verifier_env *env, int ref_obj_id); > +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, > + struct bpf_active_lock *lock); > +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, > + struct bpf_reg_state *reg); > > static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux) > { > @@ -931,6 +935,9 @@ static void print_verifier_state(struct bpf_verifier_env *env, > verbose_a("id=%d", reg->id); > if (reg->ref_obj_id) > verbose_a("ref_obj_id=%d", reg->ref_obj_id); > + if (reg->non_owning_ref_lock.ptr) > + verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr, > + reg->non_owning_ref_lock.id); > if (t != SCALAR_VALUE) > verbose_a("off=%d", reg->off); > if (type_is_pkt_pointer(t)) > @@ -4820,7 +4827,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, > return -EACCES; > } > > - if (type_is_alloc(reg->type) && !reg->ref_obj_id) { > + if (type_is_alloc(reg->type) && !reg->ref_obj_id && > + !reg->non_owning_ref_lock.ptr) { > verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n"); > return -EFAULT; > } > @@ -5778,9 +5786,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, > cur->active_lock.ptr = btf; > cur->active_lock.id = reg->id; > } else { > - struct bpf_func_state *fstate = cur_func(env); > void *ptr; > - int i; > > if (map) > ptr = map; > @@ -5796,25 +5802,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, > verbose(env, "bpf_spin_unlock of different lock\n"); > return -EINVAL; > } > - cur->active_lock.ptr = NULL; > - cur->active_lock.id = 0; > > - for (i = fstate->acquired_refs - 1; i >= 0; i--) { > - int err; > + invalidate_non_owning_refs(env, &cur->active_lock); +1 > - /* Complain on error because this reference state cannot > - * be freed before this point, as bpf_spin_lock critical > - * section does not allow functions that release the > - * allocated object immediately. > - */ > - if (!fstate->refs[i].release_on_unlock) > - continue; > - err = release_reference(env, fstate->refs[i].id); > - if (err) { > - verbose(env, "failed to release release_on_unlock reference"); > - return err; > - } > - } > + cur->active_lock.ptr = NULL; > + cur->active_lock.id = 0; +1 > } > return 0; > } > @@ -6273,6 +6265,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, > return 0; > } > > +static struct btf_field * > +reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields) > +{ > + struct btf_field *field; > + struct btf_record *rec; > + > + rec = reg_btf_record(reg); > + if (!reg) > + return NULL; > + > + field = btf_record_find(rec, off, fields); > + if (!field) > + return NULL; > + > + return field; > +} Doesn't look like that this helper is really necessary. > + > int check_func_arg_reg_off(struct bpf_verifier_env *env, > const struct bpf_reg_state *reg, int regno, > enum bpf_arg_type arg_type) > @@ -6294,6 +6303,18 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, > */ > if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK) > return 0; > + > + if (type == (PTR_TO_BTF_ID | MEM_ALLOC) && reg->off) { > + if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT)) > + return __check_ptr_off_reg(env, reg, regno, true); > + > + verbose(env, "R%d must have zero offset when passed to release func\n", > + regno); > + verbose(env, "No graph node or root found at R%d type:%s off:%d\n", regno, > + kernel_type_name(reg->btf, reg->btf_id), reg->off); > + return -EINVAL; > + } This bit is only necessary if we mark push_list as KF_RELEASE. Just don't add this mark and drop above. > + > /* Doing check_ptr_off_reg check for the offset will catch this > * because fixed_off_ok is false, but checking here allows us > * to give the user a better error message. > @@ -7055,6 +7076,20 @@ static int release_reference(struct bpf_verifier_env *env, > return 0; > } > > +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, > + struct bpf_active_lock *lock) > +{ > + struct bpf_func_state *unused; > + struct bpf_reg_state *reg; > + > + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ > + if (reg->non_owning_ref_lock.ptr && > + reg->non_owning_ref_lock.ptr == lock->ptr && > + reg->non_owning_ref_lock.id == lock->id) I think the lock.ptr = lock->ptr comparison is unnecessary to invalidate things. We're under active spin_lock here. All regs were checked earlier and id keeps incrementing. So we can just do 'u32 non_own_ref_obj_id'. > + __mark_reg_unknown(env, reg); > + })); > +} > + > static void clear_caller_saved_regs(struct bpf_verifier_env *env, > struct bpf_reg_state *regs) > { > @@ -8266,6 +8301,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta) > return meta->kfunc_flags & KF_RELEASE; > } > > +static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta) > +{ > + return meta->kfunc_flags & KF_RELEASE_NON_OWN; > +} > + No need. > static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta) > { > return meta->kfunc_flags & KF_TRUSTED_ARGS; > @@ -8651,38 +8691,55 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env, > return 0; > } > > -static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id) > +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > { > - struct bpf_func_state *state = cur_func(env); > + struct bpf_verifier_state *state = env->cur_state; > + > + if (!state->active_lock.ptr) { > + verbose(env, "verifier internal error: ref_set_non_owning_lock w/o active lock\n"); > + return -EFAULT; > + } > + > + if (reg->non_owning_ref_lock.ptr) { > + verbose(env, "verifier internal error: non_owning_ref_lock already set\n"); > + return -EFAULT; > + } > + > + reg->non_owning_ref_lock.id = state->active_lock.id; > + reg->non_owning_ref_lock.ptr = state->active_lock.ptr; > + return 0; > +} > + > +static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id) > +{ > + struct bpf_func_state *state, *unused; > struct bpf_reg_state *reg; > int i; > > - /* bpf_spin_lock only allows calling list_push and list_pop, no BPF > - * subprogs, no global functions. This means that the references would > - * not be released inside the critical section but they may be added to > - * the reference state, and the acquired_refs are never copied out for a > - * different frame as BPF to BPF calls don't work in bpf_spin_lock > - * critical sections. > - */ > + state = cur_func(env); > + > if (!ref_obj_id) { > - verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n"); > + verbose(env, "verifier internal error: ref_obj_id is zero for " > + "owning -> non-owning conversion\n"); > return -EFAULT; > } > + > for (i = 0; i < state->acquired_refs; i++) { > - if (state->refs[i].id == ref_obj_id) { > - if (state->refs[i].release_on_unlock) { > - verbose(env, "verifier internal error: expected false release_on_unlock"); > - return -EFAULT; > + if (state->refs[i].id != ref_obj_id) > + continue; > + > + /* Clear ref_obj_id here so release_reference doesn't clobber > + * the whole reg > + */ > + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ > + if (reg->ref_obj_id == ref_obj_id) { > + reg->ref_obj_id = 0; > + ref_set_non_owning_lock(env, reg); +1 except ref_set_... name doesn't quite fit. reg_set_... is more accurate, no? and probably reg_set_non_own_ref_obj_id() ? Or just open code it? > } > - state->refs[i].release_on_unlock = true; > - /* Now mark everyone sharing same ref_obj_id as untrusted */ > - bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ > - if (reg->ref_obj_id == ref_obj_id) > - reg->type |= PTR_UNTRUSTED; > - })); > - return 0; > - } > + })); > + return 0; > } > + > verbose(env, "verifier internal error: ref state missing for ref_obj_id\n"); > return -EFAULT; > } > @@ -8817,7 +8874,6 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, > { > const struct btf_type *et, *t; > struct btf_field *field; > - struct btf_record *rec; > u32 list_node_off; > > if (meta->btf != btf_vmlinux || > @@ -8834,9 +8890,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, > return -EINVAL; > } > > - rec = reg_btf_record(reg); > list_node_off = reg->off + reg->var_off.value; > - field = btf_record_find(rec, list_node_off, BPF_LIST_NODE); > + field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE); > if (!field || field->offset != list_node_off) { > verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off); > return -EINVAL; > @@ -8861,8 +8916,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, > btf_name_by_offset(field->list_head.btf, et->name_off)); > return -EINVAL; > } > - /* Set arg#1 for expiration after unlock */ > - return ref_set_release_on_unlock(env, reg->ref_obj_id); > + > + return 0; and here we come to the main point. Can you just call ref_convert_owning_non_owning(env, reg->ref_obj_id) and release_reference() here? Everything will be so much simpler, no? > } > > static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta) > @@ -9132,11 +9187,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, > int *insn_idx_p) > { > const struct btf_type *t, *func, *func_proto, *ptr_type; > + u32 i, nargs, func_id, ptr_type_id, release_ref_obj_id; > struct bpf_reg_state *regs = cur_regs(env); > const char *func_name, *ptr_type_name; > bool sleepable, rcu_lock, rcu_unlock; > struct bpf_kfunc_call_arg_meta meta; > - u32 i, nargs, func_id, ptr_type_id; > int err, insn_idx = *insn_idx_p; > const struct btf_param *args; > const struct btf_type *ret_t; > @@ -9223,7 +9278,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, > * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now. > */ > if (meta.release_regno) { > - err = release_reference(env, regs[meta.release_regno].ref_obj_id); > + err = 0; > + release_ref_obj_id = regs[meta.release_regno].ref_obj_id; > + > + if (is_kfunc_release_non_own(&meta)) > + err = ref_convert_owning_non_owning(env, release_ref_obj_id); > + if (err) { > + verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n", > + func_name, func_id); > + return err; > + } > + > + err = release_reference(env, release_ref_obj_id); and this bit won't be needed. and no need to guess in patch 1 which arg has to be released and converted to non_own. > if (err) { > verbose(env, "kfunc %s#%d reference has not been acquired before\n", > func_name, func_id); > -- > 2.30.2 >
On Wed, Dec 28, 2022 at 05:46:54PM -0600, David Vernet wrote: [...] > Hey Dave, > > I'm sorry to be chiming in a bit late in the game here, but I only > finally had the time to fully review some of this stuff during the > holiday-lull, and I have a few questions / concerns about the whole > owning vs. non-owning refcount approach we're taking here. After reading through sleeping on this and reading through the discussion in [0], I have some slight adjustments I want to make to my points here. [0]: https://lore.kernel.org/bpf/20221207230602.logjjjv3kwiiy6u3@macbook-pro-6.dhcp.thefacebook.com/ > > > --- > > include/linux/bpf.h | 1 + > > include/linux/bpf_verifier.h | 39 ++++----- > > include/linux/btf.h | 17 ++-- > > kernel/bpf/helpers.c | 4 +- > > kernel/bpf/verifier.c | 164 ++++++++++++++++++++++++----------- > > 5 files changed, 146 insertions(+), 79 deletions(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index 3de24cfb7a3d..f71571bf6adc 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -180,6 +180,7 @@ enum btf_field_type { > > BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF, > > BPF_LIST_HEAD = (1 << 4), > > BPF_LIST_NODE = (1 << 5), > > + BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD, > > }; > > > > struct btf_field_kptr { > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > > index 53d175cbaa02..cb417ffbbb84 100644 > > --- a/include/linux/bpf_verifier.h > > +++ b/include/linux/bpf_verifier.h > > @@ -43,6 +43,22 @@ enum bpf_reg_liveness { > > REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */ > > }; > > > > +/* For every reg representing a map value or allocated object pointer, > > + * we consider the tuple of (ptr, id) for them to be unique in verifier > > + * context and conside them to not alias each other for the purposes of > > + * tracking lock state. > > + */ > > +struct bpf_active_lock { > > + /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, > > + * there's no active lock held, and other fields have no > > + * meaning. If non-NULL, it indicates that a lock is held and > > + * id member has the reg->id of the register which can be >= 0. > > + */ > > + void *ptr; > > + /* This will be reg->id */ > > + u32 id; > > +}; > > + > > struct bpf_reg_state { > > /* Ordering of fields matters. See states_equal() */ > > enum bpf_reg_type type; > > @@ -68,6 +84,7 @@ struct bpf_reg_state { > > struct { > > struct btf *btf; > > u32 btf_id; > > + struct bpf_active_lock non_owning_ref_lock; > > }; > > > > u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */ > > @@ -223,11 +240,6 @@ struct bpf_reference_state { > > * exiting a callback function. > > */ > > int callback_ref; > > - /* Mark the reference state to release the registers sharing the same id > > - * on bpf_spin_unlock (for nodes that we will lose ownership to but are > > - * safe to access inside the critical section). > > - */ > > - bool release_on_unlock; > > }; > > > > /* state of the program: > > @@ -328,21 +340,8 @@ struct bpf_verifier_state { > > u32 branches; > > u32 insn_idx; > > u32 curframe; > > - /* For every reg representing a map value or allocated object pointer, > > - * we consider the tuple of (ptr, id) for them to be unique in verifier > > - * context and conside them to not alias each other for the purposes of > > - * tracking lock state. > > - */ > > - struct { > > - /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, > > - * there's no active lock held, and other fields have no > > - * meaning. If non-NULL, it indicates that a lock is held and > > - * id member has the reg->id of the register which can be >= 0. > > - */ > > - void *ptr; > > - /* This will be reg->id */ > > - u32 id; > > - } active_lock; > > + > > + struct bpf_active_lock active_lock; > > bool speculative; > > bool active_rcu_lock; > > > > diff --git a/include/linux/btf.h b/include/linux/btf.h > > index 5f628f323442..8aee3f7f4248 100644 > > --- a/include/linux/btf.h > > +++ b/include/linux/btf.h > > @@ -15,10 +15,10 @@ > > #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) > > > > /* These need to be macros, as the expressions are used in assembler input */ > > -#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ > > -#define KF_RELEASE (1 << 1) /* kfunc is a release function */ > > -#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ > > -#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ > > +#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ > > +#define KF_RELEASE (1 << 1) /* kfunc is a release function */ > > +#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ > > +#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ > > /* Trusted arguments are those which are guaranteed to be valid when passed to > > * the kfunc. It is used to enforce that pointers obtained from either acquire > > * kfuncs, or from the main kernel on a tracepoint or struct_ops callback > > @@ -67,10 +67,11 @@ > > * return 0; > > * } > > */ > > -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ > > -#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ > > -#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ > > -#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ > > +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ > > +#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ > > +#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ > > +#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ > > +#define KF_RELEASE_NON_OWN (1 << 8) /* kfunc converts its referenced arg into non-owning ref */ > > It would be nice if we could come up with new kfunc flag names that > don't have 'RELEASE' in it. As is this is arguably a bit of a leaky > abstraction given that kfunc authors now have to understand a notion of > "releasing", "releasing but keeping a non-owning ref", and "releasing > but it must be a non-owning reference". I know that in [0] you mention > that the notions of owning and non-owning references are entirely > relegated to graph-type maps, but I disagree. More below. > > [0]: https://lore.kernel.org/all/20221217082506.1570898-14-davemarchevsky@fb.com/ I see now why you said that owning and non-owning were entirely relegated to graph-type maps. I think the general idea of owning vs. non-owning references in the context of graph-type maps is reasonable, but I think the proposal here unintentionally does combine the concepts due to its naming choices, which is problematic. I'll briefly say one more thing below, but I'll continue the conversation on Alexei's email in [1] to keep everything in the same place. I'll be responding there shortly. [1]: https://lore.kernel.org/all/20221229035600.m43ayhidfisbl4sq@MacBook-Pro-6.local/ > > In general, IMO this muddies the existing, crystal-clear semantics of > BPF object ownership and refcounting. Usually a "weak" or "non-owning" > reference is a shadow of a strong reference, and "using" the weak > reference requires attempting (because it could fail) to temporarily > promote it to a strong reference. If successful, the object's existence > is guaranteed until the weak pointer is demoted back to a weak pointer > and/or the promoted strong pointer is released, and it's perfectly valid > for an object's lifetime to be extended due to a promoted weak pointer > not dropping its reference until after all the other strong pointer > references have been dropped. The key point here is that a pointer's > safety is entirely dictated by whether or not the holder has or is able > to acquire a strong reference, and nothing more. > > In contrast, if I understand correctly, in this proposal a "non-owning" > reference means that the object is guaranteed to be valid due to > external factors such as a lock being held on the root node of the > graph, and is used to e.g. signal whether an object has or has not yet > been added as a node to an rbtree or a list. If so, IMO these are > completely separate concepts from refcounting, and I don't think we > should intertwine it with the acquire / release semantics that we > currently use for ensuring object lifetime. > > Note that weak references are usually (if not always, at least in my > experience) used to resolve circular dependencies where the reference > would always be leaked if both sides had a strong reference. I don't > think that applies here, where instead we're using "owning reference" to > mean that ownership of the object has not yet been passed to a > graph-type data structure, and "non-owning reference" to mean that the > graph now owns the strong reference, but it's still safe to reference > the object due to it being protected by some external synchronization > mechanism like a lock. There's no danger of a circular dependency here, > we just want to provide consistent API semantics. > > If we want to encapsulate notions of "safe due to a lock being held on a > root node", and "pointer hasn't yet been inserted into the graph", I > think we should consider adding some entirely separate abstractions. For > example, something like PTR_GRAPH_STORED on the register type-modifier > side for signaling whether a pointer has already been stored in a graph, > and KF_GRAPH_INSERT, KF_GRAPH_REMOVE type kfunc flags for adding and > removing from graphs respectively. I don't think we'd have to add > anything at all for ensuring pointer safety from the lock being held, as > the verifier should be able to figure out that a pointer that was > inserted with KF_GRAPH_INSERT is safe to reference inside of the locked > region of the lock associated with the root node. The refcnt of the > object isn't relevant at all, it's the association of the root node with > a specific lock. > > > > > /* > > * Return the name of the passed struct, if exists, or halt the build if for > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > index af30c6cbd65d..e041409779c3 100644 > > --- a/kernel/bpf/helpers.c > > +++ b/kernel/bpf/helpers.c > > @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) > > #endif > > BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL) > > BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE) > > -BTF_ID_FLAGS(func, bpf_list_push_front) > > -BTF_ID_FLAGS(func, bpf_list_push_back) > > +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN) > > +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN) > > I don't think a helper should specify both of these flags together. > IIUC, what this is saying is something along the lines of, "Release the > reference, but rather than actually releasing it, just keep it and > convert it into a non-owning reference". IMO KF_RELEASE should always > mean, exclusively, "I'm releasing a previously-acquired strong reference > to an object", and the expectation should be that the object cannot be > referenced _at all_ afterwards, unless you happen to have another strong > reference. > > IMO this is another sign that we should consider going in a different > direction for owning vs. non-owning references. I don't think this I now don't feel that we should be going in a different direction for owning vs. non-owning as it applies to graphs as you said in [2]. I do think, however, that we have to revisit some naming choices, and possibly consider not adding new kfunc flags at all to enable this. I'll respond on the other thread to Alexei to keep the discussion in one place. [2]: https://lore.kernel.org/all/20221229035600.m43ayhidfisbl4sq@MacBook-Pro-6.local/ > makes sense from an object-refcounting perspective, but I readily admit > that I could be missing a lot of important context here. > > [...] > > Thanks, > David
On Wed, Dec 28, 2022 at 07:56:00PM -0800, Alexei Starovoitov wrote: > On Sat, Dec 17, 2022 at 12:24:55AM -0800, Dave Marchevsky wrote: > > This patch introduces non-owning reference semantics to the verifier, > > specifically linked_list API kfunc handling. release_on_unlock logic for > > refs is refactored - with small functional changes - to implement these > > semantics, and bpf_list_push_{front,back} are migrated to use them. > > > > When a list node is pushed to a list, the program still has a pointer to > > the node: > > > > n = bpf_obj_new(typeof(*n)); > > > > bpf_spin_lock(&l); > > bpf_list_push_back(&l, n); > > /* n still points to the just-added node */ > > bpf_spin_unlock(&l); > > > > What the verifier considers n to be after the push, and thus what can be > > done with n, are changed by this patch. > > > > Common properties both before/after this patch: > > * After push, n is only a valid reference to the node until end of > > critical section > > * After push, n cannot be pushed to any list > > * After push, the program can read the node's fields using n > > correct. > > > Before: > > * After push, n retains the ref_obj_id which it received on > > bpf_obj_new, but the associated bpf_reference_state's > > release_on_unlock field is set to true > > * release_on_unlock field and associated logic is used to implement > > "n is only a valid ref until end of critical section" > > * After push, n cannot be written to, the node must be removed from > > the list before writing to its fields > > * After push, n is marked PTR_UNTRUSTED > > yep > > > After: > > * After push, n's ref is released and ref_obj_id set to 0. The > > bpf_reg_state's non_owning_ref_lock struct is populated with the > > currently active lock > > * non_owning_ref_lock and logic is used to implement "n is only a > > valid ref until end of critical section" > > * n can be written to (except for special fields e.g. bpf_list_node, > > timer, ...) > > * No special type flag is added to n after push > > yep. > Great summary. > > > Summary of specific implementation changes to achieve the above: > > > > * release_on_unlock field, ref_set_release_on_unlock helper, and logic > > to "release on unlock" based on that field are removed > > +1 > > > * The anonymous active_lock struct used by bpf_verifier_state is > > pulled out into a named struct bpf_active_lock. > ... > > * A non_owning_ref_lock field of type bpf_active_lock is added to > > bpf_reg_state's PTR_TO_BTF_ID union > > not great. see below. > > > * Helpers are added to use non_owning_ref_lock to implement non-owning > > ref semantics as described above > > * invalidate_non_owning_refs - helper to clobber all non-owning refs > > matching a particular bpf_active_lock identity. Replaces > > release_on_unlock logic in process_spin_lock. > > +1 > > > * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based > > on current verifier state > > +1 > > > * ref_convert_owning_non_owning - convert owning reference w/ > > specified ref_obj_id to non-owning references. Setup > > non_owning_ref_lock for each reg with that ref_obj_id and 0 out > > its ref_obj_id > > +1 > > > * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with > > KF_RELEASE to indicate that the release arg reg should be converted > > to non-owning ref > > * Plain KF_RELEASE would clobber all regs with ref_obj_id matching > > the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first - > > doing ref_convert_owning_non_owning on the ref first, which > > prevents the regs from being clobbered by 0ing out their > > ref_obj_ids. The bpf_reference_state itself is still released via > > release_reference as a result of the KF_RELEASE flag. > > * KF_RELEASE | KF_RELEASE_NON_OWN are added to > > bpf_list_push_{front,back} > > And this bit is confusing and not generalizable. +1 on both counts. If we want to make it generalizable, I think the only way to do would be to generalize it across different graph map types. For example, to have kfunc flags like KF_GRAPH_INSERT and KF_GRAPH_REMOVE which signal to the verifier that "for this graph-type map which has a spin-lock associated with its root node that I expect to be held, I've {inserted, removed} the node {to, from} the graph, so adjust the refcnt / pointer type accordingly and then clean up when the lock is dropped." I don't see any reason to add kfunc flags for that though, as the fact that the pointer in question refers to a node that has a root node that has a lock associated with it is already itself a special-case scenario. I think we should just special-case these kfuncs in the verifier as "graph-type" kfuncs in some static global array(s). That's probably less error prone anyways, and I don't see the typical kfunc writer ever needing to do this. > As David noticed in his reply KF_RELEASE_NON_OWN is not a great name. > It's hard to come up with a good name and it won't be generic anyway. > The ref_convert_owning_non_owning has to be applied to a specific arg. > The function itself is not KF_RELEASE in the current definition of it. > The combination of KF_RELEASE|KF_RELEASE_NON_OWN is something new > that should have been generic, but doesn't really work this way. > In the next patches rbtree_root/node still has to have all the custom > logic. > KF_RELEASE_NON_OWN by itself is a nonsensical flag. IMO if a flag doesn't make any sense on its own, or even possibly if it needs to be mutually exclusive with one or more other flags, it is probably never a correct building block. Even KF_TRUSTED_ARGS doesn't really make sense, as it's redundant if KF_RCU is specified. This is fine though, as IIUC our long-term plan is to get rid of KF_TRUSTED_ARGS and make it the default behavior for all kfuncs (not trying to hijack this thread with a tangential discussion about KF_TRUSTED_ARGS, just using this as an opportunity to point out something to keep in mind as we continue to add kfunc flags down the road). > Only combination of KF_RELEASE|KF_RELEASE_NON_OWN sort-of kinda makes > sense, but still hard to understand what releases what. I agree and I think this is an important point. IMO it is a worse tradeoff to try to generalize this by complicating the definition of a reference than it is to keep the refcounting APIs straightforward and well defined. As a basic building block, having an owning refcount should mean one thing: that the object will not be destroyed and is safe to dereference. When you start mixing in these graph-specific notions of references meaning different things in specific contexts, it compromises that and makes the API significantly less usable and extensible. For example, at some point we may decide to add something like a kptr_weak_ref which would function exactly like an std::weak_ptr, except of course that it would wrap a kptr_ref instead of an std::shared_ptr. IMO something like that is a more natural and generalizable building block that cleanly complements refcounting as it exists today. > More below. > > > After these changes, linked_list's "release on unlock" logic continues > > to function as before, except for the semantic differences noted above. > > The patch immediately following this one makes minor changes to > > linked_list selftests to account for the differing behavior. > > > > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> > > --- > > include/linux/bpf.h | 1 + > > include/linux/bpf_verifier.h | 39 ++++----- > > include/linux/btf.h | 17 ++-- > > kernel/bpf/helpers.c | 4 +- > > kernel/bpf/verifier.c | 164 ++++++++++++++++++++++++----------- > > 5 files changed, 146 insertions(+), 79 deletions(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index 3de24cfb7a3d..f71571bf6adc 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -180,6 +180,7 @@ enum btf_field_type { > > BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF, > > BPF_LIST_HEAD = (1 << 4), > > BPF_LIST_NODE = (1 << 5), > > + BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD, Can you update the rest of the elements here to keep common indentation? > > }; > > > > struct btf_field_kptr { > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > > index 53d175cbaa02..cb417ffbbb84 100644 > > --- a/include/linux/bpf_verifier.h > > +++ b/include/linux/bpf_verifier.h > > @@ -43,6 +43,22 @@ enum bpf_reg_liveness { > > REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */ > > }; > > > > +/* For every reg representing a map value or allocated object pointer, > > + * we consider the tuple of (ptr, id) for them to be unique in verifier > > + * context and conside them to not alias each other for the purposes of > > + * tracking lock state. > > + */ > > +struct bpf_active_lock { > > + /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, > > + * there's no active lock held, and other fields have no > > + * meaning. If non-NULL, it indicates that a lock is held and > > + * id member has the reg->id of the register which can be >= 0. > > + */ > > + void *ptr; > > + /* This will be reg->id */ > > + u32 id; > > +}; > > + > > struct bpf_reg_state { > > /* Ordering of fields matters. See states_equal() */ > > enum bpf_reg_type type; > > @@ -68,6 +84,7 @@ struct bpf_reg_state { > > struct { > > struct btf *btf; > > u32 btf_id; > > + struct bpf_active_lock non_owning_ref_lock; > > In your other email you argue that pointer should be enough. > I suspect that won't be correct. > See fixes that Andrii did in states_equal() and regsafe(). > In particular: > if (!!old->active_lock.id != !!cur->active_lock.id) > return false; > > if (old->active_lock.id && > !check_ids(old->active_lock.id, cur->active_lock.id, env->idmap_scratch)) > return false; > > We have to do the comparison of this new ID via idmap as well. > > I think introduction of struct bpf_active_lock and addition of it > to bpf_reg_state is overkill. > Here we can add 'u32 non_own_ref_obj_id;' only and compare it via idmap in regsafe(). > I'm guessing you didn't like my 'active_lock_id' suggestion. Fine. > non_own_ref_obj_id would match existing ref_obj_id at least. > > > }; > > > > u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */ > > @@ -223,11 +240,6 @@ struct bpf_reference_state { > > * exiting a callback function. > > */ > > int callback_ref; > > - /* Mark the reference state to release the registers sharing the same id > > - * on bpf_spin_unlock (for nodes that we will lose ownership to but are > > - * safe to access inside the critical section). > > - */ > > - bool release_on_unlock; > > }; > > > > /* state of the program: > > @@ -328,21 +340,8 @@ struct bpf_verifier_state { > > u32 branches; > > u32 insn_idx; > > u32 curframe; > > - /* For every reg representing a map value or allocated object pointer, > > - * we consider the tuple of (ptr, id) for them to be unique in verifier > > - * context and conside them to not alias each other for the purposes of > > - * tracking lock state. > > - */ > > - struct { > > - /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, > > - * there's no active lock held, and other fields have no > > - * meaning. If non-NULL, it indicates that a lock is held and > > - * id member has the reg->id of the register which can be >= 0. > > - */ > > - void *ptr; > > - /* This will be reg->id */ > > - u32 id; > > - } active_lock; > > I would keep it as-is. > > > + > > + struct bpf_active_lock active_lock; > > bool speculative; > > bool active_rcu_lock; > > > > diff --git a/include/linux/btf.h b/include/linux/btf.h > > index 5f628f323442..8aee3f7f4248 100644 > > --- a/include/linux/btf.h > > +++ b/include/linux/btf.h > > @@ -15,10 +15,10 @@ > > #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) > > > > /* These need to be macros, as the expressions are used in assembler input */ > > -#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ > > -#define KF_RELEASE (1 << 1) /* kfunc is a release function */ > > -#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ > > -#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ > > +#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ > > +#define KF_RELEASE (1 << 1) /* kfunc is a release function */ > > +#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ > > +#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ > > /* Trusted arguments are those which are guaranteed to be valid when passed to > > * the kfunc. It is used to enforce that pointers obtained from either acquire > > * kfuncs, or from the main kernel on a tracepoint or struct_ops callback > > @@ -67,10 +67,11 @@ > > * return 0; > > * } > > */ > > -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ > > -#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ > > -#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ > > -#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ > > +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ > > +#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ > > +#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ > > +#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ > > +#define KF_RELEASE_NON_OWN (1 << 8) /* kfunc converts its referenced arg into non-owning ref */ > > No need for this flag. > > > /* > > * Return the name of the passed struct, if exists, or halt the build if for > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > index af30c6cbd65d..e041409779c3 100644 > > --- a/kernel/bpf/helpers.c > > +++ b/kernel/bpf/helpers.c > > @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) > > #endif > > BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL) > > BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE) > > -BTF_ID_FLAGS(func, bpf_list_push_front) > > -BTF_ID_FLAGS(func, bpf_list_push_back) > > +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN) > > +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN) > > No need for this. > > > BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL) > > BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL) > > BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index 824e2242eae5..84b0660e2a76 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -190,6 +190,10 @@ struct bpf_verifier_stack_elem { > > > > static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx); > > static int release_reference(struct bpf_verifier_env *env, int ref_obj_id); > > +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, > > + struct bpf_active_lock *lock); > > +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, > > + struct bpf_reg_state *reg); > > > > static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux) > > { > > @@ -931,6 +935,9 @@ static void print_verifier_state(struct bpf_verifier_env *env, > > verbose_a("id=%d", reg->id); > > if (reg->ref_obj_id) > > verbose_a("ref_obj_id=%d", reg->ref_obj_id); > > + if (reg->non_owning_ref_lock.ptr) > > + verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr, > > + reg->non_owning_ref_lock.id); > > if (t != SCALAR_VALUE) > > verbose_a("off=%d", reg->off); > > if (type_is_pkt_pointer(t)) > > @@ -4820,7 +4827,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, > > return -EACCES; > > } > > > > - if (type_is_alloc(reg->type) && !reg->ref_obj_id) { > > + if (type_is_alloc(reg->type) && !reg->ref_obj_id && > > + !reg->non_owning_ref_lock.ptr) { > > verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n"); > > return -EFAULT; > > } > > @@ -5778,9 +5786,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, > > cur->active_lock.ptr = btf; > > cur->active_lock.id = reg->id; > > } else { > > - struct bpf_func_state *fstate = cur_func(env); > > void *ptr; > > - int i; > > > > if (map) > > ptr = map; > > @@ -5796,25 +5802,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, > > verbose(env, "bpf_spin_unlock of different lock\n"); > > return -EINVAL; > > } > > - cur->active_lock.ptr = NULL; > > - cur->active_lock.id = 0; > > > > - for (i = fstate->acquired_refs - 1; i >= 0; i--) { > > - int err; > > + invalidate_non_owning_refs(env, &cur->active_lock); > > +1 > > > - /* Complain on error because this reference state cannot > > - * be freed before this point, as bpf_spin_lock critical > > - * section does not allow functions that release the > > - * allocated object immediately. > > - */ > > - if (!fstate->refs[i].release_on_unlock) > > - continue; > > - err = release_reference(env, fstate->refs[i].id); > > - if (err) { > > - verbose(env, "failed to release release_on_unlock reference"); > > - return err; > > - } > > - } > > + cur->active_lock.ptr = NULL; > > + cur->active_lock.id = 0; > > +1 > > > } > > return 0; > > } > > @@ -6273,6 +6265,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, > > return 0; > > } > > > > +static struct btf_field * > > +reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields) > > +{ > > + struct btf_field *field; > > + struct btf_record *rec; > > + > > + rec = reg_btf_record(reg); > > + if (!reg) > > + return NULL; > > + > > + field = btf_record_find(rec, off, fields); > > + if (!field) > > + return NULL; > > + > > + return field; > > +} > > Doesn't look like that this helper is really necessary. > > > + > > int check_func_arg_reg_off(struct bpf_verifier_env *env, > > const struct bpf_reg_state *reg, int regno, > > enum bpf_arg_type arg_type) > > @@ -6294,6 +6303,18 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, > > */ > > if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK) > > return 0; > > + > > + if (type == (PTR_TO_BTF_ID | MEM_ALLOC) && reg->off) { > > + if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT)) > > + return __check_ptr_off_reg(env, reg, regno, true); > > + > > + verbose(env, "R%d must have zero offset when passed to release func\n", > > + regno); > > + verbose(env, "No graph node or root found at R%d type:%s off:%d\n", regno, > > + kernel_type_name(reg->btf, reg->btf_id), reg->off); > > + return -EINVAL; > > + } > > This bit is only necessary if we mark push_list as KF_RELEASE. > Just don't add this mark and drop above. > > > + > > /* Doing check_ptr_off_reg check for the offset will catch this > > * because fixed_off_ok is false, but checking here allows us > > * to give the user a better error message. > > @@ -7055,6 +7076,20 @@ static int release_reference(struct bpf_verifier_env *env, > > return 0; > > } > > > > +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, > > + struct bpf_active_lock *lock) > > +{ > > + struct bpf_func_state *unused; > > + struct bpf_reg_state *reg; > > + > > + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ > > + if (reg->non_owning_ref_lock.ptr && > > + reg->non_owning_ref_lock.ptr == lock->ptr && > > + reg->non_owning_ref_lock.id == lock->id) > > I think the lock.ptr = lock->ptr comparison is unnecessary to invalidate things. > We're under active spin_lock here. All regs were checked earlier and id keeps incrementing. > So we can just do 'u32 non_own_ref_obj_id'. > > > + __mark_reg_unknown(env, reg); > > + })); > > +} > > + > > static void clear_caller_saved_regs(struct bpf_verifier_env *env, > > struct bpf_reg_state *regs) > > { > > @@ -8266,6 +8301,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta) > > return meta->kfunc_flags & KF_RELEASE; > > } > > > > +static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta) > > +{ > > + return meta->kfunc_flags & KF_RELEASE_NON_OWN; > > +} > > + > > No need. > > > static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta) > > { > > return meta->kfunc_flags & KF_TRUSTED_ARGS; > > @@ -8651,38 +8691,55 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env, > > return 0; > > } > > > > -static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id) > > +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_state *reg) > > { > > - struct bpf_func_state *state = cur_func(env); > > + struct bpf_verifier_state *state = env->cur_state; > > + > > + if (!state->active_lock.ptr) { > > + verbose(env, "verifier internal error: ref_set_non_owning_lock w/o active lock\n"); > > + return -EFAULT; > > + } > > + > > + if (reg->non_owning_ref_lock.ptr) { > > + verbose(env, "verifier internal error: non_owning_ref_lock already set\n"); > > + return -EFAULT; > > + } > > + > > + reg->non_owning_ref_lock.id = state->active_lock.id; > > + reg->non_owning_ref_lock.ptr = state->active_lock.ptr; > > + return 0; > > +} > > + > > +static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id) > > +{ > > + struct bpf_func_state *state, *unused; > > struct bpf_reg_state *reg; > > int i; > > > > - /* bpf_spin_lock only allows calling list_push and list_pop, no BPF > > - * subprogs, no global functions. This means that the references would > > - * not be released inside the critical section but they may be added to > > - * the reference state, and the acquired_refs are never copied out for a > > - * different frame as BPF to BPF calls don't work in bpf_spin_lock > > - * critical sections. > > - */ > > + state = cur_func(env); > > + > > if (!ref_obj_id) { > > - verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n"); > > + verbose(env, "verifier internal error: ref_obj_id is zero for " > > + "owning -> non-owning conversion\n"); > > return -EFAULT; > > } > > + > > for (i = 0; i < state->acquired_refs; i++) { > > - if (state->refs[i].id == ref_obj_id) { > > - if (state->refs[i].release_on_unlock) { > > - verbose(env, "verifier internal error: expected false release_on_unlock"); > > - return -EFAULT; > > + if (state->refs[i].id != ref_obj_id) > > + continue; > > + > > + /* Clear ref_obj_id here so release_reference doesn't clobber > > + * the whole reg > > + */ > > + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ > > + if (reg->ref_obj_id == ref_obj_id) { > > + reg->ref_obj_id = 0; > > + ref_set_non_owning_lock(env, reg); > > +1 except ref_set_... name doesn't quite fit. reg_set_... is more accurate, no? > and probably reg_set_non_own_ref_obj_id() ? > Or just open code it? > > > } > > - state->refs[i].release_on_unlock = true; > > - /* Now mark everyone sharing same ref_obj_id as untrusted */ > > - bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ > > - if (reg->ref_obj_id == ref_obj_id) > > - reg->type |= PTR_UNTRUSTED; > > - })); > > - return 0; > > - } > > + })); > > + return 0; > > } > > + > > verbose(env, "verifier internal error: ref state missing for ref_obj_id\n"); > > return -EFAULT; > > } > > @@ -8817,7 +8874,6 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, > > { > > const struct btf_type *et, *t; > > struct btf_field *field; > > - struct btf_record *rec; > > u32 list_node_off; > > > > if (meta->btf != btf_vmlinux || > > @@ -8834,9 +8890,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, > > return -EINVAL; > > } > > > > - rec = reg_btf_record(reg); > > list_node_off = reg->off + reg->var_off.value; > > - field = btf_record_find(rec, list_node_off, BPF_LIST_NODE); > > + field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE); > > if (!field || field->offset != list_node_off) { > > verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off); > > return -EINVAL; > > @@ -8861,8 +8916,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, > > btf_name_by_offset(field->list_head.btf, et->name_off)); > > return -EINVAL; > > } > > - /* Set arg#1 for expiration after unlock */ > > - return ref_set_release_on_unlock(env, reg->ref_obj_id); > > + > > + return 0; > > and here we come to the main point. > Can you just call > ref_convert_owning_non_owning(env, reg->ref_obj_id) and release_reference() here? > Everything will be so much simpler, no? > > > } > > > > static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta) > > @@ -9132,11 +9187,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, > > int *insn_idx_p) > > { > > const struct btf_type *t, *func, *func_proto, *ptr_type; > > + u32 i, nargs, func_id, ptr_type_id, release_ref_obj_id; > > struct bpf_reg_state *regs = cur_regs(env); > > const char *func_name, *ptr_type_name; > > bool sleepable, rcu_lock, rcu_unlock; > > struct bpf_kfunc_call_arg_meta meta; > > - u32 i, nargs, func_id, ptr_type_id; > > int err, insn_idx = *insn_idx_p; > > const struct btf_param *args; > > const struct btf_type *ret_t; > > @@ -9223,7 +9278,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, > > * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now. > > */ > > if (meta.release_regno) { > > - err = release_reference(env, regs[meta.release_regno].ref_obj_id); > > + err = 0; > > + release_ref_obj_id = regs[meta.release_regno].ref_obj_id; > > + > > + if (is_kfunc_release_non_own(&meta)) > > + err = ref_convert_owning_non_owning(env, release_ref_obj_id); > > + if (err) { > > + verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n", > > + func_name, func_id); > > + return err; > > + } > > + > > + err = release_reference(env, release_ref_obj_id); > > and this bit won't be needed. > and no need to guess in patch 1 which arg has to be released and converted to non_own. > > > if (err) { > > verbose(env, "kfunc %s#%d reference has not been acquired before\n", > > func_name, func_id); > > -- > > 2.30.2 > >
On 12/28/22 10:56 PM, Alexei Starovoitov wrote: > On Sat, Dec 17, 2022 at 12:24:55AM -0800, Dave Marchevsky wrote: >> This patch introduces non-owning reference semantics to the verifier, >> specifically linked_list API kfunc handling. release_on_unlock logic for >> refs is refactored - with small functional changes - to implement these >> semantics, and bpf_list_push_{front,back} are migrated to use them. >> >> When a list node is pushed to a list, the program still has a pointer to >> the node: >> >> n = bpf_obj_new(typeof(*n)); >> >> bpf_spin_lock(&l); >> bpf_list_push_back(&l, n); >> /* n still points to the just-added node */ >> bpf_spin_unlock(&l); >> >> What the verifier considers n to be after the push, and thus what can be >> done with n, are changed by this patch. >> >> Common properties both before/after this patch: >> * After push, n is only a valid reference to the node until end of >> critical section >> * After push, n cannot be pushed to any list >> * After push, the program can read the node's fields using n > > correct. > >> Before: >> * After push, n retains the ref_obj_id which it received on >> bpf_obj_new, but the associated bpf_reference_state's >> release_on_unlock field is set to true >> * release_on_unlock field and associated logic is used to implement >> "n is only a valid ref until end of critical section" >> * After push, n cannot be written to, the node must be removed from >> the list before writing to its fields >> * After push, n is marked PTR_UNTRUSTED > > yep > >> After: >> * After push, n's ref is released and ref_obj_id set to 0. The >> bpf_reg_state's non_owning_ref_lock struct is populated with the >> currently active lock >> * non_owning_ref_lock and logic is used to implement "n is only a >> valid ref until end of critical section" >> * n can be written to (except for special fields e.g. bpf_list_node, >> timer, ...) >> * No special type flag is added to n after push > > yep. > Great summary. > >> Summary of specific implementation changes to achieve the above: >> >> * release_on_unlock field, ref_set_release_on_unlock helper, and logic >> to "release on unlock" based on that field are removed > > +1 > >> * The anonymous active_lock struct used by bpf_verifier_state is >> pulled out into a named struct bpf_active_lock. > ... >> * A non_owning_ref_lock field of type bpf_active_lock is added to >> bpf_reg_state's PTR_TO_BTF_ID union > > not great. see below. > >> * Helpers are added to use non_owning_ref_lock to implement non-owning >> ref semantics as described above >> * invalidate_non_owning_refs - helper to clobber all non-owning refs >> matching a particular bpf_active_lock identity. Replaces >> release_on_unlock logic in process_spin_lock. > > +1 > >> * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based >> on current verifier state > > +1 > >> * ref_convert_owning_non_owning - convert owning reference w/ >> specified ref_obj_id to non-owning references. Setup >> non_owning_ref_lock for each reg with that ref_obj_id and 0 out >> its ref_obj_id > > +1 > >> * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with >> KF_RELEASE to indicate that the release arg reg should be converted >> to non-owning ref >> * Plain KF_RELEASE would clobber all regs with ref_obj_id matching >> the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first - >> doing ref_convert_owning_non_owning on the ref first, which >> prevents the regs from being clobbered by 0ing out their >> ref_obj_ids. The bpf_reference_state itself is still released via >> release_reference as a result of the KF_RELEASE flag. >> * KF_RELEASE | KF_RELEASE_NON_OWN are added to >> bpf_list_push_{front,back} > > And this bit is confusing and not generalizable. > As David noticed in his reply KF_RELEASE_NON_OWN is not a great name. > It's hard to come up with a good name and it won't be generic anyway. > The ref_convert_owning_non_owning has to be applied to a specific arg. > The function itself is not KF_RELEASE in the current definition of it. > The combination of KF_RELEASE|KF_RELEASE_NON_OWN is something new > that should have been generic, but doesn't really work this way. > In the next patches rbtree_root/node still has to have all the custom > logic. > KF_RELEASE_NON_OWN by itself is a nonsensical flag. > Only combination of KF_RELEASE|KF_RELEASE_NON_OWN sort-of kinda makes > sense, but still hard to understand what releases what. > More below. > Addressed below (in response to your 'here we come to the main point' comment). >> After these changes, linked_list's "release on unlock" logic continues >> to function as before, except for the semantic differences noted above. >> The patch immediately following this one makes minor changes to >> linked_list selftests to account for the differing behavior. >> >> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> >> --- >> include/linux/bpf.h | 1 + >> include/linux/bpf_verifier.h | 39 ++++----- >> include/linux/btf.h | 17 ++-- >> kernel/bpf/helpers.c | 4 +- >> kernel/bpf/verifier.c | 164 ++++++++++++++++++++++++----------- >> 5 files changed, 146 insertions(+), 79 deletions(-) >> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >> index 3de24cfb7a3d..f71571bf6adc 100644 >> --- a/include/linux/bpf.h >> +++ b/include/linux/bpf.h >> @@ -180,6 +180,7 @@ enum btf_field_type { >> BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF, >> BPF_LIST_HEAD = (1 << 4), >> BPF_LIST_NODE = (1 << 5), >> + BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD, >> }; >> >> struct btf_field_kptr { >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h >> index 53d175cbaa02..cb417ffbbb84 100644 >> --- a/include/linux/bpf_verifier.h >> +++ b/include/linux/bpf_verifier.h >> @@ -43,6 +43,22 @@ enum bpf_reg_liveness { >> REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */ >> }; >> >> +/* For every reg representing a map value or allocated object pointer, >> + * we consider the tuple of (ptr, id) for them to be unique in verifier >> + * context and conside them to not alias each other for the purposes of >> + * tracking lock state. >> + */ >> +struct bpf_active_lock { >> + /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, >> + * there's no active lock held, and other fields have no >> + * meaning. If non-NULL, it indicates that a lock is held and >> + * id member has the reg->id of the register which can be >= 0. >> + */ >> + void *ptr; >> + /* This will be reg->id */ >> + u32 id; >> +}; >> + >> struct bpf_reg_state { >> /* Ordering of fields matters. See states_equal() */ >> enum bpf_reg_type type; >> @@ -68,6 +84,7 @@ struct bpf_reg_state { >> struct { >> struct btf *btf; >> u32 btf_id; >> + struct bpf_active_lock non_owning_ref_lock; > > In your other email you argue that pointer should be enough. > I suspect that won't be correct. > See fixes that Andrii did in states_equal() and regsafe(). > In particular: > if (!!old->active_lock.id != !!cur->active_lock.id) > return false; > > if (old->active_lock.id && > !check_ids(old->active_lock.id, cur->active_lock.id, env->idmap_scratch)) > return false; > > We have to do the comparison of this new ID via idmap as well. > > I think introduction of struct bpf_active_lock and addition of it > to bpf_reg_state is overkill. > Here we can add 'u32 non_own_ref_obj_id;' only and compare it via idmap in regsafe(). > I'm guessing you didn't like my 'active_lock_id' suggestion. Fine. > non_own_ref_obj_id would match existing ref_obj_id at least. > >> }; >> >> u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */ >> @@ -223,11 +240,6 @@ struct bpf_reference_state { >> * exiting a callback function. >> */ >> int callback_ref; >> - /* Mark the reference state to release the registers sharing the same id >> - * on bpf_spin_unlock (for nodes that we will lose ownership to but are >> - * safe to access inside the critical section). >> - */ >> - bool release_on_unlock; >> }; >> >> /* state of the program: >> @@ -328,21 +340,8 @@ struct bpf_verifier_state { >> u32 branches; >> u32 insn_idx; >> u32 curframe; >> - /* For every reg representing a map value or allocated object pointer, >> - * we consider the tuple of (ptr, id) for them to be unique in verifier >> - * context and conside them to not alias each other for the purposes of >> - * tracking lock state. >> - */ >> - struct { >> - /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, >> - * there's no active lock held, and other fields have no >> - * meaning. If non-NULL, it indicates that a lock is held and >> - * id member has the reg->id of the register which can be >= 0. >> - */ >> - void *ptr; >> - /* This will be reg->id */ >> - u32 id; >> - } active_lock; > > I would keep it as-is. > >> + >> + struct bpf_active_lock active_lock; >> bool speculative; >> bool active_rcu_lock; >> >> diff --git a/include/linux/btf.h b/include/linux/btf.h >> index 5f628f323442..8aee3f7f4248 100644 >> --- a/include/linux/btf.h >> +++ b/include/linux/btf.h >> @@ -15,10 +15,10 @@ >> #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) >> >> /* These need to be macros, as the expressions are used in assembler input */ >> -#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ >> -#define KF_RELEASE (1 << 1) /* kfunc is a release function */ >> -#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ >> -#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ >> +#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ >> +#define KF_RELEASE (1 << 1) /* kfunc is a release function */ >> +#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ >> +#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ >> /* Trusted arguments are those which are guaranteed to be valid when passed to >> * the kfunc. It is used to enforce that pointers obtained from either acquire >> * kfuncs, or from the main kernel on a tracepoint or struct_ops callback >> @@ -67,10 +67,11 @@ >> * return 0; >> * } >> */ >> -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ >> -#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ >> -#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ >> -#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ >> +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ >> +#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ >> +#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ >> +#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ >> +#define KF_RELEASE_NON_OWN (1 << 8) /* kfunc converts its referenced arg into non-owning ref */ > > No need for this flag. > Addressed below (in re: your 'here we come to the main point' comment) >> /* >> * Return the name of the passed struct, if exists, or halt the build if for >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c >> index af30c6cbd65d..e041409779c3 100644 >> --- a/kernel/bpf/helpers.c >> +++ b/kernel/bpf/helpers.c >> @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) >> #endif >> BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL) >> BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE) >> -BTF_ID_FLAGS(func, bpf_list_push_front) >> -BTF_ID_FLAGS(func, bpf_list_push_back) >> +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN) >> +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN) > > No need for this. > Addressed below (in re: your 'here we come to the main point' comment) >> BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL) >> BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL) >> BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 824e2242eae5..84b0660e2a76 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -190,6 +190,10 @@ struct bpf_verifier_stack_elem { >> >> static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx); >> static int release_reference(struct bpf_verifier_env *env, int ref_obj_id); >> +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, >> + struct bpf_active_lock *lock); >> +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, >> + struct bpf_reg_state *reg); >> >> static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux) >> { >> @@ -931,6 +935,9 @@ static void print_verifier_state(struct bpf_verifier_env *env, >> verbose_a("id=%d", reg->id); >> if (reg->ref_obj_id) >> verbose_a("ref_obj_id=%d", reg->ref_obj_id); >> + if (reg->non_owning_ref_lock.ptr) >> + verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr, >> + reg->non_owning_ref_lock.id); >> if (t != SCALAR_VALUE) >> verbose_a("off=%d", reg->off); >> if (type_is_pkt_pointer(t)) >> @@ -4820,7 +4827,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, >> return -EACCES; >> } >> >> - if (type_is_alloc(reg->type) && !reg->ref_obj_id) { >> + if (type_is_alloc(reg->type) && !reg->ref_obj_id && >> + !reg->non_owning_ref_lock.ptr) { >> verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n"); >> return -EFAULT; >> } >> @@ -5778,9 +5786,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, >> cur->active_lock.ptr = btf; >> cur->active_lock.id = reg->id; >> } else { >> - struct bpf_func_state *fstate = cur_func(env); >> void *ptr; >> - int i; >> >> if (map) >> ptr = map; >> @@ -5796,25 +5802,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, >> verbose(env, "bpf_spin_unlock of different lock\n"); >> return -EINVAL; >> } >> - cur->active_lock.ptr = NULL; >> - cur->active_lock.id = 0; >> >> - for (i = fstate->acquired_refs - 1; i >= 0; i--) { >> - int err; >> + invalidate_non_owning_refs(env, &cur->active_lock); > > +1 > >> - /* Complain on error because this reference state cannot >> - * be freed before this point, as bpf_spin_lock critical >> - * section does not allow functions that release the >> - * allocated object immediately. >> - */ >> - if (!fstate->refs[i].release_on_unlock) >> - continue; >> - err = release_reference(env, fstate->refs[i].id); >> - if (err) { >> - verbose(env, "failed to release release_on_unlock reference"); >> - return err; >> - } >> - } >> + cur->active_lock.ptr = NULL; >> + cur->active_lock.id = 0; > > +1 > >> } >> return 0; >> } >> @@ -6273,6 +6265,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, >> return 0; >> } >> >> +static struct btf_field * >> +reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields) >> +{ >> + struct btf_field *field; >> + struct btf_record *rec; >> + >> + rec = reg_btf_record(reg); >> + if (!reg) >> + return NULL; >> + >> + field = btf_record_find(rec, off, fields); >> + if (!field) >> + return NULL; >> + >> + return field; >> +} > > Doesn't look like that this helper is really necessary. > The helper is used in other places in the series. It saves some boilerplate since usually when reg's btf_record is fetched it's just to look for the presence of some field. If all uses of the helper in this patch are removed, I will move the def to first patch where it's used when I respin. >> + >> int check_func_arg_reg_off(struct bpf_verifier_env *env, >> const struct bpf_reg_state *reg, int regno, >> enum bpf_arg_type arg_type) >> @@ -6294,6 +6303,18 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, >> */ >> if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK) >> return 0; >> + >> + if (type == (PTR_TO_BTF_ID | MEM_ALLOC) && reg->off) { >> + if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT)) >> + return __check_ptr_off_reg(env, reg, regno, true); >> + >> + verbose(env, "R%d must have zero offset when passed to release func\n", >> + regno); >> + verbose(env, "No graph node or root found at R%d type:%s off:%d\n", regno, >> + kernel_type_name(reg->btf, reg->btf_id), reg->off); >> + return -EINVAL; >> + } > > This bit is only necessary if we mark push_list as KF_RELEASE. > Just don't add this mark and drop above. > Addressed below (in re: your 'here we come to the main point' comment) >> + >> /* Doing check_ptr_off_reg check for the offset will catch this >> * because fixed_off_ok is false, but checking here allows us >> * to give the user a better error message. >> @@ -7055,6 +7076,20 @@ static int release_reference(struct bpf_verifier_env *env, >> return 0; >> } >> >> +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, >> + struct bpf_active_lock *lock) >> +{ >> + struct bpf_func_state *unused; >> + struct bpf_reg_state *reg; >> + >> + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ >> + if (reg->non_owning_ref_lock.ptr && >> + reg->non_owning_ref_lock.ptr == lock->ptr && >> + reg->non_owning_ref_lock.id == lock->id) > > I think the lock.ptr = lock->ptr comparison is unnecessary to invalidate things. > We're under active spin_lock here. All regs were checked earlier and id keeps incrementing. > So we can just do 'u32 non_own_ref_obj_id'. > >> + __mark_reg_unknown(env, reg); >> + })); >> +} >> + >> static void clear_caller_saved_regs(struct bpf_verifier_env *env, >> struct bpf_reg_state *regs) >> { >> @@ -8266,6 +8301,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta) >> return meta->kfunc_flags & KF_RELEASE; >> } >> >> +static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta) >> +{ >> + return meta->kfunc_flags & KF_RELEASE_NON_OWN; >> +} >> + > > No need. > Addressed below (in re: your 'here we come to the main point' comment) >> static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta) >> { >> return meta->kfunc_flags & KF_TRUSTED_ARGS; >> @@ -8651,38 +8691,55 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env, >> return 0; >> } >> >> -static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id) >> +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_state *reg) >> { >> - struct bpf_func_state *state = cur_func(env); >> + struct bpf_verifier_state *state = env->cur_state; >> + >> + if (!state->active_lock.ptr) { >> + verbose(env, "verifier internal error: ref_set_non_owning_lock w/o active lock\n"); >> + return -EFAULT; >> + } >> + >> + if (reg->non_owning_ref_lock.ptr) { >> + verbose(env, "verifier internal error: non_owning_ref_lock already set\n"); >> + return -EFAULT; >> + } >> + >> + reg->non_owning_ref_lock.id = state->active_lock.id; >> + reg->non_owning_ref_lock.ptr = state->active_lock.ptr; >> + return 0; >> +} >> + >> +static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id) >> +{ >> + struct bpf_func_state *state, *unused; >> struct bpf_reg_state *reg; >> int i; >> >> - /* bpf_spin_lock only allows calling list_push and list_pop, no BPF >> - * subprogs, no global functions. This means that the references would >> - * not be released inside the critical section but they may be added to >> - * the reference state, and the acquired_refs are never copied out for a >> - * different frame as BPF to BPF calls don't work in bpf_spin_lock >> - * critical sections. >> - */ >> + state = cur_func(env); >> + >> if (!ref_obj_id) { >> - verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n"); >> + verbose(env, "verifier internal error: ref_obj_id is zero for " >> + "owning -> non-owning conversion\n"); >> return -EFAULT; >> } >> + >> for (i = 0; i < state->acquired_refs; i++) { >> - if (state->refs[i].id == ref_obj_id) { >> - if (state->refs[i].release_on_unlock) { >> - verbose(env, "verifier internal error: expected false release_on_unlock"); >> - return -EFAULT; >> + if (state->refs[i].id != ref_obj_id) >> + continue; >> + >> + /* Clear ref_obj_id here so release_reference doesn't clobber >> + * the whole reg >> + */ >> + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ >> + if (reg->ref_obj_id == ref_obj_id) { >> + reg->ref_obj_id = 0; >> + ref_set_non_owning_lock(env, reg); > > +1 except ref_set_... name doesn't quite fit. reg_set_... is more accurate, no? > and probably reg_set_non_own_ref_obj_id() ? > Or just open code it? > I like reg_set_... Will change >> } >> - state->refs[i].release_on_unlock = true; >> - /* Now mark everyone sharing same ref_obj_id as untrusted */ >> - bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ >> - if (reg->ref_obj_id == ref_obj_id) >> - reg->type |= PTR_UNTRUSTED; >> - })); >> - return 0; >> - } >> + })); >> + return 0; >> } >> + >> verbose(env, "verifier internal error: ref state missing for ref_obj_id\n"); >> return -EFAULT; >> } >> @@ -8817,7 +8874,6 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, >> { >> const struct btf_type *et, *t; >> struct btf_field *field; >> - struct btf_record *rec; >> u32 list_node_off; >> >> if (meta->btf != btf_vmlinux || >> @@ -8834,9 +8890,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, >> return -EINVAL; >> } >> >> - rec = reg_btf_record(reg); >> list_node_off = reg->off + reg->var_off.value; >> - field = btf_record_find(rec, list_node_off, BPF_LIST_NODE); >> + field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE); >> if (!field || field->offset != list_node_off) { >> verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off); >> return -EINVAL; >> @@ -8861,8 +8916,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, >> btf_name_by_offset(field->list_head.btf, et->name_off)); >> return -EINVAL; >> } >> - /* Set arg#1 for expiration after unlock */ >> - return ref_set_release_on_unlock(env, reg->ref_obj_id); >> + >> + return 0; > > and here we come to the main point. > Can you just call > ref_convert_owning_non_owning(env, reg->ref_obj_id) and release_reference() here? > Everything will be so much simpler, no? > IIUC, your proposal here is what you'd like me to do instead of KF_RELEASE_NON_OWN kfunc flag. I think all the points you're making are interrelated, so I'll summarize my understanding of them and reply to them as a group here. 1) KF_RELEASE_NON_OWN shouldn't depend on KF_RELEASE and thus shouldn't require the flags to be used together. Why? It's confusing and KF_RELEASE_NON_OWN isn't really doing a 'release' by current KF_RELEASE semantics since it's converting to non-owning. This point seems reasonable to me. KF_RELEASE_NON_OWN wants to do many things that KF_RELEASE does, and in the same places (e.g. release_reference(), confirm that there's a referenced arg passed in). Adding KF_RELEASE_NON_OWN logic as a special-casing of KF_RELEASE logic in a few places reduced amount of changes to verifier, at the cost of tying the flags together. That cost doesn't seem worth it if it's confusing to read and muddies the meaning of 'RELEASE'. So agreed. Will make KF_RELEASE_NON_OWN logic separate from KF_RELEASE. 2) process_kf_arg_ptr_to_list_node, renamed __process_kf_arg_ptr_to_graph_node in further patches in the series, should handle owning -> non-owning conversion, instead of relying on a kfunc flag. This is how the implementation in v1 worked. In [0] both list_head and rb_root start using same __process_kf_arg_ptr_to_datastructure_node helper, which does ref_set_release_on_unlock. Then in [1] the ref_set_release_on_unlock call is moved out of the __process helper because process_kf_arg_ptr_to_rbtree_node needs to special-case its behavior for bpf_rbtree_remove. That special casing led me to move away from doing the owning -> non owning conversion in the process_ helpers. My logic was as follows: * process_kf_arg_ptr_to_{list_node,rbtree_node} act on the arg reg when it's a specific type, but owning -> non-owning conversion is more of a function- level property. e.g. rbtree_add sees an owning rbtree_node and converts it to non-owning, but not all functions which take an owning rbtree_node will want to do this. * Really it's both an arg/type-level behavior and a function-level behavior. stable BPF helper func proto's .arg1_type w/ base type and flags would be a better way of expressing this, IMO, as it'd remove the need to search for the arg-to-release with the current KF_RELEASE kfunc-level flag. * The 'deeper' in arg reg helper functions special-casing based on function is , the harder code becomes to understand. * Retval special-casing based on function is less confusing since that's usually in 'check_kfunc_call' with minimal helper usage. This is why I kept some special-cased logic for rbtree_remove retval in this series, although ideally there would be a named semantic for that as well. But as you mention in this patch and others, we can have this be function-level behavior without adding a new kfunc flag, by special-casing in the appropriate spots. This suggestion is pretty reasonable to me, but I'd like to make the case for keeping the named kfunc flags. I made a mistake when I used 'generalize' to describe the purpose of moving logic behind kfunc flags. You and David Vernet correctly state that it's not likely that the logic will be used by many other kfuncs; even other graph datastructure API kfuncs won't be that numerous, so is it really 'general' functionality? Really what I meant by 'generalize' was 'give this behavior a name that clearly ties it to graph datastructure semantics'. By 'clearly ties it to ...', I mean: * It should be obvious that the named semantic is not unique to the particular kfunc * It should be obvious that the named semantic is tied to other named graph datastructure semantics * When specific semantics are discussed in the documentation or on the mailing list, it should be easy to tie the concept being discussed to some specific code without ambiguity Personally, whenever I see "func_id == BPF_FUNC_whatever" (or kfunc equivalent), it's not clear to me whether the logic follows is unique to the helper or is due to the helper being e.g. "special dynptr helper". For this graph ds stuff specifically, you had trouble understanding what I wanted to do until we stepped back from the specific implementation and talked about general semantics of what args / retval look like before / after kfunc call. Since we nailed down the semantics - in some detail - in earlier convos, and decided to document them outside of the code, it made sense to me to give them top-level names. Your (and David's) comment that "KF_RELEASE_NON_OWN is not a great name" is IMO acknowledgement that giving the semantic a _good_ name would be useful. How about I try to make the names better in v3 instead of removing the kfunc flags entirely? If you're still opposed after that, I will instead add helpers with comments like: /* Implement 'release non owning reference' semantic as described by graph * ds documentation */ void graph_ds_release_non_own_ref() { ... } To satisfy my bullets above. [0]: lore.kernel.org/bpf/20221206231000.3180914-8-davemarchevsky@fb.com [1]: lore.kernel.org/bpf/20221206231000.3180914-10-davemarchevsky@fb.com >> } >> >> static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta) >> @@ -9132,11 +9187,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, >> int *insn_idx_p) >> { >> const struct btf_type *t, *func, *func_proto, *ptr_type; >> + u32 i, nargs, func_id, ptr_type_id, release_ref_obj_id; >> struct bpf_reg_state *regs = cur_regs(env); >> const char *func_name, *ptr_type_name; >> bool sleepable, rcu_lock, rcu_unlock; >> struct bpf_kfunc_call_arg_meta meta; >> - u32 i, nargs, func_id, ptr_type_id; >> int err, insn_idx = *insn_idx_p; >> const struct btf_param *args; >> const struct btf_type *ret_t; >> @@ -9223,7 +9278,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, >> * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now. >> */ >> if (meta.release_regno) { >> - err = release_reference(env, regs[meta.release_regno].ref_obj_id); >> + err = 0; >> + release_ref_obj_id = regs[meta.release_regno].ref_obj_id; >> + >> + if (is_kfunc_release_non_own(&meta)) >> + err = ref_convert_owning_non_owning(env, release_ref_obj_id); >> + if (err) { >> + verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n", >> + func_name, func_id); >> + return err; >> + } >> + >> + err = release_reference(env, release_ref_obj_id); > > and this bit won't be needed. > and no need to guess in patch 1 which arg has to be released and converted to non_own. > Addressed above (in re: your 'here we come to the main point' comment) >> if (err) { >> verbose(env, "kfunc %s#%d reference has not been acquired before\n", >> func_name, func_id); >> -- >> 2.30.2 >>
On 12/29/22 11:54 AM, David Vernet wrote: > On Wed, Dec 28, 2022 at 07:56:00PM -0800, Alexei Starovoitov wrote: >> On Sat, Dec 17, 2022 at 12:24:55AM -0800, Dave Marchevsky wrote: >>> This patch introduces non-owning reference semantics to the verifier, >>> specifically linked_list API kfunc handling. release_on_unlock logic for >>> refs is refactored - with small functional changes - to implement these >>> semantics, and bpf_list_push_{front,back} are migrated to use them. >>> >>> When a list node is pushed to a list, the program still has a pointer to >>> the node: >>> >>> n = bpf_obj_new(typeof(*n)); >>> >>> bpf_spin_lock(&l); >>> bpf_list_push_back(&l, n); >>> /* n still points to the just-added node */ >>> bpf_spin_unlock(&l); >>> >>> What the verifier considers n to be after the push, and thus what can be >>> done with n, are changed by this patch. >>> >>> Common properties both before/after this patch: >>> * After push, n is only a valid reference to the node until end of >>> critical section >>> * After push, n cannot be pushed to any list >>> * After push, the program can read the node's fields using n >> >> correct. >> >>> Before: >>> * After push, n retains the ref_obj_id which it received on >>> bpf_obj_new, but the associated bpf_reference_state's >>> release_on_unlock field is set to true >>> * release_on_unlock field and associated logic is used to implement >>> "n is only a valid ref until end of critical section" >>> * After push, n cannot be written to, the node must be removed from >>> the list before writing to its fields >>> * After push, n is marked PTR_UNTRUSTED >> >> yep >> >>> After: >>> * After push, n's ref is released and ref_obj_id set to 0. The >>> bpf_reg_state's non_owning_ref_lock struct is populated with the >>> currently active lock >>> * non_owning_ref_lock and logic is used to implement "n is only a >>> valid ref until end of critical section" >>> * n can be written to (except for special fields e.g. bpf_list_node, >>> timer, ...) >>> * No special type flag is added to n after push >> >> yep. >> Great summary. >> >>> Summary of specific implementation changes to achieve the above: >>> >>> * release_on_unlock field, ref_set_release_on_unlock helper, and logic >>> to "release on unlock" based on that field are removed >> >> +1 >> >>> * The anonymous active_lock struct used by bpf_verifier_state is >>> pulled out into a named struct bpf_active_lock. >> ... >>> * A non_owning_ref_lock field of type bpf_active_lock is added to >>> bpf_reg_state's PTR_TO_BTF_ID union >> >> not great. see below. >> >>> * Helpers are added to use non_owning_ref_lock to implement non-owning >>> ref semantics as described above >>> * invalidate_non_owning_refs - helper to clobber all non-owning refs >>> matching a particular bpf_active_lock identity. Replaces >>> release_on_unlock logic in process_spin_lock. >> >> +1 >> >>> * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based >>> on current verifier state >> >> +1 >> >>> * ref_convert_owning_non_owning - convert owning reference w/ >>> specified ref_obj_id to non-owning references. Setup >>> non_owning_ref_lock for each reg with that ref_obj_id and 0 out >>> its ref_obj_id >> >> +1 >> >>> * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with >>> KF_RELEASE to indicate that the release arg reg should be converted >>> to non-owning ref >>> * Plain KF_RELEASE would clobber all regs with ref_obj_id matching >>> the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first - >>> doing ref_convert_owning_non_owning on the ref first, which >>> prevents the regs from being clobbered by 0ing out their >>> ref_obj_ids. The bpf_reference_state itself is still released via >>> release_reference as a result of the KF_RELEASE flag. >>> * KF_RELEASE | KF_RELEASE_NON_OWN are added to >>> bpf_list_push_{front,back} >> >> And this bit is confusing and not generalizable. > > +1 on both counts. If we want to make it generalizable, I think the only > way to do would be to generalize it across different graph map types. > For example, to have kfunc flags like KF_GRAPH_INSERT and > KF_GRAPH_REMOVE which signal to the verifier that "for this graph-type > map which has a spin-lock associated with its root node that I expect to > be held, I've {inserted, removed} the node {to, from} the graph, so > adjust the refcnt / pointer type accordingly and then clean up when the > lock is dropped." > > I don't see any reason to add kfunc flags for that though, as the fact > that the pointer in question refers to a node that has a root node that > has a lock associated with it is already itself a special-case scenario. > I think we should just special-case these kfuncs in the verifier as > "graph-type" kfuncs in some static global array(s). That's probably > less error prone anyways, and I don't see the typical kfunc writer ever > needing to do this. > re: "generalizable" and "why add a kfunc flag at all", I addressed that in a side-reply to the msg which you're replying to here [0]. But to address your specific points: "the fact that the pointer in question refers to a node that has a root node that has a lock associated with it is already itself a special-case scenario" Are you trying to say here that because the arg is of a special type, special behavior should be tied to that arg type instead of the function? If so, that's addressed in [0]. A function with KF_RELEASE_NON_OWN semantics certainly does need args of a certain type in order to do its thing, but the semantic is really a function-level thing. If we can tag the function with it, then later check arg regs, that's preferable to checking kfunc id while processing arg regs, as the latter conflates "the arg is of this special type so the function should do X" with "the function does X so it must have an arg with this special type". [0]: https://lore.kernel.org/bpf/9763aed7-0284-e400-b4dc-ed01718d8e1e@meta.com/ >> As David noticed in his reply KF_RELEASE_NON_OWN is not a great name. >> It's hard to come up with a good name and it won't be generic anyway. >> The ref_convert_owning_non_owning has to be applied to a specific arg. >> The function itself is not KF_RELEASE in the current definition of it. >> The combination of KF_RELEASE|KF_RELEASE_NON_OWN is something new >> that should have been generic, but doesn't really work this way. >> In the next patches rbtree_root/node still has to have all the custom >> logic. >> KF_RELEASE_NON_OWN by itself is a nonsensical flag. > > IMO if a flag doesn't make any sense on its own, or even possibly if it > needs to be mutually exclusive with one or more other flags, it is > probably never a correct building block. Even KF_TRUSTED_ARGS doesn't > really make sense, as it's redundant if KF_RCU is specified. This is > fine though, as IIUC our long-term plan is to get rid of KF_TRUSTED_ARGS > and make it the default behavior for all kfuncs (not trying to hijack > this thread with a tangential discussion about KF_TRUSTED_ARGS, just > using this as an opportunity to point out something to keep in mind as > we continue to add kfunc flags down the road). > I'm fine with making KF_RELEASE_NON_OWN not depend on KF_RELEASE. Addressed in [0] above. >> Only combination of KF_RELEASE|KF_RELEASE_NON_OWN sort-of kinda makes >> sense, but still hard to understand what releases what. > > I agree and I think this is an important point. IMO it is a worse > tradeoff to try to generalize this by complicating the definition of a > reference than it is to keep the refcounting APIs straightforward and > well defined. As a basic building block, having an owning refcount > should mean one thing: that the object will not be destroyed and is safe > to dereference. When you start mixing in these graph-specific notions of > references meaning different things in specific contexts, it compromises > that and makes the API significantly less usable and extensible. > "Generalize" was the wrong word for me to use here. Addressed in [0] above. Regarding polluting the meaning of "reference": owning and non-owning references are intentionally scoped to graph datastructures only, and have well-defined and documented meaning in that context. Elsewhere in the verifier "reference", "owning refcount", etc are not well-defined as folks have been adding whatever semantics they need to get their stuff working for some time. Scoping these new concepts to graph datastructures only is my attempt at making progress without adding to that confusion. > For example, at some point we may decide to add something like a > kptr_weak_ref which would function exactly like an std::weak_ptr, except > of course that it would wrap a kptr_ref instead of an std::shared_ptr. > IMO something like that is a more natural and generalizable building > block that cleanly complements refcounting as it exists today. > Any functionality that implements the desired semantics for rbtree / linked_list is fine with me. If it's a superset of what I'm adding here, happy to migrate. If changes to rbtree/linked_list APIs are needed to make such migration possible, luckily it's all unstable kptr/kfunc, so that better future state isn't blocked by these semantics / implementation. All this next-gen datastructure work has been an exercise in YAGNI and scope reduction. Luckily since it's all unstable API we're not backing ourselves into any corners by doing so. re "std::weak_ptr" idea more specifically - despite obvious similarities to some rust or cpp concepts, I've been intentionally avoiding trying to sell this work as such or copying semantics wholesale. Better to focus on the specific things needed to move forward and avoid starting big-scope arguments like "should we just add std::shared_ptr semantics?" "should the verifier be doing 'borrow checking' similar to rust, and if so to what extent". Don't get me wrong, I'd find such discussions interesting, but a YAGNI approach where such functionality is gradually added in response to concrete usecases will likely save much contentious back-and-forth. >> More below. >> >>> After these changes, linked_list's "release on unlock" logic continues >>> to function as before, except for the semantic differences noted above. >>> The patch immediately following this one makes minor changes to >>> linked_list selftests to account for the differing behavior. >>> >>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> >>> --- >>> include/linux/bpf.h | 1 + >>> include/linux/bpf_verifier.h | 39 ++++----- >>> include/linux/btf.h | 17 ++-- >>> kernel/bpf/helpers.c | 4 +- >>> kernel/bpf/verifier.c | 164 ++++++++++++++++++++++++----------- >>> 5 files changed, 146 insertions(+), 79 deletions(-) >>> >>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >>> index 3de24cfb7a3d..f71571bf6adc 100644 >>> --- a/include/linux/bpf.h >>> +++ b/include/linux/bpf.h >>> @@ -180,6 +180,7 @@ enum btf_field_type { >>> BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF, >>> BPF_LIST_HEAD = (1 << 4), >>> BPF_LIST_NODE = (1 << 5), >>> + BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD, > > Can you update the rest of the elements here to keep common indentation? > Ack >>> }; >>> >>> struct btf_field_kptr { >>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h >>> index 53d175cbaa02..cb417ffbbb84 100644 >>> --- a/include/linux/bpf_verifier.h >>> +++ b/include/linux/bpf_verifier.h >>> @@ -43,6 +43,22 @@ enum bpf_reg_liveness { >>> REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */ >>> }; >>> >>> +/* For every reg representing a map value or allocated object pointer, >>> + * we consider the tuple of (ptr, id) for them to be unique in verifier >>> + * context and conside them to not alias each other for the purposes of >>> + * tracking lock state. >>> + */ >>> +struct bpf_active_lock { >>> + /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, >>> + * there's no active lock held, and other fields have no >>> + * meaning. If non-NULL, it indicates that a lock is held and >>> + * id member has the reg->id of the register which can be >= 0. >>> + */ >>> + void *ptr; >>> + /* This will be reg->id */ >>> + u32 id; >>> +}; >>> + >>> struct bpf_reg_state { >>> /* Ordering of fields matters. See states_equal() */ >>> enum bpf_reg_type type; >>> @@ -68,6 +84,7 @@ struct bpf_reg_state { >>> struct { >>> struct btf *btf; >>> u32 btf_id; >>> + struct bpf_active_lock non_owning_ref_lock; >> >> In your other email you argue that pointer should be enough. >> I suspect that won't be correct. >> See fixes that Andrii did in states_equal() and regsafe(). >> In particular: >> if (!!old->active_lock.id != !!cur->active_lock.id) >> return false; >> >> if (old->active_lock.id && >> !check_ids(old->active_lock.id, cur->active_lock.id, env->idmap_scratch)) >> return false; >> >> We have to do the comparison of this new ID via idmap as well. >> >> I think introduction of struct bpf_active_lock and addition of it >> to bpf_reg_state is overkill. >> Here we can add 'u32 non_own_ref_obj_id;' only and compare it via idmap in regsafe(). >> I'm guessing you didn't like my 'active_lock_id' suggestion. Fine. >> non_own_ref_obj_id would match existing ref_obj_id at least. >> >>> }; >>> >>> u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */ >>> @@ -223,11 +240,6 @@ struct bpf_reference_state { >>> * exiting a callback function. >>> */ >>> int callback_ref; >>> - /* Mark the reference state to release the registers sharing the same id >>> - * on bpf_spin_unlock (for nodes that we will lose ownership to but are >>> - * safe to access inside the critical section). >>> - */ >>> - bool release_on_unlock; >>> }; >>> >>> /* state of the program: >>> @@ -328,21 +340,8 @@ struct bpf_verifier_state { >>> u32 branches; >>> u32 insn_idx; >>> u32 curframe; >>> - /* For every reg representing a map value or allocated object pointer, >>> - * we consider the tuple of (ptr, id) for them to be unique in verifier >>> - * context and conside them to not alias each other for the purposes of >>> - * tracking lock state. >>> - */ >>> - struct { >>> - /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, >>> - * there's no active lock held, and other fields have no >>> - * meaning. If non-NULL, it indicates that a lock is held and >>> - * id member has the reg->id of the register which can be >= 0. >>> - */ >>> - void *ptr; >>> - /* This will be reg->id */ >>> - u32 id; >>> - } active_lock; >> >> I would keep it as-is. >> >>> + >>> + struct bpf_active_lock active_lock; >>> bool speculative; >>> bool active_rcu_lock; >>> >>> diff --git a/include/linux/btf.h b/include/linux/btf.h >>> index 5f628f323442..8aee3f7f4248 100644 >>> --- a/include/linux/btf.h >>> +++ b/include/linux/btf.h >>> @@ -15,10 +15,10 @@ >>> #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) >>> >>> /* These need to be macros, as the expressions are used in assembler input */ >>> -#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ >>> -#define KF_RELEASE (1 << 1) /* kfunc is a release function */ >>> -#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ >>> -#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ >>> +#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ >>> +#define KF_RELEASE (1 << 1) /* kfunc is a release function */ >>> +#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ >>> +#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ >>> /* Trusted arguments are those which are guaranteed to be valid when passed to >>> * the kfunc. It is used to enforce that pointers obtained from either acquire >>> * kfuncs, or from the main kernel on a tracepoint or struct_ops callback >>> @@ -67,10 +67,11 @@ >>> * return 0; >>> * } >>> */ >>> -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ >>> -#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ >>> -#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ >>> -#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ >>> +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ >>> +#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ >>> +#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ >>> +#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ >>> +#define KF_RELEASE_NON_OWN (1 << 8) /* kfunc converts its referenced arg into non-owning ref */ >> >> No need for this flag. >> >>> /* >>> * Return the name of the passed struct, if exists, or halt the build if for >>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c >>> index af30c6cbd65d..e041409779c3 100644 >>> --- a/kernel/bpf/helpers.c >>> +++ b/kernel/bpf/helpers.c >>> @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) >>> #endif >>> BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL) >>> BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE) >>> -BTF_ID_FLAGS(func, bpf_list_push_front) >>> -BTF_ID_FLAGS(func, bpf_list_push_back) >>> +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN) >>> +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN) >> >> No need for this. >> >>> BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL) >>> BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL) >>> BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >>> index 824e2242eae5..84b0660e2a76 100644 >>> --- a/kernel/bpf/verifier.c >>> +++ b/kernel/bpf/verifier.c >>> @@ -190,6 +190,10 @@ struct bpf_verifier_stack_elem { >>> >>> static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx); >>> static int release_reference(struct bpf_verifier_env *env, int ref_obj_id); >>> +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, >>> + struct bpf_active_lock *lock); >>> +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, >>> + struct bpf_reg_state *reg); >>> >>> static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux) >>> { >>> @@ -931,6 +935,9 @@ static void print_verifier_state(struct bpf_verifier_env *env, >>> verbose_a("id=%d", reg->id); >>> if (reg->ref_obj_id) >>> verbose_a("ref_obj_id=%d", reg->ref_obj_id); >>> + if (reg->non_owning_ref_lock.ptr) >>> + verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr, >>> + reg->non_owning_ref_lock.id); >>> if (t != SCALAR_VALUE) >>> verbose_a("off=%d", reg->off); >>> if (type_is_pkt_pointer(t)) >>> @@ -4820,7 +4827,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, >>> return -EACCES; >>> } >>> >>> - if (type_is_alloc(reg->type) && !reg->ref_obj_id) { >>> + if (type_is_alloc(reg->type) && !reg->ref_obj_id && >>> + !reg->non_owning_ref_lock.ptr) { >>> verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n"); >>> return -EFAULT; >>> } >>> @@ -5778,9 +5786,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, >>> cur->active_lock.ptr = btf; >>> cur->active_lock.id = reg->id; >>> } else { >>> - struct bpf_func_state *fstate = cur_func(env); >>> void *ptr; >>> - int i; >>> >>> if (map) >>> ptr = map; >>> @@ -5796,25 +5802,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, >>> verbose(env, "bpf_spin_unlock of different lock\n"); >>> return -EINVAL; >>> } >>> - cur->active_lock.ptr = NULL; >>> - cur->active_lock.id = 0; >>> >>> - for (i = fstate->acquired_refs - 1; i >= 0; i--) { >>> - int err; >>> + invalidate_non_owning_refs(env, &cur->active_lock); >> >> +1 >> >>> - /* Complain on error because this reference state cannot >>> - * be freed before this point, as bpf_spin_lock critical >>> - * section does not allow functions that release the >>> - * allocated object immediately. >>> - */ >>> - if (!fstate->refs[i].release_on_unlock) >>> - continue; >>> - err = release_reference(env, fstate->refs[i].id); >>> - if (err) { >>> - verbose(env, "failed to release release_on_unlock reference"); >>> - return err; >>> - } >>> - } >>> + cur->active_lock.ptr = NULL; >>> + cur->active_lock.id = 0; >> >> +1 >> >>> } >>> return 0; >>> } >>> @@ -6273,6 +6265,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, >>> return 0; >>> } >>> >>> +static struct btf_field * >>> +reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields) >>> +{ >>> + struct btf_field *field; >>> + struct btf_record *rec; >>> + >>> + rec = reg_btf_record(reg); >>> + if (!reg) >>> + return NULL; >>> + >>> + field = btf_record_find(rec, off, fields); >>> + if (!field) >>> + return NULL; >>> + >>> + return field; >>> +} >> >> Doesn't look like that this helper is really necessary. >> >>> + >>> int check_func_arg_reg_off(struct bpf_verifier_env *env, >>> const struct bpf_reg_state *reg, int regno, >>> enum bpf_arg_type arg_type) >>> @@ -6294,6 +6303,18 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, >>> */ >>> if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK) >>> return 0; >>> + >>> + if (type == (PTR_TO_BTF_ID | MEM_ALLOC) && reg->off) { >>> + if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT)) >>> + return __check_ptr_off_reg(env, reg, regno, true); >>> + >>> + verbose(env, "R%d must have zero offset when passed to release func\n", >>> + regno); >>> + verbose(env, "No graph node or root found at R%d type:%s off:%d\n", regno, >>> + kernel_type_name(reg->btf, reg->btf_id), reg->off); >>> + return -EINVAL; >>> + } >> >> This bit is only necessary if we mark push_list as KF_RELEASE. >> Just don't add this mark and drop above. >> >>> + >>> /* Doing check_ptr_off_reg check for the offset will catch this >>> * because fixed_off_ok is false, but checking here allows us >>> * to give the user a better error message. >>> @@ -7055,6 +7076,20 @@ static int release_reference(struct bpf_verifier_env *env, >>> return 0; >>> } >>> >>> +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, >>> + struct bpf_active_lock *lock) >>> +{ >>> + struct bpf_func_state *unused; >>> + struct bpf_reg_state *reg; >>> + >>> + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ >>> + if (reg->non_owning_ref_lock.ptr && >>> + reg->non_owning_ref_lock.ptr == lock->ptr && >>> + reg->non_owning_ref_lock.id == lock->id) >> >> I think the lock.ptr = lock->ptr comparison is unnecessary to invalidate things. >> We're under active spin_lock here. All regs were checked earlier and id keeps incrementing. >> So we can just do 'u32 non_own_ref_obj_id'. >> >>> + __mark_reg_unknown(env, reg); >>> + })); >>> +} >>> + >>> static void clear_caller_saved_regs(struct bpf_verifier_env *env, >>> struct bpf_reg_state *regs) >>> { >>> @@ -8266,6 +8301,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta) >>> return meta->kfunc_flags & KF_RELEASE; >>> } >>> >>> +static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta) >>> +{ >>> + return meta->kfunc_flags & KF_RELEASE_NON_OWN; >>> +} >>> + >> >> No need. >> >>> static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta) >>> { >>> return meta->kfunc_flags & KF_TRUSTED_ARGS; >>> @@ -8651,38 +8691,55 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env, >>> return 0; >>> } >>> >>> -static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id) >>> +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_state *reg) >>> { >>> - struct bpf_func_state *state = cur_func(env); >>> + struct bpf_verifier_state *state = env->cur_state; >>> + >>> + if (!state->active_lock.ptr) { >>> + verbose(env, "verifier internal error: ref_set_non_owning_lock w/o active lock\n"); >>> + return -EFAULT; >>> + } >>> + >>> + if (reg->non_owning_ref_lock.ptr) { >>> + verbose(env, "verifier internal error: non_owning_ref_lock already set\n"); >>> + return -EFAULT; >>> + } >>> + >>> + reg->non_owning_ref_lock.id = state->active_lock.id; >>> + reg->non_owning_ref_lock.ptr = state->active_lock.ptr; >>> + return 0; >>> +} >>> + >>> +static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id) >>> +{ >>> + struct bpf_func_state *state, *unused; >>> struct bpf_reg_state *reg; >>> int i; >>> >>> - /* bpf_spin_lock only allows calling list_push and list_pop, no BPF >>> - * subprogs, no global functions. This means that the references would >>> - * not be released inside the critical section but they may be added to >>> - * the reference state, and the acquired_refs are never copied out for a >>> - * different frame as BPF to BPF calls don't work in bpf_spin_lock >>> - * critical sections. >>> - */ >>> + state = cur_func(env); >>> + >>> if (!ref_obj_id) { >>> - verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n"); >>> + verbose(env, "verifier internal error: ref_obj_id is zero for " >>> + "owning -> non-owning conversion\n"); >>> return -EFAULT; >>> } >>> + >>> for (i = 0; i < state->acquired_refs; i++) { >>> - if (state->refs[i].id == ref_obj_id) { >>> - if (state->refs[i].release_on_unlock) { >>> - verbose(env, "verifier internal error: expected false release_on_unlock"); >>> - return -EFAULT; >>> + if (state->refs[i].id != ref_obj_id) >>> + continue; >>> + >>> + /* Clear ref_obj_id here so release_reference doesn't clobber >>> + * the whole reg >>> + */ >>> + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ >>> + if (reg->ref_obj_id == ref_obj_id) { >>> + reg->ref_obj_id = 0; >>> + ref_set_non_owning_lock(env, reg); >> >> +1 except ref_set_... name doesn't quite fit. reg_set_... is more accurate, no? >> and probably reg_set_non_own_ref_obj_id() ? >> Or just open code it? >> >>> } >>> - state->refs[i].release_on_unlock = true; >>> - /* Now mark everyone sharing same ref_obj_id as untrusted */ >>> - bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ >>> - if (reg->ref_obj_id == ref_obj_id) >>> - reg->type |= PTR_UNTRUSTED; >>> - })); >>> - return 0; >>> - } >>> + })); >>> + return 0; >>> } >>> + >>> verbose(env, "verifier internal error: ref state missing for ref_obj_id\n"); >>> return -EFAULT; >>> } >>> @@ -8817,7 +8874,6 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, >>> { >>> const struct btf_type *et, *t; >>> struct btf_field *field; >>> - struct btf_record *rec; >>> u32 list_node_off; >>> >>> if (meta->btf != btf_vmlinux || >>> @@ -8834,9 +8890,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, >>> return -EINVAL; >>> } >>> >>> - rec = reg_btf_record(reg); >>> list_node_off = reg->off + reg->var_off.value; >>> - field = btf_record_find(rec, list_node_off, BPF_LIST_NODE); >>> + field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE); >>> if (!field || field->offset != list_node_off) { >>> verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off); >>> return -EINVAL; >>> @@ -8861,8 +8916,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, >>> btf_name_by_offset(field->list_head.btf, et->name_off)); >>> return -EINVAL; >>> } >>> - /* Set arg#1 for expiration after unlock */ >>> - return ref_set_release_on_unlock(env, reg->ref_obj_id); >>> + >>> + return 0; >> >> and here we come to the main point. >> Can you just call >> ref_convert_owning_non_owning(env, reg->ref_obj_id) and release_reference() here? >> Everything will be so much simpler, no? >> >>> } >>> >>> static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta) >>> @@ -9132,11 +9187,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, >>> int *insn_idx_p) >>> { >>> const struct btf_type *t, *func, *func_proto, *ptr_type; >>> + u32 i, nargs, func_id, ptr_type_id, release_ref_obj_id; >>> struct bpf_reg_state *regs = cur_regs(env); >>> const char *func_name, *ptr_type_name; >>> bool sleepable, rcu_lock, rcu_unlock; >>> struct bpf_kfunc_call_arg_meta meta; >>> - u32 i, nargs, func_id, ptr_type_id; >>> int err, insn_idx = *insn_idx_p; >>> const struct btf_param *args; >>> const struct btf_type *ret_t; >>> @@ -9223,7 +9278,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, >>> * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now. >>> */ >>> if (meta.release_regno) { >>> - err = release_reference(env, regs[meta.release_regno].ref_obj_id); >>> + err = 0; >>> + release_ref_obj_id = regs[meta.release_regno].ref_obj_id; >>> + >>> + if (is_kfunc_release_non_own(&meta)) >>> + err = ref_convert_owning_non_owning(env, release_ref_obj_id); >>> + if (err) { >>> + verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n", >>> + func_name, func_id); >>> + return err; >>> + } >>> + >>> + err = release_reference(env, release_ref_obj_id); >> >> and this bit won't be needed. >> and no need to guess in patch 1 which arg has to be released and converted to non_own. >> >>> if (err) { >>> verbose(env, "kfunc %s#%d reference has not been acquired before\n", >>> func_name, func_id); >>> -- >>> 2.30.2 >>>
On Tue, Jan 17, 2023 at 8:07 AM Dave Marchevsky <davemarchevsky@meta.com> wrote: > > How about I try to make the names better in v3 instead of removing the kfunc > flags entirely? If you're still opposed after that, I will instead add helpers > with comments like: Please review what I did with this patch: https://patchwork.kernel.org/project/netdevbpf/patch/20221230010738.45277-1-alexei.starovoitov@gmail.com/
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3de24cfb7a3d..f71571bf6adc 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -180,6 +180,7 @@ enum btf_field_type { BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF, BPF_LIST_HEAD = (1 << 4), BPF_LIST_NODE = (1 << 5), + BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD, }; struct btf_field_kptr { diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 53d175cbaa02..cb417ffbbb84 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -43,6 +43,22 @@ enum bpf_reg_liveness { REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */ }; +/* For every reg representing a map value or allocated object pointer, + * we consider the tuple of (ptr, id) for them to be unique in verifier + * context and conside them to not alias each other for the purposes of + * tracking lock state. + */ +struct bpf_active_lock { + /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, + * there's no active lock held, and other fields have no + * meaning. If non-NULL, it indicates that a lock is held and + * id member has the reg->id of the register which can be >= 0. + */ + void *ptr; + /* This will be reg->id */ + u32 id; +}; + struct bpf_reg_state { /* Ordering of fields matters. See states_equal() */ enum bpf_reg_type type; @@ -68,6 +84,7 @@ struct bpf_reg_state { struct { struct btf *btf; u32 btf_id; + struct bpf_active_lock non_owning_ref_lock; }; u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */ @@ -223,11 +240,6 @@ struct bpf_reference_state { * exiting a callback function. */ int callback_ref; - /* Mark the reference state to release the registers sharing the same id - * on bpf_spin_unlock (for nodes that we will lose ownership to but are - * safe to access inside the critical section). - */ - bool release_on_unlock; }; /* state of the program: @@ -328,21 +340,8 @@ struct bpf_verifier_state { u32 branches; u32 insn_idx; u32 curframe; - /* For every reg representing a map value or allocated object pointer, - * we consider the tuple of (ptr, id) for them to be unique in verifier - * context and conside them to not alias each other for the purposes of - * tracking lock state. - */ - struct { - /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, - * there's no active lock held, and other fields have no - * meaning. If non-NULL, it indicates that a lock is held and - * id member has the reg->id of the register which can be >= 0. - */ - void *ptr; - /* This will be reg->id */ - u32 id; - } active_lock; + + struct bpf_active_lock active_lock; bool speculative; bool active_rcu_lock; diff --git a/include/linux/btf.h b/include/linux/btf.h index 5f628f323442..8aee3f7f4248 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -15,10 +15,10 @@ #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) /* These need to be macros, as the expressions are used in assembler input */ -#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ -#define KF_RELEASE (1 << 1) /* kfunc is a release function */ -#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ -#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ +#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ +#define KF_RELEASE (1 << 1) /* kfunc is a release function */ +#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ +#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ /* Trusted arguments are those which are guaranteed to be valid when passed to * the kfunc. It is used to enforce that pointers obtained from either acquire * kfuncs, or from the main kernel on a tracepoint or struct_ops callback @@ -67,10 +67,11 @@ * return 0; * } */ -#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ -#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ -#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ -#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ +#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ +#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ +#define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ +#define KF_RELEASE_NON_OWN (1 << 8) /* kfunc converts its referenced arg into non-owning ref */ /* * Return the name of the passed struct, if exists, or halt the build if for diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index af30c6cbd65d..e041409779c3 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2049,8 +2049,8 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) #endif BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE) -BTF_ID_FLAGS(func, bpf_list_push_front) -BTF_ID_FLAGS(func, bpf_list_push_back) +BTF_ID_FLAGS(func, bpf_list_push_front, KF_RELEASE | KF_RELEASE_NON_OWN) +BTF_ID_FLAGS(func, bpf_list_push_back, KF_RELEASE | KF_RELEASE_NON_OWN) BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 824e2242eae5..84b0660e2a76 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -190,6 +190,10 @@ struct bpf_verifier_stack_elem { static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx); static int release_reference(struct bpf_verifier_env *env, int ref_obj_id); +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, + struct bpf_active_lock *lock); +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, + struct bpf_reg_state *reg); static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux) { @@ -931,6 +935,9 @@ static void print_verifier_state(struct bpf_verifier_env *env, verbose_a("id=%d", reg->id); if (reg->ref_obj_id) verbose_a("ref_obj_id=%d", reg->ref_obj_id); + if (reg->non_owning_ref_lock.ptr) + verbose_a("non_own_id=(%p,%d)", reg->non_owning_ref_lock.ptr, + reg->non_owning_ref_lock.id); if (t != SCALAR_VALUE) verbose_a("off=%d", reg->off); if (type_is_pkt_pointer(t)) @@ -4820,7 +4827,8 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, return -EACCES; } - if (type_is_alloc(reg->type) && !reg->ref_obj_id) { + if (type_is_alloc(reg->type) && !reg->ref_obj_id && + !reg->non_owning_ref_lock.ptr) { verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n"); return -EFAULT; } @@ -5778,9 +5786,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, cur->active_lock.ptr = btf; cur->active_lock.id = reg->id; } else { - struct bpf_func_state *fstate = cur_func(env); void *ptr; - int i; if (map) ptr = map; @@ -5796,25 +5802,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, verbose(env, "bpf_spin_unlock of different lock\n"); return -EINVAL; } - cur->active_lock.ptr = NULL; - cur->active_lock.id = 0; - for (i = fstate->acquired_refs - 1; i >= 0; i--) { - int err; + invalidate_non_owning_refs(env, &cur->active_lock); - /* Complain on error because this reference state cannot - * be freed before this point, as bpf_spin_lock critical - * section does not allow functions that release the - * allocated object immediately. - */ - if (!fstate->refs[i].release_on_unlock) - continue; - err = release_reference(env, fstate->refs[i].id); - if (err) { - verbose(env, "failed to release release_on_unlock reference"); - return err; - } - } + cur->active_lock.ptr = NULL; + cur->active_lock.id = 0; } return 0; } @@ -6273,6 +6265,23 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, return 0; } +static struct btf_field * +reg_find_field_offset(const struct bpf_reg_state *reg, s32 off, u32 fields) +{ + struct btf_field *field; + struct btf_record *rec; + + rec = reg_btf_record(reg); + if (!reg) + return NULL; + + field = btf_record_find(rec, off, fields); + if (!field) + return NULL; + + return field; +} + int check_func_arg_reg_off(struct bpf_verifier_env *env, const struct bpf_reg_state *reg, int regno, enum bpf_arg_type arg_type) @@ -6294,6 +6303,18 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, */ if (arg_type_is_dynptr(arg_type) && type == PTR_TO_STACK) return 0; + + if (type == (PTR_TO_BTF_ID | MEM_ALLOC) && reg->off) { + if (reg_find_field_offset(reg, reg->off, BPF_GRAPH_NODE_OR_ROOT)) + return __check_ptr_off_reg(env, reg, regno, true); + + verbose(env, "R%d must have zero offset when passed to release func\n", + regno); + verbose(env, "No graph node or root found at R%d type:%s off:%d\n", regno, + kernel_type_name(reg->btf, reg->btf_id), reg->off); + return -EINVAL; + } + /* Doing check_ptr_off_reg check for the offset will catch this * because fixed_off_ok is false, but checking here allows us * to give the user a better error message. @@ -7055,6 +7076,20 @@ static int release_reference(struct bpf_verifier_env *env, return 0; } +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, + struct bpf_active_lock *lock) +{ + struct bpf_func_state *unused; + struct bpf_reg_state *reg; + + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ + if (reg->non_owning_ref_lock.ptr && + reg->non_owning_ref_lock.ptr == lock->ptr && + reg->non_owning_ref_lock.id == lock->id) + __mark_reg_unknown(env, reg); + })); +} + static void clear_caller_saved_regs(struct bpf_verifier_env *env, struct bpf_reg_state *regs) { @@ -8266,6 +8301,11 @@ static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta) return meta->kfunc_flags & KF_RELEASE; } +static bool is_kfunc_release_non_own(struct bpf_kfunc_call_arg_meta *meta) +{ + return meta->kfunc_flags & KF_RELEASE_NON_OWN; +} + static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta) { return meta->kfunc_flags & KF_TRUSTED_ARGS; @@ -8651,38 +8691,55 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env, return 0; } -static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id) +static int ref_set_non_owning_lock(struct bpf_verifier_env *env, struct bpf_reg_state *reg) { - struct bpf_func_state *state = cur_func(env); + struct bpf_verifier_state *state = env->cur_state; + + if (!state->active_lock.ptr) { + verbose(env, "verifier internal error: ref_set_non_owning_lock w/o active lock\n"); + return -EFAULT; + } + + if (reg->non_owning_ref_lock.ptr) { + verbose(env, "verifier internal error: non_owning_ref_lock already set\n"); + return -EFAULT; + } + + reg->non_owning_ref_lock.id = state->active_lock.id; + reg->non_owning_ref_lock.ptr = state->active_lock.ptr; + return 0; +} + +static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id) +{ + struct bpf_func_state *state, *unused; struct bpf_reg_state *reg; int i; - /* bpf_spin_lock only allows calling list_push and list_pop, no BPF - * subprogs, no global functions. This means that the references would - * not be released inside the critical section but they may be added to - * the reference state, and the acquired_refs are never copied out for a - * different frame as BPF to BPF calls don't work in bpf_spin_lock - * critical sections. - */ + state = cur_func(env); + if (!ref_obj_id) { - verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n"); + verbose(env, "verifier internal error: ref_obj_id is zero for " + "owning -> non-owning conversion\n"); return -EFAULT; } + for (i = 0; i < state->acquired_refs; i++) { - if (state->refs[i].id == ref_obj_id) { - if (state->refs[i].release_on_unlock) { - verbose(env, "verifier internal error: expected false release_on_unlock"); - return -EFAULT; + if (state->refs[i].id != ref_obj_id) + continue; + + /* Clear ref_obj_id here so release_reference doesn't clobber + * the whole reg + */ + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ + if (reg->ref_obj_id == ref_obj_id) { + reg->ref_obj_id = 0; + ref_set_non_owning_lock(env, reg); } - state->refs[i].release_on_unlock = true; - /* Now mark everyone sharing same ref_obj_id as untrusted */ - bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ - if (reg->ref_obj_id == ref_obj_id) - reg->type |= PTR_UNTRUSTED; - })); - return 0; - } + })); + return 0; } + verbose(env, "verifier internal error: ref state missing for ref_obj_id\n"); return -EFAULT; } @@ -8817,7 +8874,6 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, { const struct btf_type *et, *t; struct btf_field *field; - struct btf_record *rec; u32 list_node_off; if (meta->btf != btf_vmlinux || @@ -8834,9 +8890,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, return -EINVAL; } - rec = reg_btf_record(reg); list_node_off = reg->off + reg->var_off.value; - field = btf_record_find(rec, list_node_off, BPF_LIST_NODE); + field = reg_find_field_offset(reg, list_node_off, BPF_LIST_NODE); if (!field || field->offset != list_node_off) { verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off); return -EINVAL; @@ -8861,8 +8916,8 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, btf_name_by_offset(field->list_head.btf, et->name_off)); return -EINVAL; } - /* Set arg#1 for expiration after unlock */ - return ref_set_release_on_unlock(env, reg->ref_obj_id); + + return 0; } static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta) @@ -9132,11 +9187,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, int *insn_idx_p) { const struct btf_type *t, *func, *func_proto, *ptr_type; + u32 i, nargs, func_id, ptr_type_id, release_ref_obj_id; struct bpf_reg_state *regs = cur_regs(env); const char *func_name, *ptr_type_name; bool sleepable, rcu_lock, rcu_unlock; struct bpf_kfunc_call_arg_meta meta; - u32 i, nargs, func_id, ptr_type_id; int err, insn_idx = *insn_idx_p; const struct btf_param *args; const struct btf_type *ret_t; @@ -9223,7 +9278,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now. */ if (meta.release_regno) { - err = release_reference(env, regs[meta.release_regno].ref_obj_id); + err = 0; + release_ref_obj_id = regs[meta.release_regno].ref_obj_id; + + if (is_kfunc_release_non_own(&meta)) + err = ref_convert_owning_non_owning(env, release_ref_obj_id); + if (err) { + verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n", + func_name, func_id); + return err; + } + + err = release_reference(env, release_ref_obj_id); if (err) { verbose(env, "kfunc %s#%d reference has not been acquired before\n", func_name, func_id);
This patch introduces non-owning reference semantics to the verifier, specifically linked_list API kfunc handling. release_on_unlock logic for refs is refactored - with small functional changes - to implement these semantics, and bpf_list_push_{front,back} are migrated to use them. When a list node is pushed to a list, the program still has a pointer to the node: n = bpf_obj_new(typeof(*n)); bpf_spin_lock(&l); bpf_list_push_back(&l, n); /* n still points to the just-added node */ bpf_spin_unlock(&l); What the verifier considers n to be after the push, and thus what can be done with n, are changed by this patch. Common properties both before/after this patch: * After push, n is only a valid reference to the node until end of critical section * After push, n cannot be pushed to any list * After push, the program can read the node's fields using n Before: * After push, n retains the ref_obj_id which it received on bpf_obj_new, but the associated bpf_reference_state's release_on_unlock field is set to true * release_on_unlock field and associated logic is used to implement "n is only a valid ref until end of critical section" * After push, n cannot be written to, the node must be removed from the list before writing to its fields * After push, n is marked PTR_UNTRUSTED After: * After push, n's ref is released and ref_obj_id set to 0. The bpf_reg_state's non_owning_ref_lock struct is populated with the currently active lock * non_owning_ref_lock and logic is used to implement "n is only a valid ref until end of critical section" * n can be written to (except for special fields e.g. bpf_list_node, timer, ...) * No special type flag is added to n after push Summary of specific implementation changes to achieve the above: * release_on_unlock field, ref_set_release_on_unlock helper, and logic to "release on unlock" based on that field are removed * The anonymous active_lock struct used by bpf_verifier_state is pulled out into a named struct bpf_active_lock. * A non_owning_ref_lock field of type bpf_active_lock is added to bpf_reg_state's PTR_TO_BTF_ID union * Helpers are added to use non_owning_ref_lock to implement non-owning ref semantics as described above * invalidate_non_owning_refs - helper to clobber all non-owning refs matching a particular bpf_active_lock identity. Replaces release_on_unlock logic in process_spin_lock. * ref_set_non_owning_lock - set non_owning_ref_lock for a reg based on current verifier state * ref_convert_owning_non_owning - convert owning reference w/ specified ref_obj_id to non-owning references. Setup non_owning_ref_lock for each reg with that ref_obj_id and 0 out its ref_obj_id * New KF_RELEASE_NON_OWN flag is added, to be used in conjunction with KF_RELEASE to indicate that the release arg reg should be converted to non-owning ref * Plain KF_RELEASE would clobber all regs with ref_obj_id matching the release arg reg's. KF_RELEASE_NON_OWN's logic triggers first - doing ref_convert_owning_non_owning on the ref first, which prevents the regs from being clobbered by 0ing out their ref_obj_ids. The bpf_reference_state itself is still released via release_reference as a result of the KF_RELEASE flag. * KF_RELEASE | KF_RELEASE_NON_OWN are added to bpf_list_push_{front,back} After these changes, linked_list's "release on unlock" logic continues to function as before, except for the semantic differences noted above. The patch immediately following this one makes minor changes to linked_list selftests to account for the differing behavior. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> --- include/linux/bpf.h | 1 + include/linux/bpf_verifier.h | 39 ++++----- include/linux/btf.h | 17 ++-- kernel/bpf/helpers.c | 4 +- kernel/bpf/verifier.c | 164 ++++++++++++++++++++++++----------- 5 files changed, 146 insertions(+), 79 deletions(-)