Message ID | 20221230010738.45277-1-alexei.starovoitov@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next,1/2] bpf: Migrate release_on_unlock logic to non-owning ref semantics | expand |
On Thu, Dec 29, 2022 at 5:07 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > @@ -11959,7 +11956,10 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn) > dst_reg->type = PTR_TO_MAP_VALUE; > dst_reg->off = aux->map_off; > WARN_ON_ONCE(map->max_entries != 1); > - /* We want reg->id to be same (0) as map_value is not distinct */ > + /* map->id is positive s32. Negative map->id will not collide with env->id_gen. > + * This lets us track active_lock state as single u32 instead of pair { map|btf, id } > + */ > + dst_reg->id = -map->id; Here is what I meant in my earlier reply to Dave's patches 1 and 2. imo this is a simpler implementation of the same logic. The only tricky part is above bit that is necessary to use a single u32 in bpf_reg_state to track active_lock. We can follow up with clean up of active_lock comparison in other places of the verifier. Instead of: if (map) cur->active_lock.ptr = map; else cur->active_lock.ptr = btf; cur->active_lock.id = reg->id; it will be: cur->active_lock_id = reg->id; Another cleanup is needed to compare active_lock_id properly in regsafe(). Thoughts?
On 12/29/22 8:13 PM, Alexei Starovoitov wrote: > On Thu, Dec 29, 2022 at 5:07 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: >> @@ -11959,7 +11956,10 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn) >> dst_reg->type = PTR_TO_MAP_VALUE; >> dst_reg->off = aux->map_off; >> WARN_ON_ONCE(map->max_entries != 1); >> - /* We want reg->id to be same (0) as map_value is not distinct */ >> + /* map->id is positive s32. Negative map->id will not collide with env->id_gen. >> + * This lets us track active_lock state as single u32 instead of pair { map|btf, id } >> + */ >> + dst_reg->id = -map->id; > > Here is what I meant in my earlier reply to Dave's patches 1 and 2. > imo this is a simpler implementation of the same logic. > The only tricky part is above bit that is necessary > to use a single u32 in bpf_reg_state to track active_lock. > We can follow up with clean up of active_lock comparison > in other places of the verifier. > Instead of: > if (map) > cur->active_lock.ptr = map; > else > cur->active_lock.ptr = btf; > cur->active_lock.id = reg->id; > it will be: > cur->active_lock_id = reg->id; > > Another cleanup is needed to compare active_lock_id properly > in regsafe(). > > Thoughts? Before Kumar's commit d0d78c1df9b1d ("bpf: Allow locking bpf_spin_lock global variables"), setting of dst_reg->id in that code path was guarded by "does map val have spin_lock check": if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) dst_reg->id = ++env->id_gen; Should we do that here as well? Not sure what the implications of overzealous setting of dst_reg->id are. More generally: I see why doing -map->id will not overlap with env->id_gen in practice. For PTR_TO_BTF_ID, I'm not sure that we'll always have a nonzero reg->id here. check_kfunc_call has if (reg_may_point_to_spin_lock(®s[BPF_REG_0]) && !regs[BPF_REG_0].id) regs[BPF_REG_0].id = ++env->id_gen; when checking retval, but there's no such equivalent in check_helper_call, which instead does if (type_may_be_null(regs[BPF_REG_0].type)) regs[BPF_REG_0].id = ++env->id_gen; and similar in a few paths. Should we ensure that "any PTR_TO_BTF_ID reg that has a spin_lock must have a nonzero id"? If we don't, a helper which returns PTR_TO_BTF_ID w/ spin_lock that isn't MAYBE_NULL will break this whole scheme. Some future-proofing concerns: Kumar's commit above mentions this in the summary: Note that this can be extended in the future to also remember offset used for locking, so that we can introduce multiple bpf_spin_lock fields in the same allocation. When the above happens we'll need to go back to a struct - or some packed int - to describe "lock identity" anyways. IIRC in the long term we wanted to stop overloading reg->id's meaning, with the ideal end state being reg->id used only for "null id" purposes. If so, this is moving us back towards the overloaded state. Happy to take this over and respin once we're done discussing, unless you want to do it.
On Tue, Jan 17, 2023 at 05:42:39PM -0500, Dave Marchevsky wrote: > On 12/29/22 8:13 PM, Alexei Starovoitov wrote: > > On Thu, Dec 29, 2022 at 5:07 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > >> @@ -11959,7 +11956,10 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn) > >> dst_reg->type = PTR_TO_MAP_VALUE; > >> dst_reg->off = aux->map_off; > >> WARN_ON_ONCE(map->max_entries != 1); > >> - /* We want reg->id to be same (0) as map_value is not distinct */ > >> + /* map->id is positive s32. Negative map->id will not collide with env->id_gen. > >> + * This lets us track active_lock state as single u32 instead of pair { map|btf, id } > >> + */ > >> + dst_reg->id = -map->id; > > > > Here is what I meant in my earlier reply to Dave's patches 1 and 2. > > imo this is a simpler implementation of the same logic. > > The only tricky part is above bit that is necessary > > to use a single u32 in bpf_reg_state to track active_lock. > > We can follow up with clean up of active_lock comparison > > in other places of the verifier. > > Instead of: > > if (map) > > cur->active_lock.ptr = map; > > else > > cur->active_lock.ptr = btf; > > cur->active_lock.id = reg->id; > > it will be: > > cur->active_lock_id = reg->id; > > > > Another cleanup is needed to compare active_lock_id properly > > in regsafe(). > > > > Thoughts? > > Before Kumar's commit d0d78c1df9b1d ("bpf: Allow locking bpf_spin_lock global > variables"), setting of dst_reg->id in that code path was guarded by "does > map val have spin_lock check": > > if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) > dst_reg->id = ++env->id_gen; > > Should we do that here as well? Not sure what the implications of overzealous > setting of dst_reg->id are. That won't work, since for spin_locks in global data that 'id' has to be stable. Hence I went with -map->id. > More generally: I see why doing -map->id will not overlap with env->id_gen > in practice. For PTR_TO_BTF_ID, I'm not sure that we'll always have a nonzero > reg->id here. check_kfunc_call has > > if (reg_may_point_to_spin_lock(®s[BPF_REG_0]) && !regs[BPF_REG_0].id) > regs[BPF_REG_0].id = ++env->id_gen; correct. > when checking retval, but there's no such equivalent in check_helper_call, > which instead does > > if (type_may_be_null(regs[BPF_REG_0].type)) > regs[BPF_REG_0].id = ++env->id_gen; > > and similar in a few paths. > > Should we ensure that "any PTR_TO_BTF_ID reg that has a spin_lock must > have a nonzero id"? If we don't, a helper which returns > PTR_TO_BTF_ID w/ spin_lock that isn't MAYBE_NULL will break this whole scheme. correct. it's not a problem in practice, because there are few helpers that return PTR_TO_BTF_ID and none of them point to spin_lock. > Some future-proofing concerns: > > Kumar's commit above mentions this in the summary: > > Note that this can be extended in the future to also remember offset > used for locking, so that we can introduce multiple bpf_spin_lock fields > in the same allocation. Yeah. I forgot about this 'offset' idea. > When the above happens we'll need to go back to a struct - or some packed > int - to describe "lock identity" anyways. yeah. I was hoping we can represent all of 'active_lock' as a single id, but 'offset' is hard to encode into an integer. We won't be able to add it in top 32-bits, since the resulting 64-bit 'id' would need to go through check_ids(). Even if we extend idmap_scratch to be 64-bit it won't be correct. Two different 32-bit IDs are ok to see and the states might still be equivalent, but two different 'offset' are not ok. The offset in 'old' state and offset in 'cur' state has to be the same for states to be equivalent. So the future 'offset' would need to be compared in regsafe() manually anyway. Since we'd have to do that there is little point combining active_lock.ptr comparison into 32-bit id. Sigh. > IIRC in the long term we wanted to stop overloading reg->id's meaning, > with the ideal end state being reg->id used only for "null id" purposes. If so, > this is moving us back towards the overloaded state. > > Happy to take this over and respin once we're done discussing, unless you > want to do it. Please go ahead. Let's scratch my -map->id idea for now. Let's go with embedding 'struct bpf_active_lock {void *ptr; u32 id;}' into bpf_reg_state, but please make sure that bpf_reg_state grows by 8 bytes only. Simply adding bpf_active_lock in btf_id part of it will grow by 16. Note the other idea you had to keep a pointer to active_lock in bpf_reg_state is too dangerous and probably incorrect. That pointer might become dangling after we copy_verifier_state. All pointers in bpf_reg_state need to be stable objects. The only exception is 'parent' which is very tricky on its own. Maybe we can do struct bpf_active_lock { u32 map_btf_id; /* instead of void *ptr */ u32 id; }; and init map_btf_id with -map->id or btf->id; It's guaranteed not to overlap and != 0. Don't know whether other pointers will be there. Maybe premature optimization.
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 127058cfec47..3fc41edff267 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -68,6 +68,7 @@ struct bpf_reg_state { struct { struct btf *btf; u32 btf_id; + u32 active_lock_id; }; u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */ @@ -223,11 +224,6 @@ struct bpf_reference_state { * exiting a callback function. */ int callback_ref; - /* Mark the reference state to release the registers sharing the same id - * on bpf_spin_unlock (for nodes that we will lose ownership to but are - * safe to access inside the critical section). - */ - bool release_on_unlock; }; /* state of the program: diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 4a25375ebb0d..3e7fd564132c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -190,6 +190,7 @@ struct bpf_verifier_stack_elem { static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx); static int release_reference(struct bpf_verifier_env *env, int ref_obj_id); +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, u32 lock_id); static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux) { @@ -931,6 +932,8 @@ static void print_verifier_state(struct bpf_verifier_env *env, verbose_a("id=%d", reg->id); if (reg->ref_obj_id) verbose_a("ref_obj_id=%d", reg->ref_obj_id); + if (reg->active_lock_id) + verbose_a("lock_id=%d", reg->active_lock_id); if (t != SCALAR_VALUE) verbose_a("off=%d", reg->off); if (type_is_pkt_pointer(t)) @@ -5782,9 +5785,7 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, cur->active_lock.ptr = btf; cur->active_lock.id = reg->id; } else { - struct bpf_func_state *fstate = cur_func(env); void *ptr; - int i; if (map) ptr = map; @@ -5800,25 +5801,11 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, verbose(env, "bpf_spin_unlock of different lock\n"); return -EINVAL; } - cur->active_lock.ptr = NULL; - cur->active_lock.id = 0; - for (i = fstate->acquired_refs - 1; i >= 0; i--) { - int err; + invalidate_non_owning_refs(env, cur->active_lock.id); - /* Complain on error because this reference state cannot - * be freed before this point, as bpf_spin_lock critical - * section does not allow functions that release the - * allocated object immediately. - */ - if (!fstate->refs[i].release_on_unlock) - continue; - err = release_reference(env, fstate->refs[i].id); - if (err) { - verbose(env, "failed to release release_on_unlock reference"); - return err; - } - } + cur->active_lock.ptr = NULL; + cur->active_lock.id = 0; } return 0; } @@ -7081,6 +7068,17 @@ static int release_reference(struct bpf_verifier_env *env, return 0; } +static void invalidate_non_owning_refs(struct bpf_verifier_env *env, u32 lock_id) +{ + struct bpf_func_state *unused; + struct bpf_reg_state *reg; + + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ + if (base_type(reg->type) == PTR_TO_BTF_ID && reg->active_lock_id == lock_id) + __mark_reg_unknown(env, reg); + })); +} + static void clear_caller_saved_regs(struct bpf_verifier_env *env, struct bpf_reg_state *regs) { @@ -8583,38 +8581,38 @@ static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env, return 0; } -static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id) +static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_obj_id) { - struct bpf_func_state *state = cur_func(env); + struct bpf_func_state *state, *unused; struct bpf_reg_state *reg; int i; - /* bpf_spin_lock only allows calling list_push and list_pop, no BPF - * subprogs, no global functions. This means that the references would - * not be released inside the critical section but they may be added to - * the reference state, and the acquired_refs are never copied out for a - * different frame as BPF to BPF calls don't work in bpf_spin_lock - * critical sections. - */ + state = cur_func(env); + if (!ref_obj_id) { - verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n"); + verbose(env, "verifier internal error: ref_obj_id is zero for " + "owning -> non-owning conversion\n"); return -EFAULT; } + for (i = 0; i < state->acquired_refs; i++) { - if (state->refs[i].id == ref_obj_id) { - if (state->refs[i].release_on_unlock) { - verbose(env, "verifier internal error: expected false release_on_unlock"); - return -EFAULT; + if (state->refs[i].id != ref_obj_id) + continue; + + bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ + if (reg->ref_obj_id == ref_obj_id) { + /* Clear ref_obj_id only. The rest of PTR_TO_BTF_ID stays as-is */ + reg->ref_obj_id = 0; + reg->active_lock_id = env->cur_state->active_lock.id; } - state->refs[i].release_on_unlock = true; - /* Now mark everyone sharing same ref_obj_id as untrusted */ - bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ - if (reg->ref_obj_id == ref_obj_id) - reg->type |= PTR_UNTRUSTED; - })); - return 0; - } + })); + + /* There are no referenced regs with this ref_obj_id in the current state. + * Removed ref_obj_id from acquired_refs. It should definitely succeed. + */ + return release_reference_state(state, ref_obj_id); } + verbose(env, "verifier internal error: ref state missing for ref_obj_id\n"); return -EFAULT; } @@ -8794,8 +8792,7 @@ static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, btf_name_by_offset(field->graph_root.btf, et->name_off)); return -EINVAL; } - /* Set arg#1 for expiration after unlock */ - return ref_set_release_on_unlock(env, reg->ref_obj_id); + return ref_convert_owning_non_owning(env, reg->ref_obj_id); } static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta) @@ -8997,7 +8994,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ return -EINVAL; } if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC) && !reg->ref_obj_id) { - verbose(env, "allocated object must be referenced\n"); + verbose(env, "Allocated list_head must be referenced\n"); return -EINVAL; } ret = process_kf_arg_ptr_to_list_head(env, reg, regno, meta); @@ -9010,7 +9007,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ return -EINVAL; } if (!reg->ref_obj_id) { - verbose(env, "allocated object must be referenced\n"); + verbose(env, "Allocated list_node must be referenced\n"); return -EINVAL; } ret = process_kf_arg_ptr_to_list_node(env, reg, regno, meta); @@ -11959,7 +11956,10 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn) dst_reg->type = PTR_TO_MAP_VALUE; dst_reg->off = aux->map_off; WARN_ON_ONCE(map->max_entries != 1); - /* We want reg->id to be same (0) as map_value is not distinct */ + /* map->id is positive s32. Negative map->id will not collide with env->id_gen. + * This lets us track active_lock state as single u32 instead of pair { map|btf, id } + */ + dst_reg->id = -map->id; } else if (insn->src_reg == BPF_PSEUDO_MAP_FD || insn->src_reg == BPF_PSEUDO_MAP_IDX) { dst_reg->type = CONST_PTR_TO_MAP;