[RFC,v1,05/14] bpf: Implement BPF exception frame descriptor generation

Message ID	20240201042109.1150490-6-memxor@gmail.com (mailing list archive)
State	RFC
Delegated to:	BPF
Headers	show Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E75983BB2C for <bpf@vger.kernel.org>; Thu, 1 Feb 2024 04:21:21 +0000 (UTC) From: Kumar Kartikeya Dwivedi <memxor@gmail.com> To: bpf@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org>, Andrii Nakryiko <andrii@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Martin KaFai Lau <martin.lau@kernel.org>, David Vernet <void@manifault.com>, Tejun Heo <tj@kernel.org>, Raj Sahu <rjsu26@vt.edu>, Dan Williams <djwillia@vt.edu>, Rishabh Iyer <rishabh.iyer@epfl.ch>, Sanidhya Kashyap <sanidhya.kashyap@epfl.ch> Subject: [RFC PATCH v1 05/14] bpf: Implement BPF exception frame descriptor generation Date: Thu, 1 Feb 2024 04:21:00 +0000 Message-Id: <20240201042109.1150490-6-memxor@gmail.com> In-Reply-To: <20240201042109.1150490-1-memxor@gmail.com> References: <20240201042109.1150490-1-memxor@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Exceptions - Resource Cleanup \| expand [RFC,v1,00/14] Exceptions - Resource Cleanup [RFC,v1,01/14] bpf: Mark subprogs as throw reachable before do_check pass [RFC,v1,02/14] bpf: Process global subprog's exception propagation [RFC,v1,03/14] selftests/bpf: Add test for throwing global subprog with acquired refs [RFC,v1,04/14] bpf: Refactor check_pseudo_btf_id's BTF reference bump [RFC,v1,05/14] bpf: Implement BPF exception frame descriptor generation [RFC,v1,06/14] bpf: Adjust frame descriptor pc on instruction patching [RFC,v1,07/14] bpf: Use hidden subprog trampoline for bpf_throw [RFC,v1,08/14] bpf: Compute used callee saved registers for subprogs [RFC,v1,09/14] bpf, x86: Fix up pc offsets for frame descriptor entries [RFC,v1,10/14] bpf, x86: Implement runtime resource cleanup for exceptions [RFC,v1,11/14] bpf: Release references in verifier state when throwing exceptions [RFC,v1,12/14] bpf: Register cleanup dtors for runtime unwinding [RFC,v1,13/14] bpf: Make bpf_throw available to all program types [RFC,v1,14/14] selftests/bpf: Add tests for exceptions runtime cleanup

Context	Check	Description
bpf/vmtest-bpf-next-PR	success	PR summary
netdev/tree_selection	success	Guessing tree name failed - patch did not apply, async
bpf/vmtest-bpf-next-VM_Test-0	success	Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for aarch64-gcc / test
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-4	fail	Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-34	success	Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-8	fail	Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for s390x-gcc / test
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-17	pending	Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-25	pending	Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-30	success	Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-32	pending	Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-33	pending	Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-36	success	Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-31	success	Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-35	success	Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18

On Thu, 15 Feb 2024 at 19:24, Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote: > > Question: are there any real-life programs adapted to use exceptions > with cleanup feature? It would be interesting to see how robust > one-descriptor-per-pc is in practice, and also how it affects memory > consumption during verification. > I tried it on sched-ext schedulers with this series with some helper macros and it worked well. I can post stats on memory consumption with the next version (or make it part of veristat, whichever is more desirable). I think the downside of this approach is when you have different resources in the same register/slot from two different paths which then hit the same bpf_throw. Surely, it will not work in that case the way things are set up right now. But there are ways to address that (ranging from compiler passes that hoist the throw call to the predecessor basic blocks of separate paths when we detect such a pattern, to emitting path hints of resource types during JIT which bpf_throw can pick up), but I didn't encounter any examples so far where this came up, and when it does, mostly you can rewrite things a bit differently to make it work. One of the changes though that I plan to make in the next posting is keeping duplicate entries for the same resource around when the first frame descriptor is created for a pc. This increases the chances of merging correctly in case the second path intersects with us in some slots, but for the purposes of releasing resources, that intersection is sufficient. Like you could have the same resource in R8 and R9, the second path may only have it in R9 and not R8, both can merge, if we erase R8 (or any other duplicates) in the original frame desc. Right now, we simply do a release_resource so whenever we encounter R8, we release its ref_obj_id and mark everything sharing the same id invalid, so we never encounter R9 in the verifier state. In the above case, these frame descs would not match but they could have if we had discovered R9 before R8. If you think this is not sufficient, please let me know. It is certainly tricky and I might be underestimating the difficulty of getting this right. > The algorithm makes sense to me, a few comments/nits below. > > [...] > > > +static int find_and_merge_frame_desc(struct bpf_verifier_env *env, struct bpf_exception_frame_desc_tab *fdtab, u64 pc, struct bpf_frame_desc_reg_entry *fd) > > +{ > > + struct bpf_exception_frame_desc **descs = NULL, *desc = NULL, *p; > > + int ret = 0; > > + > > + for (int i = 0; i < fdtab->cnt; i++) { > > + if (pc != fdtab->desc[i]->pc) > > + continue; > > + descs = &fdtab->desc[i]; > > + desc = fdtab->desc[i]; > > + break; > > + } > > + > > + if (!desc) { > > + verbose(env, "frame_desc: find_and_merge: cannot find frame descriptor for pc=%llu, creating new entry\n", pc); > > + return -ENOENT; > > + } > > + > > + if (fd->off < 0) > > + goto stack; > > Nit: maybe write it down as > > if (fd->off >= 0) > return merge_frame_desc(...); > > and avoid goto? > Ack, will fix. > [...] > > > +static int gen_exception_frame_desc_stack_entry(struct bpf_verifier_env *env, struct bpf_func_state *frame, int stack_off) > > +{ > > + int spi = stack_off / BPF_REG_SIZE, off = -stack_off - 1; > > + struct bpf_reg_state *reg, not_init_reg, null_reg; > > + int slot_type, ret; > > + > > + __mark_reg_not_init(env, &not_init_reg); > > + __mark_reg_known_zero(&null_reg); > > __mark_reg_known_zero() does not set .type field, > thus null_reg.type value is undefined. > Hmm, good catch, will fix it. > > + > > + slot_type = frame->stack[spi].slot_type[BPF_REG_SIZE - 1]; > > + reg = &frame->stack[spi].spilled_ptr; > > + > > + switch (slot_type) { > > + case STACK_SPILL: > > + /* We skip all kinds of scalar registers, except NULL values, which consume a slot. */ > > + if (is_spilled_scalar_reg(&frame->stack[spi]) && !register_is_null(&frame->stack[spi].spilled_ptr)) > > + break; > > + ret = gen_exception_frame_desc_reg_entry(env, reg, off, frame->frameno); > > + if (ret < 0) > > + return ret; > > + break; > > + case STACK_DYNPTR: > > + /* Keep iterating until we find the first slot. */ > > + if (!reg->dynptr.first_slot) > > + break; > > + ret = gen_exception_frame_desc_dynptr_entry(env, reg, off, frame->frameno); > > + if (ret < 0) > > + return ret; > > + break; > > + case STACK_ITER: > > + /* Keep iterating until we find the first slot. */ > > + if (!reg->ref_obj_id) > > + break; > > + ret = gen_exception_frame_desc_iter_entry(env, reg, off, frame->frameno); > > + if (ret < 0) > > + return ret; > > + break; > > + case STACK_MISC: > > + case STACK_INVALID: > > + /* Create an invalid entry for MISC and INVALID */ > > + ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno); > > + if (ret < 0) > > + return 0; > > No tests are failing if I comment out this block. > Looking at the merge_frame_desc() logic it appears to me that fd > entries with fd->type == NOT_INIT would only be merged with other > NOT_INIT entries. What is the point of having such entries at all? > Assume you figured it out based on the other email. Basically, creating entries so that no merge can occur for the slot. > > + break; > > + case STACK_ZERO: > > + reg = &null_reg; > > + for (int i = BPF_REG_SIZE - 1; i >= 0; i--) { > > + if (frame->stack[spi].slot_type[i] != STACK_ZERO) > > + reg = &not_init_reg; > > + } > > + ret = gen_exception_frame_desc_reg_entry(env, &null_reg, off, frame->frameno); > > + if (ret < 0) > > + return ret; > > Same here, no tests are failing if STACK_ZERO block is commented. > In general, what is the point of adding STACK_ZERO entries? > There is a logic in place to merge NULL and non-NULL entries, > but how is it different from not adding NULL entries in a first place? > find_and_merge_frame_desc() does a linear scan over bpf_exception_frame_desc->stack > and does not rely on entries being sorted by .off field. > Answered on the other email as well. I will add more tests for this case. STACK_ZERO also creates NULL entries, we just treat them the same as NULL reg and insert an entry using a fake reg initialized to zero. Also I think we need to pass reg, not &null_reg in this case. Will fix this part of the code and add tests. While it does not rely on entries being sorted, we still find the one with the same offset. I can sort for a binary search at runtime, I was mostly worried that constant re-sorting during verification may end up costing more when the number of frame descriptors is low, but I think it makes sense atleast for runtime when the array is only searched. > > + break; > > + default: > > + verbose(env, "verifier internal error: frame%d stack off=%d slot_type=%d missing handling for exception frame generation\n", > > + frame->frameno, off, slot_type); > > + return -EFAULT; > > + } > > + return 0; > > +}

diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 1ebbee1d648e..463c8d22ad72 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1424,6 +1424,7 @@ struct btf_mod_pair { }; struct bpf_kfunc_desc_tab; +struct bpf_exception_frame_desc_tab; struct bpf_prog_aux { atomic64_t refcnt; @@ -1518,6 +1519,7 @@ struct bpf_prog_aux { struct module *mod; u32 num_exentries; struct exception_table_entry *extable; + struct bpf_exception_frame_desc_tab *fdtab; union { struct work_struct work; struct rcu_head rcu; @@ -3367,4 +3369,29 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog) return prog->aux->func_idx != 0; } +struct bpf_frame_desc_reg_entry { + u32 type; + s16 spill_type; + union { + s16 off; + u16 regno; + }; + struct btf *btf; + u32 btf_id; +}; + +struct bpf_exception_frame_desc { + u64 pc; + u32 stack_cnt; + struct bpf_frame_desc_reg_entry regs[4]; + struct bpf_frame_desc_reg_entry stack[]; +}; + +struct bpf_exception_frame_desc_tab { + u32 cnt; + struct bpf_exception_frame_desc **desc; +}; + +void bpf_exception_frame_desc_tab_free(struct bpf_exception_frame_desc_tab *fdtab); + #endif /* _LINUX_BPF_H */ diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 5482701e6ad9..0113a3a940e2 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -631,6 +631,8 @@ struct bpf_subprog_info { u8 arg_cnt; struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS]; + + struct bpf_exception_frame_desc_tab *fdtab; }; struct bpf_verifier_env; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 71c459a51d9e..995a4dcfa970 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2734,6 +2734,14 @@ static void bpf_free_used_btfs(struct bpf_prog_aux *aux) kfree(aux->used_btfs); } +void bpf_exception_frame_desc_tab_free(struct bpf_exception_frame_desc_tab *fdtab) +{ + if (!fdtab) + return; + for (int i = 0; i < fdtab->cnt; i++) + kfree(fdtab->desc[i]); +} + static void bpf_prog_free_deferred(struct work_struct *work) { struct bpf_prog_aux *aux; @@ -2747,6 +2755,11 @@ static void bpf_prog_free_deferred(struct work_struct *work) if (aux->cgroup_atype != CGROUP_BPF_ATTACH_TYPE_INVALID) bpf_cgroup_atype_put(aux->cgroup_atype); #endif + /* Free all exception frame descriptors */ + for (int i = 0; i < aux->func_cnt; i++) { + bpf_exception_frame_desc_tab_free(aux->func[i]->aux->fdtab); + aux->func[i]->aux->fdtab = NULL; + } bpf_free_used_maps(aux); bpf_free_used_btfs(aux); if (bpf_prog_is_dev_bound(aux)) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 03ad9a9d47c9..27233c308d83 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10004,6 +10004,366 @@ record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, return 0; } +static void print_frame_desc_reg_entry(struct bpf_verifier_env *env, struct bpf_frame_desc_reg_entry *fd, const char *pfx) +{ + const char *type = fd->off < 0 ? "stack" : "reg"; + const char *key = fd->off < 0 ? "off" : "regno"; + const char *spill_type; + + switch (fd->spill_type) { + case STACK_INVALID: + spill_type = "<unknown>"; + break; + case STACK_SPILL: + spill_type = "reg"; + break; + case STACK_DYNPTR: + spill_type = "dynptr"; + break; + case STACK_ITER: + spill_type = "iter"; + break; + default: + spill_type = "???"; + break; + } + verbose(env, "frame_desc: %s%s fde: %s=%d spill_type=%s ", pfx, type, key, fd->off, spill_type); + if (fd->btf) { + const struct btf_type *t = btf_type_by_id(fd->btf, fd->btf_id); + + verbose(env, "type=%s%s btf=%s btf_id=%d\n", fd->off < 0 ? "" : "ptr_", + btf_name_by_offset(fd->btf, t->name_off), btf_get_name(fd->btf), fd->btf_id); + } else { + verbose(env, "type=%s\n", fd->spill_type == STACK_DYNPTR ? "ringbuf" : reg_type_str(env, fd->type)); + } +} + +static int merge_frame_desc(struct bpf_verifier_env *env, struct bpf_frame_desc_reg_entry *ofd, struct bpf_frame_desc_reg_entry *fd) +{ + int ofd_type, fd_type; + + /* If ofd->off/regno is 0, this is uninitialized reg entry, just merge new entry. */ + if (!ofd->off) + goto merge_new; + /* Exact merge for dynptr, iter, reg, stack. */ + if (!memcmp(ofd, fd, sizeof(*ofd))) + goto none; + /* First, for a successful merge, both spill_type should be same.*/ + if (ofd->spill_type != fd->spill_type) + goto fail; + /* Then, both should correspond to a reg or stack entry for non-exact merge. */ + if (ofd->spill_type != STACK_SPILL && ofd->spill_type != STACK_INVALID) + goto fail; + ofd_type = ofd->type; + fd_type = fd->type; + /* One of the old or new entry must be NULL, if both are not same. */ + if (ofd_type == fd_type) + goto none; + if (ofd_type != SCALAR_VALUE && fd_type != SCALAR_VALUE) + goto fail; + if (fd_type == SCALAR_VALUE) + goto none; + verbose(env, "frame_desc: merge: merging new frame desc entry into old\n"); + print_frame_desc_reg_entry(env, ofd, "old "); +merge_new: + if (!ofd->off) + verbose(env, "frame_desc: merge: creating new frame desc entry\n"); + print_frame_desc_reg_entry(env, fd, "new "); + *ofd = *fd; + return 0; +none: + verbose(env, "frame_desc: merge: no merging needed of new frame desc entry into old\n"); + print_frame_desc_reg_entry(env, ofd, "old "); + print_frame_desc_reg_entry(env, fd, "new "); + return 0; +fail: + verbose(env, "frame_desc: merge: failed to merge old and new frame desc entry\n"); + print_frame_desc_reg_entry(env, ofd, "old "); + print_frame_desc_reg_entry(env, fd, "new "); + return -EINVAL; +} + +static int find_and_merge_frame_desc(struct bpf_verifier_env *env, struct bpf_exception_frame_desc_tab *fdtab, u64 pc, struct bpf_frame_desc_reg_entry *fd) +{ + struct bpf_exception_frame_desc **descs = NULL, *desc = NULL, *p; + int ret = 0; + + for (int i = 0; i < fdtab->cnt; i++) { + if (pc != fdtab->desc[i]->pc) + continue; + descs = &fdtab->desc[i]; + desc = fdtab->desc[i]; + break; + } + + if (!desc) { + verbose(env, "frame_desc: find_and_merge: cannot find frame descriptor for pc=%llu, creating new entry\n", pc); + return -ENOENT; + } + + if (fd->off < 0) + goto stack; + /* We didn't find a match for regno or offset, fill it into the frame descriptor. */ + return merge_frame_desc(env, &desc->regs[fd->regno - BPF_REG_6], fd); + +stack: + for (int i = 0; i < desc->stack_cnt; i++) { + struct bpf_frame_desc_reg_entry *ofd = desc->stack + i; + + if (ofd->off != fd->off) + continue; + ret = merge_frame_desc(env, ofd, fd); + if (ret < 0) + return ret; + return 0; + } + p = krealloc(desc, offsetof(typeof(*desc), stack[desc->stack_cnt + 1]), GFP_USER | __GFP_ZERO); + if (!p) { + return -ENOMEM; + } + verbose(env, "frame_desc: merge: creating new frame desc entry\n"); + print_frame_desc_reg_entry(env, fd, "new "); + desc = p; + desc->stack[desc->stack_cnt] = *fd; + desc->stack_cnt++; + *descs = desc; + return 0; +} + +/* Implementation details: + * This function is responsible for pushing a prepared bpf_frame_desc_reg_entry + * into the frame descriptor array tied to each subprog. The first step is + * ensuring the array is allocated and has enough capacity. Second, we must find + * if there is an existing descriptor already for the program counter under + * consideration, and try to report an error if we see conflicting frame + * descriptor generation requests for the same instruction in the program. + * Note that by default, we let NULL registers and stack slots occupy an entry. + * This is done so that any future non-NULL registers or stack slots at the same + * regno or offset can be satisfied by changing the type of entry to a "stronger" + * pointer type. The release handler can deal with NULL or valid values, + * therefore such a logic allows handling cases where the program may only have + * a pointer in some of the program paths and NULL in others while reaching the + * same instruction that causes an exception to be thrown. + * Likewise, a NULL entry merges into the stronger pointer type entry when a + * frame descriptor already exists before pushing a new one. + */ +static int push_exception_frame_desc(struct bpf_verifier_env *env, int frameno, struct bpf_frame_desc_reg_entry *fd) +{ + struct bpf_func_state *frame = env->cur_state->frame[frameno], *curframe = cur_func(env); + struct bpf_subprog_info *si = subprog_info(env, frame->subprogno); + struct bpf_exception_frame_desc_tab *fdtab = si->fdtab; + struct bpf_exception_frame_desc **desc; + u64 pc = env->insn_idx; + int ret; + + /* If this is not the current frame, then we need to figure out the callsite + * for its callee to identify the pc. + */ + if (frameno != curframe->frameno) + pc = env->cur_state->frame[frameno + 1]->callsite; + + if (!fdtab) { + fdtab = kzalloc(sizeof(*si->fdtab), GFP_USER); + if (!fdtab) + return -ENOMEM; + fdtab->desc = kzalloc(sizeof(*fdtab->desc), GFP_USER); + if (!fdtab->desc) { + kfree(fdtab); + return -ENOMEM; + } + si->fdtab = fdtab; + } + + ret = find_and_merge_frame_desc(env, fdtab, pc, fd); + if (!ret) + return 0; + if (ret < 0 && ret != -ENOENT) + return ret; + /* We didn't find a frame descriptor for pc, grow the array and insert it. */ + desc = realloc_array(fdtab->desc, fdtab->cnt ?: 1, fdtab->cnt + 1, sizeof(*fdtab->desc)); + if (!desc) { + return -ENOMEM; + } + fdtab->desc = desc; + fdtab->desc[fdtab->cnt] = kzalloc(sizeof(*fdtab->desc[0]), GFP_USER); + if (!fdtab->desc[fdtab->cnt]) + return -ENOMEM; + fdtab->desc[fdtab->cnt]->pc = pc; + fdtab->cnt++; + return find_and_merge_frame_desc(env, fdtab, pc, fd); +} + +static int gen_exception_frame_desc_reg_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno) +{ + struct bpf_frame_desc_reg_entry fd = {}; + + if ((!reg->ref_obj_id && reg->type != NOT_INIT) || reg->type == SCALAR_VALUE) + return 0; + if (base_type(reg->type) == PTR_TO_BTF_ID) + fd.btf = reg->btf; + fd.type = reg->type & ~PTR_MAYBE_NULL; + fd.btf_id = fd.btf ? reg->btf_id : 0; + fd.spill_type = off < 0 ? STACK_SPILL : STACK_INVALID; + fd.off = off; + verbose(env, "frame_desc: frame%d: insn_idx=%d %s=%d size=%d ref_obj_id=%d type=%s\n", + frameno, env->insn_idx, off < 0 ? "off" : "regno", off, BPF_REG_SIZE, reg->ref_obj_id, reg_type_str(env, reg->type)); + return push_exception_frame_desc(env, frameno, &fd); +} + +static int gen_exception_frame_desc_dynptr_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno) +{ + struct bpf_frame_desc_reg_entry fd = {}; + int type = reg->dynptr.type; + + /* We only need to generate an entry when the dynptr is refcounted, + * otherwise it encapsulates no resource that needs to be released. + */ + if (!dynptr_type_refcounted(type)) + return 0; + switch (type) { + case BPF_DYNPTR_TYPE_RINGBUF: + fd.type = BPF_DYNPTR_TYPE_RINGBUF; + fd.spill_type = STACK_DYNPTR; + fd.off = off; + verbose(env, "frame_desc: frame%d: insn_idx=%d off=%d size=%lu dynptr_ringbuf\n", frameno, env->insn_idx, off, + BPF_DYNPTR_NR_SLOTS * BPF_REG_SIZE); + break; + default: + verbose(env, "verifier internal error: refcounted dynptr type unhandled for exception frame descriptor entry\n"); + return -EFAULT; + } + return push_exception_frame_desc(env, frameno, &fd); +} + +static int add_used_btf(struct bpf_verifier_env *env, struct btf *btf); + +static int gen_exception_frame_desc_iter_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno) +{ + struct bpf_frame_desc_reg_entry fd = {}; + struct btf *btf = reg->iter.btf; + u32 btf_id = reg->iter.btf_id; + const struct btf_type *t; + int ret; + + fd.btf = btf; + fd.type = reg->type; + fd.btf_id = btf_id; + fd.spill_type = STACK_ITER; + fd.off = off; + t = btf_type_by_id(btf, btf_id); + verbose(env, "frame_desc: frame%d: insn_idx=%d off=%d size=%u ref_obj_id=%d iter_%s\n", + frameno, env->insn_idx, off, t->size, reg->ref_obj_id, btf_name_by_offset(btf, t->name_off)); + btf_get(btf); + ret = add_used_btf(env, btf); + if (ret < 0) { + btf_put(btf); + return ret; + } + return push_exception_frame_desc(env, frameno, &fd); +} + +static int gen_exception_frame_desc_stack_entry(struct bpf_verifier_env *env, struct bpf_func_state *frame, int stack_off) +{ + int spi = stack_off / BPF_REG_SIZE, off = -stack_off - 1; + struct bpf_reg_state *reg, not_init_reg, null_reg; + int slot_type, ret; + + __mark_reg_not_init(env, &not_init_reg); + __mark_reg_known_zero(&null_reg); + + slot_type = frame->stack[spi].slot_type[BPF_REG_SIZE - 1]; + reg = &frame->stack[spi].spilled_ptr; + + switch (slot_type) { + case STACK_SPILL: + /* We skip all kinds of scalar registers, except NULL values, which consume a slot. */ + if (is_spilled_scalar_reg(&frame->stack[spi]) && !register_is_null(&frame->stack[spi].spilled_ptr)) + break; + ret = gen_exception_frame_desc_reg_entry(env, reg, off, frame->frameno); + if (ret < 0) + return ret; + break; + case STACK_DYNPTR: + /* Keep iterating until we find the first slot. */ + if (!reg->dynptr.first_slot) + break; + ret = gen_exception_frame_desc_dynptr_entry(env, reg, off, frame->frameno); + if (ret < 0) + return ret; + break; + case STACK_ITER: + /* Keep iterating until we find the first slot. */ + if (!reg->ref_obj_id) + break; + ret = gen_exception_frame_desc_iter_entry(env, reg, off, frame->frameno); + if (ret < 0) + return ret; + break; + case STACK_MISC: + case STACK_INVALID: + /* Create an invalid entry for MISC and INVALID */ + ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno); + if (ret < 0) + return 0; + break; + case STACK_ZERO: + reg = &null_reg; + for (int i = BPF_REG_SIZE - 1; i >= 0; i--) { + if (frame->stack[spi].slot_type[i] != STACK_ZERO) + reg = &not_init_reg; + } + ret = gen_exception_frame_desc_reg_entry(env, &null_reg, off, frame->frameno); + if (ret < 0) + return ret; + break; + default: + verbose(env, "verifier internal error: frame%d stack off=%d slot_type=%d missing handling for exception frame generation\n", + frame->frameno, off, slot_type); + return -EFAULT; + } + return 0; +} + +/* We generate exception descriptors for all frames at the current program + * counter. For caller frames, we use their callsite as their program counter, + * and we go on generating it until the main frame. + * + * It's necessary to detect whether the stack layout is different, in that case + * frame descriptor generation should fail and we cannot really support runtime + * unwinding in that case. + */ +static int gen_exception_frame_descs(struct bpf_verifier_env *env) +{ + struct bpf_reg_state not_init_reg; + int ret; + + __mark_reg_not_init(env, &not_init_reg); + + for (int frameno = env->cur_state->curframe; frameno >= 0; frameno--) { + struct bpf_func_state *frame = env->cur_state->frame[frameno]; + + verbose(env, "frame_desc: frame%d: Stack:\n", frameno); + for (int i = BPF_REG_SIZE - 1; i < frame->allocated_stack; i += BPF_REG_SIZE) { + ret = gen_exception_frame_desc_stack_entry(env, frame, i); + if (ret < 0) + return ret; + } + + verbose(env, "frame_desc: frame%d: Registers:\n", frameno); + for (int i = BPF_REG_6; i < BPF_REG_FP; i++) { + struct bpf_reg_state *reg = &frame->regs[i]; + + /* Treat havoc scalars as incompatible type. */ + if (reg->type == SCALAR_VALUE && !register_is_null(reg)) + reg = &not_init_reg; + ret = gen_exception_frame_desc_reg_entry(env, reg, i, frame->frameno); + if (ret < 0) + return ret; + } + } + return 0; +} + static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit) { struct bpf_func_state *state = cur_func(env); @@ -17694,12 +18054,18 @@ static int do_check(struct bpf_verifier_env *env) err = check_func_call(env, insn, &env->insn_idx); if (!err && env->cur_state->global_subprog_call_exception) { env->cur_state->global_subprog_call_exception = false; + err = gen_exception_frame_descs(env); + if (err < 0) + return err; exception_exit = true; goto process_bpf_exit_full; } } else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) { err = check_kfunc_call(env, insn, &env->insn_idx); if (!err && is_bpf_throw_kfunc(insn)) { + err = gen_exception_frame_descs(env); + if (err < 0) + return err; exception_exit = true; goto process_bpf_exit_full; } @@ -21184,6 +21550,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3 mutex_unlock(&bpf_verifier_lock); vfree(env->insn_aux_data); err_free_env: + for (int i = 0; i < env->subprog_cnt; i++) + bpf_exception_frame_desc_tab_free(env->subprog_info[i].fdtab); kfree(env); return ret; } diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c index 28602f905d7d..5a517065b4e6 100644 --- a/tools/testing/selftests/bpf/progs/exceptions_fail.c +++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c @@ -213,7 +213,7 @@ __noinline static int subprog_cb_ref(u32 i, void *ctx) } SEC("?tc") -__failure __msg("Unreleased reference") +__failure __msg("cannot be called from callback subprog 1") int reject_with_cb_reference(void *ctx) { struct foo *f; @@ -235,7 +235,7 @@ int reject_with_cb(void *ctx) } SEC("?tc") -__failure __msg("Unreleased reference") +__success int reject_with_subprog_reference(void *ctx) { return subprog_ref(ctx) + 1;

[RFC,v1,05/14] bpf: Implement BPF exception frame descriptor generation

Checks

Commit Message

Comments

Patch