Message ID | 20240821233440.1855263-8-martin.lau@linux.dev (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf: Add gen_epilogue and allow kfunc call in pro/epilogue | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for bpf-next, async |
netdev/apply | fail | Patch does not apply to bpf-next-0 |
On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > From: Martin KaFai Lau <martin.lau@kernel.org> > > The existing prologue has been able to call bpf helper but not a kfunc. > This patch allows the prologue/epilogue to call the kfunc. > > The subsystem that implements the .gen_prologue and .gen_epilogue > can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm > set to the btf func_id of the kfunc call. This part is the same > as the bpf prog loaded from the sys_bpf. I don't understand the value of this feature, since it seems pretty hard to use. The module (qdisc-bpf or else) would need to do something like patch 8/8: +BTF_ID_LIST(st_ops_epilogue_kfunc_list) +BTF_ID(func, bpf_kfunc_st_ops_inc10) +BTF_ID(func, bpf_kfunc_st_ops_inc100) just to be able to: BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, st_ops_epilogue_kfunc_list[0]); So a bunch of extra work on the module side and a bunch of work in this patch to enable such a pattern, but what is the value? gen_epilogue() can call arbitrary kernel function. It doesn't have to be a helper. kfunc-s provide calling convention conversion from bpf to native, but the same thing is achieved by BPF_CALL_N macro. The module can use that macro without adding an actual bpf helper to uapi bpf.h. Then in gen_epilogue() the extra bpf insn can use: BPF_EMIT_CALL(module_provided_helper_that_is_not_helper) which will use BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base) to populate imm. And JITs will emit jump to that wrapper code provided by BPF_CALL_N. And no need for this extra complexity in the verifier and its consumers that have to figure out (module_fd, btf_id) for kfunc just to fit into kfunc pattern with btf_distill_func_proto(). I guess one can argue that if such kfunc is already available to bpf prog then extra BPF_CALL_N wrapper for the same thing is a waste of kernel text, but this patch also adds quite a bit of kernel text. So the cost of BPF_CALL_N (which is a zero on x86) is acceptable.
On 8/21/24 6:32 PM, Alexei Starovoitov wrote: > On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: >> >> From: Martin KaFai Lau <martin.lau@kernel.org> >> >> The existing prologue has been able to call bpf helper but not a kfunc. >> This patch allows the prologue/epilogue to call the kfunc. >> >> The subsystem that implements the .gen_prologue and .gen_epilogue >> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm >> set to the btf func_id of the kfunc call. This part is the same >> as the bpf prog loaded from the sys_bpf. > > I don't understand the value of this feature, since it seems > pretty hard to use. > The module (qdisc-bpf or else) would need to do something > like patch 8/8: > +BTF_ID_LIST(st_ops_epilogue_kfunc_list) > +BTF_ID(func, bpf_kfunc_st_ops_inc10) > +BTF_ID(func, bpf_kfunc_st_ops_inc100) > > just to be able to: > BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, > st_ops_epilogue_kfunc_list[0]); > > So a bunch of extra work on the module side and > a bunch of work in this patch to enable such a pattern, > but what is the value? > > gen_epilogue() can call arbitrary kernel function. > It doesn't have to be a helper. > kfunc-s provide calling convention conversion from bpf to native, > but the same thing is achieved by BPF_CALL_N macro. > The module can use that macro without adding an actual bpf helper > to uapi bpf.h. > Then in gen_epilogue() the extra bpf insn can use: > BPF_EMIT_CALL(module_provided_helper_that_is_not_helper) > which will use > BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base) BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch because of the bpf_jit_supports_far_kfunc_call() support for the kernel module. Using kfunc call will make supporting it the same. I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been built-in only also. I think the hid_bpf is built-in only also. Another consideration is also holding the module refcnt when having an attachable bpf prog calling a kernel func implemented in a kernel module. iiuc, this is the reason why aux->kfunc_btf_tab holds a reference to the kernel module. This should not be a problem to struct_ops though because the struct_ops map is the one that is attachable instead of the struct_ops prog. The struct_ops map has already held a refcnt of the module. > to populate imm. > And JITs will emit jump to that wrapper code provided by > BPF_CALL_N. > > And no need for this extra complexity in the verifier and > its consumers that have to figure out (module_fd, btf_id) for > kfunc just to fit into kfunc pattern with btf_distill_func_proto(). > > I guess one can argue that if such kfunc is already available > to bpf prog then extra BPF_CALL_N wrapper for the same thing > is a waste of kernel text, but this patch also adds quite a bit of > kernel text. So the cost of BPF_CALL_N (which is a zero on x86) > is acceptable.
On Wed, Aug 21, 2024 at 11:10 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 8/21/24 6:32 PM, Alexei Starovoitov wrote: > > On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >> > >> From: Martin KaFai Lau <martin.lau@kernel.org> > >> > >> The existing prologue has been able to call bpf helper but not a kfunc. > >> This patch allows the prologue/epilogue to call the kfunc. > >> > >> The subsystem that implements the .gen_prologue and .gen_epilogue > >> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm > >> set to the btf func_id of the kfunc call. This part is the same > >> as the bpf prog loaded from the sys_bpf. > > > > I don't understand the value of this feature, since it seems > > pretty hard to use. > > The module (qdisc-bpf or else) would need to do something > > like patch 8/8: > > +BTF_ID_LIST(st_ops_epilogue_kfunc_list) > > +BTF_ID(func, bpf_kfunc_st_ops_inc10) > > +BTF_ID(func, bpf_kfunc_st_ops_inc100) > > > > just to be able to: > > BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, > > st_ops_epilogue_kfunc_list[0]); > > > > So a bunch of extra work on the module side and > > a bunch of work in this patch to enable such a pattern, > > but what is the value? > > > > gen_epilogue() can call arbitrary kernel function. > > It doesn't have to be a helper. > > kfunc-s provide calling convention conversion from bpf to native, > > but the same thing is achieved by BPF_CALL_N macro. > > The module can use that macro without adding an actual bpf helper > > to uapi bpf.h. > > Then in gen_epilogue() the extra bpf insn can use: > > BPF_EMIT_CALL(module_provided_helper_that_is_not_helper) > > which will use > > BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base) > > BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch > because of the bpf_jit_supports_far_kfunc_call() support for the kernel module. > Using kfunc call will make supporting it the same. I believe far calls are typically slower, so it may be a foot gun. If something like qdisc-bpf adding a function call to bpf_exit it will be called every time the program is called, so it needs to be really fast. Allowing such callable funcs in modules may be a performance issue that we'd need to fix. So imo making a design requirement that such funcs for gen_epilogoue() need to be in kernel text is a good thing. > I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been > built-in only also. I think the hid_bpf is built-in only also. I don't think hid_bpf has any need for such gen_epilogue() adjustment. tcp-bpf-cc probably doesn't need it either. it's cleaner to fix up on the kernel side, no? qdisc-bpf and ->dev stuff is probably the only upcoming user. And that's a separate discussion. I'm not sure such gen_epilogoue() concept is really that great. Especially considering all the complexity involved.
On 8/22/24 6:47 AM, Alexei Starovoitov wrote: > On Wed, Aug 21, 2024 at 11:10 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: >> >> On 8/21/24 6:32 PM, Alexei Starovoitov wrote: >>> On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: >>>> >>>> From: Martin KaFai Lau <martin.lau@kernel.org> >>>> >>>> The existing prologue has been able to call bpf helper but not a kfunc. >>>> This patch allows the prologue/epilogue to call the kfunc. >>>> >>>> The subsystem that implements the .gen_prologue and .gen_epilogue >>>> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm >>>> set to the btf func_id of the kfunc call. This part is the same >>>> as the bpf prog loaded from the sys_bpf. >>> >>> I don't understand the value of this feature, since it seems >>> pretty hard to use. >>> The module (qdisc-bpf or else) would need to do something >>> like patch 8/8: >>> +BTF_ID_LIST(st_ops_epilogue_kfunc_list) >>> +BTF_ID(func, bpf_kfunc_st_ops_inc10) >>> +BTF_ID(func, bpf_kfunc_st_ops_inc100) >>> >>> just to be able to: >>> BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, >>> st_ops_epilogue_kfunc_list[0]); >>> >>> So a bunch of extra work on the module side and >>> a bunch of work in this patch to enable such a pattern, >>> but what is the value? >>> >>> gen_epilogue() can call arbitrary kernel function. >>> It doesn't have to be a helper. >>> kfunc-s provide calling convention conversion from bpf to native, >>> but the same thing is achieved by BPF_CALL_N macro. >>> The module can use that macro without adding an actual bpf helper >>> to uapi bpf.h. >>> Then in gen_epilogue() the extra bpf insn can use: >>> BPF_EMIT_CALL(module_provided_helper_that_is_not_helper) >>> which will use >>> BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base) >> >> BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch >> because of the bpf_jit_supports_far_kfunc_call() support for the kernel module. >> Using kfunc call will make supporting it the same. > > I believe far calls are typically slower, > so it may be a foot gun. > If something like qdisc-bpf adding a function call to bpf_exit > it will be called every time the program is called, so > it needs to be really fast. > Allowing such callable funcs in modules may be a performance issue > that we'd need to fix. > So imo making a design requirement that such funcs for gen_epilogoue() > need to be in kernel text is a good thing. Agreed. Make sense. > >> I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been >> built-in only also. I think the hid_bpf is built-in only also. > > I don't think hid_bpf has any need for such gen_epilogue() adjustment. > tcp-bpf-cc probably doesn't need it either. > it's cleaner to fix up on the kernel side, no? tcp-bpf-cc can use it to fix snd_cwnd. We have seen a mistake that snd_cwnd was set to 0 (or negative, can't remember which one). >1 ops of the tcp_congestion_ops may update the snd_cwnd, so there will be multiple places it needs to do an extra check/fix in the kernel. It is usually not the fast path, so may be ok. It is not catastrophic as skb->dev. kfunc was not introduced at that time also. Otherwise, having a kfunc to set the snd_cwnd instead could have been an option. > qdisc-bpf and ->dev stuff is probably the only upcoming user. For skb->dev, may be having a dedicated kfuncs for skb->dev manipulation is the way to go? The example could be operations that need to touch the skb->rbnode/dev sharing pointer. For fixing ->dev in the kernel, there are multiple places doing ->dequeue and not sure if we need to include the child->dequeue also. This fixing could be refactored to a kernel function and probably need to a static key in this fast path case. > And that's a separate discussion. I'm not sure such gen_epilogoue() > concept is really that great. > Especially considering all the complexity involved. I am curious on the problem you pointed out at patch 1 regardless, I am going to give it a try and remove the kfunc call. I made kfunc call separated at patch 7 and 8 :) If it still looks too complex or there is no value on gen_epilogue, I am fine to table this set.
On Thu, Aug 22, 2024 at 10:38 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 8/22/24 6:47 AM, Alexei Starovoitov wrote: > > On Wed, Aug 21, 2024 at 11:10 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >> > >> On 8/21/24 6:32 PM, Alexei Starovoitov wrote: > >>> On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >>>> > >>>> From: Martin KaFai Lau <martin.lau@kernel.org> > >>>> > >>>> The existing prologue has been able to call bpf helper but not a kfunc. > >>>> This patch allows the prologue/epilogue to call the kfunc. > >>>> > >>>> The subsystem that implements the .gen_prologue and .gen_epilogue > >>>> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm > >>>> set to the btf func_id of the kfunc call. This part is the same > >>>> as the bpf prog loaded from the sys_bpf. > >>> > >>> I don't understand the value of this feature, since it seems > >>> pretty hard to use. > >>> The module (qdisc-bpf or else) would need to do something > >>> like patch 8/8: > >>> +BTF_ID_LIST(st_ops_epilogue_kfunc_list) > >>> +BTF_ID(func, bpf_kfunc_st_ops_inc10) > >>> +BTF_ID(func, bpf_kfunc_st_ops_inc100) > >>> > >>> just to be able to: > >>> BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, > >>> st_ops_epilogue_kfunc_list[0]); > >>> > >>> So a bunch of extra work on the module side and > >>> a bunch of work in this patch to enable such a pattern, > >>> but what is the value? > >>> > >>> gen_epilogue() can call arbitrary kernel function. > >>> It doesn't have to be a helper. > >>> kfunc-s provide calling convention conversion from bpf to native, > >>> but the same thing is achieved by BPF_CALL_N macro. > >>> The module can use that macro without adding an actual bpf helper > >>> to uapi bpf.h. > >>> Then in gen_epilogue() the extra bpf insn can use: > >>> BPF_EMIT_CALL(module_provided_helper_that_is_not_helper) > >>> which will use > >>> BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base) > >> > >> BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch > >> because of the bpf_jit_supports_far_kfunc_call() support for the kernel module. > >> Using kfunc call will make supporting it the same. > > > > I believe far calls are typically slower, > > so it may be a foot gun. > > If something like qdisc-bpf adding a function call to bpf_exit > > it will be called every time the program is called, so > > it needs to be really fast. > > Allowing such callable funcs in modules may be a performance issue > > that we'd need to fix. > > So imo making a design requirement that such funcs for gen_epilogoue() > > need to be in kernel text is a good thing. > > Agreed. Make sense. > > > > >> I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been > >> built-in only also. I think the hid_bpf is built-in only also. > > > > I don't think hid_bpf has any need for such gen_epilogue() adjustment. > > tcp-bpf-cc probably doesn't need it either. > > it's cleaner to fix up on the kernel side, no? > > tcp-bpf-cc can use it to fix snd_cwnd. We have seen a mistake that snd_cwnd was > set to 0 (or negative, can't remember which one). >1 ops of the > tcp_congestion_ops may update the snd_cwnd, so there will be multiple places it > needs to do an extra check/fix in the kernel. It is usually not the fast path, > so may be ok. > > It is not catastrophic as skb->dev. kfunc was not introduced at that time also. > Otherwise, having a kfunc to set the snd_cwnd instead could have been an option. > > > qdisc-bpf and ->dev stuff is probably the only upcoming user. > > For skb->dev, may be having a dedicated kfuncs for skb->dev manipulation is the > way to go? The example could be operations that need to touch the > skb->rbnode/dev sharing pointer. > > For fixing ->dev in the kernel, there are multiple places doing ->dequeue and > not sure if we need to include the child->dequeue also. This fixing could be > refactored to a kernel function and probably need to a static key in this fast > path case. > > > And that's a separate discussion. I'm not sure such gen_epilogoue() > > concept is really that great. > > Especially considering all the complexity involved. > > I am curious on the problem you pointed out at patch 1 regardless, I am going to > give it a try and remove the kfunc call. I made kfunc call separated at patch 7 > and 8 :) > > If it still looks too complex or there is no value on gen_epilogue, I am fine to > table this set. I think the patches 1-6 are fine and good to go. Mainly because they simplify landing of qdisc-bpf. Once all pieces are there we may revisit the need for gen_epilogoue() and whether there is an alternative. I would only drop 7 and 8 for now until it's absolutely needed.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 5e995b7884fb..f4ac254a7661 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2787,6 +2787,60 @@ static struct btf *find_kfunc_desc_btf(struct bpf_verifier_env *env, s16 offset) return btf_vmlinux ?: ERR_PTR(-ENOENT); } +static int find_kfunc_desc_btf_offset(struct bpf_verifier_env *env, struct btf *btf, + struct module *module, s16 *offset) +{ + struct bpf_kfunc_btf_tab *tab; + struct bpf_kfunc_btf *b; + s16 new_offset = 1; /* 0 is reserved for btf_vmlinux */ + u32 i; + + if (btf_is_vmlinux(btf)) { + *offset = 0; + return 0; + } + + tab = env->prog->aux->kfunc_btf_tab; + if (!tab) { + tab = kzalloc(sizeof(*tab), GFP_KERNEL); + if (!tab) + return -ENOMEM; + env->prog->aux->kfunc_btf_tab = tab; + } + + b = tab->descs; + for (i = 0; i < tab->nr_descs; i++) { + if (b[i].btf == btf) { + *offset = b[i].offset; + return 0; + } + /* tab->nr_descs (from the sys_bpf) max out at MAX_KFUNC_BTFS + * which is smaller than S16_MAX, so it will be able to find + * a new_offset to use. + */ + if (new_offset == b[i].offset) + new_offset++; + } + + if (tab->nr_descs == MAX_KFUNC_BTFS) { + verbose(env, "too many different module BTFs\n"); + return -E2BIG; + } + + if (!try_module_get(module)) + return -ENXIO; + + b = &tab->descs[tab->nr_descs++]; + btf_get(btf); + b->btf = btf; + b->module = module; + b->offset = new_offset; + *offset = new_offset; + sort(tab->descs, tab->nr_descs, sizeof(tab->descs[0]), + kfunc_btf_cmp_by_off, NULL); + return 0; +} + static int add_kfunc_call(struct bpf_verifier_env *env, u32 func_id, s16 offset) { const struct btf_type *func, *func_proto; @@ -19603,6 +19657,50 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env, return 0; } +static int fixup_pro_epilogue_kfunc(struct bpf_verifier_env *env, struct bpf_insn *insns, + int cnt, struct module *module) +{ + struct btf *btf; + u32 func_id; + int i, err; + s16 offset; + + for (i = 0; i < cnt; i++) { + if (!bpf_pseudo_kfunc_call(&insns[i])) + continue; + + /* The kernel may not have BTF available, so only + * try to get a btf if the pro/epilogue calls a kfunc. + */ + btf = btf_get_module_btf(module); + if (IS_ERR_OR_NULL(btf)) { + verbose(env, "cannot find BTF from %s for kfunc used in pro/epilogue\n", + module_name(module)); + return -EINVAL; + } + + func_id = insns[i].imm; + if (btf_is_vmlinux(btf) && + btf_id_set_contains(&special_kfunc_set, func_id)) { + verbose(env, "pro/epilogue cannot use special kfunc\n"); + btf_put(btf); + return -EINVAL; + } + + err = find_kfunc_desc_btf_offset(env, btf, module, &offset); + btf_put(btf); + if (err) + return err; + + insns[i].off = offset; + err = add_kfunc_call(env, func_id, offset); + if (err) + return err; + } + + return 0; +} + /* convert load instructions that access fields of a context type into a * sequence of instructions that access fields of the underlying structure: * struct __sk_buff -> struct sk_buff @@ -19612,21 +19710,27 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) { struct bpf_subprog_info *subprogs = env->subprog_info; const struct bpf_verifier_ops *ops = env->ops; - int i, cnt, size, ctx_field_size, delta = 0, epilogue_cnt = 0; + int err, i, cnt, size, ctx_field_size, delta = 0, epilogue_cnt = 0; const int insn_cnt = env->prog->len; struct bpf_insn insn_buf[16], epilogue_buf[16], *insn; u32 target_size, size_default, off; struct bpf_prog *new_prog; enum bpf_access_type type; bool is_narrower_load; + struct module *module; if (ops->gen_epilogue) { + module = NULL; epilogue_cnt = ops->gen_epilogue(epilogue_buf, env->prog, - -(subprogs[0].stack_depth + 8), NULL); + -(subprogs[0].stack_depth + 8), &module); if (epilogue_cnt >= ARRAY_SIZE(epilogue_buf)) { verbose(env, "bpf verifier is misconfigured\n"); return -EINVAL; } else if (epilogue_cnt) { + err = fixup_pro_epilogue_kfunc(env, epilogue_buf, epilogue_cnt, module); + if (err) + return err; + /* Save the ARG_PTR_TO_CTX for the epilogue to use */ cnt = 0; subprogs[0].stack_depth += 8; @@ -19646,12 +19750,17 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) verbose(env, "bpf verifier is misconfigured\n"); return -EINVAL; } + module = NULL; cnt = ops->gen_prologue(insn_buf, env->seen_direct_write, - env->prog, NULL); + env->prog, &module); if (cnt >= ARRAY_SIZE(insn_buf)) { verbose(env, "bpf verifier is misconfigured\n"); return -EINVAL; } else if (cnt) { + err = fixup_pro_epilogue_kfunc(env, insn_buf, cnt, module); + if (err) + return err; + new_prog = bpf_patch_insn_data(env, 0, insn_buf, cnt); if (!new_prog) return -ENOMEM;