[v2,bpf-next,7/8] bpf: Allow pro/epilogue to call kfunc

Message ID	20240821233440.1855263-8-martin.lau@linux.dev (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C85DB17BB12 for <bpf@vger.kernel.org>; Wed, 21 Aug 2024 23:35:25 +0000 (UTC) From: Martin KaFai Lau <martin.lau@linux.dev> To: bpf@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org>, Andrii Nakryiko <andrii@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Eduard Zingerman <eddyz87@gmail.com>, Yonghong Song <yonghong.song@linux.dev>, Amery Hung <ameryhung@gmail.com>, kernel-team@meta.com Subject: [PATCH v2 bpf-next 7/8] bpf: Allow pro/epilogue to call kfunc Date: Wed, 21 Aug 2024 16:34:37 -0700 Message-ID: <20240821233440.1855263-8-martin.lau@linux.dev> In-Reply-To: <20240821233440.1855263-1-martin.lau@linux.dev> References: <20240821233440.1855263-1-martin.lau@linux.dev> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	bpf: Add gen_epilogue and allow kfunc call in pro/epilogue \| expand [v2,bpf-next,0/8] bpf: Add gen_epilogue and allow kfunc call in pro/epilogue [v2,bpf-next,1/8] bpf: Add gen_epilogue to bpf_verifier_ops [v2,bpf-next,2/8] bpf: Export bpf_base_func_proto [v2,bpf-next,3/8] selftests/bpf: attach struct_ops maps before test prog runs [v2,bpf-next,4/8] selftests/bpf: Test gen_prologue and gen_epilogue [v2,bpf-next,5/8] selftests/bpf: Add tailcall epilogue test [v2,bpf-next,6/8] bpf: Add module parameter to gen_prologue and gen_epilogue [v2,bpf-next,7/8] bpf: Allow pro/epilogue to call kfunc [v2,bpf-next,8/8] selftests/bpf: Add kfunc call test in gen_prologue and gen_epilogue

Context	Check	Description
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/apply	fail	Patch does not apply to bpf-next-0

Martin KaFai Lau Aug. 21, 2024, 11:34 p.m. UTC

From: Martin KaFai Lau <martin.lau@kernel.org>

The existing prologue has been able to call bpf helper but not a kfunc.
This patch allows the prologue/epilogue to call the kfunc.

The subsystem that implements the .gen_prologue and .gen_epilogue
can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
set to the btf func_id of the kfunc call. This part is the same
as the bpf prog loaded from the sys_bpf.

Another piece is to have a way for the subsystem to tell the btf object
of the kfunc func_id. This patch uses the "struct module **module"
argument added to the .gen_prologue and .gen_epilogue
in the previous patch. The verifier will use btf_get_module_btf(module)
to find out the btf object.

The .gen_epi/prologue will usually use THIS_MODULE to initialize
the "*module = THIS_MODULE". Only kfunc(s) from one module (or vmlinux)
can be used in the .gen_epi/prologue now. In the future, the
.gen_epi/prologue can return an array of modules and use the
insn->off as an index into the array.

When the returned module is NULL, the btf is btf_vmlinux. Then the
insn->off stays at 0. This is the same as the sys_bpf.

When the btf is from a module, the btf needs an entry in
prog->aux->kfunc_btf_tab. The kfunc_btf_tab is currently
sorted by insn->off which is the offset to the attr->fd_array.

This module btf may or may not be in the kfunc_btf_tab. A new function
"find_kfunc_desc_btf_offset" is added to search for the existing entry
that has the same btf. If it is found, its offset will be used in
the insn->off. If it is not found, it will find an offset value
that is not used in the kfunc_btf_tab. Add a new entry
to kfunc_btf_tab and set this new offset to the insn->off

Once the insn->off is determined (either reuse an existing one
or an unused one is found), it will call the existing add_kfunc_call()
and everything else should fall through.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
---
 kernel/bpf/verifier.c | 115 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 112 insertions(+), 3 deletions(-)

Alexei Starovoitov Aug. 22, 2024, 1:32 a.m. UTC | #1

On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> From: Martin KaFai Lau <martin.lau@kernel.org>
>
> The existing prologue has been able to call bpf helper but not a kfunc.
> This patch allows the prologue/epilogue to call the kfunc.
>
> The subsystem that implements the .gen_prologue and .gen_epilogue
> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
> set to the btf func_id of the kfunc call. This part is the same
> as the bpf prog loaded from the sys_bpf.

I don't understand the value of this feature, since it seems
pretty hard to use.
The module (qdisc-bpf or else) would need to do something
like patch 8/8:
+BTF_ID_LIST(st_ops_epilogue_kfunc_list)
+BTF_ID(func, bpf_kfunc_st_ops_inc10)
+BTF_ID(func, bpf_kfunc_st_ops_inc100)

just to be able to:
  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
               st_ops_epilogue_kfunc_list[0]);

So a bunch of extra work on the module side and
a bunch of work in this patch to enable such a pattern,
but what is the value?

gen_epilogue() can call arbitrary kernel function.
It doesn't have to be a helper.
kfunc-s provide calling convention conversion from bpf to native,
but the same thing is achieved by BPF_CALL_N macro.
The module can use that macro without adding an actual bpf helper
to uapi bpf.h.
Then in gen_epilogue() the extra bpf insn can use:
BPF_EMIT_CALL(module_provided_helper_that_is_not_helper)
which will use
BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base)
to populate imm.
And JITs will emit jump to that wrapper code provided by
BPF_CALL_N.

And no need for this extra complexity in the verifier and
its consumers that have to figure out (module_fd, btf_id) for
kfunc just to fit into kfunc pattern with btf_distill_func_proto().

I guess one can argue that if such kfunc is already available
to bpf prog then extra BPF_CALL_N wrapper for the same thing
is a waste of kernel text, but this patch also adds quite a bit of
kernel text. So the cost of BPF_CALL_N (which is a zero on x86)
is acceptable.

Martin KaFai Lau Aug. 22, 2024, 6:09 a.m. UTC | #2

On 8/21/24 6:32 PM, Alexei Starovoitov wrote:
> On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> From: Martin KaFai Lau <martin.lau@kernel.org>
>>
>> The existing prologue has been able to call bpf helper but not a kfunc.
>> This patch allows the prologue/epilogue to call the kfunc.
>>
>> The subsystem that implements the .gen_prologue and .gen_epilogue
>> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
>> set to the btf func_id of the kfunc call. This part is the same
>> as the bpf prog loaded from the sys_bpf.
> 
> I don't understand the value of this feature, since it seems
> pretty hard to use.
> The module (qdisc-bpf or else) would need to do something
> like patch 8/8:
> +BTF_ID_LIST(st_ops_epilogue_kfunc_list)
> +BTF_ID(func, bpf_kfunc_st_ops_inc10)
> +BTF_ID(func, bpf_kfunc_st_ops_inc100)
> 
> just to be able to:
>    BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
>                 st_ops_epilogue_kfunc_list[0]);
> 
> So a bunch of extra work on the module side and
> a bunch of work in this patch to enable such a pattern,
> but what is the value?
> 
> gen_epilogue() can call arbitrary kernel function.
> It doesn't have to be a helper.
> kfunc-s provide calling convention conversion from bpf to native,
> but the same thing is achieved by BPF_CALL_N macro.
> The module can use that macro without adding an actual bpf helper
> to uapi bpf.h.
> Then in gen_epilogue() the extra bpf insn can use:
> BPF_EMIT_CALL(module_provided_helper_that_is_not_helper)
> which will use
> BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base)

BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch 
because of the bpf_jit_supports_far_kfunc_call() support for the kernel module. 
Using kfunc call will make supporting it the same.

I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been 
built-in only also. I think the hid_bpf is built-in only also.

Another consideration is also holding the module refcnt when having an 
attachable bpf prog calling a kernel func implemented in a kernel module. iiuc, 
this is the reason why aux->kfunc_btf_tab holds a reference to the kernel 
module. This should not be a problem to struct_ops though because the struct_ops 
map is the one that is attachable instead of the struct_ops prog. The struct_ops 
map has already held a refcnt of the module.

> to populate imm.
> And JITs will emit jump to that wrapper code provided by
> BPF_CALL_N.
> 
> And no need for this extra complexity in the verifier and
> its consumers that have to figure out (module_fd, btf_id) for
> kfunc just to fit into kfunc pattern with btf_distill_func_proto().
> 
> I guess one can argue that if such kfunc is already available
> to bpf prog then extra BPF_CALL_N wrapper for the same thing
> is a waste of kernel text, but this patch also adds quite a bit of
> kernel text. So the cost of BPF_CALL_N (which is a zero on x86)
> is acceptable.

Alexei Starovoitov Aug. 22, 2024, 1:47 p.m. UTC | #3

On Wed, Aug 21, 2024 at 11:10 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 8/21/24 6:32 PM, Alexei Starovoitov wrote:
> > On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>
> >> From: Martin KaFai Lau <martin.lau@kernel.org>
> >>
> >> The existing prologue has been able to call bpf helper but not a kfunc.
> >> This patch allows the prologue/epilogue to call the kfunc.
> >>
> >> The subsystem that implements the .gen_prologue and .gen_epilogue
> >> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
> >> set to the btf func_id of the kfunc call. This part is the same
> >> as the bpf prog loaded from the sys_bpf.
> >
> > I don't understand the value of this feature, since it seems
> > pretty hard to use.
> > The module (qdisc-bpf or else) would need to do something
> > like patch 8/8:
> > +BTF_ID_LIST(st_ops_epilogue_kfunc_list)
> > +BTF_ID(func, bpf_kfunc_st_ops_inc10)
> > +BTF_ID(func, bpf_kfunc_st_ops_inc100)
> >
> > just to be able to:
> >    BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
> >                 st_ops_epilogue_kfunc_list[0]);
> >
> > So a bunch of extra work on the module side and
> > a bunch of work in this patch to enable such a pattern,
> > but what is the value?
> >
> > gen_epilogue() can call arbitrary kernel function.
> > It doesn't have to be a helper.
> > kfunc-s provide calling convention conversion from bpf to native,
> > but the same thing is achieved by BPF_CALL_N macro.
> > The module can use that macro without adding an actual bpf helper
> > to uapi bpf.h.
> > Then in gen_epilogue() the extra bpf insn can use:
> > BPF_EMIT_CALL(module_provided_helper_that_is_not_helper)
> > which will use
> > BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base)
>
> BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch
> because of the bpf_jit_supports_far_kfunc_call() support for the kernel module.
> Using kfunc call will make supporting it the same.

I believe far calls are typically slower,
so it may be a foot gun.
If something like qdisc-bpf adding a function call to bpf_exit
it will be called every time the program is called, so
it needs to be really fast.
Allowing such callable funcs in modules may be a performance issue
that we'd need to fix.
So imo making a design requirement that such funcs for gen_epilogoue()
need to be in kernel text is a good thing.

> I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been
> built-in only also. I think the hid_bpf is built-in only also.

I don't think hid_bpf has any need for such gen_epilogue() adjustment.
tcp-bpf-cc probably doesn't need it either.
it's cleaner to fix up on the kernel side, no?
qdisc-bpf and ->dev stuff is probably the only upcoming user.
And that's a separate discussion. I'm not sure such gen_epilogoue()
concept is really that great.
Especially considering all the complexity involved.

Martin KaFai Lau Aug. 22, 2024, 5:38 p.m. UTC | #4

On 8/22/24 6:47 AM, Alexei Starovoitov wrote:
> On Wed, Aug 21, 2024 at 11:10 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 8/21/24 6:32 PM, Alexei Starovoitov wrote:
>>> On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>
>>>> From: Martin KaFai Lau <martin.lau@kernel.org>
>>>>
>>>> The existing prologue has been able to call bpf helper but not a kfunc.
>>>> This patch allows the prologue/epilogue to call the kfunc.
>>>>
>>>> The subsystem that implements the .gen_prologue and .gen_epilogue
>>>> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
>>>> set to the btf func_id of the kfunc call. This part is the same
>>>> as the bpf prog loaded from the sys_bpf.
>>>
>>> I don't understand the value of this feature, since it seems
>>> pretty hard to use.
>>> The module (qdisc-bpf or else) would need to do something
>>> like patch 8/8:
>>> +BTF_ID_LIST(st_ops_epilogue_kfunc_list)
>>> +BTF_ID(func, bpf_kfunc_st_ops_inc10)
>>> +BTF_ID(func, bpf_kfunc_st_ops_inc100)
>>>
>>> just to be able to:
>>>     BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
>>>                  st_ops_epilogue_kfunc_list[0]);
>>>
>>> So a bunch of extra work on the module side and
>>> a bunch of work in this patch to enable such a pattern,
>>> but what is the value?
>>>
>>> gen_epilogue() can call arbitrary kernel function.
>>> It doesn't have to be a helper.
>>> kfunc-s provide calling convention conversion from bpf to native,
>>> but the same thing is achieved by BPF_CALL_N macro.
>>> The module can use that macro without adding an actual bpf helper
>>> to uapi bpf.h.
>>> Then in gen_epilogue() the extra bpf insn can use:
>>> BPF_EMIT_CALL(module_provided_helper_that_is_not_helper)
>>> which will use
>>> BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base)
>>
>> BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch
>> because of the bpf_jit_supports_far_kfunc_call() support for the kernel module.
>> Using kfunc call will make supporting it the same.
> 
> I believe far calls are typically slower,
> so it may be a foot gun.
> If something like qdisc-bpf adding a function call to bpf_exit
> it will be called every time the program is called, so
> it needs to be really fast.
> Allowing such callable funcs in modules may be a performance issue
> that we'd need to fix.
> So imo making a design requirement that such funcs for gen_epilogoue()
> need to be in kernel text is a good thing.

Agreed. Make sense.

> 
>> I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been
>> built-in only also. I think the hid_bpf is built-in only also.
> 
> I don't think hid_bpf has any need for such gen_epilogue() adjustment.
> tcp-bpf-cc probably doesn't need it either.
> it's cleaner to fix up on the kernel side, no?

tcp-bpf-cc can use it to fix snd_cwnd. We have seen a mistake that snd_cwnd was 
set to 0 (or negative, can't remember which one). >1 ops of the 
tcp_congestion_ops may update the snd_cwnd, so there will be multiple places it 
needs to do an extra check/fix in the kernel. It is usually not the fast path, 
so may be ok.

It is not catastrophic as skb->dev. kfunc was not introduced at that time also. 
Otherwise, having a kfunc to set the snd_cwnd instead could have been an option.

> qdisc-bpf and ->dev stuff is probably the only upcoming user.

For skb->dev, may be having a dedicated kfuncs for skb->dev manipulation is the 
way to go? The example could be operations that need to touch the 
skb->rbnode/dev sharing pointer.

For fixing ->dev in the kernel, there are multiple places doing ->dequeue and 
not sure if we need to include the child->dequeue also. This fixing could be 
refactored to a kernel function and probably need to a static key in this fast 
path case.

> And that's a separate discussion. I'm not sure such gen_epilogoue()
> concept is really that great.
> Especially considering all the complexity involved.

I am curious on the problem you pointed out at patch 1 regardless, I am going to 
give it a try and remove the kfunc call. I made kfunc call separated at patch 7 
and 8 :)

If it still looks too complex or there is no value on gen_epilogue, I am fine to 
table this set.

Alexei Starovoitov Aug. 22, 2024, 5:58 p.m. UTC | #5

On Thu, Aug 22, 2024 at 10:38 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 8/22/24 6:47 AM, Alexei Starovoitov wrote:
> > On Wed, Aug 21, 2024 at 11:10 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>
> >> On 8/21/24 6:32 PM, Alexei Starovoitov wrote:
> >>> On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>>>
> >>>> From: Martin KaFai Lau <martin.lau@kernel.org>
> >>>>
> >>>> The existing prologue has been able to call bpf helper but not a kfunc.
> >>>> This patch allows the prologue/epilogue to call the kfunc.
> >>>>
> >>>> The subsystem that implements the .gen_prologue and .gen_epilogue
> >>>> can add the BPF_PSEUDO_KFUNC_CALL instruction with insn->imm
> >>>> set to the btf func_id of the kfunc call. This part is the same
> >>>> as the bpf prog loaded from the sys_bpf.
> >>>
> >>> I don't understand the value of this feature, since it seems
> >>> pretty hard to use.
> >>> The module (qdisc-bpf or else) would need to do something
> >>> like patch 8/8:
> >>> +BTF_ID_LIST(st_ops_epilogue_kfunc_list)
> >>> +BTF_ID(func, bpf_kfunc_st_ops_inc10)
> >>> +BTF_ID(func, bpf_kfunc_st_ops_inc100)
> >>>
> >>> just to be able to:
> >>>     BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0,
> >>>                  st_ops_epilogue_kfunc_list[0]);
> >>>
> >>> So a bunch of extra work on the module side and
> >>> a bunch of work in this patch to enable such a pattern,
> >>> but what is the value?
> >>>
> >>> gen_epilogue() can call arbitrary kernel function.
> >>> It doesn't have to be a helper.
> >>> kfunc-s provide calling convention conversion from bpf to native,
> >>> but the same thing is achieved by BPF_CALL_N macro.
> >>> The module can use that macro without adding an actual bpf helper
> >>> to uapi bpf.h.
> >>> Then in gen_epilogue() the extra bpf insn can use:
> >>> BPF_EMIT_CALL(module_provided_helper_that_is_not_helper)
> >>> which will use
> >>> BPF_CALL_IMM(x) ((void *)(x) - (void *)__bpf_call_base)
> >>
> >> BPF_EMIT_CALL() was my earlier thought. I switched to the kfunc in this patch
> >> because of the bpf_jit_supports_far_kfunc_call() support for the kernel module.
> >> Using kfunc call will make supporting it the same.
> >
> > I believe far calls are typically slower,
> > so it may be a foot gun.
> > If something like qdisc-bpf adding a function call to bpf_exit
> > it will be called every time the program is called, so
> > it needs to be really fast.
> > Allowing such callable funcs in modules may be a performance issue
> > that we'd need to fix.
> > So imo making a design requirement that such funcs for gen_epilogoue()
> > need to be in kernel text is a good thing.
>
> Agreed. Make sense.
>
> >
> >> I think the future bpf-qdisc can enforce built-in. bpf-tcp-cc has already been
> >> built-in only also. I think the hid_bpf is built-in only also.
> >
> > I don't think hid_bpf has any need for such gen_epilogue() adjustment.
> > tcp-bpf-cc probably doesn't need it either.
> > it's cleaner to fix up on the kernel side, no?
>
> tcp-bpf-cc can use it to fix snd_cwnd. We have seen a mistake that snd_cwnd was
> set to 0 (or negative, can't remember which one). >1 ops of the
> tcp_congestion_ops may update the snd_cwnd, so there will be multiple places it
> needs to do an extra check/fix in the kernel. It is usually not the fast path,
> so may be ok.
>
> It is not catastrophic as skb->dev. kfunc was not introduced at that time also.
> Otherwise, having a kfunc to set the snd_cwnd instead could have been an option.
>
> > qdisc-bpf and ->dev stuff is probably the only upcoming user.
>
> For skb->dev, may be having a dedicated kfuncs for skb->dev manipulation is the
> way to go? The example could be operations that need to touch the
> skb->rbnode/dev sharing pointer.
>
> For fixing ->dev in the kernel, there are multiple places doing ->dequeue and
> not sure if we need to include the child->dequeue also. This fixing could be
> refactored to a kernel function and probably need to a static key in this fast
> path case.
>
> > And that's a separate discussion. I'm not sure such gen_epilogoue()
> > concept is really that great.
> > Especially considering all the complexity involved.
>
> I am curious on the problem you pointed out at patch 1 regardless, I am going to
> give it a try and remove the kfunc call. I made kfunc call separated at patch 7
> and 8 :)
>
> If it still looks too complex or there is no value on gen_epilogue, I am fine to
> table this set.

I think the patches 1-6 are fine and good to go.
Mainly because they simplify landing of qdisc-bpf.
Once all pieces are there we may revisit the need for gen_epilogoue()
and whether there is an alternative.
I would only drop 7 and 8 for now until it's absolutely needed.

[v2,bpf-next,7/8] bpf: Allow pro/epilogue to call kfunc

Checks

Commit Message

Comments

Patch