mbox series

[bpf-next,v6,0/4] bpf trampoline for arm64

Message ID 20220625161255.547944-1-xukuohai@huawei.com (mailing list archive)
Headers show
Series bpf trampoline for arm64 | expand

Message

Xu Kuohai June 25, 2022, 4:12 p.m. UTC
This patchset introduces bpf trampoline on arm64. A bpf trampoline converts
native calling convention to bpf calling convention and is used to implement
various bpf features, such as fentry, fexit, fmod_ret and struct_ops.

The trampoline introduced does essentially the same thing as the bpf
trampoline does on x86.

Tested on raspberry pi 4b and qemu:

 #18 /1     bpf_tcp_ca/dctcp:OK
 #18 /2     bpf_tcp_ca/cubic:OK
 #18 /3     bpf_tcp_ca/invalid_license:OK
 #18 /4     bpf_tcp_ca/dctcp_fallback:OK
 #18 /5     bpf_tcp_ca/rel_setsockopt:OK
 #18        bpf_tcp_ca:OK
 #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
 #51 /2     dummy_st_ops/dummy_init_ret_value:OK
 #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
 #51 /4     dummy_st_ops/dummy_multiple_args:OK
 #51        dummy_st_ops:OK
 #57 /1     fexit_bpf2bpf/target_no_callees:OK
 #57 /2     fexit_bpf2bpf/target_yes_callees:OK
 #57 /3     fexit_bpf2bpf/func_replace:OK
 #57 /4     fexit_bpf2bpf/func_replace_verify:OK
 #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
 #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
 #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
 #57 /8     fexit_bpf2bpf/func_replace_multi:OK
 #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
 #57        fexit_bpf2bpf:OK
 #237       xdp_bpf2bpf:OK

v6:
- Since Mark is refactoring arm64 ftrace to support long jump and reduce the
  ftrace trampoline overhead, it's not clear how we'll attach bpf trampoline
  to regular kernel functions, so remove ftrace related patches for now.
- Add long jump support for attaching bpf trampoline to bpf prog, since bpf
  trampoline and bpf prog are allocated via vmalloc, there is chance the
  distance exceeds the max branch range.
- Collect ACK/Review-by, not sure if the ACK and Review-bys for bpf_arch_text_poke()
  should be kept, since the changes to it is not trivial
- Update some commit messages and comments

v5: https://lore.kernel.org/bpf/20220518131638.3401509-1-xukuohai@huawei.com/
- As Alexei suggested, remove is_valid_bpf_tramp_flags()

v4: https://lore.kernel.org/bpf/20220517071838.3366093-1-xukuohai@huawei.com/
- Run the test cases on raspberry pi 4b
- Rebase and add cookie to trampoline
- As Steve suggested, move trace_direct_tramp() back to entry-ftrace.S to
  avoid messing up generic code with architecture specific code
- As Jakub suggested, merge patch 4 and patch 5 of v3 to provide full function
  in one patch
- As Mark suggested, add a comment for the use of aarch64_insn_patch_text_nosync()
- Do not generate trampoline for long jump to avoid triggering ftrace_bug
- Round stack size to multiples of 16B to avoid SPAlignmentFault
- Use callee saved register x20 to reduce the use of mov_i64
- Add missing BTI J instructions
- Trivial spelling and code style fixes

v3: https://lore.kernel.org/bpf/20220424154028.1698685-1-xukuohai@huawei.com/
- Append test results for bpf_tcp_ca, dummy_st_ops, fexit_bpf2bpf,
  xdp_bpf2bpf
- Support to poke bpf progs
- Fix return value of arch_prepare_bpf_trampoline() to the total number
  of bytes instead of number of instructions 
- Do not check whether CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled in
  arch_prepare_bpf_trampoline, since the trampoline may be hooked to a bpf
  prog
- Restrict bpf_arch_text_poke() to poke bpf text only, as kernel functions
  are poked by ftrace
- Rewrite trace_direct_tramp() in inline assembly in trace_selftest.c
  to avoid messing entry-ftrace.S
- isolate arch_ftrace_set_direct_caller() with macro
  CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS to avoid compile error
  when this macro is disabled
- Some trivial code sytle fixes

v2: https://lore.kernel.org/bpf/20220414162220.1985095-1-xukuohai@huawei.com/
- Add Song's ACK
- Change the multi-line comment in is_valid_bpf_tramp_flags() into net
  style (patch 3)
- Fix a deadloop issue in ftrace selftest (patch 2)
- Replace pt_regs->x0 with pt_regs->orig_x0 in patch 1 commit message 
- Replace "bpf trampoline" with "custom trampoline" in patch 1, as
  ftrace direct call is not only used by bpf trampoline.

v1: https://lore.kernel.org/bpf/20220413054959.1053668-1-xukuohai@huawei.com/

Xu Kuohai (4):
  bpf: Remove is_valid_bpf_tramp_flags()
  arm64: Add LDR (literal) instruction
  bpf, arm64: Impelment bpf_arch_text_poke() for arm64
  bpf, arm64: bpf trampoline for arm64

 arch/arm64/include/asm/insn.h |   3 +
 arch/arm64/lib/insn.c         |  30 +-
 arch/arm64/net/bpf_jit.h      |   7 +
 arch/arm64/net/bpf_jit_comp.c | 717 +++++++++++++++++++++++++++++++++-
 arch/x86/net/bpf_jit_comp.c   |  20 -
 kernel/bpf/bpf_struct_ops.c   |   3 +
 kernel/bpf/trampoline.c       |   3 +
 7 files changed, 742 insertions(+), 41 deletions(-)

Comments

Daniel Borkmann June 30, 2022, 9:12 p.m. UTC | #1
Hey Mark,

On 6/25/22 6:12 PM, Xu Kuohai wrote:
> This patchset introduces bpf trampoline on arm64. A bpf trampoline converts
> native calling convention to bpf calling convention and is used to implement
> various bpf features, such as fentry, fexit, fmod_ret and struct_ops.
> 
> The trampoline introduced does essentially the same thing as the bpf
> trampoline does on x86.
> 
> Tested on raspberry pi 4b and qemu:
> 
>   #18 /1     bpf_tcp_ca/dctcp:OK
>   #18 /2     bpf_tcp_ca/cubic:OK
>   #18 /3     bpf_tcp_ca/invalid_license:OK
>   #18 /4     bpf_tcp_ca/dctcp_fallback:OK
>   #18 /5     bpf_tcp_ca/rel_setsockopt:OK
>   #18        bpf_tcp_ca:OK
>   #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
>   #51 /2     dummy_st_ops/dummy_init_ret_value:OK
>   #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
>   #51 /4     dummy_st_ops/dummy_multiple_args:OK
>   #51        dummy_st_ops:OK
>   #57 /1     fexit_bpf2bpf/target_no_callees:OK
>   #57 /2     fexit_bpf2bpf/target_yes_callees:OK
>   #57 /3     fexit_bpf2bpf/func_replace:OK
>   #57 /4     fexit_bpf2bpf/func_replace_verify:OK
>   #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
>   #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
>   #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
>   #57 /8     fexit_bpf2bpf/func_replace_multi:OK
>   #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
>   #57        fexit_bpf2bpf:OK
>   #237       xdp_bpf2bpf:OK
> 
> v6:
> - Since Mark is refactoring arm64 ftrace to support long jump and reduce the
>    ftrace trampoline overhead, it's not clear how we'll attach bpf trampoline
>    to regular kernel functions, so remove ftrace related patches for now.
> - Add long jump support for attaching bpf trampoline to bpf prog, since bpf
>    trampoline and bpf prog are allocated via vmalloc, there is chance the
>    distance exceeds the max branch range.
> - Collect ACK/Review-by, not sure if the ACK and Review-bys for bpf_arch_text_poke()
>    should be kept, since the changes to it is not trivial
> - Update some commit messages and comments

Given you've been taking a look and had objections in v5, would be great if you
can find some cycles for this v6.

Thanks a lot,
Daniel
Will Deacon July 5, 2022, 4 p.m. UTC | #2
Hi Daniel,

On Thu, Jun 30, 2022 at 11:12:54PM +0200, Daniel Borkmann wrote:
> On 6/25/22 6:12 PM, Xu Kuohai wrote:
> > This patchset introduces bpf trampoline on arm64. A bpf trampoline converts
> > native calling convention to bpf calling convention and is used to implement
> > various bpf features, such as fentry, fexit, fmod_ret and struct_ops.
> > 
> > The trampoline introduced does essentially the same thing as the bpf
> > trampoline does on x86.
> > 
> > Tested on raspberry pi 4b and qemu:
> > 
> >   #18 /1     bpf_tcp_ca/dctcp:OK
> >   #18 /2     bpf_tcp_ca/cubic:OK
> >   #18 /3     bpf_tcp_ca/invalid_license:OK
> >   #18 /4     bpf_tcp_ca/dctcp_fallback:OK
> >   #18 /5     bpf_tcp_ca/rel_setsockopt:OK
> >   #18        bpf_tcp_ca:OK
> >   #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
> >   #51 /2     dummy_st_ops/dummy_init_ret_value:OK
> >   #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
> >   #51 /4     dummy_st_ops/dummy_multiple_args:OK
> >   #51        dummy_st_ops:OK
> >   #57 /1     fexit_bpf2bpf/target_no_callees:OK
> >   #57 /2     fexit_bpf2bpf/target_yes_callees:OK
> >   #57 /3     fexit_bpf2bpf/func_replace:OK
> >   #57 /4     fexit_bpf2bpf/func_replace_verify:OK
> >   #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
> >   #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
> >   #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
> >   #57 /8     fexit_bpf2bpf/func_replace_multi:OK
> >   #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
> >   #57        fexit_bpf2bpf:OK
> >   #237       xdp_bpf2bpf:OK
> > 
> > v6:
> > - Since Mark is refactoring arm64 ftrace to support long jump and reduce the
> >    ftrace trampoline overhead, it's not clear how we'll attach bpf trampoline
> >    to regular kernel functions, so remove ftrace related patches for now.
> > - Add long jump support for attaching bpf trampoline to bpf prog, since bpf
> >    trampoline and bpf prog are allocated via vmalloc, there is chance the
> >    distance exceeds the max branch range.
> > - Collect ACK/Review-by, not sure if the ACK and Review-bys for bpf_arch_text_poke()
> >    should be kept, since the changes to it is not trivial
> > - Update some commit messages and comments
> 
> Given you've been taking a look and had objections in v5, would be great if you
> can find some cycles for this v6.

Mark's out at the moment, so I wouldn't hold this series up pending his ack.
However, I agree that it would be good if _somebody_ from the Arm side can
give it the once over, so I've added Jean-Philippe to cc in case he has time
for a quick review. KP said he would also have a look, as he is interested
in this series landing.

Failing that, I'll try to look this week, but I'm off next week and I don't
want this to miss the merge window on my account.

Cheers,

Will
KP Singh July 5, 2022, 6:34 p.m. UTC | #3
On Tue, Jul 5, 2022 at 6:00 PM Will Deacon <will@kernel.org> wrote:
>
> Hi Daniel,
>
> On Thu, Jun 30, 2022 at 11:12:54PM +0200, Daniel Borkmann wrote:
> > On 6/25/22 6:12 PM, Xu Kuohai wrote:
> > > This patchset introduces bpf trampoline on arm64. A bpf trampoline converts
> > > native calling convention to bpf calling convention and is used to implement
> > > various bpf features, such as fentry, fexit, fmod_ret and struct_ops.
> > >
> > > The trampoline introduced does essentially the same thing as the bpf
> > > trampoline does on x86.
> > >
> > > Tested on raspberry pi 4b and qemu:
> > >
> > >   #18 /1     bpf_tcp_ca/dctcp:OK
> > >   #18 /2     bpf_tcp_ca/cubic:OK
> > >   #18 /3     bpf_tcp_ca/invalid_license:OK
> > >   #18 /4     bpf_tcp_ca/dctcp_fallback:OK
> > >   #18 /5     bpf_tcp_ca/rel_setsockopt:OK
> > >   #18        bpf_tcp_ca:OK
> > >   #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
> > >   #51 /2     dummy_st_ops/dummy_init_ret_value:OK
> > >   #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
> > >   #51 /4     dummy_st_ops/dummy_multiple_args:OK
> > >   #51        dummy_st_ops:OK
> > >   #57 /1     fexit_bpf2bpf/target_no_callees:OK
> > >   #57 /2     fexit_bpf2bpf/target_yes_callees:OK
> > >   #57 /3     fexit_bpf2bpf/func_replace:OK
> > >   #57 /4     fexit_bpf2bpf/func_replace_verify:OK
> > >   #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
> > >   #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
> > >   #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
> > >   #57 /8     fexit_bpf2bpf/func_replace_multi:OK
> > >   #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
> > >   #57        fexit_bpf2bpf:OK
> > >   #237       xdp_bpf2bpf:OK
> > >
> > > v6:
> > > - Since Mark is refactoring arm64 ftrace to support long jump and reduce the
> > >    ftrace trampoline overhead, it's not clear how we'll attach bpf trampoline
> > >    to regular kernel functions, so remove ftrace related patches for now.
> > > - Add long jump support for attaching bpf trampoline to bpf prog, since bpf
> > >    trampoline and bpf prog are allocated via vmalloc, there is chance the
> > >    distance exceeds the max branch range.
> > > - Collect ACK/Review-by, not sure if the ACK and Review-bys for bpf_arch_text_poke()
> > >    should be kept, since the changes to it is not trivial

+1 I need to give it another pass.

> > > - Update some commit messages and comments
> >
> > Given you've been taking a look and had objections in v5, would be great if you
> > can find some cycles for this v6.
>
> Mark's out at the moment, so I wouldn't hold this series up pending his ack.
> However, I agree that it would be good if _somebody_ from the Arm side can
> give it the once over, so I've added Jean-Philippe to cc in case he has time

Makes sense,  Jean-Philippe had worked on BPF trampolines for ARM.

> for a quick review. KP said he would also have a look, as he is interested

Thank you so much Will, I will give this another pass before the end
of the week.

> in this series landing.
>
> Failing that, I'll try to look this week, but I'm off next week and I don't
> want this to miss the merge window on my account.

Thanks for being considerate. Much appreciated.

- KP

>
> Cheers,
>
> Will
Jean-Philippe Brucker July 6, 2022, 4:08 p.m. UTC | #4
On Tue, Jul 05, 2022 at 05:00:46PM +0100, Will Deacon wrote:
> > Given you've been taking a look and had objections in v5, would be great if
> you
> > can find some cycles for this v6.
> 
> Mark's out at the moment, so I wouldn't hold this series up pending his ack.
> However, I agree that it would be good if _somebody_ from the Arm side can
> give it the once over, so I've added Jean-Philippe to cc in case he has time
> for a quick review.

I'll take a look. Sorry for not catching this earlier, all versions of the
series somehow ended up in my spams :/

Thanks,
Jean

> KP said he would also have a look, as he is interested
> in this series landing.
> 
> Failing that, I'll try to look this week, but I'm off next week and I don't
> want this to miss the merge window on my account.
> 
> Cheers,
> 
> Will
Will Deacon July 6, 2022, 4:11 p.m. UTC | #5
On Wed, Jul 06, 2022 at 05:08:49PM +0100, Jean-Philippe Brucker wrote:
> On Tue, Jul 05, 2022 at 05:00:46PM +0100, Will Deacon wrote:
> > > Given you've been taking a look and had objections in v5, would be great if
> > you
> > > can find some cycles for this v6.
> > 
> > Mark's out at the moment, so I wouldn't hold this series up pending his ack.
> > However, I agree that it would be good if _somebody_ from the Arm side can
> > give it the once over, so I've added Jean-Philippe to cc in case he has time
> > for a quick review.
> 
> I'll take a look. Sorry for not catching this earlier, all versions of the
> series somehow ended up in my spams :/

Yeah, same here. It was only Daniel's mail that hit my inbox!

Will
Xu Kuohai July 7, 2022, 2:56 a.m. UTC | #6
On 7/7/2022 12:11 AM, Will Deacon wrote:
> On Wed, Jul 06, 2022 at 05:08:49PM +0100, Jean-Philippe Brucker wrote:
>> On Tue, Jul 05, 2022 at 05:00:46PM +0100, Will Deacon wrote:
>>>> Given you've been taking a look and had objections in v5, would be great if
>>> you
>>>> can find some cycles for this v6.
>>>
>>> Mark's out at the moment, so I wouldn't hold this series up pending his ack.
>>> However, I agree that it would be good if _somebody_ from the Arm side can
>>> give it the once over, so I've added Jean-Philippe to cc in case he has time
>>> for a quick review.
>>
>> I'll take a look. Sorry for not catching this earlier, all versions of the
>> series somehow ended up in my spams :/
> 
> Yeah, same here. It was only Daniel's mail that hit my inbox!
> 
> Will
> .

Sorry, there is a misconfiguration in the huawei.com mail server:

https://lore.kernel.org/all/20220523152516.7sr247i3bzwhr44w@quack3.lan/

Our IT admins are working on this issue and hopefully they'll fix it soon.
Xu Kuohai July 7, 2022, 3:35 a.m. UTC | #7
On 7/6/2022 2:34 AM, KP Singh wrote:
> On Tue, Jul 5, 2022 at 6:00 PM Will Deacon <will@kernel.org> wrote:
>>
>> Hi Daniel,
>>
>> On Thu, Jun 30, 2022 at 11:12:54PM +0200, Daniel Borkmann wrote:
>>> On 6/25/22 6:12 PM, Xu Kuohai wrote:
>>>> This patchset introduces bpf trampoline on arm64. A bpf trampoline converts
>>>> native calling convention to bpf calling convention and is used to implement
>>>> various bpf features, such as fentry, fexit, fmod_ret and struct_ops.
>>>>
>>>> The trampoline introduced does essentially the same thing as the bpf
>>>> trampoline does on x86.
>>>>
>>>> Tested on raspberry pi 4b and qemu:
>>>>
>>>>   #18 /1     bpf_tcp_ca/dctcp:OK
>>>>   #18 /2     bpf_tcp_ca/cubic:OK
>>>>   #18 /3     bpf_tcp_ca/invalid_license:OK
>>>>   #18 /4     bpf_tcp_ca/dctcp_fallback:OK
>>>>   #18 /5     bpf_tcp_ca/rel_setsockopt:OK
>>>>   #18        bpf_tcp_ca:OK
>>>>   #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
>>>>   #51 /2     dummy_st_ops/dummy_init_ret_value:OK
>>>>   #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
>>>>   #51 /4     dummy_st_ops/dummy_multiple_args:OK
>>>>   #51        dummy_st_ops:OK
>>>>   #57 /1     fexit_bpf2bpf/target_no_callees:OK
>>>>   #57 /2     fexit_bpf2bpf/target_yes_callees:OK
>>>>   #57 /3     fexit_bpf2bpf/func_replace:OK
>>>>   #57 /4     fexit_bpf2bpf/func_replace_verify:OK
>>>>   #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
>>>>   #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
>>>>   #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
>>>>   #57 /8     fexit_bpf2bpf/func_replace_multi:OK
>>>>   #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
>>>>   #57        fexit_bpf2bpf:OK
>>>>   #237       xdp_bpf2bpf:OK
>>>>
>>>> v6:
>>>> - Since Mark is refactoring arm64 ftrace to support long jump and reduce the
>>>>    ftrace trampoline overhead, it's not clear how we'll attach bpf trampoline
>>>>    to regular kernel functions, so remove ftrace related patches for now.
>>>> - Add long jump support for attaching bpf trampoline to bpf prog, since bpf
>>>>    trampoline and bpf prog are allocated via vmalloc, there is chance the
>>>>    distance exceeds the max branch range.
>>>> - Collect ACK/Review-by, not sure if the ACK and Review-bys for bpf_arch_text_poke()
>>>>    should be kept, since the changes to it is not trivial
> 
> +1 I need to give it another pass.>

Thank you verfy much! But I have to admit a problem. This patchset does
not suport attaching bpf trampoline to regular kernel functions with
ftrace. So lsm still does not work since the LSM HOOKS themselves are
regular kernel functions. Sorry about that and hopefully we'll find an
acceptable solution soon.

>>>> - Update some commit messages and comments
>>>
>>> Given you've been taking a look and had objections in v5, would be great if you
>>> can find some cycles for this v6.
>>
>> Mark's out at the moment, so I wouldn't hold this series up pending his ack.
>> However, I agree that it would be good if _somebody_ from the Arm side can
>> give it the once over, so I've added Jean-Philippe to cc in case he has time
> 
> Makes sense,  Jean-Philippe had worked on BPF trampolines for ARM.
> 
>> for a quick review. KP said he would also have a look, as he is interested
> 
> Thank you so much Will, I will give this another pass before the end
> of the week.
> 
>> in this series landing.
>>
>> Failing that, I'll try to look this week, but I'm off next week and I don't
>> want this to miss the merge window on my account.
> 
> Thanks for being considerate. Much appreciated.
> 
> - KP
> 
>>
>> Cheers,
>>
>> Will
> .