Message ID | 20230125213817.1424447-1-iii@linux.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | Support bpf trampoline for s390x | expand |
On Wed, Jan 25, 2023 at 1:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > Hi, > > This series implements poke, trampoline, kfunc, mixing subprogs and > tailcalls, and fixes a number of tests on s390x. > > The following failures still remain: > > #52 core_read_macros:FAIL > Uses BPF_PROBE_READ(), shouldn't there be BPF_PROBE_READ_KERNEL()? BPF_PROBE_READ(), similarly to BPF_CORE_READ() both use bpf_probe_read_kernel() internally, as it's most common use case. We have separate BPF_PROBE_READ_USER() and BPF_CORE_READ_USER() for when bpf_probe_read_user() has to be used. > > #82 get_stack_raw_tp:FAIL > get_stack_print_output:FAIL:user_stack corrupted user stack > Known issue: > We cannot reliably unwind userspace on s390x without DWARF. like in principle, or frame pointers (or some equivalent) needs to be configured for this to work? Asking also in the context of [0], where s390x was excluded. If there is actually a way to enable frame pointer-based stack unwinding on s390x, would be nice to hear from an expert. [0] https://pagure.io/fesco/issue/2923 > > #101 ksyms_module:FAIL > address of kernel function bpf_testmod_test_mod_kfunc is out of range > Known issue: > Kernel and modules are too far away from each other on s390x. > > #167 sk_assign:FAIL > Uses legacy map definitions in 'maps' section. Hm.. assuming new enough iproute2, new-style .maps definition should be supported, right? Let's convert map definition? > > #190 stacktrace_build_id:FAIL > Known issue: > We cannot reliably unwind userspace on s390x without DWARF. > > #211 test_bpffs:FAIL > iterators.bpf.c is broken on s390x, it uses BPF_CORE_READ(), shouldn't > there be BPF_CORE_READ_KERNEL()? BPF_CORE_READ() is that, so must be something else > > #218 test_profiler:FAIL > A lot of BPF_PROBE_READ() usages. ditto, something else > > #281 xdp_metadata:FAIL > See patch 24. > > #284 xdp_synproxy:FAIL > Verifier error: > ; value = bpf_tcp_raw_gen_syncookie_ipv4(hdr->ipv4, hdr->tcp, > 281: (79) r1 = *(u64 *)(r10 -80) ; R1_w=pkt(off=14,r=74,imm=0) R10=fp0 > 282: (bf) r2 = r8 ; R2_w=pkt(id=5,off=14,r=74,umax=60,var_off=(0x0; 0x3c)) R8=pkt(id=5,off=14,r=74,umax=60,var_off=(0x0; 0x3c)) > 283: (79) r3 = *(u64 *)(r10 -104) ; R3_w=scalar(umax=60,var_off=(0x0; 0x3c)) R10=fp0 > 284: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204 > invalid access to packet, off=14 size=0, R2(id=5,off=14,r=74) > R2 offset is outside of the packet third arg to bpf_tcp_raw_gen_syncookie_ipv4() is defined as ARG_CONST_SIZE, so is required to be strictly positive, which doesn't seem to be "proven" to verifier. Please provided bigger log, it might another instance of needing to sprinkle barrier_var() around. And maybe thinking about using ARG_CONST_SIZE_OR_ZERO instead of ARG_CONST_SIZE. > > None of these seem to be due to the new changes. > > Best regards, > Ilya > > Ilya Leoshkevich (24): > selftests/bpf: Fix liburandom_read.so linker error > selftests/bpf: Fix symlink creation error > selftests/bpf: Fix fexit_stress on s390x > selftests/bpf: Fix trampoline_count on s390x > selftests/bpf: Fix kfree_skb on s390x > selftests/bpf: Set errno when urand_spawn() fails > selftests/bpf: Fix decap_sanity_ns cleanup > selftests/bpf: Fix verify_pkcs7_sig on s390x > selftests/bpf: Fix xdp_do_redirect on s390x > selftests/bpf: Fix cgrp_local_storage on s390x > selftests/bpf: Check stack_mprotect() return value > selftests/bpf: Increase SIZEOF_BPF_LOCAL_STORAGE_ELEM on s390x > selftests/bpf: Add a sign-extension test for kfuncs > selftests/bpf: Fix test_lsm on s390x > selftests/bpf: Fix test_xdp_adjust_tail_grow2 on s390x > selftests/bpf: Fix vmlinux test on s390x > libbpf: Read usdt arg spec with bpf_probe_read_kernel() > s390/bpf: Fix a typo in a comment > s390/bpf: Add expoline to tail calls > s390/bpf: Implement bpf_arch_text_poke() > bpf: btf: Add BTF_FMODEL_SIGNED_ARG flag > s390/bpf: Implement arch_prepare_bpf_trampoline() > s390/bpf: Implement bpf_jit_supports_subprog_tailcalls() > s390/bpf: Implement bpf_jit_supports_kfunc_call() > > arch/s390/net/bpf_jit_comp.c | 708 +++++++++++++++++- > include/linux/bpf.h | 8 + > include/linux/btf.h | 15 +- > kernel/bpf/btf.c | 16 +- > net/bpf/test_run.c | 9 + > tools/lib/bpf/usdt.bpf.h | 33 +- > tools/testing/selftests/bpf/Makefile | 7 +- > tools/testing/selftests/bpf/netcnt_common.h | 6 +- > .../selftests/bpf/prog_tests/bpf_cookie.c | 6 +- > .../bpf/prog_tests/cgrp_local_storage.c | 2 +- > .../selftests/bpf/prog_tests/decap_sanity.c | 2 +- > .../selftests/bpf/prog_tests/fexit_stress.c | 6 +- > .../selftests/bpf/prog_tests/kfree_skb.c | 2 +- > .../selftests/bpf/prog_tests/kfunc_call.c | 1 + > .../selftests/bpf/prog_tests/test_lsm.c | 3 +- > .../bpf/prog_tests/trampoline_count.c | 4 + > tools/testing/selftests/bpf/prog_tests/usdt.c | 1 + > .../bpf/prog_tests/verify_pkcs7_sig.c | 9 + > .../bpf/prog_tests/xdp_adjust_tail.c | 7 +- > .../bpf/prog_tests/xdp_do_redirect.c | 4 + > .../selftests/bpf/progs/kfunc_call_test.c | 18 + > tools/testing/selftests/bpf/progs/lsm.c | 7 +- > .../bpf/progs/test_verify_pkcs7_sig.c | 12 +- > .../selftests/bpf/progs/test_vmlinux.c | 4 +- > .../bpf/progs/test_xdp_adjust_tail_grow.c | 8 +- > 25 files changed, 816 insertions(+), 82 deletions(-) > > -- > 2.39.1 >
On Wed, 2023-01-25 at 16:45 -0800, Andrii Nakryiko wrote: > On Wed, Jan 25, 2023 at 1:39 PM Ilya Leoshkevich <iii@linux.ibm.com> > wrote: > > > > Hi, > > > > This series implements poke, trampoline, kfunc, mixing subprogs and > > tailcalls, and fixes a number of tests on s390x. > > > > The following failures still remain: > > > > #52 core_read_macros:FAIL > > Uses BPF_PROBE_READ(), shouldn't there be BPF_PROBE_READ_KERNEL()? > > BPF_PROBE_READ(), similarly to BPF_CORE_READ() both use > bpf_probe_read_kernel() internally, as it's most common use case. We > have separate BPF_PROBE_READ_USER() and BPF_CORE_READ_USER() for when > bpf_probe_read_user() has to be used. At least purely from the code perspective, BPF_PROBE_READ() seems to delegate to bpf_probe_read(). The following therefore helps with this test: --- a/tools/lib/bpf/bpf_core_read.h +++ b/tools/lib/bpf/bpf_core_read.h @@ -364,7 +364,7 @@ enum bpf_enum_value_kind { /* Non-CO-RE variant of BPF_CORE_READ_INTO() */ #define BPF_PROBE_READ_INTO(dst, src, a, ...) ({ \ - ___core_read(bpf_probe_read, bpf_probe_read, \ + ___core_read(bpf_probe_read_kernel, bpf_probe_read_kernel, \ dst, (src), a, ##__VA_ARGS__) \ }) @@ -400,7 +400,7 @@ enum bpf_enum_value_kind { /* Non-CO-RE variant of BPF_CORE_READ_STR_INTO() */ #define BPF_PROBE_READ_STR_INTO(dst, src, a, ...) ({ \ - ___core_read(bpf_probe_read_str, bpf_probe_read, \ + ___core_read(bpf_probe_read_kernel_str, bpf_probe_read_kernel, \ dst, (src), a, ##__VA_ARGS__) \ }) but I'm not sure if there are backward compatibility concerns, or if libbpf is supposed to rewrite this when !ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. > > > > #82 get_stack_raw_tp:FAIL > > get_stack_print_output:FAIL:user_stack corrupted user stack > > Known issue: > > We cannot reliably unwind userspace on s390x without DWARF. > > like in principle, or frame pointers (or some equivalent) needs to be > configured for this to work? > > Asking also in the context of [0], where s390x was excluded. If there > is actually a way to enable frame pointer-based stack unwinding on > s390x, would be nice to hear from an expert. > > [0] https://pagure.io/fesco/issue/2923 For DWARFless unwinding we have -mbackchain (not to be confused with -fno-omit-frame-pointer, which we also have, but which just hurts performance without providing tangible benefits). -mbackchain has a few problems though: - It's not atomic. Here is a typical prologue with -mbackchain: 1: stmg %r11,%r15,88(%r15) # save non-volatile registers 2: lgr %r14,%r15 # %r14 = sp 3: lay %r15,-160(%r15) # sp -= 160 4: stg %r14,0(%r15) # *(void **)sp = %r14 The invariant here is that *(void **)%r15 is always a pointer to the next frame. This means that if we unwind from (4), we are totally broken. This does not happen if we unwind from any other instruction, but still. - Unwinding from (1)-(3) is not particularly good either. PSW points to the callee, but R15 points to the caller's frame. - Unwinding leaf functions is like the previous problem, but worse: they often do not establish a stack frame at all, so PSW and R15 are out of sync for the entire duration of the call. Therefore .eh_frame-based unwinding is preferred, since it covers all these corner cases and is therefore reliable. From what I understand, adding .eh_frame unwinding to the kernel is not desirable. In an internal discussion we had another idea though: would it be possible to have an eBPF program that does .eh_frame parsing and unwinding? I understand that it can be technically challenging at the moment, but the end result would not be exploitable by crafting malicious .eh_frame sections, won't loop endlessly, will have good performance, etc. > > #101 ksyms_module:FAIL > > address of kernel function bpf_testmod_test_mod_kfunc is out of > > range > > Known issue: > > Kernel and modules are too far away from each other on s390x. > > > > #167 sk_assign:FAIL > > Uses legacy map definitions in 'maps' section. > > Hm.. assuming new enough iproute2, new-style .maps definition should > be supported, right? Let's convert map definition? Yep, that worked. Will include in v2. > > #190 stacktrace_build_id:FAIL > > Known issue: > > We cannot reliably unwind userspace on s390x without DWARF. > > > > #211 test_bpffs:FAIL > > iterators.bpf.c is broken on s390x, it uses BPF_CORE_READ(), > > shouldn't > > there be BPF_CORE_READ_KERNEL()? > > BPF_CORE_READ() is that, so must be something else > > > > > #218 test_profiler:FAIL > > A lot of BPF_PROBE_READ() usages. > > ditto, something else > > > > > #281 xdp_metadata:FAIL > > See patch 24. > > > > #284 xdp_synproxy:FAIL > > Verifier error: > > ; value = bpf_tcp_raw_gen_syncookie_ipv4(hdr->ipv4, hdr->tcp, > > 281: (79) r1 = *(u64 *)(r10 -80) ; R1_w=pkt(off=14,r=74,imm=0) > > R10=fp0 > > 282: (bf) r2 = r8 ; > > R2_w=pkt(id=5,off=14,r=74,umax=60,var_off=(0x0; 0x3c)) > > R8=pkt(id=5,off=14,r=74,umax=60,var_off=(0x0; 0x3c)) > > 283: (79) r3 = *(u64 *)(r10 -104) ; > > R3_w=scalar(umax=60,var_off=(0x0; 0x3c)) R10=fp0 > > 284: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204 > > invalid access to packet, off=14 size=0, R2(id=5,off=14,r=74) > > R2 offset is outside of the packet > > third arg to bpf_tcp_raw_gen_syncookie_ipv4() is defined as > ARG_CONST_SIZE, so is required to be strictly positive, which doesn't > seem to be "proven" to verifier. Please provided bigger log, it might > another instance of needing to sprinkle barrier_var() around. > > And maybe thinking about using ARG_CONST_SIZE_OR_ZERO instead of > ARG_CONST_SIZE. Here is the full log: https://gist.github.com/iii-i/8e20100c33ab6f0dffb5e6e51d1330e8 Apparently we do indeed lose a constraint established by if (hdr->tcp_len < sizeof(*hdr->tcp)). But the naive --- a/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c +++ b/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c @@ -401,6 +401,7 @@ static __always_inline int tcp_dissect(void *data, void *data_end, hdr->tcp_len = hdr->tcp->doff * 4; if (hdr->tcp_len < sizeof(*hdr->tcp)) return XDP_DROP; + barrier_var(hdr->tcp_len); return XDP_TX; } @@ -791,6 +792,7 @@ static __always_inline int syncookie_part2(void *ctx, void *data, void *data_end hdr->tcp_len = hdr->tcp->doff * 4; if (hdr->tcp_len < sizeof(*hdr->tcp)) return XDP_ABORTED; + barrier_var(hdr->tcp_len); return hdr->tcp->syn ? syncookie_handle_syn(hdr, ctx, data, data_end, xdp) : syncookie_handle_ack(hdr); does not help. > > > > > None of these seem to be due to the new changes. > > > > Best regards, > > Ilya > > > > Ilya Leoshkevich (24): > > selftests/bpf: Fix liburandom_read.so linker error > > selftests/bpf: Fix symlink creation error > > selftests/bpf: Fix fexit_stress on s390x > > selftests/bpf: Fix trampoline_count on s390x > > selftests/bpf: Fix kfree_skb on s390x > > selftests/bpf: Set errno when urand_spawn() fails > > selftests/bpf: Fix decap_sanity_ns cleanup > > selftests/bpf: Fix verify_pkcs7_sig on s390x > > selftests/bpf: Fix xdp_do_redirect on s390x > > selftests/bpf: Fix cgrp_local_storage on s390x > > selftests/bpf: Check stack_mprotect() return value > > selftests/bpf: Increase SIZEOF_BPF_LOCAL_STORAGE_ELEM on s390x > > selftests/bpf: Add a sign-extension test for kfuncs > > selftests/bpf: Fix test_lsm on s390x > > selftests/bpf: Fix test_xdp_adjust_tail_grow2 on s390x > > selftests/bpf: Fix vmlinux test on s390x > > libbpf: Read usdt arg spec with bpf_probe_read_kernel() > > s390/bpf: Fix a typo in a comment > > s390/bpf: Add expoline to tail calls > > s390/bpf: Implement bpf_arch_text_poke() > > bpf: btf: Add BTF_FMODEL_SIGNED_ARG flag > > s390/bpf: Implement arch_prepare_bpf_trampoline() > > s390/bpf: Implement bpf_jit_supports_subprog_tailcalls() > > s390/bpf: Implement bpf_jit_supports_kfunc_call() > > > > arch/s390/net/bpf_jit_comp.c | 708 > > +++++++++++++++++- > > include/linux/bpf.h | 8 + > > include/linux/btf.h | 15 +- > > kernel/bpf/btf.c | 16 +- > > net/bpf/test_run.c | 9 + > > tools/lib/bpf/usdt.bpf.h | 33 +- > > tools/testing/selftests/bpf/Makefile | 7 +- > > tools/testing/selftests/bpf/netcnt_common.h | 6 +- > > .../selftests/bpf/prog_tests/bpf_cookie.c | 6 +- > > .../bpf/prog_tests/cgrp_local_storage.c | 2 +- > > .../selftests/bpf/prog_tests/decap_sanity.c | 2 +- > > .../selftests/bpf/prog_tests/fexit_stress.c | 6 +- > > .../selftests/bpf/prog_tests/kfree_skb.c | 2 +- > > .../selftests/bpf/prog_tests/kfunc_call.c | 1 + > > .../selftests/bpf/prog_tests/test_lsm.c | 3 +- > > .../bpf/prog_tests/trampoline_count.c | 4 + > > tools/testing/selftests/bpf/prog_tests/usdt.c | 1 + > > .../bpf/prog_tests/verify_pkcs7_sig.c | 9 + > > .../bpf/prog_tests/xdp_adjust_tail.c | 7 +- > > .../bpf/prog_tests/xdp_do_redirect.c | 4 + > > .../selftests/bpf/progs/kfunc_call_test.c | 18 + > > tools/testing/selftests/bpf/progs/lsm.c | 7 +- > > .../bpf/progs/test_verify_pkcs7_sig.c | 12 +- > > .../selftests/bpf/progs/test_vmlinux.c | 4 +- > > .../bpf/progs/test_xdp_adjust_tail_grow.c | 8 +- > > 25 files changed, 816 insertions(+), 82 deletions(-) > > > > -- > > 2.39.1 > >
On Fri, Jan 27, 2023 at 8:51 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > On Wed, 2023-01-25 at 16:45 -0800, Andrii Nakryiko wrote: > > On Wed, Jan 25, 2023 at 1:39 PM Ilya Leoshkevich <iii@linux.ibm.com> > > wrote: > > > > > > Hi, > > > > > > This series implements poke, trampoline, kfunc, mixing subprogs and > > > tailcalls, and fixes a number of tests on s390x. > > > > > > The following failures still remain: > > > > > > #52 core_read_macros:FAIL > > > Uses BPF_PROBE_READ(), shouldn't there be BPF_PROBE_READ_KERNEL()? > > > > BPF_PROBE_READ(), similarly to BPF_CORE_READ() both use > > bpf_probe_read_kernel() internally, as it's most common use case. We > > have separate BPF_PROBE_READ_USER() and BPF_CORE_READ_USER() for when > > bpf_probe_read_user() has to be used. > > At least purely from the code perspective, BPF_PROBE_READ() seems to > delegate to bpf_probe_read(). The following therefore helps with this > test: > > --- a/tools/lib/bpf/bpf_core_read.h > +++ b/tools/lib/bpf/bpf_core_read.h > @@ -364,7 +364,7 @@ enum bpf_enum_value_kind { > > /* Non-CO-RE variant of BPF_CORE_READ_INTO() */ > #define BPF_PROBE_READ_INTO(dst, src, a, ...) ({ > \ > - ___core_read(bpf_probe_read, bpf_probe_read, > \ > + ___core_read(bpf_probe_read_kernel, bpf_probe_read_kernel, > \ > dst, (src), a, ##__VA_ARGS__) > \ > }) > > @@ -400,7 +400,7 @@ enum bpf_enum_value_kind { > > /* Non-CO-RE variant of BPF_CORE_READ_STR_INTO() */ > #define BPF_PROBE_READ_STR_INTO(dst, src, a, ...) ({ > \ > - ___core_read(bpf_probe_read_str, bpf_probe_read, > \ > + ___core_read(bpf_probe_read_kernel_str, bpf_probe_read_kernel, > \ > dst, (src), a, ##__VA_ARGS__) > \ > }) > > but I'm not sure if there are backward compatibility concerns, or if > libbpf is supposed to rewrite this when > !ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. Oh, this is just a bug, it was an omission when we converted BPF_CORE_READ to bpf_probe_read_kernel. If you look at bpf_core_read definition, it is using bpf_probe_read_kernel, which is also the intent for BPF_PROBE_READ. Let's fix this. And there is no backwards compat concerns because libbpf will automatically convert calls to bpf_probe_read_{kernel,user}[_str] to bpf_probe_read[_str] if host kernel doesn't yet support kernel or user specific variants. > > > > > > > #82 get_stack_raw_tp:FAIL > > > get_stack_print_output:FAIL:user_stack corrupted user stack > > > Known issue: > > > We cannot reliably unwind userspace on s390x without DWARF. > > > > like in principle, or frame pointers (or some equivalent) needs to be > > configured for this to work? > > > > Asking also in the context of [0], where s390x was excluded. If there > > is actually a way to enable frame pointer-based stack unwinding on > > s390x, would be nice to hear from an expert. > > > > [0] https://pagure.io/fesco/issue/2923 > > For DWARFless unwinding we have -mbackchain (not to be confused with > -fno-omit-frame-pointer, which we also have, but which just hurts > performance without providing tangible benefits). > -mbackchain has a few problems though: > > - It's not atomic. Here is a typical prologue with -mbackchain: > > 1: stmg %r11,%r15,88(%r15) # save non-volatile registers > 2: lgr %r14,%r15 # %r14 = sp > 3: lay %r15,-160(%r15) # sp -= 160 > 4: stg %r14,0(%r15) # *(void **)sp = %r14 > > The invariant here is that *(void **)%r15 is always a pointer to the > next frame. This means that if we unwind from (4), we are totally > broken. This does not happen if we unwind from any other instruction, > but still. > > - Unwinding from (1)-(3) is not particularly good either. PSW points to > the callee, but R15 points to the caller's frame. > > - Unwinding leaf functions is like the previous problem, but worse: > they often do not establish a stack frame at all, so PSW and R15 are > out of sync for the entire duration of the call. > > Therefore .eh_frame-based unwinding is preferred, since it covers all > these corner cases and is therefore reliable. From what I understand, > adding .eh_frame unwinding to the kernel is not desirable. In an > internal discussion we had another idea though: would it be possible to > have an eBPF program that does .eh_frame parsing and unwinding? I > understand that it can be technically challenging at the moment, but > the end result would not be exploitable by crafting malicious > .eh_frame sections, won't loop endlessly, will have good performance, > etc. Thanks for details. This was all discussed at length in Fedora -fno-omit-frame-pointer discussion I linked above, so no real need to go over this again. .eh_frame-based unwinding on BPF side is possible, but only for processes that you knew about (and preprocessed) before you started profiling session. Pre-processing is memory and cpu-intensive operation on busy systems, and they will miss any processes started during profiling. So as a general approach for system-wide profiling it leaves a lot to be desired. Should we enable -mbackchain in our CI for s390x to be able to capture stack traces (even if on some instructions they might be incomplete or outright broken)? > > > > #101 ksyms_module:FAIL > > > address of kernel function bpf_testmod_test_mod_kfunc is out of > > > range > > > Known issue: > > > Kernel and modules are too far away from each other on s390x. > > > > > > #167 sk_assign:FAIL > > > Uses legacy map definitions in 'maps' section. > > > > Hm.. assuming new enough iproute2, new-style .maps definition should > > be supported, right? Let's convert map definition? > > Yep, that worked. Will include in v2. Nice. > > > > #190 stacktrace_build_id:FAIL > > > Known issue: > > > We cannot reliably unwind userspace on s390x without DWARF. > > > > > > #211 test_bpffs:FAIL > > > iterators.bpf.c is broken on s390x, it uses BPF_CORE_READ(), > > > shouldn't > > > there be BPF_CORE_READ_KERNEL()? > > > > BPF_CORE_READ() is that, so must be something else > > > > > > > > #218 test_profiler:FAIL > > > A lot of BPF_PROBE_READ() usages. > > > > ditto, something else > > > > > > > > #281 xdp_metadata:FAIL > > > See patch 24. > > > > > > #284 xdp_synproxy:FAIL > > > Verifier error: > > > ; value = bpf_tcp_raw_gen_syncookie_ipv4(hdr->ipv4, hdr->tcp, > > > 281: (79) r1 = *(u64 *)(r10 -80) ; R1_w=pkt(off=14,r=74,imm=0) > > > R10=fp0 > > > 282: (bf) r2 = r8 ; > > > R2_w=pkt(id=5,off=14,r=74,umax=60,var_off=(0x0; 0x3c)) > > > R8=pkt(id=5,off=14,r=74,umax=60,var_off=(0x0; 0x3c)) > > > 283: (79) r3 = *(u64 *)(r10 -104) ; > > > R3_w=scalar(umax=60,var_off=(0x0; 0x3c)) R10=fp0 > > > 284: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204 > > > invalid access to packet, off=14 size=0, R2(id=5,off=14,r=74) > > > R2 offset is outside of the packet > > > > third arg to bpf_tcp_raw_gen_syncookie_ipv4() is defined as > > ARG_CONST_SIZE, so is required to be strictly positive, which doesn't > > seem to be "proven" to verifier. Please provided bigger log, it might > > another instance of needing to sprinkle barrier_var() around. > > > > And maybe thinking about using ARG_CONST_SIZE_OR_ZERO instead of > > ARG_CONST_SIZE. > > Here is the full log: > > https://gist.github.com/iii-i/8e20100c33ab6f0dffb5e6e51d1330e8 > > Apparently we do indeed lose a constraint established by > if (hdr->tcp_len < sizeof(*hdr->tcp)). But the naive The test is too big and unfamiliar for me to figure this out. And the problem is not upper bound, but lower bound. hdr->tcp_len is not proven to be strictly greater than zero, which is what verifier complains about. Not sure how it works on other arches right now. But I see that bpf_csum_diff defines size arguments as ARG_CONST_SIZE_OR_ZERO while bpf_tcp_raw_gen_syncookie_ipv4 has ARG_CONST_SIZE. I generally found ARG_CONST_SIZE way too problematic in practice, I'd say we should change it to ARG_CONST_SIZE_OR_ZERO. > > --- a/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c > +++ b/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c > @@ -401,6 +401,7 @@ static __always_inline int tcp_dissect(void *data, > void *data_end, > hdr->tcp_len = hdr->tcp->doff * 4; > if (hdr->tcp_len < sizeof(*hdr->tcp)) > return XDP_DROP; > + barrier_var(hdr->tcp_len); > > return XDP_TX; > } > @@ -791,6 +792,7 @@ static __always_inline int syncookie_part2(void > *ctx, void *data, void *data_end > hdr->tcp_len = hdr->tcp->doff * 4; > if (hdr->tcp_len < sizeof(*hdr->tcp)) > return XDP_ABORTED; > + barrier_var(hdr->tcp_len); > > return hdr->tcp->syn ? syncookie_handle_syn(hdr, ctx, data, > data_end, xdp) : > syncookie_handle_ack(hdr); > > does not help. > > > > > > > > > None of these seem to be due to the new changes. > > > > > > Best regards, > > > Ilya > > > > > > Ilya Leoshkevich (24): > > > selftests/bpf: Fix liburandom_read.so linker error > > > selftests/bpf: Fix symlink creation error > > > selftests/bpf: Fix fexit_stress on s390x > > > selftests/bpf: Fix trampoline_count on s390x > > > selftests/bpf: Fix kfree_skb on s390x > > > selftests/bpf: Set errno when urand_spawn() fails > > > selftests/bpf: Fix decap_sanity_ns cleanup > > > selftests/bpf: Fix verify_pkcs7_sig on s390x > > > selftests/bpf: Fix xdp_do_redirect on s390x > > > selftests/bpf: Fix cgrp_local_storage on s390x > > > selftests/bpf: Check stack_mprotect() return value > > > selftests/bpf: Increase SIZEOF_BPF_LOCAL_STORAGE_ELEM on s390x > > > selftests/bpf: Add a sign-extension test for kfuncs > > > selftests/bpf: Fix test_lsm on s390x > > > selftests/bpf: Fix test_xdp_adjust_tail_grow2 on s390x > > > selftests/bpf: Fix vmlinux test on s390x > > > libbpf: Read usdt arg spec with bpf_probe_read_kernel() > > > s390/bpf: Fix a typo in a comment > > > s390/bpf: Add expoline to tail calls > > > s390/bpf: Implement bpf_arch_text_poke() > > > bpf: btf: Add BTF_FMODEL_SIGNED_ARG flag > > > s390/bpf: Implement arch_prepare_bpf_trampoline() > > > s390/bpf: Implement bpf_jit_supports_subprog_tailcalls() > > > s390/bpf: Implement bpf_jit_supports_kfunc_call() > > > > > > arch/s390/net/bpf_jit_comp.c | 708 > > > +++++++++++++++++- > > > include/linux/bpf.h | 8 + > > > include/linux/btf.h | 15 +- > > > kernel/bpf/btf.c | 16 +- > > > net/bpf/test_run.c | 9 + > > > tools/lib/bpf/usdt.bpf.h | 33 +- > > > tools/testing/selftests/bpf/Makefile | 7 +- > > > tools/testing/selftests/bpf/netcnt_common.h | 6 +- > > > .../selftests/bpf/prog_tests/bpf_cookie.c | 6 +- > > > .../bpf/prog_tests/cgrp_local_storage.c | 2 +- > > > .../selftests/bpf/prog_tests/decap_sanity.c | 2 +- > > > .../selftests/bpf/prog_tests/fexit_stress.c | 6 +- > > > .../selftests/bpf/prog_tests/kfree_skb.c | 2 +- > > > .../selftests/bpf/prog_tests/kfunc_call.c | 1 + > > > .../selftests/bpf/prog_tests/test_lsm.c | 3 +- > > > .../bpf/prog_tests/trampoline_count.c | 4 + > > > tools/testing/selftests/bpf/prog_tests/usdt.c | 1 + > > > .../bpf/prog_tests/verify_pkcs7_sig.c | 9 + > > > .../bpf/prog_tests/xdp_adjust_tail.c | 7 +- > > > .../bpf/prog_tests/xdp_do_redirect.c | 4 + > > > .../selftests/bpf/progs/kfunc_call_test.c | 18 + > > > tools/testing/selftests/bpf/progs/lsm.c | 7 +- > > > .../bpf/progs/test_verify_pkcs7_sig.c | 12 +- > > > .../selftests/bpf/progs/test_vmlinux.c | 4 +- > > > .../bpf/progs/test_xdp_adjust_tail_grow.c | 8 +- > > > 25 files changed, 816 insertions(+), 82 deletions(-) > > > > > > -- > > > 2.39.1 > > > >
On Fri, 2023-01-27 at 09:24 -0800, Andrii Nakryiko wrote: > On Fri, Jan 27, 2023 at 8:51 AM Ilya Leoshkevich <iii@linux.ibm.com> > wrote: > > > > On Wed, 2023-01-25 at 16:45 -0800, Andrii Nakryiko wrote: > > > On Wed, Jan 25, 2023 at 1:39 PM Ilya Leoshkevich > > > <iii@linux.ibm.com> > > > wrote: > > > > > > > > Hi, > > > > > > > > This series implements poke, trampoline, kfunc, mixing subprogs > > > > and > > > > tailcalls, and fixes a number of tests on s390x. > > > > > > > > The following failures still remain: > > > > > > > > #52 core_read_macros:FAIL > > > > Uses BPF_PROBE_READ(), shouldn't there be > > > > BPF_PROBE_READ_KERNEL()? > > > > > > BPF_PROBE_READ(), similarly to BPF_CORE_READ() both use > > > bpf_probe_read_kernel() internally, as it's most common use case. > > > We > > > have separate BPF_PROBE_READ_USER() and BPF_CORE_READ_USER() for > > > when > > > bpf_probe_read_user() has to be used. > > > > At least purely from the code perspective, BPF_PROBE_READ() seems > > to > > delegate to bpf_probe_read(). The following therefore helps with > > this > > test: > > > > --- a/tools/lib/bpf/bpf_core_read.h > > +++ b/tools/lib/bpf/bpf_core_read.h > > @@ -364,7 +364,7 @@ enum bpf_enum_value_kind { > > > > /* Non-CO-RE variant of BPF_CORE_READ_INTO() */ > > #define BPF_PROBE_READ_INTO(dst, src, a, ...) ({ > > \ > > - ___core_read(bpf_probe_read, bpf_probe_read, > > \ > > + ___core_read(bpf_probe_read_kernel, bpf_probe_read_kernel, > > \ > > dst, (src), a, ##__VA_ARGS__) > > \ > > }) > > > > @@ -400,7 +400,7 @@ enum bpf_enum_value_kind { > > > > /* Non-CO-RE variant of BPF_CORE_READ_STR_INTO() */ > > #define BPF_PROBE_READ_STR_INTO(dst, src, a, ...) ({ > > \ > > - ___core_read(bpf_probe_read_str, bpf_probe_read, > > \ > > + ___core_read(bpf_probe_read_kernel_str, > > bpf_probe_read_kernel, > > \ > > dst, (src), a, ##__VA_ARGS__) > > \ > > }) > > > > but I'm not sure if there are backward compatibility concerns, or > > if > > libbpf is supposed to rewrite this when > > !ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. > > Oh, this is just a bug, it was an omission when we converted > BPF_CORE_READ to bpf_probe_read_kernel. If you look at bpf_core_read > definition, it is using bpf_probe_read_kernel, which is also the > intent for BPF_PROBE_READ. Let's fix this. > > And there is no backwards compat concerns because libbpf will > automatically convert calls to bpf_probe_read_{kernel,user}[_str] to > bpf_probe_read[_str] if host kernel doesn't yet support kernel or > user > specific variants. Thanks for confirming! I will include the fix in v2. > > > > #82 get_stack_raw_tp:FAIL > > > > get_stack_print_output:FAIL:user_stack corrupted user stack > > > > Known issue: > > > > We cannot reliably unwind userspace on s390x without DWARF. > > > > > > like in principle, or frame pointers (or some equivalent) needs > > > to be > > > configured for this to work? > > > > > > Asking also in the context of [0], where s390x was excluded. If > > > there > > > is actually a way to enable frame pointer-based stack unwinding > > > on > > > s390x, would be nice to hear from an expert. > > > > > > [0] https://pagure.io/fesco/issue/2923 > > > > For DWARFless unwinding we have -mbackchain (not to be confused > > with > > -fno-omit-frame-pointer, which we also have, but which just hurts > > performance without providing tangible benefits). > > -mbackchain has a few problems though: > > > > - It's not atomic. Here is a typical prologue with -mbackchain: > > > > 1: stmg %r11,%r15,88(%r15) # save non-volatile > > registers > > 2: lgr %r14,%r15 # %r14 = sp > > 3: lay %r15,-160(%r15) # sp -= 160 > > 4: stg %r14,0(%r15) # *(void **)sp = %r14 > > > > The invariant here is that *(void **)%r15 is always a pointer to > > the > > next frame. This means that if we unwind from (4), we are totally > > broken. This does not happen if we unwind from any other > > instruction, > > but still. > > > > - Unwinding from (1)-(3) is not particularly good either. PSW > > points to > > the callee, but R15 points to the caller's frame. > > > > - Unwinding leaf functions is like the previous problem, but worse: > > they often do not establish a stack frame at all, so PSW and R15 > > are > > out of sync for the entire duration of the call. > > > > Therefore .eh_frame-based unwinding is preferred, since it covers > > all > > these corner cases and is therefore reliable. From what I > > understand, > > adding .eh_frame unwinding to the kernel is not desirable. In an > > internal discussion we had another idea though: would it be > > possible to > > have an eBPF program that does .eh_frame parsing and unwinding? I > > understand that it can be technically challenging at the moment, > > but > > the end result would not be exploitable by crafting malicious > > .eh_frame sections, won't loop endlessly, will have good > > performance, > > etc. > > Thanks for details. This was all discussed at length in Fedora > -fno-omit-frame-pointer discussion I linked above, so no real need to > go over this again. .eh_frame-based unwinding on BPF side is > possible, > but only for processes that you knew about (and preprocessed) before > you started profiling session. Pre-processing is memory and > cpu-intensive operation on busy systems, and they will miss any > processes started during profiling. So as a general approach for > system-wide profiling it leaves a lot to be desired. Thanks for the explanation, I'll read the thread and come back if I get some new ideas not listed here. > Should we enable -mbackchain in our CI for s390x to be able to > capture > stack traces (even if on some instructions they might be incomplete > or > outright broken)? Let's do it, I don't have anything against this. [...] > > > > > > Here is the full log: > > > > https://gist.github.com/iii-i/8e20100c33ab6f0dffb5e6e51d1330e8 > > > > Apparently we do indeed lose a constraint established by > > if (hdr->tcp_len < sizeof(*hdr->tcp)). But the naive > > The test is too big and unfamiliar for me to figure this out. And the > problem is not upper bound, but lower bound. hdr->tcp_len is not > proven to be strictly greater than zero, which is what verifier > complains about. Not sure how it works on other arches right now. > > > But I see that bpf_csum_diff defines size arguments as > ARG_CONST_SIZE_OR_ZERO while bpf_tcp_raw_gen_syncookie_ipv4 has > ARG_CONST_SIZE. I generally found ARG_CONST_SIZE way too problematic > in practice, I'd say we should change it to ARG_CONST_SIZE_OR_ZERO. Yes, this helps, and doesn't seem to introduce issues, since the minimum length is enforced inside this function anyway. I will include the change for bpf_tcp_raw_gen_syncookie_ipv{4,6} in v2; I guess some other functions may benefit from this as well, but this is outside the scope of this series. [...]