Message ID | 20221231163122.1360813-1-eddyz87@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Support for BPF_ST instruction in LLVM C compiler | expand |
On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > BPF has two documented (non-atomic) memory store instructions: > > BPF_STX: *(size *) (dst_reg + off) = src_reg > BPF_ST : *(size *) (dst_reg + off) = imm32 > > Currently LLVM BPF back-end does not emit BPF_ST instruction and does > not allow one to be specified as inline assembly. > > Recently I've been exploring ways to port some of the verifier test > cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly > and machinery provided in tools/testing/selftests/bpf/test_loader.c > (which should hopefully simplify tests maintenance). > The BPF_ST instruction is popular in these tests: used in 52 of 94 files. > > While it is possible to adjust LLVM to only support BPF_ST for inline > assembly blocks it seems a bit wasteful. This patch-set contains a set > of changes to verifier necessary in case when LLVM is allowed to > freely emit BPF_ST instructions (source code is available here [1]). Would we gate LLVM's emitting of BPF_ST for C code behind some new cpu=v4? What is the benefit for compiler to start automatically emit such instructions? Such thinking about logistics, if there isn't much benefit, as BPF application owner I wouldn't bother enabling this behavior risking regressions on old kernels that don't have these changes. So I feel like the biggest benefit is to be able to use this instruction in embedded assembly, to make writing and maintaining tests easier. > The changes include: > - update to verifier.c:check_stack_write_*() functions to track > constant values spilled to stack via BPF_ST instruction in a same > way stack spills of known registers by BPF_STX are tracked; > - updates to verifier.c:convert_ctx_access() and it's callbacks to > handle BPF_ST instruction in a way similar to BPF_STX; > - some test adjustments and a few new tests. > > With the above changes applied and LLVM from [1] all test_verifier, > test_maps, test_progs and test_progs-no_alu32 test cases are passing. > > When built using the LLVM version from [1] BPF programs generated for > selftests and Cilium programs (taken from [2]) see certain reduction > in size, e.g. below are total numbers of instructions for files with > over 5K instructions: > > File Insns Insns Insns Diff > w/o with diff pct > BPF_ST BPF_ST > cilium/bpf_host.o 44620 43774 -846 -1.90% > cilium/bpf_lxc.o 36842 36060 -782 -2.12% > cilium/bpf_overlay.o 23557 23186 -371 -1.57% > cilium/bpf_xdp.o 26397 25931 -466 -1.77% > selftests/core_kern.bpf.o 12359 12359 0 0.00% > selftests/linked_list_fail.bpf.o 5501 5302 -199 -3.62% > selftests/profiler1.bpf.o 17828 17709 -119 -0.67% > selftests/pyperf100.bpf.o 49793 49268 -525 -1.05% > selftests/pyperf180.bpf.o 88738 87813 -925 -1.04% > selftests/pyperf50.bpf.o 25388 25113 -275 -1.08% > selftests/pyperf600.bpf.o 78330 78300 -30 -0.04% > selftests/pyperf_global.bpf.o 5244 5188 -56 -1.07% > selftests/pyperf_subprogs.bpf.o 5262 5192 -70 -1.33% > selftests/strobemeta.bpf.o 17154 16065 -1089 -6.35% > selftests/test_verif_scale2.bpf.o 11337 11337 0 0.00% > > (Instructions are counted by counting the number of instruction lines > in disassembly). > > Is community interested in this work? > Are there any omissions in my changes to the verifier? > > Known issue: > > There are two programs (one Cilium, one selftest) that exhibit > anomalous increase in verifier processing time with this patch-set: > > File Program Insns (A) Insns (B) Insns (DIFF) > ------------------- ----------------------------- --------- --------- -------------- > bpf_host.o tail_ipv6_host_policy_ingress 1576 2403 +827 (+52.47%) > map_kptr.bpf.o test_map_kptr 400 475 +75 (+18.75%) > ------------------- ----------------------------- --------- --------- -------------- > > These are under investigation. > > Thanks, > Eduard > > [1] https://reviews.llvm.org/D140804 > [2] git@github.com:anakryiko/cilium.git > > Eduard Zingerman (5): > bpf: more precise stack write reasoning for BPF_ST instruction > selftests/bpf: check if verifier tracks constants spilled by > BPF_ST_MEM > bpf: allow ctx writes using BPF_ST_MEM instruction > selftests/bpf: test if pointer type is tracked for BPF_ST_MEM > selftests/bpf: don't match exact insn index in expected error message > > kernel/bpf/cgroup.c | 49 +++++--- > kernel/bpf/verifier.c | 102 +++++++++------- > net/core/filter.c | 72 ++++++------ > .../selftests/bpf/prog_tests/log_fixup.c | 2 +- > .../selftests/bpf/prog_tests/spin_lock.c | 8 +- > .../bpf/verifier/bounds_mix_sign_unsign.c | 110 ++++++++++-------- > .../selftests/bpf/verifier/bpf_st_mem.c | 29 +++++ > tools/testing/selftests/bpf/verifier/ctx.c | 11 -- > tools/testing/selftests/bpf/verifier/unpriv.c | 23 ++++ > 9 files changed, 249 insertions(+), 157 deletions(-) > create mode 100644 tools/testing/selftests/bpf/verifier/bpf_st_mem.c > > -- > 2.39.0 >
> On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote: >> >> BPF has two documented (non-atomic) memory store instructions: >> >> BPF_STX: *(size *) (dst_reg + off) = src_reg >> BPF_ST : *(size *) (dst_reg + off) = imm32 >> >> Currently LLVM BPF back-end does not emit BPF_ST instruction and does >> not allow one to be specified as inline assembly. >> >> Recently I've been exploring ways to port some of the verifier test >> cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly >> and machinery provided in tools/testing/selftests/bpf/test_loader.c >> (which should hopefully simplify tests maintenance). >> The BPF_ST instruction is popular in these tests: used in 52 of 94 files. >> >> While it is possible to adjust LLVM to only support BPF_ST for inline >> assembly blocks it seems a bit wasteful. This patch-set contains a set >> of changes to verifier necessary in case when LLVM is allowed to >> freely emit BPF_ST instructions (source code is available here [1]). > > Would we gate LLVM's emitting of BPF_ST for C code behind some new > cpu=v4? What is the benefit for compiler to start automatically emit > such instructions? Such thinking about logistics, if there isn't much > benefit, as BPF application owner I wouldn't bother enabling this > behavior risking regressions on old kernels that don't have these > changes. Hmm, GCC happily generates BPF_ST instructions: $ echo 'int v; void foo () { v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s - $ cat foo.s .file "<stdin>" .text .align 3 .global foo .type foo, @function foo: lddw %r0,v stw [%r0+0],666 exit .size foo, .-foo .global v .type v, @object .lcomm v,4,4 .ident "GCC: (GNU) 12.0.0 20211206 (experimental)" Been doing that since October 2019, I think before the cpu versioning mechanism was got in place? We weren't aware this was problematic. Does the verifier reject such instructions? > So I feel like the biggest benefit is to be able to use this > instruction in embedded assembly, to make writing and maintaining > tests easier. > >> The changes include: >> - update to verifier.c:check_stack_write_*() functions to track >> constant values spilled to stack via BPF_ST instruction in a same >> way stack spills of known registers by BPF_STX are tracked; >> - updates to verifier.c:convert_ctx_access() and it's callbacks to >> handle BPF_ST instruction in a way similar to BPF_STX; >> - some test adjustments and a few new tests. >> >> With the above changes applied and LLVM from [1] all test_verifier, >> test_maps, test_progs and test_progs-no_alu32 test cases are passing. >> >> When built using the LLVM version from [1] BPF programs generated for >> selftests and Cilium programs (taken from [2]) see certain reduction >> in size, e.g. below are total numbers of instructions for files with >> over 5K instructions: >> >> File Insns Insns Insns Diff >> w/o with diff pct >> BPF_ST BPF_ST >> cilium/bpf_host.o 44620 43774 -846 -1.90% >> cilium/bpf_lxc.o 36842 36060 -782 -2.12% >> cilium/bpf_overlay.o 23557 23186 -371 -1.57% >> cilium/bpf_xdp.o 26397 25931 -466 -1.77% >> selftests/core_kern.bpf.o 12359 12359 0 0.00% >> selftests/linked_list_fail.bpf.o 5501 5302 -199 -3.62% >> selftests/profiler1.bpf.o 17828 17709 -119 -0.67% >> selftests/pyperf100.bpf.o 49793 49268 -525 -1.05% >> selftests/pyperf180.bpf.o 88738 87813 -925 -1.04% >> selftests/pyperf50.bpf.o 25388 25113 -275 -1.08% >> selftests/pyperf600.bpf.o 78330 78300 -30 -0.04% >> selftests/pyperf_global.bpf.o 5244 5188 -56 -1.07% >> selftests/pyperf_subprogs.bpf.o 5262 5192 -70 -1.33% >> selftests/strobemeta.bpf.o 17154 16065 -1089 -6.35% >> selftests/test_verif_scale2.bpf.o 11337 11337 0 0.00% >> >> (Instructions are counted by counting the number of instruction lines >> in disassembly). >> >> Is community interested in this work? >> Are there any omissions in my changes to the verifier? >> >> Known issue: >> >> There are two programs (one Cilium, one selftest) that exhibit >> anomalous increase in verifier processing time with this patch-set: >> >> File Program Insns (A) Insns (B) Insns (DIFF) >> ------------------- ----------------------------- --------- --------- -------------- >> bpf_host.o tail_ipv6_host_policy_ingress 1576 2403 +827 (+52.47%) >> map_kptr.bpf.o test_map_kptr 400 475 +75 (+18.75%) >> ------------------- ----------------------------- --------- --------- -------------- >> >> These are under investigation. >> >> Thanks, >> Eduard >> >> [1] https://reviews.llvm.org/D140804 >> [2] git@github.com:anakryiko/cilium.git >> >> Eduard Zingerman (5): >> bpf: more precise stack write reasoning for BPF_ST instruction >> selftests/bpf: check if verifier tracks constants spilled by >> BPF_ST_MEM >> bpf: allow ctx writes using BPF_ST_MEM instruction >> selftests/bpf: test if pointer type is tracked for BPF_ST_MEM >> selftests/bpf: don't match exact insn index in expected error message >> >> kernel/bpf/cgroup.c | 49 +++++--- >> kernel/bpf/verifier.c | 102 +++++++++------- >> net/core/filter.c | 72 ++++++------ >> .../selftests/bpf/prog_tests/log_fixup.c | 2 +- >> .../selftests/bpf/prog_tests/spin_lock.c | 8 +- >> .../bpf/verifier/bounds_mix_sign_unsign.c | 110 ++++++++++-------- >> .../selftests/bpf/verifier/bpf_st_mem.c | 29 +++++ >> tools/testing/selftests/bpf/verifier/ctx.c | 11 -- >> tools/testing/selftests/bpf/verifier/unpriv.c | 23 ++++ >> 9 files changed, 249 insertions(+), 157 deletions(-) >> create mode 100644 tools/testing/selftests/bpf/verifier/bpf_st_mem.c >> >> -- >> 2.39.0 >>
On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote: > > On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > BPF has two documented (non-atomic) memory store instructions: > > > > > > BPF_STX: *(size *) (dst_reg + off) = src_reg > > > BPF_ST : *(size *) (dst_reg + off) = imm32 > > > > > > Currently LLVM BPF back-end does not emit BPF_ST instruction and does > > > not allow one to be specified as inline assembly. > > > > > > Recently I've been exploring ways to port some of the verifier test > > > cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly > > > and machinery provided in tools/testing/selftests/bpf/test_loader.c > > > (which should hopefully simplify tests maintenance). > > > The BPF_ST instruction is popular in these tests: used in 52 of 94 files. > > > > > > While it is possible to adjust LLVM to only support BPF_ST for inline > > > assembly blocks it seems a bit wasteful. This patch-set contains a set > > > of changes to verifier necessary in case when LLVM is allowed to > > > freely emit BPF_ST instructions (source code is available here [1]). > > > > Would we gate LLVM's emitting of BPF_ST for C code behind some new > > cpu=v4? What is the benefit for compiler to start automatically emit > > such instructions? Such thinking about logistics, if there isn't much > > benefit, as BPF application owner I wouldn't bother enabling this > > behavior risking regressions on old kernels that don't have these > > changes. > > Hmm, GCC happily generates BPF_ST instructions: > > $ echo 'int v; void foo () { v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s - > $ cat foo.s > .file "<stdin>" > .text > .align 3 > .global foo > .type foo, @function > foo: > lddw %r0,v > stw [%r0+0],666 > exit > .size foo, .-foo > .global v > .type v, @object > .lcomm v,4,4 > .ident "GCC: (GNU) 12.0.0 20211206 (experimental)" > > Been doing that since October 2019, I think before the cpu versioning > mechanism was got in place? > > We weren't aware this was problematic. Does the verifier reject such > instructions? Interesting, do BPF selftests generated by GCC pass the same way they do if generated by clang? I had to do the following changes to the verifier to make the selftests pass when BPF_ST instruction is allowed for selection: - patch #1 in this patchset: track values of constants written to stack using BPF_ST. Currently these are tracked imprecisely, unlike the writes using BPF_STX, e.g.: fp[-8] = 42; currently verifier assumes that fp[-8]=mmmmmmmm after such instruction, where m stands for "misc", just a note that something is written at fp[-8]. r1 = 42; verifier tracks r1=42 after this instruction. fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. So the patch makes both cases equivalent. - patch #3 in this patchset: adjusts verifier.c:convert_ctx_access() to operate on BPF_ST alongside BPF_STX. Context parameters for some BPF programs types are "fake" data structures. The verifier matches all BPF_STX and BPF_LDX instructions that operate on pointers to such contexts and rewrites these instructions. It might change an offset or add another layer of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access(). (This also implies that verifier forbids writes to non-constant offsets inside such structures). So the patch extends this logic to also handle BPF_ST. > > > So I feel like the biggest benefit is to be able to use this > > instruction in embedded assembly, to make writing and maintaining > > tests easier. > > > > > The changes include: > > > - update to verifier.c:check_stack_write_*() functions to track > > > constant values spilled to stack via BPF_ST instruction in a same > > > way stack spills of known registers by BPF_STX are tracked; > > > - updates to verifier.c:convert_ctx_access() and it's callbacks to > > > handle BPF_ST instruction in a way similar to BPF_STX; > > > - some test adjustments and a few new tests. > > > > > > With the above changes applied and LLVM from [1] all test_verifier, > > > test_maps, test_progs and test_progs-no_alu32 test cases are passing. > > > > > > When built using the LLVM version from [1] BPF programs generated for > > > selftests and Cilium programs (taken from [2]) see certain reduction > > > in size, e.g. below are total numbers of instructions for files with > > > over 5K instructions: > > > > > > File Insns Insns Insns Diff > > > w/o with diff pct > > > BPF_ST BPF_ST > > > cilium/bpf_host.o 44620 43774 -846 -1.90% > > > cilium/bpf_lxc.o 36842 36060 -782 -2.12% > > > cilium/bpf_overlay.o 23557 23186 -371 -1.57% > > > cilium/bpf_xdp.o 26397 25931 -466 -1.77% > > > selftests/core_kern.bpf.o 12359 12359 0 0.00% > > > selftests/linked_list_fail.bpf.o 5501 5302 -199 -3.62% > > > selftests/profiler1.bpf.o 17828 17709 -119 -0.67% > > > selftests/pyperf100.bpf.o 49793 49268 -525 -1.05% > > > selftests/pyperf180.bpf.o 88738 87813 -925 -1.04% > > > selftests/pyperf50.bpf.o 25388 25113 -275 -1.08% > > > selftests/pyperf600.bpf.o 78330 78300 -30 -0.04% > > > selftests/pyperf_global.bpf.o 5244 5188 -56 -1.07% > > > selftests/pyperf_subprogs.bpf.o 5262 5192 -70 -1.33% > > > selftests/strobemeta.bpf.o 17154 16065 -1089 -6.35% > > > selftests/test_verif_scale2.bpf.o 11337 11337 0 0.00% > > > > > > (Instructions are counted by counting the number of instruction lines > > > in disassembly). > > > > > > Is community interested in this work? > > > Are there any omissions in my changes to the verifier? > > > > > > Known issue: > > > > > > There are two programs (one Cilium, one selftest) that exhibit > > > anomalous increase in verifier processing time with this patch-set: > > > > > > File Program Insns (A) Insns (B) Insns (DIFF) > > > ------------------- ----------------------------- --------- --------- -------------- > > > bpf_host.o tail_ipv6_host_policy_ingress 1576 2403 +827 (+52.47%) > > > map_kptr.bpf.o test_map_kptr 400 475 +75 (+18.75%) > > > ------------------- ----------------------------- --------- --------- -------------- > > > > > > These are under investigation. > > > > > > Thanks, > > > Eduard > > > > > > [1] https://reviews.llvm.org/D140804 > > > [2] git@github.com:anakryiko/cilium.git > > > > > > Eduard Zingerman (5): > > > bpf: more precise stack write reasoning for BPF_ST instruction > > > selftests/bpf: check if verifier tracks constants spilled by > > > BPF_ST_MEM > > > bpf: allow ctx writes using BPF_ST_MEM instruction > > > selftests/bpf: test if pointer type is tracked for BPF_ST_MEM > > > selftests/bpf: don't match exact insn index in expected error message > > > > > > kernel/bpf/cgroup.c | 49 +++++--- > > > kernel/bpf/verifier.c | 102 +++++++++------- > > > net/core/filter.c | 72 ++++++------ > > > .../selftests/bpf/prog_tests/log_fixup.c | 2 +- > > > .../selftests/bpf/prog_tests/spin_lock.c | 8 +- > > > .../bpf/verifier/bounds_mix_sign_unsign.c | 110 ++++++++++-------- > > > .../selftests/bpf/verifier/bpf_st_mem.c | 29 +++++ > > > tools/testing/selftests/bpf/verifier/ctx.c | 11 -- > > > tools/testing/selftests/bpf/verifier/unpriv.c | 23 ++++ > > > 9 files changed, 249 insertions(+), 157 deletions(-) > > > create mode 100644 tools/testing/selftests/bpf/verifier/bpf_st_mem.c > > > > > > -- > > > 2.39.0 > > >
> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote: >> > On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote: >> > > >> > > BPF has two documented (non-atomic) memory store instructions: >> > > >> > > BPF_STX: *(size *) (dst_reg + off) = src_reg >> > > BPF_ST : *(size *) (dst_reg + off) = imm32 >> > > >> > > Currently LLVM BPF back-end does not emit BPF_ST instruction and does >> > > not allow one to be specified as inline assembly. >> > > >> > > Recently I've been exploring ways to port some of the verifier test >> > > cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly >> > > and machinery provided in tools/testing/selftests/bpf/test_loader.c >> > > (which should hopefully simplify tests maintenance). >> > > The BPF_ST instruction is popular in these tests: used in 52 of 94 files. >> > > >> > > While it is possible to adjust LLVM to only support BPF_ST for inline >> > > assembly blocks it seems a bit wasteful. This patch-set contains a set >> > > of changes to verifier necessary in case when LLVM is allowed to >> > > freely emit BPF_ST instructions (source code is available here [1]). >> > >> > Would we gate LLVM's emitting of BPF_ST for C code behind some new >> > cpu=v4? What is the benefit for compiler to start automatically emit >> > such instructions? Such thinking about logistics, if there isn't much >> > benefit, as BPF application owner I wouldn't bother enabling this >> > behavior risking regressions on old kernels that don't have these >> > changes. >> >> Hmm, GCC happily generates BPF_ST instructions: >> >> $ echo 'int v; void foo () { v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s - >> $ cat foo.s >> .file "<stdin>" >> .text >> .align 3 >> .global foo >> .type foo, @function >> foo: >> lddw %r0,v >> stw [%r0+0],666 >> exit >> .size foo, .-foo >> .global v >> .type v, @object >> .lcomm v,4,4 >> .ident "GCC: (GNU) 12.0.0 20211206 (experimental)" >> >> Been doing that since October 2019, I think before the cpu versioning >> mechanism was got in place? >> >> We weren't aware this was problematic. Does the verifier reject such >> instructions? > > Interesting, do BPF selftests generated by GCC pass the same way they > do if generated by clang? We are still working on other issues in GCC; we are not yet in par with clang when it comes to build/run the BPF selftests. So I guess we didn't reach that particular breaking point yet ;) > I had to do the following changes to the verifier to make the > selftests pass when BPF_ST instruction is allowed for selection: So, if the current state of affairs is that the kernel rejects BPF_ST instructions, we shouldn't be generating them from GCC. I like Andrii's idea of introducing a -mcpu=v4 and condition generation of BPF_ST from the compiler to it. GCC is defaulting to -mcpu=v3 atm. Yonghong, WDYT? Would that be acceptable for clang as well? > > - patch #1 in this patchset: track values of constants written to > stack using BPF_ST. Currently these are tracked imprecisely, unlike > the writes using BPF_STX, e.g.: > > fp[-8] = 42; currently verifier assumes that fp[-8]=mmmmmmmm > after such instruction, where m stands for "misc", > just a note that something is written at fp[-8]. > > r1 = 42; verifier tracks r1=42 after this instruction. > fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. > > So the patch makes both cases equivalent. > > - patch #3 in this patchset: adjusts verifier.c:convert_ctx_access() > to operate on BPF_ST alongside BPF_STX. > > Context parameters for some BPF programs types are "fake" data > structures. The verifier matches all BPF_STX and BPF_LDX > instructions that operate on pointers to such contexts and rewrites > these instructions. It might change an offset or add another layer > of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access(). > (This also implies that verifier forbids writes to non-constant > offsets inside such structures). > > So the patch extends this logic to also handle BPF_ST. > >> >> > So I feel like the biggest benefit is to be able to use this >> > instruction in embedded assembly, to make writing and maintaining >> > tests easier. >> > >> > > The changes include: >> > > - update to verifier.c:check_stack_write_*() functions to track >> > > constant values spilled to stack via BPF_ST instruction in a same >> > > way stack spills of known registers by BPF_STX are tracked; >> > > - updates to verifier.c:convert_ctx_access() and it's callbacks to >> > > handle BPF_ST instruction in a way similar to BPF_STX; >> > > - some test adjustments and a few new tests. >> > > >> > > With the above changes applied and LLVM from [1] all test_verifier, >> > > test_maps, test_progs and test_progs-no_alu32 test cases are passing. >> > > >> > > When built using the LLVM version from [1] BPF programs generated for >> > > selftests and Cilium programs (taken from [2]) see certain reduction >> > > in size, e.g. below are total numbers of instructions for files with >> > > over 5K instructions: >> > > >> > > File Insns Insns Insns Diff >> > > w/o with diff pct >> > > BPF_ST BPF_ST >> > > cilium/bpf_host.o 44620 43774 -846 -1.90% >> > > cilium/bpf_lxc.o 36842 36060 -782 -2.12% >> > > cilium/bpf_overlay.o 23557 23186 -371 -1.57% >> > > cilium/bpf_xdp.o 26397 25931 -466 -1.77% >> > > selftests/core_kern.bpf.o 12359 12359 0 0.00% >> > > selftests/linked_list_fail.bpf.o 5501 5302 -199 -3.62% >> > > selftests/profiler1.bpf.o 17828 17709 -119 -0.67% >> > > selftests/pyperf100.bpf.o 49793 49268 -525 -1.05% >> > > selftests/pyperf180.bpf.o 88738 87813 -925 -1.04% >> > > selftests/pyperf50.bpf.o 25388 25113 -275 -1.08% >> > > selftests/pyperf600.bpf.o 78330 78300 -30 -0.04% >> > > selftests/pyperf_global.bpf.o 5244 5188 -56 -1.07% >> > > selftests/pyperf_subprogs.bpf.o 5262 5192 -70 -1.33% >> > > selftests/strobemeta.bpf.o 17154 16065 -1089 -6.35% >> > > selftests/test_verif_scale2.bpf.o 11337 11337 0 0.00% >> > > >> > > (Instructions are counted by counting the number of instruction lines >> > > in disassembly). >> > > >> > > Is community interested in this work? >> > > Are there any omissions in my changes to the verifier? >> > > >> > > Known issue: >> > > >> > > There are two programs (one Cilium, one selftest) that exhibit >> > > anomalous increase in verifier processing time with this patch-set: >> > > >> > > File Program Insns (A) Insns (B) Insns (DIFF) >> > > ------------------- ----------------------------- --------- --------- -------------- >> > > bpf_host.o tail_ipv6_host_policy_ingress 1576 2403 +827 (+52.47%) >> > > map_kptr.bpf.o test_map_kptr 400 475 +75 (+18.75%) >> > > ------------------- ----------------------------- --------- --------- -------------- >> > > >> > > These are under investigation. >> > > >> > > Thanks, >> > > Eduard >> > > >> > > [1] https://reviews.llvm.org/D140804 >> > > [2] git@github.com:anakryiko/cilium.git >> > > >> > > Eduard Zingerman (5): >> > > bpf: more precise stack write reasoning for BPF_ST instruction >> > > selftests/bpf: check if verifier tracks constants spilled by >> > > BPF_ST_MEM >> > > bpf: allow ctx writes using BPF_ST_MEM instruction >> > > selftests/bpf: test if pointer type is tracked for BPF_ST_MEM >> > > selftests/bpf: don't match exact insn index in expected error message >> > > >> > > kernel/bpf/cgroup.c | 49 +++++--- >> > > kernel/bpf/verifier.c | 102 +++++++++------- >> > > net/core/filter.c | 72 ++++++------ >> > > .../selftests/bpf/prog_tests/log_fixup.c | 2 +- >> > > .../selftests/bpf/prog_tests/spin_lock.c | 8 +- >> > > .../bpf/verifier/bounds_mix_sign_unsign.c | 110 ++++++++++-------- >> > > .../selftests/bpf/verifier/bpf_st_mem.c | 29 +++++ >> > > tools/testing/selftests/bpf/verifier/ctx.c | 11 -- >> > > tools/testing/selftests/bpf/verifier/unpriv.c | 23 ++++ >> > > 9 files changed, 249 insertions(+), 157 deletions(-) >> > > create mode 100644 tools/testing/selftests/bpf/verifier/bpf_st_mem.c >> > > >> > > -- >> > > 2.39.0 >> > >
On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote: > On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote: > > > On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > > > BPF has two documented (non-atomic) memory store instructions: > > > > > > > > BPF_STX: *(size *) (dst_reg + off) = src_reg > > > > BPF_ST : *(size *) (dst_reg + off) = imm32 > > > > > > > > Currently LLVM BPF back-end does not emit BPF_ST instruction and does > > > > not allow one to be specified as inline assembly. > > > > > > > > Recently I've been exploring ways to port some of the verifier test > > > > cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly > > > > and machinery provided in tools/testing/selftests/bpf/test_loader.c > > > > (which should hopefully simplify tests maintenance). > > > > The BPF_ST instruction is popular in these tests: used in 52 of 94 files. > > > > > > > > While it is possible to adjust LLVM to only support BPF_ST for inline > > > > assembly blocks it seems a bit wasteful. This patch-set contains a set > > > > of changes to verifier necessary in case when LLVM is allowed to > > > > freely emit BPF_ST instructions (source code is available here [1]). > > > > > > Would we gate LLVM's emitting of BPF_ST for C code behind some new > > > cpu=v4? What is the benefit for compiler to start automatically emit > > > such instructions? Such thinking about logistics, if there isn't much > > > benefit, as BPF application owner I wouldn't bother enabling this > > > behavior risking regressions on old kernels that don't have these > > > changes. > > > > Hmm, GCC happily generates BPF_ST instructions: > > > > $ echo 'int v; void foo () { v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s - > > $ cat foo.s > > .file "<stdin>" > > .text > > .align 3 > > .global foo > > .type foo, @function > > foo: > > lddw %r0,v > > stw [%r0+0],666 > > exit > > .size foo, .-foo > > .global v > > .type v, @object > > .lcomm v,4,4 > > .ident "GCC: (GNU) 12.0.0 20211206 (experimental)" > > > > Been doing that since October 2019, I think before the cpu versioning > > mechanism was got in place? > > > > We weren't aware this was problematic. Does the verifier reject such > > instructions? > > Interesting, do BPF selftests generated by GCC pass the same way they > do if generated by clang? > > I had to do the following changes to the verifier to make the > selftests pass when BPF_ST instruction is allowed for selection: > > - patch #1 in this patchset: track values of constants written to > stack using BPF_ST. Currently these are tracked imprecisely, unlike > the writes using BPF_STX, e.g.: > > fp[-8] = 42; currently verifier assumes that fp[-8]=mmmmmmmm > after such instruction, where m stands for "misc", > just a note that something is written at fp[-8]. > > r1 = 42; verifier tracks r1=42 after this instruction. > fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. > > So the patch makes both cases equivalent. > > - patch #3 in this patchset: adjusts verifier.c:convert_ctx_access() > to operate on BPF_ST alongside BPF_STX. > > Context parameters for some BPF programs types are "fake" data > structures. The verifier matches all BPF_STX and BPF_LDX > instructions that operate on pointers to such contexts and rewrites > these instructions. It might change an offset or add another layer > of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access(). > (This also implies that verifier forbids writes to non-constant > offsets inside such structures). > > So the patch extends this logic to also handle BPF_ST. The patch 3 is necessary to land before llvm starts generating 'st' for ctx access. That's clear, but I'm missing why patch 1 is necessary. Sure, it's making the verifier understand scalar spills with 'st' and makes 'st' equivalent to 'stx', but I'm missing why it's necessary. What kind of programs fail to be verified when llvm starts generating 'st' ? Regarind -mcpu=v4. I think we need to add all of our upcoming instructions as a single flag. Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them. -mcpu=v4 could mean: - ST - sign extending loads - sign extend a register - 32-bit JA - proper bswap insns: bswap16, bswap32, bswap64 The sign and 32-bit JA we've discussed earlier. The bswap was on my wish list forever. The existing TO_LE, TO_BE insns are really odd from compiler pov. The compiler should translate bswap IR op into proper bswap insn just like it does on all cpus. Maybe add SDIV to -mcpu=v4 as well?
On 1/12/23 2:27 PM, Alexei Starovoitov wrote: > On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote: >> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote: >>>> On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote: >>>>> >>>>> BPF has two documented (non-atomic) memory store instructions: >>>>> >>>>> BPF_STX: *(size *) (dst_reg + off) = src_reg >>>>> BPF_ST : *(size *) (dst_reg + off) = imm32 >>>>> >>>>> Currently LLVM BPF back-end does not emit BPF_ST instruction and does >>>>> not allow one to be specified as inline assembly. >>>>> >>>>> Recently I've been exploring ways to port some of the verifier test >>>>> cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly >>>>> and machinery provided in tools/testing/selftests/bpf/test_loader.c >>>>> (which should hopefully simplify tests maintenance). >>>>> The BPF_ST instruction is popular in these tests: used in 52 of 94 files. >>>>> >>>>> While it is possible to adjust LLVM to only support BPF_ST for inline >>>>> assembly blocks it seems a bit wasteful. This patch-set contains a set >>>>> of changes to verifier necessary in case when LLVM is allowed to >>>>> freely emit BPF_ST instructions (source code is available here [1]). >>>> >>>> Would we gate LLVM's emitting of BPF_ST for C code behind some new >>>> cpu=v4? What is the benefit for compiler to start automatically emit >>>> such instructions? Such thinking about logistics, if there isn't much >>>> benefit, as BPF application owner I wouldn't bother enabling this >>>> behavior risking regressions on old kernels that don't have these >>>> changes. >>> >>> Hmm, GCC happily generates BPF_ST instructions: >>> >>> $ echo 'int v; void foo () { v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s - >>> $ cat foo.s >>> .file "<stdin>" >>> .text >>> .align 3 >>> .global foo >>> .type foo, @function >>> foo: >>> lddw %r0,v >>> stw [%r0+0],666 >>> exit >>> .size foo, .-foo >>> .global v >>> .type v, @object >>> .lcomm v,4,4 >>> .ident "GCC: (GNU) 12.0.0 20211206 (experimental)" >>> >>> Been doing that since October 2019, I think before the cpu versioning >>> mechanism was got in place? >>> >>> We weren't aware this was problematic. Does the verifier reject such >>> instructions? >> >> Interesting, do BPF selftests generated by GCC pass the same way they >> do if generated by clang? >> >> I had to do the following changes to the verifier to make the >> selftests pass when BPF_ST instruction is allowed for selection: >> >> - patch #1 in this patchset: track values of constants written to >> stack using BPF_ST. Currently these are tracked imprecisely, unlike >> the writes using BPF_STX, e.g.: >> >> fp[-8] = 42; currently verifier assumes that fp[-8]=mmmmmmmm >> after such instruction, where m stands for "misc", >> just a note that something is written at fp[-8]. >> >> r1 = 42; verifier tracks r1=42 after this instruction. >> fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. >> >> So the patch makes both cases equivalent. >> >> - patch #3 in this patchset: adjusts verifier.c:convert_ctx_access() >> to operate on BPF_ST alongside BPF_STX. >> >> Context parameters for some BPF programs types are "fake" data >> structures. The verifier matches all BPF_STX and BPF_LDX >> instructions that operate on pointers to such contexts and rewrites >> these instructions. It might change an offset or add another layer >> of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access(). >> (This also implies that verifier forbids writes to non-constant >> offsets inside such structures). >> >> So the patch extends this logic to also handle BPF_ST. > > The patch 3 is necessary to land before llvm starts generating 'st' for ctx access. > That's clear, but I'm missing why patch 1 is necessary. > Sure, it's making the verifier understand scalar spills with 'st' and > makes 'st' equivalent to 'stx', but I'm missing why it's necessary. > What kind of programs fail to be verified when llvm starts generating 'st' ? > > Regarind -mcpu=v4. > I think we need to add all of our upcoming instructions as a single flag. > Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them. > > -mcpu=v4 could mean: > - ST > - sign extending loads > - sign extend a register > - 32-bit JA > - proper bswap insns: bswap16, bswap32, bswap64 > > The sign and 32-bit JA we've discussed earlier. > The bswap was on my wish list forever. > The existing TO_LE, TO_BE insns are really odd from compiler pov. > The compiler should translate bswap IR op into proper bswap insn > just like it does on all cpus. > > Maybe add SDIV to -mcpu=v4 as well? Right, we should add these insns in llvm17 with -mcpu=v4, so we can keep the number of cpu generations minimum.
> On 1/12/23 2:27 PM, Alexei Starovoitov wrote: >> On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote: >>> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote: >>>>> On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote: >>>>>> >>>>>> BPF has two documented (non-atomic) memory store instructions: >>>>>> >>>>>> BPF_STX: *(size *) (dst_reg + off) = src_reg >>>>>> BPF_ST : *(size *) (dst_reg + off) = imm32 >>>>>> >>>>>> Currently LLVM BPF back-end does not emit BPF_ST instruction and does >>>>>> not allow one to be specified as inline assembly. >>>>>> >>>>>> Recently I've been exploring ways to port some of the verifier test >>>>>> cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly >>>>>> and machinery provided in tools/testing/selftests/bpf/test_loader.c >>>>>> (which should hopefully simplify tests maintenance). >>>>>> The BPF_ST instruction is popular in these tests: used in 52 of 94 files. >>>>>> >>>>>> While it is possible to adjust LLVM to only support BPF_ST for inline >>>>>> assembly blocks it seems a bit wasteful. This patch-set contains a set >>>>>> of changes to verifier necessary in case when LLVM is allowed to >>>>>> freely emit BPF_ST instructions (source code is available here [1]). >>>>> >>>>> Would we gate LLVM's emitting of BPF_ST for C code behind some new >>>>> cpu=v4? What is the benefit for compiler to start automatically emit >>>>> such instructions? Such thinking about logistics, if there isn't much >>>>> benefit, as BPF application owner I wouldn't bother enabling this >>>>> behavior risking regressions on old kernels that don't have these >>>>> changes. >>>> >>>> Hmm, GCC happily generates BPF_ST instructions: >>>> >>>> $ echo 'int v; void foo () { v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s - >>>> $ cat foo.s >>>> .file "<stdin>" >>>> .text >>>> .align 3 >>>> .global foo >>>> .type foo, @function >>>> foo: >>>> lddw %r0,v >>>> stw [%r0+0],666 >>>> exit >>>> .size foo, .-foo >>>> .global v >>>> .type v, @object >>>> .lcomm v,4,4 >>>> .ident "GCC: (GNU) 12.0.0 20211206 (experimental)" >>>> >>>> Been doing that since October 2019, I think before the cpu versioning >>>> mechanism was got in place? >>>> >>>> We weren't aware this was problematic. Does the verifier reject such >>>> instructions? >>> >>> Interesting, do BPF selftests generated by GCC pass the same way they >>> do if generated by clang? >>> >>> I had to do the following changes to the verifier to make the >>> selftests pass when BPF_ST instruction is allowed for selection: >>> >>> - patch #1 in this patchset: track values of constants written to >>> stack using BPF_ST. Currently these are tracked imprecisely, unlike >>> the writes using BPF_STX, e.g.: >>> fp[-8] = 42; currently verifier assumes that >>> fp[-8]=mmmmmmmm >>> after such instruction, where m stands for "misc", >>> just a note that something is written at fp[-8]. >>> r1 = 42; verifier tracks r1=42 after >>> this instruction. >>> fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. >>> >>> So the patch makes both cases equivalent. >>> - patch #3 in this patchset: adjusts >>> verifier.c:convert_ctx_access() >>> to operate on BPF_ST alongside BPF_STX. >>> Context parameters for some BPF programs types are "fake" >>> data >>> structures. The verifier matches all BPF_STX and BPF_LDX >>> instructions that operate on pointers to such contexts and rewrites >>> these instructions. It might change an offset or add another layer >>> of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access(). >>> (This also implies that verifier forbids writes to non-constant >>> offsets inside such structures). >>> So the patch extends this logic to also handle BPF_ST. >> The patch 3 is necessary to land before llvm starts generating 'st' >> for ctx access. >> That's clear, but I'm missing why patch 1 is necessary. >> Sure, it's making the verifier understand scalar spills with 'st' and >> makes 'st' equivalent to 'stx', but I'm missing why it's necessary. >> What kind of programs fail to be verified when llvm starts generating 'st' ? >> Regarind -mcpu=v4. >> I think we need to add all of our upcoming instructions as a single flag. >> Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them. >> -mcpu=v4 could mean: >> - ST >> - sign extending loads >> - sign extend a register >> - 32-bit JA >> - proper bswap insns: bswap16, bswap32, bswap64 >> The sign and 32-bit JA we've discussed earlier. >> The bswap was on my wish list forever. >> The existing TO_LE, TO_BE insns are really odd from compiler pov. >> The compiler should translate bswap IR op into proper bswap insn >> just like it does on all cpus. >> Maybe add SDIV to -mcpu=v4 as well? > > Right, we should add these insns in llvm17 with -mcpu=v4, so we > can keep the number of cpu generations minimum. How do you plan to encode the sign-extend load instructions? I guess a possibility would be to use one of the available op-mode for load instructions that are currently marked as reserved. For example: IMM = 0b000 ABS = 0b001 IND = 0b010 MEM = 0b011 SEM = 0b100 <- new Then we would have the following code fields for sign-extending LDX instructions: op-mode:SEM op-size:{W,H,B,DW} op-class:LDX
On Fri, 2023-01-13 at 09:53 +0100, Jose E. Marchesi wrote: > > On 1/12/23 2:27 PM, Alexei Starovoitov wrote: > > > On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote: > > > > On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote: > > > > > > On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > > > > > > > > > BPF has two documented (non-atomic) memory store instructions: > > > > > > > > > > > > > > BPF_STX: *(size *) (dst_reg + off) = src_reg > > > > > > > BPF_ST : *(size *) (dst_reg + off) = imm32 > > > > > > > > > > > > > > Currently LLVM BPF back-end does not emit BPF_ST instruction and does > > > > > > > not allow one to be specified as inline assembly. > > > > > > > > > > > > > > Recently I've been exploring ways to port some of the verifier test > > > > > > > cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly > > > > > > > and machinery provided in tools/testing/selftests/bpf/test_loader.c > > > > > > > (which should hopefully simplify tests maintenance). > > > > > > > The BPF_ST instruction is popular in these tests: used in 52 of 94 files. > > > > > > > > > > > > > > While it is possible to adjust LLVM to only support BPF_ST for inline > > > > > > > assembly blocks it seems a bit wasteful. This patch-set contains a set > > > > > > > of changes to verifier necessary in case when LLVM is allowed to > > > > > > > freely emit BPF_ST instructions (source code is available here [1]). > > > > > > > > > > > > Would we gate LLVM's emitting of BPF_ST for C code behind some new > > > > > > cpu=v4? What is the benefit for compiler to start automatically emit > > > > > > such instructions? Such thinking about logistics, if there isn't much > > > > > > benefit, as BPF application owner I wouldn't bother enabling this > > > > > > behavior risking regressions on old kernels that don't have these > > > > > > changes. > > > > > > > > > > Hmm, GCC happily generates BPF_ST instructions: > > > > > > > > > > $ echo 'int v; void foo () { v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s - > > > > > $ cat foo.s > > > > > .file "<stdin>" > > > > > .text > > > > > .align 3 > > > > > .global foo > > > > > .type foo, @function > > > > > foo: > > > > > lddw %r0,v > > > > > stw [%r0+0],666 > > > > > exit > > > > > .size foo, .-foo > > > > > .global v > > > > > .type v, @object > > > > > .lcomm v,4,4 > > > > > .ident "GCC: (GNU) 12.0.0 20211206 (experimental)" > > > > > > > > > > Been doing that since October 2019, I think before the cpu versioning > > > > > mechanism was got in place? > > > > > > > > > > We weren't aware this was problematic. Does the verifier reject such > > > > > instructions? > > > > > > > > Interesting, do BPF selftests generated by GCC pass the same way they > > > > do if generated by clang? > > > > > > > > I had to do the following changes to the verifier to make the > > > > selftests pass when BPF_ST instruction is allowed for selection: > > > > > > > > - patch #1 in this patchset: track values of constants written to > > > > stack using BPF_ST. Currently these are tracked imprecisely, unlike > > > > the writes using BPF_STX, e.g.: > > > > fp[-8] = 42; currently verifier assumes that > > > > fp[-8]=mmmmmmmm > > > > after such instruction, where m stands for "misc", > > > > just a note that something is written at fp[-8]. > > > > r1 = 42; verifier tracks r1=42 after > > > > this instruction. > > > > fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. > > > > > > > > So the patch makes both cases equivalent. > > > > - patch #3 in this patchset: adjusts > > > > verifier.c:convert_ctx_access() > > > > to operate on BPF_ST alongside BPF_STX. > > > > Context parameters for some BPF programs types are "fake" > > > > data > > > > structures. The verifier matches all BPF_STX and BPF_LDX > > > > instructions that operate on pointers to such contexts and rewrites > > > > these instructions. It might change an offset or add another layer > > > > of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access(). > > > > (This also implies that verifier forbids writes to non-constant > > > > offsets inside such structures). > > > > So the patch extends this logic to also handle BPF_ST. > > > The patch 3 is necessary to land before llvm starts generating 'st' > > > for ctx access. > > > That's clear, but I'm missing why patch 1 is necessary. > > > Sure, it's making the verifier understand scalar spills with 'st' and > > > makes 'st' equivalent to 'stx', but I'm missing why it's necessary. > > > What kind of programs fail to be verified when llvm starts generating 'st' ? I should have added an example to the summary. There are a few test_prog tests that fail w/o this patch, namely atomic_bounds, dynptr, for_each, xdp_noinline, xdp_synproxy. Here is how atomic_bounds looks: SEC("fentry/bpf_fentry_test1") int BPF_PROG(sub, int x) { int a = 0; int b = __sync_fetch_and_add(&a, 1); /* b is certainly 0 here. Can the verifier tell? */ while (b) continue; return 0; } Compiled w/o BPF_ST Compiled with BPF_ST <sub>: <sub>: w1 = 0x0 *(u32 *)(r10 - 0x4) = r1 *(u32 *)(r10 - 0x4) = 0x0 w1 = 0x1 w1 = 0x1 lock *(u32 *)(r10 - 0x4) += r1 lock *(u32 *)(r10 - 0x4) += r1 if w1 == 0x0 goto +0x1 <LBB0_2> if w1 == 0x0 goto +0x1 <LBB0_2> <LBB0_1>: <LBB0_1>: goto -0x1 <LBB0_1> goto -0x1 <LBB0_1> <LBB0_2>: <LBB0_2>: w0 = 0x0 w0 = 0x0 exit exit When compiled with BPF_ST and verified w/o the patch #1 verification log looks as follows: 0: R1=ctx(off=0,imm=0) R10=fp0 0: (62) *(u32 *)(r10 -4) = 0 ; R10=fp0 fp-8=mmmm???? 1: (b4) w1 = 1 ; R1_w=1 2: (c3) r1 = atomic_fetch_add((u32 *)(r10 -4), r1) ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=mmmm???? 3: (16) if w1 == 0x0 goto pc+1 ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 4: (05) goto pc-1 4: (05) goto pc-1 4: (05) goto pc-1 4: (05) goto pc-1 infinite loop detected at insn 4 When compiled w/o BPF_ST and verified w/o the patch #1 verification log looks as follows: func#0 @0 reg type unsupported for arg#0 function sub#5 0: R1=ctx(off=0,imm=0) R10=fp0 0: (b4) w1 = 0 ; R1_w=0 1: (63) *(u32 *)(r10 -4) = r1 last_idx 1 first_idx 0 regs=2 stack=0 before 0: (b4) w1 = 0 2: R1_w=0 R10=fp0 fp-8=0000???? 2: (b4) w1 = 1 ; R1_w=1 ; int b = __sync_fetch_and_add(&a, 1); 3: (c3) r1 = atomic_fetch_add((u32 *)(r10 -4), r1) ; R1_w=P0 R10=fp0 fp-8=mmmm???? 4: (16) if w1 == 0x0 goto pc+1 6: R1_w=P0 6: (b4) w0 = 0 ; R0_w=0 7: (95) exit processed 7 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 The difference comes from the way zero write to `r10-4` is processed, with BPF_ST it is tracked as `fp-8=mmmm????` after write, without BPF_ST it is tracked as `fp-8=0000???? after` write. Which is caused by the way `check_stack_write_fixed_off()` is written. For the register spills it either saves the complete register state or STACK_ZERO if register is known to be zero. However, for the BPF_ST it just saves STACK_MISC. Hence, the patch #1. > > > Regarind -mcpu=v4. > > > I think we need to add all of our upcoming instructions as a single flag. > > > Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them. > > > -mcpu=v4 could mean: > > > - ST > > > - sign extending loads > > > - sign extend a register > > > - 32-bit JA > > > - proper bswap insns: bswap16, bswap32, bswap64 > > > The sign and 32-bit JA we've discussed earlier. > > > The bswap was on my wish list forever. > > > The existing TO_LE, TO_BE insns are really odd from compiler pov. > > > The compiler should translate bswap IR op into proper bswap insn > > > just like it does on all cpus. > > > Maybe add SDIV to -mcpu=v4 as well? There is also BPF_JSET "PC += off if dst & src" which is not currently emitted by LLVM backend as far as I can tell. > > > > Right, we should add these insns in llvm17 with -mcpu=v4, so we > > can keep the number of cpu generations minimum. > > How do you plan to encode the sign-extend load instructions? > > I guess a possibility would be to use one of the available op-mode for > load instructions that are currently marked as reserved. For example: > > IMM = 0b000 > ABS = 0b001 > IND = 0b010 > MEM = 0b011 > SEM = 0b100 <- new > > Then we would have the following code fields for sign-extending LDX > instructions: > > op-mode:SEM op-size:{W,H,B,DW} op-class:LDX I second the Jose's question about encoding for the new instructions. > - sign extend a register E.g. like this: BPF_SEXT = 0xb0 opcode: BPF_SEXT | BPF_X | BPF_ALU imm32: {8,16,32} // to be consistent with BPF_END insn or BPF_{B,H,W} // to be consistent with LDX/STX Sign extend 8,16,32-bit value from src to 64-bit dst register: src = sext{8,16,32}(dst) > The sign and 32-bit JA we've discussed earlier. Could you please share a link for this discussion? > - proper bswap insns: bswap16, bswap32, bswap64 Could you please extrapolate on this. Current instructions operate on a single register, e.g.: dst_reg = htole16(dst_reg). It looks like this could be reflected in x86 assembly as a single instruction using either bswap or xchg instructions (see [1] and [2]). x86/net/bpf_jit_comp.c:1266 can probably be adjusted to use xchg for 16-bit case. For ARM rev16, rev32, rev64 allow to use the same register for source and destination ([3]). [1] https://www.felixcloutier.com/x86/bswap [2] https://www.felixcloutier.com/x86/xchg [3] https://developer.arm.com/documentation/ddi0602/2022-06/Base-Instructions/REV16--Reverse-bytes-in-16-bit-halfwords-?lang=en https://developer.arm.com/documentation/ddi0602/2022-06/Base-Instructions/REV32--Reverse-bytes-in-32-bit-words-?lang=en https://developer.arm.com/documentation/ddi0602/2022-06/Base-Instructions/REV64--Reverse-Bytes--an-alias-of-REV-?lang=en
On 1/13/23 12:53 AM, Jose E. Marchesi wrote: > >> On 1/12/23 2:27 PM, Alexei Starovoitov wrote: >>> On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote: >>>> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote: >>>>>> On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote: >>>>>>> >>>>>>> BPF has two documented (non-atomic) memory store instructions: >>>>>>> >>>>>>> BPF_STX: *(size *) (dst_reg + off) = src_reg >>>>>>> BPF_ST : *(size *) (dst_reg + off) = imm32 >>>>>>> >>>>>>> Currently LLVM BPF back-end does not emit BPF_ST instruction and does >>>>>>> not allow one to be specified as inline assembly. >>>>>>> >>>>>>> Recently I've been exploring ways to port some of the verifier test >>>>>>> cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly >>>>>>> and machinery provided in tools/testing/selftests/bpf/test_loader.c >>>>>>> (which should hopefully simplify tests maintenance). >>>>>>> The BPF_ST instruction is popular in these tests: used in 52 of 94 files. >>>>>>> >>>>>>> While it is possible to adjust LLVM to only support BPF_ST for inline >>>>>>> assembly blocks it seems a bit wasteful. This patch-set contains a set >>>>>>> of changes to verifier necessary in case when LLVM is allowed to >>>>>>> freely emit BPF_ST instructions (source code is available here [1]). >>>>>> >>>>>> Would we gate LLVM's emitting of BPF_ST for C code behind some new >>>>>> cpu=v4? What is the benefit for compiler to start automatically emit >>>>>> such instructions? Such thinking about logistics, if there isn't much >>>>>> benefit, as BPF application owner I wouldn't bother enabling this >>>>>> behavior risking regressions on old kernels that don't have these >>>>>> changes. >>>>> >>>>> Hmm, GCC happily generates BPF_ST instructions: >>>>> >>>>> $ echo 'int v; void foo () { v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s - >>>>> $ cat foo.s >>>>> .file "<stdin>" >>>>> .text >>>>> .align 3 >>>>> .global foo >>>>> .type foo, @function >>>>> foo: >>>>> lddw %r0,v >>>>> stw [%r0+0],666 >>>>> exit >>>>> .size foo, .-foo >>>>> .global v >>>>> .type v, @object >>>>> .lcomm v,4,4 >>>>> .ident "GCC: (GNU) 12.0.0 20211206 (experimental)" >>>>> >>>>> Been doing that since October 2019, I think before the cpu versioning >>>>> mechanism was got in place? >>>>> >>>>> We weren't aware this was problematic. Does the verifier reject such >>>>> instructions? >>>> >>>> Interesting, do BPF selftests generated by GCC pass the same way they >>>> do if generated by clang? >>>> >>>> I had to do the following changes to the verifier to make the >>>> selftests pass when BPF_ST instruction is allowed for selection: >>>> >>>> - patch #1 in this patchset: track values of constants written to >>>> stack using BPF_ST. Currently these are tracked imprecisely, unlike >>>> the writes using BPF_STX, e.g.: >>>> fp[-8] = 42; currently verifier assumes that >>>> fp[-8]=mmmmmmmm >>>> after such instruction, where m stands for "misc", >>>> just a note that something is written at fp[-8]. >>>> r1 = 42; verifier tracks r1=42 after >>>> this instruction. >>>> fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. >>>> >>>> So the patch makes both cases equivalent. >>>> - patch #3 in this patchset: adjusts >>>> verifier.c:convert_ctx_access() >>>> to operate on BPF_ST alongside BPF_STX. >>>> Context parameters for some BPF programs types are "fake" >>>> data >>>> structures. The verifier matches all BPF_STX and BPF_LDX >>>> instructions that operate on pointers to such contexts and rewrites >>>> these instructions. It might change an offset or add another layer >>>> of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access(). >>>> (This also implies that verifier forbids writes to non-constant >>>> offsets inside such structures). >>>> So the patch extends this logic to also handle BPF_ST. >>> The patch 3 is necessary to land before llvm starts generating 'st' >>> for ctx access. >>> That's clear, but I'm missing why patch 1 is necessary. >>> Sure, it's making the verifier understand scalar spills with 'st' and >>> makes 'st' equivalent to 'stx', but I'm missing why it's necessary. >>> What kind of programs fail to be verified when llvm starts generating 'st' ? >>> Regarind -mcpu=v4. >>> I think we need to add all of our upcoming instructions as a single flag. >>> Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them. >>> -mcpu=v4 could mean: >>> - ST >>> - sign extending loads >>> - sign extend a register >>> - 32-bit JA >>> - proper bswap insns: bswap16, bswap32, bswap64 >>> The sign and 32-bit JA we've discussed earlier. >>> The bswap was on my wish list forever. >>> The existing TO_LE, TO_BE insns are really odd from compiler pov. >>> The compiler should translate bswap IR op into proper bswap insn >>> just like it does on all cpus. >>> Maybe add SDIV to -mcpu=v4 as well? >> >> Right, we should add these insns in llvm17 with -mcpu=v4, so we >> can keep the number of cpu generations minimum. > > How do you plan to encode the sign-extend load instructions? > > I guess a possibility would be to use one of the available op-mode for > load instructions that are currently marked as reserved. For example: > > IMM = 0b000 > ABS = 0b001 > IND = 0b010 > MEM = 0b011 > SEM = 0b100 <- new > > Then we would have the following code fields for sign-extending LDX > instructions: > > op-mode:SEM op-size:{W,H,B,DW} op-class:LDX Right, this is what I plan to do as well. See my incomplete llvm prototype here: https://reviews.llvm.org/D133464
On 1/13/23 8:47 AM, Eduard Zingerman wrote: >>>> So the patch extends this logic to also handle BPF_ST. >>>> The patch 3 is necessary to land before llvm starts generating 'st' >>>> for ctx access. >>>> That's clear, but I'm missing why patch 1 is necessary. >>>> Sure, it's making the verifier understand scalar spills with 'st' and >>>> makes 'st' equivalent to 'stx', but I'm missing why it's necessary. >>>> What kind of programs fail to be verified when llvm starts generating 'st' ? > > I should have added an example to the summary. There are a few > test_prog tests that fail w/o this patch, namely atomic_bounds, > dynptr, for_each, xdp_noinline, xdp_synproxy. > > Here is how atomic_bounds looks: > > SEC("fentry/bpf_fentry_test1") > int BPF_PROG(sub, int x) > { > int a = 0; > int b = __sync_fetch_and_add(&a, 1); > /* b is certainly 0 here. Can the verifier tell? */ > while (b) > continue; > return 0; > } > > Compiled w/o BPF_ST Compiled with BPF_ST > > <sub>: <sub>: > w1 = 0x0 > *(u32 *)(r10 - 0x4) = r1 *(u32 *)(r10 - 0x4) = 0x0 > w1 = 0x1 w1 = 0x1 > lock *(u32 *)(r10 - 0x4) += r1 lock *(u32 *)(r10 - 0x4) += r1 > if w1 == 0x0 goto +0x1 <LBB0_2> if w1 == 0x0 goto +0x1 <LBB0_2> > > <LBB0_1>: <LBB0_1>: > goto -0x1 <LBB0_1> goto -0x1 <LBB0_1> > > <LBB0_2>: <LBB0_2>: > w0 = 0x0 w0 = 0x0 > exit exit > > When compiled with BPF_ST and verified w/o the patch #1 verification log > looks as follows: > > 0: R1=ctx(off=0,imm=0) R10=fp0 > 0: (62) *(u32 *)(r10 -4) = 0 ; R10=fp0 fp-8=mmmm???? > 1: (b4) w1 = 1 ; R1_w=1 > 2: (c3) r1 = atomic_fetch_add((u32 *)(r10 -4), r1) ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=mmmm???? > 3: (16) if w1 == 0x0 goto pc+1 ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) > 4: (05) goto pc-1 > 4: (05) goto pc-1 > 4: (05) goto pc-1 > 4: (05) goto pc-1 > infinite loop detected at insn 4 > > When compiled w/o BPF_ST and verified w/o the patch #1 verification log > looks as follows: > > func#0 @0 > reg type unsupported for arg#0 function sub#5 > 0: R1=ctx(off=0,imm=0) R10=fp0 > 0: (b4) w1 = 0 ; R1_w=0 > 1: (63) *(u32 *)(r10 -4) = r1 > last_idx 1 first_idx 0 > regs=2 stack=0 before 0: (b4) w1 = 0 > 2: R1_w=0 R10=fp0 fp-8=0000???? > 2: (b4) w1 = 1 ; R1_w=1 > ; int b = __sync_fetch_and_add(&a, 1); > 3: (c3) r1 = atomic_fetch_add((u32 *)(r10 -4), r1) ; R1_w=P0 R10=fp0 fp-8=mmmm???? > 4: (16) if w1 == 0x0 goto pc+1 > 6: R1_w=P0 > 6: (b4) w0 = 0 ; R0_w=0 > 7: (95) exit > processed 7 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 > > The difference comes from the way zero write to `r10-4` is processed, > with BPF_ST it is tracked as `fp-8=mmmm????` after write, without BPF_ST > it is tracked as `fp-8=0000???? after` write. > > Which is caused by the way `check_stack_write_fixed_off()` is written. > For the register spills it either saves the complete register state or > STACK_ZERO if register is known to be zero. However, for the BPF_ST it > just saves STACK_MISC. Hence, the patch #1. Thank you for explaining. Though llvm changes are not ready, let's land the required kernel changes now. Please craft asm test cases for the issues you've seen with BPF_ST.