Message ID | 20240208063015.3893418-1-yonghong.song@linux.dev (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next,v2,1/2] bpf: Fix test verif_scale_strobemeta_subprogs failure due to llvm19 | expand |
On Wed, Feb 7, 2024 at 10:30 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > With latest llvm19, I hit the following selftest failures with > > $ ./test_progs -j > libbpf: prog 'on_event': BPF program load failed: Permission denied > libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- > combined stack size of 4 calls is 544. Too large > verification time 1344153 usec > stack depth 24+440+0+32 > processed 51008 insns (limit 1000000) max_states_per_insn 19 total_states 1467 peak_states 303 mark_read 146 > -- END PROG LOAD LOG -- > libbpf: prog 'on_event': failed to load: -13 > libbpf: failed to load object 'strobemeta_subprogs.bpf.o' > scale_test:FAIL:expect_success unexpected error: -13 (errno 13) > #498 verif_scale_strobemeta_subprogs:FAIL > > The verifier complains too big of the combined stack size (544 bytes) which > exceeds the maximum stack limit 512. This is a regression from llvm19 ([1]). > > In the above error log, the original stack depth is 24+440+0+32. > To satisfy interpreter's need, in verifier the stack depth is adjusted to > 32+448+32+32=544 which exceeds 512, hence the error. The same adjusted > stack size is also used for jit case. > > But the jitted codes could use smaller stack size. > > $ egrep -r stack_depth | grep round_up > arm64/net/bpf_jit_comp.c: ctx->stack_size = round_up(prog->aux->stack_depth, 16); > loongarch/net/bpf_jit.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); > powerpc/net/bpf_jit_comp.c: cgctx.stack_size = round_up(fp->aux->stack_depth, 16); > riscv/net/bpf_jit_comp32.c: round_up(ctx->prog->aux->stack_depth, STACK_ALIGN); > riscv/net/bpf_jit_comp64.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); > s390/net/bpf_jit_comp.c: u32 stack_depth = round_up(fp->aux->stack_depth, 8); > sparc/net/bpf_jit_comp_64.c: stack_needed += round_up(stack_depth, 16); > x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8)); > x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); > x86/net/bpf_jit_comp.c: round_up(stack_depth, 8)); > x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); > x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8)); > > In the above, STACK_ALIGN in riscv/net/bpf_jit_comp32.c is defined as 16. > So stack is aligned in either 8 or 16, x86/s390 having 8-byte stack alignment and > the rest having 16-byte alignment. > > This patch calculates total stack depth based on 16-byte alignment if jit is requested. > For the above failing case, the new stack size will be 32+448+0+32=512 and no verification > failure. llvm19 regression will be discussed separately in llvm upstream. > > The verifier change caused three test failures as these tests compared messages > with stack size. More specifically, > - test_global_funcs/global_func1, adjusted to interpreter only since verification will > succeed in jit mode. A new test will be added for jit mode later. > - async_stack_depth/{pseudo_call_check, async_call_root_check}, adjusted based on > new stack size calculation. > > [1] https://lore.kernel.org/bpf/32bde0f0-1881-46c9-931a-673be566c61d@linux.dev/ > > Suggested-by: Alexei Starovoitov <ast@kernel.org> > Signed-off-by: Yonghong Song <yonghong.song@linux.dev> > --- > kernel/bpf/verifier.c | 18 +++++++++++++----- > .../bpf/prog_tests/test_global_funcs.c | 5 ++++- > .../selftests/bpf/progs/async_stack_depth.c | 4 ++-- > 3 files changed, 19 insertions(+), 8 deletions(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index ddaf09db1175..6441a540904b 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -5812,6 +5812,17 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, > strict); > } > > +static int round_up_stack_depth(struct bpf_verifier_env *env, int stack_depth) > +{ > + if (env->prog->jit_requested) > + return round_up(stack_depth, 16); > + > + /* round up to 32-bytes, since this is granularity > + * of interpreter stack size > + */ > + return round_up(max_t(u32, stack_depth, 1), 32); > +} > + > /* starting from main bpf function walk all instructions of the function > * and recursively walk all callees that given function can call. > * Ignore jump and exit insns. > @@ -5855,10 +5866,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx) > depth); > return -EACCES; > } > - /* round up to 32-bytes, since this is granularity > - * of interpreter stack size > - */ > - depth += round_up(max_t(u32, subprog[idx].stack_depth, 1), 32); > + depth += round_up_stack_depth(env, subprog[idx].stack_depth); > if (depth > MAX_BPF_STACK) { > verbose(env, "combined stack size of %d calls is %d. Too large\n", > frame + 1, depth); > @@ -5952,7 +5960,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx) > */ > if (frame == 0) > return 0; > - depth -= round_up(max_t(u32, subprog[idx].stack_depth, 1), 32); > + depth -= round_up_stack_depth(env, subprog[idx].stack_depth); > frame--; > i = ret_insn[frame]; > idx = ret_prog[frame]; > diff --git a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c > index e905cbaf6b3d..a3a41680b38e 100644 > --- a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c > +++ b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c > @@ -138,7 +138,10 @@ static void subtest_ctx_arg_rewrite(void) > > void test_test_global_funcs(void) > { > - RUN_TESTS(test_global_func1); > + if (!env.jit_enabled) { > + RUN_TESTS(test_global_func1); > + } > + > RUN_TESTS(test_global_func2); > RUN_TESTS(test_global_func3); > RUN_TESTS(test_global_func4); > diff --git a/tools/testing/selftests/bpf/progs/async_stack_depth.c b/tools/testing/selftests/bpf/progs/async_stack_depth.c > index 3517c0e01206..a03f313b7482 100644 > --- a/tools/testing/selftests/bpf/progs/async_stack_depth.c > +++ b/tools/testing/selftests/bpf/progs/async_stack_depth.c > @@ -30,7 +30,7 @@ static int bad_timer_cb(void *map, int *key, struct bpf_timer *timer) > } > > SEC("tc") > -__failure __msg("combined stack size of 2 calls is 576. Too large") > +__failure __msg("combined stack size of 2 calls is 544. Too large") maybe it's better to adjust existing test to fail in both JIT and !JIT modes, just not expect exact size in message. I.e., truncate the message to just "combined stack size of 2 calls is") and be done with it? > int pseudo_call_check(struct __sk_buff *ctx) > { > struct hmap_elem *elem; > @@ -45,7 +45,7 @@ int pseudo_call_check(struct __sk_buff *ctx) > } > > SEC("tc") > -__failure __msg("combined stack size of 2 calls is 608. Too large") > +__failure __msg("combined stack size of 2 calls is 576. Too large") > int async_call_root_check(struct __sk_buff *ctx) > { > struct hmap_elem *elem; > -- > 2.39.3 >
On 2/8/24 10:14 AM, Andrii Nakryiko wrote: > On Wed, Feb 7, 2024 at 10:30 PM Yonghong Song <yonghong.song@linux.dev> wrote: >> With latest llvm19, I hit the following selftest failures with >> >> $ ./test_progs -j >> libbpf: prog 'on_event': BPF program load failed: Permission denied >> libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- >> combined stack size of 4 calls is 544. Too large >> verification time 1344153 usec >> stack depth 24+440+0+32 >> processed 51008 insns (limit 1000000) max_states_per_insn 19 total_states 1467 peak_states 303 mark_read 146 >> -- END PROG LOAD LOG -- >> libbpf: prog 'on_event': failed to load: -13 >> libbpf: failed to load object 'strobemeta_subprogs.bpf.o' >> scale_test:FAIL:expect_success unexpected error: -13 (errno 13) >> #498 verif_scale_strobemeta_subprogs:FAIL >> >> The verifier complains too big of the combined stack size (544 bytes) which >> exceeds the maximum stack limit 512. This is a regression from llvm19 ([1]). >> >> In the above error log, the original stack depth is 24+440+0+32. >> To satisfy interpreter's need, in verifier the stack depth is adjusted to >> 32+448+32+32=544 which exceeds 512, hence the error. The same adjusted >> stack size is also used for jit case. >> >> But the jitted codes could use smaller stack size. >> >> $ egrep -r stack_depth | grep round_up >> arm64/net/bpf_jit_comp.c: ctx->stack_size = round_up(prog->aux->stack_depth, 16); >> loongarch/net/bpf_jit.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); >> powerpc/net/bpf_jit_comp.c: cgctx.stack_size = round_up(fp->aux->stack_depth, 16); >> riscv/net/bpf_jit_comp32.c: round_up(ctx->prog->aux->stack_depth, STACK_ALIGN); >> riscv/net/bpf_jit_comp64.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); >> s390/net/bpf_jit_comp.c: u32 stack_depth = round_up(fp->aux->stack_depth, 8); >> sparc/net/bpf_jit_comp_64.c: stack_needed += round_up(stack_depth, 16); >> x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8)); >> x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); >> x86/net/bpf_jit_comp.c: round_up(stack_depth, 8)); >> x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); >> x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8)); >> >> In the above, STACK_ALIGN in riscv/net/bpf_jit_comp32.c is defined as 16. >> So stack is aligned in either 8 or 16, x86/s390 having 8-byte stack alignment and >> the rest having 16-byte alignment. >> >> This patch calculates total stack depth based on 16-byte alignment if jit is requested. >> For the above failing case, the new stack size will be 32+448+0+32=512 and no verification >> failure. llvm19 regression will be discussed separately in llvm upstream. >> >> The verifier change caused three test failures as these tests compared messages >> with stack size. More specifically, >> - test_global_funcs/global_func1, adjusted to interpreter only since verification will >> succeed in jit mode. A new test will be added for jit mode later. >> - async_stack_depth/{pseudo_call_check, async_call_root_check}, adjusted based on >> new stack size calculation. >> >> [1] https://lore.kernel.org/bpf/32bde0f0-1881-46c9-931a-673be566c61d@linux.dev/ >> >> Suggested-by: Alexei Starovoitov <ast@kernel.org> >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> >> --- >> kernel/bpf/verifier.c | 18 +++++++++++++----- >> .../bpf/prog_tests/test_global_funcs.c | 5 ++++- >> .../selftests/bpf/progs/async_stack_depth.c | 4 ++-- >> 3 files changed, 19 insertions(+), 8 deletions(-) >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index ddaf09db1175..6441a540904b 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -5812,6 +5812,17 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, >> strict); >> } >> >> +static int round_up_stack_depth(struct bpf_verifier_env *env, int stack_depth) >> +{ >> + if (env->prog->jit_requested) >> + return round_up(stack_depth, 16); >> + >> + /* round up to 32-bytes, since this is granularity >> + * of interpreter stack size >> + */ >> + return round_up(max_t(u32, stack_depth, 1), 32); >> +} >> + >> /* starting from main bpf function walk all instructions of the function >> * and recursively walk all callees that given function can call. >> * Ignore jump and exit insns. >> @@ -5855,10 +5866,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx) >> depth); >> return -EACCES; >> } >> - /* round up to 32-bytes, since this is granularity >> - * of interpreter stack size >> - */ >> - depth += round_up(max_t(u32, subprog[idx].stack_depth, 1), 32); >> + depth += round_up_stack_depth(env, subprog[idx].stack_depth); >> if (depth > MAX_BPF_STACK) { >> verbose(env, "combined stack size of %d calls is %d. Too large\n", >> frame + 1, depth); >> @@ -5952,7 +5960,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx) >> */ >> if (frame == 0) >> return 0; >> - depth -= round_up(max_t(u32, subprog[idx].stack_depth, 1), 32); >> + depth -= round_up_stack_depth(env, subprog[idx].stack_depth); >> frame--; >> i = ret_insn[frame]; >> idx = ret_prog[frame]; >> diff --git a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c >> index e905cbaf6b3d..a3a41680b38e 100644 >> --- a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c >> +++ b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c >> @@ -138,7 +138,10 @@ static void subtest_ctx_arg_rewrite(void) >> >> void test_test_global_funcs(void) >> { >> - RUN_TESTS(test_global_func1); >> + if (!env.jit_enabled) { >> + RUN_TESTS(test_global_func1); >> + } >> + >> RUN_TESTS(test_global_func2); >> RUN_TESTS(test_global_func3); >> RUN_TESTS(test_global_func4); >> diff --git a/tools/testing/selftests/bpf/progs/async_stack_depth.c b/tools/testing/selftests/bpf/progs/async_stack_depth.c >> index 3517c0e01206..a03f313b7482 100644 >> --- a/tools/testing/selftests/bpf/progs/async_stack_depth.c >> +++ b/tools/testing/selftests/bpf/progs/async_stack_depth.c >> @@ -30,7 +30,7 @@ static int bad_timer_cb(void *map, int *key, struct bpf_timer *timer) >> } >> >> SEC("tc") >> -__failure __msg("combined stack size of 2 calls is 576. Too large") >> +__failure __msg("combined stack size of 2 calls is 544. Too large") > maybe it's better to adjust existing test to fail in both JIT and !JIT > modes, just not expect exact size in message. I.e., truncate the > message to just "combined stack size of 2 calls is") and be done with > it? Originally I did not add support for no-jit since interpreter does not support callback functions, so we have smaller combined stack size, jit mode will succeed and interpreter mode will fail. But in this particular case, verifier will fail before jit is involved. So for no-jit mode, verifier will still emit an error complaining the stack size. Will do what you suggested in the next version. > >> int pseudo_call_check(struct __sk_buff *ctx) >> { >> struct hmap_elem *elem; >> @@ -45,7 +45,7 @@ int pseudo_call_check(struct __sk_buff *ctx) >> } >> >> SEC("tc") >> -__failure __msg("combined stack size of 2 calls is 608. Too large") >> +__failure __msg("combined stack size of 2 calls is 576. Too large") >> int async_call_root_check(struct __sk_buff *ctx) >> { >> struct hmap_elem *elem; >> -- >> 2.39.3 >>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index ddaf09db1175..6441a540904b 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5812,6 +5812,17 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, strict); } +static int round_up_stack_depth(struct bpf_verifier_env *env, int stack_depth) +{ + if (env->prog->jit_requested) + return round_up(stack_depth, 16); + + /* round up to 32-bytes, since this is granularity + * of interpreter stack size + */ + return round_up(max_t(u32, stack_depth, 1), 32); +} + /* starting from main bpf function walk all instructions of the function * and recursively walk all callees that given function can call. * Ignore jump and exit insns. @@ -5855,10 +5866,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx) depth); return -EACCES; } - /* round up to 32-bytes, since this is granularity - * of interpreter stack size - */ - depth += round_up(max_t(u32, subprog[idx].stack_depth, 1), 32); + depth += round_up_stack_depth(env, subprog[idx].stack_depth); if (depth > MAX_BPF_STACK) { verbose(env, "combined stack size of %d calls is %d. Too large\n", frame + 1, depth); @@ -5952,7 +5960,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx) */ if (frame == 0) return 0; - depth -= round_up(max_t(u32, subprog[idx].stack_depth, 1), 32); + depth -= round_up_stack_depth(env, subprog[idx].stack_depth); frame--; i = ret_insn[frame]; idx = ret_prog[frame]; diff --git a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c index e905cbaf6b3d..a3a41680b38e 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c +++ b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c @@ -138,7 +138,10 @@ static void subtest_ctx_arg_rewrite(void) void test_test_global_funcs(void) { - RUN_TESTS(test_global_func1); + if (!env.jit_enabled) { + RUN_TESTS(test_global_func1); + } + RUN_TESTS(test_global_func2); RUN_TESTS(test_global_func3); RUN_TESTS(test_global_func4); diff --git a/tools/testing/selftests/bpf/progs/async_stack_depth.c b/tools/testing/selftests/bpf/progs/async_stack_depth.c index 3517c0e01206..a03f313b7482 100644 --- a/tools/testing/selftests/bpf/progs/async_stack_depth.c +++ b/tools/testing/selftests/bpf/progs/async_stack_depth.c @@ -30,7 +30,7 @@ static int bad_timer_cb(void *map, int *key, struct bpf_timer *timer) } SEC("tc") -__failure __msg("combined stack size of 2 calls is 576. Too large") +__failure __msg("combined stack size of 2 calls is 544. Too large") int pseudo_call_check(struct __sk_buff *ctx) { struct hmap_elem *elem; @@ -45,7 +45,7 @@ int pseudo_call_check(struct __sk_buff *ctx) } SEC("tc") -__failure __msg("combined stack size of 2 calls is 608. Too large") +__failure __msg("combined stack size of 2 calls is 576. Too large") int async_call_root_check(struct __sk_buff *ctx) { struct hmap_elem *elem;
With latest llvm19, I hit the following selftest failures with $ ./test_progs -j libbpf: prog 'on_event': BPF program load failed: Permission denied libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- combined stack size of 4 calls is 544. Too large verification time 1344153 usec stack depth 24+440+0+32 processed 51008 insns (limit 1000000) max_states_per_insn 19 total_states 1467 peak_states 303 mark_read 146 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -13 libbpf: failed to load object 'strobemeta_subprogs.bpf.o' scale_test:FAIL:expect_success unexpected error: -13 (errno 13) #498 verif_scale_strobemeta_subprogs:FAIL The verifier complains too big of the combined stack size (544 bytes) which exceeds the maximum stack limit 512. This is a regression from llvm19 ([1]). In the above error log, the original stack depth is 24+440+0+32. To satisfy interpreter's need, in verifier the stack depth is adjusted to 32+448+32+32=544 which exceeds 512, hence the error. The same adjusted stack size is also used for jit case. But the jitted codes could use smaller stack size. $ egrep -r stack_depth | grep round_up arm64/net/bpf_jit_comp.c: ctx->stack_size = round_up(prog->aux->stack_depth, 16); loongarch/net/bpf_jit.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); powerpc/net/bpf_jit_comp.c: cgctx.stack_size = round_up(fp->aux->stack_depth, 16); riscv/net/bpf_jit_comp32.c: round_up(ctx->prog->aux->stack_depth, STACK_ALIGN); riscv/net/bpf_jit_comp64.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); s390/net/bpf_jit_comp.c: u32 stack_depth = round_up(fp->aux->stack_depth, 8); sparc/net/bpf_jit_comp_64.c: stack_needed += round_up(stack_depth, 16); x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8)); x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); x86/net/bpf_jit_comp.c: round_up(stack_depth, 8)); x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8)); In the above, STACK_ALIGN in riscv/net/bpf_jit_comp32.c is defined as 16. So stack is aligned in either 8 or 16, x86/s390 having 8-byte stack alignment and the rest having 16-byte alignment. This patch calculates total stack depth based on 16-byte alignment if jit is requested. For the above failing case, the new stack size will be 32+448+0+32=512 and no verification failure. llvm19 regression will be discussed separately in llvm upstream. The verifier change caused three test failures as these tests compared messages with stack size. More specifically, - test_global_funcs/global_func1, adjusted to interpreter only since verification will succeed in jit mode. A new test will be added for jit mode later. - async_stack_depth/{pseudo_call_check, async_call_root_check}, adjusted based on new stack size calculation. [1] https://lore.kernel.org/bpf/32bde0f0-1881-46c9-931a-673be566c61d@linux.dev/ Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> --- kernel/bpf/verifier.c | 18 +++++++++++++----- .../bpf/prog_tests/test_global_funcs.c | 5 ++++- .../selftests/bpf/progs/async_stack_depth.c | 4 ++-- 3 files changed, 19 insertions(+), 8 deletions(-)