Message ID | 20230425174744.1758515-1-yhs@fb.com (mailing list archive) |
---|---|
State | Accepted |
Commit | f1f5553d91a11663a5761b78e61f70c1db0bbd2f |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next] selftests/bpf: Fix selftest test_global_funcs/global_func1 failure with latest clang | expand |
On 4/25/23 7:47 PM, Yonghong Song wrote: > The selftest test_global_funcs/global_func1 failed with the latest clang17. > The reason is due to upstream ArgumentPromotionPass ([1]), > which may manipulate static function parameters and cause inlining > although the funciton is marked as noinline. > > The original code: > static __attribute__ ((noinline)) > int f0(int var, struct __sk_buff *skb) > { > return skb->len; > } > > __attribute__ ((noinline)) > int f1(struct __sk_buff *skb) > { > ... > return f0(0, skb) + skb->len; > } > > ... > > SEC("tc") > __failure __msg("combined stack size of 4 calls is 544") > int global_func1(struct __sk_buff *skb) > { > return f0(1, skb) + f1(skb) + f2(2, skb) + f3(3, skb, 4); > } > > After ArgumentPromotionPass, the code is translated to > static __attribute__ ((noinline)) > int f0(int var, int skb_len) > { > return skb_len; > } > > __attribute__ ((noinline)) > int f1(struct __sk_buff *skb) > { > ... > return f0(0, skb->len) + skb->len; > } > > ... > > SEC("tc") > __failure __msg("combined stack size of 4 calls is 544") > int global_func1(struct __sk_buff *skb) > { > return f0(1, skb->len) + f1(skb) + f2(2, skb) + f3(3, skb, 4); > } > > And later llvm InstCombine phase recognized that f0() > simplify returns the value of the second argument and removed f0() > completely and the final code looks like: > __attribute__ ((noinline)) > int f1(struct __sk_buff *skb) > { > ... > return skb->len + skb->len; > } > > ... > > SEC("tc") > __failure __msg("combined stack size of 4 calls is 544") > int global_func1(struct __sk_buff *skb) > { > return skb->len + f1(skb) + f2(2, skb) + f3(3, skb, 4); > } > > If f0() is not inlined, the verification will fail with stack size > 544 for a particular callchain. With f0() inlined, the maximum > stack size is 512 which is in the limit. > > Let us add a `asm volatile ("")` in f0() to prevent ArgumentPromotionPass > from hoisting the code to its caller, and this fixed the test failure. > > [1] https://reviews.llvm.org/D148269 > > Signed-off-by: Yonghong Song <yhs@fb.com> > --- > tools/testing/selftests/bpf/progs/test_global_func1.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/tools/testing/selftests/bpf/progs/test_global_func1.c b/tools/testing/selftests/bpf/progs/test_global_func1.c > index b85fc8c423ba..17a9f59bf5f3 100644 > --- a/tools/testing/selftests/bpf/progs/test_global_func1.c > +++ b/tools/testing/selftests/bpf/progs/test_global_func1.c > @@ -10,6 +10,8 @@ > static __attribute__ ((noinline)) > int f0(int var, struct __sk_buff *skb) > { > + asm volatile (""); > + > return skb->len; Is it planned to get this reverted before the final llvm/clang 17 is officially released (you mentioned the TTI hook in [1])? Thanks, Daniel
On 4/27/23 1:23 PM, Daniel Borkmann wrote: > On 4/25/23 7:47 PM, Yonghong Song wrote: >> The selftest test_global_funcs/global_func1 failed with the latest >> clang17. >> The reason is due to upstream ArgumentPromotionPass ([1]), >> which may manipulate static function parameters and cause inlining >> although the funciton is marked as noinline. >> >> The original code: >> static __attribute__ ((noinline)) >> int f0(int var, struct __sk_buff *skb) >> { >> return skb->len; >> } >> >> __attribute__ ((noinline)) >> int f1(struct __sk_buff *skb) >> { >> ... >> return f0(0, skb) + skb->len; >> } >> >> ... >> >> SEC("tc") >> __failure __msg("combined stack size of 4 calls is 544") >> int global_func1(struct __sk_buff *skb) >> { >> return f0(1, skb) + f1(skb) + f2(2, skb) + f3(3, skb, 4); >> } >> >> After ArgumentPromotionPass, the code is translated to >> static __attribute__ ((noinline)) >> int f0(int var, int skb_len) >> { >> return skb_len; >> } >> >> __attribute__ ((noinline)) >> int f1(struct __sk_buff *skb) >> { >> ... >> return f0(0, skb->len) + skb->len; >> } >> >> ... >> >> SEC("tc") >> __failure __msg("combined stack size of 4 calls is 544") >> int global_func1(struct __sk_buff *skb) >> { >> return f0(1, skb->len) + f1(skb) + f2(2, skb) + f3(3, skb, 4); >> } >> >> And later llvm InstCombine phase recognized that f0() >> simplify returns the value of the second argument and removed f0() >> completely and the final code looks like: >> __attribute__ ((noinline)) >> int f1(struct __sk_buff *skb) >> { >> ... >> return skb->len + skb->len; >> } >> >> ... >> >> SEC("tc") >> __failure __msg("combined stack size of 4 calls is 544") >> int global_func1(struct __sk_buff *skb) >> { >> return skb->len + f1(skb) + f2(2, skb) + f3(3, skb, 4); >> } >> >> If f0() is not inlined, the verification will fail with stack size >> 544 for a particular callchain. With f0() inlined, the maximum >> stack size is 512 which is in the limit. >> >> Let us add a `asm volatile ("")` in f0() to prevent ArgumentPromotionPass >> from hoisting the code to its caller, and this fixed the test failure. >> >> [1] >> https://reviews.llvm.org/D148269 >> Signed-off-by: Yonghong Song <yhs@fb.com> >> --- >> tools/testing/selftests/bpf/progs/test_global_func1.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/tools/testing/selftests/bpf/progs/test_global_func1.c >> b/tools/testing/selftests/bpf/progs/test_global_func1.c >> index b85fc8c423ba..17a9f59bf5f3 100644 >> --- a/tools/testing/selftests/bpf/progs/test_global_func1.c >> +++ b/tools/testing/selftests/bpf/progs/test_global_func1.c >> @@ -10,6 +10,8 @@ >> static __attribute__ ((noinline)) >> int f0(int var, struct __sk_buff *skb) >> { >> + asm volatile (""); >> + >> return skb->len; > > Is it planned to get this reverted before the final llvm/clang 17 is > officially released (you mentioned the TTI hook in [1])? No. This fix will not be reverted even with final clang17. The TTI hook I am used (https://reviews.llvm.org/D148551) is to prevent the optimization from increasing the number of parameter beyond 5. In this particular case, the number of arguments remains at 2, so BPF backend TTI hook has no effect. > > Thanks, > Daniel
Hello: This patch was applied to bpf/bpf-next.git (master) by Andrii Nakryiko <andrii@kernel.org>: On Tue, 25 Apr 2023 10:47:44 -0700 you wrote: > The selftest test_global_funcs/global_func1 failed with the latest clang17. > The reason is due to upstream ArgumentPromotionPass ([1]), > which may manipulate static function parameters and cause inlining > although the funciton is marked as noinline. > > The original code: > static __attribute__ ((noinline)) > int f0(int var, struct __sk_buff *skb) > { > return skb->len; > } > > [...] Here is the summary with links: - [bpf-next] selftests/bpf: Fix selftest test_global_funcs/global_func1 failure with latest clang https://git.kernel.org/bpf/bpf-next/c/f1f5553d91a1 You are awesome, thank you!
diff --git a/tools/testing/selftests/bpf/progs/test_global_func1.c b/tools/testing/selftests/bpf/progs/test_global_func1.c index b85fc8c423ba..17a9f59bf5f3 100644 --- a/tools/testing/selftests/bpf/progs/test_global_func1.c +++ b/tools/testing/selftests/bpf/progs/test_global_func1.c @@ -10,6 +10,8 @@ static __attribute__ ((noinline)) int f0(int var, struct __sk_buff *skb) { + asm volatile (""); + return skb->len; }
The selftest test_global_funcs/global_func1 failed with the latest clang17. The reason is due to upstream ArgumentPromotionPass ([1]), which may manipulate static function parameters and cause inlining although the funciton is marked as noinline. The original code: static __attribute__ ((noinline)) int f0(int var, struct __sk_buff *skb) { return skb->len; } __attribute__ ((noinline)) int f1(struct __sk_buff *skb) { ... return f0(0, skb) + skb->len; } ... SEC("tc") __failure __msg("combined stack size of 4 calls is 544") int global_func1(struct __sk_buff *skb) { return f0(1, skb) + f1(skb) + f2(2, skb) + f3(3, skb, 4); } After ArgumentPromotionPass, the code is translated to static __attribute__ ((noinline)) int f0(int var, int skb_len) { return skb_len; } __attribute__ ((noinline)) int f1(struct __sk_buff *skb) { ... return f0(0, skb->len) + skb->len; } ... SEC("tc") __failure __msg("combined stack size of 4 calls is 544") int global_func1(struct __sk_buff *skb) { return f0(1, skb->len) + f1(skb) + f2(2, skb) + f3(3, skb, 4); } And later llvm InstCombine phase recognized that f0() simplify returns the value of the second argument and removed f0() completely and the final code looks like: __attribute__ ((noinline)) int f1(struct __sk_buff *skb) { ... return skb->len + skb->len; } ... SEC("tc") __failure __msg("combined stack size of 4 calls is 544") int global_func1(struct __sk_buff *skb) { return skb->len + f1(skb) + f2(2, skb) + f3(3, skb, 4); } If f0() is not inlined, the verification will fail with stack size 544 for a particular callchain. With f0() inlined, the maximum stack size is 512 which is in the limit. Let us add a `asm volatile ("")` in f0() to prevent ArgumentPromotionPass from hoisting the code to its caller, and this fixed the test failure. [1] https://reviews.llvm.org/D148269 Signed-off-by: Yonghong Song <yhs@fb.com> --- tools/testing/selftests/bpf/progs/test_global_func1.c | 2 ++ 1 file changed, 2 insertions(+)