Message ID | CACYkzJ7AfZ4HMEzt7OV_T4N8RO4SJcFbyEVxCgVrkKS4uiOD=g@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | Failure in test_local_storage at bpf-next | expand |
On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote: > > I noticed that test_local_storage is broken due to a BTF error at > bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in > xdpsock")] > > ./test_progs -t test_local_storage > libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2 This line is truncated, btw, please make sure you post the entire output next time. But, this seems like a bug in Clang, it produced invalid access index string "0:0.1", there shouldn't be any other separator except ':' in those strings. Yonghong, can you please take a look? This seems to be a very recent regression, I had to update to 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated version to repro this. > libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22 > libbpf: failed to perform CO-RE relocations: -22 > libbpf: failed to load object 'local_storage' > libbpf: failed to load BPF skeleton 'local_storage': -22 > test_test_local_storage:FAIL:skel_load lsm skeleton failed > > by changing it to use vmlinux.h with: > [...] > > clang --version > clang version 12.0.0 (https://github.com/llvm/llvm-project.git > 6c7d713cf5d9bb188f1e73452a256386f0288bf7) > Target: x86_64-unknown-linux-gnu > Thread model: posix > > pahole --version > v1.18 > > This error goes away if I comment out the lsm/socket_post_create or > the lsm/socket_bind which makes me think that something in > bpf_core_apply_relo does not like two programs in the same object > having the same BTF type in its signature (but this just a guess, I > did not investigate more). I was wondering if anyone has any ideas > what could be going on here. > > PS: While working on task local storage, I noted that some of the > checks in this test were buggy and will send a patch to fix them as > well. > > - KP
On 10/6/20 6:23 PM, Andrii Nakryiko wrote: > On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote: >> >> I noticed that test_local_storage is broken due to a BTF error at >> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in >> xdpsock")] >> >> ./test_progs -t test_local_storage >> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2 > > This line is truncated, btw, please make sure you post the entire > output next time. > > But, this seems like a bug in Clang, it produced invalid access index > string "0:0.1", there shouldn't be any other separator except ':' in > those strings. > > Yonghong, can you please take a look? This seems to be a very recent > regression, I had to update to > 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated > version to repro this. Sorry. This indeed is a llvm regression. The guilty patch is https://reviews.llvm.org/D88855 which adds NPM (new pass manager) support for BPF. The patch just merged this morning, thanks for catching the bug so fast. Since NPM is not used by default and the code refactoring looks okay, so I did not run selftests. But, yah, it does change some semantics of the code... I just put a fix at https://reviews.llvm.org/D88942. Hopefully to merge soon. > >> libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22 >> libbpf: failed to perform CO-RE relocations: -22 >> libbpf: failed to load object 'local_storage' >> libbpf: failed to load BPF skeleton 'local_storage': -22 >> test_test_local_storage:FAIL:skel_load lsm skeleton failed >> >> by changing it to use vmlinux.h with: >> > > [...] > >> >> clang --version >> clang version 12.0.0 (https://github.com/llvm/llvm-project.git >> 6c7d713cf5d9bb188f1e73452a256386f0288bf7) >> Target: x86_64-unknown-linux-gnu >> Thread model: posix >> >> pahole --version >> v1.18 >> >> This error goes away if I comment out the lsm/socket_post_create or >> the lsm/socket_bind which makes me think that something in >> bpf_core_apply_relo does not like two programs in the same object >> having the same BTF type in its signature (but this just a guess, I >> did not investigate more). I was wondering if anyone has any ideas >> what could be going on here. >> >> PS: While working on task local storage, I noted that some of the >> checks in this test were buggy and will send a patch to fix them as >> well. >> >> - KP
On Tue, Oct 6, 2020 at 9:31 PM Yonghong Song <yhs@fb.com> wrote: > > > > On 10/6/20 6:23 PM, Andrii Nakryiko wrote: > > On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote: > >> > >> I noticed that test_local_storage is broken due to a BTF error at > >> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in > >> xdpsock")] > >> > >> ./test_progs -t test_local_storage > >> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2 > > > > This line is truncated, btw, please make sure you post the entire > > output next time. > > > > But, this seems like a bug in Clang, it produced invalid access index > > string "0:0.1", there shouldn't be any other separator except ':' in > > those strings. > > > > Yonghong, can you please take a look? This seems to be a very recent > > regression, I had to update to > > 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated > > version to repro this. > > Sorry. This indeed is a llvm regression. The guilty patch is > https://reviews.llvm.org/D88855 which adds NPM (new pass manager) > support for BPF. The patch just merged this morning, thanks for catching > the bug so fast. Since NPM is not used by default and the code > refactoring looks okay, so I did not run selftests. But, yah, it does > change some semantics of the code... > > I just put a fix at https://reviews.llvm.org/D88942. > Hopefully to merge soon. No worries, thanks for a quick fix! > > > > >> libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22 > >> libbpf: failed to perform CO-RE relocations: -22 > >> libbpf: failed to load object 'local_storage' > >> libbpf: failed to load BPF skeleton 'local_storage': -22 > >> test_test_local_storage:FAIL:skel_load lsm skeleton failed > >> > >> by changing it to use vmlinux.h with: > >> > > > > [...] > > > >> > >> clang --version > >> clang version 12.0.0 (https://github.com/llvm/llvm-project.git > >> 6c7d713cf5d9bb188f1e73452a256386f0288bf7) > >> Target: x86_64-unknown-linux-gnu > >> Thread model: posix > >> > >> pahole --version > >> v1.18 > >> > >> This error goes away if I comment out the lsm/socket_post_create or > >> the lsm/socket_bind which makes me think that something in > >> bpf_core_apply_relo does not like two programs in the same object > >> having the same BTF type in its signature (but this just a guess, I > >> did not investigate more). I was wondering if anyone has any ideas > >> what could be going on here. > >> > >> PS: While working on task local storage, I noted that some of the > >> checks in this test were buggy and will send a patch to fix them as > >> well. > >> > >> - KP
On Tue, Oct 6, 2020 at 9:31 PM Yonghong Song <yhs@fb.com> wrote: > > > > On 10/6/20 6:23 PM, Andrii Nakryiko wrote: > > On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote: > >> > >> I noticed that test_local_storage is broken due to a BTF error at > >> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in > >> xdpsock")] > >> > >> ./test_progs -t test_local_storage > >> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2 > > > > This line is truncated, btw, please make sure you post the entire > > output next time. > > > > But, this seems like a bug in Clang, it produced invalid access index > > string "0:0.1", there shouldn't be any other separator except ':' in > > those strings. > > > > Yonghong, can you please take a look? This seems to be a very recent > > regression, I had to update to > > 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated > > version to repro this. > > Sorry. This indeed is a llvm regression. The guilty patch is > https://reviews.llvm.org/D88855 which adds NPM (new pass manager) > support for BPF. The patch just merged this morning, thanks for catching > the bug so fast. Since NPM is not used by default and the code > refactoring looks okay, so I did not run selftests. But, yah, it does > change some semantics of the code... but llvm tests were run, of course. Looks like we need to add more of them, so they can gate the landing. > I just put a fix at https://reviews.llvm.org/D88942. > Hopefully to merge soon. Thanks for the quick fix!
On 10/6/20 10:18 PM, Alexei Starovoitov wrote: > On Tue, Oct 6, 2020 at 9:31 PM Yonghong Song <yhs@fb.com> wrote: >> >> >> >> On 10/6/20 6:23 PM, Andrii Nakryiko wrote: >>> On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote: >>>> >>>> I noticed that test_local_storage is broken due to a BTF error at >>>> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in >>>> xdpsock")] >>>> >>>> ./test_progs -t test_local_storage >>>> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2 >>> >>> This line is truncated, btw, please make sure you post the entire >>> output next time. >>> >>> But, this seems like a bug in Clang, it produced invalid access index >>> string "0:0.1", there shouldn't be any other separator except ':' in >>> those strings. >>> >>> Yonghong, can you please take a look? This seems to be a very recent >>> regression, I had to update to >>> 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated >>> version to repro this. >> >> Sorry. This indeed is a llvm regression. The guilty patch is >> https://reviews.llvm.org/D88855 which adds NPM (new pass manager) >> support for BPF. The patch just merged this morning, thanks for catching >> the bug so fast. Since NPM is not used by default and the code >> refactoring looks okay, so I did not run selftests. But, yah, it does >> change some semantics of the code... > > but llvm tests were run, of course. > Looks like we need to add more of them, so they can gate the landing. Right, just added two more tests to gate this particular kind of failure. Also just pushed a new version which is simpler compared to previous version. > >> I just put a fix at https://reviews.llvm.org/D88942 . >> Hopefully to merge soon. > > Thanks for the quick fix! >
On Wed, Oct 7, 2020 at 7:33 AM Yonghong Song <yhs@fb.com> wrote: > > > > On 10/6/20 10:18 PM, Alexei Starovoitov wrote: > > On Tue, Oct 6, 2020 at 9:31 PM Yonghong Song <yhs@fb.com> wrote: > >> > >> > >> > >> On 10/6/20 6:23 PM, Andrii Nakryiko wrote: > >>> On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote: > >>>> > >>>> I noticed that test_local_storage is broken due to a BTF error at > >>>> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in > >>>> xdpsock")] > >>>> > >>>> ./test_progs -t test_local_storage > >>>> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2 > >>> > >>> This line is truncated, btw, please make sure you post the entire > >>> output next time. I just ran this again and it does not seem like it's truncated: ./test_progs -t test_local_storage libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2 libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22 libbpf: failed to perform CO-RE relocations: -22 libbpf: failed to load object 'local_storage' libbpf: failed to load BPF skeleton 'local_storage': -22 test_test_local_storage:FAIL:skel_load lsm skeleton failed Am I missing something? > >>> > >>> But, this seems like a bug in Clang, it produced invalid access index > >>> string "0:0.1", there shouldn't be any other separator except ':' in > >>> those strings. > >>> > >>> Yonghong, can you please take a look? This seems to be a very recent > >>> regression, I had to update to > >>> 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated > >>> version to repro this. > >> > >> Sorry. This indeed is a llvm regression. The guilty patch is > >> https://reviews.llvm.org/D88855 which adds NPM (new pass manager) > >> support for BPF. The patch just merged this morning, thanks for catching > >> the bug so fast. Since NPM is not used by default and the code > >> refactoring looks okay, so I did not run selftests. But, yah, it does > >> change some semantics of the code... > > > > but llvm tests were run, of course. > > Looks like we need to add more of them, so they can gate the landing. > > Right, just added two more tests to gate this particular kind of > failure. Also just pushed a new version which is simpler compared to > previous version. > > > > >> I just put a fix at https://reviews.llvm.org/D88942 . > >> Hopefully to merge soon. > > > > Thanks for the quick fix! Thank you so much! > >
On Thu, Oct 8, 2020 at 12:04 AM KP Singh <kpsingh@chromium.org> wrote: > > On Wed, Oct 7, 2020 at 7:33 AM Yonghong Song <yhs@fb.com> wrote: > > > > > > > > On 10/6/20 10:18 PM, Alexei Starovoitov wrote: > > > On Tue, Oct 6, 2020 at 9:31 PM Yonghong Song <yhs@fb.com> wrote: > > >> > > >> > > >> > > >> On 10/6/20 6:23 PM, Andrii Nakryiko wrote: > > >>> On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote: > > >>>> > > >>>> I noticed that test_local_storage is broken due to a BTF error at > > >>>> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in > > >>>> xdpsock")] > > >>>> > > >>>> ./test_progs -t test_local_storage > > >>>> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2 > > >>> > > >>> This line is truncated, btw, please make sure you post the entire > > >>> output next time. > > I just ran this again and it does not seem like it's truncated: > > ./test_progs -t test_local_storage > libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2 > libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22 > libbpf: failed to perform CO-RE relocations: -22 > libbpf: failed to load object 'local_storage' > libbpf: failed to load BPF skeleton 'local_storage': -22 > test_test_local_storage:FAIL:skel_load lsm skeleton failed > > Am I missing something? And indeed I am. It was a bug in my script. For the record this should have been: root@kpsingh:~# ./test_progs -t test_local_storage libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 failed: -22 libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22 libbpf: failed to perform CO-RE relocations: -22 libbpf: failed to load object 'local_storage' libbpf: failed to load BPF skeleton 'local_storage': -22 test_test_local_storage:FAIL:skel_load lsm skeleton failed #106 test_local_storage:FAIL Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Thanks everyone for looking into this and fixing it! > > > >>> > > >>> But, this seems like a bug in Clang, it produced invalid access index > > >>> string "0:0.1", there shouldn't be any other separator except ':' in > > >>> those strings. > > >>> > > >>> Yonghong, can you please take a look? This seems to be a very recent > > >>> regression, I had to update to > > >>> 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated > > >>> version to repro this. > > >> > > >> Sorry. This indeed is a llvm regression. The guilty patch is > > >> https://reviews.llvm.org/D88855 which adds NPM (new pass manager) > > >> support for BPF. The patch just merged this morning, thanks for catching > > >> the bug so fast. Since NPM is not used by default and the code > > >> refactoring looks okay, so I did not run selftests. But, yah, it does > > >> change some semantics of the code... > > > > > > but llvm tests were run, of course. > > > Looks like we need to add more of them, so they can gate the landing. > > > > Right, just added two more tests to gate this particular kind of > > failure. Also just pushed a new version which is simpler compared to > > previous version. > > > > > > > >> I just put a fix at https://reviews.llvm.org/D88942 . > > >> Hopefully to merge soon. > > > > > > Thanks for the quick fix! > > Thank you so much! > > > >
diff --git a/tools/testing/selftests/bpf/progs/local_storage.c b/tools/testing/> index 0758ba229ae0..95fad5aca6af 100644 --- a/tools/testing/selftests/bpf/progs/local_storage.c +++ b/tools/testing/selftests/bpf/progs/local_storage.c @@ -4,9 +4,8 @@ * Copyright 2020 Google LLC. */ +#include "vmlinux.h" #include <errno.h> -#include <linux/bpf.h> -#include <stdbool.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h> @@ -36,23 +35,6 @@ struct { __type(value, struct dummy_storage); } sk_storage_map SEC(".maps"); -/* TODO Use vmlinux.h once BTF pruning for embedded types is fixed. - */ -struct sock {} __attribute__((preserve_access_index)); -struct sockaddr {} __attribute__((preserve_access_index)); -struct socket { - struct sock *sk; -} __attribute__((preserve_access_index)); - -struct inode {} __attribute__((preserve_access_index)); -struct dentry { - struct inode *d_inode; -} __attribute__((preserve_access_index)); -struct file { - struct inode *f_inode; -} __attribute__((preserve_access_index)); - - SEC("lsm/inode_unlink") int BPF_PROG(unlink_hook, struct inode *dir, struct dentry *victim) {