diff mbox series

Failure in test_local_storage at bpf-next

Message ID CACYkzJ7AfZ4HMEzt7OV_T4N8RO4SJcFbyEVxCgVrkKS4uiOD=g@mail.gmail.com (mailing list archive)
State Not Applicable
Headers show
Series Failure in test_local_storage at bpf-next | expand

Commit Message

KP Singh Oct. 7, 2020, 12:27 a.m. UTC
I noticed that test_local_storage is broken due to a BTF error at
bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in
xdpsock")]

./test_progs -t test_local_storage
libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2
libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22
libbpf: failed to perform CO-RE relocations: -22
libbpf: failed to load object 'local_storage'
libbpf: failed to load BPF skeleton 'local_storage': -22
test_test_local_storage:FAIL:skel_load lsm skeleton failed

by changing it to use vmlinux.h with:


I get a very similar error:

root@kpsingh:~# ./test_progs -t test_local_storage
libbpf: prog 'socket_post_create': relo #0: parsing [83] struct socket + 0:4.1 2
libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22
libbpf: failed to perform CO-RE relocations: -22
libbpf: failed to load object 'local_storage'
libbpf: failed to load BPF skeleton 'local_storage': -22
test_test_local_storage:FAIL:skel_load lsm skeleton failed
#106 test_local_storage:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

clang --version
clang version 12.0.0 (https://github.com/llvm/llvm-project.git
6c7d713cf5d9bb188f1e73452a256386f0288bf7)
Target: x86_64-unknown-linux-gnu
Thread model: posix

pahole --version
v1.18

This error goes away if I comment out the lsm/socket_post_create or
the lsm/socket_bind which makes me think that something in
bpf_core_apply_relo does not like two programs in the same object
having the same BTF type in its signature (but this just a guess, I
did not investigate more).  I was wondering if anyone has any ideas
what could be going on here.

PS: While working on task local storage, I noted that some of the
checks in this test were buggy and will send a patch to fix them as
well.

- KP

Comments

Andrii Nakryiko Oct. 7, 2020, 1:23 a.m. UTC | #1
On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote:
>
> I noticed that test_local_storage is broken due to a BTF error at
> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in
> xdpsock")]
>
> ./test_progs -t test_local_storage
> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2

This line is truncated, btw, please make sure you post the entire
output next time.

But, this seems like a bug in Clang, it produced invalid access index
string "0:0.1", there shouldn't be any other separator except ':' in
those strings.

Yonghong, can you please take a look? This seems to be a very recent
regression, I had to update to
6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated
version to repro this.

> libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22
> libbpf: failed to perform CO-RE relocations: -22
> libbpf: failed to load object 'local_storage'
> libbpf: failed to load BPF skeleton 'local_storage': -22
> test_test_local_storage:FAIL:skel_load lsm skeleton failed
>
> by changing it to use vmlinux.h with:
>

[...]

>
> clang --version
> clang version 12.0.0 (https://github.com/llvm/llvm-project.git
> 6c7d713cf5d9bb188f1e73452a256386f0288bf7)
> Target: x86_64-unknown-linux-gnu
> Thread model: posix
>
> pahole --version
> v1.18
>
> This error goes away if I comment out the lsm/socket_post_create or
> the lsm/socket_bind which makes me think that something in
> bpf_core_apply_relo does not like two programs in the same object
> having the same BTF type in its signature (but this just a guess, I
> did not investigate more).  I was wondering if anyone has any ideas
> what could be going on here.
>
> PS: While working on task local storage, I noted that some of the
> checks in this test were buggy and will send a patch to fix them as
> well.
>
> - KP
Yonghong Song Oct. 7, 2020, 4:31 a.m. UTC | #2
On 10/6/20 6:23 PM, Andrii Nakryiko wrote:
> On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote:
>>
>> I noticed that test_local_storage is broken due to a BTF error at
>> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in
>> xdpsock")]
>>
>> ./test_progs -t test_local_storage
>> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2
> 
> This line is truncated, btw, please make sure you post the entire
> output next time.
> 
> But, this seems like a bug in Clang, it produced invalid access index
> string "0:0.1", there shouldn't be any other separator except ':' in
> those strings.
> 
> Yonghong, can you please take a look? This seems to be a very recent
> regression, I had to update to
> 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated
> version to repro this.

Sorry. This indeed is a llvm regression. The guilty patch is
https://reviews.llvm.org/D88855 which adds NPM (new pass manager) 
support for BPF. The patch just merged this morning, thanks for catching 
the bug so fast. Since NPM is not used by default and the code 
refactoring looks okay, so I did not run selftests. But, yah, it does 
change some semantics of the code...

I just put a fix at https://reviews.llvm.org/D88942.
Hopefully to merge soon.

> 
>> libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22
>> libbpf: failed to perform CO-RE relocations: -22
>> libbpf: failed to load object 'local_storage'
>> libbpf: failed to load BPF skeleton 'local_storage': -22
>> test_test_local_storage:FAIL:skel_load lsm skeleton failed
>>
>> by changing it to use vmlinux.h with:
>>
> 
> [...]
> 
>>
>> clang --version
>> clang version 12.0.0 (https://github.com/llvm/llvm-project.git
>> 6c7d713cf5d9bb188f1e73452a256386f0288bf7)
>> Target: x86_64-unknown-linux-gnu
>> Thread model: posix
>>
>> pahole --version
>> v1.18
>>
>> This error goes away if I comment out the lsm/socket_post_create or
>> the lsm/socket_bind which makes me think that something in
>> bpf_core_apply_relo does not like two programs in the same object
>> having the same BTF type in its signature (but this just a guess, I
>> did not investigate more).  I was wondering if anyone has any ideas
>> what could be going on here.
>>
>> PS: While working on task local storage, I noted that some of the
>> checks in this test were buggy and will send a patch to fix them as
>> well.
>>
>> - KP
Andrii Nakryiko Oct. 7, 2020, 4:44 a.m. UTC | #3
On Tue, Oct 6, 2020 at 9:31 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 10/6/20 6:23 PM, Andrii Nakryiko wrote:
> > On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote:
> >>
> >> I noticed that test_local_storage is broken due to a BTF error at
> >> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in
> >> xdpsock")]
> >>
> >> ./test_progs -t test_local_storage
> >> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2
> >
> > This line is truncated, btw, please make sure you post the entire
> > output next time.
> >
> > But, this seems like a bug in Clang, it produced invalid access index
> > string "0:0.1", there shouldn't be any other separator except ':' in
> > those strings.
> >
> > Yonghong, can you please take a look? This seems to be a very recent
> > regression, I had to update to
> > 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated
> > version to repro this.
>
> Sorry. This indeed is a llvm regression. The guilty patch is
> https://reviews.llvm.org/D88855 which adds NPM (new pass manager)
> support for BPF. The patch just merged this morning, thanks for catching
> the bug so fast. Since NPM is not used by default and the code
> refactoring looks okay, so I did not run selftests. But, yah, it does
> change some semantics of the code...
>
> I just put a fix at https://reviews.llvm.org/D88942.
> Hopefully to merge soon.

No worries, thanks for a quick fix!

>
> >
> >> libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22
> >> libbpf: failed to perform CO-RE relocations: -22
> >> libbpf: failed to load object 'local_storage'
> >> libbpf: failed to load BPF skeleton 'local_storage': -22
> >> test_test_local_storage:FAIL:skel_load lsm skeleton failed
> >>
> >> by changing it to use vmlinux.h with:
> >>
> >
> > [...]
> >
> >>
> >> clang --version
> >> clang version 12.0.0 (https://github.com/llvm/llvm-project.git
> >> 6c7d713cf5d9bb188f1e73452a256386f0288bf7)
> >> Target: x86_64-unknown-linux-gnu
> >> Thread model: posix
> >>
> >> pahole --version
> >> v1.18
> >>
> >> This error goes away if I comment out the lsm/socket_post_create or
> >> the lsm/socket_bind which makes me think that something in
> >> bpf_core_apply_relo does not like two programs in the same object
> >> having the same BTF type in its signature (but this just a guess, I
> >> did not investigate more).  I was wondering if anyone has any ideas
> >> what could be going on here.
> >>
> >> PS: While working on task local storage, I noted that some of the
> >> checks in this test were buggy and will send a patch to fix them as
> >> well.
> >>
> >> - KP
Alexei Starovoitov Oct. 7, 2020, 5:18 a.m. UTC | #4
On Tue, Oct 6, 2020 at 9:31 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 10/6/20 6:23 PM, Andrii Nakryiko wrote:
> > On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote:
> >>
> >> I noticed that test_local_storage is broken due to a BTF error at
> >> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in
> >> xdpsock")]
> >>
> >> ./test_progs -t test_local_storage
> >> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2
> >
> > This line is truncated, btw, please make sure you post the entire
> > output next time.
> >
> > But, this seems like a bug in Clang, it produced invalid access index
> > string "0:0.1", there shouldn't be any other separator except ':' in
> > those strings.
> >
> > Yonghong, can you please take a look? This seems to be a very recent
> > regression, I had to update to
> > 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated
> > version to repro this.
>
> Sorry. This indeed is a llvm regression. The guilty patch is
> https://reviews.llvm.org/D88855 which adds NPM (new pass manager)
> support for BPF. The patch just merged this morning, thanks for catching
> the bug so fast. Since NPM is not used by default and the code
> refactoring looks okay, so I did not run selftests. But, yah, it does
> change some semantics of the code...

but llvm tests were run, of course.
Looks like we need to add more of them, so they can gate the landing.

> I just put a fix at https://reviews.llvm.org/D88942.
> Hopefully to merge soon.

Thanks for the quick fix!
Yonghong Song Oct. 7, 2020, 5:33 a.m. UTC | #5
On 10/6/20 10:18 PM, Alexei Starovoitov wrote:
> On Tue, Oct 6, 2020 at 9:31 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 10/6/20 6:23 PM, Andrii Nakryiko wrote:
>>> On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote:
>>>>
>>>> I noticed that test_local_storage is broken due to a BTF error at
>>>> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in
>>>> xdpsock")]
>>>>
>>>> ./test_progs -t test_local_storage
>>>> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2
>>>
>>> This line is truncated, btw, please make sure you post the entire
>>> output next time.
>>>
>>> But, this seems like a bug in Clang, it produced invalid access index
>>> string "0:0.1", there shouldn't be any other separator except ':' in
>>> those strings.
>>>
>>> Yonghong, can you please take a look? This seems to be a very recent
>>> regression, I had to update to
>>> 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated
>>> version to repro this.
>>
>> Sorry. This indeed is a llvm regression. The guilty patch is
>> https://reviews.llvm.org/D88855  which adds NPM (new pass manager)
>> support for BPF. The patch just merged this morning, thanks for catching
>> the bug so fast. Since NPM is not used by default and the code
>> refactoring looks okay, so I did not run selftests. But, yah, it does
>> change some semantics of the code...
> 
> but llvm tests were run, of course.
> Looks like we need to add more of them, so they can gate the landing.

Right, just added two more tests to gate this particular kind of 
failure. Also just pushed a new version which is simpler compared to
previous version.

> 
>> I just put a fix at https://reviews.llvm.org/D88942 .
>> Hopefully to merge soon.
> 
> Thanks for the quick fix!
>
KP Singh Oct. 7, 2020, 10:04 p.m. UTC | #6
On Wed, Oct 7, 2020 at 7:33 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 10/6/20 10:18 PM, Alexei Starovoitov wrote:
> > On Tue, Oct 6, 2020 at 9:31 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 10/6/20 6:23 PM, Andrii Nakryiko wrote:
> >>> On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote:
> >>>>
> >>>> I noticed that test_local_storage is broken due to a BTF error at
> >>>> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in
> >>>> xdpsock")]
> >>>>
> >>>> ./test_progs -t test_local_storage
> >>>> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2
> >>>
> >>> This line is truncated, btw, please make sure you post the entire
> >>> output next time.

I just ran this again and it does not seem like it's truncated:

./test_progs -t test_local_storage
libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2
libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22
libbpf: failed to perform CO-RE relocations: -22
libbpf: failed to load object 'local_storage'
libbpf: failed to load BPF skeleton 'local_storage': -22
test_test_local_storage:FAIL:skel_load lsm skeleton failed

Am I missing something?

> >>>
> >>> But, this seems like a bug in Clang, it produced invalid access index
> >>> string "0:0.1", there shouldn't be any other separator except ':' in
> >>> those strings.
> >>>
> >>> Yonghong, can you please take a look? This seems to be a very recent
> >>> regression, I had to update to
> >>> 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated
> >>> version to repro this.
> >>
> >> Sorry. This indeed is a llvm regression. The guilty patch is
> >> https://reviews.llvm.org/D88855  which adds NPM (new pass manager)
> >> support for BPF. The patch just merged this morning, thanks for catching
> >> the bug so fast. Since NPM is not used by default and the code
> >> refactoring looks okay, so I did not run selftests. But, yah, it does
> >> change some semantics of the code...
> >
> > but llvm tests were run, of course.
> > Looks like we need to add more of them, so they can gate the landing.
>
> Right, just added two more tests to gate this particular kind of
> failure. Also just pushed a new version which is simpler compared to
> previous version.
>
> >
> >> I just put a fix at https://reviews.llvm.org/D88942 .
> >> Hopefully to merge soon.
> >
> > Thanks for the quick fix!

Thank you so much!

> >
KP Singh Oct. 7, 2020, 10:07 p.m. UTC | #7
On Thu, Oct 8, 2020 at 12:04 AM KP Singh <kpsingh@chromium.org> wrote:
>
> On Wed, Oct 7, 2020 at 7:33 AM Yonghong Song <yhs@fb.com> wrote:
> >
> >
> >
> > On 10/6/20 10:18 PM, Alexei Starovoitov wrote:
> > > On Tue, Oct 6, 2020 at 9:31 PM Yonghong Song <yhs@fb.com> wrote:
> > >>
> > >>
> > >>
> > >> On 10/6/20 6:23 PM, Andrii Nakryiko wrote:
> > >>> On Tue, Oct 6, 2020 at 5:31 PM KP Singh <kpsingh@chromium.org> wrote:
> > >>>>
> > >>>> I noticed that test_local_storage is broken due to a BTF error at
> > >>>> bpf-next [67ed375530e2 ("samples: bpf: Driver interrupt statistics in
> > >>>> xdpsock")]
> > >>>>
> > >>>> ./test_progs -t test_local_storage
> > >>>> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2
> > >>>
> > >>> This line is truncated, btw, please make sure you post the entire
> > >>> output next time.
>
> I just ran this again and it does not seem like it's truncated:
>
> ./test_progs -t test_local_storage
> libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket + 0:0.1 2
> libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22
> libbpf: failed to perform CO-RE relocations: -22
> libbpf: failed to load object 'local_storage'
> libbpf: failed to load BPF skeleton 'local_storage': -22
> test_test_local_storage:FAIL:skel_load lsm skeleton failed
>
> Am I missing something?

And indeed I am. It was a bug in my script.

For the record this should have been:

root@kpsingh:~# ./test_progs -t test_local_storage
libbpf: prog 'socket_post_create': relo #0: parsing [28] struct socket
+ 0:0.1 failed: -22
libbpf: prog 'socket_post_create': relo #0: failed to relocate: -22
libbpf: failed to perform CO-RE relocations: -22
libbpf: failed to load object 'local_storage'
libbpf: failed to load BPF skeleton 'local_storage': -22
test_test_local_storage:FAIL:skel_load lsm skeleton failed
#106 test_local_storage:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Thanks everyone for looking into this and fixing it!

>
> > >>>
> > >>> But, this seems like a bug in Clang, it produced invalid access index
> > >>> string "0:0.1", there shouldn't be any other separator except ':' in
> > >>> those strings.
> > >>>
> > >>> Yonghong, can you please take a look? This seems to be a very recent
> > >>> regression, I had to update to
> > >>> 6c7d713cf5d9bb188f1e73452a256386f0288bf7 sha from not-too-outdated
> > >>> version to repro this.
> > >>
> > >> Sorry. This indeed is a llvm regression. The guilty patch is
> > >> https://reviews.llvm.org/D88855  which adds NPM (new pass manager)
> > >> support for BPF. The patch just merged this morning, thanks for catching
> > >> the bug so fast. Since NPM is not used by default and the code
> > >> refactoring looks okay, so I did not run selftests. But, yah, it does
> > >> change some semantics of the code...
> > >
> > > but llvm tests were run, of course.
> > > Looks like we need to add more of them, so they can gate the landing.
> >
> > Right, just added two more tests to gate this particular kind of
> > failure. Also just pushed a new version which is simpler compared to
> > previous version.
> >
> > >
> > >> I just put a fix at https://reviews.llvm.org/D88942 .
> > >> Hopefully to merge soon.
> > >
> > > Thanks for the quick fix!
>
> Thank you so much!
>
> > >
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/progs/local_storage.c b/tools/testing/>
index 0758ba229ae0..95fad5aca6af 100644
--- a/tools/testing/selftests/bpf/progs/local_storage.c
+++ b/tools/testing/selftests/bpf/progs/local_storage.c
@@ -4,9 +4,8 @@ 
  * Copyright 2020 Google LLC.
  */

+#include "vmlinux.h"
 #include <errno.h>
-#include <linux/bpf.h>
-#include <stdbool.h>
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_tracing.h>

@@ -36,23 +35,6 @@  struct {
        __type(value, struct dummy_storage);
 } sk_storage_map SEC(".maps");

-/* TODO Use vmlinux.h once BTF pruning for embedded types is fixed.
- */
-struct sock {} __attribute__((preserve_access_index));
-struct sockaddr {} __attribute__((preserve_access_index));
-struct socket {
-       struct sock *sk;
-} __attribute__((preserve_access_index));
-
-struct inode {} __attribute__((preserve_access_index));
-struct dentry {
-       struct inode *d_inode;
-} __attribute__((preserve_access_index));
-struct file {
-       struct inode *f_inode;
-} __attribute__((preserve_access_index));
-
-
 SEC("lsm/inode_unlink")
 int BPF_PROG(unlink_hook, struct inode *dir, struct dentry *victim)
 {