Message ID | 20210826120953.11041-1-toke@redhat.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next,v2] libbpf: ignore .eh_frame sections when parsing elf files | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 4 maintainers not CCed: clang-built-linux@googlegroups.com netdev@vger.kernel.org nathan@kernel.org ndesaulniers@google.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 9 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
bpf/vmtest | success | Kernel LATEST + selftests |
bpf/vmtest-bpf-next | success | Kernel LATEST + selftests |
On Thu, Aug 26, 2021 at 5:10 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > When .eh_frame and .rel.eh_frame sections are present in BPF object files, > libbpf produces errors like this when loading the file: > > libbpf: elf: skipping unrecognized data section(32) .eh_frame > libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame > > It is possible to get rid of the .eh_frame section by adding > -fno-asynchronous-unwind-tables to the compilation, but we have seen > multiple examples of these sections appearing in BPF files in the wild, > most recently in samples/bpf, fixed by: > 5a0ae9872d5c ("bpf, samples: Add -fno-asynchronous-unwind-tables to BPF Clang invocation") > > While the errors are technically harmless, they look odd and confuse users. These warnings point out invalid set of compiler flags used for compiling BPF object files, though. Which is a good thing and should incentivize anyone getting those warnings to check and fix how they do BPF compilation. Those .eh_frame sections shouldn't be present in BPF object files at all, and that's what libbpf is trying to say. I don't know exactly in which situations that .eh_frame section is added, but looking at our selftests (and now samples/bpf as well), where we use -target bpf, we don't need -fno-asynchronous-unwind-tables at all. So instead of hiding the problem, let's use this as an opportunity to fix those user's compilation flags instead. > So let's make libbpf filter out those sections, by adding .eh_frame to the > filter check in is_sec_name_dwarf(). > > v2: > - Expand explanation in the commit message > > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > --- > tools/lib/bpf/libbpf.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index 88d8825fc6f6..b1dc97b95965 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -2909,7 +2909,8 @@ static Elf_Data *elf_sec_data(const struct bpf_object *obj, Elf_Scn *scn) > static bool is_sec_name_dwarf(const char *name) > { > /* approximation, but the actual list is too long */ > - return strncmp(name, ".debug_", sizeof(".debug_") - 1) == 0; > + return (strncmp(name, ".debug_", sizeof(".debug_") - 1) == 0 || > + strncmp(name, ".eh_frame", sizeof(".eh_frame") - 1) == 0); > } > > static bool ignore_elf_section(GElf_Shdr *hdr, const char *name) > -- > 2.33.0 >
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > On Thu, Aug 26, 2021 at 5:10 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> When .eh_frame and .rel.eh_frame sections are present in BPF object files, >> libbpf produces errors like this when loading the file: >> >> libbpf: elf: skipping unrecognized data section(32) .eh_frame >> libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame >> >> It is possible to get rid of the .eh_frame section by adding >> -fno-asynchronous-unwind-tables to the compilation, but we have seen >> multiple examples of these sections appearing in BPF files in the wild, >> most recently in samples/bpf, fixed by: >> 5a0ae9872d5c ("bpf, samples: Add -fno-asynchronous-unwind-tables to BPF Clang invocation") >> >> While the errors are technically harmless, they look odd and confuse users. > > These warnings point out invalid set of compiler flags used for > compiling BPF object files, though. Which is a good thing and should > incentivize anyone getting those warnings to check and fix how they do > BPF compilation. Those .eh_frame sections shouldn't be present in BPF > object files at all, and that's what libbpf is trying to say. Apart from triggering that warning, what effect does this have, though? The programs seem to work just fine (as evidenced by the fact that samples/bpf has been built this way for years, for instance)... Also, how is a user supposed to go from that cryptic error message to figuring out that it has something to do with compiler flags? > I don't know exactly in which situations that .eh_frame section is > added, but looking at our selftests (and now samples/bpf as well), > where we use -target bpf, we don't need > -fno-asynchronous-unwind-tables at all. This seems to at least be compiler-dependent. We ran into this with bpftool as well (for the internal BPF programs it loads whenever it runs), which already had '-target bpf' in the Makefile. We're carrying an internal RHEL patch adding -fno-asynchronous-unwind-tables to the bpftool build to fix this... > So instead of hiding the problem, let's use this as an opportunity to > fix those user's compilation flags instead. This really doesn't seem like something that's helping anyone, it's just annoying and confusing users... -Toke
On Tue, Aug 31, 2021 at 3:28 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > > > On Thu, Aug 26, 2021 at 5:10 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> > >> When .eh_frame and .rel.eh_frame sections are present in BPF object files, > >> libbpf produces errors like this when loading the file: > >> > >> libbpf: elf: skipping unrecognized data section(32) .eh_frame > >> libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame > >> > >> It is possible to get rid of the .eh_frame section by adding > >> -fno-asynchronous-unwind-tables to the compilation, but we have seen > >> multiple examples of these sections appearing in BPF files in the wild, > >> most recently in samples/bpf, fixed by: > >> 5a0ae9872d5c ("bpf, samples: Add -fno-asynchronous-unwind-tables to BPF Clang invocation") > >> > >> While the errors are technically harmless, they look odd and confuse users. > > > > These warnings point out invalid set of compiler flags used for > > compiling BPF object files, though. Which is a good thing and should > > incentivize anyone getting those warnings to check and fix how they do > > BPF compilation. Those .eh_frame sections shouldn't be present in BPF > > object files at all, and that's what libbpf is trying to say. > > Apart from triggering that warning, what effect does this have, though? > The programs seem to work just fine (as evidenced by the fact that > samples/bpf has been built this way for years, for instance)... > > Also, how is a user supposed to go from that cryptic error message to > figuring out that it has something to do with compiler flags? Google and find discussions like these?.. I don't think libbpf error messages have to include intro into DWARF and .eh_frame. Just googling ".eh_frame" gives me [0] as a first link, which seems to describe what it is and how to get rid of it. [0] https://stackoverflow.com/questions/26300819/why-gcc-compiled-c-program-needs-eh-frame-section > > > I don't know exactly in which situations that .eh_frame section is > > added, but looking at our selftests (and now samples/bpf as well), > > where we use -target bpf, we don't need > > -fno-asynchronous-unwind-tables at all. > > This seems to at least be compiler-dependent. We ran into this with > bpftool as well (for the internal BPF programs it loads whenever it > runs), which already had '-target bpf' in the Makefile. We're carrying > an internal RHEL patch adding -fno-asynchronous-unwind-tables to the > bpftool build to fix this... So instead of figuring out why your compilers cause .eh_frame generation (while they shouldn't), you are trying to hide the warning in libbpf? This hasn't been the problem in production apps at Facebook, nor with libbpf-tools or libbpf-bootstrap apps. Which just makes me keep this warning more. Once we support multiple .rodata/.data/.bss sections for libbpf, I think I'll turn all those unrecognized sections into actual errors. I'd rather not have unknown sections being just ignored by libbpf. Someday we might actually use .eh_frame with BPF objects, that's when this will become not an error or warning. > > > So instead of hiding the problem, let's use this as an opportunity to > > fix those user's compilation flags instead. > > This really doesn't seem like something that's helping anyone, it's just > annoying and confusing users... Warnings like "libbpf: elf: skipping unrecognized data section(4) .rodata.str1.1" annoy me as well, and that's one of the reasons I'll add support for multiple .rodata sections. So annoying is fine, it raises awareness and incentivizes fixing the problem. > > -Toke >
On 8/31/21 3:28 AM, Toke Høiland-Jørgensen wrote: > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > >> On Thu, Aug 26, 2021 at 5:10 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>> >>> When .eh_frame and .rel.eh_frame sections are present in BPF object files, >>> libbpf produces errors like this when loading the file: >>> >>> libbpf: elf: skipping unrecognized data section(32) .eh_frame >>> libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame >>> >>> It is possible to get rid of the .eh_frame section by adding >>> -fno-asynchronous-unwind-tables to the compilation, but we have seen >>> multiple examples of these sections appearing in BPF files in the wild, >>> most recently in samples/bpf, fixed by: >>> 5a0ae9872d5c ("bpf, samples: Add -fno-asynchronous-unwind-tables to BPF Clang invocation") >>> >>> While the errors are technically harmless, they look odd and confuse users. >> >> These warnings point out invalid set of compiler flags used for >> compiling BPF object files, though. Which is a good thing and should >> incentivize anyone getting those warnings to check and fix how they do >> BPF compilation. Those .eh_frame sections shouldn't be present in BPF >> object files at all, and that's what libbpf is trying to say. > > Apart from triggering that warning, what effect does this have, though? > The programs seem to work just fine (as evidenced by the fact that > samples/bpf has been built this way for years, for instance)... > > Also, how is a user supposed to go from that cryptic error message to > figuring out that it has something to do with compiler flags? > >> I don't know exactly in which situations that .eh_frame section is >> added, but looking at our selftests (and now samples/bpf as well), >> where we use -target bpf, we don't need >> -fno-asynchronous-unwind-tables at all. > > This seems to at least be compiler-dependent. We ran into this with > bpftool as well (for the internal BPF programs it loads whenever it > runs), which already had '-target bpf' in the Makefile. We're carrying > an internal RHEL patch adding -fno-asynchronous-unwind-tables to the > bpftool build to fix this... I haven't seen an instance of .eh_frame as well with -target bpf. Do you have a reproducible test case? I would like to investigate what is the possible cause and whether we could do something in llvm to prevent its generatin. Thanks! > >> So instead of hiding the problem, let's use this as an opportunity to >> fix those user's compilation flags instead. > > This really doesn't seem like something that's helping anyone, it's just > annoying and confusing users... > > -Toke >
Yonghong Song <yhs@fb.com> writes: > On 8/31/21 3:28 AM, Toke Høiland-Jørgensen wrote: >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: >> >>> On Thu, Aug 26, 2021 at 5:10 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>>> >>>> When .eh_frame and .rel.eh_frame sections are present in BPF object files, >>>> libbpf produces errors like this when loading the file: >>>> >>>> libbpf: elf: skipping unrecognized data section(32) .eh_frame >>>> libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame >>>> >>>> It is possible to get rid of the .eh_frame section by adding >>>> -fno-asynchronous-unwind-tables to the compilation, but we have seen >>>> multiple examples of these sections appearing in BPF files in the wild, >>>> most recently in samples/bpf, fixed by: >>>> 5a0ae9872d5c ("bpf, samples: Add -fno-asynchronous-unwind-tables to BPF Clang invocation") >>>> >>>> While the errors are technically harmless, they look odd and confuse users. >>> >>> These warnings point out invalid set of compiler flags used for >>> compiling BPF object files, though. Which is a good thing and should >>> incentivize anyone getting those warnings to check and fix how they do >>> BPF compilation. Those .eh_frame sections shouldn't be present in BPF >>> object files at all, and that's what libbpf is trying to say. >> >> Apart from triggering that warning, what effect does this have, though? >> The programs seem to work just fine (as evidenced by the fact that >> samples/bpf has been built this way for years, for instance)... >> >> Also, how is a user supposed to go from that cryptic error message to >> figuring out that it has something to do with compiler flags? >> >>> I don't know exactly in which situations that .eh_frame section is >>> added, but looking at our selftests (and now samples/bpf as well), >>> where we use -target bpf, we don't need >>> -fno-asynchronous-unwind-tables at all. >> >> This seems to at least be compiler-dependent. We ran into this with >> bpftool as well (for the internal BPF programs it loads whenever it >> runs), which already had '-target bpf' in the Makefile. We're carrying >> an internal RHEL patch adding -fno-asynchronous-unwind-tables to the >> bpftool build to fix this... > > I haven't seen an instance of .eh_frame as well with -target bpf. > Do you have a reproducible test case? I would like to investigate > what is the possible cause and whether we could do something in llvm > to prevent its generatin. Thanks! We found this in the RHEL builds of bpftool. I don't think we're doing anything special, other than maybe building with a clang version that's a few versions behind: # clang --version clang version 11.0.0 (Red Hat 11.0.0-1.module+el8.4.0+8598+a071fcd5) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin So I suppose it may resolve itself once we upgrade LLVM? -Toke
On Thu, Sep 2, 2021 at 10:08 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Yonghong Song <yhs@fb.com> writes: > > > On 8/31/21 3:28 AM, Toke Høiland-Jørgensen wrote: > >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > >> > >>> On Thu, Aug 26, 2021 at 5:10 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >>>> > >>>> When .eh_frame and .rel.eh_frame sections are present in BPF object files, > >>>> libbpf produces errors like this when loading the file: > >>>> > >>>> libbpf: elf: skipping unrecognized data section(32) .eh_frame > >>>> libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame > >>>> > >>>> It is possible to get rid of the .eh_frame section by adding > >>>> -fno-asynchronous-unwind-tables to the compilation, but we have seen > >>>> multiple examples of these sections appearing in BPF files in the wild, > >>>> most recently in samples/bpf, fixed by: > >>>> 5a0ae9872d5c ("bpf, samples: Add -fno- /to BPF Clang invocation") > >>>> > >>>> While the errors are technically harmless, they look odd and confuse users. > >>> > >>> These warnings point out invalid set of compiler flags used for > >>> compiling BPF object files, though. Which is a good thing and should > >>> incentivize anyone getting those warnings to check and fix how they do > >>> BPF compilation. Those .eh_frame sections shouldn't be present in BPF > >>> object files at all, and that's what libbpf is trying to say. > >> > >> Apart from triggering that warning, what effect does this have, though? > >> The programs seem to work just fine (as evidenced by the fact that > >> samples/bpf has been built this way for years, for instance)... > >> > >> Also, how is a user supposed to go from that cryptic error message to > >> figuring out that it has something to do with compiler flags? > >> > >>> I don't know exactly in which situations that .eh_frame section is > >>> added, but looking at our selftests (and now samples/bpf as well), > >>> where we use -target bpf, we don't need > >>> -fno-asynchronous-unwind-tables at all. > >> > >> This seems to at least be compiler-dependent. We ran into this with > >> bpftool as well (for the internal BPF programs it loads whenever it > >> runs), which already had '-target bpf' in the Makefile. We're carrying > >> an internal RHEL patch adding -fno-asynchronous-unwind-tables to the > >> bpftool build to fix this... > > > > I haven't seen an instance of .eh_frame as well with -target bpf. > > Do you have a reproducible test case? I would like to investigate > > what is the possible cause and whether we could do something in llvm > > to prevent its generatin. Thanks! > > We found this in the RHEL builds of bpftool. I don't think we're doing > anything special, other than maybe building with a clang version that's > a few versions behind: > > # clang --version > clang version 11.0.0 (Red Hat 11.0.0-1.module+el8.4.0+8598+a071fcd5) > Target: x86_64-unknown-linux-gnu > Thread model: posix > InstalledDir: /usr/bin > > So I suppose it may resolve itself once we upgrade LLVM? That's odd. I don't think I've seen this issue even with clang 11 (but I built it myself). If there is a fix indeed let's backport it to llvm 11. The user experience matters. It could be llvm configuration too. I'm guessing some build flags might influence default settings for unwind tables. Yonghong, can we make bpf backend to ignore needsUnwindTableEntry ?
On 9/2/21 12:32 PM, Alexei Starovoitov wrote: > On Thu, Sep 2, 2021 at 10:08 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Yonghong Song <yhs@fb.com> writes: >> >>> On 8/31/21 3:28 AM, Toke Høiland-Jørgensen wrote: >>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: >>>> >>>>> On Thu, Aug 26, 2021 at 5:10 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>>>>> >>>>>> When .eh_frame and .rel.eh_frame sections are present in BPF object files, >>>>>> libbpf produces errors like this when loading the file: >>>>>> >>>>>> libbpf: elf: skipping unrecognized data section(32) .eh_frame >>>>>> libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame >>>>>> >>>>>> It is possible to get rid of the .eh_frame section by adding >>>>>> -fno-asynchronous-unwind-tables to the compilation, but we have seen >>>>>> multiple examples of these sections appearing in BPF files in the wild, >>>>>> most recently in samples/bpf, fixed by: >>>>>> 5a0ae9872d5c ("bpf, samples: Add -fno- > /to BPF Clang invocation") >>>>>> >>>>>> While the errors are technically harmless, they look odd and confuse users. >>>>> >>>>> These warnings point out invalid set of compiler flags used for >>>>> compiling BPF object files, though. Which is a good thing and should >>>>> incentivize anyone getting those warnings to check and fix how they do >>>>> BPF compilation. Those .eh_frame sections shouldn't be present in BPF >>>>> object files at all, and that's what libbpf is trying to say. >>>> >>>> Apart from triggering that warning, what effect does this have, though? >>>> The programs seem to work just fine (as evidenced by the fact that >>>> samples/bpf has been built this way for years, for instance)... >>>> >>>> Also, how is a user supposed to go from that cryptic error message to >>>> figuring out that it has something to do with compiler flags? >>>> >>>>> I don't know exactly in which situations that .eh_frame section is >>>>> added, but looking at our selftests (and now samples/bpf as well), >>>>> where we use -target bpf, we don't need >>>>> -fno-asynchronous-unwind-tables at all. >>>> >>>> This seems to at least be compiler-dependent. We ran into this with >>>> bpftool as well (for the internal BPF programs it loads whenever it >>>> runs), which already had '-target bpf' in the Makefile. We're carrying >>>> an internal RHEL patch adding -fno-asynchronous-unwind-tables to the >>>> bpftool build to fix this... >>> >>> I haven't seen an instance of .eh_frame as well with -target bpf. >>> Do you have a reproducible test case? I would like to investigate >>> what is the possible cause and whether we could do something in llvm >>> to prevent its generatin. Thanks! >> >> We found this in the RHEL builds of bpftool. I don't think we're doing >> anything special, other than maybe building with a clang version that's >> a few versions behind: >> >> # clang --version >> clang version 11.0.0 (Red Hat 11.0.0-1.module+el8.4.0+8598+a071fcd5) >> Target: x86_64-unknown-linux-gnu >> Thread model: posix >> InstalledDir: /usr/bin >> >> So I suppose it may resolve itself once we upgrade LLVM? > > That's odd. I don't think I've seen this issue even with clang 11 > (but I built it myself). I cannot reproduce it by self with self built llvm (11, 12, 13, 14). But I can reproduce it with an upstream built llvm12. /bin/clang \ -I. \ -I/home/yhs/work/bpf-next/tools/include/uapi/ \ -I/home/yhs/work/bpf-next/tools/lib/bpf/ \ -I/home/yhs/work/bpf-next/tools/lib \ -g -O2 -Wall -target bpf -c skeleton/pid_iter.bpf.c -o pid_iter.bpf.o && llvm-strip -g pid_iter.bpf.o GEN pid_iter.skel.h libbpf: elf: skipping unrecognized data section(11) .eh_frame libbpf: elf: skipping relo section(12) .rel.eh_frame for section(11) .eh_frame > If there is a fix indeed let's backport it to llvm 11. The user > experience matters. > It could be llvm configuration too. > I'm guessing some build flags might influence default settings > for unwind tables. > > Yonghong, can we make bpf backend to ignore needsUnwindTableEntry ? Sure. I will try to get upstream build flags, reproduce and fix it in llvm.
Yonghong Song <yhs@fb.com> writes: > On 9/2/21 12:32 PM, Alexei Starovoitov wrote: >> On Thu, Sep 2, 2021 at 10:08 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>> >>> Yonghong Song <yhs@fb.com> writes: >>> >>>> On 8/31/21 3:28 AM, Toke Høiland-Jørgensen wrote: >>>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: >>>>> >>>>>> On Thu, Aug 26, 2021 at 5:10 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>>>>>> >>>>>>> When .eh_frame and .rel.eh_frame sections are present in BPF object files, >>>>>>> libbpf produces errors like this when loading the file: >>>>>>> >>>>>>> libbpf: elf: skipping unrecognized data section(32) .eh_frame >>>>>>> libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame >>>>>>> >>>>>>> It is possible to get rid of the .eh_frame section by adding >>>>>>> -fno-asynchronous-unwind-tables to the compilation, but we have seen >>>>>>> multiple examples of these sections appearing in BPF files in the wild, >>>>>>> most recently in samples/bpf, fixed by: >>>>>>> 5a0ae9872d5c ("bpf, samples: Add -fno- >> /to BPF Clang invocation") >>>>>>> >>>>>>> While the errors are technically harmless, they look odd and confuse users. >>>>>> >>>>>> These warnings point out invalid set of compiler flags used for >>>>>> compiling BPF object files, though. Which is a good thing and should >>>>>> incentivize anyone getting those warnings to check and fix how they do >>>>>> BPF compilation. Those .eh_frame sections shouldn't be present in BPF >>>>>> object files at all, and that's what libbpf is trying to say. >>>>> >>>>> Apart from triggering that warning, what effect does this have, though? >>>>> The programs seem to work just fine (as evidenced by the fact that >>>>> samples/bpf has been built this way for years, for instance)... >>>>> >>>>> Also, how is a user supposed to go from that cryptic error message to >>>>> figuring out that it has something to do with compiler flags? >>>>> >>>>>> I don't know exactly in which situations that .eh_frame section is >>>>>> added, but looking at our selftests (and now samples/bpf as well), >>>>>> where we use -target bpf, we don't need >>>>>> -fno-asynchronous-unwind-tables at all. >>>>> >>>>> This seems to at least be compiler-dependent. We ran into this with >>>>> bpftool as well (for the internal BPF programs it loads whenever it >>>>> runs), which already had '-target bpf' in the Makefile. We're carrying >>>>> an internal RHEL patch adding -fno-asynchronous-unwind-tables to the >>>>> bpftool build to fix this... >>>> >>>> I haven't seen an instance of .eh_frame as well with -target bpf. >>>> Do you have a reproducible test case? I would like to investigate >>>> what is the possible cause and whether we could do something in llvm >>>> to prevent its generatin. Thanks! >>> >>> We found this in the RHEL builds of bpftool. I don't think we're doing >>> anything special, other than maybe building with a clang version that's >>> a few versions behind: >>> >>> # clang --version >>> clang version 11.0.0 (Red Hat 11.0.0-1.module+el8.4.0+8598+a071fcd5) >>> Target: x86_64-unknown-linux-gnu >>> Thread model: posix >>> InstalledDir: /usr/bin >>> >>> So I suppose it may resolve itself once we upgrade LLVM? >> >> That's odd. I don't think I've seen this issue even with clang 11 >> (but I built it myself). > > I cannot reproduce it by self with self built llvm (11, 12, 13, 14). > But I can reproduce it with an upstream built llvm12. > > /bin/clang \ > -I. \ > -I/home/yhs/work/bpf-next/tools/include/uapi/ \ > -I/home/yhs/work/bpf-next/tools/lib/bpf/ \ > -I/home/yhs/work/bpf-next/tools/lib \ > -g -O2 -Wall -target bpf -c skeleton/pid_iter.bpf.c -o > pid_iter.bpf.o && llvm-strip -g pid_iter.bpf.o > GEN pid_iter.skel.h > libbpf: elf: skipping unrecognized data section(11) .eh_frame > libbpf: elf: skipping relo section(12) .rel.eh_frame for section(11) > .eh_frame Ah, that's interesting! >> If there is a fix indeed let's backport it to llvm 11. The user >> experience matters. >> It could be llvm configuration too. >> I'm guessing some build flags might influence default settings >> for unwind tables. >> >> Yonghong, can we make bpf backend to ignore needsUnwindTableEntry ? > > Sure. I will try to get upstream build flags, reproduce and fix it > in llvm. Awesome, thanks for looking at this! :) -Toke
On 9/2/21 3:08 PM, Toke Høiland-Jørgensen wrote: > Yonghong Song <yhs@fb.com> writes: > >> On 9/2/21 12:32 PM, Alexei Starovoitov wrote: >>> On Thu, Sep 2, 2021 at 10:08 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>>> >>>> Yonghong Song <yhs@fb.com> writes: >>>> >>>>> On 8/31/21 3:28 AM, Toke Høiland-Jørgensen wrote: >>>>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: >>>>>> >>>>>>> On Thu, Aug 26, 2021 at 5:10 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>>>>>>> >>>>>>>> When .eh_frame and .rel.eh_frame sections are present in BPF object files, >>>>>>>> libbpf produces errors like this when loading the file: >>>>>>>> >>>>>>>> libbpf: elf: skipping unrecognized data section(32) .eh_frame >>>>>>>> libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame >>>>>>>> >>>>>>>> It is possible to get rid of the .eh_frame section by adding >>>>>>>> -fno-asynchronous-unwind-tables to the compilation, but we have seen >>>>>>>> multiple examples of these sections appearing in BPF files in the wild, >>>>>>>> most recently in samples/bpf, fixed by: >>>>>>>> 5a0ae9872d5c ("bpf, samples: Add -fno- >>> /to BPF Clang invocation") >>>>>>>> >>>>>>>> While the errors are technically harmless, they look odd and confuse users. >>>>>>> >>>>>>> These warnings point out invalid set of compiler flags used for >>>>>>> compiling BPF object files, though. Which is a good thing and should >>>>>>> incentivize anyone getting those warnings to check and fix how they do >>>>>>> BPF compilation. Those .eh_frame sections shouldn't be present in BPF >>>>>>> object files at all, and that's what libbpf is trying to say. >>>>>> >>>>>> Apart from triggering that warning, what effect does this have, though? >>>>>> The programs seem to work just fine (as evidenced by the fact that >>>>>> samples/bpf has been built this way for years, for instance)... >>>>>> >>>>>> Also, how is a user supposed to go from that cryptic error message to >>>>>> figuring out that it has something to do with compiler flags? >>>>>> >>>>>>> I don't know exactly in which situations that .eh_frame section is >>>>>>> added, but looking at our selftests (and now samples/bpf as well), >>>>>>> where we use -target bpf, we don't need >>>>>>> -fno-asynchronous-unwind-tables at all. >>>>>> >>>>>> This seems to at least be compiler-dependent. We ran into this with >>>>>> bpftool as well (for the internal BPF programs it loads whenever it >>>>>> runs), which already had '-target bpf' in the Makefile. We're carrying >>>>>> an internal RHEL patch adding -fno-asynchronous-unwind-tables to the >>>>>> bpftool build to fix this... >>>>> >>>>> I haven't seen an instance of .eh_frame as well with -target bpf. >>>>> Do you have a reproducible test case? I would like to investigate >>>>> what is the possible cause and whether we could do something in llvm >>>>> to prevent its generatin. Thanks! >>>> >>>> We found this in the RHEL builds of bpftool. I don't think we're doing >>>> anything special, other than maybe building with a clang version that's >>>> a few versions behind: >>>> >>>> # clang --version >>>> clang version 11.0.0 (Red Hat 11.0.0-1.module+el8.4.0+8598+a071fcd5) >>>> Target: x86_64-unknown-linux-gnu >>>> Thread model: posix >>>> InstalledDir: /usr/bin >>>> >>>> So I suppose it may resolve itself once we upgrade LLVM? >>> >>> That's odd. I don't think I've seen this issue even with clang 11 >>> (but I built it myself). >> >> I cannot reproduce it by self with self built llvm (11, 12, 13, 14). >> But I can reproduce it with an upstream built llvm12. >> >> /bin/clang \ >> -I. \ >> -I/home/yhs/work/bpf-next/tools/include/uapi/ \ >> -I/home/yhs/work/bpf-next/tools/lib/bpf/ \ >> -I/home/yhs/work/bpf-next/tools/lib \ >> -g -O2 -Wall -target bpf -c skeleton/pid_iter.bpf.c -o >> pid_iter.bpf.o && llvm-strip -g pid_iter.bpf.o >> GEN pid_iter.skel.h >> libbpf: elf: skipping unrecognized data section(11) .eh_frame >> libbpf: elf: skipping relo section(12) .rel.eh_frame for section(11) >> .eh_frame > > Ah, that's interesting! > >>> If there is a fix indeed let's backport it to llvm 11. The user >>> experience matters. >>> It could be llvm configuration too. >>> I'm guessing some build flags might influence default settings >>> for unwind tables. >>> >>> Yonghong, can we make bpf backend to ignore needsUnwindTableEntry ? >> >> Sure. I will try to get upstream build flags, reproduce and fix it >> in llvm. I did some investigation and this is due to centos private patch: https://git.centos.org/rpms/clang/blob/b99d8d4a38320329e10570f308c3e2d8cf295c78/f/SOURCES/0002-PATCH-clang-Make-funwind-tables-the-default-on-all-a.patch In upstream, the original llvm-project source is patched with several private patches before building the rpm. https://koji.mbox.centos.org/pkgs/packages/clang/12.0.1/1.module_el8.5.0+892+54d791e1/data/logs/x86_64/build.log The above private patch enables unwind-table (.eh_frame section) by default for ALL architectures and bpf is a victim of this. I filed a redhat bugzilla bug to fix their private patch. https://bugzilla.redhat.com/show_bug.cgi?id=2002024 Hopefully future newer compiler build won't have this issue. > > Awesome, thanks for looking at this! :) > > -Toke >
Yonghong Song <yhs@fb.com> writes: > On 9/2/21 3:08 PM, Toke Høiland-Jørgensen wrote: >> Yonghong Song <yhs@fb.com> writes: >> >>> On 9/2/21 12:32 PM, Alexei Starovoitov wrote: >>>> On Thu, Sep 2, 2021 at 10:08 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>>>> >>>>> Yonghong Song <yhs@fb.com> writes: >>>>> >>>>>> On 8/31/21 3:28 AM, Toke Høiland-Jørgensen wrote: >>>>>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: >>>>>>> >>>>>>>> On Thu, Aug 26, 2021 at 5:10 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>>>>>>>> >>>>>>>>> When .eh_frame and .rel.eh_frame sections are present in BPF object files, >>>>>>>>> libbpf produces errors like this when loading the file: >>>>>>>>> >>>>>>>>> libbpf: elf: skipping unrecognized data section(32) .eh_frame >>>>>>>>> libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame >>>>>>>>> >>>>>>>>> It is possible to get rid of the .eh_frame section by adding >>>>>>>>> -fno-asynchronous-unwind-tables to the compilation, but we have seen >>>>>>>>> multiple examples of these sections appearing in BPF files in the wild, >>>>>>>>> most recently in samples/bpf, fixed by: >>>>>>>>> 5a0ae9872d5c ("bpf, samples: Add -fno- >>>> /to BPF Clang invocation") >>>>>>>>> >>>>>>>>> While the errors are technically harmless, they look odd and confuse users. >>>>>>>> >>>>>>>> These warnings point out invalid set of compiler flags used for >>>>>>>> compiling BPF object files, though. Which is a good thing and should >>>>>>>> incentivize anyone getting those warnings to check and fix how they do >>>>>>>> BPF compilation. Those .eh_frame sections shouldn't be present in BPF >>>>>>>> object files at all, and that's what libbpf is trying to say. >>>>>>> >>>>>>> Apart from triggering that warning, what effect does this have, though? >>>>>>> The programs seem to work just fine (as evidenced by the fact that >>>>>>> samples/bpf has been built this way for years, for instance)... >>>>>>> >>>>>>> Also, how is a user supposed to go from that cryptic error message to >>>>>>> figuring out that it has something to do with compiler flags? >>>>>>> >>>>>>>> I don't know exactly in which situations that .eh_frame section is >>>>>>>> added, but looking at our selftests (and now samples/bpf as well), >>>>>>>> where we use -target bpf, we don't need >>>>>>>> -fno-asynchronous-unwind-tables at all. >>>>>>> >>>>>>> This seems to at least be compiler-dependent. We ran into this with >>>>>>> bpftool as well (for the internal BPF programs it loads whenever it >>>>>>> runs), which already had '-target bpf' in the Makefile. We're carrying >>>>>>> an internal RHEL patch adding -fno-asynchronous-unwind-tables to the >>>>>>> bpftool build to fix this... >>>>>> >>>>>> I haven't seen an instance of .eh_frame as well with -target bpf. >>>>>> Do you have a reproducible test case? I would like to investigate >>>>>> what is the possible cause and whether we could do something in llvm >>>>>> to prevent its generatin. Thanks! >>>>> >>>>> We found this in the RHEL builds of bpftool. I don't think we're doing >>>>> anything special, other than maybe building with a clang version that's >>>>> a few versions behind: >>>>> >>>>> # clang --version >>>>> clang version 11.0.0 (Red Hat 11.0.0-1.module+el8.4.0+8598+a071fcd5) >>>>> Target: x86_64-unknown-linux-gnu >>>>> Thread model: posix >>>>> InstalledDir: /usr/bin >>>>> >>>>> So I suppose it may resolve itself once we upgrade LLVM? >>>> >>>> That's odd. I don't think I've seen this issue even with clang 11 >>>> (but I built it myself). >>> >>> I cannot reproduce it by self with self built llvm (11, 12, 13, 14). >>> But I can reproduce it with an upstream built llvm12. >>> >>> /bin/clang \ >>> -I. \ >>> -I/home/yhs/work/bpf-next/tools/include/uapi/ \ >>> -I/home/yhs/work/bpf-next/tools/lib/bpf/ \ >>> -I/home/yhs/work/bpf-next/tools/lib \ >>> -g -O2 -Wall -target bpf -c skeleton/pid_iter.bpf.c -o >>> pid_iter.bpf.o && llvm-strip -g pid_iter.bpf.o >>> GEN pid_iter.skel.h >>> libbpf: elf: skipping unrecognized data section(11) .eh_frame >>> libbpf: elf: skipping relo section(12) .rel.eh_frame for section(11) >>> .eh_frame >> >> Ah, that's interesting! >> >>>> If there is a fix indeed let's backport it to llvm 11. The user >>>> experience matters. >>>> It could be llvm configuration too. >>>> I'm guessing some build flags might influence default settings >>>> for unwind tables. >>>> >>>> Yonghong, can we make bpf backend to ignore needsUnwindTableEntry ? >>> >>> Sure. I will try to get upstream build flags, reproduce and fix it >>> in llvm. > > I did some investigation and this is due to centos private patch: > https://git.centos.org/rpms/clang/blob/b99d8d4a38320329e10570f308c3e2d8cf295c78/f/SOURCES/0002-PATCH-clang-Make-funwind-tables-the-default-on-all-a.patch > > In upstream, the original llvm-project source is patched with > several private patches before building the rpm. > https://koji.mbox.centos.org/pkgs/packages/clang/12.0.1/1.module_el8.5.0+892+54d791e1/data/logs/x86_64/build.log > > The above private patch enables unwind-table (.eh_frame section) > by default for ALL architectures and bpf is a victim of this. Ah, doh! I had no idea we were doing this :/ > I filed a redhat bugzilla bug to fix their private patch. > > https://bugzilla.redhat.com/show_bug.cgi?id=2002024 > > Hopefully future newer compiler build won't have this issue. Thank you for finding the root cause of this! I'll follow up internally and make sure we get this fixed... -Toke
On 9/7/21 12:36 PM, Toke Høiland-Jørgensen wrote: > Yonghong Song <yhs@fb.com> writes: > >> On 9/2/21 3:08 PM, Toke Høiland-Jørgensen wrote: >>> Yonghong Song <yhs@fb.com> writes: >>> >>>> On 9/2/21 12:32 PM, Alexei Starovoitov wrote: >>>>> On Thu, Sep 2, 2021 at 10:08 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>>>>> >>>>>> Yonghong Song <yhs@fb.com> writes: >>>>>> >>>>>>> On 8/31/21 3:28 AM, Toke Høiland-Jørgensen wrote: >>>>>>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: >>>>>>>> >>>>>>>>> On Thu, Aug 26, 2021 at 5:10 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>> When .eh_frame and .rel.eh_frame sections are present in BPF object files, >>>>>>>>>> libbpf produces errors like this when loading the file: >>>>>>>>>> >>>>>>>>>> libbpf: elf: skipping unrecognized data section(32) .eh_frame >>>>>>>>>> libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame >>>>>>>>>> >>>>>>>>>> It is possible to get rid of the .eh_frame section by adding >>>>>>>>>> -fno-asynchronous-unwind-tables to the compilation, but we have seen >>>>>>>>>> multiple examples of these sections appearing in BPF files in the wild, >>>>>>>>>> most recently in samples/bpf, fixed by: >>>>>>>>>> 5a0ae9872d5c ("bpf, samples: Add -fno- >>>>> /to BPF Clang invocation") >>>>>>>>>> >>>>>>>>>> While the errors are technically harmless, they look odd and confuse users. >>>>>>>>> >>>>>>>>> These warnings point out invalid set of compiler flags used for >>>>>>>>> compiling BPF object files, though. Which is a good thing and should >>>>>>>>> incentivize anyone getting those warnings to check and fix how they do >>>>>>>>> BPF compilation. Those .eh_frame sections shouldn't be present in BPF >>>>>>>>> object files at all, and that's what libbpf is trying to say. >>>>>>>> >>>>>>>> Apart from triggering that warning, what effect does this have, though? >>>>>>>> The programs seem to work just fine (as evidenced by the fact that >>>>>>>> samples/bpf has been built this way for years, for instance)... >>>>>>>> >>>>>>>> Also, how is a user supposed to go from that cryptic error message to >>>>>>>> figuring out that it has something to do with compiler flags? >>>>>>>> >>>>>>>>> I don't know exactly in which situations that .eh_frame section is >>>>>>>>> added, but looking at our selftests (and now samples/bpf as well), >>>>>>>>> where we use -target bpf, we don't need >>>>>>>>> -fno-asynchronous-unwind-tables at all. >>>>>>>> >>>>>>>> This seems to at least be compiler-dependent. We ran into this with >>>>>>>> bpftool as well (for the internal BPF programs it loads whenever it >>>>>>>> runs), which already had '-target bpf' in the Makefile. We're carrying >>>>>>>> an internal RHEL patch adding -fno-asynchronous-unwind-tables to the >>>>>>>> bpftool build to fix this... >>>>>>> >>>>>>> I haven't seen an instance of .eh_frame as well with -target bpf. >>>>>>> Do you have a reproducible test case? I would like to investigate >>>>>>> what is the possible cause and whether we could do something in llvm >>>>>>> to prevent its generatin. Thanks! >>>>>> >>>>>> We found this in the RHEL builds of bpftool. I don't think we're doing >>>>>> anything special, other than maybe building with a clang version that's >>>>>> a few versions behind: >>>>>> >>>>>> # clang --version >>>>>> clang version 11.0.0 (Red Hat 11.0.0-1.module+el8.4.0+8598+a071fcd5) >>>>>> Target: x86_64-unknown-linux-gnu >>>>>> Thread model: posix >>>>>> InstalledDir: /usr/bin >>>>>> >>>>>> So I suppose it may resolve itself once we upgrade LLVM? >>>>> >>>>> That's odd. I don't think I've seen this issue even with clang 11 >>>>> (but I built it myself). >>>> >>>> I cannot reproduce it by self with self built llvm (11, 12, 13, 14). >>>> But I can reproduce it with an upstream built llvm12. >>>> >>>> /bin/clang \ >>>> -I. \ >>>> -I/home/yhs/work/bpf-next/tools/include/uapi/ \ >>>> -I/home/yhs/work/bpf-next/tools/lib/bpf/ \ >>>> -I/home/yhs/work/bpf-next/tools/lib \ >>>> -g -O2 -Wall -target bpf -c skeleton/pid_iter.bpf.c -o >>>> pid_iter.bpf.o && llvm-strip -g pid_iter.bpf.o >>>> GEN pid_iter.skel.h >>>> libbpf: elf: skipping unrecognized data section(11) .eh_frame >>>> libbpf: elf: skipping relo section(12) .rel.eh_frame for section(11) >>>> .eh_frame >>> >>> Ah, that's interesting! >>> >>>>> If there is a fix indeed let's backport it to llvm 11. The user >>>>> experience matters. >>>>> It could be llvm configuration too. >>>>> I'm guessing some build flags might influence default settings >>>>> for unwind tables. >>>>> >>>>> Yonghong, can we make bpf backend to ignore needsUnwindTableEntry ? >>>> >>>> Sure. I will try to get upstream build flags, reproduce and fix it >>>> in llvm. >> >> I did some investigation and this is due to centos private patch: >> https://git.centos.org/rpms/clang/blob/b99d8d4a38320329e10570f308c3e2d8cf295c78/f/SOURCES/0002-PATCH-clang-Make-funwind-tables-the-default-on-all-a.patch >> >> In upstream, the original llvm-project source is patched with >> several private patches before building the rpm. >> https://koji.mbox.centos.org/pkgs/packages/clang/12.0.1/1.module_el8.5.0+892+54d791e1/data/logs/x86_64/build.log >> >> The above private patch enables unwind-table (.eh_frame section) >> by default for ALL architectures and bpf is a victim of this. > > Ah, doh! I had no idea we were doing this :/ > >> I filed a redhat bugzilla bug to fix their private patch. >> >> https://bugzilla.redhat.com/show_bug.cgi?id=2002024 >> >> Hopefully future newer compiler build won't have this issue. > > Thank you for finding the root cause of this! I'll follow up internally > and make sure we get this fixed... Thanks! Hopefully this can be resolved soon. > > -Toke >
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 88d8825fc6f6..b1dc97b95965 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -2909,7 +2909,8 @@ static Elf_Data *elf_sec_data(const struct bpf_object *obj, Elf_Scn *scn) static bool is_sec_name_dwarf(const char *name) { /* approximation, but the actual list is too long */ - return strncmp(name, ".debug_", sizeof(".debug_") - 1) == 0; + return (strncmp(name, ".debug_", sizeof(".debug_") - 1) == 0 || + strncmp(name, ".eh_frame", sizeof(".eh_frame") - 1) == 0); } static bool ignore_elf_section(GElf_Shdr *hdr, const char *name)
When .eh_frame and .rel.eh_frame sections are present in BPF object files, libbpf produces errors like this when loading the file: libbpf: elf: skipping unrecognized data section(32) .eh_frame libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame It is possible to get rid of the .eh_frame section by adding -fno-asynchronous-unwind-tables to the compilation, but we have seen multiple examples of these sections appearing in BPF files in the wild, most recently in samples/bpf, fixed by: 5a0ae9872d5c ("bpf, samples: Add -fno-asynchronous-unwind-tables to BPF Clang invocation") While the errors are technically harmless, they look odd and confuse users. So let's make libbpf filter out those sections, by adding .eh_frame to the filter check in is_sec_name_dwarf(). v2: - Expand explanation in the commit message Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> --- tools/lib/bpf/libbpf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)