Message ID | CAEf4BzZWabv_hExaANQyQ71L2JHYqXaT4hFj52w-poWoVYWKqQ@mail.gmail.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | Per-CPU variables in modules and pahole | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Wed, Dec 09, 2020 at 12:53:44PM -0800, Andrii Nakryiko wrote: > Hi, > > I'm working on supporting per-CPU symbols in BPF/libbpf, and the > prerequisite for that is BTF data for .data..percpu data section and > variables inside that. > > Turns out, pahole doesn't currently emit any BTF information for such > variables in kernel modules. And the reason why is quite confusing and > I can't figure it out myself, so was hoping someone else might be able > to help. > > To repro, you can take latest bpf-next tree and add this to > bpf_testmod/bpf_testmod.c inside selftests/bpf: > > $ git diff bpf_testmod/bpf_testmod.c > diff --git > a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > index 2df19d73ca49..b2086b798019 100644 > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > @@ -3,6 +3,7 @@ > #include <linux/error-injection.h> > #include <linux/init.h> > #include <linux/module.h> > +#include <linux/percpu-defs.h> > #include <linux/sysfs.h> > #include <linux/tracepoint.h> > #include "bpf_testmod.h" > @@ -10,6 +11,10 @@ > #define CREATE_TRACE_POINTS > #include "bpf_testmod-events.h" > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1; > +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123; > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1; > + > noinline ssize_t > bpf_testmod_test_read(struct file *file, struct kobject *kobj, > struct bin_attribute *bin_attr, > > 1. So the very first issue (that I'm going to ignore for now) is that > if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and > would be ignored by the current pahole logic. So we need to fix that > for modules. Adding dummy1 and dummy2 takes care of this for now, > bpf_testmod_ksym_percpu has offset 4. I removed that addr zero check in the modules changes but when collecting functions, but it's still there in collect_percpu_var > > 2. Second issue is more interesting. Somehow, when pahole iterates > over DWARF variables, the address of bpf_testmod_ksym_percpu is > reported as 0x10e74, not 4. Which totally confuses pahole because > according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4. > I tracked this down to dwarf_getlocation() returning 10e74 as number > field in expr. in which place do you see that address? when I put displayed address from collect_percpu_var it shows 4 not sure this is related but looks like similar issue I had to solve for modules functions, as described in the changelog: (not merged yet) btf_encoder: Detect kernel module ftrace addresses ... There's one tricky point with kernel modules wrt Elf object, which we get from dwfl_module_getelf function. This function performs all possible relocations, including __mcount_loc section. So addrs array contains relocated values, which we need take into account when we compare them to functions values which are relative to their sections. ... The 0x10e74 value could be relocated 4.. but it's me guessing, because not sure where you see that address exactly jirka
On Thu, Dec 10, 2020 at 8:43 AM Jiri Olsa <jolsa@redhat.com> wrote: > > On Wed, Dec 09, 2020 at 12:53:44PM -0800, Andrii Nakryiko wrote: > > Hi, > > > > I'm working on supporting per-CPU symbols in BPF/libbpf, and the > > prerequisite for that is BTF data for .data..percpu data section and > > variables inside that. > > > > Turns out, pahole doesn't currently emit any BTF information for such > > variables in kernel modules. And the reason why is quite confusing and > > I can't figure it out myself, so was hoping someone else might be able > > to help. > > > > To repro, you can take latest bpf-next tree and add this to > > bpf_testmod/bpf_testmod.c inside selftests/bpf: > > > > $ git diff bpf_testmod/bpf_testmod.c > > diff --git > > a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > index 2df19d73ca49..b2086b798019 100644 > > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > @@ -3,6 +3,7 @@ > > #include <linux/error-injection.h> > > #include <linux/init.h> > > #include <linux/module.h> > > +#include <linux/percpu-defs.h> > > #include <linux/sysfs.h> > > #include <linux/tracepoint.h> > > #include "bpf_testmod.h" > > @@ -10,6 +11,10 @@ > > #define CREATE_TRACE_POINTS > > #include "bpf_testmod-events.h" > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1; > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123; > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1; > > + > > noinline ssize_t > > bpf_testmod_test_read(struct file *file, struct kobject *kobj, > > struct bin_attribute *bin_attr, > > > > 1. So the very first issue (that I'm going to ignore for now) is that > > if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and > > would be ignored by the current pahole logic. So we need to fix that > > for modules. Adding dummy1 and dummy2 takes care of this for now, > > bpf_testmod_ksym_percpu has offset 4. > > I removed that addr zero check in the modules changes but when > collecting functions, but it's still there in collect_percpu_var Hao had some reason to skip per-cpu variables with offset 0, maybe he can comment on that before we change it. > > > > > 2. Second issue is more interesting. Somehow, when pahole iterates > > over DWARF variables, the address of bpf_testmod_ksym_percpu is > > reported as 0x10e74, not 4. Which totally confuses pahole because > > according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4. > > I tracked this down to dwarf_getlocation() returning 10e74 as number > > field in expr. > > in which place do you see that address? when I put displayed > address from collect_percpu_var it shows 4 yes, ELF symbol's value is 4, but when iterating DWARF variables (0x10e70 + 4) is returned. It does look like a special handling of modules. I missed that libdw does some special things for specifically modules. Further debugging yesterday showed that 0x10e70 roughly corresponds to the offset of .data..per_cpu if you count all the allocatable data sections that come before it. So I think you are right. We should probably centralize the logic of kernel module detection so that we can handle these module vs non-module differences properly. > > not sure this is related but looks like similar issue I had to > solve for modules functions, as described in the changelog: > (not merged yet) > > btf_encoder: Detect kernel module ftrace addresses > > ... > There's one tricky point with kernel modules wrt Elf object, > which we get from dwfl_module_getelf function. This function > performs all possible relocations, including __mcount_loc > section. > > So addrs array contains relocated values, which we need take > into account when we compare them to functions values which > are relative to their sections. > ... > > The 0x10e74 value could be relocated 4.. but it's me guessing, > because not sure where you see that address exactly It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be. > > jirka >
On Thu, Dec 10, 2020 at 9:02 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Dec 10, 2020 at 8:43 AM Jiri Olsa <jolsa@redhat.com> wrote: > > > > On Wed, Dec 09, 2020 at 12:53:44PM -0800, Andrii Nakryiko wrote: > > > Hi, > > > > > > I'm working on supporting per-CPU symbols in BPF/libbpf, and the > > > prerequisite for that is BTF data for .data..percpu data section and > > > variables inside that. > > > > > > Turns out, pahole doesn't currently emit any BTF information for such > > > variables in kernel modules. And the reason why is quite confusing and > > > I can't figure it out myself, so was hoping someone else might be able > > > to help. > > > > > > To repro, you can take latest bpf-next tree and add this to > > > bpf_testmod/bpf_testmod.c inside selftests/bpf: > > > > > > $ git diff bpf_testmod/bpf_testmod.c > > > diff --git > > > a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > index 2df19d73ca49..b2086b798019 100644 > > > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > @@ -3,6 +3,7 @@ > > > #include <linux/error-injection.h> > > > #include <linux/init.h> > > > #include <linux/module.h> > > > +#include <linux/percpu-defs.h> > > > #include <linux/sysfs.h> > > > #include <linux/tracepoint.h> > > > #include "bpf_testmod.h" > > > @@ -10,6 +11,10 @@ > > > #define CREATE_TRACE_POINTS > > > #include "bpf_testmod-events.h" > > > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1; > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123; > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1; > > > + > > > noinline ssize_t > > > bpf_testmod_test_read(struct file *file, struct kobject *kobj, > > > struct bin_attribute *bin_attr, > > > > > > 1. So the very first issue (that I'm going to ignore for now) is that > > > if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and > > > would be ignored by the current pahole logic. So we need to fix that > > > for modules. Adding dummy1 and dummy2 takes care of this for now, > > > bpf_testmod_ksym_percpu has offset 4. > > > > I removed that addr zero check in the modules changes but when > > collecting functions, but it's still there in collect_percpu_var > > Hao had some reason to skip per-cpu variables with offset 0, maybe he > can comment on that before we change it. > When I initially write that check, I see there are multiple symbols of the same name that associate with a single variable, but there is only one that has a non-zero address. Besides, there are symbols that don't associate to any variable and they have zero address. For example, those defined as __ADDRESSABLE(sym) and __UNIQUE_ID(prefix). They are quite a lot, I remember. So I filtered out the zero address for the purpose of accelerating encoding. I noticed that on x86_64, the first page of the percpu section is reserved, so I deem those symbols that are of normal interest should have positive addresses. > > > > > > > > > 2. Second issue is more interesting. Somehow, when pahole iterates > > > over DWARF variables, the address of bpf_testmod_ksym_percpu is > > > reported as 0x10e74, not 4. Which totally confuses pahole because > > > according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4. > > > I tracked this down to dwarf_getlocation() returning 10e74 as number > > > field in expr. > > > > in which place do you see that address? when I put displayed > > address from collect_percpu_var it shows 4 > > yes, ELF symbol's value is 4, but when iterating DWARF variables > (0x10e70 + 4) is returned. It does look like a special handling of > modules. I missed that libdw does some special things for specifically > modules. Further debugging yesterday showed that 0x10e70 roughly > corresponds to the offset of .data..per_cpu if you count all the > allocatable data sections that come before it. So I think you are > right. We should probably centralize the logic of kernel module > detection so that we can handle these module vs non-module differences > properly. > > > > > not sure this is related but looks like similar issue I had to > > solve for modules functions, as described in the changelog: > > (not merged yet) > > > > btf_encoder: Detect kernel module ftrace addresses > > > > ... > > There's one tricky point with kernel modules wrt Elf object, > > which we get from dwfl_module_getelf function. This function > > performs all possible relocations, including __mcount_loc > > section. > > > > So addrs array contains relocated values, which we need take > > into account when we compare them to functions values which > > are relative to their sections. > > ... > > > > The 0x10e74 value could be relocated 4.. but it's me guessing, > > because not sure where you see that address exactly > > > It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be. > > > > > jirka > >
On Thu, Dec 10, 2020 at 09:02:05AM -0800, Andrii Nakryiko wrote: SNIP > > yes, ELF symbol's value is 4, but when iterating DWARF variables > (0x10e70 + 4) is returned. It does look like a special handling of > modules. I missed that libdw does some special things for specifically > modules. Further debugging yesterday showed that 0x10e70 roughly > corresponds to the offset of .data..per_cpu if you count all the > allocatable data sections that come before it. So I think you are > right. We should probably centralize the logic of kernel module > detection so that we can handle these module vs non-module differences > properly. > > > > > not sure this is related but looks like similar issue I had to > > solve for modules functions, as described in the changelog: > > (not merged yet) > > > > btf_encoder: Detect kernel module ftrace addresses > > > > ... > > There's one tricky point with kernel modules wrt Elf object, > > which we get from dwfl_module_getelf function. This function > > performs all possible relocations, including __mcount_loc > > section. > > > > So addrs array contains relocated values, which we need take > > into account when we compare them to functions values which > > are relative to their sections. > > ... > > > > The 0x10e74 value could be relocated 4.. but it's me guessing, > > because not sure where you see that address exactly > > > It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be. I'm taking section sh_addr for each function and relocate the addr value for kernel modules, check setup_functions function I don't see this being somehow centralized, looks simple enough to me for each case jirka
On Thu, Dec 10, 2020 at 3:42 PM Jiri Olsa <jolsa@redhat.com> wrote: > > On Thu, Dec 10, 2020 at 09:02:05AM -0800, Andrii Nakryiko wrote: > > SNIP > > > > > yes, ELF symbol's value is 4, but when iterating DWARF variables > > (0x10e70 + 4) is returned. It does look like a special handling of > > modules. I missed that libdw does some special things for specifically > > modules. Further debugging yesterday showed that 0x10e70 roughly > > corresponds to the offset of .data..per_cpu if you count all the > > allocatable data sections that come before it. So I think you are > > right. We should probably centralize the logic of kernel module > > detection so that we can handle these module vs non-module differences > > properly. > > > > > > > > not sure this is related but looks like similar issue I had to > > > solve for modules functions, as described in the changelog: > > > (not merged yet) > > > > > > btf_encoder: Detect kernel module ftrace addresses > > > > > > ... > > > There's one tricky point with kernel modules wrt Elf object, > > > which we get from dwfl_module_getelf function. This function > > > performs all possible relocations, including __mcount_loc > > > section. > > > > > > So addrs array contains relocated values, which we need take > > > into account when we compare them to functions values which > > > are relative to their sections. > > > ... > > > > > > The 0x10e74 value could be relocated 4.. but it's me guessing, > > > because not sure where you see that address exactly > > > > > > It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be. > > I'm taking section sh_addr for each function and relocate > the addr value for kernel modules, check setup_functions > function > > I don't see this being somehow centralized, looks simple > enough to me for each case I meant centralized detection of whether we are working with the module or vmlinux or something else. setup_functions() currently has very specific heuristic for that. So I'd like to extract that or come up with some other way that won't be so function specific (__start_mcount_loc symbol vs __mcount_loc section). > > jirka >
On Thu, Dec 10, 2020 at 10:29 AM Hao Luo <haoluo@google.com> wrote: > > On Thu, Dec 10, 2020 at 9:02 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Thu, Dec 10, 2020 at 8:43 AM Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > On Wed, Dec 09, 2020 at 12:53:44PM -0800, Andrii Nakryiko wrote: > > > > Hi, > > > > > > > > I'm working on supporting per-CPU symbols in BPF/libbpf, and the > > > > prerequisite for that is BTF data for .data..percpu data section and > > > > variables inside that. > > > > > > > > Turns out, pahole doesn't currently emit any BTF information for such > > > > variables in kernel modules. And the reason why is quite confusing and > > > > I can't figure it out myself, so was hoping someone else might be able > > > > to help. > > > > > > > > To repro, you can take latest bpf-next tree and add this to > > > > bpf_testmod/bpf_testmod.c inside selftests/bpf: > > > > > > > > $ git diff bpf_testmod/bpf_testmod.c > > > > diff --git > > > > a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > > index 2df19d73ca49..b2086b798019 100644 > > > > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > > @@ -3,6 +3,7 @@ > > > > #include <linux/error-injection.h> > > > > #include <linux/init.h> > > > > #include <linux/module.h> > > > > +#include <linux/percpu-defs.h> > > > > #include <linux/sysfs.h> > > > > #include <linux/tracepoint.h> > > > > #include "bpf_testmod.h" > > > > @@ -10,6 +11,10 @@ > > > > #define CREATE_TRACE_POINTS > > > > #include "bpf_testmod-events.h" > > > > > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1; > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123; > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1; > > > > + > > > > noinline ssize_t > > > > bpf_testmod_test_read(struct file *file, struct kobject *kobj, > > > > struct bin_attribute *bin_attr, > > > > > > > > 1. So the very first issue (that I'm going to ignore for now) is that > > > > if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and > > > > would be ignored by the current pahole logic. So we need to fix that > > > > for modules. Adding dummy1 and dummy2 takes care of this for now, > > > > bpf_testmod_ksym_percpu has offset 4. > > > > > > I removed that addr zero check in the modules changes but when > > > collecting functions, but it's still there in collect_percpu_var > > > > Hao had some reason to skip per-cpu variables with offset 0, maybe he > > can comment on that before we change it. > > > > When I initially write that check, I see there are multiple symbols of > the same name that associate with a single variable, but there is only > one that has a non-zero address. Besides, there are symbols that don't > associate to any variable and they have zero address. For example, > those defined as __ADDRESSABLE(sym) and __UNIQUE_ID(prefix). They are > quite a lot, I remember. So I filtered out the zero address for the > purpose of accelerating encoding. I noticed that on x86_64, the first > page of the percpu section is reserved, so I deem those symbols that > are of normal interest should have positive addresses. So I just checked my local vmlinux image, and seems like the only one with addr == 0 is fixed_percpu_data. Everything else that's detected as belonging to .data..percpu section looks sane and has non-zero offset. So I think this might have been the case before we switched to using ELF symbols and now it's not? I think I'll just drop this check, will post the patch, and would really appreciate if you can test it in your environment. Does that sound ok? > > > > > > > > > > > > > > 2. Second issue is more interesting. Somehow, when pahole iterates > > > > over DWARF variables, the address of bpf_testmod_ksym_percpu is > > > > reported as 0x10e74, not 4. Which totally confuses pahole because > > > > according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4. > > > > I tracked this down to dwarf_getlocation() returning 10e74 as number > > > > field in expr. > > > > > > in which place do you see that address? when I put displayed > > > address from collect_percpu_var it shows 4 > > > > yes, ELF symbol's value is 4, but when iterating DWARF variables > > (0x10e70 + 4) is returned. It does look like a special handling of > > modules. I missed that libdw does some special things for specifically > > modules. Further debugging yesterday showed that 0x10e70 roughly > > corresponds to the offset of .data..per_cpu if you count all the > > allocatable data sections that come before it. So I think you are > > right. We should probably centralize the logic of kernel module > > detection so that we can handle these module vs non-module differences > > properly. > > > > > > > > not sure this is related but looks like similar issue I had to > > > solve for modules functions, as described in the changelog: > > > (not merged yet) > > > > > > btf_encoder: Detect kernel module ftrace addresses > > > > > > ... > > > There's one tricky point with kernel modules wrt Elf object, > > > which we get from dwfl_module_getelf function. This function > > > performs all possible relocations, including __mcount_loc > > > section. > > > > > > So addrs array contains relocated values, which we need take > > > into account when we compare them to functions values which > > > are relative to their sections. > > > ... > > > > > > The 0x10e74 value could be relocated 4.. but it's me guessing, > > > because not sure where you see that address exactly > > > > > > It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be. > > > > > > > > jirka > > >
On Thu, Dec 10, 2020 at 3:49 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Dec 10, 2020 at 3:42 PM Jiri Olsa <jolsa@redhat.com> wrote: > > > > On Thu, Dec 10, 2020 at 09:02:05AM -0800, Andrii Nakryiko wrote: > > > > SNIP > > > > > > > > yes, ELF symbol's value is 4, but when iterating DWARF variables > > > (0x10e70 + 4) is returned. It does look like a special handling of > > > modules. I missed that libdw does some special things for specifically > > > modules. Further debugging yesterday showed that 0x10e70 roughly > > > corresponds to the offset of .data..per_cpu if you count all the > > > allocatable data sections that come before it. So I think you are > > > right. We should probably centralize the logic of kernel module > > > detection so that we can handle these module vs non-module differences > > > properly. > > > > > > > > > > > not sure this is related but looks like similar issue I had to > > > > solve for modules functions, as described in the changelog: > > > > (not merged yet) > > > > > > > > btf_encoder: Detect kernel module ftrace addresses > > > > > > > > ... > > > > There's one tricky point with kernel modules wrt Elf object, > > > > which we get from dwfl_module_getelf function. This function > > > > performs all possible relocations, including __mcount_loc > > > > section. > > > > > > > > So addrs array contains relocated values, which we need take > > > > into account when we compare them to functions values which > > > > are relative to their sections. > > > > ... > > > > > > > > The 0x10e74 value could be relocated 4.. but it's me guessing, > > > > because not sure where you see that address exactly > > > > > > > > > It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be. > > > > I'm taking section sh_addr for each function and relocate > > the addr value for kernel modules, check setup_functions > > function > > > > I don't see this being somehow centralized, looks simple > > enough to me for each case > > I meant centralized detection of whether we are working with the > module or vmlinux or something else. setup_functions() currently has > very specific heuristic for that. So I'd like to extract that or come > up with some other way that won't be so function specific > (__start_mcount_loc symbol vs __mcount_loc section). > This seems to be unnecessary, actually. We already record btfe->percpu_base_addr, which for vmlinux is always zero, while for module non-zero. So just subtracting this base addr before looking up ELF symbol solves the problem for me and still works for vmlinux. So I'm going with that for now. > > > > jirka > >
On Thu, Dec 10, 2020 at 6:56 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Dec 10, 2020 at 10:29 AM Hao Luo <haoluo@google.com> wrote: > > > > On Thu, Dec 10, 2020 at 9:02 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Thu, Dec 10, 2020 at 8:43 AM Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > > > On Wed, Dec 09, 2020 at 12:53:44PM -0800, Andrii Nakryiko wrote: > > > > > Hi, > > > > > > > > > > I'm working on supporting per-CPU symbols in BPF/libbpf, and the > > > > > prerequisite for that is BTF data for .data..percpu data section and > > > > > variables inside that. > > > > > > > > > > Turns out, pahole doesn't currently emit any BTF information for such > > > > > variables in kernel modules. And the reason why is quite confusing and > > > > > I can't figure it out myself, so was hoping someone else might be able > > > > > to help. > > > > > > > > > > To repro, you can take latest bpf-next tree and add this to > > > > > bpf_testmod/bpf_testmod.c inside selftests/bpf: > > > > > > > > > > $ git diff bpf_testmod/bpf_testmod.c > > > > > diff --git > > > > > a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > > > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > > > index 2df19d73ca49..b2086b798019 100644 > > > > > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > > > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > > > > > @@ -3,6 +3,7 @@ > > > > > #include <linux/error-injection.h> > > > > > #include <linux/init.h> > > > > > #include <linux/module.h> > > > > > +#include <linux/percpu-defs.h> > > > > > #include <linux/sysfs.h> > > > > > #include <linux/tracepoint.h> > > > > > #include "bpf_testmod.h" > > > > > @@ -10,6 +11,10 @@ > > > > > #define CREATE_TRACE_POINTS > > > > > #include "bpf_testmod-events.h" > > > > > > > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1; > > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123; > > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1; > > > > > + > > > > > noinline ssize_t > > > > > bpf_testmod_test_read(struct file *file, struct kobject *kobj, > > > > > struct bin_attribute *bin_attr, > > > > > > > > > > 1. So the very first issue (that I'm going to ignore for now) is that > > > > > if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and > > > > > would be ignored by the current pahole logic. So we need to fix that > > > > > for modules. Adding dummy1 and dummy2 takes care of this for now, > > > > > bpf_testmod_ksym_percpu has offset 4. > > > > > > > > I removed that addr zero check in the modules changes but when > > > > collecting functions, but it's still there in collect_percpu_var > > > > > > Hao had some reason to skip per-cpu variables with offset 0, maybe he > > > can comment on that before we change it. > > > > > > > When I initially write that check, I see there are multiple symbols of > > the same name that associate with a single variable, but there is only > > one that has a non-zero address. Besides, there are symbols that don't > > associate to any variable and they have zero address. For example, > > those defined as __ADDRESSABLE(sym) and __UNIQUE_ID(prefix). They are > > quite a lot, I remember. So I filtered out the zero address for the > > purpose of accelerating encoding. I noticed that on x86_64, the first > > page of the percpu section is reserved, so I deem those symbols that > > are of normal interest should have positive addresses. > > So I just checked my local vmlinux image, and seems like the only one > with addr == 0 is fixed_percpu_data. Everything else that's detected > as belonging to .data..percpu section looks sane and has non-zero > offset. > > So I think this might have been the case before we switched to using > ELF symbols and now it's not? I think I'll just drop this check, will > post the patch, and would really appreciate if you can test it in your > environment. Does that sound ok? Ah, never mind. While ELF symbols look good, it's the DWARF variables side where the problem is. There are lots of DWARF variables that map to addr 0 and which are impossible to distinguish from readl fixed_percpu_data, because we can't even rely on getting DWARF variable name. I guess I'll leave it as is for now, but we should come up with some solution, ideally. > > > > > > > > > > > > > > > > > > > > 2. Second issue is more interesting. Somehow, when pahole iterates > > > > > over DWARF variables, the address of bpf_testmod_ksym_percpu is > > > > > reported as 0x10e74, not 4. Which totally confuses pahole because > > > > > according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4. > > > > > I tracked this down to dwarf_getlocation() returning 10e74 as number > > > > > field in expr. > > > > > > > > in which place do you see that address? when I put displayed > > > > address from collect_percpu_var it shows 4 > > > > > > yes, ELF symbol's value is 4, but when iterating DWARF variables > > > (0x10e70 + 4) is returned. It does look like a special handling of > > > modules. I missed that libdw does some special things for specifically > > > modules. Further debugging yesterday showed that 0x10e70 roughly > > > corresponds to the offset of .data..per_cpu if you count all the > > > allocatable data sections that come before it. So I think you are > > > right. We should probably centralize the logic of kernel module > > > detection so that we can handle these module vs non-module differences > > > properly. > > > > > > > > > > > not sure this is related but looks like similar issue I had to > > > > solve for modules functions, as described in the changelog: > > > > (not merged yet) > > > > > > > > btf_encoder: Detect kernel module ftrace addresses > > > > > > > > ... > > > > There's one tricky point with kernel modules wrt Elf object, > > > > which we get from dwfl_module_getelf function. This function > > > > performs all possible relocations, including __mcount_loc > > > > section. > > > > > > > > So addrs array contains relocated values, which we need take > > > > into account when we compare them to functions values which > > > > are relative to their sections. > > > > ... > > > > > > > > The 0x10e74 value could be relocated 4.. but it's me guessing, > > > > because not sure where you see that address exactly > > > > > > > > > It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be. > > > > > > > > > > > jirka > > > >
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -3,6 +3,7 @@ #include <linux/error-injection.h> #include <linux/init.h> #include <linux/module.h> +#include <linux/percpu-defs.h> #include <linux/sysfs.h> #include <linux/tracepoint.h> #include "bpf_testmod.h" @@ -10,6 +11,10 @@ #define CREATE_TRACE_POINTS #include "bpf_testmod-events.h" +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1; +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123; +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1; + noinline ssize_t bpf_testmod_test_read(struct file *file, struct kobject *kobj, struct bin_attribute *bin_attr,