Message ID | 20210204152957.1288448-1-arnd@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kallsyms: fix nonconverging kallsyms table with lld | expand |
On Fri, Feb 5, 2021 at 12:30 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > ARM randconfig builds with lld sometimes show a build failure > from kallsyms: > > Inconsistent kallsyms data > Try make KALLSYMS_EXTRA_PASS=1 as a workaround > > The problem is the veneers/thunks getting added by the linker extend > the symbol table, which in turn leads to more veneers being needed, > so it may take a few extra iterations to converge. > > This bug has been fixed multiple times before, but comes back every time > a new symbol name is used. lld uses a different set of idenitifiers from > ld.bfd, so the additional ones need to be added as well. Yes, this is a whack-a-mole. I fixed the typo "idenitifiers" -> "identifiers" and applied to linux-kbuild. Thanks. > I looked through the sources and found that arm64 and mips define similar > prefixes, so I'm adding those as well, aside from the ones I observed. I'm > not sure about powerpc64, which seems to already be handled through a > section match, but if it comes back, the "__long_branch_" and "__plt_" > prefixes would have to get added as well. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > scripts/kallsyms.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c > index 7ecd2ccba531..54ad86d13784 100644 > --- a/scripts/kallsyms.c > +++ b/scripts/kallsyms.c > @@ -112,6 +112,12 @@ static bool is_ignored_symbol(const char *name, char type) > "__crc_", /* modversions */ > "__efistub_", /* arm64 EFI stub namespace */ > "__kvm_nvhe_", /* arm64 non-VHE KVM namespace */ > + "__AArch64ADRPThunk_", /* arm64 lld */ > + "__ARMV5PILongThunk_", /* arm lld */ > + "__ARMV7PILongThunk_", > + "__ThumbV7PILongThunk_", > + "__LA25Thunk_", /* mips lld */ > + "__microLA25Thunk_", > NULL > }; > > -- > 2.29.2 >
Hi Arnd, On Thu, Feb 04, 2021 at 04:29:47PM +0100, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > ARM randconfig builds with lld sometimes show a build failure > from kallsyms: > > Inconsistent kallsyms data > Try make KALLSYMS_EXTRA_PASS=1 as a workaround > > The problem is the veneers/thunks getting added by the linker extend > the symbol table, which in turn leads to more veneers being needed, > so it may take a few extra iterations to converge. > > This bug has been fixed multiple times before, but comes back every time > a new symbol name is used. lld uses a different set of idenitifiers from > ld.bfd, so the additional ones need to be added as well. > > I looked through the sources and found that arm64 and mips define similar > prefixes, so I'm adding those as well, aside from the ones I observed. I'm > not sure about powerpc64, which seems to already be handled through a > section match, but if it comes back, the "__long_branch_" and "__plt_" > prefixes would have to get added as well. > This is such a whack-a-mole. The problem is hitting us yet again. I suspect it may be due to a new version of lld using new symbols, but I didn't really try to track it down. Is there an easy way to search for missed symbols ? In this context .. is there a chance to apply [1] after all ? This is getting really time consuming and annoying, and I really dislike having to fix the same problem over and over again. Thanks, Guenter --- [1] https://patchwork.kernel.org/project/linux-kbuild/patch/20200910153204.156871-1-linux@roeck-us.net/
On Wed, Jun 9, 2021 at 1:05 PM Guenter Roeck <linux@roeck-us.net> wrote: > On Thu, Feb 04, 2021 at 04:29:47PM +0100, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > ARM randconfig builds with lld sometimes show a build failure > > from kallsyms: > > > > Inconsistent kallsyms data > > Try make KALLSYMS_EXTRA_PASS=1 as a workaround > > > > The problem is the veneers/thunks getting added by the linker extend > > the symbol table, which in turn leads to more veneers being needed, > > so it may take a few extra iterations to converge. > > > > This bug has been fixed multiple times before, but comes back every time > > a new symbol name is used. lld uses a different set of idenitifiers from > > ld.bfd, so the additional ones need to be added as well. > > > > I looked through the sources and found that arm64 and mips define similar > > prefixes, so I'm adding those as well, aside from the ones I observed. I'm > > not sure about powerpc64, which seems to already be handled through a > > section match, but if it comes back, the "__long_branch_" and "__plt_" > > prefixes would have to get added as well. > > > > This is such a whack-a-mole. The problem is hitting us yet again. I suspect > it may be due to a new version of lld using new symbols, but I didn't really > try to track it down. Is there an easy way to search for missed symbols ? The way I did it previously was to hack Kbuild to not remove the temporary files after a failure, and then compare the "objdump --syms" output of the last two stages. I suppose we could improve the situation if scripts/link-vmlinux.sh was able to do that automatically, and compare the kallsyms output .S file between steps 1 and 2. Arnd
On Wed, Jun 09, 2021 at 01:24:18PM +0200, Arnd Bergmann wrote: > On Wed, Jun 9, 2021 at 1:05 PM Guenter Roeck <linux@roeck-us.net> wrote: > > On Thu, Feb 04, 2021 at 04:29:47PM +0100, Arnd Bergmann wrote: > > > From: Arnd Bergmann <arnd@arndb.de> > > > > > > ARM randconfig builds with lld sometimes show a build failure > > > from kallsyms: > > > > > > Inconsistent kallsyms data > > > Try make KALLSYMS_EXTRA_PASS=1 as a workaround > > > > > > The problem is the veneers/thunks getting added by the linker extend > > > the symbol table, which in turn leads to more veneers being needed, > > > so it may take a few extra iterations to converge. > > > > > > This bug has been fixed multiple times before, but comes back every time > > > a new symbol name is used. lld uses a different set of idenitifiers from > > > ld.bfd, so the additional ones need to be added as well. > > > > > > I looked through the sources and found that arm64 and mips define similar > > > prefixes, so I'm adding those as well, aside from the ones I observed. I'm > > > not sure about powerpc64, which seems to already be handled through a > > > section match, but if it comes back, the "__long_branch_" and "__plt_" > > > prefixes would have to get added as well. > > > > > > > This is such a whack-a-mole. The problem is hitting us yet again. I suspect > > it may be due to a new version of lld using new symbols, but I didn't really > > try to track it down. Is there an easy way to search for missed symbols ? > > The way I did it previously was to hack Kbuild to not remove the temporary > files after a failure, and then compare the "objdump --syms" output of the > last two stages. Problem with that is that we have a non-deterministic problem: The build fails for us on some build servers, but we are unable to reproduce the problem when building the same image manually on a development server. That is similar to what I had observed before, where powerpc builds would pass on one server, but the same kernel with the same configuration would fail to build on a second almost identical server. It would really be great if we can find a better solution. > > I suppose we could improve the situation if scripts/link-vmlinux.sh was able > to do that automatically, and compare the kallsyms output .S file between > steps 1 and 2. Comparing the .S files doesn't result in useful data; turns out there are always irrelevant differences. We'll try to run a diff on the output of "objdump --syms". Hopefully that will generate something useful. Thanks, Guenter
On Wed, Jun 9, 2021 at 5:16 PM Guenter Roeck <linux@roeck-us.net> wrote: > > On Wed, Jun 09, 2021 at 01:24:18PM +0200, Arnd Bergmann wrote: > > On Wed, Jun 9, 2021 at 1:05 PM Guenter Roeck <linux@roeck-us.net> wrote: > > > On Thu, Feb 04, 2021 at 04:29:47PM +0100, Arnd Bergmann wrote: > > > > From: Arnd Bergmann <arnd@arndb.de> > > > > > > > > ARM randconfig builds with lld sometimes show a build failure > > > > from kallsyms: > > > > > > > > Inconsistent kallsyms data > > > > Try make KALLSYMS_EXTRA_PASS=1 as a workaround > > > > > > > > The problem is the veneers/thunks getting added by the linker extend > > > > the symbol table, which in turn leads to more veneers being needed, > > > > so it may take a few extra iterations to converge. > > > > > > > > This bug has been fixed multiple times before, but comes back every time > > > > a new symbol name is used. lld uses a different set of idenitifiers from > > > > ld.bfd, so the additional ones need to be added as well. > > > > > > > > I looked through the sources and found that arm64 and mips define similar > > > > prefixes, so I'm adding those as well, aside from the ones I observed. I'm > > > > not sure about powerpc64, which seems to already be handled through a > > > > section match, but if it comes back, the "__long_branch_" and "__plt_" > > > > prefixes would have to get added as well. > > > > > > > > > > This is such a whack-a-mole. The problem is hitting us yet again. I suspect > > > it may be due to a new version of lld using new symbols, but I didn't really > > > try to track it down. Is there an easy way to search for missed symbols ? > > > > The way I did it previously was to hack Kbuild to not remove the temporary > > files after a failure, and then compare the "objdump --syms" output of the > > last two stages. > > Problem with that is that we have a non-deterministic problem: The build > fails for us on some build servers, but we are unable to reproduce the > problem when building the same image manually on a development server. > That is similar to what I had observed before, where powerpc builds would > pass on one server, but the same kernel with the same configuration would > fail to build on a second almost identical server. It would really be great > if we can find a better solution. Right, that sucks. I suppose removing the ignore-lists from scripts/kallsyms.c would make it more easily reproducible after a few local randconfig builds, at least enough to add some form of scripting that is able to print the names of the generated symbols. Arnd
On Wed, Jun 09, 2021 at 08:16:11AM -0700, Guenter Roeck wrote: > On Wed, Jun 09, 2021 at 01:24:18PM +0200, Arnd Bergmann wrote: > > On Wed, Jun 9, 2021 at 1:05 PM Guenter Roeck <linux@roeck-us.net> wrote: > > > On Thu, Feb 04, 2021 at 04:29:47PM +0100, Arnd Bergmann wrote: > > > > From: Arnd Bergmann <arnd@arndb.de> > > > > > > > > ARM randconfig builds with lld sometimes show a build failure > > > > from kallsyms: > > > > > > > > Inconsistent kallsyms data > > > > Try make KALLSYMS_EXTRA_PASS=1 as a workaround > > > > > > > > The problem is the veneers/thunks getting added by the linker extend > > > > the symbol table, which in turn leads to more veneers being needed, > > > > so it may take a few extra iterations to converge. > > > > > > > > This bug has been fixed multiple times before, but comes back every time > > > > a new symbol name is used. lld uses a different set of idenitifiers from > > > > ld.bfd, so the additional ones need to be added as well. > > > > > > > > I looked through the sources and found that arm64 and mips define similar > > > > prefixes, so I'm adding those as well, aside from the ones I observed. I'm > > > > not sure about powerpc64, which seems to already be handled through a > > > > section match, but if it comes back, the "__long_branch_" and "__plt_" > > > > prefixes would have to get added as well. > > > > > > > > > > This is such a whack-a-mole. The problem is hitting us yet again. I suspect > > > it may be due to a new version of lld using new symbols, but I didn't really > > > try to track it down. Is there an easy way to search for missed symbols ? > > > > The way I did it previously was to hack Kbuild to not remove the temporary > > files after a failure, and then compare the "objdump --syms" output of the > > last two stages. > > Problem with that is that we have a non-deterministic problem: The build > fails for us on some build servers, but we are unable to reproduce the > problem when building the same image manually on a development server. > That is similar to what I had observed before, where powerpc builds would > pass on one server, but the same kernel with the same configuration would > fail to build on a second almost identical server. It would really be great > if we can find a better solution. > > > > > I suppose we could improve the situation if scripts/link-vmlinux.sh was able > > to do that automatically, and compare the kallsyms output .S file between > > steps 1 and 2. > > Comparing the .S files doesn't result in useful data; turns out there are > always irrelevant differences. We'll try to run a diff on the output of > "objdump --syms". Hopefully that will generate something useful. > Turns out it wasn't that useful. chromeos-kernel-5_10-5.10.42-r406: Symbol file differences: chromeos-kernel-5_10-5.10.42-r406: 7c7 chromeos-kernel-5_10-5.10.42-r406: < 00000000000325c8 g .rodata 0000000000000000 kallsyms_relative_base chromeos-kernel-5_10-5.10.42-r406: --- chromeos-kernel-5_10-5.10.42-r406: > 00000000000325c0 g .rodata 0000000000000000 kallsyms_relative_base chromeos-kernel-5_10-5.10.42-r406: 9,13c9,13 chromeos-kernel-5_10-5.10.42-r406: < 00000000000325d0 g .rodata 0000000000000000 kallsyms_num_syms chromeos-kernel-5_10-5.10.42-r406: < 00000000000325d8 g .rodata 0000000000000000 kallsyms_names chromeos-kernel-5_10-5.10.42-r406: < 00000000000cd7f0 g .rodata 0000000000000000 kallsyms_markers chromeos-kernel-5_10-5.10.42-r406: < 00000000000cdb18 g .rodata 0000000000000000 kallsyms_token_table chromeos-kernel-5_10-5.10.42-r406: < 00000000000cde78 g .rodata 0000000000000000 kallsyms_token_index chromeos-kernel-5_10-5.10.42-r406: --- chromeos-kernel-5_10-5.10.42-r406: > 00000000000325c8 g .rodata 0000000000000000 kallsyms_num_syms chromeos-kernel-5_10-5.10.42-r406: > 00000000000325d0 g .rodata 0000000000000000 kallsyms_names chromeos-kernel-5_10-5.10.42-r406: > 00000000000cd7d8 g .rodata 0000000000000000 kallsyms_markers chromeos-kernel-5_10-5.10.42-r406: > 00000000000cdb00 g .rodata 0000000000000000 kallsyms_token_table chromeos-kernel-5_10-5.10.42-r406: > 00000000000cde60 g .rodata 0000000000000000 kallsyms_token_index I thought I'd see the added symbols, but it looks like the only difference between the two files is the addresses. What am I missing ? Thanks, Guenter
On Wed, Jun 9, 2021 at 9:15 PM Guenter Roeck <linux@roeck-us.net> wrote: > On Wed, Jun 09, 2021 at 08:16:11AM -0700, Guenter Roeck wrote: > > > I suppose we could improve the situation if scripts/link-vmlinux.sh was able > > > to do that automatically, and compare the kallsyms output .S file between > > > steps 1 and 2. > > > > Comparing the .S files doesn't result in useful data; turns out there are > > always irrelevant differences. We'll try to run a diff on the output of > > "objdump --syms". Hopefully that will generate something useful. > > > > Turns out it wasn't that useful. > > chromeos-kernel-5_10-5.10.42-r406: Symbol file differences: > chromeos-kernel-5_10-5.10.42-r406: 7c7 > chromeos-kernel-5_10-5.10.42-r406: < 00000000000325c8 g .rodata 0000000000000000 kallsyms_relative_base > chromeos-kernel-5_10-5.10.42-r406: --- > chromeos-kernel-5_10-5.10.42-r406: > 00000000000325c0 g .rodata 0000000000000000 kallsyms_relative_base > chromeos-kernel-5_10-5.10.42-r406: 9,13c9,13 > chromeos-kernel-5_10-5.10.42-r406: < 00000000000325d0 g .rodata 0000000000000000 kallsyms_num_syms > chromeos-kernel-5_10-5.10.42-r406: < 00000000000325d8 g .rodata 0000000000000000 kallsyms_names > chromeos-kernel-5_10-5.10.42-r406: < 00000000000cd7f0 g .rodata 0000000000000000 kallsyms_markers > chromeos-kernel-5_10-5.10.42-r406: < 00000000000cdb18 g .rodata 0000000000000000 kallsyms_token_table > chromeos-kernel-5_10-5.10.42-r406: < 00000000000cde78 g .rodata 0000000000000000 kallsyms_token_index > chromeos-kernel-5_10-5.10.42-r406: --- > chromeos-kernel-5_10-5.10.42-r406: > 00000000000325c8 g .rodata 0000000000000000 kallsyms_num_syms > chromeos-kernel-5_10-5.10.42-r406: > 00000000000325d0 g .rodata 0000000000000000 kallsyms_names > chromeos-kernel-5_10-5.10.42-r406: > 00000000000cd7d8 g .rodata 0000000000000000 kallsyms_markers > chromeos-kernel-5_10-5.10.42-r406: > 00000000000cdb00 g .rodata 0000000000000000 kallsyms_token_table > chromeos-kernel-5_10-5.10.42-r406: > 00000000000cde60 g .rodata 0000000000000000 kallsyms_token_index > > I thought I'd see the added symbols, but it looks like the only difference > between the two files is the addresses. > > What am I missing ? I probably misremembered the part about 'objdump --syms' and there was something more to it. Maybe this was the last version before converging? It looks like the '<' version has one extra symbol ompared to the '>' version. The diff has no context, but I assume the first symbol that has a different size is 'kallsyms_offsets', which is generated by kallsyms. I see that link-vmlinux.sh already compares the System.map files, using "cmp -s System.map .tmp_System.map", which is roughly the same as the objdump --syms diff you got, so comparing these files probably doesn't help either. However, comparing the .tmp_System.map file with the previous version might reveal the problem. This might need another step to filter out the address and only compare the symbol names. Arnd
On Wed, Jun 09, 2021 at 10:30:23PM +0200, Arnd Bergmann wrote: > On Wed, Jun 9, 2021 at 9:15 PM Guenter Roeck <linux@roeck-us.net> wrote: > > On Wed, Jun 09, 2021 at 08:16:11AM -0700, Guenter Roeck wrote: > > > > I suppose we could improve the situation if scripts/link-vmlinux.sh was able > > > > to do that automatically, and compare the kallsyms output .S file between > > > > steps 1 and 2. > > > > > > Comparing the .S files doesn't result in useful data; turns out there are > > > always irrelevant differences. We'll try to run a diff on the output of > > > "objdump --syms". Hopefully that will generate something useful. > > > > > > > Turns out it wasn't that useful. > > > > chromeos-kernel-5_10-5.10.42-r406: Symbol file differences: > > chromeos-kernel-5_10-5.10.42-r406: 7c7 > > chromeos-kernel-5_10-5.10.42-r406: < 00000000000325c8 g .rodata 0000000000000000 kallsyms_relative_base > > chromeos-kernel-5_10-5.10.42-r406: --- > > chromeos-kernel-5_10-5.10.42-r406: > 00000000000325c0 g .rodata 0000000000000000 kallsyms_relative_base > > chromeos-kernel-5_10-5.10.42-r406: 9,13c9,13 > > chromeos-kernel-5_10-5.10.42-r406: < 00000000000325d0 g .rodata 0000000000000000 kallsyms_num_syms > > chromeos-kernel-5_10-5.10.42-r406: < 00000000000325d8 g .rodata 0000000000000000 kallsyms_names > > chromeos-kernel-5_10-5.10.42-r406: < 00000000000cd7f0 g .rodata 0000000000000000 kallsyms_markers > > chromeos-kernel-5_10-5.10.42-r406: < 00000000000cdb18 g .rodata 0000000000000000 kallsyms_token_table > > chromeos-kernel-5_10-5.10.42-r406: < 00000000000cde78 g .rodata 0000000000000000 kallsyms_token_index > > chromeos-kernel-5_10-5.10.42-r406: --- > > chromeos-kernel-5_10-5.10.42-r406: > 00000000000325c8 g .rodata 0000000000000000 kallsyms_num_syms > > chromeos-kernel-5_10-5.10.42-r406: > 00000000000325d0 g .rodata 0000000000000000 kallsyms_names > > chromeos-kernel-5_10-5.10.42-r406: > 00000000000cd7d8 g .rodata 0000000000000000 kallsyms_markers > > chromeos-kernel-5_10-5.10.42-r406: > 00000000000cdb00 g .rodata 0000000000000000 kallsyms_token_table > > chromeos-kernel-5_10-5.10.42-r406: > 00000000000cde60 g .rodata 0000000000000000 kallsyms_token_index > > > > I thought I'd see the added symbols, but it looks like the only difference > > between the two files is the addresses. > > > > What am I missing ? > > I probably misremembered the part about 'objdump --syms' and there was > something more to it. > > Maybe this was the last version before converging? It looks like the '<' version > has one extra symbol ompared to the '>' version. The diff has no context, but I It is the difference between step 1 and 2. Why would diff on objdump not show the additional symbol ? Is it possible that the symbol is not added to the object file ? > assume the first symbol that has a different size is 'kallsyms_offsets', which > is generated by kallsyms. I'll give it another try and run diff -u. > > I see that link-vmlinux.sh already compares the System.map files, using > "cmp -s System.map .tmp_System.map", which is roughly the same as the > objdump --syms diff you got, so comparing these files probably doesn't > help either. However, comparing the .tmp_System.map file with the previous > version might reveal the problem. This might need another step to filter out > the address and only compare the symbol names. I'll do that as well. Thanks! Guenter
On Thu, Jun 10, 2021 at 2:05 PM Guenter Roeck <linux@roeck-us.net> wrote: > On Wed, Jun 09, 2021 at 10:30:23PM +0200, Arnd Bergmann wrote: > > > I thought I'd see the added symbols, but it looks like the only difference > > > between the two files is the addresses. > > > > > > What am I missing ? > > > > I probably misremembered the part about 'objdump --syms' and there was > > something more to it. > > > > Maybe this was the last version before converging? It looks like the '<' version > > has one extra symbol ompared to the '>' version. The diff has no context, but I > > It is the difference between step 1 and 2. Why would diff on objdump not > show the additional symbol ? Is it possible that the symbol is not added > to the object file ? Note sure. The symbol must be in the object file, but perhaps the 'objdump --syms' output skips a different set of symbols compared to the list that is used as input for kallsyms, which comes from '${NM}'. Comparing the nm output might be another thing to try. Arnd
On Thu, Jun 10, 2021 at 02:26:50PM +0200, Arnd Bergmann wrote: > On Thu, Jun 10, 2021 at 2:05 PM Guenter Roeck <linux@roeck-us.net> wrote: > > On Wed, Jun 09, 2021 at 10:30:23PM +0200, Arnd Bergmann wrote: > > > > I thought I'd see the added symbols, but it looks like the only difference > > > > between the two files is the addresses. > > > > > > > > What am I missing ? > > > > > > I probably misremembered the part about 'objdump --syms' and there was > > > something more to it. > > > > > > Maybe this was the last version before converging? It looks like the '<' version > > > has one extra symbol ompared to the '>' version. The diff has no context, but I > > > > It is the difference between step 1 and 2. Why would diff on objdump not > > show the additional symbol ? Is it possible that the symbol is not added > > to the object file ? > > Note sure. The symbol must be in the object file, but perhaps the > 'objdump --syms' output skips a different set of symbols compared > to the list that is used as input for kallsyms, which comes from '${NM}'. > > Comparing the nm output might be another thing to try. > Just following up on this: As Murphy would have told us, the problem disappeared while trying to track it down. We'll add some instrumentation into the ChromeOS kernel build to get data once/if/when it shows up again. When that happens, we'll try to come up with a patch to show the symbol file differences in the kernel build and submit it upstream. Thanks, Guenter
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c index 7ecd2ccba531..54ad86d13784 100644 --- a/scripts/kallsyms.c +++ b/scripts/kallsyms.c @@ -112,6 +112,12 @@ static bool is_ignored_symbol(const char *name, char type) "__crc_", /* modversions */ "__efistub_", /* arm64 EFI stub namespace */ "__kvm_nvhe_", /* arm64 non-VHE KVM namespace */ + "__AArch64ADRPThunk_", /* arm64 lld */ + "__ARMV5PILongThunk_", /* arm lld */ + "__ARMV7PILongThunk_", + "__ThumbV7PILongThunk_", + "__LA25Thunk_", /* mips lld */ + "__microLA25Thunk_", NULL };