Message ID | 20250120212839.1675696-1-arnd@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC,DO,NOT,APPLY] vmlinux.lds: revert link speed regression | expand |
On Tue, Jan 21, 2025 at 6:29 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > I noticed a regression in the time it takes to fully link some randconfig > kernels and bisected this to commit 0043ecea2399 ("vmlinux.lds.h: Adjust > symbol ordering in text output section"), which (among other changes) moves > .text.unlikely ahead of .text. > > Partially reverting this makes the final link over six times faster again, > back to what it was in linux-6.12: > > linux-6.12 linux-6.13 > ld.lld v20 1.2s 1.2s > ld.bfd v2.36 3.2s 5.2s > ld.bfd v2.39 59s 388s > > According to the commit description, that revert is not allowed here > because with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, the .text.unlikely > section name conflicts with the function-section names. On the other > hand, the excessive link time happens both with and without that > option, so the order could be conditional. > > I did not try to bisect the linker beyond trying multiple versions > I had installed already, and it does feel like the behavior of recent > versions (tested 2.39 and 2.42 with identical results) is broken in > some form that earlier versions were not. According to 'perf', most > of the time is spent in elf_link_adjust_relocs() and ext64l_r_offset(). Is this problem specific to the BFD linker from binutils? Did you observe a link speed regression with LLVM=1 build? -- Best Regards Masahiro Yamada
On Tue, Jan 21, 2025, at 01:19, Masahiro Yamada wrote: > On Tue, Jan 21, 2025 at 6:29 AM Arnd Bergmann <arnd@kernel.org> wrote: >> >> linux-6.12 linux-6.13 >> ld.lld v20 1.2s 1.2s >> ld.bfd v2.36 3.2s 5.2s >> ld.bfd v2.39 59s 388s >> > > Is this problem specific to the BFD linker from binutils? I only tried the bfd and lld linkers, but I assume it's limited to the bfd one. > Did you observe a link speed regression with LLVM=1 build? No, the ld.lld line above is what I see with LLVM=1, it's very fast (1.2s) both before and after the change. New ld.bfd versions were much slower before the regression for this particular config and got even slower (seven minutes for each of the three vmlinux link stages). Arnd
On Mon, Jan 20, 2025 at 11:42 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Tue, Jan 21, 2025, at 01:19, Masahiro Yamada wrote: > > On Tue, Jan 21, 2025 at 6:29 AM Arnd Bergmann <arnd@kernel.org> wrote: > >> > >> linux-6.12 linux-6.13 > >> ld.lld v20 1.2s 1.2s > >> ld.bfd v2.36 3.2s 5.2s > >> ld.bfd v2.39 59s 388s > >> > > > > Is this problem specific to the BFD linker from binutils? > > I only tried the bfd and lld linkers, but I assume it's limited > to the bfd one. > > > Did you observe a link speed regression with LLVM=1 build? > > No, the ld.lld line above is what I see with LLVM=1, it's > very fast (1.2s) both before and after the change. New > ld.bfd versions were much slower before the regression > for this particular config and got even slower (seven minutes > for each of the three vmlinux link stages). > Can you send me the instructions to reproduce? I'm seeing a significant performance drop (18.5x slow down) with ld.bfd v2.39 when linking linux-6.12 (i.e without this change) I will take a look at ld.bfd. It looks like an ld.bfd issue. -Rong > Arnd
On Mon, Jan 20, 2025 at 1:29 PM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > I noticed a regression in the time it takes to fully link some randconfig > kernels and bisected this to commit 0043ecea2399 ("vmlinux.lds.h: Adjust > symbol ordering in text output section"), which (among other changes) moves > .text.unlikely ahead of .text. > > Partially reverting this makes the final link over six times faster again, > back to what it was in linux-6.12: > > linux-6.12 linux-6.13 > ld.lld v20 1.2s 1.2s > ld.bfd v2.36 3.2s 5.2s > ld.bfd v2.39 59s 388s > > According to the commit description, that revert is not allowed here > because with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, the .text.unlikely > section name conflicts with the function-section names. On the other > hand, the excessive link time happens both with and without that > option, so the order could be conditional. Yes. The order could be conditional. As a matter of fact, the first version was conditional. I changed it based on the reviewer comments to reduce conditions for more maintainable code. I would like to work from the ld.bfd side to see if we can fix the problem. -Rong > > I did not try to bisect the linker beyond trying multiple versions > I had installed already, and it does feel like the behavior of recent > versions (tested 2.39 and 2.42 with identical results) is broken in > some form that earlier versions were not. According to 'perf', most > of the time is spent in elf_link_adjust_relocs() and ext64l_r_offset(). > > I also did not try to narrow the problem down to specific kernel > configuration options, but from my first impression it does appear > to be rare, and unrelated to the Propeller options added in 6.13. > > Cc: regressions@lists.linux.dev > Cc: Han Shen <shenhan@google.com> > Cc: Masahiro Yamada <masahiroy@kernel.org> > Cc: Nathan Chancellor <nathan@kernel.org> > Cc: Kees Cook <kees@kernel.org> > Fixes: 0043ecea2399 ("vmlinux.lds.h: Adjust symbol ordering in text output section") > Link: https://pastebin.com/raw/sWpbkapL (config) > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > include/asm-generic/vmlinux.lds.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index 54504013c749..61fa047023b5 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -588,10 +588,10 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG) > *(.text.asan.* .text.tsan.*) \ > *(.text.unknown .text.unknown.*) \ > TEXT_SPLIT \ > - TEXT_UNLIKELY \ > . = ALIGN(PAGE_SIZE); \ > TEXT_HOT \ > *(TEXT_MAIN .text.fixup) \ > + TEXT_UNLIKELY \ > NOINSTR_TEXT \ > *(.ref.text) > > -- > 2.39.5 >
On Tue, Jan 21, 2025, at 18:45, Rong Xu wrote: > On Mon, Jan 20, 2025 at 1:29 PM Arnd Bergmann <arnd@kernel.org> wrote: > > Yes. The order could be conditional. As a matter of fact, the first > version was conditional. > I changed it based on the reviewer comments to reduce conditions for > more maintainable code. > I would like to work from the ld.bfd side to see if we can fix the problem. Makes sense. At least once we understand what makes the linker so slow and fix future versions, it should also be possible to come up with a more effective workaround for the existing linkers that suffer from it. Arnd
On Tue, Jan 21, 2025 at 1:18 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Tue, Jan 21, 2025, at 18:45, Rong Xu wrote: > > On Mon, Jan 20, 2025 at 1:29 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > Yes. The order could be conditional. As a matter of fact, the first > > version was conditional. > > I changed it based on the reviewer comments to reduce conditions for > > more maintainable code. > > I would like to work from the ld.bfd side to see if we can fix the problem. > > Makes sense. At least once we understand what makes the linker so slow > and fix future versions, it should also be possible to come up with > a more effective workaround for the existing linkers that suffer from it. @Arnd: Can you send me the instructions to reproduce this regression? I tried my x86-64 machine in v6.13 kernel with the following ld.bfd: (1) GNU ld (GNU Binutils) 2.36.90.20210703 (2) GNU ld (GNU Binutils) 2.39.90.20221231 (3) GNU ld (GNU Binutils for Debian) 2.42.50.20240625 They all used about ~2s to link vmlinux.o. The config I used was "make randconfig". Thanks, -Rong > > Arnd
On Wed, Jan 22, 2025, at 19:47, Rong Xu wrote: > On Tue, Jan 21, 2025 at 1:18 PM Arnd Bergmann <arnd@arndb.de> wrote: >> >> On Tue, Jan 21, 2025, at 18:45, Rong Xu wrote: >> > On Mon, Jan 20, 2025 at 1:29 PM Arnd Bergmann <arnd@kernel.org> wrote: >> > >> > Yes. The order could be conditional. As a matter of fact, the first >> > version was conditional. >> > I changed it based on the reviewer comments to reduce conditions for >> > more maintainable code. >> > I would like to work from the ld.bfd side to see if we can fix the problem. >> >> Makes sense. At least once we understand what makes the linker so slow >> and fix future versions, it should also be possible to come up with >> a more effective workaround for the existing linkers that suffer from it. > > @Arnd: Can you send me the instructions to reproduce this regression? My report had linked to the config that I saw originally: >> > Link: https://pastebin.com/raw/sWpbkapL (config) This is for a x86_64 build, and I used 'make savedefconfig' to simplify it, so you have to copy it to arch/x86/configs/test_defconfig and run 'make test_defconfig' to get the full file back. I have also uploaded a reproducer to https://drive.google.com/file/d/14xWdD_S51XBgV6kOajLvdtOef7tQTZQq/view?usp=sharing but it's fairly large. The reproducer is from ld.lld --reproduce=, but you can simply unpack it and do x86_64-linux-gnu-ld.bfd -m elf_x86_64 -z noexecstack --emit-relocs --discard-none -z max-page-size=0x200000 --gc-sections --build-id=sha1 --orphan-handling warn --script home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/arch/x86/kernel/vmlinux.lds --strip-debug -o .tmp_vmlinux1 --whole-archive home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/vmlinux.a home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/init/version-timestamp.o --no-whole-archive --start-group home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/lib/lib.a home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/arch/x86/lib/lib.a --end-group home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/.tmp_vmlinux0.kallsyms.o Arnd
On Wed, Jan 22, 2025 at 12:30 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Wed, Jan 22, 2025, at 19:47, Rong Xu wrote: > > On Tue, Jan 21, 2025 at 1:18 PM Arnd Bergmann <arnd@arndb.de> wrote: > >> > >> On Tue, Jan 21, 2025, at 18:45, Rong Xu wrote: > >> > On Mon, Jan 20, 2025 at 1:29 PM Arnd Bergmann <arnd@kernel.org> wrote: > >> > > >> > Yes. The order could be conditional. As a matter of fact, the first > >> > version was conditional. > >> > I changed it based on the reviewer comments to reduce conditions for > >> > more maintainable code. > >> > I would like to work from the ld.bfd side to see if we can fix the problem. > >> > >> Makes sense. At least once we understand what makes the linker so slow > >> and fix future versions, it should also be possible to come up with > >> a more effective workaround for the existing linkers that suffer from it. > > > > @Arnd: Can you send me the instructions to reproduce this regression? > > My report had linked to the config that I saw originally: > > >> > Link: https://pastebin.com/raw/sWpbkapL (config) > > This is for a x86_64 build, and I used 'make savedefconfig' to > simplify it, so you have to copy it to arch/x86/configs/test_defconfig > and run 'make test_defconfig' to get the full file back. > > I have also uploaded a reproducer to > https://drive.google.com/file/d/14xWdD_S51XBgV6kOajLvdtOef7tQTZQq/view?usp=sharing > but it's fairly large. The reproducer is from ld.lld --reproduce=, but > you can simply unpack it and do > > x86_64-linux-gnu-ld.bfd -m elf_x86_64 -z noexecstack --emit-relocs --discard-none -z max-page-size=0x200000 --gc-sections --build-id=sha1 --orphan-handling warn --script home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/arch/x86/kernel/vmlinux.lds --strip-debug -o .tmp_vmlinux1 --whole-archive home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/vmlinux.a home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/init/version-timestamp.o --no-whole-archive --start-group home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/lib/lib.a home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/arch/x86/lib/lib.a --end-group home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/.tmp_vmlinux0.kallsyms.o > > Arnd Thanks Arnd. I can reproduce the issue with your files. The slowdown was caused by the following change in elflink.c: commit fba1ac87dcb76e61f270d236f1e7c8aaec80f9c4 Author: Tomoaki Kawada <kawada@kmckk.co.jp> Date: Thu Jun 16 09:54:30 2022 +0000 Fix the sorting algorithm for reloc entries The optimized insertion sort algorithm in `elf_link_adjust_relocs` incorrectly assembled "runs" from unsorted entries and inserted them to an already-sorted prefix, breaking the loop invariants of insertion sort. This commit updates the run assembly loop to break upon encountering a non-monotonic change in the sort key. I haven't looked closely into this. The fix is most likely valid. But the code might not be efficient. -Rong
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 54504013c749..61fa047023b5 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -588,10 +588,10 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG) *(.text.asan.* .text.tsan.*) \ *(.text.unknown .text.unknown.*) \ TEXT_SPLIT \ - TEXT_UNLIKELY \ . = ALIGN(PAGE_SIZE); \ TEXT_HOT \ *(TEXT_MAIN .text.fixup) \ + TEXT_UNLIKELY \ NOINSTR_TEXT \ *(.ref.text)