diff mbox series

[RFC,DO,NOT,APPLY] vmlinux.lds: revert link speed regression

Message ID 20250120212839.1675696-1-arnd@kernel.org (mailing list archive)
State New
Headers show
Series [RFC,DO,NOT,APPLY] vmlinux.lds: revert link speed regression | expand

Commit Message

Arnd Bergmann Jan. 20, 2025, 9:21 p.m. UTC
From: Arnd Bergmann <arnd@arndb.de>

I noticed a regression in the time it takes to fully link some randconfig
kernels and bisected this to commit 0043ecea2399 ("vmlinux.lds.h: Adjust
symbol ordering in text output section"), which (among other changes) moves
.text.unlikely ahead of .text.

Partially reverting this makes the final link over six times faster again,
back to what it was in linux-6.12:

		linux-6.12	linux-6.13
ld.lld v20	1.2s		1.2s
ld.bfd v2.36	3.2s		5.2s
ld.bfd v2.39	59s		388s

According to the commit description, that revert is not allowed here
because with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, the .text.unlikely
section name conflicts with the function-section names. On the other
hand, the excessive link time happens both with and without that
option, so the order could be conditional.

I did not try to bisect the linker beyond trying multiple versions
I had installed already, and it does feel like the behavior of recent
versions (tested 2.39 and 2.42 with identical results) is broken in
some form that earlier versions were not. According to 'perf', most
of the time is spent in elf_link_adjust_relocs() and ext64l_r_offset().

I also did not try to narrow the problem down to specific kernel
configuration options, but from my first impression it does appear
to be rare, and unrelated to the Propeller options added in 6.13.

Cc: regressions@lists.linux.dev
Cc: Han Shen <shenhan@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Fixes: 0043ecea2399 ("vmlinux.lds.h: Adjust symbol ordering in text output section")
Link: https://pastebin.com/raw/sWpbkapL (config)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 include/asm-generic/vmlinux.lds.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Masahiro Yamada Jan. 21, 2025, 12:19 a.m. UTC | #1
On Tue, Jan 21, 2025 at 6:29 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> I noticed a regression in the time it takes to fully link some randconfig
> kernels and bisected this to commit 0043ecea2399 ("vmlinux.lds.h: Adjust
> symbol ordering in text output section"), which (among other changes) moves
> .text.unlikely ahead of .text.
>
> Partially reverting this makes the final link over six times faster again,
> back to what it was in linux-6.12:
>
>                 linux-6.12      linux-6.13
> ld.lld v20      1.2s            1.2s
> ld.bfd v2.36    3.2s            5.2s
> ld.bfd v2.39    59s             388s
>
> According to the commit description, that revert is not allowed here
> because with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, the .text.unlikely
> section name conflicts with the function-section names. On the other
> hand, the excessive link time happens both with and without that
> option, so the order could be conditional.
>
> I did not try to bisect the linker beyond trying multiple versions
> I had installed already, and it does feel like the behavior of recent
> versions (tested 2.39 and 2.42 with identical results) is broken in
> some form that earlier versions were not. According to 'perf', most
> of the time is spent in elf_link_adjust_relocs() and ext64l_r_offset().

Is this problem specific to the BFD linker from binutils?

Did you observe a link speed regression with LLVM=1 build?





--
Best Regards
Masahiro Yamada
Arnd Bergmann Jan. 21, 2025, 7:41 a.m. UTC | #2
On Tue, Jan 21, 2025, at 01:19, Masahiro Yamada wrote:
> On Tue, Jan 21, 2025 at 6:29 AM Arnd Bergmann <arnd@kernel.org> wrote:
>>
>>                 linux-6.12      linux-6.13
>> ld.lld v20      1.2s            1.2s
>> ld.bfd v2.36    3.2s            5.2s
>> ld.bfd v2.39    59s             388s
>>
>
> Is this problem specific to the BFD linker from binutils?

I only tried the bfd and lld linkers, but I assume it's limited
to the bfd one.

> Did you observe a link speed regression with LLVM=1 build?

No, the ld.lld line above is what I see with LLVM=1, it's
very fast (1.2s) both before and after the change. New
ld.bfd versions were much slower before the regression
for this particular config and got even slower (seven minutes
for each of the three vmlinux link stages).

      Arnd
Rong Xu Jan. 21, 2025, 5:36 p.m. UTC | #3
On Mon, Jan 20, 2025 at 11:42 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jan 21, 2025, at 01:19, Masahiro Yamada wrote:
> > On Tue, Jan 21, 2025 at 6:29 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >>
> >>                 linux-6.12      linux-6.13
> >> ld.lld v20      1.2s            1.2s
> >> ld.bfd v2.36    3.2s            5.2s
> >> ld.bfd v2.39    59s             388s
> >>
> >
> > Is this problem specific to the BFD linker from binutils?
>
> I only tried the bfd and lld linkers, but I assume it's limited
> to the bfd one.
>
> > Did you observe a link speed regression with LLVM=1 build?
>
> No, the ld.lld line above is what I see with LLVM=1, it's
> very fast (1.2s) both before and after the change. New
> ld.bfd versions were much slower before the regression
> for this particular config and got even slower (seven minutes
> for each of the three vmlinux link stages).
>

Can you send me the instructions to reproduce?
I'm seeing a significant performance drop (18.5x slow down) with
ld.bfd v2.39 when
linking linux-6.12 (i.e without this change)

I will take a look at ld.bfd. It looks like an ld.bfd issue.

-Rong

>       Arnd
Rong Xu Jan. 21, 2025, 5:45 p.m. UTC | #4
On Mon, Jan 20, 2025 at 1:29 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> I noticed a regression in the time it takes to fully link some randconfig
> kernels and bisected this to commit 0043ecea2399 ("vmlinux.lds.h: Adjust
> symbol ordering in text output section"), which (among other changes) moves
> .text.unlikely ahead of .text.
>
> Partially reverting this makes the final link over six times faster again,
> back to what it was in linux-6.12:
>
>                 linux-6.12      linux-6.13
> ld.lld v20      1.2s            1.2s
> ld.bfd v2.36    3.2s            5.2s
> ld.bfd v2.39    59s             388s
>
> According to the commit description, that revert is not allowed here
> because with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, the .text.unlikely
> section name conflicts with the function-section names. On the other
> hand, the excessive link time happens both with and without that
> option, so the order could be conditional.

Yes. The order could be conditional. As a matter of fact, the first
version was conditional.
I changed it based on the reviewer comments to reduce conditions for
more maintainable code.
I would like to work from the ld.bfd side to see if we can fix the problem.

-Rong

>
> I did not try to bisect the linker beyond trying multiple versions
> I had installed already, and it does feel like the behavior of recent
> versions (tested 2.39 and 2.42 with identical results) is broken in
> some form that earlier versions were not. According to 'perf', most
> of the time is spent in elf_link_adjust_relocs() and ext64l_r_offset().
>
> I also did not try to narrow the problem down to specific kernel
> configuration options, but from my first impression it does appear
> to be rare, and unrelated to the Propeller options added in 6.13.
>
> Cc: regressions@lists.linux.dev
> Cc: Han Shen <shenhan@google.com>
> Cc: Masahiro Yamada <masahiroy@kernel.org>
> Cc: Nathan Chancellor <nathan@kernel.org>
> Cc: Kees Cook <kees@kernel.org>
> Fixes: 0043ecea2399 ("vmlinux.lds.h: Adjust symbol ordering in text output section")
> Link: https://pastebin.com/raw/sWpbkapL (config)
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/asm-generic/vmlinux.lds.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 54504013c749..61fa047023b5 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -588,10 +588,10 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
>                 *(.text.asan.* .text.tsan.*)                            \
>                 *(.text.unknown .text.unknown.*)                        \
>                 TEXT_SPLIT                                              \
> -               TEXT_UNLIKELY                                           \
>                 . = ALIGN(PAGE_SIZE);                                   \
>                 TEXT_HOT                                                \
>                 *(TEXT_MAIN .text.fixup)                                \
> +               TEXT_UNLIKELY                                           \
>                 NOINSTR_TEXT                                            \
>                 *(.ref.text)
>
> --
> 2.39.5
>
Arnd Bergmann Jan. 21, 2025, 9:17 p.m. UTC | #5
On Tue, Jan 21, 2025, at 18:45, Rong Xu wrote:
> On Mon, Jan 20, 2025 at 1:29 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> Yes. The order could be conditional. As a matter of fact, the first
> version was conditional.
> I changed it based on the reviewer comments to reduce conditions for
> more maintainable code.
> I would like to work from the ld.bfd side to see if we can fix the problem.

Makes sense. At least once we understand what makes the linker so slow
and fix future versions, it should also be possible to come up with
a more effective workaround for the existing linkers that suffer from it.

     Arnd
Rong Xu Jan. 22, 2025, 6:47 p.m. UTC | #6
On Tue, Jan 21, 2025 at 1:18 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jan 21, 2025, at 18:45, Rong Xu wrote:
> > On Mon, Jan 20, 2025 at 1:29 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > Yes. The order could be conditional. As a matter of fact, the first
> > version was conditional.
> > I changed it based on the reviewer comments to reduce conditions for
> > more maintainable code.
> > I would like to work from the ld.bfd side to see if we can fix the problem.
>
> Makes sense. At least once we understand what makes the linker so slow
> and fix future versions, it should also be possible to come up with
> a more effective workaround for the existing linkers that suffer from it.

@Arnd: Can you send me the instructions to reproduce this regression?

I tried my x86-64 machine in v6.13 kernel with the following ld.bfd:
(1) GNU ld (GNU Binutils) 2.36.90.20210703
(2) GNU ld (GNU Binutils) 2.39.90.20221231
(3) GNU ld (GNU Binutils for Debian) 2.42.50.20240625
They all used about ~2s to link vmlinux.o.

The config I used was "make randconfig".

Thanks,

-Rong

>
>      Arnd
Arnd Bergmann Jan. 22, 2025, 8:29 p.m. UTC | #7
On Wed, Jan 22, 2025, at 19:47, Rong Xu wrote:
> On Tue, Jan 21, 2025 at 1:18 PM Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> On Tue, Jan 21, 2025, at 18:45, Rong Xu wrote:
>> > On Mon, Jan 20, 2025 at 1:29 PM Arnd Bergmann <arnd@kernel.org> wrote:
>> >
>> > Yes. The order could be conditional. As a matter of fact, the first
>> > version was conditional.
>> > I changed it based on the reviewer comments to reduce conditions for
>> > more maintainable code.
>> > I would like to work from the ld.bfd side to see if we can fix the problem.
>>
>> Makes sense. At least once we understand what makes the linker so slow
>> and fix future versions, it should also be possible to come up with
>> a more effective workaround for the existing linkers that suffer from it.
>
> @Arnd: Can you send me the instructions to reproduce this regression?

My report had linked to the config that I saw originally:

>> > Link: https://pastebin.com/raw/sWpbkapL (config)

This is for a x86_64 build, and I used 'make savedefconfig' to
simplify it, so you have to copy it to arch/x86/configs/test_defconfig
and run 'make test_defconfig' to get the full file back.

I have also uploaded a reproducer to
https://drive.google.com/file/d/14xWdD_S51XBgV6kOajLvdtOef7tQTZQq/view?usp=sharing
but it's fairly large. The reproducer is from ld.lld --reproduce=, but
you can simply unpack it and do

x86_64-linux-gnu-ld.bfd  -m elf_x86_64 -z noexecstack --emit-relocs --discard-none -z max-page-size=0x200000 --gc-sections --build-id=sha1 --orphan-handling warn --script home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/arch/x86/kernel/vmlinux.lds --strip-debug -o .tmp_vmlinux1 --whole-archive home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/vmlinux.a home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/init/version-timestamp.o --no-whole-archive --start-group home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/lib/lib.a home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/arch/x86/lib/lib.a --end-group home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/.tmp_vmlinux0.kallsyms.o

    Arnd
Rong Xu Jan. 22, 2025, 10:37 p.m. UTC | #8
On Wed, Jan 22, 2025 at 12:30 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Jan 22, 2025, at 19:47, Rong Xu wrote:
> > On Tue, Jan 21, 2025 at 1:18 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >>
> >> On Tue, Jan 21, 2025, at 18:45, Rong Xu wrote:
> >> > On Mon, Jan 20, 2025 at 1:29 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >> >
> >> > Yes. The order could be conditional. As a matter of fact, the first
> >> > version was conditional.
> >> > I changed it based on the reviewer comments to reduce conditions for
> >> > more maintainable code.
> >> > I would like to work from the ld.bfd side to see if we can fix the problem.
> >>
> >> Makes sense. At least once we understand what makes the linker so slow
> >> and fix future versions, it should also be possible to come up with
> >> a more effective workaround for the existing linkers that suffer from it.
> >
> > @Arnd: Can you send me the instructions to reproduce this regression?
>
> My report had linked to the config that I saw originally:
>
> >> > Link: https://pastebin.com/raw/sWpbkapL (config)
>
> This is for a x86_64 build, and I used 'make savedefconfig' to
> simplify it, so you have to copy it to arch/x86/configs/test_defconfig
> and run 'make test_defconfig' to get the full file back.
>
> I have also uploaded a reproducer to
> https://drive.google.com/file/d/14xWdD_S51XBgV6kOajLvdtOef7tQTZQq/view?usp=sharing
> but it's fairly large. The reproducer is from ld.lld --reproduce=, but
> you can simply unpack it and do
>
> x86_64-linux-gnu-ld.bfd  -m elf_x86_64 -z noexecstack --emit-relocs --discard-none -z max-page-size=0x200000 --gc-sections --build-id=sha1 --orphan-handling warn --script home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/arch/x86/kernel/vmlinux.lds --strip-debug -o .tmp_vmlinux1 --whole-archive home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/vmlinux.a home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/init/version-timestamp.o --no-whole-archive --start-group home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/lib/lib.a home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/arch/x86/lib/lib.a --end-group home/arnd/arm-soc/build/x86/0xA8B23FFD_defconfig/.tmp_vmlinux0.kallsyms.o
>
>     Arnd

Thanks Arnd. I can reproduce the issue with your files.

The slowdown was caused by the following change in elflink.c:

commit fba1ac87dcb76e61f270d236f1e7c8aaec80f9c4
Author: Tomoaki Kawada <kawada@kmckk.co.jp>
Date:   Thu Jun 16 09:54:30 2022 +0000

    Fix the sorting algorithm for reloc entries

    The optimized insertion sort algorithm in `elf_link_adjust_relocs`
    incorrectly assembled "runs" from unsorted entries and inserted them to an
    already-sorted prefix, breaking the loop invariants of insertion sort.
    This commit updates the run assembly loop to break upon encountering a
    non-monotonic change in the sort key.

I haven't looked closely into this. The fix is most likely valid. But
the code might not be efficient.

-Rong
diff mbox series

Patch

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 54504013c749..61fa047023b5 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -588,10 +588,10 @@  defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
 		*(.text.asan.* .text.tsan.*)				\
 		*(.text.unknown .text.unknown.*)			\
 		TEXT_SPLIT						\
-		TEXT_UNLIKELY						\
 		. = ALIGN(PAGE_SIZE);					\
 		TEXT_HOT						\
 		*(TEXT_MAIN .text.fixup)				\
+		TEXT_UNLIKELY						\
 		NOINSTR_TEXT						\
 		*(.ref.text)