diff mbox series

BUG Report: Some issues about vmlinux with emit-relocs

Message ID 20230808085438.3445957-1-suagrfillet@gmail.com (mailing list archive)
State New, archived
Headers show
Series BUG Report: Some issues about vmlinux with emit-relocs | expand

Checks

Context Check Description
conchuod/cover_letter success Single patches do not need cover letters
conchuod/tree_selection success Guessed tree name to be for-next at HEAD 174e8ac0272d
conchuod/fixes_present success Fixes tag not required for -next series
conchuod/maintainers_pattern success MAINTAINERS pattern errors before the patch: 4 and now 4
conchuod/verify_signedoff fail author Signed-off-by missing
conchuod/kdoc success Errors and warnings before: 0 this patch: 0
conchuod/build_rv64_clang_allmodconfig success Errors and warnings before: 9 this patch: 9
conchuod/module_param success Was 0 now: 0
conchuod/build_rv64_gcc_allmodconfig success Errors and warnings before: 9 this patch: 9
conchuod/build_rv32_defconfig success Build OK
conchuod/dtb_warn_rv64 success Errors and warnings before: 12 this patch: 12
conchuod/header_inline success No static functions without inline keyword in header files
conchuod/checkpatch warning WARNING: Unknown commit id 'c12d9fa2afe', maybe rebased or not pulled? WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
conchuod/build_rv64_nommu_k210_defconfig success Build OK
conchuod/verify_fixes success No Fixes tag
conchuod/build_rv64_nommu_virt_defconfig success Build OK

Commit Message

Song Shuai Aug. 8, 2023, 8:54 a.m. UTC
Hi, everyone:

I encountered some issues when testing CONFIG_RELOCATABLE.

The story starts with the issue "the empty relocations in .rela.dyn"
mentioned in patch 1 of "Introduce 64b relocatable kernel" thread [1].
And it had been circumvented by ld `--emit-relocs` option in patch 6.

With the `emit-relocs` option enabled, the vmlinux would grow bigger,
so vmlinux.relocs was created as a backup to generate the Image file
and vmlinux objcopyed itself to check/strip all .rela* sections.

Not sure there is a solution to fix the "empty relocaions" issue
and get rid of the `emit-relocs` option.
Until that, there are some other issues with vmlinux's `emit-relocs` option:

1. result of `remove-section` varies with different version GNU-objcopy

- the sections of vmlinux with objcopy 2.31.1

riscv64-linux-gnu-readelf -SW 00_rv_test/vmlinux |grep rel
  [10] .rela.dyn         RELA            ffffffff80c26138 913138 2186a0 18   A  9   0  8
  [15] .rela__ksymtab    RELA            0000000000000000 11529760 052dd0 18   I 49  14  8
  [17] .rela__ksymtab_gpl RELA            0000000000000000 1157c530 05eba8 18   I 49  16  8
  [20] .rela__param      RELA            0000000000000000 115db0d8 005400 18   I 49  19  8
  [22] .rela__modver     RELA            0000000000000000 115e04d8 000300 18   I 49  21  8
  [24] .rela__ex_table   RELA            0000000000000000 115e07d8 00cde0 18   I 49  23  8
  [29] .rela__bug_table  RELA            0000000000000000 115ed5b8 0bd1e0 18   I 49  28  8
  [32] .data.rel         PROGBITS        ffffffff816a5940 e21940 0d0df0 00  WA  0   0 64

- the sections of vmlinux with objcopy 2.38

riscv64-linux-gnu-readelf -SW 00_rv_newtool/vmlinux | grep rel


  [25] .data.rel         PROGBITS        ffffffff816a6340 f77340 0d0cb0 00  WA  0   0 64

The difference comes from binutils's commit c12d9fa2afe ("Support
objcopy --remove-section=.relaFOO").
The option `--remove-setions='.rela__'` wasn't supported before
binutils/objcopy 2.32, so all of '.rela__' RELA sections were kept.
And '.rela.dyn' section was kept due to the mismatch between the
stripped 'dyn' section_pattern and the actual '.dynamic' section name.

Should we kill the '.rela.dyn' section from the vmlinux ?
IMO, some senses (like, kexec/kdump) will load/run vmlinux directly that
needs this allocatable section.
And from my kexec/kdump test with vmlinux, the 2nd kernel could start
with '.rela.dyn' but failed if no '.rela.dyn'.

How about keeping '.rela.dyn' section in vmlinux and
making `remove-section` consistent with different version GNU-objcopy ?


2. the stripped vmlinux has huge symtab

I found a similar issue[2] about the huge symtab of kernel modules.

The aggressive link-time relaxations of RISC-V need sufficient
relocation info and local symbols to rewrite the code at link time.
That would result in a lot of extra symbols and relocations.

Kernel modules are compiled `-mno-relax`. But the toolchain still needed
to improve to emit fewer things under `-mno-relax`. Util that, stripping
ko with `--discard-all` or `--discard-locals` would be an option to
reduce the symtab size. (the Ubuntu fixing patch [3])

While vmlinux now uses `emit-relocs` option that would aggravate the
symtab size.
(It would take a long long time to start when using the current vmlinux
as the Crash's namelist. Crash is busy in symtab_init() function.)

So how about objcopying vmlinux with `--discard-locals` option to
reduce the symtab size ?

(And how about adopting the Ubuntu patch into riscv kernel tree? )


3. suspicious relocations in vmlinux

The vmlinux has some suspicious R_RISCV_NONE/R_RISCV_64 relocations
emitted with the `emit-relocs` option, that would be detected by
`tools/reloc_check.sh` and flush the console when making vmlinux.

riscv64-linux-gnu-objdump -R ./00_rv_newtool/vmlinux | grep -E '\<R_' | awk '{print $2}' | sort | uniq -c
      6 R_RISCV_64
  20201 R_RISCV_NONE
  71307 R_RISCV_RELATIVE

Is there a way/tool to get rid of these relocations from vmlinux,
or temporarily silence the echo of those bad relocations?


[1]: https://lore.kernel.org/all/20230329045329.64565-1-alexghiti@rivosinc.com/
[2]: https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1036
[3]: https://patchwork.ozlabs.org/project/ubuntu-kernel/patch/20220309161622.124754-1-dimitri.ledkov@canonical.com/#2855027

Finally, you can try this temporary git-diff for issue1 and issue2:

Comments

Palmer Dabbelt Sept. 13, 2023, 2:30 p.m. UTC | #1
On Tue, 08 Aug 2023 01:54:38 PDT (-0700), suagrfillet@gmail.com wrote:
> Hi, everyone:
>
> I encountered some issues when testing CONFIG_RELOCATABLE.
>
> The story starts with the issue "the empty relocations in .rela.dyn"
> mentioned in patch 1 of "Introduce 64b relocatable kernel" thread [1].
> And it had been circumvented by ld `--emit-relocs` option in patch 6.
>
> With the `emit-relocs` option enabled, the vmlinux would grow bigger,
> so vmlinux.relocs was created as a backup to generate the Image file
> and vmlinux objcopyed itself to check/strip all .rela* sections.
>
> Not sure there is a solution to fix the "empty relocaions" issue
> and get rid of the `emit-relocs` option.
> Until that, there are some other issues with vmlinux's `emit-relocs` option:
>
> 1. result of `remove-section` varies with different version GNU-objcopy
>
> - the sections of vmlinux with objcopy 2.31.1
>
> riscv64-linux-gnu-readelf -SW 00_rv_test/vmlinux |grep rel
>   [10] .rela.dyn         RELA            ffffffff80c26138 913138 2186a0 18   A  9   0  8
>   [15] .rela__ksymtab    RELA            0000000000000000 11529760 052dd0 18   I 49  14  8
>   [17] .rela__ksymtab_gpl RELA            0000000000000000 1157c530 05eba8 18   I 49  16  8
>   [20] .rela__param      RELA            0000000000000000 115db0d8 005400 18   I 49  19  8
>   [22] .rela__modver     RELA            0000000000000000 115e04d8 000300 18   I 49  21  8
>   [24] .rela__ex_table   RELA            0000000000000000 115e07d8 00cde0 18   I 49  23  8
>   [29] .rela__bug_table  RELA            0000000000000000 115ed5b8 0bd1e0 18   I 49  28  8
>   [32] .data.rel         PROGBITS        ffffffff816a5940 e21940 0d0df0 00  WA  0   0 64
>
> - the sections of vmlinux with objcopy 2.38
>
> riscv64-linux-gnu-readelf -SW 00_rv_newtool/vmlinux | grep rel
>
>
>   [25] .data.rel         PROGBITS        ffffffff816a6340 f77340 0d0cb0 00  WA  0   0 64
>
> The difference comes from binutils's commit c12d9fa2afe ("Support
> objcopy --remove-section=.relaFOO").
> The option `--remove-setions='.rela__'` wasn't supported before
> binutils/objcopy 2.32, so all of '.rela__' RELA sections were kept.
> And '.rela.dyn' section was kept due to the mismatch between the
> stripped 'dyn' section_pattern and the actual '.dynamic' section name.
>
> Should we kill the '.rela.dyn' section from the vmlinux ?
> IMO, some senses (like, kexec/kdump) will load/run vmlinux directly that
> needs this allocatable section.
> And from my kexec/kdump test with vmlinux, the 2nd kernel could start
> with '.rela.dyn' but failed if no '.rela.dyn'.
>
> How about keeping '.rela.dyn' section in vmlinux and
> making `remove-section` consistent with different version GNU-objcopy ?
>
>
> 2. the stripped vmlinux has huge symtab
>
> I found a similar issue[2] about the huge symtab of kernel modules.
>
> The aggressive link-time relaxations of RISC-V need sufficient
> relocation info and local symbols to rewrite the code at link time.
> That would result in a lot of extra symbols and relocations.
>
> Kernel modules are compiled `-mno-relax`. But the toolchain still needed
> to improve to emit fewer things under `-mno-relax`. Util that, stripping
> ko with `--discard-all` or `--discard-locals` would be an option to
> reduce the symtab size. (the Ubuntu fixing patch [3])
>
> While vmlinux now uses `emit-relocs` option that would aggravate the
> symtab size.
> (It would take a long long time to start when using the current vmlinux
> as the Crash's namelist. Crash is busy in symtab_init() function.)
>
> So how about objcopying vmlinux with `--discard-locals` option to
> reduce the symtab size ?
>
> (And how about adopting the Ubuntu patch into riscv kernel tree? )
>
>
> 3. suspicious relocations in vmlinux
>
> The vmlinux has some suspicious R_RISCV_NONE/R_RISCV_64 relocations
> emitted with the `emit-relocs` option, that would be detected by
> `tools/reloc_check.sh` and flush the console when making vmlinux.
>
> riscv64-linux-gnu-objdump -R ./00_rv_newtool/vmlinux | grep -E '\<R_' | awk '{print $2}' | sort | uniq -c
>       6 R_RISCV_64
>   20201 R_RISCV_NONE

Having R_RISCV_NONE in any binary is a bug, it's not a real relocation 
just an internal binutils thing.  There's another bug on sourceware 
about it, we're not quite sure what the right answer is yet:

https://sourceware.org/bugzilla/show_bug.cgi?id=30844

>   71307 R_RISCV_RELATIVE
>
> Is there a way/tool to get rid of these relocations from vmlinux,
> or temporarily silence the echo of those bad relocations?
>
>
> [1]: https://lore.kernel.org/all/20230329045329.64565-1-alexghiti@rivosinc.com/
> [2]: https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1036
> [3]: https://patchwork.ozlabs.org/project/ubuntu-kernel/patch/20220309161622.124754-1-dimitri.ledkov@canonical.com/#2855027
>
> Finally, you can try this temporary git-diff for issue1 and issue2:
>
> diff --git a/arch/riscv/Makefile.postlink b/arch/riscv/Makefile.postlink
> index a46fc578b30b..3324c3ede9c6 100644
> --- a/arch/riscv/Makefile.postlink
> +++ b/arch/riscv/Makefile.postlink
> @@ -20,10 +20,10 @@ quiet_cmd_cp_vmlinux_relocs = CPREL   vmlinux.relocs
>  cmd_cp_vmlinux_relocs = cp vmlinux vmlinux.relocs
>
>  quiet_cmd_relocs_strip = STRIPREL $@
> -cmd_relocs_strip = $(OBJCOPY)   --remove-section='.rel.*'       \
> -                                --remove-section='.rel__*'      \
> -                                --remove-section='.rela.*'      \
> -                                --remove-section='.rela__*' $@
> +cmd_relocs_strip = 							\
> +	$(CONFIG_SHELL) $(srctree)/arch/riscv/tools/relocs_strip.sh	\
> +				  "$(OBJCOPY)" "$(CONFIG_LD_VERSION)" "$@"
> +
>  endif
>
>  # `@true` prevents complaint when there is nothing to be done
> diff --git a/arch/riscv/tools/relocs_strip.sh b/arch/riscv/tools/relocs_strip.sh
> new file mode 100755
> index 000000000000..20cb69cf041b
> --- /dev/null
> +++ b/arch/riscv/tools/relocs_strip.sh
> @@ -0,0 +1,34 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +# This script strips and the relocations and the
> +# compiler-generated local symbols of a vmlinux.
> +
> +if [ $# -lt 3 ]; then
> +        echo "$0 [path to objcopy] [version of GNU objcopy] [path to vmlinux]" 1>&2
> +        exit 1
> +fi
> +
> +objcopy="$1"
> +objcopy_version="$2"
> +vmlinux="$3"
> +
> +# binutils/objcopy didn't support '--remove-setions='.rela__'' option util 2.32,
> +# use `--remove-relocations` to remove those RELA sections.
> +
> +if [ 0 -lt $objcopy_version -a 23200 -gt $objcopy_version ]; then
> +
> +	$objcopy --remove-section='.rel.*'	\
> +		 --remove-section='.rela.*'	\
> +		 --remove-relocations='__*' $vmlinux
> +else
> +	$objcopy --keep-section='.rela.dyn'	\
> +		 --remove-section='.rel.*'	\
> +		 --remove-section='.rel__*'	\
> +		 --remove-section='.rela.*'	\
> +		 --remove-section='.rela__*' $vmlinux
> +fi
> +
> +# discard the compiler-generated local symbols of vmlinux
> +
> +$objcopy --discard-locals $vmlinux
Andreas Schwab Sept. 13, 2023, 2:47 p.m. UTC | #2
On Sep 13 2023, Palmer Dabbelt wrote:

> Having R_RISCV_NONE in any binary is a bug, it's not a real relocation
> just an internal binutils thing.

That's not true.  It's an official part of the ELF psABI.
Palmer Dabbelt Jan. 23, 2024, 2:01 a.m. UTC | #3
On Wed, 13 Sep 2023 07:47:28 PDT (-0700), schwab@suse.de wrote:
> On Sep 13 2023, Palmer Dabbelt wrote:
>
>> Having R_RISCV_NONE in any binary is a bug, it's not a real relocation
>> just an internal binutils thing.
>
> That's not true.  It's an official part of the ELF psABI.

Ya, sorry, I got it mixed up with R_RISCV_DELETE.
diff mbox series

Patch

diff --git a/arch/riscv/Makefile.postlink b/arch/riscv/Makefile.postlink
index a46fc578b30b..3324c3ede9c6 100644
--- a/arch/riscv/Makefile.postlink
+++ b/arch/riscv/Makefile.postlink
@@ -20,10 +20,10 @@  quiet_cmd_cp_vmlinux_relocs = CPREL   vmlinux.relocs
 cmd_cp_vmlinux_relocs = cp vmlinux vmlinux.relocs
 
 quiet_cmd_relocs_strip = STRIPREL $@
-cmd_relocs_strip = $(OBJCOPY)   --remove-section='.rel.*'       \
-                                --remove-section='.rel__*'      \
-                                --remove-section='.rela.*'      \
-                                --remove-section='.rela__*' $@
+cmd_relocs_strip = 							\
+	$(CONFIG_SHELL) $(srctree)/arch/riscv/tools/relocs_strip.sh	\
+				  "$(OBJCOPY)" "$(CONFIG_LD_VERSION)" "$@"
+
 endif
 
 # `@true` prevents complaint when there is nothing to be done
diff --git a/arch/riscv/tools/relocs_strip.sh b/arch/riscv/tools/relocs_strip.sh
new file mode 100755
index 000000000000..20cb69cf041b
--- /dev/null
+++ b/arch/riscv/tools/relocs_strip.sh
@@ -0,0 +1,34 @@ 
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+# This script strips and the relocations and the
+# compiler-generated local symbols of a vmlinux.
+
+if [ $# -lt 3 ]; then
+        echo "$0 [path to objcopy] [version of GNU objcopy] [path to vmlinux]" 1>&2
+        exit 1
+fi
+
+objcopy="$1"
+objcopy_version="$2"
+vmlinux="$3"
+
+# binutils/objcopy didn't support '--remove-setions='.rela__'' option util 2.32,
+# use `--remove-relocations` to remove those RELA sections.
+
+if [ 0 -lt $objcopy_version -a 23200 -gt $objcopy_version ]; then
+
+	$objcopy --remove-section='.rel.*'	\
+		 --remove-section='.rela.*'	\
+		 --remove-relocations='__*' $vmlinux
+else
+	$objcopy --keep-section='.rela.dyn'	\
+		 --remove-section='.rel.*'	\
+		 --remove-section='.rel__*'	\
+		 --remove-section='.rela.*'	\
+		 --remove-section='.rela__*' $vmlinux
+fi
+
+# discard the compiler-generated local symbols of vmlinux
+
+$objcopy --discard-locals $vmlinux