diff mbox series

[v4,1/1] riscv: optimize ELF relocation function in riscv

Message ID 20230913130501.287250-1-fido_max@inbox.ru (mailing list archive)
State Superseded
Headers show
Series [v4,1/1] riscv: optimize ELF relocation function in riscv | expand

Checks

Context Check Description
conchuod/cover_letter success Single patches do not need cover letters
conchuod/tree_selection success Guessed tree name to be for-next at HEAD 0bb80ecc33a8
conchuod/fixes_present success Fixes tag not required for -next series
conchuod/maintainers_pattern success MAINTAINERS pattern errors before the patch: 5 and now 5
conchuod/verify_signedoff success Signed-off-by tag matches author and committer
conchuod/kdoc success Errors and warnings before: 0 this patch: 0
conchuod/build_rv64_clang_allmodconfig success Errors and warnings before: 12 this patch: 12
conchuod/module_param success Was 0 now: 0
conchuod/build_rv64_gcc_allmodconfig success Errors and warnings before: 13 this patch: 13
conchuod/build_rv32_defconfig success Build OK
conchuod/dtb_warn_rv64 success Errors and warnings before: 25 this patch: 25
conchuod/header_inline success No static functions without inline keyword in header files
conchuod/checkpatch success total: 0 errors, 0 warnings, 0 checks, 47 lines checked
conchuod/build_rv64_nommu_k210_defconfig success Build OK
conchuod/verify_fixes success No Fixes tag
conchuod/build_rv64_nommu_virt_defconfig success Build OK
conchuod/patch-1-test-13 success .github/scripts/patches/verify_signedoff.sh
conchuod/vmtest-for-next-PR success PR summary
conchuod/patch-1-test-1 success .github/scripts/patches/build_rv32_defconfig.sh
conchuod/patch-1-test-2 success .github/scripts/patches/build_rv64_clang_allmodconfig.sh
conchuod/patch-1-test-3 success .github/scripts/patches/build_rv64_gcc_allmodconfig.sh
conchuod/patch-1-test-4 success .github/scripts/patches/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-1-test-5 success .github/scripts/patches/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-1-test-6 success .github/scripts/patches/checkpatch.sh
conchuod/patch-1-test-7 success .github/scripts/patches/dtb_warn_rv64.sh
conchuod/patch-1-test-8 success .github/scripts/patches/header_inline.sh
conchuod/patch-1-test-9 success .github/scripts/patches/kdoc.sh
conchuod/patch-1-test-10 success .github/scripts/patches/module_param.sh
conchuod/patch-1-test-11 success .github/scripts/patches/verify_fixes.sh
conchuod/patch-1-test-12 success .github/scripts/patches/verify_signedoff.sh

Commit Message

Maxim Kochetkov Sept. 13, 2023, 1:05 p.m. UTC
The patch can optimize the running times of insmod command by modify ELF
relocation function.
In the 5.10 and latest kernel, when install the riscv ELF drivers which
contains multiple symbol table items to be relocated, kernel takes a lot
of time to execute the relocation. For example, we install a 3+MB driver
need 180+s.
We focus on the riscv architecture handle R_RISCV_HI20 and R_RISCV_LO20
type items relocation function in the arch\riscv\kernel\module.c and
find that there are two-loops in the function. If we modify the begin
number in the second for-loops iteration, we could save significant time
for installation. We install the same 3+MB driver could just need 2s.

Signed-off-by: Amma Lee <lixiaoyun@binary-semi.com>
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
---
Changes in v4:
- use 'while' loop instead of 'for' loop to avoid code duplicate
---
 arch/riscv/kernel/module.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

Comments

Charlie Jenkins Dec. 8, 2023, 1:02 a.m. UTC | #1
On Wed, Sep 13, 2023 at 04:05:00PM +0300, Maxim Kochetkov wrote:
> The patch can optimize the running times of insmod command by modify ELF
> relocation function.
> In the 5.10 and latest kernel, when install the riscv ELF drivers which
> contains multiple symbol table items to be relocated, kernel takes a lot
> of time to execute the relocation. For example, we install a 3+MB driver
> need 180+s.
> We focus on the riscv architecture handle R_RISCV_HI20 and R_RISCV_LO20
> type items relocation function in the arch\riscv\kernel\module.c and
> find that there are two-loops in the function. If we modify the begin
> number in the second for-loops iteration, we could save significant time
> for installation. We install the same 3+MB driver could just need 2s.
> 
> Signed-off-by: Amma Lee <lixiaoyun@binary-semi.com>
> Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
> ---
> Changes in v4:
> - use 'while' loop instead of 'for' loop to avoid code duplicate
> ---
>  arch/riscv/kernel/module.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> index 7c651d55fcbd..8c9b644ebfdb 100644
> --- a/arch/riscv/kernel/module.c
> +++ b/arch/riscv/kernel/module.c
> @@ -346,6 +346,7 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>  	Elf_Sym *sym;
>  	u32 *location;
>  	unsigned int i, type;
> +	unsigned int j_idx = 0;
>  	Elf_Addr v;
>  	int res;
>  
> @@ -384,9 +385,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>  		v = sym->st_value + rel[i].r_addend;
>  
>  		if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
> -			unsigned int j;
> +			unsigned int j = j_idx;
> +			bool found = false;
>  
> -			for (j = 0; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
> +			do {
>  				unsigned long hi20_loc =
>  					sechdrs[sechdrs[relsec].sh_info].sh_addr
>  					+ rel[j].r_offset;
> @@ -415,16 +417,26 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>  					hi20 = (offset + 0x800) & 0xfffff000;
>  					lo12 = offset - hi20;
>  					v = lo12;
> +					found = true;
>  
>  					break;
>  				}
> -			}
> -			if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
> +
> +				j++;
> +				if (j > sechdrs[relsec].sh_size / sizeof(*rel))
> +					j = 0;
Very interesting algorithm here. Assuming the hi relocation is after the
previous one seems to be a good heuristic. However I think we can do
better. In GNU ld, a hashmap of all of the hi relocations is stored and
a list of all of the lo relocations. After all of the other relocations
have been parsed, it iterates through all of the lo relocations and
looks up the associated hi relocation in the hashmap.

There is more memory overhead here but I suspect it will be faster. I
had started to mock up a hashmap implementation to see if it was faster
but decided I should mention it here first in case somebody had some
additional insight. 

- Charlie

> +
> +			} while (j_idx != j);
> +
> +			if (!found) {
>  				pr_err(
>  				  "%s: Can not find HI20 relocation information\n",
>  				  me->name);
>  				return -EINVAL;
>  			}
> +
> +			/* Record the previous j-loop end index */
> +			j_idx = j;
>  		}
>  
>  		res = handler(me, location, v);
> -- 
> 2.40.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Charlie Jenkins Dec. 12, 2023, 1:51 a.m. UTC | #2
On Thu, Dec 07, 2023 at 05:02:16PM -0800, Charlie Jenkins wrote:
> On Wed, Sep 13, 2023 at 04:05:00PM +0300, Maxim Kochetkov wrote:
> > The patch can optimize the running times of insmod command by modify ELF
> > relocation function.
> > In the 5.10 and latest kernel, when install the riscv ELF drivers which
> > contains multiple symbol table items to be relocated, kernel takes a lot
> > of time to execute the relocation. For example, we install a 3+MB driver
> > need 180+s.
> > We focus on the riscv architecture handle R_RISCV_HI20 and R_RISCV_LO20
> > type items relocation function in the arch\riscv\kernel\module.c and
> > find that there are two-loops in the function. If we modify the begin
> > number in the second for-loops iteration, we could save significant time
> > for installation. We install the same 3+MB driver could just need 2s.
> > 
> > Signed-off-by: Amma Lee <lixiaoyun@binary-semi.com>
> > Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
> > ---
> > Changes in v4:
> > - use 'while' loop instead of 'for' loop to avoid code duplicate
> > ---
> >  arch/riscv/kernel/module.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> > index 7c651d55fcbd..8c9b644ebfdb 100644
> > --- a/arch/riscv/kernel/module.c
> > +++ b/arch/riscv/kernel/module.c
> > @@ -346,6 +346,7 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  	Elf_Sym *sym;
> >  	u32 *location;
> >  	unsigned int i, type;
> > +	unsigned int j_idx = 0;
> >  	Elf_Addr v;
> >  	int res;
> >  
> > @@ -384,9 +385,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  		v = sym->st_value + rel[i].r_addend;
> >  
> >  		if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
> > -			unsigned int j;
> > +			unsigned int j = j_idx;
> > +			bool found = false;
> >  
> > -			for (j = 0; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
> > +			do {
> >  				unsigned long hi20_loc =
> >  					sechdrs[sechdrs[relsec].sh_info].sh_addr
> >  					+ rel[j].r_offset;
> > @@ -415,16 +417,26 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  					hi20 = (offset + 0x800) & 0xfffff000;
> >  					lo12 = offset - hi20;
> >  					v = lo12;
> > +					found = true;
> >  
> >  					break;
> >  				}
> > -			}
> > -			if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
> > +
> > +				j++;
> > +				if (j > sechdrs[relsec].sh_size / sizeof(*rel))
> > +					j = 0;
> Very interesting algorithm here. Assuming the hi relocation is after the
> previous one seems to be a good heuristic. However I think we can do
> better. In GNU ld, a hashmap of all of the hi relocations is stored and
> a list of all of the lo relocations. After all of the other relocations
> have been parsed, it iterates through all of the lo relocations and
> looks up the associated hi relocation in the hashmap.
> 
> There is more memory overhead here but I suspect it will be faster. I
> had started to mock up a hashmap implementation to see if it was faster
> but decided I should mention it here first in case somebody had some
> additional insight. 

Turns out this is a fantastic heuristic. Using a hashmap is
significantly faster than the default implementation but this algorithm
above is significantly faster than the hashmap. Using the amdgpu driver
(which is actually a collection of drivers) and is a size of about 469M
I found that the hashmap implementation is about 30% faster than the
current implementation, but this patch is 50% faster than the current
implementation. It is probably possible to write an ELF header with the
relocations sufficiently scrambled to make the hashmap faster, but I
suspect that for all "normal" programs this algorithm is faster.

I also tried a couple other smaller modules and it was faster or around
the same as the hashmap in all of them.

A lot of code has changed in this file since this patch was submitted,
can you rebase onto 6.7-rc1? Otherwise this patch is great.

Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>

> 
> - Charlie
> 
> > +
> > +			} while (j_idx != j);
> > +
> > +			if (!found) {
> >  				pr_err(
> >  				  "%s: Can not find HI20 relocation information\n",
> >  				  me->name);
> >  				return -EINVAL;
> >  			}
> > +
> > +			/* Record the previous j-loop end index */
> > +			j_idx = j;
> >  		}
> >  
> >  		res = handler(me, location, v);
> > -- 
> > 2.40.1
> > 
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
diff mbox series

Patch

diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
index 7c651d55fcbd..8c9b644ebfdb 100644
--- a/arch/riscv/kernel/module.c
+++ b/arch/riscv/kernel/module.c
@@ -346,6 +346,7 @@  int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
 	Elf_Sym *sym;
 	u32 *location;
 	unsigned int i, type;
+	unsigned int j_idx = 0;
 	Elf_Addr v;
 	int res;
 
@@ -384,9 +385,10 @@  int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
 		v = sym->st_value + rel[i].r_addend;
 
 		if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
-			unsigned int j;
+			unsigned int j = j_idx;
+			bool found = false;
 
-			for (j = 0; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
+			do {
 				unsigned long hi20_loc =
 					sechdrs[sechdrs[relsec].sh_info].sh_addr
 					+ rel[j].r_offset;
@@ -415,16 +417,26 @@  int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
 					hi20 = (offset + 0x800) & 0xfffff000;
 					lo12 = offset - hi20;
 					v = lo12;
+					found = true;
 
 					break;
 				}
-			}
-			if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
+
+				j++;
+				if (j > sechdrs[relsec].sh_size / sizeof(*rel))
+					j = 0;
+
+			} while (j_idx != j);
+
+			if (!found) {
 				pr_err(
 				  "%s: Can not find HI20 relocation information\n",
 				  me->name);
 				return -EINVAL;
 			}
+
+			/* Record the previous j-loop end index */
+			j_idx = j;
 		}
 
 		res = handler(me, location, v);