Message ID | 20221130225614.1594256-1-heiko@sntech.de (mailing list archive) |
---|---|
Headers | show |
Series | Zbb string optimizations and call support in alternatives | expand |
On 30/11/2022 22:56, Heiko Stuebner wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > From: Heiko Stuebner <heiko.stuebner@vrull.eu> > > The Zbb extension can be used to make string functions run a lot > faster. > > To allow There are essentially two problems to solve: > - making it possible for str* functions to replace what they do > in a performant way > > This is done by inlining the core functions and then > using alternatives to call the actual variant. > > This of course will need a more intelligent selection mechanism > down the road when more variants may exist using different > available extensions. > > - actually allowing calls in alternatives > Function calls use auipc + jalr to reach those 32bit relative > addresses but when they're compiled the offset will be wrong > as alternatives live in a different section. So when the patch > gets applied the address will point to the wrong location. > > So similar to arm64 the target addresses need to be updated. > > This is probably also helpful for other things needing more > complex code in alternatives. > > > In my half-scientific test-case of running the functions in question > on a 95 character string in a loop of 10000 iterations, the Zbb > variants shave off around 2/3 of the original runtime. > > > For v2 I got into some sort of cleanup spree for the general instruction > parsing that already existed. A number of places do their own > instruction parsing and I tried consolidating some of them. > > Noteable, the kvm parts still do, but I had to stop somewhere :-) > > The series is based on v6.1-rc7 right now. > > changes since v2: > - add patch fixing the c.jalr funct4 value > - reword some commit messages > - fix position of auipc addition patch (earlier) > - fix compile errors from patch-reordering gone wrong > (worked at the end of v2, but compiling individual patches > caused issues) - patches are now tested individually > - limit Zbb variants for GNU as for now > (LLVM support for .option arch is still under review) Still no good on that front chief: ld.lld: error: undefined symbol: __strlen_generic >>> referenced by ctype.c >>> arch/riscv/purgatory/purgatory.ro:(strlcpy) >>> referenced by ctype.c >>> arch/riscv/purgatory/purgatory.ro:(strlcat) >>> referenced by ctype.c >>> arch/riscv/purgatory/purgatory.ro:(strlcat) >>> referenced 3 more times make[5]: *** [/stuff/linux/arch/riscv/purgatory/Makefile:85: arch/riscv/purgatory/purgatory.chk] Error 1 make[5]: Target 'arch/riscv/purgatory/' not remade because of errors. make[4]: *** [/stuff/linux/scripts/Makefile.build:500: arch/riscv/purgatory] Error 2 allmodconfig, same toolchain as before. > - prevent str-functions from getting optimized to builtin-variants > > changes since v1: > - a number of generalizations/cleanups for instruction parsing > - use accessor function to access instructions (Emil) > - actually patch the correct location when having more than one > instruction in an alternative block > - string function cleanups (comments etc) (Conor) > - move zbb extension above s* extensions in cpu.c lists > > changes since rfc: > - make Zbb code actually work > - drop some unneeded patches > - a lot of cleanups > > Heiko Stuebner (14): > RISC-V: fix funct4 definition for c.jalr in parse_asm.h > RISC-V: add prefix to all constants/macros in parse_asm.h > RISC-V: detach funct-values from their offset > RISC-V: add ebreak instructions to definitions > RISC-V: add auipc elements to parse_asm header > RISC-V: Move riscv_insn_is_* macros into a common header > RISC-V: rename parse_asm.h to insn.h > RISC-V: kprobes: use central defined funct3 constants > RISC-V: add U-type imm parsing to insn.h header > RISC-V: add rd reg parsing to insn.h header > RISC-V: fix auipc-jalr addresses in patched alternatives > efi/riscv: libstub: mark when compiling libstub > RISC-V: add infrastructure to allow different str* implementations > RISC-V: add zbb support to string functions > > arch/riscv/Kconfig | 24 ++ > arch/riscv/Makefile | 3 + > arch/riscv/include/asm/alternative.h | 3 + > arch/riscv/include/asm/errata_list.h | 3 +- > arch/riscv/include/asm/hwcap.h | 1 + > arch/riscv/include/asm/insn.h | 292 +++++++++++++++++++++++ > arch/riscv/include/asm/parse_asm.h | 219 ----------------- > arch/riscv/include/asm/string.h | 83 +++++++ > arch/riscv/kernel/alternative.c | 72 ++++++ > arch/riscv/kernel/cpu.c | 1 + > arch/riscv/kernel/cpufeature.c | 29 ++- > arch/riscv/kernel/image-vars.h | 6 +- > arch/riscv/kernel/kgdb.c | 63 ++--- > arch/riscv/kernel/probes/simulate-insn.c | 19 +- > arch/riscv/kernel/probes/simulate-insn.h | 26 +- > arch/riscv/lib/Makefile | 6 + > arch/riscv/lib/strcmp.S | 38 +++ > arch/riscv/lib/strcmp_zbb.S | 96 ++++++++ > arch/riscv/lib/strlen.S | 29 +++ > arch/riscv/lib/strlen_zbb.S | 115 +++++++++ > arch/riscv/lib/strncmp.S | 41 ++++ > arch/riscv/lib/strncmp_zbb.S | 112 +++++++++ > drivers/firmware/efi/libstub/Makefile | 2 +- > 23 files changed, 982 insertions(+), 301 deletions(-) > create mode 100644 arch/riscv/include/asm/insn.h > delete mode 100644 arch/riscv/include/asm/parse_asm.h > create mode 100644 arch/riscv/lib/strcmp.S > create mode 100644 arch/riscv/lib/strcmp_zbb.S > create mode 100644 arch/riscv/lib/strlen.S > create mode 100644 arch/riscv/lib/strlen_zbb.S > create mode 100644 arch/riscv/lib/strncmp.S > create mode 100644 arch/riscv/lib/strncmp_zbb.S > > -- > 2.35.1 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
Am Donnerstag, 1. Dezember 2022, 01:02:08 CET schrieb Conor Dooley: > On 30/11/2022 22:56, Heiko Stuebner wrote: > > changes since v2: > > - add patch fixing the c.jalr funct4 value > > - reword some commit messages > > - fix position of auipc addition patch (earlier) > > - fix compile errors from patch-reordering gone wrong > > (worked at the end of v2, but compiling individual patches > > caused issues) - patches are now tested individually > > - limit Zbb variants for GNU as for now > > (LLVM support for .option arch is still under review) > > Still no good on that front chief: > ld.lld: error: undefined symbol: __strlen_generic > >>> referenced by ctype.c > >>> arch/riscv/purgatory/purgatory.ro:(strlcpy) > >>> referenced by ctype.c > >>> arch/riscv/purgatory/purgatory.ro:(strlcat) > >>> referenced by ctype.c > >>> arch/riscv/purgatory/purgatory.ro:(strlcat) > >>> referenced 3 more times > make[5]: *** [/stuff/linux/arch/riscv/purgatory/Makefile:85: arch/riscv/purgatory/purgatory.chk] Error 1 > make[5]: Target 'arch/riscv/purgatory/' not remade because of errors. > make[4]: *** [/stuff/linux/scripts/Makefile.build:500: arch/riscv/purgatory] Error 2 Oh interesting, there is another efistub-like thingy hidden in the tree. (and CRYPTO_SHA256 needs to be built-in, not a module) to allow the kexec-purgatory to be build. The following should do the trick: ---------------- 8< -------------- diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h index 806c402c874e..b99698983045 100644 --- a/arch/riscv/include/asm/string.h +++ b/arch/riscv/include/asm/string.h @@ -27,7 +27,7 @@ extern asmlinkage int __strcmp_zbb(const char *cs, const char *ct); static inline int strcmp(const char *cs, const char *ct) { -#ifdef RISCV_EFISTUB +#if defined(RISCV_EFISTUB) || defined(RISCV_PURGATORY) return __strcmp_generic(cs, ct); #else register const char *a0 asm("a0") = cs; @@ -55,7 +55,7 @@ extern asmlinkage int __strncmp_zbb(const char *cs, static inline int strncmp(const char *cs, const char *ct, size_t count) { -#ifdef RISCV_EFISTUB +#if defined(RISCV_EFISTUB) || defined(RISCV_PURGATORY) return __strncmp_generic(cs, ct, count); #else register const char *a0 asm("a0") = cs; @@ -82,7 +82,7 @@ extern asmlinkage __kernel_size_t __strlen_zbb(const char *); static inline __kernel_size_t strlen(const char *s) { -#ifdef RISCV_EFISTUB +#if defined(RISCV_EFISTUB) || defined(RISCV_PURGATORY) return __strlen_generic(s); #else register const char *a0 asm("a0") = s; diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile index dd58e1d99397..1d0969722875 100644 --- a/arch/riscv/purgatory/Makefile +++ b/arch/riscv/purgatory/Makefile @@ -2,6 +2,7 @@ OBJECT_FILES_NON_STANDARD := y purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o +purgatory-y += strcmp.o strlen.o strncmp.o targets += $(purgatory-y) PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y)) @@ -18,6 +19,15 @@ $(obj)/memcpy.o: $(srctree)/arch/riscv/lib/memcpy.S FORCE $(obj)/memset.o: $(srctree)/arch/riscv/lib/memset.S FORCE $(call if_changed_rule,as_o_S) +$(obj)/strcmp.o: $(srctree)/arch/riscv/lib/strcmp.S FORCE + $(call if_changed_rule,as_o_S) + +$(obj)/strlen.o: $(srctree)/arch/riscv/lib/strlen.S FORCE + $(call if_changed_rule,as_o_S) + +$(obj)/strncmp.o: $(srctree)/arch/riscv/lib/strncmp.S FORCE + $(call if_changed_rule,as_o_S) + $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE $(call if_changed_rule,cc_o_c) @@ -46,6 +56,7 @@ PURGATORY_CFLAGS_REMOVE := -mcmodel=kernel PURGATORY_CFLAGS := -mcmodel=medany -ffreestanding -fno-zero-initialized-in-bss PURGATORY_CFLAGS += $(DISABLE_STACKLEAK_PLUGIN) -DDISABLE_BRANCH_PROFILING PURGATORY_CFLAGS += -fno-stack-protector -g0 +PURGATORY_CFLAGS += -DRISCV_PURGATORY # Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That # in turn leaves some undefined symbols like __fentry__ in purgatory and not @@ -77,6 +88,9 @@ CFLAGS_ctype.o += $(PURGATORY_CFLAGS) AFLAGS_REMOVE_entry.o += -Wa,-gdwarf-2 AFLAGS_REMOVE_memcpy.o += -Wa,-gdwarf-2 AFLAGS_REMOVE_memset.o += -Wa,-gdwarf-2 +AFLAGS_REMOVE_strcmp.o += -Wa,-gdwarf-2 +AFLAGS_REMOVE_strlen.o += -Wa,-gdwarf-2 +AFLAGS_REMOVE_strncmp.o += -Wa,-gdwarf-2 $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE $(call if_changed,ld)
From: Heiko Stuebner <heiko.stuebner@vrull.eu> The Zbb extension can be used to make string functions run a lot faster. To allow There are essentially two problems to solve: - making it possible for str* functions to replace what they do in a performant way This is done by inlining the core functions and then using alternatives to call the actual variant. This of course will need a more intelligent selection mechanism down the road when more variants may exist using different available extensions. - actually allowing calls in alternatives Function calls use auipc + jalr to reach those 32bit relative addresses but when they're compiled the offset will be wrong as alternatives live in a different section. So when the patch gets applied the address will point to the wrong location. So similar to arm64 the target addresses need to be updated. This is probably also helpful for other things needing more complex code in alternatives. In my half-scientific test-case of running the functions in question on a 95 character string in a loop of 10000 iterations, the Zbb variants shave off around 2/3 of the original runtime. For v2 I got into some sort of cleanup spree for the general instruction parsing that already existed. A number of places do their own instruction parsing and I tried consolidating some of them. Noteable, the kvm parts still do, but I had to stop somewhere :-) The series is based on v6.1-rc7 right now. changes since v2: - add patch fixing the c.jalr funct4 value - reword some commit messages - fix position of auipc addition patch (earlier) - fix compile errors from patch-reordering gone wrong (worked at the end of v2, but compiling individual patches caused issues) - patches are now tested individually - limit Zbb variants for GNU as for now (LLVM support for .option arch is still under review) - prevent str-functions from getting optimized to builtin-variants changes since v1: - a number of generalizations/cleanups for instruction parsing - use accessor function to access instructions (Emil) - actually patch the correct location when having more than one instruction in an alternative block - string function cleanups (comments etc) (Conor) - move zbb extension above s* extensions in cpu.c lists changes since rfc: - make Zbb code actually work - drop some unneeded patches - a lot of cleanups Heiko Stuebner (14): RISC-V: fix funct4 definition for c.jalr in parse_asm.h RISC-V: add prefix to all constants/macros in parse_asm.h RISC-V: detach funct-values from their offset RISC-V: add ebreak instructions to definitions RISC-V: add auipc elements to parse_asm header RISC-V: Move riscv_insn_is_* macros into a common header RISC-V: rename parse_asm.h to insn.h RISC-V: kprobes: use central defined funct3 constants RISC-V: add U-type imm parsing to insn.h header RISC-V: add rd reg parsing to insn.h header RISC-V: fix auipc-jalr addresses in patched alternatives efi/riscv: libstub: mark when compiling libstub RISC-V: add infrastructure to allow different str* implementations RISC-V: add zbb support to string functions arch/riscv/Kconfig | 24 ++ arch/riscv/Makefile | 3 + arch/riscv/include/asm/alternative.h | 3 + arch/riscv/include/asm/errata_list.h | 3 +- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/include/asm/insn.h | 292 +++++++++++++++++++++++ arch/riscv/include/asm/parse_asm.h | 219 ----------------- arch/riscv/include/asm/string.h | 83 +++++++ arch/riscv/kernel/alternative.c | 72 ++++++ arch/riscv/kernel/cpu.c | 1 + arch/riscv/kernel/cpufeature.c | 29 ++- arch/riscv/kernel/image-vars.h | 6 +- arch/riscv/kernel/kgdb.c | 63 ++--- arch/riscv/kernel/probes/simulate-insn.c | 19 +- arch/riscv/kernel/probes/simulate-insn.h | 26 +- arch/riscv/lib/Makefile | 6 + arch/riscv/lib/strcmp.S | 38 +++ arch/riscv/lib/strcmp_zbb.S | 96 ++++++++ arch/riscv/lib/strlen.S | 29 +++ arch/riscv/lib/strlen_zbb.S | 115 +++++++++ arch/riscv/lib/strncmp.S | 41 ++++ arch/riscv/lib/strncmp_zbb.S | 112 +++++++++ drivers/firmware/efi/libstub/Makefile | 2 +- 23 files changed, 982 insertions(+), 301 deletions(-) create mode 100644 arch/riscv/include/asm/insn.h delete mode 100644 arch/riscv/include/asm/parse_asm.h create mode 100644 arch/riscv/lib/strcmp.S create mode 100644 arch/riscv/lib/strcmp_zbb.S create mode 100644 arch/riscv/lib/strlen.S create mode 100644 arch/riscv/lib/strlen_zbb.S create mode 100644 arch/riscv/lib/strncmp.S create mode 100644 arch/riscv/lib/strncmp_zbb.S