Message ID | 20220716001616.4052225-1-ndesaulniers@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm: lib: implement aeabi_uldivmod via div64_u64_rem | expand |
On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > > Compilers frequently need to defer 64b division to a libcall with this > symbol name. It essentially is div64_u64_rem, just with a different > signature. Kernel developers know to call div64_u64_rem, but compilers > don't. > > Link: https://lore.kernel.org/lkml/20220524004156.0000790e@garyguo.net/ > Suggested-by: Gary Guo <gary@garyguo.net> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> This has historically been strongly NAK'd, and I don't think that position has changed in the meantime. A variable-argument 64-bit division is really expensive, especially on 32-bit machines that lack a native 32-bit division instruction, and we don't want developers to accidentally insert one in their driver code. Explicitly calling one of the division helpers in linux/math64.h is the established way for driver writers to declare that a particular division cannot be turned into a cheaper operation and is never run in a performance critical code path. The compiler of course cannot know about either of those. Arnd
Hi Nick, I love your patch! Perhaps something to improve: [auto build test WARNING on soc/for-next] [also build test WARNING on arm/for-next linus/master v5.19-rc7 next-20220718] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Nick-Desaulniers/arm-lib-implement-aeabi_uldivmod-via-div64_u64_rem/20220716-231205 base: https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next config: arm-ixp4xx_defconfig (https://download.01.org/0day-ci/archive/20220719/202207190057.1gn0emAr-lkp@intel.com/config) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d74b88c69dc2644bd0dc5d64e2d7413a0d4040e5) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install arm cross compiling tool for clang build # apt-get install binutils-arm-linux-gnueabi # https://github.com/intel-lab-lkp/linux/commit/f0441d7a00899e433705accb0da58c94f4ff8808 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Nick-Desaulniers/arm-lib-implement-aeabi_uldivmod-via-div64_u64_rem/20220716-231205 git checkout f0441d7a00899e433705accb0da58c94f4ff8808 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): >> arch/arm/lib/aeabi_uldivmod.c:17:15: warning: no previous prototype for function '__aeabi_uldivmod' [-Wmissing-prototypes] struct result __aeabi_uldivmod(u64 numerator, u64 denominator) ^ arch/arm/lib/aeabi_uldivmod.c:17:1: note: declare 'static' if the function is not intended to be used outside of this translation unit struct result __aeabi_uldivmod(u64 numerator, u64 denominator) ^ static 1 warning generated. vim +/__aeabi_uldivmod +17 arch/arm/lib/aeabi_uldivmod.c 16 > 17 struct result __aeabi_uldivmod(u64 numerator, u64 denominator)
On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@kernel.org> wrote: > > On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > > > Compilers frequently need to defer 64b division to a libcall with this > > symbol name. It essentially is div64_u64_rem, just with a different > > signature. Kernel developers know to call div64_u64_rem, but compilers > > don't. > > > > Link: https://lore.kernel.org/lkml/20220524004156.0000790e@garyguo.net/ > > Suggested-by: Gary Guo <gary@garyguo.net> > > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> So the existing division by constant issues went away, and Craig was able to improve division by double-word constants in LLVM 1. https://reviews.llvm.org/D130862 2. https://reviews.llvm.org/D135541 But we still have one instance left that's not div/rem by constant via CONFIG_FPE_NWFPE=y that's now blocking Android's compiler upgrade. https://github.com/ClangBuiltLinux/linux/issues/1666 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312 Any creative ideas on how to avoid this? Perhaps putting the `aSig -= bSig;` in inline asm? Inserting a `barrier()` or empty asm statement into the loops also seems to work. Otherwise I'd define __aeabi_uldivmod as static in arch/arm/nwfpe/softfloat.c (with the body from this patch) only for clang. I see this function seems to be based on Berkeley Softfloat http://www.jhauser.us/arithmetic/SoftFloat.html v2. v3 looks like a total rewrite. Looking at v3e, it looks like float64_rem() is now called f64_rem() and defined in f64_rem.c. It doesn't look like there's anything from v3 that we could backport to the kernel's v2 to avoid this. Otherwise perhaps we just disable OABI_COMPAT for clang. Quite a few defconfigs explicitly enable FPE_NWFPE though. Are there really a lot of OABI binaries still in use? There's also the hidden llvm flag: `-mllvm -replexitval=never` that seems to work here, though FWICT it's disabling 3 such loop elisions (I think all three statements in that do while). That's probably the best way forward here... https://reviews.llvm.org/D9800 made the decision to do such a transformation when a loop can be fully elided ("deleted"). > > This has historically been strongly NAK'd, and I don't think that position > has changed in the meantime. A variable-argument 64-bit division is > really expensive, especially on 32-bit machines that lack a native > 32-bit division instruction, and we don't want developers to accidentally > insert one in their driver code. > > Explicitly calling one of the division helpers in linux/math64.h is the > established way for driver writers to declare that a particular division > cannot be turned into a cheaper operation and is never run in a > performance critical code path. The compiler of course cannot know > about either of those. > > Arnd
On Mon, Oct 10, 2022, at 11:23 PM, Nick Desaulniers wrote: > On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@kernel.org> wrote: >> On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312 > Any creative ideas on how to avoid this? Perhaps putting the `aSig -= > bSig;` in inline asm? Inserting a `barrier()` or empty asm statement > into the loops also seems to work. I was going to suggest adding a barrier() as well, should have read on first ;-) > I see this function seems to be based on Berkeley Softfloat > http://www.jhauser.us/arithmetic/SoftFloat.html > v2. v3 looks like a total rewrite. Looking at v3e, it looks like > float64_rem() is now called f64_rem() and defined in f64_rem.c. It > doesn't look like there's anything from v3 that we could backport to > the kernel's v2 to avoid this. > > Otherwise perhaps we just disable OABI_COMPAT for clang. Quite a few > defconfigs explicitly enable FPE_NWFPE though. Are there really a lot > of OABI binaries still in use? I'd do the minimal thing here, neither NWFPE nor OABI_COMPAT should be too relevant here. Russell still uses some machines with old OABI binaries, but I have not heard from anyone else using those in a very long time. Note that while gcc can still produce kernels that are OABI binaries, gcc-4.6 marked the arm-linux-gnu target as obsolete and gcc-4.8 removed it, so any existing binaries were built with toolchains predating that. Arnd
On Mon, Oct 10, 2022 at 3:14 PM Arnd Bergmann <arnd@kernel.org> wrote: > > On Mon, Oct 10, 2022, at 11:23 PM, Nick Desaulniers wrote: > > On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@kernel.org> wrote: > >> On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312 > > Any creative ideas on how to avoid this? Perhaps putting the `aSig -= > > bSig;` in inline asm? Inserting a `barrier()` or empty asm statement > > into the loops also seems to work. > > I was going to suggest adding a barrier() as well, should have > read on first ;-) barrier() forces reloads+spills in the loop. The output with `-mllvm -replexitval=never` is optimal (assuming the loop is faster than __aeabi_uldivmod (which I think is unprovable). https://godbolt.org/z/7dMabYYcM As much I hate relying on compiler-internal flags, I think this is optimal: ``` diff --git a/arch/arm/nwfpe/Makefile b/arch/arm/nwfpe/Makefile index 303400fa2cdf..2aec85ab1e8b 100644 --- a/arch/arm/nwfpe/Makefile +++ b/arch/arm/nwfpe/Makefile @@ -11,3 +11,9 @@ nwfpe-y += fpa11.o fpa11_cpdo.o fpa11_cpdt.o \ entry.o nwfpe-$(CONFIG_FPE_NWFPE_XP) += extended_cpdo.o + +# Try really hard to avoid generating calls to __aeabi_uldivmod() from +# float64_rem() due to loop elision. +ifdef CONFIG_CC_IS_CLANG +CFLAGS_softfloat.o += -mllvm -replexitval=never +endif ``` Part of me is tempted to move float64_rem() to its own file for that flag, but indvars+loop-utils isn't eliding other loops in that file (comparing the full disassembly before+after the above diff). Long term, it might be nice for us to have `--rtlib` recognize `--rtlib=linux-kernel@version` or something so that we could better describe the effective compiler runtime to the compiler. There are already differences in compiler-rt and libgcc where we could make better codegen decisions if we were to consider the target rtlib. These libraries also change over time though...
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index 6d2ba454f25b..3fa273219312 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -29,6 +29,7 @@ endif obj-$(CONFIG_UACCESS_WITH_MEMCPY) += uaccess_with_memcpy.o lib-$(CONFIG_MMU) += $(mmu-y) +lib-$(CONFIG_AEABI) += aeabi_uldivmod.o ifeq ($(CONFIG_CPU_32v3),y) lib-y += io-readsw-armv3.o io-writesw-armv3.o diff --git a/arch/arm/lib/aeabi_uldivmod.c b/arch/arm/lib/aeabi_uldivmod.c new file mode 100644 index 000000000000..310427893195 --- /dev/null +++ b/arch/arm/lib/aeabi_uldivmod.c @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Unsigned 64b division with remainder, as is typically provided by libgcc or + * compiler-rt. + * + * Copyright (C) 2023 Google, LLC. + * + * Author: Nick Desaulniers <ndesaulniers@google.com> + */ + +#include <linux/math64.h> + +struct result { + u64 quot, rem; +}; + +struct result __aeabi_uldivmod(u64 numerator, u64 denominator) +{ + struct result res; + + res.quot = div64_u64_rem(numerator, denominator, &res.rem); + return res; +}
Compilers frequently need to defer 64b division to a libcall with this symbol name. It essentially is div64_u64_rem, just with a different signature. Kernel developers know to call div64_u64_rem, but compilers don't. Link: https://lore.kernel.org/lkml/20220524004156.0000790e@garyguo.net/ Suggested-by: Gary Guo <gary@garyguo.net> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> --- arch/arm/lib/Makefile | 1 + arch/arm/lib/aeabi_uldivmod.c | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+) create mode 100644 arch/arm/lib/aeabi_uldivmod.c