Message ID | 20130128193033.8a0b0a871150c99247f05a95@freescale.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jan 28, 2013 at 07:30:33PM -0600, Kim Phillips wrote: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, e.g. in big endian to big endian moves. > > AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6, > and for the 16-bit version in 4.8. > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > --- > akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 > > based on linux-next. Depends on commit "compiler-gcc{3,4}.h: Use > GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>, > currently in the akpm branch. > > RFC because of unfamiliarity with arch ARM, and that at91sam9rl, > at91rm9200, and lpd270 (so far, at least) builds fail with: > > include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2' > > I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4 > 20120303 (prerelease) > > arch/arm/Kconfig | 1 + > include/linux/compiler-gcc4.h | 3 ++- > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index eda8711..437d11a 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -3,6 +3,7 @@ config ARM > default y > select ARCH_BINFMT_ELF_RANDOMIZE_PIE > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > + select ARCH_USE_BUILTIN_BSWAP > select ARCH_HAVE_CUSTOM_GPIO_H > select ARCH_WANT_IPC_PARSE_VERSION > select BUILDTIME_EXTABLE_SORT if MMU > diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h > index 68b162d..da5f728 100644 > --- a/include/linux/compiler-gcc4.h > +++ b/include/linux/compiler-gcc4.h > @@ -67,7 +67,8 @@ > > > #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP > -#if GCC_VERSION >= 40400 > +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \ > + (defined(__arm__) && GCC_VERSION >= 40600) There should be no arch-specific stuff in a generic header. I guess you probably need to select ARCH_USE_BUILTIN_BSWAP in an arm-specific compiler.h header after checking compiler version... Thanks.
On Mon, Jan 28, 2013 at 07:30:33PM -0600, Kim Phillips wrote: > AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6, > and for the 16-bit version in 4.8. Hmm. $ /usr/local/aeabi/bin/arm-linux-gcc --version arm-linux-gcc (GCC) 4.5.4 Copyright (C) 2010 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ for a in armv3 armv4 armv4t armv5t armv5te armv6k armv6 armv7-a; do \ echo $a:; \ for f in 16 32 64; do \ echo 'unsigned foo(unsigned val) { return __builtin_bswap'$f'(val); }' | \ /usr/local/aeabi/bin/arm-linux-gcc -x c -S -o - - -march=$a | grep 'bl'; \ done; \ done produces: armv3: bl __builtin_bswap16 armv4: bl __builtin_bswap16 armv4t: bl __builtin_bswap16 armv5t: bl __builtin_bswap16 armv5te: bl __builtin_bswap16 armv6k: bl __builtin_bswap16 armv6: bl __builtin_bswap16 armv7-a: bl __builtin_bswap16 So, this compiler (4.5.4) has support for 32-bit and 64-bit bswaps across all our architectures, but not the 16-bit ones.
On Tue, 2013-01-29 at 14:13 +0000, Russell King - ARM Linux wrote: > > So, this compiler (4.5.4) has support for 32-bit and 64-bit bswaps > across all our architectures, but not the 16-bit ones. That observation is consistent with my dig through GCC history. I had come to the conclusion that the 32-bit and 64-bit versions were added *generically* in 4.4, and that the 16-bit version was added in 4.6 to that PowerPC back end, and made generic in 4.8. So I *had* put that arch-specific check into compiler-gcc4.h, for PowerPC. It's just outside the context of Kim's patch. If it really does end up being different for every arch, I may reconsider that. As for the __bswapsi2() calls... if it's ever emitting an out-of-line call for something like that, that seems like a really dubious decision; surely it's better being inlined? So rather than adding it to your bits-of-libgcc.a in the kernel, I'd suggest just *not* using ARCH_USE_BUILTIN_BSWAP for the offending compilers, and filing a bug to get them fixed. But really, this is why I created ARCH_USE_BUILTIN_BSWAP and left it to architecture maintainers to enable it at their leisure.... :)
On 01/28/2013 07:30 PM, Kim Phillips wrote: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, e.g. in big endian to big endian moves. > > AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6, > and for the 16-bit version in 4.8. I think these are the versions ARM got optimized swap support. The newer compilers are smart enough to replace a swap written in C with a rev instruction. Last time I checked, they did not do rev16, but that is probably what was added in 4.8 (and backported to Linaro gcc). The Linaro backports make it impossible to define what gcc version has a specific feature. If you specify to use the builtin's, do you still get inline rev instructions emitted? Rob > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > --- > akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 > > based on linux-next. Depends on commit "compiler-gcc{3,4}.h: Use > GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>, > currently in the akpm branch. > > RFC because of unfamiliarity with arch ARM, and that at91sam9rl, > at91rm9200, and lpd270 (so far, at least) builds fail with: > > include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2' > > I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4 > 20120303 (prerelease) > > arch/arm/Kconfig | 1 + > include/linux/compiler-gcc4.h | 3 ++- > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index eda8711..437d11a 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -3,6 +3,7 @@ config ARM > default y > select ARCH_BINFMT_ELF_RANDOMIZE_PIE > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > + select ARCH_USE_BUILTIN_BSWAP > select ARCH_HAVE_CUSTOM_GPIO_H > select ARCH_WANT_IPC_PARSE_VERSION > select BUILDTIME_EXTABLE_SORT if MMU > diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h > index 68b162d..da5f728 100644 > --- a/include/linux/compiler-gcc4.h > +++ b/include/linux/compiler-gcc4.h > @@ -67,7 +67,8 @@ > > > #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP > -#if GCC_VERSION >= 40400 > +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \ > + (defined(__arm__) && GCC_VERSION >= 40600) > #define __HAVE_BUILTIN_BSWAP32__ > #define __HAVE_BUILTIN_BSWAP64__ > #endif >
On Tue, 2013-01-29 at 08:53 -0600, Rob Herring wrote: > If you specify to use the builtin's, do you still get inline rev > instructions emitted? You mean from the architecture's __arch_swab32() et al. macros? No. If the architecture enables ARCH_USE_BUILTIN_BSWAP and the compiler version checks indicate that the correspondingly-sized builtin is available, then the builtin will be used. Only if the arch *doesn't* set ARCH_USE_BUILTIN_BSWAP, or if the compiler is old enough not to have the corresponding intrinsic, will the inline assembler in the __arch_swabXX macros get used. Note, however, that there is no such compiler intrinsic for swahb32(), which is where rev16 gets used. That's still being left to the inline asm in all cases.
On Tue, 2013-01-29 at 09:35 +0100, Borislav Petkov wrote: > > > > > #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP > > -#if GCC_VERSION >= 40400 > > +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \ > > + (defined(__arm__) && GCC_VERSION >= 40600) > > There should be no arch-specific stuff in a generic header. I guess > you probably need to select ARCH_USE_BUILTIN_BSWAP in an arm-specific > compiler.h header after checking compiler version... If we're really going to have many different architectures depending on different versions of GCC for this (if it wasn't sane to use it from 4.4/4.8 when it got introduced, and depends on some later arch-specific optimisation), then perhaps we'll have the arch provide the corresponding required GCC_VERSION for using each of 64/32/16 bit builtins, instead of just a yes/no flag? Or just define __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps? In fact we could start by having just the problematic architectures set __HAVE_BUILTIN_BSWAPxx__ for themselves according to whatever criteria they want, and then if there's scope for consolidating that in the generic code then we can do so later.
On Tue, Jan 29, 2013 at 04:46:58PM +0000, Woodhouse, David wrote: > If we're really going to have many different architectures depending > on different versions of GCC for this (if it wasn't sane to use > it from 4.4/4.8 when it got introduced, and depends on some later > arch-specific optimisation), then perhaps we'll have the arch > provide the corresponding required GCC_VERSION for using each of > 64/32/16 bit builtins, instead of just a yes/no flag? Or just define > __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps? Damn, there's already the __powerpc__ thing in there. Yeah, something like defininig __HAVE_BUILTIN_BSWAPxx__ makes sense and can keep the header arch-agnostic without growing all those different arch defines. But I liked your other suggestion better to get the offending compilers fixed. I dunno though, how generically is stuff like that getting implemented for every arch so probably single arches doing __HAVE* defines is probably going to be the realizable solution in the end. Hmmm.
On Tue, 2013-01-29 at 18:42 +0100, Borislav Petkov wrote: > On Tue, Jan 29, 2013 at 04:46:58PM +0000, Woodhouse, David wrote: > > If we're really going to have many different architectures depending > > on different versions of GCC for this (if it wasn't sane to use > > it from 4.4/4.8 when it got introduced, and depends on some later > > arch-specific optimisation), then perhaps we'll have the arch > > provide the corresponding required GCC_VERSION for using each of > > 64/32/16 bit builtins, instead of just a yes/no flag? Or just define > > __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps? > > Damn, there's already the __powerpc__ thing in there. That's different though. That's because GCC didn't have a generic __builtin_bswap16() until 4.8, while PowerPC got it in 4.6. That's a relatively simple and manageable one-off arch-dependency. But once we get into a mass of "well, it wasn't actually *usable* for ARM until 4.9", we start wanting to push it into arch code. > But I liked your other suggestion better to get the offending > compilers fixed. That wasn't an *alternative*. It's required if the compiler is doing something suboptimal, whatever happens. And then we start to *use* the compiler from the first version that's known to be fixed.
On Tue, Jan 29, 2013 at 05:55:54PM +0000, Woodhouse, David wrote: > That's different though. That's because GCC didn't have a generic > __builtin_bswap16() until 4.8, while PowerPC got it in 4.6. > > That's a relatively simple and manageable one-off arch-dependency. But > once we get into a mass of "well, it wasn't actually *usable* for ARM > until 4.9", we start wanting to push it into arch code. Yeah, and once people see one arch mentioned, all of a sudden they think it is ok to add just one more ... > > But I liked your other suggestion better to get the offending > > compilers fixed. > > That wasn't an *alternative*. It's required if the compiler is doing > something suboptimal, whatever happens. And then we start to *use* the > compiler from the first version that's known to be fixed. So, IMHO it sounds to me like we want to explicitly state for each arch separately that it is ok to use the __builtin_bswapXX things. This also takes care of the case where the compiler is doing something suboptimal by excluding the affected versions. Thanks.
On Tue, 2013-01-29 at 19:10 +0100, Borislav Petkov wrote: > So, IMHO it sounds to me like we want to explicitly state for each arch > separately that it is ok to use the __builtin_bswapXX things. This also > takes care of the case where the compiler is doing something suboptimal > by excluding the affected versions. Well, if it really does end up being different for every architecture, then that means I probably made the wrong decision when I chose to make it "generic", and override the __arch_swabXX() macros. I could have just pushed all the architectures to use the builtins in their __arch_swabXX macros instead, as appropriate. Let's see how many special cases we actually end up with, and perhaps we'll end up switching that round. For now, let's just make ARM set __HAVE_BUILTIN_BSWAPxx__ for the appropriate sizes in <asm/swab.h>, according to whatever criteria it needs. It's not entirely clear how much of a win it is on ARM anyway; we don't have load-and-swap or store-and-swap instructions so there are only a few added opportunities for optimisation that we get by letting the compiler see what's going on.
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index eda8711..437d11a 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -3,6 +3,7 @@ config ARM default y select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE + select ARCH_USE_BUILTIN_BSWAP select ARCH_HAVE_CUSTOM_GPIO_H select ARCH_WANT_IPC_PARSE_VERSION select BUILDTIME_EXTABLE_SORT if MMU diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h index 68b162d..da5f728 100644 --- a/include/linux/compiler-gcc4.h +++ b/include/linux/compiler-gcc4.h @@ -67,7 +67,8 @@ #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP -#if GCC_VERSION >= 40400 +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \ + (defined(__arm__) && GCC_VERSION >= 40600) #define __HAVE_BUILTIN_BSWAP32__ #define __HAVE_BUILTIN_BSWAP64__ #endif
Enable the compiler intrinsic for byte swapping on arch ARM. This allows the compiler to detect and be able to optimize out byte swappings, e.g. in big endian to big endian moves. AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6, and for the 16-bit version in 4.8. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> --- akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 based on linux-next. Depends on commit "compiler-gcc{3,4}.h: Use GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>, currently in the akpm branch. RFC because of unfamiliarity with arch ARM, and that at91sam9rl, at91rm9200, and lpd270 (so far, at least) builds fail with: include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2' I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4 20120303 (prerelease) arch/arm/Kconfig | 1 + include/linux/compiler-gcc4.h | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-)