Message ID | 20130208191208.2ef3d78bda71aa7b44d00d7b@freescale.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 8 Feb 2013, Kim Phillips wrote: > On Fri, 8 Feb 2013 17:47:33 -0500 > Nicolas Pitre <nico@fluxnic.net> wrote: > > > On Fri, 8 Feb 2013, Woodhouse, David wrote: > > > > > On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote: > > > > On Fri, 8 Feb 2013, Woodhouse, David wrote: > > > > > > > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote: > > > > > > > > > > > > However, the biggest reason not to use libgcc is that we want to control > > > > > > what gets used in the kernel - for example, no floating point, and no > > > > > > use of 64 x 64bit division. > > > > > > > > > > Which is all very sensible. But there's no particular reason we couldn't > > > > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to. > > > > > > > > Absolutely. > > > > > > And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other > > > architectures do, right? > > > > If that turns out to be beneficial over what we have now, then yes. > > I didn't read back the whole thread to form an opinion though. > > The diff below implements __bswap[sd]i2 in arch/arm/lib, and > results in the following savings in vmlinux size: > > column 1: name of defconfig > column 2: text+data+bss, linux-next-20130207 vanilla, gcc 4.6.3 > column 3: text+data+bss, linux-next-20130207+below diff, gcc 4.6.3 > column 4: col. 3 - col. 2 (ie., -ve numbers represent savings) > [...] > imx_v6_v7_defconfig: 7672373 7667089 -5284 > lart_defconfig: 2941150 2941054 -96 > mxs_defconfig: 11091983 11095679 3696 The savings are good, with some impressive cases. However the mxs_defconfig is completely the opposite and by far. Any idea? > gcc 4.7.3 runs haven't been as good across the board as gcc 4.6.3, > however: Not only that, but in many cases the results are wildly different given the same config: > imx_v6_v7_defconfig: 7637605 7636935 -670 > lart_defconfig: 2922550 2926600 4050 > mxs_defconfig: 11071139 11070893 -246 The mxs_defconfig became much better while lart_defconfig regressed a lot. > Haven't looked at why. Would be a good idea since this is rather weird and gcc could benefit from your findings. > In any case, some questions I have are: > > (a) are the __bswap[sd]i2 implementations acceptable written in C, > as in the diff? I don't speak ARM asm (yet at least). The > generated code looks pretty optimal in both armv5 and 6+. It looks pretty nice indeed: __bswapsi2: eor r2, r0, r0, ror #16 mov r1, r2, lsr #8 bic r3, r1, #65280 eor r0, r3, r0, ror #8 bx lr There is no way to do better than that. But that's true only if -Os is _not_ used. With -Os we get the following output: __bswapsi2: mov r3, r0, asl #24 and r2, r0, #65280 orr r3, r3, r0, lsr #24 orr r3, r3, r2, asl #8 and r0, r0, #16711680 orr r0, r3, r0, lsr #8 bx lr I really don't get why gcc thinks the above is shorter. I'm saving you from pasting the __bswapdi2 result which is also way way worse. That was with Linaro gcc v4.6.2. With Sourcery gcc v4.5.1 we get: __bswapsi2: stmfd sp!, {r3, lr} bl __bswapsi2 ldmfd sp!, {r3, pc} This is indeed shorter, but much less useful. So you better enforce -O2 for this file. And the nice thing with C code is that it is fully optimized with the rev instruction when compiling for ARMv6+ if it is ever used in that case. > (b) would adding __bswap[sd]i2 to the kernel build need to be > disabled on armv6+? AFAICT, gcc doesn't emit calls - even for the > 8-byte swap, even with -Os, on armv6+. I wouldn't bother. That would save only 6 instructions total. And who knows if some gcc flavor start calling them for some reason eventually. > (c) testing allyesconfigs is proving to be a pain - lots of > breakeage - other than defconfig testing, is there any more I can do? The defconfigs provide wildly different results and that is a good thing for further investigation. You may concentrate on a small interesting sample such as those I kept above. With allyesconfig the good configs would cancel out the bad ones making the bad ones invisible. Nicolas
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 4265a26..5e8b735 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -58,6 +58,7 @@ config ARM select CLONE_BACKWARDS select OLD_SIGSUSPEND3 select OLD_SIGACTION + select ARCH_USE_BUILTIN_BSWAP help The ARM series is a line of low-power-consumption RISC chip designs licensed by ARM Ltd and targeted at embedded applications and diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index c9865f6..c225624 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -111,12 +111,12 @@ endif targets := vmlinux vmlinux.lds \ piggy.$(suffix_y) piggy.$(suffix_y).o \ - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ font.o font.c head.o misc.o $(OBJS) # Make sure files are removed during clean extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \ - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) + lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs) ifeq ($(CONFIG_FUNCTION_TRACER),y) ORIG_CFLAGS := $(KBUILD_CFLAGS) @@ -158,6 +158,12 @@ ashldi3 = $(obj)/ashldi3.o $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S $(call cmd,shipped) +# For __bswapsi2, __bswapdi2 +bswapsdi2 = $(obj)/bswapsdi2.o + +$(obj)/bswapsdi2.o: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.o + $(call cmd,shipped) + # We need to prevent any GOTOFF relocs being used with references # to symbols in the .bss section since we cannot relocate them # independently from the rest at run time. This can be achieved by @@ -179,7 +185,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ fi $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ + $(bswapsdi2) FORCE @$(check_for_multiple_zreladdr) $(call if_changed,ld) @$(check_for_bad_syms) diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c index 60d3b73..ba578f7 100644 --- a/arch/arm/kernel/armksyms.c +++ b/arch/arm/kernel/armksyms.c @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); extern void __udivsi3(void); extern void __umodsi3(void); extern void __do_div64(void); +extern void __bswapsi2(void); +extern void __bswapdi2(void); extern void __aeabi_idiv(void); extern void __aeabi_idivmod(void); @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); EXPORT_SYMBOL(__udivsi3); EXPORT_SYMBOL(__umodsi3); EXPORT_SYMBOL(__do_div64); +EXPORT_SYMBOL(__bswapsi2); +EXPORT_SYMBOL(__bswapdi2); #ifdef CONFIG_AEABI EXPORT_SYMBOL(__aeabi_idiv); diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index af72969..5383df7 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ ucmpdi2.o lib1funcs.o div64.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ - call_with_stack.o + call_with_stack.o bswapsdi2.o mmu-y := clear_user.o copy_page.o getuser.o putuser.o diff --git a/arch/arm/lib/bswapsdi2.c b/arch/arm/lib/bswapsdi2.c new file mode 100644 index 0000000..1923b2f --- /dev/null +++ b/arch/arm/lib/bswapsdi2.c @@ -0,0 +1,19 @@ +unsigned int __bswapsi2(unsigned int x) +{ + return ((x & 0x000000ffUL) << 24) | + ((x & 0x0000ff00UL) << 8) | + ((x & 0x00ff0000UL) >> 8) | + ((x & 0xff000000UL) >> 24); +} + +unsigned long long __bswapdi2(unsigned long long x) +{ + return ((x & 0x00000000000000ffULL) << 56) | + ((x & 0x000000000000ff00ULL) << 40) | + ((x & 0x0000000000ff0000ULL) << 24) | + ((x & 0x00000000ff000000ULL) << 8) | + ((x & 0x000000ff00000000ULL) >> 8) | + ((x & 0x0000ff0000000000ULL) >> 24) | + ((x & 0x00ff000000000000ULL) >> 40) | + ((x & 0xff00000000000000ULL) >> 56); +}