Message ID | 20130523114654.1f273241725205c6703b2226@freescale.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 23 May 2013, Kim Phillips wrote: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > 4.8): > > text data bss dec hex filename > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap > > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap > > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > Reviewed-by: Nicolas Pitre <nico@linaro.org> > Acked-by: David Woodhouse <David.Woodhouse@intel.com> > --- > resending as v6 appears to have fallen though the cracks. Russell? Please send your patch to Russell's patch system: http://www.arm.linux.org.uk/developer/patches/ Nicolas
On Thu, May 23, 2013 at 11:46:54AM -0500, Kim Phillips wrote: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > 4.8): > > text data bss dec hex filename > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap > > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap > > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > Reviewed-by: Nicolas Pitre <nico@linaro.org> > Acked-by: David Woodhouse <David.Woodhouse@intel.com> > --- > resending as v6 appears to have fallen though the cracks. Russell? Please put it in the patch system (otherwise I do drop patches.)
Am 23.05.2013 18:46, schrieb Kim Phillips: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > 4.8): I'm no GCC tool chain expert, so I just have an understanding question: Could anyone kindly give a brief explanation (*) of what the advantage of this is on ARM? http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 mentions "lwbrx/stwbrx on PowerPC, movbe on Atom". But for ARM? I haven't understood yet why the __arch_swabXX() in arch/arm/include/asm/swab.h [1] aren't sufficient? How can this be done better? E.g. does anybody have a disassembly without/with this change to illustrate that? Many thanks and best regards Dirk (*) or in case this already done provide a link. I couldn't find it in the discussion of this patch. [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/include/asm/swab.h
On Sun, 2013-05-26 at 07:38 +0200, Dirk Behme wrote: > Am 23.05.2013 18:46, schrieb Kim Phillips: > > Enable the compiler intrinsic for byte swapping on arch ARM. This > > allows the compiler to detect and be able to optimize out byte > > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > > 4.8): > > I'm no GCC tool chain expert, so I just have an understanding > question: Could anyone kindly give a brief explanation (*) of what the > advantage of this is on ARM? > > http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 > > mentions "lwbrx/stwbrx on PowerPC, movbe on Atom". But for ARM? > > I haven't understood yet why the __arch_swabXX() in > arch/arm/include/asm/swab.h [1] aren't sufficient? How can this be > done better? E.g. does anybody have a disassembly without/with this > change to illustrate that? The point is just that the compiler gets to *see* what's happening. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55177 for a bunch of examples of things that GCC ought to be able to optimise, even without the CPU having load-and-swap instructions. Not that it always does; hence the PR. But there are some that it does currently manage, evidently. You'll see this if you follow the link above, but as an example: imagine a code sequence that goes load, swap, mask, swap, store. With the swaps done by opaque inline asm, there's nothing the compiler can do to optimise this. But if it *knows* what's going on, it can optimise it into a single load, mask of a pre-byte-swapped constant, and store. Having said that, I can't actually answer your question — I don't know which optimisations the compiler *is* doing to provide the "modest benefit" that Kim mentions; every class of optimisation I explicitly checked for was missing. Again, hence the PR. But evidently it does manage to get *something* right.
On Fri, May 24, 2013 at 12:13:36AM +0100, Russell King - ARM Linux wrote: > On Thu, May 23, 2013 at 11:46:54AM -0500, Kim Phillips wrote: > > Enable the compiler intrinsic for byte swapping on arch ARM. This > > allows the compiler to detect and be able to optimize out byte > > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > > 4.8): > > > > text data bss dec hex filename > > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig > > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap > > > > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig > > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap > > > > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig > > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap > > > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > > Reviewed-by: Nicolas Pitre <nico@linaro.org> > > Acked-by: David Woodhouse <David.Woodhouse@intel.com> > > --- > > resending as v6 appears to have fallen though the cracks. Russell? > > Please put it in the patch system (otherwise I do drop patches.) (Added Arnd/SFR in case they have comments.) So, we have a problem here - the kind which appears when people stuff things into the -next tree which aren't destined for the next merge window. This is the relevant context from your patch, which is against linux-next: - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ - font.o font.c head.o misc.o $(OBJS) + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ + bswapsdi2.S font.o font.c head.o misc.o $(OBJS) # Make sure files are removed during clean extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \ ^^^^^^^^^ - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs) the underlined bit - piggy.lz4 for those who read mail with proportional fonts. That is not in any kernel I have, and if it _is_ something that is destined for the next merge window, it should be in my tree as it's a core ARM feature, not in some random other tree. Short of hand-editing and manually applying the patch, a solution would be to rebase it on a mainline kernel version, like -rc4, and resubmit that version instead. That will ultimately then give sfr a conflict which should be trivial to resolve - and hopefully we'll find out who's carrying the LZ4 patch and putting it into linux-next.
On Thu, Jun 06, 2013 at 11:12:34PM +0100, Russell King - ARM Linux wrote: > That will ultimately then give sfr a conflict which should be trivial > to resolve - and hopefully we'll find out who's carrying the LZ4 patch > and putting it into linux-next. That should be akpm: http://ozlabs.org/~akpm/mmotm/broken-out/arm-add-support-for-lz4-compressed-kernel.patch AFAICT.
Hi Russell, On Thu, 6 Jun 2013 23:12:34 +0100 Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > > So, we have a problem here - the kind which appears when people stuff > things into the -next tree which aren't destined for the next merge > window. This is the relevant context from your patch, which is > against linux-next: > > - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ > - font.o font.c head.o misc.o $(OBJS) > + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ > + bswapsdi2.S font.o font.c head.o misc.o $(OBJS) > > # Make sure files are removed during clean > extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \ > ^^^^^^^^^ > - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) > + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs) > > the underlined bit - piggy.lz4 for those who read mail with proportional > fonts. > > That is not in any kernel I have, and if it _is_ something that is > destined for the next merge window, it should be in my tree as it's > a core ARM feature, not in some random other tree. That is commit d8a6bf1b25bd ("arm: add support for LZ4-compressed kernel") from next-20130606 from the akpm tree. (adding author cc) That patch was cc'd to you, and is part of a series that adds LZ4 compression to the kernel, so would not work on its own. The first patch in the series is "decompressor: add LZ4 decompressor module". > Short of hand-editing and manually applying the patch, a solution would > be to rebase it on a mainline kernel version, like -rc4, and resubmit > that version instead. That will ultimately then give sfr a conflict > which should be trivial to resolve - and hopefully we'll find out who's > carrying the LZ4 patch and putting it into linux-next. People should *never, ever* submit patches based on linux-next (unless, of course they are to me to help fix merge conflicts in linux-next, etc). Patches submitted to a particular maintainer should be based on (an ancestor of) that maintainer's current tree. Sure, test new code before and after merging linux-next, but don;t base new code on it.
On Thu, 23 May 2013, Kim Phillips wrote: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > 4.8): > > text data bss dec hex filename > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap > > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap > > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > Reviewed-by: Nicolas Pitre <nico@linaro.org> > Acked-by: David Woodhouse <David.Woodhouse@intel.com> Did this ever go somewhere? Russell suggested at the time to base it against a mainline kernel (since it was patching files which apparently were already patched with out-of-tree lz4 patches) and send it to his patch system. > --- > resending as v6 appears to have fallen though the cracks. Russell? > > v7: rebased onto next-20130521, re-ran above vmlinux sizes with > Linaro gcc 4.8, added Nicolas' Reviewed-by, and David's Acked-by. > v6 and prior version information: > https://lkml.org/lkml/2013/2/22/475 > > arch/arm/Kconfig | 1 + > arch/arm/boot/compressed/Makefile | 15 +++++++++++---- > arch/arm/kernel/armksyms.c | 4 ++++ > arch/arm/lib/Makefile | 2 +- > arch/arm/lib/bswapsdi2.S | 36 ++++++++++++++++++++++++++++++++++++ > 5 files changed, 53 insertions(+), 5 deletions(-) > create mode 100644 arch/arm/lib/bswapsdi2.S > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index a7fc5ea..c2fe04d 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -63,6 +63,7 @@ config ARM > select OLD_SIGSUSPEND3 > select OLD_SIGACTION > select HAVE_CONTEXT_TRACKING > + select ARCH_USE_BUILTIN_BSWAP > help > The ARM series is a line of low-power-consumption RISC chip designs > licensed by ARM Ltd and targeted at embedded applications and > diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile > index 198a4ad..bd8a176 100644 > --- a/arch/arm/boot/compressed/Makefile > +++ b/arch/arm/boot/compressed/Makefile > @@ -112,12 +112,12 @@ endif > > targets := vmlinux vmlinux.lds \ > piggy.$(suffix_y) piggy.$(suffix_y).o \ > - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ > - font.o font.c head.o misc.o $(OBJS) > + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ > + bswapsdi2.S font.o font.c head.o misc.o $(OBJS) > > # Make sure files are removed during clean > extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \ > - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) > + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs) > > ifeq ($(CONFIG_FUNCTION_TRACER),y) > ORIG_CFLAGS := $(KBUILD_CFLAGS) > @@ -159,6 +159,12 @@ ashldi3 = $(obj)/ashldi3.o > $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S > $(call cmd,shipped) > > +# For __bswapsi2, __bswapdi2 > +bswapsdi2 = $(obj)/bswapsdi2.o > + > +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S > + $(call cmd,shipped) > + > # We need to prevent any GOTOFF relocs being used with references > # to symbols in the .bss section since we cannot relocate them > # independently from the rest at run time. This can be achieved by > @@ -180,7 +186,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ > fi > > $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ > - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE > + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ > + $(bswapsdi2) FORCE > @$(check_for_multiple_zreladdr) > $(call if_changed,ld) > @$(check_for_bad_syms) > diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c > index 60d3b73..ba578f7 100644 > --- a/arch/arm/kernel/armksyms.c > +++ b/arch/arm/kernel/armksyms.c > @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); > extern void __udivsi3(void); > extern void __umodsi3(void); > extern void __do_div64(void); > +extern void __bswapsi2(void); > +extern void __bswapdi2(void); > > extern void __aeabi_idiv(void); > extern void __aeabi_idivmod(void); > @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); > EXPORT_SYMBOL(__udivsi3); > EXPORT_SYMBOL(__umodsi3); > EXPORT_SYMBOL(__do_div64); > +EXPORT_SYMBOL(__bswapsi2); > +EXPORT_SYMBOL(__bswapdi2); > > #ifdef CONFIG_AEABI > EXPORT_SYMBOL(__aeabi_idiv); > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > index af72969..5383df7 100644 > --- a/arch/arm/lib/Makefile > +++ b/arch/arm/lib/Makefile > @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ > ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ > ucmpdi2.o lib1funcs.o div64.o \ > io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ > - call_with_stack.o > + call_with_stack.o bswapsdi2.o > > mmu-y := clear_user.o copy_page.o getuser.o putuser.o > > diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S > new file mode 100644 > index 0000000..2ba43a0 > --- /dev/null > +++ b/arch/arm/lib/bswapsdi2.S > @@ -0,0 +1,36 @@ > +#include <linux/linkage.h> > + > +#if __LINUX_ARM_ARCH__ >= 6 > +ENTRY(__bswapsi2) > + rev r0, r0 > + bx lr > +ENDPROC(__bswapsi2) > + > +ENTRY(__bswapdi2) > + rev r3, r0 > + rev r0, r1 > + mov r1, r3 > + bx lr > +ENDPROC(__bswapdi2) > +#else > +ENTRY(__bswapsi2) > + eor r3, r0, r0, ror #16 > + mov r3, r3, lsr #8 > + bic r3, r3, #0xff00 > + eor r0, r3, r0, ror #8 > + mov pc, lr > +ENDPROC(__bswapsi2) > + > +ENTRY(__bswapdi2) > + mov ip, r1 > + eor r3, ip, ip, ror #16 > + eor r1, r0, r0, ror #16 > + mov r1, r1, lsr #8 > + mov r3, r3, lsr #8 > + bic r3, r3, #0xff00 > + bic r1, r1, #0xff00 > + eor r1, r1, r0, ror #8 > + eor r0, r3, ip, ror #8 > + mov pc, lr > +ENDPROC(__bswapdi2) > +#endif > -- > 1.8.1.5 >
On Sat, 26 Oct 2013 22:41:34 -0400 Nicolas Pitre <nico@fluxnic.net> wrote: > On Thu, 23 May 2013, Kim Phillips wrote: > > > Enable the compiler intrinsic for byte swapping on arch ARM. This > > allows the compiler to detect and be able to optimize out byte > > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > > 4.8): > > > > text data bss dec hex filename > > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig > > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap > > > > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig > > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap > > > > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig > > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap > > > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > > Reviewed-by: Nicolas Pitre <nico@linaro.org> > > Acked-by: David Woodhouse <David.Woodhouse@intel.com> > > Did this ever go somewhere? > > Russell suggested at the time to base it against a mainline kernel > (since it was patching files which apparently were already patched with > out-of-tree lz4 patches) and send it to his patch system. I'll re-base and send it to his patch system. Thanks, Kim
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index a7fc5ea..c2fe04d 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -63,6 +63,7 @@ config ARM select OLD_SIGSUSPEND3 select OLD_SIGACTION select HAVE_CONTEXT_TRACKING + select ARCH_USE_BUILTIN_BSWAP help The ARM series is a line of low-power-consumption RISC chip designs licensed by ARM Ltd and targeted at embedded applications and diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 198a4ad..bd8a176 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -112,12 +112,12 @@ endif targets := vmlinux vmlinux.lds \ piggy.$(suffix_y) piggy.$(suffix_y).o \ - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ - font.o font.c head.o misc.o $(OBJS) + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ + bswapsdi2.S font.o font.c head.o misc.o $(OBJS) # Make sure files are removed during clean extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \ - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs) ifeq ($(CONFIG_FUNCTION_TRACER),y) ORIG_CFLAGS := $(KBUILD_CFLAGS) @@ -159,6 +159,12 @@ ashldi3 = $(obj)/ashldi3.o $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S $(call cmd,shipped) +# For __bswapsi2, __bswapdi2 +bswapsdi2 = $(obj)/bswapsdi2.o + +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S + $(call cmd,shipped) + # We need to prevent any GOTOFF relocs being used with references # to symbols in the .bss section since we cannot relocate them # independently from the rest at run time. This can be achieved by @@ -180,7 +186,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ fi $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ + $(bswapsdi2) FORCE @$(check_for_multiple_zreladdr) $(call if_changed,ld) @$(check_for_bad_syms) diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c index 60d3b73..ba578f7 100644 --- a/arch/arm/kernel/armksyms.c +++ b/arch/arm/kernel/armksyms.c @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); extern void __udivsi3(void); extern void __umodsi3(void); extern void __do_div64(void); +extern void __bswapsi2(void); +extern void __bswapdi2(void); extern void __aeabi_idiv(void); extern void __aeabi_idivmod(void); @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); EXPORT_SYMBOL(__udivsi3); EXPORT_SYMBOL(__umodsi3); EXPORT_SYMBOL(__do_div64); +EXPORT_SYMBOL(__bswapsi2); +EXPORT_SYMBOL(__bswapdi2); #ifdef CONFIG_AEABI EXPORT_SYMBOL(__aeabi_idiv); diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index af72969..5383df7 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ ucmpdi2.o lib1funcs.o div64.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ - call_with_stack.o + call_with_stack.o bswapsdi2.o mmu-y := clear_user.o copy_page.o getuser.o putuser.o diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S new file mode 100644 index 0000000..2ba43a0 --- /dev/null +++ b/arch/arm/lib/bswapsdi2.S @@ -0,0 +1,36 @@ +#include <linux/linkage.h> + +#if __LINUX_ARM_ARCH__ >= 6 +ENTRY(__bswapsi2) + rev r0, r0 + bx lr +ENDPROC(__bswapsi2) + +ENTRY(__bswapdi2) + rev r3, r0 + rev r0, r1 + mov r1, r3 + bx lr +ENDPROC(__bswapdi2) +#else +ENTRY(__bswapsi2) + eor r3, r0, r0, ror #16 + mov r3, r3, lsr #8 + bic r3, r3, #0xff00 + eor r0, r3, r0, ror #8 + mov pc, lr +ENDPROC(__bswapsi2) + +ENTRY(__bswapdi2) + mov ip, r1 + eor r3, ip, ip, ror #16 + eor r1, r0, r0, ror #16 + mov r1, r1, lsr #8 + mov r3, r3, lsr #8 + bic r3, r3, #0xff00 + bic r1, r1, #0xff00 + eor r1, r1, r0, ror #8 + eor r0, r3, ip, ror #8 + mov pc, lr +ENDPROC(__bswapdi2) +#endif