Message ID | 20170920151802.7609-1-romain.izard.pro@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Romain, On 20 September 2017 at 08:18, Romain Izard <romain.izard.pro@gmail.com> wrote: > For the 32-bit ARM architecture, unaligned access support is variable. > On a chip without a MMU, an unaligned access returns a rotated data word > and must be avoided. > > When a MMU is available, it can be trapped. On ARMv6 or ARMv7, it can also > be handled by the hardware, depending on the type of access instruction. > Unaligned access of 32 bits or less are corrected, while larger access > provoke a trap. > > Unfortunately, the compiler is able to merge two 32-bit access that > would generate a LDR instruction, that works on unaligned access, into a > single LDM access that will not work. This is not a common situation, > but it has been observed in the LZ4 decompression code. > > To prevent this type of optimization, it is necessary to change the > explicit accessors for unaligned addresses from those defined in the > access_ok.h header, to those defined in the packed_struct.h header. > > Add an arch-specific header to ARM, to retain other optimizations that > rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access > that explicitly rely on the unaligned accessors are correctly handled by > the compiler. > > Signed-off-by: Romain Izard <romain.izard.pro@gmail.com> > --- > If access_ok.h has been observed to produce different output from the struct versions (using any compiler), I guess we cannot simply change the asm-generic default and expect everybody to be ok with that. So I agree this is the most appropriate course of action. With the wart below removed: Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > This is a follow-up to this discussion: > HAVE_EFFICIENT_UNALIGNED_ACCESS on ARM32 > https://lkml.org/lkml/2017/9/4/359 > > arch/arm/include/asm/Kbuild | 1 - > arch/arm/include/asm/unaligned.h | 22 ++++++++++++++++++++++ > 2 files changed, 22 insertions(+), 1 deletion(-) > create mode 100644 arch/arm/include/asm/unaligned.h > > diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild > index 721ab5ecfb9b..0f2c8a2a8131 100644 > --- a/arch/arm/include/asm/Kbuild > +++ b/arch/arm/include/asm/Kbuild > @@ -20,7 +20,6 @@ generic-y += simd.h > generic-y += sizes.h > generic-y += timex.h > generic-y += trace_clock.h > -generic-y += unaligned.h > > generated-y += mach-types.h > generated-y += unistd-nr.h > diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h > new file mode 100644 > index 000000000000..394227f24b77 > --- /dev/null > +++ b/arch/arm/include/asm/unaligned.h > @@ -0,0 +1,22 @@ > +#ifndef __ASM_ARM_UNALIGNED_H > +#define __ASM_ARM_UNALIGNED_H > + > +#include <asm/byteorder.h> > + > +#if defined(__LITTLE_ENDIAN) > +#include <linux/unaligned/le_struct.h> > +#include <linux/unaligned/be_byteshift.h> > +#include <linux/unaligned/generic.h> > +#define get_unaligned __get_unaligned_le > +#define put_unaligned __put_unaligned_le > +#elif defined(__BIG_ENDIAN) > +#include <linux/unaligned/be_struct.h> > :q ^^^ > +#include <linux/unaligned/le_byteshift.h> > +#include <linux/unaligned/generic.h> > +#define get_unaligned __get_unaligned_be > +#define put_unaligned __put_unaligned_be > +#else > +#error need to define endianness > +#endif > + > +#endif /* __ASM_ARM_UNALIGNED_H */ > -- > 2.11.0
On Wed, Sep 20, 2017 at 08:26:09AM -0700, Ard Biesheuvel wrote: > Hi Romain, > > On 20 September 2017 at 08:18, Romain Izard <romain.izard.pro@gmail.com> wrote: > > For the 32-bit ARM architecture, unaligned access support is variable. > > On a chip without a MMU, an unaligned access returns a rotated data word > > and must be avoided. > > > > When a MMU is available, it can be trapped. On ARMv6 or ARMv7, it can also > > be handled by the hardware, depending on the type of access instruction. > > Unaligned access of 32 bits or less are corrected, while larger access > > provoke a trap. > > > > Unfortunately, the compiler is able to merge two 32-bit access that > > would generate a LDR instruction, that works on unaligned access, into a > > single LDM access that will not work. This is not a common situation, > > but it has been observed in the LZ4 decompression code. > > > > To prevent this type of optimization, it is necessary to change the > > explicit accessors for unaligned addresses from those defined in the > > access_ok.h header, to those defined in the packed_struct.h header. > > > > Add an arch-specific header to ARM, to retain other optimizations that > > rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access > > that explicitly rely on the unaligned accessors are correctly handled by > > the compiler. > > > > Signed-off-by: Romain Izard <romain.izard.pro@gmail.com> > > --- > > > > If access_ok.h has been observed to produce different output from the > struct versions (using any compiler), I guess we cannot simply change > the asm-generic default and expect everybody to be ok with that. So I > agree this is the most appropriate course of action. However, what effect does this have on the code generated for the rest of the kernel?
On 20 September 2017 at 08:41, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > On Wed, Sep 20, 2017 at 08:26:09AM -0700, Ard Biesheuvel wrote: >> Hi Romain, >> >> On 20 September 2017 at 08:18, Romain Izard <romain.izard.pro@gmail.com> wrote: >> > For the 32-bit ARM architecture, unaligned access support is variable. >> > On a chip without a MMU, an unaligned access returns a rotated data word >> > and must be avoided. >> > >> > When a MMU is available, it can be trapped. On ARMv6 or ARMv7, it can also >> > be handled by the hardware, depending on the type of access instruction. >> > Unaligned access of 32 bits or less are corrected, while larger access >> > provoke a trap. >> > >> > Unfortunately, the compiler is able to merge two 32-bit access that >> > would generate a LDR instruction, that works on unaligned access, into a >> > single LDM access that will not work. This is not a common situation, >> > but it has been observed in the LZ4 decompression code. >> > >> > To prevent this type of optimization, it is necessary to change the >> > explicit accessors for unaligned addresses from those defined in the >> > access_ok.h header, to those defined in the packed_struct.h header. >> > >> > Add an arch-specific header to ARM, to retain other optimizations that >> > rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access >> > that explicitly rely on the unaligned accessors are correctly handled by >> > the compiler. >> > >> > Signed-off-by: Romain Izard <romain.izard.pro@gmail.com> >> > --- >> > >> >> If access_ok.h has been observed to produce different output from the >> struct versions (using any compiler), I guess we cannot simply change >> the asm-generic default and expect everybody to be ok with that. So I >> agree this is the most appropriate course of action. > > However, what effect does this have on the code generated for the rest > of the kernel? > On v6/v7, we may currently rely on alignment fixups for unaligned ldm/stm instructions in the core kernel, and so avoiding those sounds like an improvement to me. The 'struct' unaligned accessor type looks like the sweet spot for ARM, since it will prevent the compiler from using ldm/stm/ldrd/etc on quantities that may be unaligned, but does not fall back to byte accesses on CPUs that can perform unaligned accesses in hardware when using ldr/str. I guess you may be concerned about performance regressions for scenarios where the alignment fixup is only rarely triggered? Do you have anything specific in mind?
Hi Romain, On 20/09/17 16:18, Romain Izard wrote: > For the 32-bit ARM architecture, unaligned access support is variable. > On a chip without a MMU, an unaligned access returns a rotated data word > and must be avoided. Nit: that sentence is not really true - there are CPUs without MMUs that still support alignment checking (e.g. ARM1156), and more importantly the rotation thing is only true for LDR/LDRH - LDM always just ignores the bottom two bits of the address, and LDRD doesn't make any guarantee at all of what you might get back (and stores are even worse). I think it suffices to say that in the absence of alignment checking, the results of unaligned accesses are inconsistent and impossible to rely upon. Robin. > When a MMU is available, it can be trapped. On ARMv6 or ARMv7, it can also > be handled by the hardware, depending on the type of access instruction. > Unaligned access of 32 bits or less are corrected, while larger access > provoke a trap. > > Unfortunately, the compiler is able to merge two 32-bit access that > would generate a LDR instruction, that works on unaligned access, into a > single LDM access that will not work. This is not a common situation, > but it has been observed in the LZ4 decompression code. > > To prevent this type of optimization, it is necessary to change the > explicit accessors for unaligned addresses from those defined in the > access_ok.h header, to those defined in the packed_struct.h header. > > Add an arch-specific header to ARM, to retain other optimizations that > rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access > that explicitly rely on the unaligned accessors are correctly handled by > the compiler. > > Signed-off-by: Romain Izard <romain.izard.pro@gmail.com> > --- > > This is a follow-up to this discussion: > HAVE_EFFICIENT_UNALIGNED_ACCESS on ARM32 > https://lkml.org/lkml/2017/9/4/359 > > arch/arm/include/asm/Kbuild | 1 - > arch/arm/include/asm/unaligned.h | 22 ++++++++++++++++++++++ > 2 files changed, 22 insertions(+), 1 deletion(-) > create mode 100644 arch/arm/include/asm/unaligned.h > > diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild > index 721ab5ecfb9b..0f2c8a2a8131 100644 > --- a/arch/arm/include/asm/Kbuild > +++ b/arch/arm/include/asm/Kbuild > @@ -20,7 +20,6 @@ generic-y += simd.h > generic-y += sizes.h > generic-y += timex.h > generic-y += trace_clock.h > -generic-y += unaligned.h > > generated-y += mach-types.h > generated-y += unistd-nr.h > diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h > new file mode 100644 > index 000000000000..394227f24b77 > --- /dev/null > +++ b/arch/arm/include/asm/unaligned.h > @@ -0,0 +1,22 @@ > +#ifndef __ASM_ARM_UNALIGNED_H > +#define __ASM_ARM_UNALIGNED_H > + > +#include <asm/byteorder.h> > + > +#if defined(__LITTLE_ENDIAN) > +#include <linux/unaligned/le_struct.h> > +#include <linux/unaligned/be_byteshift.h> > +#include <linux/unaligned/generic.h> > +#define get_unaligned __get_unaligned_le > +#define put_unaligned __put_unaligned_le > +#elif defined(__BIG_ENDIAN) > +#include <linux/unaligned/be_struct.h> > :q > +#include <linux/unaligned/le_byteshift.h> > +#include <linux/unaligned/generic.h> > +#define get_unaligned __get_unaligned_be > +#define put_unaligned __put_unaligned_be > +#else > +#error need to define endianness > +#endif > + > +#endif /* __ASM_ARM_UNALIGNED_H */ >
On Wed, Sep 20, 2017 at 5:26 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 20 September 2017 at 08:18, Romain Izard <romain.izard.pro@gmail.com> wrote: >> Add an arch-specific header to ARM, to retain other optimizations that >> rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access >> that explicitly rely on the unaligned accessors are correctly handled by >> the compiler. >> >> Signed-off-by: Romain Izard <romain.izard.pro@gmail.com> >> --- >> > > If access_ok.h has been observed to produce different output from the > struct versions (using any compiler), I guess we cannot simply change > the asm-generic default and expect everybody to be ok with that. So I > agree this is the most appropriate course of action. But is that actually the case? I think patching ARM like this is correct, but perhaps we can simply remove the access_ok.h version entirely and just use the struct version everywhere. Unfortunately it's been almost 10 years since the various implementations got introduced, so it's hard to find out all the concerns that went into it then. Commit 064106a91be5 ("kernel: add common infrastructure for unaligned access") lists Currently there are five implementations: 1) packed_struct.h: C-struct based, from asm-generic/unaligned.h 2) le_byteshift.h: Open coded byte-swapping, heavily based on asm-arm 3) be_byteshift.h: Open coded byte-swapping, heavily based on asm-arm 4) memmove.h: taken from multiple implementations in tree 5) access_ok.h: taken from x86 and others, unaligned access is ok. The only architectectures that use memmove.h are m32r (always has, probably not intentionally) and openrisc. That one says that GCC on OR32 "optimizes too aggressively" for struct.h, which is a bit scary but wouldn't change anything here since they don't use the asm-generic file. The architectures that do use include/asm-generic/unaligned.h and also set HAVE_EFFICIENT_UNALIGNED_ACCESS in some configurations are arm, arm64, metag, s390 and arc. This is a rather short list, and three of them (arm64, metag and arc) only support very recent compilers, so we can probably just ask the respective arch maintainers to ack the patch that changes the asm-generic file for everyone. Arnd
On 20 September 2017 at 13:35, Arnd Bergmann <arnd@arndb.de> wrote: > On Wed, Sep 20, 2017 at 5:26 PM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: >> On 20 September 2017 at 08:18, Romain Izard <romain.izard.pro@gmail.com> wrote: >>> Add an arch-specific header to ARM, to retain other optimizations that >>> rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access >>> that explicitly rely on the unaligned accessors are correctly handled by >>> the compiler. >>> >>> Signed-off-by: Romain Izard <romain.izard.pro@gmail.com> >>> --- >>> >> >> If access_ok.h has been observed to produce different output from the >> struct versions (using any compiler), I guess we cannot simply change >> the asm-generic default and expect everybody to be ok with that. So I >> agree this is the most appropriate course of action. > > But is that actually the case? I think patching ARM like this is > correct, but perhaps we can simply remove the access_ok.h > version entirely and just use the struct version everywhere. > > Unfortunately it's been almost 10 years since the various > implementations got introduced, so it's hard to find out all > the concerns that went into it then. > > Commit 064106a91be5 ("kernel: add common > infrastructure for unaligned access") lists > > Currently there are five implementations: > 1) packed_struct.h: C-struct based, from asm-generic/unaligned.h > 2) le_byteshift.h: Open coded byte-swapping, heavily based on asm-arm > 3) be_byteshift.h: Open coded byte-swapping, heavily based on asm-arm > 4) memmove.h: taken from multiple implementations in tree > 5) access_ok.h: taken from x86 and others, unaligned access is ok. > > The only architectectures that use memmove.h are m32r (always has, > probably not intentionally) and openrisc. That one says that GCC on > OR32 "optimizes too aggressively" for struct.h, which is a bit scary > but wouldn't change anything here since they don't use the asm-generic > file. > > The architectures that do use include/asm-generic/unaligned.h and > also set HAVE_EFFICIENT_UNALIGNED_ACCESS in some configurations > are arm, arm64, metag, s390 and arc. > > This is a rather short list, and three of them (arm64, metag and arc) only > support very recent compilers, so we can probably just ask the respective > arch maintainers to ack the patch that changes the asm-generic file > for everyone. > If we can limit the fallout like that, I agree that we should simply make the struct flavor the default. It elegantly informs the compiler about the size of the access and the potential misalignment, so it should allow compilers for any architecture to select the most appropriate instruction. But doesn't that mean that any code that currently relies on HAVE_EFFICIENT_UNALIGNED_ACCESS should be using get_unaligned instead? I haven't reviewed the actual use cases (other than the ones I added myself to the crypto subsystem), but it seems to me that it is generally unsafe to do any unaligned accesses directly on ARM, given that the compiler may merge adjacent LDRs into LDMs or LDRDs (and likewise for stores)
On Fri, Sep 22, 2017 at 11:36 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 20 September 2017 at 13:35, Arnd Bergmann <arnd@arndb.de> wrote: >> The architectures that do use include/asm-generic/unaligned.h and >> also set HAVE_EFFICIENT_UNALIGNED_ACCESS in some configurations >> are arm, arm64, metag, s390 and arc. >> >> This is a rather short list, and three of them (arm64, metag and arc) only >> support very recent compilers, so we can probably just ask the respective >> arch maintainers to ack the patch that changes the asm-generic file >> for everyone. >> > > If we can limit the fallout like that, I agree that we should simply > make the struct flavor the default. It elegantly informs the compiler > about the size of the access and the potential misalignment, so it > should allow compilers for any architecture to select the most > appropriate instruction. > > But doesn't that mean that any code that currently relies on > HAVE_EFFICIENT_UNALIGNED_ACCESS should be using get_unaligned instead? > I haven't reviewed the actual use cases (other than the ones I added > myself to the crypto subsystem), but it seems to me that it is > generally unsafe to do any unaligned accesses directly on ARM, given > that the compiler may merge adjacent LDRs into LDMs or LDRDs (and > likewise for stores) It's not clear that all that code should be using get_unaligned(), but I agree that any code relying on HAVE_EFFICIENT_UNALIGNED_ACCESS is potentially inefficient on ARM when it causes a trap. I think it would be a useful goal to avoid running into the ARM alignment trap handler entirely for kernel code, but that sounds like a lot of work. If we want to do that, we'd need at least these steps: - review each reference to HAVE_EFFICIENT_UNALIGNED_ACCESS and modify it so that gcc will never use the trapping instructions - add a WARN_ON_ONCE() in the trap handler - fix any drivers we run into that should be using get_unaligned() but blindly rely on the trap handler instead. Arnd
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 721ab5ecfb9b..0f2c8a2a8131 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -20,7 +20,6 @@ generic-y += simd.h generic-y += sizes.h generic-y += timex.h generic-y += trace_clock.h -generic-y += unaligned.h generated-y += mach-types.h generated-y += unistd-nr.h diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h new file mode 100644 index 000000000000..394227f24b77 --- /dev/null +++ b/arch/arm/include/asm/unaligned.h @@ -0,0 +1,22 @@ +#ifndef __ASM_ARM_UNALIGNED_H +#define __ASM_ARM_UNALIGNED_H + +#include <asm/byteorder.h> + +#if defined(__LITTLE_ENDIAN) +#include <linux/unaligned/le_struct.h> +#include <linux/unaligned/be_byteshift.h> +#include <linux/unaligned/generic.h> +#define get_unaligned __get_unaligned_le +#define put_unaligned __put_unaligned_le +#elif defined(__BIG_ENDIAN) +#include <linux/unaligned/be_struct.h> :q +#include <linux/unaligned/le_byteshift.h> +#include <linux/unaligned/generic.h> +#define get_unaligned __get_unaligned_be +#define put_unaligned __put_unaligned_be +#else +#error need to define endianness +#endif +
For the 32-bit ARM architecture, unaligned access support is variable. On a chip without a MMU, an unaligned access returns a rotated data word and must be avoided. When a MMU is available, it can be trapped. On ARMv6 or ARMv7, it can also be handled by the hardware, depending on the type of access instruction. Unaligned access of 32 bits or less are corrected, while larger access provoke a trap. Unfortunately, the compiler is able to merge two 32-bit access that would generate a LDR instruction, that works on unaligned access, into a single LDM access that will not work. This is not a common situation, but it has been observed in the LZ4 decompression code. To prevent this type of optimization, it is necessary to change the explicit accessors for unaligned addresses from those defined in the access_ok.h header, to those defined in the packed_struct.h header. Add an arch-specific header to ARM, to retain other optimizations that rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access that explicitly rely on the unaligned accessors are correctly handled by the compiler. Signed-off-by: Romain Izard <romain.izard.pro@gmail.com> --- This is a follow-up to this discussion: HAVE_EFFICIENT_UNALIGNED_ACCESS on ARM32 https://lkml.org/lkml/2017/9/4/359 arch/arm/include/asm/Kbuild | 1 - arch/arm/include/asm/unaligned.h | 22 ++++++++++++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) create mode 100644 arch/arm/include/asm/unaligned.h +#endif /* __ASM_ARM_UNALIGNED_H */