ARM: unaligned.h: Use an arch-specific version

Message ID	20170920151802.7609-1-romain.izard.pro@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Romain Izard <romain.izard.pro@gmail.com> To: Russell King <linux@armlinux.org.uk>, Al Viro <viro@zeniv.linux.org.uk>, Ard Biesheuvel <ard.biesheuvel@linaro.org>, Arnd Bergmann <arnd@arndb.de> Subject: [PATCH] ARM: unaligned.h: Use an arch-specific version Date: Wed, 20 Sep 2017 17:18:02 +0200 Message-Id: <20170920151802.7609-1-romain.izard.pro@gmail.com> Precedence: list Cc: Romain Izard <romain.izard.pro@gmail.com>, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Romain Izard Sept. 20, 2017, 3:18 p.m. UTC

For the 32-bit ARM architecture, unaligned access support is variable.
On a chip without a MMU, an unaligned access returns a rotated data word
and must be avoided.

When a MMU is available, it can be trapped. On ARMv6 or ARMv7, it can also
be handled by the hardware, depending on the type of access instruction.
Unaligned access of 32 bits or less are corrected, while larger access
provoke a trap.

Unfortunately, the compiler is able to merge two 32-bit access that
would generate a LDR instruction, that works on unaligned access, into a
single LDM access that will not work. This is not a common situation,
but it has been observed in the LZ4 decompression code.

To prevent this type of optimization, it is necessary to change the
explicit accessors for unaligned addresses from those defined in the
access_ok.h header, to those defined in the packed_struct.h header.

Add an arch-specific header to ARM, to retain other optimizations that
rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access
that explicitly rely on the unaligned accessors are correctly handled by
the compiler.

Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
---

This is a follow-up to this discussion:
HAVE_EFFICIENT_UNALIGNED_ACCESS on ARM32
https://lkml.org/lkml/2017/9/4/359

 arch/arm/include/asm/Kbuild      |  1 -
 arch/arm/include/asm/unaligned.h | 22 ++++++++++++++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/include/asm/unaligned.h

+#endif /* __ASM_ARM_UNALIGNED_H */

Ard Biesheuvel Sept. 20, 2017, 3:26 p.m. UTC | #1

Hi Romain,

On 20 September 2017 at 08:18, Romain Izard <romain.izard.pro@gmail.com> wrote:
> For the 32-bit ARM architecture, unaligned access support is variable.
> On a chip without a MMU, an unaligned access returns a rotated data word
> and must be avoided.
>
> When a MMU is available, it can be trapped. On ARMv6 or ARMv7, it can also
> be handled by the hardware, depending on the type of access instruction.
> Unaligned access of 32 bits or less are corrected, while larger access
> provoke a trap.
>
> Unfortunately, the compiler is able to merge two 32-bit access that
> would generate a LDR instruction, that works on unaligned access, into a
> single LDM access that will not work. This is not a common situation,
> but it has been observed in the LZ4 decompression code.
>
> To prevent this type of optimization, it is necessary to change the
> explicit accessors for unaligned addresses from those defined in the
> access_ok.h header, to those defined in the packed_struct.h header.
>
> Add an arch-specific header to ARM, to retain other optimizations that
> rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access
> that explicitly rely on the unaligned accessors are correctly handled by
> the compiler.
>
> Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
> ---
>

If access_ok.h has been observed to produce different output from the
struct versions (using any compiler), I guess we cannot simply change
the asm-generic default and expect everybody to be ok with that. So I
agree this is the most appropriate course of action.

With the wart below removed:
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> This is a follow-up to this discussion:
> HAVE_EFFICIENT_UNALIGNED_ACCESS on ARM32
> https://lkml.org/lkml/2017/9/4/359
>
>  arch/arm/include/asm/Kbuild      |  1 -
>  arch/arm/include/asm/unaligned.h | 22 ++++++++++++++++++++++
>  2 files changed, 22 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm/include/asm/unaligned.h
>
> diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
> index 721ab5ecfb9b..0f2c8a2a8131 100644
> --- a/arch/arm/include/asm/Kbuild
> +++ b/arch/arm/include/asm/Kbuild
> @@ -20,7 +20,6 @@ generic-y += simd.h
>  generic-y += sizes.h
>  generic-y += timex.h
>  generic-y += trace_clock.h
> -generic-y += unaligned.h
>
>  generated-y += mach-types.h
>  generated-y += unistd-nr.h
> diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h
> new file mode 100644
> index 000000000000..394227f24b77
> --- /dev/null
> +++ b/arch/arm/include/asm/unaligned.h
> @@ -0,0 +1,22 @@
> +#ifndef __ASM_ARM_UNALIGNED_H
> +#define __ASM_ARM_UNALIGNED_H
> +
> +#include <asm/byteorder.h>
> +
> +#if defined(__LITTLE_ENDIAN)
> +#include <linux/unaligned/le_struct.h>
> +#include <linux/unaligned/be_byteshift.h>
> +#include <linux/unaligned/generic.h>
> +#define get_unaligned  __get_unaligned_le
> +#define put_unaligned  __put_unaligned_le
> +#elif defined(__BIG_ENDIAN)
> +#include <linux/unaligned/be_struct.h>
> :q

^^^

> +#include <linux/unaligned/le_byteshift.h>
> +#include <linux/unaligned/generic.h>
> +#define get_unaligned  __get_unaligned_be
> +#define put_unaligned  __put_unaligned_be
> +#else
> +#error need to define endianness
> +#endif
> +
> +#endif /* __ASM_ARM_UNALIGNED_H */
> --
> 2.11.0

Russell King (Oracle) Sept. 20, 2017, 3:41 p.m. UTC | #2

On Wed, Sep 20, 2017 at 08:26:09AM -0700, Ard Biesheuvel wrote:
> Hi Romain,
> 
> On 20 September 2017 at 08:18, Romain Izard <romain.izard.pro@gmail.com> wrote:
> > For the 32-bit ARM architecture, unaligned access support is variable.
> > On a chip without a MMU, an unaligned access returns a rotated data word
> > and must be avoided.
> >
> > When a MMU is available, it can be trapped. On ARMv6 or ARMv7, it can also
> > be handled by the hardware, depending on the type of access instruction.
> > Unaligned access of 32 bits or less are corrected, while larger access
> > provoke a trap.
> >
> > Unfortunately, the compiler is able to merge two 32-bit access that
> > would generate a LDR instruction, that works on unaligned access, into a
> > single LDM access that will not work. This is not a common situation,
> > but it has been observed in the LZ4 decompression code.
> >
> > To prevent this type of optimization, it is necessary to change the
> > explicit accessors for unaligned addresses from those defined in the
> > access_ok.h header, to those defined in the packed_struct.h header.
> >
> > Add an arch-specific header to ARM, to retain other optimizations that
> > rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access
> > that explicitly rely on the unaligned accessors are correctly handled by
> > the compiler.
> >
> > Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
> > ---
> >
> 
> If access_ok.h has been observed to produce different output from the
> struct versions (using any compiler), I guess we cannot simply change
> the asm-generic default and expect everybody to be ok with that. So I
> agree this is the most appropriate course of action.

However, what effect does this have on the code generated for the rest
of the kernel?

Ard Biesheuvel Sept. 20, 2017, 4:28 p.m. UTC | #3

On 20 September 2017 at 08:41, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Wed, Sep 20, 2017 at 08:26:09AM -0700, Ard Biesheuvel wrote:
>> Hi Romain,
>>
>> On 20 September 2017 at 08:18, Romain Izard <romain.izard.pro@gmail.com> wrote:
>> > For the 32-bit ARM architecture, unaligned access support is variable.
>> > On a chip without a MMU, an unaligned access returns a rotated data word
>> > and must be avoided.
>> >
>> > When a MMU is available, it can be trapped. On ARMv6 or ARMv7, it can also
>> > be handled by the hardware, depending on the type of access instruction.
>> > Unaligned access of 32 bits or less are corrected, while larger access
>> > provoke a trap.
>> >
>> > Unfortunately, the compiler is able to merge two 32-bit access that
>> > would generate a LDR instruction, that works on unaligned access, into a
>> > single LDM access that will not work. This is not a common situation,
>> > but it has been observed in the LZ4 decompression code.
>> >
>> > To prevent this type of optimization, it is necessary to change the
>> > explicit accessors for unaligned addresses from those defined in the
>> > access_ok.h header, to those defined in the packed_struct.h header.
>> >
>> > Add an arch-specific header to ARM, to retain other optimizations that
>> > rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access
>> > that explicitly rely on the unaligned accessors are correctly handled by
>> > the compiler.
>> >
>> > Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
>> > ---
>> >
>>
>> If access_ok.h has been observed to produce different output from the
>> struct versions (using any compiler), I guess we cannot simply change
>> the asm-generic default and expect everybody to be ok with that. So I
>> agree this is the most appropriate course of action.
>
> However, what effect does this have on the code generated for the rest
> of the kernel?
>

On v6/v7, we may currently rely on alignment fixups for unaligned
ldm/stm instructions in the core kernel, and so avoiding those sounds
like an improvement to me. The 'struct' unaligned accessor type looks
like the sweet spot for ARM, since it will prevent the compiler from
using ldm/stm/ldrd/etc on quantities that may be unaligned, but does
not fall back to byte accesses on CPUs that can perform unaligned
accesses in hardware when using ldr/str.

I guess you may be concerned about performance regressions for
scenarios where the alignment fixup is only rarely triggered? Do you
have anything specific in mind?

Robin Murphy Sept. 20, 2017, 5:16 p.m. UTC | #4

Hi Romain,

On 20/09/17 16:18, Romain Izard wrote:
> For the 32-bit ARM architecture, unaligned access support is variable.
> On a chip without a MMU, an unaligned access returns a rotated data word
> and must be avoided.

Nit: that sentence is not really true - there are CPUs without MMUs that
still support alignment checking (e.g. ARM1156), and more importantly
the rotation thing is only true for LDR/LDRH - LDM always just ignores
the bottom two bits of the address, and LDRD doesn't make any guarantee
at all of what you might get back (and stores are even worse). I think
it suffices to say that in the absence of alignment checking, the
results of unaligned accesses are inconsistent and impossible to rely upon.

Robin.

> When a MMU is available, it can be trapped. On ARMv6 or ARMv7, it can also
> be handled by the hardware, depending on the type of access instruction.
> Unaligned access of 32 bits or less are corrected, while larger access
> provoke a trap.
> 
> Unfortunately, the compiler is able to merge two 32-bit access that
> would generate a LDR instruction, that works on unaligned access, into a
> single LDM access that will not work. This is not a common situation,
> but it has been observed in the LZ4 decompression code.
> 
> To prevent this type of optimization, it is necessary to change the
> explicit accessors for unaligned addresses from those defined in the
> access_ok.h header, to those defined in the packed_struct.h header.
> 
> Add an arch-specific header to ARM, to retain other optimizations that
> rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access
> that explicitly rely on the unaligned accessors are correctly handled by
> the compiler.
> 
> Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
> ---
> 
> This is a follow-up to this discussion:
> HAVE_EFFICIENT_UNALIGNED_ACCESS on ARM32
> https://lkml.org/lkml/2017/9/4/359
> 
>  arch/arm/include/asm/Kbuild      |  1 -
>  arch/arm/include/asm/unaligned.h | 22 ++++++++++++++++++++++
>  2 files changed, 22 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm/include/asm/unaligned.h
> 
> diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
> index 721ab5ecfb9b..0f2c8a2a8131 100644
> --- a/arch/arm/include/asm/Kbuild
> +++ b/arch/arm/include/asm/Kbuild
> @@ -20,7 +20,6 @@ generic-y += simd.h
>  generic-y += sizes.h
>  generic-y += timex.h
>  generic-y += trace_clock.h
> -generic-y += unaligned.h
>  
>  generated-y += mach-types.h
>  generated-y += unistd-nr.h
> diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h
> new file mode 100644
> index 000000000000..394227f24b77
> --- /dev/null
> +++ b/arch/arm/include/asm/unaligned.h
> @@ -0,0 +1,22 @@
> +#ifndef __ASM_ARM_UNALIGNED_H
> +#define __ASM_ARM_UNALIGNED_H
> +
> +#include <asm/byteorder.h>
> +
> +#if defined(__LITTLE_ENDIAN)
> +#include <linux/unaligned/le_struct.h>
> +#include <linux/unaligned/be_byteshift.h>
> +#include <linux/unaligned/generic.h>
> +#define get_unaligned	__get_unaligned_le
> +#define put_unaligned	__put_unaligned_le
> +#elif defined(__BIG_ENDIAN)
> +#include <linux/unaligned/be_struct.h>
> :q
> +#include <linux/unaligned/le_byteshift.h>
> +#include <linux/unaligned/generic.h>
> +#define get_unaligned	__get_unaligned_be
> +#define put_unaligned	__put_unaligned_be
> +#else
> +#error need to define endianness
> +#endif
> +
> +#endif /* __ASM_ARM_UNALIGNED_H */
>

Arnd Bergmann Sept. 20, 2017, 8:35 p.m. UTC | #5

On Wed, Sep 20, 2017 at 5:26 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 20 September 2017 at 08:18, Romain Izard <romain.izard.pro@gmail.com> wrote:
>> Add an arch-specific header to ARM, to retain other optimizations that
>> rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access
>> that explicitly rely on the unaligned accessors are correctly handled by
>> the compiler.
>>
>> Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
>> ---
>>
>
> If access_ok.h has been observed to produce different output from the
> struct versions (using any compiler), I guess we cannot simply change
> the asm-generic default and expect everybody to be ok with that. So I
> agree this is the most appropriate course of action.

But is that actually the case? I think patching ARM like this is
correct, but perhaps we can simply remove the access_ok.h
version entirely and just use the struct version everywhere.

Unfortunately it's been almost 10 years since the various
implementations got introduced, so it's hard to find out all
the concerns that went into it then.

Commit 064106a91be5 ("kernel: add common
infrastructure for unaligned access") lists

    Currently there are five implementations:
    1) packed_struct.h: C-struct based, from asm-generic/unaligned.h
    2) le_byteshift.h: Open coded byte-swapping, heavily based on asm-arm
    3) be_byteshift.h: Open coded byte-swapping, heavily based on asm-arm
    4) memmove.h: taken from multiple implementations in tree
    5) access_ok.h: taken from x86 and others, unaligned access is ok.

The only architectectures that use memmove.h are m32r (always has,
probably not intentionally) and openrisc. That one says that GCC on
OR32 "optimizes too aggressively" for struct.h, which is a bit scary
but wouldn't change anything here since they don't use the asm-generic
file.

The architectures that do use include/asm-generic/unaligned.h and
also set HAVE_EFFICIENT_UNALIGNED_ACCESS in some configurations
are arm, arm64, metag, s390 and arc.

This is a rather short list, and three of them (arm64, metag and arc) only
support very recent compilers, so we can probably just ask the respective
arch maintainers to ack the patch that changes the asm-generic file
for everyone.

        Arnd

Ard Biesheuvel Sept. 22, 2017, 9:36 p.m. UTC | #6

On 20 September 2017 at 13:35, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wed, Sep 20, 2017 at 5:26 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 20 September 2017 at 08:18, Romain Izard <romain.izard.pro@gmail.com> wrote:
>>> Add an arch-specific header to ARM, to retain other optimizations that
>>> rely on HAVE_EFFICIENT_UNALIGNED_ACCESS, while making sure that access
>>> that explicitly rely on the unaligned accessors are correctly handled by
>>> the compiler.
>>>
>>> Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
>>> ---
>>>
>>
>> If access_ok.h has been observed to produce different output from the
>> struct versions (using any compiler), I guess we cannot simply change
>> the asm-generic default and expect everybody to be ok with that. So I
>> agree this is the most appropriate course of action.
>
> But is that actually the case? I think patching ARM like this is
> correct, but perhaps we can simply remove the access_ok.h
> version entirely and just use the struct version everywhere.
>
> Unfortunately it's been almost 10 years since the various
> implementations got introduced, so it's hard to find out all
> the concerns that went into it then.
>
> Commit 064106a91be5 ("kernel: add common
> infrastructure for unaligned access") lists
>
>     Currently there are five implementations:
>     1) packed_struct.h: C-struct based, from asm-generic/unaligned.h
>     2) le_byteshift.h: Open coded byte-swapping, heavily based on asm-arm
>     3) be_byteshift.h: Open coded byte-swapping, heavily based on asm-arm
>     4) memmove.h: taken from multiple implementations in tree
>     5) access_ok.h: taken from x86 and others, unaligned access is ok.
>
> The only architectectures that use memmove.h are m32r (always has,
> probably not intentionally) and openrisc. That one says that GCC on
> OR32 "optimizes too aggressively" for struct.h, which is a bit scary
> but wouldn't change anything here since they don't use the asm-generic
> file.
>
> The architectures that do use include/asm-generic/unaligned.h and
> also set HAVE_EFFICIENT_UNALIGNED_ACCESS in some configurations
> are arm, arm64, metag, s390 and arc.
>
> This is a rather short list, and three of them (arm64, metag and arc) only
> support very recent compilers, so we can probably just ask the respective
> arch maintainers to ack the patch that changes the asm-generic file
> for everyone.
>

If we can limit the fallout like that, I agree that we should simply
make the struct flavor the default. It elegantly informs the compiler
about the size of the access and the potential misalignment, so it
should allow compilers for any architecture to select the most
appropriate instruction.

But doesn't that mean that any code that currently relies on
HAVE_EFFICIENT_UNALIGNED_ACCESS should be using get_unaligned instead?
I haven't reviewed the actual use cases (other than the ones I added
myself to the crypto subsystem), but it seems to me that it is
generally unsafe to do any unaligned accesses directly on ARM, given
that the compiler may merge adjacent LDRs into LDMs or LDRDs (and
likewise for stores)

Arnd Bergmann Sept. 22, 2017, 9:49 p.m. UTC | #7

On Fri, Sep 22, 2017 at 11:36 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 20 September 2017 at 13:35, Arnd Bergmann <arnd@arndb.de> wrote:

>> The architectures that do use include/asm-generic/unaligned.h and
>> also set HAVE_EFFICIENT_UNALIGNED_ACCESS in some configurations
>> are arm, arm64, metag, s390 and arc.
>>
>> This is a rather short list, and three of them (arm64, metag and arc) only
>> support very recent compilers, so we can probably just ask the respective
>> arch maintainers to ack the patch that changes the asm-generic file
>> for everyone.
>>
>
> If we can limit the fallout like that, I agree that we should simply
> make the struct flavor the default. It elegantly informs the compiler
> about the size of the access and the potential misalignment, so it
> should allow compilers for any architecture to select the most
> appropriate instruction.
>
> But doesn't that mean that any code that currently relies on
> HAVE_EFFICIENT_UNALIGNED_ACCESS should be using get_unaligned instead?
> I haven't reviewed the actual use cases (other than the ones I added
> myself to the crypto subsystem), but it seems to me that it is
> generally unsafe to do any unaligned accesses directly on ARM, given
> that the compiler may merge adjacent LDRs into LDMs or LDRDs (and
> likewise for stores)

It's not clear that all that code should be using get_unaligned(),
but I agree that any code relying on
HAVE_EFFICIENT_UNALIGNED_ACCESS is potentially inefficient
on ARM when it causes a trap.

I think it would be a useful goal to avoid running into the ARM alignment
trap handler entirely for kernel code, but that sounds like a lot of work.

If we want to do that, we'd need at least these steps:

- review each reference to HAVE_EFFICIENT_UNALIGNED_ACCESS
  and modify it so that gcc will never use the trapping instructions
- add a WARN_ON_ONCE() in the trap handler
- fix any drivers we run into that should be using get_unaligned() but
  blindly rely on the trap handler instead.

        Arnd

ARM: unaligned.h: Use an arch-specific version

Commit Message

Comments

Patch