Message ID | 20171115213428.22559-17-samitolvanen@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote: > CONFIG_LTO_CLANG requires the use of clang's integrated assembler, which > doesn't understand the inline assembly in aes-ce-cipher.c. Disable LTO for > the file to work around the issue. Could you elaborate on what the integrated asembler doesn't like? It's not entirely clear at a glance, as the asm in that file doesn't seem to do anything that obscure. Is it a bug? Thanks, Mark. > Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > Signed-off-by: Sami Tolvanen <samitolvanen@google.com> > --- > arch/arm64/crypto/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile > index b5edc5918c28..af08508521a3 100644 > --- a/arch/arm64/crypto/Makefile > +++ b/arch/arm64/crypto/Makefile > @@ -24,7 +24,7 @@ obj-$(CONFIG_CRYPTO_CRC32_ARM64_CE) += crc32-ce.o > crc32-ce-y:= crc32-ce-core.o crc32-ce-glue.o > > obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o > -CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto > +CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto $(DISABLE_LTO_CLANG) > > obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o > aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o > -- > 2.15.0.448.gf294e3d99a-goog >
On 20 November 2017 at 15:20, Mark Rutland <mark.rutland@arm.com> wrote: > On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote: >> CONFIG_LTO_CLANG requires the use of clang's integrated assembler, which >> doesn't understand the inline assembly in aes-ce-cipher.c. Disable LTO for >> the file to work around the issue. > > Could you elaborate on what the integrated asembler doesn't like? > > It's not entirely clear at a glance, as the asm in that file doesn't > seem to do anything that obscure. > Actually, it just occurred to me that this code does not adhere 100% to the kernel mode neon rules, by putting kernel_neon_begin/end and the code itself into the same source file. At the time, it seemed harmless, given that these functions are only exposed via function pointers and never called locally, and the compiler cannot really reorder the function calls with the asm block. However, under LTO this all changes, and it is no longer guaranteed that the NEON registers are only touched between the kernel mode neon begin/end calls. So the correct way to fix this would be to move the asm into its own .S file, and call it from between the kernel_mode_neon_begin/end calls. That should also fix your compat issue.
On Mon, Nov 20, 2017 at 03:20:14PM +0000, Mark Rutland wrote:
> Could you elaborate on what the integrated asembler doesn't like?
Here's the error, looks like in aes_sub:
<inline asm>:1:69: error: invalid operand for instruction
dup v1.4s, w12 ;movi v0.16b, #0 ;aese v0.16b, v1.16b ;umov w12, v0.4s[0] ;
^
LLVM ERROR: Error parsing inline asm
Sami
On Mon, Nov 20, 2017 at 03:25:31PM +0000, Ard Biesheuvel wrote: > However, under LTO this all changes, and it is no longer guaranteed > that the NEON registers are only touched between the kernel mode > neon begin/end calls. LTO operates on LLVM IR, so disabling LTO for this file should make sure there won't be any unsafe optimizations. Are there other places in the kernel that might have this issue? > So the correct way to fix this would be to move the asm into its own > .S file, and call it from between the kernel_mode_neon_begin/end > calls. That should also fix your compat issue. Sure, that would also work. Sami
On Mon, 20 Nov 2017 15:20:14 +0000 Mark Rutland <mark.rutland@arm.com> wrote: > On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote: > > CONFIG_LTO_CLANG requires the use of clang's integrated assembler, > > which doesn't understand the inline assembly in aes-ce-cipher.c. > > Disable LTO for the file to work around the issue. > > Could you elaborate on what the integrated asembler doesn't like? > > It's not entirely clear at a glance, as the asm in that file doesn't > seem to do anything that obscure. > > Is it a bug? Turns out, integrated assembler doesn't like this instruction in aes_sub(): umov %w[out], v0.4s[0] Specifically, it barks at "v0.4s[0]" part. And the way I read the spec, it's quite correct in not accepting this argument. From UMOV description: UMOV <Wd>, <Vn>.<Ts>[<index>] ... <Ts> For the 32-bit variant: is an element size specifier, encoded in the "imm5" field. It can have the following values: B when imm5 = xxxx1 H when imm5 = xxx10 S when imm5 = xx100 With "v0.s[0]" it builds fine. Ard, since this is your code, can you comment? Feels like a typo. Regards, Alex
On 20 November 2017 at 21:29, Alex Matveev <alxmtvv@gmail.com> wrote: > On Mon, 20 Nov 2017 15:20:14 +0000 > Mark Rutland <mark.rutland@arm.com> wrote: > >> On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote: >> > CONFIG_LTO_CLANG requires the use of clang's integrated assembler, >> > which doesn't understand the inline assembly in aes-ce-cipher.c. >> > Disable LTO for the file to work around the issue. >> >> Could you elaborate on what the integrated asembler doesn't like? >> >> It's not entirely clear at a glance, as the asm in that file doesn't >> seem to do anything that obscure. >> >> Is it a bug? > > Turns out, integrated assembler doesn't like this instruction in > aes_sub(): > umov %w[out], v0.4s[0] > > Specifically, it barks at "v0.4s[0]" part. And the way I read the spec, > it's quite correct in not accepting this argument. From UMOV > description: > > UMOV <Wd>, <Vn>.<Ts>[<index>] > > ... > > <Ts> For the 32-bit variant: is an element size specifier, encoded in > the "imm5" field. It can have the following values: > B when imm5 = xxxx1 > H when imm5 = xxx10 > S when imm5 = xx100 > > > With "v0.s[0]" it builds fine. > > Ard, since this is your code, can you comment? Feels like a typo. > Yep.
Sami, this seems to be better solution for aes-ce-cipher.c problem. Regards, Alex From 6bcdd763b56ce10a77a79373a46fc0e8d9026178 Mon Sep 17 00:00:00 2001 From: Alex Matveev <alxmtvv@gmail.com> Date: Mon, 20 Nov 2017 21:30:38 +0000 Subject: [PATCH] arm64: crypto: fix typo in aes_sub() Clang's integrated assembler can't parse "v0.4s[0]" argument of the UMOV instruction. And, as per ARM ARM, this is incorrect usage: UMOV <Wd>, <Vn>.<Ts>[<index>] ... <Ts> For the 32-bit variant: is an element size specifier, encoded in the "imm5" field. It can have the following values: B when imm5 = xxxx1 H when imm5 = xxx10 S when imm5 = xx100 Signed-off-by: Alex Matveev <alxmtvv@gmail.com> --- arch/arm64/crypto/aes-ce-cipher.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/crypto/aes-ce-cipher.c b/arch/arm64/crypto/aes-ce-cipher.c index 6a75cd75ed11..e26bedee2c45 100644 --- a/arch/arm64/crypto/aes-ce-cipher.c +++ b/arch/arm64/crypto/aes-ce-cipher.c @@ -152,7 +152,7 @@ static u32 aes_sub(u32 input) __asm__("dup v1.4s, %w[in] ;" "movi v0.16b, #0 ;" "aese v0.16b, v1.16b ;" - "umov %w[out], v0.4s[0] ;" + "umov %w[out], v0.s[0] ;" : [out] "=r"(ret) : [in] "r"(input)
On Mon, Nov 20, 2017 at 01:01:43PM -0800, Sami Tolvanen wrote: > On Mon, Nov 20, 2017 at 03:25:31PM +0000, Ard Biesheuvel wrote: > > However, under LTO this all changes, and it is no longer guaranteed > > that the NEON registers are only touched between the kernel mode > > neon begin/end calls. Just to check, I take it that the feat is that LTO can merge the begin/asm/end, reordering bits to the begin/end relative to the asm? AFAICT, assuming that LTO respects our compiler barriers: * the preempt_disable() in kernel_neon_begin() should prevent the asm block from being moved earlier, but it looks like it could be moved somewhere in the middle of local_bh_enable(). * the __this_cpu_xchg() in kernel_neon_end() *isn't* ordered w.r.t the asm, as it doesn't have a full memory clobber, and could be re-ordered before the asm block. We *could* solve this case with a barrier() at the end of kernel_neon_begin() and the start of kernel_neon_end(), but it is a whack-a-mole solution. :/ ... this also raises the question as to how the {__,}this_cpu*() ops are expected to be ordered w.r.t. other local operations, as that's not clear to me even in the absence of LTO. > LTO operates on LLVM IR, so disabling LTO for this file should make > sure there won't be any unsafe optimizations. Are there other places > in the kernel that might have this issue? I suspect that as above, there are a number of places that implicitly rely on compilation-unit boundaries enforcing (local) ordering w.r.t. asynchronous events, as the compiler won't otherwise be able to reorder code such as cpu-local flag manipulation. I think we have a much bigger problem here. Thanks, Mark.
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index b5edc5918c28..af08508521a3 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -24,7 +24,7 @@ obj-$(CONFIG_CRYPTO_CRC32_ARM64_CE) += crc32-ce.o crc32-ce-y:= crc32-ce-core.o crc32-ce-glue.o obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o -CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto +CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto $(DISABLE_LTO_CLANG) obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o