Message ID | 20201109205155.1207545-1-ndesaulniers@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ARM: decompressor: avoid ADRL pseudo-instruction | expand |
On Mon, 9 Nov 2020 at 21:52, Nick Desaulniers <ndesaulniers@google.com> wrote: > > As Ard notes in > commit 54781938ec34 ("crypto: arm/sha256-neon - avoid ADRL pseudo > instruction") > commit 0f5e8323777b ("crypto: arm/sha512-neon - avoid ADRL pseudo > instruction") > > The ADRL pseudo instruction is not an architectural construct, but a > convenience macro that was supported by the ARM proprietary assembler > and adopted by binutils GAS as well, but only when assembling in 32-bit > ARM mode. Therefore, it can only be used in assembler code that is known > to assemble in ARM mode only, but as it turns out, the Clang assembler > does not implement ADRL at all, and so it is better to get rid of it > entirely. > > So replace the ADRL instruction with a ADR instruction that refers to > a nearer symbol, and apply the delta explicitly using an additional > instruction. > > We can use the same technique to generate the same offset. It looks like > the ADRL pseudo instruction assembles to two SUB instructions in this > case. Because the largest immediate operand that can be specified for > this instruction is 0x400, and the distance between the reference and > the symbol are larger than that, we need to use an intermediary symbol > (cache_off in this case) to calculate the full range. > > Suggested-by: Ard Biesheuvel <ardb@kernel.org> > Suggested-by: Jian Cai <jiancai@google.com> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > --- > arch/arm/boot/compressed/head.S | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S > index 2e04ec5b5446..b3eac6f9a709 100644 > --- a/arch/arm/boot/compressed/head.S > +++ b/arch/arm/boot/compressed/head.S > @@ -1440,7 +1440,9 @@ ENTRY(efi_enter_kernel) > mov r4, r0 @ preserve image base > mov r8, r1 @ preserve DT pointer > > - ARM( adrl r0, call_cache_fn ) > + ARM( sub r0, pc, #.L__efi_enter_kernel-cache_off ) > + ARM( sub r0, r0, #cache_off-call_cache_fn ) > +.L__efi_enter_kernel: > THUMB( adr r0, call_cache_fn ) > adr r1, 0f @ clean the region of code we > bl cache_clean_flush @ may run with the MMU off > -- > 2.29.2.222.g5d2a92d10f8-goog > This is already fixed in Russell's for-next tree.
On Mon, Nov 9, 2020 at 12:53 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Mon, 9 Nov 2020 at 21:52, Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > As Ard notes in > > commit 54781938ec34 ("crypto: arm/sha256-neon - avoid ADRL pseudo > > instruction") > > commit 0f5e8323777b ("crypto: arm/sha512-neon - avoid ADRL pseudo > > instruction") > > > > The ADRL pseudo instruction is not an architectural construct, but a > > convenience macro that was supported by the ARM proprietary assembler > > and adopted by binutils GAS as well, but only when assembling in 32-bit > > ARM mode. Therefore, it can only be used in assembler code that is known > > to assemble in ARM mode only, but as it turns out, the Clang assembler > > does not implement ADRL at all, and so it is better to get rid of it > > entirely. > > > > So replace the ADRL instruction with a ADR instruction that refers to > > a nearer symbol, and apply the delta explicitly using an additional > > instruction. > > > > We can use the same technique to generate the same offset. It looks like > > the ADRL pseudo instruction assembles to two SUB instructions in this > > case. Because the largest immediate operand that can be specified for > > this instruction is 0x400, and the distance between the reference and > > the symbol are larger than that, we need to use an intermediary symbol > > (cache_off in this case) to calculate the full range. > > > > Suggested-by: Ard Biesheuvel <ardb@kernel.org> > > Suggested-by: Jian Cai <jiancai@google.com> > > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > > --- > > arch/arm/boot/compressed/head.S | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S > > index 2e04ec5b5446..b3eac6f9a709 100644 > > --- a/arch/arm/boot/compressed/head.S > > +++ b/arch/arm/boot/compressed/head.S > > @@ -1440,7 +1440,9 @@ ENTRY(efi_enter_kernel) > > mov r4, r0 @ preserve image base > > mov r8, r1 @ preserve DT pointer > > > > - ARM( adrl r0, call_cache_fn ) > > + ARM( sub r0, pc, #.L__efi_enter_kernel-cache_off ) > > + ARM( sub r0, r0, #cache_off-call_cache_fn ) > > +.L__efi_enter_kernel: > > THUMB( adr r0, call_cache_fn ) > > adr r1, 0f @ clean the region of code we > > bl cache_clean_flush @ may run with the MMU off > > -- > > 2.29.2.222.g5d2a92d10f8-goog > > > > This is already fixed in Russell's for-next tree. Ah right, trolling through lore, there was: https://lore.kernel.org/linux-arm-kernel/20200914095706.3985-1-ardb@kernel.org/ I didn't see anything in linux-next today, or https://www.armlinux.org.uk/developer/patches/ Incoming or Applied. Did it just get merged into the for-next branch, or is for-next not getting pulled into linux-next?
On Mon, 9 Nov 2020 at 22:09, Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Mon, Nov 9, 2020 at 12:53 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Mon, 9 Nov 2020 at 21:52, Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > > > As Ard notes in > > > commit 54781938ec34 ("crypto: arm/sha256-neon - avoid ADRL pseudo > > > instruction") > > > commit 0f5e8323777b ("crypto: arm/sha512-neon - avoid ADRL pseudo > > > instruction") > > > > > > The ADRL pseudo instruction is not an architectural construct, but a > > > convenience macro that was supported by the ARM proprietary assembler > > > and adopted by binutils GAS as well, but only when assembling in 32-bit > > > ARM mode. Therefore, it can only be used in assembler code that is known > > > to assemble in ARM mode only, but as it turns out, the Clang assembler > > > does not implement ADRL at all, and so it is better to get rid of it > > > entirely. > > > > > > So replace the ADRL instruction with a ADR instruction that refers to > > > a nearer symbol, and apply the delta explicitly using an additional > > > instruction. > > > > > > We can use the same technique to generate the same offset. It looks like > > > the ADRL pseudo instruction assembles to two SUB instructions in this > > > case. Because the largest immediate operand that can be specified for > > > this instruction is 0x400, and the distance between the reference and > > > the symbol are larger than that, we need to use an intermediary symbol > > > (cache_off in this case) to calculate the full range. > > > > > > Suggested-by: Ard Biesheuvel <ardb@kernel.org> > > > Suggested-by: Jian Cai <jiancai@google.com> > > > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > > > --- > > > arch/arm/boot/compressed/head.S | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S > > > index 2e04ec5b5446..b3eac6f9a709 100644 > > > --- a/arch/arm/boot/compressed/head.S > > > +++ b/arch/arm/boot/compressed/head.S > > > @@ -1440,7 +1440,9 @@ ENTRY(efi_enter_kernel) > > > mov r4, r0 @ preserve image base > > > mov r8, r1 @ preserve DT pointer > > > > > > - ARM( adrl r0, call_cache_fn ) > > > + ARM( sub r0, pc, #.L__efi_enter_kernel-cache_off ) > > > + ARM( sub r0, r0, #cache_off-call_cache_fn ) > > > +.L__efi_enter_kernel: > > > THUMB( adr r0, call_cache_fn ) > > > adr r1, 0f @ clean the region of code we > > > bl cache_clean_flush @ may run with the MMU off > > > -- > > > 2.29.2.222.g5d2a92d10f8-goog > > > > > > > This is already fixed in Russell's for-next tree. > > Ah right, trolling through lore, there was: > https://lore.kernel.org/linux-arm-kernel/20200914095706.3985-1-ardb@kernel.org/ > > I didn't see anything in linux-next today, or > https://www.armlinux.org.uk/developer/patches/ Incoming or Applied. > > Did it just get merged into the for-next branch, or is for-next not > getting pulled into linux-next? It should appear tomorrow.
On Mon, Nov 9, 2020 at 1:45 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Mon, 9 Nov 2020 at 22:09, Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > I didn't see anything in linux-next today, or > > https://www.armlinux.org.uk/developer/patches/ Incoming or Applied. > > > > Did it just get merged into the for-next branch, or is for-next not > > getting pulled into linux-next? > > > It should appear tomorrow. Yep, LGTM.
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index 2e04ec5b5446..b3eac6f9a709 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -1440,7 +1440,9 @@ ENTRY(efi_enter_kernel) mov r4, r0 @ preserve image base mov r8, r1 @ preserve DT pointer - ARM( adrl r0, call_cache_fn ) + ARM( sub r0, pc, #.L__efi_enter_kernel-cache_off ) + ARM( sub r0, r0, #cache_off-call_cache_fn ) +.L__efi_enter_kernel: THUMB( adr r0, call_cache_fn ) adr r1, 0f @ clean the region of code we bl cache_clean_flush @ may run with the MMU off
As Ard notes in commit 54781938ec34 ("crypto: arm/sha256-neon - avoid ADRL pseudo instruction") commit 0f5e8323777b ("crypto: arm/sha512-neon - avoid ADRL pseudo instruction") The ADRL pseudo instruction is not an architectural construct, but a convenience macro that was supported by the ARM proprietary assembler and adopted by binutils GAS as well, but only when assembling in 32-bit ARM mode. Therefore, it can only be used in assembler code that is known to assemble in ARM mode only, but as it turns out, the Clang assembler does not implement ADRL at all, and so it is better to get rid of it entirely. So replace the ADRL instruction with a ADR instruction that refers to a nearer symbol, and apply the delta explicitly using an additional instruction. We can use the same technique to generate the same offset. It looks like the ADRL pseudo instruction assembles to two SUB instructions in this case. Because the largest immediate operand that can be specified for this instruction is 0x400, and the distance between the reference and the symbol are larger than that, we need to use an intermediary symbol (cache_off in this case) to calculate the full range. Suggested-by: Ard Biesheuvel <ardb@kernel.org> Suggested-by: Jian Cai <jiancai@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> --- arch/arm/boot/compressed/head.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)