Message ID | 20190829154834.26547-4-will@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: avoid out-of-line ll/sc atomics | expand |
On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote: > From: Andrew Murray <andrew.murray@arm.com> > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware > or toolchain doesn't support it the existing code will fallback to ll/sc > atomics. It achieves this by branching from inline assembly to a function > that is built with special compile flags. Further this results in the > clobbering of registers even when the fallback isn't used increasing > register pressure. > > Improve this by providing inline implementations of both LSE and > ll/sc and use a static key to select between them, which allows for the > compiler to generate better atomics code. Put the LL/SC fallback atomics > in their own subsection to improve icache performance. > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> > Signed-off-by: Will Deacon <will@kernel.org> For some reason, this causes a clang built kernel to fail to boot in QEMU. There are no logs, it just never starts. I am off for the next two days so I am going to try to look into this but you might have some immediate ideas. https://github.com/ClangBuiltLinux/linux/issues/649 There is another weird failure that might be somewhat related but I have no idea. https://github.com/ClangBuiltLinux/linux/issues/648 Cheers, Nathan
On Mon, Sep 02, 2019 at 11:00:11PM -0700, Nathan Chancellor wrote: > On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote: > > From: Andrew Murray <andrew.murray@arm.com> > > > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware > > or toolchain doesn't support it the existing code will fallback to ll/sc > > atomics. It achieves this by branching from inline assembly to a function > > that is built with special compile flags. Further this results in the > > clobbering of registers even when the fallback isn't used increasing > > register pressure. > > > > Improve this by providing inline implementations of both LSE and > > ll/sc and use a static key to select between them, which allows for the > > compiler to generate better atomics code. Put the LL/SC fallback atomics > > in their own subsection to improve icache performance. > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> > > Signed-off-by: Will Deacon <will@kernel.org> > > For some reason, this causes a clang built kernel to fail to boot in > QEMU. There are no logs, it just never starts. I am off for the next two > days so I am going to try to look into this but you might have some > immediate ideas. Hmm, so unfortunately this series isn't bisectable, since I realised this when I was merging the patches from Andrew, hence this: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/atomics&id=b32baf91f60fb9c7010bff87e68132f2ce31d9a8 so if you're seeing a failure with the whole branch, this commit is probably just a red herring. > There is another weird failure that might be somewhat related but I have > no idea. > > https://github.com/ClangBuiltLinux/linux/issues/648 Interesting. Looks like KASAN is causing a cmpxchg() call on something which isn't 1, 2, 4 or 8 bytes in size :/ Will
On Mon, Sep 02, 2019 at 11:00:11PM -0700, Nathan Chancellor wrote: > On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote: > > From: Andrew Murray <andrew.murray@arm.com> > > > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware > > or toolchain doesn't support it the existing code will fallback to ll/sc > > atomics. It achieves this by branching from inline assembly to a function > > that is built with special compile flags. Further this results in the > > clobbering of registers even when the fallback isn't used increasing > > register pressure. > > > > Improve this by providing inline implementations of both LSE and > > ll/sc and use a static key to select between them, which allows for the > > compiler to generate better atomics code. Put the LL/SC fallback atomics > > in their own subsection to improve icache performance. > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> > > Signed-off-by: Will Deacon <will@kernel.org> > > For some reason, this causes a clang built kernel to fail to boot in > QEMU. There are no logs, it just never starts. I am off for the next two > days so I am going to try to look into this but you might have some > immediate ideas. > > https://github.com/ClangBuiltLinux/linux/issues/649 I've been able to reproduce this - with clang 9.0.6 and qemu (without KVM) and only when ARM64_LSE_ATOMICS is enabled. This is slightly concerning... (gdb) b __lse__cmpxchg_case_acq_32 Breakpoint 1 at 0xffff80001012b3cc: __lse__cmpxchg_case_acq_32. (19 locations) (gdb) continue Continuing. Breakpoint 1, __cmpxchg_case_acq_32 (ptr=<optimized out>, old=0, new=1) at /home/amurray/linux/./arch/arm64/include/asm/cmpxchg.h:121 121 __CMPXCHG_CASE(acq_, 32) (gdb) bt #0 __cmpxchg_case_acq_32 (ptr=<optimized out>, old=0, new=1) at /home/amurray/linux/./arch/arm64/include/asm/cmpxchg.h:121 #1 __cmpxchg_acq (ptr=<optimized out>, old=<optimized out>, new=<optimized out>, size=4) at /home/amurray/linux/./arch/arm64/include/asm/cmpxchg.h:173 #2 atomic_cmpxchg_acquire (v=<optimized out>, old=0, new=1) at /home/amurray/linux/./include/asm-generic/atomic-instrumented.h:664 #3 atomic_try_cmpxchg_acquire (v=<optimized out>, new=1, old=<optimized out>) at /home/amurray/linux/./include/linux/atomic-fallback.h:931 #4 queued_spin_lock (lock=<optimized out>) at /home/amurray/linux/./include/asm-generic/qspinlock.h:78 #5 do_raw_spin_lock (lock=<optimized out>) at /home/amurray/linux/./include/linux/spinlock.h:181 #6 __raw_spin_lock (lock=0xffff8000119b15d4 <logbuf_lock>) at /home/amurray/linux/./include/linux/spinlock_api_smp.h:143 #7 _raw_spin_lock (lock=0xffff8000119b15d4 <logbuf_lock>) at kernel/locking/spinlock.c:151 #8 0xffff800010147028 in vprintk_emit (facility=0, level=-1, dict=0x0, dictlen=0, fmt=0xffff800011103afe "\001\066Booting Linux on physical CPU 0x%010lx [0x%08x]\n", args=...) at kernel/printk/printk.c:1966 #9 0xffff800010147818 in vprintk_default (fmt=0xffff800011103afe "\001\066Booting Linux on physical CPU 0x%010lx [0x%08x]\n", args=...) at kernel/printk/printk.c:2013 #10 0xffff800010149c94 in vprintk_func (fmt=0xffff800011103afe "\001\066Booting Linux on physical CPU 0x%010lx [0x%08x]\n", args=...) at kernel/printk/printk_safe.c:386 #11 0xffff8000101461bc in printk (fmt=0xffff8000119b15d4 <logbuf_lock> "") at kernel/printk/printk.c:2046 #12 0xffff8000112d3238 in smp_setup_processor_id () at arch/arm64/kernel/setup.c:96 #13 0xffff8000112d06a4 in start_kernel () at init/main.c:581 #14 0x0000000000000000 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) In other words system_uses_lse_atomics seems to give us the LSE variant when we don't have LSE, thus resulting in an invalid instruction (we end up in do_undefinstr). Though I don't think system_uses_lse_atomics is at fault here, the behaviour varies depending on subtle code changes to lse.h, for example: - change system_uses_lse_atomics as follows, and the kernel boots as far as "Loading compiled-in X.509 certificates" and it gets stuck. --- a/arch/arm64/include/asm/lse.h +++ b/arch/arm64/include/asm/lse.h @@ -21,8 +21,11 @@ extern struct static_key_false arm64_const_caps_ready; static inline bool system_uses_lse_atomics(void) { - return (static_branch_likely(&arm64_const_caps_ready)) && - static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]); + if ((static_branch_likely(&arm64_const_caps_ready)) && + static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS])) + return true; + + return false; } - change is as follows, and we don't panic, but get stuck elsewhere in boot. diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h index 80b388278149..7c1d51fa54b2 100644 --- a/arch/arm64/include/asm/lse.h +++ b/arch/arm64/include/asm/lse.h @@ -16,13 +16,17 @@ __asm__(".arch_extension lse"); +void panic(const char *fmt, ...) __noreturn __cold; extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS]; extern struct static_key_false arm64_const_caps_ready; static inline bool system_uses_lse_atomics(void) { - return (static_branch_likely(&arm64_const_caps_ready)) && - static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]); + if ((static_branch_likely(&arm64_const_caps_ready)) && + static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS])) + panic("ATOMICS"); + + return false; } - change system_uses_lse_atomics to return false and it always boots Any ideas? Thanks, Andrew Murray > > There is another weird failure that might be somewhat related but I have > no idea. > > https://github.com/ClangBuiltLinux/linux/issues/648 > > Cheers, > Nathan
On Tue, Sep 03, 2019 at 03:31:19PM +0100, Andrew Murray wrote: > On Mon, Sep 02, 2019 at 11:00:11PM -0700, Nathan Chancellor wrote: > > On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote: > > > From: Andrew Murray <andrew.murray@arm.com> > > > > > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware > > > or toolchain doesn't support it the existing code will fallback to ll/sc > > > atomics. It achieves this by branching from inline assembly to a function > > > that is built with special compile flags. Further this results in the > > > clobbering of registers even when the fallback isn't used increasing > > > register pressure. > > > > > > Improve this by providing inline implementations of both LSE and > > > ll/sc and use a static key to select between them, which allows for the > > > compiler to generate better atomics code. Put the LL/SC fallback atomics > > > in their own subsection to improve icache performance. > > > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> > > > Signed-off-by: Will Deacon <will@kernel.org> > > > > For some reason, this causes a clang built kernel to fail to boot in > > QEMU. There are no logs, it just never starts. I am off for the next two > > days so I am going to try to look into this but you might have some > > immediate ideas. > > > > https://github.com/ClangBuiltLinux/linux/issues/649 > > I've been able to reproduce this - with clang 9.0.6 and qemu (without KVM) > and only when ARM64_LSE_ATOMICS is enabled. > > This is slightly concerning... > > (gdb) b __lse__cmpxchg_case_acq_32 > Breakpoint 1 at 0xffff80001012b3cc: __lse__cmpxchg_case_acq_32. (19 locations) > (gdb) continue > Continuing. [...] > Any ideas? Does it work if the only thing you change is the toolchain, and use GCC instead? Could be some teething issues in the 'asm goto' support for clang? Will
On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote: > On Tue, Sep 03, 2019 at 03:31:19PM +0100, Andrew Murray wrote: > > On Mon, Sep 02, 2019 at 11:00:11PM -0700, Nathan Chancellor wrote: > > > On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote: > > > > From: Andrew Murray <andrew.murray@arm.com> > > > > > > > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware > > > > or toolchain doesn't support it the existing code will fallback to ll/sc > > > > atomics. It achieves this by branching from inline assembly to a function > > > > that is built with special compile flags. Further this results in the > > > > clobbering of registers even when the fallback isn't used increasing > > > > register pressure. > > > > > > > > Improve this by providing inline implementations of both LSE and > > > > ll/sc and use a static key to select between them, which allows for the > > > > compiler to generate better atomics code. Put the LL/SC fallback atomics > > > > in their own subsection to improve icache performance. > > > > > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> > > > > Signed-off-by: Will Deacon <will@kernel.org> > > > > > > For some reason, this causes a clang built kernel to fail to boot in > > > QEMU. There are no logs, it just never starts. I am off for the next two > > > days so I am going to try to look into this but you might have some > > > immediate ideas. > > > > > > https://github.com/ClangBuiltLinux/linux/issues/649 > > > > I've been able to reproduce this - with clang 9.0.6 and qemu (without KVM) > > and only when ARM64_LSE_ATOMICS is enabled. > > > > This is slightly concerning... > > > > (gdb) b __lse__cmpxchg_case_acq_32 > > Breakpoint 1 at 0xffff80001012b3cc: __lse__cmpxchg_case_acq_32. (19 locations) > > (gdb) continue > > Continuing. > > [...] > > > Any ideas? > > Does it work if the only thing you change is the toolchain, and use GCC > instead? Yup. > Could be some teething issues in the 'asm goto' support for clang? Thanks, Andrew Murray > > Will
On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote: > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote: > > On Tue, Sep 03, 2019 at 03:31:19PM +0100, Andrew Murray wrote: > > > On Mon, Sep 02, 2019 at 11:00:11PM -0700, Nathan Chancellor wrote: > > > > On Thu, Aug 29, 2019 at 04:48:27PM +0100, Will Deacon wrote: > > > > > From: Andrew Murray <andrew.murray@arm.com> > > > > > > > > > > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware > > > > > or toolchain doesn't support it the existing code will fallback to ll/sc > > > > > atomics. It achieves this by branching from inline assembly to a function > > > > > that is built with special compile flags. Further this results in the > > > > > clobbering of registers even when the fallback isn't used increasing > > > > > register pressure. > > > > > > > > > > Improve this by providing inline implementations of both LSE and > > > > > ll/sc and use a static key to select between them, which allows for the > > > > > compiler to generate better atomics code. Put the LL/SC fallback atomics > > > > > in their own subsection to improve icache performance. > > > > > > > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> > > > > > Signed-off-by: Will Deacon <will@kernel.org> > > > > > > > > For some reason, this causes a clang built kernel to fail to boot in > > > > QEMU. There are no logs, it just never starts. I am off for the next two > > > > days so I am going to try to look into this but you might have some > > > > immediate ideas. > > > > > > > > https://github.com/ClangBuiltLinux/linux/issues/649 > > > > > > I've been able to reproduce this - with clang 9.0.6 and qemu (without KVM) > > > and only when ARM64_LSE_ATOMICS is enabled. > > > > > > This is slightly concerning... > > > > > > (gdb) b __lse__cmpxchg_case_acq_32 > > > Breakpoint 1 at 0xffff80001012b3cc: __lse__cmpxchg_case_acq_32. (19 locations) > > > (gdb) continue > > > Continuing. > > > > [...] > > > > > Any ideas? > > > > Does it work if the only thing you change is the toolchain, and use GCC > > instead? > > Yup. Also this is Clang generation: ffff8000100f2700 <__ptrace_link>: ffff8000100f2700: f9426009 ldr x9, [x0, #1216] ffff8000100f2704: 91130008 add x8, x0, #0x4c0 ffff8000100f2708: eb09011f cmp x8, x9 ffff8000100f270c: 540002a1 b.ne ffff8000100f2760 <__ptrace_link+0x60> // b.any ffff8000100f2710: f9425829 ldr x9, [x1, #1200] ffff8000100f2714: 9112c02a add x10, x1, #0x4b0 ffff8000100f2718: f9000528 str x8, [x9, #8] ffff8000100f271c: f9026009 str x9, [x0, #1216] ffff8000100f2720: f902640a str x10, [x0, #1224] ffff8000100f2724: f9025828 str x8, [x1, #1200] ffff8000100f2728: f9024001 str x1, [x0, #1152] ffff8000100f272c: b4000162 cbz x2, ffff8000100f2758 <__ptrace_link+0x58> ffff8000100f2730: b900985f str wzr, [x2, #152] ffff8000100f2734: 14000004 b ffff8000100f2744 <__ptrace_link+0x44> ffff8000100f2738: 14000001 b ffff8000100f273c <__ptrace_link+0x3c> ffff8000100f273c: 14000006 b ffff8000100f2754 <__ptrace_link+0x54> ffff8000100f2740: 14000001 b ffff8000100f2744 <__ptrace_link+0x44> ffff8000100f2744: 52800028 mov w8, #0x1 // #1 ffff8000100f2748: b828005f stadd w8, [x2] ffff8000100f274c: f9030002 str x2, [x0, #1536] ffff8000100f2750: d65f03c0 ret ffff8000100f2754: 140007fd b ffff8000100f4748 <ptrace_check_attach+0xf8> ... This looks like the default path (before we write over it) will take you to the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at least not what we expected to see. Also why 4 branches? And GCC: ffff8000100ebc98 <__ptrace_link>: ffff8000100ebc98: f9426003 ldr x3, [x0, #1216] ffff8000100ebc9c: 91130004 add x4, x0, #0x4c0 ffff8000100ebca0: eb03009f cmp x4, x3 ffff8000100ebca4: 54000261 b.ne ffff8000100ebcf0 <__ptrace_link+0x58> // b.any ffff8000100ebca8: f9425825 ldr x5, [x1, #1200] ffff8000100ebcac: 9112c026 add x6, x1, #0x4b0 ffff8000100ebcb0: f90004a4 str x4, [x5, #8] ffff8000100ebcb4: f9026005 str x5, [x0, #1216] ffff8000100ebcb8: f9026406 str x6, [x0, #1224] ffff8000100ebcbc: f9025824 str x4, [x1, #1200] ffff8000100ebcc0: f9024001 str x1, [x0, #1152] ffff8000100ebcc4: b4000122 cbz x2, ffff8000100ebce8 <__ptrace_link+0x50> ffff8000100ebcc8: b900985f str wzr, [x2, #152] ffff8000100ebccc: 14000006 b ffff8000100ebce4 <__ptrace_link+0x4c> ffff8000100ebcd0: 14000005 b ffff8000100ebce4 <__ptrace_link+0x4c> ffff8000100ebcd4: 52800021 mov w1, #0x1 // #1 ffff8000100ebcd8: b821005f stadd w1, [x2] ffff8000100ebcdc: f9030002 str x2, [x0, #1536] ffff8000100ebce0: d65f03c0 ret ffff8000100ebce4: 14000599 b ffff8000100ed348 <__arm64_compat_sys_ptrace+0x180> ... Thanks, Andrew Murray > > > > Could be some teething issues in the 'asm goto' support for clang? > > Thanks, > > Andrew Murray > > > > > Will > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote: > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote: > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote: > > > Does it work if the only thing you change is the toolchain, and use GCC > > > instead? > > > > Yup. > > Also this is Clang generation: > > ffff8000100f2700 <__ptrace_link>: > ffff8000100f2700: f9426009 ldr x9, [x0, #1216] > ffff8000100f2704: 91130008 add x8, x0, #0x4c0 > ffff8000100f2708: eb09011f cmp x8, x9 > ffff8000100f270c: 540002a1 b.ne ffff8000100f2760 <__ptrace_link+0x60> // b.any > ffff8000100f2710: f9425829 ldr x9, [x1, #1200] > ffff8000100f2714: 9112c02a add x10, x1, #0x4b0 > ffff8000100f2718: f9000528 str x8, [x9, #8] > ffff8000100f271c: f9026009 str x9, [x0, #1216] > ffff8000100f2720: f902640a str x10, [x0, #1224] > ffff8000100f2724: f9025828 str x8, [x1, #1200] > ffff8000100f2728: f9024001 str x1, [x0, #1152] > ffff8000100f272c: b4000162 cbz x2, ffff8000100f2758 <__ptrace_link+0x58> > ffff8000100f2730: b900985f str wzr, [x2, #152] > ffff8000100f2734: 14000004 b ffff8000100f2744 <__ptrace_link+0x44> > ffff8000100f2738: 14000001 b ffff8000100f273c <__ptrace_link+0x3c> > ffff8000100f273c: 14000006 b ffff8000100f2754 <__ptrace_link+0x54> > ffff8000100f2740: 14000001 b ffff8000100f2744 <__ptrace_link+0x44> > ffff8000100f2744: 52800028 mov w8, #0x1 // #1 > ffff8000100f2748: b828005f stadd w8, [x2] > ffff8000100f274c: f9030002 str x2, [x0, #1536] > ffff8000100f2750: d65f03c0 ret > ffff8000100f2754: 140007fd b ffff8000100f4748 <ptrace_check_attach+0xf8> > ... > > This looks like the default path (before we write over it) will take you to > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at > least not what we expected to see. Also why 4 branches? So I reproduced this with a silly atomic_inc wrapper: void will_atomic_inc(atomic_t *v) { atomic_inc(v); } Compiles to: 0000000000000018 <will_atomic_inc>: 18: 14000004 b 28 <will_atomic_inc+0x10> 1c: 14000001 b 20 <will_atomic_inc+0x8> 20: 14000005 b 34 <will_atomic_inc+0x1c> 24: 14000001 b 28 <will_atomic_inc+0x10> 28: 52800028 mov w8, #0x1 // #1 2c: b828001f stadd w8, [x0] 30: d65f03c0 ret 34: 14000027 b d0 <dump_kernel_offset+0x60> 38: d65f03c0 ret which is going to explode. Will
On Tue, Sep 03, 2019 at 05:37:55PM +0100, Will Deacon wrote: > On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote: > > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote: > > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote: > > > > Does it work if the only thing you change is the toolchain, and use GCC > > > > instead? > > > > > > Yup. > > > > Also this is Clang generation: > > > > ffff8000100f2700 <__ptrace_link>: > > ffff8000100f2700: f9426009 ldr x9, [x0, #1216] > > ffff8000100f2704: 91130008 add x8, x0, #0x4c0 > > ffff8000100f2708: eb09011f cmp x8, x9 > > ffff8000100f270c: 540002a1 b.ne ffff8000100f2760 <__ptrace_link+0x60> // b.any > > ffff8000100f2710: f9425829 ldr x9, [x1, #1200] > > ffff8000100f2714: 9112c02a add x10, x1, #0x4b0 > > ffff8000100f2718: f9000528 str x8, [x9, #8] > > ffff8000100f271c: f9026009 str x9, [x0, #1216] > > ffff8000100f2720: f902640a str x10, [x0, #1224] > > ffff8000100f2724: f9025828 str x8, [x1, #1200] > > ffff8000100f2728: f9024001 str x1, [x0, #1152] > > ffff8000100f272c: b4000162 cbz x2, ffff8000100f2758 <__ptrace_link+0x58> > > ffff8000100f2730: b900985f str wzr, [x2, #152] > > ffff8000100f2734: 14000004 b ffff8000100f2744 <__ptrace_link+0x44> > > ffff8000100f2738: 14000001 b ffff8000100f273c <__ptrace_link+0x3c> > > ffff8000100f273c: 14000006 b ffff8000100f2754 <__ptrace_link+0x54> > > ffff8000100f2740: 14000001 b ffff8000100f2744 <__ptrace_link+0x44> > > ffff8000100f2744: 52800028 mov w8, #0x1 // #1 > > ffff8000100f2748: b828005f stadd w8, [x2] > > ffff8000100f274c: f9030002 str x2, [x0, #1536] > > ffff8000100f2750: d65f03c0 ret > > ffff8000100f2754: 140007fd b ffff8000100f4748 <ptrace_check_attach+0xf8> > > ... > > > > This looks like the default path (before we write over it) will take you to > > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at > > least not what we expected to see. Also why 4 branches? > > So I reproduced this with a silly atomic_inc wrapper: > > void will_atomic_inc(atomic_t *v) > { > atomic_inc(v); > } > > Compiles to: > > 0000000000000018 <will_atomic_inc>: > 18: 14000004 b 28 <will_atomic_inc+0x10> > 1c: 14000001 b 20 <will_atomic_inc+0x8> > 20: 14000005 b 34 <will_atomic_inc+0x1c> > 24: 14000001 b 28 <will_atomic_inc+0x10> > 28: 52800028 mov w8, #0x1 // #1 > 2c: b828001f stadd w8, [x0] > 30: d65f03c0 ret > 34: 14000027 b d0 <dump_kernel_offset+0x60> > 38: d65f03c0 ret > > which is going to explode. I've come up with a simple reproducer for this issue: static bool branch_jump() { asm_volatile_goto( "1: b %l[l_yes2]" : : : : l_yes2); return false; l_yes2: return true; } static bool branch_test() { return (!branch_jump() && !branch_jump()); } void andy_test(int *v) { if (branch_test()) *v = 0xff; } This leads to the following (it shouldn't do anything): 0000000000000000 <andy_test>: 0: 14000004 b 10 <andy_test+0x10> 4: 14000001 b 8 <andy_test+0x8> 8: 14000004 b 18 <andy_test+0x18> c: 14000001 b 10 <andy_test+0x10> 10: 52801fe8 mov w8, #0xff // #255 14: b9000008 str w8, [x0] 18: d65f03c0 ret The issue goes away with any of the following hunks: @@ -55,7 +55,7 @@ static bool branch_jump() static bool branch_test() { - return (!branch_jump() && !branch_jump()); + return (!branch_jump()); } void andy_test(int *v) or: @@ -53,14 +53,10 @@ static bool branch_jump() return true; } -static bool branch_test() -{ - return (!branch_jump() && !branch_jump()); -} void andy_test(int *v) { - if (branch_test()) + if (!branch_jump() && !branch_jump()) *v = 0xff; } Thanks, Andrew Murray > > Will
On Tue, Sep 3, 2019 at 3:04 PM Andrew Murray <andrew.murray@arm.com> wrote: > > On Tue, Sep 03, 2019 at 05:37:55PM +0100, Will Deacon wrote: > > On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote: > > > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote: > > > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote: > > > > > Does it work if the only thing you change is the toolchain, and use GCC > > > > > instead? > > > > > > > > Yup. > > > > > > Also this is Clang generation: > > > > > > ffff8000100f2700 <__ptrace_link>: > > > ffff8000100f2700: f9426009 ldr x9, [x0, #1216] > > > ffff8000100f2704: 91130008 add x8, x0, #0x4c0 > > > ffff8000100f2708: eb09011f cmp x8, x9 > > > ffff8000100f270c: 540002a1 b.ne ffff8000100f2760 <__ptrace_link+0x60> // b.any > > > ffff8000100f2710: f9425829 ldr x9, [x1, #1200] > > > ffff8000100f2714: 9112c02a add x10, x1, #0x4b0 > > > ffff8000100f2718: f9000528 str x8, [x9, #8] > > > ffff8000100f271c: f9026009 str x9, [x0, #1216] > > > ffff8000100f2720: f902640a str x10, [x0, #1224] > > > ffff8000100f2724: f9025828 str x8, [x1, #1200] > > > ffff8000100f2728: f9024001 str x1, [x0, #1152] > > > ffff8000100f272c: b4000162 cbz x2, ffff8000100f2758 <__ptrace_link+0x58> > > > ffff8000100f2730: b900985f str wzr, [x2, #152] > > > ffff8000100f2734: 14000004 b ffff8000100f2744 <__ptrace_link+0x44> > > > ffff8000100f2738: 14000001 b ffff8000100f273c <__ptrace_link+0x3c> > > > ffff8000100f273c: 14000006 b ffff8000100f2754 <__ptrace_link+0x54> > > > ffff8000100f2740: 14000001 b ffff8000100f2744 <__ptrace_link+0x44> > > > ffff8000100f2744: 52800028 mov w8, #0x1 // #1 > > > ffff8000100f2748: b828005f stadd w8, [x2] > > > ffff8000100f274c: f9030002 str x2, [x0, #1536] > > > ffff8000100f2750: d65f03c0 ret > > > ffff8000100f2754: 140007fd b ffff8000100f4748 <ptrace_check_attach+0xf8> > > > ... > > > > > > This looks like the default path (before we write over it) will take you to > > > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at > > > least not what we expected to see. Also why 4 branches? > > > > So I reproduced this with a silly atomic_inc wrapper: > > > > void will_atomic_inc(atomic_t *v) > > { > > atomic_inc(v); > > } > > > > Compiles to: > > > > 0000000000000018 <will_atomic_inc>: > > 18: 14000004 b 28 <will_atomic_inc+0x10> > > 1c: 14000001 b 20 <will_atomic_inc+0x8> > > 20: 14000005 b 34 <will_atomic_inc+0x1c> > > 24: 14000001 b 28 <will_atomic_inc+0x10> > > 28: 52800028 mov w8, #0x1 // #1 > > 2c: b828001f stadd w8, [x0] > > 30: d65f03c0 ret > > 34: 14000027 b d0 <dump_kernel_offset+0x60> > > 38: d65f03c0 ret > > > > which is going to explode. Indeed, I can reproduce the hang with `-cpu cortex-a57` and `-cpu cortex-a73` in QEMU. Looks like my qemu (3.1.0) doesn't recognizer newer cores, so I might have to build QEMU from source to test the v8.2 extension support. https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_cores > > I've come up with a simple reproducer for this issue: > > static bool branch_jump() > { > asm_volatile_goto( > "1: b %l[l_yes2]" > : : : : l_yes2); > > return false; > l_yes2: > return true; > } > > static bool branch_test() > { > return (!branch_jump() && !branch_jump()); > } > > void andy_test(int *v) > { > if (branch_test()) > *v = 0xff; > } > > This leads to the following (it shouldn't do anything): > > 0000000000000000 <andy_test>: > 0: 14000004 b 10 <andy_test+0x10> > 4: 14000001 b 8 <andy_test+0x8> > 8: 14000004 b 18 <andy_test+0x18> > c: 14000001 b 10 <andy_test+0x10> > 10: 52801fe8 mov w8, #0xff // #255 > 14: b9000008 str w8, [x0] > 18: d65f03c0 ret > > The issue goes away with any of the following hunks: > > > @@ -55,7 +55,7 @@ static bool branch_jump() > > static bool branch_test() > { > - return (!branch_jump() && !branch_jump()); > + return (!branch_jump()); > } > > void andy_test(int *v) > > > or: > > > @@ -53,14 +53,10 @@ static bool branch_jump() > return true; > } > > -static bool branch_test() > -{ > - return (!branch_jump() && !branch_jump()); > -} > > void andy_test(int *v) > { > - if (branch_test()) > + if (!branch_jump() && !branch_jump()) > *v = 0xff; > } Thanks for the report. We squashed many bugs related to asm goto, but it's difficult to say with 100% certainty that the current implementation is bug free. Simply throwing more exotic forms of control flow at it often shake out corner cases. Thank you very much for the reduced test case, and I'll look into getting a fix ready hopefully in time to make the clang-9 release train.
> On Wed, Sep 4, 2019 at 7:35 AM Nick Desaulniers <ndesaulniers@google.com> wrote: >> Thanks for the report. We squashed many bugs related to asm goto, but >> it's difficult to say with 100% certainty that the current >> implementation is bug free. Simply throwing more exotic forms of >> control flow at it often shake out corner cases. Thank you very much >> for the reduced test case, and I'll look into getting a fix ready >> hopefully in time to make the clang-9 release train. On Tue, Sep 3, 2019 at 3:49 PM Itaru Kitayama <itaru.kitayama@gmail.com> wrote: > > Do you mean that you'd do a backport to Clang 9 as well as the trunk contribution? Yes; I think the window for merging things in the 9.0 release is still open, though they are late in the -rc cycle. If not 9.1 bugfix is possible.
On Tue, Sep 03, 2019 at 03:53:34PM -0700, Nick Desaulniers wrote: > > On Wed, Sep 4, 2019 at 7:35 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > >> Thanks for the report. We squashed many bugs related to asm goto, but > >> it's difficult to say with 100% certainty that the current > >> implementation is bug free. Simply throwing more exotic forms of > >> control flow at it often shake out corner cases. Thank you very much > >> for the reduced test case, and I'll look into getting a fix ready > >> hopefully in time to make the clang-9 release train. > > On Tue, Sep 3, 2019 at 3:49 PM Itaru Kitayama <itaru.kitayama@gmail.com> wrote: > > > > Do you mean that you'd do a backport to Clang 9 as well as the trunk contribution? > > Yes; I think the window for merging things in the 9.0 release is still > open, though they are late in the -rc cycle. If not 9.1 bugfix is > possible. Thanks, Nick. If you run out of time to get it fixed then it would probably be best to disable 'asm goto' support by default for arm64, since at least you'll have a functional kernel in that case. Will
On Tue, Sep 3, 2019 at 3:04 PM Andrew Murray <andrew.murray@arm.com> wrote: > > On Tue, Sep 03, 2019 at 05:37:55PM +0100, Will Deacon wrote: > > On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote: > > > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote: > > > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote: > > > > > Does it work if the only thing you change is the toolchain, and use GCC > > > > > instead? > > > > > > > > Yup. > > > > > > Also this is Clang generation: > > > > > > ffff8000100f2700 <__ptrace_link>: > > > ffff8000100f2700: f9426009 ldr x9, [x0, #1216] > > > ffff8000100f2704: 91130008 add x8, x0, #0x4c0 > > > ffff8000100f2708: eb09011f cmp x8, x9 > > > ffff8000100f270c: 540002a1 b.ne ffff8000100f2760 <__ptrace_link+0x60> // b.any > > > ffff8000100f2710: f9425829 ldr x9, [x1, #1200] > > > ffff8000100f2714: 9112c02a add x10, x1, #0x4b0 > > > ffff8000100f2718: f9000528 str x8, [x9, #8] > > > ffff8000100f271c: f9026009 str x9, [x0, #1216] > > > ffff8000100f2720: f902640a str x10, [x0, #1224] > > > ffff8000100f2724: f9025828 str x8, [x1, #1200] > > > ffff8000100f2728: f9024001 str x1, [x0, #1152] > > > ffff8000100f272c: b4000162 cbz x2, ffff8000100f2758 <__ptrace_link+0x58> > > > ffff8000100f2730: b900985f str wzr, [x2, #152] > > > ffff8000100f2734: 14000004 b ffff8000100f2744 <__ptrace_link+0x44> > > > ffff8000100f2738: 14000001 b ffff8000100f273c <__ptrace_link+0x3c> > > > ffff8000100f273c: 14000006 b ffff8000100f2754 <__ptrace_link+0x54> > > > ffff8000100f2740: 14000001 b ffff8000100f2744 <__ptrace_link+0x44> > > > ffff8000100f2744: 52800028 mov w8, #0x1 // #1 > > > ffff8000100f2748: b828005f stadd w8, [x2] > > > ffff8000100f274c: f9030002 str x2, [x0, #1536] > > > ffff8000100f2750: d65f03c0 ret > > > ffff8000100f2754: 140007fd b ffff8000100f4748 <ptrace_check_attach+0xf8> > > > ... > > > > > > This looks like the default path (before we write over it) will take you to > > > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at > > > least not what we expected to see. Also why 4 branches? > > > > So I reproduced this with a silly atomic_inc wrapper: > > > > void will_atomic_inc(atomic_t *v) > > { > > atomic_inc(v); > > } > > > > Compiles to: > > > > 0000000000000018 <will_atomic_inc>: > > 18: 14000004 b 28 <will_atomic_inc+0x10> > > 1c: 14000001 b 20 <will_atomic_inc+0x8> > > 20: 14000005 b 34 <will_atomic_inc+0x1c> > > 24: 14000001 b 28 <will_atomic_inc+0x10> > > 28: 52800028 mov w8, #0x1 // #1 > > 2c: b828001f stadd w8, [x0] > > 30: d65f03c0 ret > > 34: 14000027 b d0 <dump_kernel_offset+0x60> > > 38: d65f03c0 ret > > > > which is going to explode. > > I've come up with a simple reproducer for this issue: > > static bool branch_jump() > { > asm_volatile_goto( > "1: b %l[l_yes2]" > : : : : l_yes2); > > return false; > l_yes2: > return true; > } > > static bool branch_test() > { > return (!branch_jump() && !branch_jump()); > } > > void andy_test(int *v) > { > if (branch_test()) > *v = 0xff; > } > > This leads to the following (it shouldn't do anything): > > 0000000000000000 <andy_test>: > 0: 14000004 b 10 <andy_test+0x10> > 4: 14000001 b 8 <andy_test+0x8> > 8: 14000004 b 18 <andy_test+0x18> > c: 14000001 b 10 <andy_test+0x10> > 10: 52801fe8 mov w8, #0xff // #255 > 14: b9000008 str w8, [x0] > 18: d65f03c0 ret > > The issue goes away with any of the following hunks: > > > @@ -55,7 +55,7 @@ static bool branch_jump() > > static bool branch_test() > { > - return (!branch_jump() && !branch_jump()); > + return (!branch_jump()); > } > > void andy_test(int *v) > > > or: > > > @@ -53,14 +53,10 @@ static bool branch_jump() > return true; > } > > -static bool branch_test() > -{ > - return (!branch_jump() && !branch_jump()); > -} > > void andy_test(int *v) > { > - if (branch_test()) > + if (!branch_jump() && !branch_jump()) > *v = 0xff; > } Indeed, playing with the definition of `__lse_ll_sc_body`, I can get the kernel to boot again. So I think your very helpful test cases are illustrating two different problems: https://godbolt.org/z/dMf7x- See the disassembly of `andy_test2`. Reference to the correct label is emitted in the inline asm, but there's some silly unconditional branches to the next instruction. That's issue #1 and part of the reason you see superfluous branches. With that fixed, `andy_test2` would match between GCC and Clang. I think that can be a very late peephole optimization (and further, we could probably combine labels that refer to the same location, oh and .Lfunc_endX could just use `.`, too!). LLVM devs noted that the x86 backend doesn't have this issue, but this is a curiously recurring pattern I'm noticing in LLVM where some arch agnostic optimization is only implemented for x86... I'm reading through our Branch Folding pass which I think should handle this, but I'll need to fire up a debugger. Issue #2 is the more critical issue, but may be conflated with issue #1. Issue #2 is the nonsensical control flow with one level of inlining. See how in the disassembly of `andy_test`, the first label referenced from inline assembly is *before* the mov/str when it should have been *after*. Not sure where we could be going wrong, but it's straightforward for me to observe the code change as its transformed through LLVM, and I've debugged and fixed issues related to inlining asm goto before.
On Wed, Sep 04, 2019 at 10:28:14AM -0700, Nick Desaulniers wrote: > On Tue, Sep 3, 2019 at 3:04 PM Andrew Murray <andrew.murray@arm.com> wrote: > > > > On Tue, Sep 03, 2019 at 05:37:55PM +0100, Will Deacon wrote: > > > On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote: > > > > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote: > > > > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote: > > > > > > Does it work if the only thing you change is the toolchain, and use GCC > > > > > > instead? > > > > > > > > > > Yup. > > > > > > > > Also this is Clang generation: > > > > > > > > ffff8000100f2700 <__ptrace_link>: > > > > ffff8000100f2700: f9426009 ldr x9, [x0, #1216] > > > > ffff8000100f2704: 91130008 add x8, x0, #0x4c0 > > > > ffff8000100f2708: eb09011f cmp x8, x9 > > > > ffff8000100f270c: 540002a1 b.ne ffff8000100f2760 <__ptrace_link+0x60> // b.any > > > > ffff8000100f2710: f9425829 ldr x9, [x1, #1200] > > > > ffff8000100f2714: 9112c02a add x10, x1, #0x4b0 > > > > ffff8000100f2718: f9000528 str x8, [x9, #8] > > > > ffff8000100f271c: f9026009 str x9, [x0, #1216] > > > > ffff8000100f2720: f902640a str x10, [x0, #1224] > > > > ffff8000100f2724: f9025828 str x8, [x1, #1200] > > > > ffff8000100f2728: f9024001 str x1, [x0, #1152] > > > > ffff8000100f272c: b4000162 cbz x2, ffff8000100f2758 <__ptrace_link+0x58> > > > > ffff8000100f2730: b900985f str wzr, [x2, #152] > > > > ffff8000100f2734: 14000004 b ffff8000100f2744 <__ptrace_link+0x44> > > > > ffff8000100f2738: 14000001 b ffff8000100f273c <__ptrace_link+0x3c> > > > > ffff8000100f273c: 14000006 b ffff8000100f2754 <__ptrace_link+0x54> > > > > ffff8000100f2740: 14000001 b ffff8000100f2744 <__ptrace_link+0x44> > > > > ffff8000100f2744: 52800028 mov w8, #0x1 // #1 > > > > ffff8000100f2748: b828005f stadd w8, [x2] > > > > ffff8000100f274c: f9030002 str x2, [x0, #1536] > > > > ffff8000100f2750: d65f03c0 ret > > > > ffff8000100f2754: 140007fd b ffff8000100f4748 <ptrace_check_attach+0xf8> > > > > ... > > > > > > > > This looks like the default path (before we write over it) will take you to > > > > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at > > > > least not what we expected to see. Also why 4 branches? > > > > > > So I reproduced this with a silly atomic_inc wrapper: > > > > > > void will_atomic_inc(atomic_t *v) > > > { > > > atomic_inc(v); > > > } > > > > > > Compiles to: > > > > > > 0000000000000018 <will_atomic_inc>: > > > 18: 14000004 b 28 <will_atomic_inc+0x10> > > > 1c: 14000001 b 20 <will_atomic_inc+0x8> > > > 20: 14000005 b 34 <will_atomic_inc+0x1c> > > > 24: 14000001 b 28 <will_atomic_inc+0x10> > > > 28: 52800028 mov w8, #0x1 // #1 > > > 2c: b828001f stadd w8, [x0] > > > 30: d65f03c0 ret > > > 34: 14000027 b d0 <dump_kernel_offset+0x60> > > > 38: d65f03c0 ret > > > > > > which is going to explode. > > > > I've come up with a simple reproducer for this issue: > > > > static bool branch_jump() > > { > > asm_volatile_goto( > > "1: b %l[l_yes2]" > > : : : : l_yes2); > > > > return false; > > l_yes2: > > return true; > > } > > > > static bool branch_test() > > { > > return (!branch_jump() && !branch_jump()); > > } > > > > void andy_test(int *v) > > { > > if (branch_test()) > > *v = 0xff; > > } > > > > This leads to the following (it shouldn't do anything): > > > > 0000000000000000 <andy_test>: > > 0: 14000004 b 10 <andy_test+0x10> > > 4: 14000001 b 8 <andy_test+0x8> > > 8: 14000004 b 18 <andy_test+0x18> > > c: 14000001 b 10 <andy_test+0x10> > > 10: 52801fe8 mov w8, #0xff // #255 > > 14: b9000008 str w8, [x0] > > 18: d65f03c0 ret > > > > The issue goes away with any of the following hunks: > > > > > > @@ -55,7 +55,7 @@ static bool branch_jump() > > > > static bool branch_test() > > { > > - return (!branch_jump() && !branch_jump()); > > + return (!branch_jump()); > > } > > > > void andy_test(int *v) > > > > > > or: > > > > > > @@ -53,14 +53,10 @@ static bool branch_jump() > > return true; > > } > > > > -static bool branch_test() > > -{ > > - return (!branch_jump() && !branch_jump()); > > -} > > > > void andy_test(int *v) > > { > > - if (branch_test()) > > + if (!branch_jump() && !branch_jump()) > > *v = 0xff; > > } > > Indeed, playing with the definition of `__lse_ll_sc_body`, I can get > the kernel to boot again. Thanks for investigating this. Did it boot to a prompt? I played with the structure of the code and too was able to get it to boot, but I found that it hung later-on during boot. Thus I lost a bit of confidence in it. > > So I think your very helpful test cases are illustrating two different problems: > https://godbolt.org/z/dMf7x- > See the disassembly of `andy_test2`. Reference to the correct label > is emitted in the inline asm, but there's some silly unconditional > branches to the next instruction. That's issue #1 and part of the > reason you see superfluous branches. With that fixed, `andy_test2` > would match between GCC and Clang. I think that can be a very late > peephole optimization (and further, we could probably combine labels > that refer to the same location, oh and .Lfunc_endX could just use > `.`, too!). LLVM devs noted that the x86 backend doesn't have this > issue, but this is a curiously recurring pattern I'm noticing in LLVM > where some arch agnostic optimization is only implemented for x86... > I'm reading through our Branch Folding pass which I think should > handle this, but I'll need to fire up a debugger. > > Issue #2 is the more critical issue, but may be conflated with issue > #1. Issue #2 is the nonsensical control flow with one level of > inlining. See how in the disassembly of `andy_test`, the first label > referenced from inline assembly is *before* the mov/str when it should > have been *after*. Not sure where we could be going wrong, but it's > straightforward for me to observe the code change as its transformed > through LLVM, and I've debugged and fixed issues related to inlining > asm goto before. You may also be interested in this: https://godbolt.org/z/8OthP2 void andy_test3(int *v) { if (!branch_jump()) return; if (!branch_jump()) return; *v = 0xff; } (I used a similar approach with system_uses_lse_atomics to get the kernel to boot a bit more). This generated code does the right thing here (in comparison to andy_test2). I felt like this gave an insight as to what is going on, but I don't have the knowledge to know what. It's as if the early return prevents the compiler from getting confused when it should otherwise jump to the second goto. Thanks, Andrew Murray > -- > Thanks, > ~Nick Desaulniers
On Thu, Sep 5, 2019 at 4:25 AM Andrew Murray <andrew.murray@arm.com> wrote: > > On Wed, Sep 04, 2019 at 10:28:14AM -0700, Nick Desaulniers wrote: > > On Tue, Sep 3, 2019 at 3:04 PM Andrew Murray <andrew.murray@arm.com> wrote: > > > > > > On Tue, Sep 03, 2019 at 05:37:55PM +0100, Will Deacon wrote: > > > > On Tue, Sep 03, 2019 at 04:31:20PM +0100, Andrew Murray wrote: > > > > > On Tue, Sep 03, 2019 at 04:15:44PM +0100, Andrew Murray wrote: > > > > > > On Tue, Sep 03, 2019 at 03:45:34PM +0100, Will Deacon wrote: > > > > > > > Does it work if the only thing you change is the toolchain, and use GCC > > > > > > > instead? > > > > > > > > > > > > Yup. > > > > > > > > > > Also this is Clang generation: > > > > > > > > > > ffff8000100f2700 <__ptrace_link>: > > > > > ffff8000100f2700: f9426009 ldr x9, [x0, #1216] > > > > > ffff8000100f2704: 91130008 add x8, x0, #0x4c0 > > > > > ffff8000100f2708: eb09011f cmp x8, x9 > > > > > ffff8000100f270c: 540002a1 b.ne ffff8000100f2760 <__ptrace_link+0x60> // b.any > > > > > ffff8000100f2710: f9425829 ldr x9, [x1, #1200] > > > > > ffff8000100f2714: 9112c02a add x10, x1, #0x4b0 > > > > > ffff8000100f2718: f9000528 str x8, [x9, #8] > > > > > ffff8000100f271c: f9026009 str x9, [x0, #1216] > > > > > ffff8000100f2720: f902640a str x10, [x0, #1224] > > > > > ffff8000100f2724: f9025828 str x8, [x1, #1200] > > > > > ffff8000100f2728: f9024001 str x1, [x0, #1152] > > > > > ffff8000100f272c: b4000162 cbz x2, ffff8000100f2758 <__ptrace_link+0x58> > > > > > ffff8000100f2730: b900985f str wzr, [x2, #152] > > > > > ffff8000100f2734: 14000004 b ffff8000100f2744 <__ptrace_link+0x44> > > > > > ffff8000100f2738: 14000001 b ffff8000100f273c <__ptrace_link+0x3c> > > > > > ffff8000100f273c: 14000006 b ffff8000100f2754 <__ptrace_link+0x54> > > > > > ffff8000100f2740: 14000001 b ffff8000100f2744 <__ptrace_link+0x44> > > > > > ffff8000100f2744: 52800028 mov w8, #0x1 // #1 > > > > > ffff8000100f2748: b828005f stadd w8, [x2] > > > > > ffff8000100f274c: f9030002 str x2, [x0, #1536] > > > > > ffff8000100f2750: d65f03c0 ret > > > > > ffff8000100f2754: 140007fd b ffff8000100f4748 <ptrace_check_attach+0xf8> > > > > > ... > > > > > > > > > > This looks like the default path (before we write over it) will take you to > > > > > the LSE code (e.g. ffff8000100f2734). I'm pretty sure this is wrong, or at > > > > > least not what we expected to see. Also why 4 branches? > > > > > > > > So I reproduced this with a silly atomic_inc wrapper: > > > > > > > > void will_atomic_inc(atomic_t *v) > > > > { > > > > atomic_inc(v); > > > > } > > > > > > > > Compiles to: > > > > > > > > 0000000000000018 <will_atomic_inc>: > > > > 18: 14000004 b 28 <will_atomic_inc+0x10> > > > > 1c: 14000001 b 20 <will_atomic_inc+0x8> > > > > 20: 14000005 b 34 <will_atomic_inc+0x1c> > > > > 24: 14000001 b 28 <will_atomic_inc+0x10> > > > > 28: 52800028 mov w8, #0x1 // #1 > > > > 2c: b828001f stadd w8, [x0] > > > > 30: d65f03c0 ret > > > > 34: 14000027 b d0 <dump_kernel_offset+0x60> > > > > 38: d65f03c0 ret > > > > > > > > which is going to explode. > > > > > > I've come up with a simple reproducer for this issue: > > > > > > static bool branch_jump() > > > { > > > asm_volatile_goto( > > > "1: b %l[l_yes2]" > > > : : : : l_yes2); > > > > > > return false; > > > l_yes2: > > > return true; > > > } > > > > > > static bool branch_test() > > > { > > > return (!branch_jump() && !branch_jump()); > > > } > > > > > > void andy_test(int *v) > > > { > > > if (branch_test()) > > > *v = 0xff; > > > } > > > > > > This leads to the following (it shouldn't do anything): > > > > > > 0000000000000000 <andy_test>: > > > 0: 14000004 b 10 <andy_test+0x10> > > > 4: 14000001 b 8 <andy_test+0x8> > > > 8: 14000004 b 18 <andy_test+0x18> > > > c: 14000001 b 10 <andy_test+0x10> > > > 10: 52801fe8 mov w8, #0xff // #255 > > > 14: b9000008 str w8, [x0] > > > 18: d65f03c0 ret > > > > > > The issue goes away with any of the following hunks: > > > > > > > > > @@ -55,7 +55,7 @@ static bool branch_jump() > > > > > > static bool branch_test() > > > { > > > - return (!branch_jump() && !branch_jump()); > > > + return (!branch_jump()); > > > } > > > > > > void andy_test(int *v) > > > > > > > > > or: > > > > > > > > > @@ -53,14 +53,10 @@ static bool branch_jump() > > > return true; > > > } > > > > > > -static bool branch_test() > > > -{ > > > - return (!branch_jump() && !branch_jump()); > > > -} > > > > > > void andy_test(int *v) > > > { > > > - if (branch_test()) > > > + if (!branch_jump() && !branch_jump()) > > > *v = 0xff; > > > } > > > > Indeed, playing with the definition of `__lse_ll_sc_body`, I can get > > the kernel to boot again. > > Thanks for investigating this. > > Did it boot to a prompt? I played with the structure of the code and > too was able to get it to boot, but I found that it hung later-on during > boot. Thus I lost a bit of confidence in it. > > > > > So I think your very helpful test cases are illustrating two different problems: > > https://godbolt.org/z/dMf7x- > > See the disassembly of `andy_test2`. Reference to the correct label > > is emitted in the inline asm, but there's some silly unconditional > > branches to the next instruction. That's issue #1 and part of the > > reason you see superfluous branches. With that fixed, `andy_test2` > > would match between GCC and Clang. I think that can be a very late > > peephole optimization (and further, we could probably combine labels > > that refer to the same location, oh and .Lfunc_endX could just use > > `.`, too!). LLVM devs noted that the x86 backend doesn't have this > > issue, but this is a curiously recurring pattern I'm noticing in LLVM > > where some arch agnostic optimization is only implemented for x86... > > I'm reading through our Branch Folding pass which I think should > > handle this, but I'll need to fire up a debugger. > > > > Issue #2 is the more critical issue, but may be conflated with issue > > #1. Issue #2 is the nonsensical control flow with one level of > > inlining. See how in the disassembly of `andy_test`, the first label > > referenced from inline assembly is *before* the mov/str when it should > > have been *after*. Not sure where we could be going wrong, but it's > > straightforward for me to observe the code change as its transformed > > through LLVM, and I've debugged and fixed issues related to inlining > > asm goto before. > > You may also be interested in this: > > https://godbolt.org/z/8OthP2 > > void andy_test3(int *v) > { > if (!branch_jump()) > return; > > if (!branch_jump()) > return; > > *v = 0xff; > } > > (I used a similar approach with system_uses_lse_atomics to get the > kernel to boot a bit more). > > This generated code does the right thing here (in comparison to andy_test2). > I felt like this gave an insight as to what is going on, but I don't > have the knowledge to know what. It's as if the early return prevents the > compiler from getting confused when it should otherwise jump to the second > goto. Thanks for all of these test cases. It highlighted a bug in our implementation that I have a fix in hand for (currently awaiting code review): https://reviews.llvm.org/D67252 Further, I wrote a check for this kind of bug in our verification pass, so that this kind of bug doesn't creep back in: https://reviews.llvm.org/D67196 I cleared https://reviews.llvm.org/D67252 w/ the clang-9 release manager; assuming I land it today or early next week we'll likely be able to pick it up for the clang-9 release. I very much appreciate the help debugging and the reduced test cases. It's been a pleasure!
diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h index 657b0457d83c..c70d3f389d29 100644 --- a/arch/arm64/include/asm/atomic.h +++ b/arch/arm64/include/asm/atomic.h @@ -17,16 +17,7 @@ #ifdef __KERNEL__ -#define __ARM64_IN_ATOMIC_IMPL - -#if defined(CONFIG_ARM64_LSE_ATOMICS) && defined(CONFIG_AS_LSE) -#include <asm/atomic_lse.h> -#else -#include <asm/atomic_ll_sc.h> -#endif - -#undef __ARM64_IN_ATOMIC_IMPL - +#include <asm/atomic_arch.h> #include <asm/cmpxchg.h> #define ATOMIC_INIT(i) { (i) } diff --git a/arch/arm64/include/asm/atomic_arch.h b/arch/arm64/include/asm/atomic_arch.h new file mode 100644 index 000000000000..1aac7fc65084 --- /dev/null +++ b/arch/arm64/include/asm/atomic_arch.h @@ -0,0 +1,155 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Selection between LSE and LL/SC atomics. + * + * Copyright (C) 2018 ARM Ltd. + * Author: Andrew Murray <andrew.murray@arm.com> + */ + +#ifndef __ASM_ATOMIC_ARCH_H +#define __ASM_ATOMIC_ARCH_H + + +#include <linux/jump_label.h> + +#include <asm/cpucaps.h> +#include <asm/atomic_ll_sc.h> +#include <asm/atomic_lse.h> + +extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS]; +extern struct static_key_false arm64_const_caps_ready; + +static inline bool system_uses_lse_atomics(void) +{ + return (IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) && + IS_ENABLED(CONFIG_AS_LSE) && + static_branch_likely(&arm64_const_caps_ready)) && + static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]); +} + +#define __lse_ll_sc_body(op, ...) \ +({ \ + system_uses_lse_atomics() ? \ + __lse_##op(__VA_ARGS__) : \ + __ll_sc_##op(__VA_ARGS__); \ +}) + +#define ATOMIC_OP(op) \ +static inline void arch_##op(int i, atomic_t *v) \ +{ \ + __lse_ll_sc_body(op, i, v); \ +} + +ATOMIC_OP(atomic_andnot) +ATOMIC_OP(atomic_or) +ATOMIC_OP(atomic_xor) +ATOMIC_OP(atomic_add) +ATOMIC_OP(atomic_and) +ATOMIC_OP(atomic_sub) + + +#define ATOMIC_FETCH_OP(name, op) \ +static inline int arch_##op##name(int i, atomic_t *v) \ +{ \ + return __lse_ll_sc_body(op##name, i, v); \ +} + +#define ATOMIC_FETCH_OPS(op) \ + ATOMIC_FETCH_OP(_relaxed, op) \ + ATOMIC_FETCH_OP(_acquire, op) \ + ATOMIC_FETCH_OP(_release, op) \ + ATOMIC_FETCH_OP( , op) + +ATOMIC_FETCH_OPS(atomic_fetch_andnot) +ATOMIC_FETCH_OPS(atomic_fetch_or) +ATOMIC_FETCH_OPS(atomic_fetch_xor) +ATOMIC_FETCH_OPS(atomic_fetch_add) +ATOMIC_FETCH_OPS(atomic_fetch_and) +ATOMIC_FETCH_OPS(atomic_fetch_sub) +ATOMIC_FETCH_OPS(atomic_add_return) +ATOMIC_FETCH_OPS(atomic_sub_return) + + +#define ATOMIC64_OP(op) \ +static inline void arch_##op(long i, atomic64_t *v) \ +{ \ + __lse_ll_sc_body(op, i, v); \ +} + +ATOMIC64_OP(atomic64_andnot) +ATOMIC64_OP(atomic64_or) +ATOMIC64_OP(atomic64_xor) +ATOMIC64_OP(atomic64_add) +ATOMIC64_OP(atomic64_and) +ATOMIC64_OP(atomic64_sub) + + +#define ATOMIC64_FETCH_OP(name, op) \ +static inline long arch_##op##name(long i, atomic64_t *v) \ +{ \ + return __lse_ll_sc_body(op##name, i, v); \ +} + +#define ATOMIC64_FETCH_OPS(op) \ + ATOMIC64_FETCH_OP(_relaxed, op) \ + ATOMIC64_FETCH_OP(_acquire, op) \ + ATOMIC64_FETCH_OP(_release, op) \ + ATOMIC64_FETCH_OP( , op) + +ATOMIC64_FETCH_OPS(atomic64_fetch_andnot) +ATOMIC64_FETCH_OPS(atomic64_fetch_or) +ATOMIC64_FETCH_OPS(atomic64_fetch_xor) +ATOMIC64_FETCH_OPS(atomic64_fetch_add) +ATOMIC64_FETCH_OPS(atomic64_fetch_and) +ATOMIC64_FETCH_OPS(atomic64_fetch_sub) +ATOMIC64_FETCH_OPS(atomic64_add_return) +ATOMIC64_FETCH_OPS(atomic64_sub_return) + + +static inline long arch_atomic64_dec_if_positive(atomic64_t *v) +{ + return __lse_ll_sc_body(atomic64_dec_if_positive, v); +} + +#define __CMPXCHG_CASE(name, sz) \ +static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr, \ + u##sz old, \ + u##sz new) \ +{ \ + return __lse_ll_sc_body(_cmpxchg_case_##name##sz, \ + ptr, old, new); \ +} + +__CMPXCHG_CASE( , 8) +__CMPXCHG_CASE( , 16) +__CMPXCHG_CASE( , 32) +__CMPXCHG_CASE( , 64) +__CMPXCHG_CASE(acq_, 8) +__CMPXCHG_CASE(acq_, 16) +__CMPXCHG_CASE(acq_, 32) +__CMPXCHG_CASE(acq_, 64) +__CMPXCHG_CASE(rel_, 8) +__CMPXCHG_CASE(rel_, 16) +__CMPXCHG_CASE(rel_, 32) +__CMPXCHG_CASE(rel_, 64) +__CMPXCHG_CASE(mb_, 8) +__CMPXCHG_CASE(mb_, 16) +__CMPXCHG_CASE(mb_, 32) +__CMPXCHG_CASE(mb_, 64) + + +#define __CMPXCHG_DBL(name) \ +static inline long __cmpxchg_double##name(unsigned long old1, \ + unsigned long old2, \ + unsigned long new1, \ + unsigned long new2, \ + volatile void *ptr) \ +{ \ + return __lse_ll_sc_body(_cmpxchg_double##name, \ + old1, old2, new1, new2, ptr); \ +} + +__CMPXCHG_DBL( ) +__CMPXCHG_DBL(_mb) + +#endif /* __ASM_ATOMIC_LSE_H */ diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h index 6dd011e0b434..95091f72228b 100644 --- a/arch/arm64/include/asm/atomic_ll_sc.h +++ b/arch/arm64/include/asm/atomic_ll_sc.h @@ -10,83 +10,86 @@ #ifndef __ASM_ATOMIC_LL_SC_H #define __ASM_ATOMIC_LL_SC_H -#ifndef __ARM64_IN_ATOMIC_IMPL -#error "please don't include this file directly" +#if IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) && IS_ENABLED(CONFIG_AS_LSE) +#define __LL_SC_FALLBACK(asm_ops) \ +" b 3f\n" \ +" .subsection 1\n" \ +"3:\n" \ +asm_ops "\n" \ +" b 4f\n" \ +" .previous\n" \ +"4:\n" +#else +#define __LL_SC_FALLBACK(asm_ops) asm_ops #endif /* * AArch64 UP and SMP safe atomic ops. We use load exclusive and * store exclusive to ensure that these are atomic. We may loop * to ensure that the update happens. - * - * NOTE: these functions do *not* follow the PCS and must explicitly - * save any clobbered registers other than x0 (regardless of return - * value). This is achieved through -fcall-saved-* compiler flags for - * this file, which unfortunately don't work on a per-function basis - * (the optimize attribute silently ignores these options). */ #define ATOMIC_OP(op, asm_op, constraint) \ -__LL_SC_INLINE void \ -__LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v)) \ +static inline void \ +__ll_sc_atomic_##op(int i, atomic_t *v) \ { \ unsigned long tmp; \ int result; \ \ asm volatile("// atomic_" #op "\n" \ + __LL_SC_FALLBACK( \ " prfm pstl1strm, %2\n" \ "1: ldxr %w0, %2\n" \ " " #asm_op " %w0, %w0, %w3\n" \ " stxr %w1, %w0, %2\n" \ -" cbnz %w1, 1b" \ +" cbnz %w1, 1b\n") \ : "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \ : #constraint "r" (i)); \ -} \ -__LL_SC_EXPORT(arch_atomic_##op); +} #define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\ -__LL_SC_INLINE int \ -__LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v)) \ +static inline int \ +__ll_sc_atomic_##op##_return##name(int i, atomic_t *v) \ { \ unsigned long tmp; \ int result; \ \ asm volatile("// atomic_" #op "_return" #name "\n" \ + __LL_SC_FALLBACK( \ " prfm pstl1strm, %2\n" \ "1: ld" #acq "xr %w0, %2\n" \ " " #asm_op " %w0, %w0, %w3\n" \ " st" #rel "xr %w1, %w0, %2\n" \ " cbnz %w1, 1b\n" \ -" " #mb \ +" " #mb ) \ : "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \ : #constraint "r" (i) \ : cl); \ \ return result; \ -} \ -__LL_SC_EXPORT(arch_atomic_##op##_return##name); +} -#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint) \ -__LL_SC_INLINE int \ -__LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v)) \ +#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint) \ +static inline int \ +__ll_sc_atomic_fetch_##op##name(int i, atomic_t *v) \ { \ unsigned long tmp; \ int val, result; \ \ asm volatile("// atomic_fetch_" #op #name "\n" \ + __LL_SC_FALLBACK( \ " prfm pstl1strm, %3\n" \ "1: ld" #acq "xr %w0, %3\n" \ " " #asm_op " %w1, %w0, %w4\n" \ " st" #rel "xr %w2, %w1, %3\n" \ " cbnz %w2, 1b\n" \ -" " #mb \ +" " #mb ) \ : "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter) \ : #constraint "r" (i) \ : cl); \ \ return result; \ -} \ -__LL_SC_EXPORT(arch_atomic_fetch_##op##name); +} #define ATOMIC_OPS(...) \ ATOMIC_OP(__VA_ARGS__) \ @@ -121,66 +124,66 @@ ATOMIC_OPS(xor, eor, ) #undef ATOMIC_OP #define ATOMIC64_OP(op, asm_op, constraint) \ -__LL_SC_INLINE void \ -__LL_SC_PREFIX(arch_atomic64_##op(s64 i, atomic64_t *v)) \ +static inline void \ +__ll_sc_atomic64_##op(s64 i, atomic64_t *v) \ { \ s64 result; \ unsigned long tmp; \ \ asm volatile("// atomic64_" #op "\n" \ + __LL_SC_FALLBACK( \ " prfm pstl1strm, %2\n" \ "1: ldxr %0, %2\n" \ " " #asm_op " %0, %0, %3\n" \ " stxr %w1, %0, %2\n" \ -" cbnz %w1, 1b" \ +" cbnz %w1, 1b") \ : "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \ : #constraint "r" (i)); \ -} \ -__LL_SC_EXPORT(arch_atomic64_##op); +} #define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\ -__LL_SC_INLINE s64 \ -__LL_SC_PREFIX(arch_atomic64_##op##_return##name(s64 i, atomic64_t *v))\ +static inline long \ +__ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v) \ { \ s64 result; \ unsigned long tmp; \ \ asm volatile("// atomic64_" #op "_return" #name "\n" \ + __LL_SC_FALLBACK( \ " prfm pstl1strm, %2\n" \ "1: ld" #acq "xr %0, %2\n" \ " " #asm_op " %0, %0, %3\n" \ " st" #rel "xr %w1, %0, %2\n" \ " cbnz %w1, 1b\n" \ -" " #mb \ +" " #mb ) \ : "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \ : #constraint "r" (i) \ : cl); \ \ return result; \ -} \ -__LL_SC_EXPORT(arch_atomic64_##op##_return##name); +} #define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\ -__LL_SC_INLINE s64 \ -__LL_SC_PREFIX(arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v)) \ +static inline long \ +__ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v) \ { \ s64 result, val; \ unsigned long tmp; \ \ asm volatile("// atomic64_fetch_" #op #name "\n" \ + __LL_SC_FALLBACK( \ " prfm pstl1strm, %3\n" \ "1: ld" #acq "xr %0, %3\n" \ " " #asm_op " %1, %0, %4\n" \ " st" #rel "xr %w2, %1, %3\n" \ " cbnz %w2, 1b\n" \ -" " #mb \ +" " #mb ) \ : "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter) \ : #constraint "r" (i) \ : cl); \ \ return result; \ -} \ -__LL_SC_EXPORT(arch_atomic64_fetch_##op##name); +} #define ATOMIC64_OPS(...) \ ATOMIC64_OP(__VA_ARGS__) \ @@ -214,13 +217,14 @@ ATOMIC64_OPS(xor, eor, L) #undef ATOMIC64_OP_RETURN #undef ATOMIC64_OP -__LL_SC_INLINE s64 -__LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v)) +static inline s64 +__ll_sc_atomic64_dec_if_positive(atomic64_t *v) { s64 result; unsigned long tmp; asm volatile("// atomic64_dec_if_positive\n" + __LL_SC_FALLBACK( " prfm pstl1strm, %2\n" "1: ldxr %0, %2\n" " subs %0, %0, #1\n" @@ -228,20 +232,19 @@ __LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v)) " stlxr %w1, %0, %2\n" " cbnz %w1, 1b\n" " dmb ish\n" -"2:" +"2:") : "=&r" (result), "=&r" (tmp), "+Q" (v->counter) : : "cc", "memory"); return result; } -__LL_SC_EXPORT(arch_atomic64_dec_if_positive); #define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl, constraint) \ -__LL_SC_INLINE u##sz \ -__LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr, \ +static inline u##sz \ +__ll_sc__cmpxchg_case_##name##sz(volatile void *ptr, \ unsigned long old, \ - u##sz new)) \ + u##sz new) \ { \ unsigned long tmp; \ u##sz oldval; \ @@ -255,6 +258,7 @@ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr, \ old = (u##sz)old; \ \ asm volatile( \ + __LL_SC_FALLBACK( \ " prfm pstl1strm, %[v]\n" \ "1: ld" #acq "xr" #sfx "\t%" #w "[oldval], %[v]\n" \ " eor %" #w "[tmp], %" #w "[oldval], %" #w "[old]\n" \ @@ -262,15 +266,14 @@ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr, \ " st" #rel "xr" #sfx "\t%w[tmp], %" #w "[new], %[v]\n" \ " cbnz %w[tmp], 1b\n" \ " " #mb "\n" \ - "2:" \ + "2:") \ : [tmp] "=&r" (tmp), [oldval] "=&r" (oldval), \ [v] "+Q" (*(u##sz *)ptr) \ : [old] #constraint "r" (old), [new] "r" (new) \ : cl); \ \ return oldval; \ -} \ -__LL_SC_EXPORT(__cmpxchg_case_##name##sz); +} /* * Earlier versions of GCC (no later than 8.1.0) appear to incorrectly @@ -297,16 +300,17 @@ __CMPXCHG_CASE( , , mb_, 64, dmb ish, , l, "memory", L) #undef __CMPXCHG_CASE #define __CMPXCHG_DBL(name, mb, rel, cl) \ -__LL_SC_INLINE long \ -__LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1, \ +static inline long \ +__ll_sc__cmpxchg_double##name(unsigned long old1, \ unsigned long old2, \ unsigned long new1, \ unsigned long new2, \ - volatile void *ptr)) \ + volatile void *ptr) \ { \ unsigned long tmp, ret; \ \ asm volatile("// __cmpxchg_double" #name "\n" \ + __LL_SC_FALLBACK( \ " prfm pstl1strm, %2\n" \ "1: ldxp %0, %1, %2\n" \ " eor %0, %0, %3\n" \ @@ -316,14 +320,13 @@ __LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1, \ " st" #rel "xp %w0, %5, %6, %2\n" \ " cbnz %w0, 1b\n" \ " " #mb "\n" \ - "2:" \ + "2:") \ : "=&r" (tmp), "=&r" (ret), "+Q" (*(unsigned long *)ptr) \ : "r" (old1), "r" (old2), "r" (new1), "r" (new2) \ : cl); \ \ return ret; \ -} \ -__LL_SC_EXPORT(__cmpxchg_double##name); +} __CMPXCHG_DBL( , , , ) __CMPXCHG_DBL(_mb, dmb ish, l, "memory") diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h index 69acb1c19a15..7dce5e1f074e 100644 --- a/arch/arm64/include/asm/atomic_lse.h +++ b/arch/arm64/include/asm/atomic_lse.h @@ -10,22 +10,13 @@ #ifndef __ASM_ATOMIC_LSE_H #define __ASM_ATOMIC_LSE_H -#ifndef __ARM64_IN_ATOMIC_IMPL -#error "please don't include this file directly" -#endif - -#define __LL_SC_ATOMIC(op) __LL_SC_CALL(arch_atomic_##op) #define ATOMIC_OP(op, asm_op) \ -static inline void arch_atomic_##op(int i, atomic_t *v) \ +static inline void __lse_atomic_##op(int i, atomic_t *v) \ { \ - register int w0 asm ("w0") = i; \ - register atomic_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN(__LL_SC_ATOMIC(op), \ -" " #asm_op " %w[i], %[v]\n") \ - : [i] "+r" (w0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS); \ + asm volatile( \ +" " #asm_op " %w[i], %[v]\n" \ + : [i] "+r" (i), [v] "+Q" (v->counter) \ + : "r" (v)); \ } ATOMIC_OP(andnot, stclr) @@ -36,21 +27,15 @@ ATOMIC_OP(add, stadd) #undef ATOMIC_OP #define ATOMIC_FETCH_OP(name, mb, op, asm_op, cl...) \ -static inline int arch_atomic_fetch_##op##name(int i, atomic_t *v) \ +static inline int __lse_atomic_fetch_##op##name(int i, atomic_t *v) \ { \ - register int w0 asm ("w0") = i; \ - register atomic_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_ATOMIC(fetch_##op##name), \ - /* LSE atomics */ \ -" " #asm_op #mb " %w[i], %w[i], %[v]") \ - : [i] "+r" (w0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS, ##cl); \ + asm volatile( \ +" " #asm_op #mb " %w[i], %w[i], %[v]" \ + : [i] "+r" (i), [v] "+Q" (v->counter) \ + : "r" (v) \ + : cl); \ \ - return w0; \ + return i; \ } #define ATOMIC_FETCH_OPS(op, asm_op) \ @@ -68,23 +53,16 @@ ATOMIC_FETCH_OPS(add, ldadd) #undef ATOMIC_FETCH_OPS #define ATOMIC_OP_ADD_RETURN(name, mb, cl...) \ -static inline int arch_atomic_add_return##name(int i, atomic_t *v) \ +static inline int __lse_atomic_add_return##name(int i, atomic_t *v) \ { \ - register int w0 asm ("w0") = i; \ - register atomic_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_ATOMIC(add_return##name) \ - __nops(1), \ - /* LSE atomics */ \ + asm volatile( \ " ldadd" #mb " %w[i], w30, %[v]\n" \ - " add %w[i], %w[i], w30") \ - : [i] "+r" (w0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS, ##cl); \ + " add %w[i], %w[i], w30" \ + : [i] "+r" (i), [v] "+Q" (v->counter) \ + : "r" (v) \ + : "x30", ##cl); \ \ - return w0; \ + return i; \ } ATOMIC_OP_ADD_RETURN(_relaxed, ) @@ -94,41 +72,26 @@ ATOMIC_OP_ADD_RETURN( , al, "memory") #undef ATOMIC_OP_ADD_RETURN -static inline void arch_atomic_and(int i, atomic_t *v) +static inline void __lse_atomic_and(int i, atomic_t *v) { - register int w0 asm ("w0") = i; - register atomic_t *x1 asm ("x1") = v; - - asm volatile(ARM64_LSE_ATOMIC_INSN( - /* LL/SC */ - __LL_SC_ATOMIC(and) - __nops(1), - /* LSE atomics */ + asm volatile( " mvn %w[i], %w[i]\n" - " stclr %w[i], %[v]") - : [i] "+&r" (w0), [v] "+Q" (v->counter) - : "r" (x1) - : __LL_SC_CLOBBERS); + " stclr %w[i], %[v]" + : [i] "+&r" (i), [v] "+Q" (v->counter) + : "r" (v)); } #define ATOMIC_FETCH_OP_AND(name, mb, cl...) \ -static inline int arch_atomic_fetch_and##name(int i, atomic_t *v) \ +static inline int __lse_atomic_fetch_and##name(int i, atomic_t *v) \ { \ - register int w0 asm ("w0") = i; \ - register atomic_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_ATOMIC(fetch_and##name) \ - __nops(1), \ - /* LSE atomics */ \ + asm volatile( \ " mvn %w[i], %w[i]\n" \ - " ldclr" #mb " %w[i], %w[i], %[v]") \ - : [i] "+&r" (w0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS, ##cl); \ + " ldclr" #mb " %w[i], %w[i], %[v]" \ + : [i] "+&r" (i), [v] "+Q" (v->counter) \ + : "r" (v) \ + : cl); \ \ - return w0; \ + return i; \ } ATOMIC_FETCH_OP_AND(_relaxed, ) @@ -138,42 +101,27 @@ ATOMIC_FETCH_OP_AND( , al, "memory") #undef ATOMIC_FETCH_OP_AND -static inline void arch_atomic_sub(int i, atomic_t *v) +static inline void __lse_atomic_sub(int i, atomic_t *v) { - register int w0 asm ("w0") = i; - register atomic_t *x1 asm ("x1") = v; - - asm volatile(ARM64_LSE_ATOMIC_INSN( - /* LL/SC */ - __LL_SC_ATOMIC(sub) - __nops(1), - /* LSE atomics */ + asm volatile( " neg %w[i], %w[i]\n" - " stadd %w[i], %[v]") - : [i] "+&r" (w0), [v] "+Q" (v->counter) - : "r" (x1) - : __LL_SC_CLOBBERS); + " stadd %w[i], %[v]" + : [i] "+&r" (i), [v] "+Q" (v->counter) + : "r" (v)); } #define ATOMIC_OP_SUB_RETURN(name, mb, cl...) \ -static inline int arch_atomic_sub_return##name(int i, atomic_t *v) \ +static inline int __lse_atomic_sub_return##name(int i, atomic_t *v) \ { \ - register int w0 asm ("w0") = i; \ - register atomic_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_ATOMIC(sub_return##name) \ - __nops(2), \ - /* LSE atomics */ \ + asm volatile( \ " neg %w[i], %w[i]\n" \ " ldadd" #mb " %w[i], w30, %[v]\n" \ - " add %w[i], %w[i], w30") \ - : [i] "+&r" (w0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS , ##cl); \ + " add %w[i], %w[i], w30" \ + : [i] "+&r" (i), [v] "+Q" (v->counter) \ + : "r" (v) \ + : "x30", ##cl); \ \ - return w0; \ + return i; \ } ATOMIC_OP_SUB_RETURN(_relaxed, ) @@ -184,23 +132,16 @@ ATOMIC_OP_SUB_RETURN( , al, "memory") #undef ATOMIC_OP_SUB_RETURN #define ATOMIC_FETCH_OP_SUB(name, mb, cl...) \ -static inline int arch_atomic_fetch_sub##name(int i, atomic_t *v) \ +static inline int __lse_atomic_fetch_sub##name(int i, atomic_t *v) \ { \ - register int w0 asm ("w0") = i; \ - register atomic_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_ATOMIC(fetch_sub##name) \ - __nops(1), \ - /* LSE atomics */ \ + asm volatile( \ " neg %w[i], %w[i]\n" \ - " ldadd" #mb " %w[i], %w[i], %[v]") \ - : [i] "+&r" (w0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS, ##cl); \ + " ldadd" #mb " %w[i], %w[i], %[v]" \ + : [i] "+&r" (i), [v] "+Q" (v->counter) \ + : "r" (v) \ + : cl); \ \ - return w0; \ + return i; \ } ATOMIC_FETCH_OP_SUB(_relaxed, ) @@ -209,20 +150,14 @@ ATOMIC_FETCH_OP_SUB(_release, l, "memory") ATOMIC_FETCH_OP_SUB( , al, "memory") #undef ATOMIC_FETCH_OP_SUB -#undef __LL_SC_ATOMIC -#define __LL_SC_ATOMIC64(op) __LL_SC_CALL(arch_atomic64_##op) #define ATOMIC64_OP(op, asm_op) \ -static inline void arch_atomic64_##op(s64 i, atomic64_t *v) \ +static inline void __lse_atomic64_##op(s64 i, atomic64_t *v) \ { \ - register s64 x0 asm ("x0") = i; \ - register atomic64_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN(__LL_SC_ATOMIC64(op), \ -" " #asm_op " %[i], %[v]\n") \ - : [i] "+r" (x0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS); \ + asm volatile( \ +" " #asm_op " %[i], %[v]\n" \ + : [i] "+r" (i), [v] "+Q" (v->counter) \ + : "r" (v)); \ } ATOMIC64_OP(andnot, stclr) @@ -233,21 +168,15 @@ ATOMIC64_OP(add, stadd) #undef ATOMIC64_OP #define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...) \ -static inline s64 arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v) \ +static inline long __lse_atomic64_fetch_##op##name(s64 i, atomic64_t *v)\ { \ - register s64 x0 asm ("x0") = i; \ - register atomic64_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_ATOMIC64(fetch_##op##name), \ - /* LSE atomics */ \ -" " #asm_op #mb " %[i], %[i], %[v]") \ - : [i] "+r" (x0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS, ##cl); \ + asm volatile( \ +" " #asm_op #mb " %[i], %[i], %[v]" \ + : [i] "+r" (i), [v] "+Q" (v->counter) \ + : "r" (v) \ + : cl); \ \ - return x0; \ + return i; \ } #define ATOMIC64_FETCH_OPS(op, asm_op) \ @@ -265,23 +194,16 @@ ATOMIC64_FETCH_OPS(add, ldadd) #undef ATOMIC64_FETCH_OPS #define ATOMIC64_OP_ADD_RETURN(name, mb, cl...) \ -static inline s64 arch_atomic64_add_return##name(s64 i, atomic64_t *v) \ +static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\ { \ - register s64 x0 asm ("x0") = i; \ - register atomic64_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_ATOMIC64(add_return##name) \ - __nops(1), \ - /* LSE atomics */ \ + asm volatile( \ " ldadd" #mb " %[i], x30, %[v]\n" \ - " add %[i], %[i], x30") \ - : [i] "+r" (x0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS, ##cl); \ + " add %[i], %[i], x30" \ + : [i] "+r" (i), [v] "+Q" (v->counter) \ + : "r" (v) \ + : "x30", ##cl); \ \ - return x0; \ + return i; \ } ATOMIC64_OP_ADD_RETURN(_relaxed, ) @@ -291,41 +213,26 @@ ATOMIC64_OP_ADD_RETURN( , al, "memory") #undef ATOMIC64_OP_ADD_RETURN -static inline void arch_atomic64_and(s64 i, atomic64_t *v) +static inline void __lse_atomic64_and(s64 i, atomic64_t *v) { - register s64 x0 asm ("x0") = i; - register atomic64_t *x1 asm ("x1") = v; - - asm volatile(ARM64_LSE_ATOMIC_INSN( - /* LL/SC */ - __LL_SC_ATOMIC64(and) - __nops(1), - /* LSE atomics */ + asm volatile( " mvn %[i], %[i]\n" - " stclr %[i], %[v]") - : [i] "+&r" (x0), [v] "+Q" (v->counter) - : "r" (x1) - : __LL_SC_CLOBBERS); + " stclr %[i], %[v]" + : [i] "+&r" (i), [v] "+Q" (v->counter) + : "r" (v)); } #define ATOMIC64_FETCH_OP_AND(name, mb, cl...) \ -static inline s64 arch_atomic64_fetch_and##name(s64 i, atomic64_t *v) \ +static inline long __lse_atomic64_fetch_and##name(s64 i, atomic64_t *v) \ { \ - register s64 x0 asm ("x0") = i; \ - register atomic64_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_ATOMIC64(fetch_and##name) \ - __nops(1), \ - /* LSE atomics */ \ + asm volatile( \ " mvn %[i], %[i]\n" \ - " ldclr" #mb " %[i], %[i], %[v]") \ - : [i] "+&r" (x0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS, ##cl); \ + " ldclr" #mb " %[i], %[i], %[v]" \ + : [i] "+&r" (i), [v] "+Q" (v->counter) \ + : "r" (v) \ + : cl); \ \ - return x0; \ + return i; \ } ATOMIC64_FETCH_OP_AND(_relaxed, ) @@ -335,42 +242,27 @@ ATOMIC64_FETCH_OP_AND( , al, "memory") #undef ATOMIC64_FETCH_OP_AND -static inline void arch_atomic64_sub(s64 i, atomic64_t *v) +static inline void __lse_atomic64_sub(s64 i, atomic64_t *v) { - register s64 x0 asm ("x0") = i; - register atomic64_t *x1 asm ("x1") = v; - - asm volatile(ARM64_LSE_ATOMIC_INSN( - /* LL/SC */ - __LL_SC_ATOMIC64(sub) - __nops(1), - /* LSE atomics */ + asm volatile( " neg %[i], %[i]\n" - " stadd %[i], %[v]") - : [i] "+&r" (x0), [v] "+Q" (v->counter) - : "r" (x1) - : __LL_SC_CLOBBERS); + " stadd %[i], %[v]" + : [i] "+&r" (i), [v] "+Q" (v->counter) + : "r" (v)); } #define ATOMIC64_OP_SUB_RETURN(name, mb, cl...) \ -static inline s64 arch_atomic64_sub_return##name(s64 i, atomic64_t *v) \ +static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v) \ { \ - register s64 x0 asm ("x0") = i; \ - register atomic64_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_ATOMIC64(sub_return##name) \ - __nops(2), \ - /* LSE atomics */ \ + asm volatile( \ " neg %[i], %[i]\n" \ " ldadd" #mb " %[i], x30, %[v]\n" \ - " add %[i], %[i], x30") \ - : [i] "+&r" (x0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS, ##cl); \ + " add %[i], %[i], x30" \ + : [i] "+&r" (i), [v] "+Q" (v->counter) \ + : "r" (v) \ + : "x30", ##cl); \ \ - return x0; \ + return i; \ } ATOMIC64_OP_SUB_RETURN(_relaxed, ) @@ -381,23 +273,16 @@ ATOMIC64_OP_SUB_RETURN( , al, "memory") #undef ATOMIC64_OP_SUB_RETURN #define ATOMIC64_FETCH_OP_SUB(name, mb, cl...) \ -static inline s64 arch_atomic64_fetch_sub##name(s64 i, atomic64_t *v) \ +static inline long __lse_atomic64_fetch_sub##name(s64 i, atomic64_t *v) \ { \ - register s64 x0 asm ("x0") = i; \ - register atomic64_t *x1 asm ("x1") = v; \ - \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_ATOMIC64(fetch_sub##name) \ - __nops(1), \ - /* LSE atomics */ \ + asm volatile( \ " neg %[i], %[i]\n" \ - " ldadd" #mb " %[i], %[i], %[v]") \ - : [i] "+&r" (x0), [v] "+Q" (v->counter) \ - : "r" (x1) \ - : __LL_SC_CLOBBERS, ##cl); \ + " ldadd" #mb " %[i], %[i], %[v]" \ + : [i] "+&r" (i), [v] "+Q" (v->counter) \ + : "r" (v) \ + : cl); \ \ - return x0; \ + return i; \ } ATOMIC64_FETCH_OP_SUB(_relaxed, ) @@ -407,15 +292,9 @@ ATOMIC64_FETCH_OP_SUB( , al, "memory") #undef ATOMIC64_FETCH_OP_SUB -static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v) +static inline s64 __lse_atomic64_dec_if_positive(atomic64_t *v) { - register long x0 asm ("x0") = (long)v; - - asm volatile(ARM64_LSE_ATOMIC_INSN( - /* LL/SC */ - __LL_SC_ATOMIC64(dec_if_positive) - __nops(6), - /* LSE atomics */ + asm volatile( "1: ldr x30, %[v]\n" " subs %[ret], x30, #1\n" " b.lt 2f\n" @@ -423,20 +302,16 @@ static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v) " sub x30, x30, #1\n" " sub x30, x30, %[ret]\n" " cbnz x30, 1b\n" - "2:") - : [ret] "+&r" (x0), [v] "+Q" (v->counter) + "2:" + : [ret] "+&r" (v), [v] "+Q" (v->counter) : - : __LL_SC_CLOBBERS, "cc", "memory"); + : "x30", "cc", "memory"); - return x0; + return (long)v; } -#undef __LL_SC_ATOMIC64 - -#define __LL_SC_CMPXCHG(op) __LL_SC_CALL(__cmpxchg_case_##op) - #define __CMPXCHG_CASE(w, sfx, name, sz, mb, cl...) \ -static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr, \ +static inline u##sz __lse__cmpxchg_case_##name##sz(volatile void *ptr, \ u##sz old, \ u##sz new) \ { \ @@ -444,17 +319,13 @@ static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr, \ register u##sz x1 asm ("x1") = old; \ register u##sz x2 asm ("x2") = new; \ \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_CMPXCHG(name##sz) \ - __nops(2), \ - /* LSE atomics */ \ + asm volatile( \ " mov " #w "30, %" #w "[old]\n" \ " cas" #mb #sfx "\t" #w "30, %" #w "[new], %[v]\n" \ - " mov %" #w "[ret], " #w "30") \ + " mov %" #w "[ret], " #w "30" \ : [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr) \ : [old] "r" (x1), [new] "r" (x2) \ - : __LL_SC_CLOBBERS, ##cl); \ + : "x30", ##cl); \ \ return x0; \ } @@ -476,13 +347,10 @@ __CMPXCHG_CASE(w, h, mb_, 16, al, "memory") __CMPXCHG_CASE(w, , mb_, 32, al, "memory") __CMPXCHG_CASE(x, , mb_, 64, al, "memory") -#undef __LL_SC_CMPXCHG #undef __CMPXCHG_CASE -#define __LL_SC_CMPXCHG_DBL(op) __LL_SC_CALL(__cmpxchg_double##op) - #define __CMPXCHG_DBL(name, mb, cl...) \ -static inline long __cmpxchg_double##name(unsigned long old1, \ +static inline long __lse__cmpxchg_double##name(unsigned long old1, \ unsigned long old2, \ unsigned long new1, \ unsigned long new2, \ @@ -496,20 +364,16 @@ static inline long __cmpxchg_double##name(unsigned long old1, \ register unsigned long x3 asm ("x3") = new2; \ register unsigned long x4 asm ("x4") = (unsigned long)ptr; \ \ - asm volatile(ARM64_LSE_ATOMIC_INSN( \ - /* LL/SC */ \ - __LL_SC_CMPXCHG_DBL(name) \ - __nops(3), \ - /* LSE atomics */ \ + asm volatile( \ " casp" #mb "\t%[old1], %[old2], %[new1], %[new2], %[v]\n"\ " eor %[old1], %[old1], %[oldval1]\n" \ " eor %[old2], %[old2], %[oldval2]\n" \ - " orr %[old1], %[old1], %[old2]") \ + " orr %[old1], %[old1], %[old2]" \ : [old1] "+&r" (x0), [old2] "+&r" (x1), \ [v] "+Q" (*(unsigned long *)ptr) \ : [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4), \ [oldval1] "r" (oldval1), [oldval2] "r" (oldval2) \ - : __LL_SC_CLOBBERS, ##cl); \ + : cl); \ \ return x0; \ } @@ -517,7 +381,6 @@ static inline long __cmpxchg_double##name(unsigned long old1, \ __CMPXCHG_DBL( , ) __CMPXCHG_DBL(_mb, al, "memory") -#undef __LL_SC_CMPXCHG_DBL #undef __CMPXCHG_DBL #endif /* __ASM_ATOMIC_LSE_H */ diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h index 7a299a20f6dc..e5fff8cd4904 100644 --- a/arch/arm64/include/asm/cmpxchg.h +++ b/arch/arm64/include/asm/cmpxchg.h @@ -10,7 +10,7 @@ #include <linux/build_bug.h> #include <linux/compiler.h> -#include <asm/atomic.h> +#include <asm/atomic_arch.h> #include <asm/barrier.h> #include <asm/lse.h> diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h index 8262325e2fc6..52b80846d1b7 100644 --- a/arch/arm64/include/asm/lse.h +++ b/arch/arm64/include/asm/lse.h @@ -22,14 +22,6 @@ __asm__(".arch_extension lse"); -/* Move the ll/sc atomics out-of-line */ -#define __LL_SC_INLINE notrace -#define __LL_SC_PREFIX(x) __ll_sc_##x -#define __LL_SC_EXPORT(x) EXPORT_SYMBOL(__LL_SC_PREFIX(x)) - -/* Macro for constructing calls to out-of-line ll/sc atomics */ -#define __LL_SC_CALL(op) "bl\t" __stringify(__LL_SC_PREFIX(op)) "\n" -#define __LL_SC_CLOBBERS "x16", "x17", "x30" /* In-line patching at runtime */ #define ARM64_LSE_ATOMIC_INSN(llsc, lse) \ @@ -46,9 +38,6 @@ __asm__(".arch_extension lse"); #else /* __ASSEMBLER__ */ -#define __LL_SC_INLINE static inline -#define __LL_SC_PREFIX(x) x -#define __LL_SC_EXPORT(x) #define ARM64_LSE_ATOMIC_INSN(llsc, lse) llsc