diff mbox

ARM: v6: prevent gcc from reordering extended CP15 reads above is_smp() test

Message ID alpine.DEB.2.02.1307282012280.23852@utopia.booyaka.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paul Walmsley July 28, 2013, 8:16 p.m. UTC
Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm:
don't flush icache in switch_mm with hardware broadcasting") breaks
the boot on OMAP2430SDP with omap2plus_defconfig.  Tracked to an
undefined instruction abort from the CP15 read in
cache_ops_need_broadcast().  It turns out that gcc reorders the
extended CP15 read above the is_smp() test.  This breaks ARM1136 r0
cores, since they don't support several CP15 registers that later ARM
cores do.  ARM1136JF-S TRM section 3.2.1 "Register allocation" has the
details.

So, when the kernel is built for ARMv6 cores, mark the extended CP15
read as clobbering memory, which seems to prevent the compiler from
reordering it before the is_smp() test.  Russell states that the code
generated from this approach is preferable to marking the inline asm
as volatile.

This patch was developed in collaboration with Will Deacon and Russell
King.

Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
---

Thought I'd respin this to have a discussion strawman.  It boots cleanly 
on 2430SDP.

[ Updated "ARM: v6: avoid read_cpuid_ext() on ARM1136r0 in 
cache_ops_need_broadcast()" to drop the unnecessary ARM1136 r0 test, to 
switch to a memory clobber per rmk's suggestion, and to update the commit 
message. ]

Intended for v3.11-rc.


 arch/arm/include/asm/cputype.h | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Tony Lindgren July 29, 2013, 7:30 a.m. UTC | #1
Hi,

* Paul Walmsley <paul@pwsan.com> [130728 13:23]:
> 
> Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm:
> don't flush icache in switch_mm with hardware broadcasting") breaks
> the boot on OMAP2430SDP with omap2plus_defconfig.  Tracked to an
> undefined instruction abort from the CP15 read in
> cache_ops_need_broadcast().  It turns out that gcc reorders the
> extended CP15 read above the is_smp() test.  This breaks ARM1136 r0
> cores, since they don't support several CP15 registers that later ARM
> cores do.  ARM1136JF-S TRM section 3.2.1 "Register allocation" has the
> details.
> 
> So, when the kernel is built for ARMv6 cores, mark the extended CP15
> read as clobbering memory, which seems to prevent the compiler from
> reordering it before the is_smp() test.  Russell states that the code
> generated from this approach is preferable to marking the inline asm
> as volatile.
> 
> This patch was developed in collaboration with Will Deacon and Russell
> King.
> 
> Signed-off-by: Paul Walmsley <paul@pwsan.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Russell King <rmk+kernel@arm.linux.org.uk>

Sorry to be late to this party, I was offline last week. This
patch fixes the issue for me:

Acked-by: Tony Lindgren <tony@atomide.com>
Will Deacon July 29, 2013, 10:02 a.m. UTC | #2
Hi Paul,

On Sun, Jul 28, 2013 at 09:16:29PM +0100, Paul Walmsley wrote:
> 
> Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm:
> don't flush icache in switch_mm with hardware broadcasting") breaks
> the boot on OMAP2430SDP with omap2plus_defconfig.  Tracked to an
> undefined instruction abort from the CP15 read in
> cache_ops_need_broadcast().  It turns out that gcc reorders the
> extended CP15 read above the is_smp() test.  This breaks ARM1136 r0
> cores, since they don't support several CP15 registers that later ARM
> cores do.  ARM1136JF-S TRM section 3.2.1 "Register allocation" has the
> details.

Cheers for tracking this down. Interestingly, I can't reproduce this with
anything other than GCC 4.5.* tools -- 4.6+ do what we want. Still, it looks
like a valid (if not misguided) thing to do.

> diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h
> index 8c25dc4..f428eb0 100644
> --- a/arch/arm/include/asm/cputype.h
> +++ b/arch/arm/include/asm/cputype.h
> @@ -89,13 +89,25 @@ extern unsigned int processor_id;
>  		__val;							\
>  	})
>  
> +
> +# if defined(CONFIG_CPU_V6)
> +/*
> + * The mrc in the read_cpuid_ext macro must not be reordered on ARMv6,
> + * else the compiler may move it before an is_smp() test, causing
> + * undefined instruction aborts on ARM1136 r0.
> + */
> +# define CPUID_EXT_REORDER	"cc", "memory"
> +# else
> +# define CPUID_EXT_REORDER	"cc"
> +# endif
> +
>  #define read_cpuid_ext(ext_reg)						\
>  	({								\
>  		unsigned int __val;					\
>  		asm("mrc	p15, 0, %0, c0, " ext_reg		\
>  		    : "=r" (__val)					\
>  		    :							\
> -		    : "cc");						\
> +		    : CPUID_EXT_REORDER);				\
>  		__val;							\
>  	})

I wouldn't worry about checking for CPU_V6. Besides, we probably need this
to be re-evaluated across barrier() when we get CPU migration on a
big-little platform anyway (we should probably also drop the
__attribute_const__ for that).

So you can just replace the "cc" (now that Nico kindly explained why those
aren't needed the other day) with "memory".

An alternative is to add barrier() between is_smp() and the read_cpuid_ext()
in all callers, adding a fake read from the stack to the latter (like I did
for the per-cpu accessor). However, this relies on fixing all callers for
very little gain, so I don't think it's worth the hassle.

I can cook a patch if you're tied up with other things -- just let me know.

Cheers,

Will
Paul Walmsley July 30, 2013, 10:58 a.m. UTC | #3
Hi Will

On Mon, 29 Jul 2013, Will Deacon wrote:

> I wouldn't worry about checking for CPU_V6. Besides, we probably need this
> to be re-evaluated across barrier() when we get CPU migration on a
> big-little platform anyway (we should probably also drop the
> __attribute_const__ for that).
> 
> So you can just replace the "cc" (now that Nico kindly explained why those
> aren't needed the other day) with "memory".
> 
> An alternative is to add barrier() between is_smp() and the read_cpuid_ext()
> in all callers, adding a fake read from the stack to the latter (like I did
> for the per-cpu accessor). However, this relies on fixing all callers for
> very little gain, so I don't think it's worth the hassle.
> 
> I can cook a patch if you're tied up with other things -- just let me know.

Makes sense to me.  Have respun the patch and will post it shortly.  
Thanks for the extra compiler research; it's been incorporated into the 
patch description and comments.


- Paul
diff mbox

Patch

diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h
index 8c25dc4..f428eb0 100644
--- a/arch/arm/include/asm/cputype.h
+++ b/arch/arm/include/asm/cputype.h
@@ -89,13 +89,25 @@  extern unsigned int processor_id;
 		__val;							\
 	})
 
+
+# if defined(CONFIG_CPU_V6)
+/*
+ * The mrc in the read_cpuid_ext macro must not be reordered on ARMv6,
+ * else the compiler may move it before an is_smp() test, causing
+ * undefined instruction aborts on ARM1136 r0.
+ */
+# define CPUID_EXT_REORDER	"cc", "memory"
+# else
+# define CPUID_EXT_REORDER	"cc"
+# endif
+
 #define read_cpuid_ext(ext_reg)						\
 	({								\
 		unsigned int __val;					\
 		asm("mrc	p15, 0, %0, c0, " ext_reg		\
 		    : "=r" (__val)					\
 		    :							\
-		    : "cc");						\
+		    : CPUID_EXT_REORDER);				\
 		__val;							\
 	})