Message ID | alpine.DEB.2.02.1307282012280.23852@utopia.booyaka.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, * Paul Walmsley <paul@pwsan.com> [130728 13:23]: > > Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm: > don't flush icache in switch_mm with hardware broadcasting") breaks > the boot on OMAP2430SDP with omap2plus_defconfig. Tracked to an > undefined instruction abort from the CP15 read in > cache_ops_need_broadcast(). It turns out that gcc reorders the > extended CP15 read above the is_smp() test. This breaks ARM1136 r0 > cores, since they don't support several CP15 registers that later ARM > cores do. ARM1136JF-S TRM section 3.2.1 "Register allocation" has the > details. > > So, when the kernel is built for ARMv6 cores, mark the extended CP15 > read as clobbering memory, which seems to prevent the compiler from > reordering it before the is_smp() test. Russell states that the code > generated from this approach is preferable to marking the inline asm > as volatile. > > This patch was developed in collaboration with Will Deacon and Russell > King. > > Signed-off-by: Paul Walmsley <paul@pwsan.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: Russell King <rmk+kernel@arm.linux.org.uk> Sorry to be late to this party, I was offline last week. This patch fixes the issue for me: Acked-by: Tony Lindgren <tony@atomide.com>
Hi Paul, On Sun, Jul 28, 2013 at 09:16:29PM +0100, Paul Walmsley wrote: > > Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm: > don't flush icache in switch_mm with hardware broadcasting") breaks > the boot on OMAP2430SDP with omap2plus_defconfig. Tracked to an > undefined instruction abort from the CP15 read in > cache_ops_need_broadcast(). It turns out that gcc reorders the > extended CP15 read above the is_smp() test. This breaks ARM1136 r0 > cores, since they don't support several CP15 registers that later ARM > cores do. ARM1136JF-S TRM section 3.2.1 "Register allocation" has the > details. Cheers for tracking this down. Interestingly, I can't reproduce this with anything other than GCC 4.5.* tools -- 4.6+ do what we want. Still, it looks like a valid (if not misguided) thing to do. > diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h > index 8c25dc4..f428eb0 100644 > --- a/arch/arm/include/asm/cputype.h > +++ b/arch/arm/include/asm/cputype.h > @@ -89,13 +89,25 @@ extern unsigned int processor_id; > __val; \ > }) > > + > +# if defined(CONFIG_CPU_V6) > +/* > + * The mrc in the read_cpuid_ext macro must not be reordered on ARMv6, > + * else the compiler may move it before an is_smp() test, causing > + * undefined instruction aborts on ARM1136 r0. > + */ > +# define CPUID_EXT_REORDER "cc", "memory" > +# else > +# define CPUID_EXT_REORDER "cc" > +# endif > + > #define read_cpuid_ext(ext_reg) \ > ({ \ > unsigned int __val; \ > asm("mrc p15, 0, %0, c0, " ext_reg \ > : "=r" (__val) \ > : \ > - : "cc"); \ > + : CPUID_EXT_REORDER); \ > __val; \ > }) I wouldn't worry about checking for CPU_V6. Besides, we probably need this to be re-evaluated across barrier() when we get CPU migration on a big-little platform anyway (we should probably also drop the __attribute_const__ for that). So you can just replace the "cc" (now that Nico kindly explained why those aren't needed the other day) with "memory". An alternative is to add barrier() between is_smp() and the read_cpuid_ext() in all callers, adding a fake read from the stack to the latter (like I did for the per-cpu accessor). However, this relies on fixing all callers for very little gain, so I don't think it's worth the hassle. I can cook a patch if you're tied up with other things -- just let me know. Cheers, Will
Hi Will On Mon, 29 Jul 2013, Will Deacon wrote: > I wouldn't worry about checking for CPU_V6. Besides, we probably need this > to be re-evaluated across barrier() when we get CPU migration on a > big-little platform anyway (we should probably also drop the > __attribute_const__ for that). > > So you can just replace the "cc" (now that Nico kindly explained why those > aren't needed the other day) with "memory". > > An alternative is to add barrier() between is_smp() and the read_cpuid_ext() > in all callers, adding a fake read from the stack to the latter (like I did > for the per-cpu accessor). However, this relies on fixing all callers for > very little gain, so I don't think it's worth the hassle. > > I can cook a patch if you're tied up with other things -- just let me know. Makes sense to me. Have respun the patch and will post it shortly. Thanks for the extra compiler research; it's been incorporated into the patch description and comments. - Paul
diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h index 8c25dc4..f428eb0 100644 --- a/arch/arm/include/asm/cputype.h +++ b/arch/arm/include/asm/cputype.h @@ -89,13 +89,25 @@ extern unsigned int processor_id; __val; \ }) + +# if defined(CONFIG_CPU_V6) +/* + * The mrc in the read_cpuid_ext macro must not be reordered on ARMv6, + * else the compiler may move it before an is_smp() test, causing + * undefined instruction aborts on ARM1136 r0. + */ +# define CPUID_EXT_REORDER "cc", "memory" +# else +# define CPUID_EXT_REORDER "cc" +# endif + #define read_cpuid_ext(ext_reg) \ ({ \ unsigned int __val; \ asm("mrc p15, 0, %0, c0, " ext_reg \ : "=r" (__val) \ : \ - : "cc"); \ + : CPUID_EXT_REORDER); \ __val; \ })
Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm: don't flush icache in switch_mm with hardware broadcasting") breaks the boot on OMAP2430SDP with omap2plus_defconfig. Tracked to an undefined instruction abort from the CP15 read in cache_ops_need_broadcast(). It turns out that gcc reorders the extended CP15 read above the is_smp() test. This breaks ARM1136 r0 cores, since they don't support several CP15 registers that later ARM cores do. ARM1136JF-S TRM section 3.2.1 "Register allocation" has the details. So, when the kernel is built for ARMv6 cores, mark the extended CP15 read as clobbering memory, which seems to prevent the compiler from reordering it before the is_smp() test. Russell states that the code generated from this approach is preferable to marking the inline asm as volatile. This patch was developed in collaboration with Will Deacon and Russell King. Signed-off-by: Paul Walmsley <paul@pwsan.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> --- Thought I'd respin this to have a discussion strawman. It boots cleanly on 2430SDP. [ Updated "ARM: v6: avoid read_cpuid_ext() on ARM1136r0 in cache_ops_need_broadcast()" to drop the unnecessary ARM1136 r0 test, to switch to a memory clobber per rmk's suggestion, and to update the commit message. ] Intended for v3.11-rc. arch/arm/include/asm/cputype.h | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)