From patchwork Mon Jun 3 19:33:24 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 2654471 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from casper.infradead.org (casper.infradead.org [85.118.1.10]) by patchwork1.kernel.org (Postfix) with ESMTP id 111373FC23 for ; Mon, 3 Jun 2013 19:34:09 +0000 (UTC) Received: from merlin.infradead.org ([2001:4978:20e::2]) by casper.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1UjaVs-0007Ba-HY; Mon, 03 Jun 2013 19:33:52 +0000 Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1UjaVq-0000oZ-1O; Mon, 03 Jun 2013 19:33:50 +0000 Received: from relais.videotron.ca ([24.201.245.36]) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1UjaVl-0000mo-9N for linux-arm-kernel@lists.infradead.org; Mon, 03 Jun 2013 19:33:47 +0000 MIME-version: 1.0 Received: from xanadu.home ([70.83.209.44]) by VL-VM-MR002.ip.videotron.ca (Oracle Communications Messaging Exchange Server 7u4-22.01 64bit (built Apr 21 2011)) with ESMTP id <0MNU00CDO0ZOUU50@VL-VM-MR002.ip.videotron.ca> for linux-arm-kernel@lists.infradead.org; Mon, 03 Jun 2013 15:33:24 -0400 (EDT) Date: Mon, 03 Jun 2013 15:33:24 -0400 (EDT) From: Nicolas Pitre To: Will Deacon Subject: Re: [PATCH] ARM: pcpu: ensure __my_cpu_offset cannot be re-ordered across barrier() In-reply-to: <1370278405-13261-1-git-send-email-will.deacon@arm.com> Message-id: References: <1370278405-13261-1-git-send-email-will.deacon@arm.com> User-Agent: Alpine 2.03 (LFD 1266 2009-07-14) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20130603_153345_356164_2A5CE03E X-CRM114-Status: GOOD ( 23.22 ) X-Spam-Score: -1.9 (-) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-1.9 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [24.201.245.36 listed in list.dnswl.org] -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: Rob Herring , linux-arm-kernel@lists.infradead.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org On Mon, 3 Jun 2013, Will Deacon wrote: > __my_cpu_offset is non-volatile, since we want its value to be cached > when we access several per-cpu variables in a row with preemption > disabled. This means that we rely on preempt_{en,dis}able to hazard > with the operation via the barrier() macro, so that we can't end up > migrating CPUs without reloading the per-cpu offset. > > Unfortunately, GCC doesn't treat a "memory" clobber on a non-volatile > asm block as a side-effect, and will happily re-order it before other > memory clobbers (including those in prempt_disable()) and cache the > value. This has been observed to break the cmpxchg logic in the slub > allocator, leading to livelock in kmem_cache_alloc in mainline kernels. > > This patch adds a dummy memory output operand to __my_cpu_offset, > forcing it to be ordered with respect to the barrier() macro. > > Cc: Rob Herring > Cc: Nicolas Pitre > Signed-off-by: Will Deacon > --- > arch/arm/include/asm/percpu.h | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/arch/arm/include/asm/percpu.h b/arch/arm/include/asm/percpu.h > index 968c0a1..93970eb 100644 > --- a/arch/arm/include/asm/percpu.h > +++ b/arch/arm/include/asm/percpu.h > @@ -30,8 +30,12 @@ static inline void set_my_cpu_offset(unsigned long off) > static inline unsigned long __my_cpu_offset(void) > { > unsigned long off; > - /* Read TPIDRPRW */ > - asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory"); > + /* > + * Read TPIDRPRW. > + * We want to allow caching the value, so avoid using volatile and > + * instead use a fake memory access to hazard against barrier(). > + */ > + asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off), "=Qo" (off)); > return off; > } This doesn't work with the little test I wrote to see the compiler behavior change. Here's my test in case it is flawed: static inline unsigned long __my_cpu_offset(void) { unsigned long off; //asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory"); asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off), "=Qo" (off)); return off; } #define barrier() asm volatile ("" : : : "memory") int foo(int *a, int *b, char *c) { int x, y, z; x = a[__my_cpu_offset()]; barrier(); y = b[__my_cpu_offset()]; z = c[__my_cpu_offset()]; barrier(); return x + y + z + a[__my_cpu_offset()]; } With the original memory clobber (commented out above) the asm output is: foo: mrc p15, 0, r3, c13, c0, 4 ldr ip, [r0, r3, asl #2] ldr r1, [r1, r3, asl #2] ldrb r2, [r2, r3] @ zero_extendqisi2 ldr r3, [r0, r3, asl #2] add r0, ip, r1 add r0, r0, r2 add r0, r0, r3 bx lr So to confirm that no memory barrier is respected in any way. With your change we get this instead: foo: sub sp, sp, #8 mrc p15, 0, r3, c13, c0, 4 str r3, [sp, #4] ldr ip, [r0, r3, asl #2] ldr r1, [r1, r3, asl #2] str r3, [sp, #4] ldrb r2, [r2, r3] @ zero_extendqisi2 ldr r3, [r0, r3, asl #2] add r0, ip, r1 add r0, r0, r2 add r0, r0, r3 add sp, sp, #8 bx lr So not only the barrier is completely ineffective at reloading the CPU offset, but it introduces a useless store onto the stack. Same behavior with gcc versions 4.7.3, 4.6.2 and 4.5.1 (yes, that's really what I have on my system -- waiting for 4.8.4 now). So if the preemption count is really what this should be protected against, we should simply be adding that as an input dependency to the inline asm constraint as follows: The problem with this approach is that current_thread_info() is not a static location either as it is always derrived from the stack pointer, so it needlessly computes it even when nothing else uses it in the same function. Nicolas diff --git a/arch/arm/include/asm/percpu.h b/arch/arm/include/asm/percpu.h index 968c0a14e0..52f9ca33e7 100644 --- a/arch/arm/include/asm/percpu.h +++ b/arch/arm/include/asm/percpu.h @@ -31,7 +31,9 @@ static inline unsigned long __my_cpu_offset(void) { unsigned long off; /* Read TPIDRPRW */ - asm("mrc p15, 0, %0, c13, c0, 4" : "=r" (off) : : "memory"); + asm("mrc p15, 0, %0, c13, c0, 4" + : "=r" (off) + : "m" (current_thread_info()->preempt_count)); return off; } #define __my_cpu_offset __my_cpu_offset()