From patchwork Thu Apr 2 22:57:59 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Russell King - ARM Linux X-Patchwork-Id: 6151981 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 08349BF4A6 for ; Thu, 2 Apr 2015 23:01:12 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D58D82034C for ; Thu, 2 Apr 2015 23:01:10 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9B3EF20340 for ; Thu, 2 Apr 2015 23:01:09 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1Ydo4L-0000M6-Ua; Thu, 02 Apr 2015 22:58:37 +0000 Received: from pandora.arm.linux.org.uk ([2001:4d48:ad52:3201:214:fdff:fe10:1be6]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1Ydo4D-0000D1-Kx for linux-arm-kernel@lists.infradead.org; Thu, 02 Apr 2015 22:58:31 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=arm.linux.org.uk; s=pandora-2014; h=Sender:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:To:From:Date; bh=rqFwfDoXJWpwQM+NF+XYWjSURyfo52u56trg0YwjDUE=; b=dVc8eqYiEfNxlOoe5hIyeSPFmhDWW2FQE/SJFjP9N+HyTNau36ASkz1JuW5/hz1r8OubuDLruBdoYwpGND0yQQmtugoNRwNoAB2usCgn+NWpDqTaS4OkxRKnxg/ZIYAHtOL6cJk0bnG/vMY9zqNWZsRC/HTvHEnpwwgEAvvGZY4=; Received: from n2100.arm.linux.org.uk ([fd8f:7570:feb6:1:214:fdff:fe10:4f86]:52559) by pandora.arm.linux.org.uk with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.82_1-5b7a7c0-XX) (envelope-from ) id 1Ydo3o-0004bU-FU; Thu, 02 Apr 2015 23:58:04 +0100 Received: from linux by n2100.arm.linux.org.uk with local (Exim 4.76) (envelope-from ) id 1Ydo3j-0004dN-PW; Thu, 02 Apr 2015 23:57:59 +0100 Date: Thu, 2 Apr 2015 23:57:59 +0100 From: Russell King - ARM Linux To: linux-arm-kernel@lists.infradead.org, Will Deacon , Catalin Marinas Subject: Re: [RFC] mixture of cleanups to cache-v7.S Message-ID: <20150402225759.GY24899@n2100.arm.linux.org.uk> References: <20150402224947.GX24899@n2100.arm.linux.org.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150402224947.GX24899@n2100.arm.linux.org.uk> User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20150402_155830_129108_65418FF2 X-CRM114-Status: GOOD ( 25.76 ) X-Spam-Score: -0.1 (/) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED, T_DKIM_INVALID, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Apr 02, 2015 at 11:49:47PM +0100, Russell King - ARM Linux wrote: > Several cleanups are in the patch below... I'll separate them out, but > I'd like to hear comments on them. Basically: > > 1. cache-v7.S is built for ARMv7 CPUs, so there's no reason not to > use movw and movt when loading large constants, rather than using > "ldr rd,=constant" > > 2. we can do a much more efficient check for the errata in > v7_flush_dcache_louis than we were doing - rather than putting the > work-around code in the fast path, we can re-organise this such that > we only try to run the workaround code if the LoU field is zero. > > 3. shift the bitfield we want to extract in the CLIDR to the appropriate > bit position prior to masking; this reduces the complexity of the > code, particularly with the SMP differences in v7_flush_dcache_louis. > > 4. pre-shift the Cortex A9 MIDR value to be checked, and shift the > actual MIDR to lose the bottom four revision bits. > > 5. as the v7_flush_dcache_louis code is more optimal, I see no reason not > to enable this workaround by default now - if people really want it to > be disabled, they can still choose that option. This is in addition > to Versatile Express enabling it. Given the memory corrupting abilities > of not having this errata enabled, I think it's only sane that it's > something that should be encouraged to be enabled, even though it only > affects r0pX CPUs. > > One obvious issue comes up here though - in the case that the LoU bits > are validly zero, we merely return from v7_flush_dcache_louis with no > DSB or ISB. However v7_flush_dcache_all always has a DSB or ISB at the > end, even if LoC is zero. Is this an intentional difference, or should > v7_flush_dcache_louis always end with a DSB+ISB ? I should point out that if the DSB+ISB is needed, then the code can instead become as below - basically, we just move the CLIDR into the appropriate position and call start_flush_levels, which does the DMB, applies the mask to extract the appropriate field, and then decides whether it has any levels to process. diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 2eb6de9465bf..c26dfef393cd 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1139,6 +1139,7 @@ config ARM_ERRATA_742231 config ARM_ERRATA_643719 bool "ARM errata: LoUIS bit field in CLIDR register is incorrect" depends on CPU_V7 && SMP + default y help This option enables the workaround for the 643719 Cortex-A9 (prior to r1p0) erratum. On affected cores the LoUIS bit field of the CLIDR diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index b966656d2c2d..14bfdd584385 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -36,10 +36,10 @@ ENTRY(v7_invalidate_l1) mcr p15, 2, r0, c0, c0, 0 mrc p15, 1, r0, c0, c0, 0 - ldr r1, =0x7fff + movw r1, #0x7fff and r2, r1, r0, lsr #13 - ldr r1, =0x3ff + movw r1, #0x3ff and r3, r1, r0, lsr #3 @ NumWays - 1 add r2, r2, #1 @ NumSets @@ -88,23 +88,20 @@ ENDPROC(v7_flush_icache_all) */ ENTRY(v7_flush_dcache_louis) - dmb @ ensure ordering with previous memory accesses mrc p15, 1, r0, c0, c0, 1 @ read clidr, r0 = clidr - ALT_SMP(ands r3, r0, #(7 << 21)) @ extract LoUIS from clidr - ALT_UP(ands r3, r0, #(7 << 27)) @ extract LoUU from clidr +ALT_SMP(mov r3, r0, lsr #20) @ move LoUIS into position +ALT_UP( mov r3, r0, lsr #26) @ move LoUU into position #ifdef CONFIG_ARM_ERRATA_643719 - ALT_SMP(mrceq p15, 0, r2, c0, c0, 0) @ read main ID register - ALT_UP(reteq lr) @ LoUU is zero, so nothing to do - ldreq r1, =0x410fc090 @ ID of ARM Cortex A9 r0p? - biceq r2, r2, #0x0000000f @ clear minor revision number - teqeq r2, r1 @ test for errata affected core and if so... - orreqs r3, #(1 << 21) @ fix LoUIS value (and set flags state to 'ne') +ALT_SMP(ands r3, r3, #7 << 1) @ extract LoU field from clidr +ALT_UP( b start_flush_levels) + bne start_flush_levels @ LoU != 0, start flushing + mrc p15, 0, r2, c0, c0, 0 @ read main ID register + movw r1, #:lower16:(0x410fc090 >> 4) @ ID of ARM Cortex A9 r0p? + movt r1, #:upper16:(0x410fc090 >> 4) + teq r1, r2, lsr #4 @ test for errata affected core and if so... + moveq r3, #1 << 1 @ fix LoUIS value (and set flags state to 'ne') #endif - ALT_SMP(mov r3, r3, lsr #20) @ r3 = LoUIS * 2 - ALT_UP(mov r3, r3, lsr #26) @ r3 = LoUU * 2 - reteq lr @ return if level == 0 - mov r10, #0 @ r10 (starting level) = 0 - b flush_levels @ start flushing cache levels + b start_flush_levels @ start flushing cache levels ENDPROC(v7_flush_dcache_louis) /* @@ -117,10 +114,11 @@ ENDPROC(v7_flush_dcache_louis) * - mm - mm_struct describing address space */ ENTRY(v7_flush_dcache_all) - dmb @ ensure ordering with previous memory accesses mrc p15, 1, r0, c0, c0, 1 @ read clidr - ands r3, r0, #0x7000000 @ extract loc from clidr - mov r3, r3, lsr #23 @ left align loc bit field + mov r3, r0, lsr #23 @ align LoC +start_flush_levels: + dmb @ ensure ordering with previous memory accesses + ands r3, r3, #7 << 1 @ extract loc from clidr beq finished @ if loc is 0, then no need to clean mov r10, #0 @ start clean at cache level 0 flush_levels: @@ -140,10 +138,10 @@ flush_levels: #endif and r2, r1, #7 @ extract the length of the cache lines add r2, r2, #4 @ add 4 (line length offset) - ldr r4, =0x3ff + movw r4, #0x3ff ands r4, r4, r1, lsr #3 @ find maximum number on the way size clz r5, r4 @ find bit position of way size increment - ldr r7, =0x7fff + movw r7, #0x7fff ands r7, r7, r1, lsr #13 @ extract max number of the index size loop1: mov r9, r7 @ create working copy of max index