From patchwork Thu Apr 2 22:49:47 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Russell King - ARM Linux X-Patchwork-Id: 6151971 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 54E1FBF4A6 for ; Thu, 2 Apr 2015 22:53:00 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 14E5820340 for ; Thu, 2 Apr 2015 22:52:59 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7D86B202BE for ; Thu, 2 Apr 2015 22:52:57 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1YdnwL-0004xT-VQ; Thu, 02 Apr 2015 22:50:21 +0000 Received: from pandora.arm.linux.org.uk ([2001:4d48:ad52:3201:214:fdff:fe10:1be6]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1YdnwH-0003ma-CR for linux-arm-kernel@lists.infradead.org; Thu, 02 Apr 2015 22:50:19 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=arm.linux.org.uk; s=pandora-2014; h=Sender:Content-Type:MIME-Version:Message-ID:Subject:To:From:Date; bh=/h1Go789A1e1Dxjrp9f63DjMDR6IEwurBIDK8n8F11o=; b=NTNZu6UYvF50cWmVZbHoY+Qm5JGtPuRQKcduNZqmLQ4HgBVlJtdh25hom6U0URBqa6nsogmGcY4Dfp/TpGXkXT8zNTMFIqjBWD64aoowwPvnUOO29BAsbQKsc1VIRv9TWKCiBpmT5UO7LRMP9IwsKdLz/S4RZ0UcO4kQ48FdfUo=; Received: from n2100.arm.linux.org.uk ([fd8f:7570:feb6:1:214:fdff:fe10:4f86]:52545) by pandora.arm.linux.org.uk with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.82_1-5b7a7c0-XX) (envelope-from ) id 1Ydnvq-0004YV-OB; Thu, 02 Apr 2015 23:49:50 +0100 Received: from linux by n2100.arm.linux.org.uk with local (Exim 4.76) (envelope-from ) id 1Ydnvn-0004Xm-RZ; Thu, 02 Apr 2015 23:49:47 +0100 Date: Thu, 2 Apr 2015 23:49:47 +0100 From: Russell King - ARM Linux To: linux-arm-kernel@lists.infradead.org, Will Deacon , Catalin Marinas Subject: [RFC] mixture of cleanups to cache-v7.S Message-ID: <20150402224947.GX24899@n2100.arm.linux.org.uk> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20150402_155017_953340_3A1D40DA X-CRM114-Status: GOOD ( 18.19 ) X-Spam-Score: -0.1 (/) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED, T_DKIM_INVALID, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Several cleanups are in the patch below... I'll separate them out, but I'd like to hear comments on them. Basically: 1. cache-v7.S is built for ARMv7 CPUs, so there's no reason not to use movw and movt when loading large constants, rather than using "ldr rd,=constant" 2. we can do a much more efficient check for the errata in v7_flush_dcache_louis than we were doing - rather than putting the work-around code in the fast path, we can re-organise this such that we only try to run the workaround code if the LoU field is zero. 3. shift the bitfield we want to extract in the CLIDR to the appropriate bit position prior to masking; this reduces the complexity of the code, particularly with the SMP differences in v7_flush_dcache_louis. 4. pre-shift the Cortex A9 MIDR value to be checked, and shift the actual MIDR to lose the bottom four revision bits. 5. as the v7_flush_dcache_louis code is more optimal, I see no reason not to enable this workaround by default now - if people really want it to be disabled, they can still choose that option. This is in addition to Versatile Express enabling it. Given the memory corrupting abilities of not having this errata enabled, I think it's only sane that it's something that should be encouraged to be enabled, even though it only affects r0pX CPUs. One obvious issue comes up here though - in the case that the LoU bits are validly zero, we merely return from v7_flush_dcache_louis with no DSB or ISB. However v7_flush_dcache_all always has a DSB or ISB at the end, even if LoC is zero. Is this an intentional difference, or should v7_flush_dcache_louis always end with a DSB+ISB ? I haven't tested this patch yet... so no sign-off on it yet. Reviewed-by: Catalin Marinas diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 2eb6de9465bf..c26dfef393cd 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1139,6 +1139,7 @@ config ARM_ERRATA_742231 config ARM_ERRATA_643719 bool "ARM errata: LoUIS bit field in CLIDR register is incorrect" depends on CPU_V7 && SMP + default y help This option enables the workaround for the 643719 Cortex-A9 (prior to r1p0) erratum. On affected cores the LoUIS bit field of the CLIDR diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index b966656d2c2d..1010bebe05eb 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -36,10 +36,10 @@ ENTRY(v7_invalidate_l1) mcr p15, 2, r0, c0, c0, 0 mrc p15, 1, r0, c0, c0, 0 - ldr r1, =0x7fff + movw r1, #0x7fff and r2, r1, r0, lsr #13 - ldr r1, =0x3ff + movw r1, #0x3ff and r3, r1, r0, lsr #3 @ NumWays - 1 add r2, r2, #1 @ NumSets @@ -90,21 +90,20 @@ ENDPROC(v7_flush_icache_all) ENTRY(v7_flush_dcache_louis) dmb @ ensure ordering with previous memory accesses mrc p15, 1, r0, c0, c0, 1 @ read clidr, r0 = clidr - ALT_SMP(ands r3, r0, #(7 << 21)) @ extract LoUIS from clidr - ALT_UP(ands r3, r0, #(7 << 27)) @ extract LoUU from clidr +ALT_SMP(mov r3, r0, lsr #20) @ move LoUIS into position +ALT_UP( mov r3, r0, lsr #26) @ move LoUU into position + ands r3, r3, #7 << 1 @ extract LoU field from clidr + bne start_flush_levels @ LoU != 0, start flushing #ifdef CONFIG_ARM_ERRATA_643719 - ALT_SMP(mrceq p15, 0, r2, c0, c0, 0) @ read main ID register - ALT_UP(reteq lr) @ LoUU is zero, so nothing to do - ldreq r1, =0x410fc090 @ ID of ARM Cortex A9 r0p? - biceq r2, r2, #0x0000000f @ clear minor revision number - teqeq r2, r1 @ test for errata affected core and if so... - orreqs r3, #(1 << 21) @ fix LoUIS value (and set flags state to 'ne') +ALT_SMP(mrc p15, 0, r2, c0, c0, 0) @ read main ID register +ALT_UP( ret lr) @ LoUU is zero, so nothing to do + movw r1, #:lower16:(0x410fc090 >> 4) @ ID of ARM Cortex A9 r0p? + movt r1, #:upper16:(0x410fc090 >> 4) + teq r1, r2, lsr #4 @ test for errata affected core and if so... + moveq r3, #1 << 1 @ fix LoUIS value (and set flags state to 'ne') + beq start_flush_levels @ start flushing cache levels #endif - ALT_SMP(mov r3, r3, lsr #20) @ r3 = LoUIS * 2 - ALT_UP(mov r3, r3, lsr #26) @ r3 = LoUU * 2 - reteq lr @ return if level == 0 - mov r10, #0 @ r10 (starting level) = 0 - b flush_levels @ start flushing cache levels + ret lr ENDPROC(v7_flush_dcache_louis) /* @@ -119,9 +118,10 @@ ENDPROC(v7_flush_dcache_louis) ENTRY(v7_flush_dcache_all) dmb @ ensure ordering with previous memory accesses mrc p15, 1, r0, c0, c0, 1 @ read clidr - ands r3, r0, #0x7000000 @ extract loc from clidr - mov r3, r3, lsr #23 @ left align loc bit field + mov r3, r0, lsr #23 @ align LoC + ands r3, r3, #7 << 1 @ extract loc from clidr beq finished @ if loc is 0, then no need to clean +start_flush_levels: mov r10, #0 @ start clean at cache level 0 flush_levels: add r2, r10, r10, lsr #1 @ work out 3x current cache level @@ -140,10 +140,10 @@ flush_levels: #endif and r2, r1, #7 @ extract the length of the cache lines add r2, r2, #4 @ add 4 (line length offset) - ldr r4, =0x3ff + movw r4, #0x3ff ands r4, r4, r1, lsr #3 @ find maximum number on the way size clz r5, r4 @ find bit position of way size increment - ldr r7, =0x7fff + movw r7, #0x7fff ands r7, r7, r1, lsr #13 @ extract max number of the index size loop1: mov r9, r7 @ create working copy of max index