From patchwork Tue Aug 23 11:06:02 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Russell King - ARM Linux X-Patchwork-Id: 1088042 Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p7NB6Q23006242 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 23 Aug 2011 11:06:48 GMT Received: from canuck.infradead.org ([2001:4978:20e::1]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1QvooB-0002rP-FZ; Tue, 23 Aug 2011 11:06:15 +0000 Received: from localhost ([127.0.0.1] helo=canuck.infradead.org) by canuck.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1QvooB-0006dC-0d; Tue, 23 Aug 2011 11:06:15 +0000 Received: from [2002:4e20:1eda::1] (helo=caramon.arm.linux.org.uk) by canuck.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1Qvoo6-0006cu-18 for linux-arm-kernel@lists.infradead.org; Tue, 23 Aug 2011 11:06:12 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=arm.linux.org.uk; s=caramon; h=Sender:Content-Type:MIME-Version:Message-ID:Subject:To:From:Date; bh=F00cSJThAQMRSXukIclOMeer/0Nv2r8/14vcSxouJWM=; b=MbAw+Wm4Q44xTBbgm45QXza2Iw4M4JkBBgBxMjvJ31FQdClO4h1ZSK9uIsayxOyntU4HtKO/Dj5x0jOUgxbU5GpMQH8ubOAjFpL1aKZI93UeOhioSLkBX8qxjK1MqW59/LydGIi17zQdTsBJMlbfDVWHPnPZIV4/IM+q+ECex7U=; Received: from n2100.arm.linux.org.uk ([2002:4e20:1eda:1:214:fdff:fe10:4f86]) by caramon.arm.linux.org.uk with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1Qvoo0-0007HU-G6 for linux-arm-kernel@lists.infradead.org; Tue, 23 Aug 2011 12:06:05 +0100 Received: from linux by n2100.arm.linux.org.uk with local (Exim 4.72) (envelope-from ) id 1Qvony-0005wn-Sx for linux-arm-kernel@lists.infradead.org; Tue, 23 Aug 2011 12:06:02 +0100 Date: Tue, 23 Aug 2011 12:06:02 +0100 From: Russell King - ARM Linux To: linux-arm-kernel@lists.infradead.org Subject: [PATCH] Optimize multi-CPU tlb flushing a little more Message-ID: <20110823110602.GG19622@n2100.arm.linux.org.uk> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.19 (2009-01-05) X-CRM114-Version: 20090807-BlameThorstenAndJenny ( TRE 0.7.6 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20110823_070611_077446_1193EDC9 X-CRM114-Status: GOOD ( 14.55 ) X-Spam-Score: 1.2 (+) X-Spam-Report: SpamAssassin version 3.3.1 on canuck.infradead.org summary: Content analysis details: (1.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS 0.0 TO_NO_BRKTS_NORDNS To: misformatted and no rDNS X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Tue, 23 Aug 2011 11:06:48 +0000 (UTC) The compiler does not conditionalize the assembly instructions for the tlb operations, which leads to sub-optimal code being generated when building a kernel for multiple CPUs. We can tweak things fairly simply as the code fragment below shows: 17f8: e3120001 tst r2, #1 ; 0x1 ... 1800: 0a000000 beq 1808 1804: ee061f10 mcr 15, 0, r1, cr6, cr0, {0} 1808: e3120004 tst r2, #4 ; 0x4 180c: 0a000000 beq 1814 1810: ee081f36 mcr 15, 0, r1, cr8, cr6, {1} becomes: 17f0: e3120001 tst r2, #1 ; 0x1 17f4: 1e063f10 mcrne 15, 0, r3, cr6, cr0, {0} 17f8: e3120004 tst r2, #4 ; 0x4 17fc: 1e083f36 mcrne 15, 0, r3, cr8, cr6, {1} Signed-off-by: Russell King --- arch/arm/include/asm/tlbflush.h | 128 ++++++++++++++++------------------------ 1 file changed, 52 insertions(+), 76 deletions(-) diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h index d2005de..252874c 100644 --- a/arch/arm/include/asm/tlbflush.h +++ b/arch/arm/include/asm/tlbflush.h @@ -318,6 +318,18 @@ extern struct cpu_tlb_fns cpu_tlb; #define tlb_flag(f) ((always_tlb_flags & (f)) || (__tlb_flag & possible_tlb_flags & (f))) +#define tlb_op(f, regs, arg) \ + do { \ + if (always_tlb_flags & (f)) \ + asm("mcr p15, 0, %0, " regs \ + : : "r" (arg) : "cc"); \ + else if (possible_tlb_flags & (f)) \ + asm("tst %1, %2\n\t" \ + "mcrne p15, 0, %0, " regs \ + : : "r" (arg), "r" (__tlb_flag), "Ir" (f) \ + : "cc"); \ + } while (0) + static inline void local_flush_tlb_all(void) { const int zero = 0; @@ -326,16 +338,11 @@ static inline void local_flush_tlb_all(void) if (tlb_flag(TLB_WB)) dsb(); - if (tlb_flag(TLB_V3_FULL)) - asm("mcr p15, 0, %0, c6, c0, 0" : : "r" (zero) : "cc"); - if (tlb_flag(TLB_V4_U_FULL | TLB_V6_U_FULL)) - asm("mcr p15, 0, %0, c8, c7, 0" : : "r" (zero) : "cc"); - if (tlb_flag(TLB_V4_D_FULL | TLB_V6_D_FULL)) - asm("mcr p15, 0, %0, c8, c6, 0" : : "r" (zero) : "cc"); - if (tlb_flag(TLB_V4_I_FULL | TLB_V6_I_FULL)) - asm("mcr p15, 0, %0, c8, c5, 0" : : "r" (zero) : "cc"); - if (tlb_flag(TLB_V7_UIS_FULL)) - asm("mcr p15, 0, %0, c8, c3, 0" : : "r" (zero) : "cc"); + tlb_op(TLB_V3_FULL, "c6, c0, 0", zero); + tlb_op(TLB_V4_U_FULL | TLB_V6_U_FULL, "c8, c7, 0", zero); + tlb_op(TLB_V4_D_FULL | TLB_V6_D_FULL, "c8, c6, 0", zero); + tlb_op(TLB_V4_I_FULL | TLB_V6_I_FULL, "c8, c5, 0", zero); + tlb_op(TLB_V7_UIS_FULL, "c8, c3, 0", zero); if (tlb_flag(TLB_BARRIER)) { dsb(); @@ -352,29 +359,22 @@ static inline void local_flush_tlb_mm(struct mm_struct *mm) if (tlb_flag(TLB_WB)) dsb(); - if (cpumask_test_cpu(get_cpu(), mm_cpumask(mm))) { - if (tlb_flag(TLB_V3_FULL)) - asm("mcr p15, 0, %0, c6, c0, 0" : : "r" (zero) : "cc"); - if (tlb_flag(TLB_V4_U_FULL)) - asm("mcr p15, 0, %0, c8, c7, 0" : : "r" (zero) : "cc"); - if (tlb_flag(TLB_V4_D_FULL)) - asm("mcr p15, 0, %0, c8, c6, 0" : : "r" (zero) : "cc"); - if (tlb_flag(TLB_V4_I_FULL)) - asm("mcr p15, 0, %0, c8, c5, 0" : : "r" (zero) : "cc"); + if (possible_tlb_flags & (TLB_V3_FULL|TLB_V4_U_FULL|TLB_V4_D_FULL|TLB_V4_I_FULL) && + cpumask_test_cpu(get_cpu(), mm_cpumask(mm))) { + tlb_op(TLB_V3_FULL, "c6, c0, 0", zero); + tlb_op(TLB_V4_U_FULL, "c8, c7, 0", zero); + tlb_op(TLB_V4_D_FULL, "c8, c6, 0", zero); + tlb_op(TLB_V4_I_FULL, "c8, c5, 0", zero); } put_cpu(); - if (tlb_flag(TLB_V6_U_ASID)) - asm("mcr p15, 0, %0, c8, c7, 2" : : "r" (asid) : "cc"); - if (tlb_flag(TLB_V6_D_ASID)) - asm("mcr p15, 0, %0, c8, c6, 2" : : "r" (asid) : "cc"); - if (tlb_flag(TLB_V6_I_ASID)) - asm("mcr p15, 0, %0, c8, c5, 2" : : "r" (asid) : "cc"); - if (tlb_flag(TLB_V7_UIS_ASID)) + tlb_op(TLB_V6_U_ASID, "c8, c7, 2", asid); + tlb_op(TLB_V6_D_ASID, "c8, c6, 2", asid); + tlb_op(TLB_V6_I_ASID, "c8, c5, 2", asid); #ifdef CONFIG_ARM_ERRATA_720789 - asm("mcr p15, 0, %0, c8, c3, 0" : : "r" (zero) : "cc"); + tlb_op(TLB_V7_UIS_ASID, "c8, c3, 0", zero); #else - asm("mcr p15, 0, %0, c8, c3, 2" : : "r" (asid) : "cc"); + tlb_op(TLB_V7_UIS_ASID, "c8, c3, 2", asid); #endif if (tlb_flag(TLB_BARRIER)) @@ -392,30 +392,23 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr) if (tlb_flag(TLB_WB)) dsb(); - if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) { - if (tlb_flag(TLB_V3_PAGE)) - asm("mcr p15, 0, %0, c6, c0, 0" : : "r" (uaddr) : "cc"); - if (tlb_flag(TLB_V4_U_PAGE)) - asm("mcr p15, 0, %0, c8, c7, 1" : : "r" (uaddr) : "cc"); - if (tlb_flag(TLB_V4_D_PAGE)) - asm("mcr p15, 0, %0, c8, c6, 1" : : "r" (uaddr) : "cc"); - if (tlb_flag(TLB_V4_I_PAGE)) - asm("mcr p15, 0, %0, c8, c5, 1" : : "r" (uaddr) : "cc"); + if (possible_tlb_flags & (TLB_V3_PAGE|TLB_V4_U_PAGE|TLB_V4_D_PAGE|TLB_V4_I_PAGE|TLB_V4_I_FULL) && + cpumask_test_cpu(smp_processor_id(), mm_cpumask(vma->vm_mm))) { + tlb_op(TLB_V3_PAGE, "c6, c0, 0", uaddr); + tlb_op(TLB_V4_U_PAGE, "c8, c7, 1", uaddr); + tlb_op(TLB_V4_D_PAGE, "c8, c6, 1", uaddr); + tlb_op(TLB_V4_I_PAGE, "c8, c5, 1", uaddr); if (!tlb_flag(TLB_V4_I_PAGE) && tlb_flag(TLB_V4_I_FULL)) asm("mcr p15, 0, %0, c8, c5, 0" : : "r" (zero) : "cc"); } - if (tlb_flag(TLB_V6_U_PAGE)) - asm("mcr p15, 0, %0, c8, c7, 1" : : "r" (uaddr) : "cc"); - if (tlb_flag(TLB_V6_D_PAGE)) - asm("mcr p15, 0, %0, c8, c6, 1" : : "r" (uaddr) : "cc"); - if (tlb_flag(TLB_V6_I_PAGE)) - asm("mcr p15, 0, %0, c8, c5, 1" : : "r" (uaddr) : "cc"); - if (tlb_flag(TLB_V7_UIS_PAGE)) + tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", uaddr); + tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", uaddr); + tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", uaddr); #ifdef CONFIG_ARM_ERRATA_720789 - asm("mcr p15, 0, %0, c8, c3, 3" : : "r" (uaddr & PAGE_MASK) : "cc"); + tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 3", uaddr & PAGE_MASK); #else - asm("mcr p15, 0, %0, c8, c3, 1" : : "r" (uaddr) : "cc"); + tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 1", uaddr); #endif if (tlb_flag(TLB_BARRIER)) @@ -432,25 +425,17 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr) if (tlb_flag(TLB_WB)) dsb(); - if (tlb_flag(TLB_V3_PAGE)) - asm("mcr p15, 0, %0, c6, c0, 0" : : "r" (kaddr) : "cc"); - if (tlb_flag(TLB_V4_U_PAGE)) - asm("mcr p15, 0, %0, c8, c7, 1" : : "r" (kaddr) : "cc"); - if (tlb_flag(TLB_V4_D_PAGE)) - asm("mcr p15, 0, %0, c8, c6, 1" : : "r" (kaddr) : "cc"); - if (tlb_flag(TLB_V4_I_PAGE)) - asm("mcr p15, 0, %0, c8, c5, 1" : : "r" (kaddr) : "cc"); + tlb_op(TLB_V3_PAGE, "c6, c0, 0", kaddr); + tlb_op(TLB_V4_U_PAGE, "c8, c7, 1", kaddr); + tlb_op(TLB_V4_D_PAGE, "c8, c6, 1", kaddr); + tlb_op(TLB_V4_I_PAGE, "c8, c5, 1", kaddr); if (!tlb_flag(TLB_V4_I_PAGE) && tlb_flag(TLB_V4_I_FULL)) asm("mcr p15, 0, %0, c8, c5, 0" : : "r" (zero) : "cc"); - if (tlb_flag(TLB_V6_U_PAGE)) - asm("mcr p15, 0, %0, c8, c7, 1" : : "r" (kaddr) : "cc"); - if (tlb_flag(TLB_V6_D_PAGE)) - asm("mcr p15, 0, %0, c8, c6, 1" : : "r" (kaddr) : "cc"); - if (tlb_flag(TLB_V6_I_PAGE)) - asm("mcr p15, 0, %0, c8, c5, 1" : : "r" (kaddr) : "cc"); - if (tlb_flag(TLB_V7_UIS_PAGE)) - asm("mcr p15, 0, %0, c8, c3, 1" : : "r" (kaddr) : "cc"); + tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", kaddr); + tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", kaddr); + tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", kaddr); + tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 1", kaddr); if (tlb_flag(TLB_BARRIER)) { dsb(); @@ -475,13 +460,8 @@ static inline void flush_pmd_entry(pmd_t *pmd) { const unsigned int __tlb_flag = __cpu_tlb_flags; - if (tlb_flag(TLB_DCLEAN)) - asm("mcr p15, 0, %0, c7, c10, 1 @ flush_pmd" - : : "r" (pmd) : "cc"); - - if (tlb_flag(TLB_L2CLEAN_FR)) - asm("mcr p15, 1, %0, c15, c9, 1 @ L2 flush_pmd" - : : "r" (pmd) : "cc"); + tlb_op(TLB_DCLEAN, "c7, c10, 1 @ flush_pmd", pmd); + tlb_op(TLB_L2CLEAN_FR, "c15, c9, 1 @ L2 flush_pmd", pmd); if (tlb_flag(TLB_WB)) dsb(); @@ -491,15 +471,11 @@ static inline void clean_pmd_entry(pmd_t *pmd) { const unsigned int __tlb_flag = __cpu_tlb_flags; - if (tlb_flag(TLB_DCLEAN)) - asm("mcr p15, 0, %0, c7, c10, 1 @ flush_pmd" - : : "r" (pmd) : "cc"); - - if (tlb_flag(TLB_L2CLEAN_FR)) - asm("mcr p15, 1, %0, c15, c9, 1 @ L2 flush_pmd" - : : "r" (pmd) : "cc"); + tlb_op(TLB_DCLEAN, "c7, c10, 1 @ flush_pmd", pmd); + tlb_op(TLB_L2CLEAN_FR, "c15, c9, 1 @ L2 flush_pmd", pmd); } +#undef tlb_op #undef tlb_flag #undef always_tlb_flags #undef possible_tlb_flags