From patchwork Sat Feb 24 12:09:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shanker Donthineni X-Patchwork-Id: 10240367 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 11738602DC for ; Sat, 24 Feb 2018 12:10:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E664129BAA for ; Sat, 24 Feb 2018 12:10:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D8F3929BAF; Sat, 24 Feb 2018 12:10:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 07FB029BAA for ; Sat, 24 Feb 2018 12:10:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=9AUh1k5HSn5u+hWahG5M3OJskRYveC6CMP5R+c/RrCU=; b=Npz dxpVf08jvBqpZP6qqMSKsjoYSENuNftScq4MaufHUZq2PYYt2WQSBrjDCwtxy+CV4skYFC4KaNBc2 jP/9p3rtaVRTDlalD8rjYkOLgTnSoQhPh4gxvy5+T04MZ1uwDhK32p/7POxRMOunYHKKNQJWbr2oS f7b330Rc0VrQrrEo9xUE9ma1JlTaGz0qjVvvTK0Q+q2r6ygasTbiJXmNtJwS3B3/6CP8LburQ/ja8 M3dAWsSRbjLjKH9SS7R5C+VGUnCKMx9m7r0MJ4Xlib3yhmouzNR6JRcdlMRuUkcUqhj9gWZKnI08/ dtPIsRL2dpDkw5DnJngHMQlUXv9ZFdw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.89 #1 (Red Hat Linux)) id 1epYf1-0005FY-8l; Sat, 24 Feb 2018 12:10:39 +0000 Received: from smtp.codeaurora.org ([198.145.29.96]) by bombadil.infradead.org with esmtps (Exim 4.89 #1 (Red Hat Linux)) id 1epYev-0005EV-O4 for linux-arm-kernel@lists.infradead.org; Sat, 24 Feb 2018 12:10:35 +0000 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 1A85660272; Sat, 24 Feb 2018 12:10:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1519474220; bh=dnOch0rE7KlZt6NQS/KjZDUqc0v8cjseLP/pgou4vWM=; h=From:To:Cc:Subject:Date:From; b=Do8TraEhPe12xdS24mRS2ADk7WR9cNmqboodtptoef0rqxcG9tbUcmnFYTSnG1u1h +xKTiI5rN9FCU/dzrrQi/lMYPjDg6vkoyOWo6YSuhGx2VYQxHnQQDndAD7830EvK6n mcnBTRdHvg50U+u6o+Ca7iXconeb8YmoUaDufxso= Received: from shankerd-ubuntu.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: shankerd@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id E046B60272; Sat, 24 Feb 2018 12:10:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1519474218; bh=dnOch0rE7KlZt6NQS/KjZDUqc0v8cjseLP/pgou4vWM=; h=From:To:Cc:Subject:Date:From; b=HDMsP0SGIbTIiITCm7B1Jlue5J0ZW4iHDMcCWpdnSF/ELsxHG423zD6KqsmSIPg7J cPS6k3QwOzj/iB2vVGDH+n8E7rZiD9ltzcuFpxcVEmvUQ13bztKf0T7EaQn3JEjzPY zfPX/eb9fRf4uRl2ABoeXJBuArJyRtwYPAkx6viw= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org E046B60272 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=shankerd@codeaurora.org From: Shanker Donthineni To: Will Deacon , Robin Murphy , Mark Rutland , linux-kernel , linux-arm-kernel , Catalin Marinas , kvmarm Subject: [PATCH v5] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC Date: Sat, 24 Feb 2018 06:09:53 -0600 Message-Id: <1519474193-25973-1-git-send-email-shankerd@codeaurora.org> X-Mailer: git-send-email 1.9.1 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180224_041033_845031_1C5E4D83 X-CRM114-Status: GOOD ( 17.26 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Marc Zyngier , Philip Elcan , Shanker Donthineni , Vikram Sethi MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP The DCache clean & ICache invalidation requirements for instructions to be data coherence are discoverable through new fields in CTR_EL0. The following two control bits DIC and IDC were defined for this purpose. No need to perform point of unification cache maintenance operations from software on systems where CPU caches are transparent. This patch optimize the three functions __flush_cache_user_range(), clean_dcache_area_pou() and invalidate_icache_range() if the hardware reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic in order to avoid the unnecessary overhead. CTR_EL0.DIC: Instruction cache invalidation requirements for instruction to data coherence. The meaning of this bit[29]. 0: Instruction cache invalidation to the point of unification is required for instruction to data coherence. 1: Instruction cache cleaning to the point of unification is not required for instruction to data coherence. CTR_EL0.IDC: Data cache clean requirements for instruction to data coherence. The meaning of this bit[28]. 0: Data cache clean to the point of unification is required for instruction to data coherence, unless CLIDR_EL1.LoC == 0b000 or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000). 1: Data cache clean to the point of unification is not required for instruction to data coherence. Signed-off-by: Philip Elcan Signed-off-by: Shanker Donthineni --- Changes since v4: -Moved patching ARM64_HAS_CACHE_DIC inside invalidate_icache_by_line -Removed 'dsb ishst' for ARM64_HAS_CACHE_DIC as Mark suggested. Changes since v3: -Added preprocessor guard CONFIG_xxx to code snippets in cache.S -Changed barrier attributes from ISH to ISHST. Changes since v2: -Included barriers, DSB/ISB with DIC set, and DSB with IDC set. -Single Kconfig option. Changes since v1: -Reworded commit text. -Used the alternatives framework as Catalin suggested. -Rebased on top of https://patchwork.kernel.org/patch/10227927/ arch/arm64/Kconfig | 12 ++++++++++++ arch/arm64/include/asm/assembler.h | 6 ++++++ arch/arm64/include/asm/cache.h | 5 +++++ arch/arm64/include/asm/cpucaps.h | 4 +++- arch/arm64/kernel/cpufeature.c | 40 ++++++++++++++++++++++++++++++++------ arch/arm64/mm/cache.S | 13 +++++++++++++ 6 files changed, 73 insertions(+), 7 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index f55fe5b..82b8053 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN and access the new registers if the system supports the extension. Platform RAS features may additionally depend on firmware support. +config ARM64_SKIP_CACHE_POU + bool "Enable support to skip cache POU operations" + default y + help + Explicit point of unification cache operations can be eliminated + in software if the hardware handles transparently. The new bits in + CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware + capabilities of ICache and DCache POU requirements. + + Selecting this feature will allow the kernel to optimize the POU + cache maintaince operations where it requires 'D{I}C C{I}VAU' + endmenu config ARM64_SVE diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 3c78835..39f2274 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -444,6 +444,11 @@ * Corrupts: tmp1, tmp2 */ .macro invalidate_icache_by_line start, end, tmp1, tmp2, label +#ifdef CONFIG_ARM64_SKIP_CACHE_POU +alternative_if ARM64_HAS_CACHE_DIC + b 9996f +alternative_else_nop_endif +#endif icache_line_size \tmp1, \tmp2 sub \tmp2, \tmp1, #1 bic \tmp2, \start, \tmp2 @@ -453,6 +458,7 @@ cmp \tmp2, \end b.lo 9997b dsb ish +9996: isb .endm diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h index ea9bb4e..e22178b 100644 --- a/arch/arm64/include/asm/cache.h +++ b/arch/arm64/include/asm/cache.h @@ -20,8 +20,13 @@ #define CTR_L1IP_SHIFT 14 #define CTR_L1IP_MASK 3 +#define CTR_DMLINE_SHIFT 16 +#define CTR_ERG_SHIFT 20 #define CTR_CWG_SHIFT 24 #define CTR_CWG_MASK 15 +#define CTR_IDC_SHIFT 28 +#define CTR_DIC_SHIFT 29 +#define CTR_B31_SHIFT 31 #define CTR_L1IP(ctr) (((ctr) >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK) diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index bb26382..8dd42ae 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -45,7 +45,9 @@ #define ARM64_HARDEN_BRANCH_PREDICTOR 24 #define ARM64_HARDEN_BP_POST_GUEST_EXIT 25 #define ARM64_HAS_RAS_EXTN 26 +#define ARM64_HAS_CACHE_IDC 27 +#define ARM64_HAS_CACHE_DIC 28 -#define ARM64_NCAPS 27 +#define ARM64_NCAPS 29 #endif /* __ASM_CPUCAPS_H */ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index ff8a6e9..aea1484 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -199,12 +199,12 @@ static int __init register_cpu_hwcaps_dumper(void) }; static const struct arm64_ftr_bits ftr_ctr[] = { - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1), /* RES1 */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 29, 1, 1), /* DIC */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 28, 1, 1), /* IDC */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, 24, 4, 0), /* CWG */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, 20, 4, 0), /* ERG */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1), /* DminLine */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, CTR_B31_SHIFT, 1, 1), /* RES1 */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_DIC_SHIFT, 1, 1), /* DIC */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_IDC_SHIFT, 1, 1), /* IDC */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, CTR_CWG_SHIFT, 4, 0), /* CWG */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, CTR_ERG_SHIFT, 4, 0), /* ERG */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_DMLINE_SHIFT, 4, 1), /* DminLine */ /* * Linux can handle differing I-cache policies. Userspace JITs will * make use of *minLine. @@ -864,6 +864,20 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus ID_AA64PFR0_FP_SHIFT) < 0; } +#ifdef CONFIG_ARM64_SKIP_CACHE_POU +static bool has_cache_idc(const struct arm64_cpu_capabilities *entry, + int __unused) +{ + return (read_sanitised_ftr_reg(SYS_CTR_EL0) & (1UL << CTR_IDC_SHIFT)); +} + +static bool has_cache_dic(const struct arm64_cpu_capabilities *entry, + int __unused) +{ + return (read_sanitised_ftr_reg(SYS_CTR_EL0) & (1UL << CTR_DIC_SHIFT)); +} +#endif + #ifdef CONFIG_UNMAP_KERNEL_AT_EL0 static int __kpti_forced; /* 0: not forced, >0: forced on, <0: forced off */ @@ -1100,6 +1114,20 @@ static int cpu_copy_el2regs(void *__unused) .enable = cpu_clear_disr, }, #endif /* CONFIG_ARM64_RAS_EXTN */ +#ifdef CONFIG_ARM64_SKIP_CACHE_POU + { + .desc = "Skip D-Cache maintenance 'DC CVAU' (CTR_EL0.IDC=1)", + .capability = ARM64_HAS_CACHE_IDC, + .def_scope = SCOPE_SYSTEM, + .matches = has_cache_idc, + }, + { + .desc = "Skip I-Cache maintenance 'IC IVAU' (CTR_EL0.DIC=1)", + .capability = ARM64_HAS_CACHE_DIC, + .def_scope = SCOPE_SYSTEM, + .matches = has_cache_dic, + }, +#endif /* CONFIG_ARM64_SKIP_CACHE_POU */ {}, }; diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S index 758bde7..d8d7a32 100644 --- a/arch/arm64/mm/cache.S +++ b/arch/arm64/mm/cache.S @@ -50,6 +50,12 @@ ENTRY(flush_icache_range) */ ENTRY(__flush_cache_user_range) uaccess_ttbr0_enable x2, x3, x4 +#ifdef CONFIG_ARM64_SKIP_CACHE_POU +alternative_if ARM64_HAS_CACHE_IDC + dsb ishst + b 8f +alternative_else_nop_endif +#endif dcache_line_size x2, x3 sub x3, x2, #1 bic x4, x0, x3 @@ -60,6 +66,7 @@ user_alt 9f, "dc cvau, x4", "dc civac, x4", ARM64_WORKAROUND_CLEAN_CACHE b.lo 1b dsb ish +8: invalidate_icache_by_line x0, x1, x2, x3, 9f mov x0, #0 1: @@ -116,6 +123,12 @@ ENDPIPROC(__flush_dcache_area) * - size - size in question */ ENTRY(__clean_dcache_area_pou) +#ifdef CONFIG_ARM64_SKIP_CACHE_POU +alternative_if ARM64_HAS_CACHE_IDC + dsb ishst + ret +alternative_else_nop_endif +#endif dcache_by_line_op cvau, ish, x0, x1, x2, x3 ret ENDPROC(__clean_dcache_area_pou)