From patchwork Fri Jul 10 09:44:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhenyu Ye X-Patchwork-Id: 11656135 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E06C92A for ; Fri, 10 Jul 2020 09:44:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EE4FF206F4 for ; Fri, 10 Jul 2020 09:44:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE4FF206F4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 313298D0003; Fri, 10 Jul 2020 05:44:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2C2F98D0002; Fri, 10 Jul 2020 05:44:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D9058D0003; Fri, 10 Jul 2020 05:44:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id 061308D0002 for ; Fri, 10 Jul 2020 05:44:42 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id ACF2D180ACF17 for ; Fri, 10 Jul 2020 09:44:41 +0000 (UTC) X-FDA: 77021681562.23.soap92_59104eb26ecd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 863AE62143 for ; Fri, 10 Jul 2020 09:44:41 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yezhenyu2@huawei.com,,RULES_HIT:30054,0,RBL:45.249.212.190:@huawei.com:.lbl8.mailshell.net-64.95.201.95 62.18.2.100;04y8eby945xgewh5388wr5wdpdhp7ycqxkhdsoch6adntunwc5nipdx7or3fne5.9wpy4swntam9a1syfimxtkzdxddto7ppy8xkqgwc8heju157td81s4rzsbqz3w9.w-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: soap92_59104eb26ecd X-Filterd-Recvd-Size: 3490 Received: from huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Jul 2020 09:44:40 +0000 (UTC) Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id D939A8B8639AC74E3BA5; Fri, 10 Jul 2020 17:44:37 +0800 (CST) Received: from DESKTOP-KKJBAGG.china.huawei.com (10.174.186.75) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.487.0; Fri, 10 Jul 2020 17:44:29 +0800 From: Zhenyu Ye To: , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Date: Fri, 10 Jul 2020 17:44:19 +0800 Message-ID: <20200710094420.517-2-yezhenyu2@huawei.com> X-Mailer: git-send-email 2.22.0.windows.1 In-Reply-To: <20200710094420.517-1-yezhenyu2@huawei.com> References: <20200710094420.517-1-yezhenyu2@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.186.75] X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 863AE62143 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a range of input addresses. This patch detect this feature. Signed-off-by: Zhenyu Ye --- arch/arm64/include/asm/cpucaps.h | 3 ++- arch/arm64/include/asm/sysreg.h | 3 +++ arch/arm64/kernel/cpufeature.c | 10 ++++++++++ 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index d44ba903d11d..8fe4aa1d372b 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -63,7 +63,8 @@ #define ARM64_HAS_32BIT_EL1 53 #define ARM64_BTI 54 #define ARM64_HAS_ARMv8_4_TTL 55 +#define ARM64_HAS_TLBI_RANGE 56 -#define ARM64_NCAPS 56 +#define ARM64_NCAPS 57 #endif /* __ASM_CPUCAPS_H */ diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 8c209aa17273..a5f24a26d86a 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -617,6 +617,9 @@ #define ID_AA64ISAR0_SHA1_SHIFT 8 #define ID_AA64ISAR0_AES_SHIFT 4 +#define ID_AA64ISAR0_TLBI_RANGE_NI 0x0 +#define ID_AA64ISAR0_TLBI_RANGE 0x2 + /* id_aa64isar1 */ #define ID_AA64ISAR1_I8MM_SHIFT 52 #define ID_AA64ISAR1_DGH_SHIFT 48 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index e877f56ff1ab..ba0f0ce06fee 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2067,6 +2067,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .sign = FTR_UNSIGNED, }, #endif + { + .desc = "TLB range maintenance instruction", + .capability = ARM64_HAS_TLBI_RANGE, + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .matches = has_cpuid_feature, + .sys_reg = SYS_ID_AA64ISAR0_EL1, + .field_pos = ID_AA64ISAR0_TLB_SHIFT, + .sign = FTR_UNSIGNED, + .min_field_value = ID_AA64ISAR0_TLBI_RANGE, + }, {}, }; From patchwork Fri Jul 10 09:44:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhenyu Ye X-Patchwork-Id: 11656139 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F9C86C1 for ; Fri, 10 Jul 2020 09:44:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 68FF8206F4 for ; Fri, 10 Jul 2020 09:44:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 68FF8206F4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8BA328D0007; Fri, 10 Jul 2020 05:44:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 81CB88D0002; Fri, 10 Jul 2020 05:44:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 696168D0007; Fri, 10 Jul 2020 05:44:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0057.hostedemail.com [216.40.44.57]) by kanga.kvack.org (Postfix) with ESMTP id 50D698D0002 for ; Fri, 10 Jul 2020 05:44:43 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 16400181AEF0B for ; Fri, 10 Jul 2020 09:44:43 +0000 (UTC) X-FDA: 77021681646.30.jam46_5e0b24926ecd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id D9B9E180B3C95 for ; Fri, 10 Jul 2020 09:44:42 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,yezhenyu2@huawei.com,,RULES_HIT:30003:30054:30070,0,RBL:45.249.212.190:@huawei.com:.lbl8.mailshell.net-62.18.2.100 64.95.201.95;04yrshiqjx9h8rj8nheix71f3meswyc5mx1fbf3juaw5f1mpohh1mfgm37s6cep.94za5bw77mfnjirugs1bctjauedbsxinsy3rheds7qfr76menpzc8qsgp3p8rtr.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: jam46_5e0b24926ecd X-Filterd-Recvd-Size: 8454 Received: from huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Jul 2020 09:44:41 +0000 (UTC) Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id C9E85B7A7DBB6AA9B942; Fri, 10 Jul 2020 17:44:37 +0800 (CST) Received: from DESKTOP-KKJBAGG.china.huawei.com (10.174.186.75) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.487.0; Fri, 10 Jul 2020 17:44:30 +0800 From: Zhenyu Ye To: , , , , , , CC: , , , , , , , , , Subject: [PATCH v2 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Date: Fri, 10 Jul 2020 17:44:20 +0800 Message-ID: <20200710094420.517-3-yezhenyu2@huawei.com> X-Mailer: git-send-email 2.22.0.windows.1 In-Reply-To: <20200710094420.517-1-yezhenyu2@huawei.com> References: <20200710094420.517-1-yezhenyu2@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.186.75] X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: D9B9E180B3C95 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add __TLBI_VADDR_RANGE macro and rewrite __flush_tlb_range(). When cpu supports TLBI feature, the minimum range granularity is decided by 'scale', so we can not flush all pages by one instruction in some cases. For example, when the pages = 0xe81a, let's start 'scale' from maximum, and find right 'num' for each 'scale': 1. scale = 3, we can flush no pages because the minimum range is 2^(5*3 + 1) = 0x10000. 2. scale = 2, the minimum range is 2^(5*2 + 1) = 0x800, we can flush 0xe800 pages this time, the num = 0xe800/0x800 - 1 = 0x1c. Remaining pages is 0x1a; 3. scale = 1, the minimum range is 2^(5*1 + 1) = 0x40, no page can be flushed. 4. scale = 0, we flush the remaining 0x1a pages, the num = 0x1a/0x2 - 1 = 0xd. However, in most scenarios, the pages = 1 when flush_tlb_range() is called. Start from scale = 3 or other proper value (such as scale = ilog2(pages)), will incur extra overhead. So increase 'scale' from 0 to maximum, the flush order is exactly opposite to the example. Signed-off-by: Zhenyu Ye --- arch/arm64/include/asm/tlbflush.h | 138 +++++++++++++++++++++++------- 1 file changed, 109 insertions(+), 29 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index 39aed2efd21b..edfec8139ef8 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -60,6 +60,31 @@ __ta; \ }) +/* + * Get translation granule of the system, which is decided by + * PAGE_SIZE. Used by TTL. + * - 4KB : 1 + * - 16KB : 2 + * - 64KB : 3 + */ +#define TLBI_TTL_TG_4K 1 +#define TLBI_TTL_TG_16K 2 +#define TLBI_TTL_TG_64K 3 + +static inline unsigned long get_trans_granule(void) +{ + switch (PAGE_SIZE) { + case SZ_4K: + return TLBI_TTL_TG_4K; + case SZ_16K: + return TLBI_TTL_TG_16K; + case SZ_64K: + return TLBI_TTL_TG_64K; + default: + return 0; + } +} + /* * Level-based TLBI operations. * @@ -73,9 +98,6 @@ * in asm/stage2_pgtable.h. */ #define TLBI_TTL_MASK GENMASK_ULL(47, 44) -#define TLBI_TTL_TG_4K 1 -#define TLBI_TTL_TG_16K 2 -#define TLBI_TTL_TG_64K 3 #define __tlbi_level(op, addr, level) do { \ u64 arg = addr; \ @@ -83,19 +105,7 @@ if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) && \ level) { \ u64 ttl = level & 3; \ - \ - switch (PAGE_SIZE) { \ - case SZ_4K: \ - ttl |= TLBI_TTL_TG_4K << 2; \ - break; \ - case SZ_16K: \ - ttl |= TLBI_TTL_TG_16K << 2; \ - break; \ - case SZ_64K: \ - ttl |= TLBI_TTL_TG_64K << 2; \ - break; \ - } \ - \ + ttl |= get_trans_granule() << 2; \ arg &= ~TLBI_TTL_MASK; \ arg |= FIELD_PREP(TLBI_TTL_MASK, ttl); \ } \ @@ -108,6 +118,39 @@ __tlbi_level(op, (arg | USER_ASID_FLAG), level); \ } while (0) +/* + * This macro creates a properly formatted VA operand for the TLBI RANGE. + * The value bit assignments are: + * + * +----------+------+-------+-------+-------+----------------------+ + * | ASID | TG | SCALE | NUM | TTL | BADDR | + * +-----------------+-------+-------+-------+----------------------+ + * |63 48|47 46|45 44|43 39|38 37|36 0| + * + * The address range is determined by below formula: + * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE) + * + */ +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl) \ + ({ \ + unsigned long __ta = (addr) >> PAGE_SHIFT; \ + __ta &= GENMASK_ULL(36, 0); \ + __ta |= (unsigned long)(ttl & 3) << 37; \ + __ta |= (unsigned long)(num & 31) << 39; \ + __ta |= (unsigned long)(scale & 3) << 44; \ + __ta |= (get_trans_granule() & 3) << 46; \ + __ta |= (unsigned long)(asid) << 48; \ + __ta; \ + }) + +/* These macros are used by the TLBI RANGE feature. */ +#define __TLBI_RANGE_PAGES(num, scale) (((num) + 1) << (5 * (scale) + 1)) +#define MAX_TLBI_RANGE_PAGES __TLBI_RANGE_PAGES(31, 3) + +#define TLBI_RANGE_MASK GENMASK_ULL(4, 0) +#define __TLBI_RANGE_NUM(range, scale) \ + (((range) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) + /* * TLB Invalidation * ================ @@ -232,32 +275,69 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma, unsigned long stride, bool last_level, int tlb_level) { + int num = 0; + int scale = 0; unsigned long asid = ASID(vma->vm_mm); unsigned long addr; + unsigned long pages; start = round_down(start, stride); end = round_up(end, stride); + pages = (end - start) >> PAGE_SHIFT; - if ((end - start) >= (MAX_TLBI_OPS * stride)) { + if ((!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) && + (end - start) >= (MAX_TLBI_OPS * stride)) || + pages >= MAX_TLBI_RANGE_PAGES) { flush_tlb_mm(vma->vm_mm); return; } - /* Convert the stride into units of 4k */ - stride >>= 12; + dsb(ishst); - start = __TLBI_VADDR(start, asid); - end = __TLBI_VADDR(end, asid); + /* + * When cpu does not support TLBI RANGE feature, we flush the tlb + * entries one by one at the granularity of 'stride'. + * When cpu supports the TLBI RANGE feature, then: + * 1. If pages is odd, flush the first page through non-RANGE + * instruction; + * 2. For remaining pages: The minimum range granularity is decided + * by 'scale', so we can not flush all pages by one instruction + * in some cases. + * Here, we start from scale = 0, flush corresponding pages + * (from 2^(5*scale + 1) to 2^(5*(scale + 1) + 1)), and increase + * it until no pages left. + */ + while (pages > 0) { + if (!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) || + pages % 2 == 1) { + addr = __TLBI_VADDR(start, asid); + if (last_level) { + __tlbi_level(vale1is, addr, tlb_level); + __tlbi_user_level(vale1is, addr, tlb_level); + } else { + __tlbi_level(vae1is, addr, tlb_level); + __tlbi_user_level(vae1is, addr, tlb_level); + } + start += stride; + pages -= stride >> PAGE_SHIFT; + continue; + } - dsb(ishst); - for (addr = start; addr < end; addr += stride) { - if (last_level) { - __tlbi_level(vale1is, addr, tlb_level); - __tlbi_user_level(vale1is, addr, tlb_level); - } else { - __tlbi_level(vae1is, addr, tlb_level); - __tlbi_user_level(vae1is, addr, tlb_level); + num = __TLBI_RANGE_NUM(pages, scale) - 1; + if (num >= 0) { + addr = __TLBI_VADDR_RANGE(start, asid, scale, + num, tlb_level); + if (last_level) { + __tlbi(rvale1is, addr); + __tlbi_user(rvale1is, addr); + } else { + __tlbi(rvae1is, addr); + __tlbi_user(rvae1is, addr); + } + start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; + pages -= __TLBI_RANGE_PAGES(num, scale); } + scale++; } dsb(ish); }