Message ID | 1519311090-19998-1-git-send-email-shankerd@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Feb 22, 2018 at 08:51:30AM -0600, Shanker Donthineni wrote: > +#define CTR_B31_SHIFT 31 Since this is just a RES1 bit, I think we don't need a mnemonic for it, but I'll defer to Will and Catalin on that. > ENTRY(invalidate_icache_range) > +#ifdef CONFIG_ARM64_SKIP_CACHE_POU > +alternative_if ARM64_HAS_CACHE_DIC > + mov x0, xzr > + dsb ishst > + isb > + ret > +alternative_else_nop_endif > +#endif As commented on v3, I don't believe you need the DSB here. If prior stores haven't been completed at this point, the existing implementation would not work correctly here. Otherwise, this looks ok to me. Thanks, Mark.
[Apologies to keep elbowing in, and if I'm being thick here...] On 22/02/18 15:22, Mark Rutland wrote: > On Thu, Feb 22, 2018 at 08:51:30AM -0600, Shanker Donthineni wrote: >> +#define CTR_B31_SHIFT 31 > > Since this is just a RES1 bit, I think we don't need a mnemonic for it, > but I'll defer to Will and Catalin on that. > >> ENTRY(invalidate_icache_range) >> +#ifdef CONFIG_ARM64_SKIP_CACHE_POU >> +alternative_if ARM64_HAS_CACHE_DIC >> + mov x0, xzr >> + dsb ishst >> + isb >> + ret >> +alternative_else_nop_endif >> +#endif > > As commented on v3, I don't believe you need the DSB here. If prior > stores haven't been completed at this point, the existing implementation > would not work correctly here. True in terms of ordering between stores prior to entry and the IC IVAU itself, but what about the DSH ISH currently issued *after* the IC IVAU before returning? Is provably impossible that existing callers might be relying on that ordering *anything*, or would we risk losing something subtle by effectively removing it? Robin.
On Thu, Feb 22, 2018 at 04:28:03PM +0000, Robin Murphy wrote: > [Apologies to keep elbowing in, and if I'm being thick here...] > > On 22/02/18 15:22, Mark Rutland wrote: > > On Thu, Feb 22, 2018 at 08:51:30AM -0600, Shanker Donthineni wrote: > > > +#define CTR_B31_SHIFT 31 > > > > Since this is just a RES1 bit, I think we don't need a mnemonic for it, > > but I'll defer to Will and Catalin on that. > > > > > ENTRY(invalidate_icache_range) > > > +#ifdef CONFIG_ARM64_SKIP_CACHE_POU > > > +alternative_if ARM64_HAS_CACHE_DIC > > > + mov x0, xzr > > > + dsb ishst > > > + isb > > > + ret > > > +alternative_else_nop_endif > > > +#endif > > > > As commented on v3, I don't believe you need the DSB here. If prior > > stores haven't been completed at this point, the existing implementation > > would not work correctly here. > > True in terms of ordering between stores prior to entry and the IC IVAU > itself, but what about the DSH ISH currently issued *after* the IC IVAU > before returning? Is provably impossible that existing callers might be > relying on that ordering *anything*, or would we risk losing something > subtle by effectively removing it? AFAIK, the only caller of this is KVM, before page table updates occur to add execute permissions to the page this is applied to. At least in that case, I do not beleive there would be breakage. If we're worried about subtleties in callers, then we'd need to stick with DSB ISH rather than optimising to DSH ISHST. Thanks, Mark.
On 22/02/18 16:33, Mark Rutland wrote: > On Thu, Feb 22, 2018 at 04:28:03PM +0000, Robin Murphy wrote: >> [Apologies to keep elbowing in, and if I'm being thick here...] >> >> On 22/02/18 15:22, Mark Rutland wrote: >>> On Thu, Feb 22, 2018 at 08:51:30AM -0600, Shanker Donthineni wrote: >>>> +#define CTR_B31_SHIFT 31 >>> >>> Since this is just a RES1 bit, I think we don't need a mnemonic for it, >>> but I'll defer to Will and Catalin on that. >>> >>>> ENTRY(invalidate_icache_range) >>>> +#ifdef CONFIG_ARM64_SKIP_CACHE_POU >>>> +alternative_if ARM64_HAS_CACHE_DIC >>>> + mov x0, xzr >>>> + dsb ishst >>>> + isb >>>> + ret >>>> +alternative_else_nop_endif >>>> +#endif >>> >>> As commented on v3, I don't believe you need the DSB here. If prior >>> stores haven't been completed at this point, the existing implementation >>> would not work correctly here. >> >> True in terms of ordering between stores prior to entry and the IC IVAU >> itself, but what about the DSH ISH currently issued *after* the IC IVAU >> before returning? Is provably impossible that existing callers might be >> relying on that ordering *anything*, or would we risk losing something >> subtle by effectively removing it? > > AFAIK, the only caller of this is KVM, before page table updates occur > to add execute permissions to the page this is applied to. > > At least in that case, I do not beleive there would be breakage. > > If we're worried about subtleties in callers, then we'd need to stick > with DSB ISH rather than optimising to DSH ISHST. Hmm, I probably am just squawking needlessly. It is indeed hard to imagine how callers could be relying on the invalidating the I-cache for ordering unless doing something unreasonably stupid, and if the current caller is clearly OK then there should be nothing to worry about. This *has* helped me realise that I was indeed being somewhat thick before, because the existing barrier is of course not about memory ordering per se, but about completing the maintenance operation. Hooray for overloaded semantics... On a different track, I'm now wondering whether the extra complexity of these alternatives might justify removing some obvious duplication and letting __flush_cache_user_range() branch directly into invalidate_icache_range(), or might that adversely affect the user fault fixup path? Robin.
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index f55fe5b..82b8053 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN and access the new registers if the system supports the extension. Platform RAS features may additionally depend on firmware support. +config ARM64_SKIP_CACHE_POU + bool "Enable support to skip cache POU operations" + default y + help + Explicit point of unification cache operations can be eliminated + in software if the hardware handles transparently. The new bits in + CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware + capabilities of ICache and DCache POU requirements. + + Selecting this feature will allow the kernel to optimize the POU + cache maintaince operations where it requires 'D{I}C C{I}VAU' + endmenu config ARM64_SVE diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h index ea9bb4e..e22178b 100644 --- a/arch/arm64/include/asm/cache.h +++ b/arch/arm64/include/asm/cache.h @@ -20,8 +20,13 @@ #define CTR_L1IP_SHIFT 14 #define CTR_L1IP_MASK 3 +#define CTR_DMLINE_SHIFT 16 +#define CTR_ERG_SHIFT 20 #define CTR_CWG_SHIFT 24 #define CTR_CWG_MASK 15 +#define CTR_IDC_SHIFT 28 +#define CTR_DIC_SHIFT 29 +#define CTR_B31_SHIFT 31 #define CTR_L1IP(ctr) (((ctr) >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK) diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index bb26382..8dd42ae 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -45,7 +45,9 @@ #define ARM64_HARDEN_BRANCH_PREDICTOR 24 #define ARM64_HARDEN_BP_POST_GUEST_EXIT 25 #define ARM64_HAS_RAS_EXTN 26 +#define ARM64_HAS_CACHE_IDC 27 +#define ARM64_HAS_CACHE_DIC 28 -#define ARM64_NCAPS 27 +#define ARM64_NCAPS 29 #endif /* __ASM_CPUCAPS_H */ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index ff8a6e9..c0b0db0 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -199,12 +199,12 @@ static int __init register_cpu_hwcaps_dumper(void) }; static const struct arm64_ftr_bits ftr_ctr[] = { - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1), /* RES1 */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 29, 1, 1), /* DIC */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 28, 1, 1), /* IDC */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, 24, 4, 0), /* CWG */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, 20, 4, 0), /* ERG */ - ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1), /* DminLine */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, CTR_B31_SHIFT, 1, 1), /* RES1 */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_DIC_SHIFT, 1, 1), /* DIC */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_IDC_SHIFT, 1, 1), /* IDC */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, CTR_CWG_SHIFT, 4, 0), /* CWG */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, CTR_ERG_SHIFT, 4, 0), /* ERG */ + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_DMLINE_SHIFT, 4, 1), /* DminLine */ /* * Linux can handle differing I-cache policies. Userspace JITs will * make use of *minLine. @@ -864,6 +864,20 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus ID_AA64PFR0_FP_SHIFT) < 0; } +#ifdef CONFIG_ARM64_SKIP_CACHE_POU +static bool has_cache_idc(const struct arm64_cpu_capabilities *entry, + int __unused) +{ + return (read_sanitised_ftr_reg(SYS_CTR_EL0) & (1UL << CTR_IDC_SHIFT)); +} + +static bool has_cache_dic(const struct arm64_cpu_capabilities *entry, + int __unused) +{ + return (read_sanitised_ftr_reg(SYS_CTR_EL0) & (1UL << CTR_DIC_SHIFT)); +} +#endif + #ifdef CONFIG_UNMAP_KERNEL_AT_EL0 static int __kpti_forced; /* 0: not forced, >0: forced on, <0: forced off */ @@ -1100,6 +1114,20 @@ static int cpu_copy_el2regs(void *__unused) .enable = cpu_clear_disr, }, #endif /* CONFIG_ARM64_RAS_EXTN */ +#ifdef CONFIG_ARM64_SKIP_CACHE_POU + { + .desc = "Skip D-Cache maintenance 'CVAU' (CTR_EL0.IDC=1)", + .capability = ARM64_HAS_CACHE_IDC, + .def_scope = SCOPE_SYSTEM, + .matches = has_cache_idc, + }, + { + .desc = "Skip I-Cache maintenance 'IVAU' (CTR_EL0.DIC=1)", + .capability = ARM64_HAS_CACHE_DIC, + .def_scope = SCOPE_SYSTEM, + .matches = has_cache_dic, + }, +#endif /* CONFIG_ARM64_SKIP_CACHE_POU */ {}, }; diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S index 758bde7..ffba5cc 100644 --- a/arch/arm64/mm/cache.S +++ b/arch/arm64/mm/cache.S @@ -50,6 +50,12 @@ ENTRY(flush_icache_range) */ ENTRY(__flush_cache_user_range) uaccess_ttbr0_enable x2, x3, x4 +#ifdef CONFIG_ARM64_SKIP_CACHE_POU +alternative_if ARM64_HAS_CACHE_IDC + dsb ishst + b 7f +alternative_else_nop_endif +#endif dcache_line_size x2, x3 sub x3, x2, #1 bic x4, x0, x3 @@ -60,8 +66,15 @@ user_alt 9f, "dc cvau, x4", "dc civac, x4", ARM64_WORKAROUND_CLEAN_CACHE b.lo 1b dsb ish +7: +#ifdef CONFIG_ARM64_SKIP_CACHE_POU +alternative_if ARM64_HAS_CACHE_DIC + isb + b 8f +alternative_else_nop_endif +#endif invalidate_icache_by_line x0, x1, x2, x3, 9f - mov x0, #0 +8: mov x0, #0 1: uaccess_ttbr0_disable x1, x2 ret @@ -80,6 +93,14 @@ ENDPROC(__flush_cache_user_range) * - end - virtual end address of region */ ENTRY(invalidate_icache_range) +#ifdef CONFIG_ARM64_SKIP_CACHE_POU +alternative_if ARM64_HAS_CACHE_DIC + mov x0, xzr + dsb ishst + isb + ret +alternative_else_nop_endif +#endif uaccess_ttbr0_enable x2, x3, x4 invalidate_icache_by_line x0, x1, x2, x3, 2f @@ -116,6 +137,12 @@ ENDPIPROC(__flush_dcache_area) * - size - size in question */ ENTRY(__clean_dcache_area_pou) +#ifdef CONFIG_ARM64_SKIP_CACHE_POU +alternative_if ARM64_HAS_CACHE_IDC + dsb ishst + ret +alternative_else_nop_endif +#endif dcache_by_line_op cvau, ish, x0, x1, x2, x3 ret ENDPROC(__clean_dcache_area_pou)