Message ID | 20190917134228.5369-2-ionela.voinescu@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: ARMv8.4 Activity Monitors support | expand |
Hi Ionela, On Tue, Sep 17, 2019 at 02:42:25PM +0100, Ionela Voinescu wrote: > +#ifdef CONFIG_ARM64_AMU_EXTN > + > +/* > + * This per cpu variable only signals that the CPU implementation supports the > + * AMU but does not provide information regarding all the events that it > + * supports. > + * When this amu_feat per CPU variable is true, the user of this feature can > + * only rely on the presence of the 4 fixed counters. But this does not > + * guarantee that the counters are enabled or access to these counters is > + * provided by code executed at higher exception levels. > + */ > +DEFINE_PER_CPU(bool, amu_feat) = false; > + > +static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap) > +{ > + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { > + pr_info("detected CPU%d: Activity Monitors extension\n", > + smp_processor_id()); > + this_cpu_write(amu_feat, true); > + } > +} Sorry if I missed anything as I just skimmed through this series. I can't see the amu_feat used anywhere in these patches, so on its own it doesn't make much sense. I also can't see the advantage of allowing mismatched CPU implementations for this feature. What's the intended use-case? Thanks.
Hi Catalin, On 10/10/2019 18:20, Catalin Marinas wrote: > Hi Ionela, > > On Tue, Sep 17, 2019 at 02:42:25PM +0100, Ionela Voinescu wrote: >> +#ifdef CONFIG_ARM64_AMU_EXTN >> + >> +/* >> + * This per cpu variable only signals that the CPU implementation supports the >> + * AMU but does not provide information regarding all the events that it >> + * supports. >> + * When this amu_feat per CPU variable is true, the user of this feature can >> + * only rely on the presence of the 4 fixed counters. But this does not >> + * guarantee that the counters are enabled or access to these counters is >> + * provided by code executed at higher exception levels. >> + */ >> +DEFINE_PER_CPU(bool, amu_feat) = false; >> + >> +static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap) >> +{ >> + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { >> + pr_info("detected CPU%d: Activity Monitors extension\n", >> + smp_processor_id()); >> + this_cpu_write(amu_feat, true); >> + } >> +} > > Sorry if I missed anything as I just skimmed through this series. I > can't see the amu_feat used anywhere in these patches, so on its own it > doesn't make much sense. > No worries, you are correct, at the moment the per-cpu amu_feat is not yet used anywhere. But the intention is to use it to discover the feature at CPU level as some CPUs might implement AMU and some might not. I'll soon submit some patches using the counters for the scheduler's frequency invariance - to discover the frequency the CPUs are actually running at in case there is power or thermal mitigation happening outside of the OS. > I also can't see the advantage of allowing mismatched CPU > implementations for this feature. What's the intended use-case? > Just to clarify things, the intention is to allow a mix of CPUs where some of them implement AMU (v8.4 - V1) and some don't. The current implementation does not currently support a mix of CPUs with different implementations of AMU. Although that would be easy to add when we have a new version of AMU. The reason to allow a mix of CPUs with and without support for activity monitor counters is due to the fact that these counters are intended to reflect activity on a CPU, independent of other CPUs. Some of the counters might show common information to all CPUs (for example the constant counter that ticks at the frequency of the system timer), some of information might be common to a subset of CPUs (cycle counter will tick at the same rate for CPUs in the same frequency domain - except when WFI), and some will be fully specific to the CPU (instructions retired and memory stalls). This per-cpu information is useful whether or not all CPUs can provide this information. More practically, it's possible that we'll see big.LITTLE platforms where the big CPUs only will implement activity monitors and for those CPUs it will be useful to get more accurate information on the current frequency, for example, as power and thermal mitigation is more probable to happen in the power domain of the big CPUs. For this usecase and for others, it will be good for generic support to allow detection of the feature at a per-cpu level (this is where the usefulness of the per-cpu amu_feat comes in) while further checks and aggregation will be done when the counters are used, depending on the usecase. Thanks, Ionela. > Thanks. >
On Fri, Oct 11, 2019 at 11:31:40AM +0100, Ionela Voinescu wrote: > On 10/10/2019 18:20, Catalin Marinas wrote: > > On Tue, Sep 17, 2019 at 02:42:25PM +0100, Ionela Voinescu wrote: > >> +#ifdef CONFIG_ARM64_AMU_EXTN > >> + > >> +/* > >> + * This per cpu variable only signals that the CPU implementation supports the > >> + * AMU but does not provide information regarding all the events that it > >> + * supports. > >> + * When this amu_feat per CPU variable is true, the user of this feature can > >> + * only rely on the presence of the 4 fixed counters. But this does not > >> + * guarantee that the counters are enabled or access to these counters is > >> + * provided by code executed at higher exception levels. > >> + */ > >> +DEFINE_PER_CPU(bool, amu_feat) = false; > >> + > >> +static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap) > >> +{ > >> + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { > >> + pr_info("detected CPU%d: Activity Monitors extension\n", > >> + smp_processor_id()); > >> + this_cpu_write(amu_feat, true); > >> + } > >> +} > > > > Sorry if I missed anything as I just skimmed through this series. I > > can't see the amu_feat used anywhere in these patches, so on its own it > > doesn't make much sense. > > No worries, you are correct, at the moment the per-cpu amu_feat is not > yet used anywhere. But the intention is to use it to discover the > feature at CPU level as some CPUs might implement AMU and some might > not. > > I'll soon submit some patches using the counters for the scheduler's > frequency invariance - to discover the frequency the CPUs are actually > running at in case there is power or thermal mitigation happening > outside of the OS. Thanks for the explanation. I guess I'll wait for the other patches to see how all fits together. In general I'm not keen on per-CPU variables exposed to the rest of the kernel. For example, is it always read in a non-preemptible context? I'd rather have an accessor function with the corresponding WARN_ON_ONCE(preemptible()). This may come with the rest of the patches. > More practically, it's possible that we'll see big.LITTLE platforms > where the big CPUs only will implement activity monitors and for those > CPUs it will be useful to get more accurate information on the current > frequency, for example, as power and thermal mitigation is more > probable to happen in the power domain of the big CPUs. As long as that's a realistic possibility (not just a theoretical one) and the in-kernel code can handle such asymmetry, it's fine by me. This could be another reason to never expose the AMU counters to user-space or guests. You can control preemption in the kernel but can't run user-space in a non-preemptible context.
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 6b6362b83004..3bf7972d43c7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1410,6 +1410,33 @@ config ARM64_PTR_AUTH endmenu +menu "ARMv8.4 architectural features" + +config ARM64_AMU_EXTN + bool "Enable support for the Activity Monitors Unit CPU extension" + default y + help + The activity monitors extension is an optional extension introduced + by the ARMv8.4 CPU architecture. This enables support for version 1 + of the activity monitors architecture, AMUv1. + + To enable the use of this extension on CPUs that implement it, say Y. + + Note that for architectural reasons, firmware _must_ implement AMU + support when running on CPUs that present the activity monitors + extension. The required support is present in: + * Version 1.5 and later of the ARM Trusted Firmware + + For kernels that have this configuration enabled but boot with broken + firmware, you may need to say N here until the firmware is fixed. + Otherwise you may experience firmware panics or lockups when + accessing the counter registers. Even if you are not observing these + symptoms, the values returned by the register reads might not + correctly reflect reality. Most commonly, the value read will be 0, + indicating that the counter is not enabled. + +endmenu + config ARM64_SVE bool "ARM Scalable Vector Extension support" default y diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index f19fe4b9acc4..7f12ad4f69ad 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -52,7 +52,8 @@ #define ARM64_HAS_IRQ_PRIO_MASKING 42 #define ARM64_HAS_DCPODP 43 #define ARM64_WORKAROUND_1463225 44 +#define ARM64_HAS_AMU_EXTN 45 -#define ARM64_NCAPS 45 +#define ARM64_NCAPS 46 #endif /* __ASM_CPUCAPS_H */ diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 972d196c7714..f090a86a376d 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -382,6 +382,42 @@ #define SYS_TPIDR_EL0 sys_reg(3, 3, 13, 0, 2) #define SYS_TPIDRRO_EL0 sys_reg(3, 3, 13, 0, 3) +/* Definitions for system register interface to AMU for ARMv8.4 onwards */ +#define SYS_AM_EL0(crm, op2) sys_reg(3, 3, 13, crm, op2) +#define SYS_AMCR_EL0 SYS_AM_EL0(2, 0) +#define SYS_AMCFGR_EL0 SYS_AM_EL0(2, 1) +#define SYS_AMCGCR_EL0 SYS_AM_EL0(2, 2) +#define SYS_AMUSERENR_EL0 SYS_AM_EL0(2, 3) +#define SYS_AMCNTENCLR0_EL0 SYS_AM_EL0(2, 4) +#define SYS_AMCNTENSET0_EL0 SYS_AM_EL0(2, 5) +#define SYS_AMCNTENCLR1_EL0 SYS_AM_EL0(3, 0) +#define SYS_AMCNTENSET1_EL0 SYS_AM_EL0(3, 1) + +/* + * Group 0 of activity monitors (architected): + * op0 CRn op1 op2 CRm + * Counter: 11 1101 011 n<2:0> 010:n<3> + * Type: 11 1101 011 n<2:0> 011:n<3> + * n: 0-3 + * + * Group 1 of activity monitors (auxiliary): + * op0 CRn op1 op2 CRm + * Counter: 11 1101 011 n<2:0> 110:n<3> + * Type: 11 1101 011 n<2:0> 111:n<3> + * n: 0-15 + */ + +#define SYS_AMEVCNTR0_EL0(n) SYS_AM_EL0(4 + ((n) >> 3), (n) & 0x7) +#define SYS_AMEVTYPE0_EL0(n) SYS_AM_EL0(6 + ((n) >> 3), (n) & 0x7) +#define SYS_AMEVCNTR1_EL0(n) SYS_AM_EL0(12 + ((n) >> 3), (n) & 0x7) +#define SYS_AMEVTYPE1_EL0(n) SYS_AM_EL0(14 + ((n) >> 3), (n) & 0x7) + +/* V1: Fixed (architecturally defined) activity monitors */ +#define SYS_AMEVCNTR0_CORE_EL0 SYS_AMEVCNTR0_EL0(0) +#define SYS_AMEVCNTR0_CONST_EL0 SYS_AMEVCNTR0_EL0(1) +#define SYS_AMEVCNTR0_INST_RET_EL0 SYS_AMEVCNTR0_EL0(2) +#define SYS_AMEVCNTR0_MEM_STALL SYS_AMEVCNTR0_EL0(3) + #define SYS_CNTFRQ_EL0 sys_reg(3, 3, 14, 0, 0) #define SYS_CNTP_TVAL_EL0 sys_reg(3, 3, 14, 2, 0) @@ -577,6 +613,7 @@ #define ID_AA64PFR0_CSV3_SHIFT 60 #define ID_AA64PFR0_CSV2_SHIFT 56 #define ID_AA64PFR0_DIT_SHIFT 48 +#define ID_AA64PFR0_AMU_SHIFT 44 #define ID_AA64PFR0_SVE_SHIFT 32 #define ID_AA64PFR0_RAS_SHIFT 28 #define ID_AA64PFR0_GIC_SHIFT 24 @@ -587,6 +624,7 @@ #define ID_AA64PFR0_EL1_SHIFT 4 #define ID_AA64PFR0_EL0_SHIFT 0 +#define ID_AA64PFR0_AMU 0x1 #define ID_AA64PFR0_SVE 0x1 #define ID_AA64PFR0_RAS_V1 0x1 #define ID_AA64PFR0_FP_NI 0xf @@ -709,6 +747,12 @@ #define ID_AA64MMFR0_TGRAN16_NI 0x0 #define ID_AA64MMFR0_TGRAN16_SUPPORTED 0x1 +#define ID_PFR0_AMU_SHIFT 20 +#define ID_PFR0_STATE3_SHIFT 12 +#define ID_PFR0_STATE2_SHIFT 8 +#define ID_PFR0_STATE1_SHIFT 4 +#define ID_PFR0_STATE0_SHIFT 0 + #if defined(CONFIG_ARM64_4K_PAGES) #define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT #define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN4_SUPPORTED diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 9323bcc40a58..cce86c6a5d00 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -155,6 +155,7 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV3_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV2_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_DIT_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_AMU_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SVE), FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_SVE_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_RAS_SHIFT, 4, 0), @@ -308,10 +309,11 @@ static const struct arm64_ftr_bits ftr_id_mmfr4[] = { }; static const struct arm64_ftr_bits ftr_id_pfr0[] = { - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 12, 4, 0), /* State3 */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 8, 4, 0), /* State2 */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 4, 4, 0), /* State1 */ - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 0, 4, 0), /* State0 */ + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_AMU_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE3_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE2_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE1_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_PFR0_STATE0_SHIFT, 4, 0), ARM64_FTR_END, }; @@ -1143,6 +1145,49 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap, #endif +#ifdef CONFIG_ARM64_AMU_EXTN + +/* + * This per cpu variable only signals that the CPU implementation supports the + * AMU but does not provide information regarding all the events that it + * supports. + * When this amu_feat per CPU variable is true, the user of this feature can + * only rely on the presence of the 4 fixed counters. But this does not + * guarantee that the counters are enabled or access to these counters is + * provided by code executed at higher exception levels. + */ +DEFINE_PER_CPU(bool, amu_feat) = false; + +static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap) +{ + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { + pr_info("detected CPU%d: Activity Monitors extension\n", + smp_processor_id()); + this_cpu_write(amu_feat, true); + } +} + +static bool has_amu(const struct arm64_cpu_capabilities *cap, + int __unused) +{ + /* + * The AMU extension is a non-conflicting feature: the kernel can + * safely run a mix of CPUs with and without support for the + * activity monitors extension. + * Therefore, unconditionally enable the capability to allow + * any late CPU to use the feature. + * + * With this feature unconditionally enabled, the cpu_enable + * function will be called for all CPUs that match the criteria, + * including secondary and hotplugged, marking this feature as + * present on that respective CPU. The enable function will also + * print a detection message. + */ + + return true; +} +#endif + #ifdef CONFIG_ARM64_VHE static bool runs_at_el2(const struct arm64_cpu_capabilities *entry, int __unused) { @@ -1412,6 +1457,24 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .cpu_enable = cpu_clear_disr, }, #endif /* CONFIG_ARM64_RAS_EXTN */ +#ifdef CONFIG_ARM64_AMU_EXTN + { + /* + * The feature is enabled by default if CONFIG_ARM64_AMU_EXTN=y. + * Therefore, don't provide .desc as we don't want the detection + * message to be shown until at least one CPU is detected to + * support the feature. + */ + .capability = ARM64_HAS_AMU_EXTN, + .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE, + .matches = has_amu, + .sys_reg = SYS_ID_AA64PFR0_EL1, + .sign = FTR_UNSIGNED, + .field_pos = ID_AA64PFR0_AMU_SHIFT, + .min_field_value = ID_AA64PFR0_AMU, + .cpu_enable = cpu_amu_enable, + }, +#endif /* CONFIG_ARM64_AMU_EXTN */ { .desc = "Data cache clean to the PoU not required for I/D coherence", .capability = ARM64_HAS_CACHE_IDC,
The activity monitors extension is an optional extension introduced by the ARMv8.4 CPU architecture. This implements basic support for version 1 of the activity monitors architecture, AMUv1. This support includes: - Extension detection on each CPU (boot, secondary, hotplugged) - Register interface for AMU aarch64 registers - (while here) create defines for ID_PFR0_EL1 fields when adding the AMU field information. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> --- arch/arm64/Kconfig | 27 ++++++++++++ arch/arm64/include/asm/cpucaps.h | 3 +- arch/arm64/include/asm/sysreg.h | 44 ++++++++++++++++++++ arch/arm64/kernel/cpufeature.c | 71 ++++++++++++++++++++++++++++++-- 4 files changed, 140 insertions(+), 5 deletions(-)