Message ID | 20210613214728.1695340-1-yuzhao@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: mm: fix warning in arch_faults_on_old_pte() | expand |
On Sun, Jun 13, 2021 at 03:47:28PM -0600, Yu Zhao wrote: > cow_user_page() doesn't disable preemption, and it triggers the > warning in arch_faults_on_old_pte() when CONFIG_PREEMPT_COUNT=y. I'm surprised we haven't seen this warning until now. We probably haven't exercised this path with the right config options. > Converting the Access flag support to a system-wide feature to avoid > reading ID_AA64MMFR1_EL1 on local CPUs when determining the h/w cap. > > Note that though the Access flag support is a non-conflicting feature, > we require all late CPUs to have it if the boot CPU does. Otherwise > the feature won't be enabled regardless of the capabilities of late > CPUs. > > If there are h/w implementations that break this rule, they will have > to add errata, unless they can provide justifications to switch to the > less strict ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE. > > Signed-off-by: Yu Zhao <yuzhao@google.com> I'm ok in principle but one small issue below. > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index efed2830d141..afdb6e0336ed 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -1566,6 +1566,14 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap, > return true; > } > > +static void cpu_enable_hw_af(struct arm64_cpu_capabilities const *cap) > +{ > + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { > + u64 val = read_sysreg(tcr_el1); > + > + write_sysreg(val | TCR_HA, tcr_el1); > + } > +} This needs an isb(); local_flush_tlb_all() since this bit may be cached in the TLB. See how we did it in __cpu_enable_hw_dbm(). Alternatively you could leave the current TCR_AF bit setting as is (in proc.S) and only add an arm64_features[] entry for AF together with the arch_faults_on_old_pte() change but without any enable function. I'm not sure we have mixed CPUs where only some of them support AF to benefit from this.
On Mon, Jun 14, 2021 at 11:07 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Sun, Jun 13, 2021 at 03:47:28PM -0600, Yu Zhao wrote: > > cow_user_page() doesn't disable preemption, and it triggers the > > warning in arch_faults_on_old_pte() when CONFIG_PREEMPT_COUNT=y. > > I'm surprised we haven't seen this warning until now. We probably > haven't exercised this path with the right config options. Yeah, we only started hitting it recently -- it requires PFN-mapped PTEs, e.g., DAX in addition to CONFIG_PREEMPT_COUNT=y. I'll include this information in the next version. > > Converting the Access flag support to a system-wide feature to avoid > > reading ID_AA64MMFR1_EL1 on local CPUs when determining the h/w cap. > > > > Note that though the Access flag support is a non-conflicting feature, > > we require all late CPUs to have it if the boot CPU does. Otherwise > > the feature won't be enabled regardless of the capabilities of late > > CPUs. > > > > If there are h/w implementations that break this rule, they will have > > to add errata, unless they can provide justifications to switch to the > > less strict ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE. > > > > Signed-off-by: Yu Zhao <yuzhao@google.com> > > I'm ok in principle but one small issue below. > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > index efed2830d141..afdb6e0336ed 100644 > > --- a/arch/arm64/kernel/cpufeature.c > > +++ b/arch/arm64/kernel/cpufeature.c > > @@ -1566,6 +1566,14 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap, > > return true; > > } > > > > +static void cpu_enable_hw_af(struct arm64_cpu_capabilities const *cap) > > +{ > > + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { > > + u64 val = read_sysreg(tcr_el1); > > + > > + write_sysreg(val | TCR_HA, tcr_el1); > > + } > > +} > > This needs an isb(); local_flush_tlb_all() since this bit may be cached > in the TLB. See how we did it in __cpu_enable_hw_dbm(). Thanks. I'll add it. (I omitted it since I saw it as a best effort, not correctness related.) > Alternatively you could leave the current TCR_AF bit setting as is (in > proc.S) and only add an arm64_features[] entry for AF together with the > arch_faults_on_old_pte() change but without any enable function. > > I'm not sure we have mixed CPUs where only some of them support AF to > benefit from this. Can we mix CPU versions? For example, can we have A75 (v8.2 with AFDBM) + A53 (v8 w/o AFDBM) configuration? My understanding is that we can't. But I have not been able to find any confirmation on arm.com.
On Mon, Jun 14, 2021 at 12:35:06PM -0600, Yu Zhao wrote: > On Mon, Jun 14, 2021 at 11:07 AM Catalin Marinas > <catalin.marinas@arm.com> wrote: > > On Sun, Jun 13, 2021 at 03:47:28PM -0600, Yu Zhao wrote: > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > > index efed2830d141..afdb6e0336ed 100644 > > > --- a/arch/arm64/kernel/cpufeature.c > > > +++ b/arch/arm64/kernel/cpufeature.c > > > @@ -1566,6 +1566,14 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap, > > > return true; > > > } > > > > > > +static void cpu_enable_hw_af(struct arm64_cpu_capabilities const *cap) > > > +{ > > > + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { > > > + u64 val = read_sysreg(tcr_el1); > > > + > > > + write_sysreg(val | TCR_HA, tcr_el1); > > > + } > > > +} > > > > This needs an isb(); local_flush_tlb_all() since this bit may be cached > > in the TLB. See how we did it in __cpu_enable_hw_dbm(). > > Thanks. I'll add it. (I omitted it since I saw it as a best effort, > not correctness related.) The TLBs would eventually be flushed but a local TLBI during boot doesn't hurt performance, so I'd rather have it. > > Alternatively you could leave the current TCR_AF bit setting as is (in > > proc.S) and only add an arm64_features[] entry for AF together with the > > arch_faults_on_old_pte() change but without any enable function. > > > > I'm not sure we have mixed CPUs where only some of them support AF to > > benefit from this. > > Can we mix CPU versions? For example, can we have A75 (v8.2 with > AFDBM) + A53 (v8 w/o AFDBM) configuration? My understanding is that we > can't. But I have not been able to find any confirmation on arm.com. Oh, don't underestimate the hardware vendors ;). But if they do, with this patch they just lose the AF benefits which I don't really mind (together with other 8.2 features which are not present in 8.0).
On 13/06/2021 22:47, Yu Zhao wrote: > cow_user_page() doesn't disable preemption, and it triggers the > warning in arch_faults_on_old_pte() when CONFIG_PREEMPT_COUNT=y. > > Converting the Access flag support to a system-wide feature to avoid > reading ID_AA64MMFR1_EL1 on local CPUs when determining the h/w cap. > > Note that though the Access flag support is a non-conflicting feature, > we require all late CPUs to have it if the boot CPU does. Otherwise > the feature won't be enabled regardless of the capabilities of late > CPUs. > > If there are h/w implementations that break this rule, they will have > to add errata, unless they can provide justifications to switch to the > less strict ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE. > > Signed-off-by: Yu Zhao <yuzhao@google.com> > --- > arch/arm64/include/asm/cpufeature.h | 20 +++++++------------- > arch/arm64/include/asm/pgtable.h | 4 +--- > arch/arm64/kernel/cpufeature.c | 19 +++++++++++++++++++ > arch/arm64/mm/proc.S | 12 ------------ > arch/arm64/tools/cpucaps | 1 + > 5 files changed, 28 insertions(+), 28 deletions(-) > > diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h > index 338840c00e8e..c4336a374920 100644 > --- a/arch/arm64/include/asm/cpufeature.h > +++ b/arch/arm64/include/asm/cpufeature.h > @@ -763,6 +763,13 @@ static inline bool system_supports_tlb_range(void) > cpus_have_const_cap(ARM64_HAS_TLB_RANGE); > } > > +/* Check whether hardware update of the Access flag is supported. */ > +static inline bool system_has_hw_af(void) > +{ > + return IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && > + cpus_have_const_cap(ARM64_HW_AF); > +} > + > extern int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt); > > static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) > @@ -786,19 +793,6 @@ static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) > } > } > > -/* Check whether hardware update of the Access flag is supported */ > -static inline bool cpu_has_hw_af(void) > -{ > - u64 mmfr1; > - > - if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM)) > - return false; > - > - mmfr1 = read_cpuid(ID_AA64MMFR1_EL1); > - return cpuid_feature_extract_unsigned_field(mmfr1, > - ID_AA64MMFR1_HADBS_SHIFT); > -} > - > static inline bool cpu_has_pan(void) > { > u64 mmfr1 = read_cpuid(ID_AA64MMFR1_EL1); > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 0b10204e72fc..864a2fdeb559 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -982,9 +982,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma, > */ > static inline bool arch_faults_on_old_pte(void) > { > - WARN_ON(preemptible()); > - > - return !cpu_has_hw_af(); > + return !system_has_hw_af(); > } > #define arch_faults_on_old_pte arch_faults_on_old_pte > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index efed2830d141..afdb6e0336ed 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -1566,6 +1566,14 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap, > return true; > } > > +static void cpu_enable_hw_af(struct arm64_cpu_capabilities const *cap) > +{ > + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { You don't need this explicit check here. Since the cap is already ARM64_CPUCAP_SYSTEM_FEATURE, it is guaranteed that all CPUs have the capability, otherwise this wouldn't get called at all for any CPUs. Suzuki
On Tue, Jun 15, 2021 at 11:35:04AM +0100, Catalin Marinas wrote: > On Mon, Jun 14, 2021 at 12:35:06PM -0600, Yu Zhao wrote: > > On Mon, Jun 14, 2021 at 11:07 AM Catalin Marinas > > <catalin.marinas@arm.com> wrote: > > > On Sun, Jun 13, 2021 at 03:47:28PM -0600, Yu Zhao wrote: > > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > > > index efed2830d141..afdb6e0336ed 100644 > > > > --- a/arch/arm64/kernel/cpufeature.c > > > > +++ b/arch/arm64/kernel/cpufeature.c > > > > @@ -1566,6 +1566,14 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap, > > > > return true; > > > > } > > > > > > > > +static void cpu_enable_hw_af(struct arm64_cpu_capabilities const *cap) > > > > +{ > > > > + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { > > > > + u64 val = read_sysreg(tcr_el1); > > > > + > > > > + write_sysreg(val | TCR_HA, tcr_el1); > > > > + } > > > > +} > > > > > > This needs an isb(); local_flush_tlb_all() since this bit may be cached > > > in the TLB. See how we did it in __cpu_enable_hw_dbm(). > > > > Thanks. I'll add it. (I omitted it since I saw it as a best effort, > > not correctness related.) > > The TLBs would eventually be flushed but a local TLBI during boot > doesn't hurt performance, so I'd rather have it. > > > > Alternatively you could leave the current TCR_AF bit setting as is (in > > > proc.S) and only add an arm64_features[] entry for AF together with the > > > arch_faults_on_old_pte() change but without any enable function. > > > > > > I'm not sure we have mixed CPUs where only some of them support AF to > > > benefit from this. > > > > Can we mix CPU versions? For example, can we have A75 (v8.2 with > > AFDBM) + A53 (v8 w/o AFDBM) configuration? My understanding is that we > > can't. But I have not been able to find any confirmation on arm.com. > > Oh, don't underestimate the hardware vendors ;). But if they do, with > this patch they just lose the AF benefits which I don't really mind > (together with other 8.2 features which are not present in 8.0). This feels like walking on thin ice to me... it's only a matter of time before a big/little system is produced where AF is busted on one CPU type, and then we'll have to undo all of this as the system cap will break late CPU onlining if you boot on a CPU with working AF. Given that arch_faults_on_old_pte() is purely a performance hint, why don't we just set it based on the capabilities of the boot CPU instead? Will
On Sun, Jun 13, 2021 at 03:47:28PM -0600, Yu Zhao wrote: > cow_user_page() doesn't disable preemption, and it triggers the > warning in arch_faults_on_old_pte() when CONFIG_PREEMPT_COUNT=y. Doesn't kmap_atomic() disable preemption? How can I reproduce the warning? Will
On Tue, Jun 15, 2021 at 11:52:30AM +0100, Will Deacon wrote: > On Sun, Jun 13, 2021 at 03:47:28PM -0600, Yu Zhao wrote: > > cow_user_page() doesn't disable preemption, and it triggers the > > warning in arch_faults_on_old_pte() when CONFIG_PREEMPT_COUNT=y. > > Doesn't kmap_atomic() disable preemption? Good point, I missed that. I wonder whether there are additional patches on top that replace it with kmap_local_page(). I couldn't see any in -next.
On Tue, Jun 15, 2021 at 7:47 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Tue, Jun 15, 2021 at 11:52:30AM +0100, Will Deacon wrote: > > On Sun, Jun 13, 2021 at 03:47:28PM -0600, Yu Zhao wrote: > > > cow_user_page() doesn't disable preemption, and it triggers the > > > warning in arch_faults_on_old_pte() when CONFIG_PREEMPT_COUNT=y. > > > > Doesn't kmap_atomic() disable preemption? > > Good point, I missed that. I wonder whether there are additional patches > on top that replace it with kmap_local_page(). I couldn't see any in > -next. Duh, I have been using a bad backport which has arch_faults_on_old_pte() called before kmap_atomic(). Sorry for the false alarm!
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 338840c00e8e..c4336a374920 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -763,6 +763,13 @@ static inline bool system_supports_tlb_range(void) cpus_have_const_cap(ARM64_HAS_TLB_RANGE); } +/* Check whether hardware update of the Access flag is supported. */ +static inline bool system_has_hw_af(void) +{ + return IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && + cpus_have_const_cap(ARM64_HW_AF); +} + extern int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt); static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) @@ -786,19 +793,6 @@ static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) } } -/* Check whether hardware update of the Access flag is supported */ -static inline bool cpu_has_hw_af(void) -{ - u64 mmfr1; - - if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM)) - return false; - - mmfr1 = read_cpuid(ID_AA64MMFR1_EL1); - return cpuid_feature_extract_unsigned_field(mmfr1, - ID_AA64MMFR1_HADBS_SHIFT); -} - static inline bool cpu_has_pan(void) { u64 mmfr1 = read_cpuid(ID_AA64MMFR1_EL1); diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 0b10204e72fc..864a2fdeb559 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -982,9 +982,7 @@ static inline void update_mmu_cache(struct vm_area_struct *vma, */ static inline bool arch_faults_on_old_pte(void) { - WARN_ON(preemptible()); - - return !cpu_has_hw_af(); + return !system_has_hw_af(); } #define arch_faults_on_old_pte arch_faults_on_old_pte diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index efed2830d141..afdb6e0336ed 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1566,6 +1566,14 @@ static bool has_hw_dbm(const struct arm64_cpu_capabilities *cap, return true; } +static void cpu_enable_hw_af(struct arm64_cpu_capabilities const *cap) +{ + if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { + u64 val = read_sysreg(tcr_el1); + + write_sysreg(val | TCR_HA, tcr_el1); + } +} #endif #ifdef CONFIG_ARM64_AMU_EXTN @@ -2043,6 +2051,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_hw_dbm, .cpu_enable = cpu_enable_hw_dbm, }, + { + .desc = "Hardware update of the Access flag", + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .capability = ARM64_HW_AF, + .sys_reg = SYS_ID_AA64MMFR1_EL1, + .sign = FTR_UNSIGNED, + .field_pos = ID_AA64MMFR1_HADBS_SHIFT, + .min_field_value = 1, + .matches = has_cpuid_feature, + .cpu_enable = cpu_enable_hw_af, + }, #endif { .desc = "CRC32 instructions", diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 97d7bcd8d4f2..51ee6044748a 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -484,18 +484,6 @@ SYM_FUNC_START(__cpu_setup) * Set the IPS bits in TCR_EL1. */ tcr_compute_pa_size tcr, #TCR_IPS_SHIFT, x5, x6 -#ifdef CONFIG_ARM64_HW_AFDBM - /* - * Enable hardware update of the Access Flags bit. - * Hardware dirty bit management is enabled later, - * via capabilities. - */ - mrs x9, ID_AA64MMFR1_EL1 - and x9, x9, #0xf - cbz x9, 1f - orr tcr, tcr, #TCR_HA // hardware Access flag update -1: -#endif /* CONFIG_ARM64_HW_AFDBM */ msr mair_el1, mair msr tcr_el1, tcr /* diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index 21fbdda7086e..435f7ae1ac81 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -34,6 +34,7 @@ HAS_STAGE2_FWB HAS_SYSREG_GIC_CPUIF HAS_TLB_RANGE HAS_VIRT_HOST_EXTN +HW_AF HW_DBM KVM_PROTECTED_MODE MISMATCHED_CACHE_TYPE
cow_user_page() doesn't disable preemption, and it triggers the warning in arch_faults_on_old_pte() when CONFIG_PREEMPT_COUNT=y. Converting the Access flag support to a system-wide feature to avoid reading ID_AA64MMFR1_EL1 on local CPUs when determining the h/w cap. Note that though the Access flag support is a non-conflicting feature, we require all late CPUs to have it if the boot CPU does. Otherwise the feature won't be enabled regardless of the capabilities of late CPUs. If there are h/w implementations that break this rule, they will have to add errata, unless they can provide justifications to switch to the less strict ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE. Signed-off-by: Yu Zhao <yuzhao@google.com> --- arch/arm64/include/asm/cpufeature.h | 20 +++++++------------- arch/arm64/include/asm/pgtable.h | 4 +--- arch/arm64/kernel/cpufeature.c | 19 +++++++++++++++++++ arch/arm64/mm/proc.S | 12 ------------ arch/arm64/tools/cpucaps | 1 + 5 files changed, 28 insertions(+), 28 deletions(-)