Message ID | 20210608221012.223696-3-Smita.KoralahalliChannabasappa@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/mce: Support extended MCA_ADDR address on SMCA systems | expand |
On Tue, Jun 08, 2021 at 05:10:12PM -0500, Smita Koralahalli wrote: > diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c > index f71435e53cdb..480a497877e2 100644 > --- a/arch/x86/kernel/cpu/mce/amd.c > +++ b/arch/x86/kernel/cpu/mce/amd.c > @@ -204,6 +204,12 @@ EXPORT_SYMBOL_GPL(smca_banks); > #define MAX_MCATYPE_NAME_LEN 30 > static char buf_mcatype[MAX_MCATYPE_NAME_LEN]; > > +struct smca_config { > + __u64 lsb_in_status : 1, > + __reserved_0 : 63; > +}; > +static DEFINE_PER_CPU_READ_MOSTLY(struct smca_config[MAX_NR_BANKS], smca_cfg); Per CPU and per bank, huh? For a single bit? Even if we have static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank[MAX_NR_BANKS], mce_banks_array); already?
On 6/10/21 6:55 AM, Borislav Petkov wrote: > On Tue, Jun 08, 2021 at 05:10:12PM -0500, Smita Koralahalli wrote: >> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c >> index f71435e53cdb..480a497877e2 100644 >> --- a/arch/x86/kernel/cpu/mce/amd.c >> +++ b/arch/x86/kernel/cpu/mce/amd.c >> @@ -204,6 +204,12 @@ EXPORT_SYMBOL_GPL(smca_banks); >> #define MAX_MCATYPE_NAME_LEN 30 >> static char buf_mcatype[MAX_MCATYPE_NAME_LEN]; >> >> +struct smca_config { >> + __u64 lsb_in_status : 1, >> + __reserved_0 : 63; >> +}; >> +static DEFINE_PER_CPU_READ_MOSTLY(struct smca_config[MAX_NR_BANKS], smca_cfg); > Per CPU and per bank, huh? For a single bit? > > Even if we have > > static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank[MAX_NR_BANKS], mce_banks_array); > > already? The idea of defining a new struct was to keep SMCA specific stuff separate. Thought, it would be costly to include in existing struct mce_bank[] as it will be unnecessarily defined for each cpu and each bank across all vendors even if they aren't using it and would be a problem if they are constraint on resource and space. Also, in the future we can use this newly defined struct smca_config[] to cache other MCA_CONFIG feature bits for different use cases if they are per bank and per cpu. I understand its unnecessary overhead atleast now, to just have a new struct per cpu per bank for a single bit in which case I can refrain defining a new one and include it in the existing struct. Let me know what do you think? Thanks, Smita >
On Thu, Jun 10, 2021 at 10:36:44PM -0500, Smita Koralahalli Channabasappa wrote: > The idea of defining a new struct was to keep SMCA specific stuff separate. > Thought, it would be costly to include in existing struct mce_bank[] as it will be > unnecessarily defined for each cpu and each bank across all vendors even if they > aren't using it and would be a problem if they are constraint on resource and space. That's very considerate of you to think about the other vendors - I wish everyone would do that... However, our mce_banks_array is defined unconditionally on all vendors already. So it is there even now. So I wouldn't lose a single second of sleep about adding an u64 bitfield there. > Also, in the future we can use this newly defined struct smca_config[] to cache > other MCA_CONFIG feature bits for different use cases if they are per bank and per > cpu. You can use other bits in that bitfield. I hope 64 are enough. :) HTH.
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 0a1c7224a582..33c5e77cf924 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -358,6 +358,7 @@ extern int mce_threshold_remove_device(unsigned int cpu); void mce_amd_feature_init(struct cpuinfo_x86 *c); int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr); void smca_extract_err_addr(struct mce *m); +void smca_feature_init(void); #else @@ -368,6 +369,7 @@ static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { } static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) { return -EINVAL; }; static inline void smca_extract_err_addr(struct mce *m) { } +static inline void smca_feature_init(void) { } #endif static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); } diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index f71435e53cdb..480a497877e2 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -204,6 +204,12 @@ EXPORT_SYMBOL_GPL(smca_banks); #define MAX_MCATYPE_NAME_LEN 30 static char buf_mcatype[MAX_MCATYPE_NAME_LEN]; +struct smca_config { + __u64 lsb_in_status : 1, + __reserved_0 : 63; +}; +static DEFINE_PER_CPU_READ_MOSTLY(struct smca_config[MAX_NR_BANKS], smca_cfg); + static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks); /* @@ -901,9 +907,26 @@ bool amd_mce_is_memory_error(struct mce *m) void smca_extract_err_addr(struct mce *m) { - u8 lsb = (m->addr >> 56) & 0x3f; + if (this_cpu_ptr(smca_cfg)[m->bank].lsb_in_status) { + u8 lsb = (m->status >> 24) & 0x3f; + + m->addr &= GENMASK_ULL(56, lsb); + } else { + u8 lsb = (m->addr >> 56) & 0x3f; + + m->addr &= GENMASK_ULL(55, lsb); + } +} + +void smca_feature_init(void) +{ + unsigned int bank; + u64 mca_cfg; - m->addr &= GENMASK_ULL(55, lsb); + for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) { + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(bank), mca_cfg); + this_cpu_ptr(smca_cfg)[bank].lsb_in_status = !!(mca_cfg & BIT(8)); + } } static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc) @@ -920,10 +943,6 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc) if (m.status & MCI_STATUS_ADDRV) { m.addr = addr; - /* - * Extract [55:<lsb>] where lsb is the least significant - * *valid* bit of the address bits. - */ if (mce_flags.smca) smca_extract_err_addr(&m); } diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 2c09c1eec50a..ce33006e42f8 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -699,10 +699,6 @@ static void mce_read_aux(struct mce *m, int i) m->addr <<= shift; } - /* - * Extract [55:<lsb>] where lsb is the least significant - * *valid* bit of the address bits. - */ if (mce_flags.smca) smca_extract_err_addr(m); } @@ -1839,6 +1835,8 @@ static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c) msr_ops.status = smca_status_reg; msr_ops.addr = smca_addr_reg; msr_ops.misc = smca_misc_reg; + + smca_feature_init(); } } }
Newer AMD processors such as AMD 'Milan' support more physical address bits. That is the MCA_ADDR registers on Scalable MCA systems contain the ErrorAddr in bits [56:0] instead of [55:0]. Hence the existing LSB field from bits [61:56] in MCA_ADDR must be moved around to accommodate the larger ErrorAddr size. MCA_CONFIG[McaLsbInStatusSupported] indicates this change. If set, the LSB field will be found in MCA_STATUS rather than MCA_ADDR. Each logical CPU has unique MCA bank in hardware and is not shared with other logical CPUs. Additionally on SMCA systems, each feature bit may be different for each bank within same logical CPU. Check for MCA_CONFIG[McaLsbInStatusSupported] for each MCA bank and for each CPU. Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> --- arch/x86/include/asm/mce.h | 2 ++ arch/x86/kernel/cpu/mce/amd.c | 31 +++++++++++++++++++++++++------ arch/x86/kernel/cpu/mce/core.c | 6 ++---- 3 files changed, 29 insertions(+), 10 deletions(-)