Message ID | 20240624212008.663832-5-yazen.ghannam@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Rework mce_setup() | expand |
On 25.06.24 г. 0:20 ч., Yazen Ghannam wrote: > Generally, MCA information for an error is gathered on the CPU that > reported the error. In this case, CPU-specific information from the > running CPU will be correct. > > However, this will be incorrect if the MCA information is gathered while > running on a CPU that didn't report the error. One example is creating > an MCA record using mce_prep_record() for errors reported from ACPI. > > Split mce_prep_record() so that there is a helper function to gather > common, i.e. not CPU-specific, information and another helper for > CPU-specific information. > > Leave mce_prep_record() defined as-is for the common case when running > on the reporting CPU. > > Get MCG_CAP in the global helper even though the register is per-CPU. > This value is not already cached per-CPU like other values. And it does > not assist with any per-CPU decoding or handling. > > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> > --- > Link: > https://lkml.kernel.org/r/20240521125434.1555845-3-yazen.ghannam@amd.com > > v1->v2: > * No change. > > arch/x86/kernel/cpu/mce/core.c | 34 ++++++++++++++++++++---------- > arch/x86/kernel/cpu/mce/internal.h | 2 ++ > 2 files changed, 25 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index dd5192ef52e0..0133f88dfffb 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -117,20 +117,32 @@ static struct irq_work mce_irq_work; > */ > BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain); > > -/* Do initial initialization of a struct mce */ > -void mce_prep_record(struct mce *m) > +void mce_prep_record_common(struct mce *m) > { > memset(m, 0, sizeof(struct mce)); > - m->cpu = m->extcpu = smp_processor_id(); > + > + m->cpuid = cpuid_eax(1); > + m->cpuvendor = boot_cpu_data.x86_vendor; > + m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP); > /* need the internal __ version to avoid deadlocks */ > - m->time = __ktime_get_real_seconds(); > - m->cpuvendor = boot_cpu_data.x86_vendor; > - m->cpuid = cpuid_eax(1); > - m->socketid = cpu_data(m->extcpu).topo.pkg_id; > - m->apicid = cpu_data(m->extcpu).topo.initial_apicid; > - m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP); > - m->ppin = cpu_data(m->extcpu).ppin; > - m->microcode = boot_cpu_data.microcode; > + m->time = __ktime_get_real_seconds(); > +} > + > +void mce_prep_record_per_cpu(unsigned int cpu, struct mce *m) > +{ > + m->cpu = cpu; > + m->extcpu = cpu; > + m->apicid = cpu_data(m->extcpu).topo.initial_apicid; > + m->microcode = cpu_data(m->extcpu).microcode; > + m->ppin = cpu_data(m->extcpu).ppin; nit: Similar to tglx's feedback for patch 2 you could use topology_ppin() > + m->socketid = cpu_data(m->extcpu).topo.pkg_id; nit: topology_physical_package_id() > +} > + <snip>
On Tue, Jun 25, 2024 at 04:19:40PM +0300, Nikolay Borisov wrote: [...] > > + > > +void mce_prep_record_per_cpu(unsigned int cpu, struct mce *m) > > +{ > > + m->cpu = cpu; > > + m->extcpu = cpu; > > + m->apicid = cpu_data(m->extcpu).topo.initial_apicid; > > + m->microcode = cpu_data(m->extcpu).microcode; > > + m->ppin = cpu_data(m->extcpu).ppin; > > nit: Similar to tglx's feedback for patch 2 you could use topology_ppin() > > > + m->socketid = cpu_data(m->extcpu).topo.pkg_id; > nit: topology_physical_package_id() > > Yes, will update both. Thanks, Yazen
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index dd5192ef52e0..0133f88dfffb 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -117,20 +117,32 @@ static struct irq_work mce_irq_work; */ BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain); -/* Do initial initialization of a struct mce */ -void mce_prep_record(struct mce *m) +void mce_prep_record_common(struct mce *m) { memset(m, 0, sizeof(struct mce)); - m->cpu = m->extcpu = smp_processor_id(); + + m->cpuid = cpuid_eax(1); + m->cpuvendor = boot_cpu_data.x86_vendor; + m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP); /* need the internal __ version to avoid deadlocks */ - m->time = __ktime_get_real_seconds(); - m->cpuvendor = boot_cpu_data.x86_vendor; - m->cpuid = cpuid_eax(1); - m->socketid = cpu_data(m->extcpu).topo.pkg_id; - m->apicid = cpu_data(m->extcpu).topo.initial_apicid; - m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP); - m->ppin = cpu_data(m->extcpu).ppin; - m->microcode = boot_cpu_data.microcode; + m->time = __ktime_get_real_seconds(); +} + +void mce_prep_record_per_cpu(unsigned int cpu, struct mce *m) +{ + m->cpu = cpu; + m->extcpu = cpu; + m->apicid = cpu_data(m->extcpu).topo.initial_apicid; + m->microcode = cpu_data(m->extcpu).microcode; + m->ppin = cpu_data(m->extcpu).ppin; + m->socketid = cpu_data(m->extcpu).topo.pkg_id; +} + +/* Do initial initialization of a struct mce */ +void mce_prep_record(struct mce *m) +{ + mce_prep_record_common(m); + mce_prep_record_per_cpu(smp_processor_id(), m); } DEFINE_PER_CPU(struct mce, injectm); diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index 01f8f03969e6..43c7f3b71df5 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -261,6 +261,8 @@ enum mca_msr { /* Decide whether to add MCE record to MCE event pool or filter it out. */ extern bool filter_mce(struct mce *m); +void mce_prep_record_common(struct mce *m); +void mce_prep_record_per_cpu(unsigned int cpu, struct mce *m); #ifdef CONFIG_X86_MCE_AMD extern bool amd_filter_mce(struct mce *m);
Generally, MCA information for an error is gathered on the CPU that reported the error. In this case, CPU-specific information from the running CPU will be correct. However, this will be incorrect if the MCA information is gathered while running on a CPU that didn't report the error. One example is creating an MCA record using mce_prep_record() for errors reported from ACPI. Split mce_prep_record() so that there is a helper function to gather common, i.e. not CPU-specific, information and another helper for CPU-specific information. Leave mce_prep_record() defined as-is for the common case when running on the reporting CPU. Get MCG_CAP in the global helper even though the register is per-CPU. This value is not already cached per-CPU like other values. And it does not assist with any per-CPU decoding or handling. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> --- Link: https://lkml.kernel.org/r/20240521125434.1555845-3-yazen.ghannam@amd.com v1->v2: * No change. arch/x86/kernel/cpu/mce/core.c | 34 ++++++++++++++++++++---------- arch/x86/kernel/cpu/mce/internal.h | 2 ++ 2 files changed, 25 insertions(+), 11 deletions(-)