Message ID | 20241016123036.21366-7-qiuxu.zhuo@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/mce: Clean up some x86/mce code | expand |
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index 725c1d6fb1e5..40672fe0991a 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -1892,7 +1892,8 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) > } > > /* This should be disabled by the BIOS, but isn't always */ This comment is specific to the AMD and placing it before the switch makes it seem generic to the entire switch statement. It should probably be moved inside the AMD case just above the disable GART TLB check. > - if (c->x86_vendor == X86_VENDOR_AMD) { > + switch (c->x86_vendor) { > + case X86_VENDOR_AMD: > if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) { > /* > * disable GART TBL walk error reporting, which > @@ -1925,9 +1926,9 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) > if (c->x86 >= 0x17 && c->x86 <= 0x1A) > mce_flags.zen_ifu_quirk = 1; > > - } > + break; > Also, why not include the unknown vendor check (right above) inside the switch case as well? if (c->x86_vendor == X86_VENDOR_UNKNOWN) { pr_info("unknown CPU type - not enabling MCE support\n"); return -EOPNOTSUPP; } This seems to follow the same pattern as others and can be the first case inside the switch.
On Fri, Oct 18, 2024 at 12:44:00PM -0700, Sohil Mehta wrote: > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > > index 725c1d6fb1e5..40672fe0991a 100644 > > --- a/arch/x86/kernel/cpu/mce/core.c > > +++ b/arch/x86/kernel/cpu/mce/core.c > > @@ -1892,7 +1892,8 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) > > } > > > > /* This should be disabled by the BIOS, but isn't always */ > > This comment is specific to the AMD and placing it before the switch > makes it seem generic to the entire switch statement. It should probably > be moved inside the AMD case just above the disable GART TLB check. > > > - if (c->x86_vendor == X86_VENDOR_AMD) { > > + switch (c->x86_vendor) { > > + case X86_VENDOR_AMD: > > if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) { > > /* > > * disable GART TBL walk error reporting, which > > @@ -1925,9 +1926,9 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) > > if (c->x86 >= 0x17 && c->x86 <= 0x1A) > > mce_flags.zen_ifu_quirk = 1; > > > > - } > > + break; > > > > > Also, why not include the unknown vendor check (right above) inside the > switch case as well? > > if (c->x86_vendor == X86_VENDOR_UNKNOWN) { > pr_info("unknown CPU type - not enabling MCE support\n"); > return -EOPNOTSUPP; > } > > This seems to follow the same pattern as others and can be the first > case inside the switch. The vendor specific bits are large enough to warrant their own static functions (as we do elsewhere in this file). How about this (only compile-tested) patch? -Tony From 967d8637ac90823f28f4612cbbac305c421b4853 Mon Sep 17 00:00:00 2001 From: Tony Luck <tony.luck@intel.com> Date: Fri, 18 Oct 2024 13:01:02 -0700 Subject: [PATCH] x86/mce: Break up __mcheck_cpu_apply_quirks() Split each vendor specific part into its own helper function. Signed-off-by: Tony Luck <tony.luck@intel.com> --- arch/x86/kernel/cpu/mce/core.c | 172 ++++++++++++++++++--------------- 1 file changed, 96 insertions(+), 76 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 2a938f429c4d..f51fb393d369 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1880,101 +1880,121 @@ static void __mcheck_cpu_check_banks(void) } } -/* Add per CPU specific workarounds here */ -static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) +static void apply_quirks_amd(struct cpuinfo_x86 *c) { struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); struct mca_config *cfg = &mca_cfg; - if (c->x86_vendor == X86_VENDOR_UNKNOWN) { - pr_info("unknown CPU type - not enabling MCE support\n"); - return -EOPNOTSUPP; - } - /* This should be disabled by the BIOS, but isn't always */ - if (c->x86_vendor == X86_VENDOR_AMD) { - if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) { - /* - * disable GART TBL walk error reporting, which - * trips off incorrectly with the IOMMU & 3ware - * & Cerberus: - */ - clear_bit(10, (unsigned long *)&mce_banks[4].ctl); - } - if (c->x86 < 0x11 && cfg->bootlog < 0) { - /* - * Lots of broken BIOS around that don't clear them - * by default and leave crap in there. Don't log: - */ - cfg->bootlog = 0; - } + if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) { /* - * Various K7s with broken bank 0 around. Always disable - * by default. + * disable GART TBL walk error reporting, which + * trips off incorrectly with the IOMMU & 3ware + * & Cerberus: */ - if (c->x86 == 6 && this_cpu_read(mce_num_banks) > 0) - mce_banks[0].ctl = 0; - + clear_bit(10, (unsigned long *)&mce_banks[4].ctl); + } + if (c->x86 < 0x11 && cfg->bootlog < 0) { /* - * overflow_recov is supported for F15h Models 00h-0fh - * even though we don't have a CPUID bit for it. + * Lots of broken BIOS around that don't clear them + * by default and leave crap in there. Don't log: */ - if (c->x86 == 0x15 && c->x86_model <= 0xf) - mce_flags.overflow_recov = 1; + cfg->bootlog = 0; + } + /* + * Various K7s with broken bank 0 around. Always disable + * by default. + */ + if (c->x86 == 6 && this_cpu_read(mce_num_banks) > 0) + mce_banks[0].ctl = 0; - if (c->x86 >= 0x17 && c->x86 <= 0x1A) - mce_flags.zen_ifu_quirk = 1; + /* + * overflow_recov is supported for F15h Models 00h-0fh + * even though we don't have a CPUID bit for it. + */ + if (c->x86 == 0x15 && c->x86_model <= 0xf) + mce_flags.overflow_recov = 1; - } + if (c->x86 >= 0x17 && c->x86 <= 0x1A) + mce_flags.zen_ifu_quirk = 1; +} - if (c->x86_vendor == X86_VENDOR_INTEL) { - /* - * SDM documents that on family 6 bank 0 should not be written - * because it aliases to another special BIOS controlled - * register. - * But it's not aliased anymore on model 0x1a+ - * Don't ignore bank 0 completely because there could be a - * valid event later, merely don't write CTL0. - */ +static void apply_quirks_intel(struct cpuinfo_x86 *c) +{ + struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); + struct mca_config *cfg = &mca_cfg; - if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0) - mce_banks[0].init = false; + /* + * SDM documents that on family 6 bank 0 should not be written + * because it aliases to another special BIOS controlled + * register. + * But it's not aliased anymore on model 0x1a+ + * Don't ignore bank 0 completely because there could be a + * valid event later, merely don't write CTL0. + */ - /* - * All newer Intel systems support MCE broadcasting. Enable - * synchronization with a one second timeout. - */ - if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && - cfg->monarch_timeout < 0) - cfg->monarch_timeout = USEC_PER_SEC; + if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0) + mce_banks[0].init = false; - /* - * There are also broken BIOSes on some Pentium M and - * earlier systems: - */ - if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0) - cfg->bootlog = 0; + /* + * All newer Intel systems support MCE broadcasting. Enable + * synchronization with a one second timeout. + */ + if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && + cfg->monarch_timeout < 0) + cfg->monarch_timeout = USEC_PER_SEC; - if (c->x86_vfm == INTEL_SANDYBRIDGE_X) - mce_flags.snb_ifu_quirk = 1; + /* + * There are also broken BIOSes on some Pentium M and + * earlier systems: + */ + if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0) + cfg->bootlog = 0; - /* - * Skylake, Cascacde Lake and Cooper Lake require a quirk on - * rep movs. - */ - if (c->x86_vfm == INTEL_SKYLAKE_X) - mce_flags.skx_repmov_quirk = 1; + if (c->x86_vfm == INTEL_SANDYBRIDGE_X) + mce_flags.snb_ifu_quirk = 1; + + /* + * Skylake, Cascacde Lake and Cooper Lake require a quirk on + * rep movs. + */ + if (c->x86_vfm == INTEL_SKYLAKE_X) + mce_flags.skx_repmov_quirk = 1; +} + +static void apply_quirks_zhaoxin(struct cpuinfo_x86 *c) +{ + struct mca_config *cfg = &mca_cfg; + + /* + * All newer Zhaoxin CPUs support MCE broadcasting. Enable + * synchronization with a one second timeout. + */ + if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) { + if (cfg->monarch_timeout < 0) + cfg->monarch_timeout = USEC_PER_SEC; } +} - if (c->x86_vendor == X86_VENDOR_ZHAOXIN) { - /* - * All newer Zhaoxin CPUs support MCE broadcasting. Enable - * synchronization with a one second timeout. - */ - if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) { - if (cfg->monarch_timeout < 0) - cfg->monarch_timeout = USEC_PER_SEC; - } +/* Add per CPU specific workarounds here */ +static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) +{ + struct mca_config *cfg = &mca_cfg; + + switch (c->x86_vendor) { + case X86_VENDOR_UNKNOWN: + pr_info("unknown CPU type - not enabling MCE support\n"); + return -EOPNOTSUPP; + + case X86_VENDOR_AMD: + apply_quirks_amd(c); + break; + case X86_VENDOR_INTEL: + apply_quirks_intel(c); + break; + case X86_VENDOR_ZHAOXIN: + apply_quirks_zhaoxin(c); + break; } if (cfg->monarch_timeout < 0)
On 10/18/2024 1:14 PM, Tony Luck wrote: > The vendor specific bits are large enough to warrant their own > static functions (as we do elsewhere in this file). > > How about this (only compile-tested) patch? > Yeah, it does make it easier to read. Can some of the hardcoded numbers be changed to vfm macros as well?
> From: Luck, Tony <tony.luck@intel.com> > [...] > On Fri, Oct 18, 2024 at 12:44:00PM -0700, Sohil Mehta wrote: > > > diff --git a/arch/x86/kernel/cpu/mce/core.c > > > b/arch/x86/kernel/cpu/mce/core.c index 725c1d6fb1e5..40672fe0991a > > > 100644 > > > --- a/arch/x86/kernel/cpu/mce/core.c > > > +++ b/arch/x86/kernel/cpu/mce/core.c > > > @@ -1892,7 +1892,8 @@ static int __mcheck_cpu_apply_quirks(struct > cpuinfo_x86 *c) > > > } > > > > > > /* This should be disabled by the BIOS, but isn't always */ > > > > This comment is specific to the AMD and placing it before the switch > > makes it seem generic to the entire switch statement. It should > > probably be moved inside the AMD case just above the disable GART TLB > check. > > > > > - if (c->x86_vendor == X86_VENDOR_AMD) { > > > + switch (c->x86_vendor) { > > > + case X86_VENDOR_AMD: > > > if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) { > > > /* > > > * disable GART TBL walk error reporting, which @@ - > 1925,9 > > > +1926,9 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) > > > if (c->x86 >= 0x17 && c->x86 <= 0x1A) > > > mce_flags.zen_ifu_quirk = 1; > > > > > > - } > > > + break; > > > > > > > > > Also, why not include the unknown vendor check (right above) inside > > the switch case as well? > > > > if (c->x86_vendor == X86_VENDOR_UNKNOWN) { > > pr_info("unknown CPU type - not enabling MCE support\n"); > > return -EOPNOTSUPP; > > } > > > > This seems to follow the same pattern as others and can be the first > > case inside the switch. > > The vendor specific bits are large enough to warrant their own static functions > (as we do elsewhere in this file). > > How about this (only compile-tested) patch? > LGTM. This makes the code more structured and readable. I'll replace mine with this new patch in my next version after basic testing. Thanks, Tony & Sohil. -Qiuxu
> From: Mehta, Sohil <sohil.mehta@intel.com> > [...] > Subject: Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into > a switch() statement > > On 10/18/2024 1:14 PM, Tony Luck wrote: > > > The vendor specific bits are large enough to warrant their own static > > functions (as we do elsewhere in this file). > > > > How about this (only compile-tested) patch? > > > > Yeah, it does make it easier to read. Can some of the hardcoded numbers be > changed to vfm macros as well? Convert some hard-coded numbers to VFM macros on top of Tony's patch as shown below. Please review it for any errors or omissions. diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index f51fb393d369..3b83efa1ca65 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); struct mca_config *cfg = &mca_cfg; + /* Older CPUs don't need quirks. */ + if (c->x86 < 6) + return; + /* * SDM documents that on family 6 bank 0 should not be written * because it aliases to another special BIOS controlled @@ -1932,23 +1936,21 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) * Don't ignore bank 0 completely because there could be a * valid event later, merely don't write CTL0. */ - - if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0) + if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0) mce_banks[0].init = false; /* * All newer Intel systems support MCE broadcasting. Enable * synchronization with a one second timeout. */ - if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && - cfg->monarch_timeout < 0) + if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0) cfg->monarch_timeout = USEC_PER_SEC; /* * There are also broken BIOSes on some Pentium M and * earlier systems: */ - if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0) + if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0) cfg->bootlog = 0; if (c->x86_vfm == INTEL_SANDYBRIDGE_X)
> /* > * All newer Intel systems support MCE broadcasting. Enable > * synchronization with a one second timeout. > */ > - if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && > - cfg->monarch_timeout < 0) > + if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0) > cfg->monarch_timeout = USEC_PER_SEC; This change is correct. But the old code makes it more explicit that CPUs in families > 6 take this action. As the author of the VFM changes it's clear to me, maybe less so to others? But maybe its OK. The comment does help a lot. Anyone else have thoughts on this? -Tony
On 10/21/24 09:06, Luck, Tony wrote: >> /* >> * All newer Intel systems support MCE broadcasting. Enable >> * synchronization with a one second timeout. >> */ >> - if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && >> - cfg->monarch_timeout < 0) >> + if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0) >> cfg->monarch_timeout = USEC_PER_SEC; > This change is correct. But the old code makes it more explicit that > CPUs in families > 6 take this action. As the author of the VFM changes > it's clear to me, maybe less so to others? > > But maybe its OK. The comment does help a lot. Anyone else have thoughts on this? It certainly is a bit subtle. To me, the earlier check would be even better if it were: - if (c->x86 < 6) + if (c->x86_vfm < INTEL_PENTIUM_PRO) return; That at least makes it more clear that it's a range of models and avoids having a ->x86 check mixed with a ->x86_vfm one.
On 10/21/2024 9:06 AM, Luck, Tony wrote: >> /* >> * All newer Intel systems support MCE broadcasting. Enable >> * synchronization with a one second timeout. >> */ >> - if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && >> - cfg->monarch_timeout < 0) >> + if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0) >> cfg->monarch_timeout = USEC_PER_SEC; > > This change is correct. But the old code makes it more explicit that > CPUs in families > 6 take this action. As the author of the VFM changes > it's clear to me, maybe less so to others? > > But maybe its OK. The comment does help a lot. Anyone else have thoughts on this? > I am not very familiar with the intricacies of the VFM checks. I did take me a few minutes to figure out why the modified code is correct. But looking at the prior or the later checks, I see the '<' operator used directly on platform names. So, the new check seems inline with that i.e. in this case, any model or family after the said platform supports MCE broadcasting. > /* > * There are also broken BIOSes on some Pentium M and > * earlier systems: > */ > - if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0) > + if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0) > cfg->bootlog = 0;
> I am not very familiar with the intricacies of the VFM checks. I did > take me a few minutes to figure out why the modified code is correct. Hence my concern :-) > But looking at the prior or the later checks, I see the '<' operator > used directly on platform names. So, the new check seems inline with > that i.e. in this case, any model or family after the said platform > supports MCE broadcasting. Intel model number allocation policies aren't necessarily sequential. So range checks need to be used with caution. They should be safe enough when done to simplify code that checks very old models. Range checks across families may be even more problematic. Again these old checks that assume all future families will not reintroduce quirks from 20-year-old CPUs should be safe (I hope!!). But, spoiler alert, Intel is planning to begin use of two families in parallel. Family 19 (first model Diamond Rapids) is already in <asm/intel-family.h>). But there are going to be some CPUs in family 18 too. I'll be surprised if there are any use cases for ranges that span between families 18 and 19. -Tony
On 10/21/2024 10:51 AM, Luck, Tony wrote: >> But looking at the prior or the later checks, I see the '<' operator >> used directly on platform names. So, the new check seems inline with >> that i.e. in this case, any model or family after the said platform >> supports MCE broadcasting. > > Intel model number allocation policies aren't necessarily sequential. Model numbers are assumed to be sequential at least within family 6. if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && cfg->monarch_timeout < 0) There is another check in early_init_intel() which does this: if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > So range checks need to be used with caution. They should be safe > enough when done to simplify code that checks very old models. > ... > Range checks across families may be even more problematic. I agree. Maybe we should avoid range checks across families altogether forward (>) or backward (<). For example, does the following change from Qiuxu, unintentionally become applicable to Quark CPUs with family -> 5? > /* > * There are also broken BIOSes on some Pentium M and > * earlier systems: > */ > - if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0) > + if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0) > cfg->bootlog = 0;
>> Intel model number allocation policies aren't necessarily sequential. > > Model numbers are assumed to be sequential at least within family 6. Assumption can only be applied retroactively to simpler times. Looking at the timelines and model numbers for pure-Atom, pure-Core, Hybrid, and Xeon, they are somewhat jumbled. > For example, does the following change from Qiuxu, unintentionally > become applicable to Quark CPUs with family -> 5? Qiuxu starts the function with: + /* Older CPUs don't need quirks. */ + if (c->x86 < 6) + return; So Quark leaves the function early. -Tony
On 10/21/2024 11:40 AM, Luck, Tony wrote: >>> Intel model number allocation policies aren't necessarily sequential. >> >> Model numbers are assumed to be sequential at least within family 6. > > Assumption can only be applied retroactively to simpler times. Looking > at the timelines and model numbers for pure-Atom, pure-Core, Hybrid, > and Xeon, they are somewhat jumbled. > Agreed. Using range checks within a family with extreme care and avoiding cross-family ones seems like the saner thing to do. Maybe everything in the future is enumerated and VFM checks would not be needed :) Trying to understand more, I have more questions than answers. With the introduction of Family 0x19, do we need to reevaluate some of the existing model checks? early_init_intel(): if ((c->x86 == 0xf && c->x86_model >= 0x03) || (c->x86 == 0x6 && c->x86_model >= 0x0e)) set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC); It seems "constant_tsc" wouldn't show on Diamond rapids. Do we need it to? > Qiuxu starts the function with: > > + /* Older CPUs don't need quirks. */ > + if (c->x86 < 6) > + return; > > So Quark leaves the function early. > Ah! My bad, I missed that. > -Tony > >
> Trying to understand more, I have more questions than answers. With the > introduction of Family 0x19, do we need to reevaluate some of the > existing model checks? Diamond Rapids is in Family 19, not 0x19. I was unsure in <asm/intel-family.h> to use decimal or hex for family (since only 5 & 6 are used there, and they are same in both bases). I picked decimal to avoid 0x prefixes everywhere. > early_init_intel(): > if ((c->x86 == 0xf && c->x86_model >= 0x03) || > (c->x86 == 0x6 && c->x86_model >= 0x0e)) > set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC); > > It seems "constant_tsc" wouldn't show on Diamond rapids. Do we need it to? This looks to be checking for Pentium IV Prescott or newer in family 0xf, or Yonah or newer in family 6. You are right that it won't catch the new families. But it might not matter if this later block sets the feature bit. /* * c->x86_power is 8000_0007 edx. Bit 8 is TSC runs at constant rate * with P/T states and does not stop in deep C-states. * * It is also reliable across cores and sockets. (but not across * cabinets - we turn it off in that case explicitly.) */ if (c->x86_power & (1 << 8)) { set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC); set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC); } It appears that constant TSC started out model specific, but later got a proper enumeration bit in CPUID. -Tony
On 10/21/2024 4:31 PM, Luck, Tony wrote: > Diamond Rapids is in Family 19, not 0x19. I was unsure in <asm/intel-family.h> > to use decimal or hex for family (since only 5 & 6 are used there, and they are > same in both bases). I picked decimal to avoid 0x prefixes everywhere. > Got it. In intel-family.h it probably doesn't matter. But across x86 the family checks seem to be a mix of decimal and hexadecimal with maybe more inclination towards hex. > It appears that constant TSC started out model specific, but later > got a proper enumeration bit in CPUID. > Ah! Makes sense. > -Tony > >
On 10/21/24 15:57, Sohil Mehta wrote: > On 10/21/2024 11:40 AM, Luck, Tony wrote: >>>> Intel model number allocation policies aren't necessarily sequential. >>> Model numbers are assumed to be sequential at least within family 6. >> Assumption can only be applied retroactively to simpler times. Looking >> at the timelines and model numbers for pure-Atom, pure-Core, Hybrid, >> and Xeon, they are somewhat jumbled. >> > Agreed. Using range checks within a family with extreme care and > avoiding cross-family ones seems like the saner thing to do. > > Maybe everything in the future is enumerated and VFM checks would not be > needed
On 10/21/2024 5:17 PM, Dave Hansen wrote: > We only have a handful of these and they're mostly for early family 6 > things. I bet there's less than half a dozen. > You are right. There don't seem to be many unbounded model checks for Intel family 6. I could only find 3. early_init_intel() -> constant_tsc - Tony found out that it is harmless since it got it's own enumeration later on. should_io_be_busy() and acpi_processor_power_init_bm_check() also seem to be for older platforms and probably no longer applicable. I'll reach out to the power folks to confirm. Maybe if we just add an upper bound to these checks then we don't to worry about carrying them forward with the newer family 6 models and upcoming family 19 models.
> From: Mehta, Sohil <sohil.mehta@intel.com> > [...] > On 10/21/2024 9:06 AM, Luck, Tony wrote: > [...] > > This change is correct. But the old code makes it more explicit that > > CPUs in families > 6 take this action. As the author of the VFM > > changes it's clear to me, maybe less so to others? > > > > But maybe its OK. The comment does help a lot. Anyone else have thoughts > on this? > > > > I am not very familiar with the intricacies of the VFM checks. I did take me a > few minutes to figure out why the modified code is correct. OK. So, back to your original question below, what is your answer to it now? :-) "Can some of the hardcoded numbers be changed to vfm macros as well?"
On 10/18/2024 10:46 PM, Zhuo, Qiuxu wrote: > Convert some hard-coded numbers to VFM macros on top of Tony's patch as shown below. > Please review it for any errors or omissions. > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index f51fb393d369..3b83efa1ca65 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) > struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); > struct mca_config *cfg = &mca_cfg; > > + /* Older CPUs don't need quirks. */ > + if (c->x86 < 6) > + return; > + As Dave mentioned, change this to make the use of vfm consistent in the entire function and probably update the comment as well to make it explicit: /* Older CPUs (prior to family 6) don't need quirks */ if (c->x86_vfm < INTEL_PENTIUM_PRO) return; > /* > * SDM documents that on family 6 bank 0 should not be written > * because it aliases to another special BIOS controlled > @@ -1932,23 +1936,21 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) > * Don't ignore bank 0 completely because there could be a > * valid event later, merely don't write CTL0. > */ > - > - if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0) > + if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0) > mce_banks[0].init = false; > > /* > * All newer Intel systems support MCE broadcasting. Enable > * synchronization with a one second timeout. > */ > - if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && > - cfg->monarch_timeout < 0) > + if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0) > cfg->monarch_timeout = USEC_PER_SEC; > Instead of keeping this open-ended we could tweak this a bit as follows: if (!(c->x86_vfm < INTEL_CORE_YONAH)) && cfg->monarch_timeout < 0) cfg->monarch_timeout = USEC_PER_SEC; Essentially the same: if (new_cpu) vs if (!old_cpu) Don't have a strong preference. Will leave it to you and Tony. > /* > * There are also broken BIOSes on some Pentium M and > * earlier systems: > */ > - if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0) > + if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0) > cfg->bootlog = 0; > > if (c->x86_vfm == INTEL_SANDYBRIDGE_X) > All the other checks look fine to me.
On 10/22/2024 7:08 PM, Zhuo, Qiuxu wrote: > OK. So, back to your original question below, what is your answer to it now? :-) > > "Can some of the hardcoded numbers be changed to vfm macros as well?" > Even though it takes a tiny bit of reading to understand the VFM macros, the pros significantly outweigh the cons. I still feel we should go ahead and make the changes. I have responded with minor comments to your patch. Sohil
> From: Mehta, Sohil <sohil.mehta@intel.com> > [...] > As Dave mentioned, change this to make the use of vfm consistent in the > entire function and probably update the comment as well to make it explicit: > > /* Older CPUs (prior to family 6) don't need quirks */ Yes, the improved comment is better. > if (c->x86_vfm < INTEL_PENTIUM_PRO) > return; > > [...] > > - if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && > > - cfg->monarch_timeout < 0) > > + if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < > > + 0) > > cfg->monarch_timeout = USEC_PER_SEC; > > > > Instead of keeping this open-ended we could tweak this a bit as follows: > > if (!(c->x86_vfm < INTEL_CORE_YONAH)) && cfg->monarch_timeout < 0) > cfg->monarch_timeout = USEC_PER_SEC; > > Essentially the same: if (new_cpu) vs if (!old_cpu) Don't have a strong > preference. Will leave it to you and Tony. > I prefer the single, straightforward '>=' operation over the '<' and then '!' two operations. - Qiuxu
> From: Mehta, Sohil <sohil.mehta@intel.com> > [...] > On 10/22/2024 7:08 PM, Zhuo, Qiuxu wrote: > > OK. So, back to your original question below, what is your answer to > > it now? :-) > > > > "Can some of the hardcoded numbers be changed to vfm macros as > well?" > > > > Even though it takes a tiny bit of reading to understand the VFM macros, the > pros significantly outweigh the cons. I still feel we should go ahead and make > the changes. Thanks for letting me know your thoughts. To me, the VFM-based checks can make the code more compact. So, if nobody else objects, I'll include this VFM-based check version in the next version. - Qiuxu
> From: Hansen, Dave <dave.hansen@intel.com> > [...] > On 10/21/24 09:06, Luck, Tony wrote: > >> /* > >> * All newer Intel systems support MCE broadcasting. Enable > >> * synchronization with a one second timeout. > >> */ > >> - if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && > >> - cfg->monarch_timeout < 0) > >> + if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < > >> + 0) > >> cfg->monarch_timeout = USEC_PER_SEC; > > This change is correct. But the old code makes it more explicit that > > CPUs in families > 6 take this action. As the author of the VFM > > changes it's clear to me, maybe less so to others? > > > > But maybe its OK. The comment does help a lot. Anyone else have thoughts > on this? > > It certainly is a bit subtle. > > To me, the earlier check would be even better if it were: > > - if (c->x86 < 6) Thanks, Dave. OK, I'll update it in the next version. Apart from this, I'll also add a comment below, as suggested by Sohil, to make it explicit that it's for prior to family 6. /* Older CPUs (prior to family 6) don't need quirks */ > + if (c->x86_vfm < INTEL_PENTIUM_PRO) > return; > > That at least makes it more clear that it's a range of models and avoids having > a ->x86 check mixed with a ->x86_vfm one.
> From: Zhuo, Qiuxu <qiuxu.zhuo@intel.com> > [...] > Subject: RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into > a switch() statement > > > From: Hansen, Dave <dave.hansen@intel.com> [...] On 10/21/24 09:06, > [...] > > > > It certainly is a bit subtle. > > > > To me, the earlier check would be even better if it were: > > > > - if (c->x86 < 6) > > Thanks, Dave. > OK, I'll update it in the next version. > Apart from this, I'll also add a comment below, as suggested by Sohil, to make > it explicit that it's for prior to family 6. > > /* Older CPUs (prior to family 6) don't need quirks */ > > + if (c->x86_vfm < INTEL_PENTIUM_PRO) > > return; > > > > That at least makes it more clear that it's a range of models and > > avoids having a ->x86 check mixed with a ->x86_vfm one. Hi @Luck, Tony, @Hansen, Dave, @Mehta, Sohil, Thanks for your time discussing the VFM-based checks. I made the patch on top of Tony's [1] based on what we've discussed. I'd like to add the tags from you to the following patch. Please let me know if these tags are OK with you. Thanks! [1] https://lore.kernel.org/all/ZxLBwO4HkkJG4WYn@agluck-desk3.sc.intel.com/ From 6e88743f0619a902c6e6f985b9fc93669098b4af Mon Sep 17 00:00:00 2001 From: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Date: Mon, 21 Oct 2024 10:42:23 +0800 Subject: [PATCH v3 07/10] x86/mce: Convert family/model mixed checks to VFM-based checks Convert family/model mixed checks to VFM-based checks to make the code more compact. Suggested-by: Sohil Mehta <sohil.mehta@intel.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> --- arch/x86/kernel/cpu/mce/core.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index bb8b1000fa0a..936804a5a0b9 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); struct mca_config *cfg = &mca_cfg; + /* Older CPUs (prior to family 6) don't need quirks. */ + if (c->x86_vfm < INTEL_PENTIUM_PRO) + return; + /* * SDM documents that on family 6 bank 0 should not be written * because it aliases to another special BIOS controlled @@ -1932,22 +1936,21 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) * Don't ignore bank 0 completely because there could be a * valid event later, merely don't write CTL0. */ - if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0) + if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0) mce_banks[0].init = false; /* * All newer Intel systems support MCE broadcasting. Enable * synchronization with a one second timeout. */ - if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) && - cfg->monarch_timeout < 0) + if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0) cfg->monarch_timeout = USEC_PER_SEC; /* * There are also broken BIOSes on some Pentium M and * earlier systems: */ - if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0) + if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0) cfg->bootlog = 0; if (c->x86_vfm == INTEL_SANDYBRIDGE_X) -- 2.17.1
- if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0) + if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0) mce_banks[0].init = false; Updated code now matches for families before 6 (486, Pentium). 486 would never get to this code. But I think from the comments about machine check bank 0 being magic that Pentium had some rudimentary support. Should this be: if (c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0) to avoid this semantic change? - if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0) + if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0) cfg->bootlog = 0; Same as above. New code matches for families 4 and 5. -Tony
On 10/24/2024 9:42 AM, Luck, Tony wrote: > - if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0) > + if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0) > mce_banks[0].init = false; > > Updated code now matches for families before 6 (486, Pentium). 486 would never get > to this code. But I think from the comments about machine check bank 0 being magic > that Pentium had some rudimentary support. > As you mentioned it yourself (the last time I was concerned about family 5), the following check should cover this scenario? > @@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) > struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); > struct mca_config *cfg = &mca_cfg; > > + /* Older CPUs (prior to family 6) don't need quirks. */ > + if (c->x86_vfm < INTEL_PENTIUM_PRO) > + return; > + Sohil
> > - if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0) > > + if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0) > > mce_banks[0].init = false; > > > > Updated code now matches for families before 6 (486, Pentium). 486 would never get > > to this code. But I think from the comments about machine check bank 0 being magic > > that Pentium had some rudimentary support. > > > > As you mentioned it yourself (the last time I was concerned about family > 5), the following check should cover this scenario? > > > @@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) > > struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); > > struct mca_config *cfg = &mca_cfg; > > > > + /* Older CPUs (prior to family 6) don't need quirks. */ > > + if (c->x86_vfm < INTEL_PENTIUM_PRO) > > + return; So I did. I was right about it too. Sorry for the noise. This patch looks good. -Tony
On 10/24/24 14:31, Sohil Mehta wrote: > @@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) > struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); > struct mca_config *cfg = &mca_cfg; > > + /* Older CPUs (prior to family 6) don't need quirks. */ > + if (c->x86_vfm < INTEL_PENTIUM_PRO) > + return; In case anyone grumbles, this is one of those expressions that's not perfect, but I think it's quite good enough in practice. It wouldn't work if we ever (for instance) moved 'vendor' to be less significant bits than 'model'. Or on a CPU that claimed to be family=6 but model=0. But Intel never (as far as I know) sold a CPU like that, so it's probably only possible in a VM where these checks are rather worthless anyway. In short, I think >=INTEL_PENTIUM_PRO is a great check that can mean "family 6 or later" almost anywhere.
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 725c1d6fb1e5..40672fe0991a 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1892,7 +1892,8 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) } /* This should be disabled by the BIOS, but isn't always */ - if (c->x86_vendor == X86_VENDOR_AMD) { + switch (c->x86_vendor) { + case X86_VENDOR_AMD: if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) { /* * disable GART TBL walk error reporting, which @@ -1925,9 +1926,9 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) if (c->x86 >= 0x17 && c->x86 <= 0x1A) mce_flags.zen_ifu_quirk = 1; - } + break; - if (c->x86_vendor == X86_VENDOR_INTEL) { + case X86_VENDOR_INTEL: /* * SDM documents that on family 6 bank 0 should not be written * because it aliases to another special BIOS controlled @@ -1964,9 +1965,10 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) */ if (c->x86_vfm == INTEL_SKYLAKE_X) mce_flags.skx_repmov_quirk = 1; - } - if (c->x86_vendor == X86_VENDOR_ZHAOXIN) { + break; + + case X86_VENDOR_ZHAOXIN: /* * All newer Zhaoxin CPUs support MCE broadcasting. Enable * synchronization with a one second timeout. @@ -1975,6 +1977,8 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c) if (cfg->monarch_timeout < 0) cfg->monarch_timeout = USEC_PER_SEC; } + + break; } if (cfg->monarch_timeout < 0)