Message ID | 20250211194407.2577252-6-sohil.mehta@intel.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | Prepare for new Intel Family numbers | expand |
On 2/11/25 11:43, Sohil Mehta wrote: > + /* > + * Modern CPUs are generally expected to have a sane fast string > + * implementation. However, the BIOS may disable it on certain CPUs > + * via the architectural FAST_STRING bit. > + */ > + if (IS_ENABLED(CONFIG_X86_64) && (c->x86 == 6 || c->x86 > 15)) > + set_cpu_cap(c, X86_FEATURE_REP_GOOD); I'm not sure the BIOS comment is helpful here. Also, at this point, let's just make the check >=6 (or the >=PPRO equivalent). It will only matter if *all* of these are true: 1. Someone has a 64-bit capable P4 that powers on 2. They're running a 64-bit mainline kernel 3. String copy is *actually* slower than the alternative 4. They are performance sensitive enough to notice We don't even know the answer to #3 for sure. Let's just say what we're doing in a comment: /* Assume that any 64-bit CPU has a good implementation */
On 11/02/2025 8:53 pm, Dave Hansen wrote: > On 2/11/25 11:43, Sohil Mehta wrote: >> + /* >> + * Modern CPUs are generally expected to have a sane fast string >> + * implementation. However, the BIOS may disable it on certain CPUs >> + * via the architectural FAST_STRING bit. >> + */ >> + if (IS_ENABLED(CONFIG_X86_64) && (c->x86 == 6 || c->x86 > 15)) >> + set_cpu_cap(c, X86_FEATURE_REP_GOOD); > I'm not sure the BIOS comment is helpful here. > > Also, at this point, let's just make the check >=6 (or the >=PPRO > equivalent). > > It will only matter if *all* of these are true: > 1. Someone has a 64-bit capable P4 that powers on > 2. They're running a 64-bit mainline kernel > 3. String copy is *actually* slower than the alternative > 4. They are performance sensitive enough to notice > > We don't even know the answer to #3 for sure. Let's just say what we're > doing in a comment: > > /* Assume that any 64-bit CPU has a good implementation */ If you're going to override the BIOS setting, then you need to explicitly set MSR_MISC_ENABLE.FAST_STRINGS. Otherwise you're claiming to Linux that REP is good even when hardware is prohibited from using optimisations. ~Andrew
On 2/11/2025 4:54 PM, Andrew Cooper wrote: > If you're going to override the BIOS setting, then you need to > explicitly set MSR_MISC_ENABLE.FAST_STRINGS. > > Otherwise you're claiming to Linux that REP is good even when hardware > is prohibited from using optimisations. > I think the current checks have unnecessary overlap which makes them confusing. We should be fine if we only rely on the architectural MSR_MISC_ENABLE.FAST_STRINGS bit and rely just on the BIOS setting. My justification is below. The simplified version of the current checks is as follows: Check 1 (Based on Family Model numbers): > /* > * Unconditionally set REP_GOOD on early Family 6 processors > */ > if (IS_ENABLED(CONFIG_X86_64) && > (c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm < INTEL_PENTIUM_M_DOTHAN)) > set_cpu_cap(c, X86_FEATURE_REP_GOOD); This check is mostly redundant since it is targeted for 64 bit and very few if any of those CPUs support 64 bit processing. I suggest that we get rid of this check completely. The risk here is fairly limited as well. Check 2 (Based on MISC_ENABLE.FAST_STRING): > /* > * If fast string is not enabled in IA32_MISC_ENABLE for any reason, > * clear the fast string and enhanced fast string CPU capabilities. > */ > if (c->x86_vfm >= INTEL_PENTIUM_M_DOTHAN) { > rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) { > /* X86_FEATURE_ERMS will be automatically set based on CPUID */ > set_cpu_cap(c, X86_FEATURE_REP_GOOD); > } else { > pr_info("Disabled fast string operations\n"); > setup_clear_cpu_cap(X86_FEATURE_REP_GOOD); > setup_clear_cpu_cap(X86_FEATURE_ERMS); > } > } This is the only real check that is needed and should likely suffice in all meaningful scenarios. Comments?
On 12/02/2025 9:19 pm, Sohil Mehta wrote: > Check 1 (Based on Family Model numbers): >> /* >> * Unconditionally set REP_GOOD on early Family 6 processors >> */ >> if (IS_ENABLED(CONFIG_X86_64) && >> (c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm < INTEL_PENTIUM_M_DOTHAN)) >> set_cpu_cap(c, X86_FEATURE_REP_GOOD); > This check is mostly redundant since it is targeted for 64 bit and very > few if any of those CPUs support 64 bit processing. I suggest that we > get rid of this check completely. The risk here is fairly limited as well. PENTIUM_PRO is model 0x1. M_DOTHAN isn't introduced until patch 10, but is model 0xd. And model 0xf (Memron) is the first 64bit capable fam6 CPU, so this is dead code given the CONFIG_X86_64 which the compiler can't actually optimise out. > > Check 2 (Based on MISC_ENABLE.FAST_STRING): >> /* >> * If fast string is not enabled in IA32_MISC_ENABLE for any reason, >> * clear the fast string and enhanced fast string CPU capabilities. I'd suggest that a better way of phrasing this is: /* BIOSes typically have a knob for Fast Strings. Honour the user's wishes. */ >> */ >> if (c->x86_vfm >= INTEL_PENTIUM_M_DOTHAN) { >> rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); >> if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) { >> /* X86_FEATURE_ERMS will be automatically set based on CPUID */ >> set_cpu_cap(c, X86_FEATURE_REP_GOOD); >> } else { >> pr_info("Disabled fast string operations\n"); >> setup_clear_cpu_cap(X86_FEATURE_REP_GOOD); >> setup_clear_cpu_cap(X86_FEATURE_ERMS); >> } >> } MSR_MISC_ENABLE exists on all 64bit CPUs, and some 32bit ones too. Therefore, this section alone seems to suffice in order to set up REP_GOOD properly. ~Andrew
On 2/13/2025 3:02 PM, Andrew Cooper wrote: > On 12/02/2025 9:19 pm, Sohil Mehta wrote: >> Check 1 (Based on Family Model numbers): >>> /* >>> * Unconditionally set REP_GOOD on early Family 6 processors >>> */ >>> if (IS_ENABLED(CONFIG_X86_64) && >>> (c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm < INTEL_PENTIUM_M_DOTHAN)) >>> set_cpu_cap(c, X86_FEATURE_REP_GOOD); >> This check is mostly redundant since it is targeted for 64 bit and very >> few if any of those CPUs support 64 bit processing. I suggest that we >> get rid of this check completely. The risk here is fairly limited as well. > > PENTIUM_PRO is model 0x1. M_DOTHAN isn't introduced until patch 10, but > is model 0xd. > > And model 0xf (Memron) is the first 64bit capable fam6 CPU, so this is > dead code given the CONFIG_X86_64 which the compiler can't actually > optimise out. > Thanks for confirming. I figured this is likely dead code but wasn't completely sure.
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index e5f34a90963e..4f8b02cbe8c5 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -297,6 +297,14 @@ static void early_init_intel(struct cpuinfo_x86 *c) c->x86_vfm <= INTEL_CORE_YONAH) clear_cpu_cap(c, X86_FEATURE_PAT); + /* + * Modern CPUs are generally expected to have a sane fast string + * implementation. However, the BIOS may disable it on certain CPUs + * via the architectural FAST_STRING bit. + */ + if (IS_ENABLED(CONFIG_X86_64) && (c->x86 == 6 || c->x86 > 15)) + set_cpu_cap(c, X86_FEATURE_REP_GOOD); + /* * If fast string is not enabled in IA32_MISC_ENABLE for any reason, * clear the fast string and enhanced fast string CPU capabilities. @@ -556,8 +564,6 @@ static void init_intel(struct cpuinfo_x86 *c) #ifdef CONFIG_X86_64 if (c->x86 == 15) c->x86_cache_alignment = c->x86_clflush_size * 2; - if (c->x86 == 6) - set_cpu_cap(c, X86_FEATURE_REP_GOOD); #else /* * Names for the Pentium II/Celeron processors
X86_FEATURE_REP_GOOD is a linux defined feature flag to track whether fast string operations should be used for copy_page(). It is also used as a backup alternative for clear_page() if enhanced fast string operations (ERMS) are not available. Currently, the flag is only set for Family 6 processors. Extend the check to include upcoming processors in Family 18 and 19. It is uncertain whether X86_FEATURE_REP_GOOD should be set for Family 15 (Pentium 4) as well. Commit 185f3b9da24c ("x86: make intel.c have 64-bit support code") that originally set the flag also set the x86_cache_alignment preference for Family 15 processors in the same commit. The omission of the Family 15 may have been intentional. Also, move the check before a related check in early_init_intel() to avoid resetting the flag. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> --- v2: Separate out the REP_GOOD (copy page) specific change into a separate commit. From the archives, it wasn't exactly clear why the set_cpu_cap() and clear_cpu_cap() calls for X86_FEATURE_REP_GOOD are in distinct locations. Also, why there is a difference between 32-bit and 64-bit. Any insight there would be useful. For now, I have kept the change minimal based on my limited understanding. --- arch/x86/kernel/cpu/intel.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)