Message ID | 20200511101753.36610-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] x86/idle: prevent entering C6 with in service interrupts on Intel | expand |
On 11/05/2020 11:17, Roger Pau Monne wrote: > Apply a workaround for Intel errata CLX30: "A Pending Fixed Interrupt > May Be Dispatched Before an Interrupt of The Same Priority Completes". > > It's not clear which models are affected, as the errata is listed in > the "Second Generation Intel Xeon Scalable Processors" specification > update, but the issue has been seen as far back as Nehalem processors. Really? I'm only aware of it being Haswell and later. CLX30 was just one single example I gave you. It is public in all the specification updates going backwards, and is for example SKX100, BDX99 etc. > Apply the workaround to all Intel processors, the condition can be > relaxed later. Nothing in the code checks ISR, so we're applying "no power saving" unilaterally rather than in the very rare corner case that it occurs. I'm also not aware of it affecting Atom processors. This will cripple anything running on battery power, and is therefore not an appropriate fix in this form. ~Andrew
On Mon, May 11, 2020 at 11:38:49AM +0100, Andrew Cooper wrote: > On 11/05/2020 11:17, Roger Pau Monne wrote: > > Apply a workaround for Intel errata CLX30: "A Pending Fixed Interrupt > > May Be Dispatched Before an Interrupt of The Same Priority Completes". > > > > It's not clear which models are affected, as the errata is listed in > > the "Second Generation Intel Xeon Scalable Processors" specification > > update, but the issue has been seen as far back as Nehalem processors. > > Really? I'm only aware of it being Haswell and later. > > CLX30 was just one single example I gave you. It is public in all the > specification updates going backwards, and is for example SKX100, BDX99 etc. Right, will update accordingly then. > > Apply the workaround to all Intel processors, the condition can be > > relaxed later. > > Nothing in the code checks ISR, so we're applying "no power saving" > unilaterally rather than in the very rare corner case that it occurs. We don't check ISR directly, but instead the stack of pending interrupts to EOI, which should match the vectors pending in the ISR? As vectors that can be masked are not held pending in the ISR. I can check ISR directly if that's any better, but AFAICT using cpu_has_pending_apic_eoi is equally effective and faster. > I'm also not aware of it affecting Atom processors. > > This will cripple anything running on battery power, and is therefore > not an appropriate fix in this form. TBH, I've tried it in it's current form and it doesn't trigger that often. Thanks, Roger.
On Mon, May 11, 2020 at 01:07:43PM +0200, Roger Pau Monné wrote: > On Mon, May 11, 2020 at 11:38:49AM +0100, Andrew Cooper wrote: > > On 11/05/2020 11:17, Roger Pau Monne wrote: > > > Apply a workaround for Intel errata CLX30: "A Pending Fixed Interrupt > > > May Be Dispatched Before an Interrupt of The Same Priority Completes". > > > > > > It's not clear which models are affected, as the errata is listed in > > > the "Second Generation Intel Xeon Scalable Processors" specification > > > update, but the issue has been seen as far back as Nehalem processors. > > > > Really? I'm only aware of it being Haswell and later. So I've found the following related erratas: BDX99: E7-8800 v4, E7-4800 v4 (Broadwell) CLX30: 2nd Gen Xeon Scalable (Cascade Lake) SKX100: Xeon Scalable (Skylake) CFW125: E-2100, E-2200 (Kaby Lake) BDF104: E5-2600 v4 (Broadwell) BDH85: i7 LGA2011 v3 socket (Broadwell) BDM135: 5th Gen Intel Core, Core-M, Mobile Intel Pentium/Celeron (Broadwell) KBW131: E3-1200 v6 (Kaby Lake) So my plan would be to cover all chips from Broadwell to Cascade Lake, this would be: Broadwell, Skylake, Kaby Lake, Coffee Lake, {Cannon/Whiskey/Amber} Lake and Cascade Lake. I haven't found any mention of the issue in the Haswell specification updates, and the one for Xeon E7 v3 was last updated in April 2020. I think the list of IDs to match against should be: #define INTEL_FAM6_MODEL(m) { X86_VENDOR_INTEL, 6, m, X86_FEATURE_ALWAYS } { /* Broadwell */ INTEL_FAM6_MODEL(0x47), INTEL_FAM6_MODEL(0x3d), INTEL_FAM6_MODEL(0x4f), INTEL_FAM6_MODEL(0x56), /* Skylake (client) */ INTEL_FAM6_MODEL(0x5e), INTEL_FAM6_MODEL(0x4e), /* {Sky/Cascade}lake (server) */ INTEL_FAM6_MODEL(0x55), /* {Kaby/Coffee/Whiskey/Amber} Lake */ INTEL_FAM6_MODEL(0x9e), INTEL_FAM6_MODEL(0x8e), /* Cannon Lake */ INTEL_FAM6_MODEL(0x66), {} } Let me know if that sounds sensible. Thanks, Roger.
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index ee12b0f53f..6e868a2185 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -652,6 +652,14 @@ Specify the size of the console debug trace buffer. By specifying `cpu:` additionally a trace buffer of the specified size is allocated per cpu. The debug trace feature is only enabled in debugging builds of Xen. +### disable-c6-isr +> `= <boolean>` + +> Default: `true for Intel CPUs` + +Workaround for Intel errata CLX30. Prevent entering C6 idle states with in +service local APIC interrupts. Enabled by default for all Intel CPUs. + ### dma_bits > `= <integer>` diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c index b83446e77d..8ef05aeba3 100644 --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -555,20 +555,21 @@ void trace_exit_reason(u32 *irq_traced) * There was an errata with some Core i7 processors that an EOI transaction * may not be sent if software enters core C6 during an interrupt service * routine. So we don't enter deep Cx state if there is an EOI pending. + * + * Errata CLX30: A Pending Fixed Interrupt May Be Dispatched Before an + * Interrupt of The Same Priority Completes. + * + * Prevent entering C6 if there are pending lapic interrupts, or else the + * processor might dispatch further pending interrupts before the first one has + * been completed. */ -static bool errata_c6_eoi_workaround(void) +bool errata_c6_eoi_workaround(void) { - static int8_t fix_needed = -1; + static int8_t __read_mostly fix_needed = -1; + boolean_param("disable-c6-isr", fix_needed); if ( unlikely(fix_needed == -1) ) - { - int model = boot_cpu_data.x86_model; - fix_needed = (cpu_has_apic && !directed_eoi_enabled && - (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && - (boot_cpu_data.x86 == 6) && - ((model == 0x1a) || (model == 0x1e) || (model == 0x1f) || - (model == 0x25) || (model == 0x2c) || (model == 0x2f))); - } + fix_needed = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; return (fix_needed && cpu_has_pending_apic_eoi()); } @@ -676,7 +677,7 @@ static void acpi_processor_idle(void) return; } - if ( (cx->type == ACPI_STATE_C3) && errata_c6_eoi_workaround() ) + if ( (cx->type >= ACPI_STATE_C3) && errata_c6_eoi_workaround() ) cx = power->safe_state; diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c index b81937966e..bb017c488f 100644 --- a/xen/arch/x86/cpu/mwait-idle.c +++ b/xen/arch/x86/cpu/mwait-idle.c @@ -770,6 +770,9 @@ static void mwait_idle(void) return; } + if (cx->type >= 3 && errata_c6_eoi_workaround()) + cx = power->safe_state; + eax = cx->address; cstate = ((eax >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1; diff --git a/xen/include/asm-x86/cpuidle.h b/xen/include/asm-x86/cpuidle.h index 5d7dffd228..13879f58a1 100644 --- a/xen/include/asm-x86/cpuidle.h +++ b/xen/include/asm-x86/cpuidle.h @@ -26,4 +26,5 @@ void update_idle_stats(struct acpi_processor_power *, void update_last_cx_stat(struct acpi_processor_power *, struct acpi_processor_cx *, uint64_t); +bool errata_c6_eoi_workaround(void); #endif /* __X86_ASM_CPUIDLE_H__ */
Apply a workaround for Intel errata CLX30: "A Pending Fixed Interrupt May Be Dispatched Before an Interrupt of The Same Priority Completes". It's not clear which models are affected, as the errata is listed in the "Second Generation Intel Xeon Scalable Processors" specification update, but the issue has been seen as far back as Nehalem processors. Apply the workaround to all Intel processors, the condition can be relaxed later. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Changes since v1: - Unify workaround with errata_c6_eoi_workaround. - Properly check state in both acpi and mwait drivers. --- docs/misc/xen-command-line.pandoc | 8 ++++++++ xen/arch/x86/acpi/cpu_idle.c | 23 ++++++++++++----------- xen/arch/x86/cpu/mwait-idle.c | 3 +++ xen/include/asm-x86/cpuidle.h | 1 + 4 files changed, 24 insertions(+), 11 deletions(-)