Message ID | 20200507132236.26010-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/idle: prevent entering C6 with in service interrupts on Intel | expand |
On 07.05.2020 15:22, Roger Pau Monne wrote: > Apply a workaround for Intel errata CLX30: "A Pending Fixed Interrupt > May Be Dispatched Before an Interrupt of The Same Priority Completes". > > It's not clear which models are affected, as the errata is listed in > the "Second Generation Intel Xeon Scalable Processors" specification > update, but the issue has been seen as far back as Nehalem processors. > Apply the workaround to all Intel processors, the condition can be > relaxed later. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > docs/misc/xen-command-line.pandoc | 8 ++++++++ > xen/arch/x86/acpi/cpu_idle.c | 22 +++++++++++++++++++++- > xen/arch/x86/cpu/mwait-idle.c | 3 +++ > xen/include/asm-x86/cpuidle.h | 1 + > 4 files changed, 33 insertions(+), 1 deletion(-) > > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc > index ee12b0f53f..6e868a2185 100644 > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -652,6 +652,14 @@ Specify the size of the console debug trace buffer. By specifying `cpu:` > additionally a trace buffer of the specified size is allocated per cpu. > The debug trace feature is only enabled in debugging builds of Xen. > > +### disable-c6-isr > +> `= <boolean>` > + > +> Default: `true for Intel CPUs` > + > +Workaround for Intel errata CLX30. Prevent entering C6 idle states with in > +service local APIC interrupts. Enabled by default for all Intel CPUs. > + > ### dma_bits > > `= <integer>` > > diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c > index b83446e77d..5023fea148 100644 > --- a/xen/arch/x86/acpi/cpu_idle.c > +++ b/xen/arch/x86/acpi/cpu_idle.c > @@ -573,6 +573,25 @@ static bool errata_c6_eoi_workaround(void) > return (fix_needed && cpu_has_pending_apic_eoi()); > } > > +static int8_t __read_mostly disable_c6_isr = -1; > +boolean_param("disable-c6-isr", disable_c6_isr); > + > +/* > + * Errata CLX30: A Pending Fixed Interrupt May Be Dispatched Before an > + * Interrupt of The Same Priority Completes. > + * > + * Prevent entering C6 if there are pending lapic interrupts, or else the > + * processor might dispatch further pending interrupts before the first one has > + * been completed. > + */ > +bool errata_c6_isr_workaround(void) > +{ > + if ( unlikely(disable_c6_isr == -1) ) > + disable_c6_isr = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; > + > + return disable_c6_isr && cpu_has_pending_apic_eoi(); This check being the same as in errata_c6_eoi_workaround(), why don't you simply extend that function? (See also below.) Also both variable and command line option descriptor could go inside the function, to limit their scopes. > @@ -676,7 +695,8 @@ static void acpi_processor_idle(void) > return; > } > > - if ( (cx->type == ACPI_STATE_C3) && errata_c6_eoi_workaround() ) > + if ( (cx->type == ACPI_STATE_C3) && > + (errata_c6_eoi_workaround() || errata_c6_isr_workaround()) ) > cx = power->safe_state; I realize you only add to existing code, but I'm afraid this existing code cannot be safely added to. Already prior to your change there was a curious mismatch of C3 and c6 on this line, and I don't see how ACPI_STATE_C3 correlates with "core C6" state. Now this may have been the convention for Nehalem/Westmere systems, but already the mwait-idle entries for these CPU models have 4 entries (albeit such that they match this scheme). As a result I think this at the very least needs to be >=, not ==, even more so now that you want to cover all Intel CPUs. Obviously (I think) it is a mistake that mwait-idle doesn't also activate this workaround, which imo is another reason to stick to just errata_c6_eoi_workaround(). > --- a/xen/arch/x86/cpu/mwait-idle.c > +++ b/xen/arch/x86/cpu/mwait-idle.c > @@ -770,6 +770,9 @@ static void mwait_idle(void) > return; > } > > + if (cx->type == ACPI_STATE_C3 && errata_c6_isr_workaround()) > + cx = power->safe_state; Here it becomes even more relevant I think that == not be used, as the static tables list deeper C-states; it's just that the SKX table, which also gets used for CLX afaict, has no entries beyond C6-SKX. Others with deeper ones presumably have the deeper C-states similarly affected (or not) by this erratum. For reference, mwait_idle_cpu_init() has hint = flg2MWAIT(cpuidle_state_table[cstate].flags); state = MWAIT_HINT2CSTATE(hint) + 1; ... cx->type = state; i.e. the value you compare against is derived from the static table entries. For Nehalem/Westmere this means that what goes under ACPI_STATE_C3 is indeed C6-NHM, and this looks to match for all the non-Atoms, but for none of the Atoms. Now Atoms could easily be unaffected, but (just to take an example) if C6-SNB was affected, surely C7-SNB would be, too. Jan
On Fri, May 08, 2020 at 03:46:08PM +0200, Jan Beulich wrote: > On 07.05.2020 15:22, Roger Pau Monne wrote: > > diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c > > index b83446e77d..5023fea148 100644 > > --- a/xen/arch/x86/acpi/cpu_idle.c > > +++ b/xen/arch/x86/acpi/cpu_idle.c > > @@ -573,6 +573,25 @@ static bool errata_c6_eoi_workaround(void) > > return (fix_needed && cpu_has_pending_apic_eoi()); > > } > > > > +static int8_t __read_mostly disable_c6_isr = -1; > > +boolean_param("disable-c6-isr", disable_c6_isr); > > + > > +/* > > + * Errata CLX30: A Pending Fixed Interrupt May Be Dispatched Before an > > + * Interrupt of The Same Priority Completes. > > + * > > + * Prevent entering C6 if there are pending lapic interrupts, or else the > > + * processor might dispatch further pending interrupts before the first one has > > + * been completed. > > + */ > > +bool errata_c6_isr_workaround(void) > > +{ > > + if ( unlikely(disable_c6_isr == -1) ) > > + disable_c6_isr = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; > > + > > + return disable_c6_isr && cpu_has_pending_apic_eoi(); > > This check being the same as in errata_c6_eoi_workaround(), > why don't you simply extend that function? (See also below.) > Also both variable and command line option descriptor could > go inside the function, to limit their scopes. Since this is actually a superset (as it covers all Intel CPUs), I should delete the previous (more restricted) workaround matching logic. > > @@ -676,7 +695,8 @@ static void acpi_processor_idle(void) > > return; > > } > > > > - if ( (cx->type == ACPI_STATE_C3) && errata_c6_eoi_workaround() ) > > + if ( (cx->type == ACPI_STATE_C3) && > > + (errata_c6_eoi_workaround() || errata_c6_isr_workaround()) ) > > cx = power->safe_state; > > I realize you only add to existing code, but I'm afraid this > existing code cannot be safely added to. Already prior to > your change there was a curious mismatch of C3 and c6 on this > line, and I don't see how ACPI_STATE_C3 correlates with > "core C6" state. Now this may have been the convention for > Nehalem/Westmere systems, but already the mwait-idle entries > for these CPU models have 4 entries (albeit such that they > match this scheme). As a result I think this at the very > least needs to be >=, not ==, even more so now that you want > to cover all Intel CPUs. Hm, I think this is because AFAICT acpi_processor_idle only understands up to ACPI_STATE_C3, passing a type > ACPI_STATE_C3 will just cause it to fallback to C0. I've adjusted the comparison to use >= instead, as a safety imporvement in case we add support for more states to acpi_processor_idle. > Obviously (I think) it is a mistake that mwait-idle doesn't > also activate this workaround, which imo is another reason to > stick to just errata_c6_eoi_workaround(). I assumed the previous workaround didn't apply to any of the CPUs supported by the mwait-idle driver, since it seems to only support recent-ish models. > > --- a/xen/arch/x86/cpu/mwait-idle.c > > +++ b/xen/arch/x86/cpu/mwait-idle.c > > @@ -770,6 +770,9 @@ static void mwait_idle(void) > > return; > > } > > > > + if (cx->type == ACPI_STATE_C3 && errata_c6_isr_workaround()) > > + cx = power->safe_state; > > Here it becomes even more relevant I think that == not be > used, as the static tables list deeper C-states; it's just > that the SKX table, which also gets used for CLX afaict, has > no entries beyond C6-SKX. Others with deeper ones presumably > have the deeper C-states similarly affected (or not) by this > erratum. > > For reference, mwait_idle_cpu_init() has > > hint = flg2MWAIT(cpuidle_state_table[cstate].flags); > state = MWAIT_HINT2CSTATE(hint) + 1; > ... > cx->type = state; > > i.e. the value you compare against is derived from the static > table entries. For Nehalem/Westmere this means that what goes > under ACPI_STATE_C3 is indeed C6-NHM, and this looks to match > for all the non-Atoms, but for none of the Atoms. Now Atoms > could easily be unaffected, but (just to take an example) if > C6-SNB was affected, surely C7-SNB would be, too. Yes, I've adjusted this to use cx->type >= 3 instead. I have to admit I'm confused by the name of the states being C6-* while the cstate value reported by Xen will be 3 (I would instead expect the type to be 6 in order to match the name). Thanks, Roger.
On 08.05.2020 19:24, Roger Pau Monné wrote: > On Fri, May 08, 2020 at 03:46:08PM +0200, Jan Beulich wrote: >> On 07.05.2020 15:22, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/cpu/mwait-idle.c >>> +++ b/xen/arch/x86/cpu/mwait-idle.c >>> @@ -770,6 +770,9 @@ static void mwait_idle(void) >>> return; >>> } >>> >>> + if (cx->type == ACPI_STATE_C3 && errata_c6_isr_workaround()) >>> + cx = power->safe_state; >> >> Here it becomes even more relevant I think that == not be >> used, as the static tables list deeper C-states; it's just >> that the SKX table, which also gets used for CLX afaict, has >> no entries beyond C6-SKX. Others with deeper ones presumably >> have the deeper C-states similarly affected (or not) by this >> erratum. >> >> For reference, mwait_idle_cpu_init() has >> >> hint = flg2MWAIT(cpuidle_state_table[cstate].flags); >> state = MWAIT_HINT2CSTATE(hint) + 1; >> ... >> cx->type = state; >> >> i.e. the value you compare against is derived from the static >> table entries. For Nehalem/Westmere this means that what goes >> under ACPI_STATE_C3 is indeed C6-NHM, and this looks to match >> for all the non-Atoms, but for none of the Atoms. Now Atoms >> could easily be unaffected, but (just to take an example) if >> C6-SNB was affected, surely C7-SNB would be, too. > > Yes, I've adjusted this to use cx->type >= 3 instead. I have to admit > I'm confused by the name of the states being C6-* while the cstate > value reported by Xen will be 3 (I would instead expect the type to be > 6 in order to match the name). Well, the problem is the disagreement in numbering between ACPI and what the various CPU specs actually use. Jan
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index ee12b0f53f..6e868a2185 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -652,6 +652,14 @@ Specify the size of the console debug trace buffer. By specifying `cpu:` additionally a trace buffer of the specified size is allocated per cpu. The debug trace feature is only enabled in debugging builds of Xen. +### disable-c6-isr +> `= <boolean>` + +> Default: `true for Intel CPUs` + +Workaround for Intel errata CLX30. Prevent entering C6 idle states with in +service local APIC interrupts. Enabled by default for all Intel CPUs. + ### dma_bits > `= <integer>` diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c index b83446e77d..5023fea148 100644 --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -573,6 +573,25 @@ static bool errata_c6_eoi_workaround(void) return (fix_needed && cpu_has_pending_apic_eoi()); } +static int8_t __read_mostly disable_c6_isr = -1; +boolean_param("disable-c6-isr", disable_c6_isr); + +/* + * Errata CLX30: A Pending Fixed Interrupt May Be Dispatched Before an + * Interrupt of The Same Priority Completes. + * + * Prevent entering C6 if there are pending lapic interrupts, or else the + * processor might dispatch further pending interrupts before the first one has + * been completed. + */ +bool errata_c6_isr_workaround(void) +{ + if ( unlikely(disable_c6_isr == -1) ) + disable_c6_isr = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; + + return disable_c6_isr && cpu_has_pending_apic_eoi(); +} + void update_last_cx_stat(struct acpi_processor_power *power, struct acpi_processor_cx *cx, uint64_t ticks) { @@ -676,7 +695,8 @@ static void acpi_processor_idle(void) return; } - if ( (cx->type == ACPI_STATE_C3) && errata_c6_eoi_workaround() ) + if ( (cx->type == ACPI_STATE_C3) && + (errata_c6_eoi_workaround() || errata_c6_isr_workaround()) ) cx = power->safe_state; diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c index b81937966e..e14cdaeed7 100644 --- a/xen/arch/x86/cpu/mwait-idle.c +++ b/xen/arch/x86/cpu/mwait-idle.c @@ -770,6 +770,9 @@ static void mwait_idle(void) return; } + if (cx->type == ACPI_STATE_C3 && errata_c6_isr_workaround()) + cx = power->safe_state; + eax = cx->address; cstate = ((eax >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1; diff --git a/xen/include/asm-x86/cpuidle.h b/xen/include/asm-x86/cpuidle.h index 5d7dffd228..8b9d6fdb15 100644 --- a/xen/include/asm-x86/cpuidle.h +++ b/xen/include/asm-x86/cpuidle.h @@ -26,4 +26,5 @@ void update_idle_stats(struct acpi_processor_power *, void update_last_cx_stat(struct acpi_processor_power *, struct acpi_processor_cx *, uint64_t); +bool errata_c6_isr_workaround(void); #endif /* __X86_ASM_CPUIDLE_H__ */
Apply a workaround for Intel errata CLX30: "A Pending Fixed Interrupt May Be Dispatched Before an Interrupt of The Same Priority Completes". It's not clear which models are affected, as the errata is listed in the "Second Generation Intel Xeon Scalable Processors" specification update, but the issue has been seen as far back as Nehalem processors. Apply the workaround to all Intel processors, the condition can be relaxed later. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- docs/misc/xen-command-line.pandoc | 8 ++++++++ xen/arch/x86/acpi/cpu_idle.c | 22 +++++++++++++++++++++- xen/arch/x86/cpu/mwait-idle.c | 3 +++ xen/include/asm-x86/cpuidle.h | 1 + 4 files changed, 33 insertions(+), 1 deletion(-)