diff mbox series

[1/2] x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata

Message ID 05257008-13e3-0d49-cd1d-6a8c9eee2ce5@suse.com (mailing list archive)
State Superseded
Headers show
Series x86: port self-snoop related patches from Linux | expand

Commit Message

Jan Beulich July 18, 2019, 12:09 p.m. UTC
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Processors which have self-snooping capability can handle conflicting
memory type across CPUs by snooping its own cache. However, there exists
CPU models in which having conflicting memory types still leads to
unpredictable behavior, machine check errors, or hangs.

Clear this feature on affected CPUs to prevent its use.

Suggested-by: Alan Cox <alan.cox@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
[Linux commit 1e03bff3600101bd9158d005e4313132e55bdec8]

Strip Yonah - as per ark.intel.com it doesn't look to be 64-bit capable.
Call check_memory_type_self_snoop_errata() only on the boot CPU.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Andrew Cooper July 18, 2019, 12:23 p.m. UTC | #1
On 18/07/2019 13:09, Jan Beulich wrote:
> --- a/xen/arch/x86/cpu/intel.c
> +++ b/xen/arch/x86/cpu/intel.c
> @@ -15,6 +15,32 @@
>   #include "cpu.h"
>   
>   /*
> + * Processors which have self-snooping capability can handle conflicting
> + * memory type across CPUs by snooping its own cache. However, there exists
> + * CPU models in which having conflicting memory types still leads to
> + * unpredictable behavior, machine check errors, or hangs. Clear this
> + * feature to prevent its use on machines with known erratas.
> + */
> +static void __init check_memory_type_self_snoop_errata(void)
> +{
> +	switch (boot_cpu_data.x86_model) {
> +	case 0x0f: /* Merom */
> +	case 0x16: /* Merom L */
> +	case 0x17: /* Penryn */
> +	case 0x1d: /* Dunnington */
> +	case 0x1e: /* Nehalem */
> +	case 0x1f: /* Auburndale / Havendale */
> +	case 0x1a: /* Nehalem EP */
> +	case 0x2e: /* Nehalem EX */
> +	case 0x25: /* Westmere */
> +	case 0x2c: /* Westmere EP */
> +	case 0x2a: /* SandyBridge */

It would have been nice if the errata had actually been identified...

> +		setup_clear_cpu_cap(X86_FEATURE_SS);

I'm regretting exposing SS to guests at this point.

As this stands, it will result in a migration compatibility issue,
because updating Xen will cause a feature to disappear.  If we had a
default vs full policy split, this would be easy enough to work around
in a compatible way.  I wonder if there is anything clever we can do in
the meantime as a stopgap workaround.

~Andrew
Jan Beulich July 18, 2019, 1:07 p.m. UTC | #2
On 18.07.2019 14:23, Andrew Cooper wrote:
> On 18/07/2019 13:09, Jan Beulich wrote:
>> --- a/xen/arch/x86/cpu/intel.c
>> +++ b/xen/arch/x86/cpu/intel.c
>> @@ -15,6 +15,32 @@
>>    #include "cpu.h"
>>    
>>    /*
>> + * Processors which have self-snooping capability can handle conflicting
>> + * memory type across CPUs by snooping its own cache. However, there exists
>> + * CPU models in which having conflicting memory types still leads to
>> + * unpredictable behavior, machine check errors, or hangs. Clear this
>> + * feature to prevent its use on machines with known erratas.
>> + */
>> +static void __init check_memory_type_self_snoop_errata(void)
>> +{
>> +	switch (boot_cpu_data.x86_model) {
>> +	case 0x0f: /* Merom */
>> +	case 0x16: /* Merom L */
>> +	case 0x17: /* Penryn */
>> +	case 0x1d: /* Dunnington */
>> +	case 0x1e: /* Nehalem */
>> +	case 0x1f: /* Auburndale / Havendale */
>> +	case 0x1a: /* Nehalem EP */
>> +	case 0x2e: /* Nehalem EX */
>> +	case 0x25: /* Westmere */
>> +	case 0x2c: /* Westmere EP */
>> +	case 0x2a: /* SandyBridge */
> 
> It would have been nice if the errata had actually been identified...

Indeed; I hope you don't expect me to go hunt them down. I'm
cloning a Linux commit here only, after all.

>> +		setup_clear_cpu_cap(X86_FEATURE_SS);
> 
> I'm regretting exposing SS to guests at this point.
> 
> As this stands, it will result in a migration compatibility issue,
> because updating Xen will cause a feature to disappear.  If we had a
> default vs full policy split, this would be easy enough to work around
> in a compatible way.  I wonder if there is anything clever we can do in
> the meantime as a stopgap workaround.

Should we perhaps introduce X86_FEATURE_XEN_SELFSNOOP, just like
we do for SMEP and SMAP, such that we can leave the real one alone?

Jan
Andrew Cooper July 18, 2019, 1:18 p.m. UTC | #3
On 18/07/2019 14:07, Jan Beulich wrote:
> On 18.07.2019 14:23, Andrew Cooper wrote:
>> On 18/07/2019 13:09, Jan Beulich wrote:
>>> --- a/xen/arch/x86/cpu/intel.c
>>> +++ b/xen/arch/x86/cpu/intel.c
>>> @@ -15,6 +15,32 @@
>>>    #include "cpu.h"
>>>    
>>>    /*
>>> + * Processors which have self-snooping capability can handle conflicting
>>> + * memory type across CPUs by snooping its own cache. However, there exists
>>> + * CPU models in which having conflicting memory types still leads to
>>> + * unpredictable behavior, machine check errors, or hangs. Clear this
>>> + * feature to prevent its use on machines with known erratas.
>>> + */
>>> +static void __init check_memory_type_self_snoop_errata(void)
>>> +{
>>> +	switch (boot_cpu_data.x86_model) {
>>> +	case 0x0f: /* Merom */
>>> +	case 0x16: /* Merom L */
>>> +	case 0x17: /* Penryn */
>>> +	case 0x1d: /* Dunnington */
>>> +	case 0x1e: /* Nehalem */
>>> +	case 0x1f: /* Auburndale / Havendale */
>>> +	case 0x1a: /* Nehalem EP */
>>> +	case 0x2e: /* Nehalem EX */
>>> +	case 0x25: /* Westmere */
>>> +	case 0x2c: /* Westmere EP */
>>> +	case 0x2a: /* SandyBridge */
>> It would have been nice if the errata had actually been identified...
> Indeed; I hope you don't expect me to go hunt them down. I'm
> cloning a Linux commit here only, after all.

No - I don't expect you to hunt them down.

I'll try to keep an eye out next time I'm in the relevant documents.

>
>>> +		setup_clear_cpu_cap(X86_FEATURE_SS);
>> I'm regretting exposing SS to guests at this point.
>>
>> As this stands, it will result in a migration compatibility issue,
>> because updating Xen will cause a feature to disappear.  If we had a
>> default vs full policy split, this would be easy enough to work around
>> in a compatible way.  I wonder if there is anything clever we can do in
>> the meantime as a stopgap workaround.
> Should we perhaps introduce X86_FEATURE_XEN_SELFSNOOP, just like
> we do for SMEP and SMAP, such that we can leave the real one alone?

Ah yes - that will do just fine.

When we get a default/max split, I'll try to remember to take it back out.

~Andrew
diff mbox series

Patch

--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -15,6 +15,32 @@ 
  #include "cpu.h"
  
  /*
+ * Processors which have self-snooping capability can handle conflicting
+ * memory type across CPUs by snooping its own cache. However, there exists
+ * CPU models in which having conflicting memory types still leads to
+ * unpredictable behavior, machine check errors, or hangs. Clear this
+ * feature to prevent its use on machines with known erratas.
+ */
+static void __init check_memory_type_self_snoop_errata(void)
+{
+	switch (boot_cpu_data.x86_model) {
+	case 0x0f: /* Merom */
+	case 0x16: /* Merom L */
+	case 0x17: /* Penryn */
+	case 0x1d: /* Dunnington */
+	case 0x1e: /* Nehalem */
+	case 0x1f: /* Auburndale / Havendale */
+	case 0x1a: /* Nehalem EP */
+	case 0x2e: /* Nehalem EX */
+	case 0x25: /* Westmere */
+	case 0x2c: /* Westmere EP */
+	case 0x2a: /* SandyBridge */
+		setup_clear_cpu_cap(X86_FEATURE_SS);
+		break;
+	}
+}
+
+/*
   * Set caps in expected_levelling_cap, probe a specific masking MSR, and set
   * caps in levelling_caps if it is found, or clobber the MSR index if missing.
   * If preset, reads the default value into msr_val.
@@ -256,8 +282,11 @@  static void early_init_intel(struct cpui
  	    (boot_cpu_data.x86_mask == 3 || boot_cpu_data.x86_mask == 4))
  		paddr_bits = 36;
  
-	if (c == &boot_cpu_data)
+	if (c == &boot_cpu_data) {
+		check_memory_type_self_snoop_errata();
+
  		intel_init_levelling();
+	}
  
  	ctxt_switch_levelling(NULL);
  }