diff mbox series

VT-x: extend LBR Broadwell errata coverage

Message ID df6e8dad-b4c0-0821-46eb-e4aa86f8ccfa@suse.com (mailing list archive)
State Superseded
Headers show
Series VT-x: extend LBR Broadwell errata coverage | expand

Commit Message

Jan Beulich May 20, 2020, 12:52 p.m. UTC
For lbr_tsx_fixup_check() simply name a few more specific errata numbers.

For bdf93_fixup_check(), however, more models are affected. Oddly enough
despite being the same model and stepping, the erratum is listed for Xeon
E3 but not its Core counterpart. With this it's of course also uncertain
whether the absence of the erratum for Xeon D is actually meaningful.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Andrew Cooper May 20, 2020, 2:07 p.m. UTC | #1
On 20/05/2020 13:52, Jan Beulich wrote:
> For lbr_tsx_fixup_check() simply name a few more specific errata numbers.
>
> For bdf93_fixup_check(), however, more models are affected. Oddly enough
> despite being the same model and stepping, the erratum is listed for Xeon
> E3 but not its Core counterpart.

That is probably a documentation error.  These processors are made from
the same die, and are not going to deviate in this regard.

> With this it's of course also uncertain
> whether the absence of the erratum for Xeon D is actually meaningful.

Given BDE105, it is exceedingly unlikely that this erratum alone was
fixed, leaving the other related ones present.

The complicating factor is that the TSX errata were addressed in some
later Broadwell parts.  Both these errata groups are to do with a
mismatch of TSX metadata in LBR/LER records.

The former group affects Haswell and Broadwell, but only when microcode
has disabled TSX, and manifests in the processor rejecting
architecturally-correct last-branch-to records.  Any mode in the list
which still has TSX enabled in up-to-date microcode doesn't get the
workaround.

The latter group affects Broadwell only, and manifests as an
architecturally incorrect ler-from record, which shouldn't have any TSX
metadata to begin with.

>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2870,8 +2870,10 @@ static void __init lbr_tsx_fixup_check(v

There is a comment out of context here which is now stale.

If simply adding adding to the list is something you'd prefer to avoid,
what about /* Haswell/Broadwell LBR TSX metadata errata */ or similar?

>      case 0x45: /* HSM182 - 4th gen Core */
>      case 0x46: /* HSM182, HSD172 - 4th gen Core (GT3) */
>      case 0x3d: /* BDM127 - 5th gen Core */
> -    case 0x47: /* BDD117 - 5th gen Core (GT3) */
> -    case 0x4f: /* BDF85  - Xeon E5-2600 v4 */
> +    case 0x47: /* BDD117 - 5th gen Core (GT3)
> +                  BDW117 - Xeon E3-1200 v4 */
> +    case 0x4f: /* BDF85  - Xeon E5-2600 v4
> +                  BDX88  - Xeon E7-x800 v4 */
>      case 0x56: /* BDE105 - Xeon D-1500 */
>          break;
>      default:
> @@ -2895,15 +2897,26 @@ static void __init lbr_tsx_fixup_check(v
>  static void __init bdf93_fixup_check(void)

Seeing as this is no longer just BDF93, how about ler_tsx_fixup_check() ?

~Andrew

>  {
>      /*
> -     * Broadwell erratum BDF93:
> +     * Broadwell erratum BDF93 et al:
>       *
>       * Reads from MSR_LER_TO_LIP (MSR 1DEH) may return values for bits[63:61]
>       * that are not equal to bit[47].  Attempting to context switch this value
>       * may cause a #GP.  Software should sign extend the MSR.
>       */
> -    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
> -         boot_cpu_data.x86 == 6 && boot_cpu_data.x86_model == 0x4f )
> +    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
> +         boot_cpu_data.x86 != 6 )
> +        return;
> +
> +    switch ( boot_cpu_data.x86_model )
> +    {
> +    case 0x3d: /* BDM131 - 5th gen Core */
> +    case 0x47: /* BDD??? - 5th gen Core (H-Processor line)
> +                  BDW120 - Xeon E3-1200 v4 */
> +    case 0x4f: /* BDF93  - Xeon E5-2600 v4
> +                  BDX93  - Xeon E7-x800 v4 */
>          bdf93_fixup_needed = true;
> +        break;
> +    }
>  }
>  
>  static int is_last_branch_msr(u32 ecx)
Jan Beulich May 20, 2020, 3:56 p.m. UTC | #2
On 20.05.2020 16:07, Andrew Cooper wrote:
> On 20/05/2020 13:52, Jan Beulich wrote:
>> @@ -2895,15 +2897,26 @@ static void __init lbr_tsx_fixup_check(v
>>  static void __init bdf93_fixup_check(void)
> 
> Seeing as this is no longer just BDF93, how about ler_tsx_fixup_check() ?

I did consider renaming, and didn't do so just because this would
grow the patch size quite a bit. I'm fine doing so, but with the
name you suggest, is this one really as directly TSX related as the
other one? I had thought of something like lbr_bdw_fixup_check().

Jan
Andrew Cooper May 20, 2020, 4:51 p.m. UTC | #3
On 20/05/2020 16:56, Jan Beulich wrote:
> On 20.05.2020 16:07, Andrew Cooper wrote:
>> On 20/05/2020 13:52, Jan Beulich wrote:
>>> @@ -2895,15 +2897,26 @@ static void __init lbr_tsx_fixup_check(v
>>>  static void __init bdf93_fixup_check(void)
>> Seeing as this is no longer just BDF93, how about ler_tsx_fixup_check() ?
> I did consider renaming, and didn't do so just because this would
> grow the patch size quite a bit.

I don't see that as a problem.

> I'm fine doing so, but with the
> name you suggest, is this one really as directly TSX related as the
> other one? I had thought of something like lbr_bdw_fixup_check().

The errata wording doesn't mention TSX, but the breakage manifests in
the same way, with bits 61 and 62 clear but hardware expecting to see a
canonicalised value on restore.

Also, it is very specifically LER-from which gets clobbered, rather than
any of the regular LBR registers.

I'm not overly fussed what the naming is, but it oughtn't to include
bdf93 any more, now the scope of the workaround has been extended.

~Andrew
Andrew Cooper May 20, 2020, 9:27 p.m. UTC | #4
On 20/05/2020 13:52, Jan Beulich wrote:
> For lbr_tsx_fixup_check() simply name a few more specific errata numbers.
>
> For bdf93_fixup_check(), however, more models are affected. Oddly enough
> despite being the same model and stepping, the erratum is listed for Xeon
> E3 but not its Core counterpart. With this it's of course also uncertain
> whether the absence of the erratum for Xeon D is actually meaningful.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2870,8 +2870,10 @@ static void __init lbr_tsx_fixup_check(v
>      case 0x45: /* HSM182 - 4th gen Core */
>      case 0x46: /* HSM182, HSD172 - 4th gen Core (GT3) */
>      case 0x3d: /* BDM127 - 5th gen Core */
> -    case 0x47: /* BDD117 - 5th gen Core (GT3) */
> -    case 0x4f: /* BDF85  - Xeon E5-2600 v4 */
> +    case 0x47: /* BDD117 - 5th gen Core (GT3)
> +                  BDW117 - Xeon E3-1200 v4 */
> +    case 0x4f: /* BDF85  - Xeon E5-2600 v4
> +                  BDX88  - Xeon E7-x800 v4 */

After cross referencing with Roger's errata patch, BDH75, and ...

>      case 0x56: /* BDE105 - Xeon D-1500 */
>          break;
>      default:
> @@ -2895,15 +2897,26 @@ static void __init lbr_tsx_fixup_check(v
>  static void __init bdf93_fixup_check(void)
>  {
>      /*
> -     * Broadwell erratum BDF93:
> +     * Broadwell erratum BDF93 et al:
>       *
>       * Reads from MSR_LER_TO_LIP (MSR 1DEH) may return values for bits[63:61]
>       * that are not equal to bit[47].  Attempting to context switch this value
>       * may cause a #GP.  Software should sign extend the MSR.
>       */
> -    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
> -         boot_cpu_data.x86 == 6 && boot_cpu_data.x86_model == 0x4f )
> +    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
> +         boot_cpu_data.x86 != 6 )
> +        return;
> +
> +    switch ( boot_cpu_data.x86_model )
> +    {
> +    case 0x3d: /* BDM131 - 5th gen Core */
> +    case 0x47: /* BDD??? - 5th gen Core (H-Processor line)
> +                  BDW120 - Xeon E3-1200 v4 */
> +    case 0x4f: /* BDF93  - Xeon E5-2600 v4
> +                  BDX93  - Xeon E7-x800 v4 */

BDH80, which is "Intel® Core™ i7 Processor Family for LGA2011-v3
Socket", so high end desktop, despite being electrically compatible with
the E5 servers.

~Andrew

>          bdf93_fixup_needed = true;
> +        break;
> +    }
>  }
>  
>  static int is_last_branch_msr(u32 ecx)
diff mbox series

Patch

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2870,8 +2870,10 @@  static void __init lbr_tsx_fixup_check(v
     case 0x45: /* HSM182 - 4th gen Core */
     case 0x46: /* HSM182, HSD172 - 4th gen Core (GT3) */
     case 0x3d: /* BDM127 - 5th gen Core */
-    case 0x47: /* BDD117 - 5th gen Core (GT3) */
-    case 0x4f: /* BDF85  - Xeon E5-2600 v4 */
+    case 0x47: /* BDD117 - 5th gen Core (GT3)
+                  BDW117 - Xeon E3-1200 v4 */
+    case 0x4f: /* BDF85  - Xeon E5-2600 v4
+                  BDX88  - Xeon E7-x800 v4 */
     case 0x56: /* BDE105 - Xeon D-1500 */
         break;
     default:
@@ -2895,15 +2897,26 @@  static void __init lbr_tsx_fixup_check(v
 static void __init bdf93_fixup_check(void)
 {
     /*
-     * Broadwell erratum BDF93:
+     * Broadwell erratum BDF93 et al:
      *
      * Reads from MSR_LER_TO_LIP (MSR 1DEH) may return values for bits[63:61]
      * that are not equal to bit[47].  Attempting to context switch this value
      * may cause a #GP.  Software should sign extend the MSR.
      */
-    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
-         boot_cpu_data.x86 == 6 && boot_cpu_data.x86_model == 0x4f )
+    if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+         boot_cpu_data.x86 != 6 )
+        return;
+
+    switch ( boot_cpu_data.x86_model )
+    {
+    case 0x3d: /* BDM131 - 5th gen Core */
+    case 0x47: /* BDD??? - 5th gen Core (H-Processor line)
+                  BDW120 - Xeon E3-1200 v4 */
+    case 0x4f: /* BDF93  - Xeon E5-2600 v4
+                  BDX93  - Xeon E7-x800 v4 */
         bdf93_fixup_needed = true;
+        break;
+    }
 }
 
 static int is_last_branch_msr(u32 ecx)