diff mbox series

[4/4] x86/cpu-policy: Derive {,R}RSBA for guest policies

Message ID 20230526110656.4018711-6-andrew.cooper3@citrix.com (mailing list archive)
State Superseded
Headers show
Series x86: RSBA and RRSBA handling | expand

Commit Message

Andrew Cooper May 26, 2023, 11:06 a.m. UTC
The RSBA bit, "RSB Alternative", means that the RSB may use alternative
predictors when empty.  From a practical point of view, this mean "Retpoline
not safe".

Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously IBRS_ATT) is a
statement that IBRS is implemented in hardware (as opposed to the form
retrofitted to existing CPUs in microcode).

The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the eIBRS
property that predictions are tagged with the mode in which they were learnt.
Therefore, it means "when eIBRS is active, the RSB may fall back to
alternative predictors but restricted to the current prediction mode".  As
such, it's stronger statement than RSBA, but still means "Retpoline not safe".

Add feature dependencies for EIBRS and RRSBA.  While technically they're not
linked, absolutely nothing good can of letting the guest see RRSBA without
EIBRS.  Furthermore, we use this dependency to simplify the max/default
derivation logic.

The max policies gets RSBA and RRSBA unconditionally set (with the EIBRS
dependency maybe hiding RRSBA).  We can run any VM, even if it has been told
"somewhere else, Retpoline isn't safe".

The default policies inherit the host settings, because the guest wants to run
with as few (anti)features as it can safely get away with.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/cpu-policy.c | 25 +++++++++++++++++++++++++
 xen/tools/gen-cpuid.py    |  5 ++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

Comments

Jan Beulich May 30, 2023, 9:40 a.m. UTC | #1
On 26.05.2023 13:06, Andrew Cooper wrote:
> The RSBA bit, "RSB Alternative", means that the RSB may use alternative
> predictors when empty.  From a practical point of view, this mean "Retpoline
> not safe".
> 
> Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously IBRS_ATT) is a
> statement that IBRS is implemented in hardware (as opposed to the form
> retrofitted to existing CPUs in microcode).
> 
> The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the eIBRS
> property that predictions are tagged with the mode in which they were learnt.
> Therefore, it means "when eIBRS is active, the RSB may fall back to
> alternative predictors but restricted to the current prediction mode".  As
> such, it's stronger statement than RSBA, but still means "Retpoline not safe".

Just for my own understanding: Whether retpoline is safe with RRSBA does
depend on the level of control a less privileged entity has over a more
privileged entity's alternative predictor state? If so, maybe add
"probably" to the quoted statement?

> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -423,8 +423,14 @@ static void __init guest_common_max_feature_adjustments(uint32_t *fs)
>           * Retpoline not safe)", so these need to be visible to a guest in all
>           * cases, even when it's only some other server in the pool which
>           * suffers the identified behaviour.
> +         *
> +         * We can always run any VM which has previously (or will
> +         * subsequently) run on hardware where Retpoline is not safe.  Note:
> +         * The dependency logic may hide RRSBA for other reasons.
>           */
>          __set_bit(X86_FEATURE_ARCH_CAPS, fs);
> +        __set_bit(X86_FEATURE_RSBA, fs);
> +        __set_bit(X86_FEATURE_RRSBA, fs);
>      }
>  }

Similar question to what I raised before: Can't this lead to both bits being
set, which according to descriptions earlier in the series and elsewhere
ought to not be possible?

> --- a/xen/tools/gen-cpuid.py
> +++ b/xen/tools/gen-cpuid.py
> @@ -318,7 +318,7 @@ def crunch_numbers(state):
>          # IBRSB/IBRS, and we pass this MSR directly to guests.  Treating them
>          # as dependent features simplifies Xen's logic, and prevents the guest
>          # from seeing implausible configurations.
> -        IBRSB: [STIBP, SSBD, INTEL_PSFD],
> +        IBRSB: [STIBP, SSBD, INTEL_PSFD, EIBRS],

Is this really an architecturally established dependency? From an abstract
pov having just eIBRS ought to be enough of an indicator? And hence it would
be wrong to hide it just because IBRSB isn't also set. Plus aiui ...

> @@ -328,6 +328,9 @@ def crunch_numbers(state):
>  
>          # The ARCH_CAPS CPUID bit enumerates the availability of the whole register.
>          ARCH_CAPS: list(range(RDCL_NO, RDCL_NO + 64)),
> +
> +        # The behaviour described by RRSBA depend on eIBRS being active.
> +        EIBRS: [RRSBA],

... for the purpose of the change here this dependency is all you need.

Jan
Andrew Cooper May 30, 2023, 1:25 p.m. UTC | #2
On 30/05/2023 10:40 am, Jan Beulich wrote:
> On 26.05.2023 13:06, Andrew Cooper wrote:
>> The RSBA bit, "RSB Alternative", means that the RSB may use alternative
>> predictors when empty.  From a practical point of view, this mean "Retpoline
>> not safe".
>>
>> Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously IBRS_ATT) is a
>> statement that IBRS is implemented in hardware (as opposed to the form
>> retrofitted to existing CPUs in microcode).
>>
>> The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the eIBRS
>> property that predictions are tagged with the mode in which they were learnt.
>> Therefore, it means "when eIBRS is active, the RSB may fall back to
>> alternative predictors but restricted to the current prediction mode".  As
>> such, it's stronger statement than RSBA, but still means "Retpoline not safe".
> Just for my own understanding: Whether retpoline is safe with RRSBA does
> depend on the level of control a less privileged entity has over a more
> privileged entity's alternative predictor state?

Correct, but...

> If so, maybe add "probably" to the quoted statement?

... Spectre-BHI proved it was exploitable and could leak data.

"Don't do JIT in the kernel" was a very unsatisfactory resolution, and
in particular I think there is room to replicate the exploit with array
style sys/hypercalls.

As far as I'm concerned it's a matter of when, not if, a researcher
breaks this boundary again.

Concern in this area is why Intel added
MSR_SPEC_CTRL.{RRSBA,BHI}_DIS_{U,S} controls in ADL/SPR, which hobbles
this behaviour.  (And yes, we need to support these in guests too, but
that involves rearranging Xen's MSR_SPEC_CTRL handling which is why I
haven't gotten around to it yet.)

>> --- a/xen/arch/x86/cpu-policy.c
>> +++ b/xen/arch/x86/cpu-policy.c
>> @@ -423,8 +423,14 @@ static void __init guest_common_max_feature_adjustments(uint32_t *fs)
>>           * Retpoline not safe)", so these need to be visible to a guest in all
>>           * cases, even when it's only some other server in the pool which
>>           * suffers the identified behaviour.
>> +         *
>> +         * We can always run any VM which has previously (or will
>> +         * subsequently) run on hardware where Retpoline is not safe.  Note:
>> +         * The dependency logic may hide RRSBA for other reasons.
>>           */
>>          __set_bit(X86_FEATURE_ARCH_CAPS, fs);
>> +        __set_bit(X86_FEATURE_RSBA, fs);
>> +        __set_bit(X86_FEATURE_RRSBA, fs);
>>      }
>>  }
> Similar question to what I raised before: Can't this lead to both bits being
> set, which according to descriptions earlier in the series and elsewhere
> ought to not be possible?

In this case, no, because the max values are fully discarded when
establishing the default policy.

Remember this value is used to decide whether an incoming VM can be
accepted.  It doesn't reflect a sensible configuration to run.

Whether or not both values ought to be visible is the subject of the
outstanding question.

>
>> --- a/xen/tools/gen-cpuid.py
>> +++ b/xen/tools/gen-cpuid.py
>> @@ -318,7 +318,7 @@ def crunch_numbers(state):
>>          # IBRSB/IBRS, and we pass this MSR directly to guests.  Treating them
>>          # as dependent features simplifies Xen's logic, and prevents the guest
>>          # from seeing implausible configurations.
>> -        IBRSB: [STIBP, SSBD, INTEL_PSFD],
>> +        IBRSB: [STIBP, SSBD, INTEL_PSFD, EIBRS],
> Is this really an architecturally established dependency? From an abstract
> pov having just eIBRS ought to be enough of an indicator?

This is the same as asking "can we hide AVX512F but expose AVX512_*"...

> And hence it would
> be wrong to hide it just because IBRSB isn't also set.

EIBRS means "you should set MSR_SPEC_CTRL.IBRS once at the start of day
and leave it set", which to me firmly states a dependency.


>  Plus aiui ...
>
>> @@ -328,6 +328,9 @@ def crunch_numbers(state):
>>  
>>          # The ARCH_CAPS CPUID bit enumerates the availability of the whole register.
>>          ARCH_CAPS: list(range(RDCL_NO, RDCL_NO + 64)),
>> +
>> +        # The behaviour described by RRSBA depend on eIBRS being active.
>> +        EIBRS: [RRSBA],
> ... for the purpose of the change here this dependency is all you need.

This change is to make the values sane for a guest, which includes "you
don't get RRSBA, or EIBRS if you have to level IBRS out".

~Andrew
Jan Beulich May 30, 2023, 2:34 p.m. UTC | #3
On 30.05.2023 15:25, Andrew Cooper wrote:
> On 30/05/2023 10:40 am, Jan Beulich wrote:
>> On 26.05.2023 13:06, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/cpu-policy.c
>>> +++ b/xen/arch/x86/cpu-policy.c
>>> @@ -423,8 +423,14 @@ static void __init guest_common_max_feature_adjustments(uint32_t *fs)
>>>           * Retpoline not safe)", so these need to be visible to a guest in all
>>>           * cases, even when it's only some other server in the pool which
>>>           * suffers the identified behaviour.
>>> +         *
>>> +         * We can always run any VM which has previously (or will
>>> +         * subsequently) run on hardware where Retpoline is not safe.  Note:
>>> +         * The dependency logic may hide RRSBA for other reasons.
>>>           */
>>>          __set_bit(X86_FEATURE_ARCH_CAPS, fs);
>>> +        __set_bit(X86_FEATURE_RSBA, fs);
>>> +        __set_bit(X86_FEATURE_RRSBA, fs);
>>>      }
>>>  }
>> Similar question to what I raised before: Can't this lead to both bits being
>> set, which according to descriptions earlier in the series and elsewhere
>> ought to not be possible?
> 
> In this case, no, because the max values are fully discarded when
> establishing the default policy.
> 
> Remember this value is used to decide whether an incoming VM can be
> accepted.  It doesn't reflect a sensible configuration to run.

Right. I should have dropped the question when seeing the "default"
counterpart's behavior.

> Whether or not both values ought to be visible is the subject of the
> outstanding question.

Pending the answer there (and whichever easy adjustment)
Reviewed-by: Jan Beulich <jbeulich@suse.com>

>>> --- a/xen/tools/gen-cpuid.py
>>> +++ b/xen/tools/gen-cpuid.py
>>> @@ -318,7 +318,7 @@ def crunch_numbers(state):
>>>          # IBRSB/IBRS, and we pass this MSR directly to guests.  Treating them
>>>          # as dependent features simplifies Xen's logic, and prevents the guest
>>>          # from seeing implausible configurations.
>>> -        IBRSB: [STIBP, SSBD, INTEL_PSFD],
>>> +        IBRSB: [STIBP, SSBD, INTEL_PSFD, EIBRS],
>> Is this really an architecturally established dependency? From an abstract
>> pov having just eIBRS ought to be enough of an indicator?
> 
> This is the same as asking "can we hide AVX512F but expose AVX512_*"...
> 
>> And hence it would
>> be wrong to hide it just because IBRSB isn't also set.
> 
> EIBRS means "you should set MSR_SPEC_CTRL.IBRS once at the start of day
> and leave it set", which to me firmly states a dependency.

Hmm, yes, now that you put it that way, I agree.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index bdbc5660acd4..eb1ecb75f593 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -423,8 +423,14 @@  static void __init guest_common_max_feature_adjustments(uint32_t *fs)
          * Retpoline not safe)", so these need to be visible to a guest in all
          * cases, even when it's only some other server in the pool which
          * suffers the identified behaviour.
+         *
+         * We can always run any VM which has previously (or will
+         * subsequently) run on hardware where Retpoline is not safe.  Note:
+         * The dependency logic may hide RRSBA for other reasons.
          */
         __set_bit(X86_FEATURE_ARCH_CAPS, fs);
+        __set_bit(X86_FEATURE_RSBA, fs);
+        __set_bit(X86_FEATURE_RRSBA, fs);
     }
 }
 
@@ -432,6 +438,25 @@  static void __init guest_common_default_feature_adjustments(uint32_t *fs)
 {
     if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
     {
+        /*
+         * The {,R}RSBA bits under virt mean "you might migrate somewhere
+         * where retpoline is not safe".  In particular, a VM's settings may
+         * not be applicable to the current host.
+         *
+         * Discard the settings inherited from the max policy, and and feed in
+         * the host values.  The ideal case for a VM is for neither of these
+         * bits to be set, but toolstacks must accumuate them across anywhere
+         * the VM might migrate to, in case any possible destination happens
+         * to be unsafe.
+         *
+         * Note: The dependency logic might hide RRSBA for other reasons.
+         */
+        fs[FEATURESET_m10Al] &= ~(cpufeat_mask(X86_FEATURE_RSBA) |
+                                  cpufeat_mask(X86_FEATURE_RRSBA));
+        fs[FEATURESET_m10Al] |=
+            host_cpu_policy.arch_caps.lo & (cpufeat_mask(X86_FEATURE_RSBA) |
+                                            cpufeat_mask(X86_FEATURE_RRSBA));
+
         /*
          * IvyBridge client parts suffer from leakage of RDRAND data due to SRBDS
          * (XSA-320 / CVE-2020-0543), and won't be receiving microcode to
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index f28ff708a2fc..22294a26adc0 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -318,7 +318,7 @@  def crunch_numbers(state):
         # IBRSB/IBRS, and we pass this MSR directly to guests.  Treating them
         # as dependent features simplifies Xen's logic, and prevents the guest
         # from seeing implausible configurations.
-        IBRSB: [STIBP, SSBD, INTEL_PSFD],
+        IBRSB: [STIBP, SSBD, INTEL_PSFD, EIBRS],
         IBRS: [AMD_STIBP, AMD_SSBD, PSFD,
                IBRS_ALWAYS, IBRS_FAST, IBRS_SAME_MODE],
         AMD_STIBP: [STIBP_ALWAYS],
@@ -328,6 +328,9 @@  def crunch_numbers(state):
 
         # The ARCH_CAPS CPUID bit enumerates the availability of the whole register.
         ARCH_CAPS: list(range(RDCL_NO, RDCL_NO + 64)),
+
+        # The behaviour described by RRSBA depend on eIBRS being active.
+        EIBRS: [RRSBA],
     }
 
     deep_features = tuple(sorted(deps.keys()))