Message ID | 20230601144845.1554589-4-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86: RSBA and RRSBA handling | expand |
On 01.06.2023 16:48, Andrew Cooper wrote: > The RSBA bit, "RSB Alternative", means that the RSB may use alternative > predictors when empty. From a practical point of view, this mean "Retpoline > not safe". > > Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously IBRS_ATT) is a > statement that IBRS is implemented in hardware (as opposed to the form > retrofitted to existing CPUs in microcode). > > The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the eIBRS > property that predictions are tagged with the mode in which they were learnt. > Therefore, it means "when eIBRS is active, the RSB may fall back to > alternative predictors but restricted to the current prediction mode". As > such, it's stronger statement than RSBA, but still means "Retpoline not safe". > > CPUs are not expected to enumerate both RSBA and RRSBA. > > Add feature dependencies for EIBRS and RRSBA. While technically they're not > linked, absolutely nothing good can of letting the guest see RRSBA without Nit: missing "come"? > EIBRS. Nor can anything good come of a guest seeing EIBRS without IBRSB. > Furthermore, we use this dependency to simplify the max derivation logic. >[...] > @@ -532,6 +541,21 @@ static void __init calculate_pv_def_policy(void) > guest_common_default_feature_adjustments(fs); > > sanitise_featureset(fs); > + > + /* > + * If the host suffers from RSBA of any form, and the guest can see > + * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest > + * depending on the visibility of eIBRS. > + */ > + if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) && > + (cpu_has_rsba || cpu_has_rrsba) ) > + { > + bool eibrs = test_bit(X86_FEATURE_EIBRS, fs); > + > + __set_bit(eibrs ? X86_FEATURE_RRSBA > + : X86_FEATURE_RSBA, fs); > + } > + > x86_cpu_featureset_to_policy(fs, p); > recalculate_xstate(p); > } > @@ -664,6 +688,21 @@ static void __init calculate_hvm_def_policy(void) > __set_bit(X86_FEATURE_VIRT_SSBD, fs); > > sanitise_featureset(fs); > + > + /* > + * If the host suffers from RSBA of any form, and the guest can see > + * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest > + * depending on the visibility of eIBRS. > + */ > + if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) && > + (cpu_has_rsba || cpu_has_rrsba) ) > + { > + bool eibrs = test_bit(X86_FEATURE_EIBRS, fs); > + > + __set_bit(eibrs ? X86_FEATURE_RRSBA > + : X86_FEATURE_RSBA, fs); > + } > + > x86_cpu_featureset_to_policy(fs, p); > recalculate_xstate(p); > } > @@ -786,6 +825,20 @@ void recalculate_cpuid_policy(struct domain *d) > > sanitise_featureset(fs); > > + /* > + * If the host suffers from RSBA of any form, and the guest can see > + * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest > + * depending on the visibility of eIBRS. > + */ > + if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) && > + (cpu_has_rsba || cpu_has_rrsba) ) > + { > + bool eibrs = test_bit(X86_FEATURE_EIBRS, fs); > + > + __set_bit(eibrs ? X86_FEATURE_RRSBA > + : X86_FEATURE_RSBA, fs); > + } Now that we have the same code and comment even a 3rd time, surely this wants to be put in a helper? What about a tool stack request leading to us setting the 2nd of the two bits here, while the other was already set? IOW wouldn't we better clear the other bit explicitly? (Due to the EIBRS dependency or RRSBA I think this can really only happen when the tool stack requests RSBA+EIBRS, as the deep deps clearing doesn't know the concept of "negative" dependencies.) Similarly what about a tool stack (and ultimately admin) request setting both RSBA and RRSBA? Wouldn't we better clear the respectively wrong bit then? People may do this in their guest configs "just to be on the safe side" (once expressing this in guest configs is possible, of course), and due to the max policies having both bits set this also won't occur "automatically". Jan
On 02/06/2023 11:20 am, Jan Beulich wrote: > On 01.06.2023 16:48, Andrew Cooper wrote: >> The RSBA bit, "RSB Alternative", means that the RSB may use alternative >> predictors when empty. From a practical point of view, this mean "Retpoline >> not safe". >> >> Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously IBRS_ATT) is a >> statement that IBRS is implemented in hardware (as opposed to the form >> retrofitted to existing CPUs in microcode). >> >> The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the eIBRS >> property that predictions are tagged with the mode in which they were learnt. >> Therefore, it means "when eIBRS is active, the RSB may fall back to >> alternative predictors but restricted to the current prediction mode". As >> such, it's stronger statement than RSBA, but still means "Retpoline not safe". >> >> CPUs are not expected to enumerate both RSBA and RRSBA. >> >> Add feature dependencies for EIBRS and RRSBA. While technically they're not >> linked, absolutely nothing good can of letting the guest see RRSBA without > Nit: missing "come"? Yes. Will fix. >> @@ -786,6 +825,20 @@ void recalculate_cpuid_policy(struct domain *d) >> >> sanitise_featureset(fs); >> >> + /* >> + * If the host suffers from RSBA of any form, and the guest can see >> + * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest >> + * depending on the visibility of eIBRS. >> + */ >> + if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) && >> + (cpu_has_rsba || cpu_has_rrsba) ) >> + { >> + bool eibrs = test_bit(X86_FEATURE_EIBRS, fs); >> + >> + __set_bit(eibrs ? X86_FEATURE_RRSBA >> + : X86_FEATURE_RSBA, fs); >> + } > Now that we have the same code and comment even a 3rd time, surely this > wants to be put in a helper? I did consider that, and chose not to in this case. One of these is going to disappear again in due course, when we start handing errors back to the toolstack instead of fixing up behind it. The requirement to be after sanitise_featureset() is critically important here for safety, and out-of-lining makes that connection less obvious. I considered having guest_common_default_late_feature_adjustments(), but that name is getting silly and it's already somewhat hard to navigate. There's quite a bit of other cleanup which ought to be done, like uniformly adding new bits first, then taking bits away (I suffered two bugs in init_dom0_cpuid_policy() getting this wrong during development), so I was planning to leave any decisions until then. > What about a tool stack request leading to us setting the 2nd of the two > bits here, while the other was already set? IOW wouldn't we better clear > the other bit explicitly? (Due to the EIBRS dependency or RRSBA I think > this can really only happen when the tool stack requests RSBA+EIBRS, as > the deep deps clearing doesn't know the concept of "negative" > dependencies.) Hmm - I think there is a bug here, but it's not this simple. I think the only reasonable thing we can do is start rejecting bad input because I don't think Xen can fix up safely. Xen must not ever clear RSBA, or we've potentially made the VM unsafe behind the toolstack's back. If EIBRS != RRSBA, the toolstack has made a mistake. Equally too for RSBA && EIBRS. I think this is going to take more coffee to solve... > Similarly what about a tool stack (and ultimately admin) request setting > both RSBA and RRSBA? Wouldn't we better clear the respectively wrong bit > then? People may do this in their guest configs "just to be on the safe > side" (once expressing this in guest configs is possible, of course), > and due to the max policies having both bits set this also won't occur > "automatically". The only reason this series doesn't have a final patch turning ARCH_CAPS's "a" into "A" is because libxl can't currently operate these bits at all, let alone safely. Roger is kindly looking into that side of things. It is an error to be modifying bits behind the toolstack's back to start with. We get away with it previously because hiding bits that the toolstack thinks the guest saw is goes in the safe direction WRT migrate. But no more with the semantics of RSBA/RRSBA. I explicitly don't care about people wanting to set RSBA && RRSBA "just in case" - this is too complicated already. The only non-default thing an admin needs to be able to express is +rsba,-eibrs,-rrsba to mean "please be compatible with pre-EIBRS hardware". (In reality, there will also need to be some FOO_NO bits taken out too, depending on the CPUs in question.) ~Andrew
On 02/06/2023 4:29 pm, Andrew Cooper wrote: > On 02/06/2023 11:20 am, Jan Beulich wrote: >> On 01.06.2023 16:48, Andrew Cooper wrote: >> What about a tool stack request leading to us setting the 2nd of the two >> bits here, while the other was already set? IOW wouldn't we better clear >> the other bit explicitly? (Due to the EIBRS dependency or RRSBA I think >> this can really only happen when the tool stack requests RSBA+EIBRS, as >> the deep deps clearing doesn't know the concept of "negative" >> dependencies.) > Hmm - I think there is a bug here, but it's not this simple. I think > the only reasonable thing we can do is start rejecting bad input because > I don't think Xen can fix up safely. > > Xen must not ever clear RSBA, or we've potentially made the VM unsafe > behind the toolstack's back. > > If EIBRS != RRSBA, the toolstack has made a mistake. Equally too for > RSBA && EIBRS. > > I think this is going to take more coffee to solve... Actually, no. I'm going to delete the hunk modifying recalculate_cpuid(), and move this patch back to the meaning it had in v1 which is just "get the policies looking correct". It's still not supported for the toolstack to request ARCH_CAPS (the "a" marking), and the safely logic for that can come in a subsequent series along with the unit(ish) testing I was already planning to do. ~Andrew
On 02.06.2023 17:38, Andrew Cooper wrote: > On 02/06/2023 4:29 pm, Andrew Cooper wrote: >> On 02/06/2023 11:20 am, Jan Beulich wrote: >>> On 01.06.2023 16:48, Andrew Cooper wrote: >>> What about a tool stack request leading to us setting the 2nd of the two >>> bits here, while the other was already set? IOW wouldn't we better clear >>> the other bit explicitly? (Due to the EIBRS dependency or RRSBA I think >>> this can really only happen when the tool stack requests RSBA+EIBRS, as >>> the deep deps clearing doesn't know the concept of "negative" >>> dependencies.) >> Hmm - I think there is a bug here, but it's not this simple. I think >> the only reasonable thing we can do is start rejecting bad input because >> I don't think Xen can fix up safely. >> >> Xen must not ever clear RSBA, or we've potentially made the VM unsafe >> behind the toolstack's back. >> >> If EIBRS != RRSBA, the toolstack has made a mistake. Equally too for >> RSBA && EIBRS. >> >> I think this is going to take more coffee to solve... > > Actually, no. > > I'm going to delete the hunk modifying recalculate_cpuid(), and move > this patch back to the meaning it had in v1 which is just "get the > policies looking correct". Hmm, and how are you intending to achieve that goal without adjusting behind the tool stack's back, and without it being an option (yet) to reject the input? Iirc you agreed with my v1 remark that some fixing that some fixing up is going to be necessary. Since as you say (and as I should have realized when writing the earlier reply) we may never clear RSBA, wouldn't it be an option to clear EIBRS instead? > It's still not supported for the toolstack to request ARCH_CAPS (the "a" > marking), and the safely logic for that can come in a subsequent series > along with the unit(ish) testing I was already planning to do. Yet it not being supported doesn't mean we should leave a broken result when we can fix things up (for the time being, i.e. until we can report back that we don't like this feature combination). Jan
On 02.06.2023 17:29, Andrew Cooper wrote: > On 02/06/2023 11:20 am, Jan Beulich wrote: >> On 01.06.2023 16:48, Andrew Cooper wrote: >>> @@ -786,6 +825,20 @@ void recalculate_cpuid_policy(struct domain *d) >>> >>> sanitise_featureset(fs); >>> >>> + /* >>> + * If the host suffers from RSBA of any form, and the guest can see >>> + * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest >>> + * depending on the visibility of eIBRS. >>> + */ >>> + if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) && >>> + (cpu_has_rsba || cpu_has_rrsba) ) >>> + { >>> + bool eibrs = test_bit(X86_FEATURE_EIBRS, fs); >>> + >>> + __set_bit(eibrs ? X86_FEATURE_RRSBA >>> + : X86_FEATURE_RSBA, fs); >>> + } >> Now that we have the same code and comment even a 3rd time, surely this >> wants to be put in a helper? > > I did consider that, and chose not to in this case. > > One of these is going to disappear again in due course, when we start > handing errors back to the toolstack instead of fixing up behind it. > > The requirement to be after sanitise_featureset() is critically > important here for safety, and out-of-lining makes that connection less > obvious. > > I considered having guest_common_default_late_feature_adjustments(), but > that name is getting silly and it's already somewhat hard to navigate. Well, okay then. But may I then please ask that the three instances each cross-reference one another, so touching one will stand a chance of the others also being touched? >> Similarly what about a tool stack (and ultimately admin) request setting >> both RSBA and RRSBA? Wouldn't we better clear the respectively wrong bit >> then? People may do this in their guest configs "just to be on the safe >> side" (once expressing this in guest configs is possible, of course), >> and due to the max policies having both bits set this also won't occur >> "automatically". > > The only reason this series doesn't have a final patch turning > ARCH_CAPS's "a" into "A" is because libxl can't currently operate these > bits at all, let alone safely. Roger is kindly looking into that side > of things. > > It is an error to be modifying bits behind the toolstack's back to start > with. We get away with it previously because hiding bits that the > toolstack thinks the guest saw is goes in the safe direction WRT > migrate. But no more with the semantics of RSBA/RRSBA. > > I explicitly don't care about people wanting to set RSBA && RRSBA "just > in case" - this is too complicated already. The only non-default thing > an admin needs to be able to express is +rsba,-eibrs,-rrsba to mean > "please be compatible with pre-EIBRS hardware". (In reality, there will > also need to be some FOO_NO bits taken out too, depending on the CPUs in > question.) Not caring about such people would mean documenting very clearly that the two bits used together is not supported (and then also their respective invalid combinations with EIBRS). Otherwise I'm afraid "such people" would likely include one of our bigger customers. Jan
diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index ee256ff5a137..f3bcb1ea4101 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -423,8 +423,17 @@ static void __init guest_common_max_feature_adjustments(uint32_t *fs) * Retpoline not safe)", so these need to be visible to a guest in all * cases, even when it's only some other server in the pool which * suffers the identified behaviour. + * + * We can always run any VM which has previously (or will + * subsequently) run on hardware where Retpoline is not safe. + * Note: + * - The dependency logic may hide RRSBA for other reasons. + * - The max policy does not contitute a sensible configuration to + * run a guest in. */ __set_bit(X86_FEATURE_ARCH_CAPS, fs); + __set_bit(X86_FEATURE_RSBA, fs); + __set_bit(X86_FEATURE_RRSBA, fs); } } @@ -532,6 +541,21 @@ static void __init calculate_pv_def_policy(void) guest_common_default_feature_adjustments(fs); sanitise_featureset(fs); + + /* + * If the host suffers from RSBA of any form, and the guest can see + * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest + * depending on the visibility of eIBRS. + */ + if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) && + (cpu_has_rsba || cpu_has_rrsba) ) + { + bool eibrs = test_bit(X86_FEATURE_EIBRS, fs); + + __set_bit(eibrs ? X86_FEATURE_RRSBA + : X86_FEATURE_RSBA, fs); + } + x86_cpu_featureset_to_policy(fs, p); recalculate_xstate(p); } @@ -664,6 +688,21 @@ static void __init calculate_hvm_def_policy(void) __set_bit(X86_FEATURE_VIRT_SSBD, fs); sanitise_featureset(fs); + + /* + * If the host suffers from RSBA of any form, and the guest can see + * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest + * depending on the visibility of eIBRS. + */ + if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) && + (cpu_has_rsba || cpu_has_rrsba) ) + { + bool eibrs = test_bit(X86_FEATURE_EIBRS, fs); + + __set_bit(eibrs ? X86_FEATURE_RRSBA + : X86_FEATURE_RSBA, fs); + } + x86_cpu_featureset_to_policy(fs, p); recalculate_xstate(p); } @@ -786,6 +825,20 @@ void recalculate_cpuid_policy(struct domain *d) sanitise_featureset(fs); + /* + * If the host suffers from RSBA of any form, and the guest can see + * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest + * depending on the visibility of eIBRS. + */ + if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) && + (cpu_has_rsba || cpu_has_rrsba) ) + { + bool eibrs = test_bit(X86_FEATURE_EIBRS, fs); + + __set_bit(eibrs ? X86_FEATURE_RRSBA + : X86_FEATURE_RSBA, fs); + } + /* Fold host's FDP_EXCP_ONLY and NO_FPU_SEL into guest's view. */ fs[FEATURESET_7b0] &= ~(cpufeat_mask(X86_FEATURE_FDP_EXCP_ONLY) | cpufeat_mask(X86_FEATURE_NO_FPU_SEL)); diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index 4edf9aba7ff6..a0e46138d763 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -311,7 +311,7 @@ XEN_CPUFEATURE(CET_SSS, 15*32+18) /* CET Supervisor Shadow Stacks s /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.eax, word 16 */ XEN_CPUFEATURE(RDCL_NO, 16*32+ 0) /*A No Rogue Data Cache Load (Meltdown) */ XEN_CPUFEATURE(EIBRS, 16*32+ 1) /*A Enhanced IBRS */ -XEN_CPUFEATURE(RSBA, 16*32+ 2) /*!A RSB Alternative (Retpoline not safe) */ +XEN_CPUFEATURE(RSBA, 16*32+ 2) /*! RSB Alternative (Retpoline not safe) */ XEN_CPUFEATURE(SKIP_L1DFL, 16*32+ 3) /* Don't need to flush L1D on VMEntry */ XEN_CPUFEATURE(INTEL_SSB_NO, 16*32+ 4) /*A No Speculative Store Bypass */ XEN_CPUFEATURE(MDS_NO, 16*32+ 5) /*A No Microarchitectural Data Sampling */ @@ -327,7 +327,7 @@ XEN_CPUFEATURE(FBSDP_NO, 16*32+14) /*A No Fill Buffer Stale Data Prop XEN_CPUFEATURE(PSDP_NO, 16*32+15) /*A No Primary Stale Data Propagation */ XEN_CPUFEATURE(FB_CLEAR, 16*32+17) /*A Fill Buffers cleared by VERW */ XEN_CPUFEATURE(FB_CLEAR_CTRL, 16*32+18) /* MSR_OPT_CPU_CTRL.FB_CLEAR_DIS */ -XEN_CPUFEATURE(RRSBA, 16*32+19) /*!A Restricted RSB Alternative */ +XEN_CPUFEATURE(RRSBA, 16*32+19) /*! Restricted RSB Alternative */ XEN_CPUFEATURE(BHI_NO, 16*32+20) /*A No Branch History Injection */ XEN_CPUFEATURE(XAPIC_STATUS, 16*32+21) /* MSR_XAPIC_DISABLE_STATUS */ XEN_CPUFEATURE(OVRCLK_STATUS, 16*32+23) /* MSR_OVERCLOCKING_STATUS */ diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py index 973fcc1c64e8..72cf11654ba9 100755 --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -318,7 +318,7 @@ def crunch_numbers(state): # IBRSB/IBRS, and we pass this MSR directly to guests. Treating them # as dependent features simplifies Xen's logic, and prevents the guest # from seeing implausible configurations. - IBRSB: [STIBP, SSBD, INTEL_PSFD], + IBRSB: [STIBP, SSBD, INTEL_PSFD, EIBRS], IBRS: [AMD_STIBP, AMD_SSBD, PSFD, AUTO_IBRS, IBRS_ALWAYS, IBRS_FAST, IBRS_SAME_MODE], AMD_STIBP: [STIBP_ALWAYS], @@ -328,6 +328,9 @@ def crunch_numbers(state): # The ARCH_CAPS CPUID bit enumerates the availability of the whole register. ARCH_CAPS: list(range(RDCL_NO, RDCL_NO + 64)), + + # The behaviour described by RRSBA depend on eIBRS being active. + EIBRS: [RRSBA], } deep_features = tuple(sorted(deps.keys()))
The RSBA bit, "RSB Alternative", means that the RSB may use alternative predictors when empty. From a practical point of view, this mean "Retpoline not safe". Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously IBRS_ATT) is a statement that IBRS is implemented in hardware (as opposed to the form retrofitted to existing CPUs in microcode). The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the eIBRS property that predictions are tagged with the mode in which they were learnt. Therefore, it means "when eIBRS is active, the RSB may fall back to alternative predictors but restricted to the current prediction mode". As such, it's stronger statement than RSBA, but still means "Retpoline not safe". CPUs are not expected to enumerate both RSBA and RRSBA. Add feature dependencies for EIBRS and RRSBA. While technically they're not linked, absolutely nothing good can of letting the guest see RRSBA without EIBRS. Nor can anything good come of a guest seeing EIBRS without IBRSB. Furthermore, we use this dependency to simplify the max derivation logic. The max policies gets RSBA and RRSBA unconditionally set (with the EIBRS dependency maybe hiding RRSBA). We can run any VM, even if it has been told "somewhere you might run, Retpoline isn't safe". The default policies are more complicated. A guest shouldn't see both bits, but it needs to see one if the current host suffers from any form of RSBA, and which bit it needs to see depends on whether eIBRS is visible or not. Therefore, the calculation must be performed after sanitise_featureset(). Finally, apply the same logic in recalculate_cpuid_policy(), as we do for other safety settings while we're still overhauling the toolstack logic in this area. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> v2: * Expand/adjust the comment for the max features. * Rewrite the default feature derivation in light of new information. * Fix up in recalculate_cpuid_policy() too. --- xen/arch/x86/cpu-policy.c | 53 +++++++++++++++++++++ xen/include/public/arch-x86/cpufeatureset.h | 4 +- xen/tools/gen-cpuid.py | 5 +- 3 files changed, 59 insertions(+), 3 deletions(-)