Message ID | 20240209114045.97005-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] x86/vmx: add support for virtualize SPEC_CTRL | expand |
On 09.02.2024 12:40, Roger Pau Monne wrote: > @@ -1378,6 +1379,10 @@ static int construct_vmcs(struct vcpu *v) > rc = vmx_add_msr(v, MSR_PRED_CMD, PRED_CMD_IBPB, > VMX_MSR_HOST); > > + /* Set any bits we don't allow toggling in the mask field. */ > + if ( cpu_has_vmx_virt_spec_ctrl && v->arch.msrs->spec_ctrl.raw ) > + __vmwrite(SPEC_CTRL_MASK, v->arch.msrs->spec_ctrl.raw); The right side of the conditional isn't strictly necessary here, is it? Might it be better to omit it, for clarity? > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -823,18 +823,29 @@ static void cf_check vmx_cpuid_policy_changed(struct vcpu *v) > { > vmx_clear_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW); > > - rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0); > - if ( rc ) > - goto out; > + if ( !cpu_has_vmx_virt_spec_ctrl ) > + { > + rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0); > + if ( rc ) > + goto out; > + } > } > else > { > vmx_set_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW); > > - rc = vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST); > - if ( rc && rc != -ESRCH ) > - goto out; > - rc = 0; /* Tolerate -ESRCH */ > + /* > + * NB: there's no need to clear the virtualize SPEC_CTRL control, as > + * the MSR intercept takes precedence. The SPEC_CTRL shadow VMCS field > + * is also not loaded on guest entry/exit if the intercept is set. > + */ It wasn't so much the shadow field than the mask one that I was concerned might be used in some way. The shadow one clearly is used only during guest RDMSR/WRMSR processing. To not focus on "shadow", maybe simple say "The SPEC_CTRL shadow VMCS fields are also not ..."? > + if ( !cpu_has_vmx_virt_spec_ctrl ) > + { > + rc = vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST); > + if ( rc && rc != -ESRCH ) > + goto out; > + rc = 0; /* Tolerate -ESRCH */ > + } > } > > /* MSR_PRED_CMD is safe to pass through if the guest knows about it. */ > @@ -2629,6 +2640,9 @@ static uint64_t cf_check vmx_get_reg(struct vcpu *v, unsigned int reg) > switch ( reg ) > { > case MSR_SPEC_CTRL: > + if ( cpu_has_vmx_virt_spec_ctrl ) > + /* Requires remote VMCS loaded - fetched below. */ I could see what "fetch" refers to here, but ... > + break; > rc = vmx_read_guest_msr(v, reg, &val); > if ( rc ) > { > @@ -2652,6 +2666,11 @@ static uint64_t cf_check vmx_get_reg(struct vcpu *v, unsigned int reg) > vmx_vmcs_enter(v); > switch ( reg ) > { > + case MSR_SPEC_CTRL: > + ASSERT(cpu_has_vmx_virt_spec_ctrl); > + __vmread(SPEC_CTRL_SHADOW, &val); > + break; > + > case MSR_IA32_BNDCFGS: > __vmread(GUEST_BNDCFGS, &val); > break; > @@ -2678,6 +2697,9 @@ static void cf_check vmx_set_reg(struct vcpu *v, unsigned int reg, uint64_t val) > switch ( reg ) > { > case MSR_SPEC_CTRL: > + if ( cpu_has_vmx_virt_spec_ctrl ) > + /* Requires remote VMCS loaded - fetched below. */ ... since you also use the word here, I'm not sure it's really the VMREAD up there. Jan
On Mon, Feb 12, 2024 at 03:09:01PM +0100, Jan Beulich wrote: > On 09.02.2024 12:40, Roger Pau Monne wrote: > > @@ -1378,6 +1379,10 @@ static int construct_vmcs(struct vcpu *v) > > rc = vmx_add_msr(v, MSR_PRED_CMD, PRED_CMD_IBPB, > > VMX_MSR_HOST); > > > > + /* Set any bits we don't allow toggling in the mask field. */ > > + if ( cpu_has_vmx_virt_spec_ctrl && v->arch.msrs->spec_ctrl.raw ) > > + __vmwrite(SPEC_CTRL_MASK, v->arch.msrs->spec_ctrl.raw); > > The right side of the conditional isn't strictly necessary here, is it? > Might it be better to omit it, for clarity? No strong opinion, my thinking was that skipping the vmwrite would be better performance wise, but we don't care about performance here anyway. > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -823,18 +823,29 @@ static void cf_check vmx_cpuid_policy_changed(struct vcpu *v) > > { > > vmx_clear_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW); > > > > - rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0); > > - if ( rc ) > > - goto out; > > + if ( !cpu_has_vmx_virt_spec_ctrl ) > > + { > > + rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0); > > + if ( rc ) > > + goto out; > > + } > > } > > else > > { > > vmx_set_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW); > > > > - rc = vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST); > > - if ( rc && rc != -ESRCH ) > > - goto out; > > - rc = 0; /* Tolerate -ESRCH */ > > + /* > > + * NB: there's no need to clear the virtualize SPEC_CTRL control, as > > + * the MSR intercept takes precedence. The SPEC_CTRL shadow VMCS field > > + * is also not loaded on guest entry/exit if the intercept is set. > > + */ > > It wasn't so much the shadow field than the mask one that I was concerned > might be used in some way. The shadow one clearly is used only during > guest RDMSR/WRMSR processing. To not focus on "shadow", maybe simple say > "The SPEC_CTRL shadow VMCS fields are also not ..."? What about: "The SPEC_CTRL shadow and mask VMCS fields don't take effect if the intercept is set." > > + if ( !cpu_has_vmx_virt_spec_ctrl ) > > + { > > + rc = vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST); > > + if ( rc && rc != -ESRCH ) > > + goto out; > > + rc = 0; /* Tolerate -ESRCH */ > > + } > > } > > > > /* MSR_PRED_CMD is safe to pass through if the guest knows about it. */ > > @@ -2629,6 +2640,9 @@ static uint64_t cf_check vmx_get_reg(struct vcpu *v, unsigned int reg) > > switch ( reg ) > > { > > case MSR_SPEC_CTRL: > > + if ( cpu_has_vmx_virt_spec_ctrl ) > > + /* Requires remote VMCS loaded - fetched below. */ > > I could see what "fetch" refers to here, but ... > > > + break; > > rc = vmx_read_guest_msr(v, reg, &val); > > if ( rc ) > > { > > @@ -2652,6 +2666,11 @@ static uint64_t cf_check vmx_get_reg(struct vcpu *v, unsigned int reg) > > vmx_vmcs_enter(v); > > switch ( reg ) > > { > > + case MSR_SPEC_CTRL: > > + ASSERT(cpu_has_vmx_virt_spec_ctrl); > > + __vmread(SPEC_CTRL_SHADOW, &val); > > + break; > > + > > case MSR_IA32_BNDCFGS: > > __vmread(GUEST_BNDCFGS, &val); > > break; > > @@ -2678,6 +2697,9 @@ static void cf_check vmx_set_reg(struct vcpu *v, unsigned int reg, uint64_t val) > > switch ( reg ) > > { > > case MSR_SPEC_CTRL: > > + if ( cpu_has_vmx_virt_spec_ctrl ) > > + /* Requires remote VMCS loaded - fetched below. */ > > ... since you also use the word here, I'm not sure it's really > the VMREAD up there. That one should be 'set below'. Thanks, Roger.
On 15.02.2024 16:54, Roger Pau Monné wrote: > On Mon, Feb 12, 2024 at 03:09:01PM +0100, Jan Beulich wrote: >> On 09.02.2024 12:40, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/hvm/vmx/vmx.c >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c >>> @@ -823,18 +823,29 @@ static void cf_check vmx_cpuid_policy_changed(struct vcpu *v) >>> { >>> vmx_clear_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW); >>> >>> - rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0); >>> - if ( rc ) >>> - goto out; >>> + if ( !cpu_has_vmx_virt_spec_ctrl ) >>> + { >>> + rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0); >>> + if ( rc ) >>> + goto out; >>> + } >>> } >>> else >>> { >>> vmx_set_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW); >>> >>> - rc = vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST); >>> - if ( rc && rc != -ESRCH ) >>> - goto out; >>> - rc = 0; /* Tolerate -ESRCH */ >>> + /* >>> + * NB: there's no need to clear the virtualize SPEC_CTRL control, as >>> + * the MSR intercept takes precedence. The SPEC_CTRL shadow VMCS field >>> + * is also not loaded on guest entry/exit if the intercept is set. >>> + */ >> >> It wasn't so much the shadow field than the mask one that I was concerned >> might be used in some way. The shadow one clearly is used only during >> guest RDMSR/WRMSR processing. To not focus on "shadow", maybe simple say >> "The SPEC_CTRL shadow VMCS fields are also not ..."? > > What about: > > "The SPEC_CTRL shadow and mask VMCS fields don't take effect if the > intercept is set." SGTM. Jan
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 9e016634ab5c..dc46adb02595 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -202,6 +202,7 @@ static void __init vmx_display_features(void) P(cpu_has_vmx_tsc_scaling, "TSC Scaling"); P(cpu_has_vmx_bus_lock_detection, "Bus Lock Detection"); P(cpu_has_vmx_notify_vm_exiting, "Notify VM Exit"); + P(cpu_has_vmx_virt_spec_ctrl, "Virtualize SPEC_CTRL"); #undef P if ( !printed ) @@ -365,7 +366,7 @@ static int vmx_init_vmcs_config(bool bsp) if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS ) { - uint64_t opt = 0; + uint64_t opt = TERTIARY_EXEC_VIRT_SPEC_CTRL; _vmx_tertiary_exec_control = adjust_vmx_controls2( "Tertiary Exec Control", 0, opt, @@ -1378,6 +1379,10 @@ static int construct_vmcs(struct vcpu *v) rc = vmx_add_msr(v, MSR_PRED_CMD, PRED_CMD_IBPB, VMX_MSR_HOST); + /* Set any bits we don't allow toggling in the mask field. */ + if ( cpu_has_vmx_virt_spec_ctrl && v->arch.msrs->spec_ctrl.raw ) + __vmwrite(SPEC_CTRL_MASK, v->arch.msrs->spec_ctrl.raw); + out: vmx_vmcs_exit(v); @@ -2086,6 +2091,9 @@ void vmcs_dump_vcpu(struct vcpu *v) if ( v->arch.hvm.vmx.secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY ) printk("InterruptStatus = %04x\n", vmr16(GUEST_INTR_STATUS)); + if ( cpu_has_vmx_virt_spec_ctrl ) + printk("SPEC_CTRL mask = 0x%016lx shadow = 0x%016lx\n", + vmr(SPEC_CTRL_MASK), vmr(SPEC_CTRL_SHADOW)); printk("*** Host State ***\n"); printk("RIP = 0x%016lx (%ps) RSP = 0x%016lx\n", diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 48376cc32751..33cffb4f8747 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -823,18 +823,29 @@ static void cf_check vmx_cpuid_policy_changed(struct vcpu *v) { vmx_clear_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW); - rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0); - if ( rc ) - goto out; + if ( !cpu_has_vmx_virt_spec_ctrl ) + { + rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0); + if ( rc ) + goto out; + } } else { vmx_set_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW); - rc = vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST); - if ( rc && rc != -ESRCH ) - goto out; - rc = 0; /* Tolerate -ESRCH */ + /* + * NB: there's no need to clear the virtualize SPEC_CTRL control, as + * the MSR intercept takes precedence. The SPEC_CTRL shadow VMCS field + * is also not loaded on guest entry/exit if the intercept is set. + */ + if ( !cpu_has_vmx_virt_spec_ctrl ) + { + rc = vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST); + if ( rc && rc != -ESRCH ) + goto out; + rc = 0; /* Tolerate -ESRCH */ + } } /* MSR_PRED_CMD is safe to pass through if the guest knows about it. */ @@ -2629,6 +2640,9 @@ static uint64_t cf_check vmx_get_reg(struct vcpu *v, unsigned int reg) switch ( reg ) { case MSR_SPEC_CTRL: + if ( cpu_has_vmx_virt_spec_ctrl ) + /* Requires remote VMCS loaded - fetched below. */ + break; rc = vmx_read_guest_msr(v, reg, &val); if ( rc ) { @@ -2652,6 +2666,11 @@ static uint64_t cf_check vmx_get_reg(struct vcpu *v, unsigned int reg) vmx_vmcs_enter(v); switch ( reg ) { + case MSR_SPEC_CTRL: + ASSERT(cpu_has_vmx_virt_spec_ctrl); + __vmread(SPEC_CTRL_SHADOW, &val); + break; + case MSR_IA32_BNDCFGS: __vmread(GUEST_BNDCFGS, &val); break; @@ -2678,6 +2697,9 @@ static void cf_check vmx_set_reg(struct vcpu *v, unsigned int reg, uint64_t val) switch ( reg ) { case MSR_SPEC_CTRL: + if ( cpu_has_vmx_virt_spec_ctrl ) + /* Requires remote VMCS loaded - fetched below. */ + break; rc = vmx_write_guest_msr(v, reg, val); if ( rc ) { @@ -2698,6 +2720,11 @@ static void cf_check vmx_set_reg(struct vcpu *v, unsigned int reg, uint64_t val) vmx_vmcs_enter(v); switch ( reg ) { + case MSR_SPEC_CTRL: + ASSERT(cpu_has_vmx_virt_spec_ctrl); + __vmwrite(SPEC_CTRL_SHADOW, val); + break; + case MSR_IA32_BNDCFGS: __vmwrite(GUEST_BNDCFGS, val); break; diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h index a7dd2eeffcad..58140af69153 100644 --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -270,6 +270,9 @@ extern u32 vmx_secondary_exec_control; #define TERTIARY_EXEC_VIRT_SPEC_CTRL BIT(7, UL) extern uint64_t vmx_tertiary_exec_control; +#define cpu_has_vmx_virt_spec_ctrl \ + (vmx_tertiary_exec_control & TERTIARY_EXEC_VIRT_SPEC_CTRL) + #define VMX_EPT_EXEC_ONLY_SUPPORTED 0x00000001 #define VMX_EPT_WALK_LENGTH_4_SUPPORTED 0x00000040 #define VMX_EPT_MEMORY_TYPE_UC 0x00000100 @@ -436,6 +439,8 @@ enum vmcs_field { XSS_EXIT_BITMAP = 0x0000202c, TSC_MULTIPLIER = 0x00002032, TERTIARY_VM_EXEC_CONTROL = 0x00002034, + SPEC_CTRL_MASK = 0x0000204a, + SPEC_CTRL_SHADOW = 0x0000204c, GUEST_PHYSICAL_ADDRESS = 0x00002400, VMCS_LINK_POINTER = 0x00002800, GUEST_IA32_DEBUGCTL = 0x00002802, diff --git a/xen/arch/x86/include/asm/msr.h b/xen/arch/x86/include/asm/msr.h index 1d8ea9f26faa..eed7b36cd992 100644 --- a/xen/arch/x86/include/asm/msr.h +++ b/xen/arch/x86/include/asm/msr.h @@ -302,8 +302,13 @@ struct vcpu_msrs * For PV guests, this holds the guest kernel value. It is accessed on * every entry/exit path. * - * For VT-x guests, the guest value is held in the MSR guest load/save - * list. + * For VT-x guests, the guest value is held in the MSR guest load/save list + * if there's no support for virtualized SPEC_CTRL. If virtualized + * SPEC_CTRL is enabled the value here signals which bits in SPEC_CTRL the + * guest is not able to modify. Note that the value for those bits used in + * Xen context is also used in the guest context. Setting a bit here + * doesn't force such bit to set in the guest context unless also set in + * Xen selection of SPEC_CTRL. * * For SVM, the guest value lives in the VMCB, and hardware saves/restores * the host value automatically. However, guests run with the OR of the
The feature is defined in the tertiary exec control, and is available starting from Sapphire Rapids and Alder Lake CPUs. When enabled, two extra VMCS fields are used: SPEC_CTRL mask and shadow. Bits set in mask are not allowed to be toggled by the guest (either set or clear) and the value in the shadow field is the value the guest expects to be in the SPEC_CTRL register. By using it the hypervisor can force the value of SPEC_CTRL bits behind the guest back without having to trap all accesses to SPEC_CTRL, note that no bits are forced into the guest as part of this patch. It also allows getting rid of SPEC_CTRL in the guest MSR load list, since the value in the shadow field will be loaded by the hardware on vmentry. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Changes since v1: - Expand commit message and code comments. - Prefix the output of the VMCS dump with '0x'. --- xen/arch/x86/hvm/vmx/vmcs.c | 10 +++++- xen/arch/x86/hvm/vmx/vmx.c | 41 ++++++++++++++++++++----- xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 5 +++ xen/arch/x86/include/asm/msr.h | 9 ++++-- 4 files changed, 55 insertions(+), 10 deletions(-)