Message ID | 20220113163833.3831-2-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/spec-ctrl: Fix NMI race condition | expand |
On Thu, Jan 13, 2022 at 04:38:31PM +0000, Andrew Cooper wrote: > In order to fix a VT-x bug, and support MSR_SPEC_CTRL on AMD, there will need > to be three different access methods for where the guest's value lives. > However, it would be better not to duplicate the #GP checking logic. > > guest_{rd,wr}msr() are always called first in the PV and HVM MSR paths, so we > can repurpose X86EMUL_UNHANDLEABLE slightly for this. This is going to be a > common pattern for other MSRs too in the future. > > Duplicate the msrs->spec_ctrl.raw accesses in the PV and VT-x paths for now. > The SVM path is currently unreachable because of the CPUID policy. > > No functional change. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> It feels a bit weird to have the checks separated from the actual access to the MSR data, but that's likely better than to replicate the checks in both the PV and HVM vendor paths. Thanks, Roger.
On 13.01.2022 17:38, Andrew Cooper wrote: > In order to fix a VT-x bug, and support MSR_SPEC_CTRL on AMD, there will need > to be three different access methods for where the guest's value lives. > However, it would be better not to duplicate the #GP checking logic. > > guest_{rd,wr}msr() are always called first in the PV and HVM MSR paths, so we > can repurpose X86EMUL_UNHANDLEABLE slightly for this. This is going to be a > common pattern for other MSRs too in the future. I consider this repurposing risky. Did you consider using e.g. X86EMUL_DONE or X86EMUL_RETRY instead? Neither of the two is presently used by the MSR machinery afaics. What's worse, aren't you making it possible to hit the ASSERT_UNREACHABLE() in hvm_save_cpu_msrs() as well as in XEN_DOMCTL_get_vcpu_msrs handling? And have hvm_load_cpu_msrs() produce -ENXIO and XEN_DOMCTL_set_vcpu_msrs return -EINVAL? Jan
On 17/01/2022 11:07, Jan Beulich wrote: > On 13.01.2022 17:38, Andrew Cooper wrote: >> In order to fix a VT-x bug, and support MSR_SPEC_CTRL on AMD, there will need >> to be three different access methods for where the guest's value lives. >> However, it would be better not to duplicate the #GP checking logic. >> >> guest_{rd,wr}msr() are always called first in the PV and HVM MSR paths, so we >> can repurpose X86EMUL_UNHANDLEABLE slightly for this. This is going to be a >> common pattern for other MSRs too in the future. > I consider this repurposing risky. Did you consider using e.g. > X86EMUL_DONE or X86EMUL_RETRY instead? Neither of the two is > presently used by the MSR machinery afaics. RETRY is used for the MSRs which can cause a p2m allocation and hit the paging path. DONE is not remotely appropriate for this purpose. I also don't want to introduce a new PARTIAL because that is a recipe for backport bugs. > What's worse, aren't you making it possible to hit the > ASSERT_UNREACHABLE() in hvm_save_cpu_msrs() as well as in > XEN_DOMCTL_get_vcpu_msrs handling? And have hvm_load_cpu_msrs() > produce -ENXIO and XEN_DOMCTL_set_vcpu_msrs return -EINVAL? I found this bug after backporting the patches and getting them some wider testing. I'm doing a pre-req patch to rework the migration machinery to call into the pv/hvm paths rather than into guest_{rd,wr}msr() directly. ~Andrew
On 17.01.2022 12:24, Andrew Cooper wrote: > On 17/01/2022 11:07, Jan Beulich wrote: >> On 13.01.2022 17:38, Andrew Cooper wrote: >>> In order to fix a VT-x bug, and support MSR_SPEC_CTRL on AMD, there will need >>> to be three different access methods for where the guest's value lives. >>> However, it would be better not to duplicate the #GP checking logic. >>> >>> guest_{rd,wr}msr() are always called first in the PV and HVM MSR paths, so we >>> can repurpose X86EMUL_UNHANDLEABLE slightly for this. This is going to be a >>> common pattern for other MSRs too in the future. >> I consider this repurposing risky. Did you consider using e.g. >> X86EMUL_DONE or X86EMUL_RETRY instead? Neither of the two is >> presently used by the MSR machinery afaics. > > RETRY is used for the MSRs which can cause a p2m allocation and hit the > paging path. DONE is not remotely appropriate for this purpose. Well, okay then. I would have said DONE is as (in)appropriate as UNHANDLEABLE here. Jan
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index a7a0d662342a..28ee6393f11e 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3128,12 +3128,17 @@ static int is_last_branch_msr(u32 ecx) static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content) { struct vcpu *curr = current; + const struct vcpu_msrs *msrs = curr->arch.msrs; uint64_t tmp; HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr); switch ( msr ) { + case MSR_SPEC_CTRL: /* guest_rdmsr() has already performed #GP checks. */ + *msr_content = msrs->spec_ctrl.raw; + break; + case MSR_IA32_SYSENTER_CS: __vmread(GUEST_SYSENTER_CS, msr_content); break; @@ -3331,6 +3336,7 @@ void vmx_vlapic_msr_changed(struct vcpu *v) static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content) { struct vcpu *v = current; + struct vcpu_msrs *msrs = v->arch.msrs; const struct cpuid_policy *cp = v->domain->arch.cpuid; HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x, msr_value=%#"PRIx64, msr, msr_content); @@ -3339,6 +3345,10 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content) { uint64_t rsvd, tmp; + case MSR_SPEC_CTRL: /* guest_wrmsr() has already performed #GP checks. */ + msrs->spec_ctrl.raw = msr_content; + return X86EMUL_OKAY; + case MSR_IA32_SYSENTER_CS: __vmwrite(GUEST_SYSENTER_CS, msr_content); break; diff --git a/xen/arch/x86/include/asm/msr.h b/xen/arch/x86/include/asm/msr.h index 1d3eca9063a2..0b2176a9bc53 100644 --- a/xen/arch/x86/include/asm/msr.h +++ b/xen/arch/x86/include/asm/msr.h @@ -367,10 +367,13 @@ int init_domain_msr_policy(struct domain *d); int init_vcpu_msr_policy(struct vcpu *v); /* - * Below functions can return X86EMUL_UNHANDLEABLE which means that MSR is - * not (yet) handled by it and must be processed by legacy handlers. Such - * behaviour is needed for transition period until all rd/wrmsr are handled - * by the new MSR infrastructure. + * The below functions return X86EMUL_*. Callers are responsible for + * converting X86EMUL_EXCEPTION into #GP[0]. + * + * X86EMUL_UNHANDLEABLE means "not everything complete". It could be that: + * 1) Common #GP checks have been done, but val access needs delegating to the + * per-VM-type handlers. + * 2) The MSR is not handled at all by common logic. * * These functions are also used by the migration logic, so need to cope with * being used outside of v's context. diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c index b834456c7b02..3549630d6699 100644 --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -265,8 +265,7 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) case MSR_SPEC_CTRL: if ( !cp->feat.ibrsb ) goto gp_fault; - *val = msrs->spec_ctrl.raw; - break; + return X86EMUL_UNHANDLEABLE; /* Delegate value to per-VM-type logic. */ case MSR_INTEL_PLATFORM_INFO: *val = mp->platform_info.raw; @@ -514,8 +513,7 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val) if ( val & rsvd ) goto gp_fault; /* Rsvd bit set? */ - msrs->spec_ctrl.raw = val; - break; + return X86EMUL_UNHANDLEABLE; /* Delegate value to per-VM-type logic. */ case MSR_PRED_CMD: if ( !cp->feat.ibrsb && !cp->extd.ibpb ) diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c index c78be6d92b21..6644e739209c 100644 --- a/xen/arch/x86/pv/emul-priv-op.c +++ b/xen/arch/x86/pv/emul-priv-op.c @@ -875,6 +875,7 @@ static int read_msr(unsigned int reg, uint64_t *val, struct x86_emulate_ctxt *ctxt) { struct vcpu *curr = current; + const struct vcpu_msrs *msrs = curr->arch.msrs; const struct domain *currd = curr->domain; const struct cpuid_policy *cp = currd->arch.cpuid; bool vpmu_msr = false, warn = false; @@ -898,6 +899,10 @@ static int read_msr(unsigned int reg, uint64_t *val, *val |= APIC_BASE_BSP; return X86EMUL_OKAY; + case MSR_SPEC_CTRL: /* guest_rdmsr() has already performed #GP checks. */ + *val = msrs->spec_ctrl.raw; + return X86EMUL_OKAY; + case MSR_FS_BASE: if ( !cp->extd.lm ) break; @@ -1024,6 +1029,7 @@ static int write_msr(unsigned int reg, uint64_t val, struct x86_emulate_ctxt *ctxt) { struct vcpu *curr = current; + struct vcpu_msrs *msrs = curr->arch.msrs; const struct domain *currd = curr->domain; const struct cpuid_policy *cp = currd->arch.cpuid; bool vpmu_msr = false; @@ -1041,6 +1047,10 @@ static int write_msr(unsigned int reg, uint64_t val, { uint64_t temp; + case MSR_SPEC_CTRL: /* guest_wrmsr() has already performed #GP checks. */ + msrs->spec_ctrl.raw = val; + return X86EMUL_OKAY; + case MSR_FS_BASE: case MSR_GS_BASE: case MSR_SHADOW_GS_BASE:
In order to fix a VT-x bug, and support MSR_SPEC_CTRL on AMD, there will need to be three different access methods for where the guest's value lives. However, it would be better not to duplicate the #GP checking logic. guest_{rd,wr}msr() are always called first in the PV and HVM MSR paths, so we can repurpose X86EMUL_UNHANDLEABLE slightly for this. This is going to be a common pattern for other MSRs too in the future. Duplicate the msrs->spec_ctrl.raw accesses in the PV and VT-x paths for now. The SVM path is currently unreachable because of the CPUID policy. No functional change. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> --- xen/arch/x86/hvm/vmx/vmx.c | 10 ++++++++++ xen/arch/x86/include/asm/msr.h | 11 +++++++---- xen/arch/x86/msr.c | 6 ++---- xen/arch/x86/pv/emul-priv-op.c | 10 ++++++++++ 4 files changed, 29 insertions(+), 8 deletions(-)