Message ID | 20220105123532.12586-13-yang.zhong@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | AMX Support in KVM | expand |
On Wed, Jan 05, 2022, Yang Zhong wrote: > @@ -6399,6 +6424,26 @@ static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, > kvm_after_interrupt(vcpu); > } > > +static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) > +{ > + /* > + * Save xfd_err to guest_fpu before interrupt is enabled, so the > + * MSR value is not clobbered by the host activity before the guest > + * has chance to consume it. > + * > + * We should not blindly read xfd_err here, since this exception Nit, avoid "we", and explain what KVM does (or doesn't) do, not what KVM "should" do, e.g. just say * Do not blindly read ... > + * might be caused by L1 interception on a platform which doesn't > + * support xfd at all. > + * > + * Do it conditionally upon guest_fpu::xfd. xfd_err matters > + * only when xfd contains a non-zero value. > + * > + * Queuing exception is done in vmx_handle_exit. See comment there. Another nit, it's worth explaining why XFD_ERR needs to be read here regardless of is_guest_mode(). E.g. * Injecting the #NM back into the guest is handled in the standard path * as an #NM in L2 may be reflected into L1 as a VM-Exit. Read XFD_ERR * even if the #NM is from L2, as L1 may have exposed XFD to L2. Side topic, in a follow up series/patch, it's probably worth adding support in nested_vmx_prepare_msr_bitmap() to allow passthrough of the MSRs to L2. > + */ > + if (vcpu->arch.guest_fpu.fpstate->xfd) > + rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); > +} > + > static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) > { > const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist; > @@ -6407,6 +6452,9 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) > /* if exit due to PF check for async PF */ > if (is_page_fault(intr_info)) > vmx->vcpu.arch.apf.host_apf_flags = kvm_read_and_reset_apf_flags(); > + /* if exit due to NM, handle before interrupts are enabled */ Nit, drop this comment, it's slightly misleading since the #NM isn't fully handled here. The comment in handle_nm_fault_irqoff() is more than sufficient. > + else if (is_nm_fault(intr_info)) > + handle_nm_fault_irqoff(&vmx->vcpu); > /* Handle machine checks before interrupts are enabled */ > else if (is_machine_check(intr_info)) > kvm_machine_check(); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 21ce65220e38..2c988f8ca616 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9953,6 +9953,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > if (test_thread_flag(TIF_NEED_FPU_LOAD)) > switch_fpu_return(); > > + if (vcpu->arch.guest_fpu.xfd_err) > + wrmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); > + > if (unlikely(vcpu->arch.switch_db_regs)) { > set_debugreg(0, 7); > set_debugreg(vcpu->arch.eff_db[0], 0); > @@ -10016,6 +10019,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > > static_call(kvm_x86_handle_exit_irqoff)(vcpu); > > + if (vcpu->arch.guest_fpu.xfd_err) > + wrmsrl(MSR_IA32_XFD_ERR, 0); > + > /* > * Consume any pending interrupts, including the possible source of > * VM-Exit on SVM and any ticks that occur between VM-Exit and now.
On Wed, Jan 5, 2022 at 2:22 PM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Jan 05, 2022, Yang Zhong wrote: > > @@ -6399,6 +6424,26 @@ static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, > > kvm_after_interrupt(vcpu); > > } > > > > +static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) > > +{ > > + /* > > + * Save xfd_err to guest_fpu before interrupt is enabled, so the > > + * MSR value is not clobbered by the host activity before the guest > > + * has chance to consume it. > > + * > > + * We should not blindly read xfd_err here, since this exception > > Nit, avoid "we", and explain what KVM does (or doesn't) do, not what KVM "should" > do, e.g. just say > > * Do not blindly read ... > > > + * might be caused by L1 interception on a platform which doesn't > > + * support xfd at all. > > + * > > + * Do it conditionally upon guest_fpu::xfd. xfd_err matters > > + * only when xfd contains a non-zero value. > > + * > > + * Queuing exception is done in vmx_handle_exit. See comment there. > > Another nit, it's worth explaining why XFD_ERR needs to be read here regardless > of is_guest_mode(). E.g. > > * Injecting the #NM back into the guest is handled in the standard path > * as an #NM in L2 may be reflected into L1 as a VM-Exit. Read XFD_ERR > * even if the #NM is from L2, as L1 may have exposed XFD to L2. Do we have tests of L1 passing through XFD to L2?
> From: Sean Christopherson <seanjc@google.com> > Sent: Thursday, January 6, 2022 6:22 AM > > On Wed, Jan 05, 2022, Yang Zhong wrote: > > @@ -6399,6 +6424,26 @@ static void handle_interrupt_nmi_irqoff(struct > kvm_vcpu *vcpu, > > kvm_after_interrupt(vcpu); > > } > > > > +static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) > > +{ > > + /* > > + * Save xfd_err to guest_fpu before interrupt is enabled, so the > > + * MSR value is not clobbered by the host activity before the guest > > + * has chance to consume it. > > + * > > + * We should not blindly read xfd_err here, since this exception > > Nit, avoid "we", and explain what KVM does (or doesn't) do, not what KVM > "should" > do, e.g. just say > > * Do not blindly read ... > > > + * might be caused by L1 interception on a platform which doesn't > > + * support xfd at all. > > + * > > + * Do it conditionally upon guest_fpu::xfd. xfd_err matters > > + * only when xfd contains a non-zero value. > > + * > > + * Queuing exception is done in vmx_handle_exit. See comment > there. > > Another nit, it's worth explaining why XFD_ERR needs to be read here > regardless > of is_guest_mode(). E.g. > > * Injecting the #NM back into the guest is handled in the standard > path > * as an #NM in L2 may be reflected into L1 as a VM-Exit. Read > XFD_ERR > * even if the #NM is from L2, as L1 may have exposed XFD to L2. sounds good > > Side topic, in a follow up series/patch, it's probably worth adding support in > nested_vmx_prepare_msr_bitmap() to allow passthrough of the MSRs to L2. will do. > > > + */ > > + if (vcpu->arch.guest_fpu.fpstate->xfd) > > + rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); > > +} > > + > > static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) > > { > > const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist; > > @@ -6407,6 +6452,9 @@ static void handle_exception_nmi_irqoff(struct > vcpu_vmx *vmx) > > /* if exit due to PF check for async PF */ > > if (is_page_fault(intr_info)) > > vmx->vcpu.arch.apf.host_apf_flags = > kvm_read_and_reset_apf_flags(); > > + /* if exit due to NM, handle before interrupts are enabled */ > > Nit, drop this comment, it's slightly misleading since the #NM isn't fully > handled > here. The comment in handle_nm_fault_irqoff() is more than sufficient. > > > + else if (is_nm_fault(intr_info)) > > + handle_nm_fault_irqoff(&vmx->vcpu); > > /* Handle machine checks before interrupts are enabled */ > > else if (is_machine_check(intr_info)) > > kvm_machine_check(); > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 21ce65220e38..2c988f8ca616 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -9953,6 +9953,9 @@ static int vcpu_enter_guest(struct kvm_vcpu > *vcpu) > > if (test_thread_flag(TIF_NEED_FPU_LOAD)) > > switch_fpu_return(); > > > > + if (vcpu->arch.guest_fpu.xfd_err) > > + wrmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); > > + > > if (unlikely(vcpu->arch.switch_db_regs)) { > > set_debugreg(0, 7); > > set_debugreg(vcpu->arch.eff_db[0], 0); > > @@ -10016,6 +10019,9 @@ static int vcpu_enter_guest(struct kvm_vcpu > *vcpu) > > > > static_call(kvm_x86_handle_exit_irqoff)(vcpu); > > > > + if (vcpu->arch.guest_fpu.xfd_err) > > + wrmsrl(MSR_IA32_XFD_ERR, 0); > > + > > /* > > * Consume any pending interrupts, including the possible source of > > * VM-Exit on SVM and any ticks that occur between VM-Exit and > now.
> From: Jim Mattson <jmattson@google.com> > Sent: Thursday, January 6, 2022 6:31 AM > > On Wed, Jan 5, 2022 at 2:22 PM Sean Christopherson <seanjc@google.com> > wrote: > > > > On Wed, Jan 05, 2022, Yang Zhong wrote: > > > @@ -6399,6 +6424,26 @@ static void handle_interrupt_nmi_irqoff(struct > kvm_vcpu *vcpu, > > > kvm_after_interrupt(vcpu); > > > } > > > > > > +static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) > > > +{ > > > + /* > > > + * Save xfd_err to guest_fpu before interrupt is enabled, so the > > > + * MSR value is not clobbered by the host activity before the guest > > > + * has chance to consume it. > > > + * > > > + * We should not blindly read xfd_err here, since this exception > > > > Nit, avoid "we", and explain what KVM does (or doesn't) do, not what KVM > "should" > > do, e.g. just say > > > > * Do not blindly read ... > > > > > + * might be caused by L1 interception on a platform which doesn't > > > + * support xfd at all. > > > + * > > > + * Do it conditionally upon guest_fpu::xfd. xfd_err matters > > > + * only when xfd contains a non-zero value. > > > + * > > > + * Queuing exception is done in vmx_handle_exit. See comment there. > > > > Another nit, it's worth explaining why XFD_ERR needs to be read here > regardless > > of is_guest_mode(). E.g. > > > > * Injecting the #NM back into the guest is handled in the standard path > > * as an #NM in L2 may be reflected into L1 as a VM-Exit. Read > XFD_ERR > > * even if the #NM is from L2, as L1 may have exposed XFD to L2. > > Do we have tests of L1 passing through XFD to L2? As Sean mentioned passing through XFD to L2 still needs one more change in nested_vmx_prepare_msr_bitmap(). This will be done in a follow-up patch. btw we did verify having L1 emulate XFD for L2, by running the amx selftest in L1. But overall nested configuration will need more test and polish after this series. Thanks Kevin
diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h index 6e5de2e2b0da..e325c290a816 100644 --- a/arch/x86/kvm/vmx/vmcs.h +++ b/arch/x86/kvm/vmx/vmcs.h @@ -129,6 +129,11 @@ static inline bool is_machine_check(u32 intr_info) return is_exception_n(intr_info, MC_VECTOR); } +static inline bool is_nm_fault(u32 intr_info) +{ + return is_exception_n(intr_info, NM_VECTOR); +} + /* Undocumented: icebp/int1 */ static inline bool is_icebp(u32 intr_info) { diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index fe06b02994e6..939e6aad128c 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -36,6 +36,7 @@ #include <asm/debugreg.h> #include <asm/desc.h> #include <asm/fpu/api.h> +#include <asm/fpu/xstate.h> #include <asm/idtentry.h> #include <asm/io.h> #include <asm/irq_remapping.h> @@ -761,6 +762,13 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu) vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, match); } + /* + * Trap #NM if guest xfd contains a non-zero value so guest XFD_ERR + * can be saved timely. + */ + if (vcpu->arch.guest_fpu.fpstate->xfd) + eb |= (1u << NM_VECTOR); + vmcs_write32(EXCEPTION_BITMAP, eb); } @@ -1967,6 +1975,12 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_KERNEL_GS_BASE: vmx_write_guest_kernel_gs_base(vmx, data); break; + case MSR_IA32_XFD: + ret = kvm_set_msr_common(vcpu, msr_info); + /* Update #NM interception according to guest xfd */ + if (!ret) + vmx_update_exception_bitmap(vcpu); + break; #endif case MSR_IA32_SYSENTER_CS: if (is_guest_mode(vcpu)) @@ -4798,6 +4812,17 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) if (is_machine_check(intr_info) || is_nmi(intr_info)) return 1; /* handled by handle_exception_nmi_irqoff() */ + /* + * Queue the exception here instead of in handle_nm_fault_irqoff(). + * This ensures the nested_vmx check is not skipped so vmexit can + * be reflected to L1 (when it intercepts #NM) before reaching this + * point. + */ + if (is_nm_fault(intr_info)) { + kvm_queue_exception(vcpu, NM_VECTOR); + return 1; + } + if (is_invalid_opcode(intr_info)) return handle_ud(vcpu); @@ -6399,6 +6424,26 @@ static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, kvm_after_interrupt(vcpu); } +static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) +{ + /* + * Save xfd_err to guest_fpu before interrupt is enabled, so the + * MSR value is not clobbered by the host activity before the guest + * has chance to consume it. + * + * We should not blindly read xfd_err here, since this exception + * might be caused by L1 interception on a platform which doesn't + * support xfd at all. + * + * Do it conditionally upon guest_fpu::xfd. xfd_err matters + * only when xfd contains a non-zero value. + * + * Queuing exception is done in vmx_handle_exit. See comment there. + */ + if (vcpu->arch.guest_fpu.fpstate->xfd) + rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); +} + static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) { const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist; @@ -6407,6 +6452,9 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) /* if exit due to PF check for async PF */ if (is_page_fault(intr_info)) vmx->vcpu.arch.apf.host_apf_flags = kvm_read_and_reset_apf_flags(); + /* if exit due to NM, handle before interrupts are enabled */ + else if (is_nm_fault(intr_info)) + handle_nm_fault_irqoff(&vmx->vcpu); /* Handle machine checks before interrupts are enabled */ else if (is_machine_check(intr_info)) kvm_machine_check(); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 21ce65220e38..2c988f8ca616 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9953,6 +9953,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (test_thread_flag(TIF_NEED_FPU_LOAD)) switch_fpu_return(); + if (vcpu->arch.guest_fpu.xfd_err) + wrmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); + if (unlikely(vcpu->arch.switch_db_regs)) { set_debugreg(0, 7); set_debugreg(vcpu->arch.eff_db[0], 0); @@ -10016,6 +10019,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) static_call(kvm_x86_handle_exit_irqoff)(vcpu); + if (vcpu->arch.guest_fpu.xfd_err) + wrmsrl(MSR_IA32_XFD_ERR, 0); + /* * Consume any pending interrupts, including the possible source of * VM-Exit on SVM and any ticks that occur between VM-Exit and now.