Message ID | 20221213163104.19066-4-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/vmx: implement Bus Lock and VM Notify | expand |
On 13/12/2022 4:31 pm, Roger Pau Monne wrote: > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c > index a0d5e8d6ab..3d7c471a3f 100644 > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -1290,6 +1296,17 @@ static int construct_vmcs(struct vcpu *v) > v->arch.hvm.vmx.exception_bitmap = HVM_TRAP_MASK > | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault)) > | (v->arch.fully_eager_fpu ? 0 : (1U << TRAP_no_device)); > + if ( cpu_has_vmx_notify_vm_exiting ) > + { > + __vmwrite(NOTIFY_WINDOW, vm_notify_window); > + /* > + * Disable #AC and #DB interception: by using VM Notify Xen is > + * guaranteed to get a VM exit even if the guest manages to lock the > + * CPU. > + */ > + v->arch.hvm.vmx.exception_bitmap &= ~((1U << TRAP_debug) | > + (1U << TRAP_alignment_check)); > + } > vmx_update_exception_bitmap(v); > > v->arch.hvm.guest_cr[0] = X86_CR0_PE | X86_CR0_ET; > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index dabf4a3552..b11578777a 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -1428,10 +1428,19 @@ static void cf_check vmx_update_host_cr3(struct vcpu *v) > > void vmx_update_debug_state(struct vcpu *v) > { > + unsigned int mask = 1u << TRAP_int3; > + > + if ( !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting ) > + /* > + * Only allow toggling TRAP_debug if notify VM exit is enabled, as > + * unconditionally setting TRAP_debug is part of the XSA-156 fix. > + */ > + mask |= 1u << TRAP_debug; > + > if ( v->arch.hvm.debug_state_latch ) > - v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3; > + v->arch.hvm.vmx.exception_bitmap |= mask; > else > - v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3); > + v->arch.hvm.vmx.exception_bitmap &= ~mask; > > vmx_vmcs_enter(v); > vmx_update_exception_bitmap(v); > @@ -4180,6 +4189,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > switch ( vector ) > { > case TRAP_debug: > + if ( cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting ) > + goto exit_and_crash; This breaks GDBSX and introspection. For XSA-156, we were forced to intercept #DB unilaterally for safety, but both GDBSX and Introspection can optionally intercepting #DB for logical reasons too. i.e. we can legitimately end up here even on an system with VM Notify. What I can't figure out is why made any reference to MTF. MTF has absolutely nothing to do with TRAP_debug. Furthermore, there's no CPU in practice that has VM Notify but lacks MTF, so the head of vmx_update_debug_state() looks like dead code... ~Andrew
On 11.01.2023 22:05, Andrew Cooper wrote: > On 13/12/2022 4:31 pm, Roger Pau Monne wrote: >> --- a/xen/arch/x86/hvm/vmx/vmx.c >> +++ b/xen/arch/x86/hvm/vmx/vmx.c >> @@ -1428,10 +1428,19 @@ static void cf_check vmx_update_host_cr3(struct vcpu *v) >> >> void vmx_update_debug_state(struct vcpu *v) >> { >> + unsigned int mask = 1u << TRAP_int3; >> + >> + if ( !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting ) >> + /* >> + * Only allow toggling TRAP_debug if notify VM exit is enabled, as >> + * unconditionally setting TRAP_debug is part of the XSA-156 fix. >> + */ >> + mask |= 1u << TRAP_debug; >> + >> if ( v->arch.hvm.debug_state_latch ) >> - v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3; >> + v->arch.hvm.vmx.exception_bitmap |= mask; >> else >> - v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3); >> + v->arch.hvm.vmx.exception_bitmap &= ~mask; >> >> vmx_vmcs_enter(v); >> vmx_update_exception_bitmap(v); >> @@ -4180,6 +4189,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) >> switch ( vector ) >> { >> case TRAP_debug: >> + if ( cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting ) >> + goto exit_and_crash; > > This breaks GDBSX and introspection. > > For XSA-156, we were forced to intercept #DB unilaterally for safety, > but both GDBSX and Introspection can optionally intercepting #DB for > logical reasons too. > > i.e. we can legitimately end up here even on an system with VM Notify. > > > What I can't figure out is why made any reference to MTF. MTF has > absolutely nothing to do with TRAP_debug. Looking back I see that the two seemingly asymmetric conditions puzzled me during review, but for some reason I didn't question the MTF part as a whole; I think I simply wasn't sure and hence left it to the VMX maintainers. I think you're right and that part of the condition wants deleting from vmx_update_debug_state() (on top of deleting the entire if() above). > Furthermore, there's no CPU in practice that has VM Notify but lacks > MTF, so the head of vmx_update_debug_state() looks like dead code... "No CPU in practice" is not an applicable argument as long as the spec doesn't spell out a connection. When running virtualized ourselves, any valid feature combination may be found (seeing that we similarly may ourselves surface feature combinations to guests which no real hardware equivalent exists for). Jan
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index b7ee97be76..923910f553 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -2636,6 +2636,17 @@ guest will notify Xen that it has failed to acquire a spinlock. <major>, <minor> and <build> must be integers. The values will be encoded in guest CPUID 0x40000002 if viridian enlightenments are enabled. +### vm-notify-window (Intel) +> `= <integer>` + +> Default: `0` + +Specify the value of the VM Notify window used to detect locked VMs. Set to -1 +to disable the feature. Value is in units of crystal clock cycles. + +Note the hardware might add a threshold to the provided value in order to make +it safe, and hence using 0 is fine. + ### vpid (Intel) > `= <boolean>` diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index a0d5e8d6ab..3d7c471a3f 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -67,6 +67,9 @@ integer_param("ple_gap", ple_gap); static unsigned int __read_mostly ple_window = 4096; integer_param("ple_window", ple_window); +static unsigned int __ro_after_init vm_notify_window; +integer_param("vm-notify-window", vm_notify_window); + static bool __read_mostly opt_ept_pml = true; static s8 __read_mostly opt_ept_ad = -1; int8_t __read_mostly opt_ept_exec_sp = -1; @@ -210,6 +213,7 @@ static void __init vmx_display_features(void) P(cpu_has_vmx_pml, "Page Modification Logging"); P(cpu_has_vmx_tsc_scaling, "TSC Scaling"); P(cpu_has_vmx_bus_lock_detection, "Bus Lock Detection"); + P(cpu_has_vmx_notify_vm_exiting, "Notify VM Exit"); #undef P if ( !printed ) @@ -329,6 +333,8 @@ static int vmx_init_vmcs_config(bool bsp) opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST; if ( opt_ept_pml ) opt |= SECONDARY_EXEC_ENABLE_PML; + if ( vm_notify_window != ~0u ) + opt |= SECONDARY_EXEC_NOTIFY_VM_EXITING; /* * "APIC Register Virtualization" and "Virtual Interrupt Delivery" @@ -1290,6 +1296,17 @@ static int construct_vmcs(struct vcpu *v) v->arch.hvm.vmx.exception_bitmap = HVM_TRAP_MASK | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault)) | (v->arch.fully_eager_fpu ? 0 : (1U << TRAP_no_device)); + if ( cpu_has_vmx_notify_vm_exiting ) + { + __vmwrite(NOTIFY_WINDOW, vm_notify_window); + /* + * Disable #AC and #DB interception: by using VM Notify Xen is + * guaranteed to get a VM exit even if the guest manages to lock the + * CPU. + */ + v->arch.hvm.vmx.exception_bitmap &= ~((1U << TRAP_debug) | + (1U << TRAP_alignment_check)); + } vmx_update_exception_bitmap(v); v->arch.hvm.guest_cr[0] = X86_CR0_PE | X86_CR0_ET; diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index dabf4a3552..b11578777a 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1428,10 +1428,19 @@ static void cf_check vmx_update_host_cr3(struct vcpu *v) void vmx_update_debug_state(struct vcpu *v) { + unsigned int mask = 1u << TRAP_int3; + + if ( !cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting ) + /* + * Only allow toggling TRAP_debug if notify VM exit is enabled, as + * unconditionally setting TRAP_debug is part of the XSA-156 fix. + */ + mask |= 1u << TRAP_debug; + if ( v->arch.hvm.debug_state_latch ) - v->arch.hvm.vmx.exception_bitmap |= 1U << TRAP_int3; + v->arch.hvm.vmx.exception_bitmap |= mask; else - v->arch.hvm.vmx.exception_bitmap &= ~(1U << TRAP_int3); + v->arch.hvm.vmx.exception_bitmap &= ~mask; vmx_vmcs_enter(v); vmx_update_exception_bitmap(v); @@ -4180,6 +4189,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) switch ( vector ) { case TRAP_debug: + if ( cpu_has_monitor_trap_flag && cpu_has_vmx_notify_vm_exiting ) + goto exit_and_crash; + /* * Updates DR6 where debugger can peek (See 3B 23.2.1, * Table 23-1, "Exit Qualification for Debug Exceptions"). @@ -4619,6 +4631,22 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) */ break; + case EXIT_REASON_NOTIFY: + __vmread(EXIT_QUALIFICATION, &exit_qualification); + + if ( unlikely(exit_qualification & NOTIFY_VM_CONTEXT_INVALID) ) + { + perfc_incr(vmnotify_crash); + gprintk(XENLOG_ERR, "invalid VM context after notify vmexit\n"); + domain_crash(v->domain); + break; + } + + if ( unlikely(exit_qualification & INTR_INFO_NMI_UNBLOCKED_BY_IRET) ) + undo_nmis_unblocked_by_iret(); + + break; + case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED: case EXIT_REASON_INVPCID: /* fall through */ diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 2095c1e612..f8fe8d0c14 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -2487,6 +2487,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs, case EXIT_REASON_EPT_MISCONFIG: case EXIT_REASON_EXTERNAL_INTERRUPT: case EXIT_REASON_BUS_LOCK: + case EXIT_REASON_NOTIFY: /* pass to L0 handler */ break; case VMX_EXIT_REASONS_FAILED_VMENTRY: diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h index f3df5113d4..78404e42b3 100644 --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -268,6 +268,7 @@ extern u32 vmx_vmentry_control; #define SECONDARY_EXEC_XSAVES 0x00100000 #define SECONDARY_EXEC_TSC_SCALING 0x02000000 #define SECONDARY_EXEC_BUS_LOCK_DETECTION 0x40000000 +#define SECONDARY_EXEC_NOTIFY_VM_EXITING 0x80000000 extern u32 vmx_secondary_exec_control; #define VMX_EPT_EXEC_ONLY_SUPPORTED 0x00000001 @@ -349,6 +350,8 @@ extern u64 vmx_ept_vpid_cap; (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING) #define cpu_has_vmx_bus_lock_detection \ (vmx_secondary_exec_control & SECONDARY_EXEC_BUS_LOCK_DETECTION) +#define cpu_has_vmx_notify_vm_exiting \ + (vmx_secondary_exec_control & SECONDARY_EXEC_NOTIFY_VM_EXITING) #define VMCS_RID_TYPE_MASK 0x80000000 @@ -456,6 +459,7 @@ enum vmcs_field { SECONDARY_VM_EXEC_CONTROL = 0x0000401e, PLE_GAP = 0x00004020, PLE_WINDOW = 0x00004022, + NOTIFY_WINDOW = 0x00004024, VM_INSTRUCTION_ERROR = 0x00004400, VM_EXIT_REASON = 0x00004402, VM_EXIT_INTR_INFO = 0x00004404, diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h b/xen/arch/x86/include/asm/hvm/vmx/vmx.h index eae39365aa..8e1e42ac47 100644 --- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h @@ -221,6 +221,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc) #define EXIT_REASON_XSAVES 63 #define EXIT_REASON_XRSTORS 64 #define EXIT_REASON_BUS_LOCK 74 +#define EXIT_REASON_NOTIFY 75 /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */ /* @@ -236,6 +237,11 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc) #define INTR_INFO_VALID_MASK 0x80000000 /* 31 */ #define INTR_INFO_RESVD_BITS_MASK 0x7ffff000 +/* + * Exit Qualifications for NOTIFY VM EXIT + */ +#define NOTIFY_VM_CONTEXT_INVALID 1u + /* * Exit Qualifications for MOV for Control Register Access */ diff --git a/xen/arch/x86/include/asm/perfc_defn.h b/xen/arch/x86/include/asm/perfc_defn.h index 6fce21e85a..487e20dc97 100644 --- a/xen/arch/x86/include/asm/perfc_defn.h +++ b/xen/arch/x86/include/asm/perfc_defn.h @@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions, "exceptions", 32) #ifdef CONFIG_HVM -#define VMX_PERF_EXIT_REASON_SIZE 75 +#define VMX_PERF_EXIT_REASON_SIZE 76 #define VMEXIT_NPF_PERFC 143 #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1) PERFCOUNTER_ARRAY(vmexits, "vmexits", @@ -129,5 +129,6 @@ PERFCOUNTER(iommu_pt_shatters, "IOMMU page table shatters") PERFCOUNTER(iommu_pt_coalesces, "IOMMU page table coalesces") PERFCOUNTER(buslock, "Bus Locks Detected") +PERFCOUNTER(vmnotify_crash, "domain crashes by Notify VM Exit") /*#endif*/ /* __XEN_PERFC_DEFN_H__ */