Message ID | 20240518000430.1118488-10-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: Fixes for KVM_INTEL_PROVE_VE | expand |
On Sat, May 18, 2024 at 2:04 AM Sean Christopherson <seanjc@google.com> wrote: > Disable KVM's "prove #VE" support by default, as it provides no functional > value, and even its sanity checking benefits are relatively limited. I.e. > it should be fully opt-in even on debug kernels, especially since EPT > Violation #VE suppression appears to be buggy on some CPUs. More #VE trapping than #VE suppression. I wouldn't go so far as making it *depend* on DEBUG_KERNEL. EXPERT plus the scary help message is good enough. What about this: diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index b6831e17ec31..2864608c7016 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -97,14 +97,15 @@ config KVM_INTEL config KVM_INTEL_PROVE_VE bool "Check that guests do not receive #VE exceptions" - depends on KVM_INTEL && DEBUG_KERNEL && EXPERT + depends on KVM_INTEL && EXPERT help Checks that KVM's page table management code will not incorrectly let guests receive a virtualization exception. Virtualization exceptions will be trapped by the hypervisor rather than injected in the guest. - This should never be enabled in a production environment. + Note that #VE trapping appears to be buggy on some CPUs. + This should never be enabled in a production environment! If unsure, say N. Paolo > Opportunistically add a line in the help text to make it abundantly clear > that KVM_INTEL_PROVE_VE should never be enabled in a production > environment. > > Suggested-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/Kconfig | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > index 2a7f69abcac3..3468efc4be55 100644 > --- a/arch/x86/kvm/Kconfig > +++ b/arch/x86/kvm/Kconfig > @@ -97,15 +97,15 @@ config KVM_INTEL > > config KVM_INTEL_PROVE_VE > bool "Check that guests do not receive #VE exceptions" > - default KVM_PROVE_MMU || DEBUG_KERNEL > - depends on KVM_INTEL > + depends on KVM_INTEL && DEBUG_KERNEL && EXPERT > help > - > Checks that KVM's page table management code will not incorrectly > let guests receive a virtualization exception. Virtualization > exceptions will be trapped by the hypervisor rather than injected > in the guest. > > + This should never be enabled in a production environment. > + > If unsure, say N. > > config X86_SGX_KVM > -- > 2.45.0.215.g3402c0e53f-goog >
On Tue, May 21, 2024, Paolo Bonzini wrote: > On Sat, May 18, 2024 at 2:04 AM Sean Christopherson <seanjc@google.com> wrote: > > Disable KVM's "prove #VE" support by default, as it provides no functional > > value, and even its sanity checking benefits are relatively limited. I.e. > > it should be fully opt-in even on debug kernels, especially since EPT > > Violation #VE suppression appears to be buggy on some CPUs. > > More #VE trapping than #VE suppression. > > I wouldn't go so far as making it *depend* on DEBUG_KERNEL. EXPERT > plus the scary help message is good enough. Works for me. > > What about this: > > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > index b6831e17ec31..2864608c7016 100644 > --- a/arch/x86/kvm/Kconfig > +++ b/arch/x86/kvm/Kconfig > @@ -97,14 +97,15 @@ config KVM_INTEL > > config KVM_INTEL_PROVE_VE > bool "Check that guests do not receive #VE exceptions" > - depends on KVM_INTEL && DEBUG_KERNEL && EXPERT > + depends on KVM_INTEL && EXPERT > help > Checks that KVM's page table management code will not incorrectly > let guests receive a virtualization exception. Virtualization > exceptions will be trapped by the hypervisor rather than injected > in the guest. > > - This should never be enabled in a production environment. > + Note that #VE trapping appears to be buggy on some CPUs. I see where you're coming from, but I don't think "trapping" is much better, e.g. it suggests there's something broken with the interception of #VEs. Ah, the entire help text is weird. This? config KVM_INTEL_PROVE_VE bool "Verify guests do not receive unexpected EPT Violation #VEs" depends on KVM_INTEL && EXPERT help Enable EPT Violation #VEs (when supported) for all VMs, to verify that KVM's EPT management code will not incorrectly result in a #VE (KVM is supposed to supress #VEs by default). Unexpected #VEs will be intercepted by KVM and will trigger a WARN, but are otherwise transparent to the guest. Note, EPT Violation #VE support appears to be buggy on some CPUs. This should never be enabled in a production environment! If unsure, say N.
On Tue, May 21, 2024 at 8:18 PM Sean Christopherson <seanjc@google.com> wrote: > > - This should never be enabled in a production environment. > > + Note that #VE trapping appears to be buggy on some CPUs. > > I see where you're coming from, but I don't think "trapping" is much better, > e.g. it suggests there's something broken with the interception of #VEs. Ah, > the entire help text is weird. Yeah, I didn't want to say #VE is broken altogether - interception is where we saw issues, and #VE is used in production as far as I know (not just by TDX; at least Xen and maybe Hyper-V use it for anti-malware purposes?). Maybe "Note: there appear to be bugs in some CPUs that will trigger the WARN, in particular with eptad=0 and/or nested virtualization" covers all bases. Paolo > > This? > > config KVM_INTEL_PROVE_VE > bool "Verify guests do not receive unexpected EPT Violation #VEs" > depends on KVM_INTEL && EXPERT > help > Enable EPT Violation #VEs (when supported) for all VMs, to verify > that KVM's EPT management code will not incorrectly result in a #VE > (KVM is supposed to supress #VEs by default). Unexpected #VEs will > be intercepted by KVM and will trigger a WARN, but are otherwise > transparent to the guest. > > Note, EPT Violation #VE support appears to be buggy on some CPUs. > > This should never be enabled in a production environment! > > If unsure, say N. >
On Tue, May 21, 2024, Paolo Bonzini wrote: > On Tue, May 21, 2024 at 8:18 PM Sean Christopherson <seanjc@google.com> wrote: > > > - This should never be enabled in a production environment. > > > + Note that #VE trapping appears to be buggy on some CPUs. > > > > I see where you're coming from, but I don't think "trapping" is much better, > > e.g. it suggests there's something broken with the interception of #VEs. Ah, > > the entire help text is weird. > > Yeah, I didn't want to say #VE is broken altogether - Ah, yeah, good call. The #VE isn't broken per se, just spurious/unexpected. > interception is where we saw issues, It's not an issue with interception, disabling #VE intercepts results in the #VE being delivered to the guest. Test suite: ept_access_test_not_present PTE[4] @ 109fff8 = 9fed0007 PTE[3] @ 9fed0ff0 = 9fed1007 PTE[2] @ 9fed1000 = 9fed2007 VA PTE @ 9fed2000 = 8000000007 Created EPT @ 9feca008 = 11d2007 Created EPT @ 11d2000 = 11d3007 Created EPT @ 11d3000 = 11d4007 L1 hva = 40000000, hpa = 40000000, L2 gva = ffffffff80000000, gpa = 8000000000 Unhandled exception 8 #DF at ip 0000000000410d39 error_code=0000 rflags=00010097 cs=00000008 rax=ffffffff80000000 rcx=0000000000000000 rdx=0000000000000000 rbx=0000000000000000 rbp=000000009fec6fe0 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=ffffffff80000008 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 cr0=0000000080010031 cr2=0000000000000000 cr3=000000000109f000 cr4=0000000000002020 cr8=0000000000000000 STACK: @410d39 40144a 4002dd > and #VE is used in production as far as I know (not just by TDX; at least Xen > and maybe Hyper-V use it for anti-malware purposes?). Hmm, maybe a spurious #VE is benign? Or it really is limited to A/D bits being disabled? Not that us speculating is going to change anything :-) > Maybe "Note: there appear to be bugs in some CPUs that will trigger > the WARN, in particular with eptad=0 and/or nested virtualization" > covers all bases. Works for me. Maybe tweak it slightly to explain why the WARN is triggered? Note, some CPUs appear to generate spurious EPT Violations #VEs that trigger KVM's WARN, in particular with eptad=0 and/or nested virtualization.
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 2a7f69abcac3..3468efc4be55 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -97,15 +97,15 @@ config KVM_INTEL config KVM_INTEL_PROVE_VE bool "Check that guests do not receive #VE exceptions" - default KVM_PROVE_MMU || DEBUG_KERNEL - depends on KVM_INTEL + depends on KVM_INTEL && DEBUG_KERNEL && EXPERT help - Checks that KVM's page table management code will not incorrectly let guests receive a virtualization exception. Virtualization exceptions will be trapped by the hypervisor rather than injected in the guest. + This should never be enabled in a production environment. + If unsure, say N. config X86_SGX_KVM
Disable KVM's "prove #VE" support by default, as it provides no functional value, and even its sanity checking benefits are relatively limited. I.e. it should be fully opt-in even on debug kernels, especially since EPT Violation #VE suppression appears to be buggy on some CPUs. Opportunistically add a line in the help text to make it abundantly clear that KVM_INTEL_PROVE_VE should never be enabled in a production environment. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/Kconfig | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)