Message ID | 20240722235922.3351122-1-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: nVMX: Honor userspace MSR filter lists for nested VM-Enter/VM-Exit | expand |
On Tue, Jul 23, 2024 at 07:59:22AM +0800, Sean Christopherson wrote: > Synthesize a consistency check VM-Exit (VM-Enter) or VM-Abort (VM-Exit) if > L1 attempts to load/store an MSR via the VMCS MSR lists that userspace has > disallowed access to via an MSR filter. Intel already disallows including > a handful of "special" MSRs in the VMCS lists, so denying access isn't > completely without precedent. > > More importantly, the behavior is well-defined _and_ can be communicated > the end user, e.g. to the customer that owns a VM running as L1 on top of > KVM. On the other hand, ignoring userspace MSR filters is all but > guaranteed to result in unexpected behavior as the access will hit KVM's > internal state, which is likely not up-to-date. > > Unlike KVM-internal accesses, instruction emulation, and dedicated VMCS > fields, the MSRs in the VMCS load/store lists are 100% guest controlled, > thus making it all but impossible to reason about the correctness of > ignoring the MSR filter. And if userspace *really* wants to deny access > to MSRs via the aforementioned scenarios, userspace can hide the > associated feature from the guest, e.g. by disabling the PMU to prevent > accessing PERF_GLOBAL_CTRL via its VMCS field. But for the MSR lists, KVM > is blindly processing MSRs; the MSR filters are the _only_ way for > userspace to deny access. > > This partially reverts commit ac8d6cad3c7b ("KVM: x86: Only do MSR > filtering when access MSR by rdmsr/wrmsr"). > > Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> > Cc: Jim Mattson <jmattson@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > > I found this by inspection when backporting Hou's change to an internal kernel. > I don't love piggybacking Intel's "you can't touch these special MSRs" behavior, > but ignoring the userspace MSR filters is far worse IMO. E.g. if userspace is > denying access to an MSR in order to reduce KVM's attack surface, letting L1 > sneak in reads/writes through VM-Enter/VM-Exit completely circumvents the > filters. > > Documentation/virt/kvm/api.rst | 19 ++++++++++++++++--- > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/vmx/nested.c | 12 ++++++------ > arch/x86/kvm/x86.c | 6 ++++-- > 4 files changed, 28 insertions(+), 11 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 8e5dad80b337..e6b1e42186f3 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -4226,9 +4226,22 @@ filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes > an error. > > .. warning:: > - MSR accesses as part of nested VM-Enter/VM-Exit are not filtered. > - This includes both writes to individual VMCS fields and reads/writes > - through the MSR lists pointed to by the VMCS. > + MSR accesses that are side effects of instruction execution (emulated or > + native) are not filtered as hardware does not honor MSR bitmaps outside of > + RDMSR and WRMSR, and KVM mimics that behavior when emulating instructions > + to avoid pointless divergence from hardware. E.g. RDPID reads MSR_TSC_AUX, > + SYSENTER reads the SYSENTER MSRs, etc. > + > + MSRs that are loaded/stored via dedicated VMCS fields are not filtered as > + part of VM-Enter/VM-Exit emulation. > + > + MSRs that are loaded/store via VMX's load/store lists _are_ filtered as part > + of VM-Enter/VM-Exit emulation. If an MSR access is denied on VM-Enter, KVM > + synthesizes a consistency check VM-Exit(EXIT_REASON_MSR_LOAD_FAIL). If an > + MSR access is denied on VM-Exit, KVM synthesizes a VM-Abort. In short, KVM > + extends Intel's architectural list of MSRs that cannot be loaded/saved via > + the VM-Enter/VM-Exit MSR list. It is platform owner's responsibility to > + to communicate any such restrictions to their end users. > Do we also need to modify the statement before this warning? Since the behaviour is different from RDMSR/WRMSR emulation case. ``` if an MSR access is denied by userspace the resulting KVM behavior depends on whether or not KVM_CAP_X86_USER_SPACE_MSR's KVM_MSR_EXIT_REASON_FILTER is enabled. If KVM_MSR_EXIT_REASON_FILTER is enabled, KVM will exit to userspace on denied accesses, i.e. userspace effectively intercepts the MSR access. ``` > x2APIC MSR accesses cannot be filtered (KVM silently ignores filters that > cover any x2APIC MSRs). > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 950a03e0181e..94d0bedc42ee 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -2059,6 +2059,8 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu); > > void kvm_enable_efer_bits(u64); > bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer); > +int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data); > +int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data); > int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated); > int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data); > int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data); > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 2392a7ef254d..674f7089cc44 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -981,7 +981,7 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count) > __func__, i, e.index, e.reserved); > goto fail; > } > - if (kvm_set_msr(vcpu, e.index, e.value)) { > + if (kvm_set_msr_with_filter(vcpu, e.index, e.value)) { > pr_debug_ratelimited( > "%s cannot write MSR (%u, 0x%x, 0x%llx)\n", > __func__, i, e.index, e.value); > @@ -1017,7 +1017,7 @@ static bool nested_vmx_get_vmexit_msr_value(struct kvm_vcpu *vcpu, > } > } > > - if (kvm_get_msr(vcpu, msr_index, data)) { > + if (kvm_get_msr_with_filter(vcpu, msr_index, data)) { > pr_debug_ratelimited("%s cannot read MSR (0x%x)\n", __func__, > msr_index); > return false; > @@ -1112,9 +1112,9 @@ static void prepare_vmx_msr_autostore_list(struct kvm_vcpu *vcpu, > /* > * Emulated VMEntry does not fail here. Instead a less > * accurate value will be returned by > - * nested_vmx_get_vmexit_msr_value() using kvm_get_msr() > - * instead of reading the value from the vmcs02 VMExit > - * MSR-store area. > + * nested_vmx_get_vmexit_msr_value() by reading KVM's > + * internal MSR state instead of reading the value from > + * the vmcs02 VMExit MSR-store area. > */ > pr_warn_ratelimited( > "Not enough msr entries in msr_autostore. Can't add msr %x\n", > @@ -4806,7 +4806,7 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu) > goto vmabort; > } > > - if (kvm_set_msr(vcpu, h.index, h.value)) { > + if (kvm_set_msr_with_filter(vcpu, h.index, h.value)) { > pr_debug_ratelimited( > "%s WRMSR failed (%u, 0x%x, 0x%llx)\n", > __func__, j, h.index, h.value); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index af6c8cf6a37a..7b3659a05c27 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1942,19 +1942,21 @@ static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu, > return ret; > } > > -static int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data) > +int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data) > { > if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ)) > return KVM_MSR_RET_FILTERED; > return kvm_get_msr_ignored_check(vcpu, index, data, false); > } > +EXPORT_SYMBOL_GPL(kvm_get_msr_with_filter); > > -static int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data) > +int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data) > { > if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE)) > return KVM_MSR_RET_FILTERED; > return kvm_set_msr_ignored_check(vcpu, index, data, false); > } > +EXPORT_SYMBOL_GPL(kvm_set_msr_with_filter); > > int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data) > { > > base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f > -- > 2.45.2.1089.g2a221341d9-goog
On Tue, Jul 23, 2024, Hou Wenlong wrote: > On Tue, Jul 23, 2024 at 07:59:22AM +0800, Sean Christopherson wrote: > > --- > > > > I found this by inspection when backporting Hou's change to an internal kernel. > > I don't love piggybacking Intel's "you can't touch these special MSRs" behavior, > > but ignoring the userspace MSR filters is far worse IMO. E.g. if userspace is > > denying access to an MSR in order to reduce KVM's attack surface, letting L1 > > sneak in reads/writes through VM-Enter/VM-Exit completely circumvents the > > filters. > > > > Documentation/virt/kvm/api.rst | 19 ++++++++++++++++--- > > arch/x86/include/asm/kvm_host.h | 2 ++ > > arch/x86/kvm/vmx/nested.c | 12 ++++++------ > > arch/x86/kvm/x86.c | 6 ++++-- > > 4 files changed, 28 insertions(+), 11 deletions(-) > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index 8e5dad80b337..e6b1e42186f3 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -4226,9 +4226,22 @@ filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes > > an error. > > > > .. warning:: > > - MSR accesses as part of nested VM-Enter/VM-Exit are not filtered. > > - This includes both writes to individual VMCS fields and reads/writes > > - through the MSR lists pointed to by the VMCS. > > + MSR accesses that are side effects of instruction execution (emulated or > > + native) are not filtered as hardware does not honor MSR bitmaps outside of > > + RDMSR and WRMSR, and KVM mimics that behavior when emulating instructions > > + to avoid pointless divergence from hardware. E.g. RDPID reads MSR_TSC_AUX, > > + SYSENTER reads the SYSENTER MSRs, etc. > > + > > + MSRs that are loaded/stored via dedicated VMCS fields are not filtered as > > + part of VM-Enter/VM-Exit emulation. > > + > > + MSRs that are loaded/store via VMX's load/store lists _are_ filtered as part > > + of VM-Enter/VM-Exit emulation. If an MSR access is denied on VM-Enter, KVM > > + synthesizes a consistency check VM-Exit(EXIT_REASON_MSR_LOAD_FAIL). If an > > + MSR access is denied on VM-Exit, KVM synthesizes a VM-Abort. In short, KVM > > + extends Intel's architectural list of MSRs that cannot be loaded/saved via > > + the VM-Enter/VM-Exit MSR list. It is platform owner's responsibility to > > + to communicate any such restrictions to their end users. > > > Do we also need to modify the statement before this warning? Yeah, that's a good idea. While you're here, did you have a use case that is/was affected by the current VM-Enter/VM-Exit vs. MSR filtering behavior? > Since the behaviour is different from RDMSR/WRMSR emulation case. > > ``` > if an MSR access is denied by userspace the resulting KVM behavior depends on > whether or not KVM_CAP_X86_USER_SPACE_MSR's KVM_MSR_EXIT_REASON_FILTER is > enabled. If KVM_MSR_EXIT_REASON_FILTER is enabled, KVM will exit to userspace > on denied accesses, i.e. userspace effectively intercepts the MSR access. > ```
On Tue, Jul 23, 2024 at 10:23:37PM +0800, Sean Christopherson wrote: > On Tue, Jul 23, 2024, Hou Wenlong wrote: > > On Tue, Jul 23, 2024 at 07:59:22AM +0800, Sean Christopherson wrote: > > > --- > > > > > > I found this by inspection when backporting Hou's change to an internal kernel. > > > I don't love piggybacking Intel's "you can't touch these special MSRs" behavior, > > > but ignoring the userspace MSR filters is far worse IMO. E.g. if userspace is > > > denying access to an MSR in order to reduce KVM's attack surface, letting L1 > > > sneak in reads/writes through VM-Enter/VM-Exit completely circumvents the > > > filters. > > > > > > Documentation/virt/kvm/api.rst | 19 ++++++++++++++++--- > > > arch/x86/include/asm/kvm_host.h | 2 ++ > > > arch/x86/kvm/vmx/nested.c | 12 ++++++------ > > > arch/x86/kvm/x86.c | 6 ++++-- > > > 4 files changed, 28 insertions(+), 11 deletions(-) > > > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > > index 8e5dad80b337..e6b1e42186f3 100644 > > > --- a/Documentation/virt/kvm/api.rst > > > +++ b/Documentation/virt/kvm/api.rst > > > @@ -4226,9 +4226,22 @@ filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes > > > an error. > > > > > > .. warning:: > > > - MSR accesses as part of nested VM-Enter/VM-Exit are not filtered. > > > - This includes both writes to individual VMCS fields and reads/writes > > > - through the MSR lists pointed to by the VMCS. > > > + MSR accesses that are side effects of instruction execution (emulated or > > > + native) are not filtered as hardware does not honor MSR bitmaps outside of > > > + RDMSR and WRMSR, and KVM mimics that behavior when emulating instructions > > > + to avoid pointless divergence from hardware. E.g. RDPID reads MSR_TSC_AUX, > > > + SYSENTER reads the SYSENTER MSRs, etc. > > > + > > > + MSRs that are loaded/stored via dedicated VMCS fields are not filtered as > > > + part of VM-Enter/VM-Exit emulation. > > > + > > > + MSRs that are loaded/store via VMX's load/store lists _are_ filtered as part > > > + of VM-Enter/VM-Exit emulation. If an MSR access is denied on VM-Enter, KVM > > > + synthesizes a consistency check VM-Exit(EXIT_REASON_MSR_LOAD_FAIL). If an > > > + MSR access is denied on VM-Exit, KVM synthesizes a VM-Abort. In short, KVM > > > + extends Intel's architectural list of MSRs that cannot be loaded/saved via > > > + the VM-Enter/VM-Exit MSR list. It is platform owner's responsibility to > > > + to communicate any such restrictions to their end users. > > > > > Do we also need to modify the statement before this warning? > > Yeah, that's a good idea. > > While you're here, did you have a use case that is/was affected by the current > VM-Enter/VM-Exit vs. MSR filtering behavior? > Uh, nested virtualization is not usually used in our enviroment and I didn't test it with MSR filtering before. I found a conflict between MSR filtering and RDPID instruction emulation when testing the x86 emulator for PVM, so I sent this patch. At that time, I was thinking that the state transitions (including VM-Enter/VM-Exit) would also be affected by MSR filtering, so I mentioned it in the commit message. > > Since the behaviour is different from RDMSR/WRMSR emulation case. > > > > ``` > > if an MSR access is denied by userspace the resulting KVM behavior depends on > > whether or not KVM_CAP_X86_USER_SPACE_MSR's KVM_MSR_EXIT_REASON_FILTER is > > enabled. If KVM_MSR_EXIT_REASON_FILTER is enabled, KVM will exit to userspace > > on denied accesses, i.e. userspace effectively intercepts the MSR access. > > ```
On Mon, 22 Jul 2024 16:59:22 -0700, Sean Christopherson wrote: > Synthesize a consistency check VM-Exit (VM-Enter) or VM-Abort (VM-Exit) if > L1 attempts to load/store an MSR via the VMCS MSR lists that userspace has > disallowed access to via an MSR filter. Intel already disallows including > a handful of "special" MSRs in the VMCS lists, so denying access isn't > completely without precedent. > > More importantly, the behavior is well-defined _and_ can be communicated > the end user, e.g. to the customer that owns a VM running as L1 on top of > KVM. On the other hand, ignoring userspace MSR filters is all but > guaranteed to result in unexpected behavior as the access will hit KVM's > internal state, which is likely not up-to-date. > > [...] Applied to kvm-x86 vmx, with a slight massaging of the docs as suggested by Hou. [1/1] KVM: nVMX: Honor userspace MSR filter lists for nested VM-Enter/VM-Exit https://github.com/kvm-x86/linux/commit/653ea4489e69 -- https://github.com/kvm-x86/linux/tree/next
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 8e5dad80b337..e6b1e42186f3 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -4226,9 +4226,22 @@ filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes an error. .. warning:: - MSR accesses as part of nested VM-Enter/VM-Exit are not filtered. - This includes both writes to individual VMCS fields and reads/writes - through the MSR lists pointed to by the VMCS. + MSR accesses that are side effects of instruction execution (emulated or + native) are not filtered as hardware does not honor MSR bitmaps outside of + RDMSR and WRMSR, and KVM mimics that behavior when emulating instructions + to avoid pointless divergence from hardware. E.g. RDPID reads MSR_TSC_AUX, + SYSENTER reads the SYSENTER MSRs, etc. + + MSRs that are loaded/stored via dedicated VMCS fields are not filtered as + part of VM-Enter/VM-Exit emulation. + + MSRs that are loaded/store via VMX's load/store lists _are_ filtered as part + of VM-Enter/VM-Exit emulation. If an MSR access is denied on VM-Enter, KVM + synthesizes a consistency check VM-Exit(EXIT_REASON_MSR_LOAD_FAIL). If an + MSR access is denied on VM-Exit, KVM synthesizes a VM-Abort. In short, KVM + extends Intel's architectural list of MSRs that cannot be loaded/saved via + the VM-Enter/VM-Exit MSR list. It is platform owner's responsibility to + to communicate any such restrictions to their end users. x2APIC MSR accesses cannot be filtered (KVM silently ignores filters that cover any x2APIC MSRs). diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 950a03e0181e..94d0bedc42ee 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2059,6 +2059,8 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu); void kvm_enable_efer_bits(u64); bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer); +int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data); +int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data); int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data); diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 2392a7ef254d..674f7089cc44 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -981,7 +981,7 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count) __func__, i, e.index, e.reserved); goto fail; } - if (kvm_set_msr(vcpu, e.index, e.value)) { + if (kvm_set_msr_with_filter(vcpu, e.index, e.value)) { pr_debug_ratelimited( "%s cannot write MSR (%u, 0x%x, 0x%llx)\n", __func__, i, e.index, e.value); @@ -1017,7 +1017,7 @@ static bool nested_vmx_get_vmexit_msr_value(struct kvm_vcpu *vcpu, } } - if (kvm_get_msr(vcpu, msr_index, data)) { + if (kvm_get_msr_with_filter(vcpu, msr_index, data)) { pr_debug_ratelimited("%s cannot read MSR (0x%x)\n", __func__, msr_index); return false; @@ -1112,9 +1112,9 @@ static void prepare_vmx_msr_autostore_list(struct kvm_vcpu *vcpu, /* * Emulated VMEntry does not fail here. Instead a less * accurate value will be returned by - * nested_vmx_get_vmexit_msr_value() using kvm_get_msr() - * instead of reading the value from the vmcs02 VMExit - * MSR-store area. + * nested_vmx_get_vmexit_msr_value() by reading KVM's + * internal MSR state instead of reading the value from + * the vmcs02 VMExit MSR-store area. */ pr_warn_ratelimited( "Not enough msr entries in msr_autostore. Can't add msr %x\n", @@ -4806,7 +4806,7 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu) goto vmabort; } - if (kvm_set_msr(vcpu, h.index, h.value)) { + if (kvm_set_msr_with_filter(vcpu, h.index, h.value)) { pr_debug_ratelimited( "%s WRMSR failed (%u, 0x%x, 0x%llx)\n", __func__, j, h.index, h.value); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index af6c8cf6a37a..7b3659a05c27 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1942,19 +1942,21 @@ static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu, return ret; } -static int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data) +int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data) { if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ)) return KVM_MSR_RET_FILTERED; return kvm_get_msr_ignored_check(vcpu, index, data, false); } +EXPORT_SYMBOL_GPL(kvm_get_msr_with_filter); -static int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data) +int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data) { if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE)) return KVM_MSR_RET_FILTERED; return kvm_set_msr_ignored_check(vcpu, index, data, false); } +EXPORT_SYMBOL_GPL(kvm_set_msr_with_filter); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data) {
Synthesize a consistency check VM-Exit (VM-Enter) or VM-Abort (VM-Exit) if L1 attempts to load/store an MSR via the VMCS MSR lists that userspace has disallowed access to via an MSR filter. Intel already disallows including a handful of "special" MSRs in the VMCS lists, so denying access isn't completely without precedent. More importantly, the behavior is well-defined _and_ can be communicated the end user, e.g. to the customer that owns a VM running as L1 on top of KVM. On the other hand, ignoring userspace MSR filters is all but guaranteed to result in unexpected behavior as the access will hit KVM's internal state, which is likely not up-to-date. Unlike KVM-internal accesses, instruction emulation, and dedicated VMCS fields, the MSRs in the VMCS load/store lists are 100% guest controlled, thus making it all but impossible to reason about the correctness of ignoring the MSR filter. And if userspace *really* wants to deny access to MSRs via the aforementioned scenarios, userspace can hide the associated feature from the guest, e.g. by disabling the PMU to prevent accessing PERF_GLOBAL_CTRL via its VMCS field. But for the MSR lists, KVM is blindly processing MSRs; the MSR filters are the _only_ way for userspace to deny access. This partially reverts commit ac8d6cad3c7b ("KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr"). Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- I found this by inspection when backporting Hou's change to an internal kernel. I don't love piggybacking Intel's "you can't touch these special MSRs" behavior, but ignoring the userspace MSR filters is far worse IMO. E.g. if userspace is denying access to an MSR in order to reduce KVM's attack surface, letting L1 sneak in reads/writes through VM-Enter/VM-Exit completely circumvents the filters. Documentation/virt/kvm/api.rst | 19 ++++++++++++++++--- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/vmx/nested.c | 12 ++++++------ arch/x86/kvm/x86.c | 6 ++++-- 4 files changed, 28 insertions(+), 11 deletions(-) base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f