Message ID | 20230623123522.4185651-2-aaronlewis@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/pmu: SRCU protect the PMU event filter in the fast path | expand |
On Fri, Jun 23, 2023, Aaron Lewis wrote: > When running KVM's fast path it is possible to get into a situation > where the PMU event filter is dereferenced without grabbing KVM's SRCU > read lock. > > The following callstack demonstrates how that is possible. > > Call Trace: > dump_stack+0x85/0xdf > lockdep_rcu_suspicious+0x109/0x120 > pmc_event_is_allowed+0x165/0x170 > kvm_pmu_trigger_event+0xa5/0x190 > handle_fastpath_set_msr_irqoff+0xca/0x1e0 > svm_vcpu_run+0x5c3/0x7b0 [kvm_amd] > vcpu_enter_guest+0x2108/0x2580 > > Fix that by explicitly grabbing the read lock before dereferencing the > PMU event filter. Actually, on second thought, I think it would be better to acquire kvm->srcu in handle_fastpath_set_msr_irqoff(). This is the second time that invoking kvm_skip_emulated_instruction() resulted in an SRCU violation, and it probably won't be the last since one of the benefits of using SRCU instead of per-asset locks to protect things like memslots and filters is that low(ish) level helpers don't need to worry about acquiring locks. The 2x LOCK ADD from smp_mb() is unfortunate, but IMO it's worth eating that cost to avoid having to play whack-a-mole in the future. And as a (very small) bonus, commit 5c30e8101e8d can be reverted. -- From: Sean Christopherson <seanjc@google.com> Date: Fri, 23 Jun 2023 08:19:51 -0700 Subject: [PATCH] KVM: x86: Acquire SRCU read lock when handling fastpath MSR writes Temporarily acquire kvm->srcu for read when potentially emulating WRMSR in the VM-Exit fastpath handler, as several of the common helpers used during emulation expect the caller to provide SRCU protection. E.g. if the guest is counting instructions retired, KVM will query the PMU event filter when stepping over the WRMSR. dump_stack+0x85/0xdf lockdep_rcu_suspicious+0x109/0x120 pmc_event_is_allowed+0x165/0x170 kvm_pmu_trigger_event+0xa5/0x190 handle_fastpath_set_msr_irqoff+0xca/0x1e0 svm_vcpu_run+0x5c3/0x7b0 [kvm_amd] vcpu_enter_guest+0x2108/0x2580 Alternatively, check_pmu_event_filter() could acquire kvm->srcu, but this isn't the first bug of this nature, e.g. see commit 5c30e8101e8d ("KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"). Providing protection for the entirety of WRMSR emulation will allow reverting the aforementioned commit, and will avoid having to play whack-a-mole when new uses of SRCU-protected structures are inevitably added in common emulation helpers. Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events") Reported-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/x86.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 439312e04384..5f220c04624e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2172,6 +2172,8 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu) u64 data; fastpath_t ret = EXIT_FASTPATH_NONE; + kvm_vcpu_srcu_read_lock(vcpu); + switch (msr) { case APIC_BASE_MSR + (APIC_ICR >> 4): data = kvm_read_edx_eax(vcpu); @@ -2194,6 +2196,8 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu) if (ret != EXIT_FASTPATH_NONE) trace_kvm_msr_write(msr, data); + kvm_vcpu_srcu_read_unlock(vcpu); + return ret; } EXPORT_SYMBOL_GPL(handle_fastpath_set_msr_irqoff); base-commit: 88bb466c9dec4f70d682cf38c685324e7b1b3d60 --
> > Actually, on second thought, I think it would be better to acquire kvm->srcu in > handle_fastpath_set_msr_irqoff(). This is the second time that invoking > kvm_skip_emulated_instruction() resulted in an SRCU violation, and it probably > won't be the last since one of the benefits of using SRCU instead of per-asset > locks to protect things like memslots and filters is that low(ish) level helpers > don't need to worry about acquiring locks. Yeah, I like this approach better. > > Alternatively, check_pmu_event_filter() could acquire kvm->srcu, but this > isn't the first bug of this nature, e.g. see commit 5c30e8101e8d ("KVM: > SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"). Providing > protection for the entirety of WRMSR emulation will allow reverting the > aforementioned commit, and will avoid having to play whack-a-mole when new > uses of SRCU-protected structures are inevitably added in common emulation > helpers. > > Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events") > Reported-by: Aaron Lewis <aaronlewis@google.com> Could we also add "Reported-by: gthelen@google.com" > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/x86.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 439312e04384..5f220c04624e 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2172,6 +2172,8 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu) > u64 data; > fastpath_t ret = EXIT_FASTPATH_NONE; > > + kvm_vcpu_srcu_read_lock(vcpu); > + > switch (msr) { > case APIC_BASE_MSR + (APIC_ICR >> 4): > data = kvm_read_edx_eax(vcpu); > @@ -2194,6 +2196,8 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu) > if (ret != EXIT_FASTPATH_NONE) > trace_kvm_msr_write(msr, data); > > + kvm_vcpu_srcu_read_unlock(vcpu); > + > return ret; > } > EXPORT_SYMBOL_GPL(handle_fastpath_set_msr_irqoff); > > base-commit: 88bb466c9dec4f70d682cf38c685324e7b1b3d60 > -- > As a separate issue, shouldn't we restrict the MSR filter from being able to intercept MSRs handled by the fast path? I see that we do that for the APIC MSRs, but if MSR_IA32_TSC_DEADLINE is handled by the fast path, I don't see a way for userspace to override that behavior. So maybe it shouldn't? E.g. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 439312e04384..dd0a314da0a3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1787,7 +1787,7 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type) u32 i; /* x2APIC MSRs do not support filtering. */ - if (index >= 0x800 && index <= 0x8ff) + if (index >= 0x800 && index <= 0x8ff || index == MSR_IA32_TSC_DEADLINE) return true; idx = srcu_read_lock(&kvm->srcu);
On Mon, Jun 26, 2023, Aaron Lewis wrote: > As a separate issue, shouldn't we restrict the MSR filter from being > able to intercept MSRs handled by the fast path? I see that we do > that for the APIC MSRs, but if MSR_IA32_TSC_DEADLINE is handled by the > fast path, I don't see a way for userspace to override that behavior. > So maybe it shouldn't? E.g. > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 439312e04384..dd0a314da0a3 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1787,7 +1787,7 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 > index, u32 type) > u32 i; > > /* x2APIC MSRs do not support filtering. */ > - if (index >= 0x800 && index <= 0x8ff) > + if (index >= 0x800 && index <= 0x8ff || index == MSR_IA32_TSC_DEADLINE) > return true; > > idx = srcu_read_lock(&kvm->srcu); Yeah, I saw that flaw too :-/ I'm not entirely sure what to do about MSRs that can be handled in the fastpath. On one hand, intercepting those MSRs probably doesn't make much sense. On the other hand, the MSR filter needs to be uABI, i.e. we can't make the statement "MSRs handled in KVM's fastpath can't be filtered", because either every new fastpath MSRs will potentially break userspace, or KVM will be severely limited with respect to what can be handled in the fastpath. From an ABI perspective, the easiest thing is to fix the bug and enforce any filter that affects MSR_IA32_TSC_DEADLINE. If we ignore performance, the fix is trivial. E.g. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5f220c04624e..3ef903bb78ce 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2174,6 +2174,9 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu) kvm_vcpu_srcu_read_lock(vcpu); + if (!kvm_msr_allowed(vcpu, msr, KVM_MSR_FILTER_WRITE)) + goto out; + switch (msr) { case APIC_BASE_MSR + (APIC_ICR >> 4): data = kvm_read_edx_eax(vcpu); @@ -2196,6 +2199,7 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu) if (ret != EXIT_FASTPATH_NONE) trace_kvm_msr_write(msr, data); +out: kvm_vcpu_srcu_read_unlock(vcpu); return ret; But I don't love the idea of searching through the filters for an MSR that is pretty much guaranteed to be allowed. Since x2APIC MSRs can't be filtered, we could add a per-vCPU flag to track if writes to TSC_DEADLINE are allowed, i.e. if TSC_DEADLINE can be handled in the fastpath. However, at some point Intel and/or AMD will (hopefully) add support for full virtualization of TSC_DEADLINE, and then TSC_DEADLINE will be in the same boat as the x2APIC MSRs, i.e. allowing userspace to filter TSC_DEADLINE when it's fully virtualized would be nonsensical. And depending on how hardware behaves, i.e. how a virtual TSC_DEADLINE interacts with the MSR bitmaps, *enforcing* userspace's filtering might require a small amount of additional complexity. And any MSR that is performance sensitive enough to be handled in the fastpath is probably worth virtualizing in hardware, i.e. we'll end up revisiting this topic every time we add an MSR to the fastpath :-( I'm struggling to come up with an idea that won't create an ABI nightmare, won't be subject to the whims of AMD and Intel, and won't saddle KVM with complexity to support behavior that in all likelihood no one wants. I'm leaning toward enforcing the filter for TSC_DEADLINE, and crossing my fingers that neither AMD nor Intel implements TSC_DEADLINE virtualization in such a way that it changes the behavior of WRMSR interception.
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index bf653df86112..2b2247f74ab7 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -381,18 +381,29 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc) { struct kvm_x86_pmu_event_filter *filter; struct kvm *kvm = pmc->vcpu->kvm; + bool allowed; + int idx; if (!static_call(kvm_x86_pmu_hw_event_available)(pmc)) return false; + idx = srcu_read_lock(&kvm->srcu); + filter = srcu_dereference(kvm->arch.pmu_event_filter, &kvm->srcu); - if (!filter) - return true; + if (!filter) { + allowed = true; + goto out; + } if (pmc_is_gp(pmc)) - return is_gp_event_allowed(filter, pmc->eventsel); + allowed = is_gp_event_allowed(filter, pmc->eventsel); + else + allowed = is_fixed_event_allowed(filter, pmc->idx); + +out: + srcu_read_unlock(&kvm->srcu, idx); - return is_fixed_event_allowed(filter, pmc->idx); + return allowed; } static bool pmc_event_is_allowed(struct kvm_pmc *pmc)
When running KVM's fast path it is possible to get into a situation where the PMU event filter is dereferenced without grabbing KVM's SRCU read lock. The following callstack demonstrates how that is possible. Call Trace: dump_stack+0x85/0xdf lockdep_rcu_suspicious+0x109/0x120 pmc_event_is_allowed+0x165/0x170 kvm_pmu_trigger_event+0xa5/0x190 handle_fastpath_set_msr_irqoff+0xca/0x1e0 svm_vcpu_run+0x5c3/0x7b0 [kvm_amd] vcpu_enter_guest+0x2108/0x2580 Fix that by explicitly grabbing the read lock before dereferencing the PMU event filter. Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events") Signed-off-by: Aaron Lewis <aaronlewis@google.com> --- arch/x86/kvm/pmu.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)