diff mbox series

[v4,11/12] KVM: x86/mmu: split a single gfn zap range when guest MTRRs are honored

Message ID 20230714065602.20805-1-yan.y.zhao@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/mmu: refine memtype related mmu zap | expand

Commit Message

Yan Zhao July 14, 2023, 6:56 a.m. UTC
Split a single gfn zap range (specifially range [0, ~0UL)) to smaller
ranges according to current memslot layout when guest MTRRs are honored.

Though vCPUs have been serialized to perform kvm_zap_gfn_range() for MTRRs
updates and CR0.CD toggles, contention caused rescheduling cost is still
huge when there're concurrent page fault holding mmu_lock for read.

Split a single huge zap range according to the actual memslot layout can
reduce unnecessary transversal and yielding cost in tdp mmu.
Also, it can increase the chances for larger ranges to find existing ranges
to zap in zap list.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mtrr.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

Comments

Sean Christopherson Aug. 25, 2023, 11:15 p.m. UTC | #1
On Fri, Jul 14, 2023, Yan Zhao wrote:
> Split a single gfn zap range (specifially range [0, ~0UL)) to smaller
> ranges according to current memslot layout when guest MTRRs are honored.
> 
> Though vCPUs have been serialized to perform kvm_zap_gfn_range() for MTRRs
> updates and CR0.CD toggles, contention caused rescheduling cost is still
> huge when there're concurrent page fault holding mmu_lock for read.

Unless the pre-check doesn't work for some reason, I definitely want to avoid
this patch.  This is a lot of complexity that, IIUC, is just working around a
problem elsewhere in KVM.

> Split a single huge zap range according to the actual memslot layout can
> reduce unnecessary transversal and yielding cost in tdp mmu.
> Also, it can increase the chances for larger ranges to find existing ranges
> to zap in zap list.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
Yan Zhao Sept. 4, 2023, 8:39 a.m. UTC | #2
On Fri, Aug 25, 2023 at 04:15:41PM -0700, Sean Christopherson wrote:
> On Fri, Jul 14, 2023, Yan Zhao wrote:
> > Split a single gfn zap range (specifially range [0, ~0UL)) to smaller
> > ranges according to current memslot layout when guest MTRRs are honored.
> > 
> > Though vCPUs have been serialized to perform kvm_zap_gfn_range() for MTRRs
> > updates and CR0.CD toggles, contention caused rescheduling cost is still
> > huge when there're concurrent page fault holding mmu_lock for read.
> 
> Unless the pre-check doesn't work for some reason, I definitely want to avoid
> this patch.  This is a lot of complexity that, IIUC, is just working around a
> problem elsewhere in KVM.
>
I think so too.
diff mbox series

Patch

diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 9fdbdbf874a8..00e98dfc4b0d 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -909,21 +909,44 @@  static void kvm_zap_or_wait_mtrr_zap_list(struct kvm *kvm)
 static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu,
 				   gfn_t gfn_start, gfn_t gfn_end)
 {
+	int idx = srcu_read_lock(&vcpu->kvm->srcu);
+	const struct kvm_memory_slot *memslot;
 	struct mtrr_zap_range *range;
+	struct kvm_memslot_iter iter;
+	struct kvm_memslots *slots;
+	gfn_t start, end;
+	int i;
 
-	range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT);
-	if (!range)
-		goto fail;
-
-	range->start = gfn_start;
-	range->end = gfn_end;
-
-	kvm_add_mtrr_zap_list(vcpu->kvm, range);
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		slots = __kvm_memslots(vcpu->kvm, i);
+		kvm_for_each_memslot_in_gfn_range(&iter, slots, gfn_start, gfn_end) {
+			memslot = iter.slot;
+			start = max(gfn_start, memslot->base_gfn);
+			end = min(gfn_end, memslot->base_gfn + memslot->npages);
+			if (WARN_ON_ONCE(start >= end))
+				continue;
+
+			range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT);
+			if (!range)
+				goto fail;
+
+			range->start = start;
+			range->end = end;
+
+			/*
+			 * Redundent ranges in different address space will be
+			 * removed in kvm_add_mtrr_zap_list().
+			 */
+			kvm_add_mtrr_zap_list(vcpu->kvm, range);
+		}
+	}
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 
 	kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm);
 	return;
 
 fail:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 	kvm_zap_gfn_range(vcpu->kvm, gfn_start, gfn_end);
 }