diff mbox series

[v2,1/9] KVM: x86/PMU: Delay vLBR release to the vcpu next sched-in

Message ID 20230921082957.44628-2-xiong.y.zhang@intel.com (mailing list archive)
State New, archived
Headers show
Series Upgrade intel vPMU version to 5 | expand

Commit Message

Zhang, Xiong Y Sept. 21, 2023, 8:29 a.m. UTC
If guest LBR is disabled at vCPU sched-in time, the vLBR event will be
released, then the following guest LBR MSRs accessing will be trapped,
and cause KVM to create new vLBR event. If this new vLBR event is the
only user of host LBR facility, host LBR driver will reset LBR facility
at vLBR creation. So guest LBR content may be changed during vCPU
sched-out and sched-in.

Considering this serial:
1. Guest disables LBR.
2. Guest reads LBR MSRs, but it doesn't finish.
3. vCPU is sched-out, later sched-in, vLBR event is released.
4. Guest continue reading LBR MSRs, KVM creates vLBR event again,
if this vLBR event is the only LBR user on host now, host LBR driver
will reset HW LBR facility at vLBR creataion.
5. Guest gets the remain LBR MSRs with reset state.
So gueest LBR MSRs reading before vCPU sched-out is correct, while
guest LBR MSRs reading after vCPU sched-out is wrong and is in
reset state. Similarly guest LBR MSRs writing before vCPU sched-out
is lost and is in reset state, while guest LBR MSRs writing after
vCPU sched-out is correct.

This is a bug that guest LBR content is changed as vCPU's scheduling.
This can happen when guest LBR MSRs accessing spans vCPU's scheduling,
usually guest access LBR MSRs at guest task switch and PMI handler.

Two options could be used to fixed this bug:
a. Save guest LBR snapshot at vLBR release in step 3, then restore
guest LBR after vLBR creation in step 4. But the number of LBR
MSRs is near 100, this means 100 MSRs reading and 100s writing are
needed for each vLBR release, the overhead is too heavy.
b. Defer vLBR release in step 3.

This commit choose the option b. Guest LBR MSRs accessing is
passthrough, so the interceptable guest DEBUGCTLMSR_LBR bit is used to
predict guest LBR usage. If guest LBR is disabled in a whole vCPU shced
time slice, KVM will predict guest LBR won't be used recently, then vLBR
will be released in next vCPU sched-in. Guest LBR MSRs accessing should
be finished in two vCPU sched time slice, otherwise it is maybe a guest
LBR driver bug and can not be supported by this commit.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 10 ++++++++--
 arch/x86/kvm/vmx/vmx.c       | 12 +++++++++---
 arch/x86/kvm/vmx/vmx.h       |  2 ++
 3 files changed, 19 insertions(+), 5 deletions(-)
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index f2efa0bf7ae8..76d7bd8e4fc6 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -628,6 +628,7 @@  static void intel_pmu_init(struct kvm_vcpu *vcpu)
 	lbr_desc->records.nr = 0;
 	lbr_desc->event = NULL;
 	lbr_desc->msr_passthrough = false;
+	lbr_desc->in_use = FALSE;
 }
 
 static void intel_pmu_reset(struct kvm_vcpu *vcpu)
@@ -761,8 +762,13 @@  void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
 
 static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
 {
-	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
-		intel_pmu_release_guest_lbr_event(vcpu);
+	struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)) {
+		if (!lbr_desc->in_use)
+			intel_pmu_release_guest_lbr_event(vcpu);
+		lbr_desc->in_use = false;
+	}
 }
 
 void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 72e3943f3693..4056e19266b5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2238,9 +2238,15 @@  static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			get_vmcs12(vcpu)->guest_ia32_debugctl = data;
 
 		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
-		if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
-		    (data & DEBUGCTLMSR_LBR))
-			intel_pmu_create_guest_lbr_event(vcpu);
+
+		if (intel_pmu_lbr_is_enabled(vcpu) && (data & DEBUGCTLMSR_LBR)) {
+			struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+			lbr_desc->in_use = true;
+			if (!lbr_desc->event)
+				intel_pmu_create_guest_lbr_event(vcpu);
+		}
+
 		return 0;
 	}
 	case MSR_IA32_BNDCFGS:
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c2130d2c8e24..547edeb52d09 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -107,6 +107,8 @@  struct lbr_desc {
 
 	/* True if LBRs are marked as not intercepted in the MSR bitmap */
 	bool msr_passthrough;
+
+	bool in_use;
 };
 
 /*