diff mbox series

[v7,12/12] KVM/VMX/vPMU: support to report GLOBAL_STATUS_LBRS_FROZEN

Message ID 1562548999-37095-13-git-send-email-wei.w.wang@intel.com (mailing list archive)
State New, archived
Headers show
Series Guest LBR Enabling | expand

Commit Message

Wang, Wei W July 8, 2019, 1:23 a.m. UTC
This patch enables the LBR related features in Arch v4 in advance,
though the current vPMU only has v2 support. Other arch v4 related
support will be enabled later in another series.

Arch v4 supports streamlined Freeze_LBR_on_PMI. According to the SDM,
the LBR_FRZ bit is set to global status when debugctl.freeze_lbr_on_pmi
has been set and a PMI is generated. The CTR_FRZ bit is set when
debugctl.freeze_perfmon_on_pmi is set and a PMI is generated.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
---
 arch/x86/kvm/pmu.c           | 11 +++++++++--
 arch/x86/kvm/pmu.h           |  1 +
 arch/x86/kvm/vmx/pmu_intel.c | 20 ++++++++++++++++++++
 3 files changed, 30 insertions(+), 2 deletions(-)

Comments

Peter Zijlstra July 8, 2019, 3:09 p.m. UTC | #1
On Mon, Jul 08, 2019 at 09:23:19AM +0800, Wei Wang wrote:
> This patch enables the LBR related features in Arch v4 in advance,
> though the current vPMU only has v2 support. Other arch v4 related
> support will be enabled later in another series.
> 
> Arch v4 supports streamlined Freeze_LBR_on_PMI. According to the SDM,
> the LBR_FRZ bit is set to global status when debugctl.freeze_lbr_on_pmi
> has been set and a PMI is generated. The CTR_FRZ bit is set when
> debugctl.freeze_perfmon_on_pmi is set and a PMI is generated.

(that's still a misnomer; it is: freeze_perfmon_on_overflow)

Why?

Who uses that v4 crud? It's broken. It looses events between overflow
and PMI.
Wang, Wei W July 9, 2019, 3:24 a.m. UTC | #2
On 07/08/2019 11:09 PM, Peter Zijlstra wrote:
> On Mon, Jul 08, 2019 at 09:23:19AM +0800, Wei Wang wrote:
>> This patch enables the LBR related features in Arch v4 in advance,
>> though the current vPMU only has v2 support. Other arch v4 related
>> support will be enabled later in another series.
>>
>> Arch v4 supports streamlined Freeze_LBR_on_PMI. According to the SDM,
>> the LBR_FRZ bit is set to global status when debugctl.freeze_lbr_on_pmi
>> has been set and a PMI is generated. The CTR_FRZ bit is set when
>> debugctl.freeze_perfmon_on_pmi is set and a PMI is generated.
> (that's still a misnomer; it is: freeze_perfmon_on_overflow)

OK. (but that was directly copied from the sdm 18.2.4.1)


> Why?
>
> Who uses that v4 crud?

I saw the native perf driver has been updated to v4.
After the vPMU gets updated to v4, the guest perf would use that.

If you prefer to hold on this patch until vPMU v4 support,
we could do that as well.


> It's broken. It looses events between overflow
> and PMI.

Do you mean it's a v4 hardware issue?

Best,
Wei
Peter Zijlstra July 9, 2019, 11:35 a.m. UTC | #3
On Tue, Jul 09, 2019 at 11:24:07AM +0800, Wei Wang wrote:
> On 07/08/2019 11:09 PM, Peter Zijlstra wrote:
> > On Mon, Jul 08, 2019 at 09:23:19AM +0800, Wei Wang wrote:
> > > This patch enables the LBR related features in Arch v4 in advance,
> > > though the current vPMU only has v2 support. Other arch v4 related
> > > support will be enabled later in another series.
> > > 
> > > Arch v4 supports streamlined Freeze_LBR_on_PMI. According to the SDM,
> > > the LBR_FRZ bit is set to global status when debugctl.freeze_lbr_on_pmi
> > > has been set and a PMI is generated. The CTR_FRZ bit is set when
> > > debugctl.freeze_perfmon_on_pmi is set and a PMI is generated.
> > (that's still a misnomer; it is: freeze_perfmon_on_overflow)
> 
> OK. (but that was directly copied from the sdm 18.2.4.1)

Yeah, I know. But that name doesn't correctly describe what it actually
does. If it worked as named it would in fact be OK.

> > Why?
> > 
> > Who uses that v4 crud?
> 
> I saw the native perf driver has been updated to v4.

It's default disabled and I'm temped to simply remove it. See below.

> After the vPMU gets updated to v4, the guest perf would use that.
> 
> If you prefer to hold on this patch until vPMU v4 support,
> we could do that as well.
> 
> 
> > It's broken. It looses events between overflow
> > and PMI.
> 
> Do you mean it's a v4 hardware issue?

Yeah; although I'm not sure if its an implementation or specification
problem. But as it exists it is of very limited use.

Fundamentally our events (with exception of event groups) are
independent. Events should always count, except when the PMI is running
-- so as to not include the measurement overhead in the measurement
itself. But this (mis)feature stops the entire PMU as soon as a single
counter overflows, inhibiting all other counters from running (as they
should) until the PMI has happened and reset the state.

(Note that, strictly speaking, we even expect the overflowing counter to
continue counting until the PMI happens. Having an overflow should not
mean we loose events. A sampling and !sampling event should produce the
same event count.)

So even when there's only a single event (group) scheduled, it isn't
strictly right. And when there's multiple events scheduled it is
definitely wrong.

And while I understand the purpose of the current semantics; it makes a
single event group sample count more coherent, the fact that is looses
events just bugs me something fierce -- and as shown, it breaks tools.
Wang, Wei W July 10, 2019, 9:23 a.m. UTC | #4
On 07/09/2019 07:35 PM, Peter Zijlstra wrote:
>
> Yeah; although I'm not sure if its an implementation or specification
> problem. But as it exists it is of very limited use.
>
> Fundamentally our events (with exception of event groups) are
> independent. Events should always count, except when the PMI is running
> -- so as to not include the measurement overhead in the measurement
> itself. But this (mis)feature stops the entire PMU as soon as a single
> counter overflows, inhibiting all other counters from running (as they
> should) until the PMI has happened and reset the state.
>
> (Note that, strictly speaking, we even expect the overflowing counter to
> continue counting until the PMI happens. Having an overflow should not
> mean we loose events. A sampling and !sampling event should produce the
> same event count.)
>
> So even when there's only a single event (group) scheduled, it isn't
> strictly right. And when there's multiple events scheduled it is
> definitely wrong.
>
> And while I understand the purpose of the current semantics; it makes a
> single event group sample count more coherent, the fact that is looses
> events just bugs me something fierce -- and as shown, it breaks tools.

Thanks for sharing the finding.
If I understand this correctly, you observed that counter getting freezed
earlier than expected (expected to freeze at the time PMI gets generated).

Have you talked to anyone for possible freeze adjustment from the hardware?

Best,
Wei
diff mbox series

Patch

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 323bb45..89bff8f 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -52,6 +52,13 @@  static void kvm_pmi_trigger_fn(struct irq_work *irq_work)
 	kvm_pmu_deliver_pmi(vcpu);
 }
 
+static void kvm_perf_set_global_status(struct kvm_pmu *pmu, u8 idx)
+{
+	__set_bit(idx, (unsigned long *)&pmu->global_status);
+	if (kvm_x86_ops->pmu_ops->set_global_status)
+		kvm_x86_ops->pmu_ops->set_global_status(pmu, idx);
+}
+
 static void kvm_perf_overflow(struct perf_event *perf_event,
 			      struct perf_sample_data *data,
 			      struct pt_regs *regs)
@@ -61,7 +68,7 @@  static void kvm_perf_overflow(struct perf_event *perf_event,
 
 	if (!test_and_set_bit(pmc->idx,
 			      (unsigned long *)&pmu->reprogram_pmi)) {
-		__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
+		kvm_perf_set_global_status(pmu, pmc->idx);
 		kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
 	}
 }
@@ -75,7 +82,7 @@  static void kvm_perf_overflow_intr(struct perf_event *perf_event,
 
 	if (!test_and_set_bit(pmc->idx,
 			      (unsigned long *)&pmu->reprogram_pmi)) {
-		__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
+		kvm_perf_set_global_status(pmu, pmc->idx);
 		kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
 
 		/*
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index cadf91a..408ddc2 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -24,6 +24,7 @@  struct kvm_pmu_ops {
 				    u8 unit_mask);
 	unsigned (*find_fixed_event)(int idx);
 	bool (*pmc_is_enabled)(struct kvm_pmc *pmc);
+	void (*set_global_status)(struct kvm_pmu *pmu, u8 idx);
 	struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
 	struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, unsigned idx,
 					  u64 *mask);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index fd09777..6f74b69 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -413,6 +413,22 @@  static bool intel_pmu_access_lbr_msr(struct kvm_vcpu *vcpu,
 	return ret;
 }
 
+static void intel_pmu_set_global_status(struct kvm_pmu *pmu, u8 idx)
+{
+	u64 guest_debugctl;
+
+	if (pmu->version >= 4) {
+		guest_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+		if (guest_debugctl & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI)
+			__set_bit(GLOBAL_STATUS_LBRS_FROZEN,
+				  (unsigned long *)&pmu->global_status);
+		if (guest_debugctl & DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI)
+			__set_bit(GLOBAL_STATUS_COUNTERS_FROZEN,
+				  (unsigned long *)&pmu->global_status);
+	}
+}
+
 static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -597,6 +613,9 @@  static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	pmu->global_ovf_ctrl_mask = pmu->global_ctrl_mask
 			& ~(MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF |
 			    MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD);
+	if (pmu->version >= 4)
+		pmu->global_ovf_ctrl_mask &= ~(GLOBAL_STATUS_LBRS_FROZEN |
+					       GLOBAL_STATUS_COUNTERS_FROZEN);
 	if (kvm_x86_ops->pt_supported())
 		pmu->global_ovf_ctrl_mask &=
 				~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI;
@@ -711,6 +730,7 @@  void intel_pmu_disable_save_guest_lbr(struct kvm_vcpu *vcpu)
 struct kvm_pmu_ops intel_pmu_ops = {
 	.find_arch_event = intel_find_arch_event,
 	.find_fixed_event = intel_find_fixed_event,
+	.set_global_status = intel_pmu_set_global_status,
 	.pmc_is_enabled = intel_pmc_is_enabled,
 	.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
 	.msr_idx_to_pmc = intel_msr_idx_to_pmc,