diff mbox series

[RFC,01/41] perf: x86/intel: Support PERF_PMU_CAP_VPMU_PASSTHROUGH

Message ID 20240126085444.324918-2-xiong.y.zhang@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/pmu: Introduce passthrough vPM | expand

Commit Message

Xiong Zhang Jan. 26, 2024, 8:54 a.m. UTC
From: Kan Liang <kan.liang@linux.intel.com>

Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4
and later PMUs, which includes the improvements for virtualization.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/events/intel/core.c | 6 ++++++
 include/linux/perf_event.h   | 1 +
 2 files changed, 7 insertions(+)

Comments

Sean Christopherson April 11, 2024, 5:04 p.m. UTC | #1
On Fri, Jan 26, 2024, Xiong Zhang wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4
> and later PMUs

Why?  I get that is an RFC, but it's not at all obvious to me why this needs to
take a dependency on v4+.
Liang, Kan April 11, 2024, 5:21 p.m. UTC | #2
On 2024-04-11 1:04 p.m., Sean Christopherson wrote:
> On Fri, Jan 26, 2024, Xiong Zhang wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4
>> and later PMUs
> 
> Why?  I get that is an RFC, but it's not at all obvious to me why this needs to
> take a dependency on v4+.

The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They
are used in the save/restore of PMU state. Please see PATCH 23/41.
So it's limited to v4+ for now.

Thanks,
Kan
Jim Mattson April 11, 2024, 5:24 p.m. UTC | #3
On Thu, Apr 11, 2024 at 10:21 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-04-11 1:04 p.m., Sean Christopherson wrote:
> > On Fri, Jan 26, 2024, Xiong Zhang wrote:
> >> From: Kan Liang <kan.liang@linux.intel.com>
> >>
> >> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4
> >> and later PMUs
> >
> > Why?  I get that is an RFC, but it's not at all obvious to me why this needs to
> > take a dependency on v4+.
>
> The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They
> are used in the save/restore of PMU state. Please see PATCH 23/41.
> So it's limited to v4+ for now.

Prior to version 4, semi-passthrough is possible, but
IA32_PERF_GLOBAL_STATUS has to be intercepted and emulated, since it
is non-trivial to set bits in this MSR.
Sean Christopherson April 11, 2024, 5:46 p.m. UTC | #4
On Thu, Apr 11, 2024, Jim Mattson wrote:
> On Thu, Apr 11, 2024 at 10:21 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> > On 2024-04-11 1:04 p.m., Sean Christopherson wrote:
> > > On Fri, Jan 26, 2024, Xiong Zhang wrote:
> > >> From: Kan Liang <kan.liang@linux.intel.com>
> > >>
> > >> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4
> > >> and later PMUs
> > >
> > > Why?  I get that is an RFC, but it's not at all obvious to me why this needs to
> > > take a dependency on v4+.
> >
> > The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They
> > are used in the save/restore of PMU state. Please see PATCH 23/41.
> > So it's limited to v4+ for now.
> 
> Prior to version 4, semi-passthrough is possible, but IA32_PERF_GLOBAL_STATUS
> has to be intercepted and emulated, since it is non-trivial to set bits in
> this MSR.

Ah, then this _perf_ capability should be PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS or
so, especially since it's introduced in advance of the KVM side of things.  Then
whether or not to support a mediated PMU becomes a KVM decision, e.g. intercepting
accesses to IA32_PERF_GLOBAL_STATUS doesn't seem like a complete deal breaker
(or maybe it is, I now see the comment about it being used to do the context switch).

And peeking ahead, IIUC perf effectively _forces_ a passthrough model when
has_vpmu_passthrough_cap() is true, which is wrong.  There needs to be a user/admin
opt-in (or opt-out) to that behavior, at a kernel/perf level, not just at a KVM
level.  Hmm, or is perf relying on KVM to do that right thing?  I.e. relying on
KVM to do perf_guest_{enter,exit}() if and only if the PMU can support the
passthrough model.

If that's the case, most of the has_vpmu_passthrough_cap() checks are gratiutous
and confusing, e.g. just WARN if KVM (or some other module) tries to trigger a
PMU context switch when it's not supported by perf.
Liang, Kan April 11, 2024, 7:13 p.m. UTC | #5
On 2024-04-11 1:46 p.m., Sean Christopherson wrote:
> On Thu, Apr 11, 2024, Jim Mattson wrote:
>> On Thu, Apr 11, 2024 at 10:21 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>> On 2024-04-11 1:04 p.m., Sean Christopherson wrote:
>>>> On Fri, Jan 26, 2024, Xiong Zhang wrote:
>>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>>
>>>>> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4
>>>>> and later PMUs
>>>>
>>>> Why?  I get that is an RFC, but it's not at all obvious to me why this needs to
>>>> take a dependency on v4+.
>>>
>>> The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They
>>> are used in the save/restore of PMU state. Please see PATCH 23/41.
>>> So it's limited to v4+ for now.
>>
>> Prior to version 4, semi-passthrough is possible, but IA32_PERF_GLOBAL_STATUS
>> has to be intercepted and emulated, since it is non-trivial to set bits in
>> this MSR.
> 
> Ah, then this _perf_ capability should be PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS or
> so, especially since it's introduced in advance of the KVM side of things.  Then
> whether or not to support a mediated PMU becomes a KVM decision, e.g. intercepting
> accesses to IA32_PERF_GLOBAL_STATUS doesn't seem like a complete deal breaker
> (or maybe it is, I now see the comment about it being used to do the context switch).

The PERF_PMU_CAP_VPMU_PASSTHROUGH is to indicate whether the PMU has the
capability to support passthrough mode. It's used to distinguish the
other PMUs, e.g., uncore PMU. It's just because the current RFC utilizes
the IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs, I have to limit it to V4+.

I agree that it should be a KVM decision, not perf. The v4 check should
be removed.

Regarding the PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS, I think perf already
passes the x86_pmu.version to KVM. Maybe KVM can add an internal flag to
track it, so a PERF_PMU_CAP_ bit can be saved?

> 
> And peeking ahead, IIUC perf effectively _forces_ a passthrough model when
> has_vpmu_passthrough_cap() is true, which is wrong.  There needs to be a user/admin
> opt-in (or opt-out) to that behavior, at a kernel/perf level, not just at a KVM
> level.  Hmm, or is perf relying on KVM to do that right thing?  I.e. relying on
> KVM to do perf_guest_{enter,exit}() if and only if the PMU can support the
> passthrough model.
>

Yes, perf relies on KVM to tell if a guest is entering the passthrough mode.

> If that's the case, most of the has_vpmu_passthrough_cap() checks are gratiutous
> and confusing, e.g. just WARN if KVM (or some other module) tries to trigger a
> PMU context switch when it's not supported by perf.

If there is only non supported PMUs running in the host, perf wouldn't
do any context switch. The guest can feel free to use the core PMU. We
should not WARN for this case.

Thanks,
Kan
Sean Christopherson April 11, 2024, 7:32 p.m. UTC | #6
On Thu, Apr 11, 2024, Sean Christopherson wrote:
> On Thu, Apr 11, 2024, Jim Mattson wrote:
> > On Thu, Apr 11, 2024 at 10:21 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> > > On 2024-04-11 1:04 p.m., Sean Christopherson wrote:
> > > > On Fri, Jan 26, 2024, Xiong Zhang wrote:
> > > >> From: Kan Liang <kan.liang@linux.intel.com>
> > > >>
> > > >> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4
> > > >> and later PMUs
> > > >
> > > > Why?  I get that is an RFC, but it's not at all obvious to me why this needs to
> > > > take a dependency on v4+.
> > >
> > > The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They
> > > are used in the save/restore of PMU state. Please see PATCH 23/41.
> > > So it's limited to v4+ for now.
> > 
> > Prior to version 4, semi-passthrough is possible, but IA32_PERF_GLOBAL_STATUS
> > has to be intercepted and emulated, since it is non-trivial to set bits in
> > this MSR.
> 
> Ah, then this _perf_ capability should be PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS or

And now I see that the capabilities are arch agnostic, whereas GLOBAL_STATUS
obviously is not.  Unless a writable GLOBAL_STATUS is a hard requirement for perf
to be able to support a mediated PMU, this capability probably doesn't need to
exist, e.g. KVM can check for a writable GLOBAL_STATUS just as easily as perf
(or perf can stuff x86_pmu_capability.writable_global_status directly).
Sean Christopherson April 11, 2024, 8:43 p.m. UTC | #7
On Thu, Apr 11, 2024, Kan Liang wrote:
> On 2024-04-11 1:46 p.m., Sean Christopherson wrote:
> > On Thu, Apr 11, 2024, Jim Mattson wrote:
> >> On Thu, Apr 11, 2024 at 10:21 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>> On 2024-04-11 1:04 p.m., Sean Christopherson wrote:
> >>>> On Fri, Jan 26, 2024, Xiong Zhang wrote:
> >>>>> From: Kan Liang <kan.liang@linux.intel.com>
> >>>>>
> >>>>> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4
> >>>>> and later PMUs
> >>>>
> >>>> Why?  I get that is an RFC, but it's not at all obvious to me why this needs to
> >>>> take a dependency on v4+.
> >>>
> >>> The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They
> >>> are used in the save/restore of PMU state. Please see PATCH 23/41.
> >>> So it's limited to v4+ for now.
> >>
> >> Prior to version 4, semi-passthrough is possible, but IA32_PERF_GLOBAL_STATUS
> >> has to be intercepted and emulated, since it is non-trivial to set bits in
> >> this MSR.
> > 
> > Ah, then this _perf_ capability should be PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS or
> > so, especially since it's introduced in advance of the KVM side of things.  Then
> > whether or not to support a mediated PMU becomes a KVM decision, e.g. intercepting
> > accesses to IA32_PERF_GLOBAL_STATUS doesn't seem like a complete deal breaker
> > (or maybe it is, I now see the comment about it being used to do the context switch).
> 
> The PERF_PMU_CAP_VPMU_PASSTHROUGH is to indicate whether the PMU has the
> capability to support passthrough mode. It's used to distinguish the
> other PMUs, e.g., uncore PMU.

Ah, the changelog blurb about SW/uncore PMUs finally clicked.

> Regarding the PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS, I think perf already
> passes the x86_pmu.version to KVM. Maybe KVM can add an internal flag to
> track it, so a PERF_PMU_CAP_ bit can be saved?

Yeah, I think that's totally fine.  At some point, KVM is going to need to know
that GLOBAL_STATUS is writable if PMU.version >= 4, e.g. to correctly emulate
guest accesses, so I don't see any reason to bury that logic in perf.

> > And peeking ahead, IIUC perf effectively _forces_ a passthrough model when
> > has_vpmu_passthrough_cap() is true, which is wrong.  There needs to be a user/admin
> > opt-in (or opt-out) to that behavior, at a kernel/perf level, not just at a KVM
> > level.  Hmm, or is perf relying on KVM to do that right thing?  I.e. relying on
> > KVM to do perf_guest_{enter,exit}() if and only if the PMU can support the
> > passthrough model.
> >
> 
> Yes, perf relies on KVM to tell if a guest is entering the passthrough mode.
> 
> > If that's the case, most of the has_vpmu_passthrough_cap() checks are gratiutous
> > and confusing, e.g. just WARN if KVM (or some other module) tries to trigger a
> > PMU context switch when it's not supported by perf.
> 
> If there is only non supported PMUs running in the host, perf wouldn't
> do any context switch. The guest can feel free to use the core PMU. We
> should not WARN for this case.

I'm struggling to wrap my head around this.  If there is no supported PMU in the
host, how can there be a core PMU for the guest to use?  KVM virtualizes a PMU
if and only if kvm_init_pmu_capability() reports a compatible PMU, and IIUC that
reporting is done based on the core PMU.

Specifically, I want to ensure we don't screw is passing through PMU MSR access,
e.g. because KVM thinks perf will context switch those MSRs, but perf doesn't
because perf doesn't think the relevant PMU supports a mediate/passthrough mode.
Liang, Kan April 11, 2024, 9:04 p.m. UTC | #8
On 2024-04-11 4:43 p.m., Sean Christopherson wrote:
>>> And peeking ahead, IIUC perf effectively _forces_ a passthrough model when
>>> has_vpmu_passthrough_cap() is true, which is wrong.  There needs to be a user/admin
>>> opt-in (or opt-out) to that behavior, at a kernel/perf level, not just at a KVM
>>> level.  Hmm, or is perf relying on KVM to do that right thing?  I.e. relying on
>>> KVM to do perf_guest_{enter,exit}() if and only if the PMU can support the
>>> passthrough model.
>>>
>> Yes, perf relies on KVM to tell if a guest is entering the passthrough mode.
>>
>>> If that's the case, most of the has_vpmu_passthrough_cap() checks are gratiutous
>>> and confusing, e.g. just WARN if KVM (or some other module) tries to trigger a
>>> PMU context switch when it's not supported by perf.
>> If there is only non supported PMUs running in the host, perf wouldn't
>> do any context switch. The guest can feel free to use the core PMU. We
>> should not WARN for this case.
> I'm struggling to wrap my head around this.  If there is no supported PMU in the
> host, how can there be a core PMU for the guest to use?  KVM virtualizes a PMU
> if and only if kvm_init_pmu_capability() reports a compatible PMU, and IIUC that
> reporting is done based on the core PMU.
> 
> Specifically, I want to ensure we don't screw is passing through PMU MSR access,
> e.g. because KVM thinks perf will context switch those MSRs, but perf doesn't

Perf only context switches the MSRs of the PMU with the
PERF_PMU_CAP_VPMU_PASSTHROUGH flag. (Only the core PMU for this RFC).

For other PMUs without the PERF_PMU_CAP_VPMU_PASSTHROUGH, perf does
nothing in perf_guest_enter/exit().

KVM can rely on the flag to decide whether to enable the passthrough
mode for the PMU.

Thanks,
Kan
diff mbox series

Patch

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a08f794a0e79..cf790c37757a 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4662,6 +4662,9 @@  static void intel_pmu_check_hybrid_pmus(struct x86_hybrid_pmu *pmu)
 	else
 		pmu->pmu.capabilities |= ~PERF_PMU_CAP_AUX_OUTPUT;
 
+	if (x86_pmu.version >= 4)
+		pmu->pmu.capabilities |= PERF_PMU_CAP_VPMU_PASSTHROUGH;
+
 	intel_pmu_check_event_constraints(pmu->event_constraints,
 					  pmu->num_counters,
 					  pmu->num_counters_fixed,
@@ -6137,6 +6140,9 @@  __init int intel_pmu_init(void)
 			pr_cont(" AnyThread deprecated, ");
 	}
 
+	if (version >= 4)
+		x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_VPMU_PASSTHROUGH;
+
 	/*
 	 * Install the hw-cache-events table:
 	 */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index afb028c54f33..60eff413dbba 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -291,6 +291,7 @@  struct perf_event_pmu_context;
 #define PERF_PMU_CAP_NO_EXCLUDE			0x0040
 #define PERF_PMU_CAP_AUX_OUTPUT			0x0080
 #define PERF_PMU_CAP_EXTENDED_HW_TYPE		0x0100
+#define PERF_PMU_CAP_VPMU_PASSTHROUGH		0x0200
 
 struct perf_output_handle;