Message ID | 20240126085444.324918-2-xiong.y.zhang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/pmu: Introduce passthrough vPM | expand |
On Fri, Jan 26, 2024, Xiong Zhang wrote: > From: Kan Liang <kan.liang@linux.intel.com> > > Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4 > and later PMUs Why? I get that is an RFC, but it's not at all obvious to me why this needs to take a dependency on v4+.
On 2024-04-11 1:04 p.m., Sean Christopherson wrote: > On Fri, Jan 26, 2024, Xiong Zhang wrote: >> From: Kan Liang <kan.liang@linux.intel.com> >> >> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4 >> and later PMUs > > Why? I get that is an RFC, but it's not at all obvious to me why this needs to > take a dependency on v4+. The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They are used in the save/restore of PMU state. Please see PATCH 23/41. So it's limited to v4+ for now. Thanks, Kan
On Thu, Apr 11, 2024 at 10:21 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > > > On 2024-04-11 1:04 p.m., Sean Christopherson wrote: > > On Fri, Jan 26, 2024, Xiong Zhang wrote: > >> From: Kan Liang <kan.liang@linux.intel.com> > >> > >> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4 > >> and later PMUs > > > > Why? I get that is an RFC, but it's not at all obvious to me why this needs to > > take a dependency on v4+. > > The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They > are used in the save/restore of PMU state. Please see PATCH 23/41. > So it's limited to v4+ for now. Prior to version 4, semi-passthrough is possible, but IA32_PERF_GLOBAL_STATUS has to be intercepted and emulated, since it is non-trivial to set bits in this MSR.
On Thu, Apr 11, 2024, Jim Mattson wrote: > On Thu, Apr 11, 2024 at 10:21 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > On 2024-04-11 1:04 p.m., Sean Christopherson wrote: > > > On Fri, Jan 26, 2024, Xiong Zhang wrote: > > >> From: Kan Liang <kan.liang@linux.intel.com> > > >> > > >> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4 > > >> and later PMUs > > > > > > Why? I get that is an RFC, but it's not at all obvious to me why this needs to > > > take a dependency on v4+. > > > > The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They > > are used in the save/restore of PMU state. Please see PATCH 23/41. > > So it's limited to v4+ for now. > > Prior to version 4, semi-passthrough is possible, but IA32_PERF_GLOBAL_STATUS > has to be intercepted and emulated, since it is non-trivial to set bits in > this MSR. Ah, then this _perf_ capability should be PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS or so, especially since it's introduced in advance of the KVM side of things. Then whether or not to support a mediated PMU becomes a KVM decision, e.g. intercepting accesses to IA32_PERF_GLOBAL_STATUS doesn't seem like a complete deal breaker (or maybe it is, I now see the comment about it being used to do the context switch). And peeking ahead, IIUC perf effectively _forces_ a passthrough model when has_vpmu_passthrough_cap() is true, which is wrong. There needs to be a user/admin opt-in (or opt-out) to that behavior, at a kernel/perf level, not just at a KVM level. Hmm, or is perf relying on KVM to do that right thing? I.e. relying on KVM to do perf_guest_{enter,exit}() if and only if the PMU can support the passthrough model. If that's the case, most of the has_vpmu_passthrough_cap() checks are gratiutous and confusing, e.g. just WARN if KVM (or some other module) tries to trigger a PMU context switch when it's not supported by perf.
On 2024-04-11 1:46 p.m., Sean Christopherson wrote: > On Thu, Apr 11, 2024, Jim Mattson wrote: >> On Thu, Apr 11, 2024 at 10:21 AM Liang, Kan <kan.liang@linux.intel.com> wrote: >>> On 2024-04-11 1:04 p.m., Sean Christopherson wrote: >>>> On Fri, Jan 26, 2024, Xiong Zhang wrote: >>>>> From: Kan Liang <kan.liang@linux.intel.com> >>>>> >>>>> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4 >>>>> and later PMUs >>>> >>>> Why? I get that is an RFC, but it's not at all obvious to me why this needs to >>>> take a dependency on v4+. >>> >>> The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They >>> are used in the save/restore of PMU state. Please see PATCH 23/41. >>> So it's limited to v4+ for now. >> >> Prior to version 4, semi-passthrough is possible, but IA32_PERF_GLOBAL_STATUS >> has to be intercepted and emulated, since it is non-trivial to set bits in >> this MSR. > > Ah, then this _perf_ capability should be PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS or > so, especially since it's introduced in advance of the KVM side of things. Then > whether or not to support a mediated PMU becomes a KVM decision, e.g. intercepting > accesses to IA32_PERF_GLOBAL_STATUS doesn't seem like a complete deal breaker > (or maybe it is, I now see the comment about it being used to do the context switch). The PERF_PMU_CAP_VPMU_PASSTHROUGH is to indicate whether the PMU has the capability to support passthrough mode. It's used to distinguish the other PMUs, e.g., uncore PMU. It's just because the current RFC utilizes the IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs, I have to limit it to V4+. I agree that it should be a KVM decision, not perf. The v4 check should be removed. Regarding the PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS, I think perf already passes the x86_pmu.version to KVM. Maybe KVM can add an internal flag to track it, so a PERF_PMU_CAP_ bit can be saved? > > And peeking ahead, IIUC perf effectively _forces_ a passthrough model when > has_vpmu_passthrough_cap() is true, which is wrong. There needs to be a user/admin > opt-in (or opt-out) to that behavior, at a kernel/perf level, not just at a KVM > level. Hmm, or is perf relying on KVM to do that right thing? I.e. relying on > KVM to do perf_guest_{enter,exit}() if and only if the PMU can support the > passthrough model. > Yes, perf relies on KVM to tell if a guest is entering the passthrough mode. > If that's the case, most of the has_vpmu_passthrough_cap() checks are gratiutous > and confusing, e.g. just WARN if KVM (or some other module) tries to trigger a > PMU context switch when it's not supported by perf. If there is only non supported PMUs running in the host, perf wouldn't do any context switch. The guest can feel free to use the core PMU. We should not WARN for this case. Thanks, Kan
On Thu, Apr 11, 2024, Sean Christopherson wrote: > On Thu, Apr 11, 2024, Jim Mattson wrote: > > On Thu, Apr 11, 2024 at 10:21 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > > On 2024-04-11 1:04 p.m., Sean Christopherson wrote: > > > > On Fri, Jan 26, 2024, Xiong Zhang wrote: > > > >> From: Kan Liang <kan.liang@linux.intel.com> > > > >> > > > >> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4 > > > >> and later PMUs > > > > > > > > Why? I get that is an RFC, but it's not at all obvious to me why this needs to > > > > take a dependency on v4+. > > > > > > The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They > > > are used in the save/restore of PMU state. Please see PATCH 23/41. > > > So it's limited to v4+ for now. > > > > Prior to version 4, semi-passthrough is possible, but IA32_PERF_GLOBAL_STATUS > > has to be intercepted and emulated, since it is non-trivial to set bits in > > this MSR. > > Ah, then this _perf_ capability should be PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS or And now I see that the capabilities are arch agnostic, whereas GLOBAL_STATUS obviously is not. Unless a writable GLOBAL_STATUS is a hard requirement for perf to be able to support a mediated PMU, this capability probably doesn't need to exist, e.g. KVM can check for a writable GLOBAL_STATUS just as easily as perf (or perf can stuff x86_pmu_capability.writable_global_status directly).
On Thu, Apr 11, 2024, Kan Liang wrote: > On 2024-04-11 1:46 p.m., Sean Christopherson wrote: > > On Thu, Apr 11, 2024, Jim Mattson wrote: > >> On Thu, Apr 11, 2024 at 10:21 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > >>> On 2024-04-11 1:04 p.m., Sean Christopherson wrote: > >>>> On Fri, Jan 26, 2024, Xiong Zhang wrote: > >>>>> From: Kan Liang <kan.liang@linux.intel.com> > >>>>> > >>>>> Define and apply the PERF_PMU_CAP_VPMU_PASSTHROUGH flag for the version 4 > >>>>> and later PMUs > >>>> > >>>> Why? I get that is an RFC, but it's not at all obvious to me why this needs to > >>>> take a dependency on v4+. > >>> > >>> The IA32_PERF_GLOBAL_STATUS_RESET/SET MSRs are introduced in v4. They > >>> are used in the save/restore of PMU state. Please see PATCH 23/41. > >>> So it's limited to v4+ for now. > >> > >> Prior to version 4, semi-passthrough is possible, but IA32_PERF_GLOBAL_STATUS > >> has to be intercepted and emulated, since it is non-trivial to set bits in > >> this MSR. > > > > Ah, then this _perf_ capability should be PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS or > > so, especially since it's introduced in advance of the KVM side of things. Then > > whether or not to support a mediated PMU becomes a KVM decision, e.g. intercepting > > accesses to IA32_PERF_GLOBAL_STATUS doesn't seem like a complete deal breaker > > (or maybe it is, I now see the comment about it being used to do the context switch). > > The PERF_PMU_CAP_VPMU_PASSTHROUGH is to indicate whether the PMU has the > capability to support passthrough mode. It's used to distinguish the > other PMUs, e.g., uncore PMU. Ah, the changelog blurb about SW/uncore PMUs finally clicked. > Regarding the PERF_PMU_CAP_WRITABLE_GLOBAL_STATUS, I think perf already > passes the x86_pmu.version to KVM. Maybe KVM can add an internal flag to > track it, so a PERF_PMU_CAP_ bit can be saved? Yeah, I think that's totally fine. At some point, KVM is going to need to know that GLOBAL_STATUS is writable if PMU.version >= 4, e.g. to correctly emulate guest accesses, so I don't see any reason to bury that logic in perf. > > And peeking ahead, IIUC perf effectively _forces_ a passthrough model when > > has_vpmu_passthrough_cap() is true, which is wrong. There needs to be a user/admin > > opt-in (or opt-out) to that behavior, at a kernel/perf level, not just at a KVM > > level. Hmm, or is perf relying on KVM to do that right thing? I.e. relying on > > KVM to do perf_guest_{enter,exit}() if and only if the PMU can support the > > passthrough model. > > > > Yes, perf relies on KVM to tell if a guest is entering the passthrough mode. > > > If that's the case, most of the has_vpmu_passthrough_cap() checks are gratiutous > > and confusing, e.g. just WARN if KVM (or some other module) tries to trigger a > > PMU context switch when it's not supported by perf. > > If there is only non supported PMUs running in the host, perf wouldn't > do any context switch. The guest can feel free to use the core PMU. We > should not WARN for this case. I'm struggling to wrap my head around this. If there is no supported PMU in the host, how can there be a core PMU for the guest to use? KVM virtualizes a PMU if and only if kvm_init_pmu_capability() reports a compatible PMU, and IIUC that reporting is done based on the core PMU. Specifically, I want to ensure we don't screw is passing through PMU MSR access, e.g. because KVM thinks perf will context switch those MSRs, but perf doesn't because perf doesn't think the relevant PMU supports a mediate/passthrough mode.
On 2024-04-11 4:43 p.m., Sean Christopherson wrote: >>> And peeking ahead, IIUC perf effectively _forces_ a passthrough model when >>> has_vpmu_passthrough_cap() is true, which is wrong. There needs to be a user/admin >>> opt-in (or opt-out) to that behavior, at a kernel/perf level, not just at a KVM >>> level. Hmm, or is perf relying on KVM to do that right thing? I.e. relying on >>> KVM to do perf_guest_{enter,exit}() if and only if the PMU can support the >>> passthrough model. >>> >> Yes, perf relies on KVM to tell if a guest is entering the passthrough mode. >> >>> If that's the case, most of the has_vpmu_passthrough_cap() checks are gratiutous >>> and confusing, e.g. just WARN if KVM (or some other module) tries to trigger a >>> PMU context switch when it's not supported by perf. >> If there is only non supported PMUs running in the host, perf wouldn't >> do any context switch. The guest can feel free to use the core PMU. We >> should not WARN for this case. > I'm struggling to wrap my head around this. If there is no supported PMU in the > host, how can there be a core PMU for the guest to use? KVM virtualizes a PMU > if and only if kvm_init_pmu_capability() reports a compatible PMU, and IIUC that > reporting is done based on the core PMU. > > Specifically, I want to ensure we don't screw is passing through PMU MSR access, > e.g. because KVM thinks perf will context switch those MSRs, but perf doesn't Perf only context switches the MSRs of the PMU with the PERF_PMU_CAP_VPMU_PASSTHROUGH flag. (Only the core PMU for this RFC). For other PMUs without the PERF_PMU_CAP_VPMU_PASSTHROUGH, perf does nothing in perf_guest_enter/exit(). KVM can rely on the flag to decide whether to enable the passthrough mode for the PMU. Thanks, Kan
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index a08f794a0e79..cf790c37757a 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4662,6 +4662,9 @@ static void intel_pmu_check_hybrid_pmus(struct x86_hybrid_pmu *pmu) else pmu->pmu.capabilities |= ~PERF_PMU_CAP_AUX_OUTPUT; + if (x86_pmu.version >= 4) + pmu->pmu.capabilities |= PERF_PMU_CAP_VPMU_PASSTHROUGH; + intel_pmu_check_event_constraints(pmu->event_constraints, pmu->num_counters, pmu->num_counters_fixed, @@ -6137,6 +6140,9 @@ __init int intel_pmu_init(void) pr_cont(" AnyThread deprecated, "); } + if (version >= 4) + x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_VPMU_PASSTHROUGH; + /* * Install the hw-cache-events table: */ diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index afb028c54f33..60eff413dbba 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -291,6 +291,7 @@ struct perf_event_pmu_context; #define PERF_PMU_CAP_NO_EXCLUDE 0x0040 #define PERF_PMU_CAP_AUX_OUTPUT 0x0080 #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 +#define PERF_PMU_CAP_VPMU_PASSTHROUGH 0x0200 struct perf_output_handle;