Message ID | 20230117013542.371944-4-reijiw@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: PMU: Allow userspace to limit the number of PMCs on vCPU | expand |
On Mon, Jan 16, 2023 at 05:35:37PM -0800, Reiji Watanabe wrote: > The number of PMU event counters is indicated in PMCR_EL0.N. > For a vCPU with PMUv3 configured, its value will be the same as > the host value by default. Userspace can set PMCR_EL0.N for the > vCPU to a lower value than the host value using KVM_SET_ONE_REG. > However, it is practically unsupported, as reset_pmcr() resets > PMCR_EL0.N to the host value on vCPU reset. > > Change reset_pmcr() to preserve the vCPU's PMCR_EL0.N value on > vCPU reset so that userspace can limit the number of the PMU > event counter on the vCPU. > > Signed-off-by: Reiji Watanabe <reijiw@google.com> > --- > arch/arm64/kvm/pmu-emul.c | 6 ++++++ > arch/arm64/kvm/sys_regs.c | 4 +++- > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > index 24908400e190..937a272b00a5 100644 > --- a/arch/arm64/kvm/pmu-emul.c > +++ b/arch/arm64/kvm/pmu-emul.c > @@ -213,6 +213,12 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu) > > for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) > pmu->pmc[i].idx = i; > + > + /* > + * Initialize PMCR_EL0 for the vCPU with the host value so that > + * the value is available at the very first vCPU reset. > + */ > + __vcpu_sys_reg(vcpu, PMCR_EL0) = read_sysreg(pmcr_el0); I think we need to derive a sanitised value for PMCR_EL0.N, as I believe nothing in the architecture prevents implementers from gluing together cores with varying numbers of PMCs. We probably haven't noticed it yet since it would appear all Arm designs have had 6 PMCs. > } > > /** > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > index 4959658b502c..67c1bd39b478 100644 > --- a/arch/arm64/kvm/sys_regs.c > +++ b/arch/arm64/kvm/sys_regs.c > @@ -637,8 +637,10 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) > if (!kvm_arm_support_pmu_v3()) > return; > > + /* PMCR_EL0 for the vCPU is set to the host value at vCPU creation. */ > + nit: I think we can do without the floating comment here. -- Thanks, Oliver
On Fri, 20 Jan 2023 00:30:33 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > On Mon, Jan 16, 2023 at 05:35:37PM -0800, Reiji Watanabe wrote: > > The number of PMU event counters is indicated in PMCR_EL0.N. > > For a vCPU with PMUv3 configured, its value will be the same as > > the host value by default. Userspace can set PMCR_EL0.N for the > > vCPU to a lower value than the host value using KVM_SET_ONE_REG. > > However, it is practically unsupported, as reset_pmcr() resets > > PMCR_EL0.N to the host value on vCPU reset. > > > > Change reset_pmcr() to preserve the vCPU's PMCR_EL0.N value on > > vCPU reset so that userspace can limit the number of the PMU > > event counter on the vCPU. > > > > Signed-off-by: Reiji Watanabe <reijiw@google.com> > > --- > > arch/arm64/kvm/pmu-emul.c | 6 ++++++ > > arch/arm64/kvm/sys_regs.c | 4 +++- > > 2 files changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > > index 24908400e190..937a272b00a5 100644 > > --- a/arch/arm64/kvm/pmu-emul.c > > +++ b/arch/arm64/kvm/pmu-emul.c > > @@ -213,6 +213,12 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu) > > > > for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) > > pmu->pmc[i].idx = i; > > + > > + /* > > + * Initialize PMCR_EL0 for the vCPU with the host value so that > > + * the value is available at the very first vCPU reset. > > + */ > > + __vcpu_sys_reg(vcpu, PMCR_EL0) = read_sysreg(pmcr_el0); > > I think we need to derive a sanitised value for PMCR_EL0.N, as I believe > nothing in the architecture prevents implementers from gluing together > cores with varying numbers of PMCs. We probably haven't noticed it yet > since it would appear all Arm designs have had 6 PMCs. This brings back the question of late onlining. How do you cope with with the onlining of such a CPU that has a smaller set of counters than its online counterparts? This is at odds with the way the PMU code works. If you have a different set of counters, you are likely to have a different PMU altogether: [ 1.192606] hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available [ 1.201254] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available This isn't a broken system, but it has two set of cores which are massively different, and two PMUs. This really should tie back to the PMU type we're counting on, and to the set of CPUs that implements it. We already have some infrastructure to check for the affinity of the PMU vs the CPU we're running on, and this is already visible to userspace. Can't we just leave this responsibility to userspace? Thanks, M.
Hey Marc, On Fri, Jan 20, 2023 at 12:12:32PM +0000, Marc Zyngier wrote: > On Fri, 20 Jan 2023 00:30:33 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > I think we need to derive a sanitised value for PMCR_EL0.N, as I believe > > nothing in the architecture prevents implementers from gluing together > > cores with varying numbers of PMCs. We probably haven't noticed it yet > > since it would appear all Arm designs have had 6 PMCs. > > This brings back the question of late onlining. How do you cope with > with the onlining of such a CPU that has a smaller set of counters > than its online counterparts? This is at odds with the way the PMU > code works. You're absolutely right, any illusion we derived from the online set of CPUs could fall apart with a late onlining of a different core. > If you have a different set of counters, you are likely to have a > different PMU altogether: > > [ 1.192606] hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available > [ 1.201254] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available > > This isn't a broken system, but it has two set of cores which are > massively different, and two PMUs. > > This really should tie back to the PMU type we're counting on, and to > the set of CPUs that implements it. We already have some > infrastructure to check for the affinity of the PMU vs the CPU we're > running on, and this is already visible to userspace. > > Can't we just leave this responsibility to userspace? Believe me, I'm always a fan of offloading things to userspace :) If the VMM is privy to the details of the system it is on then the differing PMUs can be passed through to the guest w/ pinned vCPU threads. I just worry about the case of a naive VMM that assumes a homogenous system. I don't think I could entirely blame the VMM in this case either as we've gone to lengths to sanitise the feature set exposed to userspace. What happens when a vCPU gets scheduled on a core where the vPMU doesn't match? Ignoring other incongruences, it is not possible to virtualize more counters than are supported by the vPMU of the core. Stopping short of any major hacks in the kernel to fudge around the problem, I believe we may need to provide better documentation of how heterogeneous CPUs are handled in KVM and what userspace can do about it. -- Thanks, Oliver
Hi Oliver, Marc, Thank you for the review! On Fri, Jan 20, 2023 at 10:05 AM Oliver Upton <oliver.upton@linux.dev> wrote: > > Hey Marc, > > On Fri, Jan 20, 2023 at 12:12:32PM +0000, Marc Zyngier wrote: > > On Fri, 20 Jan 2023 00:30:33 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > > I think we need to derive a sanitised value for PMCR_EL0.N, as I believe > > > nothing in the architecture prevents implementers from gluing together > > > cores with varying numbers of PMCs. We probably haven't noticed it yet > > > since it would appear all Arm designs have had 6 PMCs. > > > > This brings back the question of late onlining. How do you cope with > > with the onlining of such a CPU that has a smaller set of counters > > than its online counterparts? This is at odds with the way the PMU > > code works. > > You're absolutely right, any illusion we derived from the online set of > CPUs could fall apart with a late onlining of a different core. > > > If you have a different set of counters, you are likely to have a > > different PMU altogether: > > > > [ 1.192606] hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available > > [ 1.201254] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available > > > > This isn't a broken system, but it has two set of cores which are > > massively different, and two PMUs. > > > > This really should tie back to the PMU type we're counting on, and to > > the set of CPUs that implements it. We already have some > > infrastructure to check for the affinity of the PMU vs the CPU we're > > running on, and this is already visible to userspace. > > > > Can't we just leave this responsibility to userspace? > > Believe me, I'm always a fan of offloading things to userspace :) > > If the VMM is privy to the details of the system it is on then the > differing PMUs can be passed through to the guest w/ pinned vCPU > threads. I just worry about the case of a naive VMM that assumes a > homogenous system. I don't think I could entirely blame the VMM in this > case either as we've gone to lengths to sanitise the feature set > exposed to userspace. > > What happens when a vCPU gets scheduled on a core where the vPMU > doesn't match? Ignoring other incongruences, it is not possible to > virtualize more counters than are supported by the vPMU of the core. I believe KVM_RUN will fail with KVM_EXIT_FAIL_ENTRY (Please see the code that handles ON_UNSUPPORTED_CPU). > Stopping short of any major hacks in the kernel to fudge around the > problem, I believe we may need to provide better documentation of how > heterogeneous CPUs are handled in KVM and what userspace can do about > it. Documentation/virt/kvm/devices/vcpu.rstDocumentation/virt/kvm/devices/vcpu.rst for KVM_ARM_VCPU_PMU_V3_SET_PMU has some description for the current behavior at least. (perhaps we may need to update documents for this though) Now I'm a bit worried about the validation code for PMCR_EL0.N as well, as setting (restoring) PMCR_EL0 could be done on any pCPUs (even before using KVM_ARM_VCPU_PMU_V3_SET_PMU). What I am currently looking at is something like this: - Set the sanitised (min) value of PMCR_EL0.N among all PMUs for vCPUs by default. - Validate the PMCR_EL0.N value that userspace tries to set against the max value on the system (this is to ensure that restoring PMCR_EL0 for a vCPU works on any pCPUs) - Make KVM_RUN fail when PMCR_EL0.N for the vCPU indicates more counters than the PMU that is set for the vCPU. What do you think ? Thank you, Reiji
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index 24908400e190..937a272b00a5 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -213,6 +213,12 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu) for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) pmu->pmc[i].idx = i; + + /* + * Initialize PMCR_EL0 for the vCPU with the host value so that + * the value is available at the very first vCPU reset. + */ + __vcpu_sys_reg(vcpu, PMCR_EL0) = read_sysreg(pmcr_el0); } /** diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 4959658b502c..67c1bd39b478 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -637,8 +637,10 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) if (!kvm_arm_support_pmu_v3()) return; + /* PMCR_EL0 for the vCPU is set to the host value at vCPU creation. */ + /* Only preserve PMCR_EL0.N, and reset the rest to 0 */ - pmcr = read_sysreg(pmcr_el0) & (ARMV8_PMU_PMCR_N_MASK << ARMV8_PMU_PMCR_N_SHIFT); + pmcr = __vcpu_sys_reg(vcpu, r->reg) & (ARMV8_PMU_PMCR_N_MASK << ARMV8_PMU_PMCR_N_SHIFT); if (!kvm_supports_32bit_el0()) pmcr |= ARMV8_PMU_PMCR_LC;
The number of PMU event counters is indicated in PMCR_EL0.N. For a vCPU with PMUv3 configured, its value will be the same as the host value by default. Userspace can set PMCR_EL0.N for the vCPU to a lower value than the host value using KVM_SET_ONE_REG. However, it is practically unsupported, as reset_pmcr() resets PMCR_EL0.N to the host value on vCPU reset. Change reset_pmcr() to preserve the vCPU's PMCR_EL0.N value on vCPU reset so that userspace can limit the number of the PMU event counter on the vCPU. Signed-off-by: Reiji Watanabe <reijiw@google.com> --- arch/arm64/kvm/pmu-emul.c | 6 ++++++ arch/arm64/kvm/sys_regs.c | 4 +++- 2 files changed, 9 insertions(+), 1 deletion(-)