Message ID | 20250217112412.3963324-2-maz@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: arm64: EL2 PMU reset handling fixes | expand |
Hey, On Mon, Feb 17, 2025 at 11:24:11AM +0000, Marc Zyngier wrote: > The MDCR_EL2 documentation indicates that the HPMN field has > the following behaviour: > > "On a Warm reset, this field resets to the expression NUM_PMU_COUNTERS." > > However, it appears we reset it to zero, which is not very useful. > > Add a reset helper for MDCR_EL2, and handle the case where userspace > changes the target PMU, which may force us to change HPMN again. > > Reported-by: Joey Gouly <joey.gouly@arm.com> > Signed-off-by: Marc Zyngier <maz@kernel.org> The existing ABI expectations are that writes to PMCR_EL0.N constrain the number of counters, so that should have a similar effect on MDCR_EL2.HPMN. At the same time, I get the feeling that we should throw out this whole behavior of writing N to change the shape of the PMU, because it complete breaks down for NV. PMCR_EL0.N is another one of those fields that change behavior based on EL and isn't a global source of truth on the shape of the PMU. What do you think about adding a new vCPU attribute for selecting the number of counters for a VM? We can allow non-nested VMs to use the 'old' method of writing PMCR_EL0.N and force nested VMs to use the attribute. We can then enforce ordering on the attribute and prevent it from being used after vCPU reset. Thanks, Oliver
On Mon, 17 Feb 2025 18:53:50 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > Hey, > > On Mon, Feb 17, 2025 at 11:24:11AM +0000, Marc Zyngier wrote: > > The MDCR_EL2 documentation indicates that the HPMN field has > > the following behaviour: > > > > "On a Warm reset, this field resets to the expression NUM_PMU_COUNTERS." > > > > However, it appears we reset it to zero, which is not very useful. > > > > Add a reset helper for MDCR_EL2, and handle the case where userspace > > changes the target PMU, which may force us to change HPMN again. > > > > Reported-by: Joey Gouly <joey.gouly@arm.com> > > Signed-off-by: Marc Zyngier <maz@kernel.org> > > The existing ABI expectations are that writes to PMCR_EL0.N constrain > the number of counters, so that should have a similar effect on > MDCR_EL2.HPMN. > > At the same time, I get the feeling that we should throw out this whole > behavior of writing N to change the shape of the PMU, because it > complete breaks down for NV. PMCR_EL0.N is another one of those fields > that change behavior based on EL and isn't a global source of truth on > the shape of the PMU. > > What do you think about adding a new vCPU attribute for selecting the > number of counters for a VM? We can allow non-nested VMs to use the > 'old' method of writing PMCR_EL0.N and force nested VMs to use the > attribute. VCPU attribute? or PMU attribute? I'm really not keen on the former, but the latter is probably workable, as it is VM-wide, similar to the way we keep track of pmcr_n. > We can then enforce ordering on the attribute and prevent it from being > used after vCPU reset. How would that work? Do you really want to mandate the PMU selection (with its counter capping) to strictly occur between vcpu creation and init? This would, for example, break kvmtool which has these two operations back-to-back, and sneaking new device-specific actions in the middle is a bit unpalatable (there is a split between VM-wide and per-vcpu actions). Any idea? M.
On Wed, Feb 19, 2025 at 02:03:49PM +0000, Marc Zyngier wrote: > On Mon, 17 Feb 2025 18:53:50 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > What do you think about adding a new vCPU attribute for selecting the > > number of counters for a VM? We can allow non-nested VMs to use the > > 'old' method of writing PMCR_EL0.N and force nested VMs to use the > > attribute. > > VCPU attribute? or PMU attribute? I'm really not keen on the former, > but the latter is probably workable, as it is VM-wide, similar to the > way we keep track of pmcr_n. Well the _existing_ PMU attributes are actually vCPU attributes. I do agree that accessing them as a VM attribute makes more sense, but that's the UAPI we already have... > > We can then enforce ordering on the attribute and prevent it from being > > used after vCPU reset. > > How would that work? Do you really want to mandate the PMU selection > (with its counter capping) to strictly occur between vcpu creation and > init? > > This would, for example, break kvmtool which has these two operations > back-to-back, and sneaking new device-specific actions in the middle > is a bit unpalatable (there is a split between VM-wide and per-vcpu > actions). > > Any idea? If we want to do this the 'right' way, we should provide VM attributes for selecting the PMU implementation / configuring the event filter to complement an attribute for setting the number of event counters. I don't want to have a mix-and-match approach where vPMU attributes are scattered between the vCPU and the VM since it requires a similar amount of gymnastics in userspace to set crap up. Thanks, Oliver
On Wed, 19 Feb 2025 19:04:12 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > On Wed, Feb 19, 2025 at 02:03:49PM +0000, Marc Zyngier wrote: > > On Mon, 17 Feb 2025 18:53:50 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > > What do you think about adding a new vCPU attribute for selecting the > > > number of counters for a VM? We can allow non-nested VMs to use the > > > 'old' method of writing PMCR_EL0.N and force nested VMs to use the > > > attribute. > > > > VCPU attribute? or PMU attribute? I'm really not keen on the former, > > but the latter is probably workable, as it is VM-wide, similar to the > > way we keep track of pmcr_n. > > Well the _existing_ PMU attributes are actually vCPU attributes. I do > agree that accessing them as a VM attribute makes more sense, but that's > the UAPI we already have... Gah, I remember now. Someone please take the API to the backyard... > > > > We can then enforce ordering on the attribute and prevent it from being > > > used after vCPU reset. > > > > How would that work? Do you really want to mandate the PMU selection > > (with its counter capping) to strictly occur between vcpu creation and > > init? > > > > This would, for example, break kvmtool which has these two operations > > back-to-back, and sneaking new device-specific actions in the middle > > is a bit unpalatable (there is a split between VM-wide and per-vcpu > > actions). > > > > Any idea? > > If we want to do this the 'right' way, we should provide VM attributes > for selecting the PMU implementation / configuring the event filter > to complement an attribute for setting the number of event counters. > > I don't want to have a mix-and-match approach where vPMU attributes are > scattered between the vCPU and the VM since it requires a similar amount > of gymnastics in userspace to set crap up. I agree on not mixing vCPU and VM scoped attributes, even if that amounts to the same thing in the back-end. But freezing the number of counters on vcpu reset is something that should be considered very carefully, and I fear it breaks existing models -- specially given that this is yet another one-off. I wonder if we should instead make use of the KVM_ARM_VCPU_FINALIZE ioctl just like we do for SVE, passing KVM_ARM_VCPU_PMU_V3 instead. This would make use of an existing mechanism and lock the PMU for good (implementation, IRQ, number of counters...). M.
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index 6c5950b9ceac8..5a71c3744c4d7 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -1007,6 +1007,19 @@ static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu) kvm->arch.arm_pmu = arm_pmu; kvm->arch.pmcr_n = kvm_arm_pmu_get_max_counters(kvm); + + /* Reset MDCR_EL2.HPMN behind the vcpus' back... */ + if (test_bit(KVM_ARM_VCPU_HAS_EL2, kvm->arch.vcpu_features)) { + struct kvm_vcpu *vcpu; + unsigned long i; + + kvm_for_each_vcpu(i, vcpu, kvm) { + u64 val = __vcpu_sys_reg(vcpu, MDCR_EL2); + val &= ~MDCR_EL2_HPMN; + val |= FIELD_PREP(MDCR_EL2_HPMN, kvm->arch.pmcr_n); + __vcpu_sys_reg(vcpu, MDCR_EL2) = val; + } + } } /** diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 82430c1e1dd02..380f22f19cb42 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2493,6 +2493,11 @@ static bool access_mdcr(struct kvm_vcpu *vcpu, return true; } +static u64 reset_mdcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) +{ + __vcpu_sys_reg(vcpu, r->reg) = vcpu->kvm->arch.pmcr_n; + return vcpu->kvm->arch.pmcr_n; +} /* * Architected system registers. @@ -3034,7 +3039,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { EL2_REG(SCTLR_EL2, access_rw, reset_val, SCTLR_EL2_RES1), EL2_REG(ACTLR_EL2, access_rw, reset_val, 0), EL2_REG_VNCR(HCR_EL2, reset_hcr, 0), - EL2_REG(MDCR_EL2, access_mdcr, reset_val, 0), + EL2_REG(MDCR_EL2, access_mdcr, reset_mdcr, 0), EL2_REG(CPTR_EL2, access_rw, reset_val, CPTR_NVHE_EL2_RES1), EL2_REG_VNCR(HSTR_EL2, reset_val, 0), EL2_REG_VNCR(HFGRTR_EL2, reset_val, 0),
The MDCR_EL2 documentation indicates that the HPMN field has the following behaviour: "On a Warm reset, this field resets to the expression NUM_PMU_COUNTERS." However, it appears we reset it to zero, which is not very useful. Add a reset helper for MDCR_EL2, and handle the case where userspace changes the target PMU, which may force us to change HPMN again. Reported-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> --- arch/arm64/kvm/pmu-emul.c | 13 +++++++++++++ arch/arm64/kvm/sys_regs.c | 7 ++++++- 2 files changed, 19 insertions(+), 1 deletion(-)