diff mbox series

[1/2] KVM: arm64: Fix MDCR_EL2.HPMN reset value

Message ID 20250217112412.3963324-2-maz@kernel.org (mailing list archive)
State New
Headers show
Series KVM: arm64: EL2 PMU reset handling fixes | expand

Commit Message

Marc Zyngier Feb. 17, 2025, 11:24 a.m. UTC
The MDCR_EL2 documentation indicates that the HPMN field has
the following behaviour:

"On a Warm reset, this field resets to the expression NUM_PMU_COUNTERS."

However, it appears we reset it to zero, which is not very useful.

Add a reset helper for MDCR_EL2, and handle the case where userspace
changes the target PMU, which may force us to change HPMN again.

Reported-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/pmu-emul.c | 13 +++++++++++++
 arch/arm64/kvm/sys_regs.c |  7 ++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

Comments

Oliver Upton Feb. 17, 2025, 6:53 p.m. UTC | #1
Hey,

On Mon, Feb 17, 2025 at 11:24:11AM +0000, Marc Zyngier wrote:
> The MDCR_EL2 documentation indicates that the HPMN field has
> the following behaviour:
> 
> "On a Warm reset, this field resets to the expression NUM_PMU_COUNTERS."
> 
> However, it appears we reset it to zero, which is not very useful.
> 
> Add a reset helper for MDCR_EL2, and handle the case where userspace
> changes the target PMU, which may force us to change HPMN again.
> 
> Reported-by: Joey Gouly <joey.gouly@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

The existing ABI expectations are that writes to PMCR_EL0.N constrain
the number of counters, so that should have a similar effect on
MDCR_EL2.HPMN.

At the same time, I get the feeling that we should throw out this whole
behavior of writing N to change the shape of the PMU, because it
complete breaks down for NV. PMCR_EL0.N is another one of those fields
that change behavior based on EL and isn't a global source of truth on
the shape of the PMU.

What do you think about adding a new vCPU attribute for selecting the
number of counters for a VM? We can allow non-nested VMs to use the
'old' method of writing PMCR_EL0.N and force nested VMs to use the
attribute.

We can then enforce ordering on the attribute and prevent it from being
used after vCPU reset.

Thanks,
Oliver
Marc Zyngier Feb. 19, 2025, 2:03 p.m. UTC | #2
On Mon, 17 Feb 2025 18:53:50 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hey,
> 
> On Mon, Feb 17, 2025 at 11:24:11AM +0000, Marc Zyngier wrote:
> > The MDCR_EL2 documentation indicates that the HPMN field has
> > the following behaviour:
> > 
> > "On a Warm reset, this field resets to the expression NUM_PMU_COUNTERS."
> > 
> > However, it appears we reset it to zero, which is not very useful.
> > 
> > Add a reset helper for MDCR_EL2, and handle the case where userspace
> > changes the target PMU, which may force us to change HPMN again.
> > 
> > Reported-by: Joey Gouly <joey.gouly@arm.com>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> 
> The existing ABI expectations are that writes to PMCR_EL0.N constrain
> the number of counters, so that should have a similar effect on
> MDCR_EL2.HPMN.
> 
> At the same time, I get the feeling that we should throw out this whole
> behavior of writing N to change the shape of the PMU, because it
> complete breaks down for NV. PMCR_EL0.N is another one of those fields
> that change behavior based on EL and isn't a global source of truth on
> the shape of the PMU.
> 
> What do you think about adding a new vCPU attribute for selecting the
> number of counters for a VM? We can allow non-nested VMs to use the
> 'old' method of writing PMCR_EL0.N and force nested VMs to use the
> attribute.

VCPU attribute? or PMU attribute? I'm really not keen on the former,
but the latter is probably workable, as it is VM-wide, similar to the
way we keep track of pmcr_n.

> We can then enforce ordering on the attribute and prevent it from being
> used after vCPU reset.

How would that work? Do you really want to mandate the PMU selection
(with its counter capping) to strictly occur between vcpu creation and
init?

This would, for example, break kvmtool which has these two operations
back-to-back, and sneaking new device-specific actions in the middle
is a bit unpalatable (there is a split between VM-wide and per-vcpu
actions).

Any idea?

	M.
Oliver Upton Feb. 19, 2025, 7:04 p.m. UTC | #3
On Wed, Feb 19, 2025 at 02:03:49PM +0000, Marc Zyngier wrote:
> On Mon, 17 Feb 2025 18:53:50 +0000, Oliver Upton <oliver.upton@linux.dev> wrote:
> > What do you think about adding a new vCPU attribute for selecting the
> > number of counters for a VM? We can allow non-nested VMs to use the
> > 'old' method of writing PMCR_EL0.N and force nested VMs to use the
> > attribute.
> 
> VCPU attribute? or PMU attribute? I'm really not keen on the former,
> but the latter is probably workable, as it is VM-wide, similar to the
> way we keep track of pmcr_n.

Well the _existing_ PMU attributes are actually vCPU attributes. I do
agree that accessing them as a VM attribute makes more sense, but that's
the UAPI we already have...

> > We can then enforce ordering on the attribute and prevent it from being
> > used after vCPU reset.
> 
> How would that work? Do you really want to mandate the PMU selection
> (with its counter capping) to strictly occur between vcpu creation and
> init?
> 
> This would, for example, break kvmtool which has these two operations
> back-to-back, and sneaking new device-specific actions in the middle
> is a bit unpalatable (there is a split between VM-wide and per-vcpu
> actions).
> 
> Any idea?

If we want to do this the 'right' way, we should provide VM attributes
for selecting the PMU implementation / configuring the event filter
to complement an attribute for setting the number of event counters.

I don't want to have a mix-and-match approach where vPMU attributes are
scattered between the vCPU and the VM since it requires a similar amount
of gymnastics in userspace to set crap up.

Thanks,
Oliver
Marc Zyngier Feb. 19, 2025, 9:10 p.m. UTC | #4
On Wed, 19 Feb 2025 19:04:12 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Wed, Feb 19, 2025 at 02:03:49PM +0000, Marc Zyngier wrote:
> > On Mon, 17 Feb 2025 18:53:50 +0000, Oliver Upton <oliver.upton@linux.dev> wrote:
> > > What do you think about adding a new vCPU attribute for selecting the
> > > number of counters for a VM? We can allow non-nested VMs to use the
> > > 'old' method of writing PMCR_EL0.N and force nested VMs to use the
> > > attribute.
> > 
> > VCPU attribute? or PMU attribute? I'm really not keen on the former,
> > but the latter is probably workable, as it is VM-wide, similar to the
> > way we keep track of pmcr_n.
> 
> Well the _existing_ PMU attributes are actually vCPU attributes. I do
> agree that accessing them as a VM attribute makes more sense, but that's
> the UAPI we already have...

Gah, I remember now. Someone please take the API to the backyard...

> 
> > > We can then enforce ordering on the attribute and prevent it from being
> > > used after vCPU reset.
> > 
> > How would that work? Do you really want to mandate the PMU selection
> > (with its counter capping) to strictly occur between vcpu creation and
> > init?
> > 
> > This would, for example, break kvmtool which has these two operations
> > back-to-back, and sneaking new device-specific actions in the middle
> > is a bit unpalatable (there is a split between VM-wide and per-vcpu
> > actions).
> > 
> > Any idea?
> 
> If we want to do this the 'right' way, we should provide VM attributes
> for selecting the PMU implementation / configuring the event filter
> to complement an attribute for setting the number of event counters.
> 
> I don't want to have a mix-and-match approach where vPMU attributes are
> scattered between the vCPU and the VM since it requires a similar amount
> of gymnastics in userspace to set crap up.

I agree on not mixing vCPU and VM scoped attributes, even if that
amounts to the same thing in the back-end. But freezing the number of
counters on vcpu reset is something that should be considered very
carefully, and I fear it breaks existing models -- specially given
that this is yet another one-off.

I wonder if we should instead make use of the KVM_ARM_VCPU_FINALIZE
ioctl just like we do for SVE, passing KVM_ARM_VCPU_PMU_V3 instead.
This would make use of an existing mechanism and lock the PMU for good
(implementation, IRQ, number of counters...).

	M.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 6c5950b9ceac8..5a71c3744c4d7 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -1007,6 +1007,19 @@  static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu)
 
 	kvm->arch.arm_pmu = arm_pmu;
 	kvm->arch.pmcr_n = kvm_arm_pmu_get_max_counters(kvm);
+
+	/* Reset MDCR_EL2.HPMN behind the vcpus' back... */
+	if (test_bit(KVM_ARM_VCPU_HAS_EL2, kvm->arch.vcpu_features)) {
+		struct kvm_vcpu *vcpu;
+		unsigned long i;
+
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			u64 val = __vcpu_sys_reg(vcpu, MDCR_EL2);
+			val &= ~MDCR_EL2_HPMN;
+			val |= FIELD_PREP(MDCR_EL2_HPMN, kvm->arch.pmcr_n);
+			__vcpu_sys_reg(vcpu, MDCR_EL2) = val;
+		}
+	}
 }
 
 /**
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 82430c1e1dd02..380f22f19cb42 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2493,6 +2493,11 @@  static bool access_mdcr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static u64 reset_mdcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	__vcpu_sys_reg(vcpu, r->reg) = vcpu->kvm->arch.pmcr_n;
+	return vcpu->kvm->arch.pmcr_n;
+}
 
 /*
  * Architected system registers.
@@ -3034,7 +3039,7 @@  static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(SCTLR_EL2, access_rw, reset_val, SCTLR_EL2_RES1),
 	EL2_REG(ACTLR_EL2, access_rw, reset_val, 0),
 	EL2_REG_VNCR(HCR_EL2, reset_hcr, 0),
-	EL2_REG(MDCR_EL2, access_mdcr, reset_val, 0),
+	EL2_REG(MDCR_EL2, access_mdcr, reset_mdcr, 0),
 	EL2_REG(CPTR_EL2, access_rw, reset_val, CPTR_NVHE_EL2_RES1),
 	EL2_REG_VNCR(HSTR_EL2, reset_val, 0),
 	EL2_REG_VNCR(HFGRTR_EL2, reset_val, 0),