diff mbox series

[v2,2/2] KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded

Message ID 20230408034759.2369068-3-reijiw@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: PMU: Correct the handling of PMUSERENR_EL0 | expand

Commit Message

Reiji Watanabe April 8, 2023, 3:47 a.m. UTC
Currently, with VHE, KVM sets ER, CR, SW and EN bits of
PMUSERENR_EL0 to 1 on vcpu_load(), and saves and restores
the register value for the host on vcpu_load() and vcpu_put().
If the value of those bits are cleared on a pCPU with a vCPU
loaded (armv8pmu_start() would do that when PMU counters are
programmed for the guest), PMU access from the guest EL0 might
be trapped to the guest EL1 directly regardless of the current
PMUSERENR_EL0 value of the vCPU.

Fix this by not letting armv8pmu_start() overwrite PMUSERENR on
the pCPU on which a vCPU is loaded, and instead updating the
saved shadow register value for the host, so that the value can
be restored on vcpu_put() later.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Fixes: 83a7a4d643d3 ("arm64: perf: Enable PMU counter userspace access for perf event")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  5 +++++
 arch/arm64/kernel/perf_event.c    | 21 ++++++++++++++++++---
 arch/arm64/kvm/pmu.c              | 20 ++++++++++++++++++++
 3 files changed, 43 insertions(+), 3 deletions(-)

Comments

Mark Rutland April 11, 2023, 9:33 a.m. UTC | #1
On Fri, Apr 07, 2023 at 08:47:59PM -0700, Reiji Watanabe wrote:
> Currently, with VHE, KVM sets ER, CR, SW and EN bits of
> PMUSERENR_EL0 to 1 on vcpu_load(), and saves and restores
> the register value for the host on vcpu_load() and vcpu_put().
> If the value of those bits are cleared on a pCPU with a vCPU
> loaded (armv8pmu_start() would do that when PMU counters are
> programmed for the guest), PMU access from the guest EL0 might
> be trapped to the guest EL1 directly regardless of the current
> PMUSERENR_EL0 value of the vCPU.
> 
> Fix this by not letting armv8pmu_start() overwrite PMUSERENR on
> the pCPU on which a vCPU is loaded, and instead updating the
> saved shadow register value for the host, so that the value can
> be restored on vcpu_put() later.

I'm happy with the hook in the PMU code, but I think there's still a race
between an IPI and vcpu_{load,put}() where we can lose an update to
PMUSERERNR_EL0. I tried to point that out in my final question in:

  https://lore.kernel.org/all/ZCwzV7ACl21VbLru@FVFF77S0Q05N.cambridge.arm.com/

... but I looks like that wasn't all that clear.

Consider vcpu_load():

void vcpu_load(struct kvm_vcpu *vcpu)
{
	int cpu = get_cpu();

	__this_cpu_write(kvm_running_vcpu, vcpu);
	preempt_notifier_register(&vcpu->preempt_notifier);
	kvm_arch_vcpu_load(vcpu, cpu);
	put_cpu();
}

AFAICT that's called with IRQs enabled, and the {get,put}_cpu() calls will only
disable migration/preemption. After the write to kvm_running_vcpu, the code in
kvm_set_pmuserenr() will see that there is a running vcpu, and write to the
host context without updating the real PMUSERENR_EL0 register.

If we take an IPI and call kvm_set_pmuserenr() after the write to
kvm_running_vcpu but before kvm_running_vcpu() completes, the call to
kvm_set_pmuserenr() could update the host context (without updating the real
PMUSERENR_EL0 value) before __activate_traps_common() saves the host value
with:

	 ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);

... which would discard the write made by kvm_set_pmuserenr().

Similar can happen in vcpu_put() where an IPI after __deactivate_traps_common()
but before kvm_running_vcpu is cleared would result in kvm_set_pmuserenr()
writing to the host context, but this value would never be written into HW.

Unless I'm missing something (e.g. if interrupts are actually masked during
those windows), I don't think this is a complete fix as-is.

I'm not sure if there is a smart fix for that.

Thanks,
Mark.

> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Fixes: 83a7a4d643d3 ("arm64: perf: Enable PMU counter userspace access for perf event")
> Signed-off-by: Reiji Watanabe <reijiw@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  5 +++++
>  arch/arm64/kernel/perf_event.c    | 21 ++++++++++++++++++---
>  arch/arm64/kvm/pmu.c              | 20 ++++++++++++++++++++
>  3 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index bcd774d74f34..22db2f885c17 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1028,9 +1028,14 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
>  #ifdef CONFIG_KVM
>  void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
>  void kvm_clr_pmu_events(u32 clr);
> +bool kvm_set_pmuserenr(u64 val);
>  #else
>  static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
>  static inline void kvm_clr_pmu_events(u32 clr) {}
> +static inline bool kvm_set_pmuserenr(u64 val)
> +{
> +	return false;
> +}
>  #endif
>  
>  void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index dde06c0f97f3..0fffe4c56c28 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -741,9 +741,25 @@ static inline u32 armv8pmu_getreset_flags(void)
>  	return value;
>  }
>  
> +static void update_pmuserenr(u64 val)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	/*
> +	 * The current pmuserenr value might be the value for the guest.
> +	 * If that's the case, have KVM keep tracking of the register value
> +	 * for the host EL0 so that KVM can restore it before returning to
> +	 * the host EL0. Otherwise, update the register now.
> +	 */
> +	if (kvm_set_pmuserenr(val))
> +		return;
> +
> +	write_sysreg(val, pmuserenr_el0);
> +}
> +
>  static void armv8pmu_disable_user_access(void)
>  {
> -	write_sysreg(0, pmuserenr_el0);
> +	update_pmuserenr(0);
>  }
>  
>  static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
> @@ -759,8 +775,7 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
>  			armv8pmu_write_evcntr(i, 0);
>  	}
>  
> -	write_sysreg(0, pmuserenr_el0);
> -	write_sysreg(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR, pmuserenr_el0);
> +	update_pmuserenr(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR);
>  }
>  
>  static void armv8pmu_enable_event(struct perf_event *event)
> diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
> index 7887133d15f0..40bb2cb13317 100644
> --- a/arch/arm64/kvm/pmu.c
> +++ b/arch/arm64/kvm/pmu.c
> @@ -209,3 +209,23 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
>  	kvm_vcpu_pmu_enable_el0(events_host);
>  	kvm_vcpu_pmu_disable_el0(events_guest);
>  }
> +
> +/*
> + * With VHE, keep track of the PMUSERENR_EL0 value for the host EL0 on
> + * the pCPU where vCPU is loaded, since PMUSERENR_EL0 is switched to
> + * the value for the guest on vcpu_load().  The value for the host EL0
> + * will be restored on vcpu_put(), before returning to the EL0.
> + *
> + * Return true if KVM takes care of the register. Otherwise return false.
> + */
> +bool kvm_set_pmuserenr(u64 val)
> +{
> +	struct kvm_cpu_context *hctxt;
> +
> +	if (!kvm_arm_support_pmu_v3() || !has_vhe() || !kvm_get_running_vcpu())
> +		return false;
> +
> +	hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	ctxt_sys_reg(hctxt, PMUSERENR_EL0) = val;
> +	return true;
> +}
> -- 
> 2.40.0.577.gac1e443424-goog
> 
>
Reiji Watanabe April 12, 2023, 5:14 a.m. UTC | #2
On Tue, Apr 11, 2023 at 10:33:50AM +0100, Mark Rutland wrote:
> On Fri, Apr 07, 2023 at 08:47:59PM -0700, Reiji Watanabe wrote:
> > Currently, with VHE, KVM sets ER, CR, SW and EN bits of
> > PMUSERENR_EL0 to 1 on vcpu_load(), and saves and restores
> > the register value for the host on vcpu_load() and vcpu_put().
> > If the value of those bits are cleared on a pCPU with a vCPU
> > loaded (armv8pmu_start() would do that when PMU counters are
> > programmed for the guest), PMU access from the guest EL0 might
> > be trapped to the guest EL1 directly regardless of the current
> > PMUSERENR_EL0 value of the vCPU.
> > 
> > Fix this by not letting armv8pmu_start() overwrite PMUSERENR on
> > the pCPU on which a vCPU is loaded, and instead updating the
> > saved shadow register value for the host, so that the value can
> > be restored on vcpu_put() later.
> 
> I'm happy with the hook in the PMU code, but I think there's still a race
> between an IPI and vcpu_{load,put}() where we can lose an update to
> PMUSERERNR_EL0. I tried to point that out in my final question in:
> 
>   https://lore.kernel.org/all/ZCwzV7ACl21VbLru@FVFF77S0Q05N.cambridge.arm.com/
> 
> ... but I looks like that wasn't all that clear.
> 
> Consider vcpu_load():
> 
> void vcpu_load(struct kvm_vcpu *vcpu)
> {
> 	int cpu = get_cpu();
> 
> 	__this_cpu_write(kvm_running_vcpu, vcpu);
> 	preempt_notifier_register(&vcpu->preempt_notifier);
> 	kvm_arch_vcpu_load(vcpu, cpu);
> 	put_cpu();
> }
> 
> AFAICT that's called with IRQs enabled, and the {get,put}_cpu() calls will only
> disable migration/preemption. After the write to kvm_running_vcpu, the code in
> kvm_set_pmuserenr() will see that there is a running vcpu, and write to the
> host context without updating the real PMUSERENR_EL0 register.
> 
> If we take an IPI and call kvm_set_pmuserenr() after the write to
> kvm_running_vcpu but before kvm_running_vcpu() completes, the call to
> kvm_set_pmuserenr() could update the host context (without updating the real
> PMUSERENR_EL0 value) before __activate_traps_common() saves the host value
> with:
> 
> 	 ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);
> 
> ... which would discard the write made by kvm_set_pmuserenr().
> 
> Similar can happen in vcpu_put() where an IPI after __deactivate_traps_common()
> but before kvm_running_vcpu is cleared would result in kvm_set_pmuserenr()
> writing to the host context, but this value would never be written into HW.
> 
> Unless I'm missing something (e.g. if interrupts are actually masked during
> those windows), I don't think this is a complete fix as-is.
> 
> I'm not sure if there is a smart fix for that.

Thank you for the comment.

Uh, right, interrupts are not masked during those windows...

What I am currently considering on this would be disabling
IRQs while manipulating the register, and introducing a new flag
to indicate whether the PMUSERENR for the guest EL0 is loaded,
and having kvm_set_pmuserenr() check the new flag.

The code would be something like below (local_irq_save/local_irq_restore
needs to be excluded for NVHE though).

What do you think ?

--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -668,6 +668,8 @@ struct kvm_vcpu_arch {
 /* Software step state is Active-pending */
 #define DBG_SS_ACTIVE_PENDING	__vcpu_single_flag(sflags, BIT(5))
 
+/* PMUSERENR for the guest EL0 is on physical CPU */
+#define PMUSERENR_ON_CPU	__vcpu_single_flag(sflags, BIT(6))
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
 #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) +	\
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 6718731729fd..57e4f480874a 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -82,12 +82,19 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
 	 */
 	if (kvm_arm_support_pmu_v3()) {
 		struct kvm_cpu_context *hctxt;
+		unsigned long flags;
 
 		write_sysreg(0, pmselr_el0);
 
 		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+
+		local_irq_save(flags);
+
 		ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);
 		write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
+		vcpu_set_flag(vcpu, PMUSERENR_ON_CPU);
+
+		local_irq_restore(flags);
 	}
 
 	vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
@@ -112,9 +119,16 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
 	write_sysreg(0, hstr_el2);
 	if (kvm_arm_support_pmu_v3()) {
 		struct kvm_cpu_context *hctxt;
+		unsigned long flags;
 
 		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+
+		local_irq_save(flags);
+
 		write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0);
+		vcpu_clear_flag(vcpu, PMUSERENR_ON_CPU);
+
+		local_irq_restore(flags);
 	}
 
 	if (cpus_have_final_cap(ARM64_SME)) {
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 40bb2cb13317..33cd8e1ecbd6 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -221,8 +221,13 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
 bool kvm_set_pmuserenr(u64 val)
 {
 	struct kvm_cpu_context *hctxt;
+	struct kvm_vcpu *vcpu;
 
-	if (!kvm_arm_support_pmu_v3() || !has_vhe() || !kvm_get_running_vcpu())
+	if (!kvm_arm_support_pmu_v3() || !has_vhe())
+		return false;
+
+	vcpu = kvm_get_running_vcpu();
+	if (!vcpu || !vcpu_get_flag(vcpu, PMUSERENR_ON_CPU))
 		return false;
 
 	hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
Mark Rutland April 12, 2023, 9:20 a.m. UTC | #3
On Tue, Apr 11, 2023 at 10:14:10PM -0700, Reiji Watanabe wrote:
> Uh, right, interrupts are not masked during those windows...
> 
> What I am currently considering on this would be disabling
> IRQs while manipulating the register, and introducing a new flag
> to indicate whether the PMUSERENR for the guest EL0 is loaded,
> and having kvm_set_pmuserenr() check the new flag.
> 
> The code would be something like below (local_irq_save/local_irq_restore
> needs to be excluded for NVHE though).
> 
> What do you think ?

I'm happy with that; it doesn't change the arm_pmu side of the interface and it
looks good from a functional perspective.

I'll have to leave it to Marc and Oliver to say whether they're happy with the
KVM side.

Thanks,
Mark.

> 
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -668,6 +668,8 @@ struct kvm_vcpu_arch {
>  /* Software step state is Active-pending */
>  #define DBG_SS_ACTIVE_PENDING	__vcpu_single_flag(sflags, BIT(5))
>  
> +/* PMUSERENR for the guest EL0 is on physical CPU */
> +#define PMUSERENR_ON_CPU	__vcpu_single_flag(sflags, BIT(6))
>  
>  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
>  #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) +	\
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 6718731729fd..57e4f480874a 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -82,12 +82,19 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
>  	 */
>  	if (kvm_arm_support_pmu_v3()) {
>  		struct kvm_cpu_context *hctxt;
> +		unsigned long flags;
>  
>  		write_sysreg(0, pmselr_el0);
>  
>  		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +
> +		local_irq_save(flags);
> +
>  		ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);
>  		write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
> +		vcpu_set_flag(vcpu, PMUSERENR_ON_CPU);
> +
> +		local_irq_restore(flags);
>  	}
>  
>  	vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
> @@ -112,9 +119,16 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
>  	write_sysreg(0, hstr_el2);
>  	if (kvm_arm_support_pmu_v3()) {
>  		struct kvm_cpu_context *hctxt;
> +		unsigned long flags;
>  
>  		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +
> +		local_irq_save(flags);
> +
>  		write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0);
> +		vcpu_clear_flag(vcpu, PMUSERENR_ON_CPU);
> +
> +		local_irq_restore(flags);
>  	}
>  
>  	if (cpus_have_final_cap(ARM64_SME)) {
> diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
> index 40bb2cb13317..33cd8e1ecbd6 100644
> --- a/arch/arm64/kvm/pmu.c
> +++ b/arch/arm64/kvm/pmu.c
> @@ -221,8 +221,13 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
>  bool kvm_set_pmuserenr(u64 val)
>  {
>  	struct kvm_cpu_context *hctxt;
> +	struct kvm_vcpu *vcpu;
>  
> -	if (!kvm_arm_support_pmu_v3() || !has_vhe() || !kvm_get_running_vcpu())
> +	if (!kvm_arm_support_pmu_v3() || !has_vhe())
> +		return false;
> +
> +	vcpu = kvm_get_running_vcpu();
> +	if (!vcpu || !vcpu_get_flag(vcpu, PMUSERENR_ON_CPU))
>  		return false;
>  
>  	hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> -- 
> 
> Thank you,
> Reiji
> 
> 
> > 
> > Thanks,
> > Mark.
> > 
> > > Suggested-by: Mark Rutland <mark.rutland@arm.com>
> > > Suggested-by: Marc Zyngier <maz@kernel.org>
> > > Fixes: 83a7a4d643d3 ("arm64: perf: Enable PMU counter userspace access for perf event")
> > > Signed-off-by: Reiji Watanabe <reijiw@google.com>
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h |  5 +++++
> > >  arch/arm64/kernel/perf_event.c    | 21 ++++++++++++++++++---
> > >  arch/arm64/kvm/pmu.c              | 20 ++++++++++++++++++++
> > >  3 files changed, 43 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index bcd774d74f34..22db2f885c17 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -1028,9 +1028,14 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
> > >  #ifdef CONFIG_KVM
> > >  void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
> > >  void kvm_clr_pmu_events(u32 clr);
> > > +bool kvm_set_pmuserenr(u64 val);
> > >  #else
> > >  static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
> > >  static inline void kvm_clr_pmu_events(u32 clr) {}
> > > +static inline bool kvm_set_pmuserenr(u64 val)
> > > +{
> > > +	return false;
> > > +}
> > >  #endif
> > >  
> > >  void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
> > > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > > index dde06c0f97f3..0fffe4c56c28 100644
> > > --- a/arch/arm64/kernel/perf_event.c
> > > +++ b/arch/arm64/kernel/perf_event.c
> > > @@ -741,9 +741,25 @@ static inline u32 armv8pmu_getreset_flags(void)
> > >  	return value;
> > >  }
> > >  
> > > +static void update_pmuserenr(u64 val)
> > > +{
> > > +	lockdep_assert_irqs_disabled();
> > > +
> > > +	/*
> > > +	 * The current pmuserenr value might be the value for the guest.
> > > +	 * If that's the case, have KVM keep tracking of the register value
> > > +	 * for the host EL0 so that KVM can restore it before returning to
> > > +	 * the host EL0. Otherwise, update the register now.
> > > +	 */
> > > +	if (kvm_set_pmuserenr(val))
> > > +		return;
> > > +
> > > +	write_sysreg(val, pmuserenr_el0);
> > > +}
> > > +
> > >  static void armv8pmu_disable_user_access(void)
> > >  {
> > > -	write_sysreg(0, pmuserenr_el0);
> > > +	update_pmuserenr(0);
> > >  }
> > >  
> > >  static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
> > > @@ -759,8 +775,7 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
> > >  			armv8pmu_write_evcntr(i, 0);
> > >  	}
> > >  
> > > -	write_sysreg(0, pmuserenr_el0);
> > > -	write_sysreg(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR, pmuserenr_el0);
> > > +	update_pmuserenr(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR);
> > >  }
> > >  
> > >  static void armv8pmu_enable_event(struct perf_event *event)
> > > diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
> > > index 7887133d15f0..40bb2cb13317 100644
> > > --- a/arch/arm64/kvm/pmu.c
> > > +++ b/arch/arm64/kvm/pmu.c
> > > @@ -209,3 +209,23 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
> > >  	kvm_vcpu_pmu_enable_el0(events_host);
> > >  	kvm_vcpu_pmu_disable_el0(events_guest);
> > >  }
> > > +
> > > +/*
> > > + * With VHE, keep track of the PMUSERENR_EL0 value for the host EL0 on
> > > + * the pCPU where vCPU is loaded, since PMUSERENR_EL0 is switched to
> > > + * the value for the guest on vcpu_load().  The value for the host EL0
> > > + * will be restored on vcpu_put(), before returning to the EL0.
> > > + *
> > > + * Return true if KVM takes care of the register. Otherwise return false.
> > > + */
> > > +bool kvm_set_pmuserenr(u64 val)
> > > +{
> > > +	struct kvm_cpu_context *hctxt;
> > > +
> > > +	if (!kvm_arm_support_pmu_v3() || !has_vhe() || !kvm_get_running_vcpu())
> > > +		return false;
> > > +
> > > +	hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> > > +	ctxt_sys_reg(hctxt, PMUSERENR_EL0) = val;
> > > +	return true;
> > > +}
> > > -- 
> > > 2.40.0.577.gac1e443424-goog
> > > 
> > > 
>
Marc Zyngier April 12, 2023, 10:22 a.m. UTC | #4
On Wed, 12 Apr 2023 10:20:05 +0100,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Tue, Apr 11, 2023 at 10:14:10PM -0700, Reiji Watanabe wrote:
> > Uh, right, interrupts are not masked during those windows...
> > 
> > What I am currently considering on this would be disabling
> > IRQs while manipulating the register, and introducing a new flag
> > to indicate whether the PMUSERENR for the guest EL0 is loaded,
> > and having kvm_set_pmuserenr() check the new flag.
> > 
> > The code would be something like below (local_irq_save/local_irq_restore
> > needs to be excluded for NVHE though).

It shouldn't need to be excluded. It should be fairly harmless, unless
I'm missing something really obvious?

> > 
> > What do you think ?
> 
> I'm happy with that; it doesn't change the arm_pmu side of the interface and it
> looks good from a functional perspective.
> 
> I'll have to leave it to Marc and Oliver to say whether they're happy with the
> KVM side.

This looks OK to me. My only ask would be to have a small comment in
the __{de}activate_traps_common() functions to say what this protects
against, because I'll page it out within minutes.

Thanks,

	M.
Reiji Watanabe April 13, 2023, 12:07 a.m. UTC | #5
On Wed, Apr 12, 2023 at 11:22:29AM +0100, Marc Zyngier wrote:
> On Wed, 12 Apr 2023 10:20:05 +0100,
> Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Tue, Apr 11, 2023 at 10:14:10PM -0700, Reiji Watanabe wrote:
> > > Uh, right, interrupts are not masked during those windows...
> > >
> > > What I am currently considering on this would be disabling
> > > IRQs while manipulating the register, and introducing a new flag
> > > to indicate whether the PMUSERENR for the guest EL0 is loaded,
> > > and having kvm_set_pmuserenr() check the new flag.
> > >
> > > The code would be something like below (local_irq_save/local_irq_restore
> > > needs to be excluded for NVHE though).
>
> It shouldn't need to be excluded. It should be fairly harmless, unless
> I'm missing something really obvious?

The reason why I think local_irq_{save,restore} should be excluded
are because they use trace_hardirqs_{on,off} (Since IRQs are
masked here for NVHE, practically, they shouldn't be called with
the current KVM implementation though).

I'm looking at using "ifndef __KVM_NVHE_HYPERVISOR__" or other
ways to organize the code for this.
Since {__activate,__deactivate}_traps_common() are pretty lightweight
functions, I'm also considering disabling IRQs in their call sites
(i.e. activate_traps_vhe_load/deactivate_traps_vhe_put), instead of in
__{de}activate_traps_common() (Thanks for this suggestion, Oliver).


> > I'm happy with that; it doesn't change the arm_pmu side of the interface and it
> > looks good from a functional perspective.
> >
> > I'll have to leave it to Marc and Oliver to say whether they're happy with the
> > KVM side.
>
> This looks OK to me. My only ask would be to have a small comment in
> the __{de}activate_traps_common() functions to say what this protects
> against, because I'll page it out within minutes.

Thank you for the feedback!
Yes, I will add a comment for those.

Thank you,
Reiji
Marc Zyngier April 13, 2023, 8:56 a.m. UTC | #6
On Thu, 13 Apr 2023 01:07:38 +0100,
Reiji Watanabe <reijiw@google.com> wrote:
> 
> On Wed, Apr 12, 2023 at 11:22:29AM +0100, Marc Zyngier wrote:
> > On Wed, 12 Apr 2023 10:20:05 +0100,
> > Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Tue, Apr 11, 2023 at 10:14:10PM -0700, Reiji Watanabe wrote:
> > > > Uh, right, interrupts are not masked during those windows...
> > > >
> > > > What I am currently considering on this would be disabling
> > > > IRQs while manipulating the register, and introducing a new flag
> > > > to indicate whether the PMUSERENR for the guest EL0 is loaded,
> > > > and having kvm_set_pmuserenr() check the new flag.
> > > >
> > > > The code would be something like below (local_irq_save/local_irq_restore
> > > > needs to be excluded for NVHE though).
> >
> > It shouldn't need to be excluded. It should be fairly harmless, unless
> > I'm missing something really obvious?
> 
> The reason why I think local_irq_{save,restore} should be excluded
> are because they use trace_hardirqs_{on,off} (Since IRQs are
> masked here for NVHE, practically, they shouldn't be called with
> the current KVM implementation though).

Gah. Indeed, we end-up with a lot of unwanted crap, and absolutely no
way to locally override it.

> I'm looking at using "ifndef __KVM_NVHE_HYPERVISOR__" or other
> ways to organize the code for this.

I'd vote for something like the code below:

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 530347cdebe3..1796fadb26cc 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -10,7 +10,7 @@ asflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS
 # will explode instantly (Words of Marc Zyngier). So introduce a generic flag
 # __DISABLE_TRACE_MMIO__ to disable MMIO tracing for nVHE KVM.
 ccflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS -D__DISABLE_TRACE_MMIO__
-ccflags-y += -fno-stack-protector	\
+ccflags-y += -fno-stack-protector	-DNO_TRACE_IRQFLAGS \
 	     -DDISABLE_BRANCH_PROFILING	\
 	     $(DISABLE_STACKLEAK_PLUGIN)
 
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 5ec0fa71399e..ab0ae58dd797 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -198,9 +198,10 @@ extern void warn_bogus_irq_restore(void);
 
 /*
  * The local_irq_*() APIs are equal to the raw_local_irq*()
- * if !TRACE_IRQFLAGS.
+ * if !TRACE_IRQFLAGS or if NO_TRACE_IRQFLAGS is localy
+ * set.
  */
-#ifdef CONFIG_TRACE_IRQFLAGS
+#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(NO_TRACE_IRQFLAGS)
 
 #define local_irq_enable()				\
 	do {						\


> Since {__activate,__deactivate}_traps_common() are pretty lightweight
> functions, I'm also considering disabling IRQs in their call sites
> (i.e. activate_traps_vhe_load/deactivate_traps_vhe_put), instead of in
> __{de}activate_traps_common() (Thanks for this suggestion, Oliver).

That would work too.

Thanks,

	M.
Reiji Watanabe April 15, 2023, 3:11 a.m. UTC | #7
> > > > > Uh, right, interrupts are not masked during those windows...
> > > > >
> > > > > What I am currently considering on this would be disabling
> > > > > IRQs while manipulating the register, and introducing a new flag
> > > > > to indicate whether the PMUSERENR for the guest EL0 is loaded,
> > > > > and having kvm_set_pmuserenr() check the new flag.
> > > > >
> > > > > The code would be something like below (local_irq_save/local_irq_restore
> > > > > needs to be excluded for NVHE though).
> > >
> > > It shouldn't need to be excluded. It should be fairly harmless, unless
> > > I'm missing something really obvious?
> > 
> > The reason why I think local_irq_{save,restore} should be excluded
> > are because they use trace_hardirqs_{on,off} (Since IRQs are
> > masked here for NVHE, practically, they shouldn't be called with
> > the current KVM implementation though).
> 
> Gah. Indeed, we end-up with a lot of unwanted crap, and absolutely no
> way to locally override it.
> 
> > I'm looking at using "ifndef __KVM_NVHE_HYPERVISOR__" or other
> > ways to organize the code for this.
> 
> I'd vote for something like the code below:

Thank you for the suggestion.
Considering that we may have similar cases in the future,
I will implement as you suggested in v3.

Thank you,
Reiji

> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> index 530347cdebe3..1796fadb26cc 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -10,7 +10,7 @@ asflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS
>  # will explode instantly (Words of Marc Zyngier). So introduce a generic flag
>  # __DISABLE_TRACE_MMIO__ to disable MMIO tracing for nVHE KVM.
>  ccflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS -D__DISABLE_TRACE_MMIO__
> -ccflags-y += -fno-stack-protector	\
> +ccflags-y += -fno-stack-protector	-DNO_TRACE_IRQFLAGS \
>  	     -DDISABLE_BRANCH_PROFILING	\
>  	     $(DISABLE_STACKLEAK_PLUGIN)
>  
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 5ec0fa71399e..ab0ae58dd797 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -198,9 +198,10 @@ extern void warn_bogus_irq_restore(void);
>  
>  /*
>   * The local_irq_*() APIs are equal to the raw_local_irq*()
> - * if !TRACE_IRQFLAGS.
> + * if !TRACE_IRQFLAGS or if NO_TRACE_IRQFLAGS is localy
> + * set.
>   */
> -#ifdef CONFIG_TRACE_IRQFLAGS
> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(NO_TRACE_IRQFLAGS)
>  
>  #define local_irq_enable()				\
>  	do {						\
> 
> 
> > Since {__activate,__deactivate}_traps_common() are pretty lightweight
> > functions, I'm also considering disabling IRQs in their call sites
> > (i.e. activate_traps_vhe_load/deactivate_traps_vhe_put), instead of in
> > __{de}activate_traps_common() (Thanks for this suggestion, Oliver).
> 
> That would work too.
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index bcd774d74f34..22db2f885c17 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1028,9 +1028,14 @@  void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
 #ifdef CONFIG_KVM
 void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
 void kvm_clr_pmu_events(u32 clr);
+bool kvm_set_pmuserenr(u64 val);
 #else
 static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
 static inline void kvm_clr_pmu_events(u32 clr) {}
+static inline bool kvm_set_pmuserenr(u64 val)
+{
+	return false;
+}
 #endif
 
 void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index dde06c0f97f3..0fffe4c56c28 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -741,9 +741,25 @@  static inline u32 armv8pmu_getreset_flags(void)
 	return value;
 }
 
+static void update_pmuserenr(u64 val)
+{
+	lockdep_assert_irqs_disabled();
+
+	/*
+	 * The current pmuserenr value might be the value for the guest.
+	 * If that's the case, have KVM keep tracking of the register value
+	 * for the host EL0 so that KVM can restore it before returning to
+	 * the host EL0. Otherwise, update the register now.
+	 */
+	if (kvm_set_pmuserenr(val))
+		return;
+
+	write_sysreg(val, pmuserenr_el0);
+}
+
 static void armv8pmu_disable_user_access(void)
 {
-	write_sysreg(0, pmuserenr_el0);
+	update_pmuserenr(0);
 }
 
 static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
@@ -759,8 +775,7 @@  static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
 			armv8pmu_write_evcntr(i, 0);
 	}
 
-	write_sysreg(0, pmuserenr_el0);
-	write_sysreg(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR, pmuserenr_el0);
+	update_pmuserenr(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR);
 }
 
 static void armv8pmu_enable_event(struct perf_event *event)
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 7887133d15f0..40bb2cb13317 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -209,3 +209,23 @@  void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
 	kvm_vcpu_pmu_enable_el0(events_host);
 	kvm_vcpu_pmu_disable_el0(events_guest);
 }
+
+/*
+ * With VHE, keep track of the PMUSERENR_EL0 value for the host EL0 on
+ * the pCPU where vCPU is loaded, since PMUSERENR_EL0 is switched to
+ * the value for the guest on vcpu_load().  The value for the host EL0
+ * will be restored on vcpu_put(), before returning to the EL0.
+ *
+ * Return true if KVM takes care of the register. Otherwise return false.
+ */
+bool kvm_set_pmuserenr(u64 val)
+{
+	struct kvm_cpu_context *hctxt;
+
+	if (!kvm_arm_support_pmu_v3() || !has_vhe() || !kvm_get_running_vcpu())
+		return false;
+
+	hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	ctxt_sys_reg(hctxt, PMUSERENR_EL0) = val;
+	return true;
+}