diff mbox series

KVM: arm64: Disabling disabled PMU counters wastes a lot of time

Message ID 20210628161925.401343-1-alexandre.chartre@oracle.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Disabling disabled PMU counters wastes a lot of time | expand

Commit Message

Alexandre Chartre June 28, 2021, 4:19 p.m. UTC
In a KVM guest on ARM, performance counters interrupts have an
unnecessary overhead which slows down execution when using the "perf
record" command and limits the "perf record" sampling period.

The problem is that when a guest VM disables counters by clearing the
PMCR_EL0.E bit (bit 0), KVM will disable all counters defined in
PMCR_EL0 even if they are not enabled in PMCNTENSET_EL0.

KVM disables a counter by calling into the perf framework, in particular
by calling perf_event_create_kernel_counter() which is a time consuming
operation. So, for example, with a Neoverse N1 CPU core which has 6 event
counters and one cycle counter, KVM will always disable all 7 counters
even if only one is enabled.

This typically happens when using the "perf record" command in a guest
VM: perf will disable all event counters with PMCNTENTSET_EL0 and only
uses the cycle counter. And when using the "perf record" -F option with
a high profiling frequency, the overhead of KVM disabling all counters
instead of one on every counter interrupt becomes very noticeable.

The problem is fixed by having KVM disable only counters which are
enabled in PMCNTENSET_EL0. If a counter is not enabled in PMCNTENSET_EL0
then KVM will not enable it when setting PMCR_EL0.E and it will remain
disable as long as it is not enabled in PMCNTENSET_EL0. So there is
effectively no need to disable a counter when clearing PMCR_EL0.E if it
is not enabled PMCNTENSET_EL0.

Fixes: 76993739cd6f ("arm64: KVM: Add helper to handle PMCR register bits")
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/arm64/kvm/pmu-emul.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Marc Zyngier June 29, 2021, 9:06 a.m. UTC | #1
Hi Alexandre,

Thanks for looking into this.

On Mon, 28 Jun 2021 17:19:25 +0100,
Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> 
> In a KVM guest on ARM, performance counters interrupts have an

nit: arm64. 32bit ARM never had any working KVM PMU emulation.

> unnecessary overhead which slows down execution when using the "perf
> record" command and limits the "perf record" sampling period.
> 
> The problem is that when a guest VM disables counters by clearing the
> PMCR_EL0.E bit (bit 0), KVM will disable all counters defined in
> PMCR_EL0 even if they are not enabled in PMCNTENSET_EL0.
> 
> KVM disables a counter by calling into the perf framework, in particular
> by calling perf_event_create_kernel_counter() which is a time consuming
> operation. So, for example, with a Neoverse N1 CPU core which has 6 event
> counters and one cycle counter, KVM will always disable all 7 counters
> even if only one is enabled.
> 
> This typically happens when using the "perf record" command in a guest
> VM: perf will disable all event counters with PMCNTENTSET_EL0 and only
> uses the cycle counter. And when using the "perf record" -F option with
> a high profiling frequency, the overhead of KVM disabling all counters
> instead of one on every counter interrupt becomes very noticeable.
> 
> The problem is fixed by having KVM disable only counters which are
> enabled in PMCNTENSET_EL0. If a counter is not enabled in PMCNTENSET_EL0
> then KVM will not enable it when setting PMCR_EL0.E and it will remain
> disable as long as it is not enabled in PMCNTENSET_EL0. So there is

nit: disabled

> effectively no need to disable a counter when clearing PMCR_EL0.E if it
> is not enabled PMCNTENSET_EL0.
> 
> Fixes: 76993739cd6f ("arm64: KVM: Add helper to handle PMCR register bits")

This isn't a fix (the current behaviour is correct per the
architecture), "only" a performance improvement. We reserve "Fixes:"
for things that are actually broken.

> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
> ---
>  arch/arm64/kvm/pmu-emul.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index fd167d4f4215..bab4b735a0cf 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -571,7 +571,8 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
>  		kvm_pmu_enable_counter_mask(vcpu,
>  		       __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
>  	} else {
> -		kvm_pmu_disable_counter_mask(vcpu, mask);
> +		kvm_pmu_disable_counter_mask(vcpu,
> +		       __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);

This seems to perpetuate a flawed pattern. Why do we need to work out
the *valid* PMCTENSET_EL0 bits? They should be correct by construction,
and the way the shadow sysreg gets populated already enforces this:

<quote>
static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
			   const struct sys_reg_desc *r)
{
[...]
	mask = kvm_pmu_valid_counter_mask(vcpu);
	if (p->is_write) {
		val = p->regval & mask;
		if (r->Op2 & 0x1) {
			/* accessing PMCNTENSET_EL0 */
			__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val;
			kvm_pmu_enable_counter_mask(vcpu, val);
			kvm_vcpu_pmu_restore_guest(vcpu);
</quote>

So the sysreg is the only thing we should consider, and I think we
should drop the useless masking. There is at least another instance of
this in the PMU code (kvm_pmu_overflow_status()), and apart from
kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the
masking to sanitise accesses.

What do you think?

	M.
Alexandre Chartre June 29, 2021, 1:16 p.m. UTC | #2
Hi Marc,

On 6/29/21 11:06 AM, Marc Zyngier wrote:
> Hi Alexandre,
> 
> Thanks for looking into this.
> 
> On Mon, 28 Jun 2021 17:19:25 +0100,
> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
>>
>> In a KVM guest on ARM, performance counters interrupts have an
> 
> nit: arm64. 32bit ARM never had any working KVM PMU emulation.
>
>> unnecessary overhead which slows down execution when using the "perf
>> record" command and limits the "perf record" sampling period.
>>
>> The problem is that when a guest VM disables counters by clearing the
>> PMCR_EL0.E bit (bit 0), KVM will disable all counters defined in
>> PMCR_EL0 even if they are not enabled in PMCNTENSET_EL0.
>>
>> KVM disables a counter by calling into the perf framework, in particular
>> by calling perf_event_create_kernel_counter() which is a time consuming
>> operation. So, for example, with a Neoverse N1 CPU core which has 6 event
>> counters and one cycle counter, KVM will always disable all 7 counters
>> even if only one is enabled.
>>
>> This typically happens when using the "perf record" command in a guest
>> VM: perf will disable all event counters with PMCNTENTSET_EL0 and only
>> uses the cycle counter. And when using the "perf record" -F option with
>> a high profiling frequency, the overhead of KVM disabling all counters
>> instead of one on every counter interrupt becomes very noticeable.
>>
>> The problem is fixed by having KVM disable only counters which are
>> enabled in PMCNTENSET_EL0. If a counter is not enabled in PMCNTENSET_EL0
>> then KVM will not enable it when setting PMCR_EL0.E and it will remain
>> disable as long as it is not enabled in PMCNTENSET_EL0. So there is
> 
> nit: disabled
> 
>> effectively no need to disable a counter when clearing PMCR_EL0.E if it
>> is not enabled PMCNTENSET_EL0.
>>
>> Fixes: 76993739cd6f ("arm64: KVM: Add helper to handle PMCR register bits")
> 
> This isn't a fix (the current behaviour is correct per the
> architecture), "only" a performance improvement. We reserve "Fixes:"
> for things that are actually broken.
> 

Ok, I will change everything you mentioned about the commit message.


>> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
>> ---
>>   arch/arm64/kvm/pmu-emul.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
>> index fd167d4f4215..bab4b735a0cf 100644
>> --- a/arch/arm64/kvm/pmu-emul.c
>> +++ b/arch/arm64/kvm/pmu-emul.c
>> @@ -571,7 +571,8 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
>>   		kvm_pmu_enable_counter_mask(vcpu,
>>   		       __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
>>   	} else {
>> -		kvm_pmu_disable_counter_mask(vcpu, mask);
>> +		kvm_pmu_disable_counter_mask(vcpu,
>> +		       __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
> 
> This seems to perpetuate a flawed pattern. Why do we need to work out
> the *valid* PMCTENSET_EL0 bits? They should be correct by construction,
> and the way the shadow sysreg gets populated already enforces this:
> 
> <quote>
> static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> 			   const struct sys_reg_desc *r)
> {
> [...]
> 	mask = kvm_pmu_valid_counter_mask(vcpu);
> 	if (p->is_write) {
> 		val = p->regval & mask;
> 		if (r->Op2 & 0x1) {
> 			/* accessing PMCNTENSET_EL0 */
> 			__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val;
> 			kvm_pmu_enable_counter_mask(vcpu, val);
> 			kvm_vcpu_pmu_restore_guest(vcpu);
> </quote>
> 
> So the sysreg is the only thing we should consider, and I think we
> should drop the useless masking. There is at least another instance of
> this in the PMU code (kvm_pmu_overflow_status()), and apart from
> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the
> masking to sanitise accesses.
> 
> What do you think?
> 

I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask()
so there's effectively no need to mask it again when we use it. I will send an additional
patch (on top of this one) to remove useless masking. Basically, changes would be:

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index bab4b735a0cf..e0dfd7ce4ba0 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -373,7 +373,6 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
                 reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
                 reg &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
                 reg &= __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
-               reg &= kvm_pmu_valid_counter_mask(vcpu);
         }
  
         return reg;
@@ -564,21 +563,22 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val)
   */
  void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
  {
-       unsigned long mask = kvm_pmu_valid_counter_mask(vcpu);
+       unsigned long mask;
         int i;
  
         if (val & ARMV8_PMU_PMCR_E) {
                 kvm_pmu_enable_counter_mask(vcpu,
-                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
+                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0));
         } else {
                 kvm_pmu_disable_counter_mask(vcpu,
-                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
+                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0));
         }
  
         if (val & ARMV8_PMU_PMCR_C)
                 kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, 0);
  
         if (val & ARMV8_PMU_PMCR_P) {
+               mask = kvm_pmu_valid_counter_mask(vcpu);
                 for_each_set_bit(i, &mask, 32)
                         kvm_pmu_set_counter_value(vcpu, i, 0);
         }
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1a7968ad078c..2e406905760e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -845,7 +845,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
                         kvm_pmu_disable_counter_mask(vcpu, val);
                 }
         } else {
-               p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
+               p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
         }
  
         return true;


Thanks,

alex.
Marc Zyngier June 29, 2021, 1:47 p.m. UTC | #3
On Tue, 29 Jun 2021 14:16:55 +0100,
Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> 
> 
> Hi Marc,
> 
> On 6/29/21 11:06 AM, Marc Zyngier wrote:
> > Hi Alexandre,

[...]

> > So the sysreg is the only thing we should consider, and I think we
> > should drop the useless masking. There is at least another instance of
> > this in the PMU code (kvm_pmu_overflow_status()), and apart from
> > kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the
> > masking to sanitise accesses.
> > 
> > What do you think?
> > 
> 
> I think you are right. PMCNTENSET_EL0 is already masked with
> kvm_pmu_valid_counter_mask() so there's effectively no need to mask
> it again when we use it. I will send an additional patch (on top of
> this one) to remove useless masking. Basically, changes would be:
> 
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index bab4b735a0cf..e0dfd7ce4ba0 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -373,7 +373,6 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
>                 reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
>                 reg &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
>                 reg &= __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
> -               reg &= kvm_pmu_valid_counter_mask(vcpu);
>         }
>          return reg;
> @@ -564,21 +563,22 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val)
>   */
>  void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
>  {
> -       unsigned long mask = kvm_pmu_valid_counter_mask(vcpu);
> +       unsigned long mask;
>         int i;
>          if (val & ARMV8_PMU_PMCR_E) {
>                 kvm_pmu_enable_counter_mask(vcpu,
> -                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
> +                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0));
>         } else {
>                 kvm_pmu_disable_counter_mask(vcpu,
> -                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
> +                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0));
>         }
>          if (val & ARMV8_PMU_PMCR_C)
>                 kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, 0);
>          if (val & ARMV8_PMU_PMCR_P) {
> +               mask = kvm_pmu_valid_counter_mask(vcpu);

Careful here, this clashes with a fix from Alexandru that is currently
in -next (PMCR_EL0.P shouldn't reset the cycle counter) and aimed at
5.14. And whilst you're at it, consider moving the 'mask' declaration
here too.

>                 for_each_set_bit(i, &mask, 32)
>                         kvm_pmu_set_counter_value(vcpu, i, 0);
>         }
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 1a7968ad078c..2e406905760e 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -845,7 +845,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>                         kvm_pmu_disable_counter_mask(vcpu, val);
>                 }
>         } else {
> -               p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
> +               p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
>         }
>          return true;

If you are cleaning up the read-side of sysregs, access_pminten() and
access_pmovs() could have some of your attention too.

Thanks,

	M.
Alexandre Chartre June 29, 2021, 2:17 p.m. UTC | #4
On 6/29/21 3:47 PM, Marc Zyngier wrote:
> On Tue, 29 Jun 2021 14:16:55 +0100,
> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
>>
>>
>> Hi Marc,
>>
>> On 6/29/21 11:06 AM, Marc Zyngier wrote:
>>> Hi Alexandre,
> 
> [...]
> 
>>> So the sysreg is the only thing we should consider, and I think we
>>> should drop the useless masking. There is at least another instance of
>>> this in the PMU code (kvm_pmu_overflow_status()), and apart from
>>> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the
>>> masking to sanitise accesses.
>>>
>>> What do you think?
>>>
>>
>> I think you are right. PMCNTENSET_EL0 is already masked with
>> kvm_pmu_valid_counter_mask() so there's effectively no need to mask
>> it again when we use it. I will send an additional patch (on top of
>> this one) to remove useless masking. Basically, changes would be:
>>
>> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
>> index bab4b735a0cf..e0dfd7ce4ba0 100644
>> --- a/arch/arm64/kvm/pmu-emul.c
>> +++ b/arch/arm64/kvm/pmu-emul.c
>> @@ -373,7 +373,6 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
>>                  reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
>>                  reg &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
>>                  reg &= __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
>> -               reg &= kvm_pmu_valid_counter_mask(vcpu);
>>          }
>>           return reg;
>> @@ -564,21 +563,22 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val)
>>    */
>>   void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
>>   {
>> -       unsigned long mask = kvm_pmu_valid_counter_mask(vcpu);
>> +       unsigned long mask;
>>          int i;
>>           if (val & ARMV8_PMU_PMCR_E) {
>>                  kvm_pmu_enable_counter_mask(vcpu,
>> -                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
>> +                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0));
>>          } else {
>>                  kvm_pmu_disable_counter_mask(vcpu,
>> -                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
>> +                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0));
>>          }
>>           if (val & ARMV8_PMU_PMCR_C)
>>                  kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, 0);
>>           if (val & ARMV8_PMU_PMCR_P) {
>> +               mask = kvm_pmu_valid_counter_mask(vcpu);
> 
> Careful here, this clashes with a fix from Alexandru that is currently
> in -next (PMCR_EL0.P shouldn't reset the cycle counter) and aimed at
> 5.14. And whilst you're at it, consider moving the 'mask' declaration
> here too.
> 
>>                  for_each_set_bit(i, &mask, 32)
>>                          kvm_pmu_set_counter_value(vcpu, i, 0);
>>          }
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 1a7968ad078c..2e406905760e 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -845,7 +845,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>>                          kvm_pmu_disable_counter_mask(vcpu, val);
>>                  }
>>          } else {
>> -               p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
>> +               p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
>>          }
>>           return true;
> 
> If you are cleaning up the read-side of sysregs, access_pminten() and
> access_pmovs() could have some of your attention too.
> 

Ok, so for now, I will just resubmit the initial patch with the commit
comment fixes. Then, look at all the mask cleanup on top of Alexandru
changes and prepare another patch.

alex.
Marc Zyngier June 29, 2021, 2:25 p.m. UTC | #5
On 2021-06-29 15:17, Alexandre Chartre wrote:
> On 6/29/21 3:47 PM, Marc Zyngier wrote:
>> On Tue, 29 Jun 2021 14:16:55 +0100,
>> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
>>> 
>>> 
>>> Hi Marc,
>>> 
>>> On 6/29/21 11:06 AM, Marc Zyngier wrote:
>>>> Hi Alexandre,
>> 
>> [...]
>> 
>>>> So the sysreg is the only thing we should consider, and I think we
>>>> should drop the useless masking. There is at least another instance 
>>>> of
>>>> this in the PMU code (kvm_pmu_overflow_status()), and apart from
>>>> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about 
>>>> the
>>>> masking to sanitise accesses.
>>>> 
>>>> What do you think?
>>>> 
>>> 
>>> I think you are right. PMCNTENSET_EL0 is already masked with
>>> kvm_pmu_valid_counter_mask() so there's effectively no need to mask
>>> it again when we use it. I will send an additional patch (on top of
>>> this one) to remove useless masking. Basically, changes would be:
>>> 
>>> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
>>> index bab4b735a0cf..e0dfd7ce4ba0 100644
>>> --- a/arch/arm64/kvm/pmu-emul.c
>>> +++ b/arch/arm64/kvm/pmu-emul.c
>>> @@ -373,7 +373,6 @@ static u64 kvm_pmu_overflow_status(struct 
>>> kvm_vcpu *vcpu)
>>>                  reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
>>>                  reg &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
>>>                  reg &= __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
>>> -               reg &= kvm_pmu_valid_counter_mask(vcpu);
>>>          }
>>>           return reg;
>>> @@ -564,21 +563,22 @@ void kvm_pmu_software_increment(struct kvm_vcpu 
>>> *vcpu, u64 val)
>>>    */
>>>   void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
>>>   {
>>> -       unsigned long mask = kvm_pmu_valid_counter_mask(vcpu);
>>> +       unsigned long mask;
>>>          int i;
>>>           if (val & ARMV8_PMU_PMCR_E) {
>>>                  kvm_pmu_enable_counter_mask(vcpu,
>>> -                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
>>> +                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0));
>>>          } else {
>>>                  kvm_pmu_disable_counter_mask(vcpu,
>>> -                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
>>> +                      __vcpu_sys_reg(vcpu, PMCNTENSET_EL0));
>>>          }
>>>           if (val & ARMV8_PMU_PMCR_C)
>>>                  kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, 
>>> 0);
>>>           if (val & ARMV8_PMU_PMCR_P) {
>>> +               mask = kvm_pmu_valid_counter_mask(vcpu);
>> 
>> Careful here, this clashes with a fix from Alexandru that is currently
>> in -next (PMCR_EL0.P shouldn't reset the cycle counter) and aimed at
>> 5.14. And whilst you're at it, consider moving the 'mask' declaration
>> here too.
>> 
>>>                  for_each_set_bit(i, &mask, 32)
>>>                          kvm_pmu_set_counter_value(vcpu, i, 0);
>>>          }
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index 1a7968ad078c..2e406905760e 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -845,7 +845,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, 
>>> struct sys_reg_params *p,
>>>                          kvm_pmu_disable_counter_mask(vcpu, val);
>>>                  }
>>>          } else {
>>> -               p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & 
>>> mask;
>>> +               p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
>>>          }
>>>           return true;
>> 
>> If you are cleaning up the read-side of sysregs, access_pminten() and
>> access_pmovs() could have some of your attention too.
>> 
> 
> Ok, so for now, I will just resubmit the initial patch with the commit
> comment fixes. Then, look at all the mask cleanup on top of Alexandru
> changes and prepare another patch.

Please send this as a series rather than individual patches. I'm only
queuing critical fixes at the moment (this is the merge window).
If you post the series after -rc1, I'll queue it and let it simmer
in -next.

Thanks,

         M.
Alexandre Chartre June 29, 2021, 2:40 p.m. UTC | #6
On 6/29/21 4:25 PM, Marc Zyngier wrote:
> On 2021-06-29 15:17, Alexandre Chartre wrote:
>> On 6/29/21 3:47 PM, Marc Zyngier wrote:
>>> On Tue, 29 Jun 2021 14:16:55 +0100,
>>> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
>>>>
>>>>
>>>> Hi Marc,
>>>>
>>>> On 6/29/21 11:06 AM, Marc Zyngier wrote:
>>>>> Hi Alexandre,
>>>
>>> [...]
>>>
>>> If you are cleaning up the read-side of sysregs, access_pminten() and
>>> access_pmovs() could have some of your attention too.
>>>
>>
>> Ok, so for now, I will just resubmit the initial patch with the commit
>> comment fixes. Then, look at all the mask cleanup on top of Alexandru
>> changes and prepare another patch.
> 
> Please send this as a series rather than individual patches. I'm only
> queuing critical fixes at the moment (this is the merge window).
> If you post the series after -rc1, I'll queue it and let it simmer
> in -next.
> 

Ok, I will prepare a series and send it after rc1.

alex.
Alexandre Chartre July 6, 2021, 1:50 p.m. UTC | #7
Hi Marc,

On 6/29/21 3:16 PM, Alexandre Chartre wrote:
> On 6/29/21 11:06 AM, Marc Zyngier wrote
> [...]
>> So the sysreg is the only thing we should consider, and I think we
>> should drop the useless masking. There is at least another instance of
>> this in the PMU code (kvm_pmu_overflow_status()), and apart from
>> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the
>> masking to sanitise accesses.
>>
>> What do you think?
>>
> 
> I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask()
> so there's effectively no need to mask it again when we use it. I will send an additional
> patch (on top of this one) to remove useless masking. Basically, changes would be:

I had a closer look and we can't remove the mask. The access functions (for pmcnten, pminten,
pmovs), clear or set only the specified valid counter bits. This means that bits other than
the valid counter bits never change in __vcpu_sys_reg(), and those bits are not necessarily
zero because the initial value is 0x1de7ec7edbadc0deULL (set by reset_unknown()).

So I will resubmit initial patch, with just the commit message changes.

Thanks,

alex.
Marc Zyngier July 6, 2021, 2:52 p.m. UTC | #8
On Tue, 06 Jul 2021 14:50:35 +0100,
Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> 
> 
> Hi Marc,
> 
> On 6/29/21 3:16 PM, Alexandre Chartre wrote:
> > On 6/29/21 11:06 AM, Marc Zyngier wrote
> > [...]
> >> So the sysreg is the only thing we should consider, and I think we
> >> should drop the useless masking. There is at least another instance of
> >> this in the PMU code (kvm_pmu_overflow_status()), and apart from
> >> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the
> >> masking to sanitise accesses.
> >> 
> >> What do you think?
> >> 
> > 
> > I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask()
> > so there's effectively no need to mask it again when we use it. I will send an additional
> > patch (on top of this one) to remove useless masking. Basically, changes would be:
> 
> I had a closer look and we can't remove the mask. The access
> functions (for pmcnten, pminten, pmovs), clear or set only the
> specified valid counter bits. This means that bits other than the
> valid counter bits never change in __vcpu_sys_reg(), and those bits
> are not necessarily zero because the initial value is
> 0x1de7ec7edbadc0deULL (set by reset_unknown()).

That's a bug that should be fixed on its own. Bits that are RAZ/WI in
the architecture shouldn't be kept in the shadow registers the first
place. I'll have a look.

> So I will resubmit initial patch, with just the commit message
> changes.

Please don't. I'm not papering over this kind of bug.

Thanks,

	M.
Alexandre Chartre July 6, 2021, 3:35 p.m. UTC | #9
On 7/6/21 4:52 PM, Marc Zyngier wrote:
> On Tue, 06 Jul 2021 14:50:35 +0100,
> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
>>
>>
>> Hi Marc,
>>
>> On 6/29/21 3:16 PM, Alexandre Chartre wrote:
>>> On 6/29/21 11:06 AM, Marc Zyngier wrote
>>> [...]
>>>> So the sysreg is the only thing we should consider, and I think we
>>>> should drop the useless masking. There is at least another instance of
>>>> this in the PMU code (kvm_pmu_overflow_status()), and apart from
>>>> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the
>>>> masking to sanitise accesses.
>>>>
>>>> What do you think?
>>>>
>>>
>>> I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask()
>>> so there's effectively no need to mask it again when we use it. I will send an additional
>>> patch (on top of this one) to remove useless masking. Basically, changes would be:
>>
>> I had a closer look and we can't remove the mask. The access
>> functions (for pmcnten, pminten, pmovs), clear or set only the
>> specified valid counter bits. This means that bits other than the
>> valid counter bits never change in __vcpu_sys_reg(), and those bits
>> are not necessarily zero because the initial value is
>> 0x1de7ec7edbadc0deULL (set by reset_unknown()).
> 
> That's a bug that should be fixed on its own. Bits that are RAZ/WI in
> the architecture shouldn't be kept in the shadow registers the first
> place. I'll have a look.
> 
>> So I will resubmit initial patch, with just the commit message
>> changes.
> 
> Please don't. I'm not papering over this kind of bug.
> 

Right, I meant I will resubmit after -rc1 as you requested.

Thanks,

alex.
Marc Zyngier July 6, 2021, 5:36 p.m. UTC | #10
On Tue, 06 Jul 2021 15:52:46 +0100,
Marc Zyngier <maz@kernel.org> wrote:
> 
> On Tue, 06 Jul 2021 14:50:35 +0100,
> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> > 
> > 
> > Hi Marc,
> > 
> > On 6/29/21 3:16 PM, Alexandre Chartre wrote:
> > > On 6/29/21 11:06 AM, Marc Zyngier wrote
> > > [...]
> > >> So the sysreg is the only thing we should consider, and I think we
> > >> should drop the useless masking. There is at least another instance of
> > >> this in the PMU code (kvm_pmu_overflow_status()), and apart from
> > >> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the
> > >> masking to sanitise accesses.
> > >> 
> > >> What do you think?
> > >> 
> > > 
> > > I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask()
> > > so there's effectively no need to mask it again when we use it. I will send an additional
> > > patch (on top of this one) to remove useless masking. Basically, changes would be:
> > 
> > I had a closer look and we can't remove the mask. The access
> > functions (for pmcnten, pminten, pmovs), clear or set only the
> > specified valid counter bits. This means that bits other than the
> > valid counter bits never change in __vcpu_sys_reg(), and those bits
> > are not necessarily zero because the initial value is
> > 0x1de7ec7edbadc0deULL (set by reset_unknown()).
> 
> That's a bug that should be fixed on its own. Bits that are RAZ/WI in
> the architecture shouldn't be kept in the shadow registers the first
> place. I'll have a look.

Please try this[1] for size, which is on top of Linus' tree as of this
morning.

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pmu/reset-values
Alexandre Chartre July 7, 2021, 12:48 p.m. UTC | #11
On 7/6/21 7:36 PM, Marc Zyngier wrote:
> On Tue, 06 Jul 2021 15:52:46 +0100,
> Marc Zyngier <maz@kernel.org> wrote:
>>
>> On Tue, 06 Jul 2021 14:50:35 +0100,
>> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
>>>
>>>
>>> Hi Marc,
>>>
>>> On 6/29/21 3:16 PM, Alexandre Chartre wrote:
>>>> On 6/29/21 11:06 AM, Marc Zyngier wrote
>>>> [...]
>>>>> So the sysreg is the only thing we should consider, and I think we
>>>>> should drop the useless masking. There is at least another instance of
>>>>> this in the PMU code (kvm_pmu_overflow_status()), and apart from
>>>>> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the
>>>>> masking to sanitise accesses.
>>>>>
>>>>> What do you think?
>>>>>
>>>>
>>>> I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask()
>>>> so there's effectively no need to mask it again when we use it. I will send an additional
>>>> patch (on top of this one) to remove useless masking. Basically, changes would be:
>>>
>>> I had a closer look and we can't remove the mask. The access
>>> functions (for pmcnten, pminten, pmovs), clear or set only the
>>> specified valid counter bits. This means that bits other than the
>>> valid counter bits never change in __vcpu_sys_reg(), and those bits
>>> are not necessarily zero because the initial value is
>>> 0x1de7ec7edbadc0deULL (set by reset_unknown()).
>>
>> That's a bug that should be fixed on its own. Bits that are RAZ/WI in
>> the architecture shouldn't be kept in the shadow registers the first
>> place. I'll have a look.
> 
> Please try this[1] for size, which is on top of Linus' tree as of this
> morning.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pmu/reset-values
> 

I have tried [1] and it works fine. Also tested with my patch on top (without
masking) and it also works fine.

Thanks,

alex.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index fd167d4f4215..bab4b735a0cf 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -571,7 +571,8 @@  void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
 		kvm_pmu_enable_counter_mask(vcpu,
 		       __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
 	} else {
-		kvm_pmu_disable_counter_mask(vcpu, mask);
+		kvm_pmu_disable_counter_mask(vcpu,
+		       __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
 	}
 
 	if (val & ARMV8_PMU_PMCR_C)