Message ID | 20210628161925.401343-1-alexandre.chartre@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Disabling disabled PMU counters wastes a lot of time | expand |
Hi Alexandre, Thanks for looking into this. On Mon, 28 Jun 2021 17:19:25 +0100, Alexandre Chartre <alexandre.chartre@oracle.com> wrote: > > In a KVM guest on ARM, performance counters interrupts have an nit: arm64. 32bit ARM never had any working KVM PMU emulation. > unnecessary overhead which slows down execution when using the "perf > record" command and limits the "perf record" sampling period. > > The problem is that when a guest VM disables counters by clearing the > PMCR_EL0.E bit (bit 0), KVM will disable all counters defined in > PMCR_EL0 even if they are not enabled in PMCNTENSET_EL0. > > KVM disables a counter by calling into the perf framework, in particular > by calling perf_event_create_kernel_counter() which is a time consuming > operation. So, for example, with a Neoverse N1 CPU core which has 6 event > counters and one cycle counter, KVM will always disable all 7 counters > even if only one is enabled. > > This typically happens when using the "perf record" command in a guest > VM: perf will disable all event counters with PMCNTENTSET_EL0 and only > uses the cycle counter. And when using the "perf record" -F option with > a high profiling frequency, the overhead of KVM disabling all counters > instead of one on every counter interrupt becomes very noticeable. > > The problem is fixed by having KVM disable only counters which are > enabled in PMCNTENSET_EL0. If a counter is not enabled in PMCNTENSET_EL0 > then KVM will not enable it when setting PMCR_EL0.E and it will remain > disable as long as it is not enabled in PMCNTENSET_EL0. So there is nit: disabled > effectively no need to disable a counter when clearing PMCR_EL0.E if it > is not enabled PMCNTENSET_EL0. > > Fixes: 76993739cd6f ("arm64: KVM: Add helper to handle PMCR register bits") This isn't a fix (the current behaviour is correct per the architecture), "only" a performance improvement. We reserve "Fixes:" for things that are actually broken. > Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> > --- > arch/arm64/kvm/pmu-emul.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > index fd167d4f4215..bab4b735a0cf 100644 > --- a/arch/arm64/kvm/pmu-emul.c > +++ b/arch/arm64/kvm/pmu-emul.c > @@ -571,7 +571,8 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) > kvm_pmu_enable_counter_mask(vcpu, > __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); > } else { > - kvm_pmu_disable_counter_mask(vcpu, mask); > + kvm_pmu_disable_counter_mask(vcpu, > + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); This seems to perpetuate a flawed pattern. Why do we need to work out the *valid* PMCTENSET_EL0 bits? They should be correct by construction, and the way the shadow sysreg gets populated already enforces this: <quote> static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p, const struct sys_reg_desc *r) { [...] mask = kvm_pmu_valid_counter_mask(vcpu); if (p->is_write) { val = p->regval & mask; if (r->Op2 & 0x1) { /* accessing PMCNTENSET_EL0 */ __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val; kvm_pmu_enable_counter_mask(vcpu, val); kvm_vcpu_pmu_restore_guest(vcpu); </quote> So the sysreg is the only thing we should consider, and I think we should drop the useless masking. There is at least another instance of this in the PMU code (kvm_pmu_overflow_status()), and apart from kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the masking to sanitise accesses. What do you think? M.
Hi Marc, On 6/29/21 11:06 AM, Marc Zyngier wrote: > Hi Alexandre, > > Thanks for looking into this. > > On Mon, 28 Jun 2021 17:19:25 +0100, > Alexandre Chartre <alexandre.chartre@oracle.com> wrote: >> >> In a KVM guest on ARM, performance counters interrupts have an > > nit: arm64. 32bit ARM never had any working KVM PMU emulation. > >> unnecessary overhead which slows down execution when using the "perf >> record" command and limits the "perf record" sampling period. >> >> The problem is that when a guest VM disables counters by clearing the >> PMCR_EL0.E bit (bit 0), KVM will disable all counters defined in >> PMCR_EL0 even if they are not enabled in PMCNTENSET_EL0. >> >> KVM disables a counter by calling into the perf framework, in particular >> by calling perf_event_create_kernel_counter() which is a time consuming >> operation. So, for example, with a Neoverse N1 CPU core which has 6 event >> counters and one cycle counter, KVM will always disable all 7 counters >> even if only one is enabled. >> >> This typically happens when using the "perf record" command in a guest >> VM: perf will disable all event counters with PMCNTENTSET_EL0 and only >> uses the cycle counter. And when using the "perf record" -F option with >> a high profiling frequency, the overhead of KVM disabling all counters >> instead of one on every counter interrupt becomes very noticeable. >> >> The problem is fixed by having KVM disable only counters which are >> enabled in PMCNTENSET_EL0. If a counter is not enabled in PMCNTENSET_EL0 >> then KVM will not enable it when setting PMCR_EL0.E and it will remain >> disable as long as it is not enabled in PMCNTENSET_EL0. So there is > > nit: disabled > >> effectively no need to disable a counter when clearing PMCR_EL0.E if it >> is not enabled PMCNTENSET_EL0. >> >> Fixes: 76993739cd6f ("arm64: KVM: Add helper to handle PMCR register bits") > > This isn't a fix (the current behaviour is correct per the > architecture), "only" a performance improvement. We reserve "Fixes:" > for things that are actually broken. > Ok, I will change everything you mentioned about the commit message. >> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> >> --- >> arch/arm64/kvm/pmu-emul.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c >> index fd167d4f4215..bab4b735a0cf 100644 >> --- a/arch/arm64/kvm/pmu-emul.c >> +++ b/arch/arm64/kvm/pmu-emul.c >> @@ -571,7 +571,8 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) >> kvm_pmu_enable_counter_mask(vcpu, >> __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); >> } else { >> - kvm_pmu_disable_counter_mask(vcpu, mask); >> + kvm_pmu_disable_counter_mask(vcpu, >> + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); > > This seems to perpetuate a flawed pattern. Why do we need to work out > the *valid* PMCTENSET_EL0 bits? They should be correct by construction, > and the way the shadow sysreg gets populated already enforces this: > > <quote> > static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p, > const struct sys_reg_desc *r) > { > [...] > mask = kvm_pmu_valid_counter_mask(vcpu); > if (p->is_write) { > val = p->regval & mask; > if (r->Op2 & 0x1) { > /* accessing PMCNTENSET_EL0 */ > __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val; > kvm_pmu_enable_counter_mask(vcpu, val); > kvm_vcpu_pmu_restore_guest(vcpu); > </quote> > > So the sysreg is the only thing we should consider, and I think we > should drop the useless masking. There is at least another instance of > this in the PMU code (kvm_pmu_overflow_status()), and apart from > kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the > masking to sanitise accesses. > > What do you think? > I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask() so there's effectively no need to mask it again when we use it. I will send an additional patch (on top of this one) to remove useless masking. Basically, changes would be: diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index bab4b735a0cf..e0dfd7ce4ba0 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -373,7 +373,6 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu) reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0); reg &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0); reg &= __vcpu_sys_reg(vcpu, PMINTENSET_EL1); - reg &= kvm_pmu_valid_counter_mask(vcpu); } return reg; @@ -564,21 +563,22 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) */ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) { - unsigned long mask = kvm_pmu_valid_counter_mask(vcpu); + unsigned long mask; int i; if (val & ARMV8_PMU_PMCR_E) { kvm_pmu_enable_counter_mask(vcpu, - __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0)); } else { kvm_pmu_disable_counter_mask(vcpu, - __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0)); } if (val & ARMV8_PMU_PMCR_C) kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, 0); if (val & ARMV8_PMU_PMCR_P) { + mask = kvm_pmu_valid_counter_mask(vcpu); for_each_set_bit(i, &mask, 32) kvm_pmu_set_counter_value(vcpu, i, 0); } diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 1a7968ad078c..2e406905760e 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -845,7 +845,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p, kvm_pmu_disable_counter_mask(vcpu, val); } } else { - p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask; + p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0); } return true; Thanks, alex.
On Tue, 29 Jun 2021 14:16:55 +0100, Alexandre Chartre <alexandre.chartre@oracle.com> wrote: > > > Hi Marc, > > On 6/29/21 11:06 AM, Marc Zyngier wrote: > > Hi Alexandre, [...] > > So the sysreg is the only thing we should consider, and I think we > > should drop the useless masking. There is at least another instance of > > this in the PMU code (kvm_pmu_overflow_status()), and apart from > > kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the > > masking to sanitise accesses. > > > > What do you think? > > > > I think you are right. PMCNTENSET_EL0 is already masked with > kvm_pmu_valid_counter_mask() so there's effectively no need to mask > it again when we use it. I will send an additional patch (on top of > this one) to remove useless masking. Basically, changes would be: > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > index bab4b735a0cf..e0dfd7ce4ba0 100644 > --- a/arch/arm64/kvm/pmu-emul.c > +++ b/arch/arm64/kvm/pmu-emul.c > @@ -373,7 +373,6 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu) > reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0); > reg &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0); > reg &= __vcpu_sys_reg(vcpu, PMINTENSET_EL1); > - reg &= kvm_pmu_valid_counter_mask(vcpu); > } > return reg; > @@ -564,21 +563,22 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) > */ > void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) > { > - unsigned long mask = kvm_pmu_valid_counter_mask(vcpu); > + unsigned long mask; > int i; > if (val & ARMV8_PMU_PMCR_E) { > kvm_pmu_enable_counter_mask(vcpu, > - __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); > + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0)); > } else { > kvm_pmu_disable_counter_mask(vcpu, > - __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); > + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0)); > } > if (val & ARMV8_PMU_PMCR_C) > kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, 0); > if (val & ARMV8_PMU_PMCR_P) { > + mask = kvm_pmu_valid_counter_mask(vcpu); Careful here, this clashes with a fix from Alexandru that is currently in -next (PMCR_EL0.P shouldn't reset the cycle counter) and aimed at 5.14. And whilst you're at it, consider moving the 'mask' declaration here too. > for_each_set_bit(i, &mask, 32) > kvm_pmu_set_counter_value(vcpu, i, 0); > } > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > index 1a7968ad078c..2e406905760e 100644 > --- a/arch/arm64/kvm/sys_regs.c > +++ b/arch/arm64/kvm/sys_regs.c > @@ -845,7 +845,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p, > kvm_pmu_disable_counter_mask(vcpu, val); > } > } else { > - p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask; > + p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0); > } > return true; If you are cleaning up the read-side of sysregs, access_pminten() and access_pmovs() could have some of your attention too. Thanks, M.
On 6/29/21 3:47 PM, Marc Zyngier wrote: > On Tue, 29 Jun 2021 14:16:55 +0100, > Alexandre Chartre <alexandre.chartre@oracle.com> wrote: >> >> >> Hi Marc, >> >> On 6/29/21 11:06 AM, Marc Zyngier wrote: >>> Hi Alexandre, > > [...] > >>> So the sysreg is the only thing we should consider, and I think we >>> should drop the useless masking. There is at least another instance of >>> this in the PMU code (kvm_pmu_overflow_status()), and apart from >>> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the >>> masking to sanitise accesses. >>> >>> What do you think? >>> >> >> I think you are right. PMCNTENSET_EL0 is already masked with >> kvm_pmu_valid_counter_mask() so there's effectively no need to mask >> it again when we use it. I will send an additional patch (on top of >> this one) to remove useless masking. Basically, changes would be: >> >> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c >> index bab4b735a0cf..e0dfd7ce4ba0 100644 >> --- a/arch/arm64/kvm/pmu-emul.c >> +++ b/arch/arm64/kvm/pmu-emul.c >> @@ -373,7 +373,6 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu) >> reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0); >> reg &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0); >> reg &= __vcpu_sys_reg(vcpu, PMINTENSET_EL1); >> - reg &= kvm_pmu_valid_counter_mask(vcpu); >> } >> return reg; >> @@ -564,21 +563,22 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) >> */ >> void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) >> { >> - unsigned long mask = kvm_pmu_valid_counter_mask(vcpu); >> + unsigned long mask; >> int i; >> if (val & ARMV8_PMU_PMCR_E) { >> kvm_pmu_enable_counter_mask(vcpu, >> - __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); >> + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0)); >> } else { >> kvm_pmu_disable_counter_mask(vcpu, >> - __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); >> + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0)); >> } >> if (val & ARMV8_PMU_PMCR_C) >> kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, 0); >> if (val & ARMV8_PMU_PMCR_P) { >> + mask = kvm_pmu_valid_counter_mask(vcpu); > > Careful here, this clashes with a fix from Alexandru that is currently > in -next (PMCR_EL0.P shouldn't reset the cycle counter) and aimed at > 5.14. And whilst you're at it, consider moving the 'mask' declaration > here too. > >> for_each_set_bit(i, &mask, 32) >> kvm_pmu_set_counter_value(vcpu, i, 0); >> } >> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c >> index 1a7968ad078c..2e406905760e 100644 >> --- a/arch/arm64/kvm/sys_regs.c >> +++ b/arch/arm64/kvm/sys_regs.c >> @@ -845,7 +845,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p, >> kvm_pmu_disable_counter_mask(vcpu, val); >> } >> } else { >> - p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask; >> + p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0); >> } >> return true; > > If you are cleaning up the read-side of sysregs, access_pminten() and > access_pmovs() could have some of your attention too. > Ok, so for now, I will just resubmit the initial patch with the commit comment fixes. Then, look at all the mask cleanup on top of Alexandru changes and prepare another patch. alex.
On 2021-06-29 15:17, Alexandre Chartre wrote: > On 6/29/21 3:47 PM, Marc Zyngier wrote: >> On Tue, 29 Jun 2021 14:16:55 +0100, >> Alexandre Chartre <alexandre.chartre@oracle.com> wrote: >>> >>> >>> Hi Marc, >>> >>> On 6/29/21 11:06 AM, Marc Zyngier wrote: >>>> Hi Alexandre, >> >> [...] >> >>>> So the sysreg is the only thing we should consider, and I think we >>>> should drop the useless masking. There is at least another instance >>>> of >>>> this in the PMU code (kvm_pmu_overflow_status()), and apart from >>>> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about >>>> the >>>> masking to sanitise accesses. >>>> >>>> What do you think? >>>> >>> >>> I think you are right. PMCNTENSET_EL0 is already masked with >>> kvm_pmu_valid_counter_mask() so there's effectively no need to mask >>> it again when we use it. I will send an additional patch (on top of >>> this one) to remove useless masking. Basically, changes would be: >>> >>> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c >>> index bab4b735a0cf..e0dfd7ce4ba0 100644 >>> --- a/arch/arm64/kvm/pmu-emul.c >>> +++ b/arch/arm64/kvm/pmu-emul.c >>> @@ -373,7 +373,6 @@ static u64 kvm_pmu_overflow_status(struct >>> kvm_vcpu *vcpu) >>> reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0); >>> reg &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0); >>> reg &= __vcpu_sys_reg(vcpu, PMINTENSET_EL1); >>> - reg &= kvm_pmu_valid_counter_mask(vcpu); >>> } >>> return reg; >>> @@ -564,21 +563,22 @@ void kvm_pmu_software_increment(struct kvm_vcpu >>> *vcpu, u64 val) >>> */ >>> void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) >>> { >>> - unsigned long mask = kvm_pmu_valid_counter_mask(vcpu); >>> + unsigned long mask; >>> int i; >>> if (val & ARMV8_PMU_PMCR_E) { >>> kvm_pmu_enable_counter_mask(vcpu, >>> - __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); >>> + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0)); >>> } else { >>> kvm_pmu_disable_counter_mask(vcpu, >>> - __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); >>> + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0)); >>> } >>> if (val & ARMV8_PMU_PMCR_C) >>> kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, >>> 0); >>> if (val & ARMV8_PMU_PMCR_P) { >>> + mask = kvm_pmu_valid_counter_mask(vcpu); >> >> Careful here, this clashes with a fix from Alexandru that is currently >> in -next (PMCR_EL0.P shouldn't reset the cycle counter) and aimed at >> 5.14. And whilst you're at it, consider moving the 'mask' declaration >> here too. >> >>> for_each_set_bit(i, &mask, 32) >>> kvm_pmu_set_counter_value(vcpu, i, 0); >>> } >>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c >>> index 1a7968ad078c..2e406905760e 100644 >>> --- a/arch/arm64/kvm/sys_regs.c >>> +++ b/arch/arm64/kvm/sys_regs.c >>> @@ -845,7 +845,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, >>> struct sys_reg_params *p, >>> kvm_pmu_disable_counter_mask(vcpu, val); >>> } >>> } else { >>> - p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & >>> mask; >>> + p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0); >>> } >>> return true; >> >> If you are cleaning up the read-side of sysregs, access_pminten() and >> access_pmovs() could have some of your attention too. >> > > Ok, so for now, I will just resubmit the initial patch with the commit > comment fixes. Then, look at all the mask cleanup on top of Alexandru > changes and prepare another patch. Please send this as a series rather than individual patches. I'm only queuing critical fixes at the moment (this is the merge window). If you post the series after -rc1, I'll queue it and let it simmer in -next. Thanks, M.
On 6/29/21 4:25 PM, Marc Zyngier wrote: > On 2021-06-29 15:17, Alexandre Chartre wrote: >> On 6/29/21 3:47 PM, Marc Zyngier wrote: >>> On Tue, 29 Jun 2021 14:16:55 +0100, >>> Alexandre Chartre <alexandre.chartre@oracle.com> wrote: >>>> >>>> >>>> Hi Marc, >>>> >>>> On 6/29/21 11:06 AM, Marc Zyngier wrote: >>>>> Hi Alexandre, >>> >>> [...] >>> >>> If you are cleaning up the read-side of sysregs, access_pminten() and >>> access_pmovs() could have some of your attention too. >>> >> >> Ok, so for now, I will just resubmit the initial patch with the commit >> comment fixes. Then, look at all the mask cleanup on top of Alexandru >> changes and prepare another patch. > > Please send this as a series rather than individual patches. I'm only > queuing critical fixes at the moment (this is the merge window). > If you post the series after -rc1, I'll queue it and let it simmer > in -next. > Ok, I will prepare a series and send it after rc1. alex.
Hi Marc, On 6/29/21 3:16 PM, Alexandre Chartre wrote: > On 6/29/21 11:06 AM, Marc Zyngier wrote > [...] >> So the sysreg is the only thing we should consider, and I think we >> should drop the useless masking. There is at least another instance of >> this in the PMU code (kvm_pmu_overflow_status()), and apart from >> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the >> masking to sanitise accesses. >> >> What do you think? >> > > I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask() > so there's effectively no need to mask it again when we use it. I will send an additional > patch (on top of this one) to remove useless masking. Basically, changes would be: I had a closer look and we can't remove the mask. The access functions (for pmcnten, pminten, pmovs), clear or set only the specified valid counter bits. This means that bits other than the valid counter bits never change in __vcpu_sys_reg(), and those bits are not necessarily zero because the initial value is 0x1de7ec7edbadc0deULL (set by reset_unknown()). So I will resubmit initial patch, with just the commit message changes. Thanks, alex.
On Tue, 06 Jul 2021 14:50:35 +0100, Alexandre Chartre <alexandre.chartre@oracle.com> wrote: > > > Hi Marc, > > On 6/29/21 3:16 PM, Alexandre Chartre wrote: > > On 6/29/21 11:06 AM, Marc Zyngier wrote > > [...] > >> So the sysreg is the only thing we should consider, and I think we > >> should drop the useless masking. There is at least another instance of > >> this in the PMU code (kvm_pmu_overflow_status()), and apart from > >> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the > >> masking to sanitise accesses. > >> > >> What do you think? > >> > > > > I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask() > > so there's effectively no need to mask it again when we use it. I will send an additional > > patch (on top of this one) to remove useless masking. Basically, changes would be: > > I had a closer look and we can't remove the mask. The access > functions (for pmcnten, pminten, pmovs), clear or set only the > specified valid counter bits. This means that bits other than the > valid counter bits never change in __vcpu_sys_reg(), and those bits > are not necessarily zero because the initial value is > 0x1de7ec7edbadc0deULL (set by reset_unknown()). That's a bug that should be fixed on its own. Bits that are RAZ/WI in the architecture shouldn't be kept in the shadow registers the first place. I'll have a look. > So I will resubmit initial patch, with just the commit message > changes. Please don't. I'm not papering over this kind of bug. Thanks, M.
On 7/6/21 4:52 PM, Marc Zyngier wrote: > On Tue, 06 Jul 2021 14:50:35 +0100, > Alexandre Chartre <alexandre.chartre@oracle.com> wrote: >> >> >> Hi Marc, >> >> On 6/29/21 3:16 PM, Alexandre Chartre wrote: >>> On 6/29/21 11:06 AM, Marc Zyngier wrote >>> [...] >>>> So the sysreg is the only thing we should consider, and I think we >>>> should drop the useless masking. There is at least another instance of >>>> this in the PMU code (kvm_pmu_overflow_status()), and apart from >>>> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the >>>> masking to sanitise accesses. >>>> >>>> What do you think? >>>> >>> >>> I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask() >>> so there's effectively no need to mask it again when we use it. I will send an additional >>> patch (on top of this one) to remove useless masking. Basically, changes would be: >> >> I had a closer look and we can't remove the mask. The access >> functions (for pmcnten, pminten, pmovs), clear or set only the >> specified valid counter bits. This means that bits other than the >> valid counter bits never change in __vcpu_sys_reg(), and those bits >> are not necessarily zero because the initial value is >> 0x1de7ec7edbadc0deULL (set by reset_unknown()). > > That's a bug that should be fixed on its own. Bits that are RAZ/WI in > the architecture shouldn't be kept in the shadow registers the first > place. I'll have a look. > >> So I will resubmit initial patch, with just the commit message >> changes. > > Please don't. I'm not papering over this kind of bug. > Right, I meant I will resubmit after -rc1 as you requested. Thanks, alex.
On Tue, 06 Jul 2021 15:52:46 +0100, Marc Zyngier <maz@kernel.org> wrote: > > On Tue, 06 Jul 2021 14:50:35 +0100, > Alexandre Chartre <alexandre.chartre@oracle.com> wrote: > > > > > > Hi Marc, > > > > On 6/29/21 3:16 PM, Alexandre Chartre wrote: > > > On 6/29/21 11:06 AM, Marc Zyngier wrote > > > [...] > > >> So the sysreg is the only thing we should consider, and I think we > > >> should drop the useless masking. There is at least another instance of > > >> this in the PMU code (kvm_pmu_overflow_status()), and apart from > > >> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the > > >> masking to sanitise accesses. > > >> > > >> What do you think? > > >> > > > > > > I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask() > > > so there's effectively no need to mask it again when we use it. I will send an additional > > > patch (on top of this one) to remove useless masking. Basically, changes would be: > > > > I had a closer look and we can't remove the mask. The access > > functions (for pmcnten, pminten, pmovs), clear or set only the > > specified valid counter bits. This means that bits other than the > > valid counter bits never change in __vcpu_sys_reg(), and those bits > > are not necessarily zero because the initial value is > > 0x1de7ec7edbadc0deULL (set by reset_unknown()). > > That's a bug that should be fixed on its own. Bits that are RAZ/WI in > the architecture shouldn't be kept in the shadow registers the first > place. I'll have a look. Please try this[1] for size, which is on top of Linus' tree as of this morning. M. [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pmu/reset-values
On 7/6/21 7:36 PM, Marc Zyngier wrote: > On Tue, 06 Jul 2021 15:52:46 +0100, > Marc Zyngier <maz@kernel.org> wrote: >> >> On Tue, 06 Jul 2021 14:50:35 +0100, >> Alexandre Chartre <alexandre.chartre@oracle.com> wrote: >>> >>> >>> Hi Marc, >>> >>> On 6/29/21 3:16 PM, Alexandre Chartre wrote: >>>> On 6/29/21 11:06 AM, Marc Zyngier wrote >>>> [...] >>>>> So the sysreg is the only thing we should consider, and I think we >>>>> should drop the useless masking. There is at least another instance of >>>>> this in the PMU code (kvm_pmu_overflow_status()), and apart from >>>>> kvm_pmu_vcpu_reset(), only the sysreg accessors should care about the >>>>> masking to sanitise accesses. >>>>> >>>>> What do you think? >>>>> >>>> >>>> I think you are right. PMCNTENSET_EL0 is already masked with kvm_pmu_valid_counter_mask() >>>> so there's effectively no need to mask it again when we use it. I will send an additional >>>> patch (on top of this one) to remove useless masking. Basically, changes would be: >>> >>> I had a closer look and we can't remove the mask. The access >>> functions (for pmcnten, pminten, pmovs), clear or set only the >>> specified valid counter bits. This means that bits other than the >>> valid counter bits never change in __vcpu_sys_reg(), and those bits >>> are not necessarily zero because the initial value is >>> 0x1de7ec7edbadc0deULL (set by reset_unknown()). >> >> That's a bug that should be fixed on its own. Bits that are RAZ/WI in >> the architecture shouldn't be kept in the shadow registers the first >> place. I'll have a look. > > Please try this[1] for size, which is on top of Linus' tree as of this > morning. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pmu/reset-values > I have tried [1] and it works fine. Also tested with my patch on top (without masking) and it also works fine. Thanks, alex.
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index fd167d4f4215..bab4b735a0cf 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -571,7 +571,8 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) kvm_pmu_enable_counter_mask(vcpu, __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); } else { - kvm_pmu_disable_counter_mask(vcpu, mask); + kvm_pmu_disable_counter_mask(vcpu, + __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); } if (val & ARMV8_PMU_PMCR_C)
In a KVM guest on ARM, performance counters interrupts have an unnecessary overhead which slows down execution when using the "perf record" command and limits the "perf record" sampling period. The problem is that when a guest VM disables counters by clearing the PMCR_EL0.E bit (bit 0), KVM will disable all counters defined in PMCR_EL0 even if they are not enabled in PMCNTENSET_EL0. KVM disables a counter by calling into the perf framework, in particular by calling perf_event_create_kernel_counter() which is a time consuming operation. So, for example, with a Neoverse N1 CPU core which has 6 event counters and one cycle counter, KVM will always disable all 7 counters even if only one is enabled. This typically happens when using the "perf record" command in a guest VM: perf will disable all event counters with PMCNTENTSET_EL0 and only uses the cycle counter. And when using the "perf record" -F option with a high profiling frequency, the overhead of KVM disabling all counters instead of one on every counter interrupt becomes very noticeable. The problem is fixed by having KVM disable only counters which are enabled in PMCNTENSET_EL0. If a counter is not enabled in PMCNTENSET_EL0 then KVM will not enable it when setting PMCR_EL0.E and it will remain disable as long as it is not enabled in PMCNTENSET_EL0. So there is effectively no need to disable a counter when clearing PMCR_EL0.E if it is not enabled PMCNTENSET_EL0. Fixes: 76993739cd6f ("arm64: KVM: Add helper to handle PMCR register bits") Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> --- arch/arm64/kvm/pmu-emul.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)