Message ID | 1548154197-5470-3-git-send-email-andrew.murray@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm/arm64: add support for chained counters | expand |
Hi Andrew, On 22/01/2019 10:49, Andrew Murray wrote: > The perf event sample_period is currently set based upon the current > counter value, when PMXEVTYPER is written to and the perf event is created. > However the user may choose to write the type before the counter value in > which case sample_period will be set incorrectly. Let's instead decouple > event creation from PMXEVTYPER and (re)create the event in either > suitation. > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> > --- > virt/kvm/arm/pmu.c | 39 ++++++++++++++++++++++++++++++--------- > 1 file changed, 30 insertions(+), 9 deletions(-) > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c > index 531d27f..4464899 100644 > --- a/virt/kvm/arm/pmu.c > +++ b/virt/kvm/arm/pmu.c > @@ -24,6 +24,8 @@ > #include <kvm/arm_pmu.h> > #include <kvm/arm_vgic.h> > > +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 data, > + u64 select_idx); > /** > * kvm_pmu_get_counter_value - get PMU counter value > * @vcpu: The vcpu pointer > @@ -57,11 +59,18 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) > */ > void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) > { > - u64 reg; > + u64 reg, data; > > reg = (select_idx == ARMV8_PMU_CYCLE_IDX) > ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx; > __vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx); > + > + reg = (select_idx == ARMV8_PMU_CYCLE_IDX) > + ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + select_idx; > + data = __vcpu_sys_reg(vcpu, reg + select_idx); I think this should be just "reg" instead of "reg + select_idx". Cheers,
On Tue, Jan 22, 2019 at 12:12:51PM +0000, Julien Thierry wrote: > Hi Andrew, > > On 22/01/2019 10:49, Andrew Murray wrote: > > The perf event sample_period is currently set based upon the current > > counter value, when PMXEVTYPER is written to and the perf event is created. > > However the user may choose to write the type before the counter value in > > which case sample_period will be set incorrectly. Let's instead decouple > > event creation from PMXEVTYPER and (re)create the event in either > > suitation. > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> > > --- > > virt/kvm/arm/pmu.c | 39 ++++++++++++++++++++++++++++++--------- > > 1 file changed, 30 insertions(+), 9 deletions(-) > > > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c > > index 531d27f..4464899 100644 > > --- a/virt/kvm/arm/pmu.c > > +++ b/virt/kvm/arm/pmu.c > > @@ -24,6 +24,8 @@ > > #include <kvm/arm_pmu.h> > > #include <kvm/arm_vgic.h> > > > > +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 data, > > + u64 select_idx); > > /** > > * kvm_pmu_get_counter_value - get PMU counter value > > * @vcpu: The vcpu pointer > > @@ -57,11 +59,18 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) > > */ > > void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) > > { > > - u64 reg; > > + u64 reg, data; > > > > reg = (select_idx == ARMV8_PMU_CYCLE_IDX) > > ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx; > > __vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx); > > + > > + reg = (select_idx == ARMV8_PMU_CYCLE_IDX) > > + ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + select_idx; > > + data = __vcpu_sys_reg(vcpu, reg + select_idx); > > I think this should be just "reg" instead of "reg + select_idx". Yes, good catch. Thanks, Andrew Murray > > Cheers, > > -- > Julien Thierry
Hi Andrew On 01/22/2019 10:49 AM, Andrew Murray wrote: > The perf event sample_period is currently set based upon the current > counter value, when PMXEVTYPER is written to and the perf event is created. > However the user may choose to write the type before the counter value in > which case sample_period will be set incorrectly. Let's instead decouple > event creation from PMXEVTYPER and (re)create the event in either > suitation. > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> The approach looks fine to me. However this patch seems to introduce a memory leak, see below, which you may be addressing in a later patch in the series. But this will affect bisecting issues. > --- > virt/kvm/arm/pmu.c | 39 ++++++++++++++++++++++++++++++--------- > 1 file changed, 30 insertions(+), 9 deletions(-) > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c > index 531d27f..4464899 100644 > --- a/virt/kvm/arm/pmu.c > +++ b/virt/kvm/arm/pmu.c > @@ -24,6 +24,8 @@ > #include <kvm/arm_pmu.h> > #include <kvm/arm_vgic.h> > > +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 data, > + u64 select_idx); Could we just pass the counter index (i.e, select_idx) after updating the event_type/counter value in the respective functions. nit: If we decide not to do that, please rename "data" to something more obvious, event_type. > /** > * kvm_pmu_get_counter_value - get PMU counter value > * @vcpu: The vcpu pointer > @@ -57,11 +59,18 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) > */ > void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) > { > - u64 reg; > + u64 reg, data; nit: Same here, data is too generic. > > reg = (select_idx == ARMV8_PMU_CYCLE_IDX) > ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx; > __vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx); > + > + reg = (select_idx == ARMV8_PMU_CYCLE_IDX) > + ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + select_idx; > + data = __vcpu_sys_reg(vcpu, reg + select_idx); > + > + /* Recreate the perf event to reflect the updated sample_period */ > + kvm_pmu_create_perf_event(vcpu, data, select_idx); > } > > /** > @@ -380,17 +389,13 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx) > } > > /** > - * kvm_pmu_set_counter_event_type - set selected counter to monitor some event > + * kvm_pmu_create_perf_event - create a perf event for a counter > * @vcpu: The vcpu pointer > - * @data: The data guest writes to PMXEVTYPER_EL0 > + * @data: Type of event as per PMXEVTYPER_EL0 format > * @select_idx: The number of selected counter > - * > - * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count an > - * event with given hardware event number. Here we call perf_event API to > - * emulate this action and create a kernel perf event for it. > */ > -void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, > - u64 select_idx) > +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 data, > + u64 select_idx) > { > struct kvm_pmu *pmu = &vcpu->arch.pmu; > struct kvm_pmc *pmc = &pmu->pmc[select_idx]; > @@ -433,6 +438,22 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, > pmc->perf_event = event; We should release the existing perf_event to prevent a memory leak and also a corruption in the data via the overflow handler for the existing event. Am I missing something here ? > } > > +/** > + * kvm_pmu_set_counter_event_type - set selected counter to monitor some event > + * @vcpu: The vcpu pointer > + * @data: The data guest writes to PMXEVTYPER_EL0 > + * @select_idx: The number of selected counter > + * > + * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count an > + * event with given hardware event number. Here we call perf_event API to > + * emulate this action and create a kernel perf event for it. > + */ > +void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, > + u64 select_idx) > +{ > + kvm_pmu_create_perf_event(vcpu, data, select_idx); > +} > + > bool kvm_arm_support_pmu_v3(void) > { > /* > Cheers Suzuki
On Tue, Jan 22, 2019 at 02:18:17PM +0000, Suzuki K Poulose wrote: > Hi Andrew > > On 01/22/2019 10:49 AM, Andrew Murray wrote: > > The perf event sample_period is currently set based upon the current > > counter value, when PMXEVTYPER is written to and the perf event is created. > > However the user may choose to write the type before the counter value in > > which case sample_period will be set incorrectly. Let's instead decouple > > event creation from PMXEVTYPER and (re)create the event in either > > suitation. > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> > > The approach looks fine to me. However this patch seems to introduce a > memory leak, see below, which you may be addressing in a later patch in the > series. But this will affect bisecting issues. See below, I don't think this is true. > > > --- > > virt/kvm/arm/pmu.c | 39 ++++++++++++++++++++++++++++++--------- > > 1 file changed, 30 insertions(+), 9 deletions(-) > > > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c > > index 531d27f..4464899 100644 > > --- a/virt/kvm/arm/pmu.c > > +++ b/virt/kvm/arm/pmu.c > > @@ -24,6 +24,8 @@ > > #include <kvm/arm_pmu.h> > > #include <kvm/arm_vgic.h> > > +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 data, > > + u64 select_idx); > > Could we just pass the counter index (i.e, select_idx) after updating > the event_type/counter value in the respective functions. Unless I misunderstand we need the value of 'data' as it is used to populate the function local perf_event_attr structure. However it is possible to instead read 'data' from __vcpu_sys_reg in kvm_pmu_create_perf_event instead of the call site. However kvm_pmu_set_counter_event_type would have to set the value of __vcpu_sys_reg from its data argument (as __vcpu_sys_reg normally gets set after kvm_pmu_set_counter_event_type returns). This is OK as we do this in the next patch in this series anyway - so perhaps I can bring that forward to this patch? > > nit: If we decide not to do that, please rename "data" to something more > obvious, event_type. > > > /** > > * kvm_pmu_get_counter_value - get PMU counter value > > * @vcpu: The vcpu pointer > > @@ -57,11 +59,18 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) > > */ > > void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) > > { > > - u64 reg; > > + u64 reg, data; > > nit: Same here, data is too generic. > > > reg = (select_idx == ARMV8_PMU_CYCLE_IDX) > > ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx; > > __vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx); > > + > > + reg = (select_idx == ARMV8_PMU_CYCLE_IDX) > > + ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + select_idx; > > + data = __vcpu_sys_reg(vcpu, reg + select_idx); > > + > > + /* Recreate the perf event to reflect the updated sample_period */ > > + kvm_pmu_create_perf_event(vcpu, data, select_idx); > > } > > /** > > @@ -380,17 +389,13 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx) > > } > > /** > > - * kvm_pmu_set_counter_event_type - set selected counter to monitor some event > > + * kvm_pmu_create_perf_event - create a perf event for a counter > > * @vcpu: The vcpu pointer > > - * @data: The data guest writes to PMXEVTYPER_EL0 > > + * @data: Type of event as per PMXEVTYPER_EL0 format > > * @select_idx: The number of selected counter > > - * > > - * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count an > > - * event with given hardware event number. Here we call perf_event API to > > - * emulate this action and create a kernel perf event for it. > > */ > > -void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, > > - u64 select_idx) > > +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 data, > > + u64 select_idx) > > { > > struct kvm_pmu *pmu = &vcpu->arch.pmu; > > struct kvm_pmc *pmc = &pmu->pmc[select_idx]; > > @@ -433,6 +438,22 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, > > pmc->perf_event = event; > > We should release the existing perf_event to prevent a memory leak and > also a corruption in the data via the overflow handler for the existing > event. Am I missing something here ? In kvm_pmu_create_perf_event (formally kvm_pmu_set_counter_event_type) we call kvm_pmu_stop_counter - this releases the event. So there is no memory leak here. Thanks, Andrew Murray > > > } > > +/** > > + * kvm_pmu_set_counter_event_type - set selected counter to monitor some event > > + * @vcpu: The vcpu pointer > > + * @data: The data guest writes to PMXEVTYPER_EL0 > > + * @select_idx: The number of selected counter > > + * > > + * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count an > > + * event with given hardware event number. Here we call perf_event API to > > + * emulate this action and create a kernel perf event for it. > > + */ > > +void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, > > + u64 select_idx) > > +{ > > + kvm_pmu_create_perf_event(vcpu, data, select_idx); > > +} > > + > > bool kvm_arm_support_pmu_v3(void) > > { > > /* > > > > > Cheers > Suzuki
Hi Andrew, On 28/01/2019 11:47, Andrew Murray wrote: > On Tue, Jan 22, 2019 at 02:18:17PM +0000, Suzuki K Poulose wrote: >> Hi Andrew >> >> On 01/22/2019 10:49 AM, Andrew Murray wrote: >>> The perf event sample_period is currently set based upon the current >>> counter value, when PMXEVTYPER is written to and the perf event is created. >>> However the user may choose to write the type before the counter value in >>> which case sample_period will be set incorrectly. Let's instead decouple >>> event creation from PMXEVTYPER and (re)create the event in either >>> suitation. >>> >>> Signed-off-by: Andrew Murray <andrew.murray@arm.com> >> >> The approach looks fine to me. However this patch seems to introduce a >> memory leak, see below, which you may be addressing in a later patch in the >> series. But this will affect bisecting issues. > > See below, I don't think this is true. You're right. Sorry for the noise. > >> >>> --- >>> virt/kvm/arm/pmu.c | 39 ++++++++++++++++++++++++++++++--------- >>> 1 file changed, 30 insertions(+), 9 deletions(-) >>> >>> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c >>> index 531d27f..4464899 100644 >>> --- a/virt/kvm/arm/pmu.c >>> +++ b/virt/kvm/arm/pmu.c >>> @@ -24,6 +24,8 @@ >>> #include <kvm/arm_pmu.h> >>> #include <kvm/arm_vgic.h> >>> +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 data, >>> + u64 select_idx); >> >> Could we just pass the counter index (i.e, select_idx) after updating >> the event_type/counter value in the respective functions. > > Unless I misunderstand we need the value of 'data' as it is used to > populate the function local perf_event_attr structure. Yes, we do program the "data" (which is the event code) in attr.config. So, since this is now a helper routine, so the name "data" doesn't make any sense. > > However it is possible to instead read 'data' from __vcpu_sys_reg in > kvm_pmu_create_perf_event instead of the call site. However > kvm_pmu_set_counter_event_type would have to set the value of > __vcpu_sys_reg from its data argument (as __vcpu_sys_reg normally gets > set after kvm_pmu_set_counter_event_type returns). This is OK as we > do this in the next patch in this series anyway - so perhaps I can > bring that forward to this patch? Yes, please. Cheers Suzuki
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c index 531d27f..4464899 100644 --- a/virt/kvm/arm/pmu.c +++ b/virt/kvm/arm/pmu.c @@ -24,6 +24,8 @@ #include <kvm/arm_pmu.h> #include <kvm/arm_vgic.h> +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 data, + u64 select_idx); /** * kvm_pmu_get_counter_value - get PMU counter value * @vcpu: The vcpu pointer @@ -57,11 +59,18 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) */ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) { - u64 reg; + u64 reg, data; reg = (select_idx == ARMV8_PMU_CYCLE_IDX) ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx; __vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx); + + reg = (select_idx == ARMV8_PMU_CYCLE_IDX) + ? PMCCFILTR_EL0 : PMEVTYPER0_EL0 + select_idx; + data = __vcpu_sys_reg(vcpu, reg + select_idx); + + /* Recreate the perf event to reflect the updated sample_period */ + kvm_pmu_create_perf_event(vcpu, data, select_idx); } /** @@ -380,17 +389,13 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx) } /** - * kvm_pmu_set_counter_event_type - set selected counter to monitor some event + * kvm_pmu_create_perf_event - create a perf event for a counter * @vcpu: The vcpu pointer - * @data: The data guest writes to PMXEVTYPER_EL0 + * @data: Type of event as per PMXEVTYPER_EL0 format * @select_idx: The number of selected counter - * - * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count an - * event with given hardware event number. Here we call perf_event API to - * emulate this action and create a kernel perf event for it. */ -void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, - u64 select_idx) +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 data, + u64 select_idx) { struct kvm_pmu *pmu = &vcpu->arch.pmu; struct kvm_pmc *pmc = &pmu->pmc[select_idx]; @@ -433,6 +438,22 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, pmc->perf_event = event; } +/** + * kvm_pmu_set_counter_event_type - set selected counter to monitor some event + * @vcpu: The vcpu pointer + * @data: The data guest writes to PMXEVTYPER_EL0 + * @select_idx: The number of selected counter + * + * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count an + * event with given hardware event number. Here we call perf_event API to + * emulate this action and create a kernel perf event for it. + */ +void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, + u64 select_idx) +{ + kvm_pmu_create_perf_event(vcpu, data, select_idx); +} + bool kvm_arm_support_pmu_v3(void) { /*
The perf event sample_period is currently set based upon the current counter value, when PMXEVTYPER is written to and the perf event is created. However the user may choose to write the type before the counter value in which case sample_period will be set incorrectly. Let's instead decouple event creation from PMXEVTYPER and (re)create the event in either suitation. Signed-off-by: Andrew Murray <andrew.murray@arm.com> --- virt/kvm/arm/pmu.c | 39 ++++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-)