Message ID | 1542885950-23816-4-git-send-email-andrew.murray@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Support perf event modifiers :G and :H | expand |
Hi Andrew, On Thu, Nov 22, 2018 at 11:25:49AM +0000, Andrew Murray wrote: > Add support for the :G and :H attributes in perf by handling the > exclude_host/exclude_guest event attributes. > > We notify KVM of counters that we wish to be enabled or disabled on > guest entry/exit and thus defer from starting or stopping :G events > as per the events exclude_host attribute. > > With both VHE and non-VHE we switch the counters between host/guest > at EL2. We are able to eliminate counters counting host events on > the boundaries of guest entry/exit when using :G by filtering out > EL2 for exclude_host. However when using :H unless exclude_hv is set > on non-VHE then there is a small blackout window at the guest > entry/exit where host events are not captured. > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> > --- > arch/arm64/kernel/perf_event.c | 52 ++++++++++++++++++++++++++++++++++++------ > 1 file changed, 45 insertions(+), 7 deletions(-) I've got a cold at the moment, so it could just be that, but I struggled to follow the logic in this patch. Comments below. > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c > index de564ae..e0619ba 100644 > --- a/arch/arm64/kernel/perf_event.c > +++ b/arch/arm64/kernel/perf_event.c > @@ -26,6 +26,7 @@ > > #include <linux/acpi.h> > #include <linux/clocksource.h> > +#include <linux/kvm_host.h> > #include <linux/of.h> > #include <linux/perf/arm_pmu.h> > #include <linux/platform_device.h> > @@ -647,11 +648,23 @@ static inline int armv8pmu_enable_counter(int idx) > > static inline void armv8pmu_enable_event_counter(struct perf_event *event) > { > + struct perf_event_attr *attr = &event->attr; > int idx = event->hw.idx; > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx)); > > - armv8pmu_enable_counter(idx); > if (armv8pmu_event_is_chained(event)) > - armv8pmu_enable_counter(idx - 1); > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); > + > + if (attr->exclude_host) > + kvm_clr_set_guest_pmu_events(0, counter_bits); > + if (attr->exclude_guest) > + kvm_clr_set_host_pmu_events(0, counter_bits); Hmm, so if I have both exclude_host and exclude_guest set then we'll set the counter in both events_host_only and events_guest_only, which is weird and probably not what you want. I think I'd find it easier to understand if you dropped the "only" suffix on the kvm masks, and you actually made the logic positive so that an event that counts in both host and guest has its bit set in both of the masks. > + > + if (!attr->exclude_host) { > + armv8pmu_enable_counter(idx); > + if (armv8pmu_event_is_chained(event)) > + armv8pmu_enable_counter(idx - 1); > + } > } > > static inline int armv8pmu_disable_counter(int idx) > @@ -664,11 +677,23 @@ static inline int armv8pmu_disable_counter(int idx) > static inline void armv8pmu_disable_event_counter(struct perf_event *event) > { > struct hw_perf_event *hwc = &event->hw; > + struct perf_event_attr *attr = &event->attr; > int idx = hwc->idx; > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx)); > > if (armv8pmu_event_is_chained(event)) > - armv8pmu_disable_counter(idx - 1); > - armv8pmu_disable_counter(idx); > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); > + > + if (attr->exclude_host) > + kvm_clr_set_guest_pmu_events(counter_bits, 0); > + if (attr->exclude_guest) > + kvm_clr_set_host_pmu_events(counter_bits, 0); > + > + if (!attr->exclude_host) { > + if (armv8pmu_event_is_chained(event)) > + armv8pmu_disable_counter(idx - 1); > + armv8pmu_disable_counter(idx); > + } It looks like you're relying on the perf_event_attr remaining unchanged between the enable/disable calls for a given event. However, I'm not convinced that's the case -- have you checked that e.g. PERF_EVENT_IOC_MODIFY_ATTRIBUTES can't rewrite the thing behind your back? > } > > static inline int armv8pmu_enable_intens(int idx) > @@ -943,16 +968,25 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, > * Therefore we ignore exclude_hv in this configuration, since > * there's no hypervisor to sample anyway. This is consistent > * with other architectures (x86 and Power). > + * > + * To eliminate counting host events on the boundaries of > + * guest entry/exit we ensure EL2 is not included in hyp mode > + * with !exclude_host. > */ > if (is_kernel_in_hyp_mode()) { > - if (!attr->exclude_kernel) > + if (!attr->exclude_kernel && !attr->exclude_host) > config_base |= ARMV8_PMU_INCLUDE_EL2; Do you know what perf passes by default here? I'm worried that we're introducing a user-visible change to existing programs by changing this check. > } else { > - if (attr->exclude_kernel) > - config_base |= ARMV8_PMU_EXCLUDE_EL1; > if (!attr->exclude_hv) > config_base |= ARMV8_PMU_INCLUDE_EL2; > } > + > + /* > + * Filter out !VHE kernels and guest kernels > + */ > + if (attr->exclude_kernel) > + config_base |= ARMV8_PMU_EXCLUDE_EL1; Why have you moved this hunk? Will
On Mon, Dec 03, 2018 at 03:04:14PM +0000, Will Deacon wrote: > Hi Andrew, > > On Thu, Nov 22, 2018 at 11:25:49AM +0000, Andrew Murray wrote: > > Add support for the :G and :H attributes in perf by handling the > > exclude_host/exclude_guest event attributes. > > > > We notify KVM of counters that we wish to be enabled or disabled on > > guest entry/exit and thus defer from starting or stopping :G events > > as per the events exclude_host attribute. > > > > With both VHE and non-VHE we switch the counters between host/guest > > at EL2. We are able to eliminate counters counting host events on > > the boundaries of guest entry/exit when using :G by filtering out > > EL2 for exclude_host. However when using :H unless exclude_hv is set > > on non-VHE then there is a small blackout window at the guest > > entry/exit where host events are not captured. > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com> > > --- > > arch/arm64/kernel/perf_event.c | 52 ++++++++++++++++++++++++++++++++++++------ > > 1 file changed, 45 insertions(+), 7 deletions(-) > > I've got a cold at the moment, so it could just be that, but I struggled to > follow the logic in this patch. Comments below. > > > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c > > index de564ae..e0619ba 100644 > > --- a/arch/arm64/kernel/perf_event.c > > +++ b/arch/arm64/kernel/perf_event.c > > @@ -26,6 +26,7 @@ > > > > #include <linux/acpi.h> > > #include <linux/clocksource.h> > > +#include <linux/kvm_host.h> > > #include <linux/of.h> > > #include <linux/perf/arm_pmu.h> > > #include <linux/platform_device.h> > > @@ -647,11 +648,23 @@ static inline int armv8pmu_enable_counter(int idx) > > > > static inline void armv8pmu_enable_event_counter(struct perf_event *event) > > { > > + struct perf_event_attr *attr = &event->attr; > > int idx = event->hw.idx; > > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx)); > > > > - armv8pmu_enable_counter(idx); > > if (armv8pmu_event_is_chained(event)) > > - armv8pmu_enable_counter(idx - 1); > > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); > > + > > + if (attr->exclude_host) > > + kvm_clr_set_guest_pmu_events(0, counter_bits); > > + if (attr->exclude_guest) > > + kvm_clr_set_host_pmu_events(0, counter_bits); > > Hmm, so if I have both exclude_host and exclude_guest set then we'll set > the counter in both events_host_only and events_guest_only, which is weird > and probably not what you want. perf_event_attr is passed from userspace, whilst the perf tool doesn't allow you to set exclude_host and exclude_guest at the same time it's feasiable for another user to do so. So indeed the current behaviour wouldn't be expected. > > I think I'd find it easier to understand if you dropped the "only" suffix on > the kvm masks, and you actually made the logic positive so that an event > that counts in both host and guest has its bit set in both of the masks. No problem. This actually fixes a bug when multiple events are enabled. > > > + > > + if (!attr->exclude_host) { > > + armv8pmu_enable_counter(idx); > > + if (armv8pmu_event_is_chained(event)) > > + armv8pmu_enable_counter(idx - 1); > > + } > > } > > > > static inline int armv8pmu_disable_counter(int idx) > > @@ -664,11 +677,23 @@ static inline int armv8pmu_disable_counter(int idx) > > static inline void armv8pmu_disable_event_counter(struct perf_event *event) > > { > > struct hw_perf_event *hwc = &event->hw; > > + struct perf_event_attr *attr = &event->attr; > > int idx = hwc->idx; > > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx)); > > > > if (armv8pmu_event_is_chained(event)) > > - armv8pmu_disable_counter(idx - 1); > > - armv8pmu_disable_counter(idx); > > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); > > + > > + if (attr->exclude_host) > > + kvm_clr_set_guest_pmu_events(counter_bits, 0); > > + if (attr->exclude_guest) > > + kvm_clr_set_host_pmu_events(counter_bits, 0); > > + > > + if (!attr->exclude_host) { > > + if (armv8pmu_event_is_chained(event)) > > + armv8pmu_disable_counter(idx - 1); > > + armv8pmu_disable_counter(idx); > > + } > > It looks like you're relying on the perf_event_attr remaining unchanged > between the enable/disable calls for a given event. However, I'm not > convinced that's the case -- have you checked that e.g. > PERF_EVENT_IOC_MODIFY_ATTRIBUTES can't rewrite the thing behind your > back? Even better - I can unconditionally clear the counter_bits here seeing as the event is being disabled. > > > } > > > > static inline int armv8pmu_enable_intens(int idx) > > @@ -943,16 +968,25 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, > > * Therefore we ignore exclude_hv in this configuration, since > > * there's no hypervisor to sample anyway. This is consistent > > * with other architectures (x86 and Power). > > + * > > + * To eliminate counting host events on the boundaries of > > + * guest entry/exit we ensure EL2 is not included in hyp mode > > + * with !exclude_host. > > */ > > if (is_kernel_in_hyp_mode()) { > > - if (!attr->exclude_kernel) > > + if (!attr->exclude_kernel && !attr->exclude_host) > > config_base |= ARMV8_PMU_INCLUDE_EL2; > > Do you know what perf passes by default here? I'm worried that we're > introducing a user-visible change to existing programs by changing this > check. When running 'perf xxx' exclude_guest is set and exclude_host is unset. And the opposite when running 'perf kvm xxx'. So for existing users that do not use virtualisation then there is no change. > > > } else { > > - if (attr->exclude_kernel) > > - config_base |= ARMV8_PMU_EXCLUDE_EL1; > > if (!attr->exclude_hv) > > config_base |= ARMV8_PMU_INCLUDE_EL2; > > } > > + > > + /* > > + * Filter out !VHE kernels and guest kernels > > + */ > > + if (attr->exclude_kernel) > > + config_base |= ARMV8_PMU_EXCLUDE_EL1; > > Why have you moved this hunk? When guests use perf the guest PMU is emulated by KVM (virt/kvm/arm/pmu) and is backed by host perf events, these events always have the exclude_host flag set. Therefore to filter these kernel events we need to filter EL1. By moving this hunk we ensure guest kernel filtering works even when the host is VHE. (The side effect is that we unnecessarily filter EL1 on VHE host kernel events, but this has no effect as no-one is using this except any guest.) Thanks, Andrew Murray > > Will
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index de564ae..e0619ba 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -26,6 +26,7 @@ #include <linux/acpi.h> #include <linux/clocksource.h> +#include <linux/kvm_host.h> #include <linux/of.h> #include <linux/perf/arm_pmu.h> #include <linux/platform_device.h> @@ -647,11 +648,23 @@ static inline int armv8pmu_enable_counter(int idx) static inline void armv8pmu_enable_event_counter(struct perf_event *event) { + struct perf_event_attr *attr = &event->attr; int idx = event->hw.idx; + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx)); - armv8pmu_enable_counter(idx); if (armv8pmu_event_is_chained(event)) - armv8pmu_enable_counter(idx - 1); + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); + + if (attr->exclude_host) + kvm_clr_set_guest_pmu_events(0, counter_bits); + if (attr->exclude_guest) + kvm_clr_set_host_pmu_events(0, counter_bits); + + if (!attr->exclude_host) { + armv8pmu_enable_counter(idx); + if (armv8pmu_event_is_chained(event)) + armv8pmu_enable_counter(idx - 1); + } } static inline int armv8pmu_disable_counter(int idx) @@ -664,11 +677,23 @@ static inline int armv8pmu_disable_counter(int idx) static inline void armv8pmu_disable_event_counter(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; + struct perf_event_attr *attr = &event->attr; int idx = hwc->idx; + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx)); if (armv8pmu_event_is_chained(event)) - armv8pmu_disable_counter(idx - 1); - armv8pmu_disable_counter(idx); + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); + + if (attr->exclude_host) + kvm_clr_set_guest_pmu_events(counter_bits, 0); + if (attr->exclude_guest) + kvm_clr_set_host_pmu_events(counter_bits, 0); + + if (!attr->exclude_host) { + if (armv8pmu_event_is_chained(event)) + armv8pmu_disable_counter(idx - 1); + armv8pmu_disable_counter(idx); + } } static inline int armv8pmu_enable_intens(int idx) @@ -943,16 +968,25 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, * Therefore we ignore exclude_hv in this configuration, since * there's no hypervisor to sample anyway. This is consistent * with other architectures (x86 and Power). + * + * To eliminate counting host events on the boundaries of + * guest entry/exit we ensure EL2 is not included in hyp mode + * with !exclude_host. */ if (is_kernel_in_hyp_mode()) { - if (!attr->exclude_kernel) + if (!attr->exclude_kernel && !attr->exclude_host) config_base |= ARMV8_PMU_INCLUDE_EL2; } else { - if (attr->exclude_kernel) - config_base |= ARMV8_PMU_EXCLUDE_EL1; if (!attr->exclude_hv) config_base |= ARMV8_PMU_INCLUDE_EL2; } + + /* + * Filter out !VHE kernels and guest kernels + */ + if (attr->exclude_kernel) + config_base |= ARMV8_PMU_EXCLUDE_EL1; + if (attr->exclude_user) config_base |= ARMV8_PMU_EXCLUDE_EL0; @@ -976,6 +1010,10 @@ static void armv8pmu_reset(void *info) armv8pmu_disable_intens(idx); } + /* Clear the counters we flip at guest entry/exit */ + kvm_clr_set_host_pmu_events(U32_MAX, 0); + kvm_clr_set_guest_pmu_events(U32_MAX, 0); + /* * Initialize & Reset PMNC. Request overflow interrupt for * 64 bit cycle counter but cheat in armv8pmu_write_counter().
Add support for the :G and :H attributes in perf by handling the exclude_host/exclude_guest event attributes. We notify KVM of counters that we wish to be enabled or disabled on guest entry/exit and thus defer from starting or stopping :G events as per the events exclude_host attribute. With both VHE and non-VHE we switch the counters between host/guest at EL2. We are able to eliminate counters counting host events on the boundaries of guest entry/exit when using :G by filtering out EL2 for exclude_host. However when using :H unless exclude_hv is set on non-VHE then there is a small blackout window at the guest entry/exit where host events are not captured. Signed-off-by: Andrew Murray <andrew.murray@arm.com> --- arch/arm64/kernel/perf_event.c | 52 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 45 insertions(+), 7 deletions(-)