diff mbox

[RFC,V2,1/3] perf, tools: Support wildcards on pmu name in dynamic pmu events

Message ID 1520034092-35275-2-git-send-email-agustinv@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Agustin Vega-Frias March 2, 2018, 11:41 p.m. UTC
Starting on v4.12 event parsing code for dynamic pmu events already
supports prefix-based matching of multiple pmus when creating dynamic
events. E.g., in a system with the following dynamic pmus:

    mypmu_0
    mypmu_1
    mypmu_2
    mypmu_4

passing mypmu/<config>/ as an event spec will result in the creation
of the event in all of the pmus. This change expands this matching
through the use of fnmatch so glob-like expressions can be used to
create events in multiple pmus. E.g., in the system described above
if a user only wants to create the event in mypmu_0 and mypmu_1,
mypmu_[01]/<config>/ can be passed.

Signed-off-by: Agustin Vega-Frias <agustinv@codeaurora.org>
---
 tools/perf/Documentation/perf-list.txt |  8 +++++++-
 tools/perf/Documentation/perf-stat.txt | 12 ++++++++++++
 tools/perf/util/parse-events.l         |  2 +-
 tools/perf/util/parse-events.y         |  3 ++-
 4 files changed, 22 insertions(+), 3 deletions(-)

--
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

Comments

Jiri Olsa March 3, 2018, 2:34 p.m. UTC | #1
On Fri, Mar 02, 2018 at 06:41:30PM -0500, Agustin Vega-Frias wrote:

SNIP

> 
> diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> index 655ecff..a1a01b1 100644
> --- a/tools/perf/util/parse-events.l
> +++ b/tools/perf/util/parse-events.l
> @@ -175,7 +175,7 @@ bpf_source	[^,{}]+\.c[a-zA-Z0-9._]*
>  num_dec		[0-9]+
>  num_hex		0x[a-fA-F0-9]+
>  num_raw_hex	[a-fA-F0-9]+
> -name		[a-zA-Z_*?][a-zA-Z0-9_*?.]*
> +name		[a-zA-Z_*?\[\]][a-zA-Z0-9_*?.\[\]]*
>  name_minus	[a-zA-Z_*?][a-zA-Z0-9\-_*?.:]*
>  drv_cfg_term	[a-zA-Z0-9_\.]+(=[a-zA-Z0-9_*?\.:]+)?
>  /* If you add a modifier you need to update check_modifier() */
> diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> index e81a20e..c528469 100644
> --- a/tools/perf/util/parse-events.y
> +++ b/tools/perf/util/parse-events.y
> @@ -8,6 +8,7 @@
> 
>  #define YYDEBUG 1
> 
> +#include <fnmatch.h>
>  #include <linux/compiler.h>
>  #include <linux/list.h>
>  #include <linux/types.h>
> @@ -241,7 +242,7 @@ PE_NAME opt_event_config
>  			if (!strncmp(name, "uncore_", 7) &&
>  			    strncmp($1, "uncore_", 7))
>  				name += 7;
> -			if (!strncmp($1, name, strlen($1))) {
> +			if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {

could we now get rid of the strncmp in here and keep the
glob matching only? I find it confusing now that following
commands give me same results:

	- [root@krava perf]# ./perf stat -e 'cbox/clockticks/' --no-merge -a sleep 1

	 Performance counter stats for 'system wide':

	   <not supported>      uncore_cbox_1/clockticks/                                   
	281,474,957,674,239      uncore_cbox_0/clockticks/                                   

	       1.000958335 seconds time elapsed

	- [root@krava perf]# ./perf stat -e '*cbox*/clockticks/' --no-merge -a sleep 1

	 Performance counter stats for 'system wide':

	   <not supported>      uncore_cbox_1/clockticks/                                   
		 5,427,337      uncore_cbox_0/clockticks/                                   

	       1.000962724 seconds time elapsed

	- [root@krava perf]# ./perf stat -e 'cbox*/clockticks/' --no-merge -a sleep 1

	 Performance counter stats for 'system wide':

	   <not supported>      uncore_cbox_1/clockticks/                                   
	281,474,969,621,374      uncore_cbox_0/clockticks/                                   

	       1.001026179 seconds time elapsed

and this one fails:

	- [root@krava perf]# ./perf stat -e '*cbox/clockticks/' --no-merge -a sleep 1
	event syntax error: '*cbox/clockticks/'
			     \___ Cannot find PMU `*cbox'. Missing kernel support?
	Run 'perf list' for a list of valid events

	 Usage: perf stat [<options>] [<command>]

	    -e, --event <event>   event selector. use 'perf list' to list available events


despite the fact that it makes as much sense as the previous one: perf stat -e 'cbox*/clockticks/'


I'd think let's keep just the glob matching, so it's clear
you what you use wildcards for.. thoughts?

thanks,
jirka
Andi Kleen March 4, 2018, 5:12 p.m. UTC | #2
> > +#include <fnmatch.h>
> >  #include <linux/compiler.h>
> >  #include <linux/list.h>
> >  #include <linux/types.h>
> > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
> >  			if (!strncmp(name, "uncore_", 7) &&
> >  			    strncmp($1, "uncore_", 7))
> >  				name += 7;
> > -			if (!strncmp($1, name, strlen($1))) {
> > +			if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
> 
> could we now get rid of the strncmp in here and keep the
> glob matching only? 

That would break existing command lines. Not a good idea.

-Andi
Jiri Olsa March 4, 2018, 6:10 p.m. UTC | #3
On Sun, Mar 04, 2018 at 09:12:45AM -0800, Andi Kleen wrote:
> > > +#include <fnmatch.h>
> > >  #include <linux/compiler.h>
> > >  #include <linux/list.h>
> > >  #include <linux/types.h>
> > > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
> > >  			if (!strncmp(name, "uncore_", 7) &&
> > >  			    strncmp($1, "uncore_", 7))
> > >  				name += 7;
> > > -			if (!strncmp($1, name, strlen($1))) {
> > > +			if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
> > 
> > could we now get rid of the strncmp in here and keep the
> > glob matching only? 
> 
> That would break existing command lines. Not a good idea.

I hoped that only you guys are using this and would rewrite your scripts ;-)

I had no idea there's fnmatch func before.. too bad, ok

jirka
Agustin Vega-Frias March 5, 2018, 3:08 p.m. UTC | #4
On 2018-03-04 13:10, Jiri Olsa wrote:
> On Sun, Mar 04, 2018 at 09:12:45AM -0800, Andi Kleen wrote:
>> > > +#include <fnmatch.h>
>> > >  #include <linux/compiler.h>
>> > >  #include <linux/list.h>
>> > >  #include <linux/types.h>
>> > > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
>> > >  			if (!strncmp(name, "uncore_", 7) &&
>> > >  			    strncmp($1, "uncore_", 7))
>> > >  				name += 7;
>> > > -			if (!strncmp($1, name, strlen($1))) {
>> > > +			if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
>> >
>> > could we now get rid of the strncmp in here and keep the
>> > glob matching only?
>> 
>> That would break existing command lines. Not a good idea.
> 
> I hoped that only you guys are using this and would rewrite your 
> scripts ;-)
> 
> I had no idea there's fnmatch func before.. too bad, ok
> 
> jirka

An option to keep backward compatibility and consistency would be
to wrap the pattern/string passed in *'s, that way we can just use
fnmatch and have all the examples Jiri brought up work the same.
With that in place we can actually also drop the explicit ignoring
of the uncore_ prefix since the globbing would take care of that.

Thoughts?

Agustín
Andi Kleen March 5, 2018, 5:55 p.m. UTC | #5
Agustin Vega-Frias <agustinv@codeaurora.org> writes:
>
> An option to keep backward compatibility and consistency would be
> to wrap the pattern/string passed in *'s, that way we can just use
> fnmatch and have all the examples Jiri brought up work the same.
> With that in place we can actually also drop the explicit ignoring
> of the uncore_ prefix since the globbing would take care of that.

Prepending with * would seem dangerous, could result in false
matches. But adding it at the end should be ok.

-Andi
Jiri Olsa March 5, 2018, 7:09 p.m. UTC | #6
On Mon, Mar 05, 2018 at 10:08:18AM -0500, Agustin Vega-Frias wrote:
> On 2018-03-04 13:10, Jiri Olsa wrote:
> > On Sun, Mar 04, 2018 at 09:12:45AM -0800, Andi Kleen wrote:
> > > > > +#include <fnmatch.h>
> > > > >  #include <linux/compiler.h>
> > > > >  #include <linux/list.h>
> > > > >  #include <linux/types.h>
> > > > > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
> > > > >  			if (!strncmp(name, "uncore_", 7) &&
> > > > >  			    strncmp($1, "uncore_", 7))
> > > > >  				name += 7;
> > > > > -			if (!strncmp($1, name, strlen($1))) {
> > > > > +			if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
> > > >
> > > > could we now get rid of the strncmp in here and keep the
> > > > glob matching only?
> > > 
> > > That would break existing command lines. Not a good idea.
> > 
> > I hoped that only you guys are using this and would rewrite your scripts
> > ;-)
> > 
> > I had no idea there's fnmatch func before.. too bad, ok
> > 
> > jirka
> 
> An option to keep backward compatibility and consistency would be
> to wrap the pattern/string passed in *'s, that way we can just use
> fnmatch and have all the examples Jiri brought up work the same.
> With that in place we can actually also drop the explicit ignoring
> of the uncore_ prefix since the globbing would take care of that.

I don't mind the strcmp as such, I wanted to get rid of the wildcard
matching without using '*' ... but as Andi said it's been out
there and it's been a while, so let's keep it

but if there's a way to make it simpler, let's go for it

thanks,
jirka
Agustin Vega-Frias March 5, 2018, 8:10 p.m. UTC | #7
On 2018-03-05 14:09, Jiri Olsa wrote:
> On Mon, Mar 05, 2018 at 10:08:18AM -0500, Agustin Vega-Frias wrote:
>> On 2018-03-04 13:10, Jiri Olsa wrote:
>> > On Sun, Mar 04, 2018 at 09:12:45AM -0800, Andi Kleen wrote:
>> > > > > +#include <fnmatch.h>
>> > > > >  #include <linux/compiler.h>
>> > > > >  #include <linux/list.h>
>> > > > >  #include <linux/types.h>
>> > > > > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
>> > > > >  			if (!strncmp(name, "uncore_", 7) &&
>> > > > >  			    strncmp($1, "uncore_", 7))
>> > > > >  				name += 7;
>> > > > > -			if (!strncmp($1, name, strlen($1))) {
>> > > > > +			if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
>> > > >
>> > > > could we now get rid of the strncmp in here and keep the
>> > > > glob matching only?
>> > >
>> > > That would break existing command lines. Not a good idea.
>> >
>> > I hoped that only you guys are using this and would rewrite your scripts
>> > ;-)
>> >
>> > I had no idea there's fnmatch func before.. too bad, ok
>> >
>> > jirka
>> 
>> An option to keep backward compatibility and consistency would be
>> to wrap the pattern/string passed in *'s, that way we can just use
>> fnmatch and have all the examples Jiri brought up work the same.
>> With that in place we can actually also drop the explicit ignoring
>> of the uncore_ prefix since the globbing would take care of that.
> 
> I don't mind the strcmp as such, I wanted to get rid of the wildcard
> matching without using '*' ... but as Andi said it's been out
> there and it's been a while, so let's keep it
> 
> but if there's a way to make it simpler, let's go for it
> 
> thanks,
> jirka

Sounds good. I have a new version ready (see sample output below).
But I wanted to ping about the other two patches before submitting.
Any feedback on those?

Thanks,
Agustín

PS:
Sample output:

$ ./perf stat -a -e imc/umask=0x3,event=0x4/ --no-merge ls -l > 
/dev/null

  Performance counter stats for 'system wide':

              2,613      uncore_imc_0/umask=0x3,event=0x4/
              2,736      uncore_imc_1/umask=0x3,event=0x4/
              2,671      uncore_imc_2/umask=0x3,event=0x4/
              2,508      uncore_imc_3/umask=0x3,event=0x4/
              2,439      uncore_imc_4/umask=0x3,event=0x4/
              2,465      uncore_imc_5/umask=0x3,event=0x4/

        0.004159243 seconds time elapsed

$ ./perf stat -a -e *imc/umask=0x3,event=0x4/ --no-merge ls -l > 
/dev/null

  Performance counter stats for 'system wide':

              2,704      uncore_imc_0/umask=0x3,event=0x4/
              2,601      uncore_imc_1/umask=0x3,event=0x4/
              2,625      uncore_imc_2/umask=0x3,event=0x4/
              2,370      uncore_imc_3/umask=0x3,event=0x4/
              2,485      uncore_imc_4/umask=0x3,event=0x4/
              2,431      uncore_imc_5/umask=0x3,event=0x4/

        0.002716763 seconds time elapsed

$ ./perf stat -a -e imc*/umask=0x3,event=0x4/ --no-merge ls -l > 
/dev/null

  Performance counter stats for 'system wide':

              1,294      uncore_imc_0/umask=0x3,event=0x4/
              1,303      uncore_imc_1/umask=0x3,event=0x4/
              1,242      uncore_imc_2/umask=0x3,event=0x4/
              1,125      uncore_imc_3/umask=0x3,event=0x4/
              1,137      uncore_imc_4/umask=0x3,event=0x4/
              1,159      uncore_imc_5/umask=0x3,event=0x4/

        0.002790441 seconds time elapsed

$ ./perf stat -a -e *imc*/umask=0x3,event=0x4/ --no-merge ls -l > 
/dev/null

  Performance counter stats for 'system wide':

              1,524      uncore_imc_0/umask=0x3,event=0x4/
              1,508      uncore_imc_1/umask=0x3,event=0x4/
              1,501      uncore_imc_2/umask=0x3,event=0x4/
              1,405      uncore_imc_3/umask=0x3,event=0x4/
              1,427      uncore_imc_4/umask=0x3,event=0x4/
              1,450      uncore_imc_5/umask=0x3,event=0x4/

        0.002720907 seconds time elapsed
Jiri Olsa March 5, 2018, 9:51 p.m. UTC | #8
On Mon, Mar 05, 2018 at 03:10:43PM -0500, Agustin Vega-Frias wrote:
> On 2018-03-05 14:09, Jiri Olsa wrote:
> > On Mon, Mar 05, 2018 at 10:08:18AM -0500, Agustin Vega-Frias wrote:
> > > On 2018-03-04 13:10, Jiri Olsa wrote:
> > > > On Sun, Mar 04, 2018 at 09:12:45AM -0800, Andi Kleen wrote:
> > > > > > > +#include <fnmatch.h>
> > > > > > >  #include <linux/compiler.h>
> > > > > > >  #include <linux/list.h>
> > > > > > >  #include <linux/types.h>
> > > > > > > @@ -241,7 +242,7 @@ PE_NAME opt_event_config
> > > > > > >  			if (!strncmp(name, "uncore_", 7) &&
> > > > > > >  			    strncmp($1, "uncore_", 7))
> > > > > > >  				name += 7;
> > > > > > > -			if (!strncmp($1, name, strlen($1))) {
> > > > > > > +			if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
> > > > > >
> > > > > > could we now get rid of the strncmp in here and keep the
> > > > > > glob matching only?
> > > > >
> > > > > That would break existing command lines. Not a good idea.
> > > >
> > > > I hoped that only you guys are using this and would rewrite your scripts
> > > > ;-)
> > > >
> > > > I had no idea there's fnmatch func before.. too bad, ok
> > > >
> > > > jirka
> > > 
> > > An option to keep backward compatibility and consistency would be
> > > to wrap the pattern/string passed in *'s, that way we can just use
> > > fnmatch and have all the examples Jiri brought up work the same.
> > > With that in place we can actually also drop the explicit ignoring
> > > of the uncore_ prefix since the globbing would take care of that.
> > 
> > I don't mind the strcmp as such, I wanted to get rid of the wildcard
> > matching without using '*' ... but as Andi said it's been out
> > there and it's been a while, so let's keep it
> > 
> > but if there's a way to make it simpler, let's go for it
> > 
> > thanks,
> > jirka
> 
> Sounds good. I have a new version ready (see sample output below).
> But I wanted to ping about the other two patches before submitting.
> Any feedback on those?

the rest looks ok to me, so does the output below

thanks,
jirka

> 
> Thanks,
> Agustín
> 
> PS:
> Sample output:
> 
> $ ./perf stat -a -e imc/umask=0x3,event=0x4/ --no-merge ls -l > /dev/null
> 
>  Performance counter stats for 'system wide':
> 
>              2,613      uncore_imc_0/umask=0x3,event=0x4/
>              2,736      uncore_imc_1/umask=0x3,event=0x4/
>              2,671      uncore_imc_2/umask=0x3,event=0x4/
>              2,508      uncore_imc_3/umask=0x3,event=0x4/
>              2,439      uncore_imc_4/umask=0x3,event=0x4/
>              2,465      uncore_imc_5/umask=0x3,event=0x4/
> 
>        0.004159243 seconds time elapsed
> 
> $ ./perf stat -a -e *imc/umask=0x3,event=0x4/ --no-merge ls -l > /dev/null
> 
>  Performance counter stats for 'system wide':
> 
>              2,704      uncore_imc_0/umask=0x3,event=0x4/
>              2,601      uncore_imc_1/umask=0x3,event=0x4/
>              2,625      uncore_imc_2/umask=0x3,event=0x4/
>              2,370      uncore_imc_3/umask=0x3,event=0x4/
>              2,485      uncore_imc_4/umask=0x3,event=0x4/
>              2,431      uncore_imc_5/umask=0x3,event=0x4/
> 
>        0.002716763 seconds time elapsed
> 
> $ ./perf stat -a -e imc*/umask=0x3,event=0x4/ --no-merge ls -l > /dev/null
> 
>  Performance counter stats for 'system wide':
> 
>              1,294      uncore_imc_0/umask=0x3,event=0x4/
>              1,303      uncore_imc_1/umask=0x3,event=0x4/
>              1,242      uncore_imc_2/umask=0x3,event=0x4/
>              1,125      uncore_imc_3/umask=0x3,event=0x4/
>              1,137      uncore_imc_4/umask=0x3,event=0x4/
>              1,159      uncore_imc_5/umask=0x3,event=0x4/
> 
>        0.002790441 seconds time elapsed
> 
> $ ./perf stat -a -e *imc*/umask=0x3,event=0x4/ --no-merge ls -l > /dev/null
> 
>  Performance counter stats for 'system wide':
> 
>              1,524      uncore_imc_0/umask=0x3,event=0x4/
>              1,508      uncore_imc_1/umask=0x3,event=0x4/
>              1,501      uncore_imc_2/umask=0x3,event=0x4/
>              1,405      uncore_imc_3/umask=0x3,event=0x4/
>              1,427      uncore_imc_4/umask=0x3,event=0x4/
>              1,450      uncore_imc_5/umask=0x3,event=0x4/
> 
>        0.002720907 seconds time elapsed
> 
> -- 
> Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux
> Foundation Collaborative Project.
diff mbox

Patch

diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
index e2a897a..2549c34 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -141,7 +141,13 @@  on the first memory controller on socket 0 of a Intel Xeon system

 Each memory controller has its own PMU.  Measuring the complete system
 bandwidth would require specifying all imc PMUs (see perf list output),
-and adding the values together.
+and adding the values together. To simplify creation of multiple events,
+prefix and glob matching is supported in the PMU name, and the prefix
+'uncore_' is also ignored when performing the match. So the command above
+can be expanded to all memory controllers by using the syntaxes:
+
+  perf stat -C 0 -a imc/cas_count_read/,imc/cas_count_write/ -I 1000 ...
+  perf stat -C 0 -a *imc*/cas_count_read/,*imc*/cas_count_write/ -I 1000 ...

 This example measures the combined core power every second

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 823fce7..49983a7 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -49,6 +49,12 @@  report::
 	  parameters are defined by corresponding entries in
 	  /sys/bus/event_source/devices/<pmu>/format/*

+	Note that the last two syntaxes support prefix and glob matching in
+	the PMU name to simplify creation of events accross multiple instances
+	of the same type of PMU (e.g. memory controller PMU) in large systems.
+	Multiple PMU instances are typical for uncore PMUs, so the prefix
+	'uncore_' is also ignored when performing this match.
+
 -i::
 --no-inherit::
         child tasks do not inherit counters
@@ -246,6 +252,12 @@  taskset.
 --no-merge::
 Do not merge results from same PMUs.

+When multiple events are created from a single event alias, stat will,
+by default, aggregate the event counts and show the result in a single
+row. This option disables that behavior and shows the individual events
+and counts. Aliases are listed immediately after the Kernel PMU events
+by perf list.
+
 --smi-cost::
 Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 655ecff..a1a01b1 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -175,7 +175,7 @@  bpf_source	[^,{}]+\.c[a-zA-Z0-9._]*
 num_dec		[0-9]+
 num_hex		0x[a-fA-F0-9]+
 num_raw_hex	[a-fA-F0-9]+
-name		[a-zA-Z_*?][a-zA-Z0-9_*?.]*
+name		[a-zA-Z_*?\[\]][a-zA-Z0-9_*?.\[\]]*
 name_minus	[a-zA-Z_*?][a-zA-Z0-9\-_*?.:]*
 drv_cfg_term	[a-zA-Z0-9_\.]+(=[a-zA-Z0-9_*?\.:]+)?
 /* If you add a modifier you need to update check_modifier() */
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index e81a20e..c528469 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -8,6 +8,7 @@ 

 #define YYDEBUG 1

+#include <fnmatch.h>
 #include <linux/compiler.h>
 #include <linux/list.h>
 #include <linux/types.h>
@@ -241,7 +242,7 @@  PE_NAME opt_event_config
 			if (!strncmp(name, "uncore_", 7) &&
 			    strncmp($1, "uncore_", 7))
 				name += 7;
-			if (!strncmp($1, name, strlen($1))) {
+			if (!strncmp($1, name, strlen($1)) || !fnmatch($1, name, 0)) {
 				if (parse_events_copy_term_list(orig_terms, &terms))
 					YYABORT;
 				if (!parse_events_add_pmu(_parse_state, list, pmu->name, terms))