Message ID | 1592384514-119954-3-git-send-email-john.garry@huawei.com (mailing list archive) |
---|---|
State | Mainlined |
Commit | ce0dc7d22271fa7eac3875fc9b57772742b8245e |
Headers | show |
Series | perf: Improve list for arm64 | expand |
Hello, On Wed, Jun 17, 2020 at 6:06 PM John Garry <john.garry@huawei.com> wrote: > > For perf list, the CPU core PMU HW event ordering is such that not all > events may will be listed adjacent - consider this example: > > $ tools/perf/perf list > > List of pre-defined events (to be used in -e): > > duration_time [Tool event] > > branch-instructions OR cpu/branch-instructions/ [Kernel PMU event] > branch-misses OR cpu/branch-misses/ [Kernel PMU event] > bus-cycles OR cpu/bus-cycles/ [Kernel PMU event] > cache-misses OR cpu/cache-misses/ [Kernel PMU event] > cache-references OR cpu/cache-references/ [Kernel PMU event] > cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event] > cstate_core/c3-residency/ [Kernel PMU event] > cstate_core/c6-residency/ [Kernel PMU event] > cstate_core/c7-residency/ [Kernel PMU event] > cstate_pkg/c2-residency/ [Kernel PMU event] > cstate_pkg/c3-residency/ [Kernel PMU event] > cstate_pkg/c6-residency/ [Kernel PMU event] > cstate_pkg/c7-residency/ [Kernel PMU event] > cycles-ct OR cpu/cycles-ct/ [Kernel PMU event] > cycles-t OR cpu/cycles-t/ [Kernel PMU event] > el-abort OR cpu/el-abort/ [Kernel PMU event] > el-capacity OR cpu/el-capacity/ [Kernel PMU event] > > Notice in the above example how the cstate_core PMU events are mixed in > the middle of the CPU core events. > > For my arm64 platform, all the uncore events get mixed in, making the list > very disorganised: > page-faults OR faults [Software event] > task-clock [Software event] > duration_time [Tool event] > L1-dcache-load-misses [Hardware cache event] > L1-dcache-loads [Hardware cache event] > L1-icache-load-misses [Hardware cache event] > L1-icache-loads [Hardware cache event] > branch-load-misses [Hardware cache event] > branch-loads [Hardware cache event] > dTLB-load-misses [Hardware cache event] > dTLB-loads [Hardware cache event] > iTLB-load-misses [Hardware cache event] > iTLB-loads [Hardware cache event] > br_mis_pred OR armv8_pmuv3_0/br_mis_pred/ [Kernel PMU event] > br_mis_pred_retired OR armv8_pmuv3_0/br_mis_pred_retired/ [Kernel PMU event] > br_pred OR armv8_pmuv3_0/br_pred/ [Kernel PMU event] > br_retired OR armv8_pmuv3_0/br_retired/ [Kernel PMU event] > br_return_retired OR armv8_pmuv3_0/br_return_retired/ [Kernel PMU event] > bus_access OR armv8_pmuv3_0/bus_access/ [Kernel PMU event] > bus_cycles OR armv8_pmuv3_0/bus_cycles/ [Kernel PMU event] > cid_write_retired OR armv8_pmuv3_0/cid_write_retired/ [Kernel PMU event] > cpu_cycles OR armv8_pmuv3_0/cpu_cycles/ [Kernel PMU event] > dtlb_walk OR armv8_pmuv3_0/dtlb_walk/ [Kernel PMU event] > exc_return OR armv8_pmuv3_0/exc_return/ [Kernel PMU event] > exc_taken OR armv8_pmuv3_0/exc_taken/ [Kernel PMU event] > hisi_sccl1_ddrc0/act_cmd/ [Kernel PMU event] > hisi_sccl1_ddrc0/flux_rcmd/ [Kernel PMU event] > hisi_sccl1_ddrc0/flux_rd/ [Kernel PMU event] > hisi_sccl1_ddrc0/flux_wcmd/ [Kernel PMU event] > hisi_sccl1_ddrc0/flux_wr/ [Kernel PMU event] > hisi_sccl1_ddrc0/pre_cmd/ [Kernel PMU event] > hisi_sccl1_ddrc0/rnk_chg/ [Kernel PMU event] > > ... > > hisi_sccl7_l3c21/wr_hit_cpipe/ [Kernel PMU event] > hisi_sccl7_l3c21/wr_hit_spipe/ [Kernel PMU event] > hisi_sccl7_l3c21/wr_spipe/ [Kernel PMU event] > inst_retired OR armv8_pmuv3_0/inst_retired/ [Kernel PMU event] > inst_spec OR armv8_pmuv3_0/inst_spec/ [Kernel PMU event] > itlb_walk OR armv8_pmuv3_0/itlb_walk/ [Kernel PMU event] > l1d_cache OR armv8_pmuv3_0/l1d_cache/ [Kernel PMU event] > l1d_cache_refill OR armv8_pmuv3_0/l1d_cache_refill/ [Kernel PMU event] > l1d_cache_wb OR armv8_pmuv3_0/l1d_cache_wb/ [Kernel PMU event] > l1d_tlb OR armv8_pmuv3_0/l1d_tlb/ [Kernel PMU event] > l1d_tlb_refill OR armv8_pmuv3_0/l1d_tlb_refill/ [Kernel PMU event] > > So the events are list alphabetically. However, CPU core event listing is > special from commit dc098b35b56f ("perf list: List kernel supplied event > aliases"), in that the alias and full event is shown (in that order). > As such, the core events may become sparse. > > Improve this by grouping the CPU core events and ensure that they are > listed first for kernel PMU events. For the first example, above, this > now looks like: > > duration_time [Tool event] > branch-instructions OR cpu/branch-instructions/ [Kernel PMU event] > branch-misses OR cpu/branch-misses/ [Kernel PMU event] > bus-cycles OR cpu/bus-cycles/ [Kernel PMU event] > cache-misses OR cpu/cache-misses/ [Kernel PMU event] > cache-references OR cpu/cache-references/ [Kernel PMU event] > cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event] > cycles-ct OR cpu/cycles-ct/ [Kernel PMU event] > cycles-t OR cpu/cycles-t/ [Kernel PMU event] > el-abort OR cpu/el-abort/ [Kernel PMU event] > el-capacity OR cpu/el-capacity/ [Kernel PMU event] > el-commit OR cpu/el-commit/ [Kernel PMU event] > el-conflict OR cpu/el-conflict/ [Kernel PMU event] > el-start OR cpu/el-start/ [Kernel PMU event] > instructions OR cpu/instructions/ [Kernel PMU event] > mem-loads OR cpu/mem-loads/ [Kernel PMU event] > mem-stores OR cpu/mem-stores/ [Kernel PMU event] > ref-cycles OR cpu/ref-cycles/ [Kernel PMU event] > topdown-fetch-bubbles OR cpu/topdown-fetch-bubbles/ [Kernel PMU event] > topdown-recovery-bubbles OR cpu/topdown-recovery-bubbles/ [Kernel PMU event] > topdown-slots-issued OR cpu/topdown-slots-issued/ [Kernel PMU event] > topdown-slots-retired OR cpu/topdown-slots-retired/ [Kernel PMU event] > topdown-total-slots OR cpu/topdown-total-slots/ [Kernel PMU event] > tx-abort OR cpu/tx-abort/ [Kernel PMU event] > tx-capacity OR cpu/tx-capacity/ [Kernel PMU event] > tx-commit OR cpu/tx-commit/ [Kernel PMU event] > tx-conflict OR cpu/tx-conflict/ [Kernel PMU event] > tx-start OR cpu/tx-start/ [Kernel PMU event] > cstate_core/c3-residency/ [Kernel PMU event] > cstate_core/c6-residency/ [Kernel PMU event] > cstate_core/c7-residency/ [Kernel PMU event] > cstate_pkg/c2-residency/ [Kernel PMU event] > cstate_pkg/c3-residency/ [Kernel PMU event] > cstate_pkg/c6-residency/ [Kernel PMU event] > cstate_pkg/c7-residency/ [Kernel PMU event] > > Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Thanks Namhyung > --- > tools/perf/util/pmu.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c > index a375364537cd..faa3e0619740 100644 > --- a/tools/perf/util/pmu.c > +++ b/tools/perf/util/pmu.c > @@ -1400,6 +1400,7 @@ struct sevent { > char *pmu; > char *metric_expr; > char *metric_name; > + int is_cpu; > }; > > static int cmp_sevent(const void *a, const void *b) > @@ -1416,6 +1417,11 @@ static int cmp_sevent(const void *a, const void *b) > if (n) > return n; > } > + > + /* Order CPU core events to be first */ > + if (as->is_cpu != bs->is_cpu) > + return bs->is_cpu - as->is_cpu; > + > return strcmp(as->name, bs->name); > } > > @@ -1507,6 +1513,7 @@ void print_pmu_events(const char *event_glob, bool name_only, bool quiet_flag, > aliases[j].pmu = pmu->name; > aliases[j].metric_expr = alias->metric_expr; > aliases[j].metric_name = alias->metric_name; > + aliases[j].is_cpu = is_cpu; > j++; > } > if (pmu->selectable && > -- > 2.26.2 >
Em Wed, Jun 17, 2020 at 08:31:02PM +0900, Namhyung Kim escreveu: > On Wed, Jun 17, 2020 at 6:06 PM John Garry <john.garry@huawei.com> wrote: > > > > For perf list, the CPU core PMU HW event ordering is such that not all > > events may will be listed adjacent - consider this example: > > cstate_pkg/c6-residency/ [Kernel PMU event] > > cstate_pkg/c7-residency/ [Kernel PMU event] > > > > Signed-off-by: John Garry <john.garry@huawei.com> > > Acked-by: Namhyung Kim <namhyung@kernel.org> Thanks a lot, applied. - Arnaldo
On 17/06/2020 13:15, Arnaldo Carvalho de Melo wrote: > Em Wed, Jun 17, 2020 at 08:31:02PM +0900, Namhyung Kim escreveu: >> On Wed, Jun 17, 2020 at 6:06 PM John Garry <john.garry@huawei.com> wrote: >>> >>> For perf list, the CPU core PMU HW event ordering is such that not all >>> events may will be listed adjacent - consider this example: >>> cstate_pkg/c6-residency/ [Kernel PMU event] >>> cstate_pkg/c7-residency/ [Kernel PMU event] >>> >>> Signed-off-by: John Garry <john.garry@huawei.com> >> >> Acked-by: Namhyung Kim <namhyung@kernel.org> > > Thanks a lot, applied. Hi Arnaldo, I'm struggling to understand which branch we should base our development on. I don't see these patches in perf/core or linux-next. I saw someone mentioned tmp.perf/core as a baseline, but I can't see that branch in git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git Please let me know - it would be useful for any dev during the merge window. Thanks, John
Em Mon, Aug 03, 2020 at 09:00:06AM +0100, John Garry escreveu: > On 17/06/2020 13:15, Arnaldo Carvalho de Melo wrote: > > Em Wed, Jun 17, 2020 at 08:31:02PM +0900, Namhyung Kim escreveu: > > > On Wed, Jun 17, 2020 at 6:06 PM John Garry <john.garry@huawei.com> wrote: > > > > For perf list, the CPU core PMU HW event ordering is such that not all > > > > events may will be listed adjacent - consider this example: > > > > cstate_pkg/c6-residency/ [Kernel PMU event] > > > > cstate_pkg/c7-residency/ [Kernel PMU event] > > > > Signed-off-by: John Garry <john.garry@huawei.com> > > > Acked-by: Namhyung Kim <namhyung@kernel.org> > > Thanks a lot, applied. > I'm struggling to understand which branch we should base our development on. > I don't see these patches in perf/core or linux-next. I saw someone > mentioned tmp.perf/core as a baseline, but I can't see that branch in > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > > Please let me know - it would be useful for any dev during the merge window. So, I'm now pushing things directly to Linus, but just the tooling part, the branch to do development on is: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/core At some point I think we'll have a git/perf-tools/perf-tools.git, just like tip, but for now, please use the one above. My perf/core in the past was rebaseable, I did changes in it after publishing, trying to have solid bisectability, since I process patch by patch doing tests on it when we noticed problems, prior to pushing to Ingo for tip. Now I am making perf/core non-rebaseable, I push things there periodically, tagging what is there with the test results, see: https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tag/?h=perf-tools-tests-2020-07-17 https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tag/?h=perf-tools-tests-2020-07-02 I'll try and tag today's state of that tree, which I did tests already but since v5.8 was released, I merged it there and will retest and tag the test results. The tmp.perf/core one is an experiment in making what I have in my local tree available for more bleeding edge things that are being done, say in that metrics effort, etc, but I think I'll stop that, since, as your message shows, it is causing confusion. I did this because these tests take quite some time and sometimes I have to fix things and restart it, rinse, repeat. So please use: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/core I'll further automate all this so that we have a more regular cadence of updates to perf/core, say every two days or so. If you have changes that touch both the kernel and userspace, the kernel bits need to go via tip, the tooling via the perf tree, that for now (well, it has been like that for quite a long time) is my tree. Arch specific kernel bits have been going via the arch trees for quite a while, I think. - Arnaldo
On 03/08/2020 13:54, Arnaldo Carvalho de Melo wrote: >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git >> >> Please let me know - it would be useful for any dev during the merge window. > So, I'm now pushing things directly to Linus, but just the tooling part, > the branch to do development on is: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/core > > At some point I think we'll have a git/perf-tools/perf-tools.git, just > like tip, but for now, please use the one above. > > My perf/core in the past was rebaseable, I did changes in it after > publishing, trying to have solid bisectability, since I process patch by > patch doing tests on it when we noticed problems, prior to pushing to > Ingo for tip. > > Now I am making perf/core non-rebaseable, I push things there > periodically, tagging what is there with the test results, see: > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tag/?h=perf-tools-tests-2020-07-17 > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tag/?h=perf-tools-tests-2020-07-02 > > I'll try and tag today's state of that tree, which I did tests already > but since v5.8 was released, I merged it there and will retest and tag > the test results. > > The tmp.perf/core one is an experiment in making what I have in my local > tree available for more bleeding edge things that are being done, say in > that metrics effort, etc, but I think I'll stop that, since, as your > message shows, it is causing confusion. > > I did this because these tests take quite some time and sometimes I have > to fix things and restart it, rinse, repeat. > > So please use: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/core > If it makes sense, it could be useful to have this included in the MAINTAINERS file. That's for forgetful people like me :) > I'll further automate all this so that we have a more regular cadence of > updates to perf/core, say every two days or so. > > If you have changes that touch both the kernel and userspace, the kernel > bits need to go via tip, the tooling via the perf tree, that for now > (well, it has been like that for quite a long time) is my tree. ok, thanks for the detailed response. > > Arch specific kernel bits have been going via the arch trees for quite a > while, I think. > > - Arnaldo > . Cheers, john
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index a375364537cd..faa3e0619740 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -1400,6 +1400,7 @@ struct sevent { char *pmu; char *metric_expr; char *metric_name; + int is_cpu; }; static int cmp_sevent(const void *a, const void *b) @@ -1416,6 +1417,11 @@ static int cmp_sevent(const void *a, const void *b) if (n) return n; } + + /* Order CPU core events to be first */ + if (as->is_cpu != bs->is_cpu) + return bs->is_cpu - as->is_cpu; + return strcmp(as->name, bs->name); } @@ -1507,6 +1513,7 @@ void print_pmu_events(const char *event_glob, bool name_only, bool quiet_flag, aliases[j].pmu = pmu->name; aliases[j].metric_expr = alias->metric_expr; aliases[j].metric_name = alias->metric_name; + aliases[j].is_cpu = is_cpu; j++; } if (pmu->selectable &&
For perf list, the CPU core PMU HW event ordering is such that not all events may will be listed adjacent - consider this example: $ tools/perf/perf list List of pre-defined events (to be used in -e): duration_time [Tool event] branch-instructions OR cpu/branch-instructions/ [Kernel PMU event] branch-misses OR cpu/branch-misses/ [Kernel PMU event] bus-cycles OR cpu/bus-cycles/ [Kernel PMU event] cache-misses OR cpu/cache-misses/ [Kernel PMU event] cache-references OR cpu/cache-references/ [Kernel PMU event] cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event] cstate_core/c3-residency/ [Kernel PMU event] cstate_core/c6-residency/ [Kernel PMU event] cstate_core/c7-residency/ [Kernel PMU event] cstate_pkg/c2-residency/ [Kernel PMU event] cstate_pkg/c3-residency/ [Kernel PMU event] cstate_pkg/c6-residency/ [Kernel PMU event] cstate_pkg/c7-residency/ [Kernel PMU event] cycles-ct OR cpu/cycles-ct/ [Kernel PMU event] cycles-t OR cpu/cycles-t/ [Kernel PMU event] el-abort OR cpu/el-abort/ [Kernel PMU event] el-capacity OR cpu/el-capacity/ [Kernel PMU event] Notice in the above example how the cstate_core PMU events are mixed in the middle of the CPU core events. For my arm64 platform, all the uncore events get mixed in, making the list very disorganised: page-faults OR faults [Software event] task-clock [Software event] duration_time [Tool event] L1-dcache-load-misses [Hardware cache event] L1-dcache-loads [Hardware cache event] L1-icache-load-misses [Hardware cache event] L1-icache-loads [Hardware cache event] branch-load-misses [Hardware cache event] branch-loads [Hardware cache event] dTLB-load-misses [Hardware cache event] dTLB-loads [Hardware cache event] iTLB-load-misses [Hardware cache event] iTLB-loads [Hardware cache event] br_mis_pred OR armv8_pmuv3_0/br_mis_pred/ [Kernel PMU event] br_mis_pred_retired OR armv8_pmuv3_0/br_mis_pred_retired/ [Kernel PMU event] br_pred OR armv8_pmuv3_0/br_pred/ [Kernel PMU event] br_retired OR armv8_pmuv3_0/br_retired/ [Kernel PMU event] br_return_retired OR armv8_pmuv3_0/br_return_retired/ [Kernel PMU event] bus_access OR armv8_pmuv3_0/bus_access/ [Kernel PMU event] bus_cycles OR armv8_pmuv3_0/bus_cycles/ [Kernel PMU event] cid_write_retired OR armv8_pmuv3_0/cid_write_retired/ [Kernel PMU event] cpu_cycles OR armv8_pmuv3_0/cpu_cycles/ [Kernel PMU event] dtlb_walk OR armv8_pmuv3_0/dtlb_walk/ [Kernel PMU event] exc_return OR armv8_pmuv3_0/exc_return/ [Kernel PMU event] exc_taken OR armv8_pmuv3_0/exc_taken/ [Kernel PMU event] hisi_sccl1_ddrc0/act_cmd/ [Kernel PMU event] hisi_sccl1_ddrc0/flux_rcmd/ [Kernel PMU event] hisi_sccl1_ddrc0/flux_rd/ [Kernel PMU event] hisi_sccl1_ddrc0/flux_wcmd/ [Kernel PMU event] hisi_sccl1_ddrc0/flux_wr/ [Kernel PMU event] hisi_sccl1_ddrc0/pre_cmd/ [Kernel PMU event] hisi_sccl1_ddrc0/rnk_chg/ [Kernel PMU event] ... hisi_sccl7_l3c21/wr_hit_cpipe/ [Kernel PMU event] hisi_sccl7_l3c21/wr_hit_spipe/ [Kernel PMU event] hisi_sccl7_l3c21/wr_spipe/ [Kernel PMU event] inst_retired OR armv8_pmuv3_0/inst_retired/ [Kernel PMU event] inst_spec OR armv8_pmuv3_0/inst_spec/ [Kernel PMU event] itlb_walk OR armv8_pmuv3_0/itlb_walk/ [Kernel PMU event] l1d_cache OR armv8_pmuv3_0/l1d_cache/ [Kernel PMU event] l1d_cache_refill OR armv8_pmuv3_0/l1d_cache_refill/ [Kernel PMU event] l1d_cache_wb OR armv8_pmuv3_0/l1d_cache_wb/ [Kernel PMU event] l1d_tlb OR armv8_pmuv3_0/l1d_tlb/ [Kernel PMU event] l1d_tlb_refill OR armv8_pmuv3_0/l1d_tlb_refill/ [Kernel PMU event] So the events are list alphabetically. However, CPU core event listing is special from commit dc098b35b56f ("perf list: List kernel supplied event aliases"), in that the alias and full event is shown (in that order). As such, the core events may become sparse. Improve this by grouping the CPU core events and ensure that they are listed first for kernel PMU events. For the first example, above, this now looks like: duration_time [Tool event] branch-instructions OR cpu/branch-instructions/ [Kernel PMU event] branch-misses OR cpu/branch-misses/ [Kernel PMU event] bus-cycles OR cpu/bus-cycles/ [Kernel PMU event] cache-misses OR cpu/cache-misses/ [Kernel PMU event] cache-references OR cpu/cache-references/ [Kernel PMU event] cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event] cycles-ct OR cpu/cycles-ct/ [Kernel PMU event] cycles-t OR cpu/cycles-t/ [Kernel PMU event] el-abort OR cpu/el-abort/ [Kernel PMU event] el-capacity OR cpu/el-capacity/ [Kernel PMU event] el-commit OR cpu/el-commit/ [Kernel PMU event] el-conflict OR cpu/el-conflict/ [Kernel PMU event] el-start OR cpu/el-start/ [Kernel PMU event] instructions OR cpu/instructions/ [Kernel PMU event] mem-loads OR cpu/mem-loads/ [Kernel PMU event] mem-stores OR cpu/mem-stores/ [Kernel PMU event] ref-cycles OR cpu/ref-cycles/ [Kernel PMU event] topdown-fetch-bubbles OR cpu/topdown-fetch-bubbles/ [Kernel PMU event] topdown-recovery-bubbles OR cpu/topdown-recovery-bubbles/ [Kernel PMU event] topdown-slots-issued OR cpu/topdown-slots-issued/ [Kernel PMU event] topdown-slots-retired OR cpu/topdown-slots-retired/ [Kernel PMU event] topdown-total-slots OR cpu/topdown-total-slots/ [Kernel PMU event] tx-abort OR cpu/tx-abort/ [Kernel PMU event] tx-capacity OR cpu/tx-capacity/ [Kernel PMU event] tx-commit OR cpu/tx-commit/ [Kernel PMU event] tx-conflict OR cpu/tx-conflict/ [Kernel PMU event] tx-start OR cpu/tx-start/ [Kernel PMU event] cstate_core/c3-residency/ [Kernel PMU event] cstate_core/c6-residency/ [Kernel PMU event] cstate_core/c7-residency/ [Kernel PMU event] cstate_pkg/c2-residency/ [Kernel PMU event] cstate_pkg/c3-residency/ [Kernel PMU event] cstate_pkg/c6-residency/ [Kernel PMU event] cstate_pkg/c7-residency/ [Kernel PMU event] Signed-off-by: John Garry <john.garry@huawei.com> --- tools/perf/util/pmu.c | 7 +++++++ 1 file changed, 7 insertions(+)