Message ID | 1668411720-3581-2-git-send-email-renyu.zj@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add metrics for neoverse-n2 | expand |
On 14/11/2022 07:41, Jing Zhang wrote: > The calculation formula of topdown L1 is from the document: > https://urldefense.com/v3/__https://documentation-service.arm.com/static/60250c7395978b529036da86?token=__;!!ACWV5N9M2RV99hQ!Ll-Jgvfs0LitTCU-hC6i6BKBVJfhke-pbQq2VoO-gmuSAcglQ3ZqMVMd2r0An_5a3ZDPYmn8zXuCrpUbehwnLHplVQ$ So since this is a from "standard" document, did you consider putting these as an arch std event? I think arch std events would work for metrics, like they do for regular events. > > However, due to the wrong count of stall_slot and stall_slot_frontend > in neoverse-n2, the real stall_slot and real stall_slot_frontend need > to subtract cpu_cycles, so when calculating the topdownL1 metrics, > stall_slot and stall_slot_frontend are corrected. Is there a reference to this? It would be indeed useful to pass a link to the n2 doc as these metrics are not part of the arm64 arm. At least I assume that they are not there. > > Since neoverse-n2 does not yet support topdown L2, metricgroups such > as Cache, TLB, Branch, InstructionsMix, and PEutilization will be > added to further analysis of performance bottlenecks in the following > patches. > Thanks, John
在 2022/11/14 下午8:59, John Garry 写道: > On 14/11/2022 07:41, Jing Zhang wrote: >> The calculation formula of topdown L1 is from the document: >> https://urldefense.com/v3/__https://documentation-service.arm.com/static/60250c7395978b529036da86?token=__;!!ACWV5N9M2RV99hQ!Ll-Jgvfs0LitTCU-hC6i6BKBVJfhke-pbQq2VoO-gmuSAcglQ3ZqMVMd2r0An_5a3ZDPYmn8zXuCrpUbehwnLHplVQ$ > > So since this is a from "standard" document, did you consider putting these as an arch std event? I think arch std events would work for metrics, like they do for regular events. > I didn't find out how to put the metric as an arch std event, it would be best if you could provide me with an example in the upstream code, thank you very much. >> >> However, due to the wrong count of stall_slot and stall_slot_frontend >> in neoverse-n2, the real stall_slot and real stall_slot_frontend need >> to subtract cpu_cycles, so when calculating the topdownL1 metrics, >> stall_slot and stall_slot_frontend are corrected. > > Is there a reference to this? It would be indeed useful to pass a link to the n2 doc as these metrics are not part of the arm64 arm. At least I assume that they are not there. > You are right, I need to add a doc link. ARM has released the n2 ERRATA document about the incorrect count of stall_slot and stall_slot_frontend, and provides a workaround to get the correct value. Link: https://developer.arm.com/documentation/SDEN1982442/1200/?lang=en >> >> Since neoverse-n2 does not yet support topdown L2, metricgroups such >> as Cache, TLB, Branch, InstructionsMix, and PEutilization will be >> added to further analysis of performance bottlenecks in the following >> patches. >> > > > Thanks, > John Best Regards, Jing
On 15/11/2022 08:43, Jing Zhang wrote: > I didn't find out how to put the metric as an arch std event, it would be best if you could provide me with an example in the upstream code, > thank you very much. As things stand, I don't think it's supported. We only support regular events for std arch events (and not metrics). However we could expand support for metrics. For the example of hip08 and FRONTEND_BOUND, we would have: --->8--- diff --git a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json index 6443a061e22a..5b1ca45224de 100644 --- a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json +++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json @@ -1,10 +1,6 @@ [ { - "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)", - "PublicDescription": "Frontend bound L1 topdown metric", - "BriefDescription": "Frontend bound L1 topdown metric", - "MetricGroup": "TopDownL1", - "MetricName": "frontend_bound" + "ArchStdEvent": "FRONTEND_BOUND" }, { "MetricExpr": "(INST_SPEC - INST_RETIRED) / (4 * CPU_CYCLES)", diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json new file mode 100644 index 000000000000..10b9c0cccc40 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json @@ -0,0 +1,9 @@ +[ + { + "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)", + "PublicDescription": "Frontend bound L1 topdown metric", + "BriefDescription": "Frontend bound L1 topdown metric", + "MetricGroup": "TopDownL1", + "MetricName": "FRONTEND_BOUND" + } +] diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py index 0daa3e007528..77049853c0bf 100755 --- a/tools/perf/pmu-events/jevents.py +++ b/tools/perf/pmu-events/jevents.py @@ -352,6 +352,8 @@ def preprocess_arch_std_files(archpath: str) -> None: for event in read_json_events(item.path, topic=''): if event.name: _arch_std_events[event.name.lower()] = event + if event.metric_name: + _arch_std_events[event.metric_name.lower()] = event def print_events_table_prefix(tblname: str) -> None:
在 2022/11/15 下午7:19, John Garry 写道: > On 15/11/2022 08:43, Jing Zhang wrote: >> I didn't find out how to put the metric as an arch std event, it would be best if you could provide me with an example in the upstream code, >> thank you very much. > > As things stand, I don't think it's supported. We only support regular events for std arch events (and not metrics). > > However we could expand support for metrics. > > For the example of hip08 and FRONTEND_BOUND, we would have: > > --->8--- > > diff --git a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json > index 6443a061e22a..5b1ca45224de 100644 > --- a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json > +++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json > @@ -1,10 +1,6 @@ > [ > { > - "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)", > - "PublicDescription": "Frontend bound L1 topdown metric", > - "BriefDescription": "Frontend bound L1 topdown metric", > - "MetricGroup": "TopDownL1", > - "MetricName": "frontend_bound" > + "ArchStdEvent": "FRONTEND_BOUND" > }, > { > "MetricExpr": "(INST_SPEC - INST_RETIRED) / (4 * CPU_CYCLES)", > diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-events/arch/arm64/sbsa.json > new file mode 100644 > index 000000000000..10b9c0cccc40 > --- /dev/null > +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json > @@ -0,0 +1,9 @@ > +[ > + { > + "MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)", > + "PublicDescription": "Frontend bound L1 topdown metric", > + "BriefDescription": "Frontend bound L1 topdown metric", > + "MetricGroup": "TopDownL1", > + "MetricName": "FRONTEND_BOUND" > + } > +] > diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py > index 0daa3e007528..77049853c0bf 100755 > --- a/tools/perf/pmu-events/jevents.py > +++ b/tools/perf/pmu-events/jevents.py > @@ -352,6 +352,8 @@ def preprocess_arch_std_files(archpath: str) -> None: > for event in read_json_events(item.path, topic=''): > if event.name: > _arch_std_events[event.name.lower()] = event > + if event.metric_name: > + _arch_std_events[event.metric_name.lower()] = event > > > def print_events_table_prefix(tblname: str) -> None: Sorry for slow response. I tried the method you provided, but it didn't work, is there any other steps I am missing? Or is this method not currently supported? diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics index 8ff1dfe..2ad30ec 100644 --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json @@ -1,10 +1,6 @@ [ { - "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)", - "PublicDescription": "Frontend bound L1 topdown metric", - "BriefDescription": "Frontend bound L1 topdown metric", - "MetricGroup": "TopDownL1", - "MetricName": "frontend_bound" + "ArchStdEvent": "FRONTEND_BOUND" }, diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli index f9fae15..e8536e2 100644 --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json @@ -6,9 +6,6 @@ { "ArchStdEvent": "STALL_BACKEND_MEM" - } + }, + { + "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)", + "PublicDescription": "Frontend bound L1 topdown metric", + "BriefDescription": "Frontend bound L1 topdown metric", + "MetricGroup": "TopDownL1", + "MetricName": "FRONTEND_BOUND" + } ] diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jevents.py index 0daa3e0..7704985 100755 --- a/tools/perf/pmu-events/jevents.py +++ b/tools/perf/pmu-events/jevents.py @@ -352,6 +352,8 @@ def preprocess_arch_std_files(archpath: str) -> None: for event in read_json_events(item.path, topic=''): if event.name: _arch_std_events[event.name.lower()] = event + if event.metric_name: + _arch_std_events[event.metric_name.lower()] = event #./perf stat -e FRONTEND_BOUND sleep 1 event syntax error: 'FRONTEND_BOUND' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli index f9fae15..1089ca0 100644 --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json @@ -6,18 +6,24 @@ "ArchStdEvent": "STALL_BACKEND" }, { - "ArchStdEvent": "STALL_SLOT_FRONTEND" + "ArchStdEvent": "STALL_SLOT_FRONTEND", + "MetricExpr": "STALL_SLOT_FRONTEND - CPU_CYCLES" }, { #./perf stat -e stall_slot_frontend sleep 1 Add CPU_CYCLES event to groups to get metric expression for stall_slot_frontend Performance counter stats for 'sleep 1': 5,125,457 stall_slot_frontend //it's still the original value. 1.001017680 seconds time elapsed 0.001162000 seconds user 0.000000000 seconds sys Thanks, Jing
> > > #./perf stat -e FRONTEND_BOUND sleep 1 > event syntax error: 'FRONTEND_BOUND' For metrics, use -M, not -e If this doesn't help, verify generated pmu-events/pmu-events.c is same after you make the change to try to use std arch events for metrics. Note that I never tested running my change. Thanks, John > \___ parser error > Run 'perf list' for a list of valid events > > Usage: perf stat [<options>] [<command>] > > -e, --event <event> event selector. use 'perf list' to list available events > > > > diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli > index f9fae15..1089ca0 100644 > --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json > +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json > @@ -6,18 +6,24 @@ > "ArchStdEvent": "STALL_BACKEND" > }, > { > - "ArchStdEvent": "STALL_SLOT_FRONTEND" > + "ArchStdEvent": "STALL_SLOT_FRONTEND", > + "MetricExpr": "STALL_SLOT_FRONTEND - CPU_CYCLES" > }, > { >
在 2022/11/21 下午6:22, John Garry 写道: > >> >> >> #./perf stat -e FRONTEND_BOUND sleep 1 >> event syntax error: 'FRONTEND_BOUND' > > For metrics, use -M, not -e > > If this doesn't help, verify generated pmu-events/pmu-events.c is same after you make the change to try to use std arch events for metrics. Note that I never tested running my change. > > Thanks, > John > >> \___ parser error >> Run 'perf list' for a list of valid events >> >> Usage: perf stat [<options>] [<command>] >> >> -e, --event <event> event selector. use 'perf list' to list available events >> >> >> >> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeli >> index f9fae15..1089ca0 100644 >> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json >> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/pipeline.json >> @@ -6,18 +6,24 @@ >> "ArchStdEvent": "STALL_BACKEND" >> }, >> { >> - "ArchStdEvent": "STALL_SLOT_FRONTEND" >> + "ArchStdEvent": "STALL_SLOT_FRONTEND", >> + "MetricExpr": "STALL_SLOT_FRONTEND - CPU_CYCLES" >> }, >> { >> I'm sorry that I misunderstood the purpose of putting metric as arch_std_event at first, and now it works after the modification over your suggestion. But there are also a few questions: 1. The value of the slot in the topdownL1 is various in different architectures, for example, the slot is 5 on neoverse-n2. If I put topdownL1 metric as arch_std_event, then I need to specify the slot to 5 in n2. I can specify slot values in metric like below, but is there any other concise way to do this? diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json index 8ff1dfe..b473baf 100644 --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json @@ -1,4 +1,23 @@ [ + { + "MetricExpr": "5", + "PublicDescription": "A pipeline slot represents the hardware resources needed to process one uOp", + "BriefDescription": "A pipeline slot represents the hardware resources needed to process one uOp", + "MetricName": "slot" + }, + { + "ArchStdEvent": "FRONTEND_BOUND" + }, + { + "ArchStdEvent": "BACKEND_BOUND" + }, + { + "ArchStdEvent": "WASTED" + }, + { + "ArchStdEvent": "RETIRING" + }, 2. Should I add the topdownL1 metric to tools/perf/pmu-event/recommended.json, or create a new json file to place the general metric? Looking forward to your reply. Thanks, Jing
On 21/11/2022 15:17, Jing Zhang wrote: > I'm sorry that I misunderstood the purpose of putting metric as arch_std_event at first, > and now it works after the modification over your suggestion. > > But there are also a few questions: > > 1. The value of the slot in the topdownL1 is various in different architectures, for example, > the slot is 5 on neoverse-n2. If I put topdownL1 metric as arch_std_event, then I need to > specify the slot to 5 in n2. I can specify slot values in metric like below, but is there any > other concise way to do this? > > diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json > index 8ff1dfe..b473baf 100644 > --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json > +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json > @@ -1,4 +1,23 @@ > [ > + { > + "MetricExpr": "5", > + "PublicDescription": "A pipeline slot represents the hardware resources needed to process one uOp", > + "BriefDescription": "A pipeline slot represents the hardware resources needed to process one uOp", > + "MetricName": "slot" Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an opinion on this? It is possible to reuse metrics, so it should work, but... One problem is that "slot" would show up as a metric, which you would not want. Alternatively I was going to suggest that you can overwrite specific std arch event attributes. So for example of frontend_bound, you could have: + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json @@ -0,0 +1,30 @@ [ { "ArchStdEvent": "FRONTEND_BOUND", "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)", }, > + } > + { > + "ArchStdEvent": "FRONTEND_BOUND" > + }, > + { > + "ArchStdEvent": "BACKEND_BOUND" > + }, > + { > + "ArchStdEvent": "WASTED" > + }, > + { > + "ArchStdEvent": "RETIRING" > + }, > > > 2. Should I add the topdownL1 metric to tools/perf/pmu-event/recommended.json, > or create a new json file to place the general metric? It would not belong in recommended.json as that is specifically for arch-recommended events. It would really just depend on where the value comes from, i.e. arm arm or sbsa. > > Looking forward to your reply.
在 2022/11/22 上午1:55, John Garry 写道: > On 21/11/2022 15:17, Jing Zhang wrote: >> I'm sorry that I misunderstood the purpose of putting metric as arch_std_event at first, >> and now it works after the modification over your suggestion. >> >> But there are also a few questions: >> >> 1. The value of the slot in the topdownL1 is various in different architectures, for example, >> the slot is 5 on neoverse-n2. If I put topdownL1 metric as arch_std_event, then I need to >> specify the slot to 5 in n2. I can specify slot values in metric like below, but is there any >> other concise way to do this? >> >> diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >> index 8ff1dfe..b473baf 100644 >> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >> @@ -1,4 +1,23 @@ >> [ >> + { >> + "MetricExpr": "5", >> + "PublicDescription": "A pipeline slot represents the hardware resources needed to process one uOp", >> + "BriefDescription": "A pipeline slot represents the hardware resources needed to process one uOp", >> + "MetricName": "slot" > > Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an opinion on this? It is possible to reuse metrics, so it should work, but... > > One problem is that "slot" would show up as a metric, which you would not want. > > Alternatively I was going to suggest that you can overwrite specific std arch event attributes. So for example of frontend_bound, you could have: > > + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json > @@ -0,0 +1,30 @@ > [ > { > "ArchStdEvent": "FRONTEND_BOUND", > "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)", > }, > >> + } >> + { >> + "ArchStdEvent": "FRONTEND_BOUND" >> + }, >> + { >> + "ArchStdEvent": "BACKEND_BOUND" >> + }, >> + { >> + "ArchStdEvent": "WASTED" >> + }, >> + { >> + "ArchStdEvent": "RETIRING" >> + }, >> >> >> 2. Should I add the topdownL1 metric to tools/perf/pmu-event/recommended.json, >> or create a new json file to place the general metric? > > It would not belong in recommended.json as that is specifically for arch-recommended events. It would really just depend on where the value comes from, i.e. arm arm or sbsa. > Thanks for your suggestion, I will send next patchset as you suggested.
On 21/11/2022 17:55, John Garry wrote: > On 21/11/2022 15:17, Jing Zhang wrote: >> I'm sorry that I misunderstood the purpose of putting metric as >> arch_std_event at first, >> and now it works after the modification over your suggestion. >> >> But there are also a few questions: >> >> 1. The value of the slot in the topdownL1 is various in different >> architectures, for example, >> the slot is 5 on neoverse-n2. If I put topdownL1 metric as >> arch_std_event, then I need to >> specify the slot to 5 in n2. I can specify slot values in metric like >> below, but is there any >> other concise way to do this? >> >> diff --git >> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >> index 8ff1dfe..b473baf 100644 >> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >> @@ -1,4 +1,23 @@ >> [ >> + { >> + "MetricExpr": "5", >> + "PublicDescription": "A pipeline slot represents the >> hardware resources needed to process one uOp", >> + "BriefDescription": "A pipeline slot represents the >> hardware resources needed to process one uOp", >> + "MetricName": "slot" > > Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an > opinion on this? It is possible to reuse metrics, so it should work, but... > > One problem is that "slot" would show up as a metric, which you would > not want. > > Alternatively I was going to suggest that you can overwrite specific std > arch event attributes. So for example of frontend_bound, you could have: I would agree with not having this and just hard coding the 5 wherever it's needed. Once we have a few different sets of metrics in place maybe we can start to look at deduplication, but for now I don't see the value. > > + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json > @@ -0,0 +1,30 @@ > [ > { > "ArchStdEvent": "FRONTEND_BOUND", > "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * > cpu_cycles)", > }, > >> + } >> + { >> + "ArchStdEvent": "FRONTEND_BOUND" >> + }, >> + { >> + "ArchStdEvent": "BACKEND_BOUND" >> + }, >> + { >> + "ArchStdEvent": "WASTED" >> + }, >> + { >> + "ArchStdEvent": "RETIRING" >> + }, >> >> >> 2. Should I add the topdownL1 metric to >> tools/perf/pmu-event/recommended.json, >> or create a new json file to place the general metric? > > It would not belong in recommended.json as that is specifically for > arch-recommended events. It would really just depend on where the value > comes from, i.e. arm arm or sbsa. > For what we're going to publish shortly we'll be generating a metrics.json file for each CPU. It will be autogenerated so I don't think duplication will be an issue and I'm expecting that there will be differences in the topdown metrics between CPUs anyway. So I would also vote to not put it in recommended.json >> >> Looking forward to your reply. >
在 2022/11/22 下午10:00, James Clark 写道: > > > On 21/11/2022 17:55, John Garry wrote: >> On 21/11/2022 15:17, Jing Zhang wrote: >>> I'm sorry that I misunderstood the purpose of putting metric as >>> arch_std_event at first, >>> and now it works after the modification over your suggestion. >>> >>> But there are also a few questions: >>> >>> 1. The value of the slot in the topdownL1 is various in different >>> architectures, for example, >>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as >>> arch_std_event, then I need to >>> specify the slot to 5 in n2. I can specify slot values in metric like >>> below, but is there any >>> other concise way to do this? >>> >>> diff --git >>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>> index 8ff1dfe..b473baf 100644 >>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>> @@ -1,4 +1,23 @@ >>> [ >>> + { >>> + "MetricExpr": "5", >>> + "PublicDescription": "A pipeline slot represents the >>> hardware resources needed to process one uOp", >>> + "BriefDescription": "A pipeline slot represents the >>> hardware resources needed to process one uOp", >>> + "MetricName": "slot" >> >> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an >> opinion on this? It is possible to reuse metrics, so it should work, but... >> >> One problem is that "slot" would show up as a metric, which you would >> not want. >> >> Alternatively I was going to suggest that you can overwrite specific std >> arch event attributes. So for example of frontend_bound, you could have: > > I would agree with not having this and just hard coding the 5 wherever > it's needed. Once we have a few different sets of metrics in place maybe > we can start to look at deduplication, but for now I don't see the value. > >> >> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >> @@ -0,0 +1,30 @@ >> [ >> { >> "ArchStdEvent": "FRONTEND_BOUND", >> "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * >> cpu_cycles)", >> }, >> >>> + } >>> + { >>> + "ArchStdEvent": "FRONTEND_BOUND" >>> + }, >>> + { >>> + "ArchStdEvent": "BACKEND_BOUND" >>> + }, >>> + { >>> + "ArchStdEvent": "WASTED" >>> + }, >>> + { >>> + "ArchStdEvent": "RETIRING" >>> + }, >>> >>> >>> 2. Should I add the topdownL1 metric to >>> tools/perf/pmu-event/recommended.json, >>> or create a new json file to place the general metric? >> >> It would not belong in recommended.json as that is specifically for >> arch-recommended events. It would really just depend on where the value >> comes from, i.e. arm arm or sbsa. >> > > For what we're going to publish shortly we'll be generating a > metrics.json file for each CPU. It will be autogenerated so I don't > think duplication will be an issue and I'm expecting that there will be > differences in the topdown metrics between CPUs anyway. So I would also > vote to not put it in recommended.json > I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/ to place metrics that may be common between some CPUs, just like arch_std_event. If the topdown metrics are different in other CPUs, we can overwrite the metric expression. For example: +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json @@ -0,0 +1,9 @@ +[ + { + "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)", + "PublicDescription": "Frontend bound L1 topdown metric", + "BriefDescription": "Frontend bound L1 topdown metric", + "MetricGroup": "TopDownL1", + "MetricName": "FRONTEND_BOUND" + } +] + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json @@ -0,0 +1,30 @@ +[ + { + "ArchStdEvent": "FRONTEND_BOUND", + "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)", + } +] In addition, I can also add TLB, Cache, Branch, InstructionMix, PEutilization and other metric groups into sbsa.json, because they are also applicable to neoverse-n1. Above metrics are described in the documentation of neoverse-n1: https://developer.arm.com/documentation/PJDOC-466751330-547673/r4p1/ Thanks, Jing >>> >>> Looking forward to your reply. >>
On 22/11/2022 15:41, Jing Zhang wrote: > > > 在 2022/11/22 下午10:00, James Clark 写道: >> >> >> On 21/11/2022 17:55, John Garry wrote: >>> On 21/11/2022 15:17, Jing Zhang wrote: >>>> I'm sorry that I misunderstood the purpose of putting metric as >>>> arch_std_event at first, >>>> and now it works after the modification over your suggestion. >>>> >>>> But there are also a few questions: >>>> >>>> 1. The value of the slot in the topdownL1 is various in different >>>> architectures, for example, >>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as >>>> arch_std_event, then I need to >>>> specify the slot to 5 in n2. I can specify slot values in metric like >>>> below, but is there any >>>> other concise way to do this? >>>> >>>> diff --git >>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>> index 8ff1dfe..b473baf 100644 >>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>> @@ -1,4 +1,23 @@ >>>> [ >>>> + { >>>> + "MetricExpr": "5", >>>> + "PublicDescription": "A pipeline slot represents the >>>> hardware resources needed to process one uOp", >>>> + "BriefDescription": "A pipeline slot represents the >>>> hardware resources needed to process one uOp", >>>> + "MetricName": "slot" >>> >>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an >>> opinion on this? It is possible to reuse metrics, so it should work, but... >>> >>> One problem is that "slot" would show up as a metric, which you would >>> not want. >>> >>> Alternatively I was going to suggest that you can overwrite specific std >>> arch event attributes. So for example of frontend_bound, you could have: >> >> I would agree with not having this and just hard coding the 5 wherever >> it's needed. Once we have a few different sets of metrics in place maybe >> we can start to look at deduplication, but for now I don't see the value. >> >>> >>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>> @@ -0,0 +1,30 @@ >>> [ >>> { >>> "ArchStdEvent": "FRONTEND_BOUND", >>> "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * >>> cpu_cycles)", >>> }, >>> >>>> + } >>>> + { >>>> + "ArchStdEvent": "FRONTEND_BOUND" >>>> + }, >>>> + { >>>> + "ArchStdEvent": "BACKEND_BOUND" >>>> + }, >>>> + { >>>> + "ArchStdEvent": "WASTED" >>>> + }, >>>> + { >>>> + "ArchStdEvent": "RETIRING" >>>> + }, >>>> >>>> >>>> 2. Should I add the topdownL1 metric to >>>> tools/perf/pmu-event/recommended.json, >>>> or create a new json file to place the general metric? >>> >>> It would not belong in recommended.json as that is specifically for >>> arch-recommended events. It would really just depend on where the value >>> comes from, i.e. arm arm or sbsa. >>> >> >> For what we're going to publish shortly we'll be generating a >> metrics.json file for each CPU. It will be autogenerated so I don't >> think duplication will be an issue and I'm expecting that there will be >> differences in the topdown metrics between CPUs anyway. So I would also >> vote to not put it in recommended.json >> > > I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/ > to place metrics that may be common between some CPUs, just like arch_std_event. Because this would apply to all CPUs rather than just N2, I still think it's best to wait for our metrics repo to be published. Otherwise Arm will start publishing metrics with names and group names for all future CPUs that have different names to the common ones added as part of this change. It's something that we've been working on for quite a while and we've taken care to make sure that it applies to future products and is scalable. It would be easier to add these right now only for N2, and then afterwards we can start to look at what is common and could be factored out into the top level folder. > If the topdown metrics are different in other CPUs, we can overwrite the > metric expression. True, but with different group names and metric names and units it could get slightly complicated. > > For example: > > +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json > @@ -0,0 +1,9 @@ > +[ > + { > + "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)", > + "PublicDescription": "Frontend bound L1 topdown metric", > + "BriefDescription": "Frontend bound L1 topdown metric", > + "MetricGroup": "TopDownL1", > + "MetricName": "FRONTEND_BOUND" > + } > +] > > + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json > @@ -0,0 +1,30 @@ > +[ > + { > + "ArchStdEvent": "FRONTEND_BOUND", > + "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)", > + } > +] > With the auto generation of metrics file I don't really see too much benefit of doing it this way. You also run into the issue where if a platform happens to define all of the events required by a metric, will that metric appear automatically, even if it's not valid? > > In addition, I can also add TLB, Cache, Branch, InstructionMix, PEutilization > and other metric groups into sbsa.json, because they are also applicable to > neoverse-n1. Above metrics are described in the documentation of neoverse-n1: > https://developer.arm.com/documentation/PJDOC-466751330-547673/r4p1/ > > > Thanks, > Jing > > >>>> >>>> Looking forward to your reply. >>>
在 2022/11/23 下午10:26, James Clark 写道: > > > On 22/11/2022 15:41, Jing Zhang wrote: >> >> >> 在 2022/11/22 下午10:00, James Clark 写道: >>> >>> >>> On 21/11/2022 17:55, John Garry wrote: >>>> On 21/11/2022 15:17, Jing Zhang wrote: >>>>> I'm sorry that I misunderstood the purpose of putting metric as >>>>> arch_std_event at first, >>>>> and now it works after the modification over your suggestion. >>>>> >>>>> But there are also a few questions: >>>>> >>>>> 1. The value of the slot in the topdownL1 is various in different >>>>> architectures, for example, >>>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as >>>>> arch_std_event, then I need to >>>>> specify the slot to 5 in n2. I can specify slot values in metric like >>>>> below, but is there any >>>>> other concise way to do this? >>>>> >>>>> diff --git >>>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>>> index 8ff1dfe..b473baf 100644 >>>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>>> @@ -1,4 +1,23 @@ >>>>> [ >>>>> + { >>>>> + "MetricExpr": "5", >>>>> + "PublicDescription": "A pipeline slot represents the >>>>> hardware resources needed to process one uOp", >>>>> + "BriefDescription": "A pipeline slot represents the >>>>> hardware resources needed to process one uOp", >>>>> + "MetricName": "slot" >>>> >>>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an >>>> opinion on this? It is possible to reuse metrics, so it should work, but... >>>> >>>> One problem is that "slot" would show up as a metric, which you would >>>> not want. >>>> >>>> Alternatively I was going to suggest that you can overwrite specific std >>>> arch event attributes. So for example of frontend_bound, you could have: >>> >>> I would agree with not having this and just hard coding the 5 wherever >>> it's needed. Once we have a few different sets of metrics in place maybe >>> we can start to look at deduplication, but for now I don't see the value. >>> >>>> >>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>> @@ -0,0 +1,30 @@ >>>> [ >>>> { >>>> "ArchStdEvent": "FRONTEND_BOUND", >>>> "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * >>>> cpu_cycles)", >>>> }, >>>> >>>>> + } >>>>> + { >>>>> + "ArchStdEvent": "FRONTEND_BOUND" >>>>> + }, >>>>> + { >>>>> + "ArchStdEvent": "BACKEND_BOUND" >>>>> + }, >>>>> + { >>>>> + "ArchStdEvent": "WASTED" >>>>> + }, >>>>> + { >>>>> + "ArchStdEvent": "RETIRING" >>>>> + }, >>>>> >>>>> >>>>> 2. Should I add the topdownL1 metric to >>>>> tools/perf/pmu-event/recommended.json, >>>>> or create a new json file to place the general metric? >>>> >>>> It would not belong in recommended.json as that is specifically for >>>> arch-recommended events. It would really just depend on where the value >>>> comes from, i.e. arm arm or sbsa. >>>> >>> >>> For what we're going to publish shortly we'll be generating a >>> metrics.json file for each CPU. It will be autogenerated so I don't >>> think duplication will be an issue and I'm expecting that there will be >>> differences in the topdown metrics between CPUs anyway. So I would also >>> vote to not put it in recommended.json >>> >> >> I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/ >> to place metrics that may be common between some CPUs, just like arch_std_event. > > Because this would apply to all CPUs rather than just N2, I still think > it's best to wait for our metrics repo to be published. Otherwise Arm > will start publishing metrics with names and group names for all future > CPUs that have different names to the common ones added as part of this > change. > > It's something that we've been working on for quite a while and we've > taken care to make sure that it applies to future products and is scalable. > > It would be easier to add these right now only for N2, and then > afterwards we can start to look at what is common and could be factored > out into the top level folder. > >> If the topdown metrics are different in other CPUs, we can overwrite the >> metric expression. > > True, but with different group names and metric names and units it could > get slightly complicated. > >> >> For example: >> >> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json >> @@ -0,0 +1,9 @@ >> +[ >> + { >> + "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)", >> + "PublicDescription": "Frontend bound L1 topdown metric", >> + "BriefDescription": "Frontend bound L1 topdown metric", >> + "MetricGroup": "TopDownL1", >> + "MetricName": "FRONTEND_BOUND" >> + } >> +] >> >> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >> @@ -0,0 +1,30 @@ >> +[ >> + { >> + "ArchStdEvent": "FRONTEND_BOUND", >> + "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)", >> + } >> +] >> > > With the auto generation of metrics file I don't really see too much > benefit of doing it this way. > > You also run into the issue where if a platform happens to define all of > the events required by a metric, will that metric appear automatically, > even if it's not valid? > Ok, I agree to put the topdown metric in the n2 metric instead of arch_std_event. There is no unified formula for the topdown metric currently, and the slots of each CPU may be different. After the standard are pubulished in the future, please consider what John said, and use the general metric as arch_std_event.
On 24/11/2022 16:32, Jing Zhang wrote: > > > 在 2022/11/23 下午10:26, James Clark 写道: >> >> >> On 22/11/2022 15:41, Jing Zhang wrote: >>> >>> >>> 在 2022/11/22 下午10:00, James Clark 写道: >>>> >>>> >>>> On 21/11/2022 17:55, John Garry wrote: >>>>> On 21/11/2022 15:17, Jing Zhang wrote: >>>>>> I'm sorry that I misunderstood the purpose of putting metric as >>>>>> arch_std_event at first, >>>>>> and now it works after the modification over your suggestion. >>>>>> >>>>>> But there are also a few questions: >>>>>> >>>>>> 1. The value of the slot in the topdownL1 is various in different >>>>>> architectures, for example, >>>>>> the slot is 5 on neoverse-n2. If I put topdownL1 metric as >>>>>> arch_std_event, then I need to >>>>>> specify the slot to 5 in n2. I can specify slot values in metric like >>>>>> below, but is there any >>>>>> other concise way to do this? >>>>>> >>>>>> diff --git >>>>>> a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>>>> b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>>>> index 8ff1dfe..b473baf 100644 >>>>>> --- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>>>> +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>>>> @@ -1,4 +1,23 @@ >>>>>> [ >>>>>> + { >>>>>> + "MetricExpr": "5", >>>>>> + "PublicDescription": "A pipeline slot represents the >>>>>> hardware resources needed to process one uOp", >>>>>> + "BriefDescription": "A pipeline slot represents the >>>>>> hardware resources needed to process one uOp", >>>>>> + "MetricName": "slot" >>>>> >>>>> Ehhh....I'm not sure if that is a good idea. Ian or anyone else have an >>>>> opinion on this? It is possible to reuse metrics, so it should work, but... >>>>> >>>>> One problem is that "slot" would show up as a metric, which you would >>>>> not want. >>>>> >>>>> Alternatively I was going to suggest that you can overwrite specific std >>>>> arch event attributes. So for example of frontend_bound, you could have: >>>> >>>> I would agree with not having this and just hard coding the 5 wherever >>>> it's needed. Once we have a few different sets of metrics in place maybe >>>> we can start to look at deduplication, but for now I don't see the value. >>>> >>>>> >>>>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>>>> @@ -0,0 +1,30 @@ >>>>> [ >>>>> { >>>>> "ArchStdEvent": "FRONTEND_BOUND", >>>>> "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * >>>>> cpu_cycles)", >>>>> }, >>>>> >>>>>> + } >>>>>> + { >>>>>> + "ArchStdEvent": "FRONTEND_BOUND" >>>>>> + }, >>>>>> + { >>>>>> + "ArchStdEvent": "BACKEND_BOUND" >>>>>> + }, >>>>>> + { >>>>>> + "ArchStdEvent": "WASTED" >>>>>> + }, >>>>>> + { >>>>>> + "ArchStdEvent": "RETIRING" >>>>>> + }, >>>>>> >>>>>> >>>>>> 2. Should I add the topdownL1 metric to >>>>>> tools/perf/pmu-event/recommended.json, >>>>>> or create a new json file to place the general metric? >>>>> >>>>> It would not belong in recommended.json as that is specifically for >>>>> arch-recommended events. It would really just depend on where the value >>>>> comes from, i.e. arm arm or sbsa. >>>>> >>>> >>>> For what we're going to publish shortly we'll be generating a >>>> metrics.json file for each CPU. It will be autogenerated so I don't >>>> think duplication will be an issue and I'm expecting that there will be >>>> differences in the topdown metrics between CPUs anyway. So I would also >>>> vote to not put it in recommended.json >>>> >>> >>> I will create a new sbsa.json file in tools/perf/pmu-events/arch/arm64/ >>> to place metrics that may be common between some CPUs, just like arch_std_event. >> >> Because this would apply to all CPUs rather than just N2, I still think >> it's best to wait for our metrics repo to be published. Otherwise Arm >> will start publishing metrics with names and group names for all future >> CPUs that have different names to the common ones added as part of this >> change. >> >> It's something that we've been working on for quite a while and we've >> taken care to make sure that it applies to future products and is scalable. >> >> It would be easier to add these right now only for N2, and then >> afterwards we can start to look at what is common and could be factored >> out into the top level folder. >> >>> If the topdown metrics are different in other CPUs, we can overwrite the >>> metric expression. >> >> True, but with different group names and metric names and units it could >> get slightly complicated. >> >>> >>> For example: >>> >>> +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json >>> @@ -0,0 +1,9 @@ >>> +[ >>> + { >>> + "MetricExpr": "stall_slot_frontend / (slot * cpu_cycles)", >>> + "PublicDescription": "Frontend bound L1 topdown metric", >>> + "BriefDescription": "Frontend bound L1 topdown metric", >>> + "MetricGroup": "TopDownL1", >>> + "MetricName": "FRONTEND_BOUND" >>> + } >>> +] >>> >>> + b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json >>> @@ -0,0 +1,30 @@ >>> +[ >>> + { >>> + "ArchStdEvent": "FRONTEND_BOUND", >>> + "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)", >>> + } >>> +] >>> >> >> With the auto generation of metrics file I don't really see too much >> benefit of doing it this way. >> >> You also run into the issue where if a platform happens to define all of >> the events required by a metric, will that metric appear automatically, >> even if it's not valid? >> > > Ok, I agree to put the topdown metric in the n2 metric instead of arch_std_event. > There is no unified formula for the topdown metric currently, and the slots of each > CPU may be different. > > After the standard are pubulished in the future, please consider what John said, and > use the general metric as arch_std_event. Yep that sounds good, will do!
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json new file mode 100644 index 0000000..0048dfe --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json @@ -0,0 +1,30 @@ +[ + { + "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)", + "PublicDescription": "Frontend bound L1 topdown metric", + "BriefDescription": "Frontend bound L1 topdown metric", + "MetricGroup": "TopDownL1", + "MetricName": "frontend_bound" + }, + { + "MetricExpr": "(1 - op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))", + "PublicDescription": "Wasted L1 topdown metric", + "BriefDescription": "Wasted L1 topdown metric", + "MetricGroup": "TopDownL1", + "MetricName": "wasted" + }, + { + "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))", + "PublicDescription": "Retiring L1 topdown metric", + "BriefDescription": "Retiring L1 topdown metric", + "MetricGroup": "TopDownL1", + "MetricName": "retiring" + }, + { + "MetricExpr": "stall_slot_backend / (5 * cpu_cycles)", + "PublicDescription": "Backend Bound L1 topdown metric", + "BriefDescription": "Backend Bound L1 topdown metric", + "MetricGroup": "TopDownL1", + "MetricName": "backend_bound" + } +] \ No newline at end of file
The calculation formula of topdown L1 is from the document: https://documentation-service.arm.com/static/60250c7395978b529036da86?token= However, due to the wrong count of stall_slot and stall_slot_frontend in neoverse-n2, the real stall_slot and real stall_slot_frontend need to subtract cpu_cycles, so when calculating the topdownL1 metrics, stall_slot and stall_slot_frontend are corrected. Since neoverse-n2 does not yet support topdown L2, metricgroups such as Cache, TLB, Branch, InstructionsMix, and PEutilization will be added to further analysis of performance bottlenecks in the following patches. Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com> --- .../arch/arm64/arm/neoverse-n2/metrics.json | 30 ++++++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json