Message ID | 1672745976-2800146-2-git-send-email-renyu.zj@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add metrics for neoverse-n2 | expand |
On 03/01/2023 11:39, Jing Zhang wrote: > The formula of topdown L1 on neoverse-n2 is from ARM sbsa7.0 platform > design document [0], D37-38. I think that I mentioned this before - if the these metrics are coming from an sbsa doc, then they are standard. As such, we can make them "arch std events" and put them in a common json such as sbsa.json, so that other cores may reuse. You don't strictly have to do do this now, but it would be better. Thanks, John > > However, due to the wrong count of stall_slot and stall_slot_frontend on > neoverse-n2, the real stall_slot and real stall_slot_frontend need to > subtract cpu_cycles, so correct the expression of topdown metrics. > Reference from ARM neoverse-n2 errata notice [1], D117. > > Since neoverse-n2 does not yet support topdown L2, metricgroups such as > Cache, TLB, Branch, InstructionsMix, and PEutilization will be added to > further analysis of performance bottlenecks in the following patches. > Reference from ARM PMU guide [2][3].
在 2023/1/3 下午7:52, John Garry 写道: > On 03/01/2023 11:39, Jing Zhang wrote: >> The formula of topdown L1 on neoverse-n2 is from ARM sbsa7.0 platform >> design document [0], D37-38. > > I think that I mentioned this before - if the these metrics are coming from an sbsa doc, then they are standard. As such, we can make them "arch std events" and put them in a common json such as sbsa.json, so that other cores may reuse. > > You don't strictly have to do do this now, but it would be better. > Hi John, I would really like to do this, but as discussed earlier, slot is different on each architectures. If I do not specify the value of the slot in sbsa.json, then in the json file of n2/v1, I need to overwrite each topdown "MetricExpr". In other words, the metrics placed in the sbsa.json file only reuse "BriefDescription", "MetricGroup" and "ScaleUnit". So I'm not sure if it's acceptable? In addition, James mentioned that if the units and names and group names of different architectures are not unified, it will become complicated. Perhaps we could do it later. Thanks, Jing
On 04/01/2023 05:05, Jing Zhang wrote: > > > 在 2023/1/3 下午7:52, John Garry 写道: >> On 03/01/2023 11:39, Jing Zhang wrote: >>> The formula of topdown L1 on neoverse-n2 is from ARM sbsa7.0 platform >>> design document [0], D37-38. >> >> I think that I mentioned this before - if the these metrics are coming from an sbsa doc, then they are standard. As such, we can make them "arch std events" and put them in a common json such as sbsa.json, so that other cores may reuse. >> >> You don't strictly have to do do this now, but it would be better. >> > > Hi John, Hi Jing, > > I would really like to do this, but as discussed earlier, slot is different on each architectures. > If I do not specify the value of the slot in sbsa.json, then in the json file of n2/v1, I need to > overwrite each topdown "MetricExpr". In other words, the metrics placed in the sbsa.json file only > reuse "BriefDescription", "MetricGroup" and "ScaleUnit". So I'm not sure if it's acceptable? I don't see a lot of value in that really. However, for this value of slot, isn't this discoverable from a system register per core? Quoting the sbsa: "The IMPLEMENTATION DEFINED constant SLOTS is discoverable from the system register PMMIR_EL1.SLOTS." Did you consider how this could be used? > > In addition, James mentioned that if the units and names and group names of different architectures > are not unified, it will become complicated. > Thanks, John
在 2023/1/5 上午1:26, John Garry 写道: > On 04/01/2023 05:05, Jing Zhang wrote: >> >> >> 在 2023/1/3 下午7:52, John Garry 写道: >>> On 03/01/2023 11:39, Jing Zhang wrote: >>>> The formula of topdown L1 on neoverse-n2 is from ARM sbsa7.0 platform >>>> design document [0], D37-38. >>> >>> I think that I mentioned this before - if the these metrics are coming from an sbsa doc, then they are standard. As such, we can make them "arch std events" and put them in a common json such as sbsa.json, so that other cores may reuse. >>> >>> You don't strictly have to do do this now, but it would be better. >>> >> >> Hi John, > > Hi Jing, > >> >> I would really like to do this, but as discussed earlier, slot is different on each architectures. >> If I do not specify the value of the slot in sbsa.json, then in the json file of n2/v1, I need to >> overwrite each topdown "MetricExpr". In other words, the metrics placed in the sbsa.json file only >> reuse "BriefDescription", "MetricGroup" and "ScaleUnit". So I'm not sure if it's acceptable? > > I don't see a lot of value in that really. > > However, for this value of slot, isn't this discoverable from a system register per core? Quoting the sbsa: "The IMPLEMENTATION DEFINED constant SLOTS is discoverable from the system register PMMIR_EL1.SLOTS." Did you consider how this could be used? > This may be a feasible idea. The value of slots comes from the register PMMIR_EL1, which I can read in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. But how do I replace the slots in MetricExpr with the read slots values? Currently I understand that parameters in metricExpr only support events and constants.
On 05/01/2023 10:05, Jing Zhang wrote: >> However, for this value of slot, isn't this discoverable from a system register per core? Quoting the sbsa: "The IMPLEMENTATION DEFINED constant SLOTS is discoverable from the system register PMMIR_EL1.SLOTS." Did you consider how this could be used? >> > > This may be a feasible idea. The value of slots comes from the register PMMIR_EL1, which I can read in > /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. But how do I replace the slots in MetricExpr with the > read slots values? Currently I understand that parameters in metricExpr only support events and constants. > Maybe during runtime we could create a pseudo metric/event for SLOT. This metric would be created during init, and it always just returns the value which was read from PMMIR_EL1. I'm not sure how well that would play will trying to resolve metrics when building generated pmu-events.c, but I don't think it's all too difficult to achieve. Have you actually read this value for the n2 core? Does look correct? Thanks, John
在 2023/1/5 下午6:13, John Garry 写道: > Maybe during runtime we could create a pseudo metric/event for SLOT. This metric would be created during init, and it always just returns the value which was read from PMMIR_EL1. > > I'm not sure how well that would play will trying to resolve metrics when building generated pmu-events.c, but I don't think it's all too difficult to achieve. > I'll try it in the v7 patch. I want to release the v6 patch first, to correct a mistake I made. :) > Have you actually read this value for the n2 core? Does look correct? Yes, I read it in n2 and it has a value of 5 which is correct. If the STALL_SLOT event is not implemented, PMMIR_EL1.SLOT might read as zero.
On Thu, Jan 5, 2023 at 2:13 AM John Garry <john.g.garry@oracle.com> wrote: > > On 05/01/2023 10:05, Jing Zhang wrote: > >> However, for this value of slot, isn't this discoverable from a system register per core? Quoting the sbsa: "The IMPLEMENTATION DEFINED constant SLOTS is discoverable from the system register PMMIR_EL1.SLOTS." Did you consider how this could be used? > >> > > > > This may be a feasible idea. The value of slots comes from the register PMMIR_EL1, which I can read in > > /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. But how do I replace the slots in MetricExpr with the > > read slots values? Currently I understand that parameters in metricExpr only support events and constants. > > > > Maybe during runtime we could create a pseudo metric/event for SLOT. For Intel we do this by just having a different constant for each architecture. It is fairly easy to add a new "literal", so you could add a #slots in expr__get_literal: https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/expr.c?h=perf/core#n407 Populating it would be the challenge :-) Thanks, Ian > This metric would be created during init, and it always just returns the > value which was read from PMMIR_EL1. > > I'm not sure how well that would play will trying to resolve metrics > when building generated pmu-events.c, but I don't think it's all too > difficult to achieve. > > Have you actually read this value for the n2 core? Does look correct? > > Thanks, > John
On 05/01/2023 21:13, Ian Rogers wrote: >>> This may be a feasible idea. The value of slots comes from the register PMMIR_EL1, which I can read in >>> /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. But how do I replace the slots in MetricExpr with the >>> read slots values? Currently I understand that parameters in metricExpr only support events and constants. >>> >> Maybe during runtime we could create a pseudo metric/event for SLOT. > For Intel we do this by just having a different constant for each > architecture. It is fairly easy to add a new "literal", so you could > add a #slots in expr__get_literal: > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/expr.c?h=perf*core*n407__;LyM!!ACWV5N9M2RV99hQ!IHcZFuFaLdQDQvVOnHVlbbME2S4aW8GohWUkydlejpi7ifFz61r7RutGXReRt0d88X_vDfkTySCiuD2PqOA$ > Populating it would be the challenge
在 2023/1/6 下午6:14, John Garry 写道: > On 05/01/2023 21:13, Ian Rogers wrote: >>>> This may be a feasible idea. The value of slots comes from the register PMMIR_EL1, which I can read in >>>> /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. But how do I replace the slots in MetricExpr with the >>>> read slots values? Currently I understand that parameters in metricExpr only support events and constants. >>>> >>> Maybe during runtime we could create a pseudo metric/event for SLOT. >> For Intel we do this by just having a different constant for each >> architecture. It is fairly easy to add a new "literal", so you could >> add a #slots in expr__get_literal: >> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/expr.c?h=perf*core*n407__;LyM!!ACWV5N9M2RV99hQ!IHcZFuFaLdQDQvVOnHVlbbME2S4aW8GohWUkydlejpi7ifFz61r7RutGXReRt0d88X_vDfkTySCiuD2PqOA$ Populating it would be the challenge
On 06/01/2023 10:14, John Garry wrote: > On 05/01/2023 21:13, Ian Rogers wrote: >>>> This may be a feasible idea. The value of slots comes from the >>>> register PMMIR_EL1, which I can read in >>>> /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. But how do I >>>> replace the slots in MetricExpr with the >>>> read slots values? Currently I understand that parameters in >>>> metricExpr only support events and constants. >>>> >>> Maybe during runtime we could create a pseudo metric/event for SLOT. >> For Intel we do this by just having a different constant for each >> architecture. It is fairly easy to add a new "literal", so you could >> add a #slots in expr__get_literal: >> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/expr.c?h=perf*core*n407__;LyM!!ACWV5N9M2RV99hQ!IHcZFuFaLdQDQvVOnHVlbbME2S4aW8GohWUkydlejpi7ifFz61r7RutGXReRt0d88X_vDfkTySCiuD2PqOA$ Populating it would be the challenge
On Mon, Jan 9, 2023 at 7:35 AM James Clark <james.clark@arm.com> wrote: > > > > On 06/01/2023 10:14, John Garry wrote: > > On 05/01/2023 21:13, Ian Rogers wrote: > >>>> This may be a feasible idea. The value of slots comes from the > >>>> register PMMIR_EL1, which I can read in > >>>> /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. But how do I > >>>> replace the slots in MetricExpr with the > >>>> read slots values? Currently I understand that parameters in > >>>> metricExpr only support events and constants. > >>>> > >>> Maybe during runtime we could create a pseudo metric/event for SLOT. > >> For Intel we do this by just having a different constant for each > >> architecture. It is fairly easy to add a new "literal", so you could > >> add a #slots in expr__get_literal: > >> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/expr.c?h=perf*core*n407__;LyM!!ACWV5N9M2RV99hQ!IHcZFuFaLdQDQvVOnHVlbbME2S4aW8GohWUkydlejpi7ifFz61r7RutGXReRt0d88X_vDfkTySCiuD2PqOA$ Populating it would be the challenge
diff --git a/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json new file mode 100644 index 0000000..c126f1bc --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-n2/metrics.json @@ -0,0 +1,30 @@ +[ + { + "MetricExpr": "(stall_slot_frontend - cpu_cycles) / (5 * cpu_cycles)", + "BriefDescription": "Frontend bound L1 topdown metric", + "MetricGroup": "TopdownL1", + "MetricName": "frontend_bound", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "(1 - op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))", + "BriefDescription": "Bad speculation L1 topdown metric", + "MetricGroup": "TopdownL1", + "MetricName": "bad_speculation", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot - cpu_cycles) / (5 * cpu_cycles))", + "BriefDescription": "Retiring L1 topdown metric", + "MetricGroup": "TopdownL1", + "MetricName": "retiring", + "ScaleUnit": "100%" + }, + { + "MetricExpr": "stall_slot_backend / (5 * cpu_cycles)", + "BriefDescription": "Backend Bound L1 topdown metric", + "MetricGroup": "TopdownL1", + "MetricName": "backend_bound", + "ScaleUnit": "100%" + } +]