Message ID | 20220125192016.20538-2-alisaidi@amazon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Allow perf scripts to process SPE raw data | expand |
Hi Ali, On 25/01/2022 19:20, Ali Saidi wrote: > Providing the arm_spe_record as raw data to the synthesized SPE samples > allows perf scripts to read and separately process the data in ways > existing perf tools don't support and mirrors functionality available > for PEBS. > Signed-off-by: Ali Saidi <alisaidi@amazon.com> > --- > tools/perf/util/arm-spe.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c > index d2b64e3f588b..a7499cde6fc0 100644 > --- a/tools/perf/util/arm-spe.c > +++ b/tools/perf/util/arm-spe.c > @@ -336,6 +336,8 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq, > sample.phys_addr = record->phys_addr; > sample.data_src = data_src; > sample.weight = record->latency; > + sample.raw_size = sizeof(*record); > + sample.raw_data = record; Have you tried this with perf-inject? I think it would need the PERF_SAMPLE_RAW bit in the sample_type, Although I quickly looked over the perf inject code and it looks like it's expecting some type of padding: // synthetic-events.c if (type & PERF_SAMPLE_RAW) { result += sizeof(u32); result += sample->raw_size; } I'm seeing some comments in utils/event.h related to this on the intel events. > > return arm_spe_deliver_synth_event(spe, speq, event, &sample); > } > @@ -354,6 +356,8 @@ static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq, > sample.stream_id = spe_events_id; > sample.addr = record->to_ip; > sample.weight = record->latency; > + sample.raw_size = sizeof(*record); > + sample.raw_data = record; > > return arm_spe_deliver_synth_event(spe, speq, event, &sample); > } > @@ -383,6 +387,8 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq, > sample.data_src = data_src; > sample.period = spe->instructions_sample_period; > sample.weight = record->latency; > + sample.raw_size = sizeof(*record); > + sample.raw_data = record; > > return arm_spe_deliver_synth_event(spe, speq, event, &sample); > }
Hi German, >Hi Ali, > >On 25/01/2022 19:20, Ali Saidi wrote: >> Providing the arm_spe_record as raw data to the synthesized SPE samples >> allows perf scripts to read and separately process the data in ways >> existing perf tools don't support and mirrors functionality available >> for PEBS. >> Signed-off-by: Ali Saidi <alisaidi@amazon.com> >> --- >> tools/perf/util/arm-spe.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c >> index d2b64e3f588b..a7499cde6fc0 100644 >> --- a/tools/perf/util/arm-spe.c >> +++ b/tools/perf/util/arm-spe.c >> @@ -336,6 +336,8 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq, >> sample.phys_addr = record->phys_addr; >> sample.data_src = data_src; >> sample.weight = record->latency; >> + sample.raw_size = sizeof(*record); >> + sample.raw_data = record; > >Have you tried this with perf-inject? I think it would need the PERF_SAMPLE_RAW bit in the sample_type, Yes I've tried the following and it worked as expected with the original perf.data or the perf.data.jitted after perf-inject. perf record -e arm_spe_0/jitter=1/ -k 1 java ... perf inject -f --jit -i perf.data -o perf.data.jitted perf script -i perf.data -s t1.py --itrace=i1i > >Although I quickly looked over the perf inject code and it looks like it's expecting some type of padding: > > // synthetic-events.c > if (type & PERF_SAMPLE_RAW) { > result += sizeof(u32); > result += sample->raw_size; > } > >I'm seeing some comments in utils/event.h related to this on the intel events. Yes i noticed this too,but looking at how the raw data is added to the same other places like intel-pt.c:1703 the perf_synth__raw*() functions are used to strip away the 4 bytes bytes before the data is added to the sample. The other places i can find the padding used is in builtin-script.c but given we have the --dump-raw-trace option it's not clear to me that it's needed to wrap the arm_spe_event in another struct with padding like perf_synth_intel_ptwrite? Thanks, Ali
On 26/01/2022 15:58, Ali Saidi wrote: > Hi German, >> Hi Ali, >> >> On 25/01/2022 19:20, Ali Saidi wrote: >>> [...] >>> >>> + sample.raw_size = sizeof(*record); >>> + sample.raw_data = record; >> Have you tried this with perf-inject? I think it would need the PERF_SAMPLE_RAW bit in the sample_type, > Yes I've tried the following and it worked as expected with the original > perf.data or the perf.data.jitted after perf-inject. > > perf record -e arm_spe_0/jitter=1/ -k 1 java ... > perf inject -f --jit -i perf.data -o perf.data.jitted This is not injecting the synthesized samples. I think it is still processing from the aux trace. Try adding "--itrace=i1i --strip" to the inject command to remove the AUXTRACE events. Judging by the raw samples, the data is missing: 11 20005239445831 0x3510 [0x94]: PERF_RECORD_SAMPLE(IP, 0x1): 47670/47670: 0xffffab6e02d1b320 period: 1 addr: 0 . ... raw event: size 64 bytes . 0000: 09 00 00 00 01 00 40 00 f3 10 9b 3b 00 00 00 00 ......@....;.... . 0010: 98 c5 e3 02 6e ab ff ff 36 ba 00 00 36 ba 00 00 ....n...6...6... . 0020: 35 da 30 d5 31 12 00 00 0b 00 00 00 00 00 00 00 5.0.1........... . 0030: 01 00 00 00 00 00 00 00 82 01 00 88 00 00 00 00 ................ vs when adding PERF_SAMPLE_RAW to attr.sample_type: . ... raw event: size 148 bytes . 0000: 09 00 00 00 01 00 94 00 f3 10 9b 3b 00 00 00 00 ...........;.... . 0010: 98 c5 e3 02 6e ab ff ff 36 ba 00 00 36 ba 00 00 ....n...6...6... [...] . 0080: 00 00 00 00 2a 00 00 00 00 00 00 00 82 01 00 88 ....*........... . 0090: 00 00 00 00 > perf script -i perf.data -s t1.py --itrace=i1i > >> Although I quickly looked over the perf inject code and it looks like it's expecting some type of padding: >> >> [...] >> >> I'm seeing some comments in utils/event.h related to this on the intel events. > Yes i noticed this too,but looking at how the raw data is added to the same > other places like intel-pt.c:1703 the perf_synth__raw*() functions are used to > strip away the 4 bytes bytes before the data is added to the sample. The other > places i can find the padding used is in builtin-script.c but given we have the > --dump-raw-trace option it's not clear to me that it's needed to wrap the > arm_spe_event in another struct with padding like perf_synth_intel_ptwrite? I think the intel use case makes sense because the layout of the data is fixed and documented. If we modify the struct arm_spe_record later it may not be obvious how to match it to the raw data of an older perf.data file. And we're generating bigger files with redundant information. Thanks, German > > Thanks, > Ali >
On 26/01/2022 19:07, German Gomez wrote: [...] >>> Have you tried this with perf-inject? I think it would need the PERF_SAMPLE_RAW bit in the sample_type, >> Yes I've tried the following and it worked as expected with the original >> perf.data or the perf.data.jitted after perf-inject. >> >> perf record -e arm_spe_0/jitter=1/ -k 1 java ... >> perf inject -f --jit -i perf.data -o perf.data.jitted > >This is not injecting the synthesized samples. I think it is still >processing from the aux trace. Try adding "--itrace=i1i --strip" to the >inject command to remove the AUXTRACE events. Judging by the raw >samples, the data is missing: > > [...] Yep, you're correct here. If I use the command above the raw samples are lost. >>> Although I quickly looked over the perf inject code and it looks like it's expecting some type of padding: >>> >>> [...] >>> >>> I'm seeing some comments in utils/event.h related to this on the intel events. >> Yes i noticed this too,but looking at how the raw data is added to the same >> other places like intel-pt.c:1703 the perf_synth__raw*() functions are used to >> strip away the 4 bytes bytes before the data is added to the sample. The other >> places i can find the padding used is in builtin-script.c but given we have the >> --dump-raw-trace option it's not clear to me that it's needed to wrap the >> arm_spe_event in another struct with padding like perf_synth_intel_ptwrite? > >I think the intel use case makes sense because the layout of the data >is fixed and documented. If we modify the struct arm_spe_record later it >may not be obvious how to match it to the raw data of an older perf.data >file. And we're generating bigger files with redundant information. Not injecting the samples into the perf trace, but having a way to support custom scripts parsing the data would be really useful and much faster than trying to parse back the --dump-raw-trace output into something useful. The other way to go would be to put a header that describes the version of the spe struct at the head of it to address any future changes, but I'm not familiar with a workflow that would benefit from the added complexity. Thanks, Ali
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index d2b64e3f588b..a7499cde6fc0 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -336,6 +336,8 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq, sample.phys_addr = record->phys_addr; sample.data_src = data_src; sample.weight = record->latency; + sample.raw_size = sizeof(*record); + sample.raw_data = record; return arm_spe_deliver_synth_event(spe, speq, event, &sample); } @@ -354,6 +356,8 @@ static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq, sample.stream_id = spe_events_id; sample.addr = record->to_ip; sample.weight = record->latency; + sample.raw_size = sizeof(*record); + sample.raw_data = record; return arm_spe_deliver_synth_event(spe, speq, event, &sample); } @@ -383,6 +387,8 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq, sample.data_src = data_src; sample.period = spe->instructions_sample_period; sample.weight = record->latency; + sample.raw_size = sizeof(*record); + sample.raw_data = record; return arm_spe_deliver_synth_event(spe, speq, event, &sample); }
Providing the arm_spe_record as raw data to the synthesized SPE samples allows perf scripts to read and separately process the data in ways existing perf tools don't support and mirrors functionality available for PEBS. Signed-off-by: Ali Saidi <alisaidi@amazon.com> --- tools/perf/util/arm-spe.c | 6 ++++++ 1 file changed, 6 insertions(+)