diff mbox series

[1/2] perf arm-spe: Add arm_spe_record to synthesized sample

Message ID 20220125192016.20538-2-alisaidi@amazon.com (mailing list archive)
State New, archived
Headers show
Series Allow perf scripts to process SPE raw data | expand

Commit Message

Ali Saidi Jan. 25, 2022, 7:20 p.m. UTC
Providing the arm_spe_record as raw data to the synthesized SPE samples
allows perf scripts to read and separately process the data in ways
existing perf tools don't support and mirrors functionality available
for PEBS.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/util/arm-spe.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

German Gomez Jan. 25, 2022, 8:47 p.m. UTC | #1
Hi Ali,

On 25/01/2022 19:20, Ali Saidi wrote:
> Providing the arm_spe_record as raw data to the synthesized SPE samples
> allows perf scripts to read and separately process the data in ways
> existing perf tools don't support and mirrors functionality available
> for PEBS.
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> ---
>  tools/perf/util/arm-spe.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index d2b64e3f588b..a7499cde6fc0 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -336,6 +336,8 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
>  	sample.phys_addr = record->phys_addr;
>  	sample.data_src = data_src;
>  	sample.weight = record->latency;
> +	sample.raw_size = sizeof(*record);
> +	sample.raw_data = record;

Have you tried this with perf-inject? I think it would need the PERF_SAMPLE_RAW bit in the sample_type,

Although I quickly looked over the perf inject code and it looks like it's expecting some type of padding:

  // synthetic-events.c
  if (type & PERF_SAMPLE_RAW) {
    result += sizeof(u32);
    result += sample->raw_size;
  }

I'm seeing some comments in utils/event.h related to this on the intel events.

>  
>  	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
>  }
> @@ -354,6 +356,8 @@ static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq,
>  	sample.stream_id = spe_events_id;
>  	sample.addr = record->to_ip;
>  	sample.weight = record->latency;
> +	sample.raw_size = sizeof(*record);
> +	sample.raw_data = record;
>  
>  	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
>  }
> @@ -383,6 +387,8 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
>  	sample.data_src = data_src;
>  	sample.period = spe->instructions_sample_period;
>  	sample.weight = record->latency;
> +	sample.raw_size = sizeof(*record);
> +	sample.raw_data = record;
>  
>  	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
>  }
Ali Saidi Jan. 26, 2022, 3:58 p.m. UTC | #2
Hi German,
>Hi Ali,
>
>On 25/01/2022 19:20, Ali Saidi wrote:
>> Providing the arm_spe_record as raw data to the synthesized SPE samples
>> allows perf scripts to read and separately process the data in ways
>> existing perf tools don't support and mirrors functionality available
>> for PEBS.
>> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
>> ---
>>  tools/perf/util/arm-spe.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>> index d2b64e3f588b..a7499cde6fc0 100644
>> --- a/tools/perf/util/arm-spe.c
>> +++ b/tools/perf/util/arm-spe.c
>> @@ -336,6 +336,8 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
>>  	sample.phys_addr = record->phys_addr;
>>  	sample.data_src = data_src;
>>  	sample.weight = record->latency;
>> +	sample.raw_size = sizeof(*record);
>> +	sample.raw_data = record;
>
>Have you tried this with perf-inject? I think it would need the PERF_SAMPLE_RAW bit in the sample_type,

Yes I've tried the following and it worked as expected with the original
perf.data or the perf.data.jitted after perf-inject. 

perf record -e arm_spe_0/jitter=1/ -k 1 java ...
perf  inject -f --jit -i perf.data -o perf.data.jitted
perf script -i perf.data -s t1.py --itrace=i1i

>
>Although I quickly looked over the perf inject code and it looks like it's expecting some type of padding:
>
>  // synthetic-events.c
>  if (type & PERF_SAMPLE_RAW) {
>    result += sizeof(u32);
>    result += sample->raw_size;
>  }
>
>I'm seeing some comments in utils/event.h related to this on the intel events.

Yes i noticed this too,but looking at how the raw data is added to the same
other places like intel-pt.c:1703 the perf_synth__raw*() functions are used to
strip away the 4 bytes bytes before the data is added to the sample. The other
places i can find the padding used is in builtin-script.c but given we have the
--dump-raw-trace option it's not clear to me that it's needed to wrap the
arm_spe_event in another struct with padding like perf_synth_intel_ptwrite?

Thanks,
Ali
German Gomez Jan. 26, 2022, 7:07 p.m. UTC | #3
On 26/01/2022 15:58, Ali Saidi wrote:
> Hi German,
>> Hi Ali,
>>
>> On 25/01/2022 19:20, Ali Saidi wrote:
>>> [...]
>>>
>>> +	sample.raw_size = sizeof(*record);
>>> +	sample.raw_data = record;
>> Have you tried this with perf-inject? I think it would need the PERF_SAMPLE_RAW bit in the sample_type,
> Yes I've tried the following and it worked as expected with the original
> perf.data or the perf.data.jitted after perf-inject. 
>
> perf record -e arm_spe_0/jitter=1/ -k 1 java ...
> perf  inject -f --jit -i perf.data -o perf.data.jitted

This is not injecting the synthesized samples. I think it is still    
processing from the aux trace. Try adding "--itrace=i1i --strip" to the
inject command to remove the AUXTRACE events. Judging by the raw
samples, the data is missing:

11 20005239445831 0x3510 [0x94]: PERF_RECORD_SAMPLE(IP, 0x1): 47670/47670: 0xffffab6e02d1b320 period: 1 addr: 0

. ... raw event: size 64 bytes
.  0000:  09 00 00 00 01 00 40 00 f3 10 9b 3b 00 00 00 00  ......@....;....
.  0010:  98 c5 e3 02 6e ab ff ff 36 ba 00 00 36 ba 00 00  ....n...6...6...
.  0020:  35 da 30 d5 31 12 00 00 0b 00 00 00 00 00 00 00  5.0.1...........
.  0030:  01 00 00 00 00 00 00 00 82 01 00 88 00 00 00 00  ................

vs when adding PERF_SAMPLE_RAW to attr.sample_type:      

. ... raw event: size 148 bytes
.  0000:  09 00 00 00 01 00 94 00 f3 10 9b 3b 00 00 00 00  ...........;....
.  0010:  98 c5 e3 02 6e ab ff ff 36 ba 00 00 36 ba 00 00  ....n...6...6...
[...]
.  0080:  00 00 00 00 2a 00 00 00 00 00 00 00 82 01 00 88  ....*...........
.  0090:  00 00 00 00

> perf script -i perf.data -s t1.py --itrace=i1i
>
>> Although I quickly looked over the perf inject code and it looks like it's expecting some type of padding:
>>
>> [...]
>>
>> I'm seeing some comments in utils/event.h related to this on the intel events.
> Yes i noticed this too,but looking at how the raw data is added to the same
> other places like intel-pt.c:1703 the perf_synth__raw*() functions are used to
> strip away the 4 bytes bytes before the data is added to the sample. The other
> places i can find the padding used is in builtin-script.c but given we have the
> --dump-raw-trace option it's not clear to me that it's needed to wrap the
> arm_spe_event in another struct with padding like perf_synth_intel_ptwrite?

I think the intel use case makes sense because the layout of the data
is fixed and documented. If we modify the struct arm_spe_record later it
may not be obvious how to match it to the raw data of an older perf.data
file. And we're generating bigger files with redundant information.

Thanks,
German

>
> Thanks,
> Ali
>
Ali Saidi Jan. 27, 2022, 7:13 p.m. UTC | #4
On 26/01/2022 19:07, German Gomez wrote:
[...]
>>> Have you tried this with perf-inject? I think it would need the PERF_SAMPLE_RAW bit in the sample_type,
>> Yes I've tried the following and it worked as expected with the original
>> perf.data or the perf.data.jitted after perf-inject. 
>>
>> perf record -e arm_spe_0/jitter=1/ -k 1 java ...
>> perf  inject -f --jit -i perf.data -o perf.data.jitted
>
>This is not injecting the synthesized samples. I think it is still    
>processing from the aux trace. Try adding "--itrace=i1i --strip" to the
>inject command to remove the AUXTRACE events. Judging by the raw
>samples, the data is missing:
>
> [...]

Yep, you're correct here. If I use the command above the raw samples are lost.

>>> Although I quickly looked over the perf inject code and it looks like it's expecting some type of padding:
>>>
>>> [...]
>>>
>>> I'm seeing some comments in utils/event.h related to this on the intel events.
>> Yes i noticed this too,but looking at how the raw data is added to the same
>> other places like intel-pt.c:1703 the perf_synth__raw*() functions are used to
>> strip away the 4 bytes bytes before the data is added to the sample. The other
>> places i can find the padding used is in builtin-script.c but given we have the
>> --dump-raw-trace option it's not clear to me that it's needed to wrap the
>> arm_spe_event in another struct with padding like perf_synth_intel_ptwrite?
>
>I think the intel use case makes sense because the layout of the data
>is fixed and documented. If we modify the struct arm_spe_record later it
>may not be obvious how to match it to the raw data of an older perf.data
>file. And we're generating bigger files with redundant information.

Not injecting the samples into the perf trace, but having a way to support
custom scripts parsing the data would be really useful and much faster than
trying to parse back the --dump-raw-trace output into something useful. The
other way to go would be to put a header that describes the version of the spe
struct at the head of it to address any future changes, but I'm not familiar
with a workflow that would benefit from the added complexity. 

Thanks,
Ali
diff mbox series

Patch

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index d2b64e3f588b..a7499cde6fc0 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -336,6 +336,8 @@  static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
 	sample.phys_addr = record->phys_addr;
 	sample.data_src = data_src;
 	sample.weight = record->latency;
+	sample.raw_size = sizeof(*record);
+	sample.raw_data = record;
 
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
@@ -354,6 +356,8 @@  static int arm_spe__synth_branch_sample(struct arm_spe_queue *speq,
 	sample.stream_id = spe_events_id;
 	sample.addr = record->to_ip;
 	sample.weight = record->latency;
+	sample.raw_size = sizeof(*record);
+	sample.raw_data = record;
 
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
@@ -383,6 +387,8 @@  static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
 	sample.data_src = data_src;
 	sample.period = spe->instructions_sample_period;
 	sample.weight = record->latency;
+	sample.raw_size = sizeof(*record);
+	sample.raw_data = record;
 
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }