Message ID | 20210624164303.28632-2-james.clark@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | perf cs-etm: Split Coresight decode by aux records | expand |
On Thu, Jun 24, 2021 at 05:43:02PM +0100, James Clark wrote: > Populate the auxtrace queues using AUX records rather than whole > auxtrace buffers so that the decoder is reset between each aux record. [...] > Signed-off-by: James Clark <james.clark@arm.com> I tested this patch with kprobe/uprobe events; and confirmed this patch works as expected. So: Tested-by: Leo Yan <leo.yan@linaro.org> Below shares my testing steps. - The rationale for the testing is to use "perf probe" to add trace points for PERF_RECORD_AUX/PERF_RECORD_AUXTRACE, and add another uprobe event for parsing AUX fragment. So we can compare the recorded PERF_RECORD_AUX events, and check if AUX fragment can reorganize the perf AUX trace data correctly or not. - The testing script test_cs_etm_snapshot.sh: http://paste.debian.net/1202563/ - Add kprobe event for perf_event_aux_event() so that it can trace the perf event PERF_RECORD_AUX: # perf probe --add "perf_event_aux_event head=head:x64 size=size:x64" - Add uprobe event for __auxtrace_mmap__read() so that it can trace the perf event PERF_RECORD_AUXTRACE: # perf probe -x /mnt/linux-kernel/linux-cs-dev/tools/perf/perf \ --add "__auxtrace_mmap__read:58 idx=mm->idx head=head:x64 offset=offset:x64 size=size:x64" - Add uprobe event for cs_etm__queue_aux_fragment(): # perf probe -x /mnt/linux-kernel/linux-cs-dev/tools/perf/perf \ --add "cs_etm__queue_aux_fragment:63 aux_offset=aux_offset:x64 aux_size=aux_event->aux_size:x64" - Enable trace events: # cd /sys/kernel/debug/tracing # echo 1 > events/probe/enable # echo 1 > events/probe_perf/enable - Executed the testing script and perf report command: # sh test_cs_etm_snapshot.sh [ perf record: Woken up 7 times to write data ] [ perf record: Captured and wrote 19.534 MB /tmp/__perf_test.perf.data.miRDs ] # perf report -i /tmp/__perf_test.perf.data.miRDs - At the end, I can get the tracing result: # cd /sys/kernel/debug/tracing # cat trace # entries-in-buffer/entries-written: 21/21 #P:6 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | dd-2206 [001] d.h3 1395.922396: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0x400000 size=0x400000 perf-2204 [002] .... 1395.922565: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0x400000 offset=0x0 size=0x400000 dd-2206 [001] d.h3 1395.942658: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0x50b3f0 size=0x10b3f0 perf-2204 [002] .... 1395.942739: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0x50b3f0 offset=0x400000 size=0x10b3f0 dd-2206 [001] d.h3 1396.000040: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0x90b3f0 size=0x400000 perf-2204 [002] .... 1396.000150: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0x90b3f0 offset=0x50b3f0 size=0x400000 dd-2206 [001] d.h3 1396.023098: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0xb87ae0 size=0x27c6f0 perf-2204 [002] .... 1396.023197: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0xb87ae0 offset=0x90b3f0 size=0x27c6f0 dd-2206 [001] d.h3 1396.087722: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0xf87ae0 size=0x400000 perf-2204 [002] .... 1396.087838: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0xf87ae0 offset=0xb87ae0 size=0x400000 dd-2206 [001] d.h3 1396.131696: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0x1387ae0 size=0x400000 perf-2204 [002] .... 1396.131808: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0x1387ae0 offset=0xf87ae0 size=0x400000 perf-2217 [003] .... 1439.581363: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0x0 aux_size=0x400000 perf-2217 [003] .... 1439.581467: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0x400000 aux_size=0x10b3f0 perf-2217 [003] .... 1439.581510: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0x50b3f0 aux_size=0x400000 perf-2217 [003] .... 1439.581553: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0x90b3f0 aux_size=0x27c6f0 perf-2217 [003] .... 1439.581595: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0xb87ae0 aux_size=0x400000 perf-2217 [003] .... 1439.581638: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0xf87ae0 aux_size=0x400000 We can see the recording AUX trace data in perf_event_aux_event() can be decoded properly in events cs_etm__queue_aux_fragment_L63(). Thanks, Leo
On Thu, Jun 24, 2021 at 05:43:02PM +0100, James Clark wrote: > Populate the auxtrace queues using AUX records rather than whole > auxtrace buffers so that the decoder is reset between each aux record. > > This is similar to the auxtrace_queues__process_index() -> > auxtrace_queues__add_indexed_event() flow where > perf_session__peek_event() is used to read AUXTRACE events out of > random positions in the file based on the auxtrace index. But now we > loop over all PERF_RECORD_AUX events instead of AUXTRACE buffers. For > each PERF_RECORD_AUX event, we find the corresponding AUXTRACE buffer > using the index, and add a fragment of that buffer to the auxtrace > queues. No other changes to decoding were made, apart from populating > the auxtrace queues. The result of decoding is identical to before, > except in cases where decoding failed completely, due to not resetting > the decoder. > > The reason for this change is because AUX records are emitted any time > tracing is disabled, for example when the process is scheduled out. > Because ETM was disabled and enabled again, the decoder also needs to > be reset to force the search for a sync packet. Otherwise there would > be fatal decoding errors. > > Testing > ======= > > Testing was done with the following script, to diff the decoding results > between the patched and un-patched versions of perf: > > #!/bin/bash > set -ex > > $1 script -i $3 $4 > split.script > $2 script -i $3 $4 > default.script > > diff split.script default.script | head -n 20 > > And it was run like this, with various itrace options depending on the > quantity of synthesised events: > > compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns > > No changes in output were observed in the following scenarios: > > * Simple per-cpu > perf record -e cs_etm/@tmc_etr0/u top > > * Per-thread, single thread > perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C > > * Per-thread multiple threads (but only one thread collected data): > perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597 > > * Per-thread multiple threads (both threads collected data): > perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597 > > * Per-cpu explicit threads: > perf record -e cs_etm/@tmc_etr0/u --pid 853,854 > > * System-wide (per-cpu): > perf record -e cs_etm/@tmc_etr0/u -a > > * No data collected (no aux buffers) > Can happen with any command when run for a short period > > * Containing truncated records > Can happen with any command > > * Containing aux records with 0 size > Can happen with any command > > * Snapshot mode (various files with and without buffer wrap) > perf record -e cs_etm/@tmc_etr0/u -a --snapshot > > Some differences were observed in the following scenario: > > * Snapshot mode (with duplicate buffers) > perf record -e cs_etm/@tmc_etr0/u -a --snapshot > > Fewer samples are generated in snapshot mode if duplicate buffers > were gathered because buffers with the same offset are now only added > once. This gives different, but more correct results and no duplicate > data is decoded any more. > > Signed-off-by: James Clark <james.clark@arm.com> > --- > tools/perf/util/cs-etm.c | 168 ++++++++++++++++++++++++++++++++++++++- > 1 file changed, 167 insertions(+), 1 deletion(-) Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c > index 64536a6ed10a..88e8122f73c9 100644 > --- a/tools/perf/util/cs-etm.c > +++ b/tools/perf/util/cs-etm.c > @@ -2679,6 +2679,172 @@ static u64 *cs_etm__create_meta_blk(u64 *buff_in, int *buff_in_offset, > return metadata; > } > > +/** > + * Puts a fragment of an auxtrace buffer into the auxtrace queues based > + * on the bounds of aux_event, if it matches with the buffer that's at > + * file_offset. > + * > + * Normally, whole auxtrace buffers would be added to the queue. But we > + * want to reset the decoder for every PERF_RECORD_AUX event, and the decoder > + * is reset across each buffer, so splitting the buffers up in advance has > + * the same effect. > + */ > +static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_offset, size_t sz, > + struct perf_record_aux *aux_event, struct perf_sample *sample) > +{ > + int err; > + char buf[PERF_SAMPLE_MAX_SIZE]; > + union perf_event *auxtrace_event_union; > + struct perf_record_auxtrace *auxtrace_event; > + union perf_event auxtrace_fragment; > + __u64 aux_offset, aux_size; > + > + struct cs_etm_auxtrace *etm = container_of(session->auxtrace, > + struct cs_etm_auxtrace, > + auxtrace); > + > + /* > + * There should be a PERF_RECORD_AUXTRACE event at the file_offset that we got > + * from looping through the auxtrace index. > + */ > + err = perf_session__peek_event(session, file_offset, buf, > + PERF_SAMPLE_MAX_SIZE, &auxtrace_event_union, NULL); > + if (err) > + return err; > + auxtrace_event = &auxtrace_event_union->auxtrace; > + if (auxtrace_event->header.type != PERF_RECORD_AUXTRACE) > + return -EINVAL; > + > + if (auxtrace_event->header.size < sizeof(struct perf_record_auxtrace) || > + auxtrace_event->header.size != sz) { > + return -EINVAL; > + } > + > + /* > + * In per-thread mode, CPU is set to -1, but TID will be set instead. See > + * auxtrace_mmap_params__set_idx(). Return 'not found' if neither CPU nor TID match. > + */ > + if ((auxtrace_event->cpu == (__u32) -1 && auxtrace_event->tid != sample->tid) || > + auxtrace_event->cpu != sample->cpu) > + return 1; > + > + if (aux_event->flags & PERF_AUX_FLAG_OVERWRITE) { > + /* > + * Clamp size in snapshot mode. The buffer size is clamped in > + * __auxtrace_mmap__read() for snapshots, so the aux record size doesn't reflect > + * the buffer size. > + */ > + aux_size = min(aux_event->aux_size, auxtrace_event->size); > + > + /* > + * In this mode, the head also points to the end of the buffer so aux_offset > + * needs to have the size subtracted so it points to the beginning as in normal mode > + */ > + aux_offset = aux_event->aux_offset - aux_size; > + } else { > + aux_size = aux_event->aux_size; > + aux_offset = aux_event->aux_offset; > + } > + > + if (aux_offset >= auxtrace_event->offset && > + aux_offset + aux_size <= auxtrace_event->offset + auxtrace_event->size) { > + /* > + * If this AUX event was inside this buffer somewhere, create a new auxtrace event > + * based on the sizes of the aux event, and queue that fragment. > + */ > + auxtrace_fragment.auxtrace = *auxtrace_event; > + auxtrace_fragment.auxtrace.size = aux_size; > + auxtrace_fragment.auxtrace.offset = aux_offset; > + file_offset += aux_offset - auxtrace_event->offset + auxtrace_event->header.size; > + > + pr_debug3("CS ETM: Queue buffer size: %#"PRI_lx64" offset: %#"PRI_lx64 > + " tid: %d cpu: %d\n", aux_size, aux_offset, sample->tid, sample->cpu); > + return auxtrace_queues__add_event(&etm->queues, session, &auxtrace_fragment, > + file_offset, NULL); > + } > + > + /* Wasn't inside this buffer, but there were no parse errors. 1 == 'not found' */ > + return 1; > +} > + > +static int cs_etm__queue_aux_records_cb(struct perf_session *session, union perf_event *event, > + u64 offset __maybe_unused, void *data __maybe_unused) > +{ > + struct perf_sample sample; > + int ret; > + struct auxtrace_index_entry *ent; > + struct auxtrace_index *auxtrace_index; > + struct evsel *evsel; > + size_t i; > + > + /* Don't care about any other events, we're only queuing buffers for AUX events */ > + if (event->header.type != PERF_RECORD_AUX) > + return 0; > + > + if (event->header.size < sizeof(struct perf_record_aux)) > + return -EINVAL; > + > + /* Truncated Aux records can have 0 size and shouldn't result in anything being queued. */ > + if (!event->aux.aux_size) > + return 0; > + > + /* > + * Parse the sample, we need the sample_id_all data that comes after the event so that the > + * CPU or PID can be matched to an AUXTRACE buffer's CPU or PID. > + */ > + evsel = evlist__event2evsel(session->evlist, event); > + if (!evsel) > + return -EINVAL; > + ret = evsel__parse_sample(evsel, event, &sample); > + if (ret) > + return ret; > + > + /* > + * Loop through the auxtrace index to find the buffer that matches up with this aux event. > + */ > + list_for_each_entry(auxtrace_index, &session->auxtrace_index, list) { > + for (i = 0; i < auxtrace_index->nr; i++) { > + ent = &auxtrace_index->entries[i]; > + ret = cs_etm__queue_aux_fragment(session, ent->file_offset, > + ent->sz, &event->aux, &sample); > + /* > + * Stop search on error or successful values. Continue search on > + * 1 ('not found') > + */ > + if (ret != 1) > + return ret; > + } > + } > + > + /* > + * Couldn't find the buffer corresponding to this aux record, something went wrong. Warn but > + * don't exit with an error because it will still be possible to decode other aux records. > + */ > + pr_err("CS ETM: Couldn't find auxtrace buffer for aux_offset: %#"PRI_lx64 > + " tid: %d cpu: %d\n", event->aux.aux_offset, sample.tid, sample.cpu); > + return 0; > +} > + > +static int cs_etm__queue_aux_records(struct perf_session *session) > +{ > + struct auxtrace_index *index = list_first_entry_or_null(&session->auxtrace_index, > + struct auxtrace_index, list); > + if (index && index->nr > 0) > + return perf_session__peek_events(session, session->header.data_offset, > + session->header.data_size, > + cs_etm__queue_aux_records_cb, NULL); > + > + /* > + * We would get here if there are no entries in the index (either no auxtrace > + * buffers or no index at all). Fail silently as there is the possibility of > + * queueing them in cs_etm__process_auxtrace_event() if etm->data_queued is still > + * false. > + * > + * In that scenario, buffers will not be split by AUX records. > + */ > + return 0; > +} > + > int cs_etm__process_auxtrace_info(union perf_event *event, > struct perf_session *session) > { > @@ -2879,7 +3045,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event, > if (err) > goto err_delete_thread; > > - err = auxtrace_queues__process_index(&etm->queues, session); > + err = cs_etm__queue_aux_records(session); > if (err) > goto err_delete_thread; > > -- > 2.28.0 >
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 64536a6ed10a..88e8122f73c9 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -2679,6 +2679,172 @@ static u64 *cs_etm__create_meta_blk(u64 *buff_in, int *buff_in_offset, return metadata; } +/** + * Puts a fragment of an auxtrace buffer into the auxtrace queues based + * on the bounds of aux_event, if it matches with the buffer that's at + * file_offset. + * + * Normally, whole auxtrace buffers would be added to the queue. But we + * want to reset the decoder for every PERF_RECORD_AUX event, and the decoder + * is reset across each buffer, so splitting the buffers up in advance has + * the same effect. + */ +static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_offset, size_t sz, + struct perf_record_aux *aux_event, struct perf_sample *sample) +{ + int err; + char buf[PERF_SAMPLE_MAX_SIZE]; + union perf_event *auxtrace_event_union; + struct perf_record_auxtrace *auxtrace_event; + union perf_event auxtrace_fragment; + __u64 aux_offset, aux_size; + + struct cs_etm_auxtrace *etm = container_of(session->auxtrace, + struct cs_etm_auxtrace, + auxtrace); + + /* + * There should be a PERF_RECORD_AUXTRACE event at the file_offset that we got + * from looping through the auxtrace index. + */ + err = perf_session__peek_event(session, file_offset, buf, + PERF_SAMPLE_MAX_SIZE, &auxtrace_event_union, NULL); + if (err) + return err; + auxtrace_event = &auxtrace_event_union->auxtrace; + if (auxtrace_event->header.type != PERF_RECORD_AUXTRACE) + return -EINVAL; + + if (auxtrace_event->header.size < sizeof(struct perf_record_auxtrace) || + auxtrace_event->header.size != sz) { + return -EINVAL; + } + + /* + * In per-thread mode, CPU is set to -1, but TID will be set instead. See + * auxtrace_mmap_params__set_idx(). Return 'not found' if neither CPU nor TID match. + */ + if ((auxtrace_event->cpu == (__u32) -1 && auxtrace_event->tid != sample->tid) || + auxtrace_event->cpu != sample->cpu) + return 1; + + if (aux_event->flags & PERF_AUX_FLAG_OVERWRITE) { + /* + * Clamp size in snapshot mode. The buffer size is clamped in + * __auxtrace_mmap__read() for snapshots, so the aux record size doesn't reflect + * the buffer size. + */ + aux_size = min(aux_event->aux_size, auxtrace_event->size); + + /* + * In this mode, the head also points to the end of the buffer so aux_offset + * needs to have the size subtracted so it points to the beginning as in normal mode + */ + aux_offset = aux_event->aux_offset - aux_size; + } else { + aux_size = aux_event->aux_size; + aux_offset = aux_event->aux_offset; + } + + if (aux_offset >= auxtrace_event->offset && + aux_offset + aux_size <= auxtrace_event->offset + auxtrace_event->size) { + /* + * If this AUX event was inside this buffer somewhere, create a new auxtrace event + * based on the sizes of the aux event, and queue that fragment. + */ + auxtrace_fragment.auxtrace = *auxtrace_event; + auxtrace_fragment.auxtrace.size = aux_size; + auxtrace_fragment.auxtrace.offset = aux_offset; + file_offset += aux_offset - auxtrace_event->offset + auxtrace_event->header.size; + + pr_debug3("CS ETM: Queue buffer size: %#"PRI_lx64" offset: %#"PRI_lx64 + " tid: %d cpu: %d\n", aux_size, aux_offset, sample->tid, sample->cpu); + return auxtrace_queues__add_event(&etm->queues, session, &auxtrace_fragment, + file_offset, NULL); + } + + /* Wasn't inside this buffer, but there were no parse errors. 1 == 'not found' */ + return 1; +} + +static int cs_etm__queue_aux_records_cb(struct perf_session *session, union perf_event *event, + u64 offset __maybe_unused, void *data __maybe_unused) +{ + struct perf_sample sample; + int ret; + struct auxtrace_index_entry *ent; + struct auxtrace_index *auxtrace_index; + struct evsel *evsel; + size_t i; + + /* Don't care about any other events, we're only queuing buffers for AUX events */ + if (event->header.type != PERF_RECORD_AUX) + return 0; + + if (event->header.size < sizeof(struct perf_record_aux)) + return -EINVAL; + + /* Truncated Aux records can have 0 size and shouldn't result in anything being queued. */ + if (!event->aux.aux_size) + return 0; + + /* + * Parse the sample, we need the sample_id_all data that comes after the event so that the + * CPU or PID can be matched to an AUXTRACE buffer's CPU or PID. + */ + evsel = evlist__event2evsel(session->evlist, event); + if (!evsel) + return -EINVAL; + ret = evsel__parse_sample(evsel, event, &sample); + if (ret) + return ret; + + /* + * Loop through the auxtrace index to find the buffer that matches up with this aux event. + */ + list_for_each_entry(auxtrace_index, &session->auxtrace_index, list) { + for (i = 0; i < auxtrace_index->nr; i++) { + ent = &auxtrace_index->entries[i]; + ret = cs_etm__queue_aux_fragment(session, ent->file_offset, + ent->sz, &event->aux, &sample); + /* + * Stop search on error or successful values. Continue search on + * 1 ('not found') + */ + if (ret != 1) + return ret; + } + } + + /* + * Couldn't find the buffer corresponding to this aux record, something went wrong. Warn but + * don't exit with an error because it will still be possible to decode other aux records. + */ + pr_err("CS ETM: Couldn't find auxtrace buffer for aux_offset: %#"PRI_lx64 + " tid: %d cpu: %d\n", event->aux.aux_offset, sample.tid, sample.cpu); + return 0; +} + +static int cs_etm__queue_aux_records(struct perf_session *session) +{ + struct auxtrace_index *index = list_first_entry_or_null(&session->auxtrace_index, + struct auxtrace_index, list); + if (index && index->nr > 0) + return perf_session__peek_events(session, session->header.data_offset, + session->header.data_size, + cs_etm__queue_aux_records_cb, NULL); + + /* + * We would get here if there are no entries in the index (either no auxtrace + * buffers or no index at all). Fail silently as there is the possibility of + * queueing them in cs_etm__process_auxtrace_event() if etm->data_queued is still + * false. + * + * In that scenario, buffers will not be split by AUX records. + */ + return 0; +} + int cs_etm__process_auxtrace_info(union perf_event *event, struct perf_session *session) { @@ -2879,7 +3045,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event, if (err) goto err_delete_thread; - err = auxtrace_queues__process_index(&etm->queues, session); + err = cs_etm__queue_aux_records(session); if (err) goto err_delete_thread;
Populate the auxtrace queues using AUX records rather than whole auxtrace buffers so that the decoder is reset between each aux record. This is similar to the auxtrace_queues__process_index() -> auxtrace_queues__add_indexed_event() flow where perf_session__peek_event() is used to read AUXTRACE events out of random positions in the file based on the auxtrace index. But now we loop over all PERF_RECORD_AUX events instead of AUXTRACE buffers. For each PERF_RECORD_AUX event, we find the corresponding AUXTRACE buffer using the index, and add a fragment of that buffer to the auxtrace queues. No other changes to decoding were made, apart from populating the auxtrace queues. The result of decoding is identical to before, except in cases where decoding failed completely, due to not resetting the decoder. The reason for this change is because AUX records are emitted any time tracing is disabled, for example when the process is scheduled out. Because ETM was disabled and enabled again, the decoder also needs to be reset to force the search for a sync packet. Otherwise there would be fatal decoding errors. Testing ======= Testing was done with the following script, to diff the decoding results between the patched and un-patched versions of perf: #!/bin/bash set -ex $1 script -i $3 $4 > split.script $2 script -i $3 $4 > default.script diff split.script default.script | head -n 20 And it was run like this, with various itrace options depending on the quantity of synthesised events: compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns No changes in output were observed in the following scenarios: * Simple per-cpu perf record -e cs_etm/@tmc_etr0/u top * Per-thread, single thread perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C * Per-thread multiple threads (but only one thread collected data): perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597 * Per-thread multiple threads (both threads collected data): perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597 * Per-cpu explicit threads: perf record -e cs_etm/@tmc_etr0/u --pid 853,854 * System-wide (per-cpu): perf record -e cs_etm/@tmc_etr0/u -a * No data collected (no aux buffers) Can happen with any command when run for a short period * Containing truncated records Can happen with any command * Containing aux records with 0 size Can happen with any command * Snapshot mode (various files with and without buffer wrap) perf record -e cs_etm/@tmc_etr0/u -a --snapshot Some differences were observed in the following scenario: * Snapshot mode (with duplicate buffers) perf record -e cs_etm/@tmc_etr0/u -a --snapshot Fewer samples are generated in snapshot mode if duplicate buffers were gathered because buffers with the same offset are now only added once. This gives different, but more correct results and no duplicate data is decoded any more. Signed-off-by: James Clark <james.clark@arm.com> --- tools/perf/util/cs-etm.c | 168 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 167 insertions(+), 1 deletion(-)