Message ID | 20210519071939.1598923-6-leo.yan@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | perf arm-spe: Enable timestamp | expand |
On 19/05/2021 08:19, Leo Yan wrote: > When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last > perf event) for processing trace data, which is needless and even might > cause logic error, e.g. it might fail to correlate perf events with Arm > SPE events correctly. > > So this patch removes the condition checking for PERF_RECORD_EXIT event. > > Signed-off-by: Leo Yan <leo.yan@linaro.org> > --- > tools/perf/util/arm-spe.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c > index 5c5b438584c4..58b7069c5a5f 100644 > --- a/tools/perf/util/arm-spe.c > +++ b/tools/perf/util/arm-spe.c > @@ -717,11 +717,7 @@ static int arm_spe_process_event(struct perf_session *session, > sample->time); > } > } else if (timestamp) { > - if (event->header.type == PERF_RECORD_EXIT) { > - err = arm_spe_process_queues(spe, timestamp); > - if (err) > - return err; > - } > + err = arm_spe_process_queues(spe, timestamp); > } > > return err; > For the whole set: Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> I see a big improvement in decoding involving multiple processes because the timestamps are now correlated with the comm and mmap events. For example perf-exec samples are visible right before the exec is done, and on an application that forks, samples are visible from all processes. For example: perf record -e arm_spe// -- bash -c "stress -c 1" perf script perf-exec 4502 [003] 259755.050409: 1 l1d-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms]) perf-exec 4502 [003] 259755.050409: 1 tlb-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms]) perf-exec 4502 [003] 259755.050409: 1 memory: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms]) perf-exec 4502 [003] 259755.050411: 1 tlb-access: ffff800010120fb8 __rcu_read_lock+0x0 ([kernel.kallsyms]) bash 4502 [003] 259755.050411: 1 branch-miss: ffff8000105b2a40 memcpy+0x80 ([kernel.kallsyms]) bash 4502 [003] 259755.050411: 1 tlb-access: 0 [unknown] ([unknown]) ... stress 4502 [003] 259755.051468: 1 l1d-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms]) stress 4502 [003] 259755.051468: 1 tlb-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms]) stress 4502 [003] 259755.051468: 1 memory: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms]) Previously samples were only attributed to 'stress', which was obviously wrong. James
On Fri, Jun 25, 2021 at 02:25:15PM +0100, James Clark wrote: > > > On 19/05/2021 08:19, Leo Yan wrote: > > When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last > > perf event) for processing trace data, which is needless and even might > > cause logic error, e.g. it might fail to correlate perf events with Arm > > SPE events correctly. > > > > So this patch removes the condition checking for PERF_RECORD_EXIT event. > > > > Signed-off-by: Leo Yan <leo.yan@linaro.org> > > --- > > tools/perf/util/arm-spe.c | 6 +----- > > 1 file changed, 1 insertion(+), 5 deletions(-) > > > > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c > > index 5c5b438584c4..58b7069c5a5f 100644 > > --- a/tools/perf/util/arm-spe.c > > +++ b/tools/perf/util/arm-spe.c > > @@ -717,11 +717,7 @@ static int arm_spe_process_event(struct perf_session *session, > > sample->time); > > } > > } else if (timestamp) { > > - if (event->header.type == PERF_RECORD_EXIT) { > > - err = arm_spe_process_queues(spe, timestamp); > > - if (err) > > - return err; > > - } > > + err = arm_spe_process_queues(spe, timestamp); > > } > > > > return err; > > > > For the whole set: > Reviewed-by: James Clark <james.clark@arm.com> > Tested-by: James Clark <james.clark@arm.com> > I see a big improvement in decoding involving multiple processes because the timestamps are now > correlated with the comm and mmap events. > > For example perf-exec samples are visible right before the exec is done, and on an > application that forks, samples are visible from all processes. For example: > > perf record -e arm_spe// -- bash -c "stress -c 1" > perf script > > perf-exec 4502 [003] 259755.050409: 1 l1d-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms]) > perf-exec 4502 [003] 259755.050409: 1 tlb-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms]) > perf-exec 4502 [003] 259755.050409: 1 memory: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms]) > perf-exec 4502 [003] 259755.050411: 1 tlb-access: ffff800010120fb8 __rcu_read_lock+0x0 ([kernel.kallsyms]) > bash 4502 [003] 259755.050411: 1 branch-miss: ffff8000105b2a40 memcpy+0x80 ([kernel.kallsyms]) > bash 4502 [003] 259755.050411: 1 tlb-access: 0 [unknown] ([unknown]) > ... > stress 4502 [003] 259755.051468: 1 l1d-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms]) > stress 4502 [003] 259755.051468: 1 tlb-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms]) > stress 4502 [003] 259755.051468: 1 memory: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms]) > > Previously samples were only attributed to 'stress', which was obviously wrong. Thanks a lot for the review and testing, James! Hi Arnaldo, I confirmed this patch set can be cleanly applied on the latest acme/perf/core branch, so could you pick up this patch set? Thanks, Leo
Em Mon, Jun 28, 2021 at 08:12:17PM +0800, Leo Yan escreveu: > On Fri, Jun 25, 2021 at 02:25:15PM +0100, James Clark wrote: > > For the whole set: > > Reviewed-by: James Clark <james.clark@arm.com> > > Tested-by: James Clark <james.clark@arm.com> > > I see a big improvement in decoding involving multiple processes because the timestamps are now > > > > For example perf-exec samples are visible right before the exec is done, and on an > > application that forks, samples are visible from all processes. For example: > > perf record -e arm_spe// -- bash -c "stress -c 1" > > perf script > > perf-exec 4502 [003] 259755.050409: 1 l1d-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms]) > > perf-exec 4502 [003] 259755.050409: 1 tlb-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms]) > > perf-exec 4502 [003] 259755.050409: 1 memory: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms]) > > perf-exec 4502 [003] 259755.050411: 1 tlb-access: ffff800010120fb8 __rcu_read_lock+0x0 ([kernel.kallsyms]) > > bash 4502 [003] 259755.050411: 1 branch-miss: ffff8000105b2a40 memcpy+0x80 ([kernel.kallsyms]) > > bash 4502 [003] 259755.050411: 1 tlb-access: 0 [unknown] ([unknown]) > > ... > > stress 4502 [003] 259755.051468: 1 l1d-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms]) > > stress 4502 [003] 259755.051468: 1 tlb-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms]) > > stress 4502 [003] 259755.051468: 1 memory: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms]) > > Previously samples were only attributed to 'stress', which was obviously wrong. > > Thanks a lot for the review and testing, James! > > Hi Arnaldo, I confirmed this patch set can be cleanly applied on > the latest acme/perf/core branch, so could you pick up this patch > set? Applied, thanks, please let me know if there is still something outstanding, - Arnaldo
On Thu, Jul 01, 2021 at 02:03:16PM -0300, Arnaldo Carvalho de Melo wrote: [...] > > Hi Arnaldo, I confirmed this patch set can be cleanly applied on > > the latest acme/perf/core branch, so could you pick up this patch > > set? > > Applied, thanks, please let me know if there is still something > outstanding, Thanks, Arnaldo! I confirmed you don't miss anything.
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index 5c5b438584c4..58b7069c5a5f 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -717,11 +717,7 @@ static int arm_spe_process_event(struct perf_session *session, sample->time); } } else if (timestamp) { - if (event->header.type == PERF_RECORD_EXIT) { - err = arm_spe_process_queues(spe, timestamp); - if (err) - return err; - } + err = arm_spe_process_queues(spe, timestamp); } return err;
When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last perf event) for processing trace data, which is needless and even might cause logic error, e.g. it might fail to correlate perf events with Arm SPE events correctly. So this patch removes the condition checking for PERF_RECORD_EXIT event. Signed-off-by: Leo Yan <leo.yan@linaro.org> --- tools/perf/util/arm-spe.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)