diff mbox series

[v5,5/5] perf arm-spe: Don't wait for PERF_RECORD_EXIT event

Message ID 20210519071939.1598923-6-leo.yan@linaro.org (mailing list archive)
State New, archived
Headers show
Series perf arm-spe: Enable timestamp | expand

Commit Message

Leo Yan May 19, 2021, 7:19 a.m. UTC
When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last
perf event) for processing trace data, which is needless and even might
cause logic error, e.g. it might fail to correlate perf events with Arm
SPE events correctly.

So this patch removes the condition checking for PERF_RECORD_EXIT event.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/arm-spe.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

Comments

James Clark June 25, 2021, 1:25 p.m. UTC | #1
On 19/05/2021 08:19, Leo Yan wrote:
> When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last
> perf event) for processing trace data, which is needless and even might
> cause logic error, e.g. it might fail to correlate perf events with Arm
> SPE events correctly.
> 
> So this patch removes the condition checking for PERF_RECORD_EXIT event.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/arm-spe.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 5c5b438584c4..58b7069c5a5f 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -717,11 +717,7 @@ static int arm_spe_process_event(struct perf_session *session,
>  					sample->time);
>  		}
>  	} else if (timestamp) {
> -		if (event->header.type == PERF_RECORD_EXIT) {
> -			err = arm_spe_process_queues(spe, timestamp);
> -			if (err)
> -				return err;
> -		}
> +		err = arm_spe_process_queues(spe, timestamp);
>  	}
>  
>  	return err;
> 

For the whole set:
Reviewed-by: James Clark <james.clark@arm.com>
Tested-by: James Clark <james.clark@arm.com>

I see a big improvement in decoding involving multiple processes because the timestamps are now
correlated with the comm and mmap events.

For example perf-exec samples are visible right before the exec is done, and on an
application that forks, samples are visible from all processes. For example:

   perf record -e arm_spe// -- bash -c "stress -c 1"
   perf script

   perf-exec  4502 [003] 259755.050409:          1    l1d-access:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
   perf-exec  4502 [003] 259755.050409:          1    tlb-access:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
   perf-exec  4502 [003] 259755.050409:          1        memory:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
   perf-exec  4502 [003] 259755.050411:          1    tlb-access:  ffff800010120fb8 __rcu_read_lock+0x0 ([kernel.kallsyms])
   bash  4502 [003] 259755.050411:          1   branch-miss:  ffff8000105b2a40 memcpy+0x80 ([kernel.kallsyms])
   bash  4502 [003] 259755.050411:          1    tlb-access:                 0 [unknown] ([unknown])
   ...
   stress  4502 [003] 259755.051468:          1    l1d-access:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
   stress  4502 [003] 259755.051468:          1    tlb-access:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
   stress  4502 [003] 259755.051468:          1        memory:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])

Previously samples were only attributed to 'stress', which was obviously wrong.

James
Leo Yan June 28, 2021, 12:12 p.m. UTC | #2
On Fri, Jun 25, 2021 at 02:25:15PM +0100, James Clark wrote:
> 
> 
> On 19/05/2021 08:19, Leo Yan wrote:
> > When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last
> > perf event) for processing trace data, which is needless and even might
> > cause logic error, e.g. it might fail to correlate perf events with Arm
> > SPE events correctly.
> > 
> > So this patch removes the condition checking for PERF_RECORD_EXIT event.
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/arm-spe.c | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> > index 5c5b438584c4..58b7069c5a5f 100644
> > --- a/tools/perf/util/arm-spe.c
> > +++ b/tools/perf/util/arm-spe.c
> > @@ -717,11 +717,7 @@ static int arm_spe_process_event(struct perf_session *session,
> >  					sample->time);
> >  		}
> >  	} else if (timestamp) {
> > -		if (event->header.type == PERF_RECORD_EXIT) {
> > -			err = arm_spe_process_queues(spe, timestamp);
> > -			if (err)
> > -				return err;
> > -		}
> > +		err = arm_spe_process_queues(spe, timestamp);
> >  	}
> >  
> >  	return err;
> > 
> 
> For the whole set:
> Reviewed-by: James Clark <james.clark@arm.com>
> Tested-by: James Clark <james.clark@arm.com>

> I see a big improvement in decoding involving multiple processes because the timestamps are now
> correlated with the comm and mmap events.
> 
> For example perf-exec samples are visible right before the exec is done, and on an
> application that forks, samples are visible from all processes. For example:
> 
>    perf record -e arm_spe// -- bash -c "stress -c 1"
>    perf script
> 
>    perf-exec  4502 [003] 259755.050409:          1    l1d-access:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
>    perf-exec  4502 [003] 259755.050409:          1    tlb-access:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
>    perf-exec  4502 [003] 259755.050409:          1        memory:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
>    perf-exec  4502 [003] 259755.050411:          1    tlb-access:  ffff800010120fb8 __rcu_read_lock+0x0 ([kernel.kallsyms])
>    bash  4502 [003] 259755.050411:          1   branch-miss:  ffff8000105b2a40 memcpy+0x80 ([kernel.kallsyms])
>    bash  4502 [003] 259755.050411:          1    tlb-access:                 0 [unknown] ([unknown])
>    ...
>    stress  4502 [003] 259755.051468:          1    l1d-access:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
>    stress  4502 [003] 259755.051468:          1    tlb-access:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
>    stress  4502 [003] 259755.051468:          1        memory:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
> 
> Previously samples were only attributed to 'stress', which was obviously wrong.

Thanks a lot for the review and testing, James!

Hi Arnaldo, I confirmed this patch set can be cleanly applied on
the latest acme/perf/core branch, so could you pick up this patch
set?

Thanks,
Leo
Arnaldo Carvalho de Melo July 1, 2021, 5:03 p.m. UTC | #3
Em Mon, Jun 28, 2021 at 08:12:17PM +0800, Leo Yan escreveu:
> On Fri, Jun 25, 2021 at 02:25:15PM +0100, James Clark wrote:
> > For the whole set:
> > Reviewed-by: James Clark <james.clark@arm.com>
> > Tested-by: James Clark <james.clark@arm.com>
 
> > I see a big improvement in decoding involving multiple processes because the timestamps are now

> > 
> > For example perf-exec samples are visible right before the exec is done, and on an
> > application that forks, samples are visible from all processes. For example:

> >    perf record -e arm_spe// -- bash -c "stress -c 1"
> >    perf script

> >    perf-exec  4502 [003] 259755.050409:          1    l1d-access:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> >    perf-exec  4502 [003] 259755.050409:          1    tlb-access:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> >    perf-exec  4502 [003] 259755.050409:          1        memory:  ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> >    perf-exec  4502 [003] 259755.050411:          1    tlb-access:  ffff800010120fb8 __rcu_read_lock+0x0 ([kernel.kallsyms])
> >    bash  4502 [003] 259755.050411:          1   branch-miss:  ffff8000105b2a40 memcpy+0x80 ([kernel.kallsyms])
> >    bash  4502 [003] 259755.050411:          1    tlb-access:                 0 [unknown] ([unknown])
> >    ...
> >    stress  4502 [003] 259755.051468:          1    l1d-access:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
> >    stress  4502 [003] 259755.051468:          1    tlb-access:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
> >    stress  4502 [003] 259755.051468:          1        memory:  ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])

> > Previously samples were only attributed to 'stress', which was obviously wrong.
> 
> Thanks a lot for the review and testing, James!
> 
> Hi Arnaldo, I confirmed this patch set can be cleanly applied on
> the latest acme/perf/core branch, so could you pick up this patch
> set?

Applied, thanks, please let me know if there is still something
outstanding,

- Arnaldo
Leo Yan July 2, 2021, 1:31 a.m. UTC | #4
On Thu, Jul 01, 2021 at 02:03:16PM -0300, Arnaldo Carvalho de Melo wrote:

[...]

> > Hi Arnaldo, I confirmed this patch set can be cleanly applied on
> > the latest acme/perf/core branch, so could you pick up this patch
> > set?
> 
> Applied, thanks, please let me know if there is still something
> outstanding,

Thanks, Arnaldo!  I confirmed you don't miss anything.
diff mbox series

Patch

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 5c5b438584c4..58b7069c5a5f 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -717,11 +717,7 @@  static int arm_spe_process_event(struct perf_session *session,
 					sample->time);
 		}
 	} else if (timestamp) {
-		if (event->header.type == PERF_RECORD_EXIT) {
-			err = arm_spe_process_queues(spe, timestamp);
-			if (err)
-				return err;
-		}
+		err = arm_spe_process_queues(spe, timestamp);
 	}
 
 	return err;