mbox series

[v3,0/2] perf: arm-spe: Decode SPE source and use for perf c2c

Message ID 20220318195913.17459-1-alisaidi@amazon.com (mailing list archive)
Headers show
Series perf: arm-spe: Decode SPE source and use for perf c2c | expand

Message

Ali Saidi March 18, 2022, 7:59 p.m. UTC
When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores so we can detect situtions like cache line contention
and transfers on Arm platforms. 

This changes enables the expected behavior of perf c2c on a system with SPE where
lines that are shared among multiple cores show up in perf c2c output. 

These changes switch to use mem_lvl_num to encode the level information instead
of mem_lvl which is being deprecated, but I haven't found other users of
mem_lvl_num. 

Changes in v3:
  * Assume ther are only three levels of cache hierarchy
  * Split the mem_lvl_num and HITM changes in c2c into two seperate patches

Ali Saidi (3):
  perf arm-spe: Use SPE data source for neoverse cores
  perf mem: Support mem_lvl_num in c2c command
  perf mem: Support HITM for when mem_lvl_num is any

 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 ++
 tools/perf/util/arm-spe.c                     | 109 +++++++++++++++---
 tools/perf/util/mem-events.c                  |  20 +++-
 4 files changed, 124 insertions(+), 18 deletions(-)

Comments

German Gomez March 22, 2022, 12:05 p.m. UTC | #1
Hi Ali, thank you for your patches

On 18/03/2022 19:59, Ali Saidi wrote:
> When synthesizing data from SPE, augment the type with source information
> for Arm Neoverse cores so we can detect situtions like cache line contention
> and transfers on Arm platforms. 
>
> This changes enables the expected behavior of perf c2c on a system with SPE where
> lines that are shared among multiple cores show up in perf c2c output. 
>
> These changes switch to use mem_lvl_num to encode the level information instead
> of mem_lvl which is being deprecated, but I haven't found other users of
> mem_lvl_num. 
>
> Changes in v3:
>   * Assume ther are only three levels of cache hierarchy
>   * Split the mem_lvl_num and HITM changes in c2c into two seperate patches
>
> Ali Saidi (3):
>   perf arm-spe: Use SPE data source for neoverse cores
>   perf mem: Support mem_lvl_num in c2c command
>   perf mem: Support HITM for when mem_lvl_num is any
>
>  .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 ++
>  tools/perf/util/arm-spe.c                     | 109 +++++++++++++++---
>  tools/perf/util/mem-events.c                  |  20 +++-
>  4 files changed, 124 insertions(+), 18 deletions(-)
>

I tested on a Neoverse N1 system using the below commands and the output
looks either unchanged or improved compared to before. For example:

| $ perf mem record -e spe-ldst -a -- sleep 4
| $ perf mem report
|
| 1.39%             1  1263          L3 miss                   [k] 0xffffb9a34bda2088
| 0.58%             1  529           L1 miss                   [k] 0xffffb9a34bd3be7c
| 0.34%             1  310           N/A                       [k] 0xffffb9a34baf4d28
| 0.34%             1  309           N/A                       [k] 0xffffb9a34bb82844

... became:

| 1.39%             1  1263          RAM hit                   [k] 0xffffb9a34bda2088
| 0.58%             1  529           L2 hit                    [k] 0xffffb9a34bd3be7c
| 0.34%             1  310           L1 hit                    [k] 0xffffb9a34baf4d28
| 0.34%             1  309           L1 hit                    [k] 0xffffb9a34bb82844
                                                                      
Also some L3 misses are now labeled as "Any cache hit" with the Snoop 
bit set. For example:
                                                                      
| 0.37%             1  332           L3 miss                   [.] 0x0000aaaadf70a700    N/A

... became:                                                           

| 0.37%             1  332           Any cache hit             [.] 0x0000aaaadf70a700    HitM

Tested-by: German Gomez <german.gomez@arm.com>
Reviewed-by: German Gomez <german.gomez@arm.com>

Thanks,
German

(I didn't run on a non-Neoverse system but it doesn't look like any   
behaviour is changed for those)