mbox series

[v1,00/11] perf script: Refactor branch flags for Arm SPE

Message ID 20250205121555.180606-1-leo.yan@arm.com (mailing list archive)
Headers show
Series perf script: Refactor branch flags for Arm SPE | expand

Message

Leo Yan Feb. 5, 2025, 12:15 p.m. UTC
This patch series refactors branch flags for support Arm SPE.  The patch
set is divided into two parts, the first part is for refactoring common
code and the second part is for enabling Arm SPE.

For refactoring branch flags, the sample flaghs are classified as branch
types and events.  A program branch type can be conditional branch,
function call, return or expection taken.  A branch event happens when
taking a branch.  This series combines branch types and the associated
events to present a sample flag.

The second part is to enable Arm SPE's sample flags for expressing
branch types and events, and support branch stack.

Patches 01 - 03 are to refactor branch types and branch events.
Patches 04, 05 extend to support not-taken event.

Patches 06 - 09 enables branch flags in Arm SPE.  This allows to print
out sample flags for samples.

Patch 10 supports branch stack for Arm SPE.  Patch 11 is an enhancement
for PBT feature.

Before:
  perf record -e arm_spe_0/load_filter=1,store_filter=1,branch_filter=1/ \
    -- ~/perf-c2c-usage-files/false_sharing.exe 1
  perf script --itrace=i1ibl -F,+flags,+addr,+brstack
   false_sharing.e  414489 [005] 775348.899294:          1                                                  branch:   jmp                   ffffc0fad9ef3d68 ffffc0fad98b2c68 search_cmp_ftr_reg+0x8 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899294:          1                                            instructions:   jmp                   ffffc0fad9ef3d68 ffffc0fad98b2c68 search_cmp_ftr_reg+0x8 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899294:          1                                                  branch:   jmp                   ffffc0fad98b3708 ffffc0fad98b3704 get_arm64_ftr_reg+0x30 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899294:          1                                            instructions:   jmp                   ffffc0fad98b3708 ffffc0fad98b3704 get_arm64_ftr_reg+0x30 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899297:          1                                                  branch:   br miss                   ffff8266da60     ffff8266dafc __sprintf_chk@plt+0xc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)
   false_sharing.e  414489 [005] 775348.899297:          1                                            instructions:   br miss                   ffff8266da60     ffff8266dafc __sprintf_chk@plt+0xc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)
   false_sharing.e  414489 [005] 775348.899297:          1                                                  branch:   br miss                   ffff826a44ec     ffff826a44e8 strcmp+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so)
   false_sharing.e  414489 [005] 775348.899297:          1                                            instructions:   br miss                   ffff826a44ec     ffff826a44e8 strcmp+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so)
   false_sharing.e  414489 [005] 775348.899298:          1                                            instructions:                                        0 ffffc0fadaad6124 mas_walk+0x274 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899300:          1                                            instructions:                                        0 ffffc0fad9b3d98c next_uptodate_folio+0x2a4 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899301:          1                                            instructions:                                        0 ffffc0fad98c3dcc __sync_icache_dcache+0x5c ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899301:          1                                                  branch:   jmp                   ffffc0fad9ba7f24 ffffc0fad9ba99c0 folio_add_file_rmap_ptes+0x48 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899301:          1                                            instructions:   jmp                   ffffc0fad9ba7f24 ffffc0fad9ba99c0 folio_add_file_rmap_ptes+0x48 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899306:          1                                            instructions:                                        0 ffffc0fad9b3f184 filemap_map_pages+0x178 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899307:          1                                                  branch:   jmp                   ffffc0fad9b3d7b0 ffffc0fad9b3d7ac next_uptodate_folio+0xc4 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899307:          1                                            instructions:   jmp                   ffffc0fad9b3d7b0 ffffc0fad9b3d7ac next_uptodate_folio+0xc4 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899307:          1                                            instructions:                                        0 ffffc0fad9b3d98c next_uptodate_folio+0x2a4 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899308:          1                                                  branch:   jmp                   ffffc0fad9ef3da4 ffffc0fad9ef3d70 bsearch+0x58 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899308:          1                                            instructions:   jmp                   ffffc0fad9ef3da4 ffffc0fad9ef3d70 bsearch+0x58 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899310:          1                                                  branch:   jmp                   ffffc0fad98a2158 ffffc0fad98a159c el0t_64_sync+0x198 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899310:          1                                            instructions:   jmp                   ffffc0fad98a2158 ffffc0fad98a159c el0t_64_sync+0x198 ([kernel.kallsyms])
   ...

After:
  perf script --itrace=i1ibl -F,+flags,+addr,+brstack
   false_sharing.e  414489 [005] 775348.899294:          1                                                  branch:   return                ffffc0fad9ef3d68 ffffc0fad98b2c68 search_cmp_ftr_reg+0x8 ([kernel.kallsyms]) 0xffffc0fad98b2c68 ([kernel.kallsyms])/0xffffc0fad9ef3d68 ([kernel.kallsyms])/P/-/-/5/RET/- 
   false_sharing.e  414489 [005] 775348.899294:          1                                            instructions:   return                ffffc0fad9ef3d68 ffffc0fad98b2c68 search_cmp_ftr_reg+0x8 ([kernel.kallsyms]) 0xffffc0fad98b2c68 ([kernel.kallsyms])/0xffffc0fad9ef3d68 ([kernel.kallsyms])/P/-/-/5/RET/- 
   false_sharing.e  414489 [005] 775348.899294:          1                                                  branch:   jcc/not_taken/        ffffc0fad98b3708 ffffc0fad98b3704 get_arm64_ftr_reg+0x30 ([kernel.kallsyms]) 0xffffc0fad98b3704 ([kernel.kallsyms])/0xffffc0fad98b3708 ([kernel.kallsyms])/PN/-/-/6/COND/- 
   false_sharing.e  414489 [005] 775348.899294:          1                                            instructions:   jcc/not_taken/        ffffc0fad98b3708 ffffc0fad98b3704 get_arm64_ftr_reg+0x30 ([kernel.kallsyms]) 0xffffc0fad98b3704 ([kernel.kallsyms])/0xffffc0fad98b3708 ([kernel.kallsyms])/PN/-/-/6/COND/- 
   false_sharing.e  414489 [005] 775348.899297:          1                                                  branch:   return/miss/              ffff8266da60     ffff8266dafc __sprintf_chk@plt+0xc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0) 0xffff8266dafc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)/0xffff8266da60 (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)/M/-/-/12/RET/- 
   false_sharing.e  414489 [005] 775348.899297:          1                                            instructions:   return/miss/              ffff8266da60     ffff8266dafc __sprintf_chk@plt+0xc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0) 0xffff8266dafc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)/0xffff8266da60 (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)/M/-/-/12/RET/- 
   false_sharing.e  414489 [005] 775348.899297:          1                                                  branch:   jcc/miss,not_taken/       ffff826a44ec     ffff826a44e8 strcmp+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so) 0xffff826a44e8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so)/0xffff826a44ec (/usr/lib/aarch64-linux-gnu/ld-2.31.so)/MN/-/-/23/COND/- 
   false_sharing.e  414489 [005] 775348.899297:          1                                            instructions:   jcc/miss,not_taken/       ffff826a44ec     ffff826a44e8 strcmp+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so) 0xffff826a44e8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so)/0xffff826a44ec (/usr/lib/aarch64-linux-gnu/ld-2.31.so)/MN/-/-/23/COND/- 
   false_sharing.e  414489 [005] 775348.899298:          1                                            instructions:                                        0 ffffc0fadaad6124 mas_walk+0x274 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899300:          1                                            instructions:                                        0 ffffc0fad9b3d98c next_uptodate_folio+0x2a4 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899301:          1                                            instructions:                                        0 ffffc0fad98c3dcc __sync_icache_dcache+0x5c ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899301:          1                                                  branch:   jmp                   ffffc0fad9ba7f24 ffffc0fad9ba99c0 folio_add_file_rmap_ptes+0x48 ([kernel.kallsyms]) 0xffffc0fad9ba99c0 ([kernel.kallsyms])/0xffffc0fad9ba7f24 ([kernel.kallsyms])/P/-/-/8//- 
   false_sharing.e  414489 [005] 775348.899301:          1                                            instructions:   jmp                   ffffc0fad9ba7f24 ffffc0fad9ba99c0 folio_add_file_rmap_ptes+0x48 ([kernel.kallsyms]) 0xffffc0fad9ba99c0 ([kernel.kallsyms])/0xffffc0fad9ba7f24 ([kernel.kallsyms])/P/-/-/8//- 
   false_sharing.e  414489 [005] 775348.899306:          1                                            instructions:                                        0 ffffc0fad9b3f184 filemap_map_pages+0x178 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899307:          1                                                  branch:   jcc/not_taken/        ffffc0fad9b3d7b0 ffffc0fad9b3d7ac next_uptodate_folio+0xc4 ([kernel.kallsyms]) 0xffffc0fad9b3d7ac ([kernel.kallsyms])/0xffffc0fad9b3d7b0 ([kernel.kallsyms])/PN/-/-/15/COND/- 
   false_sharing.e  414489 [005] 775348.899307:          1                                            instructions:   jcc/not_taken/        ffffc0fad9b3d7b0 ffffc0fad9b3d7ac next_uptodate_folio+0xc4 ([kernel.kallsyms]) 0xffffc0fad9b3d7ac ([kernel.kallsyms])/0xffffc0fad9b3d7b0 ([kernel.kallsyms])/PN/-/-/15/COND/- 
   false_sharing.e  414489 [005] 775348.899307:          1                                            instructions:                                        0 ffffc0fad9b3d98c next_uptodate_folio+0x2a4 ([kernel.kallsyms])
   false_sharing.e  414489 [005] 775348.899308:          1                                                  branch:   jcc                   ffffc0fad9ef3da4 ffffc0fad9ef3d70 bsearch+0x58 ([kernel.kallsyms]) 0xffffc0fad9ef3d70 ([kernel.kallsyms])/0xffffc0fad9ef3da4 ([kernel.kallsyms])/P/-/-/20/COND/- 
   false_sharing.e  414489 [005] 775348.899308:          1                                            instructions:   jcc                   ffffc0fad9ef3da4 ffffc0fad9ef3d70 bsearch+0x58 ([kernel.kallsyms]) 0xffffc0fad9ef3d70 ([kernel.kallsyms])/0xffffc0fad9ef3da4 ([kernel.kallsyms])/P/-/-/20/COND/- 
   false_sharing.e  414489 [005] 775348.899310:          1                                                  branch:   jmp                   ffffc0fad98a2158 ffffc0fad98a159c el0t_64_sync+0x198 ([kernel.kallsyms]) 0xffffc0fad98a159c ([kernel.kallsyms])/0xffffc0fad98a2158 ([kernel.kallsyms])/P/-/-/5//- 
   false_sharing.e  414489 [005] 775348.899310:          1                                            instructions:   jmp                   ffffc0fad98a2158 ffffc0fad98a159c el0t_64_sync+0x198 ([kernel.kallsyms]) 0xffffc0fad98a159c ([kernel.kallsyms])/0xffffc0fad98a2158 ([kernel.kallsyms])/P/-/-/5//- 
   ...


Leo Yan (11):
  perf script: Make printing flags reliable
  perf script: Refactor sample_flags_to_name() function
  perf script: Separate events from branch types
  perf script: Add not taken event for branches
  perf script: Add not taken event for branch stack
  perf arm-spe: Extend branch operations
  perf arm-spe: Decode transactional event
  perf arm-spe: Fill branch operations and events to record
  perf arm-spe: Set sample flags with supplement info
  perf arm-spe: Add branch stack
  perf arm-spe: Support previous branch target (PBT) address

 tools/perf/builtin-script.c                   |  30 ++--
 .../util/arm-spe-decoder/arm-spe-decoder.c    |  23 ++-
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  11 +-
 .../arm-spe-decoder/arm-spe-pkt-decoder.c     |  14 +-
 .../arm-spe-decoder/arm-spe-pkt-decoder.h     |  12 +-
 tools/perf/util/arm-spe.c                     | 135 ++++++++++++++++++
 tools/perf/util/branch.h                      |   3 +-
 tools/perf/util/event.h                       |  12 +-
 tools/perf/util/trace-event-scripting.c       | 116 +++++++++++----
 tools/perf/util/trace-event.h                 |   2 +
 10 files changed, 307 insertions(+), 51 deletions(-)

Comments

Ian Rogers Feb. 11, 2025, 10:34 p.m. UTC | #1
On Wed, Feb 5, 2025 at 4:16 AM Leo Yan <leo.yan@arm.com> wrote:
>
> This patch series refactors branch flags for support Arm SPE.  The patch
> set is divided into two parts, the first part is for refactoring common
> code and the second part is for enabling Arm SPE.
>
> For refactoring branch flags, the sample flaghs are classified as branch
> types and events.  A program branch type can be conditional branch,
> function call, return or expection taken.  A branch event happens when
> taking a branch.  This series combines branch types and the associated
> events to present a sample flag.
>
> The second part is to enable Arm SPE's sample flags for expressing
> branch types and events, and support branch stack.
>
> Patches 01 - 03 are to refactor branch types and branch events.
> Patches 04, 05 extend to support not-taken event.
>
> Patches 06 - 09 enables branch flags in Arm SPE.  This allows to print
> out sample flags for samples.
>
> Patch 10 supports branch stack for Arm SPE.  Patch 11 is an enhancement
> for PBT feature.
>
> Before:
>   perf record -e arm_spe_0/load_filter=1,store_filter=1,branch_filter=1/ \
>     -- ~/perf-c2c-usage-files/false_sharing.exe 1
>   perf script --itrace=i1ibl -F,+flags,+addr,+brstack
>    false_sharing.e  414489 [005] 775348.899294:          1                                                  branch:   jmp                   ffffc0fad9ef3d68 ffffc0fad98b2c68 search_cmp_ftr_reg+0x8 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899294:          1                                            instructions:   jmp                   ffffc0fad9ef3d68 ffffc0fad98b2c68 search_cmp_ftr_reg+0x8 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899294:          1                                                  branch:   jmp                   ffffc0fad98b3708 ffffc0fad98b3704 get_arm64_ftr_reg+0x30 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899294:          1                                            instructions:   jmp                   ffffc0fad98b3708 ffffc0fad98b3704 get_arm64_ftr_reg+0x30 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899297:          1                                                  branch:   br miss                   ffff8266da60     ffff8266dafc __sprintf_chk@plt+0xc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)
>    false_sharing.e  414489 [005] 775348.899297:          1                                            instructions:   br miss                   ffff8266da60     ffff8266dafc __sprintf_chk@plt+0xc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)
>    false_sharing.e  414489 [005] 775348.899297:          1                                                  branch:   br miss                   ffff826a44ec     ffff826a44e8 strcmp+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so)
>    false_sharing.e  414489 [005] 775348.899297:          1                                            instructions:   br miss                   ffff826a44ec     ffff826a44e8 strcmp+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so)
>    false_sharing.e  414489 [005] 775348.899298:          1                                            instructions:                                        0 ffffc0fadaad6124 mas_walk+0x274 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899300:          1                                            instructions:                                        0 ffffc0fad9b3d98c next_uptodate_folio+0x2a4 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899301:          1                                            instructions:                                        0 ffffc0fad98c3dcc __sync_icache_dcache+0x5c ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899301:          1                                                  branch:   jmp                   ffffc0fad9ba7f24 ffffc0fad9ba99c0 folio_add_file_rmap_ptes+0x48 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899301:          1                                            instructions:   jmp                   ffffc0fad9ba7f24 ffffc0fad9ba99c0 folio_add_file_rmap_ptes+0x48 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899306:          1                                            instructions:                                        0 ffffc0fad9b3f184 filemap_map_pages+0x178 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899307:          1                                                  branch:   jmp                   ffffc0fad9b3d7b0 ffffc0fad9b3d7ac next_uptodate_folio+0xc4 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899307:          1                                            instructions:   jmp                   ffffc0fad9b3d7b0 ffffc0fad9b3d7ac next_uptodate_folio+0xc4 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899307:          1                                            instructions:                                        0 ffffc0fad9b3d98c next_uptodate_folio+0x2a4 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899308:          1                                                  branch:   jmp                   ffffc0fad9ef3da4 ffffc0fad9ef3d70 bsearch+0x58 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899308:          1                                            instructions:   jmp                   ffffc0fad9ef3da4 ffffc0fad9ef3d70 bsearch+0x58 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899310:          1                                                  branch:   jmp                   ffffc0fad98a2158 ffffc0fad98a159c el0t_64_sync+0x198 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899310:          1                                            instructions:   jmp                   ffffc0fad98a2158 ffffc0fad98a159c el0t_64_sync+0x198 ([kernel.kallsyms])
>    ...
>
> After:
>   perf script --itrace=i1ibl -F,+flags,+addr,+brstack
>    false_sharing.e  414489 [005] 775348.899294:          1                                                  branch:   return                ffffc0fad9ef3d68 ffffc0fad98b2c68 search_cmp_ftr_reg+0x8 ([kernel.kallsyms]) 0xffffc0fad98b2c68 ([kernel.kallsyms])/0xffffc0fad9ef3d68 ([kernel.kallsyms])/P/-/-/5/RET/-
>    false_sharing.e  414489 [005] 775348.899294:          1                                            instructions:   return                ffffc0fad9ef3d68 ffffc0fad98b2c68 search_cmp_ftr_reg+0x8 ([kernel.kallsyms]) 0xffffc0fad98b2c68 ([kernel.kallsyms])/0xffffc0fad9ef3d68 ([kernel.kallsyms])/P/-/-/5/RET/-
>    false_sharing.e  414489 [005] 775348.899294:          1                                                  branch:   jcc/not_taken/        ffffc0fad98b3708 ffffc0fad98b3704 get_arm64_ftr_reg+0x30 ([kernel.kallsyms]) 0xffffc0fad98b3704 ([kernel.kallsyms])/0xffffc0fad98b3708 ([kernel.kallsyms])/PN/-/-/6/COND/-
>    false_sharing.e  414489 [005] 775348.899294:          1                                            instructions:   jcc/not_taken/        ffffc0fad98b3708 ffffc0fad98b3704 get_arm64_ftr_reg+0x30 ([kernel.kallsyms]) 0xffffc0fad98b3704 ([kernel.kallsyms])/0xffffc0fad98b3708 ([kernel.kallsyms])/PN/-/-/6/COND/-
>    false_sharing.e  414489 [005] 775348.899297:          1                                                  branch:   return/miss/              ffff8266da60     ffff8266dafc __sprintf_chk@plt+0xc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0) 0xffff8266dafc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)/0xffff8266da60 (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)/M/-/-/12/RET/-
>    false_sharing.e  414489 [005] 775348.899297:          1                                            instructions:   return/miss/              ffff8266da60     ffff8266dafc __sprintf_chk@plt+0xc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0) 0xffff8266dafc (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)/0xffff8266da60 (/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0)/M/-/-/12/RET/-
>    false_sharing.e  414489 [005] 775348.899297:          1                                                  branch:   jcc/miss,not_taken/       ffff826a44ec     ffff826a44e8 strcmp+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so) 0xffff826a44e8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so)/0xffff826a44ec (/usr/lib/aarch64-linux-gnu/ld-2.31.so)/MN/-/-/23/COND/-
>    false_sharing.e  414489 [005] 775348.899297:          1                                            instructions:   jcc/miss,not_taken/       ffff826a44ec     ffff826a44e8 strcmp+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so) 0xffff826a44e8 (/usr/lib/aarch64-linux-gnu/ld-2.31.so)/0xffff826a44ec (/usr/lib/aarch64-linux-gnu/ld-2.31.so)/MN/-/-/23/COND/-
>    false_sharing.e  414489 [005] 775348.899298:          1                                            instructions:                                        0 ffffc0fadaad6124 mas_walk+0x274 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899300:          1                                            instructions:                                        0 ffffc0fad9b3d98c next_uptodate_folio+0x2a4 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899301:          1                                            instructions:                                        0 ffffc0fad98c3dcc __sync_icache_dcache+0x5c ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899301:          1                                                  branch:   jmp                   ffffc0fad9ba7f24 ffffc0fad9ba99c0 folio_add_file_rmap_ptes+0x48 ([kernel.kallsyms]) 0xffffc0fad9ba99c0 ([kernel.kallsyms])/0xffffc0fad9ba7f24 ([kernel.kallsyms])/P/-/-/8//-
>    false_sharing.e  414489 [005] 775348.899301:          1                                            instructions:   jmp                   ffffc0fad9ba7f24 ffffc0fad9ba99c0 folio_add_file_rmap_ptes+0x48 ([kernel.kallsyms]) 0xffffc0fad9ba99c0 ([kernel.kallsyms])/0xffffc0fad9ba7f24 ([kernel.kallsyms])/P/-/-/8//-
>    false_sharing.e  414489 [005] 775348.899306:          1                                            instructions:                                        0 ffffc0fad9b3f184 filemap_map_pages+0x178 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899307:          1                                                  branch:   jcc/not_taken/        ffffc0fad9b3d7b0 ffffc0fad9b3d7ac next_uptodate_folio+0xc4 ([kernel.kallsyms]) 0xffffc0fad9b3d7ac ([kernel.kallsyms])/0xffffc0fad9b3d7b0 ([kernel.kallsyms])/PN/-/-/15/COND/-
>    false_sharing.e  414489 [005] 775348.899307:          1                                            instructions:   jcc/not_taken/        ffffc0fad9b3d7b0 ffffc0fad9b3d7ac next_uptodate_folio+0xc4 ([kernel.kallsyms]) 0xffffc0fad9b3d7ac ([kernel.kallsyms])/0xffffc0fad9b3d7b0 ([kernel.kallsyms])/PN/-/-/15/COND/-
>    false_sharing.e  414489 [005] 775348.899307:          1                                            instructions:                                        0 ffffc0fad9b3d98c next_uptodate_folio+0x2a4 ([kernel.kallsyms])
>    false_sharing.e  414489 [005] 775348.899308:          1                                                  branch:   jcc                   ffffc0fad9ef3da4 ffffc0fad9ef3d70 bsearch+0x58 ([kernel.kallsyms]) 0xffffc0fad9ef3d70 ([kernel.kallsyms])/0xffffc0fad9ef3da4 ([kernel.kallsyms])/P/-/-/20/COND/-
>    false_sharing.e  414489 [005] 775348.899308:          1                                            instructions:   jcc                   ffffc0fad9ef3da4 ffffc0fad9ef3d70 bsearch+0x58 ([kernel.kallsyms]) 0xffffc0fad9ef3d70 ([kernel.kallsyms])/0xffffc0fad9ef3da4 ([kernel.kallsyms])/P/-/-/20/COND/-
>    false_sharing.e  414489 [005] 775348.899310:          1                                                  branch:   jmp                   ffffc0fad98a2158 ffffc0fad98a159c el0t_64_sync+0x198 ([kernel.kallsyms]) 0xffffc0fad98a159c ([kernel.kallsyms])/0xffffc0fad98a2158 ([kernel.kallsyms])/P/-/-/5//-
>    false_sharing.e  414489 [005] 775348.899310:          1                                            instructions:   jmp                   ffffc0fad98a2158 ffffc0fad98a159c el0t_64_sync+0x198 ([kernel.kallsyms]) 0xffffc0fad98a159c ([kernel.kallsyms])/0xffffc0fad98a2158 ([kernel.kallsyms])/P/-/-/5//-
>    ...

Reviewed-by: Ian Rogers <irogers@google.com>

Built and tested (on x86). A little strange patch 5 adds a new bit not
at the end, but "Sample parsing" test wasn't broken so looks like it
is good. I was surprised the use of value in the union:
```
struct branch_flags {
union {
u64 value;
struct {
u64 mispred:1;
u64 predicted:1;
...
```
didn't get broken. Perhaps there's an opportunity for additional tests.

Thanks,
Ian


> Leo Yan (11):
>   perf script: Make printing flags reliable
>   perf script: Refactor sample_flags_to_name() function
>   perf script: Separate events from branch types
>   perf script: Add not taken event for branches
>   perf script: Add not taken event for branch stack
>   perf arm-spe: Extend branch operations
>   perf arm-spe: Decode transactional event
>   perf arm-spe: Fill branch operations and events to record
>   perf arm-spe: Set sample flags with supplement info
>   perf arm-spe: Add branch stack
>   perf arm-spe: Support previous branch target (PBT) address
>
>  tools/perf/builtin-script.c                   |  30 ++--
>  .../util/arm-spe-decoder/arm-spe-decoder.c    |  23 ++-
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  11 +-
>  .../arm-spe-decoder/arm-spe-pkt-decoder.c     |  14 +-
>  .../arm-spe-decoder/arm-spe-pkt-decoder.h     |  12 +-
>  tools/perf/util/arm-spe.c                     | 135 ++++++++++++++++++
>  tools/perf/util/branch.h                      |   3 +-
>  tools/perf/util/event.h                       |  12 +-
>  tools/perf/util/trace-event-scripting.c       | 116 +++++++++++----
>  tools/perf/util/trace-event.h                 |   2 +
>  10 files changed, 307 insertions(+), 51 deletions(-)
>
> --
> 2.34.1
>
Leo Yan Feb. 12, 2025, 8:54 a.m. UTC | #2
Hi Ian,

On Tue, Feb 11, 2025 at 02:34:46PM -0800, Ian Rogers wrote:
> On Wed, Feb 5, 2025 at 4:16 AM Leo Yan <leo.yan@arm.com> wrote:
> >
> > This patch series refactors branch flags for support Arm SPE.  The patch
> > set is divided into two parts, the first part is for refactoring common
> > code and the second part is for enabling Arm SPE.

[...]
> 
> Reviewed-by: Ian Rogers <irogers@google.com>
> 
> Built and tested (on x86). A little strange patch 5 adds a new bit not
> at the end, but "Sample parsing" test wasn't broken so looks like it
> is good. I was surprised the use of value in the union:
> ```
> struct branch_flags {
> union {
> u64 value;
> struct {
> u64 mispred:1;
> u64 predicted:1;
> ...
> ```
> didn't get broken. Perhaps there's an opportunity for additional tests.

If the branch stack's flag sticks to a hardware format, then the patch 5
is concerned.  My understanding is the branch flag is a synthesized
value (see intel_pt_lbr_flags() for x86).  So it is fine for rearrange
the bit layout.

The "Sample parsing" test is for big/little endian test, it does not
test for specific bit ordering, this is why the test passes.

If you think it is safer to move the new added bit at the tail of the
bit definitions (just before the 'reserved' field), I can send a new
version for this.  Please let me know your preference.

Thanks for review and test!

Leo
Ian Rogers Feb. 12, 2025, 4:14 p.m. UTC | #3
On Wed, Feb 12, 2025 at 12:54 AM Leo Yan <leo.yan@arm.com> wrote:
>
> Hi Ian,
>
> On Tue, Feb 11, 2025 at 02:34:46PM -0800, Ian Rogers wrote:
> > On Wed, Feb 5, 2025 at 4:16 AM Leo Yan <leo.yan@arm.com> wrote:
> > >
> > > This patch series refactors branch flags for support Arm SPE.  The patch
> > > set is divided into two parts, the first part is for refactoring common
> > > code and the second part is for enabling Arm SPE.
>
> [...]
> >
> > Reviewed-by: Ian Rogers <irogers@google.com>
> >
> > Built and tested (on x86). A little strange patch 5 adds a new bit not
> > at the end, but "Sample parsing" test wasn't broken so looks like it
> > is good. I was surprised the use of value in the union:
> > ```
> > struct branch_flags {
> > union {
> > u64 value;
> > struct {
> > u64 mispred:1;
> > u64 predicted:1;
> > ...
> > ```
> > didn't get broken. Perhaps there's an opportunity for additional tests.
>
> If the branch stack's flag sticks to a hardware format, then the patch 5
> is concerned.  My understanding is the branch flag is a synthesized
> value (see intel_pt_lbr_flags() for x86).  So it is fine for rearrange
> the bit layout.
>
> The "Sample parsing" test is for big/little endian test, it does not
> test for specific bit ordering, this is why the test passes.
>
> If you think it is safer to move the new added bit at the tail of the
> bit definitions (just before the 'reserved' field), I can send a new
> version for this.  Please let me know your preference.

I think it is fine as is. I was worried that because the bit fields
are checked here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/sample-parsing.c?h=perf-tools-next#n35
```
/*
 * Hardcode the expected values for branch_entry flags.
 * These are based on the input value (213) specified
 * in branch_stack variable.
 */
#define BS_EXPECTED_BE 0xa000d00000000000
#define BS_EXPECTED_LE 0x1aa00000000
```
that the adjustment would break it. But I ran the test and it passed :-)

Thanks,
Ian

> Thanks for review and test!
>
> Leo
Namhyung Kim Feb. 13, 2025, 5:29 a.m. UTC | #4
On Wed, Feb 12, 2025 at 08:14:48AM -0800, Ian Rogers wrote:
> On Wed, Feb 12, 2025 at 12:54 AM Leo Yan <leo.yan@arm.com> wrote:
> >
> > Hi Ian,
> >
> > On Tue, Feb 11, 2025 at 02:34:46PM -0800, Ian Rogers wrote:
> > > On Wed, Feb 5, 2025 at 4:16 AM Leo Yan <leo.yan@arm.com> wrote:
> > > >
> > > > This patch series refactors branch flags for support Arm SPE.  The patch
> > > > set is divided into two parts, the first part is for refactoring common
> > > > code and the second part is for enabling Arm SPE.
> >
> > [...]
> > >
> > > Reviewed-by: Ian Rogers <irogers@google.com>
> > >
> > > Built and tested (on x86). A little strange patch 5 adds a new bit not
> > > at the end, but "Sample parsing" test wasn't broken so looks like it
> > > is good. I was surprised the use of value in the union:
> > > ```
> > > struct branch_flags {
> > > union {
> > > u64 value;
> > > struct {
> > > u64 mispred:1;
> > > u64 predicted:1;
> > > ...
> > > ```
> > > didn't get broken. Perhaps there's an opportunity for additional tests.

Probably because it just checks the value as a whole u64, not each
bitfield.  But it seems to test if the value of the input sample data
and synthesized-and-parsed output sample data is same.  So it may not be
important what value it has.

Anyway it'd be nice if any ARM folks can review this series.

Thanks,
Namhyung
Leo Yan Feb. 14, 2025, 11:32 a.m. UTC | #5
Hi Ian, Namhyung,

On Wed, Feb 12, 2025 at 09:29:28PM -0800, Namhyung Kim wrote:

[...]

> > > > Built and tested (on x86). A little strange patch 5 adds a new bit not
> > > > at the end, but "Sample parsing" test wasn't broken so looks like it
> > > > is good. I was surprised the use of value in the union:
> > > > ```
> > > > struct branch_flags {
> > > > union {
> > > > u64 value;
> > > > struct {
> > > > u64 mispred:1;
> > > > u64 predicted:1;
> > > > ...
> > > > ```
> > > > didn't get broken. Perhaps there's an opportunity for additional tests.
> 
> Probably because it just checks the value as a whole u64, not each
> bitfield.  But it seems to test if the value of the input sample data
> and synthesized-and-parsed output sample data is same.  So it may not be
> important what value it has.
> 
> Anyway it'd be nice if any ARM folks can review this series.

After discussed with James, I concluded that it has risk to break
other arches (e.g., x86 LBR).  So I have sent out patch set v2 [1]
to keep the existed bitfield layout in patch 05, and added Ian's
review tags.

I expect James will give a review the new series.

Thanks,
Leo

[1] https://lore.kernel.org/linux-perf-users/20250214111936.15168-1-leo.yan@arm.com/T/#t