mbox series

[v5,0/9] perf cs-etm: Support thread stack and callchain

Message ID 20200220052701.7754-1-leo.yan@linaro.org (mailing list archive)
Headers show
Series perf cs-etm: Support thread stack and callchain | expand

Message

Leo Yan Feb. 20, 2020, 5:26 a.m. UTC
This patch series adds support for thread stack and callchain; this patch
set depends on the instruction sample fix patch set [1].

This patch set get more complex, so before divide into small groups, I'd
like to use this patch set version to include all relevant patches, hope
this can give whole context for related code change.

Briefly, this patch can be divided into three parts, which also can be
reviewed separately for every part:

Patches 01, 02 are used to fix samples for one corner case is for
accessing the branch's target address and trigger an exception.
Essentially, an extra branch sample is added to reflect this
mediate branch between the previous branch and exception entry.

Patches 03, 04, 05, 06 are coming from patch v4, which are used to
support thread stack and callchain.

Patches 07, 08, 09 are used to fixup for exception entry and exit.  This
is mainly used to fix two cases, one part is to fixup the thread stack
and callchain for the case when access branch target address and trigger
exception; another part is to fixup the thread stack for instruction
emulation (and other single step cases).

This patch set has been tested on Juno-r2 after applied on perf/core
branch with latest commit 85fc95d75970 ("perf maps: Add missing unlock
to maps__insert() error case"), and this patch set is also applied on
top of instruction sample fix patch set [1].


Test for option '-F,+callindent':

  # perf script -F,+callindent
            main  3258          1          branches:         main                                                         ffffad684d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so)
            main  3258          1          branches:             lib_loop_test@plt                                        aaaae2c4d78c main+0x18 (/root/coresight_test/main)
            main  3258          1          branches:                 _dl_fixup                                            ffffad811b4c _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main  3258          1          branches:                     _dl_lookup_symbol_x                              ffffad80c078 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main  3258          1          branches:                         do_lookup_x                                  ffffad80849c _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main  3258          1          branches:                             check_match                              ffffad807bf0 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main  3258          1          branches:                                 strcmp                               ffffad807888 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
            main  3258          1          branches:             lib_loop_test@plt                                        aaaae2c4d78c main+0x18 (/root/coresight_test/main)
            main  3258          1          branches:             lib_loop_test@plt                                        aaaae2c4d78c main+0x18 (/root/coresight_test/main)
            main  3258          1          branches:             lib_loop_test@plt                                        aaaae2c4d78c main+0x18 (/root/coresight_test/main)
            main  3258          1          branches:             lib_loop_test@plt                                        aaaae2c4d78c main+0x18 (/root/coresight_test/main)

  [...]


Test for option '--itrace=g':

  # perf script --itrace=g16l64i100

main  3258        100      instructions: 
	    ffffad816a80 memcpy+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad809468 _dl_new_object+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

main  3258        100      instructions: 
	    ffffad80952c _dl_new_object+0x16c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

main  3258        100      instructions: 
	    ffffad8018dc dl_main+0x814 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

main  3258        100      instructions: 
	ffff8000100878d0 el0_sync_handler+0x168 ([kernel.kallsyms])
	ffff800010082d00 el0_sync+0x140 ([kernel.kallsyms])
	    ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
	    ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)

  [...]

Changes from v4:
* Addressed Mike's suggestion for performance improvement for function
  cs_etm__instr_addr() for quick calculation for non T32;
* Removed the patch 'perf cs-etm: Synchronize instruction sample with
  the thread stack' (Mike);
* Fixed the issue for exception is taken for branch target address
  accessing, for the branch sample and stack thread handling, the
  related patches are 01, 02, 07;
* Fixed the stack thread handling for instruction emulation and single
  step with patches 08, 09.

Changes from v3:
* Split out separate patch set for instruction samples fixing.
* Rebased on latest perf/core branch.

Changes from v2:
* Added patch 01 to fix the unsigned variable comparison to zero
  (Suzuki).
* Refined commit logs.

Changes from v1:
* Added comments for task thread handling (Mathieu).
* Split patch 02 into two patches, one is for support thread stack and
  another is for callchain support (Mathieu).
* Added a new patch to support branch filter.

[1] https://lkml.org/lkml/2020/2/18/1406


Leo Yan (9):
  perf cs-etm: Defer to assign exception sample flag
  perf cs-etm: Reflect branch prior to exception
  perf cs-etm: Refactor instruction size handling
  perf cs-etm: Support thread stack
  perf cs-etm: Support branch filter
  perf cs-etm: Support callchain for instruction sample
  perf cs-etm: Fixup exception entry for thread stack
  perf thread: Add helper to get top return address
  perf cs-etm: Fixup exception exit for thread stack

 .../perf/util/cs-etm-decoder/cs-etm-decoder.c |   1 +
 tools/perf/util/cs-etm.c                      | 290 ++++++++++++++++--
 tools/perf/util/thread-stack.c                |  10 +
 tools/perf/util/thread-stack.h                |   1 +
 4 files changed, 268 insertions(+), 34 deletions(-)

Comments

Stephen Boyd Nov. 5, 2020, 10:50 p.m. UTC | #1
Quoting Leo Yan (2020-02-19 21:26:52)
> This patch series adds support for thread stack and callchain; this patch
> set depends on the instruction sample fix patch set [1].
> 
> This patch set get more complex, so before divide into small groups, I'd
> like to use this patch set version to include all relevant patches, hope
> this can give whole context for related code change.

Was this split up into small groups and sent again? I didn't see
anything when searching lkml.
Leo Yan Nov. 6, 2020, 2:09 a.m. UTC | #2
Hi Stephen,

On Thu, Nov 05, 2020 at 02:50:56PM -0800, Stephen Boyd wrote:
> Quoting Leo Yan (2020-02-19 21:26:52)
> > This patch series adds support for thread stack and callchain; this patch
> > set depends on the instruction sample fix patch set [1].
> > 
> > This patch set get more complex, so before divide into small groups, I'd
> > like to use this patch set version to include all relevant patches, hope
> > this can give whole context for related code change.
> 
> Was this split up into small groups and sent again? I didn't see
> anything when searching lkml.

No, this patch series is the last one for upstreaming; since I worked
on other stuffs, so didn't continue to upstream to the mainline kernel.

IIRC, there have a concern for a pontential breakage for perf cs-etm
testing, so falls to backlog.  Let me check with Mathieu/Mike offline
for how to proceed for this patch set.  Thanks for bringing up.

Leo