mbox series

[00/10] arm64: stacktrace: improve unwind reporting

Message ID 20241010101510.1487477-1-mark.rutland@arm.com (mailing list archive)
Headers show
Series arm64: stacktrace: improve unwind reporting | expand

Message

Mark Rutland Oct. 10, 2024, 10:15 a.m. UTC
This series improves arm64's unwinder to explicitly identify exception
boundaries, reporting both pt_regs::pc and pt_regs::lr and explicitly
identifying the source of elements in the stacktrace. This is useful to
humans when reviewing a stacktrace, and serves as infrastructure that
can be used for RELIABLE_STACKTRACE in future.

The first 6 patches are preparatory work that are not intended to have
any functional impact, with patches 7 to 10 making the key changes.
Largely this involves teaching the unwinder to track metadata for each
unwind step, and modifying the way we manage pt_regs::stackframe so that
exception boundaries can be identifier explcitily.

With this series applied, the unwinder will report when unwind elements are not
simply the result of a frame pointer based unwind, e.g.

| Call trace:
|  show_stack+0x20/0x38 (CF)
|  dump_stack_lvl+0x60/0x80 (F)
|  dump_stack+0x18/0x28
|  nmi_cpu_backtrace+0xfc/0x140
|  nmi_trigger_cpumask_backtrace+0x1c8/0x200
|  arch_trigger_cpumask_backtrace+0x20/0x40
|  sysrq_handle_showallcpus+0x24/0x38 (F)
|  __handle_sysrq+0xa8/0x1b0 (F)
|  handle_sysrq+0x38/0x50 (F)
|  pl011_int+0x420/0x570 (F)
|  __handle_irq_event_percpu+0x60/0x220 (F)
|  handle_irq_event+0x54/0xc0 (F)
|  handle_fasteoi_irq+0xa8/0x1d0 (F)
|  generic_handle_domain_irq+0x34/0x58 (F)
|  gic_handle_irq+0x54/0x140 (FK)
|  call_on_irq_stack+0x24/0x58 (F)
|  do_interrupt_handler+0x88/0xa0
|  el1_interrupt+0x34/0x68 (F)
|  el1h_64_irq_handler+0x18/0x28
|  el1h_64_irq+0x6c/0x70
|  default_idle_call+0x34/0x180 (P)
|  default_idle_call+0x28/0x180 (L)
|  do_idle+0x204/0x268
|  cpu_startup_entry+0x3c/0x50 (F)
|  rest_init+0xe4/0xf0
|  start_kernel+0x738/0x740
|  __primary_switched+0x88/0x98

... where:

* "C" indicates that the first element of the trace was the caller of an unwind
  function (vs "T" for a blocked task's stave PC, or "P" for a pt_regs::pc).

* "F" indicates that the element was recovered from fgraph (and
  would otherwise have been reported as return_to_handler).

* "K" indicates that the element was recovered from kretprobes (and
  would otherwise have been reported as __kretprobe_trampoline).

* "P" indicates that the element was recovered from pt_regs::pc, and therefore
  this is the first element after an exception boundary.

* "L" indidates that the element was recovered from pt_regs::lr, and therefore
  this may or may not be reliable depending on whether the LR was live at the
  moment the exception was taken.

Mark.

Mark Rutland (10):
  arm64: pt_regs: assert pt_regs is a multiple of 16 bytes
  arm64: pt_regs: remove stale big-endian layout
  arm64: pt_regs: rename "pmr_save" -> "pmr"
  arm64: pt_regs: swap 'unused' and 'pmr' fields
  arm64: use a common struct frame_record
  arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk()
  arm64: stacktrace: report source of unwind data
  arm64: stacktrace: report recovered PCs
  arm64: stacktrace: split unwind_consume_stack()
  arm64: stacktrace: unwind exception boundaries

 arch/arm64/include/asm/daifflags.h         |   2 +-
 arch/arm64/include/asm/processor.h         |   2 +-
 arch/arm64/include/asm/ptrace.h            |  22 ++-
 arch/arm64/include/asm/stacktrace/common.h |  74 +++++----
 arch/arm64/include/asm/stacktrace/frame.h  |  48 ++++++
 arch/arm64/kernel/asm-offsets.c            |   3 +-
 arch/arm64/kernel/entry.S                  |  16 +-
 arch/arm64/kernel/head.S                   |   3 +
 arch/arm64/kernel/probes/stBV5U5j          |   0
 arch/arm64/kernel/process.c                |   5 +-
 arch/arm64/kernel/signal.c                 |   5 -
 arch/arm64/kernel/stacktrace.c             | 176 +++++++++++++++++++--
 12 files changed, 287 insertions(+), 69 deletions(-)
 create mode 100644 arch/arm64/include/asm/stacktrace/frame.h
 create mode 100644 arch/arm64/kernel/probes/stBV5U5j

Comments

Miroslav Benes Oct. 11, 2024, 3:19 p.m. UTC | #1
Hi,

On Thu, 10 Oct 2024, Mark Rutland wrote:

> This series improves arm64's unwinder to explicitly identify exception
> boundaries, reporting both pt_regs::pc and pt_regs::lr and explicitly
> identifying the source of elements in the stacktrace. This is useful to
> humans when reviewing a stacktrace, and serves as infrastructure that
> can be used for RELIABLE_STACKTRACE in future.
> 
> The first 6 patches are preparatory work that are not intended to have
> any functional impact, with patches 7 to 10 making the key changes.
> Largely this involves teaching the unwinder to track metadata for each
> unwind step, and modifying the way we manage pt_regs::stackframe so that
> exception boundaries can be identifier explcitily.
> 
> With this series applied, the unwinder will report when unwind elements are not
> simply the result of a frame pointer based unwind, e.g.
> 
> | Call trace:
> |  show_stack+0x20/0x38 (CF)
> |  dump_stack_lvl+0x60/0x80 (F)
> |  dump_stack+0x18/0x28
> |  nmi_cpu_backtrace+0xfc/0x140
> |  nmi_trigger_cpumask_backtrace+0x1c8/0x200
> |  arch_trigger_cpumask_backtrace+0x20/0x40
> |  sysrq_handle_showallcpus+0x24/0x38 (F)
> |  __handle_sysrq+0xa8/0x1b0 (F)
> |  handle_sysrq+0x38/0x50 (F)
> |  pl011_int+0x420/0x570 (F)
> |  __handle_irq_event_percpu+0x60/0x220 (F)
> |  handle_irq_event+0x54/0xc0 (F)
> |  handle_fasteoi_irq+0xa8/0x1d0 (F)
> |  generic_handle_domain_irq+0x34/0x58 (F)
> |  gic_handle_irq+0x54/0x140 (FK)
> |  call_on_irq_stack+0x24/0x58 (F)
> |  do_interrupt_handler+0x88/0xa0
> |  el1_interrupt+0x34/0x68 (F)
> |  el1h_64_irq_handler+0x18/0x28
> |  el1h_64_irq+0x6c/0x70
> |  default_idle_call+0x34/0x180 (P)
> |  default_idle_call+0x28/0x180 (L)
> |  do_idle+0x204/0x268
> |  cpu_startup_entry+0x3c/0x50 (F)
> |  rest_init+0xe4/0xf0
> |  start_kernel+0x738/0x740
> |  __primary_switched+0x88/0x98
> 
> ... where:
> 
> * "C" indicates that the first element of the trace was the caller of an unwind
>   function (vs "T" for a blocked task's stave PC, or "P" for a pt_regs::pc).
> 
> * "F" indicates that the element was recovered from fgraph (and
>   would otherwise have been reported as return_to_handler).
> 
> * "K" indicates that the element was recovered from kretprobes (and
>   would otherwise have been reported as __kretprobe_trampoline).
> 
> * "P" indicates that the element was recovered from pt_regs::pc, and therefore
>   this is the first element after an exception boundary.
> 
> * "L" indidates that the element was recovered from pt_regs::lr, and therefore
>   this may or may not be reliable depending on whether the LR was live at the
>   moment the exception was taken.
> 
> Mark.
> 
> Mark Rutland (10):
>   arm64: pt_regs: assert pt_regs is a multiple of 16 bytes
>   arm64: pt_regs: remove stale big-endian layout
>   arm64: pt_regs: rename "pmr_save" -> "pmr"
>   arm64: pt_regs: swap 'unused' and 'pmr' fields
>   arm64: use a common struct frame_record
>   arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk()
>   arm64: stacktrace: report source of unwind data
>   arm64: stacktrace: report recovered PCs
>   arm64: stacktrace: split unwind_consume_stack()
>   arm64: stacktrace: unwind exception boundaries
> 
>  arch/arm64/include/asm/daifflags.h         |   2 +-
>  arch/arm64/include/asm/processor.h         |   2 +-
>  arch/arm64/include/asm/ptrace.h            |  22 ++-
>  arch/arm64/include/asm/stacktrace/common.h |  74 +++++----
>  arch/arm64/include/asm/stacktrace/frame.h  |  48 ++++++
>  arch/arm64/kernel/asm-offsets.c            |   3 +-
>  arch/arm64/kernel/entry.S                  |  16 +-
>  arch/arm64/kernel/head.S                   |   3 +
>  arch/arm64/kernel/probes/stBV5U5j          |   0
>  arch/arm64/kernel/process.c                |   5 +-
>  arch/arm64/kernel/signal.c                 |   5 -
>  arch/arm64/kernel/stacktrace.c             | 176 +++++++++++++++++++--
>  12 files changed, 287 insertions(+), 69 deletions(-)
>  create mode 100644 arch/arm64/include/asm/stacktrace/frame.h

Very nice!

Reviewed-by: Miroslav Benes <mbenes@suse.cz>

>  create mode 100644 arch/arm64/kernel/probes/stBV5U5j

but this should probably disappear :)

M
Mark Rutland Oct. 11, 2024, 4:32 p.m. UTC | #2
On Fri, Oct 11, 2024 at 05:19:15PM +0200, Miroslav Benes wrote:
> On Thu, 10 Oct 2024, Mark Rutland wrote:
> 
> > This series improves arm64's unwinder to explicitly identify exception
> > boundaries, reporting both pt_regs::pc and pt_regs::lr and explicitly
> > identifying the source of elements in the stacktrace. This is useful to
> > humans when reviewing a stacktrace, and serves as infrastructure that
> > can be used for RELIABLE_STACKTRACE in future.

[...]

> > ... where:
> > 
> > * "C" indicates that the first element of the trace was the caller of an unwind
> >   function (vs "T" for a blocked task's stave PC, or "P" for a pt_regs::pc).

Ugh, s/stave/saved/, at least that was correct in the actual commit
message that introduced it.

[...]

> >   arm64: pt_regs: assert pt_regs is a multiple of 16 bytes
> >   arm64: pt_regs: remove stale big-endian layout
> >   arm64: pt_regs: rename "pmr_save" -> "pmr"
> >   arm64: pt_regs: swap 'unused' and 'pmr' fields
> >   arm64: use a common struct frame_record
> >   arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk()
> >   arm64: stacktrace: report source of unwind data
> >   arm64: stacktrace: report recovered PCs
> >   arm64: stacktrace: split unwind_consume_stack()
> >   arm64: stacktrace: unwind exception boundaries
> > 
> >  arch/arm64/include/asm/daifflags.h         |   2 +-
> >  arch/arm64/include/asm/processor.h         |   2 +-
> >  arch/arm64/include/asm/ptrace.h            |  22 ++-
> >  arch/arm64/include/asm/stacktrace/common.h |  74 +++++----
> >  arch/arm64/include/asm/stacktrace/frame.h  |  48 ++++++
> >  arch/arm64/kernel/asm-offsets.c            |   3 +-
> >  arch/arm64/kernel/entry.S                  |  16 +-
> >  arch/arm64/kernel/head.S                   |   3 +
> >  arch/arm64/kernel/probes/stBV5U5j          |   0
> >  arch/arm64/kernel/process.c                |   5 +-
> >  arch/arm64/kernel/signal.c                 |   5 -
> >  arch/arm64/kernel/stacktrace.c             | 176 +++++++++++++++++++--
> >  12 files changed, 287 insertions(+), 69 deletions(-)
> >  create mode 100644 arch/arm64/include/asm/stacktrace/frame.h
> 
> Very nice!
> 
> Reviewed-by: Miroslav Benes <mbenes@suse.cz>

Thanks!

> >  create mode 100644 arch/arm64/kernel/probes/stBV5U5j
> 
> but this should probably disappear :)

Ack; gone.

I'll hold off sending a v2 with the fixups until middle/late next week
so that people get a chance to respond with anything more significant
and so that I can look over this with fresh eyes and clear out any more
typos...

In the mean time I've pushed out the fixed-up version to:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace/metadata

Mark.
Puranjay Mohan Oct. 15, 2024, 11:54 a.m. UTC | #3
Mark Rutland <mark.rutland@arm.com> writes:

> This series improves arm64's unwinder to explicitly identify exception
> boundaries, reporting both pt_regs::pc and pt_regs::lr and explicitly
> identifying the source of elements in the stacktrace. This is useful to
> humans when reviewing a stacktrace, and serves as infrastructure that
> can be used for RELIABLE_STACKTRACE in future.
>
> The first 6 patches are preparatory work that are not intended to have
> any functional impact, with patches 7 to 10 making the key changes.
> Largely this involves teaching the unwinder to track metadata for each
> unwind step, and modifying the way we manage pt_regs::stackframe so that
> exception boundaries can be identifier explcitily.
>
> With this series applied, the unwinder will report when unwind elements are not
> simply the result of a frame pointer based unwind, e.g.
>
> | Call trace:
> |  show_stack+0x20/0x38 (CF)
> |  dump_stack_lvl+0x60/0x80 (F)
> |  dump_stack+0x18/0x28
> |  nmi_cpu_backtrace+0xfc/0x140
> |  nmi_trigger_cpumask_backtrace+0x1c8/0x200
> |  arch_trigger_cpumask_backtrace+0x20/0x40
> |  sysrq_handle_showallcpus+0x24/0x38 (F)
> |  __handle_sysrq+0xa8/0x1b0 (F)
> |  handle_sysrq+0x38/0x50 (F)
> |  pl011_int+0x420/0x570 (F)
> |  __handle_irq_event_percpu+0x60/0x220 (F)
> |  handle_irq_event+0x54/0xc0 (F)
> |  handle_fasteoi_irq+0xa8/0x1d0 (F)
> |  generic_handle_domain_irq+0x34/0x58 (F)
> |  gic_handle_irq+0x54/0x140 (FK)
> |  call_on_irq_stack+0x24/0x58 (F)
> |  do_interrupt_handler+0x88/0xa0
> |  el1_interrupt+0x34/0x68 (F)
> |  el1h_64_irq_handler+0x18/0x28
> |  el1h_64_irq+0x6c/0x70
> |  default_idle_call+0x34/0x180 (P)
> |  default_idle_call+0x28/0x180 (L)
> |  do_idle+0x204/0x268
> |  cpu_startup_entry+0x3c/0x50 (F)
> |  rest_init+0xe4/0xf0
> |  start_kernel+0x738/0x740
> |  __primary_switched+0x88/0x98
>
> ... where:
>
> * "C" indicates that the first element of the trace was the caller of an unwind
>   function (vs "T" for a blocked task's stave PC, or "P" for a pt_regs::pc).
>
> * "F" indicates that the element was recovered from fgraph (and
>   would otherwise have been reported as return_to_handler).
>
> * "K" indicates that the element was recovered from kretprobes (and
>   would otherwise have been reported as __kretprobe_trampoline).
>
> * "P" indicates that the element was recovered from pt_regs::pc, and therefore
>   this is the first element after an exception boundary.
>
> * "L" indidates that the element was recovered from pt_regs::lr, and therefore
>   this may or may not be reliable depending on whether the LR was live at the
>   moment the exception was taken.
>
> Mark.

with all the typos reported by others fixed.

Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>

Thanks,
Puranjay Mohan