Message ID | 20241010101510.1487477-1-mark.rutland@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | arm64: stacktrace: improve unwind reporting | expand |
Hi, On Thu, 10 Oct 2024, Mark Rutland wrote: > This series improves arm64's unwinder to explicitly identify exception > boundaries, reporting both pt_regs::pc and pt_regs::lr and explicitly > identifying the source of elements in the stacktrace. This is useful to > humans when reviewing a stacktrace, and serves as infrastructure that > can be used for RELIABLE_STACKTRACE in future. > > The first 6 patches are preparatory work that are not intended to have > any functional impact, with patches 7 to 10 making the key changes. > Largely this involves teaching the unwinder to track metadata for each > unwind step, and modifying the way we manage pt_regs::stackframe so that > exception boundaries can be identifier explcitily. > > With this series applied, the unwinder will report when unwind elements are not > simply the result of a frame pointer based unwind, e.g. > > | Call trace: > | show_stack+0x20/0x38 (CF) > | dump_stack_lvl+0x60/0x80 (F) > | dump_stack+0x18/0x28 > | nmi_cpu_backtrace+0xfc/0x140 > | nmi_trigger_cpumask_backtrace+0x1c8/0x200 > | arch_trigger_cpumask_backtrace+0x20/0x40 > | sysrq_handle_showallcpus+0x24/0x38 (F) > | __handle_sysrq+0xa8/0x1b0 (F) > | handle_sysrq+0x38/0x50 (F) > | pl011_int+0x420/0x570 (F) > | __handle_irq_event_percpu+0x60/0x220 (F) > | handle_irq_event+0x54/0xc0 (F) > | handle_fasteoi_irq+0xa8/0x1d0 (F) > | generic_handle_domain_irq+0x34/0x58 (F) > | gic_handle_irq+0x54/0x140 (FK) > | call_on_irq_stack+0x24/0x58 (F) > | do_interrupt_handler+0x88/0xa0 > | el1_interrupt+0x34/0x68 (F) > | el1h_64_irq_handler+0x18/0x28 > | el1h_64_irq+0x6c/0x70 > | default_idle_call+0x34/0x180 (P) > | default_idle_call+0x28/0x180 (L) > | do_idle+0x204/0x268 > | cpu_startup_entry+0x3c/0x50 (F) > | rest_init+0xe4/0xf0 > | start_kernel+0x738/0x740 > | __primary_switched+0x88/0x98 > > ... where: > > * "C" indicates that the first element of the trace was the caller of an unwind > function (vs "T" for a blocked task's stave PC, or "P" for a pt_regs::pc). > > * "F" indicates that the element was recovered from fgraph (and > would otherwise have been reported as return_to_handler). > > * "K" indicates that the element was recovered from kretprobes (and > would otherwise have been reported as __kretprobe_trampoline). > > * "P" indicates that the element was recovered from pt_regs::pc, and therefore > this is the first element after an exception boundary. > > * "L" indidates that the element was recovered from pt_regs::lr, and therefore > this may or may not be reliable depending on whether the LR was live at the > moment the exception was taken. > > Mark. > > Mark Rutland (10): > arm64: pt_regs: assert pt_regs is a multiple of 16 bytes > arm64: pt_regs: remove stale big-endian layout > arm64: pt_regs: rename "pmr_save" -> "pmr" > arm64: pt_regs: swap 'unused' and 'pmr' fields > arm64: use a common struct frame_record > arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk() > arm64: stacktrace: report source of unwind data > arm64: stacktrace: report recovered PCs > arm64: stacktrace: split unwind_consume_stack() > arm64: stacktrace: unwind exception boundaries > > arch/arm64/include/asm/daifflags.h | 2 +- > arch/arm64/include/asm/processor.h | 2 +- > arch/arm64/include/asm/ptrace.h | 22 ++- > arch/arm64/include/asm/stacktrace/common.h | 74 +++++---- > arch/arm64/include/asm/stacktrace/frame.h | 48 ++++++ > arch/arm64/kernel/asm-offsets.c | 3 +- > arch/arm64/kernel/entry.S | 16 +- > arch/arm64/kernel/head.S | 3 + > arch/arm64/kernel/probes/stBV5U5j | 0 > arch/arm64/kernel/process.c | 5 +- > arch/arm64/kernel/signal.c | 5 - > arch/arm64/kernel/stacktrace.c | 176 +++++++++++++++++++-- > 12 files changed, 287 insertions(+), 69 deletions(-) > create mode 100644 arch/arm64/include/asm/stacktrace/frame.h Very nice! Reviewed-by: Miroslav Benes <mbenes@suse.cz> > create mode 100644 arch/arm64/kernel/probes/stBV5U5j but this should probably disappear :) M
On Fri, Oct 11, 2024 at 05:19:15PM +0200, Miroslav Benes wrote: > On Thu, 10 Oct 2024, Mark Rutland wrote: > > > This series improves arm64's unwinder to explicitly identify exception > > boundaries, reporting both pt_regs::pc and pt_regs::lr and explicitly > > identifying the source of elements in the stacktrace. This is useful to > > humans when reviewing a stacktrace, and serves as infrastructure that > > can be used for RELIABLE_STACKTRACE in future. [...] > > ... where: > > > > * "C" indicates that the first element of the trace was the caller of an unwind > > function (vs "T" for a blocked task's stave PC, or "P" for a pt_regs::pc). Ugh, s/stave/saved/, at least that was correct in the actual commit message that introduced it. [...] > > arm64: pt_regs: assert pt_regs is a multiple of 16 bytes > > arm64: pt_regs: remove stale big-endian layout > > arm64: pt_regs: rename "pmr_save" -> "pmr" > > arm64: pt_regs: swap 'unused' and 'pmr' fields > > arm64: use a common struct frame_record > > arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk() > > arm64: stacktrace: report source of unwind data > > arm64: stacktrace: report recovered PCs > > arm64: stacktrace: split unwind_consume_stack() > > arm64: stacktrace: unwind exception boundaries > > > > arch/arm64/include/asm/daifflags.h | 2 +- > > arch/arm64/include/asm/processor.h | 2 +- > > arch/arm64/include/asm/ptrace.h | 22 ++- > > arch/arm64/include/asm/stacktrace/common.h | 74 +++++---- > > arch/arm64/include/asm/stacktrace/frame.h | 48 ++++++ > > arch/arm64/kernel/asm-offsets.c | 3 +- > > arch/arm64/kernel/entry.S | 16 +- > > arch/arm64/kernel/head.S | 3 + > > arch/arm64/kernel/probes/stBV5U5j | 0 > > arch/arm64/kernel/process.c | 5 +- > > arch/arm64/kernel/signal.c | 5 - > > arch/arm64/kernel/stacktrace.c | 176 +++++++++++++++++++-- > > 12 files changed, 287 insertions(+), 69 deletions(-) > > create mode 100644 arch/arm64/include/asm/stacktrace/frame.h > > Very nice! > > Reviewed-by: Miroslav Benes <mbenes@suse.cz> Thanks! > > create mode 100644 arch/arm64/kernel/probes/stBV5U5j > > but this should probably disappear :) Ack; gone. I'll hold off sending a v2 with the fixups until middle/late next week so that people get a chance to respond with anything more significant and so that I can look over this with fresh eyes and clear out any more typos... In the mean time I've pushed out the fixed-up version to: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace/metadata Mark.
Mark Rutland <mark.rutland@arm.com> writes: > This series improves arm64's unwinder to explicitly identify exception > boundaries, reporting both pt_regs::pc and pt_regs::lr and explicitly > identifying the source of elements in the stacktrace. This is useful to > humans when reviewing a stacktrace, and serves as infrastructure that > can be used for RELIABLE_STACKTRACE in future. > > The first 6 patches are preparatory work that are not intended to have > any functional impact, with patches 7 to 10 making the key changes. > Largely this involves teaching the unwinder to track metadata for each > unwind step, and modifying the way we manage pt_regs::stackframe so that > exception boundaries can be identifier explcitily. > > With this series applied, the unwinder will report when unwind elements are not > simply the result of a frame pointer based unwind, e.g. > > | Call trace: > | show_stack+0x20/0x38 (CF) > | dump_stack_lvl+0x60/0x80 (F) > | dump_stack+0x18/0x28 > | nmi_cpu_backtrace+0xfc/0x140 > | nmi_trigger_cpumask_backtrace+0x1c8/0x200 > | arch_trigger_cpumask_backtrace+0x20/0x40 > | sysrq_handle_showallcpus+0x24/0x38 (F) > | __handle_sysrq+0xa8/0x1b0 (F) > | handle_sysrq+0x38/0x50 (F) > | pl011_int+0x420/0x570 (F) > | __handle_irq_event_percpu+0x60/0x220 (F) > | handle_irq_event+0x54/0xc0 (F) > | handle_fasteoi_irq+0xa8/0x1d0 (F) > | generic_handle_domain_irq+0x34/0x58 (F) > | gic_handle_irq+0x54/0x140 (FK) > | call_on_irq_stack+0x24/0x58 (F) > | do_interrupt_handler+0x88/0xa0 > | el1_interrupt+0x34/0x68 (F) > | el1h_64_irq_handler+0x18/0x28 > | el1h_64_irq+0x6c/0x70 > | default_idle_call+0x34/0x180 (P) > | default_idle_call+0x28/0x180 (L) > | do_idle+0x204/0x268 > | cpu_startup_entry+0x3c/0x50 (F) > | rest_init+0xe4/0xf0 > | start_kernel+0x738/0x740 > | __primary_switched+0x88/0x98 > > ... where: > > * "C" indicates that the first element of the trace was the caller of an unwind > function (vs "T" for a blocked task's stave PC, or "P" for a pt_regs::pc). > > * "F" indicates that the element was recovered from fgraph (and > would otherwise have been reported as return_to_handler). > > * "K" indicates that the element was recovered from kretprobes (and > would otherwise have been reported as __kretprobe_trampoline). > > * "P" indicates that the element was recovered from pt_regs::pc, and therefore > this is the first element after an exception boundary. > > * "L" indidates that the element was recovered from pt_regs::lr, and therefore > this may or may not be reliable depending on whether the LR was live at the > moment the exception was taken. > > Mark. with all the typos reported by others fixed. Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Thanks, Puranjay Mohan