mbox series

[v2,00/12] Add build ID to stacktraces

Message ID 20210324020443.1815557-1-swboyd@chromium.org (mailing list archive)
Headers show
Series Add build ID to stacktraces | expand

Message

Stephen Boyd March 24, 2021, 2:04 a.m. UTC
This series adds the kernel's build ID[1] to the stacktrace header printed
in oops messages, warnings, etc. and the build ID for any module that
appears in the stacktrace after the module name. The goal is to make the
stacktrace more self-contained and descriptive by including the relevant
build IDs in the kernel logs when something goes wrong. This can be used
by post processing tools like script/decode_stacktrace.sh and kernel
developers to easily locate the debug info associated with a kernel
crash and line up what line and file things started falling apart at.

To show how this can be used I've included a patch to
decode_stacktrace.sh that downloads the debuginfo from a debuginfod
server.

This also includes some patches to make the buildid.c file use more
const arguments and consolidate logic into buildid.c from kdump. These
are left to the end as they were mostly cleanup patches. I don't know
who exactly maintains this so I guess Andrew is the best option to merge
all this code.

Here's an example lkdtm stacktrace on arm64.

 WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm]
 Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE
 CPU: 4 PID: 3255 Comm: bash Not tainted 5.11 #3 aa23f7a1231c229de205662d5a9e0d4c580f19a1
 Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
 pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--)
 pc : lkdtm_WARNING+0x28/0x30 [lkdtm]
 lr : lkdtm_do_action+0x24/0x40 [lkdtm]
 sp : ffffffc0134fbca0
 x29: ffffffc0134fbca0 x28: ffffff92d53ba240
 x27: 0000000000000000 x26: 0000000000000000
 x25: 0000000000000000 x24: ffffffe3622352c0
 x23: 0000000000000020 x22: ffffffe362233366
 x21: ffffffe3622352e0 x20: ffffffc0134fbde0
 x19: 0000000000000008 x18: 0000000000000000
 x17: ffffff929b6536fc x16: 0000000000000000
 x15: 0000000000000000 x14: 0000000000000012
 x13: ffffffe380ed892c x12: ffffffe381d05068
 x11: 0000000000000000 x10: 0000000000000000
 x9 : 0000000000000001 x8 : ffffffe362237000
 x7 : aaaaaaaaaaaaaaaa x6 : 0000000000000000
 x5 : 0000000000000000 x4 : 0000000000000001
 x3 : 0000000000000008 x2 : ffffff93fef25a70
 x1 : ffffff93fef15788 x0 : ffffffe3622352e0
 Call trace:
  lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
  direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
  full_proxy_write+0x74/0xa4
  vfs_write+0xec/0x2e8
  ksys_write+0x84/0xf0
  __arm64_sys_write+0x24/0x30
  el0_svc_common+0xf4/0x1c0
  do_el0_svc_compat+0x28/0x3c
  el0_svc_compat+0x10/0x1c
  el0_sync_compat_handler+0xa8/0xcc
  el0_sync_compat+0x178/0x180
 ---[ end trace 3d95032303e59e68 ]---

Changes from v1 (https://lore.kernel.org/r/20210301174749.1269154-1-swboyd@chromium.org):
 * New printk format %pSb and %pSr
 * Return binary format instead of hex format string from build ID APIs
 * Some new patches to cleanup buildid/decode_stacktrace.sh
 * A new patch to decode_stacktrace.sh to parse output

[1] https://fedoraproject.org/wiki/Releases/FeatureBuildId

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: <kexec@lists.infradead.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: <linux-doc@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <x86@kernel.org>

Stephen Boyd (12):
  buildid: Add API to parse build ID out of buffer
  buildid: Add method to get running kernel's build ID
  dump_stack: Add vmlinux build ID to stack traces
  module: Add printk format to add module build ID to stacktraces
  arm64: stacktrace: Use %pSb for backtrace printing
  x86/dumpstack: Use %pSb for backtrace printing
  scripts/decode_stacktrace.sh: Support debuginfod
  scripts/decode_stacktrace.sh: Silence stderr messages from
    addr2line/nm
  scripts/decode_stacktrace.sh: Indicate 'auto' can be used for base
    path
  buildid: Mark some arguments const
  buildid: Fix kernel-doc notation
  kdump: Use vmlinux_build_id() to simplify

 Documentation/core-api/printk-formats.rst |  9 +++
 arch/arm64/kernel/stacktrace.c            |  2 +-
 arch/x86/kernel/dumpstack.c               |  4 +-
 include/linux/buildid.h                   |  3 +
 include/linux/crash_core.h                |  6 +-
 include/linux/kallsyms.h                  | 13 +++-
 include/linux/module.h                    |  6 +-
 kernel/crash_core.c                       | 41 +----------
 kernel/kallsyms.c                         | 73 ++++++++++++++-----
 kernel/module.c                           | 24 +++++-
 lib/buildid.c                             | 75 +++++++++++++++----
 lib/dump_stack.c                          |  5 +-
 lib/vsprintf.c                            |  6 +-
 scripts/decode_stacktrace.sh              | 89 +++++++++++++++++++----
 14 files changed, 251 insertions(+), 105 deletions(-)


base-commit: a38fd8748464831584a19438cbb3082b5a2dab15

Comments

Christoph Hellwig March 24, 2021, 8:55 a.m. UTC | #1
On Tue, Mar 23, 2021 at 07:04:31PM -0700, Stephen Boyd wrote:
>  x5 : 0000000000000000 x4 : 0000000000000001
>  x3 : 0000000000000008 x2 : ffffff93fef25a70
>  x1 : ffffff93fef15788 x0 : ffffffe3622352e0
>  Call trace:
>   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
>   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]

Yikes.  No, please do not make the backtraces a complete mess for
something that serves absolutely no need.
Stephen Boyd March 24, 2021, 7:04 p.m. UTC | #2
HTML mail?

Quoting Konstantin Khlebnikov (2021-03-24 01:23:55)
> 24.03.2021, 05:04, "Stephen Boyd" <swboyd@chromium.org>:
>  
> Looks too noisy for me. Maybe print id in the line "Modules linked in:"?
> I suppose only out-of-tree modules need this?
>  

Please see this note in patch 4:

Originally, I put this on the %pS format, but that was quickly rejected
given that %pS is used in other places such as ftrace where build IDs
aren't meaningful. There was some discussions on the list to put every
module build ID into the "Modules linked in:" section of the stacktrace
message but that quickly becomes very hard to read once you have more
than three or four modules linked in. It also provides too much
information when we don't expect each module to be traversed in a
stacktrace. Having the build ID for modules that aren't important just
makes things messy. Splitting it to multiple lines for each module
quickly explodes the number of lines printed in an oops too, possibly
wrapping the warning off the console. And finally, trying to stash away
each module used in a callstack to provide the ID of each symbol printed
is cumbersome and would require changes to each architecture to stash
away modules and return their build IDs once unwinding has completed.
Peter Enderborg March 25, 2021, 11:06 a.m. UTC | #3
On 3/24/21 9:55 AM, Christoph Hellwig wrote:
> On Tue, Mar 23, 2021 at 07:04:31PM -0700, Stephen Boyd wrote:
>>  x5 : 0000000000000000 x4 : 0000000000000001
>>  x3 : 0000000000000008 x2 : ffffff93fef25a70
>>  x1 : ffffff93fef15788 x0 : ffffffe3622352e0
>>  Call trace:
>>   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
>>   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> Yikes.  No, please do not make the backtraces a complete mess for
> something that serves absolutely no need.

Would a "verbose" flag be acceptable solution?    Something like write 1 to /sys/kernel/debug/verbose_stack to get the extra info.

I think I see a need for it.
Peter Enderborg March 25, 2021, 11:14 a.m. UTC | #4
On 3/24/21 3:04 AM, Stephen Boyd wrote:
> This series adds the kernel's build ID[1] to the stacktrace header printed
> in oops messages, warnings, etc. and the build ID for any module that
> appears in the stacktrace after the module name. The goal is to make the
> stacktrace more self-contained and descriptive by including the relevant
> build IDs in the kernel logs when something goes wrong. This can be used
> by post processing tools like script/decode_stacktrace.sh and kernel
> developers to easily locate the debug info associated with a kernel
> crash and line up what line and file things started falling apart at.
>
> To show how this can be used I've included a patch to
> decode_stacktrace.sh that downloads the debuginfo from a debuginfod
> server.
>
> This also includes some patches to make the buildid.c file use more
> const arguments and consolidate logic into buildid.c from kdump. These
> are left to the end as they were mostly cleanup patches. I don't know
> who exactly maintains this so I guess Andrew is the best option to merge
> all this code.
>
> Here's an example lkdtm stacktrace on arm64.
>
>  WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm]
>  Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE
>  CPU: 4 PID: 3255 Comm: bash Not tainted 5.11 #3 aa23f7a1231c229de205662d5a9e0d4c580f19a1
>  Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
>  pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--)
>  pc : lkdtm_WARNING+0x28/0x30 [lkdtm]
>  lr : lkdtm_do_action+0x24/0x40 [lkdtm]
>  sp : ffffffc0134fbca0
>  x29: ffffffc0134fbca0 x28: ffffff92d53ba240
>  x27: 0000000000000000 x26: 0000000000000000
>  x25: 0000000000000000 x24: ffffffe3622352c0
>  x23: 0000000000000020 x22: ffffffe362233366
>  x21: ffffffe3622352e0 x20: ffffffc0134fbde0
>  x19: 0000000000000008 x18: 0000000000000000
>  x17: ffffff929b6536fc x16: 0000000000000000
>  x15: 0000000000000000 x14: 0000000000000012
>  x13: ffffffe380ed892c x12: ffffffe381d05068
>  x11: 0000000000000000 x10: 0000000000000000
>  x9 : 0000000000000001 x8 : ffffffe362237000
>  x7 : aaaaaaaaaaaaaaaa x6 : 0000000000000000
>  x5 : 0000000000000000 x4 : 0000000000000001
>  x3 : 0000000000000008 x2 : ffffff93fef25a70
>  x1 : ffffff93fef15788 x0 : ffffffe3622352e0
>  Call trace:
>   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
>   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
>   full_proxy_write+0x74/0xa4
>   vfs_write+0xec/0x2e8
>   ksys_write+0x84/0xf0
>   __arm64_sys_write+0x24/0x30
>   el0_svc_common+0xf4/0x1c0
>   do_el0_svc_compat+0x28/0x3c
>   el0_svc_compat+0x10/0x1c
>   el0_sync_compat_handler+0xa8/0xcc
>   el0_sync_compat+0x178/0x180
>  ---[ end trace 3d95032303e59e68 ]---

How will this work with the ftrace?
Stephen Boyd March 25, 2021, 11:18 p.m. UTC | #5
Quoting peter enderborg (2021-03-25 04:14:31)
> >   el0_sync_compat_handler+0xa8/0xcc
> >   el0_sync_compat+0x178/0x180
> >  ---[ end trace 3d95032303e59e68 ]---
> 
> How will this work with the ftrace?
> 

It won't affect ftrace, if that's the question you're asking.
Stephen Boyd March 25, 2021, 11:21 p.m. UTC | #6
Quoting peter enderborg (2021-03-25 04:06:17)
> On 3/24/21 9:55 AM, Christoph Hellwig wrote:
> > On Tue, Mar 23, 2021 at 07:04:31PM -0700, Stephen Boyd wrote:
> >>  x5 : 0000000000000000 x4 : 0000000000000001
> >>  x3 : 0000000000000008 x2 : ffffff93fef25a70
> >>  x1 : ffffff93fef15788 x0 : ffffffe3622352e0
> >>  Call trace:
> >>   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> >>   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> > Yikes.  No, please do not make the backtraces a complete mess for
> > something that serves absolutely no need.

It serves a need. Please look at the patches to understand that I'm
adding the buildid to automatically find the associated debug
information on distros.

> 
> Would a "verbose" flag be acceptable solution?    Something like write 1 to /sys/kernel/debug/verbose_stack to get the extra info.
> 
> I think I see a need for it.
> 

Or a kernel config option and a commandline parameter? That would be OK
for me as I said on v1 of this series. I'll add that in for the next
patch series given all the distaste for some more hex characters next to
the module name.
Petr Mladek March 30, 2021, 10:59 a.m. UTC | #7
On Thu 2021-03-25 16:21:46, Stephen Boyd wrote:
> Quoting peter enderborg (2021-03-25 04:06:17)
> > On 3/24/21 9:55 AM, Christoph Hellwig wrote:
> > > On Tue, Mar 23, 2021 at 07:04:31PM -0700, Stephen Boyd wrote:
> > >>  x5 : 0000000000000000 x4 : 0000000000000001
> > >>  x3 : 0000000000000008 x2 : ffffff93fef25a70
> > >>  x1 : ffffff93fef15788 x0 : ffffffe3622352e0
> > >>  Call trace:
> > >>   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> > >>   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> > > Yikes.  No, please do not make the backtraces a complete mess for
> > > something that serves absolutely no need.
> 
> It serves a need. Please look at the patches to understand that I'm
> adding the buildid to automatically find the associated debug
> information on distros.
> 
> > 
> > Would a "verbose" flag be acceptable solution?    Something like write 1 to /sys/kernel/debug/verbose_stack to get the extra info.
> > 
> > I think I see a need for it.
> > 
> 
> Or a kernel config option and a commandline parameter? That would be OK
> for me as I said on v1 of this series. I'll add that in for the next
> patch series given all the distaste for some more hex characters next to
> the module name.

IMHO, a build configure option would fit the best here.

It does not make sense to show the ID when the kernel vendor does
not have a service to download the related binaries. But it makes
sense to show the buildid by default when the provider/distro has
the service and want to use the ID when handling bug reports.

We could always add boot/run time options when people really need it.

Best Regards,
Petr