mbox series

[v6,00/13] Add build ID to stacktraces

Message ID 20210511003845.2429846-1-swboyd@chromium.org (mailing list archive)
Headers show
Series Add build ID to stacktraces | expand

Message

Stephen Boyd May 11, 2021, 12:38 a.m. UTC
This series adds the kernel's build ID[1] to the stacktrace header
printed in oops messages, warnings, etc. and the build ID for any module
that appears in the stacktrace after the module name. The goal is to
make the stacktrace more self-contained and descriptive by including the
relevant build IDs in the kernel logs when something goes wrong. This
can be used by post processing tools like script/decode_stacktrace.sh
and kernel developers to easily locate the debug info associated with a
kernel crash and line up what line and file things started falling apart
at.

To show how this can be used I've included a patch to
decode_stacktrace.sh that downloads the debuginfo from a debuginfod
server. This also includes some patches to make the buildid.c file use
more const arguments and consolidate logic into buildid.c from kdump.
These are left to the end as they were mostly cleanup patches.

I don't know who exactly maintains this so I guess Andrew is the best
option to merge all this code. Otherwise, Petr mentioned it could
possibly go through the printk tree given that it touches mostly printk
things.

Here's an example lkdtm stacktrace on arm64.

 WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm]
 Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE
 CPU: 4 PID: 3255 Comm: bash Not tainted 5.11 #3 aa23f7a1231c229de205662d5a9e0d4c580f19a1
 Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
 pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--)
 pc : lkdtm_WARNING+0x28/0x30 [lkdtm]
 lr : lkdtm_do_action+0x24/0x40 [lkdtm]
 sp : ffffffc0134fbca0
 x29: ffffffc0134fbca0 x28: ffffff92d53ba240
 x27: 0000000000000000 x26: 0000000000000000
 x25: 0000000000000000 x24: ffffffe3622352c0
 x23: 0000000000000020 x22: ffffffe362233366
 x21: ffffffe3622352e0 x20: ffffffc0134fbde0
 x19: 0000000000000008 x18: 0000000000000000
 x17: ffffff929b6536fc x16: 0000000000000000
 x15: 0000000000000000 x14: 0000000000000012
 x13: ffffffe380ed892c x12: ffffffe381d05068
 x11: 0000000000000000 x10: 0000000000000000
 x9 : 0000000000000001 x8 : ffffffe362237000
 x7 : aaaaaaaaaaaaaaaa x6 : 0000000000000000
 x5 : 0000000000000000 x4 : 0000000000000001
 x3 : 0000000000000008 x2 : ffffff93fef25a70
 x1 : ffffff93fef15788 x0 : ffffffe3622352e0
 Call trace:
  lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
  direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
  full_proxy_write+0x74/0xa4
  vfs_write+0xec/0x2e8
  ksys_write+0x84/0xf0
  __arm64_sys_write+0x24/0x30
  el0_svc_common+0xf4/0x1c0
  do_el0_svc_compat+0x28/0x3c
  el0_svc_compat+0x10/0x1c
  el0_sync_compat_handler+0xa8/0xcc
  el0_sync_compat+0x178/0x180
 ---[ end trace 3d95032303e59e68 ]---

Changes from v5 (https://lore.kernel.org/r/20210420215003.3510247-1-swboyd@chromium.org):
 * Rebased onto v5.12
 * Moved logic for when to include build ID code to kdump patch
 * Simplified commit text to show before/after details

Changes from v4 (https://lore.kernel.org/r/20210410015300.3764485-1-swboyd@chromium.org):
 * Stubbed out more code when CONFIG_STACKTRACE_BUILD_ID=n
 * Use static_assert instead of BUILD_BUG_ON()
 * Dropped bad printk change to IP on x86

Changes from v3 (https://lore.kernel.org/r/20210331030520.3816265-1-swboyd@chromium.org):
 * Fixed compilation warnings due to config changes
 * Fixed kernel-doc on init_vmlinx_build_id()
 * Totally removed add_build_id_vmcoreinfo()
 * Added another printk format %pBb to help x86 print backtraces
 * Some BUILD_BUG_ON() checks to make sure the buildid doesn't get bigger or smaller

Changes from v2 (https://lore.kernel.org/r/20210324020443.1815557-1-swboyd@chromium.org):
 * Renamed symbol printing function to indicate build IDness
 * Put build ID information behind Kconfig knob
 * Build ID for vmlinux is calculated in early init instead of on demand
 * printk format is %pS[R]b

Changes from v1 (https://lore.kernel.org/r/20210301174749.1269154-1-swboyd@chromium.org):
 * New printk format %pSb and %pSr
 * Return binary format instead of hex format string from build ID APIs
 * Some new patches to cleanup buildid/decode_stacktrace.sh
 * A new patch to decode_stacktrace.sh to parse output

[1] https://fedoraproject.org/wiki/Releases/FeatureBuildId

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: <kexec@lists.infradead.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: <linux-doc@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <x86@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: peter enderborg <peter.enderborg@sony.com>

Stephen Boyd (13):
  buildid: Only consider GNU notes for build ID parsing
  buildid: Add API to parse build ID out of buffer
  buildid: Stash away kernels build ID on init
  dump_stack: Add vmlinux build ID to stack traces
  module: Add printk formats to add module build ID to stacktraces
  arm64: stacktrace: Use %pSb for backtrace printing
  x86/dumpstack: Use %pSb/%pBb for backtrace printing
  scripts/decode_stacktrace.sh: Support debuginfod
  scripts/decode_stacktrace.sh: Silence stderr messages from
    addr2line/nm
  scripts/decode_stacktrace.sh: Indicate 'auto' can be used for base
    path
  buildid: Mark some arguments const
  buildid: Fix kernel-doc notation
  kdump: Use vmlinux_build_id to simplify

 Documentation/core-api/printk-formats.rst |  11 +++
 arch/arm64/kernel/stacktrace.c            |   2 +-
 arch/x86/kernel/dumpstack.c               |   2 +-
 include/linux/buildid.h                   |   8 ++
 include/linux/crash_core.h                |  12 +--
 include/linux/kallsyms.h                  |  20 ++++-
 include/linux/module.h                    |   8 +-
 init/main.c                               |   1 +
 kernel/crash_core.c                       |  50 +----------
 kernel/kallsyms.c                         | 101 +++++++++++++++++-----
 kernel/module.c                           |  31 ++++++-
 lib/Kconfig.debug                         |  11 +++
 lib/buildid.c                             |  74 ++++++++++++----
 lib/dump_stack.c                          |  13 ++-
 lib/vsprintf.c                            |   8 +-
 scripts/decode_stacktrace.sh              |  89 +++++++++++++++----
 16 files changed, 327 insertions(+), 114 deletions(-)


base-commit: 9f4ad9e425a1d3b6a34617b8ea226d56a119a717

Comments

Petr Mladek May 11, 2021, 11:48 a.m. UTC | #1
On Mon 2021-05-10 17:38:32, Stephen Boyd wrote:
> This series adds the kernel's build ID[1] to the stacktrace header
> printed in oops messages, warnings, etc. and the build ID for any module
> that appears in the stacktrace after the module name. The goal is to
> make the stacktrace more self-contained and descriptive by including the
> relevant build IDs in the kernel logs when something goes wrong. This
> can be used by post processing tools like script/decode_stacktrace.sh
> and kernel developers to easily locate the debug info associated with a
> kernel crash and line up what line and file things started falling apart
> at.

The entire series looks good to me.

I reviewed carefully only the 5th patch touching printk/kallsyms/module
code. I just scanned over the other patches touching kernel code
and did not notice any obvious problem. I did not check the changes
in decode_stacktrace.sh at all.

I tried to get stacktraces on x86_64 and it worked as expected.

Best Regards,
Petr
David Laight May 11, 2021, 12:36 p.m. UTC | #2
From: Stephen Boyd
> Sent: 11 May 2021 01:39
> 
> This series adds the kernel's build ID[1] to the stacktrace header
> printed in oops messages, warnings, etc. and the build ID for any module
> that appears in the stacktrace after the module name. The goal is to
> make the stacktrace more self-contained and descriptive by including the
> relevant build IDs in the kernel logs when something goes wrong. This
> can be used by post processing tools like script/decode_stacktrace.sh
> and kernel developers to easily locate the debug info associated with a
> kernel crash and line up what line and file things started falling apart
> at.
> 
> To show how this can be used I've included a patch to
> decode_stacktrace.sh that downloads the debuginfo from a debuginfod
> server. 
...
> Here's an example lkdtm stacktrace on arm64.
> 
>  WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm]
>  Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE
>  CPU: 4 PID: 3255 Comm: bash Not tainted 5.11 #3 aa23f7a1231c229de205662d5a9e0d4c580f19a1
>  Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
...
>  x1 : ffffff93fef15788 x0 : ffffffe3622352e0
>  Call trace:
>   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
>   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
>   full_proxy_write+0x74/0xa4

Is there any way to get it to print each module ID only once?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Steven Rostedt May 11, 2021, 12:52 p.m. UTC | #3
On Tue, 11 May 2021 12:36:06 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> >  x1 : ffffff93fef15788 x0 : ffffffe3622352e0
> >  Call trace:
> >   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> >   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> >   full_proxy_write+0x74/0xa4  
> 
> Is there any way to get it to print each module ID only once?

If there's a trivial way to do that, then perhaps it should be done, but for
now, this patch series isn't as obnoxious as the previous versions. It only
affects stack traces, and I'm fine with that.

-- Steve
David Laight May 11, 2021, 12:58 p.m. UTC | #4
From: Steven Rostedt
> Sent: 11 May 2021 13:53
> 
> On Tue, 11 May 2021 12:36:06 +0000
> David Laight <David.Laight@ACULAB.COM> wrote:
> 
> > >  x1 : ffffff93fef15788 x0 : ffffffe3622352e0
> > >  Call trace:
> > >   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> > >   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> > >   full_proxy_write+0x74/0xa4
> >
> > Is there any way to get it to print each module ID only once?
> 
> If there's a trivial way to do that, then perhaps it should be done, but for
> now, this patch series isn't as obnoxious as the previous versions. It only
> affects stack traces, and I'm fine with that.

True. Printing the id in the module list was horrid.

The real downside is all the extra text that will overflow the
in-kernel buffer.
At least it shouldn't be extra lines causing screen wrap.
Unless the variable names are long - hi rust :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Petr Mladek May 11, 2021, 2:21 p.m. UTC | #5
On Tue 2021-05-11 12:58:47, David Laight wrote:
> From: Steven Rostedt
> > Sent: 11 May 2021 13:53
> > 
> > On Tue, 11 May 2021 12:36:06 +0000
> > David Laight <David.Laight@ACULAB.COM> wrote:
> > 
> > > >  x1 : ffffff93fef15788 x0 : ffffffe3622352e0
> > > >  Call trace:
> > > >   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> > > >   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> > > >   full_proxy_write+0x74/0xa4
> > >
> > > Is there any way to get it to print each module ID only once?
> > 
> > If there's a trivial way to do that, then perhaps it should be done, but for
> > now, this patch series isn't as obnoxious as the previous versions. It only
> > affects stack traces, and I'm fine with that.
> 
> True. Printing the id in the module list was horrid.
> 
> The real downside is all the extra text that will overflow the
> in-kernel buffer.
> At least it shouldn't be extra lines causing screen wrap.
> Unless the variable names are long - hi rust :-)

Note that the ID is printed only when CONFIG_STACKTRACE_BUILD_ID
is enabled. It will be used only by some distros/vendors that
use it to download the debuginfo packages.

Best Regards,
Petr
David Laight May 11, 2021, 2:31 p.m. UTC | #6
From: Petr Mladek
> Sent: 11 May 2021 15:22
> 
> On Tue 2021-05-11 12:58:47, David Laight wrote:
> > From: Steven Rostedt
> > > Sent: 11 May 2021 13:53
> > >
> > > On Tue, 11 May 2021 12:36:06 +0000
> > > David Laight <David.Laight@ACULAB.COM> wrote:
> > >
> > > > >  x1 : ffffff93fef15788 x0 : ffffffe3622352e0
> > > > >  Call trace:
> > > > >   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> > > > >   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
> > > > >   full_proxy_write+0x74/0xa4
> > > >
> > > > Is there any way to get it to print each module ID only once?
> > >
> > > If there's a trivial way to do that, then perhaps it should be done, but for
> > > now, this patch series isn't as obnoxious as the previous versions. It only
> > > affects stack traces, and I'm fine with that.
> >
> > True. Printing the id in the module list was horrid.
> >
> > The real downside is all the extra text that will overflow the
> > in-kernel buffer.
> > At least it shouldn't be extra lines causing screen wrap.
> > Unless the variable names are long - hi rust :-)
> 
> Note that the ID is printed only when CONFIG_STACKTRACE_BUILD_ID
> is enabled. It will be used only by some distros/vendors that
> use it to download the debuginfo packages.

Until Ubuntu decide to turn it on :-)

Actually, for the use case, the id could be trimmed significantly.
It is only trying to differentiate between builds of a specific module.
So even 8 digits would be plenty.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Matthew Wilcox (Oracle) May 11, 2021, 2:35 p.m. UTC | #7
On Tue, May 11, 2021 at 02:31:38PM +0000, David Laight wrote:
> Actually, for the use case, the id could be trimmed significantly.
> It is only trying to differentiate between builds of a specific module.
> So even 8 digits would be plenty.

asked and answered.  please review the bidding.