Message ID | 20211109202848.610874-3-gpiccoli@igalia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Some improvements on panic_print | expand |
On Tue, Nov 09, 2021 at 05:28:47PM -0300, Guilherme G. Piccoli wrote: > Currently the "panic_print" parameter/sysctl allows some interesting debug > information to be printed during a panic event. This is useful for example > in cases the user cannot kdump due to resource limits, or if the user > collects panic logs in a serial output (or pstore) and prefers a fast > reboot instead of a kdump. > > Happens that currently there's no way to see all CPUs backtraces in > a panic using "panic_print" on architectures that support that. We do > have "oops_all_cpu_backtrace" sysctl, but although partially overlapping > in the functionality, they are orthogonal in nature: "panic_print" is > a panic tuning (and we have panics without oopses, like direct calls to > panic() or maybe other paths that don't go through oops_enter() > function), and the original purpose of "oops_all_cpu_backtrace" is to > provide more information on oopses for cases in which the users desire > to continue running the kernel even after an oops, i.e., used in > non-panic scenarios. > > So, we hereby introduce an additional bit for "panic_print" to allow > dumping the CPUs backtraces during a panic event. This looks to be helpful for debugging panic. Reviewed-by: Feng Tang <feng.tang@intel.com> Thanks, Feng > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > --- > Documentation/admin-guide/kernel-parameters.txt | 1 + > Documentation/admin-guide/sysctl/kernel.rst | 1 + > kernel/panic.c | 4 ++++ > 3 files changed, 6 insertions(+) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 0905d2cdb2d5..569d035c4332 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3690,6 +3690,7 @@ > bit 3: print locks info if CONFIG_LOCKDEP is on > bit 4: print ftrace buffer > bit 5: print all printk messages in buffer > + bit 6: print all CPUs backtrace (if available in the arch) > > panic_on_taint= Bitmask for conditionally calling panic() in add_taint() > Format: <hex>[,nousertaint] > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst > index 70b7df9b081a..1666c1a9dbba 100644 > --- a/Documentation/admin-guide/sysctl/kernel.rst > +++ b/Documentation/admin-guide/sysctl/kernel.rst > @@ -796,6 +796,7 @@ bit 2 print timer info > bit 3 print locks info if ``CONFIG_LOCKDEP`` is on > bit 4 print ftrace buffer > bit 5: print all printk messages in buffer > +bit 6: print all CPUs backtrace (if available in the arch) > ===== ============================================ > > So for example to print tasks and memory info on panic, user can:: > diff --git a/kernel/panic.c b/kernel/panic.c > index cefd7d82366f..5da71fa4e5f1 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -65,6 +65,7 @@ EXPORT_SYMBOL_GPL(panic_timeout); > #define PANIC_PRINT_LOCK_INFO 0x00000008 > #define PANIC_PRINT_FTRACE_INFO 0x00000010 > #define PANIC_PRINT_ALL_PRINTK_MSG 0x00000020 > +#define PANIC_PRINT_ALL_CPU_BT 0x00000040 > unsigned long panic_print; > > ATOMIC_NOTIFIER_HEAD(panic_notifier_list); > @@ -151,6 +152,9 @@ static void panic_print_sys_info(void) > if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG) > console_flush_on_panic(CONSOLE_REPLAY_ALL); > > + if (panic_print & PANIC_PRINT_ALL_CPU_BT) > + trigger_all_cpu_backtrace(); > + > if (panic_print & PANIC_PRINT_TASK_INFO) > show_state(); > > -- > 2.33.1
On 30/11/2021 02:12, Feng Tang wrote: > On Tue, Nov 09, 2021 at 05:28:47PM -0300, Guilherme G. Piccoli wrote: >> [...] > This looks to be helpful for debugging panic. > > Reviewed-by: Feng Tang <feng.tang@intel.com> > > Thanks, > Feng Thanks a lot Feng, for both your reviews! Do you have any opinions about patch 3? Also, as a generic question to all CCed, what is the way forward with this thread? Cheers, Guilherme
On Fri, Dec 03, 2021 at 12:09:06PM -0300, Guilherme G. Piccoli wrote: > On 30/11/2021 02:12, Feng Tang wrote: > > On Tue, Nov 09, 2021 at 05:28:47PM -0300, Guilherme G. Piccoli wrote: > >> [...] > > This looks to be helpful for debugging panic. > > > > Reviewed-by: Feng Tang <feng.tang@intel.com> > > > > Thanks, > > Feng > > Thanks a lot Feng, for both your reviews! Do you have any opinions about > patch 3? > > Also, as a generic question to all CCed, what is the way forward with > this thread? > Cheers, mcgrof@sumo ~/linux-next (git::master)$ ./scripts/get_maintainer.pl kernel/printk/ Petr Mladek <pmladek@suse.com> (maintainer:PRINTK) Sergey Senozhatsky <senozhatsky@chromium.org> (maintainer:PRINTK) Steven Rostedt <rostedt@goodmis.org> (reviewer:PRINTK) John Ogness <john.ogness@linutronix.de> (reviewer:PRINTK) linux-kernel@vger.kernel.org (open list) So I suggest you email the patches to those. Luis
On 19/12/2021 17:11, Luis Chamberlain wrote: > mcgrof@sumo ~/linux-next (git::master)$ ./scripts/get_maintainer.pl > kernel/printk/ > Petr Mladek <pmladek@suse.com> (maintainer:PRINTK) > Sergey Senozhatsky <senozhatsky@chromium.org> (maintainer:PRINTK) > Steven Rostedt <rostedt@goodmis.org> (reviewer:PRINTK) > John Ogness <john.ogness@linutronix.de> (reviewer:PRINTK) > linux-kernel@vger.kernel.org (open list) > > So I suggest you email the patches to those. > > Luis > Hi Luis, thank you! But I confess I'm very confused with this series. I saw emails from Andrew that the patches had been accepted and were available in -mm tree ([0] for example) but I'm not seeing them in linux-next nor mmotm/mmots (although I saw them in mmotm directory for a while before). Andrew, could you clarify the state of them? Appreciate that! Cheers, Guilherme [0] https://lore.kernel.org/mm-commits/20211214182909._sQRtXv89%25akpm@linux-foundation.org/
On Mon, 20 Dec 2021 09:38:23 -0300 "Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote: > On 19/12/2021 17:11, Luis Chamberlain wrote: > > mcgrof@sumo ~/linux-next (git::master)$ ./scripts/get_maintainer.pl > > kernel/printk/ > > Petr Mladek <pmladek@suse.com> (maintainer:PRINTK) > > Sergey Senozhatsky <senozhatsky@chromium.org> (maintainer:PRINTK) > > Steven Rostedt <rostedt@goodmis.org> (reviewer:PRINTK) > > John Ogness <john.ogness@linutronix.de> (reviewer:PRINTK) > > linux-kernel@vger.kernel.org (open list) > > > > So I suggest you email the patches to those. > > > > Luis > > > > Hi Luis, thank you! But I confess I'm very confused with this series. I > saw emails from Andrew that the patches had been accepted and were > available in -mm tree ([0] for example) but I'm not seeing them in > linux-next nor mmotm/mmots (although I saw them in mmotm directory for a > while before). > > Andrew, could you clarify the state of them? They'll turn up on ozlabs after I've tested it all then uploaded. I do this a couple of times a week, approx.
On 21/12/2021 20:48, Andrew Morton wrote: >[...] > > They'll turn up on ozlabs after I've tested it all then uploaded. I do > this a couple of times a week, approx. > OK, thank you Andrew. Will I get some ping when that happens, through some bot or anything? I'm waiting them to show up in linux-next tree so to start a backport to an in-house kernel and starting using them. Cheers, Guilherme
On Tue 2021-11-09 17:28:47, Guilherme G. Piccoli wrote: > Currently the "panic_print" parameter/sysctl allows some interesting debug > information to be printed during a panic event. This is useful for example > in cases the user cannot kdump due to resource limits, or if the user > collects panic logs in a serial output (or pstore) and prefers a fast > reboot instead of a kdump. Yes, I have missed this possibility many times. > Happens that currently there's no way to see all CPUs backtraces in > a panic using "panic_print" on architectures that support that. We do > have "oops_all_cpu_backtrace" sysctl, but although partially overlapping > in the functionality, they are orthogonal in nature: "panic_print" is > a panic tuning (and we have panics without oopses, like direct calls to > panic() or maybe other paths that don't go through oops_enter() > function), and the original purpose of "oops_all_cpu_backtrace" is to > provide more information on oopses for cases in which the users desire > to continue running the kernel even after an oops, i.e., used in > non-panic scenarios. panic() already prevents double backtrace of the CPU that Oopsed, see: #ifdef CONFIG_DEBUG_BUGVERBOSE /* * Avoid nested stack-dumping if a panic occurs during oops processing */ if (!test_taint(TAINT_DIE) && oops_in_progress <= 1) dump_stack(); #endif It should be possible to do something similar also for backtraces on all CPUs. There are more situation when the backtraces are printed and panic() is called, for example: softlockup_panic and softlockup_all_cpu_backtrace. Well, it is just nice to have. People probably will not use these options together. And it is better to have the backtraces twice than do not have them at all. > So, we hereby introduce an additional bit for "panic_print" to allow > dumping the CPUs backtraces during a panic event. > > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Feel free to use: Reviewed-by: Petr Mladek <pmladek@suse.com> Best Regards, Petr
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 0905d2cdb2d5..569d035c4332 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3690,6 +3690,7 @@ bit 3: print locks info if CONFIG_LOCKDEP is on bit 4: print ftrace buffer bit 5: print all printk messages in buffer + bit 6: print all CPUs backtrace (if available in the arch) panic_on_taint= Bitmask for conditionally calling panic() in add_taint() Format: <hex>[,nousertaint] diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 70b7df9b081a..1666c1a9dbba 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -796,6 +796,7 @@ bit 2 print timer info bit 3 print locks info if ``CONFIG_LOCKDEP`` is on bit 4 print ftrace buffer bit 5: print all printk messages in buffer +bit 6: print all CPUs backtrace (if available in the arch) ===== ============================================ So for example to print tasks and memory info on panic, user can:: diff --git a/kernel/panic.c b/kernel/panic.c index cefd7d82366f..5da71fa4e5f1 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -65,6 +65,7 @@ EXPORT_SYMBOL_GPL(panic_timeout); #define PANIC_PRINT_LOCK_INFO 0x00000008 #define PANIC_PRINT_FTRACE_INFO 0x00000010 #define PANIC_PRINT_ALL_PRINTK_MSG 0x00000020 +#define PANIC_PRINT_ALL_CPU_BT 0x00000040 unsigned long panic_print; ATOMIC_NOTIFIER_HEAD(panic_notifier_list); @@ -151,6 +152,9 @@ static void panic_print_sys_info(void) if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG) console_flush_on_panic(CONSOLE_REPLAY_ALL); + if (panic_print & PANIC_PRINT_ALL_CPU_BT) + trigger_all_cpu_backtrace(); + if (panic_print & PANIC_PRINT_TASK_INFO) show_state();
Currently the "panic_print" parameter/sysctl allows some interesting debug information to be printed during a panic event. This is useful for example in cases the user cannot kdump due to resource limits, or if the user collects panic logs in a serial output (or pstore) and prefers a fast reboot instead of a kdump. Happens that currently there's no way to see all CPUs backtraces in a panic using "panic_print" on architectures that support that. We do have "oops_all_cpu_backtrace" sysctl, but although partially overlapping in the functionality, they are orthogonal in nature: "panic_print" is a panic tuning (and we have panics without oopses, like direct calls to panic() or maybe other paths that don't go through oops_enter() function), and the original purpose of "oops_all_cpu_backtrace" is to provide more information on oopses for cases in which the users desire to continue running the kernel even after an oops, i.e., used in non-panic scenarios. So, we hereby introduce an additional bit for "panic_print" to allow dumping the CPUs backtraces during a panic event. Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> --- Documentation/admin-guide/kernel-parameters.txt | 1 + Documentation/admin-guide/sysctl/kernel.rst | 1 + kernel/panic.c | 4 ++++ 3 files changed, 6 insertions(+)