Message ID | 20220922195136.54575-3-tony.luck@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Dump stack after certain machine checks | expand |
On Thu, Sep 22, 2022 at 12:51:36PM -0700, Tony Luck wrote: > @@ -254,6 +255,9 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) > wait_for_panic(); > barrier(); > > + if (final->severity == MCE_PANIC_STACKDUMP_SEVERITY) > + show_stack(NULL, NULL, KERN_DEFAULT); So this is kinda weird, IMO: 1. If the error has raised a MCE, then we will dump stack anyway. 2. If the error is the result of consuming poison or some other deferred type which doesn't raise an exception immediately, then we have missed it because we don't have the stack at the time the error got detected by the hardware. 3. If all you wanna do is avoid useless stack traces, you can simply ignore them. :) IOW, it will dump stack in the cases we're interested in and it will dump stack in a couple of other PANIC cases. So? We simply ignore the latter. But I don't see the point of adding code just so that we can suppress the uninteresting ones...
> 1. If the error has raised a MCE, then we will dump stack anyway.
I don't see stack dumps for machine check panics. I don't have any non-standard
settings (I think). Nor do I see them in the panic messages that other folks send
to me.
Are you settting some CONFIG or command line option to get a stack dump?
-Tony
On Mon, Oct 31, 2022 at 05:13:10PM +0000, Luck, Tony wrote: > > 1. If the error has raised a MCE, then we will dump stack anyway. > > I don't see stack dumps for machine check panics. I don't have any non-standard > settings (I think). Nor do I see them in the panic messages that other folks send > to me. > > Are you settting some CONFIG or command line option to get a stack dump? Well, if one were sane, one would assume that one would expect to see a stack dump when the machine panics, right? I mean, it is only fair... And there's an attempt: #ifdef CONFIG_DEBUG_BUGVERBOSE /* * Avoid nested stack-dumping if a panic occurs during oops processing */ if (!test_taint(TAINT_DIE) && oops_in_progress <= 1) dump_stack(); #endif but that oops_in_progress thing is stopping us: [ 13.706764] mce: [Hardware Error]: CPU 2: Machine Check Exception: 6 Bank 4: fe000010000b0c0f [ 13.706781] mce: [Hardware Error]: RIP 10:<ffffffff8103bbcb> {trigger_mce+0xb/0x10} [ 13.706791] mce: [Hardware Error]: TSC c83826d14 ADDR e1101add1e550012 MISC cafebeef [ 13.706795] mce: [Hardware Error]: PROCESSOR 2:a00f11 TIME 1667244167 SOCKET 0 APIC 2 microcode 1000065 [ 13.706809] mce: [Hardware Error]: Machine check: Processor Context Corrupt [ 13.706810] panic: on entry: oops_in_progress: 1 [ 13.706812] panic: before bust_spinlocks oops_in_progress: 1 [ 13.706813] Kernel panic - not syncing: Fatal local machine check [ 13.706814] panic: taint: 0, oops_in_progress: 2 [ 13.707133] Kernel Offset: disabled as panic() is being entered with oops_in_progress already set to 1. That oops_in_progress thing looks like is being used for console unblanking. Looking at 026ee1f66aaa ("panic: fix stack dump print on direct call to panic()") it hints that panic() might've been called twice for oops_in_progress to be already 1 on entry. I guess we need to figure out why that is...
> Well, if one were sane, one would assume that one would expect to see a > stack dump when the machine panics, right? I mean, it is only fair... Stack dump from a machine check wasn't at all useful until h/w and Linux started supporting recoverable machine checks. The stack dump is there to help diagnose and fix s/w problems. But a machine check isn't a software problem. So I was pretty happy with the status quo of not getting a stack dump from a machine check panic. With recoverable machine checks there are some cases where there might be an opportunity to change the kernel to avoid a crash. See my patches that akpm just took into the "mm" tree to recover when the kernel hits poison during a copy-on-write: https://lore.kernel.org/all/20221021200120.175753-1-tony.luck@intel.com/ or the patches from Google to recover when khugepaged hits poison: https://lore.kernel.org/linux-mm/20221010160142.1087120-1-jiaqiyan@google.com/ To identify additional opportunities to make the kernel more resilient, it would be useful to get a kernel stack trace in the specific case of a recoverable data consumption machine check while executing in the kernel. > And there's an attempt: > > #ifdef CONFIG_DEBUG_BUGVERBOSE > /* > * Avoid nested stack-dumping if a panic occurs during oops processing > */ > if (!test_taint(TAINT_DIE) && oops_in_progress <= 1) > dump_stack(); > #endif > > but that oops_in_progress thing is stopping us: ... > it hints that panic() might've been called twice for oops_in_progress to > be already 1 on entry. > > I guess we need to figure out why that is... It might be interesting, but a distraction from the goal of my patch to only dump the stack for recoverable machine checks in kernel code. -Tony
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index 7e03f5b7f6bd..f03aaff79e39 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -18,6 +18,7 @@ enum severity_level { MCE_UC_SEVERITY, MCE_AR_SEVERITY, MCE_PANIC_SEVERITY, + MCE_PANIC_STACKDUMP_SEVERITY, }; extern struct blocking_notifier_head x86_mce_decoder_chain; diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 2c8ec5c71712..69ec63eaa625 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -44,6 +44,7 @@ #include <linux/sync_core.h> #include <linux/task_work.h> #include <linux/hardirq.h> +#include <linux/sched/debug.h> #include <asm/intel-family.h> #include <asm/processor.h> @@ -254,6 +255,9 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) wait_for_panic(); barrier(); + if (final->severity == MCE_PANIC_STACKDUMP_SEVERITY) + show_stack(NULL, NULL, KERN_DEFAULT); + bust_spinlocks(1); console_verbose(); } else { @@ -864,6 +868,7 @@ static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo struct pt_regs *regs) { char *tmp = *msg; + int severity; int i; for (i = 0; i < this_cpu_read(mce_num_banks); i++) { @@ -876,9 +881,11 @@ static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo quirk_sandybridge_ifu(i, m, regs); m->bank = i; - if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) { + severity = mce_severity(m, regs, &tmp, true); + if (severity >= MCE_PANIC_SEVERITY) { mce_read_aux(m, i); *msg = tmp; + m->severity = severity; return 1; } } @@ -994,7 +1001,7 @@ static void mce_reign(void) */ if (m && global_worst >= MCE_PANIC_SEVERITY) { /* call mce_severity() to get "msg" for panic */ - mce_severity(m, NULL, &msg, true); + m->severity = mce_severity(m, NULL, &msg, true); mce_panic("Fatal machine check", m, msg); } diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c index c4477162c07d..89d083c5bd06 100644 --- a/arch/x86/kernel/cpu/mce/severity.c +++ b/arch/x86/kernel/cpu/mce/severity.c @@ -174,7 +174,7 @@ static struct severity { USER ), MCESEV( - PANIC, "Data load in unrecoverable area of kernel", + PANIC_STACKDUMP, "Data load in unrecoverable area of kernel", SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA), KERNEL ),
It isn't generally useful to dump the stack for a fatal machine check. The error was detected by hardware when some parity or ECC check failed, software isn't the problem. But the kernel now has a few places where it can recover from a machine check by treating it as an error. E.g. when copying parameters for system calls from an application. In order to ease the hunt for additional code flows where machine check errors can be recovered it is useful to know, for example, why the kernel was copying a page. Perhaps that code sequence can be modified to handle machine checks as errors. Add a new machine check severity value to indicate when a stack dump may be useful. Signed-off-by: Tony Luck <tony.luck@intel.com> --- arch/x86/kernel/cpu/mce/internal.h | 1 + arch/x86/kernel/cpu/mce/core.c | 11 +++++++++-- arch/x86/kernel/cpu/mce/severity.c | 2 +- 3 files changed, 11 insertions(+), 3 deletions(-)