diff mbox series

[2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context

Message ID 20220922195136.54575-3-tony.luck@intel.com (mailing list archive)
State New, archived
Headers show
Series Dump stack after certain machine checks | expand

Commit Message

Tony Luck Sept. 22, 2022, 7:51 p.m. UTC
It isn't generally useful to dump the stack for a fatal machine check.
The error was detected by hardware when some parity or ECC check failed,
software isn't the problem.

But the kernel now has a few places where it can recover from a machine
check by treating it as an error. E.g. when copying parameters for system
calls from an application.

In order to ease the hunt for additional code flows where machine check
errors can be recovered it is useful to know, for example, why the
kernel was copying a page. Perhaps that code sequence can be modified to
handle machine checks as errors.

Add a new machine check severity value to indicate when a stack dump
may be useful.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/internal.h |  1 +
 arch/x86/kernel/cpu/mce/core.c     | 11 +++++++++--
 arch/x86/kernel/cpu/mce/severity.c |  2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

Comments

Borislav Petkov Oct. 31, 2022, 4:44 p.m. UTC | #1
On Thu, Sep 22, 2022 at 12:51:36PM -0700, Tony Luck wrote:
> @@ -254,6 +255,9 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
>  			wait_for_panic();
>  		barrier();
>  
> +		if (final->severity == MCE_PANIC_STACKDUMP_SEVERITY)
> +			show_stack(NULL, NULL, KERN_DEFAULT);

So this is kinda weird, IMO:

1. If the error has raised a MCE, then we will dump stack anyway.

2. If the error is the result of consuming poison or some other deferred
type which doesn't raise an exception immediately, then we have missed
it because we don't have the stack at the time the error got detected by
the hardware.

3. If all you wanna do is avoid useless stack traces, you can simply
ignore them. :)

IOW, it will dump stack in the cases we're interested in and it will
dump stack in a couple of other PANIC cases. So? We simply ignore the
latter.

But I don't see the point of adding code just so that we can suppress
the uninteresting ones...
Tony Luck Oct. 31, 2022, 5:13 p.m. UTC | #2
> 1. If the error has raised a MCE, then we will dump stack anyway.

I don't see stack dumps for machine check panics. I don't have any non-standard
settings (I think). Nor do I see them in the panic messages that other folks send
to me.

Are you settting some CONFIG or command line option to get a stack dump?

-Tony
Borislav Petkov Oct. 31, 2022, 6:36 p.m. UTC | #3
On Mon, Oct 31, 2022 at 05:13:10PM +0000, Luck, Tony wrote:
> > 1. If the error has raised a MCE, then we will dump stack anyway.
> 
> I don't see stack dumps for machine check panics. I don't have any non-standard
> settings (I think). Nor do I see them in the panic messages that other folks send
> to me.
> 
> Are you settting some CONFIG or command line option to get a stack dump?

Well, if one were sane, one would assume that one would expect to see a
stack dump when the machine panics, right? I mean, it is only fair...

And there's an attempt:

#ifdef CONFIG_DEBUG_BUGVERBOSE 
        /*
         * Avoid nested stack-dumping if a panic occurs during oops processing
         */
        if (!test_taint(TAINT_DIE) && oops_in_progress <= 1)
                dump_stack();
#endif

but that oops_in_progress thing is stopping us:

[   13.706764] mce: [Hardware Error]: CPU 2: Machine Check Exception: 6 Bank 4: fe000010000b0c0f
[   13.706781] mce: [Hardware Error]: RIP 10:<ffffffff8103bbcb> {trigger_mce+0xb/0x10}
[   13.706791] mce: [Hardware Error]: TSC c83826d14 ADDR e1101add1e550012 MISC cafebeef 
[   13.706795] mce: [Hardware Error]: PROCESSOR 2:a00f11 TIME 1667244167 SOCKET 0 APIC 2 microcode 1000065
[   13.706809] mce: [Hardware Error]: Machine check: Processor Context Corrupt
[   13.706810] panic: on entry: oops_in_progress: 1
[   13.706812] panic: before bust_spinlocks oops_in_progress: 1
[   13.706813] Kernel panic - not syncing: Fatal local machine check
[   13.706814] panic: taint: 0, oops_in_progress: 2
[   13.707133] Kernel Offset: disabled

as panic() is being entered with oops_in_progress already set to 1. That
oops_in_progress thing looks like is being used for console unblanking.

Looking at

  026ee1f66aaa ("panic: fix stack dump print on direct call to panic()")

it hints that panic() might've been called twice for oops_in_progress to
be already 1 on entry.

I guess we need to figure out why that is...
Tony Luck Oct. 31, 2022, 7:20 p.m. UTC | #4
> Well, if one were sane, one would assume that one would expect to see a
> stack dump when the machine panics, right? I mean, it is only fair...

Stack dump from a machine check wasn't at all useful until h/w and Linux started
supporting recoverable machine checks. The stack dump is there to help diagnose
and fix s/w problems. But a machine check isn't a software problem.

So I was pretty happy with the status quo of not getting a stack dump from
a machine check panic.

With recoverable machine checks there are some cases where there might
be an opportunity to change the kernel to avoid a crash. See my patches that
akpm just took into the "mm" tree to recover when the kernel hits poison during
a copy-on-write:

https://lore.kernel.org/all/20221021200120.175753-1-tony.luck@intel.com/

or the patches from Google to recover when khugepaged hits poison:

https://lore.kernel.org/linux-mm/20221010160142.1087120-1-jiaqiyan@google.com/


To identify additional opportunities to make the kernel more resilient, it would be useful
to get a kernel stack trace in the specific case of a recoverable data consumption
machine check while executing in the kernel.

> And there's an attempt:
>
> #ifdef CONFIG_DEBUG_BUGVERBOSE
>         /*
>          * Avoid nested stack-dumping if a panic occurs during oops processing
>          */
>         if (!test_taint(TAINT_DIE) && oops_in_progress <= 1)
>                 dump_stack();
> #endif
>
> but that oops_in_progress thing is stopping us: 

...

> it hints that panic() might've been called twice for oops_in_progress to
> be already 1 on entry.
>
> I guess we need to figure out why that is...

It might be interesting, but a distraction from the goal of my patch to only
dump the stack for recoverable machine checks in kernel code.

-Tony
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 7e03f5b7f6bd..f03aaff79e39 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -18,6 +18,7 @@  enum severity_level {
 	MCE_UC_SEVERITY,
 	MCE_AR_SEVERITY,
 	MCE_PANIC_SEVERITY,
+	MCE_PANIC_STACKDUMP_SEVERITY,
 };
 
 extern struct blocking_notifier_head x86_mce_decoder_chain;
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c8ec5c71712..69ec63eaa625 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -44,6 +44,7 @@ 
 #include <linux/sync_core.h>
 #include <linux/task_work.h>
 #include <linux/hardirq.h>
+#include <linux/sched/debug.h>
 
 #include <asm/intel-family.h>
 #include <asm/processor.h>
@@ -254,6 +255,9 @@  static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 			wait_for_panic();
 		barrier();
 
+		if (final->severity == MCE_PANIC_STACKDUMP_SEVERITY)
+			show_stack(NULL, NULL, KERN_DEFAULT);
+
 		bust_spinlocks(1);
 		console_verbose();
 	} else {
@@ -864,6 +868,7 @@  static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo
 					  struct pt_regs *regs)
 {
 	char *tmp = *msg;
+	int severity;
 	int i;
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
@@ -876,9 +881,11 @@  static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo
 			quirk_sandybridge_ifu(i, m, regs);
 
 		m->bank = i;
-		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
+		severity = mce_severity(m, regs, &tmp, true);
+		if (severity >= MCE_PANIC_SEVERITY) {
 			mce_read_aux(m, i);
 			*msg = tmp;
+			m->severity = severity;
 			return 1;
 		}
 	}
@@ -994,7 +1001,7 @@  static void mce_reign(void)
 	 */
 	if (m && global_worst >= MCE_PANIC_SEVERITY) {
 		/* call mce_severity() to get "msg" for panic */
-		mce_severity(m, NULL, &msg, true);
+		m->severity = mce_severity(m, NULL, &msg, true);
 		mce_panic("Fatal machine check", m, msg);
 	}
 
diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index c4477162c07d..89d083c5bd06 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -174,7 +174,7 @@  static struct severity {
 		USER
 		),
 	MCESEV(
-		PANIC, "Data load in unrecoverable area of kernel",
+		PANIC_STACKDUMP, "Data load in unrecoverable area of kernel",
 		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
 		KERNEL
 		),