diff mbox series

[v2] x86/mce: Defer processing early errors until mcheck_late_init()

Message ID 20210823204122.GA1640015@agluck-desk2.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show
Series [v2] x86/mce: Defer processing early errors until mcheck_late_init() | expand

Commit Message

Tony Luck Aug. 23, 2021, 8:41 p.m. UTC
When a fatal machine check results in a system reset, Linux does
not clear the error(s) from machine check bank(s).

Hardware preserves the machine check banks across a warm reset.

During initialization of the kernel after the reboot, Linux reads,
logs, and clears all machine check banks.

But there is a problem. In:
commit 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
the call to mce_register_decode_chain() moved later in the boot sequence.
This means that /dev/mcelog doesn't see those early error logs.

This was partially fixed by:
commit cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")

which made sure that the logs were not lost completely by printing
to the console. But parsing console logs is error prone. Users
of /dev/mcelog should expect to find any early errors logged to
standard places.

Delay processing logs until after all built-in code has had a chance
to register on the mce notifier chain (modules are still out of luck,
there's not way to know how long to wait for those to load).

Fixes: 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Reported-by: Sumanth Kamatala <skamatala@juniper.net>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Borislav Petkov Aug. 23, 2021, 8:51 p.m. UTC | #1
On Mon, Aug 23, 2021 at 01:41:22PM -0700, Luck, Tony wrote:
>  arch/x86/kernel/cpu/mce/core.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)

I actually had a different idea in mind, considering the fact that some
machinery to only log the early MCEs is already there. And this fits
more naturally in the flow and doesn't need a bool switch.

Hmmm?

---
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 0607ec4f5091..9b13cca74f65 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -265,6 +265,7 @@ enum mcp_flags {
 	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
 	MCP_UC		= BIT(1),	/* log uncorrected errors */
 	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
+	MCP_LOG_ONLY	= BIT(3),	/* log only */
 };
 bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
 
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aadc085..bb691503c2e4 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -817,7 +817,10 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
 			goto clear_it;
 
-		mce_log(&m);
+		if (flags & MCP_LOG_ONLY)
+			mce_gen_pool_add(&m);
+		else
+			mce_log(&m);
 
 clear_it:
 		/*
@@ -1639,10 +1642,12 @@ static void __mcheck_cpu_init_generic(void)
 		m_fl = MCP_DONTLOG;
 
 	/*
-	 * Log the machine checks left over from the previous reset.
+	 * Log the machine checks left over from the previous reset. Log them
+	 * only, do not start processing them. That will happen in mcheck_late_init()
+	 * when all consumers have been registered on the notifier chain.
 	 */
 	bitmap_fill(all_banks, MAX_NR_BANKS);
-	machine_check_poll(MCP_UC | m_fl, &all_banks);
+	machine_check_poll(MCP_UC | MCP_LOG_ONLY | m_fl, &all_banks);
 
 	cr4_set_bits(X86_CR4_MCE);
Tony Luck Aug. 23, 2021, 9:41 p.m. UTC | #2
+	MCP_LOG_ONLY	= BIT(3),	/* log only */

That looks nice ... except for the name & comment here. They don't
really capture what this flag bit does.

Maybe

	MCP_QUEUE_LOG	= BIT(3),	/* Just queue to genpool */

-Tony
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aadc085..593af202f586 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -129,6 +129,8 @@  static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
  */
 BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);
 
+static bool mce_init_complete;
+
 /* Do initial initialization of a struct mce */
 noinstr void mce_setup(struct mce *m)
 {
@@ -155,7 +157,7 @@  EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 
 void mce_log(struct mce *m)
 {
-	if (!mce_gen_pool_add(m))
+	if (!mce_gen_pool_add(m) && mce_init_complete)
 		irq_work_queue(&mce_irq_work);
 }
 EXPORT_SYMBOL_GPL(mce_log);
@@ -2771,6 +2773,8 @@  static int __init mcheck_late_init(void)
 
 	mcheck_debugfs_init();
 
+	mce_init_complete = true;
+
 	/*
 	 * Flush out everything that has been logged during early boot, now that
 	 * everything has been initialized (workqueues, decoders, ...).