diff mbox series

[v1,1/4] x86/mce: Collect error message for severities below MCE_PANIC_SEVERITY

Message ID 20250211060200.33845-2-xueshuai@linux.alibaba.com (mailing list archive)
State New
Headers show
Series fmm/hwpoison: Fix regressions in memory failure handling | expand

Commit Message

Shuai Xue Feb. 11, 2025, 6:01 a.m. UTC
Currently, mce_no_way_out() only collects error messages when the error
severity is equal to `MCE_PANIC_SEVERITY`. To improve diagnostics,
modify the behavior to also collect error messages when the severity is
less than `MCE_PANIC_SEVERITY`.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 arch/x86/kernel/cpu/mce/core.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

Comments

Luck, Tony Feb. 11, 2025, 4:51 p.m. UTC | #1
> +	char *tmp = *msg, cur_sev = MCE_NO_SEVERITY, sev;

Should cur_sev and sev be of type "int" (since that's the type returned by mce_severity())?

It doesn't matter today since the list of return value does fit into "char", but it is setting up
to fail if that should ever change.

-Tony
Shuai Xue Feb. 12, 2025, 1:51 a.m. UTC | #2
在 2025/2/12 00:51, Luck, Tony 写道:
>> +	char *tmp = *msg, cur_sev = MCE_NO_SEVERITY, sev;
> 
> Should cur_sev and sev be of type "int" (since that's the type returned by mce_severity())?
> 
> It doesn't matter today since the list of return value does fit into "char", but it is setting up
> to fail if that should ever change.
> 
> -Tony

You are right, I previously only focused on the fact that the field 'sev' of
struct severity is an unsigned char.

Will fix it in next version.

Thanks.
Shuai
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0dc00c9894c7..2919a077cd66 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -925,11 +925,12 @@  static __always_inline void quirk_zen_ifu(int bank, struct mce *m, struct pt_reg
  * Do a quick check if any of the events requires a panic.
  * This decides if we keep the events around or clear them.
  */
-static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, unsigned long *validp,
-					  struct pt_regs *regs)
+static __always_inline bool mce_no_way_out(struct mce_hw_err *err, char **msg,
+					   unsigned long *validp,
+					   struct pt_regs *regs)
 {
 	struct mce *m = &err->m;
-	char *tmp = *msg;
+	char *tmp = *msg, cur_sev = MCE_NO_SEVERITY, sev;
 	int i;
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
@@ -945,13 +946,17 @@  static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, un
 			quirk_zen_ifu(i, m, regs);
 
 		m->bank = i;
-		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
+		sev = mce_severity(m, regs, &tmp, true);
+		if (sev >= cur_sev) {
 			mce_read_aux(err, i);
 			*msg = tmp;
-			return 1;
+			cur_sev = sev;
 		}
+
+		if (cur_sev == MCE_PANIC_SEVERITY)
+			return true;
 	}
-	return 0;
+	return false;
 }
 
 /*