diff mbox series

[v2,09/16] x86/mce: Unify AMD THR handler with MCA Polling

Message ID 20240404151359.47970-10-yazen.ghannam@amd.com (mailing list archive)
State New
Headers show
Series MCA Updates | expand

Commit Message

Yazen Ghannam April 4, 2024, 3:13 p.m. UTC
AMD systems optionally support an MCA Thresholding interrupt. The
interrupt should be used as another signal to trigger MCA polling. This
is similar to how the Intel Corrected Machine Check interrupt (CMCI) is
handled.

AMD MCA Thresholding is managed using the MCA_MISC registers within an
MCA bank. The OS will need to modify the hardware error count field in
order to reset the threshold limit and rearm the interrupt. Management
of the MCA_MISC register should be done as a follow up to the basic MCA
polling flow. It should not be the main focus of the interrupt handler.

Furthermore, future systems will have the ability to send an MCA
Thresholding interrupt to the OS even when the OS does not manage the
feature, i.e. MCA_MISC registers are Read-as-Zero/Locked.

Call the common MCA polling function when handling the MCA Thresholding
interrupt. This will allow the OS to find any valid errors whether or
not the MCA Thresholding feature is OS-managed. Also, this allows the
common MCA polling options and kernel parameters to apply to AMD
systems.

Add a callback to the MCA polling function to handle vendor-specific
operations. Start by handling the AMD MCA Thresholding "block reset"
flow.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lkml.kernel.org/r/20231118193248.1296798-14-yazen.ghannam@amd.com
    
    v1->v2:
    * No change.

 arch/x86/kernel/cpu/mce/amd.c      | 57 ++++++++++++++----------------
 arch/x86/kernel/cpu/mce/core.c     |  8 +++++
 arch/x86/kernel/cpu/mce/internal.h |  2 ++
 3 files changed, 37 insertions(+), 30 deletions(-)

Comments

Borislav Petkov April 29, 2024, 1:40 p.m. UTC | #1
On Thu, Apr 04, 2024 at 10:13:52AM -0500, Yazen Ghannam wrote:
> @@ -787,6 +793,8 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>  			mce_log(&m);
>  
>  clear_it:
> +		vendor_handle_error(&m);

Wait, whaaat?

The normal polling happens periodically (each 5 mins) and you want to
reset the thresholding blocks each 5 mins?

And the code has there now:

static void reset_block(struct threshold_block *block)
{

...

        /* Reset threshold block after logging error. */
        memset(&tr, 0, sizeof(tr));
        tr.b = block;
        threshold_restart_bank(&tr);
}

but no error has been logged.

Frankly, I don't see the point for this part: polling all banks on
a thresholding interrupt makes sense. But this resetting from within the
polling doesn't make any sense.

Especially if that polling interval is user-controllable.

Thx.
Yazen Ghannam April 29, 2024, 2:36 p.m. UTC | #2
On 4/29/2024 9:40 AM, Borislav Petkov wrote:
> On Thu, Apr 04, 2024 at 10:13:52AM -0500, Yazen Ghannam wrote:
>> @@ -787,6 +793,8 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>>  			mce_log(&m);
>>  
>>  clear_it:
>> +		vendor_handle_error(&m);
> 
> Wait, whaaat?
> 
> The normal polling happens periodically (each 5 mins) and you want to
> reset the thresholding blocks each 5 mins?
> 
> And the code has there now:
> 
> static void reset_block(struct threshold_block *block)
> {
> 
> ...
> 
>         /* Reset threshold block after logging error. */
>         memset(&tr, 0, sizeof(tr));
>         tr.b = block;
>         threshold_restart_bank(&tr);
> }
> 
> but no error has been logged.
> 
> Frankly, I don't see the point for this part: polling all banks on
> a thresholding interrupt makes sense. But this resetting from within the
> polling doesn't make any sense.
> 
> Especially if that polling interval is user-controllable.
> 
> Thx.
> 

The reset only happens on a threshold overflow event. There's a check above.

	if (!(high & MASK_OVERFLOW_HI))
                return;

Basically, all the cases in vendor_handle_error() would be conditional.

Related to this, I've been thinking that banks with thresholding enabled
should be removed from the list of polling banks. This is done on Intel but
not on AMD.

I wanted to give it more thought, because I think folks have come to expect
polling and thresholding to be independent on AMD.

Thanks,
Yazen
Borislav Petkov May 4, 2024, 2:52 p.m. UTC | #3
On Mon, Apr 29, 2024 at 10:36:57AM -0400, Yazen Ghannam wrote:
> Related to this, I've been thinking that banks with thresholding enabled
> should be removed from the list of polling banks. This is done on Intel but
> not on AMD.
> 
> I wanted to give it more thought, because I think folks have come to expect
> polling and thresholding to be independent on AMD.

Yes, this whole thing sounds weird.

On the one hand, you have a special interrupt for errors which have
reached a threshold *just* *so* you don't have to poll. Because polling
is ok but getting a a special interrupt is better and such notification
systems always want to have a special interrupt and not have to poll.

On the other hand, you're marrying the two which sounds weird. Why?

What is wrong with getting thresholding interrupts?

Why can't we simply stop the polling and do THR only if available? That
would save a lot of energy.

So why can't we program the THR to raise an interrupt on a single error
and disable polling completely?

Because that would be a lot better as the hardware would be doing the
work for us.

In any case, I'm missing the strategy here so no cleanups without
a clear goal first please.

Thx.
Yazen Ghannam May 7, 2024, 4:25 p.m. UTC | #4
On 5/4/24 10:52 AM, Borislav Petkov wrote:
> On Mon, Apr 29, 2024 at 10:36:57AM -0400, Yazen Ghannam wrote:
>> Related to this, I've been thinking that banks with thresholding enabled
>> should be removed from the list of polling banks. This is done on Intel but
>> not on AMD.
>>
>> I wanted to give it more thought, because I think folks have come to expect
>> polling and thresholding to be independent on AMD.
> 
> Yes, this whole thing sounds weird.
> 
> On the one hand, you have a special interrupt for errors which have
> reached a threshold *just* *so* you don't have to poll. Because polling
> is ok but getting a a special interrupt is better and such notification
> systems always want to have a special interrupt and not have to poll.
> 
> On the other hand, you're marrying the two which sounds weird. Why?
> 
> What is wrong with getting thresholding interrupts?
> 

Nothing. This patch is not disabling the interrupt. The goal is to get
rid of duplicate code and have a single function that checks the MCA
banks.

This would be similar to intel_threshold_interrupt().

> Why can't we simply stop the polling and do THR only if available? That
> would save a lot of energy.
> 
> So why can't we program the THR to raise an interrupt on a single error
> and disable polling completely?
> 
> Because that would be a lot better as the hardware would be doing the
> work for us.
> 
> In any case, I'm missing the strategy here so no cleanups without
> a clear goal first please.
>

We could do that. In fact, there's a request to use the threshold that
is pre-programmed in the hardware. And we could use some of the current
kernel parameters for overrides, if needed.

Thanks,
Yazen
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index f59f4a1c9b21..75195d6fe971 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -979,12 +979,7 @@  static void amd_deferred_error_interrupt(void)
 		log_error_deferred(bank);
 }
 
-static void log_error_thresholding(unsigned int bank, u64 misc)
-{
-	_log_error_deferred(bank, misc);
-}
-
-static void log_and_reset_block(struct threshold_block *block)
+static void reset_block(struct threshold_block *block)
 {
 	struct thresh_restart tr;
 	u32 low = 0, high = 0;
@@ -998,49 +993,51 @@  static void log_and_reset_block(struct threshold_block *block)
 	if (!(high & MASK_OVERFLOW_HI))
 		return;
 
-	/* Log the MCE which caused the threshold event. */
-	log_error_thresholding(block->bank, ((u64)high << 32) | low);
-
 	/* Reset threshold block after logging error. */
 	memset(&tr, 0, sizeof(tr));
 	tr.b = block;
 	threshold_restart_bank(&tr);
 }
 
-/*
- * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
- * goes off when error_count reaches threshold_limit.
- */
-static void amd_threshold_interrupt(void)
+static void reset_thr_blocks(unsigned int bank)
 {
 	struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL;
 	struct threshold_bank **bp = this_cpu_read(threshold_banks);
-	unsigned int bank, cpu = smp_processor_id();
 
 	/*
 	 * Validate that the threshold bank has been initialized already. The
 	 * handler is installed at boot time, but on a hotplug event the
 	 * interrupt might fire before the data has been initialized.
 	 */
-	if (!bp)
+	if (!bp || !bp[bank])
 		return;
 
-	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
-		if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
-			continue;
+	first_block = bp[bank]->blocks;
+	if (!first_block)
+		return;
 
-		first_block = bp[bank]->blocks;
-		if (!first_block)
-			continue;
+	/*
+	 * The first block is also the head of the list. Check it first
+	 * before iterating over the rest.
+	 */
+	reset_block(first_block);
+	list_for_each_entry_safe(block, tmp, &first_block->miscj, miscj)
+		reset_block(block);
+}
 
-		/*
-		 * The first block is also the head of the list. Check it first
-		 * before iterating over the rest.
-		 */
-		log_and_reset_block(first_block);
-		list_for_each_entry_safe(block, tmp, &first_block->miscj, miscj)
-			log_and_reset_block(block);
-	}
+/*
+ * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
+ * goes off when error_count reaches threshold_limit.
+ */
+static void amd_threshold_interrupt(void)
+{
+	/* Check all banks for now. This could be optimized in the future. */
+	machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_poll_banks));
+}
+
+void amd_handle_error(struct mce *m)
+{
+	reset_thr_blocks(m->bank);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 7a857b33f515..75297e7eb980 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -672,6 +672,12 @@  static noinstr void mce_read_aux(struct mce *m, int i)
 	}
 }
 
+static void vendor_handle_error(struct mce *m)
+{
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+		return amd_handle_error(m);
+}
+
 DEFINE_PER_CPU(unsigned, mce_poll_count);
 
 /*
@@ -787,6 +793,8 @@  bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 			mce_log(&m);
 
 clear_it:
+		vendor_handle_error(&m);
+
 		/*
 		 * Clear state for this bank.
 		 */
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index e86e53695828..96b108175ca2 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -267,6 +267,7 @@  void mce_setup_for_cpu(unsigned int cpu, struct mce *m);
 #ifdef CONFIG_X86_MCE_AMD
 extern bool amd_filter_mce(struct mce *m);
 bool amd_mce_usable_address(struct mce *m);
+void amd_handle_error(struct mce *m);
 
 /*
  * If MCA_CONFIG[McaLsbInStatusSupported] is set, extract ErrAddr in bits
@@ -295,6 +296,7 @@  static __always_inline void smca_extract_err_addr(struct mce *m)
 #else
 static inline bool amd_filter_mce(struct mce *m) { return false; }
 static inline bool amd_mce_usable_address(struct mce *m) { return false; }
+static inline void amd_handle_error(struct mce *m) { }
 static inline void smca_extract_err_addr(struct mce *m) { }
 #endif