diff mbox series

[3/3] EDAC/igen6: Allow setting edac_op_state

Message ID 20241106114024.941659-4-orange@aiven.io (mailing list archive)
State New
Headers show
Series EDAC/igen6: Add polling support and allow setting edac_op_state | expand

Commit Message

Orange Kao Nov. 6, 2024, 11:35 a.m. UTC
Current implementation does not allow users to set edac_op_state. As a
result, if a user needs to test different edac_op_state, they need to
compile the kernel.

This commit accepts module parameter edac_op_state which makes it easier
for users to test IBECC on their hardware.

Signed-off-by: Orange Kao <orange@aiven.io>
---
 drivers/edac/igen6_edac.c | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

Comments

Zhuo, Qiuxu Nov. 6, 2024, 1:04 p.m. UTC | #1
> From: Orange Kao <orange@aiven.io>
> [...]
> Subject: [PATCH 3/3] EDAC/igen6: Allow setting edac_op_state
> 
> Current implementation does not allow users to set edac_op_state. As a
> result, if a user needs to test different edac_op_state, they need to compile
> the kernel.
> 
> This commit accepts module parameter edac_op_state which makes it easier
> for users to test IBECC on their hardware.

An SoC's (with the IBECC feature) memory error reporting type is determined. 
Switching from NMI to Machine Check or vice versa for a given SoC is pointless
in the real world.

Additionally, the interrupt mode is preferred over the polling mode unless 
the interrupt cannot work, as in the case you reported.

[ Sometimes, no choice is the best choice :-). ]

-Qiuxu
Orange Kao Nov. 6, 2024, 9:23 p.m. UTC | #2
On 6/11/24 13:04, Zhuo, Qiuxu wrote:
>> From: Orange Kao <orange@aiven.io>
>> [...]
>> Subject: [PATCH 3/3] EDAC/igen6: Allow setting edac_op_state
>>
>> Current implementation does not allow users to set edac_op_state. As a
>> result, if a user needs to test different edac_op_state, they need to compile
>> the kernel.
>>
>> This commit accepts module parameter edac_op_state which makes it easier
>> for users to test IBECC on their hardware.
> 
> An SoC's (with the IBECC feature) memory error reporting type is determined.
> Switching from NMI to Machine Check or vice versa for a given SoC is pointless
> in the real world.
> 
> Additionally, the interrupt mode is preferred over the polling mode unless
> the interrupt cannot work, as in the case you reported.
> 
> [ Sometimes, no choice is the best choice :-). ]
> 
> -Qiuxu

Thank you Qiuxu and Boris. Good to know. I don't have any "actual" use 
case so please exclude patch 3.

Thanks
diff mbox series

Patch

diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index dd62aa1ea9c3..025f994f7bf0 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1341,16 +1341,18 @@  static int register_err_handler(void)
 {
 	int rc;
 
-	if (res_cfg->machine_check) {
+	if (edac_op_state == EDAC_OPSTATE_INT) {
 		mce_register_decode_chain(&ecclog_mce_dec);
 		return 0;
 	}
 
-	rc = register_nmi_handler(NMI_SERR, ecclog_nmi_handler,
-				  0, IGEN6_NMI_NAME);
-	if (rc) {
-		igen6_printk(KERN_ERR, "Failed to register NMI handler\n");
-		return rc;
+	if (edac_op_state == EDAC_OPSTATE_NMI) {
+		rc = register_nmi_handler(NMI_SERR, ecclog_nmi_handler,
+					  0, IGEN6_NMI_NAME);
+		if (rc) {
+			igen6_printk(KERN_ERR, "Failed to register NMI handler\n");
+			return rc;
+		}
 	}
 
 	return 0;
@@ -1358,16 +1360,29 @@  static int register_err_handler(void)
 
 static void unregister_err_handler(void)
 {
-	if (res_cfg->machine_check) {
+	if (edac_op_state == EDAC_OPSTATE_INT) {
 		mce_unregister_decode_chain(&ecclog_mce_dec);
 		return;
 	}
 
-	unregister_nmi_handler(NMI_SERR, IGEN6_NMI_NAME);
+	if (edac_op_state == EDAC_OPSTATE_NMI)
+		unregister_nmi_handler(NMI_SERR, IGEN6_NMI_NAME);
 }
 
 static void opstate_set(struct res_config *cfg, const struct pci_device_id *ent)
 {
+	switch (edac_op_state) {
+	case EDAC_OPSTATE_POLL:
+	case EDAC_OPSTATE_NMI:
+	case EDAC_OPSTATE_INT:
+		return;
+	case EDAC_OPSTATE_INVAL:
+		break;
+	default:
+		edac_op_state = EDAC_OPSTATE_INVAL;
+		break;
+	}
+
 	/*
 	 * Quirk: Certain SoCs' error reporting interrupts don't work.
 	 *        Force polling mode for them to ensure that memory error
@@ -1509,3 +1524,6 @@  module_exit(igen6_exit);
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Qiuxu Zhuo");
 MODULE_DESCRIPTION("MC Driver for Intel client SoC using In-Band ECC");
+
+module_param(edac_op_state, int, 0444);
+MODULE_PARM_DESC(edac_op_state, "EDAC Error Reporting state: 0=Poll, 1=NMI, 2=Machine Check, Default=Auto detect");