diff mbox

[v4,3/5] ghes_edac: add platform check to enable ghes_edac

Message ID 20170823225447.15608-4-toshi.kani@hpe.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Kani, Toshi Aug. 23, 2017, 10:54 p.m. UTC
The ghes_edac driver was introduced in 2013 [1], but it has not
been enabled by any distro yet.  This driver obtains error info
from firmware interfaces, which are not properly implemented on
many platforms, as the driver always emits the messages below:

 This EDAC driver relies on BIOS to enumerate memory and get error reports.
 Unfortunately, not all BIOSes reflect the memory layout correctly
 So, the end result of using this driver varies from vendor to vendor
 If you find incorrect reports, please contact your hardware vendor
 to correct its BIOS.

To get out from this situation, add a platform check to selectively
enable the driver on the platforms that are known to have proper
firmware implementation.  Platform vendors can add their platforms
to the list when they support ghes_edac.

"ghes_edac.force_load=1" skips this platform check.

[1]: https://lwn.net/Articles/538438/
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
---
 drivers/edac/ghes_edac.c |   29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Borislav Petkov Aug. 31, 2017, 10:56 a.m. UTC | #1
On Wed, Aug 23, 2017 at 04:54:45PM -0600, Toshi Kani wrote:
> The ghes_edac driver was introduced in 2013 [1], but it has not
> been enabled by any distro yet.  This driver obtains error info
> from firmware interfaces, which are not properly implemented on
> many platforms, as the driver always emits the messages below:
> 
>  This EDAC driver relies on BIOS to enumerate memory and get error reports.
>  Unfortunately, not all BIOSes reflect the memory layout correctly
>  So, the end result of using this driver varies from vendor to vendor
>  If you find incorrect reports, please contact your hardware vendor
>  to correct its BIOS.
> 
> To get out from this situation, add a platform check to selectively
> enable the driver on the platforms that are known to have proper
> firmware implementation.  Platform vendors can add their platforms
> to the list when they support ghes_edac.
> 
> "ghes_edac.force_load=1" skips this platform check.
> 
> [1]: https://lwn.net/Articles/538438/
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Tony Luck <tony.luck@intel.com>
> ---
>  drivers/edac/ghes_edac.c |   29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 8d904df..0030a09 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -38,6 +38,10 @@ static struct ghes_edac_pvt *ghes_pvt;
>   */
>  static DEFINE_SPINLOCK(ghes_lock);
>  
> +/* Set 1 to skip the platform check */
> +static bool __read_mostly ghes_edac_force_load;

It is static - "force_load" as a bool name is enough.

> +module_param_named(force_load, ghes_edac_force_load, bool, 0);

ERROR: Use 4 digit octal (0777) not decimal permissions
#53: FILE: drivers/edac/ghes_edac.c:43:
+module_param_named(force_load, ghes_edac_force_load, bool, 0);

This last param is @perm: visibility in sysfs. Why not visible in sysfs?

> +
>  /* Memory Device - Type 17 of SMBIOS spec */
>  struct memdev_dmi_entry {
>  	u8 type;
> @@ -415,6 +419,15 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
>  	spin_unlock_irqrestore(&ghes_lock, flags);
>  }
>  
> +/*
> + * Known systems that are safe to enable this module.
> + * "ghes_edac.force_load=1" skips this check if necessary.

Put this second sentence over the parameter definition.
Kani, Toshi Aug. 31, 2017, 4:17 p.m. UTC | #2
T24gVGh1LCAyMDE3LTA4LTMxIGF0IDEyOjU2ICswMjAwLCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6
DQo+IE9uIFdlZCwgQXVnIDIzLCAyMDE3IGF0IDA0OjU0OjQ1UE0gLTA2MDAsIFRvc2hpIEthbmkg
d3JvdGU6DQogOg0KPiA+IC0tLQ0KPiA+IMKgZHJpdmVycy9lZGFjL2doZXNfZWRhYy5jIHzCoMKg
wqAyOSArKysrKysrKysrKysrKysrKysrKysrKystLS0tLQ0KPiA+IMKgMSBmaWxlIGNoYW5nZWQs
IDI0IGluc2VydGlvbnMoKyksIDUgZGVsZXRpb25zKC0pDQo+ID4gDQo+ID4gZGlmZiAtLWdpdCBh
L2RyaXZlcnMvZWRhYy9naGVzX2VkYWMuYyBiL2RyaXZlcnMvZWRhYy9naGVzX2VkYWMuYw0KPiA+
IGluZGV4IDhkOTA0ZGYuLjAwMzBhMDkgMTAwNjQ0DQo+ID4gLS0tIGEvZHJpdmVycy9lZGFjL2do
ZXNfZWRhYy5jDQo+ID4gKysrIGIvZHJpdmVycy9lZGFjL2doZXNfZWRhYy5jDQo+ID4gQEAgLTM4
LDYgKzM4LDEwIEBAIHN0YXRpYyBzdHJ1Y3QgZ2hlc19lZGFjX3B2dCAqZ2hlc19wdnQ7DQo+ID4g
wqAgKi8NCj4gPiDCoHN0YXRpYyBERUZJTkVfU1BJTkxPQ0soZ2hlc19sb2NrKTsNCj4gPiDCoA0K
PiA+ICsvKiBTZXQgMSB0byBza2lwIHRoZSBwbGF0Zm9ybSBjaGVjayAqLw0KPiA+ICtzdGF0aWMg
Ym9vbCBfX3JlYWRfbW9zdGx5IGdoZXNfZWRhY19mb3JjZV9sb2FkOw0KPiANCj4gSXQgaXMgc3Rh
dGljIC0gImZvcmNlX2xvYWQiIGFzIGEgYm9vbCBuYW1lIGlzIGVub3VnaC4NCg0KV2lsbCBkby4N
Cg0KPiA+ICttb2R1bGVfcGFyYW1fbmFtZWQoZm9yY2VfbG9hZCwgZ2hlc19lZGFjX2ZvcmNlX2xv
YWQsIGJvb2wsIDApOw0KPiANCj4gRVJST1I6IFVzZSA0IGRpZ2l0IG9jdGFsICgwNzc3KSBub3Qg
ZGVjaW1hbCBwZXJtaXNzaW9ucw0KPiAjNTM6IEZJTEU6IGRyaXZlcnMvZWRhYy9naGVzX2VkYWMu
Yzo0MzoNCj4gK21vZHVsZV9wYXJhbV9uYW1lZChmb3JjZV9sb2FkLCBnaGVzX2VkYWNfZm9yY2Vf
bG9hZCwgYm9vbCwgMCk7DQo+IA0KPiBUaGlzIGxhc3QgcGFyYW0gaXMgQHBlcm06IHZpc2liaWxp
dHkgaW4gc3lzZnMuIFdoeSBub3QgdmlzaWJsZSBpbg0KPiBzeXNmcz8NCg0KSSBmb2xsb3dlZCBp
biB0aGUgZm9vdHN0ZXBzIG9mICdnaGVzX2Rpc2FibGUnLCB3aGljaCBpcyBhbHNvIGEga2VybmVs
DQpib290IG9wdGlvbiBhbmQgdXNlcyAwLg0KDQo+ID4gKw0KPiA+IMKgLyogTWVtb3J5IERldmlj
ZSAtIFR5cGUgMTcgb2YgU01CSU9TIHNwZWMgKi8NCj4gPiDCoHN0cnVjdCBtZW1kZXZfZG1pX2Vu
dHJ5IHsNCj4gPiDCoAl1OCB0eXBlOw0KPiA+IEBAIC00MTUsNiArNDE5LDE1IEBAIHZvaWQgZ2hl
c19lZGFjX3JlcG9ydF9tZW1fZXJyb3Ioc3RydWN0IGdoZXMNCj4gPiAqZ2hlcywgaW50IHNldiwN
Cj4gPiDCoAlzcGluX3VubG9ja19pcnFyZXN0b3JlKCZnaGVzX2xvY2ssIGZsYWdzKTsNCj4gPiDC
oH0NCj4gPiDCoA0KPiA+ICsvKg0KPiA+ICsgKiBLbm93biBzeXN0ZW1zIHRoYXQgYXJlIHNhZmUg
dG8gZW5hYmxlIHRoaXMgbW9kdWxlLg0KPiA+ICsgKiAiZ2hlc19lZGFjLmZvcmNlX2xvYWQ9MSIg
c2tpcHMgdGhpcyBjaGVjayBpZiBuZWNlc3NhcnkuDQo+IA0KPiBQdXQgdGhpcyBzZWNvbmQgc2Vu
dGVuY2Ugb3ZlciB0aGUgcGFyYW1ldGVyIGRlZmluaXRpb24uDQoNCldpbGwgZG8uDQoNClRoYW5r
cywNCi1Ub3NoaQ0K
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Aug. 31, 2017, 4:52 p.m. UTC | #3
On Thu, Aug 31, 2017 at 04:17:07PM +0000, Kani, Toshimitsu wrote:
> I followed in the footsteps of 'ghes_disable', which is also a kernel
> boot option and uses 0.

Ok, ghes_disable comment says that using module_param() is easier.
ghes_edac is not a module but then __setup() is for "really core code".

Documentation/admin-guide/kernel-parameters.rst also talks about
core_param() but that's #ifndef MODULE. So module_param() it is.
diff mbox

Patch

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 8d904df..0030a09 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -38,6 +38,10 @@  static struct ghes_edac_pvt *ghes_pvt;
  */
 static DEFINE_SPINLOCK(ghes_lock);
 
+/* Set 1 to skip the platform check */
+static bool __read_mostly ghes_edac_force_load;
+module_param_named(force_load, ghes_edac_force_load, bool, 0);
+
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
 	u8 type;
@@ -415,6 +419,15 @@  void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 	spin_unlock_irqrestore(&ghes_lock, flags);
 }
 
+/*
+ * Known systems that are safe to enable this module.
+ * "ghes_edac.force_load=1" skips this check if necessary.
+ */
+static struct acpi_platform_list plat_list[] = {
+	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+	{ } /* End */
+};
+
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
 	bool fake = false;
@@ -422,6 +435,12 @@  int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
 	struct ghes_edac_dimm_fill dimm_fill;
+	int idx;
+
+	/* Check if safe to enable on this system */
+	idx = acpi_match_platform_list(plat_list);
+	if (!ghes_edac_force_load && idx < 0)
+		return 0;
 
 	/*
 	 * We have only one logical memory controller to which all DIMMs belong.
@@ -460,17 +479,17 @@  int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->ctl_name = "ghes_edac";
 	mci->dev_name = "ghes";
 
-	if (!fake) {
+	if (fake) {
+		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+		pr_info("work on such system. Use this driver with caution\n");
+	} else if (idx < 0) {
 		pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
 		pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
 		pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
 		pr_info("If you find incorrect reports, please contact your hardware vendor\n");
 		pr_info("to correct its BIOS.\n");
 		pr_info("This system has %d DIMM sockets.\n", num_dimm);
-	} else {
-		pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-		pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-		pr_info("work on such system. Use this driver with caution\n");
 	}
 
 	if (!fake) {