diff mbox

[3/3] ghes_edac: add platform check to enable ghes_edac

Message ID 20170717215912.26070-4-toshi.kani@hpe.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Kani, Toshi July 17, 2017, 9:59 p.m. UTC
The ghes_edac driver was introduced in 2013 [1], but it has not
been enabled by any distro yet.  This driver obtains error info
from firmware interfaces, which are not properly implemented on
many platforms, as the driver always emits the messages below:

 This EDAC driver relies on BIOS to enumerate memory and get error reports.
 Unfortunately, not all BIOSes reflect the memory layout correctly
 So, the end result of using this driver varies from vendor to vendor
 If you find incorrect reports, please contact your hardware vendor
 to correct its BIOS.

To get out from this situation, add a platform type check to
selectively enable the driver on the platforms that are known to
have proper firmware implementation.  Platform vendors can add
their platforms to the list when they support ghes_edac.

"ghes_edac.any_oem=1" skips the platform type check.

[1]: https://lwn.net/Articles/538438/
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/edac/ghes_edac.c |   28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jeffrey Hugo July 18, 2017, 2:39 p.m. UTC | #1
On 7/17/2017 3:59 PM, Toshi Kani wrote:
> The ghes_edac driver was introduced in 2013 [1], but it has not
> been enabled by any distro yet.   

Ubuntu is expected to enable this soon.
Kani, Toshi July 18, 2017, 3:36 p.m. UTC | #2
On Tue, 2017-07-18 at 08:39 -0600, Jeffrey Hugo wrote:
> On 7/17/2017 3:59 PM, Toshi Kani wrote:

> > The ghes_edac driver was introduced in 2013 [1], but it has not

> > been enabled by any distro yet.   

> 

> Ubuntu is expected to enable this soon.


Interesting.  I was told from other distro that there were many buggy
firmwares out there that prevented to enable ghes_edac.  Do you know if
Ubuntu has any plan to address such issue?  Or do they not see such
issue?  I do not test with other vendors' platforms, so I cannot tell
exactly what those bugs are...

Thanks, 
-Toshi
Jeffrey Hugo July 18, 2017, 4:24 p.m. UTC | #3
On 7/18/2017 9:36 AM, Kani, Toshimitsu wrote:
> On Tue, 2017-07-18 at 08:39 -0600, Jeffrey Hugo wrote:
>> On 7/17/2017 3:59 PM, Toshi Kani wrote:
>>> The ghes_edac driver was introduced in 2013 [1], but it has not
>>> been enabled by any distro yet.
>>
>> Ubuntu is expected to enable this soon.
> 
> Interesting.  I was told from other distro that there were many buggy
> firmwares out there that prevented to enable ghes_edac.  Do you know if
> Ubuntu has any plan to address such issue?  Or do they not see such
> issue?  I do not test with other vendors' platforms, so I cannot tell
> exactly what those bugs are...
> 

I do not know if Ubuntu intends to address any "known issues".  I know a 
request was made to Canonical to enable the option, and it appears the 
request is being considered, although the option may be limited to 
ARM64, depending on how Canonical's evaluation goes.  I am not aware of 
any particular issues, so I cannot say what the side effects are, or 
what platforms are considered to exhibit such issues.
Kani, Toshi July 18, 2017, 4:42 p.m. UTC | #4
T24gVHVlLCAyMDE3LTA3LTE4IGF0IDEwOjI0IC0wNjAwLCBKZWZmcmV5IEh1Z28gd3JvdGU6DQo+
IE9uIDcvMTgvMjAxNyA5OjM2IEFNLCBLYW5pLCBUb3NoaW1pdHN1IHdyb3RlOg0KPiA+IE9uIFR1
ZSwgMjAxNy0wNy0xOCBhdCAwODozOSAtMDYwMCwgSmVmZnJleSBIdWdvIHdyb3RlOg0KPiA+ID4g
T24gNy8xNy8yMDE3IDM6NTkgUE0sIFRvc2hpIEthbmkgd3JvdGU6DQo+ID4gPiA+IFRoZSBnaGVz
X2VkYWMgZHJpdmVyIHdhcyBpbnRyb2R1Y2VkIGluIDIwMTMgWzFdLCBidXQgaXQgaGFzIG5vdA0K
PiA+ID4gPiBiZWVuIGVuYWJsZWQgYnkgYW55IGRpc3RybyB5ZXQuDQo+ID4gPiANCj4gPiA+IFVi
dW50dSBpcyBleHBlY3RlZCB0byBlbmFibGUgdGhpcyBzb29uLg0KPiA+IA0KPiA+IEludGVyZXN0
aW5nLsKgwqBJIHdhcyB0b2xkIGZyb20gb3RoZXIgZGlzdHJvIHRoYXQgdGhlcmUgd2VyZSBtYW55
DQo+ID4gYnVnZ3kgZmlybXdhcmVzIG91dCB0aGVyZSB0aGF0IHByZXZlbnRlZCB0byBlbmFibGUg
Z2hlc19lZGFjLsKgwqBEbw0KPiA+IHlvdSBrbm93IGlmIFVidW50dSBoYXMgYW55IHBsYW4gdG8g
YWRkcmVzcyBzdWNoIGlzc3VlP8KgwqBPciBkbyB0aGV5DQo+ID4gbm90IHNlZSBzdWNoIGlzc3Vl
P8KgwqBJIGRvIG5vdCB0ZXN0IHdpdGggb3RoZXIgdmVuZG9ycycgcGxhdGZvcm1zLA0KPiA+IHNv
IEkgY2Fubm90IHRlbGwgZXhhY3RseSB3aGF0IHRob3NlIGJ1Z3MgYXJlLi4uDQo+ID4gDQo+IA0K
PiBJIGRvIG5vdCBrbm93IGlmIFVidW50dSBpbnRlbmRzIHRvIGFkZHJlc3MgYW55ICJrbm93biBp
c3N1ZXMiLsKgwqBJDQo+IGtub3cgYcKgcmVxdWVzdCB3YXMgbWFkZSB0byBDYW5vbmljYWwgdG8g
ZW5hYmxlIHRoZSBvcHRpb24sIGFuZCBpdA0KPiBhcHBlYXJzIHRoZSByZXF1ZXN0IGlzIGJlaW5n
IGNvbnNpZGVyZWQsIGFsdGhvdWdoIHRoZSBvcHRpb24gbWF5IGJlDQo+IGxpbWl0ZWQgdG/CoEFS
TTY0LCBkZXBlbmRpbmcgb24gaG93IENhbm9uaWNhbCdzIGV2YWx1YXRpb24gZ29lcy7CoMKgSSBh
bQ0KPiBub3QgYXdhcmUgb2YgYW55IHBhcnRpY3VsYXIgaXNzdWVzLCBzbyBJIGNhbm5vdCBzYXkg
d2hhdCB0aGUgc2lkZQ0KPiBlZmZlY3RzIGFyZSwgb3Igd2hhdCBwbGF0Zm9ybXMgYXJlIGNvbnNp
ZGVyZWQgdG8gZXhoaWJpdCBzdWNoIGlzc3Vlcy4NCg0KSSBzZWUuIFRoYW5rcyBmb3IgdGhlIGlu
Zm8hIEkgaG9wZSBzb21lb25lIGZyb20gQ2Fub25pY2FsIGlzIG9uIHRoZQ0KbGlzdC4NCi1Ub3No
aQ0K
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 4e61a62..00588a3 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -34,6 +34,9 @@  static LIST_HEAD(ghes_reglist);
 static DEFINE_MUTEX(ghes_edac_lock);
 static int ghes_edac_mc_num;
 
+/* Set 1 to skip the platform check */
+static bool __read_mostly ghes_edac_any_oem;
+module_param_named(any_oem, ghes_edac_any_oem, bool, 0);
 
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
@@ -405,6 +408,15 @@  void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
 }
 EXPORT_SYMBOL_GPL(ghes_edac_report_mem_error);
 
+/*
+ * Known systems that are safe to enable this module.
+ * "ghes_edac.any_oem=1" skips this check if necessary.
+ */
+static struct acpi_oemlist oemlist[] = {
+	{"HPE   ", "Server  ", 0, ACPI_SIG_FADT, all_versions},
+	{ } /* End */
+};
+
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
 	bool fake = false;
@@ -413,6 +425,12 @@  int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	struct edac_mc_layer layers[1];
 	struct ghes_edac_pvt *pvt;
 	struct ghes_edac_dimm_fill dimm_fill;
+	int idx;
+
+	/* Check if safe to enable on this system */
+	idx = acpi_match_oemlist(oemlist);
+	if (!ghes_edac_any_oem && idx < 0)
+		return 0;
 
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
@@ -456,7 +474,11 @@  int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci->dev_name = "ghes";
 
 	if (!ghes_edac_mc_num) {
-		if (!fake) {
+		if (fake) {
+			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+			pr_info("work on such system. Use this driver with caution\n");
+		} else if (idx < 0) {
 			pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
 			pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
 			pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
@@ -464,10 +486,6 @@  int ghes_edac_register(struct ghes *ghes, struct device *dev)
 			pr_info("to correct its BIOS.\n");
 			pr_info("This system has %d DIMM sockets.\n",
 				num_dimm);
-		} else {
-			pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
-			pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
-			pr_info("work on such system. Use this driver with caution\n");
 		}
 	}