Message ID | 20220811091713.10427-3-justin.he@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Modularize ghes_edac driver | expand |
On Thursday, August 11, 2022 3:17 AM, Jia He wrote: > Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in > apci_init()") introduced a bug that ghes_edac_register() would be invoked > before edac_init(). Because at that time, the bus "edac" hasn't been even > registered, this created sysfs /devices/mc0 instead of > /sys/devices/system/edac/mc/mc0 on an Ampere eMag server. > > The solution is modularizing the ghes_edac driver. > Use a list to save the probing devices in ghes_probe(), and defer the > ghes_edac_register() to the new ghes_edac module_init() by iterating over > the devices list. Since this approach defers the ghes_edac registration, the series needs to include a way to prevent chipset-specific edac drivers from registering before ghes_edac as discussed in the original email thread. Toshi
Hi Jia,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on rafael-pm/linux-next]
[also build test WARNING on ras/edac-for-next efi/next linus/master v5.19 next-20220812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Jia-He/Modularize-ghes_edac-driver/20220811-171953
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-s022 (https://download.01.org/0day-ci/archive/20220812/202208121802.AQBiO8LK-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-39-gce1a6720-dirty
# https://github.com/intel-lab-lkp/linux/commit/9cf68330d4fa626e09c8cbc3be9910751e94508c
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Jia-He/Modularize-ghes_edac-driver/20220811-171953
git checkout 9cf68330d4fa626e09c8cbc3be9910751e94508c
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/acpi/apei/
If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
>> drivers/acpi/apei/ghes.c:97:1: sparse: sparse: symbol 'ghes_report_chain' was not declared. Should it be static?
drivers/acpi/apei/ghes.c:733:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
drivers/acpi/apei/ghes.c:733:25: sparse: struct ghes_estatus_cache [noderef] __rcu *
drivers/acpi/apei/ghes.c:733:25: sparse: struct ghes_estatus_cache *
drivers/acpi/apei/ghes.c:813:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
drivers/acpi/apei/ghes.c:813:25: sparse: struct ghes_estatus_cache [noderef] __rcu *
drivers/acpi/apei/ghes.c:813:25: sparse: struct ghes_estatus_cache *
On Thu, Aug 11, 2022 at 09:17:13AM +0000, Jia He wrote: > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > index d91ad378c00d..ed6519f3d45b 100644 > --- a/drivers/acpi/apei/ghes.c > +++ b/drivers/acpi/apei/ghes.c > @@ -94,6 +94,8 @@ > #define FIX_APEI_GHES_SDEI_CRITICAL __end_of_fixed_addresses > #endif > > +ATOMIC_NOTIFIER_HEAD(ghes_report_chain); static. You need function wrappers which call the notifier from the module. Also, why atomic? x86_mce_decoder_chain is a blocking one. Also, the whole notifier adding thing needs to be a separate patch. > @@ -1497,3 +1504,37 @@ void __init acpi_ghes_init(void) > else > pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n"); > } > + > +/* > + * Known x86 systems that prefer GHES error reporting: > + */ > +static struct acpi_platform_list plat_list[] = { > + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, > + { } /* End */ > +}; > + > +struct list_head *ghes_get_devices(bool force) > +{ > + int idx = -1; > + > + if (IS_ENABLED(CONFIG_X86)) { > + idx = acpi_match_platform_list(plat_list); > + if (idx < 0 && !force) > + return NULL; > + } > + > + return &ghes_devs; > +} > +EXPORT_SYMBOL_GPL(ghes_get_devices); And yes, as Toshi points out, the other EDAC drivers - sb_edac, skx_edac and amd64_edac - should call this function in their init functions so that it can get selected which driver to load on HPE server platforms. Also in a separate patch pls. Thx.
Hi Boris and Toshi > -----Original Message----- > From: Borislav Petkov <bp@alien8.de> > Sent: Friday, August 12, 2022 10:46 PM > To: Justin He <Justin.He@arm.com> > Cc: Ard Biesheuvel <ardb@kernel.org>; Len Brown <lenb@kernel.org>; James > Morse <James.Morse@arm.com>; Tony Luck <tony.luck@intel.com>; Mauro > Carvalho Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; > Robert Moore <robert.moore@intel.com>; linux-acpi@vger.kernel.org; > linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; devel@acpica.org; > Rafael J . Wysocki <rafael@kernel.org>; Shuai Xue > <xueshuai@linux.alibaba.com>; Jarkko Sakkinen <jarkko@kernel.org>; > linux-efi@vger.kernel.org; nd <nd@arm.com>; toshi.kani@hpe.com; > stable@kernel.org > Subject: Re: [PATCH 2/2] EDAC/ghes: Modularize ghes_edac driver to remove > the dependency on ghes > > On Thu, Aug 11, 2022 at 09:17:13AM +0000, Jia He wrote: > > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index > > d91ad378c00d..ed6519f3d45b 100644 > > --- a/drivers/acpi/apei/ghes.c > > +++ b/drivers/acpi/apei/ghes.c > > @@ -94,6 +94,8 @@ > > #define FIX_APEI_GHES_SDEI_CRITICAL __end_of_fixed_addresses > > #endif > > > > +ATOMIC_NOTIFIER_HEAD(ghes_report_chain); > > static. You need function wrappers which call the notifier from the module. > Ok > Also, why atomic? x86_mce_decoder_chain is a blocking one. > But is ghes_proc_in_irq() in the irq context which can invoke the notifier? I described it in the commit log: >To resolve the build dependency of ghes_edac_report_mem_error() for ghes, >introduce a notifier which is registered by ghes_edac module. atomic >notifier is used because ghes_proc_in_irq() is in the irq context. > Also, the whole notifier adding thing needs to be a separate patch. Ok, I will try to cleanly split it > > > @@ -1497,3 +1504,37 @@ void __init acpi_ghes_init(void) > > else > > pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n"); > > } > > + > > +/* > > + * Known x86 systems that prefer GHES error reporting: > > + */ > > +static struct acpi_platform_list plat_list[] = { > > + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, > > + { } /* End */ > > +}; > > + > > +struct list_head *ghes_get_devices(bool force) { > > + int idx = -1; > > + > > + if (IS_ENABLED(CONFIG_X86)) { > > + idx = acpi_match_platform_list(plat_list); > > + if (idx < 0 && !force) > > + return NULL; > > + } > > + > > + return &ghes_devs; > > +} > > +EXPORT_SYMBOL_GPL(ghes_get_devices); > > And yes, as Toshi points out, the other EDAC drivers - sb_edac, skx_edac and > amd64_edac - should call this function in their init functions so that it can get > selected which driver to load on HPE server platforms. > > Also in a separate patch pls. > Ok, thanks -- Cheers, Justin (Jia He)
Hi Borislav & Toshi Please see my questions below: > -----Original Message----- > From: Borislav Petkov <bp@alien8.de> > Sent: Friday, August 12, 2022 10:46 PM > To: Justin He <Justin.He@arm.com> > Cc: Ard Biesheuvel <ardb@kernel.org>; Len Brown <lenb@kernel.org>; James > Morse <James.Morse@arm.com>; Tony Luck <tony.luck@intel.com>; Mauro > Carvalho Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; > Robert Moore <robert.moore@intel.com>; linux-acpi@vger.kernel.org; > linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; devel@acpica.org; > Rafael J . Wysocki <rafael@kernel.org>; Shuai Xue > <xueshuai@linux.alibaba.com>; Jarkko Sakkinen <jarkko@kernel.org>; > linux-efi@vger.kernel.org; nd <nd@arm.com>; toshi.kani@hpe.com; > stable@kernel.org > Subject: Re: [PATCH 2/2] EDAC/ghes: Modularize ghes_edac driver to remove > the dependency on ghes > > On Thu, Aug 11, 2022 at 09:17:13AM +0000, Jia He wrote: > > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index > > d91ad378c00d..ed6519f3d45b 100644 > > --- a/drivers/acpi/apei/ghes.c > > +++ b/drivers/acpi/apei/ghes.c > > @@ -94,6 +94,8 @@ > > #define FIX_APEI_GHES_SDEI_CRITICAL __end_of_fixed_addresses > > #endif > > > > +ATOMIC_NOTIFIER_HEAD(ghes_report_chain); > > static. You need function wrappers which call the notifier from the module. > > Also, why atomic? x86_mce_decoder_chain is a blocking one. > > Also, the whole notifier adding thing needs to be a separate patch. > > > @@ -1497,3 +1504,37 @@ void __init acpi_ghes_init(void) > > else > > pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n"); > > } > > + > > +/* > > + * Known x86 systems that prefer GHES error reporting: > > + */ > > +static struct acpi_platform_list plat_list[] = { > > + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, > > + { } /* End */ > > +}; > > + > > +struct list_head *ghes_get_devices(bool force) { > > + int idx = -1; > > + > > + if (IS_ENABLED(CONFIG_X86)) { > > + idx = acpi_match_platform_list(plat_list); > > + if (idx < 0 && !force) > > + return NULL; > > + } > > + > > + return &ghes_devs; > > +} > > +EXPORT_SYMBOL_GPL(ghes_get_devices); > > And yes, as Toshi points out, the other EDAC drivers - sb_edac, skx_edac and > amd64_edac - should call this function in their init functions so that it can get > selected which driver to load on HPE server platforms. > I assume that all those edac drivers which used owner checking are impacted, right? So the impacted list should be: drivers/edac/pnd2_edac.c:1532: if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) drivers/edac/sb_edac.c:3513: if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) drivers/edac/amd64_edac.c:4333: if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) drivers/edac/i10nm_base.c:552: if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) drivers/edac/skx_base.c:657: if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) drivers/edac/igen6_edac.c:1275: if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) Furthermore, should I totally remove the owner check in above driver XX_init()? Because after the new helper ghes_get_device() checking, those drivers can't be initialized after ghes_edac is loaded. If ghes_edac is NOT loaded, the edac_mc_owner check in edac_mc_add_mc_with_groups() can guarantee that there is only one regular edac driver. What do you think of it? -- Cheers, Justin (Jia He)
On Monday, August 15, 2022 8:20 PM, Justin He wrote: > I assume that all those edac drivers which used owner checking are > impacted, right? So the impacted list should be: > drivers/edac/pnd2_edac.c:1532: if (owner && strncmp(owner, > EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) > drivers/edac/sb_edac.c:3513: if (owner && strncmp(owner, > EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) > drivers/edac/amd64_edac.c:4333: if (owner && strncmp(owner, > EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) > drivers/edac/i10nm_base.c:552: if (owner && strncmp(owner, > EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) > drivers/edac/skx_base.c:657: if (owner && strncmp(owner, > EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) > drivers/edac/igen6_edac.c:1275: if (owner && strncmp(owner, > EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) Yes, but the impact is not necessarily limited to these modules. The current model works as follows: 1. ghes_probe() calling ghes_edac_register() sets the owner to ghes_edac. 2. chipset-specific edac drivers checks the owner. 3. edac_mc_add_mc_with_groups() also checks the owner. Hence, check #3 enforces #1 (ghes_edac) even if there is an edac driver w/o check #2. I do not know if such case exists though. > Furthermore, should I totally remove the owner check in above driver > XX_init()? Because after the new helper ghes_get_device() checking, those > drivers can't be initialized after ghes_edac is loaded. If ghes_edac is NOT > loaded, the edac_mc_owner check in edac_mc_add_mc_with_groups() can > guarantee that there is only one regular edac driver. > > What do you think of it? I think a new check with ghes_get_device() can replace the owner check in xx_init(). Thanks, Toshi
On Mon, Aug 15, 2022 at 01:32:40AM +0000, Justin He wrote: > But is ghes_proc_in_irq() in the irq context which can invoke the notifier? > I described it in the commit log: > >To resolve the build dependency of ghes_edac_report_mem_error() for ghes, > >introduce a notifier which is registered by ghes_edac module. atomic > >notifier is used because ghes_proc_in_irq() is in the irq context. Ah, yes, there's that aspect. atomic it is. Thx.
On Wed, Aug 17, 2022 at 01:39:32AM +0000, Kani, Toshi wrote: > Yes, but the impact is not necessarily limited to these modules. The impact is limited only to the modules which would otherwise potentially load on a HP system which advertizes as "HPE ", "Server ". You're the only user of ghes_edac. So please state which drivers you want blocked from loading on those platforms and where ghes_edac should load instead. From looking at 301375e76432 ("EDAC: Add owner check to the x86 platform drivers") it is 4 drivers. > I think a new check with ghes_get_device() can replace the owner check > in xx_init(). The owner thing should not be touched now - it can be a cleanup ontop if you - Justin - feel like doing it afterwards. Thx.
On Wednesday, August 17, 2022 2:50 AM, Borislav Petkov wrote: > On Wed, Aug 17, 2022 at 01:39:32AM +0000, Kani, Toshi wrote: > > Yes, but the impact is not necessarily limited to these modules. > > The impact is limited only to the modules which would otherwise > potentially load on a HP system which advertizes as "HPE ", "Server ". > > You're the only user of ghes_edac. So please state which drivers you > want blocked from loading on those platforms and where ghes_edac should > load instead. ghes_edac is used on Arm. This original issue happened on a non-HPE platform. I am not sure if this chipset-specific edac vs ghes_edac ordering issue is x86-specific. It may be safer to add the ghes check to edac_mc_add_mc_with_groups() to keep it protected from any edac driver. Toshi
On Wed, Aug 17, 2022 at 08:22:58PM +0000, Kani, Toshi wrote: > ghes_edac is used on Arm. This original issue happened on a non-HPE > platform. Remember: ghes_edac loads only on known-good platforms - not the other way around. If ARM folks wanna use it there, then I'd expect explicit enablement to do so. > to keep it protected from any edac driver. You got this all backwards. If anything, the kernel should be protected from ghes.
On Wednesday, August 17, 2022 2:56 PM, Borislav Petkov wrote: > On Wed, Aug 17, 2022 at 08:22:58PM +0000, Kani, Toshi wrote: > > ghes_edac is used on Arm. This original issue happened on a non-HPE > > platform. > > Remember: ghes_edac loads only on known-good platforms - not the other > way around. If ARM folks wanna use it there, then I'd expect explicit > enablement to do so. Well, that was the case originally, which is why 301375e76432 was made for x86 chipset-specific edac drivers only at that time. Since then, the change below enabled ghes_edac on Arm without this known-good platforms check. commit eaa3a1d46 ("EDAC, ghes: Make platform-based whitelisting x86-only") Toshi
On Wed, Aug 17, 2022 at 09:09:41PM +0000, Kani, Toshi wrote: > Since then, the change below enabled ghes_edac on Arm without this > known-good platforms check. > > commit eaa3a1d46 ("EDAC, ghes: Make platform-based whitelisting x86-only") Bah, I had forgotten about that one... In any case, edac_mc_add_mc* is too late in the init path - that check should happen as the very first thing in the driver init function. And looking at the ARM64 EDAC drivers, they're only a couple: thunderx, xgene, bluefield, dmc-520... And I'd still prefer if their maintainers explicitly ACK such a change to call ghes_get_devices() (for a lack of a better idea) and not enable ghes_edac on them blindly. Thx.
> -----Original Message----- > From: Borislav Petkov <bp@alien8.de> > Sent: Thursday, August 18, 2022 5:35 AM > To: Kani, Toshi <toshi.kani@hpe.com> > Cc: Justin He <Justin.He@arm.com>; Ard Biesheuvel <ardb@kernel.org>; Len > Brown <lenb@kernel.org>; James Morse <James.Morse@arm.com>; Tony > Luck <tony.luck@intel.com>; Mauro Carvalho Chehab > <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert Moore > <robert.moore@intel.com>; linux-acpi@vger.kernel.org; > linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; devel@acpica.org; > Rafael J . Wysocki <rafael@kernel.org>; Shuai Xue > <xueshuai@linux.alibaba.com>; Jarkko Sakkinen <jarkko@kernel.org>; > linux-efi@vger.kernel.org; nd <nd@arm.com>; stable@kernel.org > Subject: Re: [PATCH 2/2] EDAC/ghes: Modularize ghes_edac driver to remove > the dependency on ghes > > On Wed, Aug 17, 2022 at 09:09:41PM +0000, Kani, Toshi wrote: > > Since then, the change below enabled ghes_edac on Arm without this > > known-good platforms check. > > > > commit eaa3a1d46 ("EDAC, ghes: Make platform-based whitelisting > > x86-only") > > Bah, I had forgotten about that one... > > In any case, edac_mc_add_mc* is too late in the init path - that check should > happen as the very first thing in the driver init function. > > And looking at the ARM64 EDAC drivers, they're only a couple: thunderx, > xgene, bluefield, dmc-520... And I'd still prefer if their maintainers explicitly > ACK such a change to call ghes_get_devices() (for a lack of a better idea) and > not enable ghes_edac on them blindly. Okay, will include above changes in next version, thanks for the reminder. -- Cheers, Justin (Jia He)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index d91ad378c00d..ed6519f3d45b 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -94,6 +94,8 @@ #define FIX_APEI_GHES_SDEI_CRITICAL __end_of_fixed_addresses #endif +ATOMIC_NOTIFIER_HEAD(ghes_report_chain); + static inline bool is_hest_type_generic_v2(struct ghes *ghes) { return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2; @@ -118,6 +120,9 @@ module_param_named(disable, ghes_disable, bool, 0); static LIST_HEAD(ghes_hed); static DEFINE_MUTEX(ghes_list_mutex); +static LIST_HEAD(ghes_devs); +static DEFINE_MUTEX(ghes_devs_mutex); + /* * Because the memory area used to transfer hardware error information * from BIOS to Linux can be determined only in NMI, IRQ or timer @@ -645,7 +650,7 @@ static bool ghes_do_proc(struct ghes *ghes, if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) { struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata); - ghes_edac_report_mem_error(sev, mem_err); + atomic_notifier_call_chain(&ghes_report_chain, sev, mem_err); arch_apei_report_mem_error(sev, mem_err); queued = ghes_handle_memory_failure(gdata, sev); @@ -1376,7 +1381,11 @@ static int ghes_probe(struct platform_device *ghes_dev) platform_set_drvdata(ghes_dev, ghes); - ghes_edac_register(ghes, &ghes_dev->dev); + ghes->dev = &ghes_dev->dev; + + mutex_lock(&ghes_devs_mutex); + list_add_tail(&ghes->elist, &ghes_devs); + mutex_unlock(&ghes_devs_mutex); /* Handle any pending errors right away */ spin_lock_irqsave(&ghes_notify_lock_irq, flags); @@ -1440,8 +1449,6 @@ static int ghes_remove(struct platform_device *ghes_dev) ghes_fini(ghes); - ghes_edac_unregister(ghes); - kfree(ghes); platform_set_drvdata(ghes_dev, NULL); @@ -1497,3 +1504,37 @@ void __init acpi_ghes_init(void) else pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n"); } + +/* + * Known x86 systems that prefer GHES error reporting: + */ +static struct acpi_platform_list plat_list[] = { + {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, + { } /* End */ +}; + +struct list_head *ghes_get_devices(bool force) +{ + int idx = -1; + + if (IS_ENABLED(CONFIG_X86)) { + idx = acpi_match_platform_list(plat_list); + if (idx < 0 && !force) + return NULL; + } + + return &ghes_devs; +} +EXPORT_SYMBOL_GPL(ghes_get_devices); + +void ghes_register_report_chain(struct notifier_block *nb) +{ + atomic_notifier_chain_register(&ghes_report_chain, nb); +} +EXPORT_SYMBOL_GPL(ghes_register_report_chain); + +void ghes_unregister_report_chain(struct notifier_block *nb) +{ + atomic_notifier_chain_unregister(&ghes_report_chain, nb); +} +EXPORT_SYMBOL_GPL(ghes_unregister_report_chain); diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index 17562cf1fe97..df45db81858b 100644 --- a/drivers/edac/Kconfig +++ b/drivers/edac/Kconfig @@ -53,8 +53,8 @@ config EDAC_DECODE_MCE has been initialized. config EDAC_GHES - bool "Output ACPI APEI/GHES BIOS detected errors via EDAC" - depends on ACPI_APEI_GHES && (EDAC=y) + tristate "Output ACPI APEI/GHES BIOS detected errors via EDAC" + depends on ACPI_APEI_GHES select UEFI_CPER help Not all machines support hardware-driven error report. Some of those diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index c8fa7dcfdbd0..3bdf8469882d 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -14,6 +14,7 @@ #include <linux/dmi.h> #include "edac_module.h" #include <ras/ras_event.h> +#include <linux/notifier.h> #define OTHER_DETAIL_LEN 400 @@ -59,6 +60,8 @@ module_param(force_load, bool, 0); static bool system_scanned; +static struct list_head *ghes_devs; + /* Memory Device - Type 17 of SMBIOS spec */ struct memdev_dmi_entry { u8 type; @@ -267,11 +270,14 @@ static int print_mem_error_other_detail(const struct cper_sec_mem_err *mem, char return n; } -void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) +static int ghes_edac_report_mem_error(struct notifier_block *nb, + unsigned long val, void *data) { + struct cper_sec_mem_err *mem_err = (struct cper_sec_mem_err *)data; struct cper_mem_err_compact cmem; struct edac_raw_error_desc *e; struct mem_ctl_info *mci; + unsigned long sev = val; struct ghes_pvt *pvt; unsigned long flags; char *p; @@ -282,7 +288,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) * know. */ if (WARN_ON_ONCE(in_nmi())) - return; + return NOTIFY_OK; spin_lock_irqsave(&ghes_lock, flags); @@ -374,36 +380,24 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) unlock: spin_unlock_irqrestore(&ghes_lock, flags); + + return NOTIFY_OK; } -/* - * Known systems that are safe to enable this module. - */ -static struct acpi_platform_list plat_list[] = { - {"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions}, - { } /* End */ +static struct notifier_block ghes_edac_mem_err_nb = { + .notifier_call = ghes_edac_report_mem_error, + .priority = 0, }; -int ghes_edac_register(struct ghes *ghes, struct device *dev) +static int ghes_edac_register(struct device *dev) { bool fake = false; struct mem_ctl_info *mci; struct ghes_pvt *pvt; struct edac_mc_layer layers[1]; unsigned long flags; - int idx = -1; int rc = 0; - if (IS_ENABLED(CONFIG_X86)) { - /* Check if safe to enable on this system */ - idx = acpi_match_platform_list(plat_list); - if (!force_load && idx < 0) - return -ENODEV; - } else { - force_load = true; - idx = 0; - } - /* finish another registration/unregistration instance first */ mutex_lock(&ghes_reg_mutex); @@ -447,7 +441,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n"); pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n"); pr_info("work on such system. Use this driver with caution\n"); - } else if (idx < 0) { + } else if (force_load) { pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n"); pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n"); pr_info("So, the end result of using this driver varies from vendor to vendor.\n"); @@ -503,6 +497,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) ghes_pvt = pvt; spin_unlock_irqrestore(&ghes_lock, flags); + ghes_register_report_chain(&ghes_edac_mem_err_nb); + /* only set on success */ refcount_set(&ghes_refcount, 1); @@ -517,7 +513,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) return rc; } -void ghes_edac_unregister(struct ghes *ghes) +static void ghes_edac_unregister(struct ghes *ghes) { struct mem_ctl_info *mci; unsigned long flags; @@ -548,6 +544,40 @@ void ghes_edac_unregister(struct ghes *ghes) if (mci) edac_mc_free(mci); + ghes_unregister_report_chain(&ghes_edac_mem_err_nb); + unlock: mutex_unlock(&ghes_reg_mutex); } + +static int __init ghes_edac_init(void) +{ + struct ghes *g, *g_tmp; + + ghes_devs = ghes_get_devices(force_load); + if (!ghes_devs) + return -ENODEV; + + if (!IS_ENABLED(CONFIG_X86)) + force_load = true; + + list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) { + ghes_edac_register(g->dev); + } + + return 0; +} +module_init(ghes_edac_init); + +static void __exit ghes_edac_exit(void) +{ + struct ghes *g, *g_tmp; + + list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) { + ghes_edac_unregister(g); + } +} +module_exit(ghes_edac_exit); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("Output ACPI APEI/GHES BIOS detected errors module via EDAC"); diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index 34fb3431a8f3..150c0b9500d6 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -27,6 +27,8 @@ struct ghes { struct timer_list timer; unsigned int irq; }; + struct device *dev; + struct list_head elist; }; struct ghes_estatus_node { @@ -69,35 +71,13 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb); * @nb: pointer to the notifier_block structure of the vendor record handler. */ void ghes_unregister_vendor_record_notifier(struct notifier_block *nb); +struct list_head *ghes_get_devices(bool force); +#else +static inline struct list_head *ghes_get_devices(bool force) { return NULL; } #endif int ghes_estatus_pool_init(int num_ghes); -/* From drivers/edac/ghes_edac.c */ - -#ifdef CONFIG_EDAC_GHES -void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err); - -int ghes_edac_register(struct ghes *ghes, struct device *dev); - -void ghes_edac_unregister(struct ghes *ghes); - -#else -static inline void ghes_edac_report_mem_error(int sev, - struct cper_sec_mem_err *mem_err) -{ -} - -static inline int ghes_edac_register(struct ghes *ghes, struct device *dev) -{ - return -ENODEV; -} - -static inline void ghes_edac_unregister(struct ghes *ghes) -{ -} -#endif - static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata) { return gdata->revision >> 8; @@ -145,4 +125,7 @@ int ghes_notify_sea(void); static inline int ghes_notify_sea(void) { return -ENOENT; } #endif +struct notifier_block; +extern void ghes_register_report_chain(struct notifier_block *nb); +extern void ghes_unregister_report_chain(struct notifier_block *nb); #endif /* GHES_H */