Message ID | 20180516182958.GB17092@pd.tnic (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi guys, Tyler, Zhengqiang, I assume all your shipped platforms with HEST->GHES entries also have DMI tables. On 16/05/18 19:29, Borislav Petkov wrote: > On Wed, May 16, 2018 at 02:38:38PM +0100, James Morse wrote: >> XGene has its own edac driver, but it doesn't probe when booted via ACPI so >> won't conflict with ghes_edac. > > Actually it will. EDAC core can have only one EDAC driver loaded. Don't > ask me why - it has been that way since forever. By won't probe I mean it only works on DT systems: | static const struct of_device_id xgene_edac_of_match[] = { | { .compatible = "apm,xgene-edac" }, | {}, | }; | .driver = { | .name = "xgene-edac", | .of_match_table = xgene_edac_of_match, | }, To work on a system with GHES it would need an 'struct acpi_device_id' to describe the HID (?) and populate driver's acpi_match_table. > We can change it some > day but frankly, I don't see reasoning for it. One driver can easily > manage *all* error sources on a system, I'd say. I agree, there is no reason to support two at the same time, if this happens then there is probably something wrong with the platform (e.g. races with firmware reading the same hardware registers), so we should make some noise. Xgene's edac driver would be a good example of this, it looks like it reads data from some mmio region, if something else is doing the same we're going to make a mess. >> So I think we're good to make the whitelist x86 only. >> Your diff-hunk makes 'idx=-1', so we always get the 'Unfortunately' warning. I'd >> like to suppress this unless force_load has been used. > > Yeah, we should handle that differently for ARM. Toshi added the idx > thing in > > 5deed6b6a479 ("EDAC, ghes: Add platform check") > > to dump this when the platform is not whitelisted. So let's do that: > > --- > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c > index 863fbf3db29f..473aeec4b1da 100644 > --- a/drivers/edac/ghes_edac.c > +++ b/drivers/edac/ghes_edac.c > @@ -440,12 +440,16 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) > struct mem_ctl_info *mci; > struct edac_mc_layer layers[1]; > struct ghes_edac_dimm_fill dimm_fill; > - int idx; > + int idx = -1; > > - /* Check if safe to enable on this system */ > - idx = acpi_match_platform_list(plat_list); > - if (!force_load && idx < 0) > - return -ENODEV; v4.17-rc5 has 'return 0' here. Wouldn't this change means no ghes can be registered unless ghes_edac is also supported by the platform? Shouldn't this be '0' for a silent failure? > + if (IS_ENABLED(CONFIG_X86)) { > + /* Check if safe to enable on this system */ > + idx = acpi_match_platform_list(plat_list); > + if (!force_load && idx < 0) > + return -ENODEV; > + } else { > + idx = 0; > + } > > /* > * We have only one logical memory controller to which all DIMMs belong. Tested on Seattle and some cranky homebrew-no-DMI firmware: Tested-by: James Morse <james.morse@arm.com> With the ENODEV/0 thing above: Reviewed-by: James Morse <james.morse@arm.com> >> It looks like even the oldest Arm64 ACPI systems have dmi tables, so we can >> probably require DMI or the 'force' flag. > > Well, with the hunk above it would still do ghes_edac_count_dimms() on > ARM and if it fails to find something, it will set fake, which is a good > sanity-check as it screams loudly. :) Thanks, James
On 2018/5/18 2:02, James Morse wrote: > Hi guys, > > Tyler, Zhengqiang, I assume all your shipped platforms with HEST->GHES entries > also have DMI tables. > Sure, Our ARM64 platform have DMI tables. thanks. > > On 16/05/18 19:29, Borislav Petkov wrote: >> On Wed, May 16, 2018 at 02:38:38PM +0100, James Morse wrote: >>> XGene has its own edac driver, but it doesn't probe when booted via ACPI so >>> won't conflict with ghes_edac. >> >> Actually it will. EDAC core can have only one EDAC driver loaded. Don't >> ask me why - it has been that way since forever. > > By won't probe I mean it only works on DT systems: > > | static const struct of_device_id xgene_edac_of_match[] = { > | { .compatible = "apm,xgene-edac" }, > | {}, > | }; > > | .driver = { > | .name = "xgene-edac", > | .of_match_table = xgene_edac_of_match, > | }, > > To work on a system with GHES it would need an 'struct acpi_device_id' to > describe the HID (?) and populate driver's acpi_match_table. > > >> We can change it some >> day but frankly, I don't see reasoning for it. One driver can easily >> manage *all* error sources on a system, I'd say. > > I agree, there is no reason to support two at the same time, if this happens > then there is probably something wrong with the platform (e.g. races with > firmware reading the same hardware registers), so we should make some noise. > > Xgene's edac driver would be a good example of this, it looks like it reads data > from some mmio region, if something else is doing the same we're going to make a > mess. > > >>> So I think we're good to make the whitelist x86 only. >>> Your diff-hunk makes 'idx=-1', so we always get the 'Unfortunately' warning. I'd >>> like to suppress this unless force_load has been used. >> >> Yeah, we should handle that differently for ARM. Toshi added the idx >> thing in >> >> 5deed6b6a479 ("EDAC, ghes: Add platform check") >> >> to dump this when the platform is not whitelisted. So let's do that: >> >> --- >> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c >> index 863fbf3db29f..473aeec4b1da 100644 >> --- a/drivers/edac/ghes_edac.c >> +++ b/drivers/edac/ghes_edac.c >> @@ -440,12 +440,16 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) >> struct mem_ctl_info *mci; >> struct edac_mc_layer layers[1]; >> struct ghes_edac_dimm_fill dimm_fill; >> - int idx; >> + int idx = -1; >> >> - /* Check if safe to enable on this system */ >> - idx = acpi_match_platform_list(plat_list); >> - if (!force_load && idx < 0) >> - return -ENODEV; > > v4.17-rc5 has 'return 0' here. Wouldn't this change means no ghes can be > registered unless ghes_edac is also supported by the platform? > Shouldn't this be '0' for a silent failure? > > >> + if (IS_ENABLED(CONFIG_X86)) { >> + /* Check if safe to enable on this system */ >> + idx = acpi_match_platform_list(plat_list); >> + if (!force_load && idx < 0) >> + return -ENODEV; >> + } else { >> + idx = 0; >> + } >> >> /* >> * We have only one logical memory controller to which all DIMMs belong. > > Tested on Seattle and some cranky homebrew-no-DMI firmware: > Tested-by: James Morse <james.morse@arm.com> > > With the ENODEV/0 thing above: > Reviewed-by: James Morse <james.morse@arm.com> > > >>> It looks like even the oldest Arm64 ACPI systems have dmi tables, so we can >>> probably require DMI or the 'force' flag. >> >> Well, with the hunk above it would still do ghes_edac_count_dimms() on >> ARM and if it fails to find something, it will set fake, which is a good >> sanity-check as it screams loudly. :) > > > Thanks, > > James > > . >
On Thu, May 17, 2018 at 07:02:18PM +0100, James Morse wrote: > v4.17-rc5 has 'return 0' here. Wouldn't this change means no ghes can be > registered unless ghes_edac is also supported by the platform? > Shouldn't this be '0' for a silent failure? https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=for-next&id=cc7f3f132658289b6661ab8294ab08a9d32ea026 > Tested on Seattle and some cranky homebrew-no-DMI firmware: > Tested-by: James Morse <james.morse@arm.com> > > With the ENODEV/0 thing above: > Reviewed-by: James Morse <james.morse@arm.com> Thanks, adding.
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 863fbf3db29f..473aeec4b1da 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -440,12 +440,16 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) struct mem_ctl_info *mci; struct edac_mc_layer layers[1]; struct ghes_edac_dimm_fill dimm_fill; - int idx; + int idx = -1; - /* Check if safe to enable on this system */ - idx = acpi_match_platform_list(plat_list); - if (!force_load && idx < 0) - return -ENODEV; + if (IS_ENABLED(CONFIG_X86)) { + /* Check if safe to enable on this system */ + idx = acpi_match_platform_list(plat_list); + if (!force_load && idx < 0) + return -ENODEV; + } else { + idx = 0; + } /* * We have only one logical memory controller to which all DIMMs belong.