Message ID | 20220802000614.3769714-1-jane.chu@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v6] x86/mce: retrieve poison range from hardware | expand |
* Jane Chu <jane.chu@oracle.com> wrote: > With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine > poison granularity") that changed nfit_handle_mce() callback to report > badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been > discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting > 2 back-to-back poisons and the driver ends up logging 8 badblocks, > because 0x1000 bytes is 8 512-byte. > > Dan Williams noticed that apei_mce_report_mem_error() hardcode > the LSB field to PAGE_SHIFT instead of consulting the input > struct cper_sec_mem_err record. So change to rely on hardware whenever > support is available. > > Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Jane Chu <jane.chu@oracle.com> > --- > arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c > index 717192915f28..e2439c7872ba 100644 > --- a/arch/x86/kernel/cpu/mce/apei.c > +++ b/arch/x86/kernel/cpu/mce/apei.c > @@ -29,15 +29,27 @@ > void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) > { > struct mce m; > + int lsb; > > if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) > return; > > + /* > + * Even if the ->validation_bits are set for address mask, > + * to be extra safe, check and reject an error radius '0', > + * and fallback to the default page size. > + */ speling nit: s/fallback/fall back > + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) > + lsb = find_first_bit((const unsigned long *) > + &mem_err->physical_addr_mask, PAGE_SHIFT); I think we can write this in a shorter form and in a single line: lsb = find_first_bit((void *)&mem_err->physical_addr_mask, PAGE_SHIFT); (Ignore checkpatch if it wants to break the line.) Untested. Assuming my suggestion is correct and with those addressed: Reviewed-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo
On 8/2/2022 3:59 AM, Ingo Molnar wrote: > > * Jane Chu <jane.chu@oracle.com> wrote: > >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been >> discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting >> 2 back-to-back poisons and the driver ends up logging 8 badblocks, >> because 0x1000 bytes is 8 512-byte. >> >> Dan Williams noticed that apei_mce_report_mem_error() hardcode >> the LSB field to PAGE_SHIFT instead of consulting the input >> struct cper_sec_mem_err record. So change to rely on hardware whenever >> support is available. >> >> Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com >> >> Reviewed-by: Dan Williams <dan.j.williams@intel.com> >> Signed-off-by: Jane Chu <jane.chu@oracle.com> >> --- >> arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- >> 1 file changed, 13 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c >> index 717192915f28..e2439c7872ba 100644 >> --- a/arch/x86/kernel/cpu/mce/apei.c >> +++ b/arch/x86/kernel/cpu/mce/apei.c >> @@ -29,15 +29,27 @@ >> void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) >> { >> struct mce m; >> + int lsb; >> >> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) >> return; >> >> + /* >> + * Even if the ->validation_bits are set for address mask, >> + * to be extra safe, check and reject an error radius '0', >> + * and fallback to the default page size. >> + */ > > speling nit: > > s/fallback/fall back > >> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) >> + lsb = find_first_bit((const unsigned long *) >> + &mem_err->physical_addr_mask, PAGE_SHIFT); > > I think we can write this in a shorter form and in a single line: > > lsb = find_first_bit((void *)&mem_err->physical_addr_mask, PAGE_SHIFT); > > (Ignore checkpatch if it wants to break the line.) > > Untested. > > Assuming my suggestion is correct and with those addressed: > > Reviewed-by: Ingo Molnar <mingo@kernel.org> Thanks! Just tested the change, v7 coming next. -jane > > Thanks, > > Ingo
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 717192915f28..e2439c7872ba 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -29,15 +29,27 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) { struct mce m; + int lsb; if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) return; + /* + * Even if the ->validation_bits are set for address mask, + * to be extra safe, check and reject an error radius '0', + * and fallback to the default page size. + */ + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) + lsb = find_first_bit((const unsigned long *) + &mem_err->physical_addr_mask, PAGE_SHIFT); + else + lsb = PAGE_SHIFT; + mce_setup(&m); m.bank = -1; /* Fake a memory read error with unknown channel */ m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f; - m.misc = (MCI_MISC_ADDR_PHYS << 6) | PAGE_SHIFT; + m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb; if (severity >= GHES_SEV_RECOVERABLE) m.status |= MCI_STATUS_UC;