Message ID | 20220802195053.3882368-1-jane.chu@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v7] x86/mce: retrieve poison range from hardware | expand |
* Jane Chu <jane.chu@oracle.com> wrote: > With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine s/Commit/commit > poison granularity") that changed nfit_handle_mce() callback to report > badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been > discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting > 2 back-to-back poisons and the driver ends up logging 8 badblocks, > because 0x1000 bytes is 8 512-byte. > > Dan Williams noticed that apei_mce_report_mem_error() hardcode > the LSB field to PAGE_SHIFT instead of consulting the input > struct cper_sec_mem_err record. So change to rely on hardware whenever > support is available. > > Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > Reviewed-by: Ingo Molnar <mingo@kernel.org> > Signed-off-by: Jane Chu <jane.chu@oracle.com> > --- > arch/x86/kernel/cpu/mce/apei.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c > index 717192915f28..8ed341714686 100644 > --- a/arch/x86/kernel/cpu/mce/apei.c > +++ b/arch/x86/kernel/cpu/mce/apei.c > @@ -29,15 +29,26 @@ > void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) > { > struct mce m; > + int lsb; > > if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) > return; > > + /* > + * Even if the ->validation_bits are set for address mask, > + * to be extra safe, check and reject an error radius '0', > + * and fall back to the default page size. > + */ > + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) > + lsb = find_first_bit((void *)&mem_err->physical_addr_mask, PAGE_SHIFT); > + else > + lsb = PAGE_SHIFT; > + > mce_setup(&m); > m.bank = -1; > /* Fake a memory read error with unknown channel */ > m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f; > - m.misc = (MCI_MISC_ADDR_PHYS << 6) | PAGE_SHIFT; > + m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb; LGTM. I suppose this wants to go upstream via the tree the bug came from (NVDIMM tree? ACPI tree?), or should we pick it up into the x86 tree? Thanks, Ingo
On 8/3/2022 1:53 AM, Ingo Molnar wrote: > > * Jane Chu <jane.chu@oracle.com> wrote: > >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine > > s/Commit/commit Maintainers, Would you prefer a v8, or take care the comment upon accepting the patch? > >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been >> discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting >> 2 back-to-back poisons and the driver ends up logging 8 badblocks, >> because 0x1000 bytes is 8 512-byte. >> >> Dan Williams noticed that apei_mce_report_mem_error() hardcode >> the LSB field to PAGE_SHIFT instead of consulting the input >> struct cper_sec_mem_err record. So change to rely on hardware whenever >> support is available. >> >> Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com >> >> Reviewed-by: Dan Williams <dan.j.williams@intel.com> >> Reviewed-by: Ingo Molnar <mingo@kernel.org> >> Signed-off-by: Jane Chu <jane.chu@oracle.com> >> --- >> arch/x86/kernel/cpu/mce/apei.c | 13 ++++++++++++- >> 1 file changed, 12 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c >> index 717192915f28..8ed341714686 100644 >> --- a/arch/x86/kernel/cpu/mce/apei.c >> +++ b/arch/x86/kernel/cpu/mce/apei.c >> @@ -29,15 +29,26 @@ >> void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) >> { >> struct mce m; >> + int lsb; >> >> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) >> return; >> >> + /* >> + * Even if the ->validation_bits are set for address mask, >> + * to be extra safe, check and reject an error radius '0', >> + * and fall back to the default page size. >> + */ >> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) >> + lsb = find_first_bit((void *)&mem_err->physical_addr_mask, PAGE_SHIFT); >> + else >> + lsb = PAGE_SHIFT; >> + >> mce_setup(&m); >> m.bank = -1; >> /* Fake a memory read error with unknown channel */ >> m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f; >> - m.misc = (MCI_MISC_ADDR_PHYS << 6) | PAGE_SHIFT; >> + m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb; > > LGTM. > > I suppose this wants to go upstream via the tree the bug came from (NVDIMM > tree? ACPI tree?), or should we pick it up into the x86 tree? No idea. Maintainers? thanks! -jane > > Thanks, > > Ingo
Jane Chu wrote: > On 8/3/2022 1:53 AM, Ingo Molnar wrote: > > > > * Jane Chu <jane.chu@oracle.com> wrote: > > > >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine > > > > s/Commit/commit > > Maintainers, > Would you prefer a v8, or take care the comment upon accepting the patch? > > > > >> poison granularity") that changed nfit_handle_mce() callback to report > >> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been > >> discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting > >> 2 back-to-back poisons and the driver ends up logging 8 badblocks, > >> because 0x1000 bytes is 8 512-byte. > >> > >> Dan Williams noticed that apei_mce_report_mem_error() hardcode > >> the LSB field to PAGE_SHIFT instead of consulting the input > >> struct cper_sec_mem_err record. So change to rely on hardware whenever > >> support is available. > >> > >> Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com > >> > >> Reviewed-by: Dan Williams <dan.j.williams@intel.com> > >> Reviewed-by: Ingo Molnar <mingo@kernel.org> > >> Signed-off-by: Jane Chu <jane.chu@oracle.com> > >> --- > >> arch/x86/kernel/cpu/mce/apei.c | 13 ++++++++++++- > >> 1 file changed, 12 insertions(+), 1 deletion(-) > >> > >> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c > >> index 717192915f28..8ed341714686 100644 > >> --- a/arch/x86/kernel/cpu/mce/apei.c > >> +++ b/arch/x86/kernel/cpu/mce/apei.c > >> @@ -29,15 +29,26 @@ > >> void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) > >> { > >> struct mce m; > >> + int lsb; > >> > >> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) > >> return; > >> > >> + /* > >> + * Even if the ->validation_bits are set for address mask, > >> + * to be extra safe, check and reject an error radius '0', > >> + * and fall back to the default page size. > >> + */ > >> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) > >> + lsb = find_first_bit((void *)&mem_err->physical_addr_mask, PAGE_SHIFT); > >> + else > >> + lsb = PAGE_SHIFT; > >> + > >> mce_setup(&m); > >> m.bank = -1; > >> /* Fake a memory read error with unknown channel */ > >> m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f; > >> - m.misc = (MCI_MISC_ADDR_PHYS << 6) | PAGE_SHIFT; > >> + m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb; > > > > LGTM. > > > > I suppose this wants to go upstream via the tree the bug came from (NVDIMM > > tree? ACPI tree?), or should we pick it up into the x86 tree? > > No idea. Maintainers? There's no real NVDIMM dependency here, just a general cleanup of how APEI error granularities are managed. So I think it is appropriate for this to go through the x86 tree via the typical path for mce related topics.
>>> I suppose this wants to go upstream via the tree the bug came from (NVDIMM >>> tree? ACPI tree?), or should we pick it up into the x86 tree? >> >> No idea. Maintainers? > > There's no real NVDIMM dependency here, just a general cleanup of how > APEI error granularities are managed. So I think it is appropriate for > this to go through the x86 tree via the typical path for mce related > topics. + Huang, Ying. x86 maintainers, Please let me know if you need another revision. thanks, -jane On 8/8/2022 4:30 PM, Dan Williams wrote: > Jane Chu wrote: >> On 8/3/2022 1:53 AM, Ingo Molnar wrote: >>> >>> * Jane Chu <jane.chu@oracle.com> wrote: >>> >>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>> >>> s/Commit/commit >> >> Maintainers, >> Would you prefer a v8, or take care the comment upon accepting the patch? >> >>> >>>> poison granularity") that changed nfit_handle_mce() callback to report >>>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been >>>> discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting >>>> 2 back-to-back poisons and the driver ends up logging 8 badblocks, >>>> because 0x1000 bytes is 8 512-byte. >>>> >>>> Dan Williams noticed that apei_mce_report_mem_error() hardcode >>>> the LSB field to PAGE_SHIFT instead of consulting the input >>>> struct cper_sec_mem_err record. So change to rely on hardware whenever >>>> support is available. >>>> >>>> Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com >>>> >>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com> >>>> Reviewed-by: Ingo Molnar <mingo@kernel.org> >>>> Signed-off-by: Jane Chu <jane.chu@oracle.com> >>>> --- >>>> arch/x86/kernel/cpu/mce/apei.c | 13 ++++++++++++- >>>> 1 file changed, 12 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c >>>> index 717192915f28..8ed341714686 100644 >>>> --- a/arch/x86/kernel/cpu/mce/apei.c >>>> +++ b/arch/x86/kernel/cpu/mce/apei.c >>>> @@ -29,15 +29,26 @@ >>>> void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) >>>> { >>>> struct mce m; >>>> + int lsb; >>>> >>>> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) >>>> return; >>>> >>>> + /* >>>> + * Even if the ->validation_bits are set for address mask, >>>> + * to be extra safe, check and reject an error radius '0', >>>> + * and fall back to the default page size. >>>> + */ >>>> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) >>>> + lsb = find_first_bit((void *)&mem_err->physical_addr_mask, PAGE_SHIFT); >>>> + else >>>> + lsb = PAGE_SHIFT; >>>> + >>>> mce_setup(&m); >>>> m.bank = -1; >>>> /* Fake a memory read error with unknown channel */ >>>> m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f; >>>> - m.misc = (MCI_MISC_ADDR_PHYS << 6) | PAGE_SHIFT; >>>> + m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb; >>> >>> LGTM. >>> >>> I suppose this wants to go upstream via the tree the bug came from (NVDIMM >>> tree? ACPI tree?), or should we pick it up into the x86 tree? >> >> No idea. Maintainers? > > There's no real NVDIMM dependency here, just a general cleanup of how > APEI error granularities are managed. So I think it is appropriate for > this to go through the x86 tree via the typical path for mce related > topics.
On Tue, Aug 02, 2022 at 01:50:53PM -0600, Jane Chu wrote: > With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine > poison granularity") that changed nfit_handle_mce() callback to report > badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been > discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting > 2 back-to-back poisons and the driver ends up logging 8 badblocks, > because 0x1000 bytes is 8 512-byte. What I'm missing from this text here is, what *is* the mce->misc LSB field in human speak? What does that field denote? What effect does that field have on error injection? And so on. > Dan Williams noticed that apei_mce_report_mem_error() hardcode > the LSB field to PAGE_SHIFT instead of consulting the input > struct cper_sec_mem_err record. So change to rely on hardware whenever > support is available. Rely on hardware? You're changing this to rely on what the firmware reports. That mem_err thing comes from a BIOS table AFAICT. ... Thx.
> What I'm missing from this text here is, what *is* the mce->misc LSB > field in human speak? What does that field denote? The SDM says: Recoverable Address LSB (bits 5:0): The lowest valid recoverable address bit. Indicates the position of the least significant bit (LSB) of the recoverable error address. For example, if the processor logs bits [43:9] of the address, the LSB sub-field in IA32_MCi_MISC is 01001b (9 decimal). For this example, bits [8:0] of the recoverable error address in IA32_MCi_ADDR should be ignored. So in human speak "how much data did you lose". "6" is a common value saying a cache line (2<<6 == 64) was lost. Sometimes you see "12' (2<<12 == 4096) for a whole page lost. -Tony
On 8/23/2022 9:51 AM, Borislav Petkov wrote: > On Tue, Aug 02, 2022 at 01:50:53PM -0600, Jane Chu wrote: >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been >> discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting >> 2 back-to-back poisons and the driver ends up logging 8 badblocks, >> because 0x1000 bytes is 8 512-byte. > > What I'm missing from this text here is, what *is* the mce->misc LSB > field in human speak? What does that field denote? > > What effect does that field have on error injection Tony has replied. > > And so on. > >> Dan Williams noticed that apei_mce_report_mem_error() hardcode >> the LSB field to PAGE_SHIFT instead of consulting the input >> struct cper_sec_mem_err record. So change to rely on hardware whenever >> support is available. > > Rely on hardware? You're changing this to rely on what the firmware > reports. > > That mem_err thing comes from a BIOS table AFAICT. > Would fix the comment to indicate "relying on firmware" help? Is there other concern? thanks! -jane > ... > > Thx. >
On Thu, Aug 25, 2022 at 04:29:47PM +0000, Jane Chu wrote:
> Tony has replied.
Do you really think that I can't look up what field means?
What I said was
"What I'm missing from this text here is... "
IOW, what I'm trying to say is, you should formulate your commit message
better, more human-friendly. Right now it reads like for insiders only.
But that's not its purpose.
Do you catch my drift?
Borislav Petkov wrote: > On Thu, Aug 25, 2022 at 04:29:47PM +0000, Jane Chu wrote: > > Tony has replied. > > Do you really think that I can't look up what field means? > > What I said was > > "What I'm missing from this text here is... " > > IOW, what I'm trying to say is, you should formulate your commit message > better, more human-friendly. Right now it reads like for insiders only. > But that's not its purpose. > > Do you catch my drift? How about: --- When memory poison consumption machine checks fire, mce-notifier-handlers like nfit_handle_mce() record the impacted physical address range. The error information includes data about blast radius, i.e. how many cachelines did the hardware determine are impacted. A recent change, commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison granularity"), updated nfit_handle_mce() to stop hard coding the blast radius value of 1 cacheline, and instead rely on the blast radius reported in 'struct mce' which can be up to 4K (64 cachelines). It turns out that apei_mce_report_mem_error() had a similar problem in that it hard coded a blast radius of 4K rather than checking the blast radius in the error information. Fix apei_mce_report_mem_error() to convey the proper poison granularity. ---
On Fri, Aug 26, 2022 at 10:54:31AM -0700, Dan Williams wrote: > How about: > > --- > > When memory poison consumption machine checks fire, > mce-notifier-handlers like nfit_handle_mce() record the impacted > physical address range. ... which is reported by the hardware in the MCi_MISC MSR. > The error information includes data about blast > radius, i.e. how many cachelines did the hardware determine are > impacted. Yap, nice. > A recent change, commit 7917f9cdb503 ("acpi/nfit: rely on > mce->misc to determine poison granularity"), updated nfit_handle_mce() > to stop hard coding the blast radius value of 1 cacheline, and instead > rely on the blast radius reported in 'struct mce' which can be up to 4K > (64 cachelines). > > It turns out that apei_mce_report_mem_error() had a similar problem in > that it hard coded a blast radius of 4K rather than checking the blast s/checking/reading/ > radius in the error information. Fix apei_mce_report_mem_error() to s/in/from/ > convey the proper poison granularity. > > --- Yap, that's a lot better. Thanks!
On 8/26/2022 11:09 AM, Borislav Petkov wrote: > On Fri, Aug 26, 2022 at 10:54:31AM -0700, Dan Williams wrote: >> How about: >> >> --- >> >> When memory poison consumption machine checks fire, >> mce-notifier-handlers like nfit_handle_mce() record the impacted >> physical address range. > > ... which is reported by the hardware in the MCi_MISC MSR. > >> The error information includes data about blast >> radius, i.e. how many cachelines did the hardware determine are >> impacted. > > Yap, nice. > >> A recent change, commit 7917f9cdb503 ("acpi/nfit: rely on >> mce->misc to determine poison granularity"), updated nfit_handle_mce() >> to stop hard coding the blast radius value of 1 cacheline, and instead >> rely on the blast radius reported in 'struct mce' which can be up to 4K >> (64 cachelines). >> >> It turns out that apei_mce_report_mem_error() had a similar problem in >> that it hard coded a blast radius of 4K rather than checking the blast > > s/checking/reading/ > >> radius in the error information. Fix apei_mce_report_mem_error() to > > s/in/from/ > >> convey the proper poison granularity. >> >> --- > > Yap, that's a lot better. > > Thanks! Got it and points taken. Thank you both, Boris and Dan. v8 coming up. thanks, -jane
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 717192915f28..8ed341714686 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -29,15 +29,26 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) { struct mce m; + int lsb; if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) return; + /* + * Even if the ->validation_bits are set for address mask, + * to be extra safe, check and reject an error radius '0', + * and fall back to the default page size. + */ + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) + lsb = find_first_bit((void *)&mem_err->physical_addr_mask, PAGE_SHIFT); + else + lsb = PAGE_SHIFT; + mce_setup(&m); m.bank = -1; /* Fake a memory read error with unknown channel */ m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f; - m.misc = (MCI_MISC_ADDR_PHYS << 6) | PAGE_SHIFT; + m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb; if (severity >= GHES_SEV_RECOVERABLE) m.status |= MCI_STATUS_UC;