Message ID | 20220727184644.2685405-1-jane.chu@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] x86/mce: retrieve poison range from hardware | expand |
Jane Chu wrote: > With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine > poison granularity") that changed nfit_handle_mce() callback to report > badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been > discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting > 2 back-to-back poisons and the driver ends up logging 8 badblocks, > because 0x1000 bytes is 8 512-byte. > > Dan Williams noticed that apei_mce_report_mem_error() hardcode > the LSB field to PAGE_SHIFT instead of consulting the input > struct cper_sec_mem_err record. So change to rely on hardware whenever > support is available. > > Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Jane Chu <jane.chu@oracle.com> > --- > arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c > index 717192915f28..26d63818b2de 100644 > --- a/arch/x86/kernel/cpu/mce/apei.c > +++ b/arch/x86/kernel/cpu/mce/apei.c > @@ -29,15 +29,27 @@ > void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) > { > struct mce m; > + int grain = PAGE_SHIFT; > > if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) > return; > > + /* > + * Even if the ->validation_bits are set for address mask, > + * to be extra safe, check and reject an error radius '0', > + * and fallback to the default page size. > + */ > + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) { > + grain = ~mem_err->physical_addr_mask + 1; > + if (grain == 1) > + grain = PAGE_SHIFT; Wait, if @grain is the number of bits to mask off the address, shouldn't this be something like: grain = min_not_zero(PAGE_SHIFT, hweight64(~mem_err->physical_addr_mask)); ...?
On 7/27/2022 11:56 AM, Dan Williams wrote: > Jane Chu wrote: >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >> poison granularity") that changed nfit_handle_mce() callback to report >> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been >> discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting >> 2 back-to-back poisons and the driver ends up logging 8 badblocks, >> because 0x1000 bytes is 8 512-byte. >> >> Dan Williams noticed that apei_mce_report_mem_error() hardcode >> the LSB field to PAGE_SHIFT instead of consulting the input >> struct cper_sec_mem_err record. So change to rely on hardware whenever >> support is available. >> >> Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com >> >> Reviewed-by: Dan Williams <dan.j.williams@intel.com> >> Signed-off-by: Jane Chu <jane.chu@oracle.com> >> --- >> arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- >> 1 file changed, 13 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c >> index 717192915f28..26d63818b2de 100644 >> --- a/arch/x86/kernel/cpu/mce/apei.c >> +++ b/arch/x86/kernel/cpu/mce/apei.c >> @@ -29,15 +29,27 @@ >> void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) >> { >> struct mce m; >> + int grain = PAGE_SHIFT; >> >> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) >> return; >> >> + /* >> + * Even if the ->validation_bits are set for address mask, >> + * to be extra safe, check and reject an error radius '0', >> + * and fallback to the default page size. >> + */ >> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) { >> + grain = ~mem_err->physical_addr_mask + 1; >> + if (grain == 1) >> + grain = PAGE_SHIFT; > > Wait, if @grain is the number of bits to mask off the address, shouldn't > this be something like: > > grain = min_not_zero(PAGE_SHIFT, hweight64(~mem_err->physical_addr_mask)); I see. I guess what you meant is grain = min(PAGE_SHIFT, (1 + hweight64(~mem_err->physical_addr_mask))); so that in the pmem poison case, 'grain' would be 8, not 7. thanks, -jane > > ...?
On 7/27/2022 12:24 PM, Jane Chu wrote: > On 7/27/2022 11:56 AM, Dan Williams wrote: >> Jane Chu wrote: >>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>> poison granularity") that changed nfit_handle_mce() callback to report >>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been >>> discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting >>> 2 back-to-back poisons and the driver ends up logging 8 badblocks, >>> because 0x1000 bytes is 8 512-byte. >>> >>> Dan Williams noticed that apei_mce_report_mem_error() hardcode >>> the LSB field to PAGE_SHIFT instead of consulting the input >>> struct cper_sec_mem_err record. So change to rely on hardware whenever >>> support is available. >>> >>> Link: >>> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com >>> >>> >>> Reviewed-by: Dan Williams <dan.j.williams@intel.com> >>> Signed-off-by: Jane Chu <jane.chu@oracle.com> >>> --- >>> arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- >>> 1 file changed, 13 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/kernel/cpu/mce/apei.c >>> b/arch/x86/kernel/cpu/mce/apei.c >>> index 717192915f28..26d63818b2de 100644 >>> --- a/arch/x86/kernel/cpu/mce/apei.c >>> +++ b/arch/x86/kernel/cpu/mce/apei.c >>> @@ -29,15 +29,27 @@ >>> void apei_mce_report_mem_error(int severity, struct >>> cper_sec_mem_err *mem_err) >>> { >>> struct mce m; >>> + int grain = PAGE_SHIFT; >>> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) >>> return; >>> + /* >>> + * Even if the ->validation_bits are set for address mask, >>> + * to be extra safe, check and reject an error radius '0', >>> + * and fallback to the default page size. >>> + */ >>> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) { >>> + grain = ~mem_err->physical_addr_mask + 1; >>> + if (grain == 1) >>> + grain = PAGE_SHIFT; >> >> Wait, if @grain is the number of bits to mask off the address, shouldn't >> this be something like: >> >> grain = min_not_zero(PAGE_SHIFT, >> hweight64(~mem_err->physical_addr_mask)); > > I see. I guess what you meant is > grain = min(PAGE_SHIFT, (1 + hweight64(~mem_err->physical_addr_mask))); Sorry, take that back, it won't work either. -jane > so that in the pmem poison case, 'grain' would be 8, not 7. > > thanks, > -jane > >> >> ...? >
On 7/27/2022 12:30 PM, Jane Chu wrote: > On 7/27/2022 12:24 PM, Jane Chu wrote: >> On 7/27/2022 11:56 AM, Dan Williams wrote: >>> Jane Chu wrote: >>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>>> poison granularity") that changed nfit_handle_mce() callback to report >>>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been >>>> discovered that the mce->misc LSB field is 0x1000 bytes, hence >>>> injecting >>>> 2 back-to-back poisons and the driver ends up logging 8 badblocks, >>>> because 0x1000 bytes is 8 512-byte. >>>> >>>> Dan Williams noticed that apei_mce_report_mem_error() hardcode >>>> the LSB field to PAGE_SHIFT instead of consulting the input >>>> struct cper_sec_mem_err record. So change to rely on hardware whenever >>>> support is available. >>>> >>>> Link: >>>> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com >>>> >>>> >>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com> >>>> Signed-off-by: Jane Chu <jane.chu@oracle.com> >>>> --- >>>> arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- >>>> 1 file changed, 13 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/arch/x86/kernel/cpu/mce/apei.c >>>> b/arch/x86/kernel/cpu/mce/apei.c >>>> index 717192915f28..26d63818b2de 100644 >>>> --- a/arch/x86/kernel/cpu/mce/apei.c >>>> +++ b/arch/x86/kernel/cpu/mce/apei.c >>>> @@ -29,15 +29,27 @@ >>>> void apei_mce_report_mem_error(int severity, struct >>>> cper_sec_mem_err *mem_err) >>>> { >>>> struct mce m; >>>> + int grain = PAGE_SHIFT; >>>> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) >>>> return; >>>> + /* >>>> + * Even if the ->validation_bits are set for address mask, >>>> + * to be extra safe, check and reject an error radius '0', >>>> + * and fallback to the default page size. >>>> + */ >>>> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) { >>>> + grain = ~mem_err->physical_addr_mask + 1; >>>> + if (grain == 1) >>>> + grain = PAGE_SHIFT; >>> >>> Wait, if @grain is the number of bits to mask off the address, shouldn't >>> this be something like: >>> >>> grain = min_not_zero(PAGE_SHIFT, >>> hweight64(~mem_err->physical_addr_mask)); >> >> I see. I guess what you meant is >> grain = min(PAGE_SHIFT, (1 + >> hweight64(~mem_err->physical_addr_mask))); > > Sorry, take that back, it won't work either. This will work, grain = min_not_zero(PAGE_SHIFT - 1, hweight64(~mem_err->physical_addr_mask)); grain++; but too sophisticated? I guess I prefer the simple "if" expression. thanks, -jane > > -jane > >> so that in the pmem poison case, 'grain' would be 8, not 7. >> >> thanks, >> -jane >> >>> >>> ...? >> >
Jane Chu wrote: > On 7/27/2022 12:30 PM, Jane Chu wrote: > > On 7/27/2022 12:24 PM, Jane Chu wrote: > >> On 7/27/2022 11:56 AM, Dan Williams wrote: > >>> Jane Chu wrote: > >>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine > >>>> poison granularity") that changed nfit_handle_mce() callback to report > >>>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been > >>>> discovered that the mce->misc LSB field is 0x1000 bytes, hence > >>>> injecting > >>>> 2 back-to-back poisons and the driver ends up logging 8 badblocks, > >>>> because 0x1000 bytes is 8 512-byte. > >>>> > >>>> Dan Williams noticed that apei_mce_report_mem_error() hardcode > >>>> the LSB field to PAGE_SHIFT instead of consulting the input > >>>> struct cper_sec_mem_err record. So change to rely on hardware whenever > >>>> support is available. > >>>> > >>>> Link: > >>>> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com > >>>> > >>>> > >>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com> > >>>> Signed-off-by: Jane Chu <jane.chu@oracle.com> > >>>> --- > >>>> arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- > >>>> 1 file changed, 13 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/arch/x86/kernel/cpu/mce/apei.c > >>>> b/arch/x86/kernel/cpu/mce/apei.c > >>>> index 717192915f28..26d63818b2de 100644 > >>>> --- a/arch/x86/kernel/cpu/mce/apei.c > >>>> +++ b/arch/x86/kernel/cpu/mce/apei.c > >>>> @@ -29,15 +29,27 @@ > >>>> void apei_mce_report_mem_error(int severity, struct > >>>> cper_sec_mem_err *mem_err) > >>>> { > >>>> struct mce m; > >>>> + int grain = PAGE_SHIFT; > >>>> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) > >>>> return; > >>>> + /* > >>>> + * Even if the ->validation_bits are set for address mask, > >>>> + * to be extra safe, check and reject an error radius '0', > >>>> + * and fallback to the default page size. > >>>> + */ > >>>> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) { > >>>> + grain = ~mem_err->physical_addr_mask + 1; > >>>> + if (grain == 1) > >>>> + grain = PAGE_SHIFT; > >>> > >>> Wait, if @grain is the number of bits to mask off the address, shouldn't > >>> this be something like: > >>> > >>> grain = min_not_zero(PAGE_SHIFT, > >>> hweight64(~mem_err->physical_addr_mask)); > >> > >> I see. I guess what you meant is > >> grain = min(PAGE_SHIFT, (1 + > >> hweight64(~mem_err->physical_addr_mask))); > > > > Sorry, take that back, it won't work either. > > This will work, > grain = min_not_zero(PAGE_SHIFT - 1, > hweight64(~mem_err->physical_addr_mask)); > grain++; > but too sophisticated? I guess I prefer the simple "if" expression. An "if" is fine, I was more pointing out that: hweight64(~mem_err->physical_addr_mask) + 1 ...and: ~mem_err->physical_addr_mask + 1; ...give different results.
On 7/27/2022 1:01 PM, Dan Williams wrote: > Jane Chu wrote: >> On 7/27/2022 12:30 PM, Jane Chu wrote: >>> On 7/27/2022 12:24 PM, Jane Chu wrote: >>>> On 7/27/2022 11:56 AM, Dan Williams wrote: >>>>> Jane Chu wrote: >>>>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>>>>> poison granularity") that changed nfit_handle_mce() callback to report >>>>>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been >>>>>> discovered that the mce->misc LSB field is 0x1000 bytes, hence >>>>>> injecting >>>>>> 2 back-to-back poisons and the driver ends up logging 8 badblocks, >>>>>> because 0x1000 bytes is 8 512-byte. >>>>>> >>>>>> Dan Williams noticed that apei_mce_report_mem_error() hardcode >>>>>> the LSB field to PAGE_SHIFT instead of consulting the input >>>>>> struct cper_sec_mem_err record. So change to rely on hardware whenever >>>>>> support is available. >>>>>> >>>>>> Link: >>>>>> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com >>>>>> >>>>>> >>>>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com> >>>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com> >>>>>> --- >>>>>> arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- >>>>>> 1 file changed, 13 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/arch/x86/kernel/cpu/mce/apei.c >>>>>> b/arch/x86/kernel/cpu/mce/apei.c >>>>>> index 717192915f28..26d63818b2de 100644 >>>>>> --- a/arch/x86/kernel/cpu/mce/apei.c >>>>>> +++ b/arch/x86/kernel/cpu/mce/apei.c >>>>>> @@ -29,15 +29,27 @@ >>>>>> void apei_mce_report_mem_error(int severity, struct >>>>>> cper_sec_mem_err *mem_err) >>>>>> { >>>>>> struct mce m; >>>>>> + int grain = PAGE_SHIFT; >>>>>> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) >>>>>> return; >>>>>> + /* >>>>>> + * Even if the ->validation_bits are set for address mask, >>>>>> + * to be extra safe, check and reject an error radius '0', >>>>>> + * and fallback to the default page size. >>>>>> + */ >>>>>> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) { >>>>>> + grain = ~mem_err->physical_addr_mask + 1; >>>>>> + if (grain == 1) >>>>>> + grain = PAGE_SHIFT; >>>>> >>>>> Wait, if @grain is the number of bits to mask off the address, shouldn't >>>>> this be something like: >>>>> >>>>> grain = min_not_zero(PAGE_SHIFT, >>>>> hweight64(~mem_err->physical_addr_mask)); >>>> >>>> I see. I guess what you meant is >>>> grain = min(PAGE_SHIFT, (1 + >>>> hweight64(~mem_err->physical_addr_mask))); >>> >>> Sorry, take that back, it won't work either. >> >> This will work, >> grain = min_not_zero(PAGE_SHIFT - 1, >> hweight64(~mem_err->physical_addr_mask)); >> grain++; >> but too sophisticated? I guess I prefer the simple "if" expression. > > An "if" is fine, I was more pointing out that: > > hweight64(~mem_err->physical_addr_mask) + 1 > > ...and: > > ~mem_err->physical_addr_mask + 1; > > ...give different results. They are different indeed. hweight64 returns the count of set bit while ~mem_err->physical_addr_mask returns a negated value. According to the definition of "Physical Address Mask" - https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf Table N-31 Memory Error Record Physical Address Mask 24 8 Defines the valid address bits in the Physical Address field. The mask specifies the granularity of the physical address which is dependent on the hw/ implementation factors such as interleaving. It appears that "Physical Address Mask" is defined more like PAGE_MASK rather than in bitops hweight64() ofter used to count the set bits as an indication of (e.g.) how many registers are in use. Ans similar to PAGE_MASK, a valid "Physical Address Mask" should consist of a contiguous low 0 bits, not 1's and 0's mixed up. So far, as far as I can see, the v4 patch still looks correct to me. Please let me know if I'm missing anything. thanks! -jane
Jane Chu wrote: > On 7/27/2022 1:01 PM, Dan Williams wrote: > > Jane Chu wrote: > >> On 7/27/2022 12:30 PM, Jane Chu wrote: > >>> On 7/27/2022 12:24 PM, Jane Chu wrote: > >>>> On 7/27/2022 11:56 AM, Dan Williams wrote: > >>>>> Jane Chu wrote: > >>>>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine > >>>>>> poison granularity") that changed nfit_handle_mce() callback to report > >>>>>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been > >>>>>> discovered that the mce->misc LSB field is 0x1000 bytes, hence > >>>>>> injecting > >>>>>> 2 back-to-back poisons and the driver ends up logging 8 badblocks, > >>>>>> because 0x1000 bytes is 8 512-byte. > >>>>>> > >>>>>> Dan Williams noticed that apei_mce_report_mem_error() hardcode > >>>>>> the LSB field to PAGE_SHIFT instead of consulting the input > >>>>>> struct cper_sec_mem_err record. So change to rely on hardware whenever > >>>>>> support is available. > >>>>>> > >>>>>> Link: > >>>>>> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com > >>>>>> > >>>>>> > >>>>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com> > >>>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com> > >>>>>> --- > >>>>>> arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- > >>>>>> 1 file changed, 13 insertions(+), 1 deletion(-) > >>>>>> > >>>>>> diff --git a/arch/x86/kernel/cpu/mce/apei.c > >>>>>> b/arch/x86/kernel/cpu/mce/apei.c > >>>>>> index 717192915f28..26d63818b2de 100644 > >>>>>> --- a/arch/x86/kernel/cpu/mce/apei.c > >>>>>> +++ b/arch/x86/kernel/cpu/mce/apei.c > >>>>>> @@ -29,15 +29,27 @@ > >>>>>> void apei_mce_report_mem_error(int severity, struct > >>>>>> cper_sec_mem_err *mem_err) > >>>>>> { > >>>>>> struct mce m; > >>>>>> + int grain = PAGE_SHIFT; > >>>>>> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) > >>>>>> return; > >>>>>> + /* > >>>>>> + * Even if the ->validation_bits are set for address mask, > >>>>>> + * to be extra safe, check and reject an error radius '0', > >>>>>> + * and fallback to the default page size. > >>>>>> + */ > >>>>>> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) { > >>>>>> + grain = ~mem_err->physical_addr_mask + 1; > >>>>>> + if (grain == 1) > >>>>>> + grain = PAGE_SHIFT; > >>>>> > >>>>> Wait, if @grain is the number of bits to mask off the address, shouldn't > >>>>> this be something like: > >>>>> > >>>>> grain = min_not_zero(PAGE_SHIFT, > >>>>> hweight64(~mem_err->physical_addr_mask)); > >>>> > >>>> I see. I guess what you meant is > >>>> grain = min(PAGE_SHIFT, (1 + > >>>> hweight64(~mem_err->physical_addr_mask))); > >>> > >>> Sorry, take that back, it won't work either. > >> > >> This will work, > >> grain = min_not_zero(PAGE_SHIFT - 1, > >> hweight64(~mem_err->physical_addr_mask)); > >> grain++; > >> but too sophisticated? I guess I prefer the simple "if" expression. > > > > An "if" is fine, I was more pointing out that: > > > > hweight64(~mem_err->physical_addr_mask) + 1 > > > > ...and: > > > > ~mem_err->physical_addr_mask + 1; > > > > ...give different results. > > They are different indeed. hweight64 returns the count of set bit while > ~mem_err->physical_addr_mask returns a negated value. > > According to the definition of "Physical Address Mask" - > > https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf > > Table N-31 Memory Error Record > > Physical Address Mask 24 8 Defines the valid address bits in the > Physical Address field. The mask specifies the granularity of the > physical address which is dependent on the hw/ implementation factors > such as interleaving. > > It appears that "Physical Address Mask" is defined more like PAGE_MASK > rather than in bitops hweight64() ofter used to count the set bits as > an indication of (e.g.) how many registers are in use. > > Ans similar to PAGE_MASK, a valid "Physical Address Mask" should > consist of a contiguous low 0 bits, not 1's and 0's mixed up. > > So far, as far as I can see, the v4 patch still looks correct to me. > Please let me know if I'm missing anything. The v4 patch looks broken to me. If the address mask is 0xffffffffffffffc0 to indicate a cacheline error then: ~mem_err->physical_addr_mask + 1; ...results in a grain of 64 when it should be 6.
On 7/28/2022 11:46 AM, Dan Williams wrote: > Jane Chu wrote: >> On 7/27/2022 1:01 PM, Dan Williams wrote: >>> Jane Chu wrote: >>>> On 7/27/2022 12:30 PM, Jane Chu wrote: >>>>> On 7/27/2022 12:24 PM, Jane Chu wrote: >>>>>> On 7/27/2022 11:56 AM, Dan Williams wrote: >>>>>>> Jane Chu wrote: >>>>>>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine >>>>>>>> poison granularity") that changed nfit_handle_mce() callback to report >>>>>>>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been >>>>>>>> discovered that the mce->misc LSB field is 0x1000 bytes, hence >>>>>>>> injecting >>>>>>>> 2 back-to-back poisons and the driver ends up logging 8 badblocks, >>>>>>>> because 0x1000 bytes is 8 512-byte. >>>>>>>> >>>>>>>> Dan Williams noticed that apei_mce_report_mem_error() hardcode >>>>>>>> the LSB field to PAGE_SHIFT instead of consulting the input >>>>>>>> struct cper_sec_mem_err record. So change to rely on hardware whenever >>>>>>>> support is available. >>>>>>>> >>>>>>>> Link: >>>>>>>> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com >>>>>>>> >>>>>>>> >>>>>>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com> >>>>>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com> >>>>>>>> --- >>>>>>>> arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- >>>>>>>> 1 file changed, 13 insertions(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/arch/x86/kernel/cpu/mce/apei.c >>>>>>>> b/arch/x86/kernel/cpu/mce/apei.c >>>>>>>> index 717192915f28..26d63818b2de 100644 >>>>>>>> --- a/arch/x86/kernel/cpu/mce/apei.c >>>>>>>> +++ b/arch/x86/kernel/cpu/mce/apei.c >>>>>>>> @@ -29,15 +29,27 @@ >>>>>>>> void apei_mce_report_mem_error(int severity, struct >>>>>>>> cper_sec_mem_err *mem_err) >>>>>>>> { >>>>>>>> struct mce m; >>>>>>>> + int grain = PAGE_SHIFT; >>>>>>>> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) >>>>>>>> return; >>>>>>>> + /* >>>>>>>> + * Even if the ->validation_bits are set for address mask, >>>>>>>> + * to be extra safe, check and reject an error radius '0', >>>>>>>> + * and fallback to the default page size. >>>>>>>> + */ >>>>>>>> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) { >>>>>>>> + grain = ~mem_err->physical_addr_mask + 1; >>>>>>>> + if (grain == 1) >>>>>>>> + grain = PAGE_SHIFT; >>>>>>> >>>>>>> Wait, if @grain is the number of bits to mask off the address, shouldn't >>>>>>> this be something like: >>>>>>> >>>>>>> grain = min_not_zero(PAGE_SHIFT, >>>>>>> hweight64(~mem_err->physical_addr_mask)); >>>>>> >>>>>> I see. I guess what you meant is >>>>>> grain = min(PAGE_SHIFT, (1 + >>>>>> hweight64(~mem_err->physical_addr_mask))); >>>>> >>>>> Sorry, take that back, it won't work either. >>>> >>>> This will work, >>>> grain = min_not_zero(PAGE_SHIFT - 1, >>>> hweight64(~mem_err->physical_addr_mask)); >>>> grain++; >>>> but too sophisticated? I guess I prefer the simple "if" expression. >>> >>> An "if" is fine, I was more pointing out that: >>> >>> hweight64(~mem_err->physical_addr_mask) + 1 >>> >>> ...and: >>> >>> ~mem_err->physical_addr_mask + 1; >>> >>> ...give different results. >> >> They are different indeed. hweight64 returns the count of set bit while >> ~mem_err->physical_addr_mask returns a negated value. >> >> According to the definition of "Physical Address Mask" - >> >> https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf >> >> Table N-31 Memory Error Record >> >> Physical Address Mask 24 8 Defines the valid address bits in the >> Physical Address field. The mask specifies the granularity of the >> physical address which is dependent on the hw/ implementation factors >> such as interleaving. >> >> It appears that "Physical Address Mask" is defined more like PAGE_MASK >> rather than in bitops hweight64() ofter used to count the set bits as >> an indication of (e.g.) how many registers are in use. >> >> Ans similar to PAGE_MASK, a valid "Physical Address Mask" should >> consist of a contiguous low 0 bits, not 1's and 0's mixed up. >> >> So far, as far as I can see, the v4 patch still looks correct to me. >> Please let me know if I'm missing anything. > > The v4 patch looks broken to me. If the address mask is > 0xffffffffffffffc0 to indicate a cacheline error then: > > ~mem_err->physical_addr_mask + 1; > > ...results in a grain of 64 when it should be 6. Right, it's the exponent that's needed, so back to __ffs64(). Sorry for the detour. v5 is coming next. thanks! -jane
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 717192915f28..26d63818b2de 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -29,15 +29,27 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) { struct mce m; + int grain = PAGE_SHIFT; if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) return; + /* + * Even if the ->validation_bits are set for address mask, + * to be extra safe, check and reject an error radius '0', + * and fallback to the default page size. + */ + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) { + grain = ~mem_err->physical_addr_mask + 1; + if (grain == 1) + grain = PAGE_SHIFT; + } + mce_setup(&m); m.bank = -1; /* Fake a memory read error with unknown channel */ m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f; - m.misc = (MCI_MISC_ADDR_PHYS << 6) | PAGE_SHIFT; + m.misc = (MCI_MISC_ADDR_PHYS << 6) | grain; if (severity >= GHES_SEV_RECOVERABLE) m.status |= MCI_STATUS_UC;