Message ID | CAHq9+ShGiB_H6-E=L398zYR=ja16r2OuvJZfU4KLof=segyJbw@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ie31200_edac missing PCI ID for i3-4370 | expand |
On 1/31/21 7:07 PM, Paul Marks wrote: > I have an ASRock C226M WS with an i3-4370 CPU. > > # lspci -vnn > 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor > DRAM Controller [8086:0c00] (rev 06) > Subsystem: ASRock Incorporation 4th Gen Core Processor > DRAM Controller [1849:0c00] > Flags: bus master, fast devsel, latency 0 > Capabilities: [e0] Vendor Specific Information: Len=0c <?> > Kernel driver in use: hsw_uncore > > But edac-util doesn't work: > > # edac-util -v > edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs > > I tried this ham-fisted patch: > > # diff -u ./drivers/edac/ie31200_edac.c{.old,} > --- ./drivers/edac/ie31200_edac.c.old > +++ ./drivers/edac/ie31200_edac.c > @@ -58,7 +58,7 @@ > #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150 > #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158 > #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c > -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04 > +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00 > #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08 > #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918 > #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918 just curious why you removed here and didn't just add? > > And it seems happy now: > > # lspci -vnn > 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor > DRAM Controller [8086:0c00] (rev 06) > Subsystem: ASRock Incorporation 4th Gen Core Processor > DRAM Controller [1849:0c00] > Flags: bus master, fast devsel, latency 0 > Capabilities: [e0] Vendor Specific Information: Len=0c <?> > Kernel driver in use: hsw_uncore > Kernel modules: ie31200_edac > > # edac-util -v > mc0: 0 Uncorrected Errors with no DIMM info > mc0: 0 Corrected Errors with no DIMM info > mc0: csrow0: 0 Uncorrected Errors > mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors > mc0: csrow1: 0 Uncorrected Errors > mc0: csrow1: mc#0csrow#1channel#0: 0 Corrected Errors > edac-util: No errors to report. > > I don't know if it's truly working because I can't overclock the RAM > to induce ECC errors, but still I think adding 8086:0c00 to this > driver could be useful. > Cool yeah - I think it makes sense to add if can confirm that the Intel datasheet says that this cpu uses the same registers to read errors from as the others. I can certainly confirm that the other pci ids do increment ce counts... Thanks, -Jason
On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote: > > On 1/31/21 7:07 PM, Paul Marks wrote: > > I have an ASRock C226M WS with an i3-4370 CPU. > > > > # lspci -vnn > > 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor > > DRAM Controller [8086:0c00] (rev 06) > > Subsystem: ASRock Incorporation 4th Gen Core Processor > > DRAM Controller [1849:0c00] > > Flags: bus master, fast devsel, latency 0 > > Capabilities: [e0] Vendor Specific Information: Len=0c <?> > > Kernel driver in use: hsw_uncore > > > > But edac-util doesn't work: > > > > # edac-util -v > > edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs > > > > I tried this ham-fisted patch: > > > > # diff -u ./drivers/edac/ie31200_edac.c{.old,} > > --- ./drivers/edac/ie31200_edac.c.old > > +++ ./drivers/edac/ie31200_edac.c > > @@ -58,7 +58,7 @@ > > #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150 > > #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158 > > #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c > > -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04 > > +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00 > > #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08 > > #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918 > > #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918 > > just curious why you removed here and didn't just add? This is not a serious patch, just a one-liner to demonstrate the problem. > > > > > And it seems happy now: > > > > # lspci -vnn > > 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor > > DRAM Controller [8086:0c00] (rev 06) > > Subsystem: ASRock Incorporation 4th Gen Core Processor > > DRAM Controller [1849:0c00] > > Flags: bus master, fast devsel, latency 0 > > Capabilities: [e0] Vendor Specific Information: Len=0c <?> > > Kernel driver in use: hsw_uncore > > Kernel modules: ie31200_edac > > > > # edac-util -v > > mc0: 0 Uncorrected Errors with no DIMM info > > mc0: 0 Corrected Errors with no DIMM info > > mc0: csrow0: 0 Uncorrected Errors > > mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors > > mc0: csrow1: 0 Uncorrected Errors > > mc0: csrow1: mc#0csrow#1channel#0: 0 Corrected Errors > > edac-util: No errors to report. > > > > I don't know if it's truly working because I can't overclock the RAM > > to induce ECC errors, but still I think adding 8086:0c00 to this > > driver could be useful. > > > > Cool yeah - I think it makes sense to add if can confirm > that the Intel datasheet says that this cpu uses the same > registers to read errors from as the others. I can certainly > confirm that the other pci ids do increment ce counts... > > Thanks, > > -Jason
On 2/4/21 6:22 PM, Paul Marks wrote: > On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote: >> >> On 1/31/21 7:07 PM, Paul Marks wrote: >>> I have an ASRock C226M WS with an i3-4370 CPU. >>> >>> # lspci -vnn >>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor >>> DRAM Controller [8086:0c00] (rev 06) >>> Subsystem: ASRock Incorporation 4th Gen Core Processor >>> DRAM Controller [1849:0c00] >>> Flags: bus master, fast devsel, latency 0 >>> Capabilities: [e0] Vendor Specific Information: Len=0c <?> >>> Kernel driver in use: hsw_uncore >>> >>> But edac-util doesn't work: >>> >>> # edac-util -v >>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs >>> >>> I tried this ham-fisted patch: >>> >>> # diff -u ./drivers/edac/ie31200_edac.c{.old,} >>> --- ./drivers/edac/ie31200_edac.c.old >>> +++ ./drivers/edac/ie31200_edac.c >>> @@ -58,7 +58,7 @@ >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150 >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158 >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c >>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04 >>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00 >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08 >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918 >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918 >> >> just curious why you removed here and didn't just add? > > This is not a serious patch, just a one-liner to demonstrate the problem. Ok. Any chance you can find the datasheet that shows that this driver is using the appropriate registers for this hw? I didn't find it quickly looking... Thanks, -Jason
On Tue, Feb 9, 2021 at 2:25 PM Jason Baron <jbaron@akamai.com> wrote: > > On 2/4/21 6:22 PM, Paul Marks wrote: > > On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote: > >> > >> On 1/31/21 7:07 PM, Paul Marks wrote: > >>> I have an ASRock C226M WS with an i3-4370 CPU. > >>> > >>> # lspci -vnn > >>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor > >>> DRAM Controller [8086:0c00] (rev 06) > >>> Subsystem: ASRock Incorporation 4th Gen Core Processor > >>> DRAM Controller [1849:0c00] > >>> Flags: bus master, fast devsel, latency 0 > >>> Capabilities: [e0] Vendor Specific Information: Len=0c <?> > >>> Kernel driver in use: hsw_uncore > >>> > >>> But edac-util doesn't work: > >>> > >>> # edac-util -v > >>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs > >>> > >>> I tried this ham-fisted patch: > >>> > >>> # diff -u ./drivers/edac/ie31200_edac.c{.old,} > >>> --- ./drivers/edac/ie31200_edac.c.old > >>> +++ ./drivers/edac/ie31200_edac.c > >>> @@ -58,7 +58,7 @@ > >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150 > >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158 > >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c > >>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04 > >>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00 > >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08 > >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918 > >>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918 > >> > >> just curious why you removed here and didn't just add? > > > > This is not a serious patch, just a one-liner to demonstrate the problem. > > Ok. Any chance you can find the datasheet that shows that this > driver is using the appropriate registers for this hw? I didn't > find it quickly looking... > I wouldn't know where to begin. Do you have an example of a similar datasheet from one of the known-good devices? I left "memtester" running on this machine, because it might increase the odds of generating an ECC error someday.
On 2/9/21 6:58 PM, Paul Marks wrote: > On Tue, Feb 9, 2021 at 2:25 PM Jason Baron <jbaron@akamai.com> wrote: >> On 2/4/21 6:22 PM, Paul Marks wrote: >>> On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote: >>>> On 1/31/21 7:07 PM, Paul Marks wrote: >>>>> I have an ASRock C226M WS with an i3-4370 CPU. >>>>> >>>>> # lspci -vnn >>>>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor >>>>> DRAM Controller [8086:0c00] (rev 06) >>>>> Subsystem: ASRock Incorporation 4th Gen Core Processor >>>>> DRAM Controller [1849:0c00] >>>>> Flags: bus master, fast devsel, latency 0 >>>>> Capabilities: [e0] Vendor Specific Information: Len=0c <?> >>>>> Kernel driver in use: hsw_uncore >>>>> >>>>> But edac-util doesn't work: >>>>> >>>>> # edac-util -v >>>>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs >>>>> >>>>> I tried this ham-fisted patch: >>>>> >>>>> # diff -u ./drivers/edac/ie31200_edac.c{.old,} >>>>> --- ./drivers/edac/ie31200_edac.c.old >>>>> +++ ./drivers/edac/ie31200_edac.c >>>>> @@ -58,7 +58,7 @@ >>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150 >>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158 >>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c >>>>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04 >>>>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00 >>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08 >>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918 >>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918 >>>> just curious why you removed here and didn't just add? >>> This is not a serious patch, just a one-liner to demonstrate the problem. >> Ok. Any chance you can find the datasheet that shows that this >> driver is using the appropriate registers for this hw? I didn't >> find it quickly looking... >> > I wouldn't know where to begin. Do you have an example of a similar > datasheet from one of the known-good devices? > > I left "memtester" running on this machine, because it might increase > the odds of generating an ECC error someday. Hi Paul, I have a list of them at the top of: drivers/edac/ie31200_edac.c According to the following intel link it looks like '0xc[0-f]' is valid (page 52): https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e3-1200v3-vol-2-datasheet.pdf So I'm fine with this patch (assuming it just becomes an addition). Thanks, -Jason
On 2/9/21 10:27 PM, Jason Baron wrote: > > > On 2/9/21 6:58 PM, Paul Marks wrote: >> On Tue, Feb 9, 2021 at 2:25 PM Jason Baron <jbaron@akamai.com> wrote: >>> On 2/4/21 6:22 PM, Paul Marks wrote: >>>> On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote: >>>>> On 1/31/21 7:07 PM, Paul Marks wrote: >>>>>> I have an ASRock C226M WS with an i3-4370 CPU. >>>>>> >>>>>> # lspci -vnn >>>>>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor >>>>>> DRAM Controller [8086:0c00] (rev 06) >>>>>> Subsystem: ASRock Incorporation 4th Gen Core Processor >>>>>> DRAM Controller [1849:0c00] >>>>>> Flags: bus master, fast devsel, latency 0 >>>>>> Capabilities: [e0] Vendor Specific Information: Len=0c <?> >>>>>> Kernel driver in use: hsw_uncore >>>>>> >>>>>> But edac-util doesn't work: >>>>>> >>>>>> # edac-util -v >>>>>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs >>>>>> >>>>>> I tried this ham-fisted patch: >>>>>> >>>>>> # diff -u ./drivers/edac/ie31200_edac.c{.old,} >>>>>> --- ./drivers/edac/ie31200_edac.c.old >>>>>> +++ ./drivers/edac/ie31200_edac.c >>>>>> @@ -58,7 +58,7 @@ >>>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150 >>>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158 >>>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c >>>>>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04 >>>>>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00 >>>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08 >>>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918 >>>>>> #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918 >>>>> just curious why you removed here and didn't just add? >>>> This is not a serious patch, just a one-liner to demonstrate the problem. >>> Ok. Any chance you can find the datasheet that shows that this >>> driver is using the appropriate registers for this hw? I didn't >>> find it quickly looking... >>> >> I wouldn't know where to begin. Do you have an example of a similar >> datasheet from one of the known-good devices? >> >> I left "memtester" running on this machine, because it might increase >> the odds of generating an ECC error someday. > Hi Paul, > > I have a list of them at the top of: > drivers/edac/ie31200_edac.c > > According to the following intel link it looks > like '0xc[0-f]' is valid (page 52): Sorry meant to write that as: '0x0c0[0-f]'. > https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e3-1200v3-vol-2-datasheet.pdf > > So I'm fine with this patch (assuming it just > becomes an addition). > > Thanks, > > -Jason >
--- ./drivers/edac/ie31200_edac.c.old +++ ./drivers/edac/ie31200_edac.c @@ -58,7 +58,7 @@ #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150 #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158 #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04 +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00 #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08 #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918 #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918