Message ID | 20230511174506.875153-2-hristo@venev.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh | expand |
[AMD Official Use Only - General] > Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels > instead of 12. > > I tested this with two 32GB dual-rank DIMMs. The sizes appear to be > reported correctly: > > [ 2.122750] EDAC MC0: Giving out device to module amd64_edac > controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT) > [ 2.122751] EDAC amd64: F19h_M60h detected (node 0). > [ 2.122754] EDAC MC: UMC0 chip selects: > [ 2.122754] EDAC amd64: MC: 0: 0MB 1: 0MB > [ 2.122755] EDAC amd64: MC: 2: 16384MB 3: 16384MB > [ 2.122757] EDAC MC: UMC1 chip selects: > [ 2.122757] EDAC amd64: MC: 0: 0MB 1: 0MB > [ 2.122758] EDAC amd64: MC: 2: 16384MB 3: 16384MB > [ 2.122759] AMD64 EDAC driver v3.5.0 > > ECC errors can also be detected: > > [ 313.747594] mce: [Hardware Error]: Machine check events logged > [ 313.747597] [Hardware Error]: Corrected error, no action required. > [ 313.747613] [Hardware Error]: CPU:0 (19:61:2) > MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: > 0xdc2040000400011b > [ 313.747632] [Hardware Error]: Error Addr: 0x00000007ff7e93c0 > [ 313.747639] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: > 0x000100010a801203 > [ 313.747652] [Hardware Error]: Unified Memory Controller Ext. Error > Code: 0, DRAM ECC error. > [ 313.747669] EDAC MC0: 1 CE Cannot decode normalized address on > mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 > syndrome:0x1) > [ 313.747672] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD > > According to Mario Limonciello, the same code should also work for > models 70h-7Fh [1]. > > Link: https://lore.kernel.org/linux-edac/d619252e-35c7-814b-acdb- > 74714619d62a@amd.com/T/#m9fc20d5dc36074048ec5f1c0a5b01b7f972a1cc7 > [1] > Signed-off-by: Hristo Venev <hristo@venev.name> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> > --- > drivers/edac/amd64_edac.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c > index b55129425c81..c00f7e4ef366 100644 > --- a/drivers/edac/amd64_edac.c > +++ b/drivers/edac/amd64_edac.c > @@ -3816,6 +3816,14 @@ static int per_family_init(struct amd64_pvt *pvt) > case 0x50 ... 0x5f: > pvt->ctl_name = "F19h_M50h"; > break; > + case 0x60 ... 0x6f: > + pvt->ctl_name = "F19h_M60h"; > + pvt->flags.zn_regs_v2 = 1; > + break; > + case 0x70 ... 0x7f: > + pvt->ctl_name = "F19h_M70h"; > + pvt->flags.zn_regs_v2 = 1; > + break; > case 0xa0 ... 0xaf: > pvt->ctl_name = "F19h_MA0h"; > pvt->max_mcs = 12; > -- > 2.40.1
On Thu, May 11, 2023 at 08:45:07PM +0300, Hristo Venev wrote: > Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels > instead of 12. > > I tested this with two 32GB dual-rank DIMMs. The sizes appear to be > reported correctly: > > [ 2.122750] EDAC MC0: Giving out device to module amd64_edac controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT) > [ 2.122751] EDAC amd64: F19h_M60h detected (node 0). > [ 2.122754] EDAC MC: UMC0 chip selects: > [ 2.122754] EDAC amd64: MC: 0: 0MB 1: 0MB > [ 2.122755] EDAC amd64: MC: 2: 16384MB 3: 16384MB > [ 2.122757] EDAC MC: UMC1 chip selects: > [ 2.122757] EDAC amd64: MC: 0: 0MB 1: 0MB > [ 2.122758] EDAC amd64: MC: 2: 16384MB 3: 16384MB > [ 2.122759] AMD64 EDAC driver v3.5.0 > > ECC errors can also be detected: > > [ 313.747594] mce: [Hardware Error]: Machine check events logged > [ 313.747597] [Hardware Error]: Corrected error, no action required. > [ 313.747613] [Hardware Error]: CPU:0 (19:61:2) MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000400011b > [ 313.747632] [Hardware Error]: Error Addr: 0x00000007ff7e93c0 > [ 313.747639] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000100010a801203 > [ 313.747652] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error. > [ 313.747669] EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x1) > [ 313.747672] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD > > According to Mario Limonciello, the same code should also work for > models 70h-7Fh [1]. > > Link: https://lore.kernel.org/linux-edac/d619252e-35c7-814b-acdb-74714619d62a@amd.com/T/#m9fc20d5dc36074048ec5f1c0a5b01b7f972a1cc7 [1] > Signed-off-by: Hristo Venev <hristo@venev.name> > --- > drivers/edac/amd64_edac.c | 8 ++++++++ > 1 file changed, 8 insertions(+) Applied, thanks.
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index b55129425c81..c00f7e4ef366 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -3816,6 +3816,14 @@ static int per_family_init(struct amd64_pvt *pvt) case 0x50 ... 0x5f: pvt->ctl_name = "F19h_M50h"; break; + case 0x60 ... 0x6f: + pvt->ctl_name = "F19h_M60h"; + pvt->flags.zn_regs_v2 = 1; + break; + case 0x70 ... 0x7f: + pvt->ctl_name = "F19h_M70h"; + pvt->flags.zn_regs_v2 = 1; + break; case 0xa0 ... 0xaf: pvt->ctl_name = "F19h_MA0h"; pvt->max_mcs = 12;
Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels instead of 12. I tested this with two 32GB dual-rank DIMMs. The sizes appear to be reported correctly: [ 2.122750] EDAC MC0: Giving out device to module amd64_edac controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT) [ 2.122751] EDAC amd64: F19h_M60h detected (node 0). [ 2.122754] EDAC MC: UMC0 chip selects: [ 2.122754] EDAC amd64: MC: 0: 0MB 1: 0MB [ 2.122755] EDAC amd64: MC: 2: 16384MB 3: 16384MB [ 2.122757] EDAC MC: UMC1 chip selects: [ 2.122757] EDAC amd64: MC: 0: 0MB 1: 0MB [ 2.122758] EDAC amd64: MC: 2: 16384MB 3: 16384MB [ 2.122759] AMD64 EDAC driver v3.5.0 ECC errors can also be detected: [ 313.747594] mce: [Hardware Error]: Machine check events logged [ 313.747597] [Hardware Error]: Corrected error, no action required. [ 313.747613] [Hardware Error]: CPU:0 (19:61:2) MC21_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000400011b [ 313.747632] [Hardware Error]: Error Addr: 0x00000007ff7e93c0 [ 313.747639] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000100010a801203 [ 313.747652] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error. [ 313.747669] EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x1) [ 313.747672] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD According to Mario Limonciello, the same code should also work for models 70h-7Fh [1]. Link: https://lore.kernel.org/linux-edac/d619252e-35c7-814b-acdb-74714619d62a@amd.com/T/#m9fc20d5dc36074048ec5f1c0a5b01b7f972a1cc7 [1] Signed-off-by: Hristo Venev <hristo@venev.name> --- drivers/edac/amd64_edac.c | 8 ++++++++ 1 file changed, 8 insertions(+)