Message ID | 20220203174942.31630-4-nchatrad@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/edac/amd64: Add support for GPU nodes | expand |
On Thu, Feb 03, 2022 at 11:49:33AM -0600, Naveen Krishna Chatradhi wrote: > On SMCA banks of the GPU nodes, the node id information is > available in register MCA_IPID[47:44](InstanceIdHi). > > Convert the hardware node ID to a value used by Linux > where GPU nodes are sequentially after the CPU nodes. > Terminology should be consistent. I see "node id" and "node ID" here. ... > + } else if (bank_type == SMCA_UMC_V2) { > + /* > + * SMCA_UMC_V2 exists on GPU nodes, extract the node id > + * from register MCA_IPID[47:44](InstanceIdHi). > + * The InstanceIdHi field represents the instance ID of the GPU. > + * Which needs to be mapped to a value used by Linux, > + * where GPU nodes are simply numerically after the CPU nodes. > + */ > + node_id = amd_get_gpu_node_system_id(m->ipid); As mentioned for the previous patch, why not define this function in EDAC? Thanks, Yazen
Hi Yazen On 2/10/2022 5:01 AM, Yazen Ghannam wrote: > On Thu, Feb 03, 2022 at 11:49:33AM -0600, Naveen Krishna Chatradhi wrote: >> On SMCA banks of the GPU nodes, the node id information is >> available in register MCA_IPID[47:44](InstanceIdHi). >> >> Convert the hardware node ID to a value used by Linux >> where GPU nodes are sequentially after the CPU nodes. >> > Terminology should be consistent. I see "node id" and "node ID" here. Will keep it consistent. > > ... > >> + } else if (bank_type == SMCA_UMC_V2) { >> + /* >> + * SMCA_UMC_V2 exists on GPU nodes, extract the node id >> + * from register MCA_IPID[47:44](InstanceIdHi). >> + * The InstanceIdHi field represents the instance ID of the GPU. >> + * Which needs to be mapped to a value used by Linux, >> + * where GPU nodes are simply numerically after the CPU nodes. >> + */ >> + node_id = amd_get_gpu_node_system_id(m->ipid); > As mentioned for the previous patch, why not define this function in EDAC? Sure, with recent changes we can move this function to edac. Will wait for comments on other patches in the series and submit next version with feedback addressed. Regards, Naveenk > > Thanks, > Yazen
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index cc5c63feb26a..865a925ccef0 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -2,6 +2,7 @@ #include <linux/module.h> #include <linux/slab.h> +#include <asm/amd_nb.h> #include <asm/cpu.h> #include "mce_amd.h" @@ -1186,8 +1187,26 @@ static void decode_smca_error(struct mce *m) if (xec < smca_mce_descs[bank_type].num_descs) pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]); - if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc) - decode_dram_ecc(topology_die_id(m->extcpu), m); + if (xec == 0 && decode_dram_ecc) { + int node_id = 0; + + if (bank_type == SMCA_UMC) { + node_id = topology_die_id(m->extcpu); + } else if (bank_type == SMCA_UMC_V2) { + /* + * SMCA_UMC_V2 exists on GPU nodes, extract the node id + * from register MCA_IPID[47:44](InstanceIdHi). + * The InstanceIdHi field represents the instance ID of the GPU. + * Which needs to be mapped to a value used by Linux, + * where GPU nodes are simply numerically after the CPU nodes. + */ + node_id = amd_get_gpu_node_system_id(m->ipid); + } else { + return; + } + + decode_dram_ecc(node_id, m); + } } static inline void amd_decode_err_code(u16 ec)