Message ID | 20210823185437.94417-1-nchatrad@amd.com (mailing list archive) |
---|---|
Headers | show |
Series | x86/edac/amd64: Add support for noncpu nodes | expand |
On newer heterogeneous systems with AMD CPUs the data fabrics of GPUs can be connected directly via custom links. This patchset does the following 1. amd_nb.c: a. Add support for northbridges on Aldebaran GPU nodes b. export AMD node map details to be used by edac and mce modules 2. mce_amd module: a. Identify the node ID where the error occurred and map the node id to linux enumerated node id. 2. Modifies the amd64_edac module a. Add new family op routines b. Enumerate UMCs and HBMs on the GPU nodes This patchset is rebased on top of " commit 07416cadfdfa38283b840e700427ae3782c76f6b Author: Yazen Ghannam <yazen.ghannam@amd.com> Date: Tue Oct 5 15:44:19 2021 +0000 EDAC/amd64: Handle three rank interleaving mode " Muralidhara M K (2): x86/amd_nb: Add support for northbridges on Aldebaran EDAC/amd64: Extend family ops functions Naveen Krishna Chatradhi (2): EDAC/mce_amd: Extract node id from MCA_IPID EDAC/amd64: Enumerate memory on Aldebaran GPU nodes arch/x86/include/asm/amd_nb.h | 9 + arch/x86/kernel/amd_nb.c | 131 +++++++-- drivers/edac/amd64_edac.c | 517 +++++++++++++++++++++++++--------- drivers/edac/amd64_edac.h | 33 +++ drivers/edac/mce_amd.c | 24 +- include/linux/pci_ids.h | 1 + 6 files changed, 564 insertions(+), 151 deletions(-)
On Fri, Oct 15, 2021 at 12:23:56AM +0530, Naveen Krishna Chatradhi wrote: > On newer heterogeneous systems with AMD CPUs the data fabrics of GPUs > can be connected directly via custom links. > > This patchset does the following > 1. amd_nb.c: > a. Add support for northbridges on Aldebaran GPU nodes > b. export AMD node map details to be used by edac and mce modules > > 2. mce_amd module: > a. Identify the node ID where the error occurred and map the node id > to linux enumerated node id. > > 2. Modifies the amd64_edac module > a. Add new family op routines > b. Enumerate UMCs and HBMs on the GPU nodes > > This patchset is rebased on top of > " > commit 07416cadfdfa38283b840e700427ae3782c76f6b > Author: Yazen Ghannam <yazen.ghannam@amd.com> > Date: Tue Oct 5 15:44:19 2021 +0000 > > EDAC/amd64: Handle three rank interleaving mode > " > > Muralidhara M K (2): > x86/amd_nb: Add support for northbridges on Aldebaran > EDAC/amd64: Extend family ops functions > > Naveen Krishna Chatradhi (2): > EDAC/mce_amd: Extract node id from MCA_IPID > EDAC/amd64: Enumerate memory on Aldebaran GPU nodes > > arch/x86/include/asm/amd_nb.h | 9 + > arch/x86/kernel/amd_nb.c | 131 +++++++-- > drivers/edac/amd64_edac.c | 517 +++++++++++++++++++++++++--------- > drivers/edac/amd64_edac.h | 33 +++ > drivers/edac/mce_amd.c | 24 +- > include/linux/pci_ids.h | 1 + > 6 files changed, 564 insertions(+), 151 deletions(-) So which v4 should I be looking at - this one or https://lore.kernel.org/r/20211014185058.9587-1-nchatrad@amd.com ? Btw, you don't have to do --in-reply-to and keep all patchsets in a single thread - just send the new revision as a separate thread. Thx.
Hi Boris, On 10/15/2021 1:23 AM, Borislav Petkov wrote: > [CAUTION: External Email] > > On Fri, Oct 15, 2021 at 12:23:56AM +0530, Naveen Krishna Chatradhi wrote: >> On newer heterogeneous systems with AMD CPUs the data fabrics of GPUs >> can be connected directly via custom links. >> >> This patchset does the following >> 1. amd_nb.c: >> a. Add support for northbridges on Aldebaran GPU nodes >> b. export AMD node map details to be used by edac and mce modules >> >> 2. mce_amd module: >> a. Identify the node ID where the error occurred and map the node id >> to linux enumerated node id. >> >> 2. Modifies the amd64_edac module >> a. Add new family op routines >> b. Enumerate UMCs and HBMs on the GPU nodes >> >> This patchset is rebased on top of >> " >> commit 07416cadfdfa38283b840e700427ae3782c76f6b >> Author: Yazen Ghannam <yazen.ghannam@amd.com> >> Date: Tue Oct 5 15:44:19 2021 +0000 >> >> EDAC/amd64: Handle three rank interleaving mode >> " >> >> Muralidhara M K (2): >> x86/amd_nb: Add support for northbridges on Aldebaran >> EDAC/amd64: Extend family ops functions >> >> Naveen Krishna Chatradhi (2): >> EDAC/mce_amd: Extract node id from MCA_IPID >> EDAC/amd64: Enumerate memory on Aldebaran GPU nodes >> >> arch/x86/include/asm/amd_nb.h | 9 + >> arch/x86/kernel/amd_nb.c | 131 +++++++-- >> drivers/edac/amd64_edac.c | 517 +++++++++++++++++++++++++--------- >> drivers/edac/amd64_edac.h | 33 +++ >> drivers/edac/mce_amd.c | 24 +- >> include/linux/pci_ids.h | 1 + >> 6 files changed, 564 insertions(+), 151 deletions(-) > So which v4 should I be looking at - this one or I've noticed the v4 tag missing on the 3rd and 4th patch in the series. i tried to abort but the git send-email went through. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F20211014185058.9587-1-nchatrad%40amd.com&data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7C74d7878274cb4849f2b008d98f4c42c6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637698379996982690%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=g1%2Bw29J4X%2FbdsugpYICDKMzM4Qd2nvsS1dV8zgR150w%3D&reserved=0 Could you please review the latest one (above link) or should i push them as v5, to avoid the confusion. > > ? > > Btw, you don't have to do --in-reply-to and keep all patchsets in a > single thread - just send the new revision as a separate thread. Sure, will do that. thank you. > > Thx. > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7C74d7878274cb4849f2b008d98f4c42c6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637698379996982690%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RNIqyNMrAXLLIzfDB06wXSrmRXH7C596oHAnGIQM1e0%3D&reserved=0
On Fri, Oct 15, 2021 at 05:48:32PM +0530, Chatradhi, Naveen Krishna wrote: > Could you please review the latest one (above link) Ok. > or should i push them as v5, to avoid the confusion. Nah, not necessary. The goal is to always avoid spamming maintainers with patchsets if not absolutely necessary. :-) Thx.
From: Muralidhara M K <muralimk@amd.com> On newer heterogeneous systems the data fabrics of the CPUs and GPUs are connected directly via a custom links. This patchset do not have any dependency on series by Yazen Ghannam AMD MCA Address Translation Updates [https://patchwork.kernel.org/project/linux-edac/list/?series=505989] This patchset does the following 1. Add support for northbridges on Aldebaran * x86/amd_nb: Add support for northbridges on Aldebaran 2. Modifies the amd64_edac module to a. Handle the UMCs on the noncpu nodes, * EDAC/mce_amd: Extract node id from MCA_IPID b. Enumerate PCI IDs and HBM memory * EDAC/amd64: Enumerate memory on noncpu nodes Muralidhara M K (1): x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi (2): EDAC/mce_amd: Extract node id from MCA_IPID EDAC/amd64: Enumerate memory on noncpu nodes arch/x86/include/asm/amd_nb.h | 10 ++ arch/x86/kernel/amd_nb.c | 63 +++++++++- drivers/edac/amd64_edac.c | 219 ++++++++++++++++++++++++++++++---- drivers/edac/amd64_edac.h | 28 +++++ drivers/edac/mce_amd.c | 19 ++- include/linux/pci_ids.h | 1 + 6 files changed, 308 insertions(+), 32 deletions(-)