Message ID | 20210823185437.94417-2-nchatrad@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/edac/amd64: Add support for noncpu nodes | expand |
On Tue, Aug 24, 2021 at 12:24:35AM +0530, Naveen Krishna Chatradhi wrote: > From: Muralidhara M K <muralimk@amd.com> > > On newer systems the CPUs manage MCA errors reported from the GPUs. > Enumerate the GPU nodes with the AMD NB framework to support EDAC. > > This patch adds necessary code to manage the Aldebaran nodes along with Avoid having "This patch" or "This commit" in the commit message. It is tautologically useless. Also, do $ git grep 'This patch' Documentation/process for more details. Also, what are the "Aldebaran nodes"? Something on a star which is 65 light years away? > the CPU nodes. > > The GPU nodes are enumerated in sequential order based on the > PCI hierarchy, and the first GPU node is assumed to have an "AMD Node > ID" value of 8 (the second GPU node has 9, etc.). What does that mean? The GPU nodes are simply numerically after the CPU nodes or how am I to understand this nomenclature? > Each Aldebaran GPU > package has 2 Data Fabrics, which are enumerated as 2 nodes. > With this implementation detail, the Data Fabric on the GPU nodes can be > accessed the same way as the Data Fabric on CPU nodes. > > Signed-off-by: Muralidhara M K <muralimk@amd.com> > Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> > Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> > --- > Changes since v2: Added Reviewed-by Yazen Ghannam > > arch/x86/include/asm/amd_nb.h | 10 ++++++ > arch/x86/kernel/amd_nb.c | 63 ++++++++++++++++++++++++++++++++--- > include/linux/pci_ids.h | 1 + > 3 files changed, 69 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h > index 455066a06f60..09905f6c7218 100644 > --- a/arch/x86/include/asm/amd_nb.h > +++ b/arch/x86/include/asm/amd_nb.h > @@ -80,6 +80,16 @@ struct amd_northbridge_info { > > #ifdef CONFIG_AMD_NB > > +/* > + * On newer heterogeneous systems the data fabrics of the CPUs and GPUs > + * are connected directly via a custom links, like is done with s/ a // > + * 2 socket CPU systems and also within a socket for Multi-chip Module > + * (MCM) CPUs like Naples. > + * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value In all your text: s/cpu/CPU/g > + * of 8 (the second GPU node has 9, etc.). > + */ > +#define NONCPU_NODE_INDEX 8 Why is this assumed? Can it instead be read from the hardware somewhere? Or there simply won't be more than 8 CPU nodes anyway? Not at least in the near future? I'd prefer stuff to be read out directly from the hardware so that when the hardware changes, the code just works instead of doing assumptions which get invalidated later. > + > u16 amd_nb_num(void); > bool amd_nb_has_feature(unsigned int feature); > struct amd_northbridge *node_to_amd_nb(int node); > diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c > index 23dda362dc0f..6ad5664a18aa 100644 > --- a/arch/x86/kernel/amd_nb.c > +++ b/arch/x86/kernel/amd_nb.c > @@ -26,6 +26,8 @@ > #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444 > #define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654 > #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e > +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb > +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4 You see how those defines are aligned vertically, right? If your new defines don't fit, you realign them all vertically too - you don't just slap them there. And if it wasn't clear above, that Aldebaran GPU chip name means something only to AMD folks. If this is correct https://en.wikipedia.org/wiki/Video_Core_Next then that Aldebaran thing is VCN 2.6 although this is only the video encoding stuff and GPU I guess is more than that. IOW, what I'm trying to say is, just like we name the CPUs using their families, you should name the GPUs nomenclature with GPU families (I guess there's stuff like that) and not use the marketing crap. If you need an example, here's how we did it for the Intel marketing pile of bullsh*t: arch/x86/include/asm/intel-family.h Please provide a similar way of referring to the GPU chips. > /* Protect the PCI config register pairs used for SMN and DF indirect access. */ > static DEFINE_MUTEX(smn_mutex); > @@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = { > {} > }; > > +static const struct pci_device_id amd_noncpu_root_ids[] = { Why is that "noncpu" thing everywhere? Is this thing going to be anything else besides a GPU? If not, you can simply call it amd_gpu_root_ids to mean *exactly* what they are. PCI IDs on the GPU. > + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) }, > + {} > +}; > + > +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = { > + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) }, > + {} > +}; > + > +static const struct pci_device_id amd_noncpu_nb_link_ids[] = { > + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) }, > + {} > +}; > + > const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = { > { 0x00, 0x18, 0x20 }, > { 0xff, 0x00, 0x20 }, > @@ -230,11 +247,16 @@ int amd_cache_northbridges(void) > const struct pci_device_id *misc_ids = amd_nb_misc_ids; > const struct pci_device_id *link_ids = amd_nb_link_ids; > const struct pci_device_id *root_ids = amd_root_ids; > + > + const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids; > + const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids; > + const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids; > + > struct pci_dev *root, *misc, *link; > struct amd_northbridge *nb; > u16 roots_per_misc = 0; > - u16 misc_count = 0; > - u16 root_count = 0; > + u16 misc_count = 0, misc_count_noncpu = 0; > + u16 root_count = 0, root_count_noncpu = 0; > u16 i, j; > > if (amd_northbridges.num) > @@ -253,10 +275,16 @@ int amd_cache_northbridges(void) > if (!misc_count) > return -ENODEV; > > + while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL) > + misc_count_noncpu++; > + > root = NULL; > while ((root = next_northbridge(root, root_ids)) != NULL) > root_count++; > > + while ((root = next_northbridge(root, noncpu_root_ids)) != NULL) > + root_count_noncpu++; > + > if (root_count) { > roots_per_misc = root_count / misc_count; > > @@ -270,15 +298,28 @@ int amd_cache_northbridges(void) > } > } > > - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); > + if (misc_count_noncpu) { > + /* > + * The first non-CPU Node ID starts at 8 even if there are fewer > + * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb > + * indexing scheme, allocate the number of GPU nodes plus 8. > + * Some allocated amd_northbridge structures will go unused when > + * the number of CPU nodes is less than 8, but this tradeoff is to > + * keep things relatively simple. Why simple? What's wrong with having [node IDs][GPU node IDs] i.e., the usual nodes come first and the GPU ones after it. You enumerate everything properly here so you can control what goes where. Which means, you don't need this NONCPU_NODE_INDEX non-sense at all. Hmmm? > + */ > + amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu; > + } else { > + amd_northbridges.num = misc_count; > + } > + > + nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL); > if (!nb) > return -ENOMEM; > > amd_northbridges.nb = nb; > - amd_northbridges.num = misc_count; > > link = misc = root = NULL; > - for (i = 0; i < amd_northbridges.num; i++) { > + for (i = 0; i < misc_count; i++) { > node_to_amd_nb(i)->root = root = > next_northbridge(root, root_ids); > node_to_amd_nb(i)->misc = misc = > @@ -299,6 +340,18 @@ int amd_cache_northbridges(void) > root = next_northbridge(root, root_ids); > } > > + if (misc_count_noncpu) { > + link = misc = root = NULL; > + for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) { So this is not keeping things relatively simple - this is making you jump to the GPU nodes to prepare them too which is making them special. > + node_to_amd_nb(i)->root = root = > + next_northbridge(root, noncpu_root_ids); > + node_to_amd_nb(i)->misc = misc = > + next_northbridge(misc, noncpu_misc_ids); > + node_to_amd_nb(i)->link = link = > + next_northbridge(link, noncpu_link_ids); And seeing how you put those pointers in ->root, ->misc and ->link, you can just as well drop those noncpu_*_ids and put the aldebaran PCI IDs simply in amd_root_ids, amd_nb_misc_ids and amd_nb_link_ids respectively. Because to this code, the RAS functionality is no different than any other CPU because, well, the interface is those PCI devices. So the thing doesn't care if it is GPU or not. So you don't need any of that separation between GPU and CPU nodes when it comes to the RAS code. Makes sense?
On Wed, Aug 25, 2021 at 12:42:43PM +0200, Borislav Petkov wrote: > On Tue, Aug 24, 2021 at 12:24:35AM +0530, Naveen Krishna Chatradhi wrote: ... > > > > The GPU nodes are enumerated in sequential order based on the > > PCI hierarchy, and the first GPU node is assumed to have an "AMD Node > > ID" value of 8 (the second GPU node has 9, etc.). > > What does that mean? The GPU nodes are simply numerically after the CPU > nodes or how am I to understand this nomenclature? > Yes, the GPU nodes will be numerically after the CPU nodes. However, there will be a gap in the "Node ID" values. For example, if there is one CPU node and two GPU nodes, then the "Node ID" values will look like this: CPU Node0 -> System Node ID 0 GPU Node0 -> System Node ID 8 GPU Node1 -> System Node ID 9 ... > > + * of 8 (the second GPU node has 9, etc.). > > + */ > > +#define NONCPU_NODE_INDEX 8 > > Why is this assumed? Can it instead be read from the hardware somewhere? > Or there simply won't be more than 8 CPU nodes anyway? Not at least in > the near future? > Yes, the intention is to leave a big enough gap for at least the forseeable future. > I'd prefer stuff to be read out directly from the hardware so that when > the hardware changes, the code just works instead of doing assumptions > which get invalidated later. > So after going through the latest documentation and asking the one of our hardware folks, it looks like we have an option to read this value from one of the Data Fabric registers. Hopefully, whatever solution we settle on will stick for a while. The Data Fabric registers are not architectural, and registers and fields have changed between model groups. ... > > +static const struct pci_device_id amd_noncpu_root_ids[] = { > > Why is that "noncpu" thing everywhere? Is this thing going to be > anything else besides a GPU? > > If not, you can simply call it > > amd_gpu_root_ids > > to mean *exactly* what they are. PCI IDs on the GPU. > These devices aren't officially GPUs, since they don't have graphics/video capabilities. Can we come up with a new term for this class of devices? Maybe accelerators or something? In any case, GPU is still used throughout documentation and code, so it's fair to just stick with "gpu". ... > > > > - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); > > + if (misc_count_noncpu) { > > + /* > > + * The first non-CPU Node ID starts at 8 even if there are fewer > > + * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb > > + * indexing scheme, allocate the number of GPU nodes plus 8. > > + * Some allocated amd_northbridge structures will go unused when > > + * the number of CPU nodes is less than 8, but this tradeoff is to > > + * keep things relatively simple. > > Why simple? > > What's wrong with having > > [node IDs][GPU node IDs] > > i.e., the usual nodes come first and the GPU ones after it. > > You enumerate everything properly here so you can control what goes > where. Which means, you don't need this NONCPU_NODE_INDEX non-sense at > all. > > Hmmm? > We use the Node ID to index into the amd_northbridge.nb array, e.g. in node_to_amd_nb(). We can get the Node ID of a GPU node when processing an MCA error as in Patch 2 of this set. The hardware is going to give us a value of 8 or more. So, for example, if we set up the "nb" array like this for 1 CPU and 2 GPUs: [ID:Type] : [0: CPU], [8: GPU], [9: GPU] Then I think we'll need some more processing at runtime to map, for example, an error from GPU Node 9 to NB array Index 2, etc. Or we can manage this at init time like this: [0: CPU], [1: NULL], [2: NULL], [3: NULL], [4: NULL], [5: NULL], [6: NULL], [7, NULL], [8: GPU], [9: GPU] And at runtime, the code which does Node ID to NB entry just works. This applies to node_to_amd_nb(), places where we loop over amd_nb_num(), etc. What do you think? Thanks, Yazen
On Wed, Sep 01, 2021 at 06:17:21PM +0000, Yazen Ghannam wrote: > These devices aren't officially GPUs, since they don't have graphics/video > capabilities. Can we come up with a new term for this class of devices? Maybe > accelerators or something? > > In any case, GPU is still used throughout documentation and code, so it's fair > to just stick with "gpu". Hmm, yeah, everybody is talking about special-purpose processing units now, i.e., accelerators or whatever they call them. I guess this is the new fancy thing since sliced bread. Well, what are those PCI IDs going to represent? Devices which have RAS capabilities on them? We have this nomenclature called "uncore" in the perf subsystem for counters which are not part of the CPU core or whatever. But there we use that term on AMD already so that might cause confusion. But I guess the type of those devices doesn't matter for amd_nb.c, right? All that thing cares for is having an array of northbridges, each with the respective PCI devices and that's it. So for amd_nb.c I think that differentiation doesn't matter... but keep reading... > We use the Node ID to index into the amd_northbridge.nb array, e.g. in > node_to_amd_nb(). > > We can get the Node ID of a GPU node when processing an MCA error as in Patch > 2 of this set. The hardware is going to give us a value of 8 or more. > > So, for example, if we set up the "nb" array like this for 1 CPU and 2 GPUs: > [ID:Type] : [0: CPU], [8: GPU], [9: GPU] > > Then I think we'll need some more processing at runtime to map, for example, > an error from GPU Node 9 to NB array Index 2, etc. > > Or we can manage this at init time like this: > [0: CPU], [1: NULL], [2: NULL], [3: NULL], [4: NULL], [5: NULL], [6: NULL], > [7, NULL], [8: GPU], [9: GPU] > > And at runtime, the code which does Node ID to NB entry just works. This > applies to node_to_amd_nb(), places where we loop over amd_nb_num(), etc. > > What do you think? Ok, looking at patch 2, it does: node_id = ((m->ipid >> 44) & 0xF); So how ugly would it become if you do here: node_id = ((m->ipid >> 44) & 0xF); node_id -= accel_id_offset; where that accel_id_offset is the thing you've read out from one of the Data Fabric registers before? This way, the gap between CPU IDs and accel IDs is gone and in the software view, there is none. Or are we reading other hardware registers which are aware of that gap and we would have to remove it again to get the proper index? And if so, and if it becomes real ugly, maybe we will have to bite the bullet and do the gap in the array but that would be yucky... Hmmm.
On Thu, Sep 02, 2021 at 07:30:24PM +0200, Borislav Petkov wrote: > On Wed, Sep 01, 2021 at 06:17:21PM +0000, Yazen Ghannam wrote: > > These devices aren't officially GPUs, since they don't have graphics/video > > capabilities. Can we come up with a new term for this class of devices? Maybe > > accelerators or something? > > > > In any case, GPU is still used throughout documentation and code, so it's fair > > to just stick with "gpu". > > Hmm, yeah, everybody is talking about special-purpose processing units > now, i.e., accelerators or whatever they call them. I guess this is the > new fancy thing since sliced bread. > > Well, what are those PCI IDs going to represent? Devices which have RAS > capabilities on them? > > We have this nomenclature called "uncore" in the perf subsystem for > counters which are not part of the CPU core or whatever. But there we > use that term on AMD already so that might cause confusion. > > But I guess the type of those devices doesn't matter for amd_nb.c, > right? > > All that thing cares for is having an array of northbridges, each with > the respective PCI devices and that's it. So for amd_nb.c I think that > differentiation doesn't matter... but keep reading... > > > We use the Node ID to index into the amd_northbridge.nb array, e.g. in > > node_to_amd_nb(). > > > > We can get the Node ID of a GPU node when processing an MCA error as in Patch > > 2 of this set. The hardware is going to give us a value of 8 or more. > > > > So, for example, if we set up the "nb" array like this for 1 CPU and 2 GPUs: > > [ID:Type] : [0: CPU], [8: GPU], [9: GPU] > > > > Then I think we'll need some more processing at runtime to map, for example, > > an error from GPU Node 9 to NB array Index 2, etc. > > > > Or we can manage this at init time like this: > > [0: CPU], [1: NULL], [2: NULL], [3: NULL], [4: NULL], [5: NULL], [6: NULL], > > [7, NULL], [8: GPU], [9: GPU] > > > > And at runtime, the code which does Node ID to NB entry just works. This > > applies to node_to_amd_nb(), places where we loop over amd_nb_num(), etc. > > > > What do you think? > > Ok, looking at patch 2, it does: > > node_id = ((m->ipid >> 44) & 0xF); > > So how ugly would it become if you do here: > > node_id = ((m->ipid >> 44) & 0xF); > node_id -= accel_id_offset; > > where that accel_id_offset is the thing you've read out from one of the > Data Fabric registers before? > > This way, the gap between CPU IDs and accel IDs is gone and in the > software view, there is none. > > Or are we reading other hardware registers which are aware of that gap > and we would have to remove it again to get the proper index? And if so, > and if it becomes real ugly, maybe we will have to bite the bullet and > do the gap in the array but that would be yucky... > > Hmmm. > I really like this idea. I've gone over the current and future code a few times to make sure things are okay. As far as I can tell, this idea should work most of the time, since the "node_id" value is mostly used to look up the right devices in the nb array. But there is one case so far where the "real" hardware node_id is needed during address translation. This case is in the new code in review for Data Fabric v3.5, and it only applies to the GPU devices. What do you think about having a couple of helper functions to go between the hardware and Linux index IDs? Most cases will use "hardware -> Linux index", and when needed there can be a "Linux index -> hardware". I think we still need some piece of info to indicate a device is a GPU based on its node_id. The AMD NB code doesn't need to know, but the address translation code does. The AMD NB enumeration can be mostly generic. I think it may be enough to save an "id offset" value and also a "first special index" value. Then we can go back and forth between the appropriate values without having to allocate a bunch of unused memory or hardcoding certain values. Thanks for the idea! -Yazen
On Mon, Sep 13, 2021 at 06:07:30PM +0000, Yazen Ghannam wrote: > I really like this idea. I've gone over the current and future code a few > times to make sure things are okay. As far as I can tell, this idea should > work most of the time, since the "node_id" value is mostly used to look up the > right devices in the nb array. But there is one case so far where the "real" > hardware node_id is needed during address translation. Yap, I figured as much as this is kinda like the only place where you'd care about the actual node id. > This case is in the new code in review for Data Fabric v3.5, and it > only applies to the GPU devices. > > What do you think about having a couple of helper functions to go between the > hardware and Linux index IDs? Most cases will use "hardware -> Linux index", > and when needed there can be a "Linux index -> hardware". That's fine as long as it is properly documented what it does. > I think we still need some piece of info to indicate a device is a GPU based > on its node_id. The AMD NB code doesn't need to know, but the address > translation code does. The AMD NB enumeration can be mostly generic. I think > it may be enough to save an "id offset" value and also a "first special index" > value. Then we can go back and forth between the appropriate values without > having to allocate a bunch of unused memory or hardcoding certain values. Well, since we're going to need this in the translation logic and that is part of amd64_edac and there we said we'll move the family type up into amd64_pvt so that you can have a family descriptor per node, then I guess you're all set. :-) > Thanks for the idea! Sure, np. Thx.
Hi Boris, Apologies for the late reply. We were modifying the code based on the conversation between yourself and Yazen. I would like to answer some of your questions before submitting the next version of the patch-set. On 8/25/2021 4:12 PM, Borislav Petkov wrote: > [CAUTION: External Email] > > On Tue, Aug 24, 2021 at 12:24:35AM +0530, Naveen Krishna Chatradhi wrote: >> From: Muralidhara M K <muralimk@amd.com> >> >> On newer systems the CPUs manage MCA errors reported from the GPUs. >> Enumerate the GPU nodes with the AMD NB framework to support EDAC. >> >> This patch adds necessary code to manage the Aldebaran nodes along with > Avoid having "This patch" or "This commit" in the commit message. It is > tautologically useless. > > Also, do > > $ git grep 'This patch' Documentation/process > > for more details. Sure thanks > > Also, what are the "Aldebaran nodes"? > > Something on a star which is 65 light years away? Aldebaran is an AMD GPU name, code submitted [PATCH 000/159] Aldebaran support (lists.freedesktop.org) <https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html> is a part of the DRM framework > >> the CPU nodes. >> >> The GPU nodes are enumerated in sequential order based on the >> PCI hierarchy, and the first GPU node is assumed to have an "AMD Node >> ID" value of 8 (the second GPU node has 9, etc.). > What does that mean? The GPU nodes are simply numerically after the CPU > nodes or how am I to understand this nomenclature? > >> Each Aldebaran GPU >> package has 2 Data Fabrics, which are enumerated as 2 nodes. >> With this implementation detail, the Data Fabric on the GPU nodes can be >> accessed the same way as the Data Fabric on CPU nodes. >> >> Signed-off-by: Muralidhara M K <muralimk@amd.com> >> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> >> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> >> --- >> Changes since v2: Added Reviewed-by Yazen Ghannam >> >> arch/x86/include/asm/amd_nb.h | 10 ++++++ >> arch/x86/kernel/amd_nb.c | 63 ++++++++++++++++++++++++++++++++--- >> include/linux/pci_ids.h | 1 + >> 3 files changed, 69 insertions(+), 5 deletions(-) >> >> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h >> index 455066a06f60..09905f6c7218 100644 >> --- a/arch/x86/include/asm/amd_nb.h >> +++ b/arch/x86/include/asm/amd_nb.h >> @@ -80,6 +80,16 @@ struct amd_northbridge_info { >> >> #ifdef CONFIG_AMD_NB >> >> +/* >> + * On newer heterogeneous systems the data fabrics of the CPUs and GPUs >> + * are connected directly via a custom links, like is done with > s/ a // > >> + * 2 socket CPU systems and also within a socket for Multi-chip Module >> + * (MCM) CPUs like Naples. >> + * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value > In all your text: > > s/cpu/CPU/g > >> + * of 8 (the second GPU node has 9, etc.). >> + */ >> +#define NONCPU_NODE_INDEX 8 > Why is this assumed? Can it instead be read from the hardware somewhere? > Or there simply won't be more than 8 CPU nodes anyway? Not at least in > the near future? > > I'd prefer stuff to be read out directly from the hardware so that when > the hardware changes, the code just works instead of doing assumptions > which get invalidated later. > >> + >> u16 amd_nb_num(void); >> bool amd_nb_has_feature(unsigned int feature); >> struct amd_northbridge *node_to_amd_nb(int node); >> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c >> index 23dda362dc0f..6ad5664a18aa 100644 >> --- a/arch/x86/kernel/amd_nb.c >> +++ b/arch/x86/kernel/amd_nb.c >> @@ -26,6 +26,8 @@ >> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444 >> #define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654 >> #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e >> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb >> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4 > You see how those defines are aligned vertically, right? > > If your new defines don't fit, you realign them all vertically too - you > don't just slap them there. > > And if it wasn't clear above, that Aldebaran GPU chip name means > something only to AMD folks. If this is correct > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FVideo_Core_Next&data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7Ce18fc8abf0da4da621b908d967b4ff31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637654849356966973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ge%2Fh%2FA2YmlKA7IF8XjuwM%2F4eygYYfybMOJs4jLX3g3I%3D&reserved=0 > > then that Aldebaran thing is VCN 2.6 although this is only the video > encoding stuff and GPU I guess is more than that. > > IOW, what I'm trying to say is, just like we name the CPUs using their > families, you should name the GPUs nomenclature with GPU families (I > guess there's stuff like that) and not use the marketing crap. Aldebaran GPU might be a later variant of gfx9 and are connected to the CPU sockets via custom xGMI links. I could not find any family number associated with the GPUs. The DRM driver code uses it as follows and does not expose the value to other frameworks in Linux. +#define CHIP_ALDEBARAN 25 in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm > If you need an example, here's how we did it for the Intel marketing > pile of bullsh*t: > > arch/x86/include/asm/intel-family.h > > Please provide a similar way of referring to the GPU chips. > >> /* Protect the PCI config register pairs used for SMN and DF indirect access. */ >> static DEFINE_MUTEX(smn_mutex); >> @@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = { >> {} >> }; >> >> +static const struct pci_device_id amd_noncpu_root_ids[] = { > Why is that "noncpu" thing everywhere? Is this thing going to be > anything else besides a GPU? > > If not, you can simply call it > > amd_gpu_root_ids > > to mean *exactly* what they are. PCI IDs on the GPU. > >> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) }, >> + {} >> +}; >> + >> +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = { >> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) }, >> + {} >> +}; >> + >> +static const struct pci_device_id amd_noncpu_nb_link_ids[] = { >> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) }, >> + {} >> +}; >> + >> const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = { >> { 0x00, 0x18, 0x20 }, >> { 0xff, 0x00, 0x20 }, >> @@ -230,11 +247,16 @@ int amd_cache_northbridges(void) >> const struct pci_device_id *misc_ids = amd_nb_misc_ids; >> const struct pci_device_id *link_ids = amd_nb_link_ids; >> const struct pci_device_id *root_ids = amd_root_ids; >> + >> + const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids; >> + const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids; >> + const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids; >> + >> struct pci_dev *root, *misc, *link; >> struct amd_northbridge *nb; >> u16 roots_per_misc = 0; >> - u16 misc_count = 0; >> - u16 root_count = 0; >> + u16 misc_count = 0, misc_count_noncpu = 0; >> + u16 root_count = 0, root_count_noncpu = 0; >> u16 i, j; >> >> if (amd_northbridges.num) >> @@ -253,10 +275,16 @@ int amd_cache_northbridges(void) >> if (!misc_count) >> return -ENODEV; >> >> + while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL) >> + misc_count_noncpu++; >> + >> root = NULL; >> while ((root = next_northbridge(root, root_ids)) != NULL) >> root_count++; >> >> + while ((root = next_northbridge(root, noncpu_root_ids)) != NULL) >> + root_count_noncpu++; >> + >> if (root_count) { >> roots_per_misc = root_count / misc_count; >> >> @@ -270,15 +298,28 @@ int amd_cache_northbridges(void) >> } >> } >> >> - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); >> + if (misc_count_noncpu) { >> + /* >> + * The first non-CPU Node ID starts at 8 even if there are fewer >> + * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb >> + * indexing scheme, allocate the number of GPU nodes plus 8. >> + * Some allocated amd_northbridge structures will go unused when >> + * the number of CPU nodes is less than 8, but this tradeoff is to >> + * keep things relatively simple. > Why simple? > > What's wrong with having > > [node IDs][GPU node IDs] > > i.e., the usual nodes come first and the GPU ones after it. > > You enumerate everything properly here so you can control what goes > where. Which means, you don't need this NONCPU_NODE_INDEX non-sense at > all. > > Hmmm? > >> + */ >> + amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu; >> + } else { >> + amd_northbridges.num = misc_count; >> + } >> + >> + nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL); >> if (!nb) >> return -ENOMEM; >> >> amd_northbridges.nb = nb; >> - amd_northbridges.num = misc_count; >> >> link = misc = root = NULL; >> - for (i = 0; i < amd_northbridges.num; i++) { >> + for (i = 0; i < misc_count; i++) { >> node_to_amd_nb(i)->root = root = >> next_northbridge(root, root_ids); >> node_to_amd_nb(i)->misc = misc = >> @@ -299,6 +340,18 @@ int amd_cache_northbridges(void) >> root = next_northbridge(root, root_ids); >> } >> >> + if (misc_count_noncpu) { >> + link = misc = root = NULL; >> + for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) { > So this is not keeping things relatively simple - this is making you > jump to the GPU nodes to prepare them too which is making them special. > >> + node_to_amd_nb(i)->root = root = >> + next_northbridge(root, noncpu_root_ids); >> + node_to_amd_nb(i)->misc = misc = >> + next_northbridge(misc, noncpu_misc_ids); >> + node_to_amd_nb(i)->link = link = >> + next_northbridge(link, noncpu_link_ids); > And seeing how you put those pointers in ->root, ->misc and ->link, > you can just as well drop those noncpu_*_ids and put the aldebaran > PCI IDs simply in amd_root_ids, amd_nb_misc_ids and amd_nb_link_ids > respectively. > > Because to this code, the RAS functionality is no different than any > other CPU because, well, the interface is those PCI devices. So the > thing doesn't care if it is GPU or not. > > So you don't need any of that separation between GPU and CPU nodes when > it comes to the RAS code. The roots_per_misc count is different for the CPU nodes and GPU nodes. We tried to address your comment without introducing pci_dev_id arrays for GPU roots, misc and links. But, introducing GPU ID arrays looks cleaner, let me submit the revised code and we can revisit this point. > > Makes sense? > > -- > Regards/Gruss, > Boris. Regards, Naveen > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7Ce18fc8abf0da4da621b908d967b4ff31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637654849356966973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Y5j%2BNctWbWjlpOkQ5DYvDtroWv8MplSBTPgopewm38E%3D&reserved=0
On Mon, Oct 11, 2021 at 07:56:34PM +0530, Chatradhi, Naveen Krishna wrote: > Aldebaran is an AMD GPU name, code submitted [PATCH 000/159] Aldebaran > support (lists.freedesktop.org) > <https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html> > is a part of the DRM framework A short explanation in your patchset would be very helpful so that a reader can know what it is and search the net further, if more info is needed. > Aldebaran GPU might be a later variant of gfx9 and are connected to the CPU > sockets via custom xGMI links. > > I could not find any family number associated with the GPUs. The DRM driver > code uses it as follows and > > does not expose the value to other frameworks in Linux. > > +#define CHIP_ALDEBARAN 25 > > in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm Aha, so Aldebaran is the chip name. And how are those PCI IDs named in the documentation? Aldebaran data fabric PCI functions or so? > The roots_per_misc count is different for the CPU nodes and GPU nodes. We > tried to address > > your comment without introducing pci_dev_id arrays for GPU roots, misc and > links. But, introducing > > GPU ID arrays looks cleaner, let me submit the revised code and we can > revisit this point. Ok, but as I said above, what those devices are, means nothing to the amd_nb code because that simply enumerates PCI IDs when those things were simply northbridges. If the GPU PCI IDs do not fit easily into the scheme then maybe the scheme has become inadeqate... we'll see...
diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h index 455066a06f60..09905f6c7218 100644 --- a/arch/x86/include/asm/amd_nb.h +++ b/arch/x86/include/asm/amd_nb.h @@ -80,6 +80,16 @@ struct amd_northbridge_info { #ifdef CONFIG_AMD_NB +/* + * On newer heterogeneous systems the data fabrics of the CPUs and GPUs + * are connected directly via a custom links, like is done with + * 2 socket CPU systems and also within a socket for Multi-chip Module + * (MCM) CPUs like Naples. + * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value + * of 8 (the second GPU node has 9, etc.). + */ +#define NONCPU_NODE_INDEX 8 + u16 amd_nb_num(void); bool amd_nb_has_feature(unsigned int feature); struct amd_northbridge *node_to_amd_nb(int node); diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index 23dda362dc0f..6ad5664a18aa 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -26,6 +26,8 @@ #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444 #define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654 #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4 /* Protect the PCI config register pairs used for SMN and DF indirect access. */ static DEFINE_MUTEX(smn_mutex); @@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = { {} }; +static const struct pci_device_id amd_noncpu_root_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) }, + {} +}; + +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) }, + {} +}; + +static const struct pci_device_id amd_noncpu_nb_link_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) }, + {} +}; + const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = { { 0x00, 0x18, 0x20 }, { 0xff, 0x00, 0x20 }, @@ -230,11 +247,16 @@ int amd_cache_northbridges(void) const struct pci_device_id *misc_ids = amd_nb_misc_ids; const struct pci_device_id *link_ids = amd_nb_link_ids; const struct pci_device_id *root_ids = amd_root_ids; + + const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids; + const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids; + const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids; + struct pci_dev *root, *misc, *link; struct amd_northbridge *nb; u16 roots_per_misc = 0; - u16 misc_count = 0; - u16 root_count = 0; + u16 misc_count = 0, misc_count_noncpu = 0; + u16 root_count = 0, root_count_noncpu = 0; u16 i, j; if (amd_northbridges.num) @@ -253,10 +275,16 @@ int amd_cache_northbridges(void) if (!misc_count) return -ENODEV; + while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL) + misc_count_noncpu++; + root = NULL; while ((root = next_northbridge(root, root_ids)) != NULL) root_count++; + while ((root = next_northbridge(root, noncpu_root_ids)) != NULL) + root_count_noncpu++; + if (root_count) { roots_per_misc = root_count / misc_count; @@ -270,15 +298,28 @@ int amd_cache_northbridges(void) } } - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); + if (misc_count_noncpu) { + /* + * The first non-CPU Node ID starts at 8 even if there are fewer + * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb + * indexing scheme, allocate the number of GPU nodes plus 8. + * Some allocated amd_northbridge structures will go unused when + * the number of CPU nodes is less than 8, but this tradeoff is to + * keep things relatively simple. + */ + amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu; + } else { + amd_northbridges.num = misc_count; + } + + nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL); if (!nb) return -ENOMEM; amd_northbridges.nb = nb; - amd_northbridges.num = misc_count; link = misc = root = NULL; - for (i = 0; i < amd_northbridges.num; i++) { + for (i = 0; i < misc_count; i++) { node_to_amd_nb(i)->root = root = next_northbridge(root, root_ids); node_to_amd_nb(i)->misc = misc = @@ -299,6 +340,18 @@ int amd_cache_northbridges(void) root = next_northbridge(root, root_ids); } + if (misc_count_noncpu) { + link = misc = root = NULL; + for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) { + node_to_amd_nb(i)->root = root = + next_northbridge(root, noncpu_root_ids); + node_to_amd_nb(i)->misc = misc = + next_northbridge(misc, noncpu_misc_ids); + node_to_amd_nb(i)->link = link = + next_northbridge(link, noncpu_link_ids); + } + } + if (amd_gart_present()) amd_northbridges.flags |= AMD_NB_GART; diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 4bac1831de80..d9aae90dfce9 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -554,6 +554,7 @@ #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F3 0x1493 #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3 #define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653 #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703