diff mbox series

[v3,1/3] x86/amd_nb: Add support for northbridges on Aldebaran

Message ID 20210823185437.94417-2-nchatrad@amd.com (mailing list archive)
State New, archived
Headers show
Series x86/edac/amd64: Add support for noncpu nodes | expand

Commit Message

Naveen Krishna Chatradhi Aug. 23, 2021, 6:54 p.m. UTC
From: Muralidhara M K <muralimk@amd.com>

On newer systems the CPUs manage MCA errors reported from the GPUs.
Enumerate the GPU nodes with the AMD NB framework to support EDAC.

This patch adds necessary code to manage the Aldebaran nodes along with
the CPU nodes.

The GPU nodes are enumerated in sequential order based on the
PCI hierarchy, and the first GPU node is assumed to have an "AMD Node
ID" value of 8 (the second GPU node has 9, etc.). Each Aldebaran GPU
package has 2 Data Fabrics, which are enumerated as 2 nodes.
With this implementation detail, the Data Fabric on the GPU nodes can be
accessed the same way as the Data Fabric on CPU nodes.

Signed-off-by: Muralidhara M K <muralimk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Changes since v2: Added Reviewed-by Yazen Ghannam

 arch/x86/include/asm/amd_nb.h | 10 ++++++
 arch/x86/kernel/amd_nb.c      | 63 ++++++++++++++++++++++++++++++++---
 include/linux/pci_ids.h       |  1 +
 3 files changed, 69 insertions(+), 5 deletions(-)

Comments

Borislav Petkov Aug. 25, 2021, 10:42 a.m. UTC | #1
On Tue, Aug 24, 2021 at 12:24:35AM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <muralimk@amd.com>
> 
> On newer systems the CPUs manage MCA errors reported from the GPUs.
> Enumerate the GPU nodes with the AMD NB framework to support EDAC.
> 
> This patch adds necessary code to manage the Aldebaran nodes along with

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

Also, what are the "Aldebaran nodes"?

Something on a star which is 65 light years away?

> the CPU nodes.
> 
> The GPU nodes are enumerated in sequential order based on the
> PCI hierarchy, and the first GPU node is assumed to have an "AMD Node
> ID" value of 8 (the second GPU node has 9, etc.).

What does that mean? The GPU nodes are simply numerically after the CPU
nodes or how am I to understand this nomenclature?

> Each Aldebaran GPU
> package has 2 Data Fabrics, which are enumerated as 2 nodes.
> With this implementation detail, the Data Fabric on the GPU nodes can be
> accessed the same way as the Data Fabric on CPU nodes.
> 
> Signed-off-by: Muralidhara M K <muralimk@amd.com>
> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> Changes since v2: Added Reviewed-by Yazen Ghannam
> 
>  arch/x86/include/asm/amd_nb.h | 10 ++++++
>  arch/x86/kernel/amd_nb.c      | 63 ++++++++++++++++++++++++++++++++---
>  include/linux/pci_ids.h       |  1 +
>  3 files changed, 69 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
> index 455066a06f60..09905f6c7218 100644
> --- a/arch/x86/include/asm/amd_nb.h
> +++ b/arch/x86/include/asm/amd_nb.h
> @@ -80,6 +80,16 @@ struct amd_northbridge_info {
>  
>  #ifdef CONFIG_AMD_NB
>  
> +/*
> + * On newer heterogeneous systems the data fabrics of the CPUs and GPUs
> + * are connected directly via a custom links, like is done with

s/ a //

> + * 2 socket CPU systems and also within a socket for Multi-chip Module
> + * (MCM) CPUs like Naples.
> + * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value

In all your text:

s/cpu/CPU/g

> + * of 8 (the second GPU node has 9, etc.).
> + */
> +#define NONCPU_NODE_INDEX	8

Why is this assumed? Can it instead be read from the hardware somewhere?
Or there simply won't be more than 8 CPU nodes anyway? Not at least in
the near future?

I'd prefer stuff to be read out directly from the hardware so that when
the hardware changes, the code just works instead of doing assumptions
which get invalidated later.

> +
>  u16 amd_nb_num(void);
>  bool amd_nb_has_feature(unsigned int feature);
>  struct amd_northbridge *node_to_amd_nb(int node);
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 23dda362dc0f..6ad5664a18aa 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -26,6 +26,8 @@
>  #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
>  #define PCI_DEVICE_ID_AMD_19H_DF_F4	0x1654
>  #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT	0x14bb
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4	0x14d4

You see how those defines are aligned vertically, right?

If your new defines don't fit, you realign them all vertically too - you
don't just slap them there.

And if it wasn't clear above, that Aldebaran GPU chip name means
something only to AMD folks. If this is correct

https://en.wikipedia.org/wiki/Video_Core_Next

then that Aldebaran thing is VCN 2.6 although this is only the video
encoding stuff and GPU I guess is more than that.

IOW, what I'm trying to say is, just like we name the CPUs using their
families, you should name the GPUs nomenclature with GPU families (I
guess there's stuff like that) and not use the marketing crap.

If you need an example, here's how we did it for the Intel marketing
pile of bullsh*t:

arch/x86/include/asm/intel-family.h

Please provide a similar way of referring to the GPU chips.

>  /* Protect the PCI config register pairs used for SMN and DF indirect access. */
>  static DEFINE_MUTEX(smn_mutex);
> @@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = {
>  	{}
>  };
>  
> +static const struct pci_device_id amd_noncpu_root_ids[] = {

Why is that "noncpu" thing everywhere? Is this thing going to be
anything else besides a GPU?

If not, you can simply call it

	amd_gpu_root_ids

to mean *exactly* what they are. PCI IDs on the GPU.

> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
> +	{}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
> +	{}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
> +	{}
> +};
> +
>  const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
>  	{ 0x00, 0x18, 0x20 },
>  	{ 0xff, 0x00, 0x20 },
> @@ -230,11 +247,16 @@ int amd_cache_northbridges(void)
>  	const struct pci_device_id *misc_ids = amd_nb_misc_ids;
>  	const struct pci_device_id *link_ids = amd_nb_link_ids;
>  	const struct pci_device_id *root_ids = amd_root_ids;
> +
> +	const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
> +	const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
> +	const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
> +
>  	struct pci_dev *root, *misc, *link;
>  	struct amd_northbridge *nb;
>  	u16 roots_per_misc = 0;
> -	u16 misc_count = 0;
> -	u16 root_count = 0;
> +	u16 misc_count = 0, misc_count_noncpu = 0;
> +	u16 root_count = 0, root_count_noncpu = 0;
>  	u16 i, j;
>  
>  	if (amd_northbridges.num)
> @@ -253,10 +275,16 @@ int amd_cache_northbridges(void)
>  	if (!misc_count)
>  		return -ENODEV;
>  
> +	while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
> +		misc_count_noncpu++;
> +
>  	root = NULL;
>  	while ((root = next_northbridge(root, root_ids)) != NULL)
>  		root_count++;
>  
> +	while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
> +		root_count_noncpu++;
> +
>  	if (root_count) {
>  		roots_per_misc = root_count / misc_count;
>  
> @@ -270,15 +298,28 @@ int amd_cache_northbridges(void)
>  		}
>  	}
>  
> -	nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
> +	if (misc_count_noncpu) {
> +		/*
> +		 * The first non-CPU Node ID starts at 8 even if there are fewer
> +		 * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb
> +		 * indexing scheme, allocate the number of GPU nodes plus 8.
> +		 * Some allocated amd_northbridge structures will go unused when
> +		 * the number of CPU nodes is less than 8, but this tradeoff is to
> +		 * keep things relatively simple.

Why simple?

What's wrong with having

[node IDs][GPU node IDs]

i.e., the usual nodes come first and the GPU ones after it.

You enumerate everything properly here so you can control what goes
where. Which means, you don't need this NONCPU_NODE_INDEX non-sense at
all.

Hmmm?

> +		 */
> +		amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
> +	} else {
> +		amd_northbridges.num = misc_count;
> +	}
> +
> +	nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL);
>  	if (!nb)
>  		return -ENOMEM;
>  
>  	amd_northbridges.nb = nb;
> -	amd_northbridges.num = misc_count;
>  
>  	link = misc = root = NULL;
> -	for (i = 0; i < amd_northbridges.num; i++) {
> +	for (i = 0; i < misc_count; i++) {
>  		node_to_amd_nb(i)->root = root =
>  			next_northbridge(root, root_ids);
>  		node_to_amd_nb(i)->misc = misc =
> @@ -299,6 +340,18 @@ int amd_cache_northbridges(void)
>  			root = next_northbridge(root, root_ids);
>  	}
>  
> +	if (misc_count_noncpu) {
> +		link = misc = root = NULL;
> +		for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {

So this is not keeping things relatively simple - this is making you
jump to the GPU nodes to prepare them too which is making them special.

> +			node_to_amd_nb(i)->root = root =
> +				next_northbridge(root, noncpu_root_ids);
> +			node_to_amd_nb(i)->misc = misc =
> +				next_northbridge(misc, noncpu_misc_ids);
> +			node_to_amd_nb(i)->link = link =
> +				next_northbridge(link, noncpu_link_ids);

And seeing how you put those pointers in ->root, ->misc and ->link,
you can just as well drop those noncpu_*_ids and put the aldebaran
PCI IDs simply in amd_root_ids, amd_nb_misc_ids and amd_nb_link_ids
respectively.

Because to this code, the RAS functionality is no different than any
other CPU because, well, the interface is those PCI devices. So the
thing doesn't care if it is GPU or not.

So you don't need any of that separation between GPU and CPU nodes when
it comes to the RAS code.

Makes sense?
Yazen Ghannam Sept. 1, 2021, 6:17 p.m. UTC | #2
On Wed, Aug 25, 2021 at 12:42:43PM +0200, Borislav Petkov wrote:
> On Tue, Aug 24, 2021 at 12:24:35AM +0530, Naveen Krishna Chatradhi wrote:
...
> > 
> > The GPU nodes are enumerated in sequential order based on the
> > PCI hierarchy, and the first GPU node is assumed to have an "AMD Node
> > ID" value of 8 (the second GPU node has 9, etc.).
> 
> What does that mean? The GPU nodes are simply numerically after the CPU
> nodes or how am I to understand this nomenclature?
>

Yes, the GPU nodes will be numerically after the CPU nodes. However, there
will be a gap in the "Node ID" values. For example, if there is one CPU node
and two GPU nodes, then the "Node ID" values will look like this:

CPU Node0 -> System Node ID 0
GPU Node0 -> System Node ID 8
GPU Node1 -> System Node ID 9
 
...
> > + * of 8 (the second GPU node has 9, etc.).
> > + */
> > +#define NONCPU_NODE_INDEX	8
> 
> Why is this assumed? Can it instead be read from the hardware somewhere?
> Or there simply won't be more than 8 CPU nodes anyway? Not at least in
> the near future?
>

Yes, the intention is to leave a big enough gap for at least the forseeable
future.
 
> I'd prefer stuff to be read out directly from the hardware so that when
> the hardware changes, the code just works instead of doing assumptions
> which get invalidated later.
>

So after going through the latest documentation and asking the one of our
hardware folks, it looks like we have an option to read this value from one of
the Data Fabric registers. Hopefully, whatever solution we settle on will
stick for a while. The Data Fabric registers are not architectural, and
registers and fields have changed between model groups.
 
...
> > +static const struct pci_device_id amd_noncpu_root_ids[] = {
> 
> Why is that "noncpu" thing everywhere? Is this thing going to be
> anything else besides a GPU?
> 
> If not, you can simply call it
> 
> 	amd_gpu_root_ids
> 
> to mean *exactly* what they are. PCI IDs on the GPU.
>

These devices aren't officially GPUs, since they don't have graphics/video
capabilities. Can we come up with a new term for this class of devices? Maybe
accelerators or something?

In any case, GPU is still used throughout documentation and code, so it's fair
to just stick with "gpu".
 
...
> >  
> > -	nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
> > +	if (misc_count_noncpu) {
> > +		/*
> > +		 * The first non-CPU Node ID starts at 8 even if there are fewer
> > +		 * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb
> > +		 * indexing scheme, allocate the number of GPU nodes plus 8.
> > +		 * Some allocated amd_northbridge structures will go unused when
> > +		 * the number of CPU nodes is less than 8, but this tradeoff is to
> > +		 * keep things relatively simple.
> 
> Why simple?
> 
> What's wrong with having
> 
> [node IDs][GPU node IDs]
> 
> i.e., the usual nodes come first and the GPU ones after it.
> 
> You enumerate everything properly here so you can control what goes
> where. Which means, you don't need this NONCPU_NODE_INDEX non-sense at
> all.
> 
> Hmmm?
>

We use the Node ID to index into the amd_northbridge.nb array, e.g. in
node_to_amd_nb().

We can get the Node ID of a GPU node when processing an MCA error as in Patch
2 of this set. The hardware is going to give us a value of 8 or more.

So, for example, if we set up the "nb" array like this for 1 CPU and 2 GPUs:
[ID:Type] : [0: CPU], [8: GPU], [9: GPU]
 
Then I think we'll need some more processing at runtime to map, for example,
an error from GPU Node 9 to NB array Index 2, etc.

Or we can manage this at init time like this:
[0: CPU], [1: NULL], [2: NULL], [3: NULL], [4: NULL], [5: NULL], [6: NULL],
[7, NULL], [8: GPU], [9: GPU]

And at runtime, the code which does Node ID to NB entry just works. This
applies to node_to_amd_nb(), places where we loop over amd_nb_num(), etc.

What do you think?

Thanks,
Yazen
Borislav Petkov Sept. 2, 2021, 5:30 p.m. UTC | #3
On Wed, Sep 01, 2021 at 06:17:21PM +0000, Yazen Ghannam wrote:
> These devices aren't officially GPUs, since they don't have graphics/video
> capabilities. Can we come up with a new term for this class of devices? Maybe
> accelerators or something?
> 
> In any case, GPU is still used throughout documentation and code, so it's fair
> to just stick with "gpu".

Hmm, yeah, everybody is talking about special-purpose processing units
now, i.e., accelerators or whatever they call them. I guess this is the
new fancy thing since sliced bread.

Well, what are those PCI IDs going to represent? Devices which have RAS
capabilities on them?

We have this nomenclature called "uncore" in the perf subsystem for
counters which are not part of the CPU core or whatever. But there we
use that term on AMD already so that might cause confusion.

But I guess the type of those devices doesn't matter for amd_nb.c,
right?

All that thing cares for is having an array of northbridges, each with
the respective PCI devices and that's it. So for amd_nb.c I think that
differentiation doesn't matter... but keep reading...

> We use the Node ID to index into the amd_northbridge.nb array, e.g. in
> node_to_amd_nb().
> 
> We can get the Node ID of a GPU node when processing an MCA error as in Patch
> 2 of this set. The hardware is going to give us a value of 8 or more.
> 
> So, for example, if we set up the "nb" array like this for 1 CPU and 2 GPUs:
> [ID:Type] : [0: CPU], [8: GPU], [9: GPU]
>  
> Then I think we'll need some more processing at runtime to map, for example,
> an error from GPU Node 9 to NB array Index 2, etc.
> 
> Or we can manage this at init time like this:
> [0: CPU], [1: NULL], [2: NULL], [3: NULL], [4: NULL], [5: NULL], [6: NULL],
> [7, NULL], [8: GPU], [9: GPU]
> 
> And at runtime, the code which does Node ID to NB entry just works. This
> applies to node_to_amd_nb(), places where we loop over amd_nb_num(), etc.
> 
> What do you think?

Ok, looking at patch 2, it does:

	node_id = ((m->ipid >> 44) & 0xF);

So how ugly would it become if you do here:

	node_id = ((m->ipid >> 44) & 0xF);
	node_id -= accel_id_offset;

where that accel_id_offset is the thing you've read out from one of the
Data Fabric registers before?

This way, the gap between CPU IDs and accel IDs is gone and in the
software view, there is none.

Or are we reading other hardware registers which are aware of that gap
and we would have to remove it again to get the proper index? And if so,
and if it becomes real ugly, maybe we will have to bite the bullet and
do the gap in the array but that would be yucky...

Hmmm.
Yazen Ghannam Sept. 13, 2021, 6:07 p.m. UTC | #4
On Thu, Sep 02, 2021 at 07:30:24PM +0200, Borislav Petkov wrote:
> On Wed, Sep 01, 2021 at 06:17:21PM +0000, Yazen Ghannam wrote:
> > These devices aren't officially GPUs, since they don't have graphics/video
> > capabilities. Can we come up with a new term for this class of devices? Maybe
> > accelerators or something?
> > 
> > In any case, GPU is still used throughout documentation and code, so it's fair
> > to just stick with "gpu".
> 
> Hmm, yeah, everybody is talking about special-purpose processing units
> now, i.e., accelerators or whatever they call them. I guess this is the
> new fancy thing since sliced bread.
> 
> Well, what are those PCI IDs going to represent? Devices which have RAS
> capabilities on them?
> 
> We have this nomenclature called "uncore" in the perf subsystem for
> counters which are not part of the CPU core or whatever. But there we
> use that term on AMD already so that might cause confusion.
> 
> But I guess the type of those devices doesn't matter for amd_nb.c,
> right?
> 
> All that thing cares for is having an array of northbridges, each with
> the respective PCI devices and that's it. So for amd_nb.c I think that
> differentiation doesn't matter... but keep reading...
> 
> > We use the Node ID to index into the amd_northbridge.nb array, e.g. in
> > node_to_amd_nb().
> > 
> > We can get the Node ID of a GPU node when processing an MCA error as in Patch
> > 2 of this set. The hardware is going to give us a value of 8 or more.
> > 
> > So, for example, if we set up the "nb" array like this for 1 CPU and 2 GPUs:
> > [ID:Type] : [0: CPU], [8: GPU], [9: GPU]
> >  
> > Then I think we'll need some more processing at runtime to map, for example,
> > an error from GPU Node 9 to NB array Index 2, etc.
> > 
> > Or we can manage this at init time like this:
> > [0: CPU], [1: NULL], [2: NULL], [3: NULL], [4: NULL], [5: NULL], [6: NULL],
> > [7, NULL], [8: GPU], [9: GPU]
> > 
> > And at runtime, the code which does Node ID to NB entry just works. This
> > applies to node_to_amd_nb(), places where we loop over amd_nb_num(), etc.
> > 
> > What do you think?
> 
> Ok, looking at patch 2, it does:
> 
> 	node_id = ((m->ipid >> 44) & 0xF);
> 
> So how ugly would it become if you do here:
> 
> 	node_id = ((m->ipid >> 44) & 0xF);
> 	node_id -= accel_id_offset;
> 
> where that accel_id_offset is the thing you've read out from one of the
> Data Fabric registers before?
> 
> This way, the gap between CPU IDs and accel IDs is gone and in the
> software view, there is none.
> 
> Or are we reading other hardware registers which are aware of that gap
> and we would have to remove it again to get the proper index? And if so,
> and if it becomes real ugly, maybe we will have to bite the bullet and
> do the gap in the array but that would be yucky...
> 
> Hmmm.
>

I really like this idea. I've gone over the current and future code a few
times to make sure things are okay. As far as I can tell, this idea should
work most of the time, since the "node_id" value is mostly used to look up the
right devices in the nb array. But there is one case so far where the "real"
hardware node_id is needed during address translation. This case is in the new
code in review for Data Fabric v3.5, and it only applies to the GPU devices.

What do you think about having a couple of helper functions to go between the
hardware and Linux index IDs? Most cases will use "hardware -> Linux index",
and when needed there can be a "Linux index -> hardware".

I think we still need some piece of info to indicate a device is a GPU based
on its node_id. The AMD NB code doesn't need to know, but the address
translation code does. The AMD NB enumeration can be mostly generic. I think
it may be enough to save an "id offset" value and also a "first special index"
value. Then we can go back and forth between the appropriate values without
having to allocate a bunch of unused memory or hardcoding certain values.

Thanks for the idea!

-Yazen
Borislav Petkov Sept. 16, 2021, 4:36 p.m. UTC | #5
On Mon, Sep 13, 2021 at 06:07:30PM +0000, Yazen Ghannam wrote:
> I really like this idea. I've gone over the current and future code a few
> times to make sure things are okay. As far as I can tell, this idea should
> work most of the time, since the "node_id" value is mostly used to look up the
> right devices in the nb array. But there is one case so far where the "real"
> hardware node_id is needed during address translation.

Yap, I figured as much as this is kinda like the only place where you'd
care about the actual node id.

> This case is in the new code in review for Data Fabric v3.5, and it
> only applies to the GPU devices.
>
> What do you think about having a couple of helper functions to go between the
> hardware and Linux index IDs? Most cases will use "hardware -> Linux index",
> and when needed there can be a "Linux index -> hardware".

That's fine as long as it is properly documented what it does.

> I think we still need some piece of info to indicate a device is a GPU based
> on its node_id. The AMD NB code doesn't need to know, but the address
> translation code does. The AMD NB enumeration can be mostly generic. I think
> it may be enough to save an "id offset" value and also a "first special index"
> value. Then we can go back and forth between the appropriate values without
> having to allocate a bunch of unused memory or hardcoding certain values.

Well, since we're going to need this in the translation logic and that
is part of amd64_edac and there we said we'll move the family type up
into amd64_pvt so that you can have a family descriptor per node, then I
guess you're all set. :-)

> Thanks for the idea!

Sure, np.

Thx.
Naveen Krishna Chatradhi Oct. 11, 2021, 2:26 p.m. UTC | #6
Hi Boris,

Apologies for the late reply. We were modifying the code based on the 
conversation between yourself and Yazen.

I would like to answer some of your questions before submitting the next 
version of the patch-set.

On 8/25/2021 4:12 PM, Borislav Petkov wrote:
> [CAUTION: External Email]
>
> On Tue, Aug 24, 2021 at 12:24:35AM +0530, Naveen Krishna Chatradhi wrote:
>> From: Muralidhara M K <muralimk@amd.com>
>>
>> On newer systems the CPUs manage MCA errors reported from the GPUs.
>> Enumerate the GPU nodes with the AMD NB framework to support EDAC.
>>
>> This patch adds necessary code to manage the Aldebaran nodes along with
> Avoid having "This patch" or "This commit" in the commit message. It is
> tautologically useless.
>
> Also, do
>
> $ git grep 'This patch' Documentation/process
>
> for more details.
Sure thanks
>
> Also, what are the "Aldebaran nodes"?
>
> Something on a star which is 65 light years away?

Aldebaran is an AMD GPU name, code submitted [PATCH 000/159] Aldebaran 
support (lists.freedesktop.org) 
<https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html> 
is a part of the DRM framework

>
>> the CPU nodes.
>>
>> The GPU nodes are enumerated in sequential order based on the
>> PCI hierarchy, and the first GPU node is assumed to have an "AMD Node
>> ID" value of 8 (the second GPU node has 9, etc.).
> What does that mean? The GPU nodes are simply numerically after the CPU
> nodes or how am I to understand this nomenclature?
>
>> Each Aldebaran GPU
>> package has 2 Data Fabrics, which are enumerated as 2 nodes.
>> With this implementation detail, the Data Fabric on the GPU nodes can be
>> accessed the same way as the Data Fabric on CPU nodes.
>>
>> Signed-off-by: Muralidhara M K <muralimk@amd.com>
>> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
>> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
>> ---
>> Changes since v2: Added Reviewed-by Yazen Ghannam
>>
>>   arch/x86/include/asm/amd_nb.h | 10 ++++++
>>   arch/x86/kernel/amd_nb.c      | 63 ++++++++++++++++++++++++++++++++---
>>   include/linux/pci_ids.h       |  1 +
>>   3 files changed, 69 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
>> index 455066a06f60..09905f6c7218 100644
>> --- a/arch/x86/include/asm/amd_nb.h
>> +++ b/arch/x86/include/asm/amd_nb.h
>> @@ -80,6 +80,16 @@ struct amd_northbridge_info {
>>
>>   #ifdef CONFIG_AMD_NB
>>
>> +/*
>> + * On newer heterogeneous systems the data fabrics of the CPUs and GPUs
>> + * are connected directly via a custom links, like is done with
> s/ a //
>
>> + * 2 socket CPU systems and also within a socket for Multi-chip Module
>> + * (MCM) CPUs like Naples.
>> + * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value
> In all your text:
>
> s/cpu/CPU/g
>
>> + * of 8 (the second GPU node has 9, etc.).
>> + */
>> +#define NONCPU_NODE_INDEX    8
> Why is this assumed? Can it instead be read from the hardware somewhere?
> Or there simply won't be more than 8 CPU nodes anyway? Not at least in
> the near future?
>
> I'd prefer stuff to be read out directly from the hardware so that when
> the hardware changes, the code just works instead of doing assumptions
> which get invalidated later.
>
>> +
>>   u16 amd_nb_num(void);
>>   bool amd_nb_has_feature(unsigned int feature);
>>   struct amd_northbridge *node_to_amd_nb(int node);
>> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
>> index 23dda362dc0f..6ad5664a18aa 100644
>> --- a/arch/x86/kernel/amd_nb.c
>> +++ b/arch/x86/kernel/amd_nb.c
>> @@ -26,6 +26,8 @@
>>   #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
>>   #define PCI_DEVICE_ID_AMD_19H_DF_F4  0x1654
>>   #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
>> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT     0x14bb
>> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4    0x14d4
> You see how those defines are aligned vertically, right?
>
> If your new defines don't fit, you realign them all vertically too - you
> don't just slap them there.
>
> And if it wasn't clear above, that Aldebaran GPU chip name means
> something only to AMD folks. If this is correct
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FVideo_Core_Next&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7Ce18fc8abf0da4da621b908d967b4ff31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637654849356966973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Ge%2Fh%2FA2YmlKA7IF8XjuwM%2F4eygYYfybMOJs4jLX3g3I%3D&amp;reserved=0
>
> then that Aldebaran thing is VCN 2.6 although this is only the video
> encoding stuff and GPU I guess is more than that.
>
> IOW, what I'm trying to say is, just like we name the CPUs using their
> families, you should name the GPUs nomenclature with GPU families (I
> guess there's stuff like that) and not use the marketing crap.

Aldebaran GPU might be a later variant of gfx9 and are connected to the 
CPU sockets via custom xGMI links.

I could not find any family number associated with the GPUs. The DRM 
driver code uses it as follows and

does not expose the value to other frameworks in Linux.

+#define CHIP_ALDEBARAN 25

in 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm

> If you need an example, here's how we did it for the Intel marketing
> pile of bullsh*t:
>
> arch/x86/include/asm/intel-family.h
>
> Please provide a similar way of referring to the GPU chips.
>
>>   /* Protect the PCI config register pairs used for SMN and DF indirect access. */
>>   static DEFINE_MUTEX(smn_mutex);
>> @@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = {
>>        {}
>>   };
>>
>> +static const struct pci_device_id amd_noncpu_root_ids[] = {
> Why is that "noncpu" thing everywhere? Is this thing going to be
> anything else besides a GPU?
>
> If not, you can simply call it
>
>          amd_gpu_root_ids
>
> to mean *exactly* what they are. PCI IDs on the GPU.
>
>> +     { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
>> +     {}
>> +};
>> +
>> +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
>> +     { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
>> +     {}
>> +};
>> +
>> +static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
>> +     { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
>> +     {}
>> +};
>> +
>>   const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
>>        { 0x00, 0x18, 0x20 },
>>        { 0xff, 0x00, 0x20 },
>> @@ -230,11 +247,16 @@ int amd_cache_northbridges(void)
>>        const struct pci_device_id *misc_ids = amd_nb_misc_ids;
>>        const struct pci_device_id *link_ids = amd_nb_link_ids;
>>        const struct pci_device_id *root_ids = amd_root_ids;
>> +
>> +     const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
>> +     const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
>> +     const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
>> +
>>        struct pci_dev *root, *misc, *link;
>>        struct amd_northbridge *nb;
>>        u16 roots_per_misc = 0;
>> -     u16 misc_count = 0;
>> -     u16 root_count = 0;
>> +     u16 misc_count = 0, misc_count_noncpu = 0;
>> +     u16 root_count = 0, root_count_noncpu = 0;
>>        u16 i, j;
>>
>>        if (amd_northbridges.num)
>> @@ -253,10 +275,16 @@ int amd_cache_northbridges(void)
>>        if (!misc_count)
>>                return -ENODEV;
>>
>> +     while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
>> +             misc_count_noncpu++;
>> +
>>        root = NULL;
>>        while ((root = next_northbridge(root, root_ids)) != NULL)
>>                root_count++;
>>
>> +     while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
>> +             root_count_noncpu++;
>> +
>>        if (root_count) {
>>                roots_per_misc = root_count / misc_count;
>>
>> @@ -270,15 +298,28 @@ int amd_cache_northbridges(void)
>>                }
>>        }
>>
>> -     nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
>> +     if (misc_count_noncpu) {
>> +             /*
>> +              * The first non-CPU Node ID starts at 8 even if there are fewer
>> +              * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb
>> +              * indexing scheme, allocate the number of GPU nodes plus 8.
>> +              * Some allocated amd_northbridge structures will go unused when
>> +              * the number of CPU nodes is less than 8, but this tradeoff is to
>> +              * keep things relatively simple.
> Why simple?
>
> What's wrong with having
>
> [node IDs][GPU node IDs]
>
> i.e., the usual nodes come first and the GPU ones after it.
>
> You enumerate everything properly here so you can control what goes
> where. Which means, you don't need this NONCPU_NODE_INDEX non-sense at
> all.
>
> Hmmm?
>
>> +              */
>> +             amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
>> +     } else {
>> +             amd_northbridges.num = misc_count;
>> +     }
>> +
>> +     nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL);
>>        if (!nb)
>>                return -ENOMEM;
>>
>>        amd_northbridges.nb = nb;
>> -     amd_northbridges.num = misc_count;
>>
>>        link = misc = root = NULL;
>> -     for (i = 0; i < amd_northbridges.num; i++) {
>> +     for (i = 0; i < misc_count; i++) {
>>                node_to_amd_nb(i)->root = root =
>>                        next_northbridge(root, root_ids);
>>                node_to_amd_nb(i)->misc = misc =
>> @@ -299,6 +340,18 @@ int amd_cache_northbridges(void)
>>                        root = next_northbridge(root, root_ids);
>>        }
>>
>> +     if (misc_count_noncpu) {
>> +             link = misc = root = NULL;
>> +             for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {
> So this is not keeping things relatively simple - this is making you
> jump to the GPU nodes to prepare them too which is making them special.
>
>> +                     node_to_amd_nb(i)->root = root =
>> +                             next_northbridge(root, noncpu_root_ids);
>> +                     node_to_amd_nb(i)->misc = misc =
>> +                             next_northbridge(misc, noncpu_misc_ids);
>> +                     node_to_amd_nb(i)->link = link =
>> +                             next_northbridge(link, noncpu_link_ids);
> And seeing how you put those pointers in ->root, ->misc and ->link,
> you can just as well drop those noncpu_*_ids and put the aldebaran
> PCI IDs simply in amd_root_ids, amd_nb_misc_ids and amd_nb_link_ids
> respectively.
>
> Because to this code, the RAS functionality is no different than any
> other CPU because, well, the interface is those PCI devices. So the
> thing doesn't care if it is GPU or not.
>
> So you don't need any of that separation between GPU and CPU nodes when
> it comes to the RAS code.

The roots_per_misc count is different for the CPU nodes and GPU nodes. 
We tried to address

your comment without introducing pci_dev_id arrays for GPU roots, misc 
and links. But, introducing

GPU ID arrays looks cleaner, let me submit the revised code and we can 
revisit this point.

>
> Makes sense?
>
> --
> Regards/Gruss,
>      Boris.

Regards,

Naveen

>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7Ce18fc8abf0da4da621b908d967b4ff31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637654849356966973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Y5j%2BNctWbWjlpOkQ5DYvDtroWv8MplSBTPgopewm38E%3D&amp;reserved=0
Borislav Petkov Oct. 11, 2021, 6:08 p.m. UTC | #7
On Mon, Oct 11, 2021 at 07:56:34PM +0530, Chatradhi, Naveen Krishna wrote:
> Aldebaran is an AMD GPU name, code submitted [PATCH 000/159] Aldebaran
> support (lists.freedesktop.org)
> <https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html>
> is a part of the DRM framework

A short explanation in your patchset would be very helpful so that a
reader can know what it is and search the net further, if more info is
needed.

> Aldebaran GPU might be a later variant of gfx9 and are connected to the CPU
> sockets via custom xGMI links.
> 
> I could not find any family number associated with the GPUs. The DRM driver
> code uses it as follows and
> 
> does not expose the value to other frameworks in Linux.
> 
> +#define CHIP_ALDEBARAN 25
> 
> in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm

Aha, so Aldebaran is the chip name. And how are those PCI IDs named in
the documentation? Aldebaran data fabric PCI functions or so?

> The roots_per_misc count is different for the CPU nodes and GPU nodes. We
> tried to address
> 
> your comment without introducing pci_dev_id arrays for GPU roots, misc and
> links. But, introducing
> 
> GPU ID arrays looks cleaner, let me submit the revised code and we can
> revisit this point.

Ok, but as I said above, what those devices are, means nothing to the
amd_nb code because that simply enumerates PCI IDs when those things
were simply northbridges.

If the GPU PCI IDs do not fit easily into the scheme then maybe the
scheme has become inadeqate... we'll see...
diff mbox series

Patch

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 455066a06f60..09905f6c7218 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -80,6 +80,16 @@  struct amd_northbridge_info {
 
 #ifdef CONFIG_AMD_NB
 
+/*
+ * On newer heterogeneous systems the data fabrics of the CPUs and GPUs
+ * are connected directly via a custom links, like is done with
+ * 2 socket CPU systems and also within a socket for Multi-chip Module
+ * (MCM) CPUs like Naples.
+ * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value
+ * of 8 (the second GPU node has 9, etc.).
+ */
+#define NONCPU_NODE_INDEX	8
+
 u16 amd_nb_num(void);
 bool amd_nb_has_feature(unsigned int feature);
 struct amd_northbridge *node_to_amd_nb(int node);
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 23dda362dc0f..6ad5664a18aa 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -26,6 +26,8 @@ 
 #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
 #define PCI_DEVICE_ID_AMD_19H_DF_F4	0x1654
 #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT	0x14bb
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4	0x14d4
 
 /* Protect the PCI config register pairs used for SMN and DF indirect access. */
 static DEFINE_MUTEX(smn_mutex);
@@ -94,6 +96,21 @@  static const struct pci_device_id hygon_nb_link_ids[] = {
 	{}
 };
 
+static const struct pci_device_id amd_noncpu_root_ids[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
+	{}
+};
+
+static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
+	{}
+};
+
+static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
+	{}
+};
+
 const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
 	{ 0x00, 0x18, 0x20 },
 	{ 0xff, 0x00, 0x20 },
@@ -230,11 +247,16 @@  int amd_cache_northbridges(void)
 	const struct pci_device_id *misc_ids = amd_nb_misc_ids;
 	const struct pci_device_id *link_ids = amd_nb_link_ids;
 	const struct pci_device_id *root_ids = amd_root_ids;
+
+	const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
+	const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
+	const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
+
 	struct pci_dev *root, *misc, *link;
 	struct amd_northbridge *nb;
 	u16 roots_per_misc = 0;
-	u16 misc_count = 0;
-	u16 root_count = 0;
+	u16 misc_count = 0, misc_count_noncpu = 0;
+	u16 root_count = 0, root_count_noncpu = 0;
 	u16 i, j;
 
 	if (amd_northbridges.num)
@@ -253,10 +275,16 @@  int amd_cache_northbridges(void)
 	if (!misc_count)
 		return -ENODEV;
 
+	while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
+		misc_count_noncpu++;
+
 	root = NULL;
 	while ((root = next_northbridge(root, root_ids)) != NULL)
 		root_count++;
 
+	while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
+		root_count_noncpu++;
+
 	if (root_count) {
 		roots_per_misc = root_count / misc_count;
 
@@ -270,15 +298,28 @@  int amd_cache_northbridges(void)
 		}
 	}
 
-	nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
+	if (misc_count_noncpu) {
+		/*
+		 * The first non-CPU Node ID starts at 8 even if there are fewer
+		 * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb
+		 * indexing scheme, allocate the number of GPU nodes plus 8.
+		 * Some allocated amd_northbridge structures will go unused when
+		 * the number of CPU nodes is less than 8, but this tradeoff is to
+		 * keep things relatively simple.
+		 */
+		amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
+	} else {
+		amd_northbridges.num = misc_count;
+	}
+
+	nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL);
 	if (!nb)
 		return -ENOMEM;
 
 	amd_northbridges.nb = nb;
-	amd_northbridges.num = misc_count;
 
 	link = misc = root = NULL;
-	for (i = 0; i < amd_northbridges.num; i++) {
+	for (i = 0; i < misc_count; i++) {
 		node_to_amd_nb(i)->root = root =
 			next_northbridge(root, root_ids);
 		node_to_amd_nb(i)->misc = misc =
@@ -299,6 +340,18 @@  int amd_cache_northbridges(void)
 			root = next_northbridge(root, root_ids);
 	}
 
+	if (misc_count_noncpu) {
+		link = misc = root = NULL;
+		for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {
+			node_to_amd_nb(i)->root = root =
+				next_northbridge(root, noncpu_root_ids);
+			node_to_amd_nb(i)->misc = misc =
+				next_northbridge(misc, noncpu_misc_ids);
+			node_to_amd_nb(i)->link = link =
+				next_northbridge(link, noncpu_link_ids);
+		}
+	}
+
 	if (amd_gart_present())
 		amd_northbridges.flags |= AMD_NB_GART;
 
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 4bac1831de80..d9aae90dfce9 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -554,6 +554,7 @@ 
 #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F3 0x1493
 #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b
 #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3	0x14d3
 #define PCI_DEVICE_ID_AMD_19H_DF_F3	0x1653
 #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d
 #define PCI_DEVICE_ID_AMD_CNB17H_F3	0x1703