Message ID | 20181114224921.12123-2-keith.busch@intel.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | ACPI HMAT memory sysfs representation | expand |
On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: > Memory-only nodes will often have affinity to a compute node, and > platforms have ways to express that locality relationship. > > A node containing CPUs or other DMA devices that can initiate memory > access are referred to as "memory iniators". A "memory target" is a > node that provides at least one phyiscal address range accessible to a > memory initiator. I think I may be confused here. If there is _no_ link from node X to node Y, does that mean that node X's CPUs cannot access the memory on node Y? In my mind, all nodes can access all memory in the system, just not with uniform bandwidth/latency.
On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote: > On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: > > Memory-only nodes will often have affinity to a compute node, and > > platforms have ways to express that locality relationship. > > > > A node containing CPUs or other DMA devices that can initiate memory > > access are referred to as "memory iniators". A "memory target" is a > > node that provides at least one phyiscal address range accessible to a > > memory initiator. > > I think I may be confused here. If there is _no_ link from node X to > node Y, does that mean that node X's CPUs cannot access the memory on > node Y? In my mind, all nodes can access all memory in the system, > just not with uniform bandwidth/latency. The link is just about which nodes are "local". It's like how nodes have a cpulist. Other CPUs not in the node's list can acces that node's memory, but the ones in the mask are local, and provide useful optimization hints. Would a node mask would be prefered to symlinks?
On Thu, Nov 15, 2018 at 7:02 AM Keith Busch <keith.busch@intel.com> wrote: > > On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote: > > On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: > > > Memory-only nodes will often have affinity to a compute node, and > > > platforms have ways to express that locality relationship. > > > > > > A node containing CPUs or other DMA devices that can initiate memory > > > access are referred to as "memory iniators". A "memory target" is a > > > node that provides at least one phyiscal address range accessible to a > > > memory initiator. > > > > I think I may be confused here. If there is _no_ link from node X to > > node Y, does that mean that node X's CPUs cannot access the memory on > > node Y? In my mind, all nodes can access all memory in the system, > > just not with uniform bandwidth/latency. > > The link is just about which nodes are "local". It's like how nodes have > a cpulist. Other CPUs not in the node's list can acces that node's memory, > but the ones in the mask are local, and provide useful optimization hints. > > Would a node mask would be prefered to symlinks? I think that would be more flexible, because the set of initiators that may have "best" or "local" access to a target may be more than 1.
On Thu, Nov 15, 2018 at 07:59:20AM -0700, Keith Busch wrote: > On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote: > > On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: > > > Memory-only nodes will often have affinity to a compute node, and > > > platforms have ways to express that locality relationship. > > > > > > A node containing CPUs or other DMA devices that can initiate memory > > > access are referred to as "memory iniators". A "memory target" is a > > > node that provides at least one phyiscal address range accessible to a > > > memory initiator. > > > > I think I may be confused here. If there is _no_ link from node X to > > node Y, does that mean that node X's CPUs cannot access the memory on > > node Y? In my mind, all nodes can access all memory in the system, > > just not with uniform bandwidth/latency. > > The link is just about which nodes are "local". It's like how nodes have > a cpulist. Other CPUs not in the node's list can acces that node's memory, > but the ones in the mask are local, and provide useful optimization hints. So ... let's imagine a hypothetical system (I've never seen one built like this, but it doesn't seem too implausible). Connect four CPU sockets in a square, each of which has some regular DIMMs attached to it. CPU A is 0 hops to Memory A, one hop to Memory B and Memory C, and two hops from Memory D (each CPU only has two "QPI" links). Then maybe there's some special memory extender device attached on the PCIe bus. Now there's Memory B1 and B2 that's attached to CPU B and it's local to CPU B, but not as local as Memory B is ... and we'd probably _prefer_ to allocate memory for CPU A from Memory B1 than from Memory D. But ... *mumble*, this seems hard. I understand you're trying to reflect what the HMAT table is telling you, I'm just really fuzzy on who's ultimately consuming this information and what decisions they're trying to drive from it. > Would a node mask would be prefered to symlinks? I don't have a strong opinion here, but what Dan says makes sense.
On Thu, Nov 15, 2018 at 12:36:54PM -0800, Matthew Wilcox wrote: > On Thu, Nov 15, 2018 at 07:59:20AM -0700, Keith Busch wrote: > > On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote: > > > On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: > > > > Memory-only nodes will often have affinity to a compute node, and > > > > platforms have ways to express that locality relationship. > > > > > > > > A node containing CPUs or other DMA devices that can initiate memory > > > > access are referred to as "memory iniators". A "memory target" is a > > > > node that provides at least one phyiscal address range accessible to a > > > > memory initiator. > > > > > > I think I may be confused here. If there is _no_ link from node X to > > > node Y, does that mean that node X's CPUs cannot access the memory on > > > node Y? In my mind, all nodes can access all memory in the system, > > > just not with uniform bandwidth/latency. > > > > The link is just about which nodes are "local". It's like how nodes have > > a cpulist. Other CPUs not in the node's list can acces that node's memory, > > but the ones in the mask are local, and provide useful optimization hints. > > So ... let's imagine a hypothetical system (I've never seen one built like > this, but it doesn't seem too implausible). Connect four CPU sockets in > a square, each of which has some regular DIMMs attached to it. CPU A is > 0 hops to Memory A, one hop to Memory B and Memory C, and two hops from > Memory D (each CPU only has two "QPI" links). Then maybe there's some > special memory extender device attached on the PCIe bus. Now there's > Memory B1 and B2 that's attached to CPU B and it's local to CPU B, but > not as local as Memory B is ... and we'd probably _prefer_ to allocate > memory for CPU A from Memory B1 than from Memory D. But ... *mumble*, > this seems hard. Indeed, that particular example is out of scope for this series. The first objective is to aid a process running in node B's CPUs to allocate memory in B1. Anything that crosses QPI are their own. > I understand you're trying to reflect what the HMAT table is telling you, > I'm just really fuzzy on who's ultimately consuming this information > and what decisions they're trying to drive from it. Intended consumers include processes using numa_alloc_onnode() and mbind(). Consider a system with faster DRAM and slower persistent memory. Such a system may group the DRAM in a different proximity domain than the persistent memory, and both are local to yet another proximity domain that contains the CPUs. HMAT provides a way to express that relationship, and this patch provides a user facing abstraction for that information.
On Thu, Nov 15, 2018 at 12:37 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Thu, Nov 15, 2018 at 07:59:20AM -0700, Keith Busch wrote: > > On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote: > > > On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: > > > > Memory-only nodes will often have affinity to a compute node, and > > > > platforms have ways to express that locality relationship. > > > > > > > > A node containing CPUs or other DMA devices that can initiate memory > > > > access are referred to as "memory iniators". A "memory target" is a > > > > node that provides at least one phyiscal address range accessible to a > > > > memory initiator. > > > > > > I think I may be confused here. If there is _no_ link from node X to > > > node Y, does that mean that node X's CPUs cannot access the memory on > > > node Y? In my mind, all nodes can access all memory in the system, > > > just not with uniform bandwidth/latency. > > > > The link is just about which nodes are "local". It's like how nodes have > > a cpulist. Other CPUs not in the node's list can acces that node's memory, > > but the ones in the mask are local, and provide useful optimization hints. > > So ... let's imagine a hypothetical system (I've never seen one built like > this, but it doesn't seem too implausible). Connect four CPU sockets in > a square, each of which has some regular DIMMs attached to it. CPU A is > 0 hops to Memory A, one hop to Memory B and Memory C, and two hops from > Memory D (each CPU only has two "QPI" links). Then maybe there's some > special memory extender device attached on the PCIe bus. Now there's > Memory B1 and B2 that's attached to CPU B and it's local to CPU B, but > not as local as Memory B is ... and we'd probably _prefer_ to allocate > memory for CPU A from Memory B1 than from Memory D. But ... *mumble*, > this seems hard. > > I understand you're trying to reflect what the HMAT table is telling you, > I'm just really fuzzy on who's ultimately consuming this information > and what decisions they're trying to drive from it. The singular "local" is a limitation of the HMAT, but I would expect the Linux translation of "local" would allow for multiple initiators that can achieve some semblance of the "best" performance. Anything less than best is going to have a wide range of variance and will likely devolve to looking at the platform firmware data table directly. The expected 80% case is software wants to be able to ask "which CPUs should I run on to get the best access to this memory?"
On 11/15/2018 04:19 AM, Keith Busch wrote: > Memory-only nodes will often have affinity to a compute node, and > platforms have ways to express that locality relationship. It may not have a local affinity to any compute node but it might have a valid NUMA distance from all available compute nodes. This is particularly true when the coherent device memory which is accessible from all available compute nodes without having local affinity to any compute node other than the device compute which may or not be represented as a NUMA node in itself. But in case of normally system memory also, a memory only node might be far from other CPU nodes and may not have CPUs of it's own. In that case there is no local affinity anyways. > > A node containing CPUs or other DMA devices that can initiate memory > access are referred to as "memory iniators". A "memory target" is a Memory initiators should also include heterogeneous compute elements like GPU cores, FPGA elements etc apart from CPU and DMA engines. > node that provides at least one phyiscal address range accessible to a > memory initiator. This definition for "memory target" makes sense. Coherent accesses within PA range from all possible "memory initiators" which should also include heterogeneous compute elements as mentioned before. > > In preparation for these systems, provide a new kernel API to link > the target memory node to its initiator compute node with symlinks to > each other. Makes sense but how would we really define NUMA placement for various heterogeneous compute elements which are connected differently to the system bus differently than the CPU and DMA. > > The following example shows the new sysfs hierarchy setup for memory node > 'Y' local to commpute node 'X': > > # ls -l /sys/devices/system/node/nodeX/initiator* > /sys/devices/system/node/nodeX/targetY -> ../nodeY > > # ls -l /sys/devices/system/node/nodeY/target* > /sys/devices/system/node/nodeY/initiatorX -> ../nodeX This inter linking makes sense but once we are able to define all possible memory initiators and memory targets as NUMA nodes (which might not very trivial) taking into account heterogeneous compute environment. But this linking at least establishes the coherency relationship between memory initiators and memory targets. > > Signed-off-by: Keith Busch <keith.busch@intel.com> > --- > drivers/base/node.c | 32 ++++++++++++++++++++++++++++++++ > include/linux/node.h | 2 ++ > 2 files changed, 34 insertions(+) > > diff --git a/drivers/base/node.c b/drivers/base/node.c > index 86d6cd92ce3d..a9b7512a9502 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -372,6 +372,38 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid) > kobject_name(&node_devices[nid]->dev.kobj)); > } > > +int register_memory_node_under_compute_node(unsigned int m, unsigned int p) > +{ > + int ret; > + char initiator[20], target[17]; 20, 17 seems arbitrary here. > + > + if (!node_online(p) || !node_online(m)) > + return -ENODEV; Just wondering how a NUMA node for group of GPU compute elements will look like which are not manage by kernel but are still memory initiators having access to a number of memory targets. > + if (m == p) > + return 0; Why skip ? Should not we link memory target to it's own node which can be it's memory initiator as well. Caller of this linking function might decide on whether the memory target is accessible from same NUMA node as a memory initiator or not. > + > + snprintf(initiator, sizeof(initiator), "initiator%d", p); > + snprintf(target, sizeof(target), "target%d", m); > + > + ret = sysfs_create_link(&node_devices[p]->dev.kobj, > + &node_devices[m]->dev.kobj, > + target); > + if (ret) > + return ret; > + > + ret = sysfs_create_link(&node_devices[m]->dev.kobj, > + &node_devices[p]->dev.kobj, > + initiator); > + if (ret) > + goto err; > + > + return 0; > + err: > + sysfs_remove_link(&node_devices[p]->dev.kobj, > + kobject_name(&node_devices[m]->dev.kobj)); > + return ret; > +} > + > int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) > { > struct device *obj; > diff --git a/include/linux/node.h b/include/linux/node.h > index 257bb3d6d014..1fd734a3fb3f 100644 > --- a/include/linux/node.h > +++ b/include/linux/node.h > @@ -75,6 +75,8 @@ extern int register_mem_sect_under_node(struct memory_block *mem_blk, > extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk, > unsigned long phys_index); > > +extern int register_memory_node_under_compute_node(unsigned int m, unsigned int p); > + > #ifdef CONFIG_HUGETLBFS > extern void register_hugetlbfs_with_node(node_registration_func_t doregister, > node_registration_func_t unregister); > The code is all good but as mentioned before the primary concern is whether this semantics will be able to correctly represent all possible present and future heterogeneous compute environments with multi attribute memory. This is going to be a kernel API. So apart from various NUMA representation for all possible kinds, the interface has to be abstract with generic elements and room for future extension.
On 11/15/2018 08:29 PM, Keith Busch wrote: > On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote: >> On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: >>> Memory-only nodes will often have affinity to a compute node, and >>> platforms have ways to express that locality relationship. >>> >>> A node containing CPUs or other DMA devices that can initiate memory >>> access are referred to as "memory iniators". A "memory target" is a >>> node that provides at least one phyiscal address range accessible to a >>> memory initiator. >> >> I think I may be confused here. If there is _no_ link from node X to >> node Y, does that mean that node X's CPUs cannot access the memory on >> node Y? In my mind, all nodes can access all memory in the system, >> just not with uniform bandwidth/latency. > > The link is just about which nodes are "local". It's like how nodes have > a cpulist. Other CPUs not in the node's list can acces that node's memory, > but the ones in the mask are local, and provide useful optimization hints. > > Would a node mask would be prefered to symlinks? Having hint for local affinity is definitely a plus but this must provide the coherency matrix to the user preferably in the form of a nodemask for each memory target.
On 11/15/2018 11:20 PM, Dan Williams wrote: > On Thu, Nov 15, 2018 at 7:02 AM Keith Busch <keith.busch@intel.com> wrote: >> >> On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote: >>> On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: >>>> Memory-only nodes will often have affinity to a compute node, and >>>> platforms have ways to express that locality relationship. >>>> >>>> A node containing CPUs or other DMA devices that can initiate memory >>>> access are referred to as "memory iniators". A "memory target" is a >>>> node that provides at least one phyiscal address range accessible to a >>>> memory initiator. >>> >>> I think I may be confused here. If there is _no_ link from node X to >>> node Y, does that mean that node X's CPUs cannot access the memory on >>> node Y? In my mind, all nodes can access all memory in the system, >>> just not with uniform bandwidth/latency. >> >> The link is just about which nodes are "local". It's like how nodes have >> a cpulist. Other CPUs not in the node's list can acces that node's memory, >> but the ones in the mask are local, and provide useful optimization hints. >> >> Would a node mask would be prefered to symlinks? > > I think that would be more flexible, because the set of initiators > that may have "best" or "local" access to a target may be more than 1. Right. The memory target should have two nodemasks (for now at least). One enumerating which initiator nodes can access the memory coherently and the other one which are nearer and can benefit from local allocation.
On 11/17/2018 12:02 AM, Keith Busch wrote: > On Thu, Nov 15, 2018 at 12:36:54PM -0800, Matthew Wilcox wrote: >> On Thu, Nov 15, 2018 at 07:59:20AM -0700, Keith Busch wrote: >>> On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote: >>>> On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: >>>>> Memory-only nodes will often have affinity to a compute node, and >>>>> platforms have ways to express that locality relationship. >>>>> >>>>> A node containing CPUs or other DMA devices that can initiate memory >>>>> access are referred to as "memory iniators". A "memory target" is a >>>>> node that provides at least one phyiscal address range accessible to a >>>>> memory initiator. >>>> >>>> I think I may be confused here. If there is _no_ link from node X to >>>> node Y, does that mean that node X's CPUs cannot access the memory on >>>> node Y? In my mind, all nodes can access all memory in the system, >>>> just not with uniform bandwidth/latency. >>> >>> The link is just about which nodes are "local". It's like how nodes have >>> a cpulist. Other CPUs not in the node's list can acces that node's memory, >>> but the ones in the mask are local, and provide useful optimization hints. >> >> So ... let's imagine a hypothetical system (I've never seen one built like >> this, but it doesn't seem too implausible). Connect four CPU sockets in >> a square, each of which has some regular DIMMs attached to it. CPU A is >> 0 hops to Memory A, one hop to Memory B and Memory C, and two hops from >> Memory D (each CPU only has two "QPI" links). Then maybe there's some >> special memory extender device attached on the PCIe bus. Now there's >> Memory B1 and B2 that's attached to CPU B and it's local to CPU B, but >> not as local as Memory B is ... and we'd probably _prefer_ to allocate >> memory for CPU A from Memory B1 than from Memory D. But ... *mumble*, >> this seems hard. > > Indeed, that particular example is out of scope for this series. The > first objective is to aid a process running in node B's CPUs to allocate > memory in B1. Anything that crosses QPI are their own. This is problematic. Any new kernel API interface should accommodate B2 type memory as well from the above example which is on a PCIe bus. Because eventually they would be represented as some sort of a NUMA node and then applications will have to depend on this sysfs interface for their desired memory placement requirements. Unless this interface is thought through for B2 type of memory, it might not be extensible in the future.
On Mon, Nov 19, 2018 at 08:45:25AM +0530, Anshuman Khandual wrote: > On 11/17/2018 12:02 AM, Keith Busch wrote: > > On Thu, Nov 15, 2018 at 12:36:54PM -0800, Matthew Wilcox wrote: > >> So ... let's imagine a hypothetical system (I've never seen one built like > >> this, but it doesn't seem too implausible). Connect four CPU sockets in > >> a square, each of which has some regular DIMMs attached to it. CPU A is > >> 0 hops to Memory A, one hop to Memory B and Memory C, and two hops from > >> Memory D (each CPU only has two "QPI" links). Then maybe there's some > >> special memory extender device attached on the PCIe bus. Now there's > >> Memory B1 and B2 that's attached to CPU B and it's local to CPU B, but > >> not as local as Memory B is ... and we'd probably _prefer_ to allocate > >> memory for CPU A from Memory B1 than from Memory D. But ... *mumble*, > >> this seems hard. > > > > Indeed, that particular example is out of scope for this series. The > > first objective is to aid a process running in node B's CPUs to allocate > > memory in B1. Anything that crosses QPI are their own. > > This is problematic. Any new kernel API interface should accommodate B2 type > memory as well from the above example which is on a PCIe bus. Because > eventually they would be represented as some sort of a NUMA node and then > applications will have to depend on this sysfs interface for their desired > memory placement requirements. Unless this interface is thought through for > B2 type of memory, it might not be extensible in the future. I'm not sure I understand the concern. The proposal allows linking B to B2 memory.
Keith Busch <keith.busch@intel.com> writes: > On Thu, Nov 15, 2018 at 12:36:54PM -0800, Matthew Wilcox wrote: >> On Thu, Nov 15, 2018 at 07:59:20AM -0700, Keith Busch wrote: >> > On Thu, Nov 15, 2018 at 05:57:10AM -0800, Matthew Wilcox wrote: >> > > On Wed, Nov 14, 2018 at 03:49:14PM -0700, Keith Busch wrote: >> > > > Memory-only nodes will often have affinity to a compute node, and >> > > > platforms have ways to express that locality relationship. >> > > > >> > > > A node containing CPUs or other DMA devices that can initiate memory >> > > > access are referred to as "memory iniators". A "memory target" is a >> > > > node that provides at least one phyiscal address range accessible to a >> > > > memory initiator. >> > > >> > > I think I may be confused here. If there is _no_ link from node X to >> > > node Y, does that mean that node X's CPUs cannot access the memory on >> > > node Y? In my mind, all nodes can access all memory in the system, >> > > just not with uniform bandwidth/latency. >> > >> > The link is just about which nodes are "local". It's like how nodes have >> > a cpulist. Other CPUs not in the node's list can acces that node's memory, >> > but the ones in the mask are local, and provide useful optimization hints. >> >> So ... let's imagine a hypothetical system (I've never seen one built like >> this, but it doesn't seem too implausible). Connect four CPU sockets in >> a square, each of which has some regular DIMMs attached to it. CPU A is >> 0 hops to Memory A, one hop to Memory B and Memory C, and two hops from >> Memory D (each CPU only has two "QPI" links). Then maybe there's some >> special memory extender device attached on the PCIe bus. Now there's >> Memory B1 and B2 that's attached to CPU B and it's local to CPU B, but >> not as local as Memory B is ... and we'd probably _prefer_ to allocate >> memory for CPU A from Memory B1 than from Memory D. But ... *mumble*, >> this seems hard. > > Indeed, that particular example is out of scope for this series. The > first objective is to aid a process running in node B's CPUs to allocate > memory in B1. Anything that crosses QPI are their own. But if you can extrapolate how such a system can possibly be expressed using what is propsed here, it would help in reviewing this. Also how do we intent to express the locality of memory w.r.t to other computing units like GPU/FPGA? I understand that this is looked at as ACPI HMAT in sysfs format. But as mentioned by others in this thread, if we don't do this platform and device independent way, we can have application portability issues going forward? -aneesh
On Tue, Dec 04, 2018 at 09:13:33PM +0530, Aneesh Kumar K.V wrote: > Keith Busch <keith.busch@intel.com> writes: > > > > Indeed, that particular example is out of scope for this series. The > > first objective is to aid a process running in node B's CPUs to allocate > > memory in B1. Anything that crosses QPI are their own. > > But if you can extrapolate how such a system can possibly be expressed > using what is propsed here, it would help in reviewing this. Expressed to what end? This proposal is not trying to express anything other than the best possible pairings because that is the most common information applications will want to know. > Also how > do we intent to express the locality of memory w.r.t to other computing > units like GPU/FPGA? The HMAT parsing at the end of the series provides an example for how others may use the proposed interfaces. > I understand that this is looked at as ACPI HMAT in sysfs format. > But as mentioned by others in this thread, if we don't do this platform > and device independent way, we can have application portability issues > going forward? Only the last patch is specific to HMAT. If there are other ways to get the same attributes, then those drivers or subsystems may also register them with these new kernel interfaces.
diff --git a/drivers/base/node.c b/drivers/base/node.c index 86d6cd92ce3d..a9b7512a9502 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -372,6 +372,38 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid) kobject_name(&node_devices[nid]->dev.kobj)); } +int register_memory_node_under_compute_node(unsigned int m, unsigned int p) +{ + int ret; + char initiator[20], target[17]; + + if (!node_online(p) || !node_online(m)) + return -ENODEV; + if (m == p) + return 0; + + snprintf(initiator, sizeof(initiator), "initiator%d", p); + snprintf(target, sizeof(target), "target%d", m); + + ret = sysfs_create_link(&node_devices[p]->dev.kobj, + &node_devices[m]->dev.kobj, + target); + if (ret) + return ret; + + ret = sysfs_create_link(&node_devices[m]->dev.kobj, + &node_devices[p]->dev.kobj, + initiator); + if (ret) + goto err; + + return 0; + err: + sysfs_remove_link(&node_devices[p]->dev.kobj, + kobject_name(&node_devices[m]->dev.kobj)); + return ret; +} + int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) { struct device *obj; diff --git a/include/linux/node.h b/include/linux/node.h index 257bb3d6d014..1fd734a3fb3f 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -75,6 +75,8 @@ extern int register_mem_sect_under_node(struct memory_block *mem_blk, extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk, unsigned long phys_index); +extern int register_memory_node_under_compute_node(unsigned int m, unsigned int p); + #ifdef CONFIG_HUGETLBFS extern void register_hugetlbfs_with_node(node_registration_func_t doregister, node_registration_func_t unregister);
Memory-only nodes will often have affinity to a compute node, and platforms have ways to express that locality relationship. A node containing CPUs or other DMA devices that can initiate memory access are referred to as "memory iniators". A "memory target" is a node that provides at least one phyiscal address range accessible to a memory initiator. In preparation for these systems, provide a new kernel API to link the target memory node to its initiator compute node with symlinks to each other. The following example shows the new sysfs hierarchy setup for memory node 'Y' local to commpute node 'X': # ls -l /sys/devices/system/node/nodeX/initiator* /sys/devices/system/node/nodeX/targetY -> ../nodeY # ls -l /sys/devices/system/node/nodeY/target* /sys/devices/system/node/nodeY/initiatorX -> ../nodeX Signed-off-by: Keith Busch <keith.busch@intel.com> --- drivers/base/node.c | 32 ++++++++++++++++++++++++++++++++ include/linux/node.h | 2 ++ 2 files changed, 34 insertions(+)