Message ID | 20191216153809.105463-1-Jonathan.Cameron@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | ACPI: Support Generic Initiator proximity domains | expand |
Le 16/12/2019 à 16:38, Jonathan Cameron a écrit : > Introduces a new type of NUMA node for cases where we want to represent > the access characteristics of a non CPU initiator of memory requests, > as these differ from all those for existing nodes containing CPUs and/or > memory. > > These Generic Initiators are presented by the node access0 class in > sysfs in the same way as a CPU. It seems likely that there will be > usecases in which the best 'CPU' is desired and Generic Initiators > should be ignored. The final few patches in this series introduced > access1 which is a new performance class in the sysfs node description > which presents only CPU to memory relationships. Test cases for this > are described below. Hello Jonathan If I want to test this with a fake GI, what are the minimal set of changes I should put in my ACPI tables? Can I just specify a dummy GI in SRAT? What handle should I use there? Thanks Brice
On Wed, 18 Dec 2019 12:32:06 +0100 Brice Goglin <brice.goglin@gmail.com> wrote: > Le 16/12/2019 à 16:38, Jonathan Cameron a écrit : > > Introduces a new type of NUMA node for cases where we want to represent > > the access characteristics of a non CPU initiator of memory requests, > > as these differ from all those for existing nodes containing CPUs and/or > > memory. > > > > These Generic Initiators are presented by the node access0 class in > > sysfs in the same way as a CPU. It seems likely that there will be > > usecases in which the best 'CPU' is desired and Generic Initiators > > should be ignored. The final few patches in this series introduced > > access1 which is a new performance class in the sysfs node description > > which presents only CPU to memory relationships. Test cases for this > > are described below. > > > Hello Jonathan > > If I want to test this with a fake GI, what are the minimal set of > changes I should put in my ACPI tables? Can I just specify a dummy GI in > SRAT? What handle should I use there? Exactly that for a dummy GI. Also extend HMAT and SLIT for the extra proximity domain / initiator. For the handle, anything is fine. This patch set doesn't currently use it. That handle was a bit controversial when this spec feature was being discussed because it can 'disagree' with information from _PXM. The ACPI spec ended up effectively relying on them agreeing. So any handle must identify a device that either doesn't have a _PXM entry or that has one that refers to the same proximity domain. Also note there is a fiddly corner case which is covered by an _OSC. If you have a device that you want to use _PXM to put in a GI only domain then older kernels will not know about the GI domain. Hence ACPI goes through a dance to ensure that a kernel that hasn't announced it is GI aware, doesn't get told anything is in a GI only domain. For testing this series though you can just ignore that. The logic to actually pass that handle based specification through to the devices is complex, so this set relies on _PXM in DSDT to actually associate any device with the Generic Initiator domain. If doing this for a PCI device, note that you need the fix mentioned in the cover letter to actually have _PXM apply to PCI EPs. Note that the _PXM case needs to work anyway as you might have a GI node with multiple GIs and there is no obligation for them all to be specified in SRAT. Once this initial set is in place we can work out how to use the SRAT handle to associate it with a device. To be honest, I haven't really thought about how we'd do that yet. Thanks, Jonathan > > Thanks > > Brice > >
Le 18/12/2019 à 15:50, Jonathan Cameron a écrit : > On Wed, 18 Dec 2019 12:32:06 +0100 > Brice Goglin <brice.goglin@gmail.com> wrote: > >> Le 16/12/2019 à 16:38, Jonathan Cameron a écrit : >>> Introduces a new type of NUMA node for cases where we want to represent >>> the access characteristics of a non CPU initiator of memory requests, >>> as these differ from all those for existing nodes containing CPUs and/or >>> memory. >>> >>> These Generic Initiators are presented by the node access0 class in >>> sysfs in the same way as a CPU. It seems likely that there will be >>> usecases in which the best 'CPU' is desired and Generic Initiators >>> should be ignored. The final few patches in this series introduced >>> access1 which is a new performance class in the sysfs node description >>> which presents only CPU to memory relationships. Test cases for this >>> are described below. >> >> Hello Jonathan >> >> If I want to test this with a fake GI, what are the minimal set of >> changes I should put in my ACPI tables? Can I just specify a dummy GI in >> SRAT? What handle should I use there? > Exactly that for a dummy GI. Also extend HMAT and SLIT for the extra > proximity domain / initiator. I couldn't get this to work (your patches on top of 5.5-rc2). I added the GI in SRAT, and extended HMAT and SLIT accordingly. I don't know if that's expected but I get an additional node in sysfs, with 0kB memory. However the HMAT table gets ignored because find_mem_target() fails in hmat_parse_proximity_domain(). The target should have been allocated in alloc_memory_target() which is called in srat_parse_mem_affinity(), but it seems to me that this function isn't called for GI nodes. Or should SRAT also contain a normal Memory node with same PM as the GI? Brice
On Fri, 20 Dec 2019 22:40:18 +0100 Brice Goglin <brice.goglin@gmail.com> wrote: > Le 18/12/2019 à 15:50, Jonathan Cameron a écrit : > > On Wed, 18 Dec 2019 12:32:06 +0100 > > Brice Goglin <brice.goglin@gmail.com> wrote: > > > >> Le 16/12/2019 à 16:38, Jonathan Cameron a écrit : > >>> Introduces a new type of NUMA node for cases where we want to represent > >>> the access characteristics of a non CPU initiator of memory requests, > >>> as these differ from all those for existing nodes containing CPUs and/or > >>> memory. > >>> > >>> These Generic Initiators are presented by the node access0 class in > >>> sysfs in the same way as a CPU. It seems likely that there will be > >>> usecases in which the best 'CPU' is desired and Generic Initiators > >>> should be ignored. The final few patches in this series introduced > >>> access1 which is a new performance class in the sysfs node description > >>> which presents only CPU to memory relationships. Test cases for this > >>> are described below. > >> > >> Hello Jonathan > >> > >> If I want to test this with a fake GI, what are the minimal set of > >> changes I should put in my ACPI tables? Can I just specify a dummy GI in > >> SRAT? What handle should I use there? > > Exactly that for a dummy GI. Also extend HMAT and SLIT for the extra > > proximity domain / initiator. > > > I couldn't get this to work (your patches on top of 5.5-rc2). I added > the GI in SRAT, and extended HMAT and SLIT accordingly. > > I don't know if that's expected but I get an additional node in sysfs, > with 0kB memory. > > However the HMAT table gets ignored because find_mem_target() fails in > hmat_parse_proximity_domain(). The target should have been allocated in > alloc_memory_target() which is called in srat_parse_mem_affinity(), but > it seems to me that this function isn't called for GI nodes. Or should > SRAT also contain a normal Memory node with same PM as the GI? > Hi Brice, Yes you should see a node with 0kB memory. Same as you get for a processor only node I believe. srat_parse_mem_affinity shouldn't call alloc_memory_target for the GI nodes as they don't have any memory. The hmat table should only refer to GI domains as initiators. Just to check, do you have them listed as a target node? Or perhaps in some hmat proximity entry as memory_PD? To answer your question, SRAT should not contain a normal memory node with the same PXM as that would defeat the whole purpose as we would have been able to have such a domain without Generic Initiators. Also, just to check, x86 or arm64? Thanks for testing this. Jonathan > Brice > >
Le 02/01/2020 à 16:27, Jonathan Cameron a écrit : > >> However the HMAT table gets ignored because find_mem_target() fails in >> hmat_parse_proximity_domain(). The target should have been allocated in >> alloc_memory_target() which is called in srat_parse_mem_affinity(), but >> it seems to me that this function isn't called for GI nodes. Or should >> SRAT also contain a normal Memory node with same PM as the GI? >> > Hi Brice, > > Yes you should see a node with 0kB memory. Same as you get for a processor > only node I believe. > > srat_parse_mem_affinity shouldn't call alloc_memory_target for the GI nodes > as they don't have any memory. The hmat table should only refer to > GI domains as initiators. Just to check, do you have them listed as > a target node? Or perhaps in some hmat proximity entry as memory_PD? > Thanks, I finally got things to work. I am on x86. It's a dual-socket machine with SubNUMA clusters (2 nodes per socket) and NVDIMMs (one dax-kmem node per socket). Before adding a GI, initiators look like this: node0 -> node0 and node4 node1 -> node1 and node5 node2 -> node2 and node4 node3 -> node3 and node5 I added a GI with faster access to node0, node2, node4 (first socket). The GI node becomes an access0 initiator for node4, and node0 and node2 remain access1 initiators. The GI node doesn't become access0 initiator for node0 and node2, likely because of this test : /* * If the Address Range Structure provides a local processor pxm, link * only that one. Otherwise, find the best performance attributes and * register all initiators that match. */ if (target->processor_pxm != PXM_INVAL) { I guess I should split node0-3 into separate CPU nodes and memory nodes in SRAT? Brice
On Thu, 2 Jan 2020 22:37:04 +0100 Brice Goglin <brice.goglin@gmail.com> wrote: > Le 02/01/2020 à 16:27, Jonathan Cameron a écrit : > > > >> However the HMAT table gets ignored because find_mem_target() fails in > >> hmat_parse_proximity_domain(). The target should have been allocated in > >> alloc_memory_target() which is called in srat_parse_mem_affinity(), but > >> it seems to me that this function isn't called for GI nodes. Or should > >> SRAT also contain a normal Memory node with same PM as the GI? > >> > > Hi Brice, > > > > Yes you should see a node with 0kB memory. Same as you get for a processor > > only node I believe. > > > > srat_parse_mem_affinity shouldn't call alloc_memory_target for the GI nodes > > as they don't have any memory. The hmat table should only refer to > > GI domains as initiators. Just to check, do you have them listed as > > a target node? Or perhaps in some hmat proximity entry as memory_PD? > > > > Thanks, I finally got things to work. I am on x86. It's a dual-socket > machine with SubNUMA clusters (2 nodes per socket) and NVDIMMs (one > dax-kmem node per socket). Before adding a GI, initiators look like this: > > node0 -> node0 and node4 > > node1 -> node1 and node5 > > node2 -> node2 and node4 > > node3 -> node3 and node5 > > I added a GI with faster access to node0, node2, node4 (first socket). > > The GI node becomes an access0 initiator for node4, and node0 and node2 > remain access1 initiators. > > The GI node doesn't become access0 initiator for node0 and node2, likely > because of this test : > > /* > * If the Address Range Structure provides a local processor pxm, link > * only that one. Otherwise, find the best performance attributes and > * register all initiators that match. > */ > if (target->processor_pxm != PXM_INVAL) { > > I guess I should split node0-3 into separate CPU nodes and memory nodes > in SRAT? It sounds like it's working as expected. There are a few assumptions made about 'sensible' hmat configurations. 1) If the memory and processor are in the same domain, that should mean the access characteristics within that domain are the best in the system. It is possible to have a setup with very low latency access from a particular processor but also low bandwidth. Another domain may have high bandwidth but long latency. Such systems may occur, but they are probably going to not be for 'normal memory the OS can just use'. 2) If we have a relevant "Memory Proximity Domain Attributes Structure" Note this was renamed in acpi 6.3 from "Address Range Structure" as it no longer has any address ranges. (which are entirely optional btw) that indicates that the memory controller for a given memory lies in the proximity domain of the Initiator specified. If that happens we ignore cases where hmat says somewhere else is nearer via bandwidth and latency. For case 1) I'm not sure we actually enforce it. I think you've hit case 2). Removing the address range structures should work, or as you say you can move that memory into separate memory nodes. It will be a bit of a strange setup though. Assuming node4 is an NVDIMM then that would be closer to a potential real system. With a suitable coherent bus (CCIX is most familiar to me and can do this) You might have _______ ________ _______ | | | | | | | Node0 | | Node4 |---| Node6 | | CPU |-----| Mem + |---| GI | | Mem | | CCHome |---| | |_______| |________| |_______| | | |__________________________| CCHome Cache Coherency directory location to avoid the need for more esoteric cache coherency short cut methods etc. Idea being the GI node is some big fat DB accelerator or similar doing offloaded queries. It has a fat pipe to the NVDIMMs. Lets ignore that, to actually justify the use of a GI only node, you need some more elements as this situation could be represented by fusing node4 and node6 and having asymmetric HMAT between Node0 and the fused Node4. So in conclusion, with your setup, only the NVDIMM nodes look like the sort of memory that might be in a node nearer to a GI than the host. > > Brice Thanks again for looking at this! Jonathan > > > >
Le 03/01/2020 à 11:09, Jonathan Cameron a écrit : > > 1) If the memory and processor are in the same domain, that should mean the > access characteristics within that domain are the best in the system. > It is possible to have a setup with very low latency access > from a particular processor but also low bandwidth. Another domain may have > high bandwidth but long latency. Such systems may occur, but they are probably > going to not be for 'normal memory the OS can just use'. > > 2) If we have a relevant "Memory Proximity Domain Attributes Structure" > Note this was renamed in acpi 6.3 from "Address Range Structure" as > it no longer has any address ranges. > (which are entirely optional btw) that indicates that the memory controller > for a given memory lies in the proximity domain of the Initiator specified. > If that happens we ignore cases where hmat says somewhere else is nearer > via bandwidth and latency. > > For case 1) I'm not sure we actually enforce it. > I think you've hit case 2). > > Removing the address range structures should work, or as you say you can > move that memory into separate memory nodes. I removed the "processor proximity domain valid" flag from the address range structure of node2, and the GI is now its access0 initiator instead of node2 itself. Looks like it confirms I was in case 2) Thanks Brice
On Fri, 3 Jan 2020 13:18:59 +0100 Brice Goglin <brice.goglin@gmail.com> wrote: > Le 03/01/2020 à 11:09, Jonathan Cameron a écrit : > > > > 1) If the memory and processor are in the same domain, that should mean the > > access characteristics within that domain are the best in the system. > > It is possible to have a setup with very low latency access > > from a particular processor but also low bandwidth. Another domain may have > > high bandwidth but long latency. Such systems may occur, but they are probably > > going to not be for 'normal memory the OS can just use'. > > > > 2) If we have a relevant "Memory Proximity Domain Attributes Structure" > > Note this was renamed in acpi 6.3 from "Address Range Structure" as > > it no longer has any address ranges. > > (which are entirely optional btw) that indicates that the memory controller > > for a given memory lies in the proximity domain of the Initiator specified. > > If that happens we ignore cases where hmat says somewhere else is nearer > > via bandwidth and latency. > > > > For case 1) I'm not sure we actually enforce it. > > I think you've hit case 2). > > > > Removing the address range structures should work, or as you say you can > > move that memory into separate memory nodes. > > > I removed the "processor proximity domain valid" flag from the address > range structure of node2, and the GI is now its access0 initiator > instead of node2 itself. Looks like it confirms I was in case 2) > > Thanks > > Brice Cool. I was wondering if that change would work fine. It is a somewhat crazy setup so I didn't have an equivalent in my test set. Sounds like all is working as expected. Thanks, Jonathan