Message ID | 1457049339-23351-3-git-send-email-ddaney.cavm@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Mar 3, 2016 at 5:55 PM, David Daney <ddaney.cavm@gmail.com> wrote: > From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> > > Add DT bindings for numa mapping of memory, CPUs and IOs. > > Reviewed-by: Robert Richter <rrichter@cavium.com> > Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> > Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Rob Herring <robh@kernel.org>
On 03.03.16 15:55:35, David Daney wrote: > From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> > > Add DT bindings for numa mapping of memory, CPUs and IOs. > > Reviewed-by: Robert Richter <rrichter@cavium.com> > Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> > Signed-off-by: David Daney <david.daney@cavium.com> > --- > Documentation/devicetree/bindings/numa.txt | 272 +++++++++++++++++++++++++++++ > 1 file changed, 272 insertions(+) > create mode 100644 Documentation/devicetree/bindings/numa.txt > > diff --git a/Documentation/devicetree/bindings/numa.txt b/Documentation/devicetree/bindings/numa.txt > new file mode 100644 > index 0000000..ec5ed7c > --- /dev/null > +++ b/Documentation/devicetree/bindings/numa.txt > +============================================================================== > +3 - distance-map > +============================================================================== > + > +The device tree node distance-map describes the relative > +distance (memory latency) between all numa nodes. > + > +- compatible : Should at least contain "numa-distance-map-v1". > + > +- distance-matrix > + This property defines a matrix to describe the relative distances > + between all numa nodes. > + It is represented as a list of node pairs and their relative distance. > + > + Note: > + 1. Each entry represents distance from first node to second node. > + The distances are equal in either direction. > + 2. The distance from a node to self (local distance) is represented > + with value 10 and all internode distance should be represented with > + a value greater than 10. > + 3. distance-matrix should have entries in lexicographical ascending > + order of nodes. > + 4. There must be only one device node distance-map which must reside in the root node. There is no note that this one is optional, but is it right? The default is 10 for local and 20 for remote connections. If so, then ... static int __init of_numa_parse_distance_map(void) { int ret = -EINVAL; struct device_node *np = of_find_node_by_path("/distance-map"); if (!np) return ret; must return 0 instead of -EINVAL here. -Robert > + > +Example: > + 4 nodes connected in mesh/ring topology as below, > + > + 0_______20______1 > + | | > + | | > + 20 20 > + | | > + | | > + |_______________| > + 3 20 2 > + > + if relative distance for each hop is 20, > + then internode distance would be, > + 0 -> 1 = 20 > + 1 -> 2 = 20 > + 2 -> 3 = 20 > + 3 -> 0 = 20 > + 0 -> 2 = 40 > + 1 -> 3 = 40 > + > + and dt presentation for this distance matrix is, > + > + distance-map { > + compatible = "numa-distance-map-v1"; > + distance-matrix = <0 0 10>, > + <0 1 20>, > + <0 2 40>, > + <0 3 20>, > + <1 0 20>, > + <1 1 10>, > + <1 2 20>, > + <1 3 40>, > + <2 0 40>, > + <2 1 20>, > + <2 2 10>, > + <2 3 20>, > + <3 0 20>, > + <3 1 40>, > + <3 2 20>, > + <3 3 10>; > + };
On 03/07/2016 11:22 AM, Robert Richter wrote: > On 03.03.16 15:55:35, David Daney wrote: >> From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> >> >> Add DT bindings for numa mapping of memory, CPUs and IOs. >> >> Reviewed-by: Robert Richter <rrichter@cavium.com> >> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> >> Signed-off-by: David Daney <david.daney@cavium.com> >> --- >> Documentation/devicetree/bindings/numa.txt | 272 +++++++++++++++++++++++++++++ >> 1 file changed, 272 insertions(+) >> create mode 100644 Documentation/devicetree/bindings/numa.txt >> >> diff --git a/Documentation/devicetree/bindings/numa.txt b/Documentation/devicetree/bindings/numa.txt >> new file mode 100644 >> index 0000000..ec5ed7c >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/numa.txt > >> +============================================================================== >> +3 - distance-map >> +============================================================================== >> + >> +The device tree node distance-map describes the relative >> +distance (memory latency) between all numa nodes. >> + >> +- compatible : Should at least contain "numa-distance-map-v1". >> + >> +- distance-matrix >> + This property defines a matrix to describe the relative distances >> + between all numa nodes. >> + It is represented as a list of node pairs and their relative distance. >> + >> + Note: >> + 1. Each entry represents distance from first node to second node. >> + The distances are equal in either direction. >> + 2. The distance from a node to self (local distance) is represented >> + with value 10 and all internode distance should be represented with >> + a value greater than 10. >> + 3. distance-matrix should have entries in lexicographical ascending >> + order of nodes. >> + 4. There must be only one device node distance-map which must reside in the root node. > > There is no note that this one is optional, but is it right? The > default is 10 for local and 20 for remote connections. > Do we need to explicitly state that it is optional? Many node types are optional, and their binding specifications don't really talk about their being optional. If the node is present then it has the meaning specified. If the node is *not* present, then the special meaning described in the bindings document does not apply. In the case of NUMA, this means that all memory is equally distant (i.e. it is *Uniform*), and we are not talking about a *Non* *Uniform* Memory Architecture (NUMA) system. > If so, then ... > > static int __init of_numa_parse_distance_map(void) > { > int ret = -EINVAL; > struct device_node *np = of_find_node_by_path("/distance-map"); > > if (!np) > return ret; > > must return 0 instead of -EINVAL here. No, I don't think doing that would be correct. If there is no "distance-map", then of_numa_init() returns the error code. This causes the code in arch/arm64/kernel/numa.c to fall back to the non-NUMA "dummy_numa" case. By adding your Reviewed-by: Robert Richter <rrichter@cavium.com> tag to patch 5/6, where we select between "real" and "dummy_numa", I had assumed that you agreed with this approach. David Daney
On Tue, Mar 8, 2016 at 1:17 AM, David Daney <ddaney@caviumnetworks.com> wrote: > On 03/07/2016 11:22 AM, Robert Richter wrote: >> >> On 03.03.16 15:55:35, David Daney wrote: >>> >>> From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> >>> >>> Add DT bindings for numa mapping of memory, CPUs and IOs. >>> >>> Reviewed-by: Robert Richter <rrichter@cavium.com> >>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> >>> Signed-off-by: David Daney <david.daney@cavium.com> >>> --- >>> Documentation/devicetree/bindings/numa.txt | 272 >>> +++++++++++++++++++++++++++++ >>> 1 file changed, 272 insertions(+) >>> create mode 100644 Documentation/devicetree/bindings/numa.txt >>> >>> diff --git a/Documentation/devicetree/bindings/numa.txt >>> b/Documentation/devicetree/bindings/numa.txt >>> new file mode 100644 >>> index 0000000..ec5ed7c >>> --- /dev/null >>> +++ b/Documentation/devicetree/bindings/numa.txt >> >> >>> >>> +============================================================================== >>> +3 - distance-map >>> >>> +============================================================================== >>> + >>> +The device tree node distance-map describes the relative >>> +distance (memory latency) between all numa nodes. >>> + >>> +- compatible : Should at least contain "numa-distance-map-v1". >>> + >>> +- distance-matrix >>> + This property defines a matrix to describe the relative distances >>> + between all numa nodes. >>> + It is represented as a list of node pairs and their relative distance. >>> + >>> + Note: >>> + 1. Each entry represents distance from first node to second node. >>> + The distances are equal in either direction. >>> + 2. The distance from a node to self (local distance) is >>> represented >>> + with value 10 and all internode distance should be represented >>> with >>> + a value greater than 10. >>> + 3. distance-matrix should have entries in lexicographical >>> ascending >>> + order of nodes. >>> + 4. There must be only one device node distance-map which must >>> reside in the root node. >> >> >> There is no note that this one is optional, but is it right? The >> default is 10 for local and 20 for remote connections. >> > > Do we need to explicitly state that it is optional? Many node types are > optional, and their binding specifications don't really talk about their > being optional. > > If the node is present then it has the meaning specified. > > If the node is *not* present, then the special meaning described in the > bindings document does not apply. > > In the case of NUMA, this means that all memory is equally distant (i.e. it > is *Uniform*), and we are not talking about a *Non* *Uniform* Memory > Architecture (NUMA) system. > > >> If so, then ... >> >> static int __init of_numa_parse_distance_map(void) >> { >> int ret = -EINVAL; >> struct device_node *np = of_find_node_by_path("/distance-map"); >> >> if (!np) >> return ret; >> >> must return 0 instead of -EINVAL here. > > > No, I don't think doing that would be correct. > > If there is no "distance-map", then of_numa_init() returns the error code. > This causes the code in arch/arm64/kernel/numa.c to fall back to the > non-NUMA "dummy_numa" case. IMO, return 0 will allow 2 node system to have distance-map optional. by default node distance is set to 10 for local node and for remote node is 20 and this will suffice the need of 2 node system. by returning EINVAL, we are forcing 2 node system (and even for systems which has equal remote distances) to define distance-map. > > By adding your Reviewed-by: Robert Richter <rrichter@cavium.com> tag to > patch 5/6, where we select between "real" and "dummy_numa", I had assumed > that you agreed with this approach. > > David Daney Ganapat > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 08.03.16 10:31:33, Ganapatrao Kulkarni wrote: > On Tue, Mar 8, 2016 at 1:17 AM, David Daney <ddaney@caviumnetworks.com> wrote: > > On 03/07/2016 11:22 AM, Robert Richter wrote: > >> > >> On 03.03.16 15:55:35, David Daney wrote: > >>> > >>> From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> > >>> > >>> Add DT bindings for numa mapping of memory, CPUs and IOs. > >>> > >>> Reviewed-by: Robert Richter <rrichter@cavium.com> > >>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> > >>> Signed-off-by: David Daney <david.daney@cavium.com> > >>> --- > >>> Documentation/devicetree/bindings/numa.txt | 272 > >>> +++++++++++++++++++++++++++++ > >>> 1 file changed, 272 insertions(+) > >>> create mode 100644 Documentation/devicetree/bindings/numa.txt > >>> > >>> diff --git a/Documentation/devicetree/bindings/numa.txt > >>> b/Documentation/devicetree/bindings/numa.txt > >>> new file mode 100644 > >>> index 0000000..ec5ed7c > >>> --- /dev/null > >>> +++ b/Documentation/devicetree/bindings/numa.txt > >> > >> > >>> > >>> +============================================================================== > >>> +3 - distance-map > >>> > >>> +============================================================================== > >>> + > >>> +The device tree node distance-map describes the relative > >>> +distance (memory latency) between all numa nodes. > >>> + > >>> +- compatible : Should at least contain "numa-distance-map-v1". > >>> + > >>> +- distance-matrix > >>> + This property defines a matrix to describe the relative distances > >>> + between all numa nodes. > >>> + It is represented as a list of node pairs and their relative distance. > >>> + > >>> + Note: > >>> + 1. Each entry represents distance from first node to second node. > >>> + The distances are equal in either direction. > >>> + 2. The distance from a node to self (local distance) is > >>> represented > >>> + with value 10 and all internode distance should be represented > >>> with > >>> + a value greater than 10. > >>> + 3. distance-matrix should have entries in lexicographical > >>> ascending > >>> + order of nodes. > >>> + 4. There must be only one device node distance-map which must > >>> reside in the root node. > >> > >> > >> There is no note that this one is optional, but is it right? The > >> default is 10 for local and 20 for remote connections. > >> > > > > Do we need to explicitly state that it is optional? Many node types are > > optional, and their binding specifications don't really talk about their > > being optional. > > > > If the node is present then it has the meaning specified. > > > > If the node is *not* present, then the special meaning described in the > > bindings document does not apply. > > > > In the case of NUMA, this means that all memory is equally distant (i.e. it > > is *Uniform*), and we are not talking about a *Non* *Uniform* Memory > > Architecture (NUMA) system. > > > > > >> If so, then ... > >> > >> static int __init of_numa_parse_distance_map(void) > >> { > >> int ret = -EINVAL; > >> struct device_node *np = of_find_node_by_path("/distance-map"); > >> > >> if (!np) > >> return ret; > >> > >> must return 0 instead of -EINVAL here. > > > > > > No, I don't think doing that would be correct. > > > > If there is no "distance-map", then of_numa_init() returns the error code. > > This causes the code in arch/arm64/kernel/numa.c to fall back to the > > non-NUMA "dummy_numa" case. Which means distance-map is not optional for numa. > IMO, return 0 will allow 2 node system to have distance-map optional. > by default node distance is set to 10 for local node and for remote node is 20 > and this will suffice the need of 2 node system. > by returning EINVAL, we are forcing 2 node system (and even for > systems which has equal remote distances) > to define distance-map. This is exaclty the problem. numa_init() fails if there is no distance-map node. > > By adding your Reviewed-by: Robert Richter <rrichter@cavium.com> tag to > > patch 5/6, where we select between "real" and "dummy_numa", I had assumed > > that you agreed with this approach. This is from v6: if (strcmp(uname, "distance-map") != 0) return 0; -Robert
diff --git a/Documentation/devicetree/bindings/numa.txt b/Documentation/devicetree/bindings/numa.txt new file mode 100644 index 0000000..ec5ed7c --- /dev/null +++ b/Documentation/devicetree/bindings/numa.txt @@ -0,0 +1,272 @@ +============================================================================== +NUMA binding description. +============================================================================== + +============================================================================== +1 - Introduction +============================================================================== + +Systems employing a Non Uniform Memory Access (NUMA) architecture contain +collections of hardware resources including processors, memory, and I/O buses, +that comprise what is commonly known as a NUMA node. +Processor accesses to memory within the local NUMA node is generally faster +than processor accesses to memory outside of the local NUMA node. +DT defines interfaces that allow the platform to convey NUMA node +topology information to OS. + +============================================================================== +2 - numa-node-id +============================================================================== + +For the purpose of identification, each NUMA node is associated with a unique +token known as a node id. For the purpose of this binding +a node id is a 32-bit integer. + +A device node is associated with a NUMA node by the presence of a +numa-node-id property which contains the node id of the device. + +Example: + /* numa node 0 */ + numa-node-id = <0>; + + /* numa node 1 */ + numa-node-id = <1>; + +============================================================================== +3 - distance-map +============================================================================== + +The device tree node distance-map describes the relative +distance (memory latency) between all numa nodes. + +- compatible : Should at least contain "numa-distance-map-v1". + +- distance-matrix + This property defines a matrix to describe the relative distances + between all numa nodes. + It is represented as a list of node pairs and their relative distance. + + Note: + 1. Each entry represents distance from first node to second node. + The distances are equal in either direction. + 2. The distance from a node to self (local distance) is represented + with value 10 and all internode distance should be represented with + a value greater than 10. + 3. distance-matrix should have entries in lexicographical ascending + order of nodes. + 4. There must be only one device node distance-map which must reside in the root node. + +Example: + 4 nodes connected in mesh/ring topology as below, + + 0_______20______1 + | | + | | + 20 20 + | | + | | + |_______________| + 3 20 2 + + if relative distance for each hop is 20, + then internode distance would be, + 0 -> 1 = 20 + 1 -> 2 = 20 + 2 -> 3 = 20 + 3 -> 0 = 20 + 0 -> 2 = 40 + 1 -> 3 = 40 + + and dt presentation for this distance matrix is, + + distance-map { + compatible = "numa-distance-map-v1"; + distance-matrix = <0 0 10>, + <0 1 20>, + <0 2 40>, + <0 3 20>, + <1 0 20>, + <1 1 10>, + <1 2 20>, + <1 3 40>, + <2 0 40>, + <2 1 20>, + <2 2 10>, + <2 3 20>, + <3 0 20>, + <3 1 40>, + <3 2 20>, + <3 3 10>; + }; + +============================================================================== +4 - Example dts +============================================================================== + +Dual socket system consists of 2 boards connected through ccn bus and +each board having one socket/soc of 8 cpus, memory and pci bus. + + memory@c00000 { + device_type = "memory"; + reg = <0x0 0xc00000 0x0 0x80000000>; + /* node 0 */ + numa-node-id = <0>; + }; + + memory@10000000000 { + device_type = "memory"; + reg = <0x100 0x0 0x0 0x80000000>; + /* node 1 */ + numa-node-id = <1>; + }; + + cpus { + #address-cells = <2>; + #size-cells = <0>; + + cpu@0 { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0x0>; + enable-method = "psci"; + /* node 0 */ + numa-node-id = <0>; + }; + cpu@1 { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0x1>; + enable-method = "psci"; + numa-node-id = <0>; + }; + cpu@2 { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0x2>; + enable-method = "psci"; + numa-node-id = <0>; + }; + cpu@3 { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0x3>; + enable-method = "psci"; + numa-node-id = <0>; + }; + cpu@4 { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0x4>; + enable-method = "psci"; + numa-node-id = <0>; + }; + cpu@5 { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0x5>; + enable-method = "psci"; + numa-node-id = <0>; + }; + cpu@6 { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0x6>; + enable-method = "psci"; + numa-node-id = <0>; + }; + cpu@7 { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0x7>; + enable-method = "psci"; + numa-node-id = <0>; + }; + cpu@8 { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0x8>; + enable-method = "psci"; + /* node 1 */ + numa-node-id = <1>; + }; + cpu@9 { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0x9>; + enable-method = "psci"; + numa-node-id = <1>; + }; + cpu@a { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0xa>; + enable-method = "psci"; + numa-node-id = <1>; + }; + cpu@b { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0xb>; + enable-method = "psci"; + numa-node-id = <1>; + }; + cpu@c { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0xc>; + enable-method = "psci"; + numa-node-id = <1>; + }; + cpu@d { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0xd>; + enable-method = "psci"; + numa-node-id = <1>; + }; + cpu@e { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0xe>; + enable-method = "psci"; + numa-node-id = <1>; + }; + cpu@f { + device_type = "cpu"; + compatible = "arm,armv8"; + reg = <0x0 0xf>; + enable-method = "psci"; + numa-node-id = <1>; + }; + }; + + pcie0: pcie0@848000000000 { + compatible = "arm,armv8"; + device_type = "pci"; + bus-range = <0 255>; + #size-cells = <2>; + #address-cells = <3>; + reg = <0x8480 0x00000000 0 0x10000000>; /* Configuration space */ + ranges = <0x03000000 0x8010 0x00000000 0x8010 0x00000000 0x70 0x00000000>; + /* node 0 */ + numa-node-id = <0>; + }; + + pcie1: pcie1@948000000000 { + compatible = "arm,armv8"; + device_type = "pci"; + bus-range = <0 255>; + #size-cells = <2>; + #address-cells = <3>; + reg = <0x9480 0x00000000 0 0x10000000>; /* Configuration space */ + ranges = <0x03000000 0x9010 0x00000000 0x9010 0x00000000 0x70 0x00000000>; + /* node 1 */ + numa-node-id = <1>; + }; + + distance-map { + compatible = "numa-distance-map-v1"; + distance-matrix = <0 0 10>, + <0 1 20>, + <1 1 10>; + };