Message ID | 20220622082513.467538-12-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/demotion: Memory tiers and demotion | expand |
Hi "Aneesh, Thank you for the patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] url: https://github.com/intel-lab-lkp/linux/commits/Aneesh-Kumar-K-V/mm-demotion-Memory-tiers-and-demotion/20220622-163031 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything reproduce: make htmldocs If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> Documentation/admin-guide/mm/memory-tiering.rst:5: (SEVERE/4) Title overline & underline mismatch. vim +5 Documentation/admin-guide/mm/memory-tiering.rst 4 > 5 =========== 6 Memory tiers 7 ============ 8
On Thu, Jun 23, 2022 at 05:21:17AM +0800, kernel test robot wrote: > If you fix the issue, kindly add following tag where applicable > Reported-by: kernel test robot <lkp@intel.com> > > All errors (new ones prefixed by >>): > > >> Documentation/admin-guide/mm/memory-tiering.rst:5: (SEVERE/4) Title overline & underline mismatch. > > vim +5 Documentation/admin-guide/mm/memory-tiering.rst > > 4 > > 5 =========== > 6 Memory tiers > 7 ============ > 8 > Here is the fixup. Thanks. ---- >8 ---- From ee8b97451b6ad1869f4d426e2d3825ac20a6e15d Mon Sep 17 00:00:00 2001 From: Bagas Sanjaya <bagasdotme@gmail.com> Date: Sat, 25 Jun 2022 09:48:28 +0700 Subject: [PATCH] fixup for "mm/demotion: Add documentation for memory tiering" Extend the title heading overline by one (=) to match the underline. Fixes: 64fc925cf27dac ("mm/demotion: Add documentation for memory tiering") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> --- Documentation/admin-guide/mm/memory-tiering.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/admin-guide/mm/memory-tiering.rst b/Documentation/admin-guide/mm/memory-tiering.rst index 142c36651f5dd2..0a75e0dab1fd8e 100644 --- a/Documentation/admin-guide/mm/memory-tiering.rst +++ b/Documentation/admin-guide/mm/memory-tiering.rst @@ -2,7 +2,7 @@ .. _admin_guide_memory_tiering: -=========== +============ Memory tiers ============
On Wed, Jun 22, 2022 at 01:55:12PM +0530, Aneesh Kumar K.V wrote: > From: Jagdish Gediya <jvgediya@linux.ibm.com> > Hi Aneesh and Jagdish, The documentation can be improved, see below. > All N_MEMORY nodes are divided into 3 memoty tiers with tier ID value > MEMORY_TIER_HBM_GPU, MEMORY_TIER_DRAM and MEMORY_TIER_PMEM. By default, > all nodes are assigned to default memory tier. > > Demotion path for all N_MEMORY nodes is prepared based on the tier ID value > of memory tiers. > > This patch adds documention for memory tiering introduction, its sysfs > interfaces and how demotion is performed based on memory tiers. > I think the patch message should just be: "Add documentation for memory tiering. It also covers its sysfs interfaces and how demotion is performed based on memory tiers." > +=========== > +Memory tiers > +============ > + > +This document describes explicit memory tiering support along with > +demotion based on memory tiers. > + This causes htmldocs error, for which I have applied the fixup at [1]. > +Memory nodes are divided into 3 types of memory tiers with tier ID > +value as shown based on their hardware characteristics. > + > + > +MEMORY_TIER_HBM_GPU > +MEMORY_TIER_DRAM > +MEMORY_TIER_PMEM > + Use bullet list. > +Sysfs interfaces > +================ > + > +Nodes belonging to specific tier can be read from, > +/sys/devices/system/memtier/memtierN/nodelist (Read-Only) > + > +Where N is 0 - 2. The "where" sentence can be compounded into the previous sentence above. > + > +Example 1: > +For a system where Node 0 is CPU + DRAM nodes, Node 1 is HBM node, > +node 2 is a PMEM node an ideal tier layout will be > + > +$ cat /sys/devices/system/memtier/memtier0/nodelist > +1 > +$ cat /sys/devices/system/memtier/memtier1/nodelist > +0 > +$ cat /sys/devices/system/memtier/memtier2/nodelist > +2 > + The code snippets should have been inside literal code blocks. > +Example 2: > +For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM > +nodes. > + > +$ cat /sys/devices/system/memtier/memtier0/nodelist > +cat: /sys/devices/system/memtier/memtier0/nodelist: No such file or > +directory > +$ cat /sys/devices/system/memtier/memtier1/nodelist > +0-1 > +$ cat /sys/devices/system/memtier/memtier2/nodelist > +2-3 > + Use literal code block. > +Default memory tier can be read from, > +/sys/devices/system/memtier/default_tier (Read-Only) > + > +e.g. > +$ cat /sys/devices/system/memtier/default_tier > +memtier200 > + > +Max memory tier ID supported can be read from, > +/sys/devices/system/memtier/max_tier (Read-Only) > + > +e.g. > +$ cat /sys/devices/system/memtier/max_tier > +400 > + > +Individual node's memory tier can be read of set using, > +/sys/devices/system/node/nodeN/memtier (Read-Write) > + > +where N = node id > + > +When this interface is written, Node is moved from the old memory tier > +to new memory tier and demotion targets for all N_MEMORY nodes are > +built again. > + > +For example 1 mentioned above, > +$ cat /sys/devices/system/node/node0/memtier > +1 > +$ cat /sys/devices/system/node/node1/memtier > +0 > +$ cat /sys/devices/system/node/node2/memtier > +2 > + The same suggestions above apply here, too. > +Enable/Disable demotion > +----------------------- > + > +By default demotion is disabled, it can be enabled/disabled using > +below sysfs interface, > + > +$ echo 0/1 or false/true > /sys/kernel/mm/numa/demotion_enabled > + Use literal code block. > +preferred and allowed demotion nodes > +------------------------------------ > + > +Preferred nodes for a specific N_MEMORY node are the best nodes > +from the next possible lower memory tier. Allowed nodes for any > +node are all the nodes available in all possible lower memory > +tiers. > + > +Example: > + > +For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM > +nodes, > + > +node distances: > +node 0 1 2 3 > + 0 10 20 30 40 > + 1 20 10 40 30 > + 2 30 40 10 40 > + 3 40 30 40 10 > + Use reST table. > +memory_tiers[0] = <empty> > +memory_tiers[1] = 0-1 > +memory_tiers[2] = 2-3 > + > +node_demotion[0].preferred = 2 > +node_demotion[0].allowed = 2, 3 > +node_demotion[1].preferred = 3 > +node_demotion[1].allowed = 3, 2 > +node_demotion[2].preferred = <empty> > +node_demotion[2].allowed = <empty> > +node_demotion[3].preferred = <empty> > +node_demotion[3].allowed = <empty> > + What are these above? Node properties? BTW, use literal code block. If you don't understand these suggestions above, here is the diff: ---- >8 ---- diff --git a/Documentation/admin-guide/mm/memory-tiering.rst b/Documentation/admin-guide/mm/memory-tiering.rst index 0a75e0dab1fd8e..10ec5aab6ddd53 100644 --- a/Documentation/admin-guide/mm/memory-tiering.rst +++ b/Documentation/admin-guide/mm/memory-tiering.rst @@ -14,13 +14,13 @@ Introduction Many systems have multiple types of memory devices e.g. GPU, DRAM and PMEM. The memory subsystem of these systems can be called a memory -tiering system because the performance of the different types of +tiering system because the performance of each type of memory is different. Memory tiers are defined based on the hardware capabilities of memory nodes. Each memory tier is assigned a tier ID value that determines the memory tier position in demotion order. The memory tier assignment of each node is independent of each -other. Moving a node from one tier to another tier doesn't affect +other. Moving a node from one tier to another doesn't affect the tier assignment of any other node. Memory tiers are used to build the demotion targets for nodes. A node @@ -32,10 +32,9 @@ Memory tier rank Memory nodes are divided into 3 types of memory tiers with tier ID value as shown based on their hardware characteristics. - -MEMORY_TIER_HBM_GPU -MEMORY_TIER_DRAM -MEMORY_TIER_PMEM + * MEMORY_TIER_HBM_GPU + * MEMORY_TIER_DRAM + * MEMORY_TIER_PMEM Memory tiers initialization and (re)assignments =============================================== @@ -49,68 +48,73 @@ hotplug, the memory tier with default tier ID is assigned to the memory node. Sysfs interfaces ================ -Nodes belonging to specific tier can be read from, -/sys/devices/system/memtier/memtierN/nodelist (Read-Only) +Nodes belonging to specific tier can be read from +/sys/devices/system/memtier/memtierN/nodelist, where N is 0 - 2 (read-only) -Where N is 0 - 2. +Examples: -Example 1: -For a system where Node 0 is CPU + DRAM nodes, Node 1 is HBM node, -node 2 is a PMEM node an ideal tier layout will be +1. On a system where Node 0 is CPU + DRAM nodes, Node 1 is HBM node, + node 2 is a PMEM node an ideal tier layout will be: -$ cat /sys/devices/system/memtier/memtier0/nodelist -1 -$ cat /sys/devices/system/memtier/memtier1/nodelist -0 -$ cat /sys/devices/system/memtier/memtier2/nodelist -2 + .. code-block:: -Example 2: -For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM -nodes. + $ cat /sys/devices/system/memtier/memtier0/nodelist + 1 + $ cat /sys/devices/system/memtier/memtier1/nodelist + 0 + $ cat /sys/devices/system/memtier/memtier2/nodelist + 2 -$ cat /sys/devices/system/memtier/memtier0/nodelist -cat: /sys/devices/system/memtier/memtier0/nodelist: No such file or -directory -$ cat /sys/devices/system/memtier/memtier1/nodelist -0-1 -$ cat /sys/devices/system/memtier/memtier2/nodelist -2-3 +2. On a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM + nodes: -Default memory tier can be read from, -/sys/devices/system/memtier/default_tier (Read-Only) + .. code-block:: -e.g. -$ cat /sys/devices/system/memtier/default_tier -memtier200 + $ cat /sys/devices/system/memtier/memtier0/nodelist + cat: /sys/devices/system/memtier/memtier0/nodelist: No such file or + directory + $ cat /sys/devices/system/memtier/memtier1/nodelist + 0-1 + $ cat /sys/devices/system/memtier/memtier2/nodelist + 2-3 -Max memory tier ID supported can be read from, -/sys/devices/system/memtier/max_tier (Read-Only) +Default memory tier can be read from +/sys/devices/system/memtier/default_tier (read-only), e.g.: -e.g. -$ cat /sys/devices/system/memtier/max_tier -400 +.. code-block:: -Individual node's memory tier can be read of set using, -/sys/devices/system/node/nodeN/memtier (Read-Write) + $ cat /sys/devices/system/memtier/default_tier + memtier200 -where N = node id +Max memory tier ID supported can be read from +/sys/devices/system/memtier/max_tier (read-only), e.g.: -When this interface is written, Node is moved from the old memory tier +.. code-block:: + + $ cat /sys/devices/system/memtier/max_tier + 400 + +Individual node's memory tier can be read or set using +/sys/devices/system/node/nodeN/memtier (read-write), where N = node id. + +When this interface is written, node is moved from the old memory tier to new memory tier and demotion targets for all N_MEMORY nodes are built again. -For example 1 mentioned above, -$ cat /sys/devices/system/node/node0/memtier -1 -$ cat /sys/devices/system/node/node1/memtier -0 -$ cat /sys/devices/system/node/node2/memtier -2 +For example 1 mentioned above: + +.. code-block:: + + $ cat /sys/devices/system/node/node0/memtier + 1 + $ cat /sys/devices/system/node/node1/memtier + 0 + $ cat /sys/devices/system/node/node2/memtier + 2 Additional memory tiers can be created by writing a tier ID value to this file. -This results in a new memory tier creation and moving the specific NUMA node to -that memory tier. +This results into creating a new tier and moving the specific NUMA node to +that tier. Demotion ======== @@ -128,19 +132,20 @@ be used. Instead of a page being discarded during reclaim, it can be moved to persistent memory. Allowing page migration during reclaim enables -these systems to migrate pages from fast(higher) tiers to slow(lower) -tiers when the fast(higher) tier is under pressure. +these systems to migrate pages from fast (higher) tiers to slow (lower) +tiers when the fast (higher) tier is under pressure. Enable/Disable demotion ----------------------- -By default demotion is disabled, it can be enabled/disabled using -below sysfs interface, +By default demotion is disabled. It can be toggled by: -$ echo 0/1 or false/true > /sys/kernel/mm/numa/demotion_enabled +.. code-block:: -preferred and allowed demotion nodes + $ echo 0/1 or false/true > /sys/kernel/mm/numa/demotion_enabled + +Preferred and allowed demotion nodes ------------------------------------ Preferred nodes for a specific N_MEMORY node are the best nodes @@ -148,35 +153,40 @@ from the next possible lower memory tier. Allowed nodes for any node are all the nodes available in all possible lower memory tiers. -Example: +For example, on a system where Node 0 & 1 are CPU + DRAM nodes, +node 2 & 3 are PMEM nodes: -For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM -nodes, + * node distances -node distances: -node 0 1 2 3 - 0 10 20 30 40 - 1 20 10 40 30 - 2 30 40 10 40 - 3 40 30 40 10 + ==== == == == == + node 0 1 2 3 + ==== == == == == + 0 10 20 30 40 + 1 20 10 40 30 + 2 30 40 10 40 + 3 40 30 40 10 + ==== == == == == -memory_tiers[0] = <empty> -memory_tiers[1] = 0-1 -memory_tiers[2] = 2-3 + * node properties -node_demotion[0].preferred = 2 -node_demotion[0].allowed = 2, 3 -node_demotion[1].preferred = 3 -node_demotion[1].allowed = 3, 2 -node_demotion[2].preferred = <empty> -node_demotion[2].allowed = <empty> -node_demotion[3].preferred = <empty> -node_demotion[3].allowed = <empty> + .. code-block:: + + memory_tiers[0] = <empty> + memory_tiers[1] = 0-1 + memory_tiers[2] = 2-3 + + node_demotion[0].preferred = 2 + node_demotion[0].allowed = 2, 3 + node_demotion[1].preferred = 3 + node_demotion[1].allowed = 3, 2 + node_demotion[2].preferred = <empty> + node_demotion[2].allowed = <empty> + node_demotion[3].preferred = <empty> + node_demotion[3].allowed = <empty> Memory allocation for demotion ------------------------------ -If a page needs to be demoted from any node, the kernel 1st tries -to allocate a new page from the node's preferred node and fallbacks to -node's allowed targets in allocation fallback order. - +If a page needs to be demoted from any node, the kernel first tries +to allocate a new page from the node's preferred target node and fallbacks +to node's allowed targets in allocation fallback order. Thanks. [1]: https://lore.kernel.org/linux-doc/YrZ5cTFOSuWxlF2t@debian.me/
Bagas Sanjaya <bagasdotme@gmail.com> writes: > On Wed, Jun 22, 2022 at 01:55:12PM +0530, Aneesh Kumar K.V wrote: >> From: Jagdish Gediya <jvgediya@linux.ibm.com> >> > > Hi Aneesh and Jagdish, > > The documentation can be improved, see below. > >> All N_MEMORY nodes are divided into 3 memoty tiers with tier ID value >> MEMORY_TIER_HBM_GPU, MEMORY_TIER_DRAM and MEMORY_TIER_PMEM. By default, >> all nodes are assigned to default memory tier. >> >> Demotion path for all N_MEMORY nodes is prepared based on the tier ID value >> of memory tiers. >> >> This patch adds documention for memory tiering introduction, its sysfs >> interfaces and how demotion is performed based on memory tiers. >> > > I think the patch message should just be: > "Add documentation for memory tiering. It also covers its sysfs > interfaces and how demotion is performed based on memory tiers." > >> +=========== >> +Memory tiers >> +============ >> + >> +This document describes explicit memory tiering support along with >> +demotion based on memory tiers. >> + > > This causes htmldocs error, for which I have applied the fixup at [1]. > >> +Memory nodes are divided into 3 types of memory tiers with tier ID >> +value as shown based on their hardware characteristics. >> + >> + >> +MEMORY_TIER_HBM_GPU >> +MEMORY_TIER_DRAM >> +MEMORY_TIER_PMEM >> + > > Use bullet list. > >> +Sysfs interfaces >> +================ >> + >> +Nodes belonging to specific tier can be read from, >> +/sys/devices/system/memtier/memtierN/nodelist (Read-Only) >> + >> +Where N is 0 - 2. > > The "where" sentence can be compounded into the previous sentence above. > >> + >> +Example 1: >> +For a system where Node 0 is CPU + DRAM nodes, Node 1 is HBM node, >> +node 2 is a PMEM node an ideal tier layout will be >> + >> +$ cat /sys/devices/system/memtier/memtier0/nodelist >> +1 >> +$ cat /sys/devices/system/memtier/memtier1/nodelist >> +0 >> +$ cat /sys/devices/system/memtier/memtier2/nodelist >> +2 >> + > > The code snippets should have been inside literal code blocks. > >> +Example 2: >> +For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM >> +nodes. >> + >> +$ cat /sys/devices/system/memtier/memtier0/nodelist >> +cat: /sys/devices/system/memtier/memtier0/nodelist: No such file or >> +directory >> +$ cat /sys/devices/system/memtier/memtier1/nodelist >> +0-1 >> +$ cat /sys/devices/system/memtier/memtier2/nodelist >> +2-3 >> + > > Use literal code block. > >> +Default memory tier can be read from, >> +/sys/devices/system/memtier/default_tier (Read-Only) >> + >> +e.g. >> +$ cat /sys/devices/system/memtier/default_tier >> +memtier200 >> + >> +Max memory tier ID supported can be read from, >> +/sys/devices/system/memtier/max_tier (Read-Only) >> + >> +e.g. >> +$ cat /sys/devices/system/memtier/max_tier >> +400 >> + >> +Individual node's memory tier can be read of set using, >> +/sys/devices/system/node/nodeN/memtier (Read-Write) >> + >> +where N = node id >> + >> +When this interface is written, Node is moved from the old memory tier >> +to new memory tier and demotion targets for all N_MEMORY nodes are >> +built again. >> + >> +For example 1 mentioned above, >> +$ cat /sys/devices/system/node/node0/memtier >> +1 >> +$ cat /sys/devices/system/node/node1/memtier >> +0 >> +$ cat /sys/devices/system/node/node2/memtier >> +2 >> + > > The same suggestions above apply here, too. > >> +Enable/Disable demotion >> +----------------------- >> + >> +By default demotion is disabled, it can be enabled/disabled using >> +below sysfs interface, >> + >> +$ echo 0/1 or false/true > /sys/kernel/mm/numa/demotion_enabled >> + > > Use literal code block. > >> +preferred and allowed demotion nodes >> +------------------------------------ >> + >> +Preferred nodes for a specific N_MEMORY node are the best nodes >> +from the next possible lower memory tier. Allowed nodes for any >> +node are all the nodes available in all possible lower memory >> +tiers. >> + >> +Example: >> + >> +For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM >> +nodes, >> + >> +node distances: >> +node 0 1 2 3 >> + 0 10 20 30 40 >> + 1 20 10 40 30 >> + 2 30 40 10 40 >> + 3 40 30 40 10 >> + > > Use reST table. > >> +memory_tiers[0] = <empty> >> +memory_tiers[1] = 0-1 >> +memory_tiers[2] = 2-3 >> + >> +node_demotion[0].preferred = 2 >> +node_demotion[0].allowed = 2, 3 >> +node_demotion[1].preferred = 3 >> +node_demotion[1].allowed = 3, 2 >> +node_demotion[2].preferred = <empty> >> +node_demotion[2].allowed = <empty> >> +node_demotion[3].preferred = <empty> >> +node_demotion[3].allowed = <empty> >> + > > What are these above? Node properties? BTW, use literal code block. > > If you don't understand these suggestions above, here is the diff: I got with the below diff. patch: **** malformed patch at line 180: @@ -148,35 +153,40 @@ from the next possible lower memory tier. Allowed nodes for any But I did modify the documentation based on your feedback and it is much better than what I had. Thanks for the review. I will send v8 with the changes folded. I did add the below to commit message. Hope that is ok. [update doc format by Bagas Sanjaya <bagasdotme@gmail.com>] > > ---- >8 ---- > > diff --git a/Documentation/admin-guide/mm/memory-tiering.rst b/Documentation/admin-guide/mm/memory-tiering.rst > index 0a75e0dab1fd8e..10ec5aab6ddd53 100644 > --- a/Documentation/admin-guide/mm/memory-tiering.rst > +++ b/Documentation/admin-guide/mm/memory-tiering.rst > @@ -14,13 +14,13 @@ Introduction > > Many systems have multiple types of memory devices e.g. GPU, DRAM and > PMEM. The memory subsystem of these systems can be called a memory > -tiering system because the performance of the different types of > +tiering system because the performance of each type of > memory is different. Memory tiers are defined based on the hardware > capabilities of memory nodes. Each memory tier is assigned a tier ID > value that determines the memory tier position in demotion order. > > The memory tier assignment of each node is independent of each > -other. Moving a node from one tier to another tier doesn't affect > +other. Moving a node from one tier to another doesn't affect > the tier assignment of any other node. > > Memory tiers are used to build the demotion targets for nodes. A node > @@ -32,10 +32,9 @@ Memory tier rank > Memory nodes are divided into 3 types of memory tiers with tier ID > value as shown based on their hardware characteristics. > > - > -MEMORY_TIER_HBM_GPU > -MEMORY_TIER_DRAM > -MEMORY_TIER_PMEM > + * MEMORY_TIER_HBM_GPU > + * MEMORY_TIER_DRAM > + * MEMORY_TIER_PMEM > > Memory tiers initialization and (re)assignments > =============================================== > @@ -49,68 +48,73 @@ hotplug, the memory tier with default tier ID is assigned to the memory node. > Sysfs interfaces > ================ > > -Nodes belonging to specific tier can be read from, > -/sys/devices/system/memtier/memtierN/nodelist (Read-Only) > +Nodes belonging to specific tier can be read from > +/sys/devices/system/memtier/memtierN/nodelist, where N is 0 - 2 (read-only) > > -Where N is 0 - 2. > +Examples: > > -Example 1: > -For a system where Node 0 is CPU + DRAM nodes, Node 1 is HBM node, > -node 2 is a PMEM node an ideal tier layout will be > +1. On a system where Node 0 is CPU + DRAM nodes, Node 1 is HBM node, > + node 2 is a PMEM node an ideal tier layout will be: > > -$ cat /sys/devices/system/memtier/memtier0/nodelist > -1 > -$ cat /sys/devices/system/memtier/memtier1/nodelist > -0 > -$ cat /sys/devices/system/memtier/memtier2/nodelist > -2 > + .. code-block:: > > -Example 2: > -For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM > -nodes. > + $ cat /sys/devices/system/memtier/memtier0/nodelist > + 1 > + $ cat /sys/devices/system/memtier/memtier1/nodelist > + 0 > + $ cat /sys/devices/system/memtier/memtier2/nodelist > + 2 > > -$ cat /sys/devices/system/memtier/memtier0/nodelist > -cat: /sys/devices/system/memtier/memtier0/nodelist: No such file or > -directory > -$ cat /sys/devices/system/memtier/memtier1/nodelist > -0-1 > -$ cat /sys/devices/system/memtier/memtier2/nodelist > -2-3 > +2. On a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM > + nodes: > > -Default memory tier can be read from, > -/sys/devices/system/memtier/default_tier (Read-Only) > + .. code-block:: > > -e.g. > -$ cat /sys/devices/system/memtier/default_tier > -memtier200 > + $ cat /sys/devices/system/memtier/memtier0/nodelist > + cat: /sys/devices/system/memtier/memtier0/nodelist: No such file or > + directory > + $ cat /sys/devices/system/memtier/memtier1/nodelist > + 0-1 > + $ cat /sys/devices/system/memtier/memtier2/nodelist > + 2-3 > > -Max memory tier ID supported can be read from, > -/sys/devices/system/memtier/max_tier (Read-Only) > +Default memory tier can be read from > +/sys/devices/system/memtier/default_tier (read-only), e.g.: > > -e.g. > -$ cat /sys/devices/system/memtier/max_tier > -400 > +.. code-block:: > > -Individual node's memory tier can be read of set using, > -/sys/devices/system/node/nodeN/memtier (Read-Write) > + $ cat /sys/devices/system/memtier/default_tier > + memtier200 > > -where N = node id > +Max memory tier ID supported can be read from > +/sys/devices/system/memtier/max_tier (read-only), e.g.: > > -When this interface is written, Node is moved from the old memory tier > +.. code-block:: > + > + $ cat /sys/devices/system/memtier/max_tier > + 400 > + > +Individual node's memory tier can be read or set using > +/sys/devices/system/node/nodeN/memtier (read-write), where N = node id. > + > +When this interface is written, node is moved from the old memory tier > to new memory tier and demotion targets for all N_MEMORY nodes are > built again. > > -For example 1 mentioned above, > -$ cat /sys/devices/system/node/node0/memtier > -1 > -$ cat /sys/devices/system/node/node1/memtier > -0 > -$ cat /sys/devices/system/node/node2/memtier > -2 > +For example 1 mentioned above: > + > +.. code-block:: > + > + $ cat /sys/devices/system/node/node0/memtier > + 1 > + $ cat /sys/devices/system/node/node1/memtier > + 0 > + $ cat /sys/devices/system/node/node2/memtier > + 2 > > Additional memory tiers can be created by writing a tier ID value to this file. > -This results in a new memory tier creation and moving the specific NUMA node to > -that memory tier. > +This results into creating a new tier and moving the specific NUMA node to > +that tier. > > Demotion > ======== > @@ -128,19 +132,20 @@ be used. > > Instead of a page being discarded during reclaim, it can be moved to > persistent memory. Allowing page migration during reclaim enables > -these systems to migrate pages from fast(higher) tiers to slow(lower) > -tiers when the fast(higher) tier is under pressure. > +these systems to migrate pages from fast (higher) tiers to slow (lower) > +tiers when the fast (higher) tier is under pressure. > > > Enable/Disable demotion > ----------------------- > > -By default demotion is disabled, it can be enabled/disabled using > -below sysfs interface, > +By default demotion is disabled. It can be toggled by: > > -$ echo 0/1 or false/true > /sys/kernel/mm/numa/demotion_enabled > +.. code-block:: > > -preferred and allowed demotion nodes > + $ echo 0/1 or false/true > /sys/kernel/mm/numa/demotion_enabled > + > +Preferred and allowed demotion nodes > ------------------------------------ > > Preferred nodes for a specific N_MEMORY node are the best nodes > @@ -148,35 +153,40 @@ from the next possible lower memory tier. Allowed nodes for any > node are all the nodes available in all possible lower memory > tiers. > > -Example: > +For example, on a system where Node 0 & 1 are CPU + DRAM nodes, > +node 2 & 3 are PMEM nodes: > > -For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM > -nodes, > + * node distances > > -node distances: > -node 0 1 2 3 > - 0 10 20 30 40 > - 1 20 10 40 30 > - 2 30 40 10 40 > - 3 40 30 40 10 > + ==== == == == == > + node 0 1 2 3 > + ==== == == == == > + 0 10 20 30 40 > + 1 20 10 40 30 > + 2 30 40 10 40 > + 3 40 30 40 10 > + ==== == == == == > > -memory_tiers[0] = <empty> > -memory_tiers[1] = 0-1 > -memory_tiers[2] = 2-3 > + * node properties > > -node_demotion[0].preferred = 2 > -node_demotion[0].allowed = 2, 3 > -node_demotion[1].preferred = 3 > -node_demotion[1].allowed = 3, 2 > -node_demotion[2].preferred = <empty> > -node_demotion[2].allowed = <empty> > -node_demotion[3].preferred = <empty> > -node_demotion[3].allowed = <empty> > + .. code-block:: > + > + memory_tiers[0] = <empty> > + memory_tiers[1] = 0-1 > + memory_tiers[2] = 2-3 > + > + node_demotion[0].preferred = 2 > + node_demotion[0].allowed = 2, 3 > + node_demotion[1].preferred = 3 > + node_demotion[1].allowed = 3, 2 > + node_demotion[2].preferred = <empty> > + node_demotion[2].allowed = <empty> > + node_demotion[3].preferred = <empty> > + node_demotion[3].allowed = <empty> > > Memory allocation for demotion > ------------------------------ > > -If a page needs to be demoted from any node, the kernel 1st tries > -to allocate a new page from the node's preferred node and fallbacks to > -node's allowed targets in allocation fallback order. > - > +If a page needs to be demoted from any node, the kernel first tries > +to allocate a new page from the node's preferred target node and fallbacks > +to node's allowed targets in allocation fallback order. > > > Thanks. > > [1]: https://lore.kernel.org/linux-doc/YrZ5cTFOSuWxlF2t@debian.me/ > > -- > An old man doll... just what I always wanted! - Clara
On Wed, Jun 22, 2022 at 2:04 PM Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote: > > From: Jagdish Gediya <jvgediya@linux.ibm.com> > > All N_MEMORY nodes are divided into 3 memoty tiers with tier ID value s /memoty/ memory > MEMORY_TIER_HBM_GPU, MEMORY_TIER_DRAM and MEMORY_TIER_PMEM. By default, > all nodes are assigned to default memory tier. I think adding the default memory tier name will be helpful. > > Demotion path for all N_MEMORY nodes is prepared based on the tier ID value > of memory tiers. > > This patch adds documention for memory tiering introduction, its sysfs > interfaces and how demotion is performed based on memory tiers. > > Suggested-by: Wei Xu <weixugc@google.com> > Signed-off-by: Jagdish Gediya <jvgediya@linux.ibm.com> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > Documentation/admin-guide/mm/index.rst | 1 + > .../admin-guide/mm/memory-tiering.rst | 182 ++++++++++++++++++ > 2 files changed, 183 insertions(+) > create mode 100644 Documentation/admin-guide/mm/memory-tiering.rst > > diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst > index c21b5823f126..3f211cbca8c3 100644 > --- a/Documentation/admin-guide/mm/index.rst > +++ b/Documentation/admin-guide/mm/index.rst > @@ -32,6 +32,7 @@ the Linux memory management. > idle_page_tracking > ksm > memory-hotplug > + memory-tiering > nommu-mmap > numa_memory_policy > numaperf > diff --git a/Documentation/admin-guide/mm/memory-tiering.rst b/Documentation/admin-guide/mm/memory-tiering.rst > new file mode 100644 > index 000000000000..142c36651f5d > --- /dev/null > +++ b/Documentation/admin-guide/mm/memory-tiering.rst > @@ -0,0 +1,182 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +.. _admin_guide_memory_tiering: > + > +=========== > +Memory tiers > +============ > + > +This document describes explicit memory tiering support along with > +demotion based on memory tiers. > + > +Introduction > +============ > + > +Many systems have multiple types of memory devices e.g. GPU, DRAM and > +PMEM. The memory subsystem of these systems can be called a memory > +tiering system because the performance of the different types of > +memory is different. Memory tiers are defined based on the hardware > +capabilities of memory nodes. Each memory tier is assigned a tier ID > +value that determines the memory tier position in demotion order. > + > +The memory tier assignment of each node is independent of each > +other. Moving a node from one tier to another tier doesn't affect > +the tier assignment of any other node. > + > +Memory tiers are used to build the demotion targets for nodes. A node > +can demote its pages to any node of any lower tiers. > + > +Memory tier rank > +================= > + > +Memory nodes are divided into 3 types of memory tiers with tier ID > +value as shown based on their hardware characteristics. > + > + > +MEMORY_TIER_HBM_GPU > +MEMORY_TIER_DRAM > +MEMORY_TIER_PMEM > + > +Memory tiers initialization and (re)assignments > +=============================================== > + > +By default, all nodes are assigned to the memory tier with the default tier ID > +DEFAULT_MEMORY_TIER which is 200 (MEMORY_TIER_DRAM). The memory tier of > +the memory node can be either modified through sysfs or from the driver. On > +hotplug, the memory tier with default tier ID is assigned to the memory node. > + > + > +Sysfs interfaces > +================ > + > +Nodes belonging to specific tier can be read from, > +/sys/devices/system/memtier/memtierN/nodelist (Read-Only) > + > +Where N is 0 - 2. > + > +Example 1: > +For a system where Node 0 is CPU + DRAM nodes, Node 1 is HBM node, > +node 2 is a PMEM node an ideal tier layout will be > + > +$ cat /sys/devices/system/memtier/memtier0/nodelist > +1 > +$ cat /sys/devices/system/memtier/memtier1/nodelist > +0 > +$ cat /sys/devices/system/memtier/memtier2/nodelist > +2 > + > +Example 2: > +For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM > +nodes. > + > +$ cat /sys/devices/system/memtier/memtier0/nodelist > +cat: /sys/devices/system/memtier/memtier0/nodelist: No such file or > +directory > +$ cat /sys/devices/system/memtier/memtier1/nodelist > +0-1 > +$ cat /sys/devices/system/memtier/memtier2/nodelist > +2-3 > + > +Default memory tier can be read from, > +/sys/devices/system/memtier/default_tier (Read-Only) > + > +e.g. > +$ cat /sys/devices/system/memtier/default_tier > +memtier200 > + > +Max memory tier ID supported can be read from, > +/sys/devices/system/memtier/max_tier (Read-Only) > + > +e.g. > +$ cat /sys/devices/system/memtier/max_tier > +400 > + > +Individual node's memory tier can be read of set using, > +/sys/devices/system/node/nodeN/memtier (Read-Write) > + > +where N = node id > + > +When this interface is written, Node is moved from the old memory tier > +to new memory tier and demotion targets for all N_MEMORY nodes are > +built again. > + > +For example 1 mentioned above, > +$ cat /sys/devices/system/node/node0/memtier > +1 > +$ cat /sys/devices/system/node/node1/memtier > +0 > +$ cat /sys/devices/system/node/node2/memtier > +2 > + > +Additional memory tiers can be created by writing a tier ID value to this file. > +This results in a new memory tier creation and moving the specific NUMA node to > +that memory tier. > + > +Demotion > +======== > + > +In a system with DRAM and persistent memory, once DRAM > +fills up, reclaim will start and some of the DRAM contents will be > +thrown out even if there is a space in persistent memory. > +Consequently, allocations will, at some point, start falling over to the slower > +persistent memory. > + > +That has two nasty properties. First, the newer allocations can end up in > +the slower persistent memory. Second, reclaimed data in DRAM are just > +discarded even if there are gobs of space in persistent memory that could > +be used. > + > +Instead of a page being discarded during reclaim, it can be moved to > +persistent memory. Allowing page migration during reclaim enables > +these systems to migrate pages from fast(higher) tiers to slow(lower) > +tiers when the fast(higher) tier is under pressure. > + > + > +Enable/Disable demotion > +----------------------- > + > +By default demotion is disabled, it can be enabled/disabled using > +below sysfs interface, > + > +$ echo 0/1 or false/true > /sys/kernel/mm/numa/demotion_enabled > + > +preferred and allowed demotion nodes > +------------------------------------ > + > +Preferred nodes for a specific N_MEMORY node are the best nodes > +from the next possible lower memory tier. Allowed nodes for any > +node are all the nodes available in all possible lower memory > +tiers. > + > +Example: > + > +For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM > +nodes, > + > +node distances: > +node 0 1 2 3 > + 0 10 20 30 40 > + 1 20 10 40 30 > + 2 30 40 10 40 > + 3 40 30 40 10 > + > +memory_tiers[0] = <empty> > +memory_tiers[1] = 0-1 > +memory_tiers[2] = 2-3 > + > +node_demotion[0].preferred = 2 > +node_demotion[0].allowed = 2, 3 > +node_demotion[1].preferred = 3 > +node_demotion[1].allowed = 3, 2 > +node_demotion[2].preferred = <empty> > +node_demotion[2].allowed = <empty> > +node_demotion[3].preferred = <empty> > +node_demotion[3].allowed = <empty> > + > +Memory allocation for demotion > +------------------------------ > + > +If a page needs to be demoted from any node, the kernel 1st tries > +to allocate a new page from the node's preferred node and fallbacks to > +node's allowed targets in allocation fallback order. > + > -- > 2.36.1 > >
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst index c21b5823f126..3f211cbca8c3 100644 --- a/Documentation/admin-guide/mm/index.rst +++ b/Documentation/admin-guide/mm/index.rst @@ -32,6 +32,7 @@ the Linux memory management. idle_page_tracking ksm memory-hotplug + memory-tiering nommu-mmap numa_memory_policy numaperf diff --git a/Documentation/admin-guide/mm/memory-tiering.rst b/Documentation/admin-guide/mm/memory-tiering.rst new file mode 100644 index 000000000000..142c36651f5d --- /dev/null +++ b/Documentation/admin-guide/mm/memory-tiering.rst @@ -0,0 +1,182 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _admin_guide_memory_tiering: + +=========== +Memory tiers +============ + +This document describes explicit memory tiering support along with +demotion based on memory tiers. + +Introduction +============ + +Many systems have multiple types of memory devices e.g. GPU, DRAM and +PMEM. The memory subsystem of these systems can be called a memory +tiering system because the performance of the different types of +memory is different. Memory tiers are defined based on the hardware +capabilities of memory nodes. Each memory tier is assigned a tier ID +value that determines the memory tier position in demotion order. + +The memory tier assignment of each node is independent of each +other. Moving a node from one tier to another tier doesn't affect +the tier assignment of any other node. + +Memory tiers are used to build the demotion targets for nodes. A node +can demote its pages to any node of any lower tiers. + +Memory tier rank +================= + +Memory nodes are divided into 3 types of memory tiers with tier ID +value as shown based on their hardware characteristics. + + +MEMORY_TIER_HBM_GPU +MEMORY_TIER_DRAM +MEMORY_TIER_PMEM + +Memory tiers initialization and (re)assignments +=============================================== + +By default, all nodes are assigned to the memory tier with the default tier ID +DEFAULT_MEMORY_TIER which is 200 (MEMORY_TIER_DRAM). The memory tier of +the memory node can be either modified through sysfs or from the driver. On +hotplug, the memory tier with default tier ID is assigned to the memory node. + + +Sysfs interfaces +================ + +Nodes belonging to specific tier can be read from, +/sys/devices/system/memtier/memtierN/nodelist (Read-Only) + +Where N is 0 - 2. + +Example 1: +For a system where Node 0 is CPU + DRAM nodes, Node 1 is HBM node, +node 2 is a PMEM node an ideal tier layout will be + +$ cat /sys/devices/system/memtier/memtier0/nodelist +1 +$ cat /sys/devices/system/memtier/memtier1/nodelist +0 +$ cat /sys/devices/system/memtier/memtier2/nodelist +2 + +Example 2: +For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM +nodes. + +$ cat /sys/devices/system/memtier/memtier0/nodelist +cat: /sys/devices/system/memtier/memtier0/nodelist: No such file or +directory +$ cat /sys/devices/system/memtier/memtier1/nodelist +0-1 +$ cat /sys/devices/system/memtier/memtier2/nodelist +2-3 + +Default memory tier can be read from, +/sys/devices/system/memtier/default_tier (Read-Only) + +e.g. +$ cat /sys/devices/system/memtier/default_tier +memtier200 + +Max memory tier ID supported can be read from, +/sys/devices/system/memtier/max_tier (Read-Only) + +e.g. +$ cat /sys/devices/system/memtier/max_tier +400 + +Individual node's memory tier can be read of set using, +/sys/devices/system/node/nodeN/memtier (Read-Write) + +where N = node id + +When this interface is written, Node is moved from the old memory tier +to new memory tier and demotion targets for all N_MEMORY nodes are +built again. + +For example 1 mentioned above, +$ cat /sys/devices/system/node/node0/memtier +1 +$ cat /sys/devices/system/node/node1/memtier +0 +$ cat /sys/devices/system/node/node2/memtier +2 + +Additional memory tiers can be created by writing a tier ID value to this file. +This results in a new memory tier creation and moving the specific NUMA node to +that memory tier. + +Demotion +======== + +In a system with DRAM and persistent memory, once DRAM +fills up, reclaim will start and some of the DRAM contents will be +thrown out even if there is a space in persistent memory. +Consequently, allocations will, at some point, start falling over to the slower +persistent memory. + +That has two nasty properties. First, the newer allocations can end up in +the slower persistent memory. Second, reclaimed data in DRAM are just +discarded even if there are gobs of space in persistent memory that could +be used. + +Instead of a page being discarded during reclaim, it can be moved to +persistent memory. Allowing page migration during reclaim enables +these systems to migrate pages from fast(higher) tiers to slow(lower) +tiers when the fast(higher) tier is under pressure. + + +Enable/Disable demotion +----------------------- + +By default demotion is disabled, it can be enabled/disabled using +below sysfs interface, + +$ echo 0/1 or false/true > /sys/kernel/mm/numa/demotion_enabled + +preferred and allowed demotion nodes +------------------------------------ + +Preferred nodes for a specific N_MEMORY node are the best nodes +from the next possible lower memory tier. Allowed nodes for any +node are all the nodes available in all possible lower memory +tiers. + +Example: + +For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM +nodes, + +node distances: +node 0 1 2 3 + 0 10 20 30 40 + 1 20 10 40 30 + 2 30 40 10 40 + 3 40 30 40 10 + +memory_tiers[0] = <empty> +memory_tiers[1] = 0-1 +memory_tiers[2] = 2-3 + +node_demotion[0].preferred = 2 +node_demotion[0].allowed = 2, 3 +node_demotion[1].preferred = 3 +node_demotion[1].allowed = 3, 2 +node_demotion[2].preferred = <empty> +node_demotion[2].allowed = <empty> +node_demotion[3].preferred = <empty> +node_demotion[3].allowed = <empty> + +Memory allocation for demotion +------------------------------ + +If a page needs to be demoted from any node, the kernel 1st tries +to allocate a new page from the node's preferred node and fallbacks to +node's allowed targets in allocation fallback order. +