diff mbox series

mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing

Message ID tencent_57D6CF437AF88E48DD5C5BD872753C43280A@qq.com (mailing list archive)
State New
Headers show
Series mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing | expand

Commit Message

Junjie Fu Nov. 23, 2024, 7:09 p.m. UTC
When handling a page fault caused by NUMA balancing (do_numa_page), it is
necessary to decide whether to migrate the current page to another node or
keep it on its current node. For pages with the MPOL_PREFERRED memory
policy, it is sufficient to check whether the first node set in the
nodemask is the same as the node where the page is currently located. If
this is the case, the page should remain in its current state. Otherwise,
migration to another node should be attempted.

Because the definition of MPOL_PREFERRED is as follows: "This mode sets the
preferred node for allocation. The kernel will try to allocate pages from
this node first and fall back to nearby nodes if the preferred node is low
on free memory. If the nodemask specifies more than one node ID, the first
node in the mask will be selected as the preferred node."

Thus, if the node where the current page resides is not the first node in
the nodemask, it is not the PREFERRED node, and memory migration can be
attempted.

However, in the original code, the check only verifies whether the current
node exists in the nodemask (which may or may not be the first node in the
mask). This could lead to a scenario where, if the current node is not the
first node in the nodemask, the code incorrectly decides not to attempt
migration to other nodes.

This behavior is clearly incorrect. If the target node for migration and
the page's current NUMA node are both within the nodemask but neither is
the first node, they should be treated with the same priority, and
migration attempts should proceed.

Signed-off-by: Junjie Fu <fujunjie1@qq.com>
---
 mm/mempolicy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Matthew Wilcox Nov. 23, 2024, 10:15 p.m. UTC | #1
On Sun, Nov 24, 2024 at 03:09:35AM +0800, Junjie Fu wrote:
> Because the definition of MPOL_PREFERRED is as follows: "This mode sets the
> preferred node for allocation. The kernel will try to allocate pages from
> this node first and fall back to nearby nodes if the preferred node is low
> on free memory. If the nodemask specifies more than one node ID, the first
> node in the mask will be selected as the preferred node."
> 
> Thus, if the node where the current page resides is not the first node in
> the nodemask, it is not the PREFERRED node, and memory migration can be
> attempted.

I think you've found poor documentation, not a kernel bug.  If multiple
nodes are set in PREFERRED, then _new_ allocations should come from the
first node, but _existing_ allocations do not need to be moved to the
new node.  At least IMO that was the original intent of allowing
multiple nodes to be set.  Otherwise, what is the point?
diff mbox series

Patch

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bb37cd1a51d8..3454dfc7da8d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2769,7 +2769,7 @@  int mpol_misplaced(struct folio *folio, struct vm_fault *vmf,
 		break;
 
 	case MPOL_PREFERRED:
-		if (node_isset(curnid, pol->nodes))
+		if (curnid == first_node(pol->nodes))
 			goto out;
 		polnid = first_node(pol->nodes);
 		break;