Message ID | tencent_57D6CF437AF88E48DD5C5BD872753C43280A@qq.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing | expand |
On Sun, Nov 24, 2024 at 03:09:35AM +0800, Junjie Fu wrote: > Because the definition of MPOL_PREFERRED is as follows: "This mode sets the > preferred node for allocation. The kernel will try to allocate pages from > this node first and fall back to nearby nodes if the preferred node is low > on free memory. If the nodemask specifies more than one node ID, the first > node in the mask will be selected as the preferred node." > > Thus, if the node where the current page resides is not the first node in > the nodemask, it is not the PREFERRED node, and memory migration can be > attempted. I think you've found poor documentation, not a kernel bug. If multiple nodes are set in PREFERRED, then _new_ allocations should come from the first node, but _existing_ allocations do not need to be moved to the new node. At least IMO that was the original intent of allowing multiple nodes to be set. Otherwise, what is the point?
diff --git a/mm/mempolicy.c b/mm/mempolicy.c index bb37cd1a51d8..3454dfc7da8d 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2769,7 +2769,7 @@ int mpol_misplaced(struct folio *folio, struct vm_fault *vmf, break; case MPOL_PREFERRED: - if (node_isset(curnid, pol->nodes)) + if (curnid == first_node(pol->nodes)) goto out; polnid = first_node(pol->nodes); break;
When handling a page fault caused by NUMA balancing (do_numa_page), it is necessary to decide whether to migrate the current page to another node or keep it on its current node. For pages with the MPOL_PREFERRED memory policy, it is sufficient to check whether the first node set in the nodemask is the same as the node where the page is currently located. If this is the case, the page should remain in its current state. Otherwise, migration to another node should be attempted. Because the definition of MPOL_PREFERRED is as follows: "This mode sets the preferred node for allocation. The kernel will try to allocate pages from this node first and fall back to nearby nodes if the preferred node is low on free memory. If the nodemask specifies more than one node ID, the first node in the mask will be selected as the preferred node." Thus, if the node where the current page resides is not the first node in the nodemask, it is not the PREFERRED node, and memory migration can be attempted. However, in the original code, the check only verifies whether the current node exists in the nodemask (which may or may not be the first node in the mask). This could lead to a scenario where, if the current node is not the first node in the nodemask, the code incorrectly decides not to attempt migration to other nodes. This behavior is clearly incorrect. If the target node for migration and the page's current NUMA node are both within the nodemask but neither is the first node, they should be treated with the same priority, and migration attempts should proceed. Signed-off-by: Junjie Fu <fujunjie1@qq.com> --- mm/mempolicy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)