Message ID | 1622005302-23027-2-git-send-email-feng.tang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/mempolicy: some fix and semantics cleanup | expand |
On Wed 26-05-21 13:01:39, Feng Tang wrote: > mempolicy_nodemask_intersects() is used in oom case to check if a > task may have memory allocated on some memory nodes. > > Currently, the nodes_intersects() is run for both 'bind' and 'interleave' > policies. But they are different regarding memory allocation, the nodemask > is a forced requirement for 'bind', while just a hint for 'interleave'. > Like in alloc_pages_vma(): > > nmask = policy_nodemask(gfp, pol); > preferred_nid = policy_node(gfp, pol, node); > page = __alloc_pages(gfp, order, preferred_nid, nmask); > > in plicy_nodemask(), only 'bind' policy may return its desired nodemask, > while others return NULL. And this 'NULL' enables the 'interleave' policy > can get memory from other nodes than its nodemask. > > So skip the nodemask intersect check for 'interleave' policy. The changelog is not really clear on the actual effect of the patch and the above reference to alloc_pages_vma looks misleading to me because that path is never called for interleaved policy. This is very likely my fault because I was rather vague. The existing code in its current form is confusing but it _works_ properly. The problem is that it sounds like a general helper and in that regards the function is correct for the interleaved policy and your proposed preferred-many. But its only existing caller wants a different semantic. Until now this was not a real problem even for OOM context because alloc_page_interleave is always used for the interleaving policy and that one doesn't use any node mask so the code is not really exercised. With your MPOL_PREFERRED this would no longer be the case. Your patch makes the code more robust for the oom context but it can confuse other users who really want to do an intersect logic. So I think it would really be best to rename the function and make it oom specific. E.g. mempolicy_in_oom_domain(tsk, mask) this would make it clear that this is not a general purpose function. The changelog should be clear that this is just a code cleanup rather than fix.
On Thu, May 27, 2021 at 09:30:00AM +0200, Michal Hocko wrote: > On Wed 26-05-21 13:01:39, Feng Tang wrote: > > mempolicy_nodemask_intersects() is used in oom case to check if a > > task may have memory allocated on some memory nodes. > > > > Currently, the nodes_intersects() is run for both 'bind' and 'interleave' > > policies. But they are different regarding memory allocation, the nodemask > > is a forced requirement for 'bind', while just a hint for 'interleave'. > > Like in alloc_pages_vma(): > > > > nmask = policy_nodemask(gfp, pol); > > preferred_nid = policy_node(gfp, pol, node); > > page = __alloc_pages(gfp, order, preferred_nid, nmask); > > > > in plicy_nodemask(), only 'bind' policy may return its desired nodemask, > > while others return NULL. And this 'NULL' enables the 'interleave' policy > > can get memory from other nodes than its nodemask. > > > > So skip the nodemask intersect check for 'interleave' policy. > > The changelog is not really clear on the actual effect of the > patch and the above reference to alloc_pages_vma looks misleading to me > because that path is never called for interleaved policy. You are right. thanks for pointing it out. Only the 'bind' policy calls policy_nodemask() and gets its preset nodemask, while for 'interleave', alloc_page_interleave() calls __alloc_pages() with NULL nodemask, so the conclusion is the same that 'bind' policy can only get memory from its preset nodemask, while 'interleave' can get memory from all nodes. > This is very likely my fault because I was rather vague. The existing > code in its current form is confusing but it _works_ properly. The > problem is that it sounds like a general helper and in that regards > the function is correct for the interleaved policy and your proposed > preferred-many. But its only existing caller wants a different semantic. > > Until now this was not a real problem even for OOM context because > alloc_page_interleave is always used for the interleaving policy > and that one doesn't use any node mask so the code is not really > exercised. With your MPOL_PREFERRED this would no longer be the case. Given the 'interleave' task may have memory allocated from all nodes, shouldn't the mempolicy_nodemask_intersects() return true for 'interleave'? or I'm still missing something? > Your patch makes the code more robust for the oom context but it can > confuse other users who really want to do an intersect logic. So I think > it would really be best to rename the function and make it oom specific. > E.g. mempolicy_in_oom_domain(tsk, mask) this would make it clear that > this is not a general purpose function. Ok, will rename like this. Thanks, Feng > The changelog should be clear that this is just a code cleanup rather > than fix. > > -- > Michal Hocko > SUSE Labs
On Thu 27-05-21 21:05:01, Feng Tang wrote: > On Thu, May 27, 2021 at 09:30:00AM +0200, Michal Hocko wrote: [...] > > Until now this was not a real problem even for OOM context because > > alloc_page_interleave is always used for the interleaving policy > > and that one doesn't use any node mask so the code is not really > > exercised. With your MPOL_PREFERRED this would no longer be the case. > > Given the 'interleave' task may have memory allocated from all nodes, > shouldn't the mempolicy_nodemask_intersects() return true for 'interleave'? > or I'm still missing something? Well, if you go with the renaming then it should be quite obvious that any policies which are not a hard binding should return true.
On Thu, May 27, 2021 at 03:15:04PM +0200, Michal Hocko wrote: > On Thu 27-05-21 21:05:01, Feng Tang wrote: > > On Thu, May 27, 2021 at 09:30:00AM +0200, Michal Hocko wrote: > [...] > > > Until now this was not a real problem even for OOM context because > > > alloc_page_interleave is always used for the interleaving policy > > > and that one doesn't use any node mask so the code is not really > > > exercised. With your MPOL_PREFERRED this would no longer be the case. > > > > Given the 'interleave' task may have memory allocated from all nodes, > > shouldn't the mempolicy_nodemask_intersects() return true for 'interleave'? > > or I'm still missing something? > > Well, if you go with the renaming then it should be quite obvious that > any policies which are not a hard binding should return true. Ok, will do the rename. thanks for clarifying! - Feng > -- > Michal Hocko > SUSE Labs
diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d79fa29..1964cca 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2098,7 +2098,7 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) * * If tsk's mempolicy is "default" [NULL], return 'true' to indicate default * policy. Otherwise, check for intersection between mask and the policy - * nodemask for 'bind' or 'interleave' policy. For 'preferred' or 'local' + * nodemask for 'bind' policy. For 'interleave', 'preferred' or 'local' * policy, always return true since it may allocate elsewhere on fallback. * * Takes task_lock(tsk) to prevent freeing of its mempolicy. @@ -2111,29 +2111,13 @@ bool mempolicy_nodemask_intersects(struct task_struct *tsk, if (!mask) return ret; + task_lock(tsk); mempolicy = tsk->mempolicy; - if (!mempolicy) - goto out; - - switch (mempolicy->mode) { - case MPOL_PREFERRED: - /* - * MPOL_PREFERRED and MPOL_F_LOCAL are only preferred nodes to - * allocate from, they may fallback to other nodes when oom. - * Thus, it's possible for tsk to have allocated memory from - * nodes in mask. - */ - break; - case MPOL_BIND: - case MPOL_INTERLEAVE: + if (mempolicy && mempolicy->mode == MPOL_BIND) ret = nodes_intersects(mempolicy->v.nodes, *mask); - break; - default: - BUG(); - } -out: task_unlock(tsk); + return ret; }
mempolicy_nodemask_intersects() is used in oom case to check if a task may have memory allocated on some memory nodes. Currently, the nodes_intersects() is run for both 'bind' and 'interleave' policies. But they are different regarding memory allocation, the nodemask is a forced requirement for 'bind', while just a hint for 'interleave'. Like in alloc_pages_vma(): nmask = policy_nodemask(gfp, pol); preferred_nid = policy_node(gfp, pol, node); page = __alloc_pages(gfp, order, preferred_nid, nmask); in plicy_nodemask(), only 'bind' policy may return its desired nodemask, while others return NULL. And this 'NULL' enables the 'interleave' policy can get memory from other nodes than its nodemask. So skip the nodemask intersect check for 'interleave' policy. Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Feng Tang <feng.tang@intel.com> --- mm/mempolicy.c | 24 ++++-------------------- 1 file changed, 4 insertions(+), 20 deletions(-)