Message ID | 1594728512-18969-1-git-send-email-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm, oom: check memcg margin for parallel oom | expand |
On Tue 14-07-20 08:08:32, Yafang Shao wrote: > The commit 7775face2079 ("memcg: killed threads should not invoke memcg OOM > killer") resolves the problem that different threads in a multi-threaded > task doing parallel memcg oom, but it doesn't solve the problem that > different tasks doing parallel memcg oom. > > It may happens that many different tasks in the same memcg are waiting > oom_lock at the same time, if one of them has already made progress and > freed enough available memory, the others don't need to trigger the oom > killer again. By checking memcg margin after hold oom_lock can help > achieve it. While the changelog makes sense it I believe it can be improved. I would use something like the following. Feel free to use its parts. " Memcg oom killer invocation is synchronized by the global oom_lock and tasks are sleeping on the lock while somebody is selecting the victim or potentially race with the oom_reaper is releasing the victim's memory. This can result in a pointless oom killer invocation because a waiter might be racing with the oom_reaper P1 oom_reaper P2 oom_reap_task mutex_lock(oom_lock) out_of_memory # no victim because we have one already __oom_reap_task_mm mute_unlock(oom_lock) mutex_lock(oom_lock) set MMF_OOM_SKIP select_bad_process # finds a new victim The page allocator prevents from this race by trying to allocate after the lock can be acquired (in __alloc_pages_may_oom) which acts as a last minute check. Moreover page allocator simply doesn't block on the oom_lock and simply retries the whole reclaim process. Memcg oom killer should do the last minute check as well. Call mem_cgroup_margin to do that. Trylock on the oom_lock could be done as well but this doesn't seem to be necessary at this stage. " > Suggested-by: Michal Hocko <mhocko@kernel.org> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > Cc: David Rientjes <rientjes@google.com> > --- > mm/memcontrol.c | 19 +++++++++++++++++-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 1962232..df141e1 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1560,16 +1560,31 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > .gfp_mask = gfp_mask, > .order = order, > }; > - bool ret; > + bool ret = true; > > if (mutex_lock_killable(&oom_lock)) > return true; > + > /* > * A few threads which were not waiting at mutex_lock_killable() can > * fail to bail out. Therefore, check again after holding oom_lock. > */ > - ret = should_force_charge() || out_of_memory(&oc); > + if (should_force_charge()) > + goto out; > + > + /* > + * Different tasks may be doing parallel oom, so after hold the > + * oom_lock the task should check the memcg margin again to check > + * whether other task has already made progress. > + */ > + if (mem_cgroup_margin(memcg) >= (1 << order)) > + goto out; Is there any reason why you simply haven't done this? (+ your comment which is helpful). diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 248e6cad0095..2c176825efe3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1561,15 +1561,21 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, .order = order, .chosen_points = LONG_MIN, }; - bool ret; + bool ret = true; - if (mutex_lock_killable(&oom_lock)) + if (!mutex_trylock(&oom_lock)) return true; + + if (mem_cgroup_margin(memcg) >= (1 << order)) + goto unlock; + /* * A few threads which were not waiting at mutex_lock_killable() can * fail to bail out. Therefore, check again after holding oom_lock. */ ret = should_force_charge() || out_of_memory(&oc); + +unlock: mutex_unlock(&oom_lock); return ret; } > + > + ret = out_of_memory(&oc); > + > +out: > mutex_unlock(&oom_lock); > + > return ret; > } > > -- > 1.8.3.1
And btw. "memcg, oom: check memcg margin for parallel oom" would be a better fit because this is not really related to the global oom.
On 2020/07/14 21:37, Michal Hocko wrote: > - if (mutex_lock_killable(&oom_lock)) > + if (!mutex_trylock(&oom_lock)) > return true; I don't like this change. The trylock needlessly wastes CPU time which could have been utilized by the OOM killer/reaper for reclaiming memory. Rather, I want to change - if (!mutex_trylock(&oom_lock)) { + if (mutex_lock_killable(&oom_lock)) { in __alloc_pages_may_oom() side.
On Tue 14-07-20 21:47:06, Tetsuo Handa wrote: > On 2020/07/14 21:37, Michal Hocko wrote: > > - if (mutex_lock_killable(&oom_lock)) > > + if (!mutex_trylock(&oom_lock)) > > return true; > > I don't like this change. The trylock needlessly wastes CPU time which could > have been utilized by the OOM killer/reaper for reclaiming memory. Ohh, that was an unintended hunk which stayed there from the previous discussion. With the margin check the lock itself can stay. > > Rather, I want to change > > - if (!mutex_trylock(&oom_lock)) { > + if (mutex_lock_killable(&oom_lock)) { > > in __alloc_pages_may_oom() side. Please do not start a tangent discussion again. If you want to make this change then post a patch with a justification.
On Tue, Jul 14, 2020 at 8:37 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Tue 14-07-20 08:08:32, Yafang Shao wrote: > > The commit 7775face2079 ("memcg: killed threads should not invoke memcg OOM > > killer") resolves the problem that different threads in a multi-threaded > > task doing parallel memcg oom, but it doesn't solve the problem that > > different tasks doing parallel memcg oom. > > > > It may happens that many different tasks in the same memcg are waiting > > oom_lock at the same time, if one of them has already made progress and > > freed enough available memory, the others don't need to trigger the oom > > killer again. By checking memcg margin after hold oom_lock can help > > achieve it. > > While the changelog makes sense it I believe it can be improved. I would > use something like the following. Feel free to use its parts. > > " > Memcg oom killer invocation is synchronized by the global oom_lock and > tasks are sleeping on the lock while somebody is selecting the victim or > potentially race with the oom_reaper is releasing the victim's memory. > This can result in a pointless oom killer invocation because a waiter > might be racing with the oom_reaper > > P1 oom_reaper P2 > oom_reap_task mutex_lock(oom_lock) > out_of_memory # no victim because we have one already > __oom_reap_task_mm mute_unlock(oom_lock) > mutex_lock(oom_lock) > set MMF_OOM_SKIP > select_bad_process > # finds a new victim > > The page allocator prevents from this race by trying to allocate after > the lock can be acquired (in __alloc_pages_may_oom) which acts as a last > minute check. Moreover page allocator simply doesn't block on the > oom_lock and simply retries the whole reclaim process. > > Memcg oom killer should do the last minute check as well. Call > mem_cgroup_margin to do that. Trylock on the oom_lock could be done as > well but this doesn't seem to be necessary at this stage. > " > > > Suggested-by: Michal Hocko <mhocko@kernel.org> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > Cc: Michal Hocko <mhocko@kernel.org> > > Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > > Cc: David Rientjes <rientjes@google.com> > > --- > > mm/memcontrol.c | 19 +++++++++++++++++-- > > 1 file changed, 17 insertions(+), 2 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 1962232..df141e1 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1560,16 +1560,31 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > > .gfp_mask = gfp_mask, > > .order = order, > > }; > > - bool ret; > > + bool ret = true; > > > > if (mutex_lock_killable(&oom_lock)) > > return true; > > + > > /* > > * A few threads which were not waiting at mutex_lock_killable() can > > * fail to bail out. Therefore, check again after holding oom_lock. > > */ > > - ret = should_force_charge() || out_of_memory(&oc); > > + if (should_force_charge()) > > + goto out; > > + > > + /* > > + * Different tasks may be doing parallel oom, so after hold the > > + * oom_lock the task should check the memcg margin again to check > > + * whether other task has already made progress. > > + */ > > + if (mem_cgroup_margin(memcg) >= (1 << order)) > > + goto out; > > Is there any reason why you simply haven't done this? (+ your comment > which is helpful). > No strong reason. I just think that threads of a multi-thread task are more likely to do parallel OOM, so I checked it first. I can change it as you suggested below, as it is more simple. > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 248e6cad0095..2c176825efe3 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1561,15 +1561,21 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > .order = order, > .chosen_points = LONG_MIN, > }; > - bool ret; > + bool ret = true; > > - if (mutex_lock_killable(&oom_lock)) > + if (!mutex_trylock(&oom_lock)) > return true; > + > + if (mem_cgroup_margin(memcg) >= (1 << order)) > + goto unlock; > + > /* > * A few threads which were not waiting at mutex_lock_killable() can > * fail to bail out. Therefore, check again after holding oom_lock. > */ > ret = should_force_charge() || out_of_memory(&oc); > + > +unlock: > mutex_unlock(&oom_lock); > return ret; > } > > + > > + ret = out_of_memory(&oc); > > + > > +out: > > mutex_unlock(&oom_lock); > > + > > return ret; > > } > > > > -- > > 1.8.3.1 > > -- > Michal Hocko > SUSE Labs
On Tue, Jul 14, 2020 at 8:38 PM Michal Hocko <mhocko@kernel.org> wrote: > > And btw. "memcg, oom: check memcg margin for parallel oom" would be a > better fit because this is not really related to the global oom. Sure, I will do it.
On Tue 14-07-20 21:25:04, Yafang Shao wrote: > On Tue, Jul 14, 2020 at 8:37 PM Michal Hocko <mhocko@kernel.org> wrote: [...] > > > @@ -1560,16 +1560,31 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > > > .gfp_mask = gfp_mask, > > > .order = order, > > > }; > > > - bool ret; > > > + bool ret = true; > > > > > > if (mutex_lock_killable(&oom_lock)) > > > return true; > > > + > > > /* > > > * A few threads which were not waiting at mutex_lock_killable() can > > > * fail to bail out. Therefore, check again after holding oom_lock. > > > */ > > > - ret = should_force_charge() || out_of_memory(&oc); > > > + if (should_force_charge()) > > > + goto out; > > > + > > > + /* > > > + * Different tasks may be doing parallel oom, so after hold the > > > + * oom_lock the task should check the memcg margin again to check > > > + * whether other task has already made progress. > > > + */ > > > + if (mem_cgroup_margin(memcg) >= (1 << order)) > > > + goto out; > > > > Is there any reason why you simply haven't done this? (+ your comment > > which is helpful). > > > > No strong reason. > I just think that threads of a multi-thread task are more likely to do > parallel OOM, so I checked it first. > I can change it as you suggested below, as it is more simple. I would rather go with simplicity. This is a super slow path so ordering of checks shouldn't matter much (if at all).
On Tue, Jul 14, 2020 at 9:34 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Tue 14-07-20 21:25:04, Yafang Shao wrote: > > On Tue, Jul 14, 2020 at 8:37 PM Michal Hocko <mhocko@kernel.org> wrote: > [...] > > > > @@ -1560,16 +1560,31 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > > > > .gfp_mask = gfp_mask, > > > > .order = order, > > > > }; > > > > - bool ret; > > > > + bool ret = true; > > > > > > > > if (mutex_lock_killable(&oom_lock)) > > > > return true; > > > > + > > > > /* > > > > * A few threads which were not waiting at mutex_lock_killable() can > > > > * fail to bail out. Therefore, check again after holding oom_lock. > > > > */ > > > > - ret = should_force_charge() || out_of_memory(&oc); > > > > + if (should_force_charge()) > > > > + goto out; > > > > + > > > > + /* > > > > + * Different tasks may be doing parallel oom, so after hold the > > > > + * oom_lock the task should check the memcg margin again to check > > > > + * whether other task has already made progress. > > > > + */ > > > > + if (mem_cgroup_margin(memcg) >= (1 << order)) > > > > + goto out; > > > > > > Is there any reason why you simply haven't done this? (+ your comment > > > which is helpful). > > > > > > > No strong reason. > > I just think that threads of a multi-thread task are more likely to do > > parallel OOM, so I checked it first. > > I can change it as you suggested below, as it is more simple. > > I would rather go with simplicity. This is a super slow path so ordering > of checks shouldn't matter much (if at all). > Sure.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1962232..df141e1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1560,16 +1560,31 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, .gfp_mask = gfp_mask, .order = order, }; - bool ret; + bool ret = true; if (mutex_lock_killable(&oom_lock)) return true; + /* * A few threads which were not waiting at mutex_lock_killable() can * fail to bail out. Therefore, check again after holding oom_lock. */ - ret = should_force_charge() || out_of_memory(&oc); + if (should_force_charge()) + goto out; + + /* + * Different tasks may be doing parallel oom, so after hold the + * oom_lock the task should check the memcg margin again to check + * whether other task has already made progress. + */ + if (mem_cgroup_margin(memcg) >= (1 << order)) + goto out; + + ret = out_of_memory(&oc); + +out: mutex_unlock(&oom_lock); + return ret; }
The commit 7775face2079 ("memcg: killed threads should not invoke memcg OOM killer") resolves the problem that different threads in a multi-threaded task doing parallel memcg oom, but it doesn't solve the problem that different tasks doing parallel memcg oom. It may happens that many different tasks in the same memcg are waiting oom_lock at the same time, if one of them has already made progress and freed enough available memory, the others don't need to trigger the oom killer again. By checking memcg margin after hold oom_lock can help achieve it. Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: David Rientjes <rientjes@google.com> --- mm/memcontrol.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)