Message ID | 01370f70-e1f6-ebe4-b95e-0df21a0bc15e@i-love.sakura.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] memcg: killed threads should not invoke memcg OOM killer | expand |
On Tue 15-01-19 19:17:27, Tetsuo Handa wrote: > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > If $N > $M, a single process with $N threads in a memcg group can easily > kill all $M processes in that memcg group, for mem_cgroup_out_of_memory() > does not check if current thread needs to invoke the memcg OOM killer. > > T1@P1 |T2...$N@P1|P2...$M |OOM reaper > ----------+----------+----------+---------- > # all sleeping > try_charge() > mem_cgroup_out_of_memory() > mutex_lock(oom_lock) > try_charge() > mem_cgroup_out_of_memory() > mutex_lock(oom_lock) > out_of_memory() > select_bad_process() > oom_kill_process(P1) > wake_oom_reaper() > oom_reap_task() # ignores P1 > mutex_unlock(oom_lock) > out_of_memory() > select_bad_process(P2...$M) > # all killed by T2...$N@P1 > wake_oom_reaper() > oom_reap_task() # ignores P2...$M > mutex_unlock(oom_lock) > > We don't need to invoke the memcg OOM killer if current thread was killed > when waiting for oom_lock, for mem_cgroup_oom_synchronize(true) can count > on try_charge() when mem_cgroup_oom_synchronize(true) can not make forward > progress because try_charge() allows already killed/exiting threads to > make forward progress, and memory_max_write() can bail out upon signals. > > At first Michal thought that fatal signal check is racy compared to > tsk_is_oom_victim() check. But an experiment showed that trying to call > mark_oom_victim() on all killed thread groups is more racy than fatal > signal check due to task_will_free_mem(current) path in out_of_memory(). > > Therefore, this patch changes mem_cgroup_out_of_memory() to bail out upon > should_force_charge() == T rather than upon fatal_signal_pending() == T, > for should_force_charge() == T && signal_pending(current) == F at > memory_max_write() can't happen because current thread won't call > memory_max_write() after getting PF_EXITING. The changelog is too cryptic IMHO. Can we make it something as simple as " If a memory cgroup contains a single process with many threads (including different process group sharing the mm) then it is possible to trigger a race when the oom killer complains that there are no oom elible tasks and complain into the log which is both annoying and confusing because there is no actual problem. The race looks as follows: P1 oom_reaper P2 try_charge try_charge mem_cgroup_out_of_memory mutex_lock(oom_lock) out_of_memory oom_kill_process(P1,P2) wake_oom_reaper mutex_unlock(oom_lock) oom_reap_task mutex_lock(oom_lock) select_bad_process # no victim The problem is more visible with many threads. Fix this by checking for fatal_signal_pending from mem_cgroup_out_of_memory when the oom_lock is already held. The oom bypass is safe because we do the same early in the try_charge path already. The situation migh have changed in the mean time. It should be safe to check for fatal_signal_pending and tsk_is_oom_victim but for a better code readability abstract the current charge bypass condition into should_force_charge and reuse it from that path. " > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Anyway, to the change Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/memcontrol.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index af7f18b..79a7d2a 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -248,6 +248,12 @@ enum res_type { > iter != NULL; \ > iter = mem_cgroup_iter(NULL, iter, NULL)) > > +static inline bool should_force_charge(void) > +{ > + return tsk_is_oom_victim(current) || fatal_signal_pending(current) || > + (current->flags & PF_EXITING); > +} > + > /* Some nice accessors for the vmpressure. */ > struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg) > { > @@ -1389,8 +1395,13 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > }; > bool ret; > > - mutex_lock(&oom_lock); > - ret = out_of_memory(&oc); > + if (mutex_lock_killable(&oom_lock)) > + return true; > + /* > + * A few threads which were not waiting at mutex_lock_killable() can > + * fail to bail out. Therefore, check again after holding oom_lock. > + */ > + ret = should_force_charge() || out_of_memory(&oc); > mutex_unlock(&oom_lock); > return ret; > } > @@ -2209,9 +2220,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > * bypass the last charges so that they can exit quickly and > * free their memory. > */ > - if (unlikely(tsk_is_oom_victim(current) || > - fatal_signal_pending(current) || > - current->flags & PF_EXITING)) > + if (unlikely(should_force_charge())) > goto force; > > /* > -- > 1.8.3.1
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index af7f18b..79a7d2a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -248,6 +248,12 @@ enum res_type { iter != NULL; \ iter = mem_cgroup_iter(NULL, iter, NULL)) +static inline bool should_force_charge(void) +{ + return tsk_is_oom_victim(current) || fatal_signal_pending(current) || + (current->flags & PF_EXITING); +} + /* Some nice accessors for the vmpressure. */ struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg) { @@ -1389,8 +1395,13 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, }; bool ret; - mutex_lock(&oom_lock); - ret = out_of_memory(&oc); + if (mutex_lock_killable(&oom_lock)) + return true; + /* + * A few threads which were not waiting at mutex_lock_killable() can + * fail to bail out. Therefore, check again after holding oom_lock. + */ + ret = should_force_charge() || out_of_memory(&oc); mutex_unlock(&oom_lock); return ret; } @@ -2209,9 +2220,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(tsk_is_oom_victim(current) || - fatal_signal_pending(current) || - current->flags & PF_EXITING)) + if (unlikely(should_force_charge())) goto force; /*