Message ID | 20190107143802.16847-3-mhocko@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | oom, memcg: do not report racy no-eligible OOM | expand |
On 2019/01/07 23:38, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > Tetsuo has reported [1] that a single process group memcg might easily > swamp the log with no-eligible oom victim reports due to race between > the memcg charge and oom_reaper This explanation is outdated. I reported that one memcg OOM killer can kill all processes in that memcg. I expect the changelog to be updated. > > Thread 1 Thread2 oom_reaper > try_charge try_charge > mem_cgroup_out_of_memory > mutex_lock(oom_lock) > mem_cgroup_out_of_memory > mutex_lock(oom_lock) > out_of_memory > select_bad_process > oom_kill_process(current) > wake_oom_reaper > oom_reap_task > MMF_OOM_SKIP->victim > mutex_unlock(oom_lock) > out_of_memory > select_bad_process # no task > > If Thread1 didn't race it would bail out from try_charge and force the > charge. We can achieve the same by checking tsk_is_oom_victim inside > the oom_lock and therefore close the race. > > [1] http://lkml.kernel.org/r/bb2074c0-34fe-8c2c-1c7d-db71338f1e7f@i-love.sakura.ne.jp > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/memcontrol.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index af7f18b32389..90eb2e2093e7 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > .gfp_mask = gfp_mask, > .order = order, > }; > - bool ret; > + bool ret = true; > > mutex_lock(&oom_lock); And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom victims", mark_oom_victim() will be called on current thread even if we used mutex_lock_killable(&oom_lock) here, like you said mutex_lock_killable would take care of exiting task already. I would then still prefer to check for mark_oom_victim because that is not racy with the exit path clearing signals. I can update my patch to use _killable lock variant if we are really going with the memcg specific fix. . If current thread is not yet killed by the OOM killer but can terminate without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here saves some processes. What is the race you are referring by "racy with the exit path clearing signals" ? > + > + /* > + * multi-threaded tasks might race with oom_reaper and gain > + * MMF_OOM_SKIP before reaching out_of_memory which can lead > + * to out_of_memory failure if the task is the last one in > + * memcg which would be a false possitive failure reported > + */ Not only out_of_memory() failure. Current thread needlessly tries to select next OOM victim. out_of_memory() failure is nothing but a result of no eligible candidate case. > + if (tsk_is_oom_victim(current)) > + goto unlock; > + > ret = out_of_memory(&oc); > + > +unlock: > mutex_unlock(&oom_lock); > return ret; > } >
On Tue 08-01-19 05:59:49, Tetsuo Handa wrote: > On 2019/01/07 23:38, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > Tetsuo has reported [1] that a single process group memcg might easily > > swamp the log with no-eligible oom victim reports due to race between > > the memcg charge and oom_reaper > > This explanation is outdated. I reported that one memcg OOM killer can > kill all processes in that memcg. I expect the changelog to be updated. I am open to refinements. Any specific wording you have in mind? > > > > Thread 1 Thread2 oom_reaper > > try_charge try_charge > > mem_cgroup_out_of_memory > > mutex_lock(oom_lock) > > mem_cgroup_out_of_memory > > mutex_lock(oom_lock) > > out_of_memory > > select_bad_process > > oom_kill_process(current) > > wake_oom_reaper > > oom_reap_task > > MMF_OOM_SKIP->victim > > mutex_unlock(oom_lock) > > out_of_memory > > select_bad_process # no task > > > > If Thread1 didn't race it would bail out from try_charge and force the > > charge. We can achieve the same by checking tsk_is_oom_victim inside > > the oom_lock and therefore close the race. > > > > [1] http://lkml.kernel.org/r/bb2074c0-34fe-8c2c-1c7d-db71338f1e7f@i-love.sakura.ne.jp > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > mm/memcontrol.c | 14 +++++++++++++- > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index af7f18b32389..90eb2e2093e7 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > > .gfp_mask = gfp_mask, > > .order = order, > > }; > > - bool ret; > > + bool ret = true; > > > > mutex_lock(&oom_lock); > > And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom > victims", mark_oom_victim() will be called on current thread even if > we used mutex_lock_killable(&oom_lock) here, like you said > > mutex_lock_killable would take care of exiting task already. I would > then still prefer to check for mark_oom_victim because that is not racy > with the exit path clearing signals. I can update my patch to use > _killable lock variant if we are really going with the memcg specific > fix. > > . If current thread is not yet killed by the OOM killer but can terminate > without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here > saves some processes. What is the race you are referring by "racy with the > exit path clearing signals" ? This is unrelated to the patch. > > + > > + /* > > + * multi-threaded tasks might race with oom_reaper and gain > > + * MMF_OOM_SKIP before reaching out_of_memory which can lead > > + * to out_of_memory failure if the task is the last one in > > + * memcg which would be a false possitive failure reported > > + */ > > Not only out_of_memory() failure. Current thread needlessly tries to > select next OOM victim. out_of_memory() failure is nothing but a result > of no eligible candidate case. So? Let me ask again. Does this solve the issue you are seeing? I really do not want to end in nit picking endless thread again and would like to move on. > > + if (tsk_is_oom_victim(current)) > > + goto unlock; > > + > > ret = out_of_memory(&oc); > > + > > +unlock: > > mutex_unlock(&oom_lock); > > return ret; > > } > >
Hi Michal,
I love your patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[also build test WARNING on v5.0-rc1 next-20190108]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Michal-Hocko/oom-memcg-do-not-report-racy-no-eligible-OOM/20190108-092805
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
include/linux/rcupdate.h:659:9: warning: context imbalance in 'find_lock_task_mm' - wrong count at exit
include/linux/sched/mm.h:141:37: warning: dereference of noderef expression
mm/oom_kill.c:225:28: warning: context imbalance in 'oom_badness' - unexpected unlock
mm/oom_kill.c:406:9: warning: context imbalance in 'dump_tasks' - different lock contexts for basic block
>> mm/oom_kill.c:918:17: warning: context imbalance in '__oom_kill_process' - unexpected unlock
vim +/__oom_kill_process +918 mm/oom_kill.c
1af8bb43 Michal Hocko 2016-07-28 845
5989ad7b Roman Gushchin 2018-08-21 846 static void __oom_kill_process(struct task_struct *victim)
^1da177e Linus Torvalds 2005-04-16 847 {
5989ad7b Roman Gushchin 2018-08-21 848 struct task_struct *p;
647f2bdf David Rientjes 2012-03-21 849 struct mm_struct *mm;
bb29902a Tetsuo Handa 2016-03-25 850 bool can_oom_reap = true;
^1da177e Linus Torvalds 2005-04-16 851
6b0c81b3 David Rientjes 2012-07-31 852 p = find_lock_task_mm(victim);
6b0c81b3 David Rientjes 2012-07-31 853 if (!p) {
6b0c81b3 David Rientjes 2012-07-31 854 put_task_struct(victim);
647f2bdf David Rientjes 2012-03-21 855 return;
6b0c81b3 David Rientjes 2012-07-31 856 } else if (victim != p) {
6b0c81b3 David Rientjes 2012-07-31 857 get_task_struct(p);
6b0c81b3 David Rientjes 2012-07-31 858 put_task_struct(victim);
6b0c81b3 David Rientjes 2012-07-31 859 victim = p;
6b0c81b3 David Rientjes 2012-07-31 860 }
647f2bdf David Rientjes 2012-03-21 861
880b7689 Tetsuo Handa 2015-11-05 862 /* Get a reference to safely compare mm after task_unlock(victim) */
647f2bdf David Rientjes 2012-03-21 863 mm = victim->mm;
f1f10076 Vegard Nossum 2017-02-27 864 mmgrab(mm);
8e675f7a Konstantin Khlebnikov 2017-07-06 865
8e675f7a Konstantin Khlebnikov 2017-07-06 866 /* Raise event before sending signal: task reaper must see this */
8e675f7a Konstantin Khlebnikov 2017-07-06 867 count_vm_event(OOM_KILL);
fe6bdfc8 Roman Gushchin 2018-06-14 868 memcg_memory_event_mm(mm, MEMCG_OOM_KILL);
8e675f7a Konstantin Khlebnikov 2017-07-06 869
426fb5e7 Tetsuo Handa 2015-11-05 870 /*
cd04ae1e Michal Hocko 2017-09-06 871 * We should send SIGKILL before granting access to memory reserves
cd04ae1e Michal Hocko 2017-09-06 872 * in order to prevent the OOM victim from depleting the memory
cd04ae1e Michal Hocko 2017-09-06 873 * reserves from the user space under its control.
426fb5e7 Tetsuo Handa 2015-11-05 874 */
079b22dc Eric W. Biederman 2018-09-03 875 do_send_sig_info(SIGKILL, SEND_SIG_PRIV, victim, PIDTYPE_TGID);
16e95196 Johannes Weiner 2015-06-24 876 mark_oom_victim(victim);
eca56ff9 Jerome Marchand 2016-01-14 877 pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
647f2bdf David Rientjes 2012-03-21 878 task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
647f2bdf David Rientjes 2012-03-21 879 K(get_mm_counter(victim->mm, MM_ANONPAGES)),
eca56ff9 Jerome Marchand 2016-01-14 880 K(get_mm_counter(victim->mm, MM_FILEPAGES)),
eca56ff9 Jerome Marchand 2016-01-14 881 K(get_mm_counter(victim->mm, MM_SHMEMPAGES)));
647f2bdf David Rientjes 2012-03-21 882 task_unlock(victim);
647f2bdf David Rientjes 2012-03-21 883
647f2bdf David Rientjes 2012-03-21 884 /*
647f2bdf David Rientjes 2012-03-21 885 * Kill all user processes sharing victim->mm in other thread groups, if
647f2bdf David Rientjes 2012-03-21 886 * any. They don't get access to memory reserves, though, to avoid
647f2bdf David Rientjes 2012-03-21 887 * depletion of all memory. This prevents mm->mmap_sem livelock when an
647f2bdf David Rientjes 2012-03-21 888 * oom killed thread cannot exit because it requires the semaphore and
647f2bdf David Rientjes 2012-03-21 889 * its contended by another thread trying to allocate memory itself.
647f2bdf David Rientjes 2012-03-21 890 * That thread will now get access to memory reserves since it has a
647f2bdf David Rientjes 2012-03-21 891 * pending fatal signal.
647f2bdf David Rientjes 2012-03-21 892 */
4d4048be Oleg Nesterov 2014-01-21 893 rcu_read_lock();
c319025a Oleg Nesterov 2015-11-05 894 for_each_process(p) {
00508538 Michal Hocko 2019-01-07 895 struct task_struct *t;
4d7b3394 Oleg Nesterov 2015-11-05 896 if (!process_shares_mm(p, mm))
c319025a Oleg Nesterov 2015-11-05 897 continue;
c319025a Oleg Nesterov 2015-11-05 898 if (same_thread_group(p, victim))
c319025a Oleg Nesterov 2015-11-05 899 continue;
1b51e65e Michal Hocko 2016-10-07 900 if (is_global_init(p)) {
aac45363 Michal Hocko 2016-03-25 901 can_oom_reap = false;
862e3073 Michal Hocko 2016-10-07 902 set_bit(MMF_OOM_SKIP, &mm->flags);
a373966d Michal Hocko 2016-07-28 903 pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n",
a373966d Michal Hocko 2016-07-28 904 task_pid_nr(victim), victim->comm,
a373966d Michal Hocko 2016-07-28 905 task_pid_nr(p), p->comm);
647f2bdf David Rientjes 2012-03-21 906 continue;
aac45363 Michal Hocko 2016-03-25 907 }
1b51e65e Michal Hocko 2016-10-07 908 /*
1b51e65e Michal Hocko 2016-10-07 909 * No use_mm() user needs to read from the userspace so we are
1b51e65e Michal Hocko 2016-10-07 910 * ok to reap it.
1b51e65e Michal Hocko 2016-10-07 911 */
1b51e65e Michal Hocko 2016-10-07 912 if (unlikely(p->flags & PF_KTHREAD))
1b51e65e Michal Hocko 2016-10-07 913 continue;
079b22dc Eric W. Biederman 2018-09-03 914 do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_TGID);
00508538 Michal Hocko 2019-01-07 915 t = find_lock_task_mm(p);
00508538 Michal Hocko 2019-01-07 916 if (!t)
00508538 Michal Hocko 2019-01-07 917 continue;
00508538 Michal Hocko 2019-01-07 @918 mark_oom_victim(t);
00508538 Michal Hocko 2019-01-07 919 task_unlock(t);
647f2bdf David Rientjes 2012-03-21 920 }
6b0c81b3 David Rientjes 2012-07-31 921 rcu_read_unlock();
647f2bdf David Rientjes 2012-03-21 922
aac45363 Michal Hocko 2016-03-25 923 if (can_oom_reap)
36324a99 Michal Hocko 2016-03-25 924 wake_oom_reaper(victim);
aac45363 Michal Hocko 2016-03-25 925
880b7689 Tetsuo Handa 2015-11-05 926 mmdrop(mm);
6b0c81b3 David Rientjes 2012-07-31 927 put_task_struct(victim);
^1da177e Linus Torvalds 2005-04-16 928 }
647f2bdf David Rientjes 2012-03-21 929 #undef K
^1da177e Linus Torvalds 2005-04-16 930
:::::: The code at line 918 was first introduced by commit
:::::: 00508538cb045f28a2f60e5d2dff98b77b0da725 mm, oom: marks all killed tasks as oom victims
:::::: TO: Michal Hocko <mhocko@suse.com>
:::::: CC: 0day robot <lkp@intel.com>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Tue 08-01-19 16:35:42, kbuild test robot wrote: [...] > All warnings (new ones prefixed by >>): > > include/linux/rcupdate.h:659:9: warning: context imbalance in 'find_lock_task_mm' - wrong count at exit > include/linux/sched/mm.h:141:37: warning: dereference of noderef expression > mm/oom_kill.c:225:28: warning: context imbalance in 'oom_badness' - unexpected unlock > mm/oom_kill.c:406:9: warning: context imbalance in 'dump_tasks' - different lock contexts for basic block > >> mm/oom_kill.c:918:17: warning: context imbalance in '__oom_kill_process' - unexpected unlock What exactly does this warning say? I do not see anything wrong about the code. find_lock_task_mm returns a locked task when t != NULL and mark_oom_victim doesn't do anything about the locking. Am I missing something or the warning is just confused? [...] > 00508538 Michal Hocko 2019-01-07 915 t = find_lock_task_mm(p); > 00508538 Michal Hocko 2019-01-07 916 if (!t) > 00508538 Michal Hocko 2019-01-07 917 continue; > 00508538 Michal Hocko 2019-01-07 @918 mark_oom_victim(t); > 00508538 Michal Hocko 2019-01-07 919 task_unlock(t); > 647f2bdf David Rientjes 2012-03-21 920 }
On 2019/01/08 17:14, Michal Hocko wrote: >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>> index af7f18b32389..90eb2e2093e7 100644 >>> --- a/mm/memcontrol.c >>> +++ b/mm/memcontrol.c >>> @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, >>> .gfp_mask = gfp_mask, >>> .order = order, >>> }; >>> - bool ret; >>> + bool ret = true; >>> >>> mutex_lock(&oom_lock); >> >> And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom >> victims", mark_oom_victim() will be called on current thread even if >> we used mutex_lock_killable(&oom_lock) here, like you said >> >> mutex_lock_killable would take care of exiting task already. I would >> then still prefer to check for mark_oom_victim because that is not racy >> with the exit path clearing signals. I can update my patch to use >> _killable lock variant if we are really going with the memcg specific >> fix. >> >> . If current thread is not yet killed by the OOM killer but can terminate >> without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here >> saves some processes. What is the race you are referring by "racy with the >> exit path clearing signals" ? > > This is unrelated to the patch. Ultimately related! This is the reasoning why your patch should be preferred over my patch. For example, if memcg OOM events in different domains are pending, already OOM-killed threads needlessly wait for pending memcg OOM events in different domains. An out_of_memory() call is slow because it involves printk(). With slow serial consoles, out_of_memory() might take more than a second. I consider that allowing killed processes to call mmput() from exit_mm() from do_exit() quickly (instead of waiting for pending memcg OOM events in different domains at mem_cgroup_out_of_memory()) helps calling __mmput() (which can reclaim more memory than the OOM reaper can reclaim) quickly. Unless what you call "racy" is problematic, I don't see reasons not to apply my patch. So, please please answer what you are referring to with "racy".
On Tue 08-01-19 19:39:58, Tetsuo Handa wrote: > On 2019/01/08 17:14, Michal Hocko wrote: > >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >>> index af7f18b32389..90eb2e2093e7 100644 > >>> --- a/mm/memcontrol.c > >>> +++ b/mm/memcontrol.c > >>> @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > >>> .gfp_mask = gfp_mask, > >>> .order = order, > >>> }; > >>> - bool ret; > >>> + bool ret = true; > >>> > >>> mutex_lock(&oom_lock); > >> > >> And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom > >> victims", mark_oom_victim() will be called on current thread even if > >> we used mutex_lock_killable(&oom_lock) here, like you said > >> > >> mutex_lock_killable would take care of exiting task already. I would > >> then still prefer to check for mark_oom_victim because that is not racy > >> with the exit path clearing signals. I can update my patch to use > >> _killable lock variant if we are really going with the memcg specific > >> fix. > >> > >> . If current thread is not yet killed by the OOM killer but can terminate > >> without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here > >> saves some processes. What is the race you are referring by "racy with the > >> exit path clearing signals" ? > > > > This is unrelated to the patch. > > Ultimately related! This is the reasoning why your patch should be preferred > over my patch. No! I've said I do not mind using mutex_lock_killable on top of this patch. I just want to have this fix minimal.
On 01/08/2019 05:39 PM, Michal Hocko wrote: > On Tue 08-01-19 16:35:42, kbuild test robot wrote: > [...] >> All warnings (new ones prefixed by >>): >> >> include/linux/rcupdate.h:659:9: warning: context imbalance in 'find_lock_task_mm' - wrong count at exit >> include/linux/sched/mm.h:141:37: warning: dereference of noderef expression >> mm/oom_kill.c:225:28: warning: context imbalance in 'oom_badness' - unexpected unlock >> mm/oom_kill.c:406:9: warning: context imbalance in 'dump_tasks' - different lock contexts for basic block >>>> mm/oom_kill.c:918:17: warning: context imbalance in '__oom_kill_process' - unexpected unlock > What exactly does this warning say? I do not see anything wrong about > the code. find_lock_task_mm returns a locked task when t != NULL and > mark_oom_victim doesn't do anything about the locking. Am I missing > something or the warning is just confused? Thanks for your reply. It looks like a false positive. We'll look into it. Best Regards, Rong Chen > > [...] >> 00508538 Michal Hocko 2019-01-07 915 t = find_lock_task_mm(p); >> 00508538 Michal Hocko 2019-01-07 916 if (!t) >> 00508538 Michal Hocko 2019-01-07 917 continue; >> 00508538 Michal Hocko 2019-01-07 @918 mark_oom_victim(t); >> 00508538 Michal Hocko 2019-01-07 919 task_unlock(t); >> 647f2bdf David Rientjes 2012-03-21 920 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index af7f18b32389..90eb2e2093e7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, .gfp_mask = gfp_mask, .order = order, }; - bool ret; + bool ret = true; mutex_lock(&oom_lock); + + /* + * multi-threaded tasks might race with oom_reaper and gain + * MMF_OOM_SKIP before reaching out_of_memory which can lead + * to out_of_memory failure if the task is the last one in + * memcg which would be a false possitive failure reported + */ + if (tsk_is_oom_victim(current)) + goto unlock; + ret = out_of_memory(&oc); + +unlock: mutex_unlock(&oom_lock); return ret; }