Message ID | 1595166795-27587-1-git-send-email-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm, oom: show process exiting information in __oom_kill_process() | expand |
On 2020/07/19 22:53, Yafang Shao wrote: > p = find_lock_task_mm(victim); > if (!p) { > + pr_info("Process %d (%s) is already exiting\n", > + task_pid_nr(victim), victim->comm); Maybe + pr_info("%s: Process %d (%s) is already exiting\n", + message, task_pid_nr(victim), victim->comm); line is easier to compare with pr_err("%s: Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB, UID:%u pgtables:%lukB oom_score_adj:%hd\n", message, task_pid_nr(victim), victim->comm, K(mm->total_vm), line? > put_task_struct(victim); > return;
On Mon, Jul 20, 2020 at 7:20 AM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2020/07/19 22:53, Yafang Shao wrote: > > p = find_lock_task_mm(victim); > > if (!p) { > > + pr_info("Process %d (%s) is already exiting\n", > > + task_pid_nr(victim), victim->comm); > > Maybe > > + pr_info("%s: Process %d (%s) is already exiting\n", > + message, task_pid_nr(victim), victim->comm); > > line is easier to compare with > > pr_err("%s: Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB, UID:%u pgtables:%lukB oom_score_adj:%hd\n", > message, task_pid_nr(victim), victim->comm, K(mm->total_vm), > > line? > Seems so. Thanks for the suggestion. > > put_task_struct(victim); > > return;
On Sun 19-07-20 09:53:15, Yafang Shao wrote: > When the OOM killer finding a victim and trying to kill it, if the victim > is already exiting, the task mm will be NULL and no process will be killed. > But the dump_header() has been already executed, so it will be strange to > dump so many information without killing a process. We'd better show some > helpful information to indicate why this happens. > > Suggested-by: David Rientjes <rientjes@google.com> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > --- > mm/oom_kill.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 6e94962..0480dde 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -863,9 +863,13 @@ static void __oom_kill_process(struct task_struct *victim, const char *message) > > p = find_lock_task_mm(victim); > if (!p) { > + pr_info("Process %d (%s) is already exiting\n", > + task_pid_nr(victim), victim->comm); > put_task_struct(victim); I do agree that a silent bail out is not the best thing to do. The above message would be more useful if it also explained what the oom killer does (or does not): "OOM victim %d (%s) is already exiting. Skip killing the task\n" > return; > - } else if (victim != p) { > + } > + > + if (victim != p) { Why do we need this? > get_task_struct(p); > put_task_struct(victim); > victim = p; > -- > 1.8.3.1
On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Sun 19-07-20 09:53:15, Yafang Shao wrote: > > When the OOM killer finding a victim and trying to kill it, if the victim > > is already exiting, the task mm will be NULL and no process will be killed. > > But the dump_header() has been already executed, so it will be strange to > > dump so many information without killing a process. We'd better show some > > helpful information to indicate why this happens. > > > > Suggested-by: David Rientjes <rientjes@google.com> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > Cc: Michal Hocko <mhocko@kernel.org> > > Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > > --- > > mm/oom_kill.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index 6e94962..0480dde 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -863,9 +863,13 @@ static void __oom_kill_process(struct task_struct *victim, const char *message) > > > > p = find_lock_task_mm(victim); > > if (!p) { > > + pr_info("Process %d (%s) is already exiting\n", > > + task_pid_nr(victim), victim->comm); > > put_task_struct(victim); > > I do agree that a silent bail out is not the best thing to do. The above > message would be more useful if it also explained what the oom killer > does (or does not): > > "OOM victim %d (%s) is already exiting. Skip killing the task\n" > Sure. > > return; > > - } else if (victim != p) { > > + } > > + > > + if (victim != p) { > > Why do we need this? > Because I don't like that code style. But it is not a big problem, I will not change it in the next version. > > get_task_struct(p); > > put_task_struct(victim); > > victim = p; > > -- > > 1.8.3.1 > > -- > Michal Hocko > SUSE Labs
On 2020/07/20 19:36, Yafang Shao wrote: > On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote: >> I do agree that a silent bail out is not the best thing to do. The above >> message would be more useful if it also explained what the oom killer >> does (or does not): >> >> "OOM victim %d (%s) is already exiting. Skip killing the task\n" >> > > Sure. This path is rarely hit because find_lock_task_mm() in oom_badness() from select_bad_process() in the next round of OOM killer will skip this task. Since we don't wake up the OOM reaper when hitting this path, unless __mmput() for this task itself immediately reclaims memory and updates the statistics counter, we just get two chunks of dump_header() messages and one OOM victim. Current synchronous printk() gives __mmput() some time for reclaiming memory and updating the statistics counter. But when printk() becomes asynchronous, there might be quite small time. People might wonder "why killed message follows immediately after skipped killing message"... Wouldn't the skip message confuse people?
On Mon, Jul 20, 2020 at 7:06 PM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2020/07/20 19:36, Yafang Shao wrote: > > On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote: > >> I do agree that a silent bail out is not the best thing to do. The above > >> message would be more useful if it also explained what the oom killer > >> does (or does not): > >> > >> "OOM victim %d (%s) is already exiting. Skip killing the task\n" > >> > > > > Sure. > > This path is rarely hit because find_lock_task_mm() in oom_badness() from > select_bad_process() in the next round of OOM killer will skip this task. > > Since we don't wake up the OOM reaper when hitting this path, unless __mmput() > for this task itself immediately reclaims memory and updates the statistics > counter, we just get two chunks of dump_header() messages and one OOM victim. > Could you pls. explain more specifically why we will get two chunks of dump_header()? My understanding is the free_mm() happens between select_bad_process() and __oom_kill_process() as bellow, P1 Victim select_bad_process() oom_badness() p = find_lock_task_mm() # p isn't NULL __mmput() free_mm() dump_header() # dump once __oom_kill_process() p = find_lock_task_mm(victim); # p is NULL now So where is another dump_header() ? > Current synchronous printk() gives __mmput() some time for reclaiming memory > and updating the statistics counter. But when printk() becomes asynchronous, > there might be quite small time. People might wonder "why killed message > follows immediately after skipped killing message"... Wouldn't the skip > message confuse people?
On 2020/07/20 21:19, Yafang Shao wrote: > On Mon, Jul 20, 2020 at 7:06 PM Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: >> >> On 2020/07/20 19:36, Yafang Shao wrote: >>> On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote: >>>> I do agree that a silent bail out is not the best thing to do. The above >>>> message would be more useful if it also explained what the oom killer >>>> does (or does not): >>>> >>>> "OOM victim %d (%s) is already exiting. Skip killing the task\n" >>>> >>> >>> Sure. >> >> This path is rarely hit because find_lock_task_mm() in oom_badness() from >> select_bad_process() in the next round of OOM killer will skip this task. >> >> Since we don't wake up the OOM reaper when hitting this path, unless __mmput() >> for this task itself immediately reclaims memory and updates the statistics >> counter, we just get two chunks of dump_header() messages and one OOM victim. >> > > Could you pls. explain more specifically why we will get two chunks of > dump_header()? > My understanding is the free_mm() happens between select_bad_process() > and __oom_kill_process() as bellow, > > P1 > Victim > select_bad_process() > oom_badness() > p = find_lock_task_mm() # p isn't NULL > > __mmput() > > free_mm() > dump_header() # dump once > __oom_kill_process() > p = find_lock_task_mm(victim); # p is NULL now > > So where is another dump_header() ? > Start of __mmput() does not guarantee that memory is reclaimed immediately. Moreover, even __mmput() might not have started by the moment second chunk of dump_header() happens. The "OOM victim %d (%s) is already exiting." case only indicates that victim's mm became NULL; there is no guarantee that memory is reclaimed (in order to avoid OOM kill) by the moment next round hits. P1 Victim1 Victim2 out_of_memory() { select_bad_process() { oom_badness() { p = find_lock_task_mm() { task_lock(victim); // finds Victim1 because Victim1->mm != NULL. } get_task_struct(p); task_unlock(p); } } oom_kill_process() { task_lock(victim); task_unlock(victim); do_exit() { dump_header(oc, victim); // first dump_header() with Victim1 and Victim2 __oom_kill_process(victim, message) { exit_mm() { task_lock(current); current->mm = NULL; task_unlock(current); p = find_lock_task_mm(victim); put_task_struct(victim); // without killing Victim1 because p == NULL. } } } } out_of_memory() { select_bad_process() { oom_badness() { p = find_lock_task_mm() { task_lock(victim); // finds Victim2 because Victim2->mm != NULL. } get_task_struct(p); task_unlock(p); } } mmput() { __mmput() { uprobe_clear_state() { // Might wait for delayed_uprobe_lock. } oom_kill_process() { task_lock(victim); task_unlock(victim); dump_header(oc, victim); // second dump_header() with Victim2 __oom_kill_process(victim, message) { p = find_lock_task_mm(victim); pr_err("%s: Killed process %d (%s) "...); // first kill message. put_task_struct(p); } } } exit_mmap(); // Which frees memory. } } } } Maybe the better behavior is to restart out_of_memory() without dump_header() (we can remember whether we already called dump_header() into "struct oom_control"), with last second watermark check before select_bad_process() and after dump_header().
On Mon 20-07-20 18:36:53, Yafang Shao wrote: > On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Sun 19-07-20 09:53:15, Yafang Shao wrote: > > > When the OOM killer finding a victim and trying to kill it, if the victim > > > is already exiting, the task mm will be NULL and no process will be killed. > > > But the dump_header() has been already executed, so it will be strange to > > > dump so many information without killing a process. We'd better show some > > > helpful information to indicate why this happens. > > > > > > Suggested-by: David Rientjes <rientjes@google.com> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > > Cc: Michal Hocko <mhocko@kernel.org> > > > Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > > > --- > > > mm/oom_kill.c | 6 +++++- > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > > index 6e94962..0480dde 100644 > > > --- a/mm/oom_kill.c > > > +++ b/mm/oom_kill.c > > > @@ -863,9 +863,13 @@ static void __oom_kill_process(struct task_struct *victim, const char *message) > > > > > > p = find_lock_task_mm(victim); > > > if (!p) { > > > + pr_info("Process %d (%s) is already exiting\n", > > > + task_pid_nr(victim), victim->comm); > > > put_task_struct(victim); > > > > I do agree that a silent bail out is not the best thing to do. The above > > message would be more useful if it also explained what the oom killer > > does (or does not): > > > > "OOM victim %d (%s) is already exiting. Skip killing the task\n" > > > > Sure. > > > > return; > > > - } else if (victim != p) { > > > + } > > > + > > > + if (victim != p) { > > > > Why do we need this? > > > > Because I don't like that code style. We usually prefer to keep unrelated changes separate. Minor coding style changes are usually questionable because many people have different sense for the code. > But it is not a big problem, I will not change it in the next version. Thanks
On Mon 20-07-20 20:06:17, Tetsuo Handa wrote: > On 2020/07/20 19:36, Yafang Shao wrote: > > On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote: > >> I do agree that a silent bail out is not the best thing to do. The above > >> message would be more useful if it also explained what the oom killer > >> does (or does not): > >> > >> "OOM victim %d (%s) is already exiting. Skip killing the task\n" > >> > > > > Sure. > > This path is rarely hit because find_lock_task_mm() in oom_badness() from > select_bad_process() in the next round of OOM killer will skip this task. Agreed! > Since we don't wake up the OOM reaper when hitting this path, unless __mmput() > for this task itself immediately reclaims memory and updates the statistics > counter, we just get two chunks of dump_header() messages and one OOM victim. > > Current synchronous printk() gives __mmput() some time for reclaiming memory > and updating the statistics counter. But when printk() becomes asynchronous, > there might be quite small time. People might wonder "why killed message > follows immediately after skipped killing message"... Wouldn't the skip > message confuse people? I would ask other way around. Wouldn't that give us a better clue that the first oom invocation and the back off was a suboptimal decision? If we learn about more of those, maybe we want to reconsider this heuristic and rather retry the victim selection instead. I do not really see how this message would be harmful TBH.
On Mon, Jul 20, 2020 at 9:11 PM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2020/07/20 21:19, Yafang Shao wrote: > > On Mon, Jul 20, 2020 at 7:06 PM Tetsuo Handa > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > >> > >> On 2020/07/20 19:36, Yafang Shao wrote: > >>> On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote: > >>>> I do agree that a silent bail out is not the best thing to do. The above > >>>> message would be more useful if it also explained what the oom killer > >>>> does (or does not): > >>>> > >>>> "OOM victim %d (%s) is already exiting. Skip killing the task\n" > >>>> > >>> > >>> Sure. > >> > >> This path is rarely hit because find_lock_task_mm() in oom_badness() from > >> select_bad_process() in the next round of OOM killer will skip this task. > >> > >> Since we don't wake up the OOM reaper when hitting this path, unless __mmput() > >> for this task itself immediately reclaims memory and updates the statistics > >> counter, we just get two chunks of dump_header() messages and one OOM victim. > >> > > > > Could you pls. explain more specifically why we will get two chunks of > > dump_header()? > > My understanding is the free_mm() happens between select_bad_process() > > and __oom_kill_process() as bellow, > > > > P1 > > Victim > > select_bad_process() > > oom_badness() > > p = find_lock_task_mm() # p isn't NULL > > > > __mmput() > > > > free_mm() > > dump_header() # dump once > > __oom_kill_process() > > p = find_lock_task_mm(victim); # p is NULL now > > > > So where is another dump_header() ? > > > > Start of __mmput() does not guarantee that memory is reclaimed immediately. > Moreover, even __mmput() might not have started by the moment second chunk of > dump_header() happens. The "OOM victim %d (%s) is already exiting." case only > indicates that victim's mm became NULL; there is no guarantee that memory is > reclaimed (in order to avoid OOM kill) by the moment next round hits. > > P1 Victim1 Victim2 > > out_of_memory() { > select_bad_process() { > oom_badness() { > p = find_lock_task_mm() { > task_lock(victim); // finds Victim1 because Victim1->mm != NULL. > } > get_task_struct(p); > task_unlock(p); > } > } > oom_kill_process() { > task_lock(victim); > task_unlock(victim); > do_exit() { > dump_header(oc, victim); // first dump_header() with Victim1 and Victim2 > __oom_kill_process(victim, message) { > exit_mm() { > task_lock(current); > current->mm = NULL; > task_unlock(current); > p = find_lock_task_mm(victim); > put_task_struct(victim); // without killing Victim1 because p == NULL. > } > } > } > } > out_of_memory() { > select_bad_process() { > oom_badness() { > p = find_lock_task_mm() { > task_lock(victim); // finds Victim2 because Victim2->mm != NULL. > } > get_task_struct(p); > task_unlock(p); > } > } > mmput() { > __mmput() { > uprobe_clear_state() { > // Might wait for delayed_uprobe_lock. > } > oom_kill_process() { > task_lock(victim); > task_unlock(victim); > dump_header(oc, victim); // second dump_header() with Victim2 > __oom_kill_process(victim, message) { > p = find_lock_task_mm(victim); > pr_err("%s: Killed process %d (%s) "...); // first kill message. > put_task_struct(p); > } > } > } > exit_mmap(); // Which frees memory. > } > } > } > } > > Maybe the better behavior is to restart out_of_memory() without dump_header() > (we can remember whether we already called dump_header() into "struct oom_control"), > with last second watermark check before select_bad_process() and after dump_header(). I understand what you mean now. But I agree with Michal that this output won't be harmful in your case. And for your case, I think Michal's suggestion that retry the victim selection would be better.
On 2020/07/20 22:41, Michal Hocko wrote: >> Since we don't wake up the OOM reaper when hitting this path, unless __mmput() >> for this task itself immediately reclaims memory and updates the statistics >> counter, we just get two chunks of dump_header() messages and one OOM victim. >> >> Current synchronous printk() gives __mmput() some time for reclaiming memory >> and updating the statistics counter. But when printk() becomes asynchronous, >> there might be quite small time. People might wonder "why killed message >> follows immediately after skipped killing message"... Wouldn't the skip >> message confuse people? > > I would ask other way around. Wouldn't that give us a better clue that > the first oom invocation and the back off was a suboptimal decision? If > we learn about more of those, maybe we want to reconsider this heuristic > and rather retry the victim selection instead. I've just suggested Maybe the better behavior is to restart out_of_memory() without dump_header() (we can remember whether we already called dump_header() into "struct oom_control"), with last second watermark check before select_bad_process() and after dump_header(). at https://lkml.kernel.org/r/7f58363a-db1a-5502-e2b4-ee4b9fa31824@i-love.sakura.ne.jp .
On Mon 20-07-20 23:03:46, Tetsuo Handa wrote: > On 2020/07/20 22:41, Michal Hocko wrote: > >> Since we don't wake up the OOM reaper when hitting this path, unless __mmput() > >> for this task itself immediately reclaims memory and updates the statistics > >> counter, we just get two chunks of dump_header() messages and one OOM victim. > >> > >> Current synchronous printk() gives __mmput() some time for reclaiming memory > >> and updating the statistics counter. But when printk() becomes asynchronous, > >> there might be quite small time. People might wonder "why killed message > >> follows immediately after skipped killing message"... Wouldn't the skip > >> message confuse people? > > > > I would ask other way around. Wouldn't that give us a better clue that > > the first oom invocation and the back off was a suboptimal decision? If > > we learn about more of those, maybe we want to reconsider this heuristic > > and rather retry the victim selection instead. > > I've just suggested > > Maybe the better behavior is to restart out_of_memory() without dump_header() > (we can remember whether we already called dump_header() into "struct oom_control"), > with last second watermark check before select_bad_process() and after dump_header(). > > at https://lkml.kernel.org/r/7f58363a-db1a-5502-e2b4-ee4b9fa31824@i-love.sakura.ne.jp . As soon as we learn about more of those as mentioned above.
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 6e94962..0480dde 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -863,9 +863,13 @@ static void __oom_kill_process(struct task_struct *victim, const char *message) p = find_lock_task_mm(victim); if (!p) { + pr_info("Process %d (%s) is already exiting\n", + task_pid_nr(victim), victim->comm); put_task_struct(victim); return; - } else if (victim != p) { + } + + if (victim != p) { get_task_struct(p); put_task_struct(victim); victim = p;
When the OOM killer finding a victim and trying to kill it, if the victim is already exiting, the task mm will be NULL and no process will be killed. But the dump_header() has been already executed, so it will be strange to dump so many information without killing a process. We'd better show some helpful information to indicate why this happens. Suggested-by: David Rientjes <rientjes@google.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> --- mm/oom_kill.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)