diff mbox series

mm, oom: show process exiting information in __oom_kill_process()

Message ID 1595166795-27587-1-git-send-email-laoar.shao@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm, oom: show process exiting information in __oom_kill_process() | expand

Commit Message

Yafang Shao July 19, 2020, 1:53 p.m. UTC
When the OOM killer finding a victim and trying to kill it, if the victim
is already exiting, the task mm will be NULL and no process will be killed.
But the dump_header() has been already executed, so it will be strange to
dump so many information without killing a process. We'd better show some
helpful information to indicate why this happens.

Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
---
 mm/oom_kill.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Tetsuo Handa July 19, 2020, 11:20 p.m. UTC | #1
On 2020/07/19 22:53, Yafang Shao wrote:
>  	p = find_lock_task_mm(victim);
>  	if (!p) {
> +		pr_info("Process %d (%s) is already exiting\n",
> +			task_pid_nr(victim), victim->comm);

Maybe

+               pr_info("%s: Process %d (%s) is already exiting\n",
+                       message, task_pid_nr(victim), victim->comm);

line is easier to compare with

        pr_err("%s: Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB, UID:%u pgtables:%lukB oom_score_adj:%hd\n",
                message, task_pid_nr(victim), victim->comm, K(mm->total_vm),

line?

>  		put_task_struct(victim);
>  		return;
Yafang Shao July 20, 2020, 1:43 a.m. UTC | #2
On Mon, Jul 20, 2020 at 7:20 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2020/07/19 22:53, Yafang Shao wrote:
> >       p = find_lock_task_mm(victim);
> >       if (!p) {
> > +             pr_info("Process %d (%s) is already exiting\n",
> > +                     task_pid_nr(victim), victim->comm);
>
> Maybe
>
> +               pr_info("%s: Process %d (%s) is already exiting\n",
> +                       message, task_pid_nr(victim), victim->comm);
>
> line is easier to compare with
>
>         pr_err("%s: Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB, UID:%u pgtables:%lukB oom_score_adj:%hd\n",
>                 message, task_pid_nr(victim), victim->comm, K(mm->total_vm),
>
> line?
>

Seems so.
Thanks for the suggestion.

> >               put_task_struct(victim);
> >               return;
Michal Hocko July 20, 2020, 7:16 a.m. UTC | #3
On Sun 19-07-20 09:53:15, Yafang Shao wrote:
> When the OOM killer finding a victim and trying to kill it, if the victim
> is already exiting, the task mm will be NULL and no process will be killed.
> But the dump_header() has been already executed, so it will be strange to
> dump so many information without killing a process. We'd better show some
> helpful information to indicate why this happens.
> 
> Suggested-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> ---
>  mm/oom_kill.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 6e94962..0480dde 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -863,9 +863,13 @@ static void __oom_kill_process(struct task_struct *victim, const char *message)
>  
>  	p = find_lock_task_mm(victim);
>  	if (!p) {
> +		pr_info("Process %d (%s) is already exiting\n",
> +			task_pid_nr(victim), victim->comm);
>  		put_task_struct(victim);

I do agree that a silent bail out is not the best thing to do. The above
message would be more useful if it also explained what the oom killer
does (or does not):

	"OOM victim %d (%s) is already exiting. Skip killing the task\n"

>  		return;
> -	} else if (victim != p) {
> +	}
> +
> +	if (victim != p) {

Why do we need this?

>  		get_task_struct(p);
>  		put_task_struct(victim);
>  		victim = p;
> -- 
> 1.8.3.1
Yafang Shao July 20, 2020, 10:36 a.m. UTC | #4
On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Sun 19-07-20 09:53:15, Yafang Shao wrote:
> > When the OOM killer finding a victim and trying to kill it, if the victim
> > is already exiting, the task mm will be NULL and no process will be killed.
> > But the dump_header() has been already executed, so it will be strange to
> > dump so many information without killing a process. We'd better show some
> > helpful information to indicate why this happens.
> >
> > Suggested-by: David Rientjes <rientjes@google.com>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> > ---
> >  mm/oom_kill.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 6e94962..0480dde 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -863,9 +863,13 @@ static void __oom_kill_process(struct task_struct *victim, const char *message)
> >
> >       p = find_lock_task_mm(victim);
> >       if (!p) {
> > +             pr_info("Process %d (%s) is already exiting\n",
> > +                     task_pid_nr(victim), victim->comm);
> >               put_task_struct(victim);
>
> I do agree that a silent bail out is not the best thing to do. The above
> message would be more useful if it also explained what the oom killer
> does (or does not):
>
>         "OOM victim %d (%s) is already exiting. Skip killing the task\n"
>

Sure.

> >               return;
> > -     } else if (victim != p) {
> > +     }
> > +
> > +     if (victim != p) {
>
> Why do we need this?
>

Because I don't  like that code style.
But it is not a big problem, I will not change it in the next version.

> >               get_task_struct(p);
> >               put_task_struct(victim);
> >               victim = p;
> > --
> > 1.8.3.1
>
> --
> Michal Hocko
> SUSE Labs
Tetsuo Handa July 20, 2020, 11:06 a.m. UTC | #5
On 2020/07/20 19:36, Yafang Shao wrote:
> On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote:
>> I do agree that a silent bail out is not the best thing to do. The above
>> message would be more useful if it also explained what the oom killer
>> does (or does not):
>>
>>         "OOM victim %d (%s) is already exiting. Skip killing the task\n"
>>
> 
> Sure.

This path is rarely hit because find_lock_task_mm() in oom_badness() from
select_bad_process() in the next round of OOM killer will skip this task.

Since we don't wake up the OOM reaper when hitting this path, unless __mmput()
for this task itself immediately reclaims memory and updates the statistics
counter, we just get two chunks of dump_header() messages and one OOM victim.

Current synchronous printk() gives __mmput() some time for reclaiming memory
and updating the statistics counter. But when printk() becomes asynchronous,
there might be quite small time. People might wonder "why killed message
follows immediately after skipped killing message"... Wouldn't the skip
message confuse people?
Yafang Shao July 20, 2020, 12:19 p.m. UTC | #6
On Mon, Jul 20, 2020 at 7:06 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2020/07/20 19:36, Yafang Shao wrote:
> > On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote:
> >> I do agree that a silent bail out is not the best thing to do. The above
> >> message would be more useful if it also explained what the oom killer
> >> does (or does not):
> >>
> >>         "OOM victim %d (%s) is already exiting. Skip killing the task\n"
> >>
> >
> > Sure.
>
> This path is rarely hit because find_lock_task_mm() in oom_badness() from
> select_bad_process() in the next round of OOM killer will skip this task.
>
> Since we don't wake up the OOM reaper when hitting this path, unless __mmput()
> for this task itself immediately reclaims memory and updates the statistics
> counter, we just get two chunks of dump_header() messages and one OOM victim.
>

Could you pls. explain more specifically why we will get two chunks of
dump_header()?
My understanding is the free_mm() happens between select_bad_process()
and __oom_kill_process() as bellow,

P1
             Victim
select_bad_process()
    oom_badness()
        p = find_lock_task_mm()  # p isn't NULL

                __mmput()

                    free_mm()
dump_header()  # dump once
__oom_kill_process()
    p = find_lock_task_mm(victim); # p is NULL now

So where is another dump_header() ?


> Current synchronous printk() gives __mmput() some time for reclaiming memory
> and updating the statistics counter. But when printk() becomes asynchronous,
> there might be quite small time. People might wonder "why killed message
> follows immediately after skipped killing message"... Wouldn't the skip
> message confuse people?
Tetsuo Handa July 20, 2020, 1:11 p.m. UTC | #7
On 2020/07/20 21:19, Yafang Shao wrote:
> On Mon, Jul 20, 2020 at 7:06 PM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> On 2020/07/20 19:36, Yafang Shao wrote:
>>> On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote:
>>>> I do agree that a silent bail out is not the best thing to do. The above
>>>> message would be more useful if it also explained what the oom killer
>>>> does (or does not):
>>>>
>>>>         "OOM victim %d (%s) is already exiting. Skip killing the task\n"
>>>>
>>>
>>> Sure.
>>
>> This path is rarely hit because find_lock_task_mm() in oom_badness() from
>> select_bad_process() in the next round of OOM killer will skip this task.
>>
>> Since we don't wake up the OOM reaper when hitting this path, unless __mmput()
>> for this task itself immediately reclaims memory and updates the statistics
>> counter, we just get two chunks of dump_header() messages and one OOM victim.
>>
> 
> Could you pls. explain more specifically why we will get two chunks of
> dump_header()?
> My understanding is the free_mm() happens between select_bad_process()
> and __oom_kill_process() as bellow,
> 
> P1
>              Victim
> select_bad_process()
>     oom_badness()
>         p = find_lock_task_mm()  # p isn't NULL
> 
>                 __mmput()
> 
>                     free_mm()
> dump_header()  # dump once
> __oom_kill_process()
>     p = find_lock_task_mm(victim); # p is NULL now
> 
> So where is another dump_header() ?
> 

Start of __mmput() does not guarantee that memory is reclaimed immediately.
Moreover, even __mmput() might not have started by the moment second chunk of
dump_header() happens. The "OOM victim %d (%s) is already exiting." case only
indicates that victim's mm became NULL; there is no guarantee that memory is
reclaimed (in order to avoid OOM kill) by the moment next round hits.

P1                                Victim1                              Victim2

out_of_memory() {
  select_bad_process() {
    oom_badness() {
      p = find_lock_task_mm() {
        task_lock(victim);       // finds Victim1 because Victim1->mm != NULL.
      }
      get_task_struct(p);
      task_unlock(p);
    }
  }
  oom_kill_process() {
    task_lock(victim);
    task_unlock(victim);
                                  do_exit() {
    dump_header(oc, victim); // first dump_header() with Victim1 and Victim2
    __oom_kill_process(victim, message) {
                                    exit_mm() {
                                      task_lock(current);
                                      current->mm = NULL;
                                      task_unlock(current);
        p = find_lock_task_mm(victim);
        put_task_struct(victim); // without killing Victim1 because p == NULL.
      }
    }
  }
}
out_of_memory() {
  select_bad_process() {
    oom_badness() {
      p = find_lock_task_mm() {
        task_lock(victim);       // finds Victim2 because Victim2->mm != NULL.
      }
      get_task_struct(p);
      task_unlock(p);
    }
  }
                                      mmput() {
                                        __mmput() {
                                          uprobe_clear_state() {
                                            // Might wait for delayed_uprobe_lock.
                                          }
  oom_kill_process() {
    task_lock(victim);
    task_unlock(victim);
    dump_header(oc, victim); // second dump_header() with Victim2
    __oom_kill_process(victim, message) {
      p = find_lock_task_mm(victim);
      pr_err("%s: Killed process %d (%s) "...); // first kill message.
      put_task_struct(p);
    }
  }
}
                                          exit_mmap(); // Which frees memory.
                                        }
                                      }
                                    }
                                  }

Maybe the better behavior is to restart out_of_memory() without dump_header()
(we can remember whether we already called dump_header() into "struct oom_control"),
with last second watermark check before select_bad_process() and after dump_header().
Michal Hocko July 20, 2020, 1:35 p.m. UTC | #8
On Mon 20-07-20 18:36:53, Yafang Shao wrote:
> On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Sun 19-07-20 09:53:15, Yafang Shao wrote:
> > > When the OOM killer finding a victim and trying to kill it, if the victim
> > > is already exiting, the task mm will be NULL and no process will be killed.
> > > But the dump_header() has been already executed, so it will be strange to
> > > dump so many information without killing a process. We'd better show some
> > > helpful information to indicate why this happens.
> > >
> > > Suggested-by: David Rientjes <rientjes@google.com>
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > Cc: Michal Hocko <mhocko@kernel.org>
> > > Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> > > ---
> > >  mm/oom_kill.c | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 6e94962..0480dde 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -863,9 +863,13 @@ static void __oom_kill_process(struct task_struct *victim, const char *message)
> > >
> > >       p = find_lock_task_mm(victim);
> > >       if (!p) {
> > > +             pr_info("Process %d (%s) is already exiting\n",
> > > +                     task_pid_nr(victim), victim->comm);
> > >               put_task_struct(victim);
> >
> > I do agree that a silent bail out is not the best thing to do. The above
> > message would be more useful if it also explained what the oom killer
> > does (or does not):
> >
> >         "OOM victim %d (%s) is already exiting. Skip killing the task\n"
> >
> 
> Sure.
> 
> > >               return;
> > > -     } else if (victim != p) {
> > > +     }
> > > +
> > > +     if (victim != p) {
> >
> > Why do we need this?
> >
> 
> Because I don't  like that code style.

We usually prefer to keep unrelated changes separate. Minor coding
style changes are usually questionable because many people have
different sense for the code.

> But it is not a big problem, I will not change it in the next version.

Thanks
Michal Hocko July 20, 2020, 1:41 p.m. UTC | #9
On Mon 20-07-20 20:06:17, Tetsuo Handa wrote:
> On 2020/07/20 19:36, Yafang Shao wrote:
> > On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote:
> >> I do agree that a silent bail out is not the best thing to do. The above
> >> message would be more useful if it also explained what the oom killer
> >> does (or does not):
> >>
> >>         "OOM victim %d (%s) is already exiting. Skip killing the task\n"
> >>
> > 
> > Sure.
> 
> This path is rarely hit because find_lock_task_mm() in oom_badness() from
> select_bad_process() in the next round of OOM killer will skip this task.

Agreed!

> Since we don't wake up the OOM reaper when hitting this path, unless __mmput()
> for this task itself immediately reclaims memory and updates the statistics
> counter, we just get two chunks of dump_header() messages and one OOM victim.
> 
> Current synchronous printk() gives __mmput() some time for reclaiming memory
> and updating the statistics counter. But when printk() becomes asynchronous,
> there might be quite small time. People might wonder "why killed message
> follows immediately after skipped killing message"... Wouldn't the skip
> message confuse people?

I would ask other way around. Wouldn't that give us a better clue that
the first oom invocation and the back off was a suboptimal decision? If
we learn about more of those, maybe we want to reconsider this heuristic
and rather retry the victim selection instead.

I do not really see how this message would be harmful TBH.
Yafang Shao July 20, 2020, 1:59 p.m. UTC | #10
On Mon, Jul 20, 2020 at 9:11 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2020/07/20 21:19, Yafang Shao wrote:
> > On Mon, Jul 20, 2020 at 7:06 PM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >> On 2020/07/20 19:36, Yafang Shao wrote:
> >>> On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@kernel.org> wrote:
> >>>> I do agree that a silent bail out is not the best thing to do. The above
> >>>> message would be more useful if it also explained what the oom killer
> >>>> does (or does not):
> >>>>
> >>>>         "OOM victim %d (%s) is already exiting. Skip killing the task\n"
> >>>>
> >>>
> >>> Sure.
> >>
> >> This path is rarely hit because find_lock_task_mm() in oom_badness() from
> >> select_bad_process() in the next round of OOM killer will skip this task.
> >>
> >> Since we don't wake up the OOM reaper when hitting this path, unless __mmput()
> >> for this task itself immediately reclaims memory and updates the statistics
> >> counter, we just get two chunks of dump_header() messages and one OOM victim.
> >>
> >
> > Could you pls. explain more specifically why we will get two chunks of
> > dump_header()?
> > My understanding is the free_mm() happens between select_bad_process()
> > and __oom_kill_process() as bellow,
> >
> > P1
> >              Victim
> > select_bad_process()
> >     oom_badness()
> >         p = find_lock_task_mm()  # p isn't NULL
> >
> >                 __mmput()
> >
> >                     free_mm()
> > dump_header()  # dump once
> > __oom_kill_process()
> >     p = find_lock_task_mm(victim); # p is NULL now
> >
> > So where is another dump_header() ?
> >
>
> Start of __mmput() does not guarantee that memory is reclaimed immediately.
> Moreover, even __mmput() might not have started by the moment second chunk of
> dump_header() happens. The "OOM victim %d (%s) is already exiting." case only
> indicates that victim's mm became NULL; there is no guarantee that memory is
> reclaimed (in order to avoid OOM kill) by the moment next round hits.
>
> P1                                Victim1                              Victim2
>
> out_of_memory() {
>   select_bad_process() {
>     oom_badness() {
>       p = find_lock_task_mm() {
>         task_lock(victim);       // finds Victim1 because Victim1->mm != NULL.
>       }
>       get_task_struct(p);
>       task_unlock(p);
>     }
>   }
>   oom_kill_process() {
>     task_lock(victim);
>     task_unlock(victim);
>                                   do_exit() {
>     dump_header(oc, victim); // first dump_header() with Victim1 and Victim2
>     __oom_kill_process(victim, message) {
>                                     exit_mm() {
>                                       task_lock(current);
>                                       current->mm = NULL;
>                                       task_unlock(current);
>         p = find_lock_task_mm(victim);
>         put_task_struct(victim); // without killing Victim1 because p == NULL.
>       }
>     }
>   }
> }
> out_of_memory() {
>   select_bad_process() {
>     oom_badness() {
>       p = find_lock_task_mm() {
>         task_lock(victim);       // finds Victim2 because Victim2->mm != NULL.
>       }
>       get_task_struct(p);
>       task_unlock(p);
>     }
>   }
>                                       mmput() {
>                                         __mmput() {
>                                           uprobe_clear_state() {
>                                             // Might wait for delayed_uprobe_lock.
>                                           }
>   oom_kill_process() {
>     task_lock(victim);
>     task_unlock(victim);
>     dump_header(oc, victim); // second dump_header() with Victim2
>     __oom_kill_process(victim, message) {
>       p = find_lock_task_mm(victim);
>       pr_err("%s: Killed process %d (%s) "...); // first kill message.
>       put_task_struct(p);
>     }
>   }
> }
>                                           exit_mmap(); // Which frees memory.
>                                         }
>                                       }
>                                     }
>                                   }
>
> Maybe the better behavior is to restart out_of_memory() without dump_header()
> (we can remember whether we already called dump_header() into "struct oom_control"),
> with last second watermark check before select_bad_process() and after dump_header().

I understand what you mean now.
But I agree with Michal that this output won't be harmful in your case.
And for your case, I think Michal's suggestion that retry the victim
selection would be better.
Tetsuo Handa July 20, 2020, 2:03 p.m. UTC | #11
On 2020/07/20 22:41, Michal Hocko wrote:
>> Since we don't wake up the OOM reaper when hitting this path, unless __mmput()
>> for this task itself immediately reclaims memory and updates the statistics
>> counter, we just get two chunks of dump_header() messages and one OOM victim.
>>
>> Current synchronous printk() gives __mmput() some time for reclaiming memory
>> and updating the statistics counter. But when printk() becomes asynchronous,
>> there might be quite small time. People might wonder "why killed message
>> follows immediately after skipped killing message"... Wouldn't the skip
>> message confuse people?
> 
> I would ask other way around. Wouldn't that give us a better clue that
> the first oom invocation and the back off was a suboptimal decision? If
> we learn about more of those, maybe we want to reconsider this heuristic
> and rather retry the victim selection instead.

I've just suggested

  Maybe the better behavior is to restart out_of_memory() without dump_header()
  (we can remember whether we already called dump_header() into "struct oom_control"),
  with last second watermark check before select_bad_process() and after dump_header().

at https://lkml.kernel.org/r/7f58363a-db1a-5502-e2b4-ee4b9fa31824@i-love.sakura.ne.jp .
Michal Hocko July 20, 2020, 2:23 p.m. UTC | #12
On Mon 20-07-20 23:03:46, Tetsuo Handa wrote:
> On 2020/07/20 22:41, Michal Hocko wrote:
> >> Since we don't wake up the OOM reaper when hitting this path, unless __mmput()
> >> for this task itself immediately reclaims memory and updates the statistics
> >> counter, we just get two chunks of dump_header() messages and one OOM victim.
> >>
> >> Current synchronous printk() gives __mmput() some time for reclaiming memory
> >> and updating the statistics counter. But when printk() becomes asynchronous,
> >> there might be quite small time. People might wonder "why killed message
> >> follows immediately after skipped killing message"... Wouldn't the skip
> >> message confuse people?
> > 
> > I would ask other way around. Wouldn't that give us a better clue that
> > the first oom invocation and the back off was a suboptimal decision? If
> > we learn about more of those, maybe we want to reconsider this heuristic
> > and rather retry the victim selection instead.
> 
> I've just suggested
> 
>   Maybe the better behavior is to restart out_of_memory() without dump_header()
>   (we can remember whether we already called dump_header() into "struct oom_control"),
>   with last second watermark check before select_bad_process() and after dump_header().
> 
> at https://lkml.kernel.org/r/7f58363a-db1a-5502-e2b4-ee4b9fa31824@i-love.sakura.ne.jp .

As soon as we learn about more of those as mentioned above.
diff mbox series

Patch

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 6e94962..0480dde 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -863,9 +863,13 @@  static void __oom_kill_process(struct task_struct *victim, const char *message)
 
 	p = find_lock_task_mm(victim);
 	if (!p) {
+		pr_info("Process %d (%s) is already exiting\n",
+			task_pid_nr(victim), victim->comm);
 		put_task_struct(victim);
 		return;
-	} else if (victim != p) {
+	}
+
+	if (victim != p) {
 		get_task_struct(p);
 		put_task_struct(victim);
 		victim = p;