diff mbox series

[RFC,1/2] mm, oom: marks all killed tasks as oom victims

Message ID 20181022071323.9550-2-mhocko@kernel.org (mailing list archive)
State New, archived
Headers show
Series oom, memcg: do not report racy no-eligible OOM | expand

Commit Message

Michal Hocko Oct. 22, 2018, 7:13 a.m. UTC
From: Michal Hocko <mhocko@suse.com>

Historically we have called mark_oom_victim only to the main task
selected as the oom victim because oom victims have access to memory
reserves and granting the access to all killed tasks could deplete
memory reserves very quickly and cause even larger problems.

Since only a partial access to memory reserves is allowed there is no
longer this risk and so all tasks killed along with the oom victim
can be considered as well.

The primary motivation for that is that process groups which do not
shared signals would behave more like standard thread groups wrt oom
handling (aka tsk_is_oom_victim will work the same way for them).

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/oom_kill.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Tetsuo Handa Oct. 22, 2018, 7:58 a.m. UTC | #1
Michal Hocko wrote:
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -898,6 +898,7 @@ static void __oom_kill_process(struct task_struct *victim)
>  		if (unlikely(p->flags & PF_KTHREAD))
>  			continue;
>  		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, PIDTYPE_TGID);
> +		mark_oom_victim(p);
>  	}
>  	rcu_read_unlock();
>  
> -- 

Wrong. Either

---
 mm/oom_kill.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f10aa53..99b36ff 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -879,6 +879,8 @@ static void __oom_kill_process(struct task_struct *victim)
 	 */
 	rcu_read_lock();
 	for_each_process(p) {
+		struct task_struct *t;
+
 		if (!process_shares_mm(p, mm))
 			continue;
 		if (same_thread_group(p, victim))
@@ -898,6 +900,11 @@ static void __oom_kill_process(struct task_struct *victim)
 		if (unlikely(p->flags & PF_KTHREAD))
 			continue;
 		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, PIDTYPE_TGID);
+		t = find_lock_task_mm(p);
+		if (!t)
+			continue;
+		mark_oom_victim(t);
+		task_unlock(t);
 	}
 	rcu_read_unlock();
Michal Hocko Oct. 22, 2018, 8:48 a.m. UTC | #2
On Mon 22-10-18 16:58:50, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -898,6 +898,7 @@ static void __oom_kill_process(struct task_struct *victim)
> >  		if (unlikely(p->flags & PF_KTHREAD))
> >  			continue;
> >  		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, PIDTYPE_TGID);
> > +		mark_oom_victim(p);
> >  	}
> >  	rcu_read_unlock();
> >  
> > -- 
> 
> Wrong. Either

You are right. The mm might go away between process_shares_mm and here.
While your find_lock_task_mm would be correct I believe we can do better
by using the existing mm that we already have. I will make it a separate
patch to clarity.

Thanks for pointing this out.

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 188ae490cf3e..4c205061ed67 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -663,6 +663,7 @@ static inline void wake_oom_reaper(struct task_struct *tsk)
 /**
  * mark_oom_victim - mark the given task as OOM victim
  * @tsk: task to mark
+ * @mm: mm associated with the task
  *
  * Has to be called with oom_lock held and never after
  * oom has been disabled already.
@@ -670,10 +671,8 @@ static inline void wake_oom_reaper(struct task_struct *tsk)
  * tsk->mm has to be non NULL and caller has to guarantee it is stable (either
  * under task_lock or operate on the current).
  */
-static void mark_oom_victim(struct task_struct *tsk)
+static void mark_oom_victim(struct task_struct *tsk, struct mm_struct *mm)
 {
-	struct mm_struct *mm = tsk->mm;
-
 	WARN_ON(oom_killer_disabled);
 	/* OOM killer might race with memcg OOM */
 	if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
@@ -860,7 +859,7 @@ static void __oom_kill_process(struct task_struct *victim)
 	 * reserves from the user space under its control.
 	 */
 	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, PIDTYPE_TGID);
-	mark_oom_victim(victim);
+	mark_oom_victim(victim, mm);
 	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
 		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
 		K(get_mm_counter(victim->mm, MM_ANONPAGES)),
@@ -898,7 +897,7 @@ static void __oom_kill_process(struct task_struct *victim)
 		if (unlikely(p->flags & PF_KTHREAD))
 			continue;
 		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, PIDTYPE_TGID);
-		mark_oom_victim(p);
+		mark_oom_victim(p, mm);
 	}
 	rcu_read_unlock();
 
@@ -942,7 +941,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
 	 */
 	task_lock(p);
 	if (task_will_free_mem(p)) {
-		mark_oom_victim(p);
+		mark_oom_victim(p, p->mm);
 		wake_oom_reaper(p);
 		task_unlock(p);
 		put_task_struct(p);
@@ -1072,7 +1071,7 @@ bool out_of_memory(struct oom_control *oc)
 	 * quickly exit and free its memory.
 	 */
 	if (task_will_free_mem(current)) {
-		mark_oom_victim(current);
+		mark_oom_victim(current, current->mm);
 		wake_oom_reaper(current);
 		return true;
 	}
Tetsuo Handa Oct. 22, 2018, 9:42 a.m. UTC | #3
On 2018/10/22 17:48, Michal Hocko wrote:
> On Mon 22-10-18 16:58:50, Tetsuo Handa wrote:
>> Michal Hocko wrote:
>>> --- a/mm/oom_kill.c
>>> +++ b/mm/oom_kill.c
>>> @@ -898,6 +898,7 @@ static void __oom_kill_process(struct task_struct *victim)
>>>  		if (unlikely(p->flags & PF_KTHREAD))
>>>  			continue;
>>>  		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, PIDTYPE_TGID);
>>> +		mark_oom_victim(p);
>>>  	}
>>>  	rcu_read_unlock();
>>>  
>>> -- 
>>
>> Wrong. Either
> 
> You are right. The mm might go away between process_shares_mm and here.
> While your find_lock_task_mm would be correct I believe we can do better
> by using the existing mm that we already have. I will make it a separate
> patch to clarity.

Still wrong. p->mm == NULL means that we are too late to set TIF_MEMDIE
on that thread. Passing non-NULL mm to mark_oom_victim() won't help.

> @@ -898,7 +897,7 @@ static void __oom_kill_process(struct task_struct *victim)
>  		if (unlikely(p->flags & PF_KTHREAD))
>  			continue;
>  		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, PIDTYPE_TGID);
> -		mark_oom_victim(p);
> +		mark_oom_victim(p, mm);
>  	}
>  	rcu_read_unlock();
>
Michal Hocko Oct. 22, 2018, 10:43 a.m. UTC | #4
On Mon 22-10-18 18:42:30, Tetsuo Handa wrote:
> On 2018/10/22 17:48, Michal Hocko wrote:
> > On Mon 22-10-18 16:58:50, Tetsuo Handa wrote:
> >> Michal Hocko wrote:
> >>> --- a/mm/oom_kill.c
> >>> +++ b/mm/oom_kill.c
> >>> @@ -898,6 +898,7 @@ static void __oom_kill_process(struct task_struct *victim)
> >>>  		if (unlikely(p->flags & PF_KTHREAD))
> >>>  			continue;
> >>>  		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, PIDTYPE_TGID);
> >>> +		mark_oom_victim(p);
> >>>  	}
> >>>  	rcu_read_unlock();
> >>>  
> >>> -- 
> >>
> >> Wrong. Either
> > 
> > You are right. The mm might go away between process_shares_mm and here.
> > While your find_lock_task_mm would be correct I believe we can do better
> > by using the existing mm that we already have. I will make it a separate
> > patch to clarity.
> 
> Still wrong. p->mm == NULL means that we are too late to set TIF_MEMDIE
> on that thread. Passing non-NULL mm to mark_oom_victim() won't help.

Why would it be too late? Or in other words why would this be harmful?
Tetsuo Handa Oct. 22, 2018, 10:56 a.m. UTC | #5
On 2018/10/22 19:43, Michal Hocko wrote:
> On Mon 22-10-18 18:42:30, Tetsuo Handa wrote:
>> On 2018/10/22 17:48, Michal Hocko wrote:
>>> On Mon 22-10-18 16:58:50, Tetsuo Handa wrote:
>>>> Michal Hocko wrote:
>>>>> --- a/mm/oom_kill.c
>>>>> +++ b/mm/oom_kill.c
>>>>> @@ -898,6 +898,7 @@ static void __oom_kill_process(struct task_struct *victim)
>>>>>  		if (unlikely(p->flags & PF_KTHREAD))
>>>>>  			continue;
>>>>>  		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, PIDTYPE_TGID);
>>>>> +		mark_oom_victim(p);
>>>>>  	}
>>>>>  	rcu_read_unlock();
>>>>>  
>>>>> -- 
>>>>
>>>> Wrong. Either
>>>
>>> You are right. The mm might go away between process_shares_mm and here.
>>> While your find_lock_task_mm would be correct I believe we can do better
>>> by using the existing mm that we already have. I will make it a separate
>>> patch to clarity.
>>
>> Still wrong. p->mm == NULL means that we are too late to set TIF_MEMDIE
>> on that thread. Passing non-NULL mm to mark_oom_victim() won't help.
> 
> Why would it be too late? Or in other words why would this be harmful?
> 

Setting TIF_MEMDIE after exit_mm() completed is too late.

static void exit_mm(void)
{
(...snipped...)
	task_lock(current);
	current->mm = NULL;
	up_read(&mm->mmap_sem);
	enter_lazy_tlb(mm, current);
	task_unlock(current);
	mm_update_next_owner(mm);
	mmput(mm);
	if (test_thread_flag(TIF_MEMDIE))
		exit_oom_victim();
}
Michal Hocko Oct. 22, 2018, 11:12 a.m. UTC | #6
On Mon 22-10-18 19:56:49, Tetsuo Handa wrote:
> On 2018/10/22 19:43, Michal Hocko wrote:
> > On Mon 22-10-18 18:42:30, Tetsuo Handa wrote:
> >> On 2018/10/22 17:48, Michal Hocko wrote:
> >>> On Mon 22-10-18 16:58:50, Tetsuo Handa wrote:
> >>>> Michal Hocko wrote:
> >>>>> --- a/mm/oom_kill.c
> >>>>> +++ b/mm/oom_kill.c
> >>>>> @@ -898,6 +898,7 @@ static void __oom_kill_process(struct task_struct *victim)
> >>>>>  		if (unlikely(p->flags & PF_KTHREAD))
> >>>>>  			continue;
> >>>>>  		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, PIDTYPE_TGID);
> >>>>> +		mark_oom_victim(p);
> >>>>>  	}
> >>>>>  	rcu_read_unlock();
> >>>>>  
> >>>>> -- 
> >>>>
> >>>> Wrong. Either
> >>>
> >>> You are right. The mm might go away between process_shares_mm and here.
> >>> While your find_lock_task_mm would be correct I believe we can do better
> >>> by using the existing mm that we already have. I will make it a separate
> >>> patch to clarity.
> >>
> >> Still wrong. p->mm == NULL means that we are too late to set TIF_MEMDIE
> >> on that thread. Passing non-NULL mm to mark_oom_victim() won't help.
> > 
> > Why would it be too late? Or in other words why would this be harmful?
> > 
> 
> Setting TIF_MEMDIE after exit_mm() completed is too late.

You are right and I am obviously dense today. I will go with
find_lock_task_mm for now and push the "get rid of TIF_MEMDIE" up in the
todo list. I hope I will get to it some day.
diff mbox series

Patch

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f10aa5360616..188ae490cf3e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -898,6 +898,7 @@  static void __oom_kill_process(struct task_struct *victim)
 		if (unlikely(p->flags & PF_KTHREAD))
 			continue;
 		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, PIDTYPE_TGID);
+		mark_oom_victim(p);
 	}
 	rcu_read_unlock();