From patchwork Tue Jul 3 14:25:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 10504171 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 858EE6035E for ; Tue, 3 Jul 2018 14:26:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7480E28700 for ; Tue, 3 Jul 2018 14:26:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 688FC28949; Tue, 3 Jul 2018 14:26:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 779E228700 for ; Tue, 3 Jul 2018 14:26:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED3CB6B000C; Tue, 3 Jul 2018 10:26:15 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E80CA6B000D; Tue, 3 Jul 2018 10:26:15 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D268C6B000E; Tue, 3 Jul 2018 10:26:15 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id A0E566B000C for ; Tue, 3 Jul 2018 10:26:15 -0400 (EDT) Received: by mail-oi0-f72.google.com with SMTP id w12-v6so1303740oie.12 for ; Tue, 03 Jul 2018 07:26:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=0EGYZaQJ8+wdiOBPjVGRdWgG47+Eg/ydYWCKxMNgOm0=; b=N+lmKY/tmnX7YjYRvKxlO1KpBjN4SpZcqWCuVTbkHG/97U8QAMn8u6OxLpLxyXPshr A2CF0I/5UzzucUz/riF3TWShA4MgV9M4WhFuRXsmkDVYA344MexrhUV+qzuVQDvgzPRj VBJ4msgp51lHrED56a4P5sv/IIBKjAHE+O4C2A+JxTlbNR6hZRzXXQIPJ+IVhZK3r8u6 0DW55nsCZ7ITp2cCKgefPYItOj16dIRwu+wdv7KVLZC5Q0Vv1blnWswUH3pgMiYgazew eryzVHtEIrreNUbobqsnymMjEZPmzHN0p2/7WJJOG2P7L/byfS6xrxGI+5N3m7R4M+Wt buOA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp X-Gm-Message-State: APt69E1UMGkgt4nedgK1QMhvG3U4R5WpTTmxSI6nMU68AJVnwjiHKkMt dksbalfKpaiEcD2FaoWUWQdoSPw6v0ImdOL1LCeKfsBL8D8CtcqP22mvIFFQF147iN0TD0p7AUJ gHQrZBlxsrt8UfIIVz1Pb7IzXKGDjvCRWG75FNUxzxO5knYCemRoG7tV0qny6dyo01Q== X-Received: by 2002:aca:45d5:: with SMTP id s204-v6mr19884019oia.289.1530627975410; Tue, 03 Jul 2018 07:26:15 -0700 (PDT) X-Google-Smtp-Source: AAOMgpc3XYPKyitECCTW1E6wMS0NI7g51GBQy+WLGQlb8I74ZshkFUuBTB6kBLBUSYSvZsQZhOyV X-Received: by 2002:aca:45d5:: with SMTP id s204-v6mr19883945oia.289.1530627974385; Tue, 03 Jul 2018 07:26:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530627974; cv=none; d=google.com; s=arc-20160816; b=DF0YecOK/L/Qposc2oGZrJRZulilLCe2HTbnc3RUN9QrUXwk4d57rfx6O2yXfTQvnp 0c1CqeQbFJACFBzClj3b0OhbBlUP+6alJ+lfWBh1l9FUuT9qdknSizqWDyhbOsb6VqRG ZtjOucjCp5hCM0SG+j8uKXYbw4y34kPaV9Jzv0+5/WEvqe86oFocBL/WH6z8cfg4Hq1J kXQ7jibAG5EDcmuspjoqb13NjGOEl7O0lOZuGdtHT7JpVD5KSrVMIQMxnbHh2UTVpBbA OUE+9bWNY6sSo7MkwwdhT44/J2iTn9T3axSzvBV2Dvy4bhfrhOKXb8o7cT6GXlib1Wmp 32Yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=0EGYZaQJ8+wdiOBPjVGRdWgG47+Eg/ydYWCKxMNgOm0=; b=I0vPR+Zgl5qrJgKOqo9zVWekziT9XViYpRQn4hWPP9JcaT7Z42RHABBokF00eMRUS/ YiEEnjXdd59GfIPJ74wWjwBmGPoxFt6hlzTAoiG/vNP2HlW9166Tnx9RMfaSJHZPZz6V b7PuNle3XN5cVnXUvgDPZ5RRhzbo7uDAUjGpkz0VC7VWE5WDRrvXw/gN3mN9rkoVexHZ A/Z5dS6qlY2a069YhXhessmueMdOF1QlGZHxPhNeFxs0B2RBykuqnI/eWwISnPquAFES EW2H/DD5rXv0mcqNJSwqyGOgTNPbZlInwcyTY+bivvSitQS6amyGPF7/RjkkFJlKWBr/ PT9w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [202.181.97.72]) by mx.google.com with ESMTPS id 101-v6si415199otm.195.2018.07.03.07.26.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 03 Jul 2018 07:26:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) client-ip=202.181.97.72; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from fsav402.sakura.ne.jp (fsav402.sakura.ne.jp [133.242.250.101]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w63EPppn063895; Tue, 3 Jul 2018 23:25:51 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav402.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav402.sakura.ne.jp); Tue, 03 Jul 2018 23:25:51 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav402.sakura.ne.jp) Received: from ccsecurity.localdomain (softbank126074194044.bbtec.net [126.74.194.44]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id w63EPbJ6063838 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 3 Jul 2018 23:25:50 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) From: Tetsuo Handa To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: torvalds@linux-foundation.org, Tetsuo Handa , David Rientjes , Johannes Weiner , Michal Hocko , Roman Gushchin , Tejun Heo , Vladimir Davydov Subject: [PATCH 2/8] mm, oom: Check pending victims earlier in out_of_memory(). Date: Tue, 3 Jul 2018 23:25:03 +0900 Message-Id: <1530627910-3415-3-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1530627910-3415-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> References: <1530627910-3415-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The "mm, oom: cgroup-aware OOM killer" patchset is trying to introduce INFLIGHT_VICTIM in order to replace open-coded ((void *)-1UL). But (regarding CONFIG_MMU=y case) we have a list of inflight OOM victim threads which are connected to oom_reaper_list. Thus we can check whether there are inflight OOM victims before starting process/memcg list traversal. Since it is likely that only few threads are linked to oom_reaper_list, checking all victims' OOM domain will not matter. Thus, check whether there are inflight OOM victims before starting process/memcg list traversal and eliminate the "abort" path. Note that this patch could temporarily regress CONFIG_MMU=n kernels because this patch selects same victims rather than waits for victims if CONFIG_MMU=n. This will be fixed by the next patch in this series. Signed-off-by: Tetsuo Handa Cc: Roman Gushchin Cc: Michal Hocko Cc: Johannes Weiner Cc: Vladimir Davydov Cc: David Rientjes Cc: Tejun Heo --- include/linux/memcontrol.h | 9 ++-- include/linux/sched.h | 2 +- mm/memcontrol.c | 18 +++----- mm/oom_kill.c | 103 +++++++++++++++++++++++++-------------------- 4 files changed, 67 insertions(+), 65 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6c6fb11..a82360a 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -382,8 +382,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *, struct mem_cgroup *, struct mem_cgroup_reclaim_cookie *); void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *); -int mem_cgroup_scan_tasks(struct mem_cgroup *, - int (*)(struct task_struct *, void *), void *); +void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, + void (*fn)(struct task_struct *, void *), void *arg); static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg) { @@ -850,10 +850,9 @@ static inline void mem_cgroup_iter_break(struct mem_cgroup *root, { } -static inline int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, - int (*fn)(struct task_struct *, void *), void *arg) +static inline void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, + void (*fn)(struct task_struct *, void *), void *arg) { - return 0; } static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg) diff --git a/include/linux/sched.h b/include/linux/sched.h index 9256118..d56ae68 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1163,7 +1163,7 @@ struct task_struct { #endif int pagefault_disabled; #ifdef CONFIG_MMU - struct task_struct *oom_reaper_list; + struct list_head oom_victim_list; #endif #ifdef CONFIG_VMAP_STACK struct vm_struct *stack_vm_area; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e6f0d5e..c8a75c8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -884,17 +884,14 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg) * @arg: argument passed to @fn * * This function iterates over tasks attached to @memcg or to any of its - * descendants and calls @fn for each task. If @fn returns a non-zero - * value, the function breaks the iteration loop and returns the value. - * Otherwise, it will iterate over all tasks and return 0. + * descendants and calls @fn for each task. * * This function must not be called for the root memory cgroup. */ -int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, - int (*fn)(struct task_struct *, void *), void *arg) +void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, + void (*fn)(struct task_struct *, void *), void *arg) { struct mem_cgroup *iter; - int ret = 0; BUG_ON(memcg == root_mem_cgroup); @@ -903,15 +900,10 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, struct task_struct *task; css_task_iter_start(&iter->css, 0, &it); - while (!ret && (task = css_task_iter_next(&it))) - ret = fn(task, arg); + while ((task = css_task_iter_next(&it))) + fn(task, arg); css_task_iter_end(&it); - if (ret) { - mem_cgroup_iter_break(memcg, iter); - break; - } } - return ret; } /** diff --git a/mm/oom_kill.c b/mm/oom_kill.c index d3fb4e4..f58281e 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -304,25 +304,13 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc) return CONSTRAINT_NONE; } -static int oom_evaluate_task(struct task_struct *task, void *arg) +static void oom_evaluate_task(struct task_struct *task, void *arg) { struct oom_control *oc = arg; unsigned long points; if (oom_unkillable_task(task, NULL, oc->nodemask)) - goto next; - - /* - * This task already has access to memory reserves and is being killed. - * Don't allow any other task to have access to the reserves unless - * the task has MMF_OOM_SKIP because chances that it would release - * any memory is quite low. - */ - if (!is_sysrq_oom(oc) && tsk_is_oom_victim(task)) { - if (test_bit(MMF_OOM_SKIP, &task->signal->oom_mm->flags)) - goto next; - goto abort; - } + return; /* * If task is allocating a lot of memory and has been marked to be @@ -335,29 +323,22 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) points = oom_badness(task, NULL, oc->nodemask, oc->totalpages); if (!points || points < oc->chosen_points) - goto next; + return; /* Prefer thread group leaders for display purposes */ if (points == oc->chosen_points && thread_group_leader(oc->chosen)) - goto next; + return; select: if (oc->chosen) put_task_struct(oc->chosen); get_task_struct(task); oc->chosen = task; oc->chosen_points = points; -next: - return 0; -abort: - if (oc->chosen) - put_task_struct(oc->chosen); - oc->chosen = (void *)-1UL; - return 1; } /* * Simple selection loop. We choose the process with the highest number of - * 'points'. In case scan was aborted, oc->chosen is set to -1. + * 'points'. */ static void select_bad_process(struct oom_control *oc) { @@ -368,8 +349,7 @@ static void select_bad_process(struct oom_control *oc) rcu_read_lock(); for_each_process(p) - if (oom_evaluate_task(p, oc)) - break; + oom_evaluate_task(p, oc); rcu_read_unlock(); } @@ -476,7 +456,7 @@ bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) */ static struct task_struct *oom_reaper_th; static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait); -static struct task_struct *oom_reaper_list; +static LIST_HEAD(oom_victim_list); static DEFINE_SPINLOCK(oom_reaper_lock); /* @@ -488,7 +468,7 @@ bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) * unmap_page_range() # frees some memory * set_bit(MMF_OOM_SKIP) * out_of_memory() - * select_bad_process() + * oom_has_pending_victims() * test_bit(MMF_OOM_SKIP) # selects new oom victim * mutex_unlock(&oom_lock) * @@ -606,14 +586,16 @@ static void oom_reap_task(struct task_struct *tsk) debug_show_all_locks(); done: - tsk->oom_reaper_list = NULL; - /* * Hide this mm from OOM killer because it has been either reaped or * somebody can't call up_write(mmap_sem). */ set_bit(MMF_OOM_SKIP, &mm->flags); + spin_lock(&oom_reaper_lock); + list_del(&tsk->oom_victim_list); + spin_unlock(&oom_reaper_lock); + /* Drop a reference taken by wake_oom_reaper */ put_task_struct(tsk); } @@ -623,12 +605,13 @@ static int oom_reaper(void *unused) while (true) { struct task_struct *tsk = NULL; - wait_event_freezable(oom_reaper_wait, oom_reaper_list != NULL); + wait_event_freezable(oom_reaper_wait, + !list_empty(&oom_victim_list)); spin_lock(&oom_reaper_lock); - if (oom_reaper_list != NULL) { - tsk = oom_reaper_list; - oom_reaper_list = tsk->oom_reaper_list; - } + if (!list_empty(&oom_victim_list)) + tsk = list_first_entry(&oom_victim_list, + struct task_struct, + oom_victim_list); spin_unlock(&oom_reaper_lock); if (tsk) @@ -640,15 +623,11 @@ static int oom_reaper(void *unused) static void wake_oom_reaper(struct task_struct *tsk) { - /* tsk is already queued? */ - if (tsk == oom_reaper_list || tsk->oom_reaper_list) + if (tsk->oom_victim_list.next) return; - get_task_struct(tsk); - spin_lock(&oom_reaper_lock); - tsk->oom_reaper_list = oom_reaper_list; - oom_reaper_list = tsk; + list_add_tail(&tsk->oom_victim_list, &oom_victim_list); spin_unlock(&oom_reaper_lock); trace_wake_reaper(tsk->pid); wake_up(&oom_reaper_wait); @@ -1010,6 +989,34 @@ int unregister_oom_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_oom_notifier); +static bool oom_has_pending_victims(struct oom_control *oc) +{ +#ifdef CONFIG_MMU + struct task_struct *p; + + if (is_sysrq_oom(oc)) + return false; + /* + * Since oom_reap_task_mm()/exit_mmap() will set MMF_OOM_SKIP, let's + * wait for pending victims until MMF_OOM_SKIP is set. + */ + spin_lock(&oom_reaper_lock); + list_for_each_entry(p, &oom_victim_list, oom_victim_list) + if (!oom_unkillable_task(p, oc->memcg, oc->nodemask) && + !test_bit(MMF_OOM_SKIP, &p->signal->oom_mm->flags)) + break; + spin_unlock(&oom_reaper_lock); + return p != NULL; +#else + /* + * Since nobody except oom_kill_process() sets MMF_OOM_SKIP, waiting + * for pending victims until MMF_OOM_SKIP is set is useless. Therefore, + * simply let the OOM killer select pending victims again. + */ + return false; +#endif +} + /** * out_of_memory - kill the "best" process when we run out of memory * @oc: pointer to struct oom_control @@ -1063,6 +1070,9 @@ bool out_of_memory(struct oom_control *oc) oc->nodemask = NULL; check_panic_on_oom(oc, constraint); + if (oom_has_pending_victims(oc)) + return true; + if (!is_memcg_oom(oc) && sysctl_oom_kill_allocating_task && current->mm && !oom_unkillable_task(current, NULL, oc->nodemask) && current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) { @@ -1074,14 +1084,15 @@ bool out_of_memory(struct oom_control *oc) select_bad_process(oc); /* Found nothing?!?! Either we hang forever, or we panic. */ - if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { + if (!oc->chosen) { + if (is_sysrq_oom(oc) || is_memcg_oom(oc)) + return false; dump_header(oc, NULL); panic("Out of memory and no killable processes...\n"); } - if (oc->chosen && oc->chosen != (void *)-1UL) - oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : - "Memory cgroup out of memory"); - return !!oc->chosen; + oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : + "Memory cgroup out of memory"); + return true; } /*