From patchwork Tue Jul 3 14:25:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 10504183 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8EC1660325 for ; Tue, 3 Jul 2018 14:26:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7CC7528C50 for ; Tue, 3 Jul 2018 14:26:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6FD6728C56; Tue, 3 Jul 2018 14:26:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B3AFA28C50 for ; Tue, 3 Jul 2018 14:26:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 72E376B0271; Tue, 3 Jul 2018 10:26:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6E00B6B0272; Tue, 3 Jul 2018 10:26:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CEB16B0273; Tue, 3 Jul 2018 10:26:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-it0-f71.google.com (mail-it0-f71.google.com [209.85.214.71]) by kanga.kvack.org (Postfix) with ESMTP id 37D546B0271 for ; Tue, 3 Jul 2018 10:26:47 -0400 (EDT) Received: by mail-it0-f71.google.com with SMTP id j80-v6so1953324itj.8 for ; Tue, 03 Jul 2018 07:26:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=m+kbcxlDuFhzJoV2/symCFiKL3pArFk8f7wnQGEZYJM=; b=f0EfplxO6rcHFlDlD4eb72LkGoUabytujFdXPmbOKzTU94q6LYTtO0H/DhnJ9isqwZ I8whD8RZKb/R6gkyUVJtFBKk9HDv78zO++rFxoxNJFbcgdomj8ffwpv9d16MvCP7HOik 7rF+zYMkdskgDdWFBRZCTuVKK3ZLovondA5lqYXIeM6Q0irI6NcKmCbTC/WAVi6YP/vI 52pSerRN9aLurIC78mvh4dpw96Q/8AeQvl+5bmWki+/lqX7yJ/GUKEvZO60CAriZxczw jlmhP1CsieO792UI4uJL3NcmkyrF9R2xVjL3zWtnw2RhEXhfxwMxVuVf6iL+AdWO4y7r p0Nw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp X-Gm-Message-State: APt69E2Bel/OteQqqXZMzjqptf2ea2nrWoK7X6Kwko4opgtJ5vC/GyQz YR9rUDdg1hw8CFksHpkfxMS11zdPG/XPsVECiby3Fr0q4C/vuu01bNfu+kTnwlaAMu2xMAIUGRl 79qTpChD6PGBhzZahe0nEPKjIpDVn9p2rzdIP5dqe21C75wcdCNiAJSk/PVo+sOgN8Q== X-Received: by 2002:a24:99c4:: with SMTP id a187-v6mr13120031ite.148.1530628006918; Tue, 03 Jul 2018 07:26:46 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdqBkZoRqyu3/XMnDfEs1uB4/s8jE1/MAjqhCfgp13yzB3tT1avpQhxMMF8QuwfK6ZgHR/1 X-Received: by 2002:a24:99c4:: with SMTP id a187-v6mr13119989ite.148.1530628005981; Tue, 03 Jul 2018 07:26:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530628005; cv=none; d=google.com; s=arc-20160816; b=QCizLuR3KVP7jWC3HiRS28JFEDSvF6T8lnwNsZdteQxpGaTqC6kHK62xDaWbCcOlEA 1WOf3DQnxXG/hgxHWPAwVeO1I+qGTQyxidpq8hKP0XHKnzkC3Ki7m5DX9A3dUeKbbEwI WoFTiP99Zyum4i13dsZt5m0PR8jvkfo/d6nZULncgv2rjPUVG5xJHvhLW3CXfoCFFasO mjxumL1J25ehzo+tPr2od05XBRXNg6EHakqTrJCZPNMYlTMiSvHYP2sbbl2tcm2sSJ7h pzlYx59M6KokNBZr2WwbC1ypWSGihB1LU6f/06bKOw2icTlgJ+oH+bcK0o/kxf+9aQjb Od9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=m+kbcxlDuFhzJoV2/symCFiKL3pArFk8f7wnQGEZYJM=; b=O6qztne+vOP8rD/0qUK6glsKrRiSzli8dXVIDuISNqVYkp7i9/dFuDOvNggVNoIHAO o3Q5S40ykgCNRTLapAFP4u6ZITP5woFQd2Nhrs8jOrWCqSwGTQnIw9Qy9Z5vlpaHgDdM rg0tHKveQhQ+f6mPiOmpXCVMjcLF7Pg3ZrjziRwvAYkQoh61gWG0hTgfeNgKi0KitK1P LrTo27zEtAtTGbp+tmO1LRXekfgWLHH0cgwQadbHfDxiFk0pQY8TxCUTMgqu/Rye+/GQ lPg8MeXxUIf9I8/8K1Rx1FKc77p4WewlQmLJ3YRq3RxJjJfYkCcahFc9+L6u7/JWdrzS B+8A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [202.181.97.72]) by mx.google.com with ESMTPS id a200-v6si887681ioa.166.2018.07.03.07.26.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 03 Jul 2018 07:26:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) client-ip=202.181.97.72; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from fsav402.sakura.ne.jp (fsav402.sakura.ne.jp [133.242.250.101]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w63EPpES063911; Tue, 3 Jul 2018 23:25:51 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav402.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav402.sakura.ne.jp); Tue, 03 Jul 2018 23:25:51 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav402.sakura.ne.jp) Received: from ccsecurity.localdomain (softbank126074194044.bbtec.net [126.74.194.44]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id w63EPbJ9063838 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 3 Jul 2018 23:25:51 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) From: Tetsuo Handa To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: torvalds@linux-foundation.org, Tetsuo Handa , David Rientjes , Johannes Weiner , Michal Hocko , Roman Gushchin , Tejun Heo , Vladimir Davydov Subject: [PATCH 5/8] mm,oom: Bring OOM notifier to outside of oom_lock. Date: Tue, 3 Jul 2018 23:25:06 +0900 Message-Id: <1530627910-3415-6-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1530627910-3415-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> References: <1530627910-3415-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Since blocking_notifier_call_chain() in out_of_memory() might sleep, sleeping with oom_lock held is currently an unavoidable problem. As a preparation for not to sleep with oom_lock held, this patch brings OOM notifier callbacks to outside of oom_lock. We are planning to eventually replace OOM notifier callbacks with different mechanisms (e.g. shrinker API). But such changes are out of scope for this series. Signed-off-by: Tetsuo Handa Cc: Roman Gushchin Cc: Michal Hocko Cc: Johannes Weiner Cc: Vladimir Davydov Cc: David Rientjes Cc: Tejun Heo --- include/linux/oom.h | 1 + mm/oom_kill.c | 38 +++++++++++++++++++++------ mm/page_alloc.c | 76 +++++++++++++++++++++++++++++++---------------------- 3 files changed, 76 insertions(+), 39 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index eab409f..d8da2cb 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -101,6 +101,7 @@ extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, const nodemask_t *nodemask, unsigned long totalpages); +extern unsigned long try_oom_notifier(void); extern bool out_of_memory(struct oom_control *oc); extern void exit_oom_victim(void); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 1a9fae4..d18fe1e 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -871,6 +871,36 @@ int unregister_oom_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_oom_notifier); +/** + * try_oom_notifier - Try to reclaim memory from OOM notifier list. + * + * Returns non-zero if notifier callbacks released something, zero otherwise. + */ +unsigned long try_oom_notifier(void) +{ + static DEFINE_MUTEX(oom_notifier_lock); + unsigned long freed = 0; + + /* + * In order to protect OOM notifiers which are not thread safe and to + * avoid excessively releasing memory from OOM notifiers which release + * memory every time, this lock serializes/excludes concurrent calls to + * OOM notifiers. + */ + if (!mutex_trylock(&oom_notifier_lock)) + return 1; + /* + * But teach the lockdep that mutex_trylock() above acts like + * mutex_lock(), for we are not allowed to depend on + * __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation here. + */ + mutex_release(&oom_notifier_lock.dep_map, 1, _THIS_IP_); + mutex_acquire(&oom_notifier_lock.dep_map, 0, 0, _THIS_IP_); + blocking_notifier_call_chain(&oom_notify_list, 0, &freed); + mutex_unlock(&oom_notifier_lock); + return freed; +} + void exit_oom_mm(struct mm_struct *mm) { struct task_struct *p, *tmp; @@ -937,19 +967,11 @@ static bool oom_has_pending_victims(struct oom_control *oc) */ bool out_of_memory(struct oom_control *oc) { - unsigned long freed = 0; enum oom_constraint constraint = CONSTRAINT_NONE; if (oom_killer_disabled) return false; - if (!is_memcg_oom(oc)) { - blocking_notifier_call_chain(&oom_notify_list, 0, &freed); - if (freed > 0) - /* Got some memory back in the last second. */ - return true; - } - /* * If current has a pending SIGKILL or is exiting, then automatically * select it. The goal is to allow it to allocate so that it may diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b915533..4cb3602 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3447,10 +3447,50 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) return page; } +static inline bool can_oomkill(gfp_t gfp_mask, unsigned int order, + const struct alloc_context *ac) +{ + /* Coredumps can quickly deplete all memory reserves */ + if (current->flags & PF_DUMPCORE) + return false; + /* The OOM killer will not help higher order allocs */ + if (order > PAGE_ALLOC_COSTLY_ORDER) + return false; + /* + * We have already exhausted all our reclaim opportunities without any + * success so it is time to admit defeat. We will skip the OOM killer + * because it is very likely that the caller has a more reasonable + * fallback than shooting a random task. + */ + if (gfp_mask & __GFP_RETRY_MAYFAIL) + return false; + /* The OOM killer does not needlessly kill tasks for lowmem */ + if (ac->high_zoneidx < ZONE_NORMAL) + return false; + if (pm_suspended_storage()) + return false; + /* + * XXX: GFP_NOFS allocations should rather fail than rely on + * other request to make a forward progress. + * We are in an unfortunate situation where out_of_memory cannot + * do much for this context but let's try it to at least get + * access to memory reserved if the current task is killed (see + * out_of_memory). Once filesystems are ready to handle allocation + * failures more gracefully we should just bail out here. + */ + + /* The OOM killer may not free memory on a specific node */ + if (gfp_mask & __GFP_THISNODE) + return false; + + return true; +} + static inline struct page * __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, const struct alloc_context *ac, unsigned long *did_some_progress) { + const bool oomkill = can_oomkill(gfp_mask, order, ac); struct oom_control oc = { .zonelist = ac->zonelist, .nodemask = ac->nodemask, @@ -3462,6 +3502,10 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) *did_some_progress = 0; + /* Try to reclaim via OOM notifier callback. */ + if (oomkill) + *did_some_progress = try_oom_notifier(); + /* * Acquire the oom lock. If that fails, somebody else is * making progress for us. @@ -3484,37 +3528,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) if (page) goto out; - /* Coredumps can quickly deplete all memory reserves */ - if (current->flags & PF_DUMPCORE) - goto out; - /* The OOM killer will not help higher order allocs */ - if (order > PAGE_ALLOC_COSTLY_ORDER) - goto out; - /* - * We have already exhausted all our reclaim opportunities without any - * success so it is time to admit defeat. We will skip the OOM killer - * because it is very likely that the caller has a more reasonable - * fallback than shooting a random task. - */ - if (gfp_mask & __GFP_RETRY_MAYFAIL) - goto out; - /* The OOM killer does not needlessly kill tasks for lowmem */ - if (ac->high_zoneidx < ZONE_NORMAL) - goto out; - if (pm_suspended_storage()) - goto out; - /* - * XXX: GFP_NOFS allocations should rather fail than rely on - * other request to make a forward progress. - * We are in an unfortunate situation where out_of_memory cannot - * do much for this context but let's try it to at least get - * access to memory reserved if the current task is killed (see - * out_of_memory). Once filesystems are ready to handle allocation - * failures more gracefully we should just bail out here. - */ - - /* The OOM killer may not free memory on a specific node */ - if (gfp_mask & __GFP_THISNODE) + if (!oomkill) goto out; /* Exhausted what can be done so it's blame time */