From patchwork Thu May 31 10:10:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 10440617 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id EAF816035E for ; Thu, 31 May 2018 10:11:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D539329220 for ; Thu, 31 May 2018 10:11:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C9E522922D; Thu, 31 May 2018 10:11:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E2F8F2902C for ; Thu, 31 May 2018 10:11:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D070A6B0005; Thu, 31 May 2018 06:11:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CB7D06B0006; Thu, 31 May 2018 06:11:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA6176B0007; Thu, 31 May 2018 06:11:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ot0-f197.google.com (mail-ot0-f197.google.com [74.125.82.197]) by kanga.kvack.org (Postfix) with ESMTP id 903776B0005 for ; Thu, 31 May 2018 06:11:12 -0400 (EDT) Received: by mail-ot0-f197.google.com with SMTP id q4-v6so13521595ote.6 for ; Thu, 31 May 2018 03:11:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:to:cc :references:from:message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=v6mkJQizoANBikMDP076nIgfsTXCc4YzdtRpjvV4tIE=; b=qnLtLs6UBWWwpfpcOEhPcl/JU4y686FRqsy5IgHexHc06SFLV+EBQsN4ZspGICtGAc upFrtZ7It6iwhSAtu8/TDxt+trCiMv0k3ehgEOUDENq7GYq3yhFYmUlh6yM+QNYrZI1Q kZYaVWtMofXWahaqo+P/Qlhg72Q7ulxIV0ubWLjAjqF4HBUOi+uAsLPR3821SZ5vo2eM aMjxxPg7GThOoRThjpekUr2/jV4PHoQtv4GsOHY227qAEDaTJXAiiAFyYncvL1eBKXPk +Muvrh5dY7Cx8dvJV2+n/bJSFc2rU0GyCUfMD3O64eMZENnRilyO0c8VOki0ZcWrXck3 02rA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp X-Gm-Message-State: ALKqPwfHsSqUZmu5xd/9C06RVjgRJ1Q1wYsZSfl2OAxnDtYR/zhcd3Fj id+jOb9xo/cdQR6s4z1IwukOGPYGCBizaw7fDDBk8EITh20NnGhL5u67nrrl4GHSKyuOLa+jTxC SoKCxK/nbQnrsBd9HFI2VgIdoHvkzkjdDtLRZey4n38MjO4ezGSiOBvCt9fY7pQGu1g== X-Received: by 2002:a9d:fad:: with SMTP id d42-v6mr3867601otd.238.1527761472353; Thu, 31 May 2018 03:11:12 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJd4O784lZuWpviBMpMJWr8TQYnma8aWSH+gnlcMS3GOJp5lDLQmBEeAgruLbID+XNpOuQF X-Received: by 2002:a9d:fad:: with SMTP id d42-v6mr3867539otd.238.1527761470769; Thu, 31 May 2018 03:11:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527761470; cv=none; d=google.com; s=arc-20160816; b=RcRvb4JjmVQ7eGPX5JXowKBOuGGz3DSqjdS96zqfZFS5SvbBmH/+1XjOcgzhySxOCQ 97B77+ThCH7uXs6f/Gtehp5Y4LXdWLY95v+cQpt7imQ7VtDyjDegN/OPlgDytSmN7zKl B5EDfctjn8BjxaRG+jcsuja1aS4lnh14bxrDGfF/3RieIjXmjCbh8V+SFQbH2pWiYSBJ T0qXCN8VVMT+HyQJwcYfORfupvdeywavGsgIHrW6y/hGglutR7LiBB+IUiVV1/vxJmMo gI76cVp+GSyMaZqRdjDvNPQ/495fFDzTrE7GIds000XEV4pddJzzj+Iymf9s/3pFpn1D CVzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject :arc-authentication-results; bh=v6mkJQizoANBikMDP076nIgfsTXCc4YzdtRpjvV4tIE=; b=iCdjLghC0PAIz5txljCijefIVUr1+X/8r2gQfR72HwhLu+vQX4SSO7Sx3pZM131ESv oRWjsGWvVbevZ4MsOTilIhlvQHjbGhZhzbedf/gNmginLceRwLfccU+5AQ9KLgS8eP3/ A7YKSw1NCq2kkORnkRY0WrjMElFd76oY2KJzXZ7rHeUZZBeEuPNQEndl1E1ouhBwh76B 1oJ5ZRZ7ALU+/s8fI8POgS/CK66hGEh/7A0GT3fmo8uQ9tTc0bTPxrmx2d3S1mURvG2t IzLLOTq6fnYuUP4IO9JS+ii1/laTnW7yjxGddhKaHCy/rUwfygeUaqBbJAm998+r0e6+ 0MJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [202.181.97.72]) by mx.google.com with ESMTPS id z47-v6si5034069otz.253.2018.05.31.03.11.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 May 2018 03:11:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) client-ip=202.181.97.72; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from fsav103.sakura.ne.jp (fsav103.sakura.ne.jp [27.133.134.230]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w4VAAtSx010427; Thu, 31 May 2018 19:10:55 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav103.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav103.sakura.ne.jp); Thu, 31 May 2018 19:10:55 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav103.sakura.ne.jp) Received: from [192.168.1.8] (softbank126074194044.bbtec.net [126.74.194.44]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id w4VAAnkG010341 (version=TLSv1.2 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 31 May 2018 19:10:55 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Subject: Re: [PATCH] mm,oom: Don't call schedule_timeout_killable() with oom_lock held. To: Andrew Morton , Michal Hocko , torvalds@linux-foundation.org Cc: guro@fb.com, rientjes@google.com, hannes@cmpxchg.org, vdavydov.dev@gmail.com, tj@kernel.org, linux-mm@kvack.org References: <20180525083118.GI11881@dhcp22.suse.cz> <201805251957.EJJ09809.LFJHFFVOOSQOtM@I-love.SAKURA.ne.jp> <20180525114213.GJ11881@dhcp22.suse.cz> <201805252046.JFF30222.JHSFOFQFMtVOLO@I-love.SAKURA.ne.jp> <20180528124313.GC27180@dhcp22.suse.cz> <201805290557.BAJ39558.MFLtOJVFOHFOSQ@I-love.SAKURA.ne.jp> <20180529060755.GH27180@dhcp22.suse.cz> <20180529160700.dbc430ebbfac301335ac8cf4@linux-foundation.org> From: Tetsuo Handa Message-ID: <16eca862-5fa6-2333-8a81-94a2c2692758@i-love.sakura.ne.jp> Date: Thu, 31 May 2018 19:10:48 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180529160700.dbc430ebbfac301335ac8cf4@linux-foundation.org> Content-Language: en-US X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP On 2018/05/30 8:07, Andrew Morton wrote: > On Tue, 29 May 2018 09:17:41 +0200 Michal Hocko wrote: > >>> I suggest applying >>> this patch first, and then fix "mm, oom: cgroup-aware OOM killer" patch. >> >> Well, I hope the whole pile gets merged in the upcoming merge window >> rather than stall even more. > > I'm more inclined to drop it all. David has identified significant > shortcomings and I'm not seeing a way of addressing those shortcomings > in a backward-compatible fashion. Therefore there is no way forward > at present. > Can we apply my patch as-is first? My patch mitigates lockup regression which should be able to be easily backported to stable kernels. We can later evaluate whether moving the short sleep to should_reclaim_retry() has negative impact. Also we can eliminate the short sleep in Roman's patch before deciding whether we can merge Roman's patchset in the upcoming merge window. >From 4b356c742a3f1b720d5b709792fe68b25d800902 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Sat, 12 May 2018 12:27:52 +0900 Subject: [PATCH] mm,oom: Don't call schedule_timeout_killable() with oom_lock held. When I was examining a bug which occurs under CPU + memory pressure, I observed that a thread which called out_of_memory() can sleep for minutes at schedule_timeout_killable(1) with oom_lock held when many threads are doing direct reclaim. The whole point of the sleep is give the OOM victim some time to exit. But since commit 27ae357fa82be5ab ("mm, oom: fix concurrent munlock and oom reaper unmap, v3") changed the OOM victim to wait for oom_lock in order to close race window at exit_mmap(), the whole point of this sleep is lost now. We need to make sure that the thread which called out_of_memory() will release oom_lock shortly. Therefore, this patch brings the sleep to outside of the OOM path. Signed-off-by: Tetsuo Handa Cc: Roman Gushchin Cc: Michal Hocko Cc: Johannes Weiner Cc: Vladimir Davydov Cc: David Rientjes Cc: Tejun Heo --- mm/oom_kill.c | 38 +++++++++++++++++--------------------- mm/page_alloc.c | 7 ++++++- 2 files changed, 23 insertions(+), 22 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 8ba6cb8..23ce67f 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -479,6 +479,21 @@ bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) static struct task_struct *oom_reaper_list; static DEFINE_SPINLOCK(oom_reaper_lock); +/* + * We have to make sure not to cause premature new oom victim selection. + * + * __alloc_pages_may_oom() oom_reap_task_mm()/exit_mmap() + * mutex_trylock(&oom_lock) + * get_page_from_freelist(ALLOC_WMARK_HIGH) # fails + * unmap_page_range() # frees some memory + * set_bit(MMF_OOM_SKIP) + * out_of_memory() + * select_bad_process() + * test_bit(MMF_OOM_SKIP) # selects new oom victim + * mutex_unlock(&oom_lock) + * + * Therefore, the callers hold oom_lock when calling this function. + */ void __oom_reap_task_mm(struct mm_struct *mm) { struct vm_area_struct *vma; @@ -523,20 +538,6 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) { bool ret = true; - /* - * We have to make sure to not race with the victim exit path - * and cause premature new oom victim selection: - * oom_reap_task_mm exit_mm - * mmget_not_zero - * mmput - * atomic_dec_and_test - * exit_oom_victim - * [...] - * out_of_memory - * select_bad_process - * # no TIF_MEMDIE task selects new victim - * unmap_page_range # frees some memory - */ mutex_lock(&oom_lock); if (!down_read_trylock(&mm->mmap_sem)) { @@ -1077,15 +1078,9 @@ bool out_of_memory(struct oom_control *oc) dump_header(oc, NULL); panic("Out of memory and no killable processes...\n"); } - if (oc->chosen && oc->chosen != (void *)-1UL) { + if (oc->chosen && oc->chosen != (void *)-1UL) oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : "Memory cgroup out of memory"); - /* - * Give the killed process a good chance to exit before trying - * to allocate memory again. - */ - schedule_timeout_killable(1); - } return !!oc->chosen; } @@ -1111,4 +1106,5 @@ void pagefault_out_of_memory(void) return; out_of_memory(&oc); mutex_unlock(&oom_lock); + schedule_timeout_killable(1); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 905db9d..458ed32 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3478,7 +3478,6 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) */ if (!mutex_trylock(&oom_lock)) { *did_some_progress = 1; - schedule_timeout_uninterruptible(1); return NULL; } @@ -4241,6 +4240,12 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) /* Retry as long as the OOM killer is making progress */ if (did_some_progress) { no_progress_loops = 0; + /* + * This schedule_timeout_*() serves as a guaranteed sleep for + * PF_WQ_WORKER threads when __zone_watermark_ok() == false. + */ + if (!tsk_is_oom_victim(current)) + schedule_timeout_uninterruptible(1); goto retry; }