From patchwork Thu Jun 7 11:00:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 10451647 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 806E66037F for ; Thu, 7 Jun 2018 11:00:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7031729BF3 for ; Thu, 7 Jun 2018 11:00:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6447929C36; Thu, 7 Jun 2018 11:00:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8FAC129BF3 for ; Thu, 7 Jun 2018 11:00:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8FBD26B0003; Thu, 7 Jun 2018 07:00:56 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8AC4D6B0006; Thu, 7 Jun 2018 07:00:56 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C23E6B0007; Thu, 7 Jun 2018 07:00:56 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-io0-f197.google.com (mail-io0-f197.google.com [209.85.223.197]) by kanga.kvack.org (Postfix) with ESMTP id 542536B0003 for ; Thu, 7 Jun 2018 07:00:56 -0400 (EDT) Received: by mail-io0-f197.google.com with SMTP id p12-v6so7101314iog.21 for ; Thu, 07 Jun 2018 04:00:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id; bh=TexaRsCOduFeYW/1LgVGU7NFws3ECUjiTuGkyn1TjQo=; b=PP5yE1Uy8eUyeGwFuiYrlGxSgIVhPN8XqI5RPzX0s1zUKDs7LxWT1IxqdCGT7c3++R 3sMHFz51gs6IW6iyAUDMZZpJsVDnruof7mW0oG6DmVnxMw+4oIjiuJyEADZbOFna50WF ONr3ymuIXYI3/yQmWWYkGeOzkR+vlTLc8cxpOGRs61MUyH7CQzlV4EQkmAi5zK9c3EgE FqnOfTi5FLNTgQdjnNfhubEZzgzzUhmYKJXy9IGU8WZ1s1bBd0ftsZz1x31gQWzHpuY7 QWMgjKafKU8SHBIGf7lKNq8EnA2lyr+a6N0FXw4OlmHXVbkDOxLjROzcEe/eKu8E52Pp UWug== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp X-Gm-Message-State: APt69E1sm30zGDLPlEhkUACTjwyDrkLfC2fgUNzJ6YaLiJCvLfzR/TlU e48m36fqXKi2gKiK3RMVR3at4IsYJf6TVJmhsuuzryaHK/E6JK8bqMUee9g//nSCvX8QUgkp2/5 9YWRBmOkxhkeT6LnMZlmgR0spb0HDt5kbE6J4pCL97dOhnROoNndSu3oieBX12mKrcg== X-Received: by 2002:a24:4e8f:: with SMTP id r137-v6mr1499400ita.9.1528369256055; Thu, 07 Jun 2018 04:00:56 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKRff/RXcx/gCTSn+ERGpUFagqTccH7I9w9flsIQH13I/vPPVbYCCWnLAmnNsYy+prrX0IH X-Received: by 2002:a24:4e8f:: with SMTP id r137-v6mr1499240ita.9.1528369254223; Thu, 07 Jun 2018 04:00:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528369254; cv=none; d=google.com; s=arc-20160816; b=UGCfkTlW/wrtkqXedHj5V1sBinxZpCXxbjWnnVRLEHFvXBPQ5w2saaKCsEhRM6gAFq 4aWGy6mbfPkhmR1IfOCLO6ofxiTKUss+ev2RvE27yb0AiaSyI+kkR1SqO7qKohG3mu4d 6a8J0CHExpQwNFcsHY8jq2MXL8tQxGVozz/PuG3z0N0w3QbQh1gV+RZvzU9GCxXgKQY7 3v7asMwLqSACE/4ODN6Oy4H4oUED3I4abopsGqrRqLo+dfu1EEw0SjHrHXvtlbnsr/iO zcSuBz25tcQpGPODOAFu7B1NEL0OjrzIkpX3nTmaltEGi6mpbY89xLhfVbnGmqiv8kIQ yNCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from:arc-authentication-results; bh=TexaRsCOduFeYW/1LgVGU7NFws3ECUjiTuGkyn1TjQo=; b=eDQCeTeGtMpjsJfyD9H2Xww/eQJAQO/Q8BsmbTIG0nmpR9uJBDwvFlEOiPKLrcw7eT 7T4jKp5g7ctbyebJ8x7rKdIVZtTLqk3i+bepMdMXLmKcedgcKQKNJmYTjNA0AK9rXh2K 2imZbuUElKw1s8kXZupFzVYgx6XRKX5U4GtulqJXuoTiYfAUeZaxzNOLK9B86xdl9zT8 aElS7VyN5EEMmj9t2+LjLvvMNdlYmdQ261XyCjVZQ/8at96a52Ju869Co3jFcdeNyW8A A2NT+/RY7eG999OhNJfp0sRZh+LO6Xvq5jBnBUHu+LNWCA6x1fqu3EamAEMXG0zXvips xSuQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [202.181.97.72]) by mx.google.com with ESMTPS id l187-v6si6752666iof.1.2018.06.07.04.00.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Jun 2018 04:00:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) client-ip=202.181.97.72; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from fsav302.sakura.ne.jp (fsav302.sakura.ne.jp [153.120.85.133]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w57B0U9R033488; Thu, 7 Jun 2018 20:00:30 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav302.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav302.sakura.ne.jp); Thu, 07 Jun 2018 20:00:30 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav302.sakura.ne.jp) Received: from ccsecurity.localdomain (softbank126074194044.bbtec.net [126.74.194.44]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id w57B0OP7033466 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 7 Jun 2018 20:00:30 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) From: Tetsuo Handa To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, Tetsuo Handa , David Rientjes , Johannes Weiner , Michal Hocko , Roman Gushchin , Tejun Heo , Vladimir Davydov Subject: [PATCH 1/4] mm, oom: Don't call schedule_timeout_killable() with oom_lock held. Date: Thu, 7 Jun 2018 20:00:20 +0900 Message-Id: <1528369223-7571-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> X-Mailer: git-send-email 1.8.3.1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When I was examining a bug which occurs under CPU + memory pressure, I observed that a thread which called out_of_memory() can sleep for minutes at schedule_timeout_killable(1) with oom_lock held when many threads are doing direct reclaim. The whole point of the sleep is give the OOM victim some time to exit. But since commit 27ae357fa82be5ab ("mm, oom: fix concurrent munlock and oom reaper unmap, v3") changed the OOM victim to wait for oom_lock in order to close race window at exit_mmap(), the whole point of this sleep is lost now. We need to make sure that the thread which called out_of_memory() will release oom_lock shortly. Therefore, this patch brings the sleep to outside of the OOM path. Whether it is safe to remove the sleep will be tested by future patch. Signed-off-by: Tetsuo Handa Cc: Roman Gushchin Cc: Michal Hocko Cc: Johannes Weiner Cc: Vladimir Davydov Cc: David Rientjes Cc: Tejun Heo Nacked-by: Michal Hocko --- mm/oom_kill.c | 38 +++++++++++++++++--------------------- mm/page_alloc.c | 7 ++++++- 2 files changed, 23 insertions(+), 22 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 8ba6cb8..23ce67f 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -479,6 +479,21 @@ bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) static struct task_struct *oom_reaper_list; static DEFINE_SPINLOCK(oom_reaper_lock); +/* + * We have to make sure not to cause premature new oom victim selection. + * + * __alloc_pages_may_oom() oom_reap_task_mm()/exit_mmap() + * mutex_trylock(&oom_lock) + * get_page_from_freelist(ALLOC_WMARK_HIGH) # fails + * unmap_page_range() # frees some memory + * set_bit(MMF_OOM_SKIP) + * out_of_memory() + * select_bad_process() + * test_bit(MMF_OOM_SKIP) # selects new oom victim + * mutex_unlock(&oom_lock) + * + * Therefore, the callers hold oom_lock when calling this function. + */ void __oom_reap_task_mm(struct mm_struct *mm) { struct vm_area_struct *vma; @@ -523,20 +538,6 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) { bool ret = true; - /* - * We have to make sure to not race with the victim exit path - * and cause premature new oom victim selection: - * oom_reap_task_mm exit_mm - * mmget_not_zero - * mmput - * atomic_dec_and_test - * exit_oom_victim - * [...] - * out_of_memory - * select_bad_process - * # no TIF_MEMDIE task selects new victim - * unmap_page_range # frees some memory - */ mutex_lock(&oom_lock); if (!down_read_trylock(&mm->mmap_sem)) { @@ -1077,15 +1078,9 @@ bool out_of_memory(struct oom_control *oc) dump_header(oc, NULL); panic("Out of memory and no killable processes...\n"); } - if (oc->chosen && oc->chosen != (void *)-1UL) { + if (oc->chosen && oc->chosen != (void *)-1UL) oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : "Memory cgroup out of memory"); - /* - * Give the killed process a good chance to exit before trying - * to allocate memory again. - */ - schedule_timeout_killable(1); - } return !!oc->chosen; } @@ -1111,4 +1106,5 @@ void pagefault_out_of_memory(void) return; out_of_memory(&oc); mutex_unlock(&oom_lock); + schedule_timeout_killable(1); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 22320ea27..e90f152 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3471,7 +3471,6 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) */ if (!mutex_trylock(&oom_lock)) { *did_some_progress = 1; - schedule_timeout_uninterruptible(1); return NULL; } @@ -4238,6 +4237,12 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) /* Retry as long as the OOM killer is making progress */ if (did_some_progress) { no_progress_loops = 0; + /* + * This schedule_timeout_*() serves as a guaranteed sleep for + * PF_WQ_WORKER threads when __zone_watermark_ok() == false. + */ + if (!tsk_is_oom_victim(current)) + schedule_timeout_uninterruptible(1); goto retry; }