From patchwork Thu Oct 25 08:24:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10655591 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 886B714DE for ; Thu, 25 Oct 2018 08:24:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 74EE32B236 for ; Thu, 25 Oct 2018 08:24:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 68CE52B252; Thu, 25 Oct 2018 08:24:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D69352B236 for ; Thu, 25 Oct 2018 08:24:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36CE36B026C; Thu, 25 Oct 2018 04:24:15 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2A08F6B026D; Thu, 25 Oct 2018 04:24:15 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 118876B026E; Thu, 25 Oct 2018 04:24:15 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id AD01F6B026C for ; Thu, 25 Oct 2018 04:24:14 -0400 (EDT) Received: by mail-ed1-f70.google.com with SMTP id i16-v6so4282595ede.11 for ; Thu, 25 Oct 2018 01:24:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=ki+kngmMO/xWHfrh7zqE78qHZf4pfpYk6sGkoYfsK/w=; b=pIHJLKd052I13rg6XKxE298QJ+sdzaXAEE3tFlwSPBu/A+1LK1JR7OGzIN5exhhQ6m vhoAx6c0d3vF7PWiv230momOs1fS4zPieB+WthNXiR0HJjuk2aKXTuVTHW8E723jYwyt tLp9erzI4MkwVEhn9KyTFWEy9R7u+ufZBtURG65zoIZnMVWs2J4KDJoP/VYNp/owVphd 8ovUyxJD5t5mMvWSOD1opU56hpWZI3OkFm/vIsAOXHJgupMSO8BB4UVO0n8eOb35xMU4 HpdLanBe6EP7WUJBo8i4TEgbCgVpI0jN5T9MffAr75tszpcqHLKBIxDwJI/96o7m8082 9K+g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: AGRZ1gJG9EGaTXvLfRzlBMCSWGC/+Es0q9+mY/IbNWupA9X15J3KjhAC nFIftxO5WmH9opf3p4XOrX+dBGuEETUQgBC/fPNncKMtIeppa2aTb3w2bVef0AdHP4ww6ZrNSYm PsZb5VkMYiqPAA3DIg5EzWxkxjSA0fK38oMQhqgjVplwCO8SqBgSeGd00TcK8dqtVEbGH4KyF4u As0mF3bsL9N4CJ80mK4OlME++9/ID7MgHEYhIbwb0ZS+xx2Oo1/jhnj1XsTAlC/YLxZQQhBLbYg 9IeUwy8dq1dS4nd1pE7MDafhtEzB8p+oMmmOKc8sf70RYgy+WwwWGmJram3GrYkYnoRzev1qO61 U0AOWTBpRv5LJFmpIMCnkk2MGDIGG3WBFXKBNrmYnjJl1KVXpOlZFoYgqMtsg6C6aBXrFvErBg= = X-Received: by 2002:a17:906:1c45:: with SMTP id l5-v6mr620768ejg.118.1540455854119; Thu, 25 Oct 2018 01:24:14 -0700 (PDT) X-Received: by 2002:a17:906:1c45:: with SMTP id l5-v6mr620726ejg.118.1540455852944; Thu, 25 Oct 2018 01:24:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540455852; cv=none; d=google.com; s=arc-20160816; b=jlE3lL5x/JG2rBOfN4X+AizUM6XP41ZDXOLGH/fSFEWw1dgChzi3LvK0Us1TnKVCsz r2/ARXkm9HsFEsksa8prpu4VLg9g0b54z3cqQTnH8X9RcSwOwtq83axk70CHrMUpF1ux HViJ87bdAGHhYmM6myjmVr2WPXhiQY6ocyCdmWtWqZinn5KrEkVcg80oXg9Uz9FhbJxl ykyZlNEFQ0oh9sqLqMr0rMpI55JkEkDQA1yWngR/2zQliC5nLGVcLtz328qZJFaosIK4 6es0zVOJQtcKnbCtSJn9xtwbvPKCKujCsYSWP0cKFBuzpLNCmIFwX2q5WthXviim6lNn 8psA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=ki+kngmMO/xWHfrh7zqE78qHZf4pfpYk6sGkoYfsK/w=; b=uxkVYFyzcBteicdN9jnF0NE5CksLm+KHnBVVyNlEKbRi6YCLqIzovrHc4qYcpFFKRh p5sE+ELbbHZc3yH++C5/QQqrJegzhO9KLilM+UzLHUeyw+5SRCvgMsUeGz0eiWYV6yYm zOqUEgA2cbc3dbYNTV1phzsulJDQzkbtdZeTmgvlfGYGm2Z5H2bRVHei7Ta/icsNQweS ppCH+3N5yy0DXUIzUadxH97CQwqejZ7Eiz77N3tKF5DIM30HQn2X6/9+n5SE2ZnC4ZKw YvW3wxFH5Q/72IKDk2TqVoETk7TjXVt4rUSa/6qIF2jTNHkou30KCxgKUEcKqzkfQnIi /6hA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id o16-v6sor2423601ejc.45.2018.10.25.01.24.12 for (Google Transport Security); Thu, 25 Oct 2018 01:24:12 -0700 (PDT) Received-SPF: pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Google-Smtp-Source: AJdET5cqZeAQKpBo+7lP/8xTao6kPlYSbvike08LeMNUvfJ54+Ggd+xhCdR4xl+/JbmWTdkS+VkM6A== X-Received: by 2002:a17:906:5044:: with SMTP id e4-v6mr606971ejk.3.1540455852208; Thu, 25 Oct 2018 01:24:12 -0700 (PDT) Received: from tiehlicka.suse.cz (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id m24-v6sm2628277edd.31.2018.10.25.01.24.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Oct 2018 01:24:11 -0700 (PDT) From: Michal Hocko To: Cc: Tetsuo Handa , Roman Gushchin , David Rientjes , Andrew Morton , LKML , Michal Hocko Subject: [RFC PATCH v2 1/3] mm, oom: rework mmap_exit vs. oom_reaper synchronization Date: Thu, 25 Oct 2018 10:24:01 +0200 Message-Id: <20181025082403.3806-2-mhocko@kernel.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181025082403.3806-1-mhocko@kernel.org> References: <20181025082403.3806-1-mhocko@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko The oom_reaper cannot handle mlocked vmas right now and therefore we have exit_mmap to reap the memory before it clears the mlock flags on mappings. This is all good but we would like to have a better hand over protocol between the oom_reaper and exit_mmap paths. Therefore use exclusive mmap_sem in exit_mmap whenever exit_mmap has to synchronize with the oom_reaper. There are two notable places. Mlocked vmas (munlock_vma_pages_all) and page tables tear down path. All others should be fine to race with oom_reap_task_mm. This is mostly a preparatory patch which shouldn't introduce functional changes. Changes since RFC - move MMF_OOM_SKIP in exit_mmap to before we are going to free page tables. Signed-off-by: Michal Hocko --- include/linux/oom.h | 2 -- mm/mmap.c | 50 ++++++++++++++++++++++----------------------- mm/oom_kill.c | 4 ++-- 3 files changed, 27 insertions(+), 29 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index 69864a547663..11e26ca565a7 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -95,8 +95,6 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm) return 0; } -bool __oom_reap_task_mm(struct mm_struct *mm); - extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, const nodemask_t *nodemask, unsigned long totalpages); diff --git a/mm/mmap.c b/mm/mmap.c index 5f2b2b184c60..a02b314c0546 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3042,39 +3042,29 @@ void exit_mmap(struct mm_struct *mm) struct mmu_gather tlb; struct vm_area_struct *vma; unsigned long nr_accounted = 0; + bool oom = mm_is_oom_victim(mm); /* mm's last user has gone, and its about to be pulled down */ mmu_notifier_release(mm); - if (unlikely(mm_is_oom_victim(mm))) { - /* - * Manually reap the mm to free as much memory as possible. - * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard - * this mm from further consideration. Taking mm->mmap_sem for - * write after setting MMF_OOM_SKIP will guarantee that the oom - * reaper will not run on this mm again after mmap_sem is - * dropped. - * - * Nothing can be holding mm->mmap_sem here and the above call - * to mmu_notifier_release(mm) ensures mmu notifier callbacks in - * __oom_reap_task_mm() will not block. - * - * This needs to be done before calling munlock_vma_pages_all(), - * which clears VM_LOCKED, otherwise the oom reaper cannot - * reliably test it. - */ - (void)__oom_reap_task_mm(mm); - - set_bit(MMF_OOM_SKIP, &mm->flags); - down_write(&mm->mmap_sem); - up_write(&mm->mmap_sem); - } - if (mm->locked_vm) { vma = mm->mmap; while (vma) { - if (vma->vm_flags & VM_LOCKED) + if (vma->vm_flags & VM_LOCKED) { + /* + * oom_reaper cannot handle mlocked vmas but we + * need to serialize it with munlock_vma_pages_all + * which clears VM_LOCKED, otherwise the oom reaper + * cannot reliably test it. + */ + if (oom) + down_write(&mm->mmap_sem); + munlock_vma_pages_all(vma); + + if (oom) + up_write(&mm->mmap_sem); + } vma = vma->vm_next; } } @@ -3091,6 +3081,13 @@ void exit_mmap(struct mm_struct *mm) /* update_hiwater_rss(mm) here? but nobody should be looking */ /* Use -1 here to ensure all VMAs in the mm are unmapped */ unmap_vmas(&tlb, vma, 0, -1); + + /* oom_reaper cannot race with the page tables teardown */ + if (oom) { + down_write(&mm->mmap_sem); + set_bit(MMF_OOM_SKIP, &mm->flags); + } + free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); tlb_finish_mmu(&tlb, 0, -1); @@ -3104,6 +3101,9 @@ void exit_mmap(struct mm_struct *mm) vma = remove_vma(vma); } vm_unacct_memory(nr_accounted); + + if (oom) + up_write(&mm->mmap_sem); } /* Insert vm structure into process list sorted by address diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f10aa5360616..b3b2c2bbd8ab 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -488,7 +488,7 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait); static struct task_struct *oom_reaper_list; static DEFINE_SPINLOCK(oom_reaper_lock); -bool __oom_reap_task_mm(struct mm_struct *mm) +static bool __oom_reap_task_mm(struct mm_struct *mm) { struct vm_area_struct *vma; bool ret = true; @@ -554,7 +554,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't * work on the mm anymore. The check for MMF_OOM_SKIP must run * under mmap_sem for reading because it serializes against the - * down_write();up_write() cycle in exit_mmap(). + * down_write() in exit_mmap(). */ if (test_bit(MMF_OOM_SKIP, &mm->flags)) { trace_skip_task_reaping(tsk->pid);