From patchwork Sun Jul 14 23:33:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11043301 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1204113BD for ; Sun, 14 Jul 2019 23:34:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0077426E73 for ; Sun, 14 Jul 2019 23:34:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E87E526E78; Sun, 14 Jul 2019 23:34:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9DDB826E73 for ; Sun, 14 Jul 2019 23:34:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A349E6B0007; Sun, 14 Jul 2019 19:34:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9E5AC6B0008; Sun, 14 Jul 2019 19:34:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D3E66B000A; Sun, 14 Jul 2019 19:34:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 55E506B0007 for ; Sun, 14 Jul 2019 19:34:19 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id g21so9389822pfb.13 for ; Sun, 14 Jul 2019 16:34:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=85/V4Pzs7rYqxaYbRfiE7L9WRncYejHtvNwPAaXBQkY=; b=lMEs6TVvbDzEIvaAiTlPmdO461kCsjBhqAAsVbBTkhtzFUIpS5ZiXay8uBen+BoW/z BtLUqa8C6MLa8Y19wA4K13dfwg/x76WGv2aa5g31GeRRv+rT1r0ZaEjlH5i2s/97oqH5 a7EKGS/Udh6qEg9nN/sGE6VvhHaZOb6wss8zYWQKwmrSVdO9YwmhPG5dcJ8Q91nRxLGU qQ3PUq67WTtyhZvAdLbPkOXzHiNmehOk2XLIO9ut5Vh4onxkwjDhWKP8TEhHYX3C5If4 eI0NixQBUV67FtnWGSt/c0/fwZYs7GtjEMAncMOZvyLvUQraAUWVU70fL03GZOyCvt3r 2LWg== X-Gm-Message-State: APjAAAUR2w4mkvaCF0gnDM4VFUQnohXysJEuyEF9hrrL4/AM3xUZ+o3Y ILS4O8qLpUVa7xx1SdrmQ8rd5l9YAYQ6nDPNc+SFSOiVPnFqhsDfd3QOcoWz4KpiLGA6TsPTPCM g6ea7mnU+M4ftgnGcdly7cJWk+1ldTxpvUSO0QAmv4mCWkGKRIlNFTK4Bmd07a5E= X-Received: by 2002:a65:4044:: with SMTP id h4mr23833139pgp.164.1563147258514; Sun, 14 Jul 2019 16:34:18 -0700 (PDT) X-Received: by 2002:a65:4044:: with SMTP id h4mr23833043pgp.164.1563147256611; Sun, 14 Jul 2019 16:34:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563147256; cv=none; d=google.com; s=arc-20160816; b=1ElNRoYsLOOidN1fRXURSxaP5cBf3FHW8+Pr0vPBPmwTgJBHcqRIrhdRwOCUZmL/pQ DMteGq0LYbKLhdBtLn6XVdPNF+NS4cgUcInlbghuqKCXPgT8Xb6wm6TunNw0qqTo4DXX p38bTTd1p3ZfmKotF3u4Y4hJZxp/38l8AQRGm37a4VQBGuM7+ECe6AoR8LgDvqohwHvt tP8masLeWGexO/J4TtcqzJM1wfgzWE42vAYmszPUtbHfWHaHbkevvuQ8qR29dkgBJFUm 7dh8NWc1OPjwSYmqsEg3dzqQ1FM1jy+h5+uEdPt8onDTJlO+78QdAxx0fE5pBBthAgFR dAfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=85/V4Pzs7rYqxaYbRfiE7L9WRncYejHtvNwPAaXBQkY=; b=MIPw9julFc+65KRshMhpTfx+akdblIxRLG1HatBICJIbzwogzwCFPqE09hqAnePqOX Vb1tuLwt2mjPRVoeWfhoWo4ZgluVp0cCgSQbGKBaeNw2nSbApSB4X3BiCwD7Xjg5ZXQI MSysGbdrBYbFM3Y0V3QMLfKn4GHcY30FboRGjC0wJQiFOO1Gbciu5THfJ4sUItam8CxB EuOcBjRg09CzIOlViBYlpoFYqWY99LWWn+xX7U1Ybjctv9BJAtgbOGq+lfELYH2NoINJ Vq74etJSHCCs3XFn3FcPCuXe2V4AEzBzsdDS6LJpoJNpCJ85Y7SkuMyMVRS7jcTw8U4B mTAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="J5n/MLIl"; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id c16sor8242486pfr.5.2019.07.14.16.34.16 for (Google Transport Security); Sun, 14 Jul 2019 16:34:16 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="J5n/MLIl"; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=85/V4Pzs7rYqxaYbRfiE7L9WRncYejHtvNwPAaXBQkY=; b=J5n/MLIl9+0ou5zdKnmM5FmJUCuxVGFinktgVKBXfM3A8VZ8WSyKLcdj6wO9PdP8FW 98V2i8ebDFxbZshESU9TYctqvFazYQpTTjSjHvp2bBtqX9SoAJCOLorU9o1o9Z5E4GFV 0vxYtArS0EcpaNaco7pbVYyLn+GGqsSuxZ4+XEdwAkn6z+iH+8WLk+Z+AP2CffkzjO7J e/mP96GYakGdReedL++mhTmix2nUoIShuxthaHawdn/L9AwFa+K67YdOgFKfBtfqeJTj NQVvkkRJ3TLt3RApbBfNWVpRPeR5C7oM57VwXZwkg3zVWF5G8wRysFvbVT4VxBJ+DWD9 FhcQ== X-Google-Smtp-Source: APXvYqzcV/egyWWzxN6Rb4acW00YJb5L+Lsc7BX7Lq79QNB8ydYZ6Fr1VWT4zSVIG4VhpAtsr84yBg== X-Received: by 2002:a63:6f41:: with SMTP id k62mr23957525pgc.32.1563147256089; Sun, 14 Jul 2019 16:34:16 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id n26sm16256923pfa.83.2019.07.14.16.34.10 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sun, 14 Jul 2019 16:34:14 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v5 1/5] mm: introduce MADV_COLD Date: Mon, 15 Jul 2019 08:33:56 +0900 Message-Id: <20190714233401.36909-2-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.510.g264f2c817a-goog In-Reply-To: <20190714233401.36909-1-minchan@kernel.org> References: <20190714233401.36909-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. * v2 * add up the warn with lots of page cache workload - mhocko * add man page stuff - dave * v1 * remove page_mapcount filter - hannes, mhocko * remove idle page handling - joelaf * RFCv2 * add more description - mhocko * RFCv1 * renaming from MADV_COOL to MADV_COLD - hannes * internal review * use clear_page_youn in deactivate_page - joelaf * Revise the description - surenb * Renaming from MADV_WARM to MADV_COOL - surenb Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Minchan Kim --- include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/internal.h | 2 +- mm/madvise.c | 180 ++++++++++++++++++++++++- mm/oom_kill.c | 2 +- mm/swap.c | 42 ++++++ 6 files changed, 224 insertions(+), 4 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index de2c67a33b7e..0ce997edb8bb 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -340,6 +340,7 @@ extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_all(void); extern void rotate_reclaimable_page(struct page *page); extern void deactivate_file_page(struct page *page); +extern void deactivate_page(struct page *page); extern void mark_page_lazyfree(struct page *page); extern void swap_setup(void); diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 63b1f506ea67..ef8a56927b12 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -45,6 +45,7 @@ #define MADV_SEQUENTIAL 2 /* expect sequential page references */ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ +#define MADV_COLD 5 /* deactivatie these pages */ /* common parameters: try to keep these consistent across architectures */ #define MADV_FREE 8 /* free pages only if memory pressure */ diff --git a/mm/internal.h b/mm/internal.h index f53a14d67538..c61b215ff265 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -39,7 +39,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf); void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling); -static inline bool can_madv_dontneed_vma(struct vm_area_struct *vma) +static inline bool can_madv_lru_vma(struct vm_area_struct *vma) { return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)); } diff --git a/mm/madvise.c b/mm/madvise.c index 968df3aa069f..bae0055f9724 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -40,6 +40,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_REMOVE: case MADV_WILLNEED: case MADV_DONTNEED: + case MADV_COLD: case MADV_FREE: return 0; default: @@ -307,6 +308,178 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } +static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + struct page *page; + unsigned long next; + + next = pmd_addr_end(addr, end); + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + } + + test_and_clear_page_young(page); + deactivate_page(page); +huge_unlock: + spin_unlock(ptl); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; + +regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + + if (pte_none(ptent)) + continue; + + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + /* + * Creating a THP page is expensive so split it only if we + * are sure it's worth. Split it if we are only owner. + */ + if (PageTransCompound(page)) { + if (page_mapcount(page) != 1) + break; + get_page(page); + if (!trylock_page(page)) { + put_page(page); + break; + } + pte_unmap_unlock(orig_pte, ptl); + if (split_huge_page(page)) { + unlock_page(page); + put_page(page); + pte_offset_map_lock(mm, pmd, addr, &ptl); + break; + } + unlock_page(page); + put_page(page); + pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + pte--; + addr -= PAGE_SIZE; + continue; + } + + VM_BUG_ON_PAGE(PageTransCompound(page), page); + + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + + /* + * We are deactivating a page for accelerating reclaiming. + * VM couldn't reclaim the page unless we clear PG_young. + * As a side effect, it makes confuse idle-page tracking + * because they will miss recent referenced history. + */ + test_and_clear_page_young(page); + deactivate_page(page); + } + + arch_enter_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + cond_resched(); + + return 0; +} + +static void madvise_cold_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk cold_walk = { + .pmd_entry = madvise_cold_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &cold_walk); + tlb_end_vma(tlb, vma); +} + +static long madvise_cold(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_cold_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -519,7 +692,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, int behavior) { *prev = vma; - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (!userfaultfd_remove(vma, start, end)) { @@ -541,7 +714,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, */ return -ENOMEM; } - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (end > vma->vm_end) { /* @@ -695,6 +868,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_remove(vma, prev, start, end); case MADV_WILLNEED: return madvise_willneed(vma, prev, start, end); + case MADV_COLD: + return madvise_cold(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -716,6 +891,7 @@ madvise_behavior_valid(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_FREE: + case MADV_COLD: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 95872bdfec4e..c8f0ec6b0e80 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -523,7 +523,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm) set_bit(MMF_UNSTABLE, &mm->flags); for (vma = mm->mmap ; vma; vma = vma->vm_next) { - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) continue; /* diff --git a/mm/swap.c b/mm/swap.c index 55899c1f54af..b501ad6eb091 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -47,6 +47,7 @@ int page_cluster; static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); +static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs); #ifdef CONFIG_SMP static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs); @@ -538,6 +539,22 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, update_page_reclaim_stat(lruvec, file, 0); } +static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, + void *arg) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + int file = page_is_file_cache(page); + int lru = page_lru_base_type(page); + + del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE); + ClearPageActive(page); + ClearPageReferenced(page); + add_page_to_lru_list(page, lruvec, lru); + + __count_vm_events(PGDEACTIVATE, hpage_nr_pages(page)); + update_page_reclaim_stat(lruvec, file, 0); + } +} static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, void *arg) @@ -590,6 +607,10 @@ void lru_add_drain_cpu(int cpu) if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); + pvec = &per_cpu(lru_deactivate_pvecs, cpu); + if (pagevec_count(pvec)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + pvec = &per_cpu(lru_lazyfree_pvecs, cpu); if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); @@ -623,6 +644,26 @@ void deactivate_file_page(struct page *page) } } +/* + * deactivate_page - deactivate a page + * @page: page to deactivate + * + * deactivate_page() moves @page to the inactive list if @page was on the active + * list and was not an unevictable page. This is done to accelerate the reclaim + * of @page. + */ +void deactivate_page(struct page *page) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); + + get_page(page); + if (!pagevec_add(pvec, page) || PageCompound(page)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + put_cpu_var(lru_deactivate_pvecs); + } +} + /** * mark_page_lazyfree - make an anon page lazyfree * @page: page to deactivate @@ -687,6 +728,7 @@ void lru_add_drain_all(void) if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) || pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) || + pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) || need_activate_page_drain(cpu)) { INIT_WORK(work, lru_add_drain_per_cpu); From patchwork Sun Jul 14 23:33:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11043303 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F114A13BD for ; Sun, 14 Jul 2019 23:34:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E214926E73 for ; Sun, 14 Jul 2019 23:34:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D5B8826E78; Sun, 14 Jul 2019 23:34:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3032926E73 for ; Sun, 14 Jul 2019 23:34:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F16B66B0008; Sun, 14 Jul 2019 19:34:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EED746B000A; Sun, 14 Jul 2019 19:34:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDCFF6B000C; Sun, 14 Jul 2019 19:34:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id A922A6B0008 for ; Sun, 14 Jul 2019 19:34:22 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id c31so8683468pgb.20 for ; Sun, 14 Jul 2019 16:34:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=dXmXb4hHePla+El29aWi3bvhcbCN7txGRytZMqhI7AI=; b=fyfU8AtkCvo1BUpGqFJH03FgMpHkNv3USh91EZGkVnkKrSwEuC38Unarbl2ykWhmNm 4U3W9nlN4vh9c4RgGXi3CGm+VMoF4cEudyINkT8lQVikeU6M9ymR/pz4F2gaNF3Hs1gH fK9aRHwQIYZFDnB3yOOQBqZTIxXOZ/zqAzaZgvCiQpfwJTQ1ETiLFQ15N8k17rPaerFS M2DgNCQC517kRdmFhUWGJY/2+gqqVnA1VrwLOgNBxjM8p0G6v453NTETjzo/mAajk4rw hVJlYjAuSo3w06JFQHCpplr41EL590aQk2JD0sN1EGN4xZ/5Aj2m4igLvQWU4CnSs9f0 v2rg== X-Gm-Message-State: APjAAAWiiXADAKDGA4lepWb5lbqZN6Ow1RctQjLoxmTbAR323fTY/yR0 MGbPDtBGRkVsGHrmE0HbWJAE3Vc+YK6wISRSLjWugCyGY5RP4NtgWAd1WSgbipShkRt4/0BezCC qH1MeMEpWQK5wBp5QWnht6BwHK2w2NRG2dG4knBH5UuUgSBTTemLcJr8TlZ4eaes= X-Received: by 2002:a17:902:e65:: with SMTP id 92mr24001234plw.13.1563147262326; Sun, 14 Jul 2019 16:34:22 -0700 (PDT) X-Received: by 2002:a17:902:e65:: with SMTP id 92mr24001190plw.13.1563147261623; Sun, 14 Jul 2019 16:34:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563147261; cv=none; d=google.com; s=arc-20160816; b=LMd5nXLGRxr9+qPUfSmNsVC4CSA3CQ0GAuMH26sTqdcriQRd9tWwP+hwH7TWtKt23n imq+oqfYeRQPcr+dZrLJPt90RehNQS9c6qKXxJKAE6plg0q19nOHXS2woPr7gXFq5HDp faNGzI+cF6ZxJ/QUCBv5WfcJEHxXZuBVNTIxmYJVQH3AMEtmlZ7Hxo4YcpVWi5gxscAO 5S8tAB7BypZSXQw6E7ih8sVGaEoga5MwkQwgSs/d5ISjcR2ePOlJ/XIn0yxgpliVexFV aVNhAMZHj25KT5Mkqvnj+PZXKiZHOWPzMq2UMou2wtNKT5sB9vntNezOFp3qIYKdObMN H0Qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=dXmXb4hHePla+El29aWi3bvhcbCN7txGRytZMqhI7AI=; b=T2v3OgnpkDSKIxwRwi8Xs4AuR1XgzklPuC9KeGqnuI8jOGfDKLLAJJJKPjc15+prjj fwSyZ6V/9qB3LzkxllPvHdEHMaSCrDe3SsmTsStJRYlOoo4ZhurYidqW6l850pnnCDX1 HqM17ggerjrD95ooP2hvPJBCYWDMdqnmeGwZALu1Zj/w0W0FJBFPkgoFOhqQVTnDzjM1 3OnMasp70JunkDzos4kcGZWVC+koGNjI3shCVTWT7acYvDievkGeazhqlnkcC0DZBo7O vsR8FFvNeYbkLHyOUbp0FlGvLxWg3jDcImzUzJdkNn9Q6YWxDDFuj0ofby8zQb02cn/i fy9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=WfvIdRNO; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 73sor18842611plf.60.2019.07.14.16.34.21 for (Google Transport Security); Sun, 14 Jul 2019 16:34:21 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=WfvIdRNO; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dXmXb4hHePla+El29aWi3bvhcbCN7txGRytZMqhI7AI=; b=WfvIdRNOEo5M3yqFQ95X+vfDsPgzs25Ux+W4a/3b0F9+lSPVhBU179TivXn7eJEcmJ V22evIfrP5qzt8j3EEbNGr9Cvxr5RHydDispLsawnpwS+nHo6VkqM/P9PGR0X+UElKV4 mDKb2zjj60mKgRGXZyWtcGNLiUVwP5GSH0k1GPzGyIpiAhSSsdEVzGdg0PrI9ahJXM1e o5dJpgB4/blnAaNTpN3xA/Sq69f7umq2GgXLfqmnvzrQNcc3ouReHxWXqU2OrRkNua08 4m8cNSl/Tn3dM+2HjgPGqx9RgjsKf+mLls7u+Guu/TTd5spfTGYh/uTQBOUOwl9q6GAm TVsw== X-Google-Smtp-Source: APXvYqyoIoiBc9BVkZkT+Mht+XPRNWcu2bkpsPgUKHDpHhdXHUndHSbJNQ8VyCvOS+D5QoVIUDIXhg== X-Received: by 2002:a17:902:e011:: with SMTP id ca17mr25554044plb.328.1563147261258; Sun, 14 Jul 2019 16:34:21 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id n26sm16256923pfa.83.2019.07.14.16.34.16 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sun, 14 Jul 2019 16:34:20 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v5 2/5] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Date: Mon, 15 Jul 2019 08:33:57 +0900 Message-Id: <20190714233401.36909-3-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.510.g264f2c817a-goog In-Reply-To: <20190714233401.36909-1-minchan@kernel.org> References: <20190714233401.36909-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN as default. It is for preventing to reclaim dirty pages when CMA try to migrate pages. Strictly speaking, we don't need it because CMA didn't allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list. Moreover, it has a problem to prevent anonymous pages's swap out even though force_reclaim = true in shrink_page_list on upcoming patch. So this patch makes references's default value to PAGEREF_RECLAIM and rename force_reclaim with ignore_references to make it more clear. This is a preparatory work for next patch. * RFCv1 * use ignore_referecnes as parameter name - hannes Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Minchan Kim --- mm/vmscan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301edd8d03..b4fa04d10ba6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1119,7 +1119,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct scan_control *sc, enum ttu_flags ttu_flags, struct reclaim_stat *stat, - bool force_reclaim) + bool ignore_references) { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); @@ -1133,7 +1133,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct address_space *mapping; struct page *page; int may_enter_fs; - enum page_references references = PAGEREF_RECLAIM_CLEAN; + enum page_references references = PAGEREF_RECLAIM; bool dirty, writeback; unsigned int nr_pages; @@ -1264,7 +1264,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, } } - if (!force_reclaim) + if (!ignore_references) references = page_check_references(page, sc); switch (references) { From patchwork Sun Jul 14 23:33:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11043305 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D67B314F6 for ; Sun, 14 Jul 2019 23:34:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C4CE226E73 for ; Sun, 14 Jul 2019 23:34:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B922326E78; Sun, 14 Jul 2019 23:34:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BF29526E73 for ; Sun, 14 Jul 2019 23:34:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF77B6B000A; Sun, 14 Jul 2019 19:34:29 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AA7EB6B000C; Sun, 14 Jul 2019 19:34:29 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9980F6B000D; Sun, 14 Jul 2019 19:34:29 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 622D76B000A for ; Sun, 14 Jul 2019 19:34:29 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id e95so7543486plb.9 for ; Sun, 14 Jul 2019 16:34:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=CcASyEj5nqHglgUTtJatjdsEw4UW9ArCi6OlhzqIv5Q=; b=dD+w5yWcgpBiSWnM0uU/+C+opRalTiUjuNqEaCMBTKgjtLJr3Hv0w+D4I7AEtAJwpa R0K//xMZdSM/83NTf3vbvtfGj67LZER00z5Jeay9qPPSCyl2oBavAmhGegdI0MbXTWdq 2MIMLdY+PbhbnfTmd7WsE/OM+PEVawEpodxg61m8ZTTE49inKEgX4EqksjXaeyRUTIt7 1NHMcG/siej9tAWNQOjZ3BQX6+l4BoXs2PQDEi8Gz4YlNUOxff1xvIWnmx+VdoJrXMOF 3MwtJJ5GQ0YwgLR2kqQZjW1N32Im0Q6gLzT6UR1p76ey7MO+WrhqlcHaaiAlHz/++VcU 5jpA== X-Gm-Message-State: APjAAAUB9SeCh4+BkkvlfgJzimfT89bUYRmGYDPP0Io9zNk5P7TOyBIa pGirqiGbPFQWxWBfgH4EKSROpygf1v3+qzRQbn4mRzWyU0TIj08Qe/17BPSbMWGIymDDPMV2s1c fMLoFQg6oyAASptDWOkCTSyn5QUj4+iygpXrkM50S5z+93rZPqP3nFiSPiF6p7yQ= X-Received: by 2002:a17:90a:19c2:: with SMTP id 2mr24989324pjj.13.1563147269033; Sun, 14 Jul 2019 16:34:29 -0700 (PDT) X-Received: by 2002:a17:90a:19c2:: with SMTP id 2mr24989230pjj.13.1563147267077; Sun, 14 Jul 2019 16:34:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563147267; cv=none; d=google.com; s=arc-20160816; b=gGxo8hItSd0KrHzDBLGOylCfeJD0AsTgK7I9VUGUiVTMk4vomIRsFtnxSZWcDs82cx kZkTJLZMOmwC3MihfIOKDs18mNp2I/kSnja1uxKxgF9TTt7KgYVPwaXSe5YG/KUgU1zg pUMoSZjpbwbvu1FVwZzqY7nsZAJaoXi22JLz8Y5+acR/9HmtDsjKcdHH2dduOHSgY+Z6 +VJ15FGQDJ/9mtkzZxCXF/FZy9flD6FLR/XYBwYMx76Zc3QPvHSejPVeAlomf3WMJ/ot KWj/qaelLYMMxoduAs9fJcZwDhfkIGp4G6o/OlVOHvsMBeYKWjGXdzWxt8IyPajZ2Ubu sFnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=CcASyEj5nqHglgUTtJatjdsEw4UW9ArCi6OlhzqIv5Q=; b=odzTxAOgAM4kKQ4DSzyjbVOT4TfiJdLBC0vaGYF0uvKxvdnFI8AsF3CbStmGYkBpNW BZFGxQXJc/cszbtWDAC0ZQ2mCPKpNegHwj8lZJLDb9k3sYpMHzv+w26v0C8oZNpXLjtf zdLFAmAWzmdy2crPPO9ZF3vEGGan5FC/grs7r3IFzXc+nPF5HJNwt9jGRNfNZOtBVaCs WsQpY4dwwqdRJWHq2pM4oVKO5EnTGtagBw86f+9weueq/xHWHfl5jACLSPI0iws4CnRS ONxY5ZyAEM72WYKVWdkDGiiAsnNW76f+ZgXNhjHJ+R4LxebcQxibkL6YcPzjAN2nP96W LeAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ITM12hlC; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id y7sor18772309plp.58.2019.07.14.16.34.26 for (Google Transport Security); Sun, 14 Jul 2019 16:34:27 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ITM12hlC; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CcASyEj5nqHglgUTtJatjdsEw4UW9ArCi6OlhzqIv5Q=; b=ITM12hlCVl6PLfJnLj9DXjyywCCYnFNl+stX3FRS05Hw/23uCR8v1645tbu//bGb2W V7gkC+Mod0qpzxW3NsNx2ST/3xcJ22fY2bTSuwPjzLbnC71r+W0oia/wr1esBfvLzwIh 6dLUXzvrtx9vGBdXeaeaGCYJa8lmXoFibu3RECf3oa+qmtb1Hd8c7kKQi7y2day5tUkt qomtygKUXx/MfFOzo9FM3pVfHr5ryPZYfiu5Rd5ZcUQXDe/NoAx6QV/zKrSCUNMAGQCd YcR3UszZferUWM2/5yQ0UcLW79sG0upv8RDsgMcIfYnioV3HOjofv0n9oICLgaEvI4cH csXA== X-Google-Smtp-Source: APXvYqxhFu1Tkh+g0GEW6eCcJyeZzYUbT1jdfMvamXeLVdkyFLZznrLlaO8adb+G5LrVYtgDFxA3xw== X-Received: by 2002:a17:902:e65:: with SMTP id 92mr24001465plw.13.1563147266591; Sun, 14 Jul 2019 16:34:26 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id n26sm16256923pfa.83.2019.07.14.16.34.21 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sun, 14 Jul 2019 16:34:25 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v5 3/5] mm: account nr_isolated_xxx in [isolate|putback]_lru_page Date: Mon, 15 Jul 2019 08:33:58 +0900 Message-Id: <20190714233401.36909-4-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.510.g264f2c817a-goog In-Reply-To: <20190714233401.36909-1-minchan@kernel.org> References: <20190714233401.36909-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The isolate counting is pecpu counter so it would be not huge gain to work them by batch. Rather than complicating to make them batch, let's make it more stright-foward via adding the counting logic into [isolate|putback]_lru_page API. * v1 * fix accounting bug - Hillf Link: http://lkml.kernel.org/r/20190531165927.GA20067@cmpxchg.org Suggested-by: Johannes Weiner Acked-by: Johannes Weiner Acked-by: Michal Hocko Signed-off-by: Minchan Kim --- mm/compaction.c | 2 -- mm/gup.c | 7 +------ mm/khugepaged.c | 3 --- mm/memory-failure.c | 3 --- mm/memory_hotplug.c | 4 ---- mm/mempolicy.c | 6 +----- mm/migrate.c | 37 ++++++++----------------------------- mm/vmscan.c | 22 ++++++++++++++++------ 8 files changed, 26 insertions(+), 58 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9e1b9acb116b..c6591682deda 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -982,8 +982,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, /* Successfully isolated */ del_page_from_lru_list(page, lruvec, page_lru(page)); - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); isolate_success: list_add(&page->lru, &cc->migratepages); diff --git a/mm/gup.c b/mm/gup.c index 98f13ab37bac..11d0634ce613 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1475,13 +1475,8 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk, drain_allow = false; } - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(&head->lru, &cma_page_list); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + - page_is_file_cache(head), - hpage_nr_pages(head)); - } } } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index eaaa21b23215..a8b517d6df4a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -503,7 +503,6 @@ void __khugepaged_exit(struct mm_struct *mm) static void release_pte_page(struct page *page) { - dec_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); unlock_page(page); putback_lru_page(page); } @@ -602,8 +601,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_DEL_PAGE_LRU; goto out; } - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageLRU(page), page); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 7ef849da8278..9900bb95d774 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1791,9 +1791,6 @@ static int __soft_offline_page(struct page *page, int flags) * so use !__PageMovable instead for LRU page's mapping * cannot have PAGE_MAPPING_MOVABLE. */ - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); list_add(&page->lru, &pagelist); ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b9ba5b85f9f7..15bad2043b41 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1383,10 +1383,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE); if (!ret) { /* Success */ list_add_tail(&page->lru, &source); - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - } else { pr_warn("failed to isolate pfn %lx\n", pfn); dump_page(page, "isolation failed"); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4acc2d14bc77..a5685eee6d1d 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -994,12 +994,8 @@ static int migrate_page_add(struct page *page, struct list_head *pagelist, * Avoid migrating a page that is shared with others. */ if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); - } } return 0; diff --git a/mm/migrate.c b/mm/migrate.c index 8992741f10aa..f54f449a99f5 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -190,8 +190,6 @@ void putback_movable_pages(struct list_head *l) unlock_page(page); put_page(page); } else { - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); putback_lru_page(page); } } @@ -1175,10 +1173,17 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, return -ENOMEM; if (page_count(page) == 1) { + bool is_lru = !__PageMovable(page); + /* page was freed from under us. So we are done. */ ClearPageActive(page); ClearPageUnevictable(page); - if (unlikely(__PageMovable(page))) { + if (likely(is_lru)) + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); + else { lock_page(page); if (!PageMovable(page)) __ClearPageIsolated(page); @@ -1204,15 +1209,6 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, * restored. */ list_del(&page->lru); - - /* - * Compaction can migrate also non-LRU pages which are - * not accounted to NR_ISOLATED_*. They can be recognized - * as __PageMovable - */ - if (likely(!__PageMovable(page))) - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); } /* @@ -1566,9 +1562,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, err = 0; list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); } out_putpage: /* @@ -1884,8 +1877,6 @@ static struct page *alloc_misplaced_dst_page(struct page *page, static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) { - int page_lru; - VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page); /* Avoid migrating to a node that is nearly full */ @@ -1907,10 +1898,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) return 0; } - page_lru = page_is_file_cache(page); - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru, - hpage_nr_pages(page)); - /* * Isolating the page has taken another reference, so the * caller's reference can be safely dropped without the page @@ -1965,8 +1952,6 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, if (nr_remaining) { if (!list_empty(&migratepages)) { list_del(&page->lru); - dec_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); putback_lru_page(page); } isolated = 0; @@ -1996,7 +1981,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, pg_data_t *pgdat = NODE_DATA(node); int isolated = 0; struct page *new_page = NULL; - int page_lru = page_is_file_cache(page); unsigned long start = address & HPAGE_PMD_MASK; new_page = alloc_pages_node(node, @@ -2042,8 +2026,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, /* Retake the callers reference and putback on LRU */ get_page(page); putback_lru_page(page); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR); goto out_unlock; } @@ -2093,9 +2075,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR); count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, - -HPAGE_PMD_NR); return isolated; out_fail: diff --git a/mm/vmscan.c b/mm/vmscan.c index b4fa04d10ba6..ca192b792d4f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1016,6 +1016,9 @@ int remove_mapping(struct address_space *mapping, struct page *page) void putback_lru_page(struct page *page) { lru_cache_add(page); + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + page_is_file_cache(page), + -hpage_nr_pages(page)); put_page(page); /* drop ref from isolate */ } @@ -1481,6 +1484,9 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ nr_reclaimed += nr_pages; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -nr_pages); /* * Is there need to periodically free_page_list? It would * appear not as the counts should be low @@ -1556,7 +1562,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, TTU_IGNORE_ACCESS, &dummy_stat, true); list_splice(&clean_pages, page_list); - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret); return ret; } @@ -1632,6 +1637,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) */ ClearPageLRU(page); ret = 0; + __mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } return ret; @@ -1763,6 +1771,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, total_scan, skipped, nr_taken, mode, lru); update_lru_sizes(lruvec, lru, nr_zone_taken); + return nr_taken; } @@ -1811,6 +1820,9 @@ int isolate_lru_page(struct page *page) ClearPageLRU(page); del_page_from_lru_list(page, lruvec, lru); ret = 0; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } spin_unlock_irq(&pgdat->lru_lock); } @@ -1902,6 +1914,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); list_move(&page->lru, &lruvec->lists[lru]); + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); if (put_page_testzero(page)) { __ClearPageLRU(page); __ClearPageActive(page); @@ -1979,7 +1994,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT; @@ -2005,8 +2019,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, move_pages_to_lru(lruvec, &page_list); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&page_list); @@ -2065,7 +2077,6 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; __count_vm_events(PGREFILL, nr_scanned); @@ -2134,7 +2145,6 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_vm_events(PGDEACTIVATE, nr_deactivate); __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&l_active); From patchwork Sun Jul 14 23:33:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11043307 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 389EB14F6 for ; Sun, 14 Jul 2019 23:34:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2740026E73 for ; Sun, 14 Jul 2019 23:34:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1B00426E78; Sun, 14 Jul 2019 23:34:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2490226E73 for ; Sun, 14 Jul 2019 23:34:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1818A6B000C; Sun, 14 Jul 2019 19:34:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 15A1B6B000D; Sun, 14 Jul 2019 19:34:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEEB36B000E; Sun, 14 Jul 2019 19:34:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id AEFE96B000C for ; Sun, 14 Jul 2019 19:34:33 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id c31so8683654pgb.20 for ; Sun, 14 Jul 2019 16:34:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=x7OGRJB6G0b2mz7kofQVz0EDmFae0oRhIcZkWWsoPD8=; b=ol9FD8RWKz4EiCK3Wx5XwswMi+PmeK21Nwz0QuNP2/yXksTlVxNTsp02Ph8kKt63jJ 1dGTpE5gfo2WNWjqEVc8JuPkJSBZr0dPx/uTSk3UgbcuKns/hVYJNl2G8NC/XMEEQ4q8 zT0Dq4+AZcyK6y6l0u6KUTWNRdp2gHkO1LNxFqaDiC2Kv49viw22D988B1Xp0+4Av6dF pQErltCeu/E29bm3Nqk4agv/VDW6l449Dpe9sLSpnQADMkuHFWnwVdXQyO6QoKV2Q5J+ VnMwL1qSYaHgImJ7lRV2Uq2H8LGalGcLeb4EFEm5qLdPgtJCnNh8fQw8dSD50/IFpIXy oJfg== X-Gm-Message-State: APjAAAUTjOyCDtt7tLedzybGOHo9TcLEXbPELk6UvL1r9n27GJ6o1prP /ZD6C9LfNVB5P4cAixWuoQ6xItlZl+fm4+pCdjUAwUdNz1dYiNkOppuyH/kg4oxwqbi+ASAvEfU yzq7fFMqHwlk1OjIna3y3vKeSTNEJ0gPGdVx9f9iSpCn8uN6O+E68yUaeMhzlKos= X-Received: by 2002:a17:90a:ba94:: with SMTP id t20mr26955081pjr.116.1563147273360; Sun, 14 Jul 2019 16:34:33 -0700 (PDT) X-Received: by 2002:a17:90a:ba94:: with SMTP id t20mr26955010pjr.116.1563147272182; Sun, 14 Jul 2019 16:34:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563147272; cv=none; d=google.com; s=arc-20160816; b=GhKBRfrpehcq2yHOAaVnzctsJb+yTi6Wtj6uNgP5v+xWtXvV3zrn+A0Wk9tZG6CeXD BOkVLfJxlYVMgf3wDoaRjbQyBYYCz6DOmQVgg+NfIvt53Z0hq8z+T8O5bZb6F9JK6mYw YkYCrAonhPTTHupNaPdwkXTukOLHqeor5KXKYZpvumHved+DuP+1yXEBPP6SP2fi2Dz7 AwipBFaYNRpNbdqzRVsR++zQtRA53PFv4v7NP9GH1bxLjQtdKtfMFOorEGifepjyT0zS PVGOE+6mQSD+3NN67AyHkY1Z5mMVmilpPvbIJb4PnQzWNmezL0w+fc4La/mWXoF7GUOg 27WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=x7OGRJB6G0b2mz7kofQVz0EDmFae0oRhIcZkWWsoPD8=; b=Bnt5DX+G/mh6JnSdXZ/fWozGHn2qpcXCwZhEUVlGBhKNbCHSCEwzaVS1/j0hmdDJVt jfeyln2kf+Ob++IwP+csq2w0maejNS0AcrBPm5maoLKCIp5oKtpBqaXMFAnHC/uNgLhT NSw8+q4/3tuY6NZy3WoBEsdZKu6P1p2q3OcBp0AOrIzgW8jQU9QJT6RBi2oCb3wA0c7N eKOuNn/1roXxB4pKBL108gz+Uw3UEwpDjyz6/7GREHCVTndTJRQBMKMZIgvk25PEEwpy DMtNk2hj60A+yK6t4TfMLIl4po7K7FY8ngoR9i7xOcOs2qF26h8/iCerbhr+UcEVXEdS 9QiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ddMLp6o0; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id j20sor8080368pgh.82.2019.07.14.16.34.32 for (Google Transport Security); Sun, 14 Jul 2019 16:34:32 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ddMLp6o0; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=x7OGRJB6G0b2mz7kofQVz0EDmFae0oRhIcZkWWsoPD8=; b=ddMLp6o04J7hQ+K38i+GP3H0ZubPFIHsqFwcHMmxJ9v48gUP0fuvM+l6mnwJE/9wsx GFE+L/uPVo1mBgOGvQQCtT6LezL7uuCVO0ORD2gBBNrtWu/xq4RE6gQQsy35L7JdIttq 54urrtDQlbmP5wRPnokG765XWeFnsK8hsFUKowTvY4vlEJ/lsYVnddnDXt8qOQZK1D7c MIpKmIrKQkx3J12nU075MN6SmZXtoXBdW9kv9AbTMm333eu12ewZ/2qUKIKNgKl7CNgt Hpd51cBHgczkwbmpTYAb+m3anRBZrvLEl0+JUqfUa6VLSSBuVNArJELA8ixnV0qG16aE ObHA== X-Google-Smtp-Source: APXvYqyJw37bA7hrK+3Q10r6Le9z/HIefQjYqIyB2isQKf3LG0jwaP0U+q8ekBSZxR0YwXi6tYz4Sg== X-Received: by 2002:a63:f510:: with SMTP id w16mr24293782pgh.0.1563147271675; Sun, 14 Jul 2019 16:34:31 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id n26sm16256923pfa.83.2019.07.14.16.34.26 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sun, 14 Jul 2019 16:34:30 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v5 4/5] mm: introduce MADV_PAGEOUT Date: Mon, 15 Jul 2019 08:33:59 +0900 Message-Id: <20190714233401.36909-5-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.510.g264f2c817a-goog In-Reply-To: <20190714233401.36909-1-minchan@kernel.org> References: <20190714233401.36909-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. * v4 * clear young bit regardless of success of page isolation - hannes * v3 * man page material modification - mhocko * remove using SWAP_CLUSTER_MAX - mhocko * v2 * add comment about SWAP_CLUSTER_MAX - mhocko * add permission check to prevent sidechannel attack - mhocko * add man page stuff - dave * v1 * change pte to old and rely on the other's reference - hannes * remove page_mapcount to check shared page - mhocko * RFC v2 * make reclaim_pages simple via factoring out isolate logic - hannes * RFCv1 * rename from MADV_COLD to MADV_PAGEOUT - hannes * bail out if process is being killed - Hillf * fix reclaim_pages bugs - Hillf [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/ Acked-by: Michal Hocko Signed-off-by: Minchan Kim --- include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/madvise.c | 195 +++++++++++++++++++++++++ mm/vmscan.c | 55 +++++++ 4 files changed, 252 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 0ce997edb8bb..063c0c1e112b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -365,6 +365,7 @@ extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern unsigned long vm_total_pages; +extern unsigned long reclaim_pages(struct list_head *page_list); #ifdef CONFIG_NUMA extern int node_reclaim_mode; extern int sysctl_min_unmapped_ratio; diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index ef8a56927b12..c613abdb7284 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -46,6 +46,7 @@ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ #define MADV_COLD 5 /* deactivatie these pages */ +#define MADV_PAGEOUT 6 /* reclaim these pages */ /* common parameters: try to keep these consistent across architectures */ #define MADV_FREE 8 /* free pages only if memory pressure */ diff --git a/mm/madvise.c b/mm/madvise.c index bae0055f9724..8db102f170ef 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -41,6 +42,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_COLD: + case MADV_PAGEOUT: case MADV_FREE: return 0; default: @@ -480,6 +482,196 @@ static long madvise_cold(struct vm_area_struct *vma, return 0; } +static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + LIST_HEAD(page_list); + struct page *page; + unsigned long next; + + if (fatal_signal_pending(current)) + return -EINTR; + + next = pmd_addr_end(addr, end); + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_tlb_entry(tlb, pmd, addr); + } + + ClearPageReferenced(page); + test_and_clear_page_young(page); + + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); +huge_unlock: + spin_unlock(ptl); + reclaim_pages(&page_list); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; +regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + /* + * creating a THP page is expensive so split it only if we + * are sure it's worth. Split it if we are only owner. + */ + if (PageTransCompound(page)) { + if (page_mapcount(page) != 1) + break; + get_page(page); + if (!trylock_page(page)) { + put_page(page); + break; + } + pte_unmap_unlock(orig_pte, ptl); + if (split_huge_page(page)) { + unlock_page(page); + put_page(page); + pte_offset_map_lock(mm, pmd, addr, &ptl); + break; + } + unlock_page(page); + put_page(page); + pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + pte--; + addr -= PAGE_SIZE; + continue; + } + + VM_BUG_ON_PAGE(PageTransCompound(page), page); + + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + ClearPageReferenced(page); + test_and_clear_page_young(page); + + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); + } + + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + reclaim_pages(&page_list); + cond_resched(); + + return 0; +} + +static void madvise_pageout_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk pageout_walk = { + .pmd_entry = madvise_pageout_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &pageout_walk); + tlb_end_vma(tlb, vma); +} + +static inline bool can_do_pageout(struct vm_area_struct *vma) +{ + if (vma_is_anonymous(vma)) + return true; + if (!vma->vm_file) + return false; + /* + * paging out pagecache only for non-anonymous mappings that correspond + * to the files the calling process could (if tried) open for writing; + * otherwise we'd be including shared non-exclusive mappings, which + * opens a side channel. + */ + return inode_owner_or_capable(file_inode(vma->vm_file)) || + inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; +} + +static long madvise_pageout(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + if (!can_do_pageout(vma)) + return 0; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_pageout_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -870,6 +1062,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_willneed(vma, prev, start, end); case MADV_COLD: return madvise_cold(vma, prev, start, end); + case MADV_PAGEOUT: + return madvise_pageout(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -892,6 +1086,7 @@ madvise_behavior_valid(int behavior) case MADV_DONTNEED: case MADV_FREE: case MADV_COLD: + case MADV_PAGEOUT: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/vmscan.c b/mm/vmscan.c index ca192b792d4f..bda3c41de767 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2153,6 +2153,61 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_deactivate, nr_rotated, sc->priority, file); } +unsigned long reclaim_pages(struct list_head *page_list) +{ + int nid = -1; + unsigned long nr_reclaimed = 0; + LIST_HEAD(node_page_list); + struct reclaim_stat dummy_stat; + struct page *page; + struct scan_control sc = { + .gfp_mask = GFP_KERNEL, + .priority = DEF_PRIORITY, + .may_writepage = 1, + .may_unmap = 1, + .may_swap = 1, + }; + + while (!list_empty(page_list)) { + page = lru_to_page(page_list); + if (nid == -1) { + nid = page_to_nid(page); + INIT_LIST_HEAD(&node_page_list); + } + + if (nid == page_to_nid(page)) { + list_move(&page->lru, &node_page_list); + continue; + } + + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + page = lru_to_page(&node_page_list); + list_del(&page->lru); + putback_lru_page(page); + } + + nid = -1; + } + + if (!list_empty(&node_page_list)) { + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + page = lru_to_page(&node_page_list); + list_del(&page->lru); + putback_lru_page(page); + } + } + + return nr_reclaimed; +} + /* * The inactive anon list should be small enough that the VM never has * to do too much work. From patchwork Sun Jul 14 23:34:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11043309 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E833F13BD for ; Sun, 14 Jul 2019 23:34:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D7CDB26E73 for ; Sun, 14 Jul 2019 23:34:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CC14A26E78; Sun, 14 Jul 2019 23:34:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1561326E73 for ; Sun, 14 Jul 2019 23:34:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3CF06B000D; Sun, 14 Jul 2019 19:34:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EE6856B000E; Sun, 14 Jul 2019 19:34:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB5896B0010; Sun, 14 Jul 2019 19:34:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id A5B8B6B000D for ; Sun, 14 Jul 2019 19:34:38 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id p29so1376404pgm.10 for ; Sun, 14 Jul 2019 16:34:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=KfEiOXyW4kuzduZtxistAHn9ZpsA2gWKadebZDAjUeQ=; b=IpdAVV9pPdu5KNH7Zp5C6kyC8YV7JYZDu/Jv6OSwvNKZnEvaNcKnFFKZ4GYlXRdi8Y 9oq+jx/r3m5P5sBLQes/VLY3fwRXn1nMpmnPhSnNjJFW7ba3KX1zMekZ1Rr/BlkU7fL3 ZqMrnSTERfFTKGps7+KFWKddryPNxo6dcHdUI9cUk3tGlBX6qDiwJZgP4jWzhUEQD1Sw lwqVXPeGt95S52GLbNgx0PpivTY7ZdhVXYgr7xGmeESPmlw7P4QC1VjnsYPo1NRmvgZe inRReEp26oWDwt75rCBnJQVK+rtTofIGU6cfALPAbVyT7zOaiZPc6ZNAcutZOHrnMDm9 ODrw== X-Gm-Message-State: APjAAAWciX0Gx/WzOMN+Jxsly/qOFjqZ1zTBnE6E3ueqHy/PkTH65pZm omI4mTNYilUiwON5+IATsAnhaWhwGwM/zWwytm92ynAbOGr63FChuCWZoRsRW3P8vmgv5qnuajV BQHoJNgPcB5eSMtFSMqb09RMQVhXMe7Omc+bQC97sIqNZl+0OdgPuEh7legSx6zY= X-Received: by 2002:a17:90a:8c0c:: with SMTP id a12mr25894647pjo.67.1563147278355; Sun, 14 Jul 2019 16:34:38 -0700 (PDT) X-Received: by 2002:a17:90a:8c0c:: with SMTP id a12mr25894603pjo.67.1563147277558; Sun, 14 Jul 2019 16:34:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563147277; cv=none; d=google.com; s=arc-20160816; b=Ot4JY3SovF3VHHUXh2RqC8Af82F0uUaXkTsuuP317Rfj6JQngtq6QL1XHMpVwsbcr1 xoxX3Q13aE6NB7ieY1mHv/kY85y595H+RlpVuUUoHPNDEozMgJrvobyRkg0HY/Y/Prpw 3T2+6xuf3AY/fxc5m4S4y7wCgmreDyqn+GH0yJLgpVZhl2/6NML1Pkw/h/WUzofgpMST t4TlJpMKp8a4aIJfILIG9Fjy997rXAoS0B3On1tOuqtr93jziOEdXq4bqM/zMvmo9G1T u8Tl8l4p+D7E4BrKZke4NYHC+wKME7UzjT6UR+6NQcTwfxHc+P1jsqQVy0Fkd764p5q/ rxsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=KfEiOXyW4kuzduZtxistAHn9ZpsA2gWKadebZDAjUeQ=; b=PpABCUd5vPK48KVoaQgzURb0hce2MU9GHIOC2DCbRKhuAAUwYs6nrkxy7aO/fkYRF7 P5mXWKI9A2RbwqbxSPCZz30uHEh35GnyDYdTh8PhMeH0cCxp9CJ8FqXIo7ja60u2B5+z EbLS3TpT7T16kCR2bSfEuevMkb+pDmkmOo/Hs692W5+mUcZ7TZw2mfDrHzDWU71X6DrF eLtoP2dMN2x0xGdE6HEpN04dHYAejGVdtDYKVOZkWGPhLEyUvWpRGw2NBartIOrrMnNk XG8UI5lCX59meNimDfbET1u/WftCo+Sfw+VXGcgURFOYi09Sy+EWXUP7ZdX6Sp4d11Z0 hT9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rf4itW1I; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id d2sor8389774pfn.15.2019.07.14.16.34.37 for (Google Transport Security); Sun, 14 Jul 2019 16:34:37 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rf4itW1I; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KfEiOXyW4kuzduZtxistAHn9ZpsA2gWKadebZDAjUeQ=; b=rf4itW1IsHxAFs9MxOB0TmnKsP3+cBK+3y0QWniCdN9fc36+7Wn17VNe/511cQywEj nwKt2XJoD9k9jweT3ijg2UwqWMp2UBUFM4wr4WdcfgixPwrvVcbyA+zeUlatwyJkYjHt FUKsEjRrldB2OmI3pHGg1pppPkGN7ok9C95b8NeDAELf2N+g/ai3UfPtG4mWQlLktgxF QlmWAdn/h767jcyQ665JXOHeko34GAqkIPVR9ztFU2XnY+W46OlbWMMMppaNG+vuqfiy +4o/1VI8cvBCCFy12b0GDdum4qatHpezBIrd518V1f0JgdnySCEMnFJEktvngZ1dsbLK Oj9A== X-Google-Smtp-Source: APXvYqzC7RBZaMwY5bYAkeGQmgM4AZYBlT/9oSZ7xWAe4a7ZbX2ktJRZOUhtUi1xyjHDdH3K4Bjarg== X-Received: by 2002:a63:f04:: with SMTP id e4mr23148786pgl.38.1563147277064; Sun, 14 Jul 2019 16:34:37 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id n26sm16256923pfa.83.2019.07.14.16.34.31 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Sun, 14 Jul 2019 16:34:35 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v5 5/5] mm: factor out common parts between MADV_COLD and MADV_PAGEOUT Date: Mon, 15 Jul 2019 08:34:00 +0900 Message-Id: <20190714233401.36909-6-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.510.g264f2c817a-goog In-Reply-To: <20190714233401.36909-1-minchan@kernel.org> References: <20190714233401.36909-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There are many common parts between MADV_COLD and MADV_PAGEOUT. This patch factor them out to save code duplication. Acked-by: Michal Hocko Signed-off-by: Minchan Kim --- mm/madvise.c | 193 ++++++++++++--------------------------------------- 1 file changed, 46 insertions(+), 147 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 8db102f170ef..693239a8f67b 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -30,6 +30,11 @@ #include "internal.h" +struct madvise_walk_private { + struct mmu_gather *tlb; + bool pageout; +}; + /* * Any behaviour which results in changes to the vma->vm_flags needs to * take mmap_sem for writing. Others, which simply traverse vmas, need @@ -310,16 +315,23 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } -static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, struct mm_walk *walk) +static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, + unsigned long addr, unsigned long end, + struct mm_walk *walk) { - struct mmu_gather *tlb = walk->private; + struct madvise_walk_private *private = walk->private; + struct mmu_gather *tlb = private->tlb; + bool pageout = private->pageout; struct mm_struct *mm = tlb->mm; struct vm_area_struct *vma = walk->vma; pte_t *orig_pte, *pte, ptent; spinlock_t *ptl; - struct page *page; unsigned long next; + struct page *page = NULL; + LIST_HEAD(page_list); + + if (fatal_signal_pending(current)) + return -EINTR; next = pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) { @@ -366,10 +378,17 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, tlb_remove_pmd_tlb_entry(tlb, pmd, addr); } + ClearPageReferenced(page); test_and_clear_page_young(page); - deactivate_page(page); + if (pageout) { + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); + } else + deactivate_page(page); huge_unlock: spin_unlock(ptl); + if (pageout) + reclaim_pages(&page_list); return 0; } @@ -437,12 +456,19 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, * As a side effect, it makes confuse idle-page tracking * because they will miss recent referenced history. */ + ClearPageReferenced(page); test_and_clear_page_young(page); - deactivate_page(page); + if (pageout) { + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); + } else + deactivate_page(page); } arch_enter_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); + if (pageout) + reclaim_pages(&page_list); cond_resched(); return 0; @@ -452,10 +478,15 @@ static void madvise_cold_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long addr, unsigned long end) { + struct madvise_walk_private walk_private = { + .tlb = tlb, + .pageout = false, + }; + struct mm_walk cold_walk = { - .pmd_entry = madvise_cold_pte_range, + .pmd_entry = madvise_cold_or_pageout_pte_range, .mm = vma->vm_mm, - .private = tlb, + .private = &walk_private, }; tlb_start_vma(tlb, vma); @@ -482,151 +513,19 @@ static long madvise_cold(struct vm_area_struct *vma, return 0; } -static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, struct mm_walk *walk) -{ - struct mmu_gather *tlb = walk->private; - struct mm_struct *mm = tlb->mm; - struct vm_area_struct *vma = walk->vma; - pte_t *orig_pte, *pte, ptent; - spinlock_t *ptl; - LIST_HEAD(page_list); - struct page *page; - unsigned long next; - - if (fatal_signal_pending(current)) - return -EINTR; - - next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) { - pmd_t orig_pmd; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - ptl = pmd_trans_huge_lock(pmd, vma); - if (!ptl) - return 0; - - orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto huge_unlock; - - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto huge_unlock; - } - - page = pmd_page(orig_pmd); - if (next - addr != HPAGE_PMD_SIZE) { - int err; - - if (page_mapcount(page) != 1) - goto huge_unlock; - get_page(page); - spin_unlock(ptl); - lock_page(page); - err = split_huge_page(page); - unlock_page(page); - put_page(page); - if (!err) - goto regular_page; - return 0; - } - - if (pmd_young(orig_pmd)) { - pmdp_invalidate(vma, addr, pmd); - orig_pmd = pmd_mkold(orig_pmd); - - set_pmd_at(mm, addr, pmd, orig_pmd); - tlb_remove_tlb_entry(tlb, pmd, addr); - } - - ClearPageReferenced(page); - test_and_clear_page_young(page); - - if (!isolate_lru_page(page)) - list_add(&page->lru, &page_list); -huge_unlock: - spin_unlock(ptl); - reclaim_pages(&page_list); - return 0; - } - - if (pmd_trans_unstable(pmd)) - return 0; -regular_page: - tlb_change_page_size(tlb, PAGE_SIZE); - orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); - for (; addr < end; pte++, addr += PAGE_SIZE) { - ptent = *pte; - if (!pte_present(ptent)) - continue; - - page = vm_normal_page(vma, addr, ptent); - if (!page) - continue; - - /* - * creating a THP page is expensive so split it only if we - * are sure it's worth. Split it if we are only owner. - */ - if (PageTransCompound(page)) { - if (page_mapcount(page) != 1) - break; - get_page(page); - if (!trylock_page(page)) { - put_page(page); - break; - } - pte_unmap_unlock(orig_pte, ptl); - if (split_huge_page(page)) { - unlock_page(page); - put_page(page); - pte_offset_map_lock(mm, pmd, addr, &ptl); - break; - } - unlock_page(page); - put_page(page); - pte = pte_offset_map_lock(mm, pmd, addr, &ptl); - pte--; - addr -= PAGE_SIZE; - continue; - } - - VM_BUG_ON_PAGE(PageTransCompound(page), page); - - if (pte_young(ptent)) { - ptent = ptep_get_and_clear_full(mm, addr, pte, - tlb->fullmm); - ptent = pte_mkold(ptent); - set_pte_at(mm, addr, pte, ptent); - tlb_remove_tlb_entry(tlb, pte, addr); - } - ClearPageReferenced(page); - test_and_clear_page_young(page); - - if (!isolate_lru_page(page)) - list_add(&page->lru, &page_list); - } - - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(orig_pte, ptl); - reclaim_pages(&page_list); - cond_resched(); - - return 0; -} - static void madvise_pageout_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long addr, unsigned long end) { + struct madvise_walk_private walk_private = { + .pageout = true, + .tlb = tlb, + }; + struct mm_walk pageout_walk = { - .pmd_entry = madvise_pageout_pte_range, + .pmd_entry = madvise_cold_or_pageout_pte_range, .mm = vma->vm_mm, - .private = tlb, + .private = &walk_private, }; tlb_start_vma(tlb, vma);