From patchwork Fri Jul 26 02:34:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11060173 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C90E6C5 for ; Fri, 26 Jul 2019 02:34:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4B19A289EB for ; Fri, 26 Jul 2019 02:34:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3E3C528A38; Fri, 26 Jul 2019 02:34:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B0413289EB for ; Fri, 26 Jul 2019 02:34:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B54E66B0006; Thu, 25 Jul 2019 22:34:53 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B06758E0003; Thu, 25 Jul 2019 22:34:53 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A7F28E0002; Thu, 25 Jul 2019 22:34:53 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 562946B0006 for ; Thu, 25 Jul 2019 22:34:53 -0400 (EDT) Received: by mail-pl1-f198.google.com with SMTP id 71so27493646pld.1 for ; Thu, 25 Jul 2019 19:34:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Rvr6L/mK4zt/1yuoasBZVuEH2PHUQkL88+71vRBztOo=; b=EAr1i8rH00eYwUmQA0Ne9d1k0cvQ26GehnUIVtumXrh6H/uONKMWAWpnPQEXXxzgFD qsGHAU9L5rPgxcHjffVC/5zoBiqp2cDAhdhARdJsGum83UvHcjWX9RiqqsDtiTM6P7+Y k1wAgWyka0GPGNJ3fhz0uSd9b4qCEMNrunK5l2inEeWc4xAOyI6SfsieJXyUtkm3A69y S+WLG3CNWlenyHVFaYowGyMLp//iCsjlinrPNQNdBhjaUk3YAXK8XbikHFfAJa47VQoa SDWTQf1HKdF0Faq3FTydgJn5AGRSAZG+xCKH76JJnL/sF4s7nh+ggQ7yixIv+AjwU7qN ilUg== X-Gm-Message-State: APjAAAXxoSIZVGDwbtWbnZNMK0NCaExkXgF8oAsNWJdiUlLcKL63FHfC SemJubR5b/F/o85dIChEeoOYdzfV1uzwWMvy8HyjV62lxYzzH+gPawmsWchj5GRBkPRc0qNmYJh 9qXsNmdGbAU5zdKTL/XjofVMWIZApPXQ+f6zi1jAAkQ0Gn+TcbIv5XGju1hWl8us= X-Received: by 2002:a62:14c4:: with SMTP id 187mr19212137pfu.241.1564108492846; Thu, 25 Jul 2019 19:34:52 -0700 (PDT) X-Received: by 2002:a62:14c4:: with SMTP id 187mr19212064pfu.241.1564108491746; Thu, 25 Jul 2019 19:34:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564108491; cv=none; d=google.com; s=arc-20160816; b=CxGUQ317nM/cxs1nHHrv2Wzj1zNnqGAcgSOjDp45ffbX8i4CD+LN8uWyHnx7/ttDDU gJCtrh1liTSHuqWb0Tc6rR/JaXPTSsNQR4oBOQQQ8Cw+MlYegsCldgq+gsqgD++sk94q 13pWEbCmNyZVRi3M8sgTfN0dLW4UHRuc4lNNliUYEiunzV9fK2t5D5I1SrQDhaDvkLBo Fj2FEBiaX+5DDxWZzzBCCsiEWM+oUoCx1ZEgAh1uC189kNhZkc2xX5l8vtZpD/yKnHpx Ez3kPuhtvhSmiiBH5HAlK/PurdCTyjoRsIJ8w8NPgu231J+3XEnbMSHI4ltjT3Ff2Yba yQDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=Rvr6L/mK4zt/1yuoasBZVuEH2PHUQkL88+71vRBztOo=; b=hC1lHbGESJrw90GDSqBO6Mu8AQGEuRLINzYo6nF9fBT6+JwlYyxg6Dv5tj0Qhzt3Vq a3UIiJtM7fGAh7CIQ4XRtMXvXO6ZlxWLhhim1V6uTLRhwU4F4g26w4UeqaJOUDIIsWOC QhVpqxnxQEWZoL1LT6t67TRgXTWNJNtMsarRUG0dvUNT+Chiw48c4Fx6fn3/djVNOBqc 7VAMe4Aw6YZlkjAlrrDxkjraLC8h+3BsVZynI8UgJAdkqHOwKlwOhxUELlm0hlT4IwlJ n4eZ1mwuOKucD+wWshe8cbxH0fORC/RCd/hs7S+fg1hPrbF8pucz0UWR4ZwrWHRCaqh+ eQxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=YOQSucIQ; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id c10sor22062752pgb.31.2019.07.25.19.34.51 for (Google Transport Security); Thu, 25 Jul 2019 19:34:51 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=YOQSucIQ; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Rvr6L/mK4zt/1yuoasBZVuEH2PHUQkL88+71vRBztOo=; b=YOQSucIQC53vc/BvDAo5198kiKuiIUZZWjGm8fmS5x8QZyogi9Qx5p3AfjmH8Vuosk FCHetA7A5Qab0a+Zs1a+G5Q8jM9cDwyC3yfUmqS3psgysuQA5KpvytzWUtX3eQh5CDra 8dehSWzLAShy1Yj1sskuBnjmflxyVKRqadzoH3T+sVdR/EVfvMJEWG3Z/BtCAp7CC6yF xCKrPYIajffuZCyU6KmY9Ut2vKv8UcCYbVp1xvsS5mGuN25Y39KYbkGS7Zb2CQRPBwBS OAWhzdAlf7dGi/F3XUzBgokE+nh0lWvEuE9v8fd9WZTqOTeY35bydMn7QOr4bdR6D+vI 5cgg== X-Google-Smtp-Source: APXvYqwCP+6nD+/zMUi5gVVwRA/0wQgfXdtr0mfBW/Dqb16J2cjzlS75JVU/LtStg6dc4rDc0GmkAA== X-Received: by 2002:a65:5348:: with SMTP id w8mr88172516pgr.176.1564108491149; Thu, 25 Jul 2019 19:34:51 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id l31sm88958450pgm.63.2019.07.25.19.34.44 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 25 Jul 2019 19:34:49 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim , linux-arch@vger.kernel.org, "James E . J . Bottomley" , Richard Henderson , Ralf Baechle , Chris Zankel , kbuild test robot Subject: [PATCH v7 1/5] mm: introduce MADV_COLD Date: Fri, 26 Jul 2019 11:34:31 +0900 Message-Id: <20190726023435.214162-2-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.709.g102302147b-goog In-Reply-To: <20190726023435.214162-1-minchan@kernel.org> References: <20190726023435.214162-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. * v6 * Fix build error kbuildbot reported * https://lore.kernel.org/linux-mm/201907251647.fhJ6XzdA%25lkp@intel.com/ * https://lore.kernel.org/linux-mm/201907251529.kTj2FpcL%25lkp@intel.com/ * v5 * Fix typo and correct wrong lazy_mmu_mode pair use - surenb * v2 * add up the warn with lots of page cache workload - mhocko * add man page stuff - dave * v1 * remove page_mapcount filter - hannes, mhocko * remove idle page handling - joelaf * RFCv2 * add more description - mhocko * RFCv1 * renaming from MADV_COOL to MADV_COLD - hannes * internal review * use clear_page_youn in deactivate_page - joelaf * Revise the description - surenb * Renaming from MADV_WARM to MADV_COOL - surenb Cc: linux-arch@vger.kernel.org Cc: James E.J. Bottomley Cc: Richard Henderson Cc: Ralf Baechle Cc: Chris Zankel Reported-by: kbuild test robot Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Minchan Kim --- arch/alpha/include/uapi/asm/mman.h | 2 + arch/mips/include/uapi/asm/mman.h | 2 + arch/parisc/include/uapi/asm/mman.h | 2 + arch/xtensa/include/uapi/asm/mman.h | 2 + include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 2 + mm/internal.h | 2 +- mm/madvise.c | 181 ++++++++++++++++++++++++- mm/oom_kill.c | 2 +- mm/swap.c | 42 ++++++ 10 files changed, 234 insertions(+), 4 deletions(-) diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index ac23379b7a876..f3258fbf03d03 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -68,6 +68,8 @@ #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ +#define MADV_COLD 20 /* deactivate these pages */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index c2b40969eb1fa..00ad09fc5eb16 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -95,6 +95,8 @@ #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ +#define MADV_COLD 20 /* deactivate these pages */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index c98162f494dbb..eb14e3a7b8f37 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -48,6 +48,8 @@ #define MADV_DONTFORK 10 /* don't inherit across fork */ #define MADV_DOFORK 11 /* do inherit across fork */ +#define MADV_COLD 20 /* deactivate these pages */ + #define MADV_MERGEABLE 65 /* KSM may merge identical pages */ #define MADV_UNMERGEABLE 66 /* KSM may not merge identical pages */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index ebbb48842190d..f926b00ff11f9 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -103,6 +103,8 @@ #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ +#define MADV_COLD 20 /* deactivate these pages */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/include/linux/swap.h b/include/linux/swap.h index de2c67a33b7e7..0ce997edb8bbc 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -340,6 +340,7 @@ extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_all(void); extern void rotate_reclaimable_page(struct page *page); extern void deactivate_file_page(struct page *page); +extern void deactivate_page(struct page *page); extern void mark_page_lazyfree(struct page *page); extern void swap_setup(void); diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 63b1f506ea678..23431faf0eb6e 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -67,6 +67,8 @@ #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ +#define MADV_COLD 20 /* deactivate these pages */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/internal.h b/mm/internal.h index e32390802fd3f..0d5f720c75abf 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -39,7 +39,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf); void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling); -static inline bool can_madv_dontneed_vma(struct vm_area_struct *vma) +static inline bool can_madv_lru_vma(struct vm_area_struct *vma) { return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)); } diff --git a/mm/madvise.c b/mm/madvise.c index 968df3aa069fd..e724bce09d7ca 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -40,6 +41,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_REMOVE: case MADV_WILLNEED: case MADV_DONTNEED: + case MADV_COLD: case MADV_FREE: return 0; default: @@ -307,6 +309,178 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } +static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + struct page *page; + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + unsigned long next = pmd_addr_end(addr, end); + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + } + + test_and_clear_page_young(page); + deactivate_page(page); +huge_unlock: + spin_unlock(ptl); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; +regular_page: +#endif + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + + if (pte_none(ptent)) + continue; + + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + /* + * Creating a THP page is expensive so split it only if we + * are sure it's worth. Split it if we are only owner. + */ + if (PageTransCompound(page)) { + if (page_mapcount(page) != 1) + break; + get_page(page); + if (!trylock_page(page)) { + put_page(page); + break; + } + pte_unmap_unlock(orig_pte, ptl); + if (split_huge_page(page)) { + unlock_page(page); + put_page(page); + pte_offset_map_lock(mm, pmd, addr, &ptl); + break; + } + unlock_page(page); + put_page(page); + pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + pte--; + addr -= PAGE_SIZE; + continue; + } + + VM_BUG_ON_PAGE(PageTransCompound(page), page); + + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + + /* + * We are deactivating a page for accelerating reclaiming. + * VM couldn't reclaim the page unless we clear PG_young. + * As a side effect, it makes confuse idle-page tracking + * because they will miss recent referenced history. + */ + test_and_clear_page_young(page); + deactivate_page(page); + } + + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + cond_resched(); + + return 0; +} + +static void madvise_cold_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk cold_walk = { + .pmd_entry = madvise_cold_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &cold_walk); + tlb_end_vma(tlb, vma); +} + +static long madvise_cold(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_cold_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -519,7 +693,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, int behavior) { *prev = vma; - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (!userfaultfd_remove(vma, start, end)) { @@ -541,7 +715,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, */ return -ENOMEM; } - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (end > vma->vm_end) { /* @@ -695,6 +869,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_remove(vma, prev, start, end); case MADV_WILLNEED: return madvise_willneed(vma, prev, start, end); + case MADV_COLD: + return madvise_cold(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -716,6 +892,7 @@ madvise_behavior_valid(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_FREE: + case MADV_COLD: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/oom_kill.c b/mm/oom_kill.c index a2a5edbf61789..493028ad865f1 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -522,7 +522,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm) set_bit(MMF_UNSTABLE, &mm->flags); for (vma = mm->mmap ; vma; vma = vma->vm_next) { - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) continue; /* diff --git a/mm/swap.c b/mm/swap.c index 0226c53465604..9c0c5d6286faa 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -47,6 +47,7 @@ int page_cluster; static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); +static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs); #ifdef CONFIG_SMP static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs); @@ -538,6 +539,22 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, update_page_reclaim_stat(lruvec, file, 0); } +static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, + void *arg) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + int file = page_is_file_cache(page); + int lru = page_lru_base_type(page); + + del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE); + ClearPageActive(page); + ClearPageReferenced(page); + add_page_to_lru_list(page, lruvec, lru); + + __count_vm_events(PGDEACTIVATE, hpage_nr_pages(page)); + update_page_reclaim_stat(lruvec, file, 0); + } +} static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, void *arg) @@ -590,6 +607,10 @@ void lru_add_drain_cpu(int cpu) if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); + pvec = &per_cpu(lru_deactivate_pvecs, cpu); + if (pagevec_count(pvec)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + pvec = &per_cpu(lru_lazyfree_pvecs, cpu); if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); @@ -623,6 +644,26 @@ void deactivate_file_page(struct page *page) } } +/* + * deactivate_page - deactivate a page + * @page: page to deactivate + * + * deactivate_page() moves @page to the inactive list if @page was on the active + * list and was not an unevictable page. This is done to accelerate the reclaim + * of @page. + */ +void deactivate_page(struct page *page) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); + + get_page(page); + if (!pagevec_add(pvec, page) || PageCompound(page)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + put_cpu_var(lru_deactivate_pvecs); + } +} + /** * mark_page_lazyfree - make an anon page lazyfree * @page: page to deactivate @@ -687,6 +728,7 @@ void lru_add_drain_all(void) if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) || pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) || + pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) || need_activate_page_drain(cpu)) { INIT_WORK(work, lru_add_drain_per_cpu); From patchwork Fri Jul 26 02:34:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11060175 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 294F2912 for ; Fri, 26 Jul 2019 02:35:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 19E1728A32 for ; Fri, 26 Jul 2019 02:35:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0A44D289EB; Fri, 26 Jul 2019 02:35:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64E5B289EB for ; Fri, 26 Jul 2019 02:34:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 738D26B0007; Thu, 25 Jul 2019 22:34:58 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6C07C8E0003; Thu, 25 Jul 2019 22:34:58 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B04E8E0002; Thu, 25 Jul 2019 22:34:58 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 25C226B0007 for ; Thu, 25 Jul 2019 22:34:58 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id 8so26649553pgl.3 for ; Thu, 25 Jul 2019 19:34:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=kQxMbqrsX/nZiD0N6XPZDIM95WzOo92zapucPy1vlUg=; b=q4ojxIxO6OIutrVgOQVXraEzRKiDZFOVrL+ENbJ+vCvg1MByvupX6HP55F7n2+1/gS iXuPnELY2Ie5h4gVl5STclzjNmw89XycfKk5rI4ZpJv4cSm6hHremuHgYCwpjnj4dzU1 RsuJmJxvROeGETyv+HYj4crecASheK9tAUWiycCUF7ezwJzzkgxG88Pw6ojoRtKoDzXI yhbgYgyDwpVHL+lWKRb7N3HBldNzrjbiXmYnCSpOs/kkI9+RPnckraloq86UCTAS2Jlx zUdzd1P0D68HZTnRn5GjRQdIIAvibJX1N7dHQnEs4NcsLrzA9saPxPLBV8/HfOk6c4es rHpA== X-Gm-Message-State: APjAAAXKjLL99V7mvj4Q4BHEZ/EOnfgYfbhQpWQSvOfTE/TuZSQW0/Yt CppnJwAtY4kBEXqfnLR6Ut/I9K6vkWFqG9zSFSGu1G8SF+Nj22QwvzOR6vAayPAy4RU93fiksHy JcjGpzLtuusEMvNQfTG6ekUnGStV/T/UZ4ju9VKCZvOTi7m9VVwTbKlnq1meh5RE= X-Received: by 2002:a17:90a:d14a:: with SMTP id t10mr55799942pjw.85.1564108497818; Thu, 25 Jul 2019 19:34:57 -0700 (PDT) X-Received: by 2002:a17:90a:d14a:: with SMTP id t10mr55799912pjw.85.1564108497112; Thu, 25 Jul 2019 19:34:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564108497; cv=none; d=google.com; s=arc-20160816; b=QTjHcWF55NqwkMEK4S/r1Mrz08OH9DFvwO2gk6r047jLG5u8rGCXH4QfGKDl83GOTl ejUwes3VN2Pht9x8x56cNH3cvlwvvBDZ3etcaxJZyY4LLqY4rPBLNcJXDJCkcVCbhqpd gYWS4sULENK2rtS26G3q/GGPPIIrzityhTpkTrlZPy+8YwFQpVWSUgoWJUg0Az/T3sFI bXuO34b5J4uX6lmkwbLUwDKe3rSYn2Y86F4ZnSOnu8ZIjKk5oei/3v3v1LQ0AUiphhCv SNPOsLMGEZ6byK97WR+fgqj5JHLQdBtaArRAonxiI34FC98b7pFp5s05mx4VI4h4Y1wb 8uoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=kQxMbqrsX/nZiD0N6XPZDIM95WzOo92zapucPy1vlUg=; b=ecM6QyWZYm/Wn1O4nQkXfJPhvfjhEnOmFwFYpSRPnulXS43Ito8ijq/7TPyArrQMDJ oNc1HS/UoplL8TTazMBkEnS1CmcC1/ifI36bFFsmisrpZwleZD3XlZDQUCvh3pmkhoP2 dAvBT8pTok9TBQO022A0c52w0CN2FYOwA/bilQwpziIQ9YDijGpqDE9gCAyp/J/KbvXG CCBHiSMSB1+gYbrBbQre8/cAbI6FNOOPe/3gNEUV16s2I/ihXgZPaF7tDfR4vZhUHsaC npwnAufnIurnacolOBjLVazIBojtEh4l3oBbvfV2/48XmSEBcVMQS8+LAiX/zUVFborq ESgA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oKcEphqr; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 142sor30519756pge.7.2019.07.25.19.34.57 for (Google Transport Security); Thu, 25 Jul 2019 19:34:57 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oKcEphqr; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kQxMbqrsX/nZiD0N6XPZDIM95WzOo92zapucPy1vlUg=; b=oKcEphqrcIJTXgzvD1uIGdr1uqZO2gHbJqUCYNEjScVnAblxZ7V2fe3YaOhp91mxei HMBTDS5xNe4zP8el69NOJvVwqVrbeZ4vhgV39daqg1CBlxo1/Y0l1fJlXb2u0POuHO+G +fR8O/S29Hirw56Er6PMRq7F+Vh3GI18riHLeQYpDi6lRimdRcOAePdbgZFqMWzIeJRZ wfLljhwWF8vsCNEVuFI9iu9LiKanPYdZ0cOZ7vhxfgQ7CnngyxpbiaKc8mM2UEo7piE/ CEtQMq6PjMVOCWeKRpdJPWERXx4gZj0BLLz1IUhQV2CVQ7ZIL+/i8gkgUlPDYMRdHKHm a5+g== X-Google-Smtp-Source: APXvYqwCBy1Zc5tBK+qCsIuV/IuM/t64Rz0ADnQpNhj+o/kGXbRTjM5yaRilbKPxid2BEuggsNOknw== X-Received: by 2002:a63:3805:: with SMTP id f5mr55887841pga.272.1564108496625; Thu, 25 Jul 2019 19:34:56 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id l31sm88958450pgm.63.2019.07.25.19.34.51 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 25 Jul 2019 19:34:55 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v7 2/5] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Date: Fri, 26 Jul 2019 11:34:32 +0900 Message-Id: <20190726023435.214162-3-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.709.g102302147b-goog In-Reply-To: <20190726023435.214162-1-minchan@kernel.org> References: <20190726023435.214162-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN as default. It is for preventing to reclaim dirty pages when CMA try to migrate pages. Strictly speaking, we don't need it because CMA didn't allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list. Moreover, it has a problem to prevent anonymous pages's swap out even though force_reclaim = true in shrink_page_list on upcoming patch. So this patch makes references's default value to PAGEREF_RECLAIM and rename force_reclaim with ignore_references to make it more clear. This is a preparatory work for next patch. * RFCv1 * use ignore_referecnes as parameter name - hannes Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Minchan Kim --- mm/vmscan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 82e1e229eef21..436577236dd3e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1124,7 +1124,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct scan_control *sc, enum ttu_flags ttu_flags, struct reclaim_stat *stat, - bool force_reclaim) + bool ignore_references) { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); @@ -1138,7 +1138,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct address_space *mapping; struct page *page; int may_enter_fs; - enum page_references references = PAGEREF_RECLAIM_CLEAN; + enum page_references references = PAGEREF_RECLAIM; bool dirty, writeback; unsigned int nr_pages; @@ -1269,7 +1269,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, } } - if (!force_reclaim) + if (!ignore_references) references = page_check_references(page, sc); switch (references) { From patchwork Fri Jul 26 02:34:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11060177 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 693DD912 for ; Fri, 26 Jul 2019 02:35:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54857289EB for ; Fri, 26 Jul 2019 02:35:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4404B28A38; Fri, 26 Jul 2019 02:35:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 40E96289EB for ; Fri, 26 Jul 2019 02:35:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B3A56B0008; Thu, 25 Jul 2019 22:35:04 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 363468E0003; Thu, 25 Jul 2019 22:35:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2541B8E0002; Thu, 25 Jul 2019 22:35:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id DEBE06B0008 for ; Thu, 25 Jul 2019 22:35:03 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id n4so26976103plp.4 for ; Thu, 25 Jul 2019 19:35:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=0uelhR0bVtCikiO3UjN5/qieUxFslDUIqUJrGkVz98o=; b=LHIynJYeJ6lF4Uv+w7DjAa/llwjJh+IkP2UJGKQImYq9RYI+0ZxhuoWODIH6PFQr/o AWgsXz+2hw5yxnNK3dLdM4RApcU6sJjzGo1ywk3JLj+LKLy8boX+bu8EruuokIqWGVG6 97LzKQ/PtCuyW210Z9bOQVudQBBFBTE7/OWTQLZLfYcbmKFsw0yDj3Tymi6Ifv69Erj4 ok8dKDNeoHFUBbAtbAYxJMEzwB06Rl5MAYl0/a1WKL0zKRT8jZOWCtYsdMc4UQSumMOO cbWIvcaaPIcn66m3uBVClCA4vQSZeusipwS+SidibJLhKP00epiy0gooOsPZrxYJLOtM qsQA== X-Gm-Message-State: APjAAAXHsGtUOscwgY4Pq2bjW3OqdAh8zx66yzYZgE/OUsNs3mtKFbLr eScolb0KvlLDG+gaS4RpqB+xkwSE8l8evMteXsGid8fKkmfB4wZM9hBdxMdqPQsnVo26k7SCTCm oowpXR7hKC4UA1rUfpPC8yy3JZg/b3zNQHvT3vHvAYyIRBd9KaLcpOh5euyw78nA= X-Received: by 2002:a63:6947:: with SMTP id e68mr53081151pgc.60.1564108503455; Thu, 25 Jul 2019 19:35:03 -0700 (PDT) X-Received: by 2002:a63:6947:: with SMTP id e68mr53081094pgc.60.1564108502310; Thu, 25 Jul 2019 19:35:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564108502; cv=none; d=google.com; s=arc-20160816; b=Ybwx8dpcdFmwOch0o7NLumzcKi4sNyH0jTKK+xKF8fEsI0VdbeUL68+/PAOZQ1EsvA JiFtXHkSjN1FHvcSPEBe+B0XRyiuTvF+XNmY2jpOLBKho5p3BhyAkP/bjDlUiV/5oN50 m6MhuKa+av+2NLHaRRmyx7zV85AujPTsOCApp+811tLV1QXqj9tohZDwsZBxq/FAZOQ7 0ycWNxUFd6ne1DnfDvuFjWxay2bPBbvfFpxj4AxUvxgfLKPHtlhiL1khu/MKXI7Y/35C zuhe3wqMEQ1lsi6f5ImPPGmt+HMKRTsEB8njKFwTyLxaXW/ITf/vGgqCzmohbdnvmg6q JzrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=0uelhR0bVtCikiO3UjN5/qieUxFslDUIqUJrGkVz98o=; b=SpJHiSKB7jnrJ3xev8mWtXOIjIkSQ6+pf0tizoddxD3b5JatS0irJ7a+BloUOKmqAt tCduDzNL25taMUHs0arf8sf8k4A2rlXK6X3LdwfEoNKhMGunlcCf0fI+CKiaIS/NgnY/ v6+auiaV4tWxJvR2nnLnSHqz88Pf4r5NHjp4++mdYeIDAd6AhAmAuiJzx78FM3dgUiUv nNDwRJ4nfHz+Hij/Wz3VfBuWm72oRTXaXGxJHwmuy7mCilpN+XjSAYGVVaM96GvIuA6L pgneeAVJ5dNy2B7Gy2Q7Do626L8jpOorrL4EsGRprhJ+Ik7laawi/chJWqZSdVxxnUib Zv0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Nb1DZrb0; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id l68sor61937536plb.69.2019.07.25.19.35.02 for (Google Transport Security); Thu, 25 Jul 2019 19:35:02 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Nb1DZrb0; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0uelhR0bVtCikiO3UjN5/qieUxFslDUIqUJrGkVz98o=; b=Nb1DZrb0AHta/GHTPhgUw9v0mr9+lMXze6QXty1JADheqRUGT0akcOYDDNVbJ+jvhe TcV899MhXTYzNnQlERcDNnF8Z9K/f0fplT2E1YXXrJ8/CAKHeQ/rPDQGBKYq/u5EHYU8 IBxDNhTu1cGUkPCYMigStPeT9XRrMcD2f4t+qWWYo8qzaaYceUJMsk1agTbbKAVuC7E8 woU+8Z0VOg7UqvDaSwW2y7aAxu5U95WDc4iuq9I5MPWitAxnWzxBo+K6VQ71zXoLMMYT +tTT9gD2oOUvd0MHLTDh0Sp9QHzF9JFZy56qre9UdMU65lxoA9X1HCzfS+nDD1KOydLa 7jdg== X-Google-Smtp-Source: APXvYqxiGcZ8zBMIzJvwkXZqeTobDQDNvXHVXN5DDOJjlQ39/Hi0HLMdq6JxMCc004FRNhGuThOfSw== X-Received: by 2002:a17:902:aa95:: with SMTP id d21mr90124110plr.185.1564108501869; Thu, 25 Jul 2019 19:35:01 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id l31sm88958450pgm.63.2019.07.25.19.34.56 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 25 Jul 2019 19:35:00 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v7 3/5] mm: account nr_isolated_xxx in [isolate|putback]_lru_page Date: Fri, 26 Jul 2019 11:34:33 +0900 Message-Id: <20190726023435.214162-4-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.709.g102302147b-goog In-Reply-To: <20190726023435.214162-1-minchan@kernel.org> References: <20190726023435.214162-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The isolate counting is pecpu counter so it would be not huge gain to work them by batch. Rather than complicating to make them batch, let's make it more stright-foward via adding the counting logic into [isolate|putback]_lru_page API. * v1 * fix accounting bug - Hillf Link: http://lkml.kernel.org/r/20190531165927.GA20067@cmpxchg.org Suggested-by: Johannes Weiner Acked-by: Johannes Weiner Acked-by: Michal Hocko Signed-off-by: Minchan Kim --- mm/compaction.c | 2 -- mm/gup.c | 7 +------ mm/khugepaged.c | 3 --- mm/memory-failure.c | 3 --- mm/memory_hotplug.c | 4 ---- mm/mempolicy.c | 3 --- mm/migrate.c | 37 ++++++++----------------------------- mm/vmscan.c | 22 ++++++++++++++++------ 8 files changed, 25 insertions(+), 56 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index d99d59412c755..ac4ead029b4a1 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -984,8 +984,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, /* Successfully isolated */ del_page_from_lru_list(page, lruvec, page_lru(page)); - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); isolate_success: list_add(&page->lru, &cc->migratepages); diff --git a/mm/gup.c b/mm/gup.c index 012060efddf18..357cfc1ca37d1 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1460,13 +1460,8 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk, drain_allow = false; } - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(&head->lru, &cma_page_list); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + - page_is_file_cache(head), - hpage_nr_pages(head)); - } } } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index eaaa21b232156..a8b517d6df4ab 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -503,7 +503,6 @@ void __khugepaged_exit(struct mm_struct *mm) static void release_pte_page(struct page *page) { - dec_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); unlock_page(page); putback_lru_page(page); } @@ -602,8 +601,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_DEL_PAGE_LRU; goto out; } - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageLRU(page), page); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 7ef849da8278c..9900bb95d7740 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1791,9 +1791,6 @@ static int __soft_offline_page(struct page *page, int flags) * so use !__PageMovable instead for LRU page's mapping * cannot have PAGE_MAPPING_MOVABLE. */ - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); list_add(&page->lru, &pagelist); ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 5b8811945bbba..9a82e12bd0e73 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1373,10 +1373,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE); if (!ret) { /* Success */ list_add_tail(&page->lru, &source); - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - } else { pr_warn("failed to isolate pfn %lx\n", pfn); dump_page(page, "isolation failed"); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 547cd403ed020..e8bbec6148dfe 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -977,9 +977,6 @@ static int migrate_page_add(struct page *page, struct list_head *pagelist, if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { if (!isolate_lru_page(head)) { list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); } else if (flags & MPOL_MF_STRICT) { /* * Non-movable page may reach here. And, there may be diff --git a/mm/migrate.c b/mm/migrate.c index 92d346646ed55..84b89d2d69065 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -190,8 +190,6 @@ void putback_movable_pages(struct list_head *l) unlock_page(page); put_page(page); } else { - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); putback_lru_page(page); } } @@ -1177,10 +1175,17 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, return -ENOMEM; if (page_count(page) == 1) { + bool is_lru = !__PageMovable(page); + /* page was freed from under us. So we are done. */ ClearPageActive(page); ClearPageUnevictable(page); - if (unlikely(__PageMovable(page))) { + if (likely(is_lru)) + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); + else { lock_page(page); if (!PageMovable(page)) __ClearPageIsolated(page); @@ -1206,15 +1211,6 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, * restored. */ list_del(&page->lru); - - /* - * Compaction can migrate also non-LRU pages which are - * not accounted to NR_ISOLATED_*. They can be recognized - * as __PageMovable - */ - if (likely(!__PageMovable(page))) - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); } /* @@ -1568,9 +1564,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, err = 0; list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); } out_putpage: /* @@ -1886,8 +1879,6 @@ static struct page *alloc_misplaced_dst_page(struct page *page, static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) { - int page_lru; - VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page); /* Avoid migrating to a node that is nearly full */ @@ -1909,10 +1900,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) return 0; } - page_lru = page_is_file_cache(page); - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru, - hpage_nr_pages(page)); - /* * Isolating the page has taken another reference, so the * caller's reference can be safely dropped without the page @@ -1967,8 +1954,6 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, if (nr_remaining) { if (!list_empty(&migratepages)) { list_del(&page->lru); - dec_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); putback_lru_page(page); } isolated = 0; @@ -1998,7 +1983,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, pg_data_t *pgdat = NODE_DATA(node); int isolated = 0; struct page *new_page = NULL; - int page_lru = page_is_file_cache(page); unsigned long start = address & HPAGE_PMD_MASK; new_page = alloc_pages_node(node, @@ -2044,8 +2028,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, /* Retake the callers reference and putback on LRU */ get_page(page); putback_lru_page(page); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR); goto out_unlock; } @@ -2095,9 +2077,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR); count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, - -HPAGE_PMD_NR); return isolated; out_fail: diff --git a/mm/vmscan.c b/mm/vmscan.c index 436577236dd3e..d1d7163c281de 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1021,6 +1021,9 @@ int remove_mapping(struct address_space *mapping, struct page *page) void putback_lru_page(struct page *page) { lru_cache_add(page); + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + page_is_file_cache(page), + -hpage_nr_pages(page)); put_page(page); /* drop ref from isolate */ } @@ -1486,6 +1489,9 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ nr_reclaimed += nr_pages; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -nr_pages); /* * Is there need to periodically free_page_list? It would * appear not as the counts should be low @@ -1561,7 +1567,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, TTU_IGNORE_ACCESS, &dummy_stat, true); list_splice(&clean_pages, page_list); - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret); return ret; } @@ -1637,6 +1642,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) */ ClearPageLRU(page); ret = 0; + __mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } return ret; @@ -1768,6 +1776,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, total_scan, skipped, nr_taken, mode, lru); update_lru_sizes(lruvec, lru, nr_zone_taken); + return nr_taken; } @@ -1816,6 +1825,9 @@ int isolate_lru_page(struct page *page) ClearPageLRU(page); del_page_from_lru_list(page, lruvec, lru); ret = 0; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } spin_unlock_irq(&pgdat->lru_lock); } @@ -1907,6 +1919,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); list_move(&page->lru, &lruvec->lists[lru]); + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); if (put_page_testzero(page)) { __ClearPageLRU(page); __ClearPageActive(page); @@ -1984,7 +1999,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT; @@ -2010,8 +2024,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, move_pages_to_lru(lruvec, &page_list); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&page_list); @@ -2070,7 +2082,6 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; __count_vm_events(PGREFILL, nr_scanned); @@ -2139,7 +2150,6 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_vm_events(PGDEACTIVATE, nr_deactivate); __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&l_active); From patchwork Fri Jul 26 02:34:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11060179 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CDAB26C5 for ; Fri, 26 Jul 2019 02:35:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BC812289EB for ; Fri, 26 Jul 2019 02:35:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B01F728A38; Fri, 26 Jul 2019 02:35:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 81245289EB for ; Fri, 26 Jul 2019 02:35:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 673CC6B000A; Thu, 25 Jul 2019 22:35:10 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 622F86B000C; Thu, 25 Jul 2019 22:35:10 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EBE38E0002; Thu, 25 Jul 2019 22:35:10 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 173796B000A for ; Thu, 25 Jul 2019 22:35:10 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id i2so32214472pfe.1 for ; Thu, 25 Jul 2019 19:35:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=ZeIKyljSWmyR9iRQqVHC0PZiu089u4IppZY+5/gE58Q=; b=ezuKgueBmtco2s6zK1w1peCII4WK+RngYrXm/AS0LP0/0JwVi3rfKlURlpmNbapWoe V892FjM6LHf/2kn0jikwta4AAE72f5j+fEtlijjV4KBHqghbV/ZSlnTC1U34i4JQJrMC scacJ16VydDnaZ2CYjGqoqN+bgvHHSQo83rpNI241ZclhJWSuuyihBy1n5fnUbBJU4v0 vXkx28KkXHfUUaBgkoCd/EfJZglq+WiwFgZU60JU4AUdiddaiPl1SE1WOrD93lV8RMcN a8CZPFNMlPNezrE0dO0+DFIiO4yRPhuiH1tqQnFC1deT5xqLcm2QPI3Msoqw4JEoGxaG Sv5Q== X-Gm-Message-State: APjAAAX5PS5DdhllpA1o2IdJ1RFGwyczBTmllIOemQ5vMlUF6OMqNoQQ Miw05VjkPl116ZHE4mTrgrKrAJWP9++PZ69V4pl1SUPpmpJH+w49S2PpiUJ7Db+K+oY/XqOr2q+ o7iTDMDvNzUox9URS9csoEVPClJVMVIGNA+fzYKmxEBayiptAv8H5QgJaaV5brkA= X-Received: by 2002:a17:902:aa09:: with SMTP id be9mr9678781plb.52.1564108509655; Thu, 25 Jul 2019 19:35:09 -0700 (PDT) X-Received: by 2002:a17:902:aa09:: with SMTP id be9mr9678723plb.52.1564108508613; Thu, 25 Jul 2019 19:35:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564108508; cv=none; d=google.com; s=arc-20160816; b=rtk8isiA3wWXCv3UkVrNrEC+cYtdh+3MoV+5cA0JNZZeuLwrPQynbUcv1U/pD0Gd4V HBSyHLaLtvEXlJXo24IrgI5fCygYYZHxCbJZ1JJ5DlT4KoLL6Kjp39QhArU5IJLx4Ghr zUK1VHv2f7MIiei4FTvjxVFVpGDVlxxZ2QvXl/XBu3ravEwcKuVx3JmF9HaGUJkqgFG/ LU5AfV7T80Lw4bPQlA/VD4hMFtuTri7xQzr+ALEFzpJRHVBRmIHJsc/4RqZ4Ny0mjjzv viDGU+9Nty1mycoRcCxL8mDbLte5njI1LTlnieuX0OJ5YRiCxZOrp7OUA5PpmJd3RsQ8 RhXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=ZeIKyljSWmyR9iRQqVHC0PZiu089u4IppZY+5/gE58Q=; b=ecaaen/dsCJHQI/tOkT2gSN/gxXImb//bC/J3OiAai/3dGfrEz3BHiBFXG+UWgjSy+ 8s2/nnW7+iZkXZy5/nIV4xjQH/4KGFzCYclQ8Zkcu9m/e+lDbCxVmE6QIO5je2/TrGrD isPicnPi4EblcgIFV9PYRjxpuwufgxHQZrVE1lCzxYQnNI2DKGU8RTsHr1SgeeezTfct psAEJd15AxTxDTbXW+vFBjm9gmKQj7kNepESQPzXOxCrg2YLNrLmjGBnBMjHERMHQfbN peNR8WYb/wSdG//8j+2fP9EyBnOsisNm08Sa1orHwKoBhexI+HkxQHOLsYorYu0XX/8r C1kg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=KhApFIyF; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id ca12sor63432331pjb.6.2019.07.25.19.35.08 for (Google Transport Security); Thu, 25 Jul 2019 19:35:08 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=KhApFIyF; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZeIKyljSWmyR9iRQqVHC0PZiu089u4IppZY+5/gE58Q=; b=KhApFIyFr7Erp+NvpAwvyljFRNj0Z6bhK9azgVZ+z7hvyTnoQGDhh7mZl5Q3+BmVCV ndxQ6tWsnE/sgh1m1rEb+YX1Cc2Fzt6Yb0QRnD6jqX+EWpesO7fFZxqDDBmaOygJJpJg isilBv61Lqf2Xh+CwQjhRi607Dau2lLBhODPk4y6lf09qFGgjDT/QEjPSGYW/0IpkjB/ hybtYLK3dr2yqbwaWTDlHlggedtPHQ4Dd7dHaYFpyCWNbkYmVQ0jqcCkme9qSMZIhr8u mSsrF062jGFFkYWGusitBklZSu1cVhmPH91Jhf4FvO4fkeo+oSy4SN9KaouP16q0fsPK UgNA== X-Google-Smtp-Source: APXvYqykX6v+YLWhUeV/Nwry8d8mZHTwnP16dOnbAEEqXJkEARoNpZ7HrZ9YvB+8gs8yGMFuyp28uA== X-Received: by 2002:a17:90a:8a15:: with SMTP id w21mr96209854pjn.134.1564108508090; Thu, 25 Jul 2019 19:35:08 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id l31sm88958450pgm.63.2019.07.25.19.35.02 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 25 Jul 2019 19:35:07 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim , linux-arch@vger.kernel.org, "James E . J . Bottomley" , Richard Henderson , Ralf Baechle , Chris Zankel , kbuild test robot Subject: [PATCH v7 4/5] mm: introduce MADV_PAGEOUT Date: Fri, 26 Jul 2019 11:34:34 +0900 Message-Id: <20190726023435.214162-5-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.709.g102302147b-goog In-Reply-To: <20190726023435.214162-1-minchan@kernel.org> References: <20190726023435.214162-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. * v6 * Fix build error kbuildbot reported * https://lore.kernel.org/linux-mm/201907251759.zSy10dLW%25lkp@intel.com/ * v4 * clear young bit regardless of success of page isolation - hannes * v3 * man page material modification - mhocko * remove using SWAP_CLUSTER_MAX - mhocko * v2 * add comment about SWAP_CLUSTER_MAX - mhocko * add permission check to prevent sidechannel attack - mhocko * add man page stuff - dave * v1 * change pte to old and rely on the other's reference - hannes * remove page_mapcount to check shared page - mhocko * RFC v2 * make reclaim_pages simple via factoring out isolate logic - hannes * RFCv1 * rename from MADV_COLD to MADV_PAGEOUT - hannes * bail out if process is being killed - Hillf * fix reclaim_pages bugs - Hillf [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/ Cc: linux-arch@vger.kernel.org Cc: James E.J. Bottomley Cc: Richard Henderson Cc: Ralf Baechle Cc: Chris Zankel Reported-by: kbuild test robot Acked-by: Michal Hocko Signed-off-by: Minchan Kim --- arch/alpha/include/uapi/asm/mman.h | 1 + arch/mips/include/uapi/asm/mman.h | 1 + arch/parisc/include/uapi/asm/mman.h | 1 + arch/xtensa/include/uapi/asm/mman.h | 1 + include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/madvise.c | 195 +++++++++++++++++++++++++ mm/vmscan.c | 55 +++++++ 8 files changed, 256 insertions(+) diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index f3258fbf03d03..a18ec7f638880 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -69,6 +69,7 @@ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ #define MADV_COLD 20 /* deactivate these pages */ +#define MADV_PAGEOUT 21 /* reclaim these pages */ /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index 00ad09fc5eb16..57dc2ac4f8bda 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -96,6 +96,7 @@ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ #define MADV_COLD 20 /* deactivate these pages */ +#define MADV_PAGEOUT 21 /* reclaim these pages */ /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index eb14e3a7b8f37..6fd8871e4081e 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -49,6 +49,7 @@ #define MADV_DOFORK 11 /* do inherit across fork */ #define MADV_COLD 20 /* deactivate these pages */ +#define MADV_PAGEOUT 21 /* reclaim these pages */ #define MADV_MERGEABLE 65 /* KSM may merge identical pages */ #define MADV_UNMERGEABLE 66 /* KSM may not merge identical pages */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index f926b00ff11f9..e5e6437529475 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -104,6 +104,7 @@ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ #define MADV_COLD 20 /* deactivate these pages */ +#define MADV_PAGEOUT 21 /* reclaim these pages */ /* compatibility flags */ #define MAP_FILE 0 diff --git a/include/linux/swap.h b/include/linux/swap.h index 0ce997edb8bbc..063c0c1e112bd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -365,6 +365,7 @@ extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern unsigned long vm_total_pages; +extern unsigned long reclaim_pages(struct list_head *page_list); #ifdef CONFIG_NUMA extern int node_reclaim_mode; extern int sysctl_min_unmapped_ratio; diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 23431faf0eb6e..c160a5354eb62 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -68,6 +68,7 @@ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ #define MADV_COLD 20 /* deactivate these pages */ +#define MADV_PAGEOUT 21 /* reclaim these pages */ /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/madvise.c b/mm/madvise.c index e724bce09d7ca..78aa6802b95ad 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -42,6 +42,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_COLD: + case MADV_PAGEOUT: case MADV_FREE: return 0; default: @@ -481,6 +482,197 @@ static long madvise_cold(struct vm_area_struct *vma, return 0; } +static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + LIST_HEAD(page_list); + struct page *page; + + if (fatal_signal_pending(current)) + return -EINTR; + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + unsigned long next = pmd_addr_end(addr, end); + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_tlb_entry(tlb, pmd, addr); + } + + ClearPageReferenced(page); + test_and_clear_page_young(page); + + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); +huge_unlock: + spin_unlock(ptl); + reclaim_pages(&page_list); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; +regular_page: +#endif + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + /* + * creating a THP page is expensive so split it only if we + * are sure it's worth. Split it if we are only owner. + */ + if (PageTransCompound(page)) { + if (page_mapcount(page) != 1) + break; + get_page(page); + if (!trylock_page(page)) { + put_page(page); + break; + } + pte_unmap_unlock(orig_pte, ptl); + if (split_huge_page(page)) { + unlock_page(page); + put_page(page); + pte_offset_map_lock(mm, pmd, addr, &ptl); + break; + } + unlock_page(page); + put_page(page); + pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + pte--; + addr -= PAGE_SIZE; + continue; + } + + VM_BUG_ON_PAGE(PageTransCompound(page), page); + + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + ClearPageReferenced(page); + test_and_clear_page_young(page); + + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); + } + + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + reclaim_pages(&page_list); + cond_resched(); + + return 0; +} + +static void madvise_pageout_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk pageout_walk = { + .pmd_entry = madvise_pageout_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &pageout_walk); + tlb_end_vma(tlb, vma); +} + +static inline bool can_do_pageout(struct vm_area_struct *vma) +{ + if (vma_is_anonymous(vma)) + return true; + if (!vma->vm_file) + return false; + /* + * paging out pagecache only for non-anonymous mappings that correspond + * to the files the calling process could (if tried) open for writing; + * otherwise we'd be including shared non-exclusive mappings, which + * opens a side channel. + */ + return inode_owner_or_capable(file_inode(vma->vm_file)) || + inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; +} + +static long madvise_pageout(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + if (!can_do_pageout(vma)) + return 0; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_pageout_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -871,6 +1063,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_willneed(vma, prev, start, end); case MADV_COLD: return madvise_cold(vma, prev, start, end); + case MADV_PAGEOUT: + return madvise_pageout(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -893,6 +1087,7 @@ madvise_behavior_valid(int behavior) case MADV_DONTNEED: case MADV_FREE: case MADV_COLD: + case MADV_PAGEOUT: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/vmscan.c b/mm/vmscan.c index d1d7163c281de..47aa2158cfac2 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2158,6 +2158,61 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_deactivate, nr_rotated, sc->priority, file); } +unsigned long reclaim_pages(struct list_head *page_list) +{ + int nid = -1; + unsigned long nr_reclaimed = 0; + LIST_HEAD(node_page_list); + struct reclaim_stat dummy_stat; + struct page *page; + struct scan_control sc = { + .gfp_mask = GFP_KERNEL, + .priority = DEF_PRIORITY, + .may_writepage = 1, + .may_unmap = 1, + .may_swap = 1, + }; + + while (!list_empty(page_list)) { + page = lru_to_page(page_list); + if (nid == -1) { + nid = page_to_nid(page); + INIT_LIST_HEAD(&node_page_list); + } + + if (nid == page_to_nid(page)) { + list_move(&page->lru, &node_page_list); + continue; + } + + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + page = lru_to_page(&node_page_list); + list_del(&page->lru); + putback_lru_page(page); + } + + nid = -1; + } + + if (!list_empty(&node_page_list)) { + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + page = lru_to_page(&node_page_list); + list_del(&page->lru); + putback_lru_page(page); + } + } + + return nr_reclaimed; +} + /* * The inactive anon list should be small enough that the VM never has * to do too much work. From patchwork Fri Jul 26 02:34:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11060181 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 51ACA912 for ; Fri, 26 Jul 2019 02:35:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 41AC3289EB for ; Fri, 26 Jul 2019 02:35:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3557928A64; Fri, 26 Jul 2019 02:35:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 38624289EB for ; Fri, 26 Jul 2019 02:35:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F4E76B000C; Thu, 25 Jul 2019 22:35:16 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0A4EB6B000D; Thu, 25 Jul 2019 22:35:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB0728E0002; Thu, 25 Jul 2019 22:35:15 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id B2FC76B000C for ; Thu, 25 Jul 2019 22:35:15 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id 71so27494185pld.1 for ; Thu, 25 Jul 2019 19:35:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Q30YVPvUX3JGM2I0+DyGeQCKX2+K88lPFkslb0rMb3A=; b=VBMqyjBhFdrtHH9v7CzK9KYFT59DWtuZ+NQ5GjVvOQlGkPujHZCa7zH8jY9Arv0zpR 8fhXiTBQVzznjzwU+01Ukzgw5IU8/uTnuLwDWhqlI/D6qLyUIitE9I4lVgG5eXyiMeRA OMebdimq3v69bKsmYWXtUAaL0y423Vcy7bCPnJedtELD2evnRF/Kfo90/wI9gwUkYVKo 9YJu7j5E4EKpjT8AJ5c5mUpT+tYXoNVcfxWzO+otP8c71JbDVbeMc33o9USFtRRxjqjM +O3o7RjT+YNtkflmM5mnQR44VK4cYI6LeTcsQSgvSI6rikmi3+46PTU+xpu4jTUsp9CV ZZ3A== X-Gm-Message-State: APjAAAV2ObeaqDKms20IDz9TrZ3brIfcDjaROgPfK6UJJZcCzGwzQQDn z7ah8OQWM0TfiieTJU+76YOGs5f+NTvzj3bt2M7bzj0Bf7lI1Lzijh8h9UV88ZGr4bdz5Qt8dEw gUT5xgbGLxp8aYnz2/YYh5SkU+WJV651ZrPCoTtlAdZoyB8i7kZllctucwfJ4aro= X-Received: by 2002:a63:b555:: with SMTP id u21mr90559626pgo.222.1564108515304; Thu, 25 Jul 2019 19:35:15 -0700 (PDT) X-Received: by 2002:a63:b555:: with SMTP id u21mr90559554pgo.222.1564108513976; Thu, 25 Jul 2019 19:35:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564108513; cv=none; d=google.com; s=arc-20160816; b=CGSIK+BPl9XY3Uppt5IXZ6qOq21lixziyBzLqPVpbrTZMFlGJfxetZb1GaF4M9rWcy K/Ig+ZuWxwg9jGoRiFn1NBx8w0HOti1H6UOr7wf5nHihyMbq4Clo+dFfsfNl+w2ma/vQ naV00JtwpMlIlxB4IyilwHRbB4LX199iIN19wIfclUHeDRynfEmDolWHjfhyzjtQGIoE EEY4luOrtUfOJcCNalMlfh1ivFqqtwdvlAFw+kj9sU24rLlrO3IXw4N8/z/CtO8BZf8h Prci2LvwOlvDK2W1cQDyYAaIVujvJtyQ1nf3WbTh+YQM4Qx74Nl859/nBJP10PU90trk Xxzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=Q30YVPvUX3JGM2I0+DyGeQCKX2+K88lPFkslb0rMb3A=; b=ueSWUF8XeXUrFI7yTSQXk92aZswvOl0AF0TkVhuj2YCoPyrgPHUCEYQJGSjW3/2JLG ylfQEapoVo9M92MwUykucJjxR4tfdi/Ns3JN5Nh+Krad/JE1YXdpa8nkleb5XDpBRYe/ fbHPETAqkGGmKZuz8n5n0IqKi/WwkfGfOUNsvHa/kVmMNrhNeXb2PLILJFyuYSNvlHGm 7LGpbqYGq6f8tmnF8I5Vuk/6PFSEwLFn6ReWyP4yqpWLqZ6xJApq2Osn0vbA6E+6yM6M V+dPn56M12p4kHaJAYas1uiXnqcn1+8Wn3LIf5hkiEOxwQ70c3msOE5VxfWwu8s+mPWA Z4SA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=YLR9hvoY; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 123sor32889269pfu.0.2019.07.25.19.35.13 for (Google Transport Security); Thu, 25 Jul 2019 19:35:13 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=YLR9hvoY; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Q30YVPvUX3JGM2I0+DyGeQCKX2+K88lPFkslb0rMb3A=; b=YLR9hvoYu1W/oZRoUhC8RM52d7UHrKWR7/bpWn0fqe4QguWRF3O+NgHQaQ1qrytgEA rfSUKU8VxiFgNwxGvXLRDZj7hyeaopHWPEfwqi8YYq5fbANe99xNyeBPP6yuyIQj9ODK jXR9nWVOmCl6PSFSt4d5mYygk5w+iXMm9kv98YWobWNBrgpPlEHtDeLYWpqc/To5SAy1 x6vQiNUeKn0P+HLzSaX3gpPQ3XDV4mURPLqUCVa679ctL9yPl7hkoCo39G7LQjzGMUQN kaNijnQjuf8A6PBq+S5PaR+8vNFGRfIg/NZ5hTg3f84iUDLU7faOEa3pD/wib0p5W7zp JmGA== X-Google-Smtp-Source: APXvYqyy2hrsqR1fSZ3dc6IWHJTQa/8XvoFf1/1yCu8fXgywe6vHznXwamB7QK5HhknOKACRJ2jDdg== X-Received: by 2002:a62:6c1:: with SMTP id 184mr19175390pfg.230.1564108513527; Thu, 25 Jul 2019 19:35:13 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id l31sm88958450pgm.63.2019.07.25.19.35.08 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 25 Jul 2019 19:35:12 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v7 5/5] mm: factor out common parts between MADV_COLD and MADV_PAGEOUT Date: Fri, 26 Jul 2019 11:34:35 +0900 Message-Id: <20190726023435.214162-6-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.709.g102302147b-goog In-Reply-To: <20190726023435.214162-1-minchan@kernel.org> References: <20190726023435.214162-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There are many common parts between MADV_COLD and MADV_PAGEOUT. This patch factor them out to save code duplication. Suggested-by: Johannes Weiner Acked-by: Michal Hocko Signed-off-by: Minchan Kim --- mm/madvise.c | 194 ++++++++++++--------------------------------------- 1 file changed, 46 insertions(+), 148 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 78aa6802b95ad..52f9bddbab19c 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -30,6 +30,11 @@ #include "internal.h" +struct madvise_walk_private { + struct mmu_gather *tlb; + bool pageout; +}; + /* * Any behaviour which results in changes to the vma->vm_flags needs to * take mmap_sem for writing. Others, which simply traverse vmas, need @@ -310,15 +315,22 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } -static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, struct mm_walk *walk) +static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, + unsigned long addr, unsigned long end, + struct mm_walk *walk) { - struct mmu_gather *tlb = walk->private; + struct madvise_walk_private *private = walk->private; + struct mmu_gather *tlb = private->tlb; + bool pageout = private->pageout; struct mm_struct *mm = tlb->mm; struct vm_area_struct *vma = walk->vma; pte_t *orig_pte, *pte, ptent; spinlock_t *ptl; - struct page *page; + struct page *page = NULL; + LIST_HEAD(page_list); + + if (fatal_signal_pending(current)) + return -EINTR; #ifdef CONFIG_TRANSPARENT_HUGEPAGE if (pmd_trans_huge(*pmd)) { @@ -366,10 +378,17 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, tlb_remove_pmd_tlb_entry(tlb, pmd, addr); } + ClearPageReferenced(page); test_and_clear_page_young(page); - deactivate_page(page); + if (pageout) { + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); + } else + deactivate_page(page); huge_unlock: spin_unlock(ptl); + if (pageout) + reclaim_pages(&page_list); return 0; } @@ -437,12 +456,19 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, * As a side effect, it makes confuse idle-page tracking * because they will miss recent referenced history. */ + ClearPageReferenced(page); test_and_clear_page_young(page); - deactivate_page(page); + if (pageout) { + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); + } else + deactivate_page(page); } arch_leave_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); + if (pageout) + reclaim_pages(&page_list); cond_resched(); return 0; @@ -452,10 +478,15 @@ static void madvise_cold_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long addr, unsigned long end) { + struct madvise_walk_private walk_private = { + .tlb = tlb, + .pageout = false, + }; + struct mm_walk cold_walk = { - .pmd_entry = madvise_cold_pte_range, + .pmd_entry = madvise_cold_or_pageout_pte_range, .mm = vma->vm_mm, - .private = tlb, + .private = &walk_private, }; tlb_start_vma(tlb, vma); @@ -482,152 +513,19 @@ static long madvise_cold(struct vm_area_struct *vma, return 0; } -static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, struct mm_walk *walk) -{ - struct mmu_gather *tlb = walk->private; - struct mm_struct *mm = tlb->mm; - struct vm_area_struct *vma = walk->vma; - pte_t *orig_pte, *pte, ptent; - spinlock_t *ptl; - LIST_HEAD(page_list); - struct page *page; - - if (fatal_signal_pending(current)) - return -EINTR; - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (pmd_trans_huge(*pmd)) { - pmd_t orig_pmd; - unsigned long next = pmd_addr_end(addr, end); - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - ptl = pmd_trans_huge_lock(pmd, vma); - if (!ptl) - return 0; - - orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto huge_unlock; - - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto huge_unlock; - } - - page = pmd_page(orig_pmd); - if (next - addr != HPAGE_PMD_SIZE) { - int err; - - if (page_mapcount(page) != 1) - goto huge_unlock; - get_page(page); - spin_unlock(ptl); - lock_page(page); - err = split_huge_page(page); - unlock_page(page); - put_page(page); - if (!err) - goto regular_page; - return 0; - } - - if (pmd_young(orig_pmd)) { - pmdp_invalidate(vma, addr, pmd); - orig_pmd = pmd_mkold(orig_pmd); - - set_pmd_at(mm, addr, pmd, orig_pmd); - tlb_remove_tlb_entry(tlb, pmd, addr); - } - - ClearPageReferenced(page); - test_and_clear_page_young(page); - - if (!isolate_lru_page(page)) - list_add(&page->lru, &page_list); -huge_unlock: - spin_unlock(ptl); - reclaim_pages(&page_list); - return 0; - } - - if (pmd_trans_unstable(pmd)) - return 0; -regular_page: -#endif - tlb_change_page_size(tlb, PAGE_SIZE); - orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); - for (; addr < end; pte++, addr += PAGE_SIZE) { - ptent = *pte; - if (!pte_present(ptent)) - continue; - - page = vm_normal_page(vma, addr, ptent); - if (!page) - continue; - - /* - * creating a THP page is expensive so split it only if we - * are sure it's worth. Split it if we are only owner. - */ - if (PageTransCompound(page)) { - if (page_mapcount(page) != 1) - break; - get_page(page); - if (!trylock_page(page)) { - put_page(page); - break; - } - pte_unmap_unlock(orig_pte, ptl); - if (split_huge_page(page)) { - unlock_page(page); - put_page(page); - pte_offset_map_lock(mm, pmd, addr, &ptl); - break; - } - unlock_page(page); - put_page(page); - pte = pte_offset_map_lock(mm, pmd, addr, &ptl); - pte--; - addr -= PAGE_SIZE; - continue; - } - - VM_BUG_ON_PAGE(PageTransCompound(page), page); - - if (pte_young(ptent)) { - ptent = ptep_get_and_clear_full(mm, addr, pte, - tlb->fullmm); - ptent = pte_mkold(ptent); - set_pte_at(mm, addr, pte, ptent); - tlb_remove_tlb_entry(tlb, pte, addr); - } - ClearPageReferenced(page); - test_and_clear_page_young(page); - - if (!isolate_lru_page(page)) - list_add(&page->lru, &page_list); - } - - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(orig_pte, ptl); - reclaim_pages(&page_list); - cond_resched(); - - return 0; -} - static void madvise_pageout_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long addr, unsigned long end) { + struct madvise_walk_private walk_private = { + .pageout = true, + .tlb = tlb, + }; + struct mm_walk pageout_walk = { - .pmd_entry = madvise_pageout_pte_range, + .pmd_entry = madvise_cold_or_pageout_pte_range, .mm = vma->vm_mm, - .private = tlb, + .private = &walk_private, }; tlb_start_vma(tlb, vma);