From patchwork Tue Dec 21 09:18:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 12689319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61D41C433F5 for ; Tue, 21 Dec 2021 09:18:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E46EB6B007B; Tue, 21 Dec 2021 04:18:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DF6EF6B007D; Tue, 21 Dec 2021 04:18:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C70D06B007E; Tue, 21 Dec 2021 04:18:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0201.hostedemail.com [216.40.44.201]) by kanga.kvack.org (Postfix) with ESMTP id B44696B007B for ; Tue, 21 Dec 2021 04:18:33 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7DCB28249980 for ; Tue, 21 Dec 2021 09:18:33 +0000 (UTC) X-FDA: 78941250906.27.E1975D7 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf28.hostedemail.com (Postfix) with ESMTP id 38EF5C0044 for ; Tue, 21 Dec 2021 09:18:31 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R651e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0V.K7h71_1640078308; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0V.K7h71_1640078308) by smtp.aliyun-inc.com(127.0.0.1); Tue, 21 Dec 2021 17:18:28 +0800 From: Baolin Wang To: sj@kernel.org, akpm@linux-foundation.org Cc: ying.huang@intel.com, dave.hansen@linux.intel.com, ziy@nvidia.com, shy828301@gmail.com, zhongjiang-ali@linux.alibaba.com, xlpang@linux.alibaba.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/2] mm: Export the alloc_demote_page() function Date: Tue, 21 Dec 2021 17:18:03 +0800 Message-Id: <611250978aa68c1fab6112a795e9c0e5b817d9ee.1640077468.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 38EF5C0044 X-Stat-Signature: 8pky978a53f98uqy3mtgdft9qqed7jrc Authentication-Results: imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-HE-Tag: 1640078311-894044 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Export the alloc_demote_page() function to the head file as a preparation to support page demotion for DAMON monitor. Signed-off-by: Baolin Wang Reviewed-by: SeongJae Park --- mm/internal.h | 1 + mm/vmscan.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/internal.h b/mm/internal.h index deb9bda..99ea5fb 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -181,6 +181,7 @@ static inline void set_page_refcounted(struct page *page) extern int isolate_lru_page(struct page *page); extern void putback_lru_page(struct page *page); extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason); +extern struct page *alloc_demote_page(struct page *page, unsigned long node); /* * in mm/rmap.c: diff --git a/mm/vmscan.c b/mm/vmscan.c index f3162a5..bf38327 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1458,7 +1458,7 @@ static void page_check_dirty_writeback(struct page *page, mapping->a_ops->is_dirty_writeback(page, dirty, writeback); } -static struct page *alloc_demote_page(struct page *page, unsigned long node) +struct page *alloc_demote_page(struct page *page, unsigned long node) { struct migration_target_control mtc = { /* From patchwork Tue Dec 21 09:18:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 12689321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D2E7C433FE for ; Tue, 21 Dec 2021 09:18:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 018076B007D; Tue, 21 Dec 2021 04:18:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EE44E6B007E; Tue, 21 Dec 2021 04:18:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D34316B0080; Tue, 21 Dec 2021 04:18:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0204.hostedemail.com [216.40.44.204]) by kanga.kvack.org (Postfix) with ESMTP id C68A26B007D for ; Tue, 21 Dec 2021 04:18:34 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 78A7D89928 for ; Tue, 21 Dec 2021 09:18:34 +0000 (UTC) X-FDA: 78941250948.03.5E992B6 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by imf05.hostedemail.com (Postfix) with ESMTP id 3F645100041 for ; Tue, 21 Dec 2021 09:18:32 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0V.K7h7G_1640078309; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0V.K7h7G_1640078309) by smtp.aliyun-inc.com(127.0.0.1); Tue, 21 Dec 2021 17:18:29 +0800 From: Baolin Wang To: sj@kernel.org, akpm@linux-foundation.org Cc: ying.huang@intel.com, dave.hansen@linux.intel.com, ziy@nvidia.com, shy828301@gmail.com, zhongjiang-ali@linux.alibaba.com, xlpang@linux.alibaba.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/2] mm/damon: Add a new scheme to support demotion on tiered memory system Date: Tue, 21 Dec 2021 17:18:04 +0800 Message-Id: <1c014ce5c6f6c62a30d07096c5e28aa1310c1bbd.1640077468.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Rspamd-Queue-Id: 3F645100041 X-Stat-Signature: kijysgy44sbm9spps1dg9ymeszktoefh Authentication-Results: imf05.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf05.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.54 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com X-Rspamd-Server: rspam02 X-HE-Tag: 1640078312-137235 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On tiered memory system, the reclaim path in shrink_page_list() already support demoting pages to slow memory node instead of discarding the pages. However, at that time the fast memory node memory wartermark is already tense, which will increase the memory allocation latency during page demotion. We can rely on the DAMON in user space to help to monitor the cold memory on fast memory node, and demote the cold pages to slow memory node proactively to keep the fast memory node in a healthy state. Thus this patch introduces a new scheme named DAMOS_DEMOTE to support this feature. Signed-off-by: Baolin Wang Reported-by: kernel test robot --- include/linux/damon.h | 3 + mm/damon/dbgfs.c | 1 + mm/damon/vaddr.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 160 insertions(+) diff --git a/include/linux/damon.h b/include/linux/damon.h index af64838..da9957c 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -87,6 +87,8 @@ struct damon_target { * @DAMOS_PAGEOUT: Call ``madvise()`` for the region with MADV_PAGEOUT. * @DAMOS_HUGEPAGE: Call ``madvise()`` for the region with MADV_HUGEPAGE. * @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE. + * @DAMOS_DEMOTE: Migate cold pages from fast memory type (DRAM) to slow + * memory type (persistent memory). * @DAMOS_STAT: Do nothing but count the stat. */ enum damos_action { @@ -95,6 +97,7 @@ enum damos_action { DAMOS_PAGEOUT, DAMOS_HUGEPAGE, DAMOS_NOHUGEPAGE, + DAMOS_DEMOTE, DAMOS_STAT, /* Do nothing but only record the stat */ }; diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index 58dbb96..43355ab 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -168,6 +168,7 @@ static bool damos_action_valid(int action) case DAMOS_PAGEOUT: case DAMOS_HUGEPAGE: case DAMOS_NOHUGEPAGE: + case DAMOS_DEMOTE: case DAMOS_STAT: return true; default: diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 9e213a1..b354d3e 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -14,6 +14,10 @@ #include #include #include +#include +#include +#include +#include "../internal.h" #include "prmtv-common.h" @@ -693,6 +697,156 @@ static unsigned long damos_madvise(struct damon_target *target, } #endif /* CONFIG_ADVISE_SYSCALLS */ +static bool damos_isolate_page(struct page *page, struct list_head *demote_list) +{ + struct page *head = compound_head(page); + + /* Do not interfere with other mappings of this page */ + if (page_mapcount(head) != 1) + return false; + + /* No need migration if the target demotion node is empty. */ + if (next_demotion_node(page_to_nid(head)) == NUMA_NO_NODE) + return false; + + if (isolate_lru_page(head)) + return false; + + list_add_tail(&head->lru, demote_list); + mod_node_page_state(page_pgdat(head), + NR_ISOLATED_ANON + page_is_file_lru(head), + thp_nr_pages(head)); + return true; +} + +static int damos_migrate_pmd_entry(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct vm_area_struct *vma = walk->vma; + struct list_head *demote_list = walk->private; + spinlock_t *ptl; + struct page *page; + pte_t *pte, *mapped_pte; + + if (!vma_migratable(vma)) + return -EFAULT; + + ptl = pmd_trans_huge_lock(pmd, vma); + if (ptl) { + /* Bail out if THP migration is not supported. */ + if (!thp_migration_supported()) + goto thp_out; + + /* If the THP pte is under migration, do not bother it. */ + if (unlikely(is_pmd_migration_entry(*pmd))) + goto thp_out; + + page = damon_get_page(pmd_pfn(*pmd)); + if (!page) + goto thp_out; + + damos_isolate_page(page, demote_list); + + put_page(page); +thp_out: + spin_unlock(ptl); + return 0; + } + + /* regular page handling */ + if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd))) + return -EINVAL; + + mapped_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + for (; addr != end; pte++, addr += PAGE_SIZE) { + if (pte_none(*pte) || !pte_present(*pte)) + continue; + + page = damon_get_page(pte_pfn(*pte)); + if (!page) + continue; + + damos_isolate_page(page, demote_list); + put_page(page); + } + pte_unmap_unlock(mapped_pte, ptl); + cond_resched(); + + return 0; +} + +static const struct mm_walk_ops damos_migrate_pages_walk_ops = { + .pmd_entry = damos_migrate_pmd_entry, +}; + +/* + * damos_demote() - demote cold pages from fast memory to slow memory + * @target: the given target + * @r: region of the target + * + * On tiered memory system, if DAMON monitored cold pages on fast memory + * node (DRAM), we can demote them to slow memory node proactively in case + * accumulating much more cold memory on fast memory node (DRAM) to reclaim. + * + * Return: + * = 0 means no pages were demoted. + * > 0 means how many cold pages were demoted successfully. + * < 0 means errors happened. + */ +static int damos_demote(struct damon_target *target, struct damon_region *r) +{ + struct mm_struct *mm; + LIST_HEAD(demote_pages); + LIST_HEAD(pagelist); + int target_nid, nr_pages, ret = 0; + unsigned int nr_succeeded, demoted_pages = 0; + struct page *page, *page2; + + /* Validate if allowing to do page demotion */ + if (!numa_demotion_enabled) + return -EINVAL; + + mm = damon_get_mm(target); + if (!mm) + return -ENOMEM; + + mmap_read_lock(mm); + walk_page_range(mm, PAGE_ALIGN(r->ar.start), PAGE_ALIGN(r->ar.end), + &damos_migrate_pages_walk_ops, &demote_pages); + mmap_read_unlock(mm); + + mmput(mm); + if (list_empty(&demote_pages)) + return 0; + + list_for_each_entry_safe(page, page2, &demote_pages, lru) { + list_add(&page->lru, &pagelist); + target_nid = next_demotion_node(page_to_nid(page)); + nr_pages = thp_nr_pages(page); + + ret = migrate_pages(&pagelist, alloc_demote_page, NULL, + target_nid, MIGRATE_SYNC, MR_DEMOTION, + &nr_succeeded); + if (ret) { + if (!list_empty(&pagelist)) { + list_del(&page->lru); + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + page_is_file_lru(page), + -nr_pages); + putback_lru_page(page); + } + } else { + __count_vm_events(PGDEMOTE_DIRECT, nr_succeeded); + demoted_pages += nr_succeeded; + } + + INIT_LIST_HEAD(&pagelist); + cond_resched(); + } + + return demoted_pages; +} + static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx, struct damon_target *t, struct damon_region *r, struct damos *scheme) @@ -715,6 +869,8 @@ static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx, case DAMOS_NOHUGEPAGE: madv_action = MADV_NOHUGEPAGE; break; + case DAMOS_DEMOTE: + return damos_demote(t, r); case DAMOS_STAT: return 0; default: