From patchwork Thu Mar 21 20:01:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10864261 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7526F13B5 for ; Thu, 21 Mar 2019 20:03:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4FD292A45C for ; Thu, 21 Mar 2019 20:03:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4403C2A483; Thu, 21 Mar 2019 20:03:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D66942A45C for ; Thu, 21 Mar 2019 20:03:50 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id B8A942194EB7A; Thu, 21 Mar 2019 13:03:48 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.126; helo=mga18.intel.com; envelope-from=keith.busch@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id E7D672194EB75 for ; Thu, 21 Mar 2019 13:02:56 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Mar 2019 13:02:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,254,1549958400"; d="scan'208";a="309246233" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga005.jf.intel.com with ESMTP; 21 Mar 2019 13:02:56 -0700 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org Subject: [PATCH 1/5] node: Define and export memory migration path Date: Thu, 21 Mar 2019 14:01:53 -0600 Message-Id: <20190321200157.29678-2-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190321200157.29678-1-keith.busch@intel.com> References: <20190321200157.29678-1-keith.busch@intel.com> X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dave Hansen MIME-Version: 1.0 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Prepare for the kernel to auto-migrate pages to other memory nodes with a user defined node migration table. A user may create a single target for each NUMA node to enable the kernel to do NUMA page migrations instead of simply reclaiming colder pages. A node with no target is a "terminal node", so reclaim acts normally there. The migration target does not fundamentally _need_ to be a single node, but this implementation starts there to limit complexity. If you consider the migration path as a graph, cycles (loops) in the graph are disallowed. This avoids wasting resources by constantly migrating (A->B, B->A, A->B ...). The expectation is that cycles will never be allowed. Signed-off-by: Keith Busch --- Documentation/ABI/stable/sysfs-devices-node | 11 ++++- drivers/base/node.c | 73 +++++++++++++++++++++++++++++ include/linux/node.h | 6 +++ 3 files changed, 89 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index 3e90e1f3bf0a..7439e1845e5d 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -90,4 +90,13 @@ Date: December 2009 Contact: Lee Schermerhorn Description: The node's huge page size control/query attributes. - See Documentation/admin-guide/mm/hugetlbpage.rst \ No newline at end of file + See Documentation/admin-guide/mm/hugetlbpage.rst + +What: /sys/devices/system/node/nodeX/migration_path +Data March 2019 +Contact: Linux Memory Management list +Description: + Defines which node the kernel should attempt to migrate this + node's pages to when this node requires memory reclaim. A + negative value means this is a terminal node and memory can not + be reclaimed through kernel managed migration. diff --git a/drivers/base/node.c b/drivers/base/node.c index 86d6cd92ce3d..20a90905555f 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -59,6 +59,10 @@ static inline ssize_t node_read_cpulist(struct device *dev, static DEVICE_ATTR(cpumap, S_IRUGO, node_read_cpumask, NULL); static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL); +#define TERMINAL_NODE -1 +static int node_migration[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] = TERMINAL_NODE}; +static DEFINE_SPINLOCK(node_migration_lock); + #define K(x) ((x) << (PAGE_SHIFT - 10)) static ssize_t node_read_meminfo(struct device *dev, struct device_attribute *attr, char *buf) @@ -233,6 +237,74 @@ static ssize_t node_read_distance(struct device *dev, } static DEVICE_ATTR(distance, S_IRUGO, node_read_distance, NULL); +static ssize_t migration_path_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", node_migration[dev->id]); +} + +static ssize_t migration_path_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + int i, err, nid = dev->id; + nodemask_t visited = NODE_MASK_NONE; + long next; + + err = kstrtol(buf, 0, &next); + if (err) + return -EINVAL; + + if (next < 0) { + spin_lock(&node_migration_lock); + WRITE_ONCE(node_migration[nid], TERMINAL_NODE); + spin_unlock(&node_migration_lock); + return count; + } + if (next > MAX_NUMNODES || !node_online(next)) + return -EINVAL; + + /* + * Follow the entire migration path from 'nid' through the point where + * we hit a TERMINAL_NODE. + * + * Don't allow looped migration cycles in the path. + */ + node_set(nid, visited); + spin_lock(&node_migration_lock); + for (i = next; node_migration[i] != TERMINAL_NODE; + i = node_migration[i]) { + /* Fail if we have visited this node already */ + if (node_test_and_set(i, visited)) { + spin_unlock(&node_migration_lock); + return -EINVAL; + } + } + WRITE_ONCE(node_migration[nid], next); + spin_unlock(&node_migration_lock); + + return count; +} +static DEVICE_ATTR_RW(migration_path); + +/** + * next_migration_node() - Get the next node in the migration path + * @current_node: The starting node to lookup the next node + * + * @returns: node id for next memory node in the migration path hierarchy from + * @current_node; -1 if @current_node is terminal or its migration + * node is not online. + */ +int next_migration_node(int current_node) +{ + int nid = READ_ONCE(node_migration[current_node]); + + if (nid >= 0 && node_online(nid)) + return nid; + return TERMINAL_NODE; +} + static struct attribute *node_dev_attrs[] = { &dev_attr_cpumap.attr, &dev_attr_cpulist.attr, @@ -240,6 +312,7 @@ static struct attribute *node_dev_attrs[] = { &dev_attr_numastat.attr, &dev_attr_distance.attr, &dev_attr_vmstat.attr, + &dev_attr_migration_path.attr, NULL }; ATTRIBUTE_GROUPS(node_dev); diff --git a/include/linux/node.h b/include/linux/node.h index 257bb3d6d014..af46c7a8b94f 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -67,6 +67,7 @@ static inline int register_one_node(int nid) return error; } +extern int next_migration_node(int current_node); extern void unregister_one_node(int nid); extern int register_cpu_under_node(unsigned int cpu, unsigned int nid); extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid); @@ -115,6 +116,11 @@ static inline void register_hugetlbfs_with_node(node_registration_func_t reg, node_registration_func_t unreg) { } + +static inline int next_migration_node(int current_node) +{ + return -1; +} #endif #define to_node(device) container_of(device, struct node, dev) From patchwork Thu Mar 21 20:01:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10864263 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1FFB91575 for ; Thu, 21 Mar 2019 20:03:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F37192A45C for ; Thu, 21 Mar 2019 20:03:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E7CBA2A47F; Thu, 21 Mar 2019 20:03:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 87B372A45C for ; Thu, 21 Mar 2019 20:03:52 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id D36DA2194EB7E; Thu, 21 Mar 2019 13:03:48 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.126; helo=mga18.intel.com; envelope-from=keith.busch@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 85ADC2194EB70 for ; Thu, 21 Mar 2019 13:02:57 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Mar 2019 13:02:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,254,1549958400"; d="scan'208";a="309246237" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga005.jf.intel.com with ESMTP; 21 Mar 2019 13:02:56 -0700 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org Subject: [PATCH 2/5] mm: Split handling old page for migration Date: Thu, 21 Mar 2019 14:01:54 -0600 Message-Id: <20190321200157.29678-3-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190321200157.29678-1-keith.busch@intel.com> References: <20190321200157.29678-1-keith.busch@intel.com> X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dave Hansen MIME-Version: 1.0 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Refactor unmap_and_move() handling for the new page into a separate function from locking and preparing the old page. No functional change here: this is just making it easier to reuse this part of the page migration from contexts that already locked the old page. Signed-off-by: Keith Busch --- mm/migrate.c | 115 +++++++++++++++++++++++++++++++---------------------------- 1 file changed, 61 insertions(+), 54 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index ac6f4939bb59..705b320d4b35 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1000,57 +1000,14 @@ static int move_to_new_page(struct page *newpage, struct page *page, return rc; } -static int __unmap_and_move(struct page *page, struct page *newpage, - int force, enum migrate_mode mode) +static int __unmap_and_move_locked(struct page *page, struct page *newpage, + enum migrate_mode mode) { int rc = -EAGAIN; int page_was_mapped = 0; struct anon_vma *anon_vma = NULL; bool is_lru = !__PageMovable(page); - if (!trylock_page(page)) { - if (!force || mode == MIGRATE_ASYNC) - goto out; - - /* - * It's not safe for direct compaction to call lock_page. - * For example, during page readahead pages are added locked - * to the LRU. Later, when the IO completes the pages are - * marked uptodate and unlocked. However, the queueing - * could be merging multiple pages for one bio (e.g. - * mpage_readpages). If an allocation happens for the - * second or third page, the process can end up locking - * the same page twice and deadlocking. Rather than - * trying to be clever about what pages can be locked, - * avoid the use of lock_page for direct compaction - * altogether. - */ - if (current->flags & PF_MEMALLOC) - goto out; - - lock_page(page); - } - - if (PageWriteback(page)) { - /* - * Only in the case of a full synchronous migration is it - * necessary to wait for PageWriteback. In the async case, - * the retry loop is too short and in the sync-light case, - * the overhead of stalling is too much - */ - switch (mode) { - case MIGRATE_SYNC: - case MIGRATE_SYNC_NO_COPY: - break; - default: - rc = -EBUSY; - goto out_unlock; - } - if (!force) - goto out_unlock; - wait_on_page_writeback(page); - } - /* * By try_to_unmap(), page->mapcount goes down to 0 here. In this case, * we cannot notice that anon_vma is freed while we migrates a page. @@ -1077,11 +1034,11 @@ static int __unmap_and_move(struct page *page, struct page *newpage, * This is much like races on refcount of oldpage: just don't BUG(). */ if (unlikely(!trylock_page(newpage))) - goto out_unlock; + goto out; if (unlikely(!is_lru)) { rc = move_to_new_page(newpage, page, mode); - goto out_unlock_both; + goto out_unlock; } /* @@ -1100,7 +1057,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, VM_BUG_ON_PAGE(PageAnon(page), page); if (page_has_private(page)) { try_to_free_buffers(page); - goto out_unlock_both; + goto out_unlock; } } else if (page_mapped(page)) { /* Establish migration ptes */ @@ -1110,22 +1067,19 @@ static int __unmap_and_move(struct page *page, struct page *newpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); page_was_mapped = 1; } - if (!page_mapped(page)) rc = move_to_new_page(newpage, page, mode); if (page_was_mapped) remove_migration_ptes(page, rc == MIGRATEPAGE_SUCCESS ? newpage : page, false); - -out_unlock_both: - unlock_page(newpage); out_unlock: + unlock_page(newpage); /* Drop an anon_vma reference if we took one */ +out: if (anon_vma) put_anon_vma(anon_vma); - unlock_page(page); -out: + /* * If migration is successful, decrease refcount of the newpage * which will not free the page because new page owner increased @@ -1141,7 +1095,60 @@ static int __unmap_and_move(struct page *page, struct page *newpage, else putback_lru_page(newpage); } + return rc; +} + +static int __unmap_and_move(struct page *page, struct page *newpage, + int force, enum migrate_mode mode) +{ + int rc = -EAGAIN; + + if (!trylock_page(page)) { + if (!force || mode == MIGRATE_ASYNC) + goto out; + + /* + * It's not safe for direct compaction to call lock_page. + * For example, during page readahead pages are added locked + * to the LRU. Later, when the IO completes the pages are + * marked uptodate and unlocked. However, the queueing + * could be merging multiple pages for one bio (e.g. + * mpage_readpages). If an allocation happens for the + * second or third page, the process can end up locking + * the same page twice and deadlocking. Rather than + * trying to be clever about what pages can be locked, + * avoid the use of lock_page for direct compaction + * altogether. + */ + if (current->flags & PF_MEMALLOC) + goto out; + + lock_page(page); + } + if (PageWriteback(page)) { + /* + * Only in the case of a full synchronous migration is it + * necessary to wait for PageWriteback. In the async case, + * the retry loop is too short and in the sync-light case, + * the overhead of stalling is too much + */ + switch (mode) { + case MIGRATE_SYNC: + case MIGRATE_SYNC_NO_COPY: + break; + default: + rc = -EBUSY; + goto out_unlock; + } + if (!force) + goto out_unlock; + wait_on_page_writeback(page); + } + rc = __unmap_and_move_locked(page, newpage, mode); +out_unlock: + unlock_page(page); +out: return rc; } From patchwork Thu Mar 21 20:01:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10864265 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8629F1575 for ; Thu, 21 Mar 2019 20:03:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6494A2A45C for ; Thu, 21 Mar 2019 20:03:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 58C7C2A47F; Thu, 21 Mar 2019 20:03:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id F06E92A45C for ; Thu, 21 Mar 2019 20:03:53 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E8C86211E2F06; Thu, 21 Mar 2019 13:03:48 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.126; helo=mga18.intel.com; envelope-from=keith.busch@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 0D1212194EB70 for ; Thu, 21 Mar 2019 13:02:58 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Mar 2019 13:02:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,254,1549958400"; d="scan'208";a="309246241" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga005.jf.intel.com with ESMTP; 21 Mar 2019 13:02:57 -0700 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org Subject: [PATCH 3/5] mm: Attempt to migrate page in lieu of discard Date: Thu, 21 Mar 2019 14:01:55 -0600 Message-Id: <20190321200157.29678-4-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190321200157.29678-1-keith.busch@intel.com> References: <20190321200157.29678-1-keith.busch@intel.com> X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dave Hansen MIME-Version: 1.0 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP If a memory node has a preferred migration path to demote cold pages, attempt to move those inactive pages to that migration node before reclaiming. This will better utilize available memory, provide a faster tier than swapping or discarding, and allow such pages to be reused immediately without IO to retrieve the data. Some places we would like to see this used: 1. Persistent memory being as a slower, cheaper DRAM replacement 2. Remote memory-only "expansion" NUMA nodes 3. Resolving memory imbalances where one NUMA node is seeing more allocation activity than another. This helps keep more recent allocations closer to the CPUs on the node doing the allocating. Signed-off-by: Keith Busch --- include/linux/migrate.h | 6 ++++++ include/trace/events/migrate.h | 3 ++- mm/debug.c | 1 + mm/migrate.c | 45 ++++++++++++++++++++++++++++++++++++++++++ mm/vmscan.c | 15 ++++++++++++++ 5 files changed, 69 insertions(+), 1 deletion(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index e13d9bf2f9a5..a004cb1b2dbb 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -25,6 +25,7 @@ enum migrate_reason { MR_MEMPOLICY_MBIND, MR_NUMA_MISPLACED, MR_CONTIG_RANGE, + MR_DEMOTION, MR_TYPES }; @@ -79,6 +80,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping, extern int migrate_page_move_mapping(struct address_space *mapping, struct page *newpage, struct page *page, enum migrate_mode mode, int extra_count); +extern bool migrate_demote_mapping(struct page *page); #else static inline void putback_movable_pages(struct list_head *l) {} @@ -105,6 +107,10 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, return -ENOSYS; } +static inline bool migrate_demote_mapping(struct page *page) +{ + return false; +} #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_COMPACTION diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h index 705b33d1e395..d25de0cc8714 100644 --- a/include/trace/events/migrate.h +++ b/include/trace/events/migrate.h @@ -20,7 +20,8 @@ EM( MR_SYSCALL, "syscall_or_cpuset") \ EM( MR_MEMPOLICY_MBIND, "mempolicy_mbind") \ EM( MR_NUMA_MISPLACED, "numa_misplaced") \ - EMe(MR_CONTIG_RANGE, "contig_range") + EM(MR_CONTIG_RANGE, "contig_range") \ + EMe(MR_DEMOTION, "demotion") /* * First define the enums in the above macros to be exported to userspace diff --git a/mm/debug.c b/mm/debug.c index c0b31b6c3877..53d499f65199 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -25,6 +25,7 @@ const char *migrate_reason_names[MR_TYPES] = { "mempolicy_mbind", "numa_misplaced", "cma", + "demotion", }; const struct trace_print_flags pageflag_names[] = { diff --git a/mm/migrate.c b/mm/migrate.c index 705b320d4b35..83fad87361bf 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1152,6 +1152,51 @@ static int __unmap_and_move(struct page *page, struct page *newpage, return rc; } +/** + * migrate_demote_mapping() - Migrate this page and its mappings to its + * demotion node. + * @page: An isolated, non-compound page that should move to + * its current node's migration path. + * + * @returns: True if migrate demotion was successful, false otherwise + */ +bool migrate_demote_mapping(struct page *page) +{ + int rc, next_nid = next_migration_node(page_to_nid(page)); + struct page *newpage; + + /* + * The flags are set to allocate only on the desired node in the + * migration path, and to fail fast if not immediately available. We + * are already in the memory reclaim path, we don't want heroic + * efforts to get a page. + */ + gfp_t mask = GFP_NOWAIT | __GFP_NOWARN | __GFP_NORETRY | + __GFP_NOMEMALLOC | __GFP_THISNODE; + + VM_BUG_ON_PAGE(PageCompound(page), page); + VM_BUG_ON_PAGE(PageLRU(page), page); + + if (next_nid < 0) + return false; + + newpage = alloc_pages_node(next_nid, mask, 0); + if (!newpage) + return false; + + /* + * MIGRATE_ASYNC is the most light weight and never blocks. + */ + rc = __unmap_and_move_locked(page, newpage, MIGRATE_ASYNC); + if (rc != MIGRATEPAGE_SUCCESS) { + __free_pages(newpage, 0); + return false; + } + + set_page_owner_migrate_reason(newpage, MR_DEMOTION); + return true; +} + /* * gcc 4.7 and 4.8 on arm get an ICEs when inlining unmap_and_move(). Work * around it. diff --git a/mm/vmscan.c b/mm/vmscan.c index a5ad0b35ab8e..0a95804e946a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1261,6 +1261,21 @@ static unsigned long shrink_page_list(struct list_head *page_list, ; /* try to reclaim the page below */ } + if (!PageCompound(page)) { + if (migrate_demote_mapping(page)) { + unlock_page(page); + if (likely(put_page_testzero(page))) + goto free_it; + + /* + * Speculative reference will free this page, + * so leave it off the LRU. + */ + nr_reclaimed++; + continue; + } + } + /* * Anonymous process memory has backing store? * Try to allocate it some swap space here. From patchwork Thu Mar 21 20:01:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10864267 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D67AC13B5 for ; Thu, 21 Mar 2019 20:03:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B5CCC2A45C for ; Thu, 21 Mar 2019 20:03:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AA1B92A47F; Thu, 21 Mar 2019 20:03:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6435C2A45C for ; Thu, 21 Mar 2019 20:03:55 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 0B90E2194D3AE; Thu, 21 Mar 2019 13:03:49 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.126; helo=mga18.intel.com; envelope-from=keith.busch@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id B17522194EB70 for ; Thu, 21 Mar 2019 13:02:58 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Mar 2019 13:02:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,254,1549958400"; d="scan'208";a="309246245" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga005.jf.intel.com with ESMTP; 21 Mar 2019 13:02:58 -0700 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org Subject: [PATCH 4/5] mm: Consider anonymous pages without swap Date: Thu, 21 Mar 2019 14:01:56 -0600 Message-Id: <20190321200157.29678-5-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190321200157.29678-1-keith.busch@intel.com> References: <20190321200157.29678-1-keith.busch@intel.com> X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dave Hansen MIME-Version: 1.0 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Age and reclaim anonymous pages from nodes that have an online migration node even if swap is not enabled. Signed-off-by: Keith Busch --- include/linux/swap.h | 20 ++++++++++++++++++++ mm/vmscan.c | 10 +++++----- 2 files changed, 25 insertions(+), 5 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 4bfb5c4ac108..91b405a3b44f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -680,5 +680,25 @@ static inline bool mem_cgroup_swap_full(struct page *page) } #endif +static inline bool reclaim_anon_pages(struct mem_cgroup *memcg, + int node_id) +{ + /* Always age anon pages when we have swap */ + if (memcg == NULL) { + if (get_nr_swap_pages() > 0) + return true; + } else { + if (mem_cgroup_get_nr_swap_pages(memcg) > 0) + return true; + } + + /* Also age anon pages if we can auto-migrate them */ + if (next_migration_node(node_id) >= 0) + return true; + + /* No way to reclaim anon pages */ + return false; +} + #endif /* __KERNEL__*/ #endif /* _LINUX_SWAP_H */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 0a95804e946a..226c4c838947 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -327,7 +327,7 @@ unsigned long zone_reclaimable_pages(struct zone *zone) nr = zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_FILE) + zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_FILE); - if (get_nr_swap_pages() > 0) + if (reclaim_anon_pages(NULL, zone_to_nid(zone))) nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) + zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON); @@ -2206,7 +2206,7 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file, * If we don't have swap space, anonymous page deactivation * is pointless. */ - if (!file && !total_swap_pages) + if (!file && !reclaim_anon_pages(NULL, pgdat->node_id)) return false; inactive = lruvec_lru_size(lruvec, inactive_lru, sc->reclaim_idx); @@ -2287,7 +2287,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, enum lru_list lru; /* If we have no swap space, do not bother scanning anon pages. */ - if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) { + if (!sc->may_swap || !reclaim_anon_pages(memcg, pgdat->node_id)) { scan_balance = SCAN_FILE; goto out; } @@ -2650,7 +2650,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, */ pages_for_compaction = compact_gap(sc->order); inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); - if (get_nr_swap_pages() > 0) + if (!reclaim_anon_pages(NULL, pgdat->node_id)) inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); if (sc->nr_reclaimed < pages_for_compaction && inactive_lru_pages > pages_for_compaction) @@ -3347,7 +3347,7 @@ static void age_active_anon(struct pglist_data *pgdat, { struct mem_cgroup *memcg; - if (!total_swap_pages) + if (!reclaim_anon_pages(NULL, pgdat->node_id)) return; memcg = mem_cgroup_iter(NULL, NULL, NULL); From patchwork Thu Mar 21 20:01:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10864269 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2F4FD1575 for ; Thu, 21 Mar 2019 20:03:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0EAB42A45C for ; Thu, 21 Mar 2019 20:03:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 033722A47F; Thu, 21 Mar 2019 20:03:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B4EB32A45C for ; Thu, 21 Mar 2019 20:03:56 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 208E5219493E6; Thu, 21 Mar 2019 13:03:49 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.126; helo=mga18.intel.com; envelope-from=keith.busch@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 637CD2194EB70 for ; Thu, 21 Mar 2019 13:02:59 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Mar 2019 13:02:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,254,1549958400"; d="scan'208";a="309246248" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga005.jf.intel.com with ESMTP; 21 Mar 2019 13:02:58 -0700 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org Subject: [PATCH 5/5] mm/migrate: Add page movement trace event Date: Thu, 21 Mar 2019 14:01:57 -0600 Message-Id: <20190321200157.29678-6-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190321200157.29678-1-keith.busch@intel.com> References: <20190321200157.29678-1-keith.busch@intel.com> X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dave Hansen MIME-Version: 1.0 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Trace the source and destination node of a page migration to help debug memory usage. Signed-off-by: Keith Busch --- include/trace/events/migrate.h | 26 ++++++++++++++++++++++++++ mm/migrate.c | 1 + 2 files changed, 27 insertions(+) diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h index d25de0cc8714..3d4b7131e547 100644 --- a/include/trace/events/migrate.h +++ b/include/trace/events/migrate.h @@ -6,6 +6,7 @@ #define _TRACE_MIGRATE_H #include +#include #define MIGRATE_MODE \ EM( MIGRATE_ASYNC, "MIGRATE_ASYNC") \ @@ -71,6 +72,31 @@ TRACE_EVENT(mm_migrate_pages, __print_symbolic(__entry->mode, MIGRATE_MODE), __print_symbolic(__entry->reason, MIGRATE_REASON)) ); + +TRACE_EVENT(mm_migrate_move_page, + + TP_PROTO(struct page *from, struct page *to, int status), + + TP_ARGS(from, to, status), + + TP_STRUCT__entry( + __field(struct page *, from) + __field(struct page *, to) + __field(int, status) + ), + + TP_fast_assign( + __entry->from = from; + __entry->to = to; + __entry->status = status; + ), + + TP_printk("node from=%d to=%d status=%d flags=%s refs=%d", + page_to_nid(__entry->from), page_to_nid(__entry->to), + __entry->status, + show_page_flags(__entry->from->flags & ((1UL << NR_PAGEFLAGS) - 1)), + page_ref_count(__entry->from)) +); #endif /* _TRACE_MIGRATE_H */ /* This part must be outside protection */ diff --git a/mm/migrate.c b/mm/migrate.c index 83fad87361bf..d97433da12c0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -997,6 +997,7 @@ static int move_to_new_page(struct page *newpage, struct page *page, page->mapping = NULL; } out: + trace_mm_migrate_move_page(page, newpage, rc); return rc; }