From patchwork Fri Jun 19 16:24:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614593 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E273290 for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B9E5121707 for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B9E5121707 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 151BE8D00D2; Fri, 19 Jun 2020 12:24:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0B5AA8D00D0; Fri, 19 Jun 2020 12:24:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E965A8D00D2; Fri, 19 Jun 2020 12:24:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0094.hostedemail.com [216.40.44.94]) by kanga.kvack.org (Postfix) with ESMTP id CC92D8D00D0 for ; Fri, 19 Jun 2020 12:24:32 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8AA5C180AD81A for ; Fri, 19 Jun 2020 16:24:32 +0000 (UTC) X-FDA: 76946484384.18.cream30_070a0e726e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 8D7C010085D5F for ; Fri, 19 Jun 2020 16:24:31 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30029:30045:30054:30064,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-62.50.0.100 64.95.201.95,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: cream30_070a0e726e1a X-Filterd-Recvd-Size: 2969 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:30 +0000 (UTC) IronPort-SDR: GVBsMXjing+fRwviMSEU0wjDGsec2jPQcLNlV9NAlwFp5kOmDvSpBkwMyQwiiim+nKFh776/sk 5wRQLXOfaYXA== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280134" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280134" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:28 -0700 IronPort-SDR: QK2r2izXYeI5Gb759ect1MLhaHAPPt7795R4t3jaLnhVuy1wnRp/dx6rd6PQLkXBTT16N/6FaQ o8WNxSAMH4IQ== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264367918" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:27 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Christoph Lameter , Andrew Morton , David Rientjes Subject: [PATCH 01/18] mm/mempolicy: Add comment for missing LOCAL Date: Fri, 19 Jun 2020 09:24:08 -0700 Message-Id: <20200619162425.1052382-2-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 8D7C010085D5F X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: MPOL_LOCAL is a bit weird because it is simply a different name for an existing behavior (preferred policy with no node mask). It has been this way since it was added here: commit 479e2802d09f ("mm: mempolicy: Make MPOL_LOCAL a real policy") It is so similar to MPOL_PREFERRED in fact that when the policy is created in mpol_new, the mode is set as PREFERRED, and an internal state representing LOCAL doesn't exist. To prevent future explorers from scratching their head as to why MPOL_LOCAL isn't defined in the mpol_ops table, add a small comment explaining the situations. Cc: Christoph Lameter Cc: Andrew Morton Cc: David Rientjes Signed-off-by: Ben Widawsky Acked-by: Michal Hocko --- mm/mempolicy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 381320671677..36ee3267c25f 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -427,6 +427,7 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { .create = mpol_new_bind, .rebind = mpol_rebind_nodemask, }, + /* MPOL_LOCAL is converted to MPOL_PREFERRED on policy creation */ }; static int migrate_page_add(struct page *page, struct list_head *pagelist, From patchwork Fri Jun 19 16:24:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8BAF890 for ; Fri, 19 Jun 2020 16:25:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 597D221852 for ; Fri, 19 Jun 2020 16:25:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 597D221852 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A24478D00E5; Fri, 19 Jun 2020 12:24:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7BAD68D00E6; Fri, 19 Jun 2020 12:24:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65CB78D00E9; Fri, 19 Jun 2020 12:24:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0224.hostedemail.com [216.40.44.224]) by kanga.kvack.org (Postfix) with ESMTP id 457168D00E5 for ; Fri, 19 Jun 2020 12:24:47 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 0ED57180AD80F for ; Fri, 19 Jun 2020 16:24:47 +0000 (UTC) X-FDA: 76946485014.13.drum52_2806e4926e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 9E9A2181411F9 for ; Fri, 19 Jun 2020 16:24:31 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30041:30045:30054:30064:30080,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: drum52_2806e4926e1a X-Filterd-Recvd-Size: 4084 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:30 +0000 (UTC) IronPort-SDR: aZHWzDKVlfUY/OCSU1Y/Nj3mqaPZemqKMMh/Noa1yL/PGDHzu69EIvNV1QB3PN26gH3YAqqJtj H3GNGNYNOKLw== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280136" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280136" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:28 -0700 IronPort-SDR: oLF0Yn6mSVHJnqoQqHhr0/SypkAj2C8XbjglXisgJgl5oo7fTmKSy5d9E4GfOfu9Ortv8mESf0 hfsG5EujIMvQ== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264367949" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:28 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andrew Morton , Lee Schermerhorn Subject: [PATCH 02/18] mm/mempolicy: Use node_mem_id() instead of node_id() Date: Fri, 19 Jun 2020 09:24:09 -0700 Message-Id: <20200619162425.1052382-3-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 9E9A2181411F9 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Calling out some distinctions first as I understand it, and the reasoning of the patch: numa_node_id() - The node id for the currently running CPU. numa_mem_id() - The node id for the closest memory node. The case where they are not the same is CONFIG_HAVE_MEMORYLESS_NODES. Only ia64 and powerpc support this option, so it is perhaps not a very interesting situation to most. The question is, when you do want which? numa_node_id() is definitely what's desired if MPOL_PREFERRED, or MPOL_LOCAL were used, since the ABI states "This mode specifies "local allocation"; the memory is allocated on the node of the CPU that triggered the allocation (the "local node")." It would be weird, though not impossible to set this policy on a CPU that has memoryless nodes. A more likely way to hit this is with interleaving. The current interfaces will return some equally weird thing, but at least it's symmetric. Therefore, in cases where the node is being queried for the currently running process, it probably makes sense to use numa_node_id(). For other cases however, when CPU is trying to obtain the "local" memory, numa_mem_id() already contains this and should be used instead. This really should only effect configurations where CONFIG_HAVE_MEMORYLESS_NODES=y, and even on those machines it's quite possible the ultimate behavior would be identical. Cc: Andrew Morton Cc: Lee Schermerhorn Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 36ee3267c25f..99e0f3f9c4a6 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1991,7 +1991,7 @@ static unsigned offset_il_node(struct mempolicy *pol, unsigned long n) int nid; if (!nnodes) - return numa_node_id(); + return numa_mem_id(); target = (unsigned int)n % nnodes; nid = first_node(pol->v.nodes); for (i = 0; i < target; i++) @@ -2049,7 +2049,7 @@ int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, nid = interleave_nid(*mpol, vma, addr, huge_page_shift(hstate_vma(vma))); } else { - nid = policy_node(gfp_flags, *mpol, numa_node_id()); + nid = policy_node(gfp_flags, *mpol, numa_mem_id()); if ((*mpol)->mode == MPOL_BIND) *nodemask = &(*mpol)->v.nodes; } From patchwork Fri Jun 19 16:24:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614597 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3789C14B7 for ; Fri, 19 Jun 2020 16:24:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 018822168B for ; Fri, 19 Jun 2020 16:24:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 018822168B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BB0EC8D00D4; Fri, 19 Jun 2020 12:24:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B3D2D8D00D3; Fri, 19 Jun 2020 12:24:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F1238D00D5; Fri, 19 Jun 2020 12:24:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0148.hostedemail.com [216.40.44.148]) by kanga.kvack.org (Postfix) with ESMTP id 681C78D00D4 for ; Fri, 19 Jun 2020 12:24:33 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 25E0C5F372 for ; Fri, 19 Jun 2020 16:24:33 +0000 (UTC) X-FDA: 76946484426.14.meat35_470efc426e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 0650A1800E61E for ; Fri, 19 Jun 2020 16:24:32 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30012:30054:30064:30090,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-62.50.0.100 64.95.201.95,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:21,LUA_SUMMARY:none X-HE-Tag: meat35_470efc426e1a X-Filterd-Recvd-Size: 9697 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:31 +0000 (UTC) IronPort-SDR: lQlGuHAFmika25QOHUFD+hUgzGUz4vx5J2z+d5LR3Qby4Zqe5xq4AGT0Fs2rp99SyM3cUMuKQ1 dBDFbnlrKZAw== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280140" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280140" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:28 -0700 IronPort-SDR: JjDjt/6sAL68rGj0XMM1XBGZ2TlG2SEA7oZejlI9snhT+eVoW65AOXh467jeJEoCjkanO+KZq9 +aP4yrbj9qtA== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368007" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:28 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andi Kleen , Andrew Morton , Dave Hansen , Kuppuswamy Sathyanarayanan , Mel Gorman , Michal Hocko Subject: [PATCH 03/18] mm/page_alloc: start plumbing multi preferred node Date: Fri, 19 Jun 2020 09:24:10 -0700 Message-Id: <20200619162425.1052382-4-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 0650A1800E61E X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for supporting multiple preferred nodes, we need the internals to switch from taking a nid to a nodemask. As mentioned in the code as a comment, __alloc_pages_nodemask() is the heart of the page allocator. It takes a single node as a preferred node to try to obtain a zonelist from first. This patch leaves that internal interface in place, but changes the guts of the function to consider a list of preferred nodes. The local node is always most preferred. If the local node is either restricted because of preference or binding, then the closest node that meets both the binding and preference criteria is used. If the intersection of binding and preference is the empty set, then fall back to the first node the meets binding criteria. As of this patch, multiple preferred nodes aren't actually supported as it might seem initially. As an example, suppose your preferred nodes are 0, and 1. Node 0's fallback zone list may have zones from nodes ordered 0->2->1. If this code were to pick 0's zonelist, and all zones from node 0 were full, you'd get a zone from node 2 instead of 1. As multiple nodes aren't yet supported anyway, this is okay just as a preparatory patch. v2: Fixed memory hotplug handling (Ben) Cc: Andi Kleen Cc: Andrew Morton Cc: Dave Hansen Cc: Kuppuswamy Sathyanarayanan Cc: Mel Gorman Cc: Michal Hocko Signed-off-by: Ben Widawsky --- mm/page_alloc.c | 125 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 119 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 48eb0f1410d4..280ca85dc4d8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -129,6 +129,10 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = { }; EXPORT_SYMBOL(node_states); +#ifdef CONFIG_NUMA +static int find_next_best_node(int node, nodemask_t *used_node_mask); +#endif + atomic_long_t _totalram_pages __read_mostly; EXPORT_SYMBOL(_totalram_pages); unsigned long totalreserve_pages __read_mostly; @@ -4759,13 +4763,118 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, return page; } -static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, - int preferred_nid, nodemask_t *nodemask, - struct alloc_context *ac, gfp_t *alloc_mask, - unsigned int *alloc_flags) +#ifndef CONFIG_NUMA +#define set_pref_bind_mask(out, pref, bind) \ + { \ + (out)->bits[0] = 1UL \ + } +#else +static void set_pref_bind_mask(nodemask_t *out, const nodemask_t *prefmask, + const nodemask_t *bindmask) +{ + bool has_pref, has_bind; + + has_pref = prefmask && !nodes_empty(*prefmask); + has_bind = bindmask && !nodes_empty(*bindmask); + + if (has_pref && has_bind) + nodes_and(*out, *prefmask, *bindmask); + else if (has_pref && !has_bind) + *out = *prefmask; + else if (!has_pref && has_bind) + *out = *bindmask; + else if (!has_pref && !has_bind) + unreachable(); /* Handled above */ + else + unreachable(); +} +#endif + +/* + * Find a zonelist from a preferred node. Here is a truth table example using 2 + * different masks. The choices are, NULL mask, empty mask, two masks with an + * intersection, and two masks with no intersection. If the local node is in the + * intersection, it is used, otherwise the first set node is used. + * + * +----------+----------+------------+ + * | bindmask | prefmask | zonelist | + * +----------+----------+------------+ + * | NULL/0 | NULL/0 | local node | + * | NULL/0 | 0x2 | 0x2 | + * | NULL/0 | 0x4 | 0x4 | + * | 0x2 | NULL/0 | 0x2 | + * | 0x2 | 0x2 | 0x2 | + * | 0x2 | 0x4 | local* | + * | 0x4 | NULL/0 | 0x4 | + * | 0x4 | 0x2 | local* | + * | 0x4 | 0x4 | 0x4 | + * +----------+----------+------------+ + * + * NB: That zonelist will have *all* zones in the fallback case, and not all of + * those zones will belong to preferred nodes. + */ +static struct zonelist *preferred_zonelist(gfp_t gfp_mask, + const nodemask_t *prefmask, + const nodemask_t *bindmask) +{ + nodemask_t pref; + int nid, local_node = numa_mem_id(); + + /* Multi nodes not supported yet */ + VM_BUG_ON(prefmask && nodes_weight(*prefmask) != 1); + +#define _isset(mask, node) \ + (!(mask) || nodes_empty(*(mask)) ? 1 : node_isset(node, *(mask))) + /* + * This will handle NULL masks, empty masks, and when the local node + * match all constraints. It does most of the magic here. + */ + if (_isset(prefmask, local_node) && _isset(bindmask, local_node)) + return node_zonelist(local_node, gfp_mask); +#undef _isset + + VM_BUG_ON(!prefmask && !bindmask); + + set_pref_bind_mask(&pref, prefmask, bindmask); + + /* + * It is possible that the caller may ask for a preferred set that isn't + * available. One such case is memory hotplug. Memory hotplug code tries + * to do some allocations from the target node (what will be local to + * the new node) before the pages are onlined (N_MEMORY). + */ + for_each_node_mask(nid, pref) { + if (!node_state(nid, N_MEMORY)) + node_clear(nid, pref); + } + + /* + * If we couldn't manage to get anything reasonable, let later code + * clean up our mess. Local node will be the best approximation for what + * is desired, just use it. + */ + if (unlikely(nodes_empty(pref))) + return node_zonelist(local_node, gfp_mask); + + /* Try to find the "closest" node in the list. */ + nodes_complement(pref, pref); + while ((nid = find_next_best_node(local_node, &pref)) != NUMA_NO_NODE) + return node_zonelist(nid, gfp_mask); + + /* + * find_next_best_node() will have to have found something if the + * node list isn't empty and so it isn't possible to get here unless + * find_next_best_node() is modified and this function isn't updated. + */ + BUG(); +} + +static inline bool +prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, nodemask_t *prefmask, + nodemask_t *nodemask, struct alloc_context *ac, + gfp_t *alloc_mask, unsigned int *alloc_flags) { ac->highest_zoneidx = gfp_zone(gfp_mask); - ac->zonelist = node_zonelist(preferred_nid, gfp_mask); ac->nodemask = nodemask; ac->migratetype = gfp_migratetype(gfp_mask); @@ -4777,6 +4886,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, *alloc_flags |= ALLOC_CPUSET; } + ac->zonelist = preferred_zonelist(gfp_mask, prefmask, ac->nodemask); + fs_reclaim_acquire(gfp_mask); fs_reclaim_release(gfp_mask); @@ -4817,6 +4928,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, unsigned int alloc_flags = ALLOC_WMARK_LOW; gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ struct alloc_context ac = { }; + nodemask_t prefmask = nodemask_of_node(preferred_nid); /* * There are several places where we assume that the order value is sane @@ -4829,7 +4941,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, gfp_mask &= gfp_allowed_mask; alloc_mask = gfp_mask; - if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags)) + if (!prepare_alloc_pages(gfp_mask, order, &prefmask, nodemask, &ac, + &alloc_mask, &alloc_flags)) return NULL; finalise_ac(gfp_mask, &ac); From patchwork Fri Jun 19 16:24:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614595 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E9CD814B7 for ; Fri, 19 Jun 2020 16:24:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B7FCB21707 for ; Fri, 19 Jun 2020 16:24:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B7FCB21707 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6F5628D00D0; Fri, 19 Jun 2020 12:24:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6808C8D00D3; Fri, 19 Jun 2020 12:24:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D1D78D00D0; Fri, 19 Jun 2020 12:24:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0245.hostedemail.com [216.40.44.245]) by kanga.kvack.org (Postfix) with ESMTP id 2F3BB8D00D3 for ; Fri, 19 Jun 2020 12:24:33 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E3C0E5F367 for ; Fri, 19 Jun 2020 16:24:32 +0000 (UTC) X-FDA: 76946484384.03.thumb27_0c0207926e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id BC74128A4EB for ; Fri, 19 Jun 2020 16:24:32 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30001:30003:30012:30036:30054:30064:30070,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: thumb27_0c0207926e1a X-Filterd-Recvd-Size: 8671 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:31 +0000 (UTC) IronPort-SDR: FHHzQ8tFY1+ltRvJ657bVVSBb1+387ogcLSHtue9/DkCSMBytZz0RAYhdD5tW96dEBurXWP4oE fJORtiy1I1bw== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280144" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280144" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:29 -0700 IronPort-SDR: LhtlEfdA+55J0z77uTKRZq8AJOwmbJI8XUREPdp65EdQN7ZV09vct7JT+LcL0hAvuKbb68ZSen cU19Mhj5mDAw== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368066" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:28 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andi Kleen , Andrew Morton , Dave Hansen , Johannes Weiner , Mel Gorman , Michal Hocko , Vlastimil Babka Subject: [PATCH 04/18] mm/page_alloc: add preferred pass to page allocation Date: Fri, 19 Jun 2020 09:24:11 -0700 Message-Id: <20200619162425.1052382-5-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: BC74128A4EB X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch updates the core part of page allocation (pulling from the free list) to take preferred nodes into account first. If an allocation from a preferred node cannot be found, the remaining nodes in the nodemask are checked. Intentionally not handled in this patch are OOM node scanning and reclaim scanning. I am very open and receptive on comments as to whether it is worth handling those cases with a preferred node ordering. In this patch the code first scans the preferred nodes to make the allocation, and then takes the subset of nodes in the remaining bound nodes (often this is NULL aka all nodes) - potentially two passes. Actually, the code was already two passes as it tries not to fragment on the first pass, so now it's up to 4 passes. Consider a 3 node system (0-2) passed the following masks: Preferred: 100 Bound: 110 pass 1: node 2 no fragmentation pass 2: node 1 no fragmentation pass 3: node 2 w/fragmentation pass 4: node 1 w/fragmentation Cc: Andi Kleen Cc: Andrew Morton Cc: Dave Hansen Cc: Johannes Weiner Cc: Mel Gorman Cc: Michal Hocko Cc: Vlastimil Babka Signed-off-by: Ben Widawsky --- mm/internal.h | 1 + mm/page_alloc.c | 108 +++++++++++++++++++++++++++++++++++------------- 2 files changed, 80 insertions(+), 29 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 9886db20d94f..8d16229c6cbb 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -138,6 +138,7 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); struct alloc_context { struct zonelist *zonelist; nodemask_t *nodemask; + nodemask_t *prefmask; struct zoneref *preferred_zoneref; int migratetype; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 280ca85dc4d8..3cf44b6c31ae 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3675,6 +3675,69 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask) return alloc_flags; } +#ifdef CONFIG_NUMA +static void set_pref_bind_mask(nodemask_t *out, const nodemask_t *prefmask, + const nodemask_t *bindmask) +{ + bool has_pref, has_bind; + + has_pref = prefmask && !nodes_empty(*prefmask); + has_bind = bindmask && !nodes_empty(*bindmask); + + if (has_pref && has_bind) + nodes_and(*out, *prefmask, *bindmask); + else if (has_pref && !has_bind) + *out = *prefmask; + else if (!has_pref && has_bind) + *out = *bindmask; + else if (!has_pref && !has_bind) + *out = NODE_MASK_ALL; + else + unreachable(); +} +#else +#define set_pref_bind_mask(out, pref, bind) \ + { \ + (out)->bits[0] = 1UL \ + } +#endif + +/* Helper to generate the preferred and fallback nodelists */ +static void __nodemask_for_freelist_scan(const struct alloc_context *ac, + bool preferred, nodemask_t *outnodes) +{ + bool has_pref; + bool has_bind; + + if (preferred) { + set_pref_bind_mask(outnodes, ac->prefmask, ac->nodemask); + return; + } + + has_pref = ac->prefmask && !nodes_empty(*ac->prefmask); + has_bind = ac->nodemask && !nodes_empty(*ac->nodemask); + + if (!has_bind && !has_pref) { + /* + * If no preference, we already tried the full nodemask, + * so we have to bail. + */ + nodes_clear(*outnodes); + } else if (!has_bind && has_pref) { + /* We tried preferred nodes only before. Invert that. */ + nodes_complement(*outnodes, *ac->prefmask); + } else if (has_bind && !has_pref) { + /* + * If preferred was empty, we've tried all bound nodes, + * and there nothing further we can do. + */ + nodes_clear(*outnodes); + } else if (has_bind && has_pref) { + /* Try the bound nodes that weren't tried before. */ + nodes_andnot(*outnodes, *ac->nodemask, *ac->prefmask); + } +} + /* * get_page_from_freelist goes through the zonelist trying to allocate * a page. @@ -3686,7 +3749,10 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, struct zoneref *z; struct zone *zone; struct pglist_data *last_pgdat_dirty_limit = NULL; - bool no_fallback; + nodemask_t nodes; + bool no_fallback, preferred_nodes_exhausted = false; + + __nodemask_for_freelist_scan(ac, true, &nodes); retry: /* @@ -3696,7 +3762,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, no_fallback = alloc_flags & ALLOC_NOFRAGMENT; z = ac->preferred_zoneref; for_next_zone_zonelist_nodemask(zone, z, ac->zonelist, - ac->highest_zoneidx, ac->nodemask) { + ac->highest_zoneidx, &nodes) + { struct page *page; unsigned long mark; @@ -3816,12 +3883,20 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, } } + if (!preferred_nodes_exhausted) { + __nodemask_for_freelist_scan(ac, false, &nodes); + preferred_nodes_exhausted = true; + goto retry; + } + /* * It's possible on a UMA machine to get through all zones that are * fragmented. If avoiding fragmentation, reset and try again. */ if (no_fallback) { alloc_flags &= ~ALLOC_NOFRAGMENT; + __nodemask_for_freelist_scan(ac, true, &nodes); + preferred_nodes_exhausted = false; goto retry; } @@ -4763,33 +4838,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, return page; } -#ifndef CONFIG_NUMA -#define set_pref_bind_mask(out, pref, bind) \ - { \ - (out)->bits[0] = 1UL \ - } -#else -static void set_pref_bind_mask(nodemask_t *out, const nodemask_t *prefmask, - const nodemask_t *bindmask) -{ - bool has_pref, has_bind; - - has_pref = prefmask && !nodes_empty(*prefmask); - has_bind = bindmask && !nodes_empty(*bindmask); - - if (has_pref && has_bind) - nodes_and(*out, *prefmask, *bindmask); - else if (has_pref && !has_bind) - *out = *prefmask; - else if (!has_pref && has_bind) - *out = *bindmask; - else if (!has_pref && !has_bind) - unreachable(); /* Handled above */ - else - unreachable(); -} -#endif - /* * Find a zonelist from a preferred node. Here is a truth table example using 2 * different masks. The choices are, NULL mask, empty mask, two masks with an @@ -4945,6 +4993,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, &alloc_mask, &alloc_flags)) return NULL; + ac.prefmask = &prefmask; + finalise_ac(gfp_mask, &ac); /* From patchwork Fri Jun 19 16:24:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614599 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A79890 for ; Fri, 19 Jun 2020 16:24:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E8DA021707 for ; Fri, 19 Jun 2020 16:24:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E8DA021707 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5987A8D00D5; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 54A1D8D00D7; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 376FF8D00D5; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0060.hostedemail.com [216.40.44.60]) by kanga.kvack.org (Postfix) with ESMTP id 11A718D00D3 for ; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BDF4A180AD806 for ; Fri, 19 Jun 2020 16:24:33 +0000 (UTC) X-FDA: 76946484426.07.book40_2b049f826e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id C588F1803F9A8 for ; Fri, 19 Jun 2020 16:24:32 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30025:30054:30055:30064:30070:30075,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: book40_2b049f826e1a X-Filterd-Recvd-Size: 11642 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:31 +0000 (UTC) IronPort-SDR: caO5O64lxeiVGD2MhNkZSFuape1ljIvQk701itzNJCYAl1Q39EioBRwOm3BInwGIRX0bn210K9 wFse8eZwDGeA== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280148" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280148" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:29 -0700 IronPort-SDR: NtADxzpG3a0vTTHt6xZm6yYrU0LXwfw3/WCavruWlhjQGB4iiuks/JPbA3rP0nFMLT8oWybsUc TwNeyhrOyp5g== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368109" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:29 -0700 From: Ben Widawsky To: linux-mm Cc: Dave Hansen , Andrew Morton , Ben Widawsky Subject: [PATCH 05/18] mm/mempolicy: convert single preferred_node to full nodemask Date: Fri, 19 Jun 2020 09:24:12 -0700 Message-Id: <20200619162425.1052382-6-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: C588F1803F9A8 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Hansen The NUMA APIs currently allow passing in a "preferred node" as a single bit set in a nodemask. If more than one bit it set, bits after the first are ignored. Internally, this is implemented as a single integer: mempolicy->preferred_node. This single node is generally OK for location-based NUMA where memory being allocated will eventually be operated on by a single CPU. However, in systems with multiple memory types, folks want to target a *type* of memory instead of a location. For instance, someone might want some high-bandwidth memory but do not care about the CPU next to which it is allocated. Or, they want a cheap, high capacity allocation and want to target all NUMA nodes which have persistent memory in volatile mode. In both of these cases, the application wants to target a *set* of nodes, but does not want strict MPOL_BIND behavior. To get that behavior, a MPOL_PREFERRED mode is desirable, but one that honors multiple nodes to be set in the nodemask. The first step in that direction is to be able to internally store multiple preferred nodes, which is implemented in this patch. This should not have any function changes and just switches the internal representation of mempolicy->preferred_node from an integer to a nodemask called 'mempolicy->preferred_nodes'. This is not a pie-in-the-sky dream for an API. This was a response to a specific ask of more than one group at Intel. Specifically: 1. There are existing libraries that target memory types such as https://github.com/memkind/memkind. These are known to suffer from SIGSEGV's when memory is low on targeted memory "kinds" that span more than one node. The MCDRAM on a Xeon Phi in "Cluster on Die" mode is an example of this. 2. Volatile-use persistent memory users want to have a memory policy which is targeted at either "cheap and slow" (PMEM) or "expensive and fast" (DRAM). However, they do not want to experience allocation failures when the targeted type is unavailable. 3. Allocate-then-run. Generally, we let the process scheduler decide on which physical CPU to run a task. That location provides a default allocation policy, and memory availability is not generally considered when placing tasks. For situations where memory is valuable and constrained, some users want to allocate memory first, *then* allocate close compute resources to the allocation. This is the reverse of the normal (CPU) model. Accelerators such as GPUs that operate on core-mm-managed memory are interested in this model. v2: Fix spelling errors in commit message. (Ben) clang-format. (Ben) Integrated bit from another patch. (Ben) Update the docs to reflect the internal data structure change (Ben) Don't advertise MPOL_PREFERRED_MANY in UAPI until we can handle it (Ben) Added more to the commit message (Dave) Cc: Andrew Morton Signed-off-by: Dave Hansen (v2) Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky --- .../admin-guide/mm/numa_memory_policy.rst | 6 +-- include/linux/mempolicy.h | 4 +- mm/mempolicy.c | 40 ++++++++++--------- 3 files changed, 27 insertions(+), 23 deletions(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index 067a90a1499c..1ad020c459b8 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -205,9 +205,9 @@ MPOL_PREFERRED of increasing distance from the preferred node based on information provided by the platform firmware. - Internally, the Preferred policy uses a single node--the - preferred_node member of struct mempolicy. When the internal - mode flag MPOL_F_LOCAL is set, the preferred_node is ignored + Internally, the Preferred policy uses a nodemask--the + preferred_nodes member of struct mempolicy. When the internal + mode flag MPOL_F_LOCAL is set, the preferred_nodes are ignored and the policy is interpreted as local allocation. "Local" allocation policy can be viewed as a Preferred policy that starts at the node containing the cpu where the allocation diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index ea9c15b60a96..c66ea9f4c61e 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -47,8 +47,8 @@ struct mempolicy { unsigned short mode; /* See MPOL_* above */ unsigned short flags; /* See set_mempolicy() MPOL_F_* above */ union { - short preferred_node; /* preferred */ - nodemask_t nodes; /* interleave/bind */ + nodemask_t preferred_nodes; /* preferred */ + nodemask_t nodes; /* interleave/bind */ /* undefined for default */ } v; union { diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 99e0f3f9c4a6..e0b576838e57 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -205,7 +205,7 @@ static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) else if (nodes_empty(*nodes)) return -EINVAL; /* no allowed nodes */ else - pol->v.preferred_node = first_node(*nodes); + pol->v.preferred_nodes = nodemask_of_node(first_node(*nodes)); return 0; } @@ -345,22 +345,26 @@ static void mpol_rebind_preferred(struct mempolicy *pol, const nodemask_t *nodes) { nodemask_t tmp; + nodemask_t preferred_node; + + /* MPOL_PREFERRED uses only the first node in the mask */ + preferred_node = nodemask_of_node(first_node(*nodes)); if (pol->flags & MPOL_F_STATIC_NODES) { int node = first_node(pol->w.user_nodemask); if (node_isset(node, *nodes)) { - pol->v.preferred_node = node; + pol->v.preferred_nodes = nodemask_of_node(node); pol->flags &= ~MPOL_F_LOCAL; } else pol->flags |= MPOL_F_LOCAL; } else if (pol->flags & MPOL_F_RELATIVE_NODES) { mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes); - pol->v.preferred_node = first_node(tmp); + pol->v.preferred_nodes = tmp; } else if (!(pol->flags & MPOL_F_LOCAL)) { - pol->v.preferred_node = node_remap(pol->v.preferred_node, - pol->w.cpuset_mems_allowed, - *nodes); + nodes_remap(tmp, pol->v.preferred_nodes, + pol->w.cpuset_mems_allowed, preferred_node); + pol->v.preferred_nodes = tmp; pol->w.cpuset_mems_allowed = *nodes; } } @@ -913,7 +917,7 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) break; case MPOL_PREFERRED: if (!(p->flags & MPOL_F_LOCAL)) - node_set(p->v.preferred_node, *nodes); + *nodes = p->v.preferred_nodes; /* else return empty node mask for local allocation */ break; default: @@ -1906,9 +1910,9 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) { - if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) - nd = policy->v.preferred_node; - else { + if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) { + nd = first_node(policy->v.preferred_nodes); + } else { /* * __GFP_THISNODE shouldn't even be used with the bind policy * because we might easily break the expectation to stay on the @@ -1953,7 +1957,7 @@ unsigned int mempolicy_slab_node(void) /* * handled MPOL_F_LOCAL above */ - return policy->v.preferred_node; + return first_node(policy->v.preferred_nodes); case MPOL_INTERLEAVE: return interleave_nodes(policy); @@ -2087,7 +2091,7 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) if (mempolicy->flags & MPOL_F_LOCAL) nid = numa_node_id(); else - nid = mempolicy->v.preferred_node; + nid = first_node(mempolicy->v.preferred_nodes); init_nodemask_of_node(mask, nid); break; @@ -2225,7 +2229,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, * node in its nodemask, we allocate the standard way. */ if (pol->mode == MPOL_PREFERRED && !(pol->flags & MPOL_F_LOCAL)) - hpage_node = pol->v.preferred_node; + hpage_node = first_node(pol->v.preferred_nodes); nmask = policy_nodemask(gfp, pol); if (!nmask || node_isset(hpage_node, *nmask)) { @@ -2364,7 +2368,7 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) /* a's ->flags is the same as b's */ if (a->flags & MPOL_F_LOCAL) return true; - return a->v.preferred_node == b->v.preferred_node; + return nodes_equal(a->v.preferred_nodes, b->v.preferred_nodes); default: BUG(); return false; @@ -2508,7 +2512,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long if (pol->flags & MPOL_F_LOCAL) polnid = numa_node_id(); else - polnid = pol->v.preferred_node; + polnid = first_node(pol->v.preferred_nodes); break; case MPOL_BIND: @@ -2825,7 +2829,7 @@ void __init numa_policy_init(void) .refcnt = ATOMIC_INIT(1), .mode = MPOL_PREFERRED, .flags = MPOL_F_MOF | MPOL_F_MORON, - .v = { .preferred_node = nid, }, + .v = { .preferred_nodes = nodemask_of_node(nid), }, }; } @@ -2991,7 +2995,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) if (mode != MPOL_PREFERRED) new->v.nodes = nodes; else if (nodelist) - new->v.preferred_node = first_node(nodes); + new->v.preferred_nodes = nodemask_of_node(first_node(nodes)); else new->flags |= MPOL_F_LOCAL; @@ -3044,7 +3048,7 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) if (flags & MPOL_F_LOCAL) mode = MPOL_LOCAL; else - node_set(pol->v.preferred_node, nodes); + nodes_or(nodes, nodes, pol->v.preferred_nodes); break; case MPOL_BIND: case MPOL_INTERLEAVE: From patchwork Fri Jun 19 16:24:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614601 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31DB214B7 for ; Fri, 19 Jun 2020 16:24:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EF8E2217D9 for ; Fri, 19 Jun 2020 16:24:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EF8E2217D9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 93BE28D00D7; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 881308D00D6; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 636C78D00D8; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0212.hostedemail.com [216.40.44.212]) by kanga.kvack.org (Postfix) with ESMTP id 3F6AB8D00D3 for ; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 04B768248068 for ; Fri, 19 Jun 2020 16:24:34 +0000 (UTC) X-FDA: 76946484468.23.dime60_131320126e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id B66B337608 for ; Fri, 19 Jun 2020 16:24:33 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30054:30064:30070,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-62.50.0.100 64.95.201.95,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: dime60_131320126e1a X-Filterd-Recvd-Size: 7967 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:32 +0000 (UTC) IronPort-SDR: TObJU5ksCR1bVsrgJtW/p5jjHB2evym31upowL2AWG6NQTlFz0tfWfA/CQGSj7z2L6SlpC9alh WMJFdQlDORqQ== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280151" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280151" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:29 -0700 IronPort-SDR: 9ejPzY/ZCUW0x8l6yzL5ESEWuxZvH5KYmrSoMO5CtC+3Mv4dQXrnv5XKygYSi3nBnR/JihUrQM wbsnThmTlAyQ== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368134" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:29 -0700 From: Ben Widawsky To: linux-mm Cc: Dave Hansen , Andrew Morton , Ben Widawsky Subject: [PATCH 06/18] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes Date: Fri, 19 Jun 2020 09:24:13 -0700 Message-Id: <20200619162425.1052382-7-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: B66B337608 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Hansen MPOL_PREFERRED honors only a single node set in the nodemask. Add the bare define for a new mode which will allow more than one. The patch does all the plumbing without actually adding the new policy type. v2: Plumb most MPOL_PREFERRED_MANY without exposing UAPI (Ben) Fixes for checkpatch (Ben) Cc: Andrew Morton Signed-off-by: Dave Hansen Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 42 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index e0b576838e57..6c7301cefeb6 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -31,6 +31,9 @@ * but useful to set in a VMA when you have a non default * process policy. * + * preferred many Try a set of nodes first before normal fallback. This is + * similar to preferred without the special case. + * * default Allocate on the local node first, or when on a VMA * use the process policy. This is what Linux always did * in a NUMA aware kernel and still does by, ahem, default. @@ -105,6 +108,8 @@ #include "internal.h" +#define MPOL_PREFERRED_MANY MPOL_MAX + /* Internal flags */ #define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0) /* Skip checks for continuous vmas */ #define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1) /* Invert check for nodemask */ @@ -175,7 +180,7 @@ struct mempolicy *get_task_policy(struct task_struct *p) static const struct mempolicy_operations { int (*create)(struct mempolicy *pol, const nodemask_t *nodes); void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes); -} mpol_ops[MPOL_MAX]; +} mpol_ops[MPOL_MAX + 1]; static inline int mpol_store_user_nodemask(const struct mempolicy *pol) { @@ -415,7 +420,7 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new) mmap_write_unlock(mm); } -static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { +static const struct mempolicy_operations mpol_ops[MPOL_MAX + 1] = { [MPOL_DEFAULT] = { .rebind = mpol_rebind_default, }, @@ -432,6 +437,10 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { .rebind = mpol_rebind_nodemask, }, /* MPOL_LOCAL is converted to MPOL_PREFERRED on policy creation */ + [MPOL_PREFERRED_MANY] = { + .create = NULL, + .rebind = NULL, + }, }; static int migrate_page_add(struct page *page, struct list_head *pagelist, @@ -915,6 +924,9 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) case MPOL_INTERLEAVE: *nodes = p->v.nodes; break; + case MPOL_PREFERRED_MANY: + *nodes = p->v.preferred_nodes; + break; case MPOL_PREFERRED: if (!(p->flags & MPOL_F_LOCAL)) *nodes = p->v.preferred_nodes; @@ -1910,7 +1922,9 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) { - if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) { + if ((policy->mode == MPOL_PREFERRED || + policy->mode == MPOL_PREFERRED_MANY) && + !(policy->flags & MPOL_F_LOCAL)) { nd = first_node(policy->v.preferred_nodes); } else { /* @@ -1953,6 +1967,7 @@ unsigned int mempolicy_slab_node(void) return node; switch (policy->mode) { + case MPOL_PREFERRED_MANY: case MPOL_PREFERRED: /* * handled MPOL_F_LOCAL above @@ -2087,6 +2102,9 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) task_lock(current); mempolicy = current->mempolicy; switch (mempolicy->mode) { + case MPOL_PREFERRED_MANY: + *mask = mempolicy->v.preferred_nodes; + break; case MPOL_PREFERRED: if (mempolicy->flags & MPOL_F_LOCAL) nid = numa_node_id(); @@ -2141,6 +2159,9 @@ bool mempolicy_nodemask_intersects(struct task_struct *tsk, * nodes in mask. */ break; + case MPOL_PREFERRED_MANY: + ret = nodes_intersects(mempolicy->v.preferred_nodes, *mask); + break; case MPOL_BIND: case MPOL_INTERLEAVE: ret = nodes_intersects(mempolicy->v.nodes, *mask); @@ -2225,8 +2246,9 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, * node and don't fall back to other nodes, as the cost of * remote accesses would likely offset THP benefits. * - * If the policy is interleave, or does not allow the current - * node in its nodemask, we allocate the standard way. + * If the policy is interleave or multiple preferred nodes, or + * does not allow the current node in its nodemask, we allocate + * the standard way. */ if (pol->mode == MPOL_PREFERRED && !(pol->flags & MPOL_F_LOCAL)) hpage_node = first_node(pol->v.preferred_nodes); @@ -2364,6 +2386,9 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) case MPOL_BIND: case MPOL_INTERLEAVE: return !!nodes_equal(a->v.nodes, b->v.nodes); + case MPOL_PREFERRED_MANY: + return !!nodes_equal(a->v.preferred_nodes, + b->v.preferred_nodes); case MPOL_PREFERRED: /* a's ->flags is the same as b's */ if (a->flags & MPOL_F_LOCAL) @@ -2532,6 +2557,8 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long polnid = zone_to_nid(z->zone); break; + /* case MPOL_PREFERRED_MANY: */ + default: BUG(); } @@ -2883,6 +2910,7 @@ static const char * const policy_modes[] = [MPOL_BIND] = "bind", [MPOL_INTERLEAVE] = "interleave", [MPOL_LOCAL] = "local", + [MPOL_PREFERRED_MANY] = "prefer (many)", }; @@ -2962,6 +2990,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) if (!nodelist) err = 0; goto out; + case MPOL_PREFERRED_MANY: case MPOL_BIND: /* * Insist on a nodelist @@ -3044,6 +3073,9 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) switch (mode) { case MPOL_DEFAULT: break; + case MPOL_PREFERRED_MANY: + WARN_ON(flags & MPOL_F_LOCAL); + fallthrough; case MPOL_PREFERRED: if (flags & MPOL_F_LOCAL) mode = MPOL_LOCAL; From patchwork Fri Jun 19 16:24:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614603 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5095B14B7 for ; Fri, 19 Jun 2020 16:24:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2709B217BA for ; Fri, 19 Jun 2020 16:24:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2709B217BA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C1E3E8D00D3; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B94188D00D8; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 785D18D00D3; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id 4DD608D00D6 for ; Fri, 19 Jun 2020 12:24:34 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 11FD0181AC9C6 for ; Fri, 19 Jun 2020 16:24:34 +0000 (UTC) X-FDA: 76946484468.12.roof71_310405126e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id D01AF18092FAF for ; Fri, 19 Jun 2020 16:24:33 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30012:30054:30064,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: roof71_310405126e1a X-Filterd-Recvd-Size: 3412 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:33 +0000 (UTC) IronPort-SDR: QhkbneDFVlAP3gXyxObCkI+l6593dL+8f3PLftKaN9SbJbD7blrmm1qLWd850an9AdaDqvUS26 h0+TjNQMuCpg== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280154" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280154" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:30 -0700 IronPort-SDR: HNqybXHqPDhfnU8V+NhMRaKUcvCWUhnJEM84rGBBqf4fgyLtnukAmdMPlmtQ56qtycE4siSMJl ib/cVt4C2Irg== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368170" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:29 -0700 From: Ben Widawsky To: linux-mm Cc: Dave Hansen , Andrew Morton , Ben Widawsky Subject: [PATCH 07/18] mm/mempolicy: allow preferred code to take a nodemask Date: Fri, 19 Jun 2020 09:24:14 -0700 Message-Id: <20200619162425.1052382-8-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: D01AF18092FAF X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Hansen Create a helper function (mpol_new_preferred_many()) which is usable both by the old, single-node MPOL_PREFERRED and the new MPOL_PREFERRED_MANY. Enforce the old single-node MPOL_PREFERRED behavior in the "new" version of mpol_new_preferred() which calls mpol_new_preferred_many(). Cc: Andrew Morton Signed-off-by: Dave Hansen Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 6c7301cefeb6..541675a5b947 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -203,17 +203,30 @@ static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes) return 0; } -static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) +static int mpol_new_preferred_many(struct mempolicy *pol, + const nodemask_t *nodes) { if (!nodes) pol->flags |= MPOL_F_LOCAL; /* local allocation */ else if (nodes_empty(*nodes)) return -EINVAL; /* no allowed nodes */ else - pol->v.preferred_nodes = nodemask_of_node(first_node(*nodes)); + pol->v.preferred_nodes = *nodes; return 0; } +static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) +{ + if (nodes) { + /* MPOL_PREFERRED can only take a single node: */ + nodemask_t tmp = nodemask_of_node(first_node(*nodes)); + + return mpol_new_preferred_many(pol, &tmp); + } + + return mpol_new_preferred_many(pol, NULL); +} + static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes) { if (nodes_empty(*nodes)) From patchwork Fri Jun 19 16:24:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614615 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5614514E3 for ; Fri, 19 Jun 2020 16:25:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2D9CC2168B for ; Fri, 19 Jun 2020 16:25:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D9CC2168B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 46D2F8D00DA; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1C92B8D00DB; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BFF7D8D00D9; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0149.hostedemail.com [216.40.44.149]) by kanga.kvack.org (Postfix) with ESMTP id 5B2B58D00DA for ; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1DC90181AC9C6 for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) X-FDA: 76946484510.08.laugh95_500be4726e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id E4996180A25DE for ; Fri, 19 Jun 2020 16:24:34 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30012:30054:30064,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-62.50.0.100 64.95.201.95,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:0,LUA_SUMMARY:none X-HE-Tag: laugh95_500be4726e1a X-Filterd-Recvd-Size: 4041 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:33 +0000 (UTC) IronPort-SDR: pnL3AY5zXnzq6Ik56zMGcsVcnW72NVbHQFFixS2hshYUrjsEM/WvZQmMtpxfZA+23053ebDaOD jmVRy0latYnQ== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280156" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280156" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:30 -0700 IronPort-SDR: 8Zm6foZqP8deM8cOj3bMVRgU7GIsgeMdPjQYFWzU1fk9ZclUUDHK9Jb+iz+BSFTwNhc5/alu8N KWKBHyZHyPTg== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368196" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:30 -0700 From: Ben Widawsky To: linux-mm Cc: Dave Hansen , Ben Widawsky Subject: [PATCH 08/18] mm/mempolicy: refactor rebind code for PREFERRED_MANY Date: Fri, 19 Jun 2020 09:24:15 -0700 Message-Id: <20200619162425.1052382-9-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: E4996180A25DE X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Hansen Again, this extracts the "only one node must be set" behavior of MPOL_PREFERRED. It retains virtually all of the existing code so it can be used by MPOL_PREFERRED_MANY as well. v2: Fixed typos in commit message. (Ben) Merged bits from other patches. (Ben) annotate mpol_rebind_preferred_many as unused (Ben) Signed-off-by: Dave Hansen Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 541675a5b947..bfc4ef2af90d 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -359,14 +359,11 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes) pol->v.nodes = tmp; } -static void mpol_rebind_preferred(struct mempolicy *pol, - const nodemask_t *nodes) +static void mpol_rebind_preferred_common(struct mempolicy *pol, + const nodemask_t *preferred_nodes, + const nodemask_t *nodes) { nodemask_t tmp; - nodemask_t preferred_node; - - /* MPOL_PREFERRED uses only the first node in the mask */ - preferred_node = nodemask_of_node(first_node(*nodes)); if (pol->flags & MPOL_F_STATIC_NODES) { int node = first_node(pol->w.user_nodemask); @@ -381,12 +378,30 @@ static void mpol_rebind_preferred(struct mempolicy *pol, pol->v.preferred_nodes = tmp; } else if (!(pol->flags & MPOL_F_LOCAL)) { nodes_remap(tmp, pol->v.preferred_nodes, - pol->w.cpuset_mems_allowed, preferred_node); + pol->w.cpuset_mems_allowed, *preferred_nodes); pol->v.preferred_nodes = tmp; pol->w.cpuset_mems_allowed = *nodes; } } +/* MPOL_PREFERRED_MANY allows multiple nodes to be set in 'nodes' */ +static void __maybe_unused mpol_rebind_preferred_many(struct mempolicy *pol, + const nodemask_t *nodes) +{ + mpol_rebind_preferred_common(pol, nodes, nodes); +} + +static void mpol_rebind_preferred(struct mempolicy *pol, + const nodemask_t *nodes) +{ + nodemask_t preferred_node; + + /* MPOL_PREFERRED uses only the first node in 'nodes' */ + preferred_node = nodemask_of_node(first_node(*nodes)); + + mpol_rebind_preferred_common(pol, &preferred_node, nodes); +} + /* * mpol_rebind_policy - Migrate a policy to a different set of nodes * From patchwork Fri Jun 19 16:24:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614605 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 432AD90 for ; Fri, 19 Jun 2020 16:24:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1124C2168B for ; Fri, 19 Jun 2020 16:24:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1124C2168B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A31588D00DC; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 98D238D00D8; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 744DD8D00DB; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0208.hostedemail.com [216.40.44.208]) by kanga.kvack.org (Postfix) with ESMTP id 417248D00D6 for ; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 045F5869A3 for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) X-FDA: 76946484510.30.corn88_16160eb26e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id CD8F6180A25D7 for ; Fri, 19 Jun 2020 16:24:34 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30054:30064,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-62.50.0.100 64.95.201.95,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: corn88_16160eb26e1a X-Filterd-Recvd-Size: 5221 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:33 +0000 (UTC) IronPort-SDR: seroeqbL4igBN6fGUgMG1clFe6vr4JamHU+q+CIIt5IvQcg9n+ieo1hBwV9jAjM7RZXDIUNY8K 0jENCcCzC/ZQ== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280159" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280159" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:30 -0700 IronPort-SDR: FO8s2akDwN7AEB9ZPgLdIPArtd7lqjXX+oAef5sfRlD4YGZDMEKn9oaB6KaXq4tIduI4UUlTCw DuOsVJmx7SgA== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368219" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:30 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andrew Morton , Dan Williams , Dave Hansen , Mel Gorman Subject: [PATCH 09/18] mm: Finish handling MPOL_PREFERRED_MANY Date: Fri, 19 Jun 2020 09:24:16 -0700 Message-Id: <20200619162425.1052382-10-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: CD8F6180A25D7 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that there is a function to generate the preferred zonelist given a preferred mask, bindmask, and flags it is possible to support MPOL_PREFERRED_MANY policy easily in more places. This patch was developed on top of Dave's original work. When Dave wrote his patches there was no clean way to implement MPOL_PREFERRED_MANY. Now that the other bits are in place, this is easy to drop on top. Cc: Andrew Morton Cc: Dan Williams Cc: Dave Hansen Cc: Mel Gorman Signed-off-by: Ben Widawsky --- include/linux/mmzone.h | 3 +++ mm/mempolicy.c | 20 ++++++++++++++++++-- mm/page_alloc.c | 5 ++--- 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c4c37fd12104..6b62ee98bb96 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1001,6 +1001,9 @@ struct zoneref *__next_zones_zonelist(struct zoneref *z, enum zone_type highest_zoneidx, nodemask_t *nodes); +struct zonelist *preferred_zonelist(gfp_t gfp_mask, const nodemask_t *prefmask, + const nodemask_t *bindmask); + /** * next_zones_zonelist - Returns the next zone at or below highest_zoneidx within the allowed nodemask using a cursor within a zonelist as a starting point * @z - The cursor used as a starting point for the search diff --git a/mm/mempolicy.c b/mm/mempolicy.c index bfc4ef2af90d..90bc9c93b1b9 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1995,7 +1995,6 @@ unsigned int mempolicy_slab_node(void) return node; switch (policy->mode) { - case MPOL_PREFERRED_MANY: case MPOL_PREFERRED: /* * handled MPOL_F_LOCAL above @@ -2020,6 +2019,18 @@ unsigned int mempolicy_slab_node(void) return z->zone ? zone_to_nid(z->zone) : node; } + case MPOL_PREFERRED_MANY: { + struct zoneref *z; + struct zonelist *zonelist; + enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL); + + zonelist = preferred_zonelist(GFP_KERNEL, + &policy->v.preferred_nodes, NULL); + z = first_zones_zonelist(zonelist, highest_zoneidx, + &policy->v.nodes); + return z->zone ? zone_to_nid(z->zone) : node; + } + default: BUG(); } @@ -2585,7 +2596,12 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long polnid = zone_to_nid(z->zone); break; - /* case MPOL_PREFERRED_MANY: */ + case MPOL_PREFERRED_MANY: + z = first_zones_zonelist(preferred_zonelist(GFP_HIGHUSER, + &pol->v.preferred_nodes, NULL), + gfp_zone(GFP_HIGHUSER), &pol->v.preferred_nodes); + polnid = zone_to_nid(z->zone); + break; default: BUG(); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3cf44b6c31ae..c6f8f112a5d4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4861,9 +4861,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * NB: That zonelist will have *all* zones in the fallback case, and not all of * those zones will belong to preferred nodes. */ -static struct zonelist *preferred_zonelist(gfp_t gfp_mask, - const nodemask_t *prefmask, - const nodemask_t *bindmask) +struct zonelist *preferred_zonelist(gfp_t gfp_mask, const nodemask_t *prefmask, + const nodemask_t *bindmask) { nodemask_t pref; int nid, local_node = numa_mem_id(); From patchwork Fri Jun 19 16:24:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614609 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3017C90 for ; Fri, 19 Jun 2020 16:25:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 072662168B for ; Fri, 19 Jun 2020 16:25:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 072662168B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1CD6C8D00DD; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F12C18D00D8; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2F158D00DB; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0198.hostedemail.com [216.40.44.198]) by kanga.kvack.org (Postfix) with ESMTP id 59B0B8D00D9 for ; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 17D838248068 for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) X-FDA: 76946484510.11.bait94_3412dbc26e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id D0A24180F8B81 for ; Fri, 19 Jun 2020 16:24:34 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30054:30064,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: bait94_3412dbc26e1a X-Filterd-Recvd-Size: 4035 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:34 +0000 (UTC) IronPort-SDR: /rfga6gXLxwyKcDwp2xWvWQHOmEUNFkX4KFo08/fd2pV1qz8adFmWVm3ey6t1wvnI2NefHkEes qqRF5XTXp6ow== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280161" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280161" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:30 -0700 IronPort-SDR: CIS2KgrKNmpUS6tATDa22F3Iyecz0H33xs2Os1+F42E7/VBfWybMoMK9CqxNo6WeZE6KHYpury AMfRDuReq+OA== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368267" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:30 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andrew Morton , David Rientjes , Dave Hansen Subject: [PATCH 10/18] mm: clean up alloc_pages_vma (thp) Date: Fri, 19 Jun 2020 09:24:17 -0700 Message-Id: <20200619162425.1052382-11-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: D0A24180F8B81 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: __alloc_pages_nodemask() already does the right thing for a preferred node and bind nodemask. Calling it directly allows us to simplify much of this. The handling occurs in prepare_alloc_pages() A VM assertion is added to prove correctness. Cc: Andrew Morton Cc: David Rientjes Cc: Dave Hansen Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 90bc9c93b1b9..408ba78c8424 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2293,27 +2293,29 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, hpage_node = first_node(pol->v.preferred_nodes); nmask = policy_nodemask(gfp, pol); - if (!nmask || node_isset(hpage_node, *nmask)) { - mpol_cond_put(pol); - /* - * First, try to allocate THP only on local node, but - * don't reclaim unnecessarily, just compact. - */ - page = __alloc_pages_node(hpage_node, - gfp | __GFP_THISNODE | __GFP_NORETRY, order); + mpol_cond_put(pol); - /* - * If hugepage allocations are configured to always - * synchronous compact or the vma has been madvised - * to prefer hugepage backing, retry allowing remote - * memory with both reclaim and compact as well. - */ - if (!page && (gfp & __GFP_DIRECT_RECLAIM)) - page = __alloc_pages_node(hpage_node, - gfp, order); + /* + * First, try to allocate THP only on local node, but + * don't reclaim unnecessarily, just compact. + */ + page = __alloc_pages_nodemask(gfp | __GFP_THISNODE | + __GFP_NORETRY, + order, hpage_node, nmask); - goto out; - } + /* + * If hugepage allocations are configured to always synchronous + * compact or the vma has been madvised to prefer hugepage + * backing, retry allowing remote memory with both reclaim and + * compact as well. + */ + if (!page && (gfp & __GFP_DIRECT_RECLAIM)) + page = __alloc_pages_nodemask(gfp, order, hpage_node, + nmask); + + VM_BUG_ON(page && nmask && + !node_isset(page_to_nid(page), *nmask)); + goto out; } nmask = policy_nodemask(gfp, pol); From patchwork Fri Jun 19 16:24:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B53490 for ; Fri, 19 Jun 2020 16:25:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 12A8C217A0 for ; Fri, 19 Jun 2020 16:25:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 12A8C217A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 261D98D00EC; Fri, 19 Jun 2020 12:24:53 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1EC318D00E9; Fri, 19 Jun 2020 12:24:53 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 068218D00EC; Fri, 19 Jun 2020 12:24:52 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0197.hostedemail.com [216.40.44.197]) by kanga.kvack.org (Postfix) with ESMTP id DEE338D00E9 for ; Fri, 19 Jun 2020 12:24:52 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9FBEF15BEC2 for ; Fri, 19 Jun 2020 16:24:52 +0000 (UTC) X-FDA: 76946485224.13.steam84_1b046b326e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id A245B181411C5 for ; Fri, 19 Jun 2020 16:24:36 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30012:30054:30064:30070,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-62.50.0.100 64.95.201.95,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: steam84_1b046b326e1a X-Filterd-Recvd-Size: 6540 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) IronPort-SDR: 8vMb2O253zWB3TDqVyfe/+SGIn62EZZi7EmVvURHzpHTo9M6V60S13Ki4ysUAQMnrmV1kBqmrD +EP6zvYa2SEQ== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280165" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280165" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:31 -0700 IronPort-SDR: nx13Rsqul0tYp3FcKnRS1dWP4vZKJT58vvpM/bohsOEdFL+zQS2ZIkYmwpbgXMW/7b1OeF2JQx utbzQgMz4NPQ== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368298" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:30 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andrew Morton , Dave Hansen , Michal Hocko Subject: [PATCH 11/18] mm: Extract THP hugepage allocation Date: Fri, 19 Jun 2020 09:24:18 -0700 Message-Id: <20200619162425.1052382-12-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: A245B181411C5 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The next patch is going to rework this code to support MPOL_PREFERRED_MANY. This refactor makes the that change much more readable. After the extraction, the resulting code makes it apparent that this can be converted to a simple if ladder and thus allows removing the goto. There is not meant to be any functional or behavioral changes. Note that still at this point MPOL_PREFERRED_MANY isn't specially handled for huge pages. Cc: Andrew Morton Cc: Dave Hansen Cc: Michal Hocko Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 96 ++++++++++++++++++++++++++------------------------ 1 file changed, 49 insertions(+), 47 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 408ba78c8424..3ce2354fed44 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2232,6 +2232,48 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, return page; } +static struct page *alloc_pages_vma_thp(gfp_t gfp, struct mempolicy *pol, + int order, int node) +{ + nodemask_t *nmask; + struct page *page; + int hpage_node = node; + + /* + * For hugepage allocation and non-interleave policy which allows the + * current node (or other explicitly preferred node) we only try to + * allocate from the current/preferred node and don't fall back to other + * nodes, as the cost of remote accesses would likely offset THP + * benefits. + * + * If the policy is interleave or multiple preferred nodes, or does not + * allow the current node in its nodemask, we allocate the standard way. + */ + if (pol->mode == MPOL_PREFERRED && !(pol->flags & MPOL_F_LOCAL)) + hpage_node = first_node(pol->v.preferred_nodes); + + nmask = policy_nodemask(gfp, pol); + + /* + * First, try to allocate THP only on local node, but don't reclaim + * unnecessarily, just compact. + */ + page = __alloc_pages_nodemask(gfp | __GFP_THISNODE | __GFP_NORETRY, + order, hpage_node, nmask); + + /* + * If hugepage allocations are configured to always synchronous compact + * or the vma has been madvised to prefer hugepage backing, retry + * allowing remote memory with both reclaim and compact as well. + */ + if (!page && (gfp & __GFP_DIRECT_RECLAIM)) + page = __alloc_pages_nodemask(gfp, order, hpage_node, nmask); + + VM_BUG_ON(page && nmask && !node_isset(page_to_nid(page), *nmask)); + + return page; +} + /** * alloc_pages_vma - Allocate a page for a VMA. * @@ -2272,57 +2314,17 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, nid = interleave_nid(pol, vma, addr, PAGE_SHIFT + order); mpol_cond_put(pol); page = alloc_page_interleave(gfp, order, nid); - goto out; - } - - if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) { - int hpage_node = node; - - /* - * For hugepage allocation and non-interleave policy which - * allows the current node (or other explicitly preferred - * node) we only try to allocate from the current/preferred - * node and don't fall back to other nodes, as the cost of - * remote accesses would likely offset THP benefits. - * - * If the policy is interleave or multiple preferred nodes, or - * does not allow the current node in its nodemask, we allocate - * the standard way. - */ - if (pol->mode == MPOL_PREFERRED && !(pol->flags & MPOL_F_LOCAL)) - hpage_node = first_node(pol->v.preferred_nodes); - + } else if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && + hugepage)) { + page = alloc_pages_vma_thp(gfp, pol, order, node); + mpol_cond_put(pol); + } else { nmask = policy_nodemask(gfp, pol); + preferred_nid = policy_node(gfp, pol, node); + page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask); mpol_cond_put(pol); - - /* - * First, try to allocate THP only on local node, but - * don't reclaim unnecessarily, just compact. - */ - page = __alloc_pages_nodemask(gfp | __GFP_THISNODE | - __GFP_NORETRY, - order, hpage_node, nmask); - - /* - * If hugepage allocations are configured to always synchronous - * compact or the vma has been madvised to prefer hugepage - * backing, retry allowing remote memory with both reclaim and - * compact as well. - */ - if (!page && (gfp & __GFP_DIRECT_RECLAIM)) - page = __alloc_pages_nodemask(gfp, order, hpage_node, - nmask); - - VM_BUG_ON(page && nmask && - !node_isset(page_to_nid(page), *nmask)); - goto out; } - nmask = policy_nodemask(gfp, pol); - preferred_nid = policy_node(gfp, pol, node); - page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask); - mpol_cond_put(pol); -out: return page; } EXPORT_SYMBOL(alloc_pages_vma); From patchwork Fri Jun 19 16:24:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614619 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9549390 for ; Fri, 19 Jun 2020 16:25:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6C9F52168B for ; Fri, 19 Jun 2020 16:25:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6C9F52168B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CC3168D00DB; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C26208D00DF; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 969D28D00DB; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0106.hostedemail.com [216.40.44.106]) by kanga.kvack.org (Postfix) with ESMTP id 66EC58D00D8 for ; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2688154C88 for ; Fri, 19 Jun 2020 16:24:36 +0000 (UTC) X-FDA: 76946484552.16.kiss71_380725e26e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id E877C1017D44D for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30051:30054:30064:30070,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-62.50.0.100 64.95.201.95,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: kiss71_380725e26e1a X-Filterd-Recvd-Size: 3019 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) IronPort-SDR: S0Ta899fTfFqnpPii72biU4siTfPFv6y/4kFegRxdcNNxqjxLkOJ75LuO8piqEmQs4+F3Zgfv3 RsFLjacFTccA== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280167" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280167" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:31 -0700 IronPort-SDR: XJJALdrAq7Nipe1KewxFUg58dqI6sy2ibf/soOtR7y4AcACSnarWM1Utq0z2rzESMKha4iAHVL GnF/BBAC+v7A== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368338" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:31 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andrew Morton , Vlastimil Babka Subject: [PATCH 12/18] mm/mempolicy: Use __alloc_page_node for interleaved Date: Fri, 19 Jun 2020 09:24:19 -0700 Message-Id: <20200619162425.1052382-13-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: E877C1017D44D X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This helps reduce the consumers of the interface and get us in better shape to clean up some of the low level page allocation routines. The goal in doing that is to eventually limit the places we'll need to declare nodemask_t variables on the stack (more on that later). Currently the only distinction between __alloc_pages_node and __alloc_pages is that the former does sanity checks on the gfp flags and the nid. In the case of interleave nodes, this isn't necessary because the caller has already figured out the right nid and flags with interleave_nodes(), This kills the only real user of __alloc_pages, which can then be removed later. Cc: Andrew Morton Cc: Vlastimil Babka Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 3ce2354fed44..eb2520d68a04 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2220,7 +2220,7 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, { struct page *page; - page = __alloc_pages(gfp, order, nid); + page = __alloc_pages_node(nid, gfp, order); /* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled */ if (!static_branch_likely(&vm_numa_stat_key)) return page; From patchwork Fri Jun 19 16:24:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614621 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D3F3B90 for ; Fri, 19 Jun 2020 16:25:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AAEEA218AC for ; Fri, 19 Jun 2020 16:25:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AAEEA218AC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1A3B98D00DF; Fri, 19 Jun 2020 12:24:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 111758D00DE; Fri, 19 Jun 2020 12:24:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D5C188D00E0; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0101.hostedemail.com [216.40.44.101]) by kanga.kvack.org (Postfix) with ESMTP id A74B08D00D8 for ; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6564D180AD81A for ; Fri, 19 Jun 2020 16:24:36 +0000 (UTC) X-FDA: 76946484552.21.boats94_561165026e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 403EC180442CB for ; Fri, 19 Jun 2020 16:24:36 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30012:30054:30064:30091,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: boats94_561165026e1a X-Filterd-Recvd-Size: 4308 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) IronPort-SDR: Ex5ldL4pD5mxg1OfaTbcnjLeQ+Uk5H5jrMcB1U1gbnWUmMUPOORRxHZGoJa/T1oE7qFkDlKNYf svLLioX43ciQ== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280170" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280170" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:31 -0700 IronPort-SDR: GGwQLHAF1ILWCYkvG/muppFE/J8O/fExJvl2irhTRwMgAczaKATR80h+Wj9jzRUl4U2+0usKix GUF8z7udl6HQ== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368380" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:31 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andrew Morton , Michal Hocko Subject: [PATCH 13/18] mm: kill __alloc_pages Date: Fri, 19 Jun 2020 09:24:20 -0700 Message-Id: <20200619162425.1052382-14-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 403EC180442CB X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: IMPORTANT NOTE: It's unclear how safe it is to declare nodemask_t on the stack, when nodemask_t can be relatively large in huge NUMA systems. Upcoming patches will try to limit this. The primary purpose of this patch is to clear up which interfaces should be used for page allocation. There are several attributes in page allocation after the obvious gfp and order: 1. node mask: set of nodes to try to allocate from, fail if unavailable 2. preferred nid: a preferred node to try to allocate from, falling back to node mask if unavailable 3. (soon) preferred mask: like preferred nid, but multiple nodes. Here's a summary of the existing interfaces, and which they cover *alloc_pages: () *alloc_pages_node: (2) __alloc_pages_nodemask: (1,2,3) I am instead proposing instead the following interfaces as a reasonable set. Generally node binding isn't used by kernel code, it's only used for mempolicy. On the other hand, the kernel does have preferred nodes (today it's only one), and that is why those interfaces exist while an interface to specify binding does not. alloc_pages: () I don't care, give me pages. alloc_pages_node: (2) I want pages from this particular node first alloc_pages_nodes: (3) I want pages from *these* nodes first __alloc_pages_nodemask: (1,2,3) I'm picky about my pages Cc: Andrew Morton Cc: Michal Hocko Signed-off-by: Ben Widawsky --- include/linux/gfp.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 67a0774e080b..9ab5c07579bd 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -504,9 +504,10 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, nodemask_t *nodemask); static inline struct page * -__alloc_pages(gfp_t gfp_mask, unsigned int order, int preferred_nid) +__alloc_pages_nodes(nodemask_t *nodes, gfp_t gfp_mask, unsigned int order) { - return __alloc_pages_nodemask(gfp_mask, order, preferred_nid, NULL); + return __alloc_pages_nodemask(gfp_mask, order, first_node(*nodes), + NULL); } /* @@ -516,10 +517,12 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order, int preferred_nid) static inline struct page * __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) { + nodemask_t tmp; VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); VM_WARN_ON((gfp_mask & __GFP_THISNODE) && !node_online(nid)); - return __alloc_pages(gfp_mask, order, nid); + tmp = nodemask_of_node(nid); + return __alloc_pages_nodes(&tmp, gfp_mask, order); } /* From patchwork Fri Jun 19 16:24:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614627 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B26BF90 for ; Fri, 19 Jun 2020 16:25:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7F8142067D for ; Fri, 19 Jun 2020 16:25:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7F8142067D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B8D3A8D00E0; Fri, 19 Jun 2020 12:24:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A7A038D00DE; Fri, 19 Jun 2020 12:24:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CA248D00E0; Fri, 19 Jun 2020 12:24:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 660BB8D00DE for ; Fri, 19 Jun 2020 12:24:37 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 20AB615B0C2 for ; Fri, 19 Jun 2020 16:24:37 +0000 (UTC) X-FDA: 76946484594.16.land58_3b086e926e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id DD32F10059970 for ; Fri, 19 Jun 2020 16:24:36 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30012:30054:30064:30080,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: land58_3b086e926e1a X-Filterd-Recvd-Size: 5365 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:36 +0000 (UTC) IronPort-SDR: O7N+5YJiXSLgcuWJobuXPcMrUSJ2JhVU6Fyw1H9lKmbJi5TSym9EnjFeCzYJ/ztxfbLHE5E3PH b9MmgwpHF7CA== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280176" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280176" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:32 -0700 IronPort-SDR: ejqW/jBzApwb9uy3Z45iajkUTb3mOntknlN+idLtjDeLrqHbc0Of2uNQcckcxZFV2xwjErZare 3DHJC/kXPruw== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368420" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:31 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andrew Morton , Dave Hansen , Li Xinhai , Michal Hocko , Vlastimil Babka Subject: [PATCH 14/18] mm/mempolicy: Introduce policy_preferred_nodes() Date: Fri, 19 Jun 2020 09:24:21 -0700 Message-Id: <20200619162425.1052382-15-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: DD32F10059970 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Current code provides a policy_node() helper which given a preferred node, flags, and policy will help determine the preferred node. Going forward it is desirable to have this same functionality given a set of nodes, rather than a single node. policy_node is then implemented in terms of the now more generic policy_preferred_nodes. I went back and forth as to whether this function should take in a set of preferred nodes and modify that. Something like: policy_preferred_nodes(gfp, *policy, *mask); That idea was nice as it allowed the policy function to create the mask to be used. Ultimately, it turns out callers don't need such fanciness, and those callers would use this mask directly in page allocation functions that can accept NULL for a preference mask. So having this function return NULL when there is no ideal mask turns out to be beneficial. Cc: Andrew Morton Cc: Dave Hansen Cc: Li Xinhai Cc: Michal Hocko Cc: Vlastimil Babka Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 57 +++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 47 insertions(+), 10 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index eb2520d68a04..3c48f299d344 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1946,24 +1946,61 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) return NULL; } -/* Return the node id preferred by the given mempolicy, or the given id */ -static int policy_node(gfp_t gfp, struct mempolicy *policy, - int nd) +/* + * Returns a nodemask to be used for preference if the given policy dictates. + * Otherwise, returns NULL and the caller should likely use + * nodemask_of_node(numa_mem_id()); + */ +static nodemask_t *policy_preferred_nodes(gfp_t gfp, struct mempolicy *policy) { - if ((policy->mode == MPOL_PREFERRED || - policy->mode == MPOL_PREFERRED_MANY) && - !(policy->flags & MPOL_F_LOCAL)) { - nd = first_node(policy->v.preferred_nodes); - } else { + nodemask_t *pol_pref = &policy->v.preferred_nodes; + + /* + * There are 2 "levels" of policy. What the callers asked for + * (prefmask), and what the memory policy should be for the given gfp. + * The memory policy takes preference in the case that prefmask isn't a + * subset of the mem policy. + */ + switch (policy->mode) { + case MPOL_PREFERRED: + /* local, or buggy policy */ + if (policy->flags & MPOL_F_LOCAL || + WARN_ON(nodes_weight(*pol_pref) != 1)) + return NULL; + else + return pol_pref; + break; + case MPOL_PREFERRED_MANY: + if (WARN_ON(nodes_weight(*pol_pref) == 0)) + return NULL; + else + return pol_pref; + break; + default: + case MPOL_INTERLEAVE: + case MPOL_BIND: /* * __GFP_THISNODE shouldn't even be used with the bind policy * because we might easily break the expectation to stay on the * requested node and not break the policy. */ - WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE)); + WARN_ON_ONCE(gfp & __GFP_THISNODE); + break; } - return nd; + return NULL; +} + +/* Return the node id preferred by the given mempolicy, or the given id */ +static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) +{ + nodemask_t *tmp; + + tmp = policy_preferred_nodes(gfp, policy); + if (tmp) + return first_node(*tmp); + else + return nd; } /* Do dynamic interleaving for a process */ From patchwork Fri Jun 19 16:24:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614629 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D1BCB90 for ; Fri, 19 Jun 2020 16:25:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A8F002067D for ; Fri, 19 Jun 2020 16:25:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A8F002067D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1517C8D00DE; Fri, 19 Jun 2020 12:24:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0B6818D00E1; Fri, 19 Jun 2020 12:24:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E704A8D00DE; Fri, 19 Jun 2020 12:24:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0222.hostedemail.com [216.40.44.222]) by kanga.kvack.org (Postfix) with ESMTP id BDE128D00E1 for ; Fri, 19 Jun 2020 12:24:37 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 80165180AD806 for ; Fri, 19 Jun 2020 16:24:37 +0000 (UTC) X-FDA: 76946484594.22.uncle58_5b059af26e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 501931809F4FA for ; Fri, 19 Jun 2020 16:24:37 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30054:30064,0,RBL:192.55.52.93:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: uncle58_5b059af26e1a X-Filterd-Recvd-Size: 6621 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:36 +0000 (UTC) IronPort-SDR: ML57aXtcMravLn2r7KWGd7Gl7vWkb1NUOvC4emlFJWBDI3exKdDsyTesG4p7a6gLppIX51vSoT T61zgHkPk5pw== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="141280181" X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="141280181" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:32 -0700 IronPort-SDR: v6DnmVNH9PS/6lwBBLaZPF/6YMXDxcsISHZJr805IJNbB4ngx/v73B3fX8nhZqiTHdozQwebEw w19zfihkAZNg== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368456" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:32 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andrew Morton , Dave Hansen , Mike Kravetz , Mina Almasry , Vlastimil Babka Subject: [PATCH 15/18] mm: convert callers of __alloc_pages_nodemask to pmask Date: Fri, 19 Jun 2020 09:24:22 -0700 Message-Id: <20200619162425.1052382-16-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 501931809F4FA X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that the infrastructure is in place to both select, and allocate a set of preferred nodes as specified by policy (or perhaps in the future, the calling function), start transitioning over functions that can benefit from this. This patch looks stupid. It seems to artificially insert a nodemask on the stack, then just use the first node from that mask - in other words, a nop just adding overhead. It does. The reason for this is it's a preparatory patch for when we switch over to __alloc_pages_nodemask() to using a mask for preferences. This helps with readability and bisectability. Cc: Andrew Morton Cc: Dave Hansen Cc: Mike Kravetz Cc: Mina Almasry Cc: Vlastimil Babka Signed-off-by: Ben Widawsky --- mm/hugetlb.c | 11 ++++++++--- mm/mempolicy.c | 38 +++++++++++++++++++++++--------------- 2 files changed, 31 insertions(+), 18 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 57ece74e3aae..71b6750661df 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1687,6 +1687,12 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, int order = huge_page_order(h); struct page *page; bool alloc_try_hard = true; + nodemask_t pmask; + + if (nid == NUMA_NO_NODE) + nid = numa_mem_id(); + + pmask = nodemask_of_node(nid); /* * By default we always try hard to allocate the page with @@ -1700,9 +1706,8 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, gfp_mask |= __GFP_COMP|__GFP_NOWARN; if (alloc_try_hard) gfp_mask |= __GFP_RETRY_MAYFAIL; - if (nid == NUMA_NO_NODE) - nid = numa_mem_id(); - page = __alloc_pages_nodemask(gfp_mask, order, nid, nmask); + page = __alloc_pages_nodemask(gfp_mask, order, first_node(pmask), + nmask); if (page) __count_vm_event(HTLB_BUDDY_PGALLOC); else diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 3c48f299d344..9521bb46aa00 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2270,11 +2270,11 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, } static struct page *alloc_pages_vma_thp(gfp_t gfp, struct mempolicy *pol, - int order, int node) + int order, nodemask_t *prefmask) { nodemask_t *nmask; struct page *page; - int hpage_node = node; + int hpage_node = first_node(*prefmask); /* * For hugepage allocation and non-interleave policy which allows the @@ -2286,9 +2286,6 @@ static struct page *alloc_pages_vma_thp(gfp_t gfp, struct mempolicy *pol, * If the policy is interleave or multiple preferred nodes, or does not * allow the current node in its nodemask, we allocate the standard way. */ - if (pol->mode == MPOL_PREFERRED && !(pol->flags & MPOL_F_LOCAL)) - hpage_node = first_node(pol->v.preferred_nodes); - nmask = policy_nodemask(gfp, pol); /* @@ -2340,10 +2337,14 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, { struct mempolicy *pol; struct page *page; - int preferred_nid; - nodemask_t *nmask; + nodemask_t *nmask, *pmask, tmp; pol = get_vma_policy(vma, addr); + pmask = policy_preferred_nodes(gfp, pol); + if (!pmask) { + tmp = nodemask_of_node(node); + pmask = &tmp; + } if (pol->mode == MPOL_INTERLEAVE) { unsigned nid; @@ -2353,12 +2354,12 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, page = alloc_page_interleave(gfp, order, nid); } else if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) { - page = alloc_pages_vma_thp(gfp, pol, order, node); + page = alloc_pages_vma_thp(gfp, pol, order, pmask); mpol_cond_put(pol); } else { nmask = policy_nodemask(gfp, pol); - preferred_nid = policy_node(gfp, pol, node); - page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask); + page = __alloc_pages_nodemask(gfp, order, first_node(*pmask), + nmask); mpol_cond_put(pol); } @@ -2393,12 +2394,19 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned order) * No reference counting needed for current->mempolicy * nor system default_policy */ - if (pol->mode == MPOL_INTERLEAVE) + if (pol->mode == MPOL_INTERLEAVE) { page = alloc_page_interleave(gfp, order, interleave_nodes(pol)); - else - page = __alloc_pages_nodemask(gfp, order, - policy_node(gfp, pol, numa_node_id()), - policy_nodemask(gfp, pol)); + } else { + nodemask_t tmp, *pmask; + + pmask = policy_preferred_nodes(gfp, pol); + if (!pmask) { + tmp = nodemask_of_node(numa_node_id()); + pmask = &tmp; + } + page = __alloc_pages_nodemask(gfp, order, first_node(*pmask), + policy_nodemask(gfp, pol)); + } return page; } From patchwork Fri Jun 19 16:24:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614607 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 34DD390 for ; Fri, 19 Jun 2020 16:24:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F374A217D8 for ; Fri, 19 Jun 2020 16:24:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F374A217D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D0CB68D00D6; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C73F78D00DD; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CD728D00D6; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0150.hostedemail.com [216.40.44.150]) by kanga.kvack.org (Postfix) with ESMTP id 51A7F8D00D8 for ; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1AF168698E for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) X-FDA: 76946484510.27.drain74_511049d26e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 8F5803D66B for ; Fri, 19 Jun 2020 16:24:34 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30005:30054:30064:30091,0,RBL:134.134.136.126:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:1:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: drain74_511049d26e1a X-Filterd-Recvd-Size: 10539 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:33 +0000 (UTC) IronPort-SDR: ZcukJWQ8rgasjr8Cevgp4pla9T0uzrcvAr4hZY6EkfyJKFsxPOWiTZYsvgzXZbcETuZqZmz/Qq WuSop2yAk9Hg== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="130375196" X-IronPort-AV: E=Sophos;i="5.75,256,1589266800"; d="scan'208";a="130375196" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:32 -0700 IronPort-SDR: QOS+I4PPX2AaJcI7jMrIE0uugPR+OmjPTed45wszA9ZhRfUkcepiEtQg/qf7o+57N+N6PI4u35 4MU0YfJKC0AQ== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368500" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:32 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andrew Morton , Dave Hansen , Jason Gunthorpe , Michal Hocko , Mike Kravetz Subject: [PATCH 16/18] alloc_pages_nodemask: turn preferred nid into a nodemask Date: Fri, 19 Jun 2020 09:24:23 -0700 Message-Id: <20200619162425.1052382-17-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 8F5803D66B X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The guts of the page allocator already understands that the memory policy might provide multiple preferred nodes. Ideally, the alloc function itself wouldn't take multiple nodes until one of the callers decided it would be useful. Unfortunately, the way the callstack is today is the caller of __alloc_pages_nodemask is responsible for figuring out the preferred nodes (almost always without policy in place, this is numa_node_id()). The purpose of this patch is to allow multiple preferred nodes while keeping the existing logical preference assignments in place. In other words, everything at, and below __alloc_pages_nodemask() has no concept of policy, and this patch maintains that division. Like bindmask, NULL and empty set for preference are allowed. A note on allocation. One of the obvious fallouts from this is some callers are now going to allocate nodemasks on their stack. When no policy is in place, these nodemasks are simply the nodemask_of_node(numa_node_id()). Some amount of this is addressed in the next patch. The alternatives are kmalloc which is unsafe in these paths, a percpu variable can't work because a nodemask today can be 128B at the max NODE_SHIFT of 10 on x86 cnd ia64 is too large for a percpu variable, or a lookup table. There's no reason a lookup table can't work, but it seems like a premature optimization. If you were to make a lookup table for the more extreme cases of large systems, each nodemask would be 128B, and you have 1024 nodes - so the size of just that is 128K. I'm very open to better solutions. Cc: Andrew Morton Cc: Dave Hansen Cc: Jason Gunthorpe Cc: Michal Hocko Cc: Mike Kravetz Signed-off-by: Ben Widawsky --- include/linux/gfp.h | 8 +++----- include/linux/migrate.h | 4 ++-- mm/hugetlb.c | 3 +-- mm/mempolicy.c | 27 ++++++--------------------- mm/page_alloc.c | 10 ++++------ 5 files changed, 16 insertions(+), 36 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 9ab5c07579bd..47e9c02c17ae 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -499,15 +499,13 @@ static inline int arch_make_page_accessible(struct page *page) } #endif -struct page * -__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, - nodemask_t *nodemask); +struct page *__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, + nodemask_t *prefmask, nodemask_t *nodemask); static inline struct page * __alloc_pages_nodes(nodemask_t *nodes, gfp_t gfp_mask, unsigned int order) { - return __alloc_pages_nodemask(gfp_mask, order, first_node(*nodes), - NULL); + return __alloc_pages_nodemask(gfp_mask, order, nodes, NULL); } /* diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 3e546cbf03dd..91b399ec9249 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -37,6 +37,7 @@ static inline struct page *new_page_nodemask(struct page *page, gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL; unsigned int order = 0; struct page *new_page = NULL; + nodemask_t pmask = nodemask_of_node(preferred_nid); if (PageHuge(page)) return alloc_huge_page_nodemask(page_hstate(compound_head(page)), @@ -50,8 +51,7 @@ static inline struct page *new_page_nodemask(struct page *page, if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE)) gfp_mask |= __GFP_HIGHMEM; - new_page = __alloc_pages_nodemask(gfp_mask, order, - preferred_nid, nodemask); + new_page = __alloc_pages_nodemask(gfp_mask, order, &pmask, nodemask); if (new_page && PageTransHuge(new_page)) prep_transhuge_page(new_page); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 71b6750661df..52e097aed7ed 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1706,8 +1706,7 @@ static struct page *alloc_buddy_huge_page(struct hstate *h, gfp_mask |= __GFP_COMP|__GFP_NOWARN; if (alloc_try_hard) gfp_mask |= __GFP_RETRY_MAYFAIL; - page = __alloc_pages_nodemask(gfp_mask, order, first_node(pmask), - nmask); + page = __alloc_pages_nodemask(gfp_mask, order, &pmask, nmask); if (page) __count_vm_event(HTLB_BUDDY_PGALLOC); else diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 9521bb46aa00..fb49bea41ab8 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2274,7 +2274,6 @@ static struct page *alloc_pages_vma_thp(gfp_t gfp, struct mempolicy *pol, { nodemask_t *nmask; struct page *page; - int hpage_node = first_node(*prefmask); /* * For hugepage allocation and non-interleave policy which allows the @@ -2282,9 +2281,6 @@ static struct page *alloc_pages_vma_thp(gfp_t gfp, struct mempolicy *pol, * allocate from the current/preferred node and don't fall back to other * nodes, as the cost of remote accesses would likely offset THP * benefits. - * - * If the policy is interleave or multiple preferred nodes, or does not - * allow the current node in its nodemask, we allocate the standard way. */ nmask = policy_nodemask(gfp, pol); @@ -2293,7 +2289,7 @@ static struct page *alloc_pages_vma_thp(gfp_t gfp, struct mempolicy *pol, * unnecessarily, just compact. */ page = __alloc_pages_nodemask(gfp | __GFP_THISNODE | __GFP_NORETRY, - order, hpage_node, nmask); + order, prefmask, nmask); /* * If hugepage allocations are configured to always synchronous compact @@ -2301,7 +2297,7 @@ static struct page *alloc_pages_vma_thp(gfp_t gfp, struct mempolicy *pol, * allowing remote memory with both reclaim and compact as well. */ if (!page && (gfp & __GFP_DIRECT_RECLAIM)) - page = __alloc_pages_nodemask(gfp, order, hpage_node, nmask); + page = __alloc_pages_nodemask(gfp, order, prefmask, nmask); VM_BUG_ON(page && nmask && !node_isset(page_to_nid(page), *nmask)); @@ -2337,14 +2333,10 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, { struct mempolicy *pol; struct page *page; - nodemask_t *nmask, *pmask, tmp; + nodemask_t *nmask, *pmask; pol = get_vma_policy(vma, addr); pmask = policy_preferred_nodes(gfp, pol); - if (!pmask) { - tmp = nodemask_of_node(node); - pmask = &tmp; - } if (pol->mode == MPOL_INTERLEAVE) { unsigned nid; @@ -2358,9 +2350,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, mpol_cond_put(pol); } else { nmask = policy_nodemask(gfp, pol); - page = __alloc_pages_nodemask(gfp, order, first_node(*pmask), - nmask); mpol_cond_put(pol); + page = __alloc_pages_nodemask(gfp, order, pmask, nmask); } return page; @@ -2397,14 +2388,8 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned order) if (pol->mode == MPOL_INTERLEAVE) { page = alloc_page_interleave(gfp, order, interleave_nodes(pol)); } else { - nodemask_t tmp, *pmask; - - pmask = policy_preferred_nodes(gfp, pol); - if (!pmask) { - tmp = nodemask_of_node(numa_node_id()); - pmask = &tmp; - } - page = __alloc_pages_nodemask(gfp, order, first_node(*pmask), + page = __alloc_pages_nodemask(gfp, order, + policy_preferred_nodes(gfp, pol), policy_nodemask(gfp, pol)); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c6f8f112a5d4..0f90419fe0d8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4967,15 +4967,13 @@ static inline void finalise_ac(gfp_t gfp_mask, struct alloc_context *ac) /* * This is the 'heart' of the zoned buddy allocator. */ -struct page * -__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, - nodemask_t *nodemask) +struct page *__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, + nodemask_t *prefmask, nodemask_t *nodemask) { struct page *page; unsigned int alloc_flags = ALLOC_WMARK_LOW; gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ struct alloc_context ac = { }; - nodemask_t prefmask = nodemask_of_node(preferred_nid); /* * There are several places where we assume that the order value is sane @@ -4988,11 +4986,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, gfp_mask &= gfp_allowed_mask; alloc_mask = gfp_mask; - if (!prepare_alloc_pages(gfp_mask, order, &prefmask, nodemask, &ac, + if (!prepare_alloc_pages(gfp_mask, order, prefmask, nodemask, &ac, &alloc_mask, &alloc_flags)) return NULL; - ac.prefmask = &prefmask; + ac.prefmask = prefmask; finalise_ac(gfp_mask, &ac); From patchwork Fri Jun 19 16:24:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614617 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0C69390 for ; Fri, 19 Jun 2020 16:25:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CE1FC2168B for ; Fri, 19 Jun 2020 16:25:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CE1FC2168B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7E9088D00D9; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6AF138D00DE; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2D6678D00D9; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0226.hostedemail.com [216.40.44.226]) by kanga.kvack.org (Postfix) with ESMTP id 015158D00DA for ; Fri, 19 Jun 2020 12:24:35 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id B6F9215B0B6 for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) X-FDA: 76946484510.29.owner91_541300c26e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 875F118086CAB for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30054:30064,0,RBL:134.134.136.126:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: owner91_541300c26e1a X-Filterd-Recvd-Size: 4007 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:34 +0000 (UTC) IronPort-SDR: BFZrcCmlXJlv5P67ShBGZKk+JHjv0qOfbqsDNXuiE49pZ10YHtrbQc29ehq7i4bMAY1XjbgG9g lwqgTD3uhKWQ== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="130375198" X-IronPort-AV: E=Sophos;i="5.75,256,1589266800"; d="scan'208";a="130375198" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:33 -0700 IronPort-SDR: RQQn/+0QFSpZPpp31xLKN9CYC9HpEInE/QJci4bR7T8sBihOUwBMi3syIWShzSGXC+GfljCd1r cdV3YCkX266g== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368539" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:32 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andrew Morton , Michal Hocko , Tejun Heo Subject: [PATCH 17/18] mm: Use less stack for page allocations Date: Fri, 19 Jun 2020 09:24:24 -0700 Message-Id: <20200619162425.1052382-18-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 875F118086CAB X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After converting __alloc_pages_nodemask to take in a preferred nodoemask, __alloc_pages_node is left holding the bag as requiring stack space since it needs to generate a nodemask for the specific node. The patch attempts to remove all callers of it unless absolutely necessary to avoid using stack space which is theoretically significant in huge NUMA systems. It turns out there aren't too many opportunities to do this as all callers know exactly what they want. The difference between __alloc_pages_node and alloc_pages_node is the former is meant for explicit node allocation while the latter support providing no preference (by specifying NUMA_NO_NODE as nid). Now it becomes clear that NUMA_NO_NODE can be implemented without using stack space via some of the newer functions that have been added, in particular, __alloc_pages_nodes and __alloc_pages_nodemask. In the non NUMA case, alloc_pages used numa_node_id(), which is 0. Switching to NUMA_NO_NODE allows us to avoid using the stack. Cc: Andrew Morton Cc: Michal Hocko Cc: Tejun Heo Signed-off-by: Ben Widawsky --- include/linux/gfp.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 47e9c02c17ae..e78982ef9349 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -532,7 +532,7 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) { if (nid == NUMA_NO_NODE) - nid = numa_mem_id(); + return __alloc_pages_nodes(NULL, gfp_mask, order); return __alloc_pages_node(nid, gfp_mask, order); } @@ -551,8 +551,8 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true) #else -#define alloc_pages(gfp_mask, order) \ - alloc_pages_node(numa_node_id(), gfp_mask, order) +#define alloc_pages(gfp_mask, order) \ + alloc_pages_node(NUMA_NO_NODE, gfp_mask, order) #define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\ alloc_pages(gfp_mask, order) #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ From patchwork Fri Jun 19 16:24:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614625 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 41CCB14E3 for ; Fri, 19 Jun 2020 16:25:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 15A9621852 for ; Fri, 19 Jun 2020 16:25:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15A9621852 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4D6A48D00D8; Fri, 19 Jun 2020 12:24:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3A03F8D00E0; Fri, 19 Jun 2020 12:24:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E723A8D00D8; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0163.hostedemail.com [216.40.44.163]) by kanga.kvack.org (Postfix) with ESMTP id B98598D00DE for ; Fri, 19 Jun 2020 12:24:36 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6BF5B180AD81D for ; Fri, 19 Jun 2020 16:24:36 +0000 (UTC) X-FDA: 76946484552.28.glue19_3601ac826e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 493E115B0C0 for ; Fri, 19 Jun 2020 16:24:36 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30003:30012:30054:30064:30070:30074:30075:30090,0,RBL:134.134.136.126:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: glue19_3601ac826e1a X-Filterd-Recvd-Size: 7359 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:35 +0000 (UTC) IronPort-SDR: 3zdJ184yCuDNmkgYHnRnPD3wuyphEFzSgo+WtNfIq8se7OpVdNOOqI9yByE2chWtsdN7DFpdCE Pdy45copx58A== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="130375201" X-IronPort-AV: E=Sophos;i="5.75,256,1589266800"; d="scan'208";a="130375201" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:33 -0700 IronPort-SDR: dVbg4TDZtx5ZKdztee7fN0NUM7eJavq1g9hepFwo5Ys59JRMtQ6xsL5j7Tz7kjKImgU3NeeJVq FXhxNIHuwO/A== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264368591" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:32 -0700 From: Ben Widawsky To: linux-mm Cc: Ben Widawsky , Andrew Morton , Dave Hansen , David Hildenbrand , Jonathan Corbet , Michal Hocko , Vlastimil Babka Subject: [PATCH 18/18] mm/mempolicy: Advertise new MPOL_PREFERRED_MANY Date: Fri, 19 Jun 2020 09:24:25 -0700 Message-Id: <20200619162425.1052382-19-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com> References: <20200619162425.1052382-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 493E115B0C0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: See comments in code, and previous commit messages for details of implementation and usage. Fix whitespace while here. Cc: Andrew Morton Cc: Dave Hansen Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Michal Hocko Cc: Vlastimil Babka Signed-off-by: Ben Widawsky --- .../admin-guide/mm/numa_memory_policy.rst | 16 ++++++++++++---- include/uapi/linux/mempolicy.h | 6 +++--- mm/mempolicy.c | 14 ++++++-------- mm/page_alloc.c | 3 --- 4 files changed, 21 insertions(+), 18 deletions(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst index 1ad020c459b8..b69963a37fc8 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -245,6 +245,14 @@ MPOL_INTERLEAVED address range or file. During system boot up, the temporary interleaved system default policy works in this mode. +MPOL_PREFERRED_MANY + This mode specifies that the allocation should be attempted from the + nodemask specified in the policy. If that allocation fails, the kernel + will search other nodes, in order of increasing distance from the first + set bit in the nodemask based on information provided by the platform + firmware. It is similar to MPOL_PREFERRED with the main exception that + is is an error to have an empty nodemask. + NUMA memory policy supports the following optional mode flags: MPOL_F_STATIC_NODES @@ -253,10 +261,10 @@ MPOL_F_STATIC_NODES nodes changes after the memory policy has been defined. Without this flag, any time a mempolicy is rebound because of a - change in the set of allowed nodes, the node (Preferred) or - nodemask (Bind, Interleave) is remapped to the new set of - allowed nodes. This may result in nodes being used that were - previously undesired. + change in the set of allowed nodes, the preferred nodemask (Preferred + Many), preferred node (Preferred) or nodemask (Bind, Interleave) is + remapped to the new set of allowed nodes. This may result in nodes + being used that were previously undesired. With this flag, if the user-specified nodes overlap with the nodes allowed by the task's cpuset, then the memory policy is diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 3354774af61e..ad3eee651d4e 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -16,13 +16,13 @@ */ /* Policies */ -enum { - MPOL_DEFAULT, +enum { MPOL_DEFAULT, MPOL_PREFERRED, MPOL_BIND, MPOL_INTERLEAVE, MPOL_LOCAL, - MPOL_MAX, /* always last member of enum */ + MPOL_PREFERRED_MANY, + MPOL_MAX, /* always last member of enum */ }; /* Flags for set_mempolicy */ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index fb49bea41ab8..07e916f8f6b7 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -108,8 +108,6 @@ #include "internal.h" -#define MPOL_PREFERRED_MANY MPOL_MAX - /* Internal flags */ #define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0) /* Skip checks for continuous vmas */ #define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1) /* Invert check for nodemask */ @@ -180,7 +178,7 @@ struct mempolicy *get_task_policy(struct task_struct *p) static const struct mempolicy_operations { int (*create)(struct mempolicy *pol, const nodemask_t *nodes); void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes); -} mpol_ops[MPOL_MAX + 1]; +} mpol_ops[MPOL_MAX]; static inline int mpol_store_user_nodemask(const struct mempolicy *pol) { @@ -385,8 +383,8 @@ static void mpol_rebind_preferred_common(struct mempolicy *pol, } /* MPOL_PREFERRED_MANY allows multiple nodes to be set in 'nodes' */ -static void __maybe_unused mpol_rebind_preferred_many(struct mempolicy *pol, - const nodemask_t *nodes) +static void mpol_rebind_preferred_many(struct mempolicy *pol, + const nodemask_t *nodes) { mpol_rebind_preferred_common(pol, nodes, nodes); } @@ -448,7 +446,7 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new) mmap_write_unlock(mm); } -static const struct mempolicy_operations mpol_ops[MPOL_MAX + 1] = { +static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { [MPOL_DEFAULT] = { .rebind = mpol_rebind_default, }, @@ -466,8 +464,8 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX + 1] = { }, /* MPOL_LOCAL is converted to MPOL_PREFERRED on policy creation */ [MPOL_PREFERRED_MANY] = { - .create = NULL, - .rebind = NULL, + .create = mpol_new_preferred_many, + .rebind = mpol_rebind_preferred_many, }, }; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0f90419fe0d8..b89c9c2637bf 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4867,9 +4867,6 @@ struct zonelist *preferred_zonelist(gfp_t gfp_mask, const nodemask_t *prefmask, nodemask_t pref; int nid, local_node = numa_mem_id(); - /* Multi nodes not supported yet */ - VM_BUG_ON(prefmask && nodes_weight(*prefmask) != 1); - #define _isset(mask, node) \ (!(mask) || nodes_empty(*(mask)) ? 1 : node_isset(node, *(mask))) /*