From patchwork Fri Jun 19 16:24:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 11614637 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9052F90 for ; Fri, 19 Jun 2020 16:25:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5997A21835 for ; Fri, 19 Jun 2020 16:25:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5997A21835 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4144D8D00E1; Fri, 19 Jun 2020 12:24:46 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 39C6D8D00E5; Fri, 19 Jun 2020 12:24:46 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 289768D00E1; Fri, 19 Jun 2020 12:24:46 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0085.hostedemail.com [216.40.44.85]) by kanga.kvack.org (Postfix) with ESMTP id 098578D00E5 for ; Fri, 19 Jun 2020 12:24:46 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C250932A24 for ; Fri, 19 Jun 2020 16:24:45 +0000 (UTC) X-FDA: 76946484930.25.bell40_10040d226e1a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 973591804E3A1 for ; Fri, 19 Jun 2020 16:24:45 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ben.widawsky@intel.com,,RULES_HIT:30001:30003:30012:30036:30054:30064:30070,0,RBL:192.55.52.120:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: bell40_10040d226e1a X-Filterd-Recvd-Size: 8348 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 16:24:44 +0000 (UTC) IronPort-SDR: 4GMYFKrAa1yS4o3Tf0MRcYdbmh16nY7zzzIg7hI7z2XmngeV0VLsrEhhVp7Qckj9ZgpaSGkVfp gtp4gK9QYClQ== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="140535478" X-IronPort-AV: E=Sophos;i="5.75,256,1589266800"; d="scan'208";a="140535478" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:16 -0700 IronPort-SDR: nP3q7mREd5Lsc7D4L2Y3ha6sA4M8v4mdcUWkAp5tJVvdk7lGUabsnNljxzZFAs4RF+sHWqYung 7xOKQmCI0L/Q== X-IronPort-AV: E=Sophos;i="5.75,255,1589266800"; d="scan'208";a="264366273" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 09:24:16 -0700 From: Ben Widawsky To: linux-mm Subject: [PATCH 04/18] mm/page_alloc: add preferred pass to page allocation Date: Fri, 19 Jun 2020 09:24:00 -0700 Message-Id: <20200619162414.1052234-5-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200619162414.1052234-1-ben.widawsky@intel.com> References: <20200619162414.1052234-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 973591804E3A1 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch updates the core part of page allocation (pulling from the free list) to take preferred nodes into account first. If an allocation from a preferred node cannot be found, the remaining nodes in the nodemask are checked. Intentionally not handled in this patch are OOM node scanning and reclaim scanning. I am very open and receptive on comments as to whether it is worth handling those cases with a preferred node ordering. In this patch the code first scans the preferred nodes to make the allocation, and then takes the subset of nodes in the remaining bound nodes (often this is NULL aka all nodes) - potentially two passes. Actually, the code was already two passes as it tries not to fragment on the first pass, so now it's up to 4 passes. Consider a 3 node system (0-2) passed the following masks: Preferred: 100 Bound: 110 pass 1: node 2 no fragmentation pass 2: node 1 no fragmentation pass 3: node 2 w/fragmentation pass 4: node 1 w/fragmentation Cc: Andi Kleen Cc: Andrew Morton Cc: Dave Hansen Cc: Johannes Weiner Cc: Mel Gorman Cc: Michal Hocko Cc: Vlastimil Babka Signed-off-by: Ben Widawsky --- mm/internal.h | 1 + mm/page_alloc.c | 108 +++++++++++++++++++++++++++++++++++------------- 2 files changed, 80 insertions(+), 29 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 9886db20d94f..8d16229c6cbb 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -138,6 +138,7 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); struct alloc_context { struct zonelist *zonelist; nodemask_t *nodemask; + nodemask_t *prefmask; struct zoneref *preferred_zoneref; int migratetype; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 280ca85dc4d8..3cf44b6c31ae 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3675,6 +3675,69 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask) return alloc_flags; } +#ifdef CONFIG_NUMA +static void set_pref_bind_mask(nodemask_t *out, const nodemask_t *prefmask, + const nodemask_t *bindmask) +{ + bool has_pref, has_bind; + + has_pref = prefmask && !nodes_empty(*prefmask); + has_bind = bindmask && !nodes_empty(*bindmask); + + if (has_pref && has_bind) + nodes_and(*out, *prefmask, *bindmask); + else if (has_pref && !has_bind) + *out = *prefmask; + else if (!has_pref && has_bind) + *out = *bindmask; + else if (!has_pref && !has_bind) + *out = NODE_MASK_ALL; + else + unreachable(); +} +#else +#define set_pref_bind_mask(out, pref, bind) \ + { \ + (out)->bits[0] = 1UL \ + } +#endif + +/* Helper to generate the preferred and fallback nodelists */ +static void __nodemask_for_freelist_scan(const struct alloc_context *ac, + bool preferred, nodemask_t *outnodes) +{ + bool has_pref; + bool has_bind; + + if (preferred) { + set_pref_bind_mask(outnodes, ac->prefmask, ac->nodemask); + return; + } + + has_pref = ac->prefmask && !nodes_empty(*ac->prefmask); + has_bind = ac->nodemask && !nodes_empty(*ac->nodemask); + + if (!has_bind && !has_pref) { + /* + * If no preference, we already tried the full nodemask, + * so we have to bail. + */ + nodes_clear(*outnodes); + } else if (!has_bind && has_pref) { + /* We tried preferred nodes only before. Invert that. */ + nodes_complement(*outnodes, *ac->prefmask); + } else if (has_bind && !has_pref) { + /* + * If preferred was empty, we've tried all bound nodes, + * and there nothing further we can do. + */ + nodes_clear(*outnodes); + } else if (has_bind && has_pref) { + /* Try the bound nodes that weren't tried before. */ + nodes_andnot(*outnodes, *ac->nodemask, *ac->prefmask); + } +} + /* * get_page_from_freelist goes through the zonelist trying to allocate * a page. @@ -3686,7 +3749,10 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, struct zoneref *z; struct zone *zone; struct pglist_data *last_pgdat_dirty_limit = NULL; - bool no_fallback; + nodemask_t nodes; + bool no_fallback, preferred_nodes_exhausted = false; + + __nodemask_for_freelist_scan(ac, true, &nodes); retry: /* @@ -3696,7 +3762,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, no_fallback = alloc_flags & ALLOC_NOFRAGMENT; z = ac->preferred_zoneref; for_next_zone_zonelist_nodemask(zone, z, ac->zonelist, - ac->highest_zoneidx, ac->nodemask) { + ac->highest_zoneidx, &nodes) + { struct page *page; unsigned long mark; @@ -3816,12 +3883,20 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, } } + if (!preferred_nodes_exhausted) { + __nodemask_for_freelist_scan(ac, false, &nodes); + preferred_nodes_exhausted = true; + goto retry; + } + /* * It's possible on a UMA machine to get through all zones that are * fragmented. If avoiding fragmentation, reset and try again. */ if (no_fallback) { alloc_flags &= ~ALLOC_NOFRAGMENT; + __nodemask_for_freelist_scan(ac, true, &nodes); + preferred_nodes_exhausted = false; goto retry; } @@ -4763,33 +4838,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, return page; } -#ifndef CONFIG_NUMA -#define set_pref_bind_mask(out, pref, bind) \ - { \ - (out)->bits[0] = 1UL \ - } -#else -static void set_pref_bind_mask(nodemask_t *out, const nodemask_t *prefmask, - const nodemask_t *bindmask) -{ - bool has_pref, has_bind; - - has_pref = prefmask && !nodes_empty(*prefmask); - has_bind = bindmask && !nodes_empty(*bindmask); - - if (has_pref && has_bind) - nodes_and(*out, *prefmask, *bindmask); - else if (has_pref && !has_bind) - *out = *prefmask; - else if (!has_pref && has_bind) - *out = *bindmask; - else if (!has_pref && !has_bind) - unreachable(); /* Handled above */ - else - unreachable(); -} -#endif - /* * Find a zonelist from a preferred node. Here is a truth table example using 2 * different masks. The choices are, NULL mask, empty mask, two masks with an @@ -4945,6 +4993,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, &alloc_mask, &alloc_flags)) return NULL; + ac.prefmask = &prefmask; + finalise_ac(gfp_mask, &ac); /*