From patchwork Thu Sep 2 22:00:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12473255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B461AC433EF for ; Thu, 2 Sep 2021 22:01:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6BF5760724 for ; Thu, 2 Sep 2021 22:01:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6BF5760724 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 6C7C7940023; Thu, 2 Sep 2021 18:01:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 60117940021; Thu, 2 Sep 2021 18:01:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32466940024; Thu, 2 Sep 2021 18:01:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0022.hostedemail.com [216.40.44.22]) by kanga.kvack.org (Postfix) with ESMTP id DE437940023 for ; Thu, 2 Sep 2021 18:01:18 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A64452C58B for ; Thu, 2 Sep 2021 22:01:18 +0000 (UTC) X-FDA: 78544005036.15.13148BC Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf22.hostedemail.com (Postfix) with ESMTP id 335F71903 for ; Thu, 2 Sep 2021 22:00:18 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id D7D45603E9; Thu, 2 Sep 2021 22:00:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1630620017; bh=JuMbieh0fzh5LqXR3vCiEom3vxwS4qD+f0n7GBzctGE=; h=Date:From:To:Subject:In-Reply-To:From; b=i57F5pF5fIf9HT7UaCDY/N7tSxf1z5gApR0JBTbQ3mvv9l+AXkDzDjTKYCa856H5z NkyeIbCjBCTZjSxUqPI/6TBk5kLT6pl9PZlskBXwL1SBH9FvrRNp/he59UiK7sTC6a uFgoeGw5tO1IRkaqdjXAjH9GBO4Ssmks5jw4AHe4= Date: Thu, 02 Sep 2021 15:00:16 -0700 From: Andrew Morton To: aarcange@redhat.com, ak@linux.intel.com, akpm@linux-foundation.org, ben.widawsky@intel.com, dan.j.williams@intel.com, dave.hansen@linux.intel.com, feng.tang@intel.com, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, rdunlap@infradead.org, rientjes@google.com, torvalds@linux-foundation.org, vbabka@suse.cz, ying.huang@intel.com Subject: [patch 194/212] mm/mempolicy: advertise new MPOL_PREFERRED_MANY Message-ID: <20210902220016.B1gnJYb0H%akpm@linux-foundation.org> In-Reply-To: <20210902144820.78957dff93d7bea620d55a89@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=i57F5pF5; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 335F71903 X-Stat-Signature: d1asrzy8m8fgupd57ryuouefbj4kg7zb X-HE-Tag: 1630620018-329963 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ben Widawsky Subject: mm/mempolicy: advertise new MPOL_PREFERRED_MANY Adds a new mode to the existing mempolicy modes, MPOL_PREFERRED_MANY. MPOL_PREFERRED_MANY will be adequately documented in the internal admin-guide with this patch. Eventually, the man pages for mbind(2), get_mempolicy(2), set_mempolicy(2) and numactl(8) will also have text about this mode. Those shall contain the canonical reference. NUMA systems continue to become more prevalent. New technologies like PMEM make finer grain control over memory access patterns increasingly desirable. MPOL_PREFERRED_MANY allows userspace to specify a set of nodes that will be tried first when performing allocations. If those allocations fail, all remaining nodes will be tried. It's a straight forward API which solves many of the presumptive needs of system administrators wanting to optimize workloads on such machines. The mode will work either per VMA, or per thread. [Michal Hocko: refine kernel doc for MPOL_PREFERRED_MANY] Link: https://lore.kernel.org/r/20200630212517.308045-13-ben.widawsky@intel.com Link: https://lkml.kernel.org/r/1627970362-61305-5-git-send-email-feng.tang@intel.com Signed-off-by: Ben Widawsky Signed-off-by: Feng Tang Acked-by: Michal Hocko Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Rientjes Cc: Huang Ying Cc: Mel Gorman Cc: Michal Hocko Cc: Mike Kravetz Cc: Randy Dunlap Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- Documentation/admin-guide/mm/numa_memory_policy.rst | 15 +++++++--- mm/mempolicy.c | 7 ---- 2 files changed, 12 insertions(+), 10 deletions(-) --- a/Documentation/admin-guide/mm/numa_memory_policy.rst~mm-mempolicy-advertise-new-mpol_preferred_many +++ a/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -245,6 +245,13 @@ MPOL_INTERLEAVED address range or file. During system boot up, the temporary interleaved system default policy works in this mode. +MPOL_PREFERRED_MANY + This mode specifices that the allocation should be preferrably + satisfied from the nodemask specified in the policy. If there is + a memory pressure on all nodes in the nodemask, the allocation + can fall back to all existing numa nodes. This is effectively + MPOL_PREFERRED allowed for a mask rather than a single node. + NUMA memory policy supports the following optional mode flags: MPOL_F_STATIC_NODES @@ -253,10 +260,10 @@ MPOL_F_STATIC_NODES nodes changes after the memory policy has been defined. Without this flag, any time a mempolicy is rebound because of a - change in the set of allowed nodes, the node (Preferred) or - nodemask (Bind, Interleave) is remapped to the new set of - allowed nodes. This may result in nodes being used that were - previously undesired. + change in the set of allowed nodes, the preferred nodemask (Preferred + Many), preferred node (Preferred) or nodemask (Bind, Interleave) is + remapped to the new set of allowed nodes. This may result in nodes + being used that were previously undesired. With this flag, if the user-specified nodes overlap with the nodes allowed by the task's cpuset, then the memory policy is --- a/mm/mempolicy.c~mm-mempolicy-advertise-new-mpol_preferred_many +++ a/mm/mempolicy.c @@ -1463,12 +1463,7 @@ static inline int sanitize_mpol_flags(in *flags = *mode & MPOL_MODE_FLAGS; *mode &= ~MPOL_MODE_FLAGS; - /* - * The check should be 'mode >= MPOL_MAX', but as 'prefer_many' - * is not fully implemented, don't permit it to be used for now, - * and the logic will be restored in following patch - */ - if ((unsigned int)(*mode) >= MPOL_PREFERRED_MANY) + if ((unsigned int)(*mode) >= MPOL_MAX) return -EINVAL; if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) return -EINVAL;