From patchwork Fri Aug 5 00:59:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12936820 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2475C19F2D for ; Fri, 5 Aug 2022 00:58:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB3D48E0002; Thu, 4 Aug 2022 20:58:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E62838E0001; Thu, 4 Aug 2022 20:58:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2A408E0002; Thu, 4 Aug 2022 20:58:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C02788E0001 for ; Thu, 4 Aug 2022 20:58:43 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 98877140334 for ; Fri, 5 Aug 2022 00:58:43 +0000 (UTC) X-FDA: 79763728926.06.F40997B Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf28.hostedemail.com (Postfix) with ESMTP id 2ABF7C011D for ; Fri, 5 Aug 2022 00:58:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659661122; x=1691197122; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=ooi5mbefGvKfUWCk9RYSVk/n9UbKTohxT9gcx4F8dMI=; b=VP2S8YyyMpnl1CFrxgqqDHPNPqjRpBJMXb4Q8LwxN6vm6RiN3lO8BJev y++e/nhXlOeR07bO/ODj1McLUCMSZxl7SLCob0LjNQd3OgfaT74nmE3tf ljIV2CVHOZdBHrzi45IjPfFN32PtlWQSsdzz3uHQBOJ36GTjY0h2yh1tm ELha9Gu26l7By3LR3lO+u+17r2Mbq/7793ygfR6oGSeZQPyXMjXipkN67 qClYLdnHGLpoLw/YHh4gDQE6QPSXs1F3lK9zR4LlaykPyiG3JL1LR/vdz q9H5c4C2SLaoqHgS/45Gkeev0jyTfWXn57yCAhYhTgiJBQXO3vQ//Uxw7 A==; X-IronPort-AV: E=McAfee;i="6400,9594,10429"; a="291313264" X-IronPort-AV: E=Sophos;i="5.93,216,1654585200"; d="scan'208";a="291313264" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Aug 2022 17:58:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,216,1654585200"; d="scan'208";a="729821492" Received: from shbuild999.sh.intel.com ([10.239.147.181]) by orsmga004.jf.intel.com with ESMTP; 04 Aug 2022 17:58:34 -0700 From: Feng Tang To: Andrew Morton , Mike Kravetz , Muchun Song , Michal Hocko Cc: Dave Hansen , Ben Widawsky , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Feng Tang Subject: [PATCH v2] mm/hugetlb: add dedicated func to get 'allowed' nodemask for current process Date: Fri, 5 Aug 2022 08:59:03 +0800 Message-Id: <20220805005903.95563-1-feng.tang@intel.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659661123; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=S9NDwcejCMyEpqXpqSLEHH77HKacW300D7zb/YiXKso=; b=SqkDx68SEZ2arha24nxJxj1Pq+LbYNxjcR3ETluFG/Jmgw/FAsIQgjOE+ZwYUvVipP1bu9 dOJpOn1ZjV0WCtFlzB48ovCuTA5dXqLJ7pD5+E4/iLRZ9ht5hyashQAAfdfy7fa+boeWsU zSsCmc20uxxJvbe6pzzdCTKwOl9yHOg= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=VP2S8Yyy; spf=pass (imf28.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659661123; a=rsa-sha256; cv=none; b=wvK5bpcHy0YsS2mYvi6VM0eZYMCmRzBPo0WZDAV0qRrhhUfDlnIDU3tn7e3I1iKkwto73q XP4bcPDNBrBFqj9htdP98ms24eOYGupsxZIgmFbkx3EDCvUsy91fXXArZbGL/LrPTMktb6 RFXg3f/y2VXzzLd6XtKIgHk2bVRopDk= X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=VP2S8Yyy; spf=pass (imf28.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2ABF7C011D X-Stat-Signature: rppaijjn3ftwjewttmzddcsi9sapg6ec X-HE-Tag: 1659661121-876695 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Muchun Song found that after MPOL_PREFERRED_MANY policy was introduced in commit b27abaccf8e8 ("mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes"), the policy_nodemask_current()'s semantics for this new policy has been changed, which returns 'preferred' nodes instead of 'allowed' nodes. With the changed semantic of policy_nodemask_current, a task with MPOL_PREFERRED_MANY policy could fail to get its reservation even though it can fall back to other nodes (either defined by cpusets or all online nodes) for that reservation failing mmap calles unnecessarily early. The fix is to not consider MPOL_PREFERRED_MANY for reservations at all because they, unlike MPOL_MBIND, do not pose any actual hard constrain. Michal suggested the policy_nodemask_current() is only used by hugetlb, and could be moved to hugetlb code with more explicit name to enforce the 'allowed' semantics for which only MPOL_BIND policy matters. apply_policy_zone() is made extern to be called in hugetlb code and its return value is changed to bool. [1]. https://lore.kernel.org/lkml/20220801084207.39086-1-songmuchun@bytedance.com/t/ Fixes: b27abaccf8e8 ("mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes") Reported-by: Muchun Song Suggested-by: Michal Hocko Signed-off-by: Feng Tang Acked-by: Michal Hocko Reviewed-by: Muchun Song --- Changelog: since v1: * add the missing user visible effect of the problem (Michal Hocko) * add Fix tag (Andrew/Michal) * add Acked-by tag since RFC: * make apply_policy_zone() extern instead of putting it to mempolicy.h (Michal Hocko) include/linux/mempolicy.h | 13 +------------ mm/hugetlb.c | 24 ++++++++++++++++++++---- mm/mempolicy.c | 2 +- 3 files changed, 22 insertions(+), 17 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 668389b4b53d..d232de7cdc56 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -151,13 +151,6 @@ extern bool mempolicy_in_oom_domain(struct task_struct *tsk, const nodemask_t *mask); extern nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy); -static inline nodemask_t *policy_nodemask_current(gfp_t gfp) -{ - struct mempolicy *mpol = get_task_policy(current); - - return policy_nodemask(gfp, mpol); -} - extern unsigned int mempolicy_slab_node(void); extern enum zone_type policy_zone; @@ -189,6 +182,7 @@ static inline bool mpol_is_preferred_many(struct mempolicy *pol) return (pol->mode == MPOL_PREFERRED_MANY); } +extern bool apply_policy_zone(struct mempolicy *policy, enum zone_type zone); #else @@ -294,11 +288,6 @@ static inline void mpol_put_task_policy(struct task_struct *task) { } -static inline nodemask_t *policy_nodemask_current(gfp_t gfp) -{ - return NULL; -} - static inline bool mpol_is_preferred_many(struct mempolicy *pol) { return false; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a18c071c294e..ad84bb85b6de 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4330,18 +4330,34 @@ static int __init default_hugepagesz_setup(char *s) } __setup("default_hugepagesz=", default_hugepagesz_setup); +static nodemask_t *policy_mbind_nodemask(gfp_t gfp) +{ +#ifdef CONFIG_NUMA + struct mempolicy *mpol = get_task_policy(current); + + /* + * Only enforce MPOL_BIND policy which overlaps with cpuset policy + * (from policy_nodemask) specifically for hugetlb case + */ + if (mpol->mode == MPOL_BIND && + (apply_policy_zone(mpol, gfp_zone(gfp)) && + cpuset_nodemask_valid_mems_allowed(&mpol->nodes))) + return &mpol->nodes; +#endif + return NULL; +} + static unsigned int allowed_mems_nr(struct hstate *h) { int node; unsigned int nr = 0; - nodemask_t *mpol_allowed; + nodemask_t *mbind_nodemask; unsigned int *array = h->free_huge_pages_node; gfp_t gfp_mask = htlb_alloc_mask(h); - mpol_allowed = policy_nodemask_current(gfp_mask); - + mbind_nodemask = policy_mbind_nodemask(gfp_mask); for_each_node_mask(node, cpuset_current_mems_allowed) { - if (!mpol_allowed || node_isset(node, *mpol_allowed)) + if (!mbind_nodemask || node_isset(node, *mbind_nodemask)) nr += array[node]; } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d39b01fd52fe..9f15bc533601 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1805,7 +1805,7 @@ bool vma_policy_mof(struct vm_area_struct *vma) return pol->flags & MPOL_F_MOF; } -static int apply_policy_zone(struct mempolicy *policy, enum zone_type zone) +bool apply_policy_zone(struct mempolicy *policy, enum zone_type zone) { enum zone_type dynamic_policy_zone = policy_zone;