From patchwork Tue Jul 28 03:49:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 11688073 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6B523913 for ; Tue, 28 Jul 2020 03:49:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2E1792075A for ; Tue, 28 Jul 2020 03:49:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="m1j4Imhq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2E1792075A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1CA7C8D0007; Mon, 27 Jul 2020 23:49:53 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 17BC78D0002; Mon, 27 Jul 2020 23:49:53 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 069EA8D0007; Mon, 27 Jul 2020 23:49:53 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0174.hostedemail.com [216.40.44.174]) by kanga.kvack.org (Postfix) with ESMTP id E41D08D0002 for ; Mon, 27 Jul 2020 23:49:52 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9BBFC1EF1 for ; Tue, 28 Jul 2020 03:49:52 +0000 (UTC) X-FDA: 77086105824.27.fall02_61043ec26f66 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 723103D663 for ; Tue, 28 Jul 2020 03:49:52 +0000 (UTC) X-Spam-Summary: 1,0,0,4f90d237aedddd51,d41d8cd98f00b204,songmuchun@bytedance.com,,RULES_HIT:41:355:379:541:800:960:966:973:982:988:989:1260:1311:1314:1345:1437:1515:1535:1544:1605:1711:1730:1747:1777:1792:2194:2196:2199:2200:2393:2553:2559:2562:2693:2895:2910:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3874:4118:4250:4321:4385:4605:5007:6119:6261:6653:7903:9108:10004:11026:11658:11914:12043:12296:12297:12438:12517:12519:12555:12895:12986:13161:13221:13229:13894:14096:14181:14394:14721:21080:21444:21451:21627:21939:21990:30054:30056:30064:30070:30090,0,RBL:209.85.210.195:@bytedance.com:.lbl8.mailshell.net-62.2.0.100 66.100.201.201;04yguycz3rxtj3mg35hjsyhke5wpnocuftcnn38tu6fjykog9aiw31bgm1qqmb7.phwdgfz5ffzxygu5zacnqfiw8sidk67kotd8omjs6q9hfwuxkoc8mhgcsh8hrhh.h-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: fall02_61043ec26f66 X-Filterd-Recvd-Size: 7723 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Tue, 28 Jul 2020 03:49:51 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id l2so3860081pff.0 for ; Mon, 27 Jul 2020 20:49:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Z3SI+30oOVrpwqsXeNOLZYsvl71fQHWoVD8BD0w11Y4=; b=m1j4ImhqUvVsJlCysIaLIa+0F0en/dGCbB8Cc/eXZdN2jA/FyigBfIryL7CxgAgnac dVpKf59T0JX31gcT3xQzih+jBS/ZWcSWdKtBw+WvIcCQBaxC6PT/JHo3vUrrEwGP4443 60vtBVvN9mm+YY3UgK0Y0zAxXaC7r7imSweg58Yn4HxmYfROOW93IWKmMq40Zi02XEQ9 CVqyZQHT/6GmzkVumNtbUAwf9dDN4cMJpybTQnzrF2KlmMQ+sCykF0Q0tfn0Oz+tol0O xd4iAkIpLH/nQfCfMloU0Cpb9LDK6vLjJ7tpFAfQEblyFd17oK+MMeYY9ijOXeeVmvE/ vtVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Z3SI+30oOVrpwqsXeNOLZYsvl71fQHWoVD8BD0w11Y4=; b=p5q2NvMPjEoB094cIUcNBVZtF3JzgzoQ2clBlFaGlZ4B0NmbIKCVg2Tx2psINDJYsl dWZRIWkclxCsS59K2twauO7MWKyv8wDcor1Q0h+Dy2ZGr2Nvw8KOxeiMsXVgM8PeNlsT E7gl7JuYWUwmKmqryfK+COgSfXzOYO215kCUSxhe5o09JtUwoVSdhYTuqNGFy1g+6f2C t9O782wAtJuFrH2XLFNrhM6jxSZstXP5b781PfrKzcp1RoMZrDQe1ZPXjllvo5gYT36u 5x3sBRuO7TAUOj8oDEXpZnvTYn7vyKuW/8X1yMkW6jalokInMNSDmFbRYgDFDx8z4bsa +rIA== X-Gm-Message-State: AOAM530If+ua/t+ERsBli0Ki/LLP3KHO646qc2OvXW7De9dWBRY2FZjg gSQkUH69VCsVXeDiNCX6I3I3fQ== X-Google-Smtp-Source: ABdhPJwC18jLGNlBf5fgSrSHm5lWvDB/pjThuDuosU8klfALgU/Lav9js8Gi4STVcOQ7z3JqJ8Xecg== X-Received: by 2002:a63:29c8:: with SMTP id p191mr23590926pgp.333.1595908190337; Mon, 27 Jul 2020 20:49:50 -0700 (PDT) Received: from Smcdef-MBP.local.net ([103.136.221.71]) by smtp.gmail.com with ESMTPSA id f1sm3872558pjo.1.2020.07.27.20.49.46 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Jul 2020 20:49:49 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, mhocko@kernel.org Cc: rientjes@google.com, mgorman@suse.de, walken@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song , Jianchao Guo Subject: [PATCH v4] mm/hugetlb: add mempolicy check in the reservation routine Date: Tue, 28 Jul 2020 11:49:38 +0800 Message-Id: <20200728034938.14993-1-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) MIME-Version: 1.0 X-Rspamd-Queue-Id: 723103D663 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In the reservation routine, we only check whether the cpuset meets the memory allocation requirements. But we ignore the mempolicy of MPOL_BIND case. If someone mmap hugetlb succeeds, but the subsequent memory allocation may fail due to mempolicy restrictions and receives the SIGBUS signal. This can be reproduced by the follow steps. 1) Compile the test case. cd tools/testing/selftests/vm/ gcc map_hugetlb.c -o map_hugetlb 2) Pre-allocate huge pages. Suppose there are 2 numa nodes in the system. Each node will pre-allocate one huge page. echo 2 > /proc/sys/vm/nr_hugepages 3) Run test case(mmap 4MB). We receive the SIGBUS signal. numactl --membind=0 ./map_hugetlb 4 With this patch applied, the mmap will fail in the step 3) and throw "mmap: Cannot allocate memory". Signed-off-by: Muchun Song Reported-by: Jianchao Guo Suggested-by: Michal Hocko Reviewed-by: Mike Kravetz --- changelog in v4: 1) Fix compilation errors with !CONFIG_NUMA. changelog in v3: 1) Do not allocate nodemask on the stack. 2) Update comment. changelog in v2: 1) Reuse policy_nodemask(). include/linux/mempolicy.h | 14 ++++++++++++++ mm/hugetlb.c | 22 ++++++++++++++++++---- mm/mempolicy.c | 2 +- 3 files changed, 33 insertions(+), 5 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index ea9c15b60a96..0656ece1ccf1 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -152,6 +152,15 @@ extern int huge_node(struct vm_area_struct *vma, extern bool init_nodemask_of_mempolicy(nodemask_t *mask); extern bool mempolicy_nodemask_intersects(struct task_struct *tsk, const nodemask_t *mask); +extern nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy); + +static inline nodemask_t *policy_nodemask_current(gfp_t gfp) +{ + struct mempolicy *mpol = get_task_policy(current); + + return policy_nodemask(gfp, mpol); +} + extern unsigned int mempolicy_slab_node(void); extern enum zone_type policy_zone; @@ -281,5 +290,10 @@ static inline int mpol_misplaced(struct page *page, struct vm_area_struct *vma, static inline void mpol_put_task_policy(struct task_struct *task) { } + +static inline nodemask_t *policy_nodemask_current(gfp_t gfp) +{ + return NULL; +} #endif /* CONFIG_NUMA */ #endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 589c330df4db..a34458f6a475 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3463,13 +3463,21 @@ static int __init default_hugepagesz_setup(char *s) } __setup("default_hugepagesz=", default_hugepagesz_setup); -static unsigned int cpuset_mems_nr(unsigned int *array) +static unsigned int allowed_mems_nr(struct hstate *h) { int node; unsigned int nr = 0; + nodemask_t *mpol_allowed; + unsigned int *array = h->free_huge_pages_node; + gfp_t gfp_mask = htlb_alloc_mask(h); + + mpol_allowed = policy_nodemask_current(gfp_mask); - for_each_node_mask(node, cpuset_current_mems_allowed) - nr += array[node]; + for_each_node_mask(node, cpuset_current_mems_allowed) { + if (!mpol_allowed || + (mpol_allowed && node_isset(node, *mpol_allowed))) + nr += array[node]; + } return nr; } @@ -3648,12 +3656,18 @@ static int hugetlb_acct_memory(struct hstate *h, long delta) * we fall back to check against current free page availability as * a best attempt and hopefully to minimize the impact of changing * semantics that cpuset has. + * + * Apart from cpuset, we also have memory policy mechanism that + * also determines from which node the kernel will allocate memory + * in a NUMA system. So similar to cpuset, we also should consider + * the memory policy of the current task. Similar to the description + * above. */ if (delta > 0) { if (gather_surplus_pages(h, delta) < 0) goto out; - if (delta > cpuset_mems_nr(h->free_huge_pages_node)) { + if (delta > allowed_mems_nr(h)) { return_unused_surplus_pages(h, delta); goto out; } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 93fcfc1f2fa2..fce14c3f4f38 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1873,7 +1873,7 @@ static int apply_policy_zone(struct mempolicy *policy, enum zone_type zone) * Return a nodemask representing a mempolicy for filtering nodes for * page allocation */ -static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) +nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) { /* Lower zones don't get a nodemask applied for MPOL_BIND */ if (unlikely(policy->mode == MPOL_BIND) &&