From patchwork Fri Jun 21 19:00:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aristeu Rozanski X-Patchwork-Id: 13708017 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A875C27C4F for ; Fri, 21 Jun 2024 19:01:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A2748D0193; Fri, 21 Jun 2024 15:00:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 951B38D0190; Fri, 21 Jun 2024 15:00:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 819888D0193; Fri, 21 Jun 2024 15:00:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6462D8D0190 for ; Fri, 21 Jun 2024 15:00:59 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1B00B12119D for ; Fri, 21 Jun 2024 19:00:59 +0000 (UTC) X-FDA: 82255813038.07.63A003B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 32E0E18001D for ; Fri, 21 Jun 2024 19:00:57 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ABWl6ZhY; spf=pass (imf16.hostedemail.com: domain of aris@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=aris@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718996452; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=zOSUnqJmHVHsAnf1voW++AFbC2r2BFz7mQbZXKU0ngg=; b=YueGSSxRYvOw+5muHAvpzFAjee02saAve7a+bvIK1CyEqOWi07eWn4/4ARY3Ip4WtJvH3Q 3flHl5no82Nu8f1BYxzDu9MrZA3IKx62dLW0u9e6ImHUJJzDpqHcu/MpzkzrttmfN+2zKM ncHU+SI0gPSh671D03wbs2+Mgdb9IZw= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ABWl6ZhY; spf=pass (imf16.hostedemail.com: domain of aris@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=aris@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718996452; a=rsa-sha256; cv=none; b=iOcSdxXoK9OC1Uyn8NnYnuGVFzc6AnRgPArPLjdsgRyhNA3kwBz1F4eiESRh8a5p/dOa3n 082f1PD8h6V6+vKrCZ4/Iy9hut6lqwAXu5VZHduTbwWyd4mNu/IqlZu2T74XAwvN7BUo3X 1asQNuPwVpmusuwgcVd5k1OnAy2uvw4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718996456; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=zOSUnqJmHVHsAnf1voW++AFbC2r2BFz7mQbZXKU0ngg=; b=ABWl6ZhY0gB7TEKRDZI8a+8G/oj9KulMb3JyDQkE8syxe2irfRHMpY37SOWiBGBOBVPB/7 O9xO12N+rtGDIfhA/d8LMoBWwkYioiyz02ftWps+sF1Nh9SaNCjnljgbUfli2u/tQzmleD KGiV8zqGGZV+0Vpop7PGlvbqTk+YQpg= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-9-cN6LyLuvN0-vHxf7g9ISZQ-1; Fri, 21 Jun 2024 15:00:53 -0400 X-MC-Unique: cN6LyLuvN0-vHxf7g9ISZQ-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8B2F81956080; Fri, 21 Jun 2024 19:00:51 +0000 (UTC) Received: from napanee.usersys.redhat.com (unknown [10.22.17.154]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 1C4EC3000601; Fri, 21 Jun 2024 19:00:51 +0000 (UTC) Received: by napanee.usersys.redhat.com (Postfix, from userid 1000) id 41D76C0869; Fri, 21 Jun 2024 15:00:50 -0400 (EDT) Date: Fri, 21 Jun 2024 15:00:50 -0400 From: Aristeu Rozanski To: linux-mm@kvack.org Cc: Muchun Song , Andrew Morton Subject: [PATCH] hugetlb: force allocating surplus hugepages on mempolicy allowed nodes Message-ID: <20240621190050.mhxwb65zn37doegp@redhat.com> MIME-Version: 1.0 User-Agent: NeoMutt/20220429 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 32E0E18001D X-Stat-Signature: g4maytmis9wb4eh18uqwgewq9m9nuq8e X-HE-Tag: 1718996457-386451 X-HE-Meta: U2FsdGVkX19H3KJGQ7qeJsvZF+e2bR/NGfeTBQ9w/OFi9DpYcZu+FLQyXPkFRFTbtANgy3VS60gu3iGu/cnXhZiDjWsPIXx/MA6KSPYVrZ9O9hT3cBp3+K+kgHg6O5nVvJYCB3fZVvU2jY/eXsdxr4e9foO0qXx1ZHNyR0xuoDJzLSgKRofYZYAP4IjVwi8flCqav671RMuUmtvJvAVGtLJjMmSNfF+E+D33tE9mfQTlL5zrECdxSU6ffU0u77E1Zdc8qVzDeA0JU3nXx74lmEgFeWRotd0TFt04XMHjfYXO/7NOveeSiunG1V2UzWCT9nfnGTrK/yIZ5nDajSOUVIX3Q/bufc+PQCxjuMQg4nQDiziipCM/j4vmi4oossGqBmka9qDcRRWCpEEC8g7psWfArOjb3PnzV3PeXW3CmSowg70U0MB6M5tb8oInj4VbZQm0AXV4dpOBJSGkOq1wwT7vuOIutq5uc/EJm4IE7V1cwr/IdI21Jb/nu3RQk3Vk2/oUu2qBHvtlD/zvfm7NWvGt5uYIN+g9eD5DctufLe32CKRnVWdvYBYAANsKjQa4pveu7POy3EwzqvljIwUu8KTqz8EhE3vNtnJZv7R+fv9PvCRXZDdeRioYnHOxSQhdFa2NSG8JcqN52Z+fB3SoLX4RFxXPkQl5K62ehHCiEA47V/Yn1Fyciphb7S8CauHu2qW4EdRqvfltn3vOJjKyKcjpm/2sXn6XOO0NrF/d/wMR2CO/R4EkPRAUbJlODJtMm2LeCfP+6HdwD4pVylTgQ+v1WhnbYfr+G55gD1LojvcV0rknDJMEJWpegWGFZAOQKfxotUrBhVmXa2i1zxWd7ls3Q/NCQN8tUq0kb0YagUsEn3HNHmKFqNGkfoFcpT5eFBLDuCNtTQnseHpBzWlrV7R/MVhOv4Fw7Z92dJ9QjAyZlyr/TpNuuxPNlh9lNOwWzzYD//yTutwmiEXh+sa 1KRS11mz JfVHQqfkMUn1cfTvk6uVGOBQ9yg21WDGTqF5TBDBC/NcnJpz7EMMg5J3f6T4AR/phaihspV3Tw5pG2j4tyKLTkyQNfI7+BwsDAGyhLd5leVyQ8eCgwwpNn+dDZTcG3K95K+E+UbgIHe2yJ2PXqWBB3MuqjXJY/+7kfTh4EHCtI5gOcaZwBwlYgPG6g98LA1rRXbcmfz53JGTESCyWdNP25K0h1ubT4RpTysL/vw9O5R+vpaThSP0lnqL/qC07axQFW2TSorWlUpFOrh5K/JE4mLpDcnw/iCKqKPndclr21aDtSio67x6kAdnRl6g6qlH8Mvr4a4+TCO1KkZAS1jTZiGp6JSm8N94Ufh9mKKdDua8JdU5ojVJd955UfVv1JVJfQblb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: allowed_mems_nr() only accounts for the number of free hugepages in the nodes the current process belongs to. In case one or more of the requested surplus hugepages are allocated in a different node, the whole allocation will fail due allowed_mems_nr() returning a lower value. So allocate surplus hugepages in one of the nodes the current process belongs to. Easy way to reproduce this issue is to use a 2+ NUMA nodes system: # echo 0 >/proc/sys/vm/nr_hugepages # echo 1 >/proc/sys/vm/nr_overcommit_hugepages # numactl -m0 ./tools/testing/selftests/mm/map_hugetlb 2 will eventually fail when the hugepage ends up allocated in a different node. Cc: Muchun Song Cc: Andrew Morton Signed-off-by: Aristeu Rozanski --- mm/hugetlb.c | 46 +++++++++++++++++++++++++++------------------- 1 file changed, 27 insertions(+), 19 deletions(-) --- upstream.orig/mm/hugetlb.c 2024-06-20 13:42:25.699568114 -0400 +++ upstream/mm/hugetlb.c 2024-06-21 14:52:49.970798299 -0400 @@ -2618,6 +2618,23 @@ struct folio *alloc_hugetlb_folio_nodema return alloc_migrate_hugetlb_folio(h, gfp_mask, preferred_nid, nmask); } +static nodemask_t *policy_mbind_nodemask(gfp_t gfp) +{ +#ifdef CONFIG_NUMA + struct mempolicy *mpol = get_task_policy(current); + + /* + * Only enforce MPOL_BIND policy which overlaps with cpuset policy + * (from policy_nodemask) specifically for hugetlb case + */ + if (mpol->mode == MPOL_BIND && + (apply_policy_zone(mpol, gfp_zone(gfp)) && + cpuset_nodemask_valid_mems_allowed(&mpol->nodes))) + return &mpol->nodes; +#endif + return NULL; +} + /* * Increase the hugetlb pool such that it can accommodate a reservation * of size 'delta'. @@ -2631,6 +2648,8 @@ static int gather_surplus_pages(struct h long i; long needed, allocated; bool alloc_ok = true; + int node; + nodemask_t *mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h)); lockdep_assert_held(&hugetlb_lock); needed = (h->resv_huge_pages + delta) - h->free_huge_pages; @@ -2645,8 +2664,14 @@ allocated = 0; retry: spin_unlock_irq(&hugetlb_lock); for (i = 0; i < needed; i++) { - folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h), - NUMA_NO_NODE, NULL); + for_each_node_mask(node, cpuset_current_mems_allowed) { + if (!mbind_nodemask || node_isset(node, *mbind_nodemask)) { + folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h), + node, NULL); + if (folio) + break; + } + } if (!folio) { alloc_ok = false; break; @@ -4876,23 +4901,6 @@ default_hstate_max_huge_pages = 0; } __setup("default_hugepagesz=", default_hugepagesz_setup); -static nodemask_t *policy_mbind_nodemask(gfp_t gfp) -{ -#ifdef CONFIG_NUMA - struct mempolicy *mpol = get_task_policy(current); - - /* - * Only enforce MPOL_BIND policy which overlaps with cpuset policy - * (from policy_nodemask) specifically for hugetlb case - */ - if (mpol->mode == MPOL_BIND && - (apply_policy_zone(mpol, gfp_zone(gfp)) && - cpuset_nodemask_valid_mems_allowed(&mpol->nodes))) - return &mpol->nodes; -#endif - return NULL; -} - static unsigned int allowed_mems_nr(struct hstate *h) { int node;