From patchwork Fri Aug 20 03:05:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yaozhenguo X-Patchwork-Id: 12448297 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64E1BC4320A for ; Fri, 20 Aug 2021 03:05:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D890460E93 for ; Fri, 20 Aug 2021 03:05:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D890460E93 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 391726B0071; Thu, 19 Aug 2021 23:05:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3426B8D0002; Thu, 19 Aug 2021 23:05:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 209218D0001; Thu, 19 Aug 2021 23:05:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id 064346B0071 for ; Thu, 19 Aug 2021 23:05:57 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A5B3E28482 for ; Fri, 20 Aug 2021 03:05:56 +0000 (UTC) X-FDA: 78493969512.03.078751F Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf02.hostedemail.com (Postfix) with ESMTP id 66D5B700178D for ; Fri, 20 Aug 2021 03:05:56 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id g14so7370637pfm.1 for ; Thu, 19 Aug 2021 20:05:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=vKa5fNGiQdpiDE73Y/wXWALNQvnhbKcn95gVeAZ0DeM=; b=Q0IxkP23p97cLE/qQYxan3kPansBEd5wlkmnsgdJZzPsn+A9+1OAlK6PkU/n+5ww2p 5kpv4qXTluk3rQdGc4sQvQjiIlrRBxdJvj8aYGVHfoP1uVxIqnlWOE9LhJFKk/qqpAvq M3MXtlz3Y9RPM3bvwzno+LDeHm0Auup8DCHpRlpiUnqpBAMtbFbUIhSmkisqWpmhlk1/ eEXrUBn9PnE3D4Rt31pVW9jBr87y2m5r9zBFp80BUVuwUwontidSCs1uNkYkbWJvsQC3 /AjbhyOIyBFpqdWw4+mfe5/Hr26Sd+WjUQRJeeAQw8boNmKkrotuIgOY5RmCzhzHIcRX j17Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=vKa5fNGiQdpiDE73Y/wXWALNQvnhbKcn95gVeAZ0DeM=; b=KSpQud0c2xdz/YM96+UhJ5OzZSXPW3LBUjD4LBrqxfgq7RoZEIjFYUIBBGz9yNr4TL csB3YMWmWEby3bdJujzhNyQJnuGIz1/lxexrcnXs/WP7QNwUJM0M3XVSOMTqP1qvHeO2 OldU0MN/pD9okiYTm9TcQ7c83omlyuxdVkBFMh0MYfZUwAGjnbqFm+vADZGt7RaX7Uhx KjBZVMCAPi5/C0Oj8wFNyDBj85otd9izOkSibXvPrpaSuyW4hKDIVxBd/tTmogYaTIUd 0K1Mui0BPpXGmhm+QUD0UFCAxWfbOdFATKuspiwlIQQp0p89UM8P4ehQsp41n93i6wLB FtMQ== X-Gm-Message-State: AOAM5316Wv6/imV2poH+ipyKRMSFVWZHynnEBrg6wcjqbBN0Vt4BYhKm 5pcofsbCbcEHxkCFLJdeT2g= X-Google-Smtp-Source: ABdhPJzDuiS+dMnR0RKVQpDr/5SRcHMRyxfMY+1gTr3Z0wrASBq+OVzESAQe0qWvZY6mYLGmzwRwJg== X-Received: by 2002:aa7:8685:0:b0:3e1:76d8:922e with SMTP id d5-20020aa78685000000b003e176d8922emr17348400pfo.45.1629428755284; Thu, 19 Aug 2021 20:05:55 -0700 (PDT) Received: from localhost.localdomain ([137.116.162.235]) by smtp.gmail.com with ESMTPSA id d198sm5002861pfd.101.2021.08.19.20.05.53 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 19 Aug 2021 20:05:55 -0700 (PDT) From: yaozhenguo To: mike.kravetz@oracle.com, corbet@lwn.net, akpm@linux-foundation.org Cc: yaozhenguo@jd.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, yaozhenguo Subject: [PATCH] hugetlbfs: add hugepages_node kernel parameter Date: Fri, 20 Aug 2021 11:05:36 +0800 Message-Id: <20210820030536.25737-1-yaozhenguo1@gmail.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=Q0IxkP23; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of yaozhenguo1@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=yaozhenguo1@gmail.com X-Stat-Signature: eax48tgyah1bfdrkhixze18n974hn6ji X-Rspamd-Queue-Id: 66D5B700178D X-Rspamd-Server: rspam01 X-HE-Tag: 1629428756-948462 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We can specify the number of hugepages to allocate at boot. But the hugepages is balanced in all nodes at present. In some scenarios, we only need hugepags in one node. For example: DPDK needs hugepages which is in the same node as NIC. if DPDK needs four hugepags of 1G size in node1 and system has 16 numa nodes. We must reserve 64 hugepags in kernel cmdline. But, only four hugepages is used. The others should be free after boot.If the system memory is low(for example: 64G), it will be an impossible task. So, add hugepages_node kernel parameter to specify node number of hugepages to allocate at boot. For example add following parameter: hugepagesz=1G hugepages_node=1 hugepages=4 It will allocate 4 hugepags in node1 at boot. Signed-off-by: yaozhenguo --- .../admin-guide/kernel-parameters.txt | 6 + include/linux/hugetlb.h | 1 + mm/hugetlb.c | 109 +++++++++++++++++- 3 files changed, 110 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index bdb22006f..1f85f2b3d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1583,6 +1583,12 @@ hugepages using the CMA allocator. If enabled, the boot-time allocation of gigantic hugepages is skipped. + hugepages_node= [HW] Node number of hugepages to allocate at boot. + This is used in conjunction with hugepages (below), + The pair hugepages_node=X hugepages=Y can be specified + for number of hugepages in numa node X. + Format: + hugepages= [HW] Number of HugeTLB pages to allocate at boot. If this follows hugepagesz (below), it specifies the number of pages of hugepagesz to be allocated. diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index f7ca1a387..5939ecd4f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -605,6 +605,7 @@ struct hstate { unsigned long nr_overcommit_huge_pages; struct list_head hugepage_activelist; struct list_head hugepage_freelists[MAX_NUMNODES]; + unsigned int max_huge_pages_node[MAX_NUMNODES]; unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; unsigned int surplus_huge_pages_node[MAX_NUMNODES]; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index dfc940d52..1f50f866c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -66,6 +66,8 @@ static struct hstate * __initdata parsed_hstate; static unsigned long __initdata default_hstate_max_huge_pages; static bool __initdata parsed_valid_hugepagesz = true; static bool __initdata parsed_default_hugepagesz; +static unsigned int default_hugepages_in_node[MAX_NUMNODES] __initdata; +static int parsed_huge_pages_node __initdata = NUMA_NO_NODE; /* * Protects updates to hugepage_freelists, hugepage_activelist, nr_huge_pages, @@ -2842,10 +2844,68 @@ static void __init gather_bootmem_prealloc(void) } } -static void __init hugetlb_hstate_alloc_pages(struct hstate *h) +static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid, bool *gns) +{ + unsigned long i; + + for (i = 0; i < h->max_huge_pages_node[nid]; i++) { + if (hstate_is_gigantic(h)) { + struct huge_bootmem_page *m; + void *addr; + + addr = memblock_alloc_try_nid_raw( + huge_page_size(h), huge_page_size(h), + 0, MEMBLOCK_ALLOC_ACCESSIBLE, nid); + if (!addr) + break; + m = addr; + BUG_ON(!IS_ALIGNED(virt_to_phys(m), huge_page_size(h))); + /* Put them into a private list first because mem_map is not up yet */ + INIT_LIST_HEAD(&m->list); + list_add(&m->list, &huge_boot_pages); + m->hstate = h; + *gns = true; + } else { + struct page *page; + + gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; + + page = alloc_fresh_huge_page(h, gfp_mask, nid, + &node_states[N_MEMORY], NULL); + if (page) + put_page(page); /* free it into the hugepage allocator */ + + } + } + if (hstate_is_gigantic(h)) { + h->max_huge_pages_node[nid] = 0; + } +} + +static void __init hugetlb_hstate_alloc_pages(struct hstate *h, int nid) { unsigned long i; nodemask_t *node_alloc_noretry; + bool hugetlb_node_set = false; + bool gigantic_node_set = false; + + /* do node alloc */ + for (i = 0; i < nodes_weight(node_states[N_MEMORY]); i++) { + if (h->max_huge_pages_node[i] > 0) + hugetlb_hstate_alloc_pages_onenode(h, i, &gigantic_node_set); + /* use gigantic_node_set to make a distinction + * between node set and whole set in gigantic hstate + */ + if (gigantic_node_set || h->nr_huge_pages_node[i] > 0) + hugetlb_node_set = true; + } + + /* nid != NUMA_NO_NODE prevent more pages are alloced in gigantic hstate + * for exampe: + * hugepagesz=1G hugepages_node=0 hugepages=4 hugepages_node=1 hugepages=0 + */ + if (hugetlb_node_set || nid != NUMA_NO_NODE) + return; if (!hstate_is_gigantic(h)) { /* @@ -2901,7 +2961,7 @@ static void __init hugetlb_init_hstates(void) /* oversize hugepages were init'ed in early boot */ if (!hstate_is_gigantic(h)) - hugetlb_hstate_alloc_pages(h); + hugetlb_hstate_alloc_pages(h, NUMA_NO_NODE); } VM_BUG_ON(minimum_order == UINT_MAX); } @@ -3580,6 +3640,9 @@ static int __init hugetlb_init(void) default_hstate_max_huge_pages; } } + for (i = 0; i < nodes_weight(node_states[N_MEMORY]); i++) + if (default_hugepages_in_node[i] > 0) + default_hstate.max_huge_pages_node[i] = default_hugepages_in_node[i]; hugetlb_cma_check(); hugetlb_init_hstates(); @@ -3663,9 +3726,16 @@ static int __init hugepages_setup(char *s) * default_hugepagesz. */ else if (!hugetlb_max_hstate) - mhp = &default_hstate_max_huge_pages; + if (parsed_huge_pages_node == NUMA_NO_NODE) + mhp = &default_hstate_max_huge_pages; + else + mhp = (unsigned long *)&(default_hugepages_in_node[parsed_huge_pages_node]); else - mhp = &parsed_hstate->max_huge_pages; + if (parsed_huge_pages_node == NUMA_NO_NODE) + mhp = &parsed_hstate->max_huge_pages; + else + mhp = (unsigned long *) + &(parsed_hstate->max_huge_pages_node[parsed_huge_pages_node]); if (mhp == last_mhp) { pr_warn("HugeTLB: hugepages= specified twice without interleaving hugepagesz=, ignoring hugepages=%s\n", s); @@ -3675,20 +3745,47 @@ static int __init hugepages_setup(char *s) if (sscanf(s, "%lu", mhp) <= 0) *mhp = 0; + if (parsed_huge_pages_node != NUMA_NO_NODE) { + if (!hugetlb_max_hstate) + default_hstate_max_huge_pages += *mhp; + else + parsed_hstate->max_huge_pages += *mhp; + } /* * Global state is always initialized later in hugetlb_init. * But we need to allocate gigantic hstates here early to still * use the bootmem allocator. */ if (hugetlb_max_hstate && hstate_is_gigantic(parsed_hstate)) - hugetlb_hstate_alloc_pages(parsed_hstate); + hugetlb_hstate_alloc_pages(parsed_hstate, parsed_huge_pages_node); + parsed_huge_pages_node = NUMA_NO_NODE; last_mhp = mhp; return 1; } __setup("hugepages=", hugepages_setup); +static int __init hugetlb_node_setup(char *s) +{ + int ret; + + if (!parsed_valid_hugepagesz) { + pr_warn("hugepages_node=%s preceded by an unsupported hugepagesz, ignoring\n", s); + parsed_valid_hugepagesz = true; + return 1; + } + + ret = kstrtoint(s, 0, &parsed_huge_pages_node); + if (ret < 0 || parsed_huge_pages_node < 0) { + pr_warn("hugepages_node = %d is invalid\n", parsed_huge_pages_node); + parsed_huge_pages_node = NUMA_NO_NODE; + } + + return 1; +} +__setup("hugepages_node=", hugetlb_node_setup); + /* * hugepagesz command line processing * A specific huge page size can only be specified once with hugepagesz. @@ -3776,7 +3873,7 @@ static int __init default_hugepagesz_setup(char *s) if (default_hstate_max_huge_pages) { default_hstate.max_huge_pages = default_hstate_max_huge_pages; if (hstate_is_gigantic(&default_hstate)) - hugetlb_hstate_alloc_pages(&default_hstate); + hugetlb_hstate_alloc_pages(&default_hstate, NUMA_NO_NODE); default_hstate_max_huge_pages = 0; }