From patchwork Mon Mar 7 13:07:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12771826 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93A56C433EF for ; Mon, 7 Mar 2022 13:08:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0EEBC8D0003; Mon, 7 Mar 2022 08:08:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 09EC68D0001; Mon, 7 Mar 2022 08:08:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA8908D0003; Mon, 7 Mar 2022 08:08:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id DC0F68D0001 for ; Mon, 7 Mar 2022 08:08:42 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9A83D181CAEFE for ; Mon, 7 Mar 2022 13:08:42 +0000 (UTC) X-FDA: 79217619684.23.6DAE401 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf12.hostedemail.com (Postfix) with ESMTP id 438AA40015 for ; Mon, 7 Mar 2022 13:08:42 +0000 (UTC) Received: by mail-pj1-f45.google.com with SMTP id mg21-20020a17090b371500b001bef9e4657cso13224442pjb.0 for ; Mon, 07 Mar 2022 05:08:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qW5FPsjHoeDJhXjcnei9BSeHnfSk9h1jeGzbOieqiTI=; b=Mvh1dUO5XlJTKhAQPXB10OJdy5tionKj/pcUVRBSYgQQ2XYHy1Iz09aVnM8gn13pbT EfWrBCO54EEaP743f2KxxiJxmO91AS90PxkE59Tm8IdWJ1VeUXxxzDAPZfqa0FVIc5Kt E48Otybo5+HsZ+bTexjnn3sjO7EI1uLgscUPFpnxC98gu8EFOrOCo+vHgSG6mim7+FPZ 6KDalI6lStC0WZ/nnCrs4/DiFH01Lk+MJJU5LBOZuZSZ2VN3VuTONhCzoLfC5gplBR9s Q7DcsNwsG2ebV7Fq/XznHUuizt6xizgYcLLBn8LXpctNu2/4P+BUymdiQJIwo5i6zsof 3D2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qW5FPsjHoeDJhXjcnei9BSeHnfSk9h1jeGzbOieqiTI=; b=eu6+8HWulq+E5gs47RSKCBcV43MIrFzXAesSzyY50fAja0HicJpUjejkcgsav4q9PY 71svQTYDdIhk9TKmtBUq7FqhParJePGaIbKSiCbi/GqnyrNfjR9l9ZgbHnH43FTSxVe3 /e5XhBJt5vLXNWMNBARkKre9REDOAL6llx8xnXXcPlGiVJKSIThtd7Lrqic62KXrYlLJ MYQYUBXcv139ygzvYBeKM0NwEP49HzYA+h1J3Uk2/OfXQTiOq3R94ljB0rhdVmKnJWth PMhr/PXEXuD+0/vRAu+vwBbyRmdhwWOUvWPigNoIBAlwfwbBoXNe2TUrfb8fn120z1P+ Bg3Q== X-Gm-Message-State: AOAM531H7k4hDQsCEdSlIbIXBy6WqsqardmbtvdVFt9GRhNu8qCYnBqf XlMa2sRNCgyDy7DQvSbNtc5J6w== X-Google-Smtp-Source: ABdhPJyi/s6JrRd7CnoW3N3QL8rs8llPB8YRmzbheDpz2rgM3I7HPPl+hZkDsa92nWf2RiLq/+elKQ== X-Received: by 2002:a17:902:8bc2:b0:14d:6d13:a389 with SMTP id r2-20020a1709028bc200b0014d6d13a389mr11944727plo.2.1646658521277; Mon, 07 Mar 2022 05:08:41 -0800 (PST) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.245]) by smtp.gmail.com with ESMTPSA id x9-20020aa79409000000b004f704d33ca0sm3258528pfo.136.2022.03.07.05.08.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Mar 2022 05:08:41 -0800 (PST) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v3 1/4] mm: hugetlb: disable freeing vmemmap pages when struct page crosses page boundaries Date: Mon, 7 Mar 2022 21:07:05 +0800 Message-Id: <20220307130708.58771-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220307130708.58771-1-songmuchun@bytedance.com> References: <20220307130708.58771-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 438AA40015 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Mvh1dUO5; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf12.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Stat-Signature: agfhqoe79zbzsmin7azatrjmo64cfyb7 X-HE-Tag: 1646658522-431846 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If the size of "struct page" is not the power of two and this feature is enabled, then the vmemmap pages of HugeTLB will be corrupted after remapping (panic is about to happen in theory). But this only exists when !CONFIG_MEMCG && !CONFIG_SLUB on x86_64. However, it is not a conventional configuration nowadays. So it is not a real word issue, just the result of a code review. But we cannot prevent anyone from configuring that combined configure. This feature should be disable in this case to fix this issue. Signed-off-by: Muchun Song --- mm/hugetlb_vmemmap.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index b3118dba0518..49bc7f845438 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -121,6 +121,18 @@ void __init hugetlb_vmemmap_init(struct hstate *h) if (!hugetlb_free_vmemmap_enabled()) return; + if (IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON) && + !is_power_of_2(sizeof(struct page))) { + /* + * The hugetlb_free_vmemmap_enabled_key can be enabled when + * CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON. It should + * be disabled if "struct page" crosses page boundaries. + */ + pr_warn_once("cannot free vmemmap pages because \"struct page\" crosses page boundaries\n"); + static_branch_disable(&hugetlb_free_vmemmap_enabled_key); + return; + } + vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT; /* * The head page is not to be freed to buddy allocator, the other tail From patchwork Mon Mar 7 13:07:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12771827 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1E55C433F5 for ; Mon, 7 Mar 2022 13:08:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F7698D0005; Mon, 7 Mar 2022 08:08:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A7948D0001; Mon, 7 Mar 2022 08:08:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46EFD8D0005; Mon, 7 Mar 2022 08:08:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0172.hostedemail.com [216.40.44.172]) by kanga.kvack.org (Postfix) with ESMTP id 38C808D0001 for ; Mon, 7 Mar 2022 08:08:48 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D5F7C824C43A for ; Mon, 7 Mar 2022 13:08:47 +0000 (UTC) X-FDA: 79217619894.19.EBC9817 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf03.hostedemail.com (Postfix) with ESMTP id 52CDD20012 for ; Mon, 7 Mar 2022 13:08:47 +0000 (UTC) Received: by mail-pl1-f170.google.com with SMTP id 9so13703329pll.6 for ; Mon, 07 Mar 2022 05:08:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=eZY7gA2azo+4sffK75qtgPBXDeGINoqj5q7b11RTqYU=; b=Kse9gu+yaCLOEChe3cuA0Kb8eck2heSq5BOv8WU9ue+N9gUxdSkaNvkKxdDqZpzml5 7CRSs4fHIdvmCKOR4jjyWcn95Hn9m9ypLaWya+bCSj7EPJD1D30SFFOsHR8crWGPRQoB WHX5tOWQIJINeYODVkyoAwwyluernd4B0VPvXcgGwCL7yUJh2yFC1NQjUyVMbnQDsy7i VGWCoxmXD9JxkzayLdD3GxHLoo0uBJkZrRDdmkFlGzcvyIbL5RxKRQd7yAEEaiYrjaAi PF1CuJ8mURertrx7GdYJi1cwD7MIicxvaYivlicjKPJPDmLfWHQaGyoTqp8Kmba3t+Dk tRsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eZY7gA2azo+4sffK75qtgPBXDeGINoqj5q7b11RTqYU=; b=G299Xdu++ahSIXXf9AEJTtSB31jDGBED4poTRERska9IRK/ki2hvF9xfMLVFgrrKOb uQKxuasyjsnYlxNduiqAFNO0aphr8F0F5Fyhgjl1KO7cxPWI6VZDR9kqITUrqtTey6uf /soak4W4qYXJDakwch1YgoCAdOFYBsFMQv1uhnikfj59gF/g2ghSQkqPwKARVL1EJrkB WHhtGWoQj9O+hoofQqJenaonb4WwswqHTxJ7bOoBWJaPMWUD/4rg9IdZzbFjnDDNeZNx A40fkHNFbJElqdNVKza2buA7pxI0NjVxF+/m+WdweJe9eSfpyCL0QY/+UQVMaElhSEDU l+7g== X-Gm-Message-State: AOAM531XTdxn/CryUGmlg1VGrMlU95O8OuPFFORHtme8M4dlZqa8pEVO 6QuFOSxoqGyd6l4GdCDoTOMFpA== X-Google-Smtp-Source: ABdhPJw8PMBRA/Jiet3aOehZpxwLq5aQgGRk+n6DAlJRzdFAldXJjBHSB6LzkRlc4xetOEKUOh8p6A== X-Received: by 2002:a17:90b:1c8e:b0:1bf:364c:dd7a with SMTP id oo14-20020a17090b1c8e00b001bf364cdd7amr12718994pjb.103.1646658526331; Mon, 07 Mar 2022 05:08:46 -0800 (PST) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.245]) by smtp.gmail.com with ESMTPSA id x9-20020aa79409000000b004f704d33ca0sm3258528pfo.136.2022.03.07.05.08.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Mar 2022 05:08:46 -0800 (PST) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v3 2/4] mm: memory_hotplug: override memmap_on_memory when hugetlb_free_vmemmap=on Date: Mon, 7 Mar 2022 21:07:06 +0800 Message-Id: <20220307130708.58771-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220307130708.58771-1-songmuchun@bytedance.com> References: <20220307130708.58771-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 52CDD20012 X-Stat-Signature: qzmksorw35ap1wpebahnuajab89cn1dm Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Kse9gu+y; spf=pass (imf03.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1646658527-623636 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When "hugetlb_free_vmemmap=on" and "memory_hotplug.memmap_on_memory" are both passed to boot cmdline, the variable of "memmap_on_memory" will be set to 1 even if the vmemmap pages will not be allocated from the hotadded memory since the former takes precedence over the latter. In the next patch, we want to enable or disable the feature of freeing vmemmap pages of HugeTLB via sysctl. We need a way to know if the feature of memory_hotplug.memmap_on_memory is enabled when enabling the feature of freeing vmemmap pages since those two features are not compatible, however, the variable of "memmap_on_memory" cannot indicate this nowadays. Do not set "memmap_on_memory" to 1 when both parameters are passed to cmdline, in this case, "memmap_on_memory" could indicate if this feature is enabled by the users. Also introduce mhp_memmap_on_memory() helper to move the definition of "memmap_on_memory" to the scope of CONFIG_MHP_MEMMAP_ON_MEMORY. It could save a sizeof(bool) memory when !CONFIG_MHP_MEMMAP_ON_MEMORY. In the next patch, mhp_memmap_on_memory() will also be exported to be used in hugetlb_vmemmap.c. Signed-off-by: Muchun Song --- mm/memory_hotplug.c | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index c226a337c1ef..d92edf102cfe 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -42,14 +42,36 @@ #include "internal.h" #include "shuffle.h" +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY +static int memmap_on_memory_set(const char *val, const struct kernel_param *kp) +{ + if (hugetlb_free_vmemmap_enabled()) + return 0; + return param_set_bool(val, kp); +} + +static const struct kernel_param_ops memmap_on_memory_ops = { + .flags = KERNEL_PARAM_OPS_FL_NOARG, + .set = memmap_on_memory_set, + .get = param_get_bool, +}; /* * memory_hotplug.memmap_on_memory parameter */ static bool memmap_on_memory __ro_after_init; -#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY -module_param(memmap_on_memory, bool, 0444); +module_param_cb(memmap_on_memory, &memmap_on_memory_ops, &memmap_on_memory, 0444); MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug"); + +static inline bool mhp_memmap_on_memory(void) +{ + return memmap_on_memory; +} +#else +static inline bool mhp_memmap_on_memory(void) +{ + return false; +} #endif enum { @@ -1289,9 +1311,7 @@ bool mhp_supports_memmap_on_memory(unsigned long size) * altmap as an alternative source of memory, and we do not exactly * populate a single PMD. */ - return memmap_on_memory && - !hugetlb_free_vmemmap_enabled() && - IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && + return mhp_memmap_on_memory() && size == memory_block_size_bytes() && IS_ALIGNED(vmemmap_size, PMD_SIZE) && IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)); @@ -2075,7 +2095,7 @@ static int __ref try_remove_memory(u64 start, u64 size) * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in * the same granularity it was added - a single memory block. */ - if (memmap_on_memory) { + if (mhp_memmap_on_memory()) { nr_vmemmap_pages = walk_memory_blocks(start, size, NULL, get_nr_vmemmap_pages_cb); if (nr_vmemmap_pages) { From patchwork Mon Mar 7 13:07:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12771828 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D592C433F5 for ; Mon, 7 Mar 2022 13:08:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 361638D0006; Mon, 7 Mar 2022 08:08:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E7FA8D0001; Mon, 7 Mar 2022 08:08:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AFF18D0006; Mon, 7 Mar 2022 08:08:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id 0D8348D0001 for ; Mon, 7 Mar 2022 08:08:54 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id BE08AA7994 for ; Mon, 7 Mar 2022 13:08:53 +0000 (UTC) X-FDA: 79217620146.22.3329473 Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) by imf25.hostedemail.com (Postfix) with ESMTP id 79B3AA0004 for ; Mon, 7 Mar 2022 13:08:53 +0000 (UTC) Received: by mail-pj1-f51.google.com with SMTP id mg21-20020a17090b371500b001bef9e4657cso13224962pjb.0 for ; Mon, 07 Mar 2022 05:08:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+FAEIk56iYrAuK3MHoOT3nS4Qb+vOqV2LEpPx2yYRAU=; b=UjMZLsDLy9mCu8/dtQDwPqLjHTvTyNUIOUo6ybj/xzbcbHacSb2nQwfeBr+B3BjPcl mohyYwuVC1tn5KlIMmCvbleNlC4A1Fqy4LrySAQwkwZSQe9FjDR1sWCmmtSbauF/HwNZ EVNiL4Qb7fq3FTb8DdjLQDlAguAnJg3sKhK56EShokt7xobJdH9+KVtKc9C+rBjCjMrO ZYS5+r69TrKsEbdZTSPTjFyqjZK1k0/Za0HblRZ3OGdsa+ZP/TRwR7fxfhFGqVf/7wkI ZmXJRV3/kro/+He81jlubfv0cCdqHlyx4ThM2Kc4/fr7cv+I3CJLI9CUKEApUNZO+k0A /Uaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+FAEIk56iYrAuK3MHoOT3nS4Qb+vOqV2LEpPx2yYRAU=; b=d0jivYjNOaPyItj0QvBUnhx8TEo0w5TxTg6eItv2kmvdI0Xzw8T+FDZZYir4tAAxKa 8UCAzlmE3LQdBck7QamFJPko+6xrzi/ZJ34aFXK9xEhcleZjq8zbtM98QKWFjw6MwPDs FLnUMURA0Jj82wtusKD1NtTO3FI69u+Spr2DPhajjuYp4+NYZB4MkVwjpEhMGMx6IXpd XBqEyjNmXX0lTYiKZKngA6NFmID3fIoZqWJwjmHixuffk/Sd/RcpsiHO5Nax5Pv0j26i 3i+AUfB0jApK3SGAtK84loA7+jhCc5BbedvazKOYijDZQiFldO6AOROy6LW/4UwkJkSh Kmlw== X-Gm-Message-State: AOAM533btpE+po7kJ9ZKEM+whcRHCKtEvK+xlZ1Ccq5vs+XoOhSTxXyb FV56ShSTAQeP9Pef87TUH28T7Q== X-Google-Smtp-Source: ABdhPJxq9zQCb9Rxmm8KdTGq3LEG09tBwTdNh4JsDI4M5Ye2wPb5suJqnSJDNrAzNc8BX+kYsp7OwQ== X-Received: by 2002:a17:902:ef4c:b0:14f:7548:dae3 with SMTP id e12-20020a170902ef4c00b0014f7548dae3mr11899068plx.92.1646658532607; Mon, 07 Mar 2022 05:08:52 -0800 (PST) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.245]) by smtp.gmail.com with ESMTPSA id x9-20020aa79409000000b004f704d33ca0sm3258528pfo.136.2022.03.07.05.08.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Mar 2022 05:08:52 -0800 (PST) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v3 3/4] sysctl: allow to set extra1 to SYSCTL_ONE Date: Mon, 7 Mar 2022 21:07:07 +0800 Message-Id: <20220307130708.58771-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220307130708.58771-1-songmuchun@bytedance.com> References: <20220307130708.58771-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 79B3AA0004 X-Stat-Signature: b4xgikqrfpokbfsbihxf7xjamd99gyaf X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=UjMZLsDL; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf25.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspamd-Server: rspam03 X-HE-Tag: 1646658533-117555 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: proc_do_static_key() does not consider the situation where a sysctl is only allowed to be enabled and cannot be disabled under certain circumstances since it set "->extra1" to SYSCTL_ZERO unconditionally. This patch add the functionality to set "->extra1" accordingly. Signed-off-by: Muchun Song --- kernel/sysctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 64065abf361e..ab3e9c937268 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1631,7 +1631,7 @@ int proc_do_static_key(struct ctl_table *table, int write, .data = &val, .maxlen = sizeof(val), .mode = table->mode, - .extra1 = SYSCTL_ZERO, + .extra1 = table->extra1 == SYSCTL_ONE ? SYSCTL_ONE : SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }; From patchwork Mon Mar 7 13:07:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12771829 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAD8BC433FE for ; Mon, 7 Mar 2022 13:09:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 819DD8D0007; Mon, 7 Mar 2022 08:09:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CAB28D0001; Mon, 7 Mar 2022 08:09:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66AA28D0007; Mon, 7 Mar 2022 08:09:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0129.hostedemail.com [216.40.44.129]) by kanga.kvack.org (Postfix) with ESMTP id 580B28D0001 for ; Mon, 7 Mar 2022 08:09:00 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 14A7C181CAEFE for ; Mon, 7 Mar 2022 13:09:00 +0000 (UTC) X-FDA: 79217620440.23.C22F8E3 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf29.hostedemail.com (Postfix) with ESMTP id A07BD12000D for ; Mon, 7 Mar 2022 13:08:59 +0000 (UTC) Received: by mail-pf1-f170.google.com with SMTP id s8so9638554pfk.12 for ; Mon, 07 Mar 2022 05:08:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1OMb6+2xb2ueOpC2O231W+xC0LlPanYaCyUkm3nLSq0=; b=P0OgwEuzaNO5bFgHBqtDtVGr2gBoU/3xHekRrxUGxiAx+m3oWqI+sG/KAbqKv0h+oY vMi0SgxaatSzUTP19GXHVSCFn7x9hs91BNJCvm10282Xm2e3DBMw7+3M13RNaTdb74Qc lTdEwRxoO/DfxMUkJB+WkItzcc5U3g1XusKg+zO95MM/cFolcykQENe4C02ot1HIwb/5 fyN0uiYpJpTqrKWE/hctCISihgZZcuyIsZ/JaThs2CdoOa1wVROWtGGKsFhr6Ji4W5tQ TeCDDHNTJcQRr0ieUdu4YPIfGwAyiVwDySB4kWsdXsnrj70V6WNBequMvqnMIkqPGgOj oK9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1OMb6+2xb2ueOpC2O231W+xC0LlPanYaCyUkm3nLSq0=; b=lz2FdTfw+sy+gHlbC2xpG1rSCWH7RbiJE1k586OhGEJkenCk9E/OekYsySvoxeM3CU V/RxVESkRXGMD6yNCRriw2QqpFfjeex713Qs63OWENWUYDP5+KM3OunVA0zPqy1bd/ex y+Pttxp1ZDxEhkBti8kZVSqlm6638ZQyYoBtcl0sw9buFK0FKODOwp3GdFgBsQev32O5 At7+Dln7BUM+Ka4EFRCFRllmjuMkzu87Ib4eg/cnSEu3UfIVrPb+u6542ooyK6314X6u kxW5FaBm2KQILh3xCHO7BwSpJwZSx7Xu7U+cZjxdIPuOKTNLXbEihrdGYaWWUqCvJbLT vO6g== X-Gm-Message-State: AOAM532iwU1SdzIrBzVtfS/D6Xkgsy9ZvMPfKMi97kh8irg0EvfbcXAu yQ1LX3d9G+aKcfwzSSnPPj1jbjTr7uzrCDbV X-Google-Smtp-Source: ABdhPJxe7TtLawzowOkb+ZDTnSFzwxCW8mkz9lFK64uTeFNt99dlPNUpl4aYf0ELGjo1lx6PgV/5PA== X-Received: by 2002:a63:5014:0:b0:380:132:6b25 with SMTP id e20-20020a635014000000b0038001326b25mr8513764pgb.211.1646658538636; Mon, 07 Mar 2022 05:08:58 -0800 (PST) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.245]) by smtp.gmail.com with ESMTPSA id x9-20020aa79409000000b004f704d33ca0sm3258528pfo.136.2022.03.07.05.08.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Mar 2022 05:08:58 -0800 (PST) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v3 4/4] mm: hugetlb: add hugetlb_free_vmemmap sysctl Date: Mon, 7 Mar 2022 21:07:08 +0800 Message-Id: <20220307130708.58771-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220307130708.58771-1-songmuchun@bytedance.com> References: <20220307130708.58771-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A07BD12000D X-Stat-Signature: 1qtcds4zrnncn6a3i9wjndi8k5sr987q Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=P0OgwEuz; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf29.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-HE-Tag: 1646658539-537051 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We must add "hugetlb_free_vmemmap=on" to boot cmdline and reboot the server to enable the feature of freeing vmemmap pages of HugeTLB pages. Rebooting usually takes a long time. Add a sysctl to enable or disable the feature at runtime without rebooting. Disabling requires there is no any optimized HugeTLB page in the system. If you fail to disable it, you can set "nr_hugepages" to 0 and then retry. Signed-off-by: Muchun Song --- Documentation/admin-guide/sysctl/vm.rst | 14 ++++ include/linux/memory_hotplug.h | 9 +++ mm/hugetlb_vmemmap.c | 113 +++++++++++++++++++++++++------- mm/hugetlb_vmemmap.h | 4 +- mm/memory_hotplug.c | 7 +- 5 files changed, 116 insertions(+), 31 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f4804ce37c58..9e0e153ed935 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -561,6 +561,20 @@ Change the minimum size of the hugepage pool. See Documentation/admin-guide/mm/hugetlbpage.rst +hugetlb_free_vmemmap +==================== + +Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap +pages associated with each HugeTLB page. Once true, the vmemmap pages of +subsequent allocation of HugeTLB pages from buddy system will be optimized, +whereas already allocated HugeTLB pages will not be optimized. If you fail +to disable this feature, you can set "nr_hugepages" to 0 and then retry +since it is only allowed to be disabled after there is no any optimized +HugeTLB page in the system. + +See Documentation/admin-guide/mm/hugetlbpage.rst + + nr_hugepages_mempolicy ====================== diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index e0b2209ab71c..20d7edf62a6a 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -351,4 +351,13 @@ void arch_remove_linear_mapping(u64 start, u64 size); extern bool mhp_supports_memmap_on_memory(unsigned long size); #endif /* CONFIG_MEMORY_HOTPLUG */ +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY +bool mhp_memmap_on_memory(void); +#else +static inline bool mhp_memmap_on_memory(void) +{ + return false; +} +#endif + #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 49bc7f845438..0f7fe49220cf 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -10,6 +10,7 @@ #define pr_fmt(fmt) "HugeTLB: " fmt +#include #include "hugetlb_vmemmap.h" /* @@ -26,6 +27,10 @@ DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, hugetlb_free_vmemmap_enabled_key); EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key); +/* How many HugeTLB pages with vmemmap pages optimized. */ +static atomic_long_t optimized_pages = ATOMIC_LONG_INIT(0); +static DECLARE_RWSEM(sysctl_rwsem); + static int __init early_hugetlb_free_vmemmap_param(char *buf) { /* We cannot optimize if a "struct page" crosses page boundaries. */ @@ -48,11 +53,6 @@ static int __init early_hugetlb_free_vmemmap_param(char *buf) } early_param("hugetlb_free_vmemmap", early_hugetlb_free_vmemmap_param); -static inline unsigned long free_vmemmap_pages_size_per_hpage(struct hstate *h) -{ - return (unsigned long)free_vmemmap_pages_per_hpage(h) << PAGE_SHIFT; -} - /* * Previously discarded vmemmap pages will be allocated and remapping * after this function returns zero. @@ -61,14 +61,16 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) { int ret; unsigned long vmemmap_addr = (unsigned long)head; - unsigned long vmemmap_end, vmemmap_reuse; + unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; if (!HPageVmemmapOptimized(head)) return 0; - vmemmap_addr += RESERVE_VMEMMAP_SIZE; - vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h); - vmemmap_reuse = vmemmap_addr - PAGE_SIZE; + vmemmap_addr += RESERVE_VMEMMAP_SIZE; + vmemmap_pages = free_vmemmap_pages_per_hpage(h); + vmemmap_end = vmemmap_addr + (vmemmap_pages << PAGE_SHIFT); + vmemmap_reuse = vmemmap_addr - PAGE_SIZE; + /* * The pages which the vmemmap virtual address range [@vmemmap_addr, * @vmemmap_end) are mapped to are freed to the buddy allocator, and @@ -78,8 +80,14 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) */ ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE); - if (!ret) + if (!ret) { ClearHPageVmemmapOptimized(head); + /* + * Paired with acquire semantic in + * hugetlb_free_vmemmap_handler(). + */ + atomic_long_dec_return_release(&optimized_pages); + } return ret; } @@ -87,22 +95,28 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) void free_huge_page_vmemmap(struct hstate *h, struct page *head) { unsigned long vmemmap_addr = (unsigned long)head; - unsigned long vmemmap_end, vmemmap_reuse; + unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; - if (!free_vmemmap_pages_per_hpage(h)) - return; + down_read(&sysctl_rwsem); + vmemmap_pages = free_vmemmap_pages_per_hpage(h); + if (!vmemmap_pages) + goto out; - vmemmap_addr += RESERVE_VMEMMAP_SIZE; - vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h); - vmemmap_reuse = vmemmap_addr - PAGE_SIZE; + vmemmap_addr += RESERVE_VMEMMAP_SIZE; + vmemmap_end = vmemmap_addr + (vmemmap_pages << PAGE_SHIFT); + vmemmap_reuse = vmemmap_addr - PAGE_SIZE; /* * Remap the vmemmap virtual address range [@vmemmap_addr, @vmemmap_end) * to the page which @vmemmap_reuse is mapped to, then free the pages * which the range [@vmemmap_addr, @vmemmap_end] is mapped to. */ - if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) + if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) { SetHPageVmemmapOptimized(head); + atomic_long_inc(&optimized_pages); + } +out: + up_read(&sysctl_rwsem); } void __init hugetlb_vmemmap_init(struct hstate *h) @@ -118,18 +132,16 @@ void __init hugetlb_vmemmap_init(struct hstate *h) BUILD_BUG_ON(__NR_USED_SUBPAGE >= RESERVE_VMEMMAP_SIZE / sizeof(struct page)); - if (!hugetlb_free_vmemmap_enabled()) - return; - - if (IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON) && - !is_power_of_2(sizeof(struct page))) { + if (!is_power_of_2(sizeof(struct page))) { /* * The hugetlb_free_vmemmap_enabled_key can be enabled when * CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON. It should * be disabled if "struct page" crosses page boundaries. */ - pr_warn_once("cannot free vmemmap pages because \"struct page\" crosses page boundaries\n"); - static_branch_disable(&hugetlb_free_vmemmap_enabled_key); + if (IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON)) { + pr_warn_once("cannot free vmemmap pages because \"struct page\" crosses page boundaries\n"); + static_branch_disable(&hugetlb_free_vmemmap_enabled_key); + } return; } @@ -148,3 +160,56 @@ void __init hugetlb_vmemmap_init(struct hstate *h) pr_info("can free %d vmemmap pages for %s\n", h->nr_free_vmemmap_pages, h->name); } + +static int hugetlb_free_vmemmap_handler(struct ctl_table *table, int write, + void *buffer, size_t *length, + loff_t *ppos) +{ + int ret; + + down_write(&sysctl_rwsem); + /* + * Cannot be disabled when there is at lease one optimized + * HugeTLB in the system. + * + * The acquire semantic is paired with release semantic in + * alloc_huge_page_vmemmap(). If we saw the @optimized_pages + * with 0, all the operations of vmemmap pages remapping from + * alloc_huge_page_vmemmap() are visible too so that we can + * safely disable static key. + */ + table->extra1 = atomic_long_read_acquire(&optimized_pages) ? + SYSCTL_ONE : SYSCTL_ZERO; + ret = proc_do_static_key(table, write, buffer, length, ppos); + up_write(&sysctl_rwsem); + + return ret; +} + +static struct ctl_table hugetlb_vmemmap_sysctls[] = { + { + .procname = "hugetlb_free_vmemmap", + .data = &hugetlb_free_vmemmap_enabled_key.key, + .mode = 0644, + .proc_handler = hugetlb_free_vmemmap_handler, + }, + { } +}; + +static __init int hugetlb_vmemmap_sysctls_init(void) +{ + if (!is_power_of_2(sizeof(struct page))) + return 0; + + /* + * The vmemmap pages cannot be optimized if + * "memory_hotplug.memmap_on_memory" is enabled. + */ + if (mhp_memmap_on_memory()) + return 0; + + register_sysctl_init("vm", hugetlb_vmemmap_sysctls); + + return 0; +} +late_initcall(hugetlb_vmemmap_sysctls_init); diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index cb2bef8f9e73..b67a159027f4 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -21,7 +21,9 @@ void hugetlb_vmemmap_init(struct hstate *h); */ static inline unsigned int free_vmemmap_pages_per_hpage(struct hstate *h) { - return h->nr_free_vmemmap_pages; + if (hugetlb_free_vmemmap_enabled()) + return h->nr_free_vmemmap_pages; + return 0; } #else static inline int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index d92edf102cfe..e69c31cea917 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -63,15 +63,10 @@ static bool memmap_on_memory __ro_after_init; module_param_cb(memmap_on_memory, &memmap_on_memory_ops, &memmap_on_memory, 0444); MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug"); -static inline bool mhp_memmap_on_memory(void) +bool mhp_memmap_on_memory(void) { return memmap_on_memory; } -#else -static inline bool mhp_memmap_on_memory(void) -{ - return false; -} #endif enum {