From patchwork Mon Jun 13 06:35:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12879051 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6C1BCCA47E for ; Mon, 13 Jun 2022 06:35:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5CA798D0158; Mon, 13 Jun 2022 02:35:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 579E78D0142; Mon, 13 Jun 2022 02:35:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4059D8D0158; Mon, 13 Jun 2022 02:35:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 308A08D0142 for ; Mon, 13 Jun 2022 02:35:33 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 0E389120676 for ; Mon, 13 Jun 2022 06:35:33 +0000 (UTC) X-FDA: 79572251346.11.3105D7E Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf31.hostedemail.com (Postfix) with ESMTP id AED402009E for ; Mon, 13 Jun 2022 06:35:32 +0000 (UTC) Received: by mail-pj1-f48.google.com with SMTP id o6-20020a17090a0a0600b001e2c6566046so7954623pjo.0 for ; Sun, 12 Jun 2022 23:35:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=B72Ua44K+e6NGA36blstxa09i97avcKmfrwwXvB2UR0=; b=qwkYRnZuS/W9slMWkIAwYldruSCyscLX8/wyF7kLD8G4uAeT2sqkPMDJ9FLteGBkba jPtSNz9K5I6n2/vLHYGNDpiJsEr6w0AdooLipoVAUAZ8Y4EBOxLw9+mQhljpgwFEQzqf yztz6xnzmXcan/IcMsTOWEk5Pil7u0lNfnQePAdqkP+93elHBvBPF44pEJT1wFoWpLse 3kVLIYgT3gyvTFurcJnYDGckq9NnRYCNM90frf7MuhOYmtrIXMfiKNc7i4JsrOf8flV7 yS3Q7pWIirK0dCxNgn28AalTjpfE0oz8lfmYe0RuwZiDGEikCM0cN6ykGA39pbDhQ8x8 Fgew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=B72Ua44K+e6NGA36blstxa09i97avcKmfrwwXvB2UR0=; b=vNJmPLD4PzISAjEFsbVdifFR5goaZo2nejL59tYQCSYHF0QlBd4qI4z7StzHP9FUb6 xtgrCXo9UgFSmpyf6BVV/+4MHRdGud+laJ3wnt30NRiOvVGKxp56nU0WnG2HxCMcDS3G wAWoFo4s+Sm9G6tBxAf8KarNoCh5YaIufE8dACgvRPc9B1nKPyypizFYpEwY/N9+mmNk YT7p0j8/zpPUvLgonjQXKzgsOgG/eMWdgLhyWj84ZeUHZfmbZGqo/GdFhATkkdeKtf5+ lu3iAnoQ71yO6MDuQYVrMA4FISCQPtnvH7zazSB9NWjwtsuy8gwykW9tOFvKUkeFi/5K bdLw== X-Gm-Message-State: AOAM5300fbWp687HYRI81w9ONo2mPRAptSvMDD9OZ7001Eha2g+c2h8G pOYxZo7HbCmhQnb8SVcMTpAtXA== X-Google-Smtp-Source: ABdhPJy8ePrJh8eNpFPc7r+hoH67PAyb+LUzYIz//Skci1r3NK3WQ1UlwXpoXXJMTLqE6uQnUgUeXg== X-Received: by 2002:a17:902:d2c9:b0:167:1195:3a41 with SMTP id n9-20020a170902d2c900b0016711953a41mr51441419plc.126.1655102131696; Sun, 12 Jun 2022 23:35:31 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id v3-20020aa799c3000000b0051bc538baadsm4366554pfi.184.2022.06.12.23.35.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Jun 2022 23:35:31 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, david@redhat.com, akpm@linux-foundation.org, corbet@lwn.net Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Muchun Song , Catalin Marinas , Will Deacon , Anshuman Khandual Subject: [PATCH 1/6] mm: hugetlb_vmemmap: delete hugetlb_optimize_vmemmap_enabled() Date: Mon, 13 Jun 2022 14:35:07 +0800 Message-Id: <20220613063512.17540-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220613063512.17540-1-songmuchun@bytedance.com> References: <20220613063512.17540-1-songmuchun@bytedance.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655102132; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=B72Ua44K+e6NGA36blstxa09i97avcKmfrwwXvB2UR0=; b=8Mwzv04JnD6h7jqlemaorMmQatgI9MCTz4OF9ujHKDCEZ8KQZtcBVSR0a79AT8KE1Z7CK+ Evx0g5sV3lY4OYTs6EL5BUtL0szKgd/2D3jlqqE9gmiOFlY/O+vFohaexrCUPkq8RWZiO8 VZKQ2Gw6K43UEnRIqgYfq4NPLjdMMDY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655102132; a=rsa-sha256; cv=none; b=H59rR4++fEoBfxMhC4bdDpvvfrIWOqbSt+FG6cM/NgEsWKY55oDbTekuXZ6J7+/Xyiy8A/ LJuwMjQq8xXyxxLN6mv75d4BkVQEw1syFrQZAqLH9NPPvpcjPM7wRTK9Xga1YXMRx8S+bb tV/mzNk9bADQYbGCcXVr+lntQhlqnQc= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=qwkYRnZu; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf31.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=qwkYRnZu; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf31.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspamd-Server: rspam08 X-Rspam-User: X-Stat-Signature: ifoajeixkkgq7gc5dowc76yu3gafiodg X-Rspamd-Queue-Id: AED402009E X-HE-Tag: 1655102132-259202 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The name hugetlb_optimize_vmemmap_enabled() a bit confusing as it tests two conditions (enabled and pages in use). Instead of coming up to an appropriate name, we could just delete it. There is already a discussion about deleting it in thread [1]. There is only one user of hugetlb_optimize_vmemmap_enabled() outside of hugetlb_vmemmap, that is flush_dcache_page() in arch/arm64/mm/flush.c. However, it does not need to call hugetlb_optimize_vmemmap_enabled() in flush_dcache_page() since HugeTLB pages are always fully mapped and only head page will be set PG_dcache_clean meaning only head page's flag may need to be cleared (see commit cf5a501d985b). So it is easy to remove hugetlb_optimize_vmemmap_enabled(). Link: https://lore.kernel.org/all/c77c61c8-8a5a-87e8-db89-d04d8aaab4cc@oracle.com/ [1] Signed-off-by: Muchun Song Cc: Catalin Marinas Cc: Will Deacon Cc: Anshuman Khandual Reviewed-by: Oscar Salvador Reviewed-by: Mike Kravetz Reviewed-by: Catalin Marinas --- arch/arm64/mm/flush.c | 13 +++---------- include/linux/page-flags.h | 14 ++------------ 2 files changed, 5 insertions(+), 22 deletions(-) diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index fc4f710e9820..5f9379b3c8c8 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -76,17 +76,10 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache); void flush_dcache_page(struct page *page) { /* - * Only the head page's flags of HugeTLB can be cleared since the tail - * vmemmap pages associated with each HugeTLB page are mapped with - * read-only when CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is enabled (more - * details can refer to vmemmap_remap_pte()). Although - * __sync_icache_dcache() only set PG_dcache_clean flag on the head - * page struct, there is more than one page struct with PG_dcache_clean - * associated with the HugeTLB page since the head vmemmap page frame - * is reused (more details can refer to the comments above - * page_fixed_fake_head()). + * HugeTLB pages are always fully mapped and only head page will be + * set PG_dcache_clean (see comments in __sync_icache_dcache()). */ - if (hugetlb_optimize_vmemmap_enabled() && PageHuge(page)) + if (PageHuge(page)) page = compound_head(page); if (test_bit(PG_dcache_clean, &page->flags)) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index de80f0c26b2f..b8b992cb201c 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -203,12 +203,6 @@ enum pageflags { DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, hugetlb_optimize_vmemmap_key); -static __always_inline bool hugetlb_optimize_vmemmap_enabled(void) -{ - return static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - &hugetlb_optimize_vmemmap_key); -} - /* * If the feature of optimizing vmemmap pages associated with each HugeTLB * page is enabled, the head vmemmap page frame is reused and all of the tail @@ -227,7 +221,8 @@ static __always_inline bool hugetlb_optimize_vmemmap_enabled(void) */ static __always_inline const struct page *page_fixed_fake_head(const struct page *page) { - if (!hugetlb_optimize_vmemmap_enabled()) + if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, + &hugetlb_optimize_vmemmap_key)) return page; /* @@ -255,11 +250,6 @@ static inline const struct page *page_fixed_fake_head(const struct page *page) { return page; } - -static inline bool hugetlb_optimize_vmemmap_enabled(void) -{ - return false; -} #endif static __always_inline int page_is_fake_head(struct page *page) From patchwork Mon Jun 13 06:35:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12879052 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CEAACCA47B for ; Mon, 13 Jun 2022 06:35:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD36F8D0159; Mon, 13 Jun 2022 02:35:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A83238D0142; Mon, 13 Jun 2022 02:35:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 924378D0159; Mon, 13 Jun 2022 02:35:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 810A98D0142 for ; Mon, 13 Jun 2022 02:35:36 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 550C0120676 for ; Mon, 13 Jun 2022 06:35:36 +0000 (UTC) X-FDA: 79572251472.17.2B4CC38 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by imf28.hostedemail.com (Postfix) with ESMTP id F0B30C009E for ; Mon, 13 Jun 2022 06:35:35 +0000 (UTC) Received: by mail-pj1-f47.google.com with SMTP id y13-20020a17090a154d00b001eaaa3b9b8dso2168078pja.2 for ; Sun, 12 Jun 2022 23:35:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XLBHCSl3x40n/JirLCrBjaqZu+zJJs2ufPMKi7d2Sww=; b=oyts69b4avBW6IfgVF+4QpgyMTlLyDvTK8H8cgqFmeMxrl7bKHSPH0QcLfPSUXkpFx nvC3JmFuxzPBTZ08riXpF3Hnn0WwSf0Xz5YoNlEhowi0FwIsSdFz3JNWmEg73Mu8zFZL yTHQ74HZI/9sqE+udr02AltX1Wk3vQDAL+J9nK9PLL2UcKGGnsTrwzNPqPTZC0uucGgB SIfPr0Zg0Vv+ShY5B1ecsFgDlSe6swxvd7QQA3/qQk0QLSHfPE2ri/ZSGcRQNsI0mcpp c9v+ELelY0gKRbnsL7BJ+AN8XbTC/qE5w/gmOik/HPgrmqeYOvTpOOBkPbC19WeXC67S 4ytg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XLBHCSl3x40n/JirLCrBjaqZu+zJJs2ufPMKi7d2Sww=; b=UNPIeOEG+dDEoCO0/tjXhZWJ1dp7F0u2VuPYOYJp1MvqrSsTifjcJ3Ojdvj9ev+gu6 77PqkD1RwswOhZM7d4pkHDldN5FGdPnQt9484wcDr8rS+OjYjGdtQ1DaGVi+xDfTlcVZ kbc+Xvtj8SJBlXBArSzfD9zBIESQW9chUFJprfq2STtZGOEyM1q83SUEY6zSXyPKHlON qc8PFuv5koakQhtjjPeo5aBHJZj4yozXzI8/PcIVOPPJ6qYbXPtZwFjfpxVzz/nPHHtn R1KonmbfKTVbZytekCpn+1j4QNLafN3pdctkTBf47O9uAO+UwWSQxYX3T6x1YaGLg6nh DTRw== X-Gm-Message-State: AOAM533T2Nr6ZRhuGgvZE0yraXWULPrZN8CggDGsO3DevOvGM4kGm18M IgYZxdWiXCPJU76zXgpWQOh7Gw== X-Google-Smtp-Source: ABdhPJxCusXzGwHuisl+klFQ+x+A+Evfz+qDo9pEo1G3bTqjCJLS5ohmbbAPHwTAHuGVHxld3Y5Wog== X-Received: by 2002:a17:902:e742:b0:166:4d34:3be8 with SMTP id p2-20020a170902e74200b001664d343be8mr53370044plf.140.1655102135058; Sun, 12 Jun 2022 23:35:35 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id v3-20020aa799c3000000b0051bc538baadsm4366554pfi.184.2022.06.12.23.35.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Jun 2022 23:35:34 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, david@redhat.com, akpm@linux-foundation.org, corbet@lwn.net Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Muchun Song Subject: [PATCH 2/6] mm: hugetlb_vmemmap: optimize vmemmap_optimize_mode handling Date: Mon, 13 Jun 2022 14:35:08 +0800 Message-Id: <20220613063512.17540-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220613063512.17540-1-songmuchun@bytedance.com> References: <20220613063512.17540-1-songmuchun@bytedance.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655102136; a=rsa-sha256; cv=none; b=YRr2TMnaYqgk8Ccjdq7wWGXKDgQDWIeQ0fYGWvbq5B2fvNMNy3Rsf+3KDL+97qKU/mvezt ngGBbJq8QzDmjhbyrGkXRglaghtIVTakYdj/A/ZoFy5daxRSEbpb+IllbwV4xlRTUQsjJP ufJ24OKVpE4n6g8huGGGrpQreCqiVag= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655102136; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XLBHCSl3x40n/JirLCrBjaqZu+zJJs2ufPMKi7d2Sww=; b=JSr2mkPyj0LwVvCPj3mQ3HXG6z1w5RvD+1MlDpk3wFlOPq1ERwolqBVw7y7Yli3J73X/p/ B4zYKSFjmkSxnCV1S69W4Gxh/e0TNOXxd8/hvNy1qZvW8HJxISGMWnh/ItyntpEdeww8UJ Wrjh0w+kzTVvG2QTuIkq+fkAuD0Vupg= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=oyts69b4; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf28.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=oyts69b4; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf28.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Stat-Signature: c3cek5fgpbcpa3xokbbbsh6ghufco7jk X-Rspamd-Queue-Id: F0B30C009E X-Rspamd-Server: rspam12 X-Rspam-User: X-HE-Tag: 1655102135-60986 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We hold an another reference to hugetlb_optimize_vmemmap_key when making vmemmap_optimize_mode on, because we use static_key to tell memory_hotplug that memory_hotplug.memmap_on_memory should be overridden. However, this rule has gone when we have introduced SECTION_CANNOT_OPTIMIZE_VMEMMAP. Therefore, we could simplify vmemmap_optimize_mode handling by not holding an another reference to hugetlb_optimize_vmemmap_key. Signed-off-by: Muchun Song Reviewed-by: Oscar Salvador Reviewed-by: Mike Kravetz --- include/linux/page-flags.h | 6 ++--- mm/hugetlb_vmemmap.c | 65 +++++----------------------------------------- 2 files changed, 9 insertions(+), 62 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index b8b992cb201c..da7ccc3b16ad 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -200,8 +200,7 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H #ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP -DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - hugetlb_optimize_vmemmap_key); +DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); /* * If the feature of optimizing vmemmap pages associated with each HugeTLB @@ -221,8 +220,7 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, */ static __always_inline const struct page *page_fixed_fake_head(const struct page *page) { - if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - &hugetlb_optimize_vmemmap_key)) + if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key)) return page; /* diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index e20a7082f2f8..132dc83f0130 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -23,42 +23,15 @@ #define RESERVE_VMEMMAP_NR 1U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) -enum vmemmap_optimize_mode { - VMEMMAP_OPTIMIZE_OFF, - VMEMMAP_OPTIMIZE_ON, -}; - -DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - hugetlb_optimize_vmemmap_key); +DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); -static enum vmemmap_optimize_mode vmemmap_optimize_mode = +static bool vmemmap_optimize_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON); -static void vmemmap_optimize_mode_switch(enum vmemmap_optimize_mode to) -{ - if (vmemmap_optimize_mode == to) - return; - - if (to == VMEMMAP_OPTIMIZE_OFF) - static_branch_dec(&hugetlb_optimize_vmemmap_key); - else - static_branch_inc(&hugetlb_optimize_vmemmap_key); - WRITE_ONCE(vmemmap_optimize_mode, to); -} - static int __init hugetlb_vmemmap_early_param(char *buf) { - bool enable; - enum vmemmap_optimize_mode mode; - - if (kstrtobool(buf, &enable)) - return -EINVAL; - - mode = enable ? VMEMMAP_OPTIMIZE_ON : VMEMMAP_OPTIMIZE_OFF; - vmemmap_optimize_mode_switch(mode); - - return 0; + return kstrtobool(buf, &vmemmap_optimize_enabled); } early_param("hugetlb_free_vmemmap", hugetlb_vmemmap_early_param); @@ -103,7 +76,7 @@ static unsigned int optimizable_vmemmap_pages(struct hstate *h, unsigned long pfn = page_to_pfn(head); unsigned long end = pfn + pages_per_huge_page(h); - if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF) + if (!READ_ONCE(vmemmap_optimize_enabled)) return 0; for (; pfn < end; pfn += PAGES_PER_SECTION) { @@ -155,7 +128,6 @@ void __init hugetlb_vmemmap_init(struct hstate *h) if (!is_power_of_2(sizeof(struct page))) { pr_warn_once("cannot optimize vmemmap pages because \"struct page\" crosses page boundaries\n"); - static_branch_disable(&hugetlb_optimize_vmemmap_key); return; } @@ -176,36 +148,13 @@ void __init hugetlb_vmemmap_init(struct hstate *h) } #ifdef CONFIG_PROC_SYSCTL -static int hugetlb_optimize_vmemmap_handler(struct ctl_table *table, int write, - void *buffer, size_t *length, - loff_t *ppos) -{ - int ret; - enum vmemmap_optimize_mode mode; - static DEFINE_MUTEX(sysctl_mutex); - - if (write && !capable(CAP_SYS_ADMIN)) - return -EPERM; - - mutex_lock(&sysctl_mutex); - mode = vmemmap_optimize_mode; - table->data = &mode; - ret = proc_dointvec_minmax(table, write, buffer, length, ppos); - if (write && !ret) - vmemmap_optimize_mode_switch(mode); - mutex_unlock(&sysctl_mutex); - - return ret; -} - static struct ctl_table hugetlb_vmemmap_sysctls[] = { { .procname = "hugetlb_optimize_vmemmap", - .maxlen = sizeof(enum vmemmap_optimize_mode), + .data = &vmemmap_optimize_enabled, + .maxlen = sizeof(int), .mode = 0644, - .proc_handler = hugetlb_optimize_vmemmap_handler, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, + .proc_handler = proc_dobool, }, { } }; From patchwork Mon Jun 13 06:35:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12879053 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BFA1C43334 for ; Mon, 13 Jun 2022 06:35:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E077E8D015A; Mon, 13 Jun 2022 02:35:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB7498D0142; Mon, 13 Jun 2022 02:35:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C58738D015A; Mon, 13 Jun 2022 02:35:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B36C98D0142 for ; Mon, 13 Jun 2022 02:35:40 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 8E28980BD8 for ; Mon, 13 Jun 2022 06:35:40 +0000 (UTC) X-FDA: 79572251640.01.91D9C99 Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) by imf03.hostedemail.com (Postfix) with ESMTP id 0592E20042 for ; Mon, 13 Jun 2022 06:35:39 +0000 (UTC) Received: by mail-pg1-f170.google.com with SMTP id q140so4727489pgq.6 for ; Sun, 12 Jun 2022 23:35:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2+OKPYQ5REAoye/wSsuy111hflzgcPbPLQAUS9w4vzY=; b=UYN43vXERJXZ9csVHQZAu3w+v7BI/PQ0oeR4+TZu85yj6WdN/18M7tPeNA86JQv0e9 SgArwTPIRb2QOuLEAdNysGtz7p4ElisPmsr3carhhenLFpoFRcylbY/nbwXXKm2HtLRF Wol3JvHDVcHMeKhemIgfBhErkg6/Jo7+PB+1RDjJb7PRfjKx2EldoY8ueWHgFrIyaC5f 4irSNHL3qy7UKj3th4VmZDa8vJ4HYoc01II29r3nFIT7jgbCEGQXpmHxkYZbk+1TRwJR cY4zcduFoLB93qL6h4m9LTBxkNG2//NyIft3/u5IYhYM0QwOCiIswLcEdxAe2XGt3SIx f1PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2+OKPYQ5REAoye/wSsuy111hflzgcPbPLQAUS9w4vzY=; b=g7KT9fhQuMy1ztUqcB4F0BVLc4qo1sksNrJMD8kmAmPUnH1YkMKWnWgx+xteOGw6BR 19EZ4GPB9nW6/dkQwPcboo9ICb/qGCH3lAyqhBH7c3w3/I+vaCiUeP4UWfZB3VrhIsXd OlU/FEiMVE+SKjTMSsac0Rmwl8SjLvq+on+3NA519nvXEXim745p4BaYnl95MPGyWJVH w3r/ft9bz0Zn3TBoNo+aJXABlspaBY2dwDlgjhRKwsVHlnoZipE2WEcMOKokTz8PZXOv R01+ZqZA7wGbxAMScUIwr9W/giJ1EaGTXBCJ43Oe80R54Ba8nJmyg4wCvL6j9TSMnjwe VRWw== X-Gm-Message-State: AOAM530JttdN+gPDkmGwRJUNWUAPztHXs7AdeB0fewPzjHNaZwXwD3/R HJAA3RNPhOmO9u4hd5n3/Fi23w== X-Google-Smtp-Source: ABdhPJxz7UwWsexBbVriPZbR0Xz+JC+GNqKyElGl7birTSFGxX0z0EQ/fFZlYKwK8LYDlaLorQU6Ug== X-Received: by 2002:a63:cc09:0:b0:3fb:aae7:4964 with SMTP id x9-20020a63cc09000000b003fbaae74964mr49566503pgf.118.1655102139008; Sun, 12 Jun 2022 23:35:39 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id v3-20020aa799c3000000b0051bc538baadsm4366554pfi.184.2022.06.12.23.35.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Jun 2022 23:35:38 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, david@redhat.com, akpm@linux-foundation.org, corbet@lwn.net Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Muchun Song Subject: [PATCH 3/6] mm: hugetlb_vmemmap: introduce the name HVO Date: Mon, 13 Jun 2022 14:35:09 +0800 Message-Id: <20220613063512.17540-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220613063512.17540-1-songmuchun@bytedance.com> References: <20220613063512.17540-1-songmuchun@bytedance.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655102140; a=rsa-sha256; cv=none; b=QiXSH7k6p6wYPCZOgXXlWo/XYe6OPNjStcw101GpcIOfnKC+3q+Tw2R+X+P3FcsP0+R8SP ucWjT6+ukhCRR7o3WzfDgDjXJFVEblmU4iCVlLOQ1imDUuhMkWmYdh3475+kseDWTnMyM7 7agBJ25tUSWx9y77bFB5P4q0sQLy5Uo= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=UYN43vXE; spf=pass (imf03.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655102140; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2+OKPYQ5REAoye/wSsuy111hflzgcPbPLQAUS9w4vzY=; b=2ga+GU+YJKhsrqMbniUoO5zzGcPjAgCjd6Zal7ZsyXh6BlBuS2p7NOTPUOpyO83lB0RNrS DG9yGyrkI1KzYVI16A6o/PAt6Zw3omp33Vhc2vcMbbUKuYLg5XLEAdvvCyjQehx9U361l5 eEr/653t7uFqlebR9n3ToQVlGpZxam8= X-Rspamd-Queue-Id: 0592E20042 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=UYN43vXE; spf=pass (imf03.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspam-User: X-Rspamd-Server: rspam06 X-Stat-Signature: t4bumt361xot3n63rd16w36ij1twa9an X-HE-Tag: 1655102139-73857 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It it inconvenient to mention the feature of optimizing vmemmap pages associated with HugeTLB pages when communicating with others since there is no specific or abbreviated name for it when it is first introduced. Let us give it a name HVO (HugeTLB Vmemmap Optimization) from now. This commit also updates the document about "hugetlb_free_vmemmap" by the way discussed in thread [1]. Link: https://lore.kernel.org/all/21aae898-d54d-cc4b-a11f-1bb7fddcfffa@redhat.com/ [1] Signed-off-by: Muchun Song Reviewed-by: Oscar Salvador Reviewed-by: Mike Kravetz --- Documentation/admin-guide/kernel-parameters.txt | 7 ++++--- Documentation/admin-guide/mm/hugetlbpage.rst | 3 +-- Documentation/admin-guide/sysctl/vm.rst | 3 +-- fs/Kconfig | 13 ++++++------- mm/hugetlb_vmemmap.c | 8 ++++---- mm/hugetlb_vmemmap.h | 4 ++-- 6 files changed, 18 insertions(+), 20 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 391b43fee93e..7539553b3fb0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1725,12 +1725,13 @@ hugetlb_free_vmemmap= [KNL] Reguires CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP enabled. + Control if HugeTLB Vmemmap Optimization (HVO) is enabled. Allows heavy hugetlb users to free up some more memory (7 * PAGE_SIZE for each 2MB hugetlb page). - Format: { [oO][Nn]/Y/y/1 | [oO][Ff]/N/n/0 (default) } + Format: { on | off (default) } - [oO][Nn]/Y/y/1: enable the feature - [oO][Ff]/N/n/0: disable the feature + on: enable HVO + off: disable HVO Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y, the default is on. diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index a90330d0a837..64e0d5c512e7 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -164,8 +164,7 @@ default_hugepagesz will all result in 256 2M huge pages being allocated. Valid default huge page size is architecture dependent. hugetlb_free_vmemmap - When CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is set, this enables optimizing - unused vmemmap pages associated with each HugeTLB page. + When CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is set, this enables HVO. When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages`` indicates the current number of pre-allocated huge pages of the default size. diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index d7374a1e8ac9..c9f35db973f0 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -569,8 +569,7 @@ This knob is not available when the size of 'struct page' (a structure defined in include/linux/mm_types.h) is not power of two (an unusual system config could result in this). -Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages -associated with each HugeTLB page. +Enable (set to 1) or disable (set to 0) HugeTLB Vmemmap Optimization (HVO). Once enabled, the vmemmap pages of subsequent allocation of HugeTLB pages from buddy allocator will be optimized (7 pages per 2MB HugeTLB page and 4095 pages diff --git a/fs/Kconfig b/fs/Kconfig index 5976eb33535f..2f9fd840cb66 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -247,8 +247,7 @@ config HUGETLB_PAGE # # Select this config option from the architecture Kconfig, if it is preferred -# to enable the feature of minimizing overhead of struct page associated with -# each HugeTLB page. +# to enable the feature of HugeTLB Vmemmap Optimization (HVO). # config ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP bool @@ -259,14 +258,14 @@ config HUGETLB_PAGE_OPTIMIZE_VMEMMAP depends on SPARSEMEM_VMEMMAP config HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON - bool "Default optimizing vmemmap pages of HugeTLB to on" + bool "Default HugeTLB Vmemmap Optimization (HVO) to on" default n depends on HUGETLB_PAGE_OPTIMIZE_VMEMMAP help - When using HUGETLB_PAGE_OPTIMIZE_VMEMMAP, the optimizing unused vmemmap - pages associated with each HugeTLB page is default off. Say Y here - to enable optimizing vmemmap pages of HugeTLB by default. It can then - be disabled on the command line via hugetlb_free_vmemmap=off. + When using HUGETLB_PAGE_OPTIMIZE_VMEMMAP, the HugeTLB Vmemmap + Optimization (HVO) is off by default. Say Y here to enable HVO + by default. It can then be disabled on the command line via + hugetlb_free_vmemmap=off or sysctl. config MEMFD_CREATE def_bool TMPFS || HUGETLBFS diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 132dc83f0130..c10540993577 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -1,8 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Optimize vmemmap pages associated with HugeTLB + * HugeTLB Vmemmap Optimization (HVO) * - * Copyright (c) 2020, Bytedance. All rights reserved. + * Copyright (c) 2020, ByteDance. All rights reserved. * * Author: Muchun Song * @@ -120,8 +120,8 @@ void __init hugetlb_vmemmap_init(struct hstate *h) /* * There are only (RESERVE_VMEMMAP_SIZE / sizeof(struct page)) struct - * page structs that can be used when CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP, - * so add a BUILD_BUG_ON to catch invalid usage of the tail struct page. + * page structs that can be used when HVO is enabled, add a BUILD_BUG_ON + * to catch invalid usage of the tail page structs. */ BUILD_BUG_ON(__NR_USED_SUBPAGE >= RESERVE_VMEMMAP_SIZE / sizeof(struct page)); diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index 109b0a53b6fe..ba66fadad9fc 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -1,8 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Optimize vmemmap pages associated with HugeTLB + * HugeTLB Vmemmap Optimization (HVO) * - * Copyright (c) 2020, Bytedance. All rights reserved. + * Copyright (c) 2020, ByteDance. All rights reserved. * * Author: Muchun Song */ From patchwork Mon Jun 13 06:35:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12879054 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A607CC43334 for ; Mon, 13 Jun 2022 06:35:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 41D338D015B; Mon, 13 Jun 2022 02:35:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CC278D0142; Mon, 13 Jun 2022 02:35:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21F388D015B; Mon, 13 Jun 2022 02:35:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0E9CD8D0142 for ; Mon, 13 Jun 2022 02:35:45 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DD26320E3D for ; Mon, 13 Jun 2022 06:35:44 +0000 (UTC) X-FDA: 79572251808.02.3EBD448 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf18.hostedemail.com (Postfix) with ESMTP id 8DFCF1C007C for ; Mon, 13 Jun 2022 06:35:44 +0000 (UTC) Received: by mail-pl1-f174.google.com with SMTP id d5so1693663plo.12 for ; Sun, 12 Jun 2022 23:35:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Cd5TADEYIw6EN5sUEtH2CBYk3q1W3D399BGmAPG/bAw=; b=vEdfjXcvbTsdSeTcxUk5kx5POEj7WY0iyWJKmSGt22Lx3wglGegKrdLGvzJweT1AfH cH6HNumJqtGob9JvAQ65Hj0ngIHjw1ODg/2td5bWnuQDN0fcsyjTco7ht5RDsSqp1NI5 9mw4tmKtNMqMmoHhhdek7rkOUVRQ93m0cV0p+aPlK5ahPwIYQgiJmj1YsN/KSHiUsLat XYboVO2+y7ALLL6J6Hvett0jO+TGXUJ+kE9i0lmnx69FivI2/5Ws8UtzUawNq/q6X6yl J9pkD174uMVGuXy0l5ILCjcs1esTWjr80/6LNxgDcC/wLNwJ/mvIF4+ePaF1Zc8kuBcQ jpbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Cd5TADEYIw6EN5sUEtH2CBYk3q1W3D399BGmAPG/bAw=; b=sZzKhRLCgPYGYMM2fne5KJwmsLgMB9cOz8CBVdgobxL6YzL4hE4ldqEh5WxsvAiPuv lPg0ElNmtbuMqi18s5gM2pZxfABtmVo7tmcOtyp7H2Junhip7cL/tVCxLS0EPFBdggT5 HXSsIjJemlrJ1gG/q5XZ8+P6HmIrz99orNyUHiAXmsUvzhR6VGFgLY57GfQi6h2dZhcu nLQs+PYNpo3pWd/xskk1hD8Xay0xZaQhB+sw2zhiB5EFWsQmZRDhlvHOWOPRNkjMM8ov ZT2H9jwz2JUqdbM3uHJQLTQyPn9Hh77MPfN1yq7UWWKgYS8DwoPk0p+Zqp9Th2qxWuOj DmZA== X-Gm-Message-State: AOAM533Cap/UPrQGxlAqvG91nrDLcpoIxhgtuR8taxSsOBuMd6VWuIBn UxDjAJCmnf9oQarIQlYWVTHYsw== X-Google-Smtp-Source: ABdhPJyUPsj3ZCa5CUrkPMU5XgNXd44Hm79QahH4VTCrNacbiTJjMF1MkuSpn9bKxk79RDbBHcwhKA== X-Received: by 2002:a17:90a:c70a:b0:1e2:eb3e:239f with SMTP id o10-20020a17090ac70a00b001e2eb3e239fmr14202730pjt.94.1655102143315; Sun, 12 Jun 2022 23:35:43 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id v3-20020aa799c3000000b0051bc538baadsm4366554pfi.184.2022.06.12.23.35.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Jun 2022 23:35:42 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, david@redhat.com, akpm@linux-foundation.org, corbet@lwn.net Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Muchun Song Subject: [PATCH 4/6] mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c Date: Mon, 13 Jun 2022 14:35:10 +0800 Message-Id: <20220613063512.17540-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220613063512.17540-1-songmuchun@bytedance.com> References: <20220613063512.17540-1-songmuchun@bytedance.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655102144; a=rsa-sha256; cv=none; b=7l+fZQaDAY6wQO662Z4ccqucwr8TZgO50PNhBeyJyaEhu7PK1cfvLU6+kB2t/u/2v6GSgo q7s5renIE68EkOyyyK9WuzGmtuwKSMauSsglvnFJAlzylcxMVsCfUIlAuY4PsOlIaek3Ri 3PPGFcnCtzFxxR1U+6WlrMn676SAY6w= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=vEdfjXcv; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf18.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655102144; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Cd5TADEYIw6EN5sUEtH2CBYk3q1W3D399BGmAPG/bAw=; b=Ven2Fw8T3p6ksXFEcA5YZBeB33NJmjbe9yQc8QyQfKMHMAiD2MdV4so9sn+ft0zF+FKmxo Y85kliQ9tjofdhv5MyvfadCZITwOPq1nJCElQHiNgsT2oegRAf1RC8SmJer18Hme/aYmjk RtLMJ5sgU7+X5iNDQOa+zKeRuYlyj40= X-Stat-Signature: igi17zig8pfr4fz4s4tet481uqn99bgw X-Rspam-User: X-Rspamd-Queue-Id: 8DFCF1C007C X-Rspamd-Server: rspam07 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=vEdfjXcv; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf18.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-HE-Tag: 1655102144-25197 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When I first introduced vmemmap manipulation functions related to HugeTLB, I thought those functions may be reused by other modules (e.g. using similar approach to optimize vmemmap pages, unfortunately, the DAX used the same approach but does not use those functions). After two years, we didn't see any other users. So move those functions to hugetlb_vmemmap.c. Signed-off-by: Muchun Song Reviewed-by: Oscar Salvador Reviewed-by: Mike Kravetz --- include/linux/mm.h | 7 - mm/hugetlb_vmemmap.c | 391 ++++++++++++++++++++++++++++++++++++++++++++++++++- mm/sparse-vmemmap.c | 391 --------------------------------------------------- 3 files changed, 390 insertions(+), 399 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 623c2ee8330a..152d0eefe5aa 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3208,13 +3208,6 @@ static inline void print_vma_addr(char *prefix, unsigned long rip) } #endif -#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP -int vmemmap_remap_free(unsigned long start, unsigned long end, - unsigned long reuse); -int vmemmap_remap_alloc(unsigned long start, unsigned long end, - unsigned long reuse, gfp_t gfp_mask); -#endif - void *sparse_buffer_alloc(unsigned long size); struct page * __populate_section_memmap(unsigned long pfn, unsigned long nr_pages, int nid, struct vmem_altmap *altmap, diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index c10540993577..abdf441215bb 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -10,9 +10,31 @@ */ #define pr_fmt(fmt) "HugeTLB: " fmt -#include +#include +#include +#include +#include #include "hugetlb_vmemmap.h" +/** + * struct vmemmap_remap_walk - walk vmemmap page table + * + * @remap_pte: called for each lowest-level entry (PTE). + * @nr_walked: the number of walked pte. + * @reuse_page: the page which is reused for the tail vmemmap pages. + * @reuse_addr: the virtual address of the @reuse_page page. + * @vmemmap_pages: the list head of the vmemmap pages that can be freed + * or is mapped from. + */ +struct vmemmap_remap_walk { + void (*remap_pte)(pte_t *pte, unsigned long addr, + struct vmemmap_remap_walk *walk); + unsigned long nr_walked; + struct page *reuse_page; + unsigned long reuse_addr; + struct list_head *vmemmap_pages; +}; + /* * There are a lot of struct page structures associated with each HugeTLB page. * For tail pages, the value of compound_head is the same. So we can reuse first @@ -23,6 +45,373 @@ #define RESERVE_VMEMMAP_NR 1U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) +static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) +{ + pmd_t __pmd; + int i; + unsigned long addr = start; + struct page *page = pmd_page(*pmd); + pte_t *pgtable = pte_alloc_one_kernel(&init_mm); + + if (!pgtable) + return -ENOMEM; + + pmd_populate_kernel(&init_mm, &__pmd, pgtable); + + for (i = 0; i < PMD_SIZE / PAGE_SIZE; i++, addr += PAGE_SIZE) { + pte_t entry, *pte; + pgprot_t pgprot = PAGE_KERNEL; + + entry = mk_pte(page + i, pgprot); + pte = pte_offset_kernel(&__pmd, addr); + set_pte_at(&init_mm, addr, pte, entry); + } + + spin_lock(&init_mm.page_table_lock); + if (likely(pmd_leaf(*pmd))) { + /* Make pte visible before pmd. See comment in pmd_install(). */ + smp_wmb(); + pmd_populate_kernel(&init_mm, pmd, pgtable); + flush_tlb_kernel_range(start, start + PMD_SIZE); + } else { + pte_free_kernel(&init_mm, pgtable); + } + spin_unlock(&init_mm.page_table_lock); + + return 0; +} + +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) +{ + int leaf; + + spin_lock(&init_mm.page_table_lock); + leaf = pmd_leaf(*pmd); + spin_unlock(&init_mm.page_table_lock); + + if (!leaf) + return 0; + + return __split_vmemmap_huge_pmd(pmd, start); +} + +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, + struct vmemmap_remap_walk *walk) +{ + pte_t *pte = pte_offset_kernel(pmd, addr); + + /* + * The reuse_page is found 'first' in table walk before we start + * remapping (which is calling @walk->remap_pte). + */ + if (!walk->reuse_page) { + walk->reuse_page = pte_page(*pte); + /* + * Because the reuse address is part of the range that we are + * walking, skip the reuse address range. + */ + addr += PAGE_SIZE; + pte++; + walk->nr_walked++; + } + + for (; addr != end; addr += PAGE_SIZE, pte++) { + walk->remap_pte(pte, addr, walk); + walk->nr_walked++; + } +} + +static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, + unsigned long end, + struct vmemmap_remap_walk *walk) +{ + pmd_t *pmd; + unsigned long next; + + pmd = pmd_offset(pud, addr); + do { + int ret; + + ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK); + if (ret) + return ret; + + next = pmd_addr_end(addr, end); + vmemmap_pte_range(pmd, addr, next, walk); + } while (pmd++, addr = next, addr != end); + + return 0; +} + +static int vmemmap_pud_range(p4d_t *p4d, unsigned long addr, + unsigned long end, + struct vmemmap_remap_walk *walk) +{ + pud_t *pud; + unsigned long next; + + pud = pud_offset(p4d, addr); + do { + int ret; + + next = pud_addr_end(addr, end); + ret = vmemmap_pmd_range(pud, addr, next, walk); + if (ret) + return ret; + } while (pud++, addr = next, addr != end); + + return 0; +} + +static int vmemmap_p4d_range(pgd_t *pgd, unsigned long addr, + unsigned long end, + struct vmemmap_remap_walk *walk) +{ + p4d_t *p4d; + unsigned long next; + + p4d = p4d_offset(pgd, addr); + do { + int ret; + + next = p4d_addr_end(addr, end); + ret = vmemmap_pud_range(p4d, addr, next, walk); + if (ret) + return ret; + } while (p4d++, addr = next, addr != end); + + return 0; +} + +static int vmemmap_remap_range(unsigned long start, unsigned long end, + struct vmemmap_remap_walk *walk) +{ + unsigned long addr = start; + unsigned long next; + pgd_t *pgd; + + VM_BUG_ON(!PAGE_ALIGNED(start)); + VM_BUG_ON(!PAGE_ALIGNED(end)); + + pgd = pgd_offset_k(addr); + do { + int ret; + + next = pgd_addr_end(addr, end); + ret = vmemmap_p4d_range(pgd, addr, next, walk); + if (ret) + return ret; + } while (pgd++, addr = next, addr != end); + + /* + * We only change the mapping of the vmemmap virtual address range + * [@start + PAGE_SIZE, end), so we only need to flush the TLB which + * belongs to the range. + */ + flush_tlb_kernel_range(start + PAGE_SIZE, end); + + return 0; +} + +/* + * Free a vmemmap page. A vmemmap page can be allocated from the memblock + * allocator or buddy allocator. If the PG_reserved flag is set, it means + * that it allocated from the memblock allocator, just free it via the + * free_bootmem_page(). Otherwise, use __free_page(). + */ +static inline void free_vmemmap_page(struct page *page) +{ + if (PageReserved(page)) + free_bootmem_page(page); + else + __free_page(page); +} + +/* Free a list of the vmemmap pages */ +static void free_vmemmap_page_list(struct list_head *list) +{ + struct page *page, *next; + + list_for_each_entry_safe(page, next, list, lru) { + list_del(&page->lru); + free_vmemmap_page(page); + } +} + +static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, + struct vmemmap_remap_walk *walk) +{ + /* + * Remap the tail pages as read-only to catch illegal write operation + * to the tail pages. + */ + pgprot_t pgprot = PAGE_KERNEL_RO; + pte_t entry = mk_pte(walk->reuse_page, pgprot); + struct page *page = pte_page(*pte); + + list_add_tail(&page->lru, walk->vmemmap_pages); + set_pte_at(&init_mm, addr, pte, entry); +} + +/* + * How many struct page structs need to be reset. When we reuse the head + * struct page, the special metadata (e.g. page->flags or page->mapping) + * cannot copy to the tail struct page structs. The invalid value will be + * checked in the free_tail_pages_check(). In order to avoid the message + * of "corrupted mapping in tail page". We need to reset at least 3 (one + * head struct page struct and two tail struct page structs) struct page + * structs. + */ +#define NR_RESET_STRUCT_PAGE 3 + +static inline void reset_struct_pages(struct page *start) +{ + int i; + struct page *from = start + NR_RESET_STRUCT_PAGE; + + for (i = 0; i < NR_RESET_STRUCT_PAGE; i++) + memcpy(start + i, from, sizeof(*from)); +} + +static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, + struct vmemmap_remap_walk *walk) +{ + pgprot_t pgprot = PAGE_KERNEL; + struct page *page; + void *to; + + BUG_ON(pte_page(*pte) != walk->reuse_page); + + page = list_first_entry(walk->vmemmap_pages, struct page, lru); + list_del(&page->lru); + to = page_to_virt(page); + copy_page(to, (void *)walk->reuse_addr); + reset_struct_pages(to); + + set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); +} + +/** + * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end) + * to the page which @reuse is mapped to, then free vmemmap + * which the range are mapped to. + * @start: start address of the vmemmap virtual address range that we want + * to remap. + * @end: end address of the vmemmap virtual address range that we want to + * remap. + * @reuse: reuse address. + * + * Return: %0 on success, negative error code otherwise. + */ +static int vmemmap_remap_free(unsigned long start, unsigned long end, + unsigned long reuse) +{ + int ret; + LIST_HEAD(vmemmap_pages); + struct vmemmap_remap_walk walk = { + .remap_pte = vmemmap_remap_pte, + .reuse_addr = reuse, + .vmemmap_pages = &vmemmap_pages, + }; + + /* + * In order to make remapping routine most efficient for the huge pages, + * the routine of vmemmap page table walking has the following rules + * (see more details from the vmemmap_pte_range()): + * + * - The range [@start, @end) and the range [@reuse, @reuse + PAGE_SIZE) + * should be continuous. + * - The @reuse address is part of the range [@reuse, @end) that we are + * walking which is passed to vmemmap_remap_range(). + * - The @reuse address is the first in the complete range. + * + * So we need to make sure that @start and @reuse meet the above rules. + */ + BUG_ON(start - reuse != PAGE_SIZE); + + mmap_read_lock(&init_mm); + ret = vmemmap_remap_range(reuse, end, &walk); + if (ret && walk.nr_walked) { + end = reuse + walk.nr_walked * PAGE_SIZE; + /* + * vmemmap_pages contains pages from the previous + * vmemmap_remap_range call which failed. These + * are pages which were removed from the vmemmap. + * They will be restored in the following call. + */ + walk = (struct vmemmap_remap_walk) { + .remap_pte = vmemmap_restore_pte, + .reuse_addr = reuse, + .vmemmap_pages = &vmemmap_pages, + }; + + vmemmap_remap_range(reuse, end, &walk); + } + mmap_read_unlock(&init_mm); + + free_vmemmap_page_list(&vmemmap_pages); + + return ret; +} + +static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, + gfp_t gfp_mask, struct list_head *list) +{ + unsigned long nr_pages = (end - start) >> PAGE_SHIFT; + int nid = page_to_nid((struct page *)start); + struct page *page, *next; + + while (nr_pages--) { + page = alloc_pages_node(nid, gfp_mask, 0); + if (!page) + goto out; + list_add_tail(&page->lru, list); + } + + return 0; +out: + list_for_each_entry_safe(page, next, list, lru) + __free_pages(page, 0); + return -ENOMEM; +} + +/** + * vmemmap_remap_alloc - remap the vmemmap virtual address range [@start, end) + * to the page which is from the @vmemmap_pages + * respectively. + * @start: start address of the vmemmap virtual address range that we want + * to remap. + * @end: end address of the vmemmap virtual address range that we want to + * remap. + * @reuse: reuse address. + * @gfp_mask: GFP flag for allocating vmemmap pages. + * + * Return: %0 on success, negative error code otherwise. + */ +static int vmemmap_remap_alloc(unsigned long start, unsigned long end, + unsigned long reuse, gfp_t gfp_mask) +{ + LIST_HEAD(vmemmap_pages); + struct vmemmap_remap_walk walk = { + .remap_pte = vmemmap_restore_pte, + .reuse_addr = reuse, + .vmemmap_pages = &vmemmap_pages, + }; + + /* See the comment in the vmemmap_remap_free(). */ + BUG_ON(start - reuse != PAGE_SIZE); + + if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages)) + return -ENOMEM; + + mmap_read_lock(&init_mm); + vmemmap_remap_range(reuse, end, &walk); + mmap_read_unlock(&init_mm); + + return 0; +} + DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 49cb15cbe590..473effcb2285 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -27,400 +27,9 @@ #include #include #include -#include -#include #include #include -#include - -#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP -/** - * struct vmemmap_remap_walk - walk vmemmap page table - * - * @remap_pte: called for each lowest-level entry (PTE). - * @nr_walked: the number of walked pte. - * @reuse_page: the page which is reused for the tail vmemmap pages. - * @reuse_addr: the virtual address of the @reuse_page page. - * @vmemmap_pages: the list head of the vmemmap pages that can be freed - * or is mapped from. - */ -struct vmemmap_remap_walk { - void (*remap_pte)(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk); - unsigned long nr_walked; - struct page *reuse_page; - unsigned long reuse_addr; - struct list_head *vmemmap_pages; -}; - -static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) -{ - pmd_t __pmd; - int i; - unsigned long addr = start; - struct page *page = pmd_page(*pmd); - pte_t *pgtable = pte_alloc_one_kernel(&init_mm); - - if (!pgtable) - return -ENOMEM; - - pmd_populate_kernel(&init_mm, &__pmd, pgtable); - - for (i = 0; i < PMD_SIZE / PAGE_SIZE; i++, addr += PAGE_SIZE) { - pte_t entry, *pte; - pgprot_t pgprot = PAGE_KERNEL; - - entry = mk_pte(page + i, pgprot); - pte = pte_offset_kernel(&__pmd, addr); - set_pte_at(&init_mm, addr, pte, entry); - } - - spin_lock(&init_mm.page_table_lock); - if (likely(pmd_leaf(*pmd))) { - /* Make pte visible before pmd. See comment in pmd_install(). */ - smp_wmb(); - pmd_populate_kernel(&init_mm, pmd, pgtable); - flush_tlb_kernel_range(start, start + PMD_SIZE); - } else { - pte_free_kernel(&init_mm, pgtable); - } - spin_unlock(&init_mm.page_table_lock); - - return 0; -} - -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) -{ - int leaf; - - spin_lock(&init_mm.page_table_lock); - leaf = pmd_leaf(*pmd); - spin_unlock(&init_mm.page_table_lock); - - if (!leaf) - return 0; - - return __split_vmemmap_huge_pmd(pmd, start); -} - -static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, - struct vmemmap_remap_walk *walk) -{ - pte_t *pte = pte_offset_kernel(pmd, addr); - - /* - * The reuse_page is found 'first' in table walk before we start - * remapping (which is calling @walk->remap_pte). - */ - if (!walk->reuse_page) { - walk->reuse_page = pte_page(*pte); - /* - * Because the reuse address is part of the range that we are - * walking, skip the reuse address range. - */ - addr += PAGE_SIZE; - pte++; - walk->nr_walked++; - } - - for (; addr != end; addr += PAGE_SIZE, pte++) { - walk->remap_pte(pte, addr, walk); - walk->nr_walked++; - } -} - -static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, - unsigned long end, - struct vmemmap_remap_walk *walk) -{ - pmd_t *pmd; - unsigned long next; - - pmd = pmd_offset(pud, addr); - do { - int ret; - - ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK); - if (ret) - return ret; - - next = pmd_addr_end(addr, end); - vmemmap_pte_range(pmd, addr, next, walk); - } while (pmd++, addr = next, addr != end); - - return 0; -} - -static int vmemmap_pud_range(p4d_t *p4d, unsigned long addr, - unsigned long end, - struct vmemmap_remap_walk *walk) -{ - pud_t *pud; - unsigned long next; - - pud = pud_offset(p4d, addr); - do { - int ret; - - next = pud_addr_end(addr, end); - ret = vmemmap_pmd_range(pud, addr, next, walk); - if (ret) - return ret; - } while (pud++, addr = next, addr != end); - - return 0; -} - -static int vmemmap_p4d_range(pgd_t *pgd, unsigned long addr, - unsigned long end, - struct vmemmap_remap_walk *walk) -{ - p4d_t *p4d; - unsigned long next; - - p4d = p4d_offset(pgd, addr); - do { - int ret; - - next = p4d_addr_end(addr, end); - ret = vmemmap_pud_range(p4d, addr, next, walk); - if (ret) - return ret; - } while (p4d++, addr = next, addr != end); - - return 0; -} - -static int vmemmap_remap_range(unsigned long start, unsigned long end, - struct vmemmap_remap_walk *walk) -{ - unsigned long addr = start; - unsigned long next; - pgd_t *pgd; - - VM_BUG_ON(!PAGE_ALIGNED(start)); - VM_BUG_ON(!PAGE_ALIGNED(end)); - - pgd = pgd_offset_k(addr); - do { - int ret; - - next = pgd_addr_end(addr, end); - ret = vmemmap_p4d_range(pgd, addr, next, walk); - if (ret) - return ret; - } while (pgd++, addr = next, addr != end); - - /* - * We only change the mapping of the vmemmap virtual address range - * [@start + PAGE_SIZE, end), so we only need to flush the TLB which - * belongs to the range. - */ - flush_tlb_kernel_range(start + PAGE_SIZE, end); - - return 0; -} - -/* - * Free a vmemmap page. A vmemmap page can be allocated from the memblock - * allocator or buddy allocator. If the PG_reserved flag is set, it means - * that it allocated from the memblock allocator, just free it via the - * free_bootmem_page(). Otherwise, use __free_page(). - */ -static inline void free_vmemmap_page(struct page *page) -{ - if (PageReserved(page)) - free_bootmem_page(page); - else - __free_page(page); -} - -/* Free a list of the vmemmap pages */ -static void free_vmemmap_page_list(struct list_head *list) -{ - struct page *page, *next; - - list_for_each_entry_safe(page, next, list, lru) { - list_del(&page->lru); - free_vmemmap_page(page); - } -} - -static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk) -{ - /* - * Remap the tail pages as read-only to catch illegal write operation - * to the tail pages. - */ - pgprot_t pgprot = PAGE_KERNEL_RO; - pte_t entry = mk_pte(walk->reuse_page, pgprot); - struct page *page = pte_page(*pte); - - list_add_tail(&page->lru, walk->vmemmap_pages); - set_pte_at(&init_mm, addr, pte, entry); -} - -/* - * How many struct page structs need to be reset. When we reuse the head - * struct page, the special metadata (e.g. page->flags or page->mapping) - * cannot copy to the tail struct page structs. The invalid value will be - * checked in the free_tail_pages_check(). In order to avoid the message - * of "corrupted mapping in tail page". We need to reset at least 3 (one - * head struct page struct and two tail struct page structs) struct page - * structs. - */ -#define NR_RESET_STRUCT_PAGE 3 - -static inline void reset_struct_pages(struct page *start) -{ - int i; - struct page *from = start + NR_RESET_STRUCT_PAGE; - - for (i = 0; i < NR_RESET_STRUCT_PAGE; i++) - memcpy(start + i, from, sizeof(*from)); -} - -static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk) -{ - pgprot_t pgprot = PAGE_KERNEL; - struct page *page; - void *to; - - BUG_ON(pte_page(*pte) != walk->reuse_page); - - page = list_first_entry(walk->vmemmap_pages, struct page, lru); - list_del(&page->lru); - to = page_to_virt(page); - copy_page(to, (void *)walk->reuse_addr); - reset_struct_pages(to); - - set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); -} - -/** - * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end) - * to the page which @reuse is mapped to, then free vmemmap - * which the range are mapped to. - * @start: start address of the vmemmap virtual address range that we want - * to remap. - * @end: end address of the vmemmap virtual address range that we want to - * remap. - * @reuse: reuse address. - * - * Return: %0 on success, negative error code otherwise. - */ -int vmemmap_remap_free(unsigned long start, unsigned long end, - unsigned long reuse) -{ - int ret; - LIST_HEAD(vmemmap_pages); - struct vmemmap_remap_walk walk = { - .remap_pte = vmemmap_remap_pte, - .reuse_addr = reuse, - .vmemmap_pages = &vmemmap_pages, - }; - - /* - * In order to make remapping routine most efficient for the huge pages, - * the routine of vmemmap page table walking has the following rules - * (see more details from the vmemmap_pte_range()): - * - * - The range [@start, @end) and the range [@reuse, @reuse + PAGE_SIZE) - * should be continuous. - * - The @reuse address is part of the range [@reuse, @end) that we are - * walking which is passed to vmemmap_remap_range(). - * - The @reuse address is the first in the complete range. - * - * So we need to make sure that @start and @reuse meet the above rules. - */ - BUG_ON(start - reuse != PAGE_SIZE); - - mmap_read_lock(&init_mm); - ret = vmemmap_remap_range(reuse, end, &walk); - if (ret && walk.nr_walked) { - end = reuse + walk.nr_walked * PAGE_SIZE; - /* - * vmemmap_pages contains pages from the previous - * vmemmap_remap_range call which failed. These - * are pages which were removed from the vmemmap. - * They will be restored in the following call. - */ - walk = (struct vmemmap_remap_walk) { - .remap_pte = vmemmap_restore_pte, - .reuse_addr = reuse, - .vmemmap_pages = &vmemmap_pages, - }; - - vmemmap_remap_range(reuse, end, &walk); - } - mmap_read_unlock(&init_mm); - - free_vmemmap_page_list(&vmemmap_pages); - - return ret; -} - -static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, - gfp_t gfp_mask, struct list_head *list) -{ - unsigned long nr_pages = (end - start) >> PAGE_SHIFT; - int nid = page_to_nid((struct page *)start); - struct page *page, *next; - - while (nr_pages--) { - page = alloc_pages_node(nid, gfp_mask, 0); - if (!page) - goto out; - list_add_tail(&page->lru, list); - } - - return 0; -out: - list_for_each_entry_safe(page, next, list, lru) - __free_pages(page, 0); - return -ENOMEM; -} - -/** - * vmemmap_remap_alloc - remap the vmemmap virtual address range [@start, end) - * to the page which is from the @vmemmap_pages - * respectively. - * @start: start address of the vmemmap virtual address range that we want - * to remap. - * @end: end address of the vmemmap virtual address range that we want to - * remap. - * @reuse: reuse address. - * @gfp_mask: GFP flag for allocating vmemmap pages. - * - * Return: %0 on success, negative error code otherwise. - */ -int vmemmap_remap_alloc(unsigned long start, unsigned long end, - unsigned long reuse, gfp_t gfp_mask) -{ - LIST_HEAD(vmemmap_pages); - struct vmemmap_remap_walk walk = { - .remap_pte = vmemmap_restore_pte, - .reuse_addr = reuse, - .vmemmap_pages = &vmemmap_pages, - }; - - /* See the comment in the vmemmap_remap_free(). */ - BUG_ON(start - reuse != PAGE_SIZE); - - if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages)) - return -ENOMEM; - - mmap_read_lock(&init_mm); - vmemmap_remap_range(reuse, end, &walk); - mmap_read_unlock(&init_mm); - - return 0; -} -#endif /* CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP */ /* * Allocate a block of memory to be used to back the virtual memory map From patchwork Mon Jun 13 06:35:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12879055 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49919C43334 for ; Mon, 13 Jun 2022 06:35:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C5B1E8D015C; Mon, 13 Jun 2022 02:35:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C09D48D0142; Mon, 13 Jun 2022 02:35:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AAB868D015C; Mon, 13 Jun 2022 02:35:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9B3EE8D0142 for ; Mon, 13 Jun 2022 02:35:48 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 76CD720B79 for ; Mon, 13 Jun 2022 06:35:48 +0000 (UTC) X-FDA: 79572251976.20.6CC2483 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf27.hostedemail.com (Postfix) with ESMTP id 1C01C4008B for ; Mon, 13 Jun 2022 06:35:47 +0000 (UTC) Received: by mail-pf1-f173.google.com with SMTP id c196so4938063pfb.1 for ; Sun, 12 Jun 2022 23:35:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=J3yPX4pLteTBSSt3jyoxQC4mEw/61R+QtZm4AH/eCAw=; b=tFpKgo0j+PNLUDk0x391HcyJFSlH7ARGiFiArHFBqwwQvFstQSoTZFlD63itVTgYpc kUUVltfZZJyaBSPftr8vB5U60aL3+Bhhc3i9SxGcNDi/qPj+qPLIIzGOrxA0UqjRKOYe SJe68LscSGyRfpTbJWWulEAdjHzitTlvSq0Kgup9y7NhhFvjWFNRz49wAFlozK/qxWqV siEjOibdp+vbrc5CDd14o6Oxe7QSb5wHX6a2IUr2AzgcEZIY0IFOOHhtonhSWbQeg8bK JGZPFSeVlFm7Z22v9a5EJKJ3OIkN90hGOjnWYRJkdY5mztcGZsPWI8fuObj9As7/kq5u tDag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=J3yPX4pLteTBSSt3jyoxQC4mEw/61R+QtZm4AH/eCAw=; b=NWB+ZlV403VMcIysOF/mzX5H1+JNM5tMZcipG6j7JcxQhYBfKRBkEL5N067nHAErsM kmpHVVC2r+ugns3bljsUG6+pKdH6JGPFsbqfg37tSUrFfPH8w7Fj/YioIvsLWuEuZZOc //2DPacKTXElAeGVorgBWU0HV869IfkzZKm+u+Dezrzq1gtUhY4LledLVTaOf6YVdc1y mu95NipXyvpmGKsVAFEr1n/3MzL/CKavdNRYLNiBxzP04Ib/N2+MoMn4IzaWnz8leg99 LkkCH7CUJZ6bTnOJxvOtCq+LNeMibmLb8w1rIpo3DqHe/21ndK4YyGdbo8AEn2YOOI52 k6Zg== X-Gm-Message-State: AOAM531s9yjwZM4iPDdumVrGB1oNWJmeQNY8tCZlgZFM1DJWuyQvJiLk /+SdMSKKEznqU0kx8aTlCkgc7g== X-Google-Smtp-Source: ABdhPJy28P5B0qqGa3Ke6+SfMWEOc2cytKxdWdyOLSShHZNaM7W3nV99RcCZipyIALw6c+KciZ9G3w== X-Received: by 2002:a63:cf51:0:b0:408:85f4:fb33 with SMTP id b17-20020a63cf51000000b0040885f4fb33mr1973347pgj.589.1655102147198; Sun, 12 Jun 2022 23:35:47 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id v3-20020aa799c3000000b0051bc538baadsm4366554pfi.184.2022.06.12.23.35.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Jun 2022 23:35:46 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, david@redhat.com, akpm@linux-foundation.org, corbet@lwn.net Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Muchun Song Subject: [PATCH 5/6] mm: hugetlb_vmemmap: replace early_param() with core_param() Date: Mon, 13 Jun 2022 14:35:11 +0800 Message-Id: <20220613063512.17540-6-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220613063512.17540-1-songmuchun@bytedance.com> References: <20220613063512.17540-1-songmuchun@bytedance.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655102148; a=rsa-sha256; cv=none; b=TNo4GtqVRraY2ty/z5HABrv20t/UhZ0/hAYbD+hkx9cIXW4weaXyXbS5364faHk0WGNTFc 4qTQ60RuiTxSJt8GC9D6LphOy5oMo8dA1L5OxCj3AVVrgOoSIYMS8kUebC4e2wxvwhKk8G 0NUfHqBGRxCEyzVUj7hhH99lJYssM1o= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=tFpKgo0j; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf27.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655102148; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J3yPX4pLteTBSSt3jyoxQC4mEw/61R+QtZm4AH/eCAw=; b=OTt3XGY0unNi2M89oZYxP+qWQJ0sxJos54yQBStKhzo/KfoP1ElbDZqNetI0ti3ZFZjkDv AGNEZRd2pCT6XZ+pDSLgEQEoFl7b2szzkpZJT8ArE9kNrQDXmkMar+2IEe1GFj4XXRlfzJ teebz2iF2SRgeJndvCe7QDTmNAZHbLQ= X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1C01C4008B Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=tFpKgo0j; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf27.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Stat-Signature: e8qza5774uzzim9eqqhjqnifq5iy9hso X-HE-Tag: 1655102147-380013 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After the following commit and previous commit applied. 78f39084b41d ("mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl") There is no order requirement between the parameter of "hugetlb_free_vmemmap" and "hugepages" since we have removed the check of whether HVO is enabled from hugetlb_vmemmap_init(). Therefore we can safely replace early_param() with core_param() to simplify the code. Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz --- mm/hugetlb_vmemmap.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index abdf441215bb..9808d32cdb9e 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -415,14 +415,8 @@ static int vmemmap_remap_alloc(unsigned long start, unsigned long end, DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); -static bool vmemmap_optimize_enabled = - IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON); - -static int __init hugetlb_vmemmap_early_param(char *buf) -{ - return kstrtobool(buf, &vmemmap_optimize_enabled); -} -early_param("hugetlb_free_vmemmap", hugetlb_vmemmap_early_param); +static bool vmemmap_optimize_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON); +core_param(hugetlb_free_vmemmap, vmemmap_optimize_enabled, bool, 0); /* * Previously discarded vmemmap pages will be allocated and remapping From patchwork Mon Jun 13 06:35:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12879056 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98EF0C43334 for ; Mon, 13 Jun 2022 06:35:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 359BA8D015D; Mon, 13 Jun 2022 02:35:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3093D8D0142; Mon, 13 Jun 2022 02:35:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 183198D015D; Mon, 13 Jun 2022 02:35:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 03C828D0142 for ; Mon, 13 Jun 2022 02:35:52 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id CAC1E20E3D for ; Mon, 13 Jun 2022 06:35:51 +0000 (UTC) X-FDA: 79572252102.01.422CFFA Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf27.hostedemail.com (Postfix) with ESMTP id 6DBD04008B for ; Mon, 13 Jun 2022 06:35:51 +0000 (UTC) Received: by mail-pf1-f173.google.com with SMTP id c196so4938063pfb.1 for ; Sun, 12 Jun 2022 23:35:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=gwkuZE56aSA9/tatUtzCc+m4R0MzHGUai/Pf7RD5URU=; b=Kjz8rGkl7FUYvNUnypP5Qyyj+Q6Yamk+s0aKQd9kRPShoI7ZOlXd8aKUfnLlq9VRDB 2IfUB/K+zfaqtPom55HNE6nY/05iePr6inRaRAesiMWdSIoAh1ye3EsBFXLtNMLr2tAR Fb/Hsv2nG/b2HGDs6VBfx0c9n0h/U8MnTniUCYUNU8LUi4I+p+AMkK5YytsFI58VYv5t 1Jl91fMb+LgXqyUBvkiNO2PGH6dWAAXvpAtDzGS8jywgFBpsKNXleUSj+JD4WX5bfVPZ rZaLzWXTJfU0O6StEleg8g9iCZOzQsFeWDeSdE23GLC3Ca0j2sx41hCPzX3sfrqha1hw zjSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gwkuZE56aSA9/tatUtzCc+m4R0MzHGUai/Pf7RD5URU=; b=1J/VZa7UNVOGjiLx6IXwchg5ZDi0t4g8H2h0M8f4zKmjJnQSG8hNus9RdxO6ryRscm nYflJ1Oucta60MYQKuDeWeu4oxJs9/GbTfdOPE8P5vFLfwyhpapnN/er1NEq4wMLU8bW H2KBPspqDxH3MPHtM1Qjdm5B3dFqBQrTFBhiVTuiKK3Rh85pJpckE49CnalN9fR6V0rD DZad/xa6W5Jr8mxzaSStKug6rGSsWEGJ9UI81d0jtGzR/ItJHWMaMUb35Le4hTp6IAUC xxP7VuT1Dv+OoyFERorWsDmPPiHMunlgFJ/gR/9Turf4VqSozvJpmTb2q8T3PxsFT/jo sHCA== X-Gm-Message-State: AOAM530GrBTTHEvI3Pg6oede5liPxb7Q4EFNaYQA6Wm447qvRQ1DzqEX lR+C514raLkTEJRtQmjHIba5Zw== X-Google-Smtp-Source: ABdhPJx2XekdnvADwYsXewHpUnTmAPdpTfW58vn7LUP2xBm6iJ9za0hOyv1YbpUo8oZuj7TQ2LmxOQ== X-Received: by 2002:a62:6410:0:b0:4f3:9654:266d with SMTP id y16-20020a626410000000b004f39654266dmr58255506pfb.59.1655102150896; Sun, 12 Jun 2022 23:35:50 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id v3-20020aa799c3000000b0051bc538baadsm4366554pfi.184.2022.06.12.23.35.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Jun 2022 23:35:50 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, david@redhat.com, akpm@linux-foundation.org, corbet@lwn.net Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Muchun Song Subject: [PATCH 6/6] mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability Date: Mon, 13 Jun 2022 14:35:12 +0800 Message-Id: <20220613063512.17540-7-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220613063512.17540-1-songmuchun@bytedance.com> References: <20220613063512.17540-1-songmuchun@bytedance.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655102151; a=rsa-sha256; cv=none; b=r8F/V8PqG6LyWUVXfoOgN37WgfJKDkCIaQMwUEPQSt9Nkm7Aa+Ijj/lfz26esLSfhR4MQh 7lIj4fL0bAKQ7ogP9S/zVLRMwyQGZbtnF5pgHLmX4Ty+y5hm2+0LSz8sl4xWbtGG42PEAX 1hTLipDzQ2hd3uoKTtYPbUA5UIb8kSc= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Kjz8rGkl; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf27.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655102151; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gwkuZE56aSA9/tatUtzCc+m4R0MzHGUai/Pf7RD5URU=; b=yTL8sVgOinSriQrO5BD1H+wX5sH6jErZV2HbCq/ydSQmlmJ3UO0BsRy7SAz+2TxoU00u8p 4ZPQ+kp34wNAwcqBGswmFFPNKvbBsP9ug9E36bzPcL1iKC4VNVz0Ywrb8c+sGwZTcgAua6 +T87dhIKVJFJooGsOSJlXE6exPXYyX8= X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 6DBD04008B Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Kjz8rGkl; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf27.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Stat-Signature: pombiza67enp4ait1bt6n9nh3xmranni X-HE-Tag: 1655102151-701033 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There is a discussion about the name of hugetlb_vmemmap_alloc/free in thread [1]. The suggestion suggested by David is rename "alloc/free" to "optimize/restore" to make functionalities clearer to users, "optimize" means the function will optimize vmemmap pages, while "restore" means restoring its vmemmap pages discared before. This commit does this. Another discussion is the confusion RESERVE_VMEMMAP_NR isn't used explicitly for vmemmap_addr but implicitly for vmemmap_end in hugetlb_vmemmap_alloc/free. David suggested we can compute what hugetlb_vmemmap_init() does now at runtime. We do not need to worry for the overhead of computing at runtime since the calculation is simple enough and those functions are not in a hot path. This commit has the following improvements: 1) The function suffixed name ("optimize/restore") is more expressive. 2) The logic becomes less weird in hugetlb_vmemmap_optmize/restore(). 3) The hugetlb_vmemmap_init() does not need to be exported anymore. 4) A ->optimize_vmemmap_pages field in struct hstate is killed. 5) There is only one place where checks is_power_of_2(sizeof(struct page)) instead of two places. 6) Add more comments for hugetlb_vmemmap_optmize/restore(). 7) For external users, hugetlb_optimize_vmemmap_pages() is used for detecting if the HugeTLB's vmemmap pages is optimizable originally. In this commit, it is killed and we introduce a new helper hugetlb_vmemmap_optimizable() to replace it. The name is more expressive. Link: https://lore.kernel.org/all/20220404074652.68024-2-songmuchun@bytedance.com/ [1] Signed-off-by: Muchun Song --- include/linux/hugetlb.h | 7 +-- mm/hugetlb.c | 11 ++-- mm/hugetlb_vmemmap.c | 154 +++++++++++++++++++++++------------------------- mm/hugetlb_vmemmap.h | 39 +++++++----- 4 files changed, 105 insertions(+), 106 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 642a39016f9a..0b475faf9bf4 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -640,9 +640,6 @@ struct hstate { unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; unsigned int surplus_huge_pages_node[MAX_NUMNODES]; -#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP - unsigned int optimize_vmemmap_pages; -#endif #ifdef CONFIG_CGROUP_HUGETLB /* cgroup control files */ struct cftype cgroup_files_dfl[8]; @@ -718,7 +715,7 @@ static inline struct hstate *hstate_vma(struct vm_area_struct *vma) return hstate_file(vma->vm_file); } -static inline unsigned long huge_page_size(struct hstate *h) +static inline unsigned long huge_page_size(const struct hstate *h) { return (unsigned long)PAGE_SIZE << h->order; } @@ -747,7 +744,7 @@ static inline bool hstate_is_gigantic(struct hstate *h) return huge_page_order(h) >= MAX_ORDER; } -static inline unsigned int pages_per_huge_page(struct hstate *h) +static inline unsigned int pages_per_huge_page(const struct hstate *h) { return 1 << h->order; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 259b9c41892f..26a5af7f0065 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1541,7 +1541,7 @@ static void __update_and_free_page(struct hstate *h, struct page *page) if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) return; - if (hugetlb_vmemmap_alloc(h, page)) { + if (hugetlb_vmemmap_restore(h, page)) { spin_lock_irq(&hugetlb_lock); /* * If we cannot allocate vmemmap pages, just refuse to free the @@ -1627,7 +1627,7 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn); static inline void flush_free_hpage_work(struct hstate *h) { - if (hugetlb_optimize_vmemmap_pages(h)) + if (hugetlb_vmemmap_optimizable(h)) flush_work(&free_hpage_work); } @@ -1749,7 +1749,7 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid) static void __prep_new_huge_page(struct hstate *h, struct page *page) { - hugetlb_vmemmap_free(h, page); + hugetlb_vmemmap_optimize(h, page); INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); hugetlb_set_page_subpool(page, NULL); @@ -2122,7 +2122,7 @@ int dissolve_free_huge_page(struct page *page) * Attempt to allocate vmemmmap here so that we can take * appropriate action on failure. */ - rc = hugetlb_vmemmap_alloc(h, head); + rc = hugetlb_vmemmap_restore(h, head); if (!rc) { /* * Move PageHWPoison flag from head page to the raw @@ -3434,7 +3434,7 @@ static int demote_free_huge_page(struct hstate *h, struct page *page) remove_hugetlb_page_for_demote(h, page, false); spin_unlock_irq(&hugetlb_lock); - rc = hugetlb_vmemmap_alloc(h, page); + rc = hugetlb_vmemmap_restore(h, page); if (rc) { /* Allocation of vmemmmap failed, we can not demote page */ spin_lock_irq(&hugetlb_lock); @@ -4124,7 +4124,6 @@ void __init hugetlb_add_hstate(unsigned int order) h->next_nid_to_free = first_memory_node; snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB", huge_page_size(h)/1024); - hugetlb_vmemmap_init(h); parsed_hstate = h; } diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 9808d32cdb9e..595b0cee3109 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -35,16 +35,6 @@ struct vmemmap_remap_walk { struct list_head *vmemmap_pages; }; -/* - * There are a lot of struct page structures associated with each HugeTLB page. - * For tail pages, the value of compound_head is the same. So we can reuse first - * page of head page structures. We map the virtual addresses of all the pages - * of tail page structures to the head page struct, and then free these page - * frames. Therefore, we need to reserve one pages as vmemmap areas. - */ -#define RESERVE_VMEMMAP_NR 1U -#define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) - static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) { pmd_t __pmd; @@ -418,32 +408,38 @@ EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); static bool vmemmap_optimize_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON); core_param(hugetlb_free_vmemmap, vmemmap_optimize_enabled, bool, 0); -/* - * Previously discarded vmemmap pages will be allocated and remapping - * after this function returns zero. +/** + * hugetlb_vmemmap_restore - restore previously optimized (by + * hugetlb_vmemmap_optimize()) vmemmap pages which + * will be reallocated and remapped. + * @h: struct hstate. + * @head: the head page whose vmemmap pages will be restored. + * + * Return: %0 if @head's vmemmap pages have been reallocated and remapped, + * negative error code otherwise. */ -int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head) +int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head) { int ret; - unsigned long vmemmap_addr = (unsigned long)head; - unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; + unsigned long vmemmap_start = (unsigned long)head; + unsigned long vmemmap_end, vmemmap_reuse, vmemmap_size; if (!HPageVmemmapOptimized(head)) return 0; - vmemmap_addr += RESERVE_VMEMMAP_SIZE; - vmemmap_pages = hugetlb_optimize_vmemmap_pages(h); - vmemmap_end = vmemmap_addr + (vmemmap_pages << PAGE_SHIFT); - vmemmap_reuse = vmemmap_addr - PAGE_SIZE; + vmemmap_size = hugetlb_vmemmap_size(h); + vmemmap_end = vmemmap_start + vmemmap_size; + vmemmap_reuse = vmemmap_start; + vmemmap_start += RESERVE_VMEMMAP_SIZE; /* - * The pages which the vmemmap virtual address range [@vmemmap_addr, + * The pages which the vmemmap virtual address range [@vmemmap_start, * @vmemmap_end) are mapped to are freed to the buddy allocator, and * the range is mapped to the page which @vmemmap_reuse is mapped to. * When a HugeTLB page is freed to the buddy allocator, previously * discarded vmemmap pages must be allocated and remapping. */ - ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, + ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse, GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE); if (!ret) { ClearHPageVmemmapOptimized(head); @@ -453,84 +449,62 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head) return ret; } -static unsigned int optimizable_vmemmap_pages(struct hstate *h, - struct page *head) +/* Return true iff a HugeTLB whose vmemmap should and can be optimized. */ +static bool vmemmap_should_optimize(const struct hstate *h, const struct page *head) { unsigned long pfn = page_to_pfn(head); unsigned long end = pfn + pages_per_huge_page(h); if (!READ_ONCE(vmemmap_optimize_enabled)) - return 0; + return false; + + if (!hugetlb_vmemmap_optimizable(h)) + return false; for (; pfn < end; pfn += PAGES_PER_SECTION) { if (section_cannot_optimize_vmemmap(__pfn_to_section(pfn))) - return 0; + return false; } - return hugetlb_optimize_vmemmap_pages(h); + return true; } -void hugetlb_vmemmap_free(struct hstate *h, struct page *head) +/** + * hugetlb_vmemmap_optimize - optimize @head page's vmemmap pages. + * @h: struct hstate. + * @head: the head page whose vmemmap pages will be optimized. + * + * This function only tries to optimize @head's vmemmap pages and does not + * guarantee that the optimization will succeed after it returns. The caller + * can use HPageVmemmapOptimized(@head) to detect if @head's vmemmap pages + * have been optimized. + */ +void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) { - unsigned long vmemmap_addr = (unsigned long)head; - unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; + unsigned long vmemmap_start = (unsigned long)head; + unsigned long vmemmap_end, vmemmap_reuse, vmemmap_size; - vmemmap_pages = optimizable_vmemmap_pages(h, head); - if (!vmemmap_pages) + if (!vmemmap_should_optimize(h, head)) return; static_branch_inc(&hugetlb_optimize_vmemmap_key); - vmemmap_addr += RESERVE_VMEMMAP_SIZE; - vmemmap_end = vmemmap_addr + (vmemmap_pages << PAGE_SHIFT); - vmemmap_reuse = vmemmap_addr - PAGE_SIZE; + vmemmap_size = hugetlb_vmemmap_size(h); + vmemmap_end = vmemmap_start + vmemmap_size; + vmemmap_reuse = vmemmap_start; + vmemmap_start += RESERVE_VMEMMAP_SIZE; /* - * Remap the vmemmap virtual address range [@vmemmap_addr, @vmemmap_end) + * Remap the vmemmap virtual address range [@vmemmap_start, @vmemmap_end) * to the page which @vmemmap_reuse is mapped to, then free the pages - * which the range [@vmemmap_addr, @vmemmap_end] is mapped to. + * which the range [@vmemmap_start, @vmemmap_end] is mapped to. */ - if (vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) + if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse)) static_branch_dec(&hugetlb_optimize_vmemmap_key); else SetHPageVmemmapOptimized(head); } -void __init hugetlb_vmemmap_init(struct hstate *h) -{ - unsigned int nr_pages = pages_per_huge_page(h); - unsigned int vmemmap_pages; - - /* - * There are only (RESERVE_VMEMMAP_SIZE / sizeof(struct page)) struct - * page structs that can be used when HVO is enabled, add a BUILD_BUG_ON - * to catch invalid usage of the tail page structs. - */ - BUILD_BUG_ON(__NR_USED_SUBPAGE >= - RESERVE_VMEMMAP_SIZE / sizeof(struct page)); - - if (!is_power_of_2(sizeof(struct page))) { - pr_warn_once("cannot optimize vmemmap pages because \"struct page\" crosses page boundaries\n"); - return; - } - - vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT; - /* - * The head page is not to be freed to buddy allocator, the other tail - * pages will map to the head page, so they can be freed. - * - * Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? It is true - * on some architectures (e.g. aarch64). See Documentation/arm64/ - * hugetlbpage.rst for more details. - */ - if (likely(vmemmap_pages > RESERVE_VMEMMAP_NR)) - h->optimize_vmemmap_pages = vmemmap_pages - RESERVE_VMEMMAP_NR; - - pr_info("can optimize %d vmemmap pages for %s\n", - h->optimize_vmemmap_pages, h->name); -} - -#ifdef CONFIG_PROC_SYSCTL static struct ctl_table hugetlb_vmemmap_sysctls[] = { { .procname = "hugetlb_optimize_vmemmap", @@ -542,16 +516,36 @@ static struct ctl_table hugetlb_vmemmap_sysctls[] = { { } }; -static __init int hugetlb_vmemmap_sysctls_init(void) +static int __init hugetlb_vmemmap_init(void) { + const struct hstate *h; + bool optimizable = false; + /* - * If "struct page" crosses page boundaries, the vmemmap pages cannot - * be optimized. + * There are only (RESERVE_VMEMMAP_SIZE / sizeof(struct page)) struct + * page structs that can be used when HVO is enabled. */ - if (is_power_of_2(sizeof(struct page))) - register_sysctl_init("vm", hugetlb_vmemmap_sysctls); + BUILD_BUG_ON(__NR_USED_SUBPAGE >= RESERVE_VMEMMAP_SIZE / sizeof(struct page)); + + for_each_hstate(h) { + char buf[16]; + unsigned int size = 0; + + if (hugetlb_vmemmap_optimizable(h)) + size = hugetlb_vmemmap_size(h) - RESERVE_VMEMMAP_SIZE; + optimizable = size ? true : optimizable; + string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, + sizeof(buf)); + pr_info("%d KiB vmemmap can be optimized for a %s page\n", + size / SZ_1K, buf); + } + if (optimizable) { + if (IS_ENABLED(CONFIG_PROC_SYSCTL)) + register_sysctl_init("vm", hugetlb_vmemmap_sysctls); + pr_info("%d huge pages whose vmemmap are optimized at boot\n", + static_key_count(&hugetlb_optimize_vmemmap_key.key)); + } return 0; } -late_initcall(hugetlb_vmemmap_sysctls_init); -#endif /* CONFIG_PROC_SYSCTL */ +late_initcall(hugetlb_vmemmap_init); diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index ba66fadad9fc..0af3f08cf63c 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -11,35 +11,44 @@ #include #ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP -int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head); -void hugetlb_vmemmap_free(struct hstate *h, struct page *head); -void hugetlb_vmemmap_init(struct hstate *h); +int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head); +void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head); /* - * How many vmemmap pages associated with a HugeTLB page that can be - * optimized and freed to the buddy allocator. + * There are a lot of struct page structures associated with each HugeTLB page. + * The only 'useful' information in the tail page structs is the compound_head + * field which is the same for all tail page structs. So we can reuse the first + * page frame of page structs. The virtual addresses of all the remaining pages + * of tail page structs will be mapped to the head page frame, and then these + * tail page frames are freed. Therefore, we need to reserve one page as + * vmemmap. See Documentation/vm/vmemmap_dedup.rst. */ -static inline unsigned int hugetlb_optimize_vmemmap_pages(struct hstate *h) +#define RESERVE_VMEMMAP_SIZE PAGE_SIZE + +static inline unsigned int hugetlb_vmemmap_size(const struct hstate *h) { - return h->optimize_vmemmap_pages; + return pages_per_huge_page(h) * sizeof(struct page); } -#else -static inline int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head) + +static inline bool hugetlb_vmemmap_optimizable(const struct hstate *h) { - return 0; + if (!is_power_of_2(sizeof(struct page))) + return false; + return hugetlb_vmemmap_size(h) > RESERVE_VMEMMAP_SIZE; } - -static inline void hugetlb_vmemmap_free(struct hstate *h, struct page *head) +#else +static inline int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head) { + return 0; } -static inline void hugetlb_vmemmap_init(struct hstate *h) +static inline void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) { } -static inline unsigned int hugetlb_optimize_vmemmap_pages(struct hstate *h) +static inline bool hugetlb_vmemmap_optimizable(const struct hstate *h) { - return 0; + return false; } #endif /* CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP */ #endif /* _LINUX_HUGETLB_VMEMMAP_H */