From patchwork Mon Oct 18 10:20:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12565961 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5ABCC433EF for ; Mon, 18 Oct 2021 10:25:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6D1E760FED for ; Mon, 18 Oct 2021 10:25:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6D1E760FED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 134116B006C; Mon, 18 Oct 2021 06:25:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E3A8900003; Mon, 18 Oct 2021 06:25:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC64C6B0072; Mon, 18 Oct 2021 06:25:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192]) by kanga.kvack.org (Postfix) with ESMTP id DB66F6B006C for ; Mon, 18 Oct 2021 06:25:30 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9D9F03260A for ; Mon, 18 Oct 2021 10:25:30 +0000 (UTC) X-FDA: 78709176420.32.C8CBE56 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) by imf23.hostedemail.com (Postfix) with ESMTP id B8428900009B for ; Mon, 18 Oct 2021 10:25:26 +0000 (UTC) Received: by mail-pg1-f171.google.com with SMTP id f5so15640175pgc.12 for ; Mon, 18 Oct 2021 03:25:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jfjzDank9Gs725Ve2wZUdWVM59N09QveuHRdI5l/EEo=; b=Jo4KlRhA3do4ZDQoWRh+hEm3G6BdHIv+UQWlcLgaGMEcTTkP8pre8HuKdcs07YqVAM Hj5hk6MwJaW+ZaAM6gS7c4UHMINKqwGYwWlN+DRSIEbmJ8BzAQtulfzUea2KVqacuUP/ U4+xKWnMs8sj9wqghfxdj74UkeDkfLMzapmvJIGrz5L7Bzah1+Ne2FXbA8ZhBZUJpqKf wLEgYi2IFVicPzu5Ewu3whhiyNGTNJbTx8bzRFF/iU2ephhRHgLyb0+XOwUvMKlXT7oP TLWzK0HCPWgimXJIOw3szerUC++40Td+1DizM1rgdPPKL34CAcA5uWBE0rCnktcCQMDF hSfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jfjzDank9Gs725Ve2wZUdWVM59N09QveuHRdI5l/EEo=; b=U4ftV3qzkzIEn7SZvWcM0QKp6YFrhkPkCtIEqIQUEgkDjw/iJCWjuoC8/lwlKDJd2b IXLKjm0T59McfK7az1oRhdeeaqTSmTdotD0LZw6XsUQmMKSwbIsBxn0hG61JhRnea2xa xHwShN6CzTqrGubCTA9eCno/4YsdLyuKuujFC7oshV79ftioHSc47uFgCPD8bRw179c/ fSClUVIea5Py+qqnPgo8qFM5KDJ+fVgo16jRLY+i1empIYlDaLmpkZZv1mqrpUPDG9mi jKZ8OkhWSEPQ7NDoFCVjaKMX8r+41FjMTQvZhSI4jESLmCT069wWSnimggjnhwXHKiT5 J4uA== X-Gm-Message-State: AOAM533juBmYD3w4pmzCHtP1bQn2JywjcsMl2qMLYnyf3K8I/8JwJVlf LwjS/QibZBvBJbWTYHUE2RDJQw== X-Google-Smtp-Source: ABdhPJwp7nnl1P8hzk3eXoUsHY6aCDXXtRmUtiw3ghWKvmcfbGR0BsMJ2pd68ZdoE6r7b/cegpt8sg== X-Received: by 2002:a62:2944:0:b0:44d:b731:4228 with SMTP id p65-20020a622944000000b0044db7314228mr11243871pfp.9.1634552729357; Mon, 18 Oct 2021 03:25:29 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.237]) by smtp.gmail.com with ESMTPSA id nn14sm12762232pjb.27.2021.10.18.03.25.23 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Oct 2021 03:25:29 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org, 21cnbao@gmail.com Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v6 1/5] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page Date: Mon, 18 Oct 2021 18:20:39 +0800 Message-Id: <20211018102043.78685-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20211018102043.78685-1-songmuchun@bytedance.com> References: <20211018102043.78685-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: 6ihsizp1njcwhqcpufbq8tjg7u99tjss Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Jo4KlRhA; spf=pass (imf23.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B8428900009B X-HE-Tag: 1634552726-230029 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB page. However, we can remap all tail vmemmap pages to the page frame mapped to with the head vmemmap page. Finally, we can free 7 vmemmap pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save extra 2GB memory when there is 1TB HugeTLB pages in the system compared with the current implementation). But the head vmemmap page is not freed to the buddy allocator and all tail vmemmap pages are mapped to the head vmemmap page frame. So we can see more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page) associated with each HugeTLB page. We should adjust compound_head() to make it returns the real head struct page when the parameter is the tail struct page but with PG_head flag. Signed-off-by: Muchun Song Reviewed-by: Barry Song --- Documentation/admin-guide/kernel-parameters.txt | 2 +- include/linux/page-flags.h | 78 +++++++++++++++++++++++-- mm/hugetlb_vmemmap.c | 60 ++++++++++--------- mm/sparse-vmemmap.c | 21 +++++++ 4 files changed, 129 insertions(+), 32 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index ad94a2aa9819..8ac050b9b3da 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1619,7 +1619,7 @@ [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP enabled. Allows heavy hugetlb users to free up some more - memory (6 * PAGE_SIZE for each 2MB hugetlb page). + memory (7 * PAGE_SIZE for each 2MB hugetlb page). Format: { on | off (default) } on: enable the feature diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 70bf0ec29ee3..7cd386538d0c 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -184,13 +184,69 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +extern bool hugetlb_free_vmemmap_enabled; + +/* + * If the feature of freeing some vmemmap pages associated with each HugeTLB + * page is enabled, the head vmemmap page frame is reused and all of the tail + * vmemmap addresses map to the head vmemmap page frame (furture details can + * refer to the figure at the head of the mm/hugetlb_vmemmap.c). In other + * words, there are more than one page struct with PG_head associated with each + * HugeTLB page. We __know__ that there is only one head page struct, the tail + * page structs with PG_head are fake head page structs. We need an approach + * to distinguish between those two different types of page structs so that + * compound_head() can return the real head page struct when the parameter is + * the tail page struct but with PG_head. + * + * The page_fixed_fake_head() returns the real head page struct if the @page is + * fake page head, otherwise, returns @page which can either be a true page + * head or tail. + */ +static __always_inline const struct page *page_fixed_fake_head(const struct page *page) +{ + if (!hugetlb_free_vmemmap_enabled) + return page; + + /* + * Only addresses aligned with PAGE_SIZE of struct page may be fake head + * struct page. The alignment check aims to avoid access the fields ( + * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly) + * cold cacheline in some cases. + */ + if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) && + test_bit(PG_head, &page->flags)) { + /* + * We can safely access the field of the @page[1] with PG_head + * because the @page is a compound page composed with at least + * two contiguous pages. + */ + unsigned long head = READ_ONCE(page[1].compound_head); + + if (likely(head & 1)) + return (const struct page *)(head - 1); + } + return page; +} +#else +static inline const struct page *page_fixed_fake_head(const struct page *page) +{ + return page; +} +#endif + +static __always_inline int page_is_fake_head(struct page *page) +{ + return page_fixed_fake_head(page) != page; +} + static inline unsigned long _compound_head(const struct page *page) { unsigned long head = READ_ONCE(page->compound_head); if (unlikely(head & 1)) return head - 1; - return (unsigned long)page; + return (unsigned long)page_fixed_fake_head(page); } #define compound_head(page) ((typeof(page))_compound_head(page)) @@ -225,12 +281,13 @@ static inline unsigned long _compound_head(const struct page *page) static __always_inline int PageTail(struct page *page) { - return READ_ONCE(page->compound_head) & 1; + return READ_ONCE(page->compound_head) & 1 || page_is_fake_head(page); } static __always_inline int PageCompound(struct page *page) { - return test_bit(PG_head, &page->flags) || PageTail(page); + return test_bit(PG_head, &page->flags) || + READ_ONCE(page->compound_head) & 1; } #define PAGE_POISON_PATTERN -1l @@ -675,7 +732,20 @@ static inline bool test_set_page_writeback(struct page *page) return set_page_writeback(page); } -__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY) +static __always_inline bool folio_test_head(struct folio *folio) +{ + return test_bit(PG_head, folio_flags(folio, FOLIO_PF_ANY)); +} + +static __always_inline int PageHead(struct page *page) +{ + PF_POISONED_CHECK(page); + return test_bit(PG_head, &page->flags) && !page_is_fake_head(page); +} + +__SETPAGEFLAG(Head, head, PF_ANY) +__CLEARPAGEFLAG(Head, head, PF_ANY) +CLEARPAGEFLAG(Head, head, PF_ANY) /* Whether there are one or multiple pages in a folio */ static inline bool folio_test_single(struct folio *folio) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index c540c21e26f5..f4a8fca691ee 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -124,9 +124,9 @@ * page of page structs (page 0) associated with the HugeTLB page contains the 4 * page structs necessary to describe the HugeTLB. The only use of the remaining * pages of page structs (page 1 to page 7) is to point to page->compound_head. - * Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs + * Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of page structs * will be used for each HugeTLB page. This will allow us to free the remaining - * 6 pages to the buddy allocator. + * 7 pages to the buddy allocator. * * Here is how things look after remapping. * @@ -134,30 +134,30 @@ * +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ * | | | 0 | -------------> | 0 | * | | +-----------+ +-----------+ - * | | | 1 | -------------> | 1 | - * | | +-----------+ +-----------+ - * | | | 2 | ----------------^ ^ ^ ^ ^ ^ - * | | +-----------+ | | | | | - * | | | 3 | ------------------+ | | | | - * | | +-----------+ | | | | - * | | | 4 | --------------------+ | | | - * | PMD | +-----------+ | | | - * | level | | 5 | ----------------------+ | | - * | mapping | +-----------+ | | - * | | | 6 | ------------------------+ | - * | | +-----------+ | - * | | | 7 | --------------------------+ + * | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^ + * | | +-----------+ | | | | | | + * | | | 2 | -----------------+ | | | | | + * | | +-----------+ | | | | | + * | | | 3 | -------------------+ | | | | + * | | +-----------+ | | | | + * | | | 4 | ---------------------+ | | | + * | PMD | +-----------+ | | | + * | level | | 5 | -----------------------+ | | + * | mapping | +-----------+ | | + * | | | 6 | -------------------------+ | + * | | +-----------+ | + * | | | 7 | ---------------------------+ * | | +-----------+ * | | * | | * | | * +-----------+ * - * When a HugeTLB is freed to the buddy system, we should allocate 6 pages for + * When a HugeTLB is freed to the buddy system, we should allocate 7 pages for * vmemmap pages and restore the previous mapping relationship. * * For the HugeTLB page of the pud level mapping. It is similar to the former. - * We also can use this approach to free (PAGE_SIZE - 2) vmemmap pages. + * We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages. * * Apart from the HugeTLB page of the pmd/pud level mapping, some architectures * (e.g. aarch64) provides a contiguous bit in the translation table entries @@ -166,7 +166,13 @@ * * The contiguous bit is used to increase the mapping size at the pmd and pte * (last) level. So this type of HugeTLB page can be optimized only when its - * size of the struct page structs is greater than 2 pages. + * size of the struct page structs is greater than 1 page. + * + * Notice: The head vmemmap page is not freed to the buddy allocator and all + * tail vmemmap pages are mapped to the head vmemmap page frame. So we can see + * more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page) + * associated with each HugeTLB page. The compound_head() can handle this + * correctly (more details refer to the comment above compound_head()). */ #define pr_fmt(fmt) "HugeTLB: " fmt @@ -175,14 +181,16 @@ /* * There are a lot of struct page structures associated with each HugeTLB page. * For tail pages, the value of compound_head is the same. So we can reuse first - * page of tail page structures. We map the virtual addresses of the remaining - * pages of tail page structures to the first tail page struct, and then free - * these page frames. Therefore, we need to reserve two pages as vmemmap areas. + * page of head page structures. We map the virtual addresses of all the pages + * of tail page structures to the head page struct, and then free these page + * frames. Therefore, we need to reserve one pages as vmemmap areas. */ -#define RESERVE_VMEMMAP_NR 2U +#define RESERVE_VMEMMAP_NR 1U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) -bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); +bool hugetlb_free_vmemmap_enabled __read_mostly = + IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); +EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled); static int __init early_hugetlb_free_vmemmap_param(char *buf) { @@ -236,7 +244,6 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) */ ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE); - if (!ret) ClearHPageVmemmapOptimized(head); @@ -282,9 +289,8 @@ void __init hugetlb_vmemmap_init(struct hstate *h) vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT; /* - * The head page and the first tail page are not to be freed to buddy - * allocator, the other pages will map to the first tail page, so they - * can be freed. + * The head page is not to be freed to buddy allocator, the other tail + * pages will map to the head page, so they can be freed. * * Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? It is true * on some architectures (e.g. aarch64). See Documentation/arm64/ diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index db6df27c852a..e881f5db7091 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -245,6 +245,26 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, set_pte_at(&init_mm, addr, pte, entry); } +/* + * How many struct page structs need to be reset. When we reuse the head + * struct page, the special metadata (e.g. page->flags or page->mapping) + * cannot copy to the tail struct page structs. The invalid value will be + * checked in the free_tail_pages_check(). In order to avoid the message + * of "corrupted mapping in tail page". We need to reset at least 3 (one + * head struct page struct and two tail struct page structs) struct page + * structs. + */ +#define NR_RESET_STRUCT_PAGE 3 + +static inline void reset_struct_pages(struct page *start) +{ + int i; + struct page *from = start + NR_RESET_STRUCT_PAGE; + + for (i = 0; i < NR_RESET_STRUCT_PAGE; i++) + memcpy(start + i, from, sizeof(*from)); +} + static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, struct vmemmap_remap_walk *walk) { @@ -258,6 +278,7 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, list_del(&page->lru); to = page_to_virt(page); copy_page(to, (void *)walk->reuse_addr); + reset_struct_pages(to); set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); } From patchwork Mon Oct 18 10:20:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12565963 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C69F9C433EF for ; Mon, 18 Oct 2021 10:25:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5055B60FED for ; Mon, 18 Oct 2021 10:25:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5055B60FED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E52B0900004; Mon, 18 Oct 2021 06:25:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0245900003; Mon, 18 Oct 2021 06:25:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCAF7900004; Mon, 18 Oct 2021 06:25:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id BF36E900003 for ; Mon, 18 Oct 2021 06:25:37 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 790AC1838DC12 for ; Mon, 18 Oct 2021 10:25:37 +0000 (UTC) X-FDA: 78709176714.07.9AAB08A Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf29.hostedemail.com (Postfix) with ESMTP id 20205900024E for ; Mon, 18 Oct 2021 10:25:34 +0000 (UTC) Received: by mail-pj1-f41.google.com with SMTP id lk8-20020a17090b33c800b001a0a284fcc2so14151213pjb.2 for ; Mon, 18 Oct 2021 03:25:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=yhMXohaipBke6xOtWf2whLCJBoi6CVAwEzN4wXOFZSY=; b=hnUO9pplQhl5S3GsNvI6yx4RjlSwCCnaMcQvqhLf2TBKOUUDkg47/v6/xUW+vvgv0x 84rkV2kA7bARAel5+WQbpyBZDisbABA88id207Wq7htpPwWyQO1/d2oGWdKr5yn2KyAa Aiu1uQ1A18cw+wOtyl8SPJw+pYQpTEy60dbkLS3FvzF5KPLKKBe31CNb+8r7lQ1BE5hN 4+16f9sfEcscRWhfNbyg/bFRiIo2jgB2DtTGl5IN1unyxS+LCB8t/6h4vOPAb/Jso+Oj Xml9ecpione7lx2zqqQ7l8Hiq1awF8tVJqD1oGFqcOH1Kb+sRwG8yf0XWgLPsXZqIdya JMGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yhMXohaipBke6xOtWf2whLCJBoi6CVAwEzN4wXOFZSY=; b=kLxollpVRlV+yqdVXmY7kvTQW5ovnD72k5W16BxM8d0Ey9P/3AuvDjOAXR+ZzsWWea gwjNTXfrtPy9E2F3dGka8vEsWknGO2nhdqASiwheksDWFnzlK9ag6iktoQL1BAMllx/z J5CfjuueXJ2U3ZtZLB60Ra1dKvfppU/RgVJUgxXO1nRbXCRM3woVAJJes7AZa3OT2J6C a11vvbE7qI1ldkCIMGahWdU2wjw2xfjRzUFI9LI9ZjmlQkkqd2nMOOXLNrmZ0KTNLpSc WTqFfyt4T4zh7qSiexVIlCdua7QDKV7qw51nbwi8x22GTAoa8HaBXvgpy7Irl6FJJIIA LXLA== X-Gm-Message-State: AOAM533OHfogeFOQNHgLQyidX2X3Yhi+HEqKKR1AsT0qawz80WPT+2tF 3PNy/YFqBdtLzGpkp+GvKMEvYQ== X-Google-Smtp-Source: ABdhPJzmhXBehF57sDToISHgBpJeXGOTenrQtWrZvDbe/BybmHbhc2G3sAd4iD96pWqgNZheVhDwMQ== X-Received: by 2002:a17:903:234a:b0:13e:f01a:24f with SMTP id c10-20020a170903234a00b0013ef01a024fmr26554320plh.43.1634552736229; Mon, 18 Oct 2021 03:25:36 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.237]) by smtp.gmail.com with ESMTPSA id nn14sm12762232pjb.27.2021.10.18.03.25.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Oct 2021 03:25:36 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org, 21cnbao@gmail.com Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v6 2/5] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key Date: Mon, 18 Oct 2021 18:20:40 +0800 Message-Id: <20211018102043.78685-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20211018102043.78685-1-songmuchun@bytedance.com> References: <20211018102043.78685-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=hnUO9ppl; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf29.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 20205900024E X-Stat-Signature: c37ueta4utsnoohyrtxeigaxsips76h9 X-HE-Tag: 1634552734-453653 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The page_head_if_fake() is used throughout memory management and the conditional check requires checking a global variable, although the overhead of this check may be small, it increases when the memory cache comes under pressure. Also, the global variable will not be modified after system boot, so it is very appropriate to use static key machanism. Signed-off-by: Muchun Song Reviewed-by: Barry Song --- include/linux/hugetlb.h | 6 ------ include/linux/page-flags.h | 16 ++++++++++++++-- mm/hugetlb_vmemmap.c | 12 ++++++------ mm/memory_hotplug.c | 2 +- 4 files changed, 21 insertions(+), 15 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 44c2ab0dfa59..27a2adff0db7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1077,12 +1077,6 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr } #endif /* CONFIG_HUGETLB_PAGE */ -#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP -extern bool hugetlb_free_vmemmap_enabled; -#else -#define hugetlb_free_vmemmap_enabled false -#endif - static inline spinlock_t *huge_pte_lock(struct hstate *h, struct mm_struct *mm, pte_t *pte) { diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 7cd386538d0c..26e540fd3393 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -185,7 +185,14 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP -extern bool hugetlb_free_vmemmap_enabled; +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + hugetlb_free_vmemmap_enabled_key); + +static __always_inline bool hugetlb_free_vmemmap_enabled(void) +{ + return static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + &hugetlb_free_vmemmap_enabled_key); +} /* * If the feature of freeing some vmemmap pages associated with each HugeTLB @@ -205,7 +212,7 @@ extern bool hugetlb_free_vmemmap_enabled; */ static __always_inline const struct page *page_fixed_fake_head(const struct page *page) { - if (!hugetlb_free_vmemmap_enabled) + if (!hugetlb_free_vmemmap_enabled()) return page; /* @@ -233,6 +240,11 @@ static inline const struct page *page_fixed_fake_head(const struct page *page) { return page; } + +static inline bool hugetlb_free_vmemmap_enabled(void) +{ + return false; +} #endif static __always_inline int page_is_fake_head(struct page *page) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index f4a8fca691ee..22ecb5e21686 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -188,9 +188,9 @@ #define RESERVE_VMEMMAP_NR 1U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) -bool hugetlb_free_vmemmap_enabled __read_mostly = - IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); -EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled); +DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + hugetlb_free_vmemmap_enabled_key); +EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key); static int __init early_hugetlb_free_vmemmap_param(char *buf) { @@ -204,9 +204,9 @@ static int __init early_hugetlb_free_vmemmap_param(char *buf) return -EINVAL; if (!strcmp(buf, "on")) - hugetlb_free_vmemmap_enabled = true; + static_branch_enable(&hugetlb_free_vmemmap_enabled_key); else if (!strcmp(buf, "off")) - hugetlb_free_vmemmap_enabled = false; + static_branch_disable(&hugetlb_free_vmemmap_enabled_key); else return -EINVAL; @@ -284,7 +284,7 @@ void __init hugetlb_vmemmap_init(struct hstate *h) BUILD_BUG_ON(__NR_USED_SUBPAGE >= RESERVE_VMEMMAP_SIZE / sizeof(struct page)); - if (!hugetlb_free_vmemmap_enabled) + if (!hugetlb_free_vmemmap_enabled()) return; vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 3de7933e5302..587a8fc61fc8 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1326,7 +1326,7 @@ bool mhp_supports_memmap_on_memory(unsigned long size) * populate a single PMD. */ return memmap_on_memory && - !hugetlb_free_vmemmap_enabled && + !hugetlb_free_vmemmap_enabled() && IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && size == memory_block_size_bytes() && IS_ALIGNED(vmemmap_size, PMD_SIZE) && From patchwork Mon Oct 18 10:20:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12565965 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 205FBC433FE for ; Mon, 18 Oct 2021 10:25:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 93F7B61212 for ; Mon, 18 Oct 2021 10:25:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 93F7B61212 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 3CB8A940007; Mon, 18 Oct 2021 06:25:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 37B04900003; Mon, 18 Oct 2021 06:25:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2438D940007; Mon, 18 Oct 2021 06:25:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0146.hostedemail.com [216.40.44.146]) by kanga.kvack.org (Postfix) with ESMTP id 1761A900003 for ; Mon, 18 Oct 2021 06:25:44 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C9DE78249980 for ; Mon, 18 Oct 2021 10:25:43 +0000 (UTC) X-FDA: 78709176966.22.CB1CFE4 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf05.hostedemail.com (Postfix) with ESMTP id 9311A5081761 for ; Mon, 18 Oct 2021 10:25:39 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id i5so4659409pla.5 for ; Mon, 18 Oct 2021 03:25:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Ut/rsMwOCHZFvr+mAC3bHz7Yha0mg558IoAi3jATnso=; b=DaXqaOy98jLAUucS1hvA+9Ma+UWXvtUxae9gnfTQoGnbKCqzFI9zx4JKshkwwvkiSI 8l7N8/UrjU6jlp6r99cXEPszL1xBa+nGqfAnDTuD3o5HmbpsUJ1W+OVPBHLVzNRTPPug iqbGy/cM08NkFsAd05RVTbdl3ufCDlX4SB8xu1pw3S0OgCIJ97TtaOIl5tuZmAu60yBn Q6DmHSLRaWdMM5Kjrt9GlZ9TdVHPoeIrXDf4LznVoNe6NyAx2Xf/smGIK1EY71/zh1Df 0LolhMWEDHZQjVJ8fZNr0OiC2F1Pectdj0ht5PntEt1LePfunGY29BYaY7lOcXZ6ZaH0 R0bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ut/rsMwOCHZFvr+mAC3bHz7Yha0mg558IoAi3jATnso=; b=EqukSdNvxqq3d/E1jTzZ9yJAiSI+2hrLSS2yaXtXGWAWybcX8p+ajHQd5jy9z/7Bhz +8RuL6Ivu9uleAh/VLrnAov8/pJV3EL5kKX0v/mTKayxfTqgI5Gcfi2ko/I7W4F0wwUX fuD+9BoHRLQx0ieFzmEwlzJ0+G5jhaZSGElw24Aql43kkPhm5HfYKo2f0Q1jC8ZdUgzs zY7uZIdEYGIZItk4oJUGrOFKARmhK79chVvDBAb3kOHbuqnsT8BwRCLJFDChRF6C7d1/ 8zXzzS8jqtGlJLRIdzlFKmZymIXBRFA3UFt1dnkejzRD+Vjq5mByr0fMu7IPQ8b3kak3 7gPQ== X-Gm-Message-State: AOAM531bi/qkpCRIqoPeQx2ez6mf1ADgHa0tV6mSWMfQbOnZUwuS1seS QN//8sjaFwKnnGsaZhKcjQJ4tg== X-Google-Smtp-Source: ABdhPJyOvT95zF+08s6KWVreUMW6YBMCSlQPjJgW7DS4nnr5UpDmdcq5mhd6IdTY72HiGi45Mql9KQ== X-Received: by 2002:a17:902:e984:b0:13f:17c2:8f0a with SMTP id f4-20020a170902e98400b0013f17c28f0amr26512874plb.74.1634552742608; Mon, 18 Oct 2021 03:25:42 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.237]) by smtp.gmail.com with ESMTPSA id nn14sm12762232pjb.27.2021.10.18.03.25.36 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Oct 2021 03:25:42 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org, 21cnbao@gmail.com Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v6 3/5] mm: sparsemem: use page table lock to protect kernel pmd operations Date: Mon, 18 Oct 2021 18:20:41 +0800 Message-Id: <20211018102043.78685-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20211018102043.78685-1-songmuchun@bytedance.com> References: <20211018102043.78685-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: 9pqa3gogd1tor469myzif4mdauoq5c1u Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=DaXqaOy9; spf=pass (imf05.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 9311A5081761 X-HE-Tag: 1634552739-82325 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The init_mm.page_table_lock is used to protect kernel page tables, we can use it to serialize splitting vmemmap PMD mappings instead of mmap write lock, which can increase the concurrency of vmemmap_remap_free(). Actually, It increase the concurrency between allocations of HugeTLB pages. But it is not the only benefit. There are a lot of users of mmap read lock of init_mm. The mmap write lock is holding through vmemmap_remap_free(), removing mmap write lock usage to make it does not affect other users of mmap read lock. It is not making anything worse and always a win to move. Now the kernel page table walker does not hold the page_table_lock when walking pmd entries. There may be consistency issue of a pmd entry, because pmd entry might change from a huge pmd entry to a PTE page table. There is only one user of kernel page table walker, namely ptdump. The ptdump already considers the consistency, which use a local variable to cache the value of pmd entry. But we also need to update ->action to ACTION_CONTINUE to make sure the walker does not walk every pte entry again when concurrent thread has split the huge pmd. Signed-off-by: Muchun Song --- mm/ptdump.c | 16 ++++++++++++---- mm/sparse-vmemmap.c | 47 +++++++++++++++++++++++++++++++---------------- 2 files changed, 43 insertions(+), 20 deletions(-) diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..eea3d28d173c 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -40,8 +40,10 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr, if (st->effective_prot) st->effective_prot(st, 0, pgd_val(val)); - if (pgd_leaf(val)) + if (pgd_leaf(val)) { st->note_page(st, addr, 0, pgd_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } @@ -61,8 +63,10 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr, if (st->effective_prot) st->effective_prot(st, 1, p4d_val(val)); - if (p4d_leaf(val)) + if (p4d_leaf(val)) { st->note_page(st, addr, 1, p4d_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } @@ -82,8 +86,10 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr, if (st->effective_prot) st->effective_prot(st, 2, pud_val(val)); - if (pud_leaf(val)) + if (pud_leaf(val)) { st->note_page(st, addr, 2, pud_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } @@ -101,8 +107,10 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr, if (st->effective_prot) st->effective_prot(st, 3, pmd_val(val)); - if (pmd_leaf(val)) + if (pmd_leaf(val)) { st->note_page(st, addr, 3, pmd_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index e881f5db7091..c64d1aa3c4b5 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -53,8 +53,7 @@ struct vmemmap_remap_walk { struct list_head *vmemmap_pages; }; -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, - struct vmemmap_remap_walk *walk) +static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) { pmd_t __pmd; int i; @@ -76,15 +75,34 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, set_pte_at(&init_mm, addr, pte, entry); } - /* Make pte visible before pmd. See comment in pmd_install(). */ - smp_wmb(); - pmd_populate_kernel(&init_mm, pmd, pgtable); - - flush_tlb_kernel_range(start, start + PMD_SIZE); + spin_lock(&init_mm.page_table_lock); + if (likely(pmd_leaf(*pmd))) { + /* Make pte visible before pmd. See comment in pmd_install(). */ + smp_wmb(); + pmd_populate_kernel(&init_mm, pmd, pgtable); + flush_tlb_kernel_range(start, start + PMD_SIZE); + } else { + pte_free_kernel(&init_mm, pgtable); + } + spin_unlock(&init_mm.page_table_lock); return 0; } +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) +{ + int leaf; + + spin_lock(&init_mm.page_table_lock); + leaf = pmd_leaf(*pmd); + spin_unlock(&init_mm.page_table_lock); + + if (!leaf) + return 0; + + return __split_vmemmap_huge_pmd(pmd, start); +} + static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct vmemmap_remap_walk *walk) @@ -121,13 +139,12 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, pmd = pmd_offset(pud, addr); do { - if (pmd_leaf(*pmd)) { - int ret; + int ret; + + ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK); + if (ret) + return ret; - ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk); - if (ret) - return ret; - } next = pmd_addr_end(addr, end); vmemmap_pte_range(pmd, addr, next, walk); } while (pmd++, addr = next, addr != end); @@ -321,10 +338,8 @@ int vmemmap_remap_free(unsigned long start, unsigned long end, */ BUG_ON(start - reuse != PAGE_SIZE); - mmap_write_lock(&init_mm); + mmap_read_lock(&init_mm); ret = vmemmap_remap_range(reuse, end, &walk); - mmap_write_downgrade(&init_mm); - if (ret && walk.nr_walked) { end = reuse + walk.nr_walked * PAGE_SIZE; /* From patchwork Mon Oct 18 10:20:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12565967 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38952C433EF for ; Mon, 18 Oct 2021 10:25:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A9D3E6103D for ; Mon, 18 Oct 2021 10:25:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A9D3E6103D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 50362940008; Mon, 18 Oct 2021 06:25:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B2D1900003; Mon, 18 Oct 2021 06:25:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37C4B940008; Mon, 18 Oct 2021 06:25:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0117.hostedemail.com [216.40.44.117]) by kanga.kvack.org (Postfix) with ESMTP id 2B600900003 for ; Mon, 18 Oct 2021 06:25:50 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id DEA3C1838DC0A for ; Mon, 18 Oct 2021 10:25:49 +0000 (UTC) X-FDA: 78709177218.33.E221769 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf03.hostedemail.com (Postfix) with ESMTP id D5DCA30000A4 for ; Mon, 18 Oct 2021 10:25:47 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id ls18so11879438pjb.3 for ; Mon, 18 Oct 2021 03:25:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4TEz7LJ2e38nXsVuu+zdOOkjk9dhX/V4fwPw2l4XlU8=; b=ub/RlTdG7co0/yi8PJViCio5eWgEkacfTivaJLn549E9iS0zviJGG12Dr92T5aXkX9 FpkFTZaMBaVuia1mx0M5Yi8MHLWcYuhyuiTgGR0/8cCxTZzE750BFwjQOInpT6bL0FzV u4WxRe6JP56/oZvvyQU0eEmCbEUPf7MgpvC21cFIgkxkZgOAjxCwIclhhYADSvC3ocpY M1YcBGg212mUSEKw5OHFaRumoPa08MN4O2xyQia5Umdu5q8vefIJdHMvMLT40oMihUWL NfwYKViGWehBL5FjuPHPgzohD5kpenSIMwjOBG/wDuxc2zEDFyypKwAxOkeFM0lqWzC1 Iytw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4TEz7LJ2e38nXsVuu+zdOOkjk9dhX/V4fwPw2l4XlU8=; b=OXnB2MsA4Q94W4V0iNu8WNZGXguWBOAbWoxpFw/K3JYpq+wn/0TMYWiPMsgO8QALjx Op0JgsY8SbosW9CUqmESw1FBOYHo+Za0Tglmq7Bv0XL4Lju2DSSscl0kqoXLdOsRPbh3 /JxEg/QbvqVovgpVrZLMXVCVOFBcXOsCsy+bgdtoTRjhUD8DSVS7m8feKq/LBUeI1N4P 8P8f5dodCyq269dlqFGMQ7NM9W2HJY+xSEXFdBrYOJL5mpPo0bTffjtnyIPGVfe5V8HF DG4sZ3kMC7Wuf0oJ5443IObdMKI8+VM+7BbsCPuR2+QRQ5k8rbOBaCS00VtC0g3zC5mg tEmg== X-Gm-Message-State: AOAM532XKRhX14ceNPRjbqmsdHNHNehT6U1Ah+dzekMwxRGuAb+WuMAf yiBsKgTlUaKsl2dTccPdjmJFMA== X-Google-Smtp-Source: ABdhPJzwnPmdmCAEs1NVmhdD8qEYCyVLFFUGm6I894CH7brAEpIkoeA+Wg8DD4fMyzrgHEpFVGBF2A== X-Received: by 2002:a17:90b:4d84:: with SMTP id oj4mr32326261pjb.58.1634552748657; Mon, 18 Oct 2021 03:25:48 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.237]) by smtp.gmail.com with ESMTPSA id nn14sm12762232pjb.27.2021.10.18.03.25.42 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Oct 2021 03:25:48 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org, 21cnbao@gmail.com Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v6 4/5] selftests: vm: add a hugetlb test case Date: Mon, 18 Oct 2021 18:20:42 +0800 Message-Id: <20211018102043.78685-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20211018102043.78685-1-songmuchun@bytedance.com> References: <20211018102043.78685-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: D5DCA30000A4 X-Stat-Signature: 3ccijkt3x8nnf1b8empcsh73crk9oy5a Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="ub/RlTdG"; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf03.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-HE-Tag: 1634552747-649113 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since the head vmemmap page frame associated with each HugeTLB page is reused, we should hide the PG_head flag of tail struct page from the user. Add a tese case to check whether it is work properly. The test steps are as follows. 1) alloc 2MB hugeTLB 2) get each page frame 3) apply those APIs in each page frame 4) Those APIs work completely the same as before. Reading the flags of a page by /proc/kpageflags is done in stable_page_flags(), which has invoked PageHead(), PageTail(), PageCompound() and compound_head(). If those APIs work properly, the head page must have 15 and 17 bits set. And tail pages must have 16 and 17 bits set but 15 bit unset. Those flags are checked in check_page_flags(). Signed-off-by: Muchun Song Reviewed-by: Barry Song --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/hugepage-vmemmap.c | 144 ++++++++++++++++++++++++++ tools/testing/selftests/vm/run_vmtests.sh | 11 ++ 4 files changed, 157 insertions(+) create mode 100644 tools/testing/selftests/vm/hugepage-vmemmap.c diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 2e7e86e85282..3b5faec3c04f 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -2,6 +2,7 @@ hugepage-mmap hugepage-mremap hugepage-shm +hugepage-vmemmap khugepaged map_hugetlb map_populate diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 1607322a112c..7d100a7dc462 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -31,6 +31,7 @@ TEST_GEN_FILES += hmm-tests TEST_GEN_FILES += hugepage-mmap TEST_GEN_FILES += hugepage-mremap TEST_GEN_FILES += hugepage-shm +TEST_GEN_FILES += hugepage-vmemmap TEST_GEN_FILES += khugepaged TEST_GEN_FILES += madv_populate TEST_GEN_FILES += map_fixed_noreplace diff --git a/tools/testing/selftests/vm/hugepage-vmemmap.c b/tools/testing/selftests/vm/hugepage-vmemmap.c new file mode 100644 index 000000000000..557bdbd4f87e --- /dev/null +++ b/tools/testing/selftests/vm/hugepage-vmemmap.c @@ -0,0 +1,144 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * A test case of using hugepage memory in a user application using the + * mmap system call with MAP_HUGETLB flag. Before running this program + * make sure the administrator has allocated enough default sized huge + * pages to cover the 2 MB allocation. + */ +#include +#include +#include +#include +#include + +#define MAP_LENGTH (2UL * 1024 * 1024) + +#ifndef MAP_HUGETLB +#define MAP_HUGETLB 0x40000 /* arch specific */ +#endif + +#define PAGE_SIZE 4096 + +#define PAGE_COMPOUND_HEAD (1UL << 15) +#define PAGE_COMPOUND_TAIL (1UL << 16) +#define PAGE_HUGE (1UL << 17) + +#define HEAD_PAGE_FLAGS (PAGE_COMPOUND_HEAD | PAGE_HUGE) +#define TAIL_PAGE_FLAGS (PAGE_COMPOUND_TAIL | PAGE_HUGE) + +#define PM_PFRAME_BITS 55 +#define PM_PFRAME_MASK ~((1UL << PM_PFRAME_BITS) - 1) + +/* + * For ia64 architecture, Linux kernel reserves Region number 4 for hugepages. + * That means the addresses starting with 0x800000... will need to be + * specified. Specifying a fixed address is not required on ppc64, i386 + * or x86_64. + */ +#ifdef __ia64__ +#define MAP_ADDR (void *)(0x8000000000000000UL) +#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED) +#else +#define MAP_ADDR NULL +#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) +#endif + +static void write_bytes(char *addr, size_t length) +{ + unsigned long i; + + for (i = 0; i < length; i++) + *(addr + i) = (char)i; +} + +static unsigned long virt_to_pfn(void *addr) +{ + int fd; + unsigned long pagemap; + + fd = open("/proc/self/pagemap", O_RDONLY); + if (fd < 0) + return -1UL; + + lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET); + read(fd, &pagemap, sizeof(pagemap)); + close(fd); + + return pagemap & ~PM_PFRAME_MASK; +} + +static int check_page_flags(unsigned long pfn) +{ + int fd, i; + unsigned long pageflags; + + fd = open("/proc/kpageflags", O_RDONLY); + if (fd < 0) + return -1; + + lseek(fd, pfn * sizeof(pageflags), SEEK_SET); + + read(fd, &pageflags, sizeof(pageflags)); + if ((pageflags & HEAD_PAGE_FLAGS) != HEAD_PAGE_FLAGS) { + close(fd); + printf("Head page flags (%lx) is invalid\n", pageflags); + return -1; + } + + /* + * pages other than the first page must be tail and shouldn't be head; + * this also verifies kernel has correctly set the fake page_head to tail + * while hugetlb_free_vmemmap is enabled. + */ + for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) { + read(fd, &pageflags, sizeof(pageflags)); + if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS || + (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) { + close(fd); + printf("Tail page flags (%lx) is invalid\n", pageflags); + return -1; + } + } + + close(fd); + + return 0; +} + +int main(int argc, char **argv) +{ + void *addr; + unsigned long pfn; + + addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0); + if (addr == MAP_FAILED) { + perror("mmap"); + exit(1); + } + + /* Trigger allocation of HugeTLB page. */ + write_bytes(addr, MAP_LENGTH); + + pfn = virt_to_pfn(addr); + if (pfn == -1UL) { + munmap(addr, MAP_LENGTH); + perror("virt_to_pfn"); + exit(1); + } + + printf("Returned address is %p whose pfn is %lx\n", addr, pfn); + + if (check_page_flags(pfn) < 0) { + munmap(addr, MAP_LENGTH); + perror("check_page_flags"); + exit(1); + } + + /* munmap() length of MAP_HUGETLB memory must be hugepage aligned */ + if (munmap(addr, MAP_LENGTH)) { + perror("munmap"); + exit(1); + } + + return 0; +} diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index 45e803af7c77..745f86e7a086 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -108,6 +108,17 @@ else echo "[PASS]" fi +echo "------------------------" +echo "running hugepage-vmemmap" +echo "------------------------" +./hugepage-vmemmap +if [ $? -ne 0 ]; then + echo "[FAIL]" + exitcode=1 +else + echo "[PASS]" +fi + echo "NOTE: The above hugetlb tests provide minimal coverage. Use" echo " https://github.com/libhugetlbfs/libhugetlbfs.git for" echo " hugetlb regression testing." From patchwork Mon Oct 18 10:20:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12565969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C529C433FE for ; Mon, 18 Oct 2021 10:25:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 121AF60FF2 for ; Mon, 18 Oct 2021 10:25:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 121AF60FF2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id AA92E900005; Mon, 18 Oct 2021 06:25:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A58A5900003; Mon, 18 Oct 2021 06:25:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92106900005; Mon, 18 Oct 2021 06:25:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0241.hostedemail.com [216.40.44.241]) by kanga.kvack.org (Postfix) with ESMTP id 84A2C900003 for ; Mon, 18 Oct 2021 06:25:56 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3F6458249980 for ; Mon, 18 Oct 2021 10:25:56 +0000 (UTC) X-FDA: 78709177512.01.8A906A2 Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) by imf18.hostedemail.com (Postfix) with ESMTP id 6C59F400208E for ; Mon, 18 Oct 2021 10:25:53 +0000 (UTC) Received: by mail-pj1-f51.google.com with SMTP id qe4-20020a17090b4f8400b0019f663cfcd1so14146330pjb.1 for ; Mon, 18 Oct 2021 03:25:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2UoWjZzOHViEmzMC0Qv5p6Aol3nKS7az97fC7P92qe4=; b=Gd5h2gBCnRqEtZChFB1olUuihrmox4cKyIUyPvGqjLMLY6jUZ3uNVPFTHJpBzZo6qf 5Hl+3LYSiOcMkFuJpfr5+FRq/6ej3achjCnGkj1gjgWjsrPpaglpOzUccBvGHbXhyaRJ U3bqhb81C6A//0N06JgcY1sw7snoGzjn3zeZaH/myJFRSRFGM91aXvZqjOVwPG3AKLCg l92aTjdVAWMtEzATNw6Dq1y+py7fkGmsKDAM3Z1yAreL6h1LPiSSfEJ8xS/viJRZD3xZ hZM5KDPXAdAWFQK0ron8FDsAZCo02VaPhkeRYGlsOXRWMa+c05HGLFF0lVGshnjQkaOo ZLqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2UoWjZzOHViEmzMC0Qv5p6Aol3nKS7az97fC7P92qe4=; b=Cb8Pv3J0ubiHVz3pn01QrfP0hdzlIyvr4Pc5YfwTNGZyuF2TnILbR8zo9VldlnUuIK BFRAh7aZUTh2HZgH53ly2rSIKX6asvJVbynoXhOQbc1f1pNvUKOQiq6w0HByPzHlb+Om SajS6V1uRqqCTZq0pwsLosIl9IFnGqpv7dI5wZIPUhzqrcSp8IA+O0MNe+Dl55Ihe7gD HrA14Fj/vgDQDaFEM7lNw7kiViIMJbc2tibJ6S1ojS2XEv/9rMxUzzYEsIOXZmLDJwpS 96bKYMp00DZZNMZxR+jZmGsXgjDKTXiijKnJyFp0c2uQabmj/iGg11oFwjAv+BXOSJad fvwQ== X-Gm-Message-State: AOAM530Cn/NF9V/8ig/4QVAsjPRXsCLgOWuK9U4AOAyxIVikwuNym7Sd 58tQ7pOieY+vsNlUQpbS9Ms+Gw== X-Google-Smtp-Source: ABdhPJwYZ3gvRiwyFh8hGZtiS4aEnAV/VG+UgzKIi92f3MeJyhYUk6ojpZOUzHRjbfYDUeAIK1iKTw== X-Received: by 2002:a17:90a:1a4c:: with SMTP id 12mr47381855pjl.89.1634552754998; Mon, 18 Oct 2021 03:25:54 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.237]) by smtp.gmail.com with ESMTPSA id nn14sm12762232pjb.27.2021.10.18.03.25.49 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Oct 2021 03:25:54 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org, 21cnbao@gmail.com Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v6 5/5] mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP Date: Mon, 18 Oct 2021 18:20:43 +0800 Message-Id: <20211018102043.78685-6-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20211018102043.78685-1-songmuchun@bytedance.com> References: <20211018102043.78685-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 6C59F400208E X-Stat-Signature: 9cuinznroutricyq4h19b47z8hf99qym Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Gd5h2gBC; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf18.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspamd-Server: rspam02 X-HE-Tag: 1634552753-520510 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The vmemmap_remap_free/alloc are relevant to HugeTLB, so move those functiongs to the scope of CONFIG_HUGETLB_PAGE_FREE_VMEMMAP. Signed-off-by: Muchun Song Reviewed-by: Barry Song --- include/linux/mm.h | 2 ++ mm/sparse-vmemmap.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index a7e4a9e7d807..8c85863a067c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3184,10 +3184,12 @@ static inline void print_vma_addr(char *prefix, unsigned long rip) } #endif +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP int vmemmap_remap_free(unsigned long start, unsigned long end, unsigned long reuse); int vmemmap_remap_alloc(unsigned long start, unsigned long end, unsigned long reuse, gfp_t gfp_mask); +#endif void *sparse_buffer_alloc(unsigned long size); struct page * __populate_section_memmap(unsigned long pfn, diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index c64d1aa3c4b5..8aecd6b3896c 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -34,6 +34,7 @@ #include #include +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP /** * struct vmemmap_remap_walk - walk vmemmap page table * @@ -419,6 +420,7 @@ int vmemmap_remap_alloc(unsigned long start, unsigned long end, return 0; } +#endif /* CONFIG_HUGETLB_PAGE_FREE_VMEMMAP */ /* * Allocate a block of memory to be used to back the virtual memory map