From patchwork Thu Aug 19 06:58:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12446449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91F38C4338F for ; Thu, 19 Aug 2021 07:01:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EFC5C610A7 for ; Thu, 19 Aug 2021 07:01:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EFC5C610A7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 8173E6B0071; Thu, 19 Aug 2021 03:01:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C7AB8D0001; Thu, 19 Aug 2021 03:01:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68F196B0073; Thu, 19 Aug 2021 03:01:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0092.hostedemail.com [216.40.44.92]) by kanga.kvack.org (Postfix) with ESMTP id 4FB146B0071 for ; Thu, 19 Aug 2021 03:01:29 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DCBAB274BF for ; Thu, 19 Aug 2021 07:01:24 +0000 (UTC) X-FDA: 78490934088.15.0A35AEE Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) by imf15.hostedemail.com (Postfix) with ESMTP id 82A98D0048B3 for ; Thu, 19 Aug 2021 07:01:24 +0000 (UTC) Received: by mail-pg1-f170.google.com with SMTP id s11so4985386pgr.11 for ; Thu, 19 Aug 2021 00:01:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tPeELtqCwjpvRVGceyyXb8wT9tIIs0jEr3fTLflB/X8=; b=fZgUHxGhmwtdl7UBLqg84E06NfJLqcmh+VZLYnCZmyfGY7d728NMUDb5pHOhN+ErPN +ZOusejpqJSEvuhBzhoYzJVEotzGiBoVhbX2Q9PkiwftWd3ABHGb+NjNYIeKzBuWbxiA s5KxFUB8Wdbdq6fdmlIWTHdg2X4z6J9xiHzdQzEWsucjNUKjcvVcmjmEagBpyfsb7bi6 jm8NfC+dhdNuBNPFH+UjaeeU9nPqc4UUEXCwpSFceNHAVse9N7gmFgHseNIy3TKVQZ4K XG0NylInaagAoRXsaNm8w8fxuhBrTDszQQigWcYNbIYQje863ru4aQGn6zscbleVZ4od LPhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tPeELtqCwjpvRVGceyyXb8wT9tIIs0jEr3fTLflB/X8=; b=nSsHTC41hFwzY987JGWImm/psJzhCXN6RYD+RD2O77RneQ4GUuXCrDM3R/aKayZux2 E9HS3izNbFcgv2F+d3fixARBW1Q/dZYYQH8L0qEMbdJTdAsCcokJ61GViYR5/V4sE1ax uqreczqboyqTyEmRQAupQTfSBlTaVOJE+LUhKz92mZB7BJlzaoVo9m4RShVBASJbIaA6 S5V4TPyvoHHCo7II0fgfqKl4F+zNwErrjZfTlBNS+7Viy2VV2MJA0AbvvCh4o5OsUnkp 4CqFM4btLhMbM5WYKddb//3J6G8mJg09XegkkMnGZyA3qGpnH8RXG/qIGDeqgcZqN5g6 KEUg== X-Gm-Message-State: AOAM53042tdj8//ItcZrxq5i2gOKWwHzPPzRRiwGVCrVydMkehEg0u33 IWXt8yMnxCGnRKNdozcxY9QAeA== X-Google-Smtp-Source: ABdhPJwv/wjwYA3OfRoLnhbJkD2hJue5q8lDQLXbPSdmpOrDp8q5sb0noWCNmQ/lWmBX1yb1hzO6Kg== X-Received: by 2002:a63:ce57:: with SMTP id r23mr12761914pgi.271.1629356483522; Thu, 19 Aug 2021 00:01:23 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.237]) by smtp.gmail.com with ESMTPSA id t30sm2490395pgl.47.2021.08.19.00.01.17 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 19 Aug 2021 00:01:23 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v2 1/4] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page Date: Thu, 19 Aug 2021 14:58:28 +0800 Message-Id: <20210819065831.43186-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210819065831.43186-1-songmuchun@bytedance.com> References: <20210819065831.43186-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 82A98D0048B3 X-Stat-Signature: j8wkjgkabbde5bd1u5dcwhofri69douo Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=fZgUHxGh; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf15.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-HE-Tag: 1629356484-322547 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, we only free 6 vmemmap pages associated with a 2MB HugeTLB page. However, we can remap all tail vmemmap pages to the page frame mapped to with the head vmemmap page. Finally, we can free 7 vmemmap pages for a 2MB HugeTLB page. It is a fine gain (e.g. we can save extra 2GB memory when there is 1TB HugeTLB pages in the system compared with the current implementation). But the head vmemmap page is not freed to the buddy allocator and all tail vmemmap pages are mapped to the head vmemmap page frame. So we can see more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page) associated with each HugeTLB page. We should adjust compound_head() to make it returns the real head struct page when the parameter is the tail struct page but with PG_head flag. Signed-off-by: Muchun Song --- Documentation/admin-guide/kernel-parameters.txt | 2 +- include/linux/page-flags.h | 75 +++++++++++++++++++++++-- mm/hugetlb_vmemmap.c | 60 +++++++++++--------- mm/sparse-vmemmap.c | 21 +++++++ 4 files changed, 126 insertions(+), 32 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index bdb22006f713..a154a7b3b9a5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1606,7 +1606,7 @@ [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP enabled. Allows heavy hugetlb users to free up some more - memory (6 * PAGE_SIZE for each 2MB hugetlb page). + memory (7 * PAGE_SIZE for each 2MB hugetlb page). Format: { on | off (default) } on: enable the feature diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 8e1d97d8f3bd..7b1a918ebd43 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -184,13 +184,64 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +extern bool hugetlb_free_vmemmap_enabled; + +/* + * If the feature of freeing some vmemmap pages associated with each HugeTLB + * page is enabled, the head vmemmap page frame is reused and all of the tail + * vmemmap addresses map to the head vmemmap page frame (furture details can + * refer to the figure at the head of the mm/hugetlb_vmemmap.c). In other + * word, there are more than one page struct with PG_head associated with each + * HugeTLB page. We __know__ that there is only one head page struct, the tail + * page structs with PG_head are fake head page structs. We need an approach + * to distinguish between those two different types of page structs so that + * compound_head() can return the real head page struct when the parameter is + * the tail page struct but with PG_head. + * + * The page_head_if_fake() returns the real head page struct iff the @page may + * be fake, otherwise, returns the @page if it cannot be a fake page struct. + */ +static __always_inline const struct page *page_head_if_fake(const struct page *page) +{ + if (!hugetlb_free_vmemmap_enabled) + return page; + + /* + * Only addresses aligned with PAGE_SIZE of struct page may be fake head + * struct page. The alignment check aims to avoid access the fields ( + * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly) + * cold cacheline in some cases. + */ + if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) && + test_bit(PG_head, &page->flags)) { + /* + * We can safely access the field of the @page[1] with PG_head + * because the @page is a compound page composed with at least + * two contiguous pages. + */ + unsigned long head = READ_ONCE(page[1].compound_head); + + if (likely(head & 1)) + return (const struct page *)(head - 1); + } + + return page; +} +#else +static __always_inline const struct page *page_head_if_fake(const struct page *page) +{ + return page; +} +#endif + static inline unsigned long _compound_head(const struct page *page) { unsigned long head = READ_ONCE(page->compound_head); if (unlikely(head & 1)) return head - 1; - return (unsigned long)page; + return (unsigned long)page_head_if_fake(page); } #define compound_head(page) ((typeof(page))_compound_head(page)) @@ -225,12 +276,14 @@ static inline unsigned long _compound_head(const struct page *page) static __always_inline int PageTail(struct page *page) { - return READ_ONCE(page->compound_head) & 1; + return READ_ONCE(page->compound_head) & 1 || + page_head_if_fake(page) != page; } static __always_inline int PageCompound(struct page *page) { - return test_bit(PG_head, &page->flags) || PageTail(page); + return test_bit(PG_head, &page->flags) || + READ_ONCE(page->compound_head) & 1; } #define PAGE_POISON_PATTERN -1l @@ -675,7 +728,21 @@ static inline bool test_set_page_writeback(struct page *page) return set_page_writeback(page); } -__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY) +static __always_inline bool folio_test_head(struct folio *folio) +{ + return test_bit(PG_head, folio_flags(folio, FOLIO_PF_ANY)); +} + +static __always_inline int PageHead(struct page *page) +{ + PF_POISONED_CHECK(page); + return test_bit(PG_head, &page->flags) && + page_head_if_fake(page) == page; +} + +__SETPAGEFLAG(Head, head, PF_ANY) +__CLEARPAGEFLAG(Head, head, PF_ANY) +CLEARPAGEFLAG(Head, head, PF_ANY) /* Whether there are one or multiple pages in a folio */ static inline bool folio_single(struct folio *folio) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index c540c21e26f5..527bcaa44a48 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -124,9 +124,9 @@ * page of page structs (page 0) associated with the HugeTLB page contains the 4 * page structs necessary to describe the HugeTLB. The only use of the remaining * pages of page structs (page 1 to page 7) is to point to page->compound_head. - * Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs + * Therefore, we can remap pages 1 to 7 to page 0. Only 1 pages of page structs * will be used for each HugeTLB page. This will allow us to free the remaining - * 6 pages to the buddy allocator. + * 7 pages to the buddy allocator. * * Here is how things look after remapping. * @@ -134,30 +134,30 @@ * +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ * | | | 0 | -------------> | 0 | * | | +-----------+ +-----------+ - * | | | 1 | -------------> | 1 | - * | | +-----------+ +-----------+ - * | | | 2 | ----------------^ ^ ^ ^ ^ ^ - * | | +-----------+ | | | | | - * | | | 3 | ------------------+ | | | | - * | | +-----------+ | | | | - * | | | 4 | --------------------+ | | | - * | PMD | +-----------+ | | | - * | level | | 5 | ----------------------+ | | - * | mapping | +-----------+ | | - * | | | 6 | ------------------------+ | - * | | +-----------+ | - * | | | 7 | --------------------------+ + * | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^ + * | | +-----------+ | | | | | | + * | | | 2 | -----------------+ | | | | | + * | | +-----------+ | | | | | + * | | | 3 | -------------------+ | | | | + * | | +-----------+ | | | | + * | | | 4 | ---------------------+ | | | + * | PMD | +-----------+ | | | + * | level | | 5 | -----------------------+ | | + * | mapping | +-----------+ | | + * | | | 6 | -------------------------+ | + * | | +-----------+ | + * | | | 7 | ---------------------------+ * | | +-----------+ * | | * | | * | | * +-----------+ * - * When a HugeTLB is freed to the buddy system, we should allocate 6 pages for + * When a HugeTLB is freed to the buddy system, we should allocate 7 pages for * vmemmap pages and restore the previous mapping relationship. * * For the HugeTLB page of the pud level mapping. It is similar to the former. - * We also can use this approach to free (PAGE_SIZE - 2) vmemmap pages. + * We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages. * * Apart from the HugeTLB page of the pmd/pud level mapping, some architectures * (e.g. aarch64) provides a contiguous bit in the translation table entries @@ -166,7 +166,13 @@ * * The contiguous bit is used to increase the mapping size at the pmd and pte * (last) level. So this type of HugeTLB page can be optimized only when its - * size of the struct page structs is greater than 2 pages. + * size of the struct page structs is greater than 1 pages. + * + * Notice: The head vmemmap page is not freed to the buddy allocator and all + * tail vmemmap pages are mapped to the head vmemmap page frame. So we can see + * more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page) + * associated with each HugeTLB page. The compound_head() can handle this + * correctly (more details refer to the comment above compound_head()). */ #define pr_fmt(fmt) "HugeTLB: " fmt @@ -175,14 +181,16 @@ /* * There are a lot of struct page structures associated with each HugeTLB page. * For tail pages, the value of compound_head is the same. So we can reuse first - * page of tail page structures. We map the virtual addresses of the remaining - * pages of tail page structures to the first tail page struct, and then free - * these page frames. Therefore, we need to reserve two pages as vmemmap areas. + * page of head page structures. We map the virtual addresses of all the pages + * of tail page structures to the head page struct, and then free these page + * frames. Therefore, we need to reserve one pages as vmemmap areas. */ -#define RESERVE_VMEMMAP_NR 2U +#define RESERVE_VMEMMAP_NR 1U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) -bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); +bool hugetlb_free_vmemmap_enabled __read_mostly = + IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); +EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled); static int __init early_hugetlb_free_vmemmap_param(char *buf) { @@ -236,7 +244,6 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) */ ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE); - if (!ret) ClearHPageVmemmapOptimized(head); @@ -282,9 +289,8 @@ void __init hugetlb_vmemmap_init(struct hstate *h) vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT; /* - * The head page and the first tail page are not to be freed to buddy - * allocator, the other pages will map to the first tail page, so they - * can be freed. + * The head page is not to be freed to buddy allocator, the other tail + * pages will map to the head page, so they can be freed. * * Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? It is true * on some architectures (e.g. aarch64). See Documentation/arm64/ diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index bdce883f9286..62e3d20648ce 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -53,6 +53,17 @@ struct vmemmap_remap_walk { struct list_head *vmemmap_pages; }; +/* + * How many struct page structs need to be reset. When we reuse the head + * struct page, the special metadata (e.g. page->flags or page->mapping) + * cannot copy to the tail struct page structs. The invalid value will be + * checked in the free_tail_pages_check(). In order to avoid the message + * of "corrupted mapping in tail page". We need to reset at least 3 (one + * head struct page struct and two tail struct page structs) struct page + * structs. + */ +#define NR_RESET_STRUCT_PAGE 3 + static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, struct vmemmap_remap_walk *walk) { @@ -245,6 +256,15 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, set_pte_at(&init_mm, addr, pte, entry); } +static inline void reset_struct_pages(struct page *start) +{ + int i; + struct page *from = start + NR_RESET_STRUCT_PAGE; + + for (i = 0; i < NR_RESET_STRUCT_PAGE; i++) + memcpy(start + i, from, sizeof(*from)); +} + static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, struct vmemmap_remap_walk *walk) { @@ -258,6 +278,7 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, list_del(&page->lru); to = page_to_virt(page); copy_page(to, (void *)walk->reuse_addr); + reset_struct_pages(to); set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); } From patchwork Thu Aug 19 06:58:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12446451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAF31C4320A for ; Thu, 19 Aug 2021 07:01:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 606CB6101A for ; Thu, 19 Aug 2021 07:01:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 606CB6101A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 078256B0072; Thu, 19 Aug 2021 03:01:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 026796B0073; Thu, 19 Aug 2021 03:01:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E30C36B0074; Thu, 19 Aug 2021 03:01:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0181.hostedemail.com [216.40.44.181]) by kanga.kvack.org (Postfix) with ESMTP id C92816B0072 for ; Thu, 19 Aug 2021 03:01:30 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 667AC181D0289 for ; Thu, 19 Aug 2021 07:01:30 +0000 (UTC) X-FDA: 78490934340.01.78333F9 Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) by imf05.hostedemail.com (Postfix) with ESMTP id 19468506A0AF for ; Thu, 19 Aug 2021 07:01:29 +0000 (UTC) Received: by mail-pg1-f175.google.com with SMTP id q2so4972248pgt.6 for ; Thu, 19 Aug 2021 00:01:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OTxY68Ufdm8UNqRMLWBsmc/KZ1S77UCqKKx7OF0ikR0=; b=udSVxlwMVtCgOdal9skEcrtOOlkbpjElZ0fwLOo8FwTTHT/Jnxl5qRS//fZ+inFeq4 N6U7m4ES00lY1g5Z3nWRJfQISLuH9mIWJkadx3e5lZRPgwjcsKr+nZbLba83Gqh5MrUe LVIOKysRnfasE0FG2JvPu1aoqc/k41nHT5qNQ7/bwHmx92z2o/Ewg+mIzMcD10ng3GMa pldchSUP694vbNs590DJbMGSx4cVT7R4kbvTonTKSkCeZ23CLFCa4RjfpGw2o142Cdvt 4o86NswXubR6AROFGWZ4FzR0zWuXqtYb+mTvoWHt8RuRgrfQ5JZjhYC/+VbK0V3Vtc32 xn9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OTxY68Ufdm8UNqRMLWBsmc/KZ1S77UCqKKx7OF0ikR0=; b=KtfEnV3d+JfeAuEithbM0GgKaknf+raj0wYR5tLXjvKy5TdRvSmOykmZWsuojE4zqY 0MIaZysEAji4RQD47tLlfvP5P8KCK1ZWpA7UfqsaSfA++4cSteNSjKkZgO0U6RZBYf3N 3cv4KnFiHwTzoSaYKNMrV4bUjH1N/PQuxSf8bwwrwr6m43Ro7RfzWU4+0WRPZLXzygEB hCp2LF2cmUhiSDA7qCZOb2jnY3oQT4UWmcvF3Cjj07AFRnWul9Sx8XgXsNhKZyz58Nlb 2FKuumaRbSjzzrjunywwEbtE/XN+DsXJ5rSwSHJRrbCujIwL/3WT9jjT94VUdbMim87i rYsw== X-Gm-Message-State: AOAM533tkXM9GADTwG/VbNc0CTkGV+/mLGr1ImYI0BKzpnYkJlcDUiro Xee5XQNa4x6nqafniT5ykC30Bg== X-Google-Smtp-Source: ABdhPJz6BNvUtnn0IBiNO/ph123G8mcdkJEcNpmKIzPsWUA724ZXomsEWhuBZTvbLwPKu0WhJiQCVA== X-Received: by 2002:aa7:86d5:0:b0:3e1:abc7:890b with SMTP id h21-20020aa786d5000000b003e1abc7890bmr13279465pfo.4.1629356489253; Thu, 19 Aug 2021 00:01:29 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.237]) by smtp.gmail.com with ESMTPSA id t30sm2490395pgl.47.2021.08.19.00.01.23 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 19 Aug 2021 00:01:28 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v2 2/4] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key Date: Thu, 19 Aug 2021 14:58:29 +0800 Message-Id: <20210819065831.43186-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210819065831.43186-1-songmuchun@bytedance.com> References: <20210819065831.43186-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=udSVxlwM; spf=pass (imf05.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.175 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 19468506A0AF X-Stat-Signature: ethc1tpuaxq3g7kpzcx47h3b8yrc1pik X-HE-Tag: 1629356489-813520 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The page_head_if_fake() is used throughout memory management and the conditional check requires checking a global variable, although the overhead of this check may be small, it increases when the memory cache comes under pressure. Also, the global variable will not be modified after system boot, so it is very appropriate to use static key machanism. Signed-off-by: Muchun Song --- include/linux/hugetlb.h | 6 +++++- include/linux/page-flags.h | 6 ++++-- mm/hugetlb_vmemmap.c | 10 +++++----- 3 files changed, 14 insertions(+), 8 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index f7ca1a3870ea..ee3ddf3d12cf 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1057,7 +1057,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr #endif /* CONFIG_HUGETLB_PAGE */ #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP -extern bool hugetlb_free_vmemmap_enabled; +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + hugetlb_free_vmemmap_enabled_key); +#define hugetlb_free_vmemmap_enabled \ + static_key_enabled(&hugetlb_free_vmemmap_enabled_key) + #else #define hugetlb_free_vmemmap_enabled false #endif diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 7b1a918ebd43..d68d2cf30d76 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -185,7 +185,8 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP -extern bool hugetlb_free_vmemmap_enabled; +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + hugetlb_free_vmemmap_enabled_key); /* * If the feature of freeing some vmemmap pages associated with each HugeTLB @@ -204,7 +205,8 @@ extern bool hugetlb_free_vmemmap_enabled; */ static __always_inline const struct page *page_head_if_fake(const struct page *page) { - if (!hugetlb_free_vmemmap_enabled) + if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + &hugetlb_free_vmemmap_enabled_key)) return page; /* diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 527bcaa44a48..5b80129c684c 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -188,9 +188,9 @@ #define RESERVE_VMEMMAP_NR 1U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) -bool hugetlb_free_vmemmap_enabled __read_mostly = - IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); -EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled); +DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + hugetlb_free_vmemmap_enabled_key); +EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key); static int __init early_hugetlb_free_vmemmap_param(char *buf) { @@ -204,9 +204,9 @@ static int __init early_hugetlb_free_vmemmap_param(char *buf) return -EINVAL; if (!strcmp(buf, "on")) - hugetlb_free_vmemmap_enabled = true; + static_branch_enable(&hugetlb_free_vmemmap_enabled_key); else if (!strcmp(buf, "off")) - hugetlb_free_vmemmap_enabled = false; + static_branch_disable(&hugetlb_free_vmemmap_enabled_key); else return -EINVAL; From patchwork Thu Aug 19 06:58:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12446453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73CF6C4338F for ; Thu, 19 Aug 2021 07:01:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E753C6101A for ; Thu, 19 Aug 2021 07:01:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E753C6101A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 901B46B0073; Thu, 19 Aug 2021 03:01:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B1F56B0074; Thu, 19 Aug 2021 03:01:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7793A6B0075; Thu, 19 Aug 2021 03:01:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id 5EBC76B0073 for ; Thu, 19 Aug 2021 03:01:37 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 08F5B267EA for ; Thu, 19 Aug 2021 07:01:37 +0000 (UTC) X-FDA: 78490934634.15.2861DBD Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf04.hostedemail.com (Postfix) with ESMTP id BBCF050056F1 for ; Thu, 19 Aug 2021 07:01:36 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id m26so4612995pff.3 for ; Thu, 19 Aug 2021 00:01:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0sSF06QuAY7kUNXCT+H8hO7WNAzsL+XHoJo4uKapYUQ=; b=bz6B9GnjEB8FJpQWlfTXzMXQJYKxR4mxvj7WuwJpA0zmWpAJP9VEBNnjnzda/rgMRj VLRfxuos/e58YoGwUJZlfkvZNH6MBwX8G5zW9vnEEX85aTscSwUrZKDdxnh8BLK3XS/a 4OCBBnp0PrcADiaF11veTWYCToZjMc/cmnlSXJJnmjO6pG2ACpLRU3IBHAWNk5zC8CU4 m7Hqz8G10aBfkaCpgH1wNiABon0jihEeyC1VkozVILjraJF/MOPDNztML+DCPNuW46bb CENuQsUMww/Mv2CjgIsi5PEmBXY/GSH3hZiRPTa2tR5q1I4SO+nIHcSj044Ffb+QRlPb Nf2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0sSF06QuAY7kUNXCT+H8hO7WNAzsL+XHoJo4uKapYUQ=; b=PkMKz1xW7gMa2IPYvXjH/ZGlLyJ4M53NL1LFET8d/jNaOlFzF7Z7GAttuaPsNFN8Sl DTuZrZWB2vAUe7X10ub++0OMWDKnGoiEqQC736+L4K7VNT/EaLdDUbGGl33eIO7UAdFf eLX/jsiE87lp6L2zhRxCmx5evZ2t7wmuGt2MiO1dmJjsObmt+/gMdGsiuoai+Qxh8uwY wuQ6bRutrmiBUxs3nFDFWep0tNjZWF15WJ0TCpl9cBbLVZSKJSOUCm2VY2ZjMuSNA21L pkR235b7iuGBkREZk5KNoUQ17YlnrGJgsd6mhkhJaD1ZxEXU2OKjze6jDpdi0V0jZCk8 zv8w== X-Gm-Message-State: AOAM531EHF6I6+Z8ZN6Z5Wfdcr+a5y2OmOMXkd948qtLH5xSuU80QeYZ x+UZt+H2vevbsqdYez4J9xa0Aw== X-Google-Smtp-Source: ABdhPJzewjf+wgm9JSfyZS7moriIW5lEXauBnMcKhK5qtrC7xUIu665oSlEW5oA7tsAECEyjsySsNQ== X-Received: by 2002:aa7:93b1:0:b0:3e0:f290:72b3 with SMTP id x17-20020aa793b1000000b003e0f29072b3mr13121262pff.46.1629356495854; Thu, 19 Aug 2021 00:01:35 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.237]) by smtp.gmail.com with ESMTPSA id t30sm2490395pgl.47.2021.08.19.00.01.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 19 Aug 2021 00:01:35 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations Date: Thu, 19 Aug 2021 14:58:30 +0800 Message-Id: <20210819065831.43186-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210819065831.43186-1-songmuchun@bytedance.com> References: <20210819065831.43186-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=bz6B9Gnj; spf=pass (imf04.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: BBCF050056F1 X-Stat-Signature: mygu5s78n61zkoiux3en5p4emimpi7pf X-HE-Tag: 1629356496-722246 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The init_mm.page_table_lock is used to protect kernel page tables, we can use it to serialize splitting vmemmap PMD mappings instead of mmap write lock, which can increase the concurrency of vmemmap_remap_free(). Signed-off-by: Muchun Song --- mm/ptdump.c | 16 ++++++++++++---- mm/sparse-vmemmap.c | 49 ++++++++++++++++++++++++++++++++++--------------- 2 files changed, 46 insertions(+), 19 deletions(-) diff --git a/mm/ptdump.c b/mm/ptdump.c index da751448d0e4..eea3d28d173c 100644 --- a/mm/ptdump.c +++ b/mm/ptdump.c @@ -40,8 +40,10 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr, if (st->effective_prot) st->effective_prot(st, 0, pgd_val(val)); - if (pgd_leaf(val)) + if (pgd_leaf(val)) { st->note_page(st, addr, 0, pgd_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } @@ -61,8 +63,10 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr, if (st->effective_prot) st->effective_prot(st, 1, p4d_val(val)); - if (p4d_leaf(val)) + if (p4d_leaf(val)) { st->note_page(st, addr, 1, p4d_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } @@ -82,8 +86,10 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr, if (st->effective_prot) st->effective_prot(st, 2, pud_val(val)); - if (pud_leaf(val)) + if (pud_leaf(val)) { st->note_page(st, addr, 2, pud_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } @@ -101,8 +107,10 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr, if (st->effective_prot) st->effective_prot(st, 3, pmd_val(val)); - if (pmd_leaf(val)) + if (pmd_leaf(val)) { st->note_page(st, addr, 3, pmd_val(val)); + walk->action = ACTION_CONTINUE; + } return 0; } diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 62e3d20648ce..e636943ccfc4 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -64,8 +64,8 @@ struct vmemmap_remap_walk { */ #define NR_RESET_STRUCT_PAGE 3 -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, - struct vmemmap_remap_walk *walk) +static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, + struct vmemmap_remap_walk *walk) { pmd_t __pmd; int i; @@ -87,15 +87,37 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, set_pte_at(&init_mm, addr, pte, entry); } - /* Make pte visible before pmd. See comment in __pte_alloc(). */ - smp_wmb(); - pmd_populate_kernel(&init_mm, pmd, pgtable); + spin_lock(&init_mm.page_table_lock); + if (likely(pmd_leaf(*pmd))) { + /* Make pte visible before pmd. See comment in __pte_alloc(). */ + smp_wmb(); + pmd_populate_kernel(&init_mm, pmd, pgtable); + flush_tlb_kernel_range(start, start + PMD_SIZE); + spin_unlock(&init_mm.page_table_lock); - flush_tlb_kernel_range(start, start + PMD_SIZE); + return 0; + } + spin_unlock(&init_mm.page_table_lock); + pte_free_kernel(&init_mm, pgtable); return 0; } +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, + struct vmemmap_remap_walk *walk) +{ + int ret; + + spin_lock(&init_mm.page_table_lock); + ret = pmd_leaf(*pmd); + spin_unlock(&init_mm.page_table_lock); + + if (ret) + ret = __split_vmemmap_huge_pmd(pmd, start, walk); + + return ret; +} + static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct vmemmap_remap_walk *walk) @@ -132,13 +154,12 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, pmd = pmd_offset(pud, addr); do { - if (pmd_leaf(*pmd)) { - int ret; + int ret; + + ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk); + if (ret) + return ret; - ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk); - if (ret) - return ret; - } next = pmd_addr_end(addr, end); vmemmap_pte_range(pmd, addr, next, walk); } while (pmd++, addr = next, addr != end); @@ -321,10 +342,8 @@ int vmemmap_remap_free(unsigned long start, unsigned long end, */ BUG_ON(start - reuse != PAGE_SIZE); - mmap_write_lock(&init_mm); + mmap_read_lock(&init_mm); ret = vmemmap_remap_range(reuse, end, &walk); - mmap_write_downgrade(&init_mm); - if (ret && walk.nr_walked) { end = reuse + walk.nr_walked * PAGE_SIZE; /* From patchwork Thu Aug 19 06:58:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12446455 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 482F4C4320A for ; Thu, 19 Aug 2021 07:01:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EF05961155 for ; Thu, 19 Aug 2021 07:01:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EF05961155 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 94F646B0074; Thu, 19 Aug 2021 03:01:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 90C286B0075; Thu, 19 Aug 2021 03:01:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C75C6B0078; Thu, 19 Aug 2021 03:01:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id 634F66B0074 for ; Thu, 19 Aug 2021 03:01:43 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0065A182CEC6C for ; Thu, 19 Aug 2021 07:01:42 +0000 (UTC) X-FDA: 78490934886.27.7AAE0A8 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf24.hostedemail.com (Postfix) with ESMTP id AE90DB003468 for ; Thu, 19 Aug 2021 07:01:42 +0000 (UTC) Received: by mail-pg1-f177.google.com with SMTP id s11so4986152pgr.11 for ; Thu, 19 Aug 2021 00:01:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TGoc9aaZ7sLEp8beTZ83fWsgv448hw3NlquoSI1tIpc=; b=bwr/h7J8Lb9dED0QAbvdKsEfFQ/MFTlsI31a+1pZku2apk3Fl0LXScilrk5x4A2Xbn fTP7qJo4jpMohkbX5wICFS4p9/N2yaaiqmGM8Q6VcZHFc8qcdjGwEbkCmXRN/xsWYLpj xablzhxnPCGGpgzbOyo46k6DQEs+sJqI3Y1K1dGdtKb4HmbArT+b4JMN3JXkrf4i8Wxa PZ5mmkmwX0BAtnCoBmtoD2dV8D4gCP1B3wXC6CUPZ3HgtPVXJnwc6yHtc8C+fQBiEuuE Yaz1W0Hoix1uUJVJu6hgdP8ggF8Fk6L5ef14ba5r3nhFwcMevvg91/unY6ljsZmP4cZF NtfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TGoc9aaZ7sLEp8beTZ83fWsgv448hw3NlquoSI1tIpc=; b=j4Ur5vMfLr/D4bYpTt+kgDojCMJQOUvhOVgkHeiCqqm6kfdPu8O0x4Ez8ADXmpq+UV ynndunxHZwJr+eIACukolwH6YqeEIlu2VsfRXt+TRF+dNJtnS3KJWGE3D5+cYJC3r+kj IB9fl4n0ZEtslLnJSWaEKw+l/dDuqjY2ykOyxbiHoccwwkjOJtkegi3oTAdP7pEn5IPV 9moFRTlRd9FJM0HaOF8qI/DbrHwOtfRHjXRTRvfab5aCJMRxCZLmE7pe6TRDYx3WBet3 64+VUQp/tsW2BHI4dSIKCPudjTrZlEsJaLMJEnD34uU4J/5pTKSMB0r8nCNoqnrzCIKT SCaQ== X-Gm-Message-State: AOAM532lMs0i1wBi8hxuj3ldlwSKXTFk7wj230n3+epva8xDB63VDb25 f934K6EXFVe6BUW3dYvY1yr7mw== X-Google-Smtp-Source: ABdhPJw1t5bEpzzN39Y5h9hV/52kGgevNTt539gpBKbVZs4zR6Bnyux1Z24rBd7QIGzHw2oc3ZBUBw== X-Received: by 2002:a63:fd12:: with SMTP id d18mr12704852pgh.129.1629356501856; Thu, 19 Aug 2021 00:01:41 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.237]) by smtp.gmail.com with ESMTPSA id t30sm2490395pgl.47.2021.08.19.00.01.36 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 19 Aug 2021 00:01:41 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v2 4/4] selftests: vm: add a hugetlb test case Date: Thu, 19 Aug 2021 14:58:31 +0800 Message-Id: <20210819065831.43186-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210819065831.43186-1-songmuchun@bytedance.com> References: <20210819065831.43186-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b="bwr/h7J8"; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf24.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Stat-Signature: kdyf5ep5c31if18bksk77d3h847m63u7 X-Rspamd-Queue-Id: AE90DB003468 X-Rspamd-Server: rspam05 X-HE-Tag: 1629356502-97738 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since the head vmemmap page frame associated with each HugeTLB page is reused, we should hide the PG_head flag of tail struct page from the user. Add a tese case to check whether it is work properly. Signed-off-by: Muchun Song --- tools/testing/selftests/vm/vmemmap_hugetlb.c | 139 +++++++++++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 tools/testing/selftests/vm/vmemmap_hugetlb.c diff --git a/tools/testing/selftests/vm/vmemmap_hugetlb.c b/tools/testing/selftests/vm/vmemmap_hugetlb.c new file mode 100644 index 000000000000..b6e945bf4053 --- /dev/null +++ b/tools/testing/selftests/vm/vmemmap_hugetlb.c @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * A test case of using hugepage memory in a user application using the + * mmap system call with MAP_HUGETLB flag. Before running this program + * make sure the administrator has allocated enough default sized huge + * pages to cover the 2 MB allocation. + * + * For ia64 architecture, Linux kernel reserves Region number 4 for hugepages. + * That means the addresses starting with 0x800000... will need to be + * specified. Specifying a fixed address is not required on ppc64, i386 + * or x86_64. + */ +#include +#include +#include +#include +#include + +#define MAP_LENGTH (2UL * 1024 * 1024) + +#ifndef MAP_HUGETLB +#define MAP_HUGETLB 0x40000 /* arch specific */ +#endif + +#define PAGE_SIZE 4096 + +#define PAGE_COMPOUND_HEAD (1UL << 15) +#define PAGE_COMPOUND_TAIL (1UL << 16) +#define PAGE_HUGE (1UL << 17) + +#define HEAD_PAGE_FLAGS (PAGE_COMPOUND_HEAD | PAGE_HUGE) +#define TAIL_PAGE_FLAGS (PAGE_COMPOUND_TAIL | PAGE_HUGE) + +#define PM_PFRAME_BITS 55 +#define PM_PFRAME_MASK ~((1UL << PM_PFRAME_BITS) - 1) + +/* Only ia64 requires this */ +#ifdef __ia64__ +#define MAP_ADDR (void *)(0x8000000000000000UL) +#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED) +#else +#define MAP_ADDR NULL +#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) +#endif + +static void write_bytes(char *addr, size_t length) +{ + unsigned long i; + + for (i = 0; i < length; i++) + *(addr + i) = (char)i; +} + +static unsigned long virt_to_pfn(void *addr) +{ + int fd; + unsigned long pagemap; + + fd = open("/proc/self/pagemap", O_RDONLY); + if (fd < 0) + return -1UL; + + lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET); + read(fd, &pagemap, sizeof(pagemap)); + close(fd); + + return pagemap & ~PM_PFRAME_MASK; +} + +static int check_page_flags(unsigned long pfn) +{ + int fd, i; + unsigned long pageflags; + + fd = open("/proc/kpageflags", O_RDONLY); + if (fd < 0) + return -1; + + lseek(fd, pfn * sizeof(pageflags), SEEK_SET); + + read(fd, &pageflags, sizeof(pageflags)); + if ((pageflags & HEAD_PAGE_FLAGS) != HEAD_PAGE_FLAGS) { + close(fd); + printf("Head page flags (%lx) is invalid\n", pageflags); + return -1; + } + + for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) { + read(fd, &pageflags, sizeof(pageflags)); + if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS || + (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) { + close(fd); + printf("Tail page flags (%lx) is invalid\n", pageflags); + return -1; + } + } + + close(fd); + + return 0; +} + +int main(int argc, char **argv) +{ + void *addr; + unsigned long pfn; + + addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0); + if (addr == MAP_FAILED) { + perror("mmap"); + exit(1); + } + + /* Trigger allocation of HugeTLB page. */ + write_bytes(addr, MAP_LENGTH); + + pfn = virt_to_pfn(addr); + if (pfn == -1UL) { + munmap(addr, MAP_LENGTH); + perror("virt_to_pfn"); + exit(1); + } + + printf("Returned address is %p whose pfn is %lx\n", addr, pfn); + + if (check_page_flags(pfn) < 0) { + munmap(addr, MAP_LENGTH); + perror("check_page_flags"); + exit(1); + } + + /* munmap() length of MAP_HUGETLB memory must be hugepage aligned */ + if (munmap(addr, MAP_LENGTH)) { + perror("munmap"); + exit(1); + } + + return 0; +}