From patchwork Sat Jun 12 09:45:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12316945 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39E6FC48BCF for ; Sat, 12 Jun 2021 09:46:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AB584611CC for ; Sat, 12 Jun 2021 09:46:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AB584611CC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4D99C6B006E; Sat, 12 Jun 2021 05:46:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 488486B0070; Sat, 12 Jun 2021 05:46:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B6C86B0071; Sat, 12 Jun 2021 05:46:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id E8BAC6B006E for ; Sat, 12 Jun 2021 05:46:15 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 738C78249980 for ; Sat, 12 Jun 2021 09:46:15 +0000 (UTC) X-FDA: 78244591110.09.6CC6ECA Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf13.hostedemail.com (Postfix) with ESMTP id A3270E000243 for ; Sat, 12 Jun 2021 09:46:07 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id h12-20020a17090aa88cb029016400fd8ad8so7360056pjq.3 for ; Sat, 12 Jun 2021 02:46:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Un9OGrfbYYOOduOdmv+QQ9zUCVXg5KgaTJKOHZY1pLE=; b=Ls4kGmR2Uqf3cWDx+KcF3sAqPEOin+nodqRpP0W0qywtkFaJ+/iCI7rCUBz1oVXoa+ k6Ur39cqmLQvNvotcH4+Q7CGbKeZ+fGooAf4P7JSC/NMksAlJ44wlK3kagAqMi95wSJk Hu/ayp693r6Gp0ky7La01r22hRLIUGUPINVUFO6zz8loXVgKCTCZzZK0G9c3hpmP+czp JVHxf5F0hFQAg6Vf/udr8/gXlekpbJ0igO54vgBB4fopx0PKuu1AhwXSxhsZsFQNxmbw izDI/Y/lWcAtbhcU09w60XfKZcEw0Nmi+zPe2tNifSiFV2fsDioW5/QMCwdzAKdK/qt2 exaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Un9OGrfbYYOOduOdmv+QQ9zUCVXg5KgaTJKOHZY1pLE=; b=lItUGFe2aQ9+BTsZ7mnSfVvP6ABj7gwxcMDlu43WMucvOJJDvQGwom7WWN4QnHLf9S i3mCZY8+bucv1B8WELFvxt7cODNOaGrXsDy+QiPqVn7LQF7DeXPxUxhdik1oLJ7jx7kT Ookqm2G5DiXbPJey/JzYUBBrrGSjImJ5yPiRJOeDNXPueWIuzROPVe0Hb94qQg75jJld xJihMP16W4rYwO18XUFWQZ9s8204A9c6arsRMVb9ygpMB7gvHCNT8jsxu5hFiz1axg/t rMZWOhJTSSsSdA7C7cARYYrkoQiB+ef8GFD/NYqzSRK1NDsEWYD7158VPecL1NyPm5+W 9dJQ== X-Gm-Message-State: AOAM533MsBThn4oqgIDYV39sEFBqisw15sjhXrMymu5PADJj3jEBsHxc 9hdTUENR/BI4Rn6PbCUWaZ5fjg== X-Google-Smtp-Source: ABdhPJz5QQVgV01ntip77zgYCTfnUIdmUJT8IdqfSP3wRqf4wN12CDMhe/0lFlsj4YKjHMtwfOHrfQ== X-Received: by 2002:a17:902:7085:b029:114:eb3f:fe29 with SMTP id z5-20020a1709027085b0290114eb3ffe29mr7937121plk.40.1623491172734; Sat, 12 Jun 2021 02:46:12 -0700 (PDT) Received: from localhost.tiktokd.org ([139.177.225.246]) by smtp.gmail.com with ESMTPSA id t39sm6929557pfg.147.2021.06.12.02.46.07 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 12 Jun 2021 02:46:12 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v2 1/3] mm: sparsemem: split the huge PMD mapping of vmemmap pages Date: Sat, 12 Jun 2021 17:45:53 +0800 Message-Id: <20210612094555.71344-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210612094555.71344-1-songmuchun@bytedance.com> References: <20210612094555.71344-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A3270E000243 X-Stat-Signature: iko3y1qadngwyaderyus71pw3ckpfwak Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=Ls4kGmR2; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf13.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-HE-Tag: 1623491167-929265 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, we disable huge PMD mapping of vmemmap pages when that feature of "Free some vmemmap pages of HugeTLB pages" is enabled. If the vmemmap is huge PMD mapped when we walk the vmemmap page tables, we split the huge PMD firstly and then we move to PTE mappings. When HugeTLB pages are freed from the pool we do not attempt coalasce and move back to a PMD mapping because it is much more complex. Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz --- include/linux/mm.h | 4 +- mm/hugetlb_vmemmap.c | 5 +- mm/sparse-vmemmap.c | 157 ++++++++++++++++++++++++++++++++++++++------------- 3 files changed, 123 insertions(+), 43 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cadc8cc2c715..8284e8ed30c9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3055,8 +3055,8 @@ static inline void print_vma_addr(char *prefix, unsigned long rip) } #endif -void vmemmap_remap_free(unsigned long start, unsigned long end, - unsigned long reuse); +int vmemmap_remap_free(unsigned long start, unsigned long end, + unsigned long reuse); int vmemmap_remap_alloc(unsigned long start, unsigned long end, unsigned long reuse, gfp_t gfp_mask); diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index f9f9bb212319..06802056f296 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -258,9 +258,8 @@ void free_huge_page_vmemmap(struct hstate *h, struct page *head) * to the page which @vmemmap_reuse is mapped to, then free the pages * which the range [@vmemmap_addr, @vmemmap_end] is mapped to. */ - vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse); - - SetHPageVmemmapOptimized(head); + if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) + SetHPageVmemmapOptimized(head); } void __init hugetlb_vmemmap_init(struct hstate *h) diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 693de0aec7a8..7f73c37f742d 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -38,6 +38,7 @@ * vmemmap_remap_walk - walk vmemmap page table * * @remap_pte: called for each lowest-level entry (PTE). + * @walked_pte: the number of walked pte. * @reuse_page: the page which is reused for the tail vmemmap pages. * @reuse_addr: the virtual address of the @reuse_page page. * @vmemmap_pages: the list head of the vmemmap pages that can be freed @@ -46,11 +47,44 @@ struct vmemmap_remap_walk { void (*remap_pte)(pte_t *pte, unsigned long addr, struct vmemmap_remap_walk *walk); + unsigned long walked_pte; struct page *reuse_page; unsigned long reuse_addr; struct list_head *vmemmap_pages; }; +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, + struct vmemmap_remap_walk *walk) +{ + pmd_t __pmd; + int i; + unsigned long addr = start; + struct page *page = pmd_page(*pmd); + pte_t *pgtable = pte_alloc_one_kernel(&init_mm); + + if (!pgtable) + return -ENOMEM; + + pmd_populate_kernel(&init_mm, &__pmd, pgtable); + + for (i = 0; i < PMD_SIZE / PAGE_SIZE; i++, addr += PAGE_SIZE) { + pte_t entry, *pte; + pgprot_t pgprot = PAGE_KERNEL; + + entry = mk_pte(page + i, pgprot); + pte = pte_offset_kernel(&__pmd, addr); + set_pte_at(&init_mm, addr, pte, entry); + } + + /* Make pte visible before pmd. See comment in __pte_alloc(). */ + smp_wmb(); + pmd_populate_kernel(&init_mm, pmd, pgtable); + + flush_tlb_kernel_range(start, start + PMD_SIZE); + + return 0; +} + static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct vmemmap_remap_walk *walk) @@ -68,59 +102,81 @@ static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, * walking, skip the reuse address range. */ addr += PAGE_SIZE; + walk->walked_pte++; pte++; } - for (; addr != end; addr += PAGE_SIZE, pte++) + for (; addr != end; addr += PAGE_SIZE, pte++) { walk->remap_pte(pte, addr, walk); + walk->walked_pte++; + } } -static void vmemmap_pmd_range(pud_t *pud, unsigned long addr, - unsigned long end, - struct vmemmap_remap_walk *walk) +static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, + unsigned long end, + struct vmemmap_remap_walk *walk) { pmd_t *pmd; unsigned long next; pmd = pmd_offset(pud, addr); do { - BUG_ON(pmd_leaf(*pmd)); + if (pmd_leaf(*pmd)) { + int ret; + ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk); + if (ret) + return ret; + } next = pmd_addr_end(addr, end); vmemmap_pte_range(pmd, addr, next, walk); } while (pmd++, addr = next, addr != end); + + return 0; } -static void vmemmap_pud_range(p4d_t *p4d, unsigned long addr, - unsigned long end, - struct vmemmap_remap_walk *walk) +static int vmemmap_pud_range(p4d_t *p4d, unsigned long addr, + unsigned long end, + struct vmemmap_remap_walk *walk) { pud_t *pud; unsigned long next; pud = pud_offset(p4d, addr); do { + int ret; + next = pud_addr_end(addr, end); - vmemmap_pmd_range(pud, addr, next, walk); + ret = vmemmap_pmd_range(pud, addr, next, walk); + if (ret) + return ret; } while (pud++, addr = next, addr != end); + + return 0; } -static void vmemmap_p4d_range(pgd_t *pgd, unsigned long addr, - unsigned long end, - struct vmemmap_remap_walk *walk) +static int vmemmap_p4d_range(pgd_t *pgd, unsigned long addr, + unsigned long end, + struct vmemmap_remap_walk *walk) { p4d_t *p4d; unsigned long next; p4d = p4d_offset(pgd, addr); do { + int ret; + next = p4d_addr_end(addr, end); - vmemmap_pud_range(p4d, addr, next, walk); + ret = vmemmap_pud_range(p4d, addr, next, walk); + if (ret) + return ret; } while (p4d++, addr = next, addr != end); + + return 0; } -static void vmemmap_remap_range(unsigned long start, unsigned long end, - struct vmemmap_remap_walk *walk) +static int vmemmap_remap_range(unsigned long start, unsigned long end, + struct vmemmap_remap_walk *walk) { unsigned long addr = start; unsigned long next; @@ -131,8 +187,12 @@ static void vmemmap_remap_range(unsigned long start, unsigned long end, pgd = pgd_offset_k(addr); do { + int ret; + next = pgd_addr_end(addr, end); - vmemmap_p4d_range(pgd, addr, next, walk); + ret = vmemmap_p4d_range(pgd, addr, next, walk); + if (ret) + return ret; } while (pgd++, addr = next, addr != end); /* @@ -141,6 +201,8 @@ static void vmemmap_remap_range(unsigned long start, unsigned long end, * belongs to the range. */ flush_tlb_kernel_range(start + PAGE_SIZE, end); + + return 0; } /* @@ -179,10 +241,27 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, pte_t entry = mk_pte(walk->reuse_page, pgprot); struct page *page = pte_page(*pte); - list_add(&page->lru, walk->vmemmap_pages); + list_add_tail(&page->lru, walk->vmemmap_pages); set_pte_at(&init_mm, addr, pte, entry); } +static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, + struct vmemmap_remap_walk *walk) +{ + pgprot_t pgprot = PAGE_KERNEL; + struct page *page; + void *to; + + BUG_ON(pte_page(*pte) != walk->reuse_page); + + page = list_first_entry(walk->vmemmap_pages, struct page, lru); + list_del(&page->lru); + to = page_to_virt(page); + copy_page(to, (void *)walk->reuse_addr); + + set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); +} + /** * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end) * to the page which @reuse is mapped to, then free vmemmap @@ -193,12 +272,12 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, * remap. * @reuse: reuse address. * - * Note: This function depends on vmemmap being base page mapped. Please make - * sure that we disable PMD mapping of vmemmap pages when calling this function. + * Return: %0 on success, negative error code otherwise. */ -void vmemmap_remap_free(unsigned long start, unsigned long end, - unsigned long reuse) +int vmemmap_remap_free(unsigned long start, unsigned long end, + unsigned long reuse) { + int ret; LIST_HEAD(vmemmap_pages); struct vmemmap_remap_walk walk = { .remap_pte = vmemmap_remap_pte, @@ -221,25 +300,25 @@ void vmemmap_remap_free(unsigned long start, unsigned long end, */ BUG_ON(start - reuse != PAGE_SIZE); - vmemmap_remap_range(reuse, end, &walk); - free_vmemmap_page_list(&vmemmap_pages); -} + mmap_write_lock(&init_mm); + ret = vmemmap_remap_range(reuse, end, &walk); + mmap_write_downgrade(&init_mm); -static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk) -{ - pgprot_t pgprot = PAGE_KERNEL; - struct page *page; - void *to; + if (ret && walk.walked_pte) { + end = reuse + walk.walked_pte * PAGE_SIZE; + walk = (struct vmemmap_remap_walk) { + .remap_pte = vmemmap_restore_pte, + .reuse_addr = reuse, + .vmemmap_pages = &vmemmap_pages, + }; - BUG_ON(pte_page(*pte) != walk->reuse_page); + vmemmap_remap_range(reuse, end, &walk); + } + mmap_read_unlock(&init_mm); - page = list_first_entry(walk->vmemmap_pages, struct page, lru); - list_del(&page->lru); - to = page_to_virt(page); - copy_page(to, (void *)walk->reuse_addr); + free_vmemmap_page_list(&vmemmap_pages); - set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); + return ret; } static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, @@ -273,6 +352,8 @@ static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, * remap. * @reuse: reuse address. * @gpf_mask: GFP flag for allocating vmemmap pages. + * + * Return: %0 on success, negative error code otherwise. */ int vmemmap_remap_alloc(unsigned long start, unsigned long end, unsigned long reuse, gfp_t gfp_mask) @@ -287,12 +368,12 @@ int vmemmap_remap_alloc(unsigned long start, unsigned long end, /* See the comment in the vmemmap_remap_free(). */ BUG_ON(start - reuse != PAGE_SIZE); - might_sleep_if(gfpflags_allow_blocking(gfp_mask)); - if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages)) return -ENOMEM; + mmap_read_lock(&init_mm); vmemmap_remap_range(reuse, end, &walk); + mmap_read_unlock(&init_mm); return 0; } From patchwork Sat Jun 12 09:45:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12316947 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6E7CC48BCF for ; Sat, 12 Jun 2021 09:46:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 72C10611CC for ; Sat, 12 Jun 2021 09:46:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 72C10611CC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 17E256B0070; Sat, 12 Jun 2021 05:46:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12D236B0071; Sat, 12 Jun 2021 05:46:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC3FE6B0072; Sat, 12 Jun 2021 05:46:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id B2CEC6B0070 for ; Sat, 12 Jun 2021 05:46:20 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 582CD180BAA61 for ; Sat, 12 Jun 2021 09:46:20 +0000 (UTC) X-FDA: 78244591320.27.68A8D99 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf07.hostedemail.com (Postfix) with ESMTP id ABED8A00025B for ; Sat, 12 Jun 2021 09:46:12 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id u18so6454955pfk.11 for ; Sat, 12 Jun 2021 02:46:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KC0On2yvIRGYR5Tx6Aoa1thZ3mR+Ze9SCIXSxV9QNMo=; b=BWT9fI1X89JDvUQS4XFMLpK1ki292I3lniywkAZFQ387QiFhmPa1rsKIzcyACeeuYn LDCgxz9yhxYBTAg5P+XeDLuJ2xn3Sv17zYC6gcORNRyX0sqGpgcw7tdU33VQqWIcHBJL IWnJLmA9+HiXJd2L6rMQ3QOocJMh2hqMrpcAie+09tUMz9yDxa08HfcBN3jyP3R7OQw2 /g9yIZ+pG2Pdc5vDYqdWQtioRIYGTZ/NHiWjEnJYMGnQeJiARvYB7xBynkuNY7agse0y qmHbGwXgfo7KMImfKVa+7bB2HUF3M4kBxW5W9rl8oevftB/PSX2jKgRpf1fMzy1GxZkk tLhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KC0On2yvIRGYR5Tx6Aoa1thZ3mR+Ze9SCIXSxV9QNMo=; b=OdJAsPqEO71vmtMLN4Uteq4jz69Ioh8a5co+kkPJHViz9P+Z2iC0QHikPr6lOkgKTz pSodeSCxz6jx7gxISBzeu2fx+TM7rxJUUJoYQpv+46jYWgpNFlUXxRBwgywvt2nrERQv Z1NWaK/8+q2rQ119D8cLdIduaP8jB3NvQvvlpAZUS6O+4IdII5CXk3snuQtSuLAAcrAY yz5UZPFg7janEqYsxL+N+Sc5h5uYboTbAtvcmzk5pOP6YlP85IvobEp9fdcLuHwN8/8C h3m9TZsmvBXKL+U+bBajF7hSkPxfdH9yvXLbdoqSSf2UEn543iSofLkyji0cFzZBUqGX Hb1w== X-Gm-Message-State: AOAM533DBMaeHSR8gxkFeDILH9I6hOso93nzHw27IECGRlikr1GmlaOm nTUo5YhKRR/fepdTo6flfnstpQ== X-Google-Smtp-Source: ABdhPJzlV+X34s+XadDAq8IRfThE+1QP5f7WXqSFWivVZkY7TsjDfOf4yVK+yTR8eXMcg7x/42wckA== X-Received: by 2002:a62:7d4e:0:b029:2e9:ac1c:2769 with SMTP id y75-20020a627d4e0000b02902e9ac1c2769mr12173274pfc.57.1623491178468; Sat, 12 Jun 2021 02:46:18 -0700 (PDT) Received: from localhost.tiktokd.org ([139.177.225.246]) by smtp.gmail.com with ESMTPSA id t39sm6929557pfg.147.2021.06.12.02.46.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 12 Jun 2021 02:46:18 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v2 2/3] mm: sparsemem: use huge PMD mapping for vmemmap pages Date: Sat, 12 Jun 2021 17:45:54 +0800 Message-Id: <20210612094555.71344-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210612094555.71344-1-songmuchun@bytedance.com> References: <20210612094555.71344-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=BWT9fI1X; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf07.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspamd-Server: rspam02 X-Stat-Signature: a4qse5ofmdzftwifa9uj663mmi13wfjn X-Rspamd-Queue-Id: ABED8A00025B X-HE-Tag: 1623491172-643723 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The preparation of splitting huge PMD mapping of vmemmap pages is ready, so switch the mapping from PTE to PMD. Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz --- Documentation/admin-guide/kernel-parameters.txt | 7 ------- arch/x86/mm/init_64.c | 8 ++------ include/linux/hugetlb.h | 25 ++++++------------------- mm/memory_hotplug.c | 2 +- 4 files changed, 9 insertions(+), 33 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index db1ef6739613..a01aadafee38 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1599,13 +1599,6 @@ enabled. Allows heavy hugetlb users to free up some more memory (6 * PAGE_SIZE for each 2MB hugetlb page). - This feauture is not free though. Large page - tables are not used to back vmemmap pages which - can lead to a performance degradation for some - workloads. Also there will be memory allocation - required when hugetlb pages are freed from the - pool which can lead to corner cases under heavy - memory pressure. Format: { on | off (default) } on: enable the feature diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 9d9d18d0c2a1..65ea58527176 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -34,7 +34,6 @@ #include #include #include -#include #include #include @@ -1610,8 +1609,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE)); VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE)); - if ((is_hugetlb_free_vmemmap_enabled() && !altmap) || - end - start < PAGES_PER_SECTION * sizeof(struct page)) + if (end - start < PAGES_PER_SECTION * sizeof(struct page)) err = vmemmap_populate_basepages(start, end, node, NULL); else if (boot_cpu_has(X86_FEATURE_PSE)) err = vmemmap_populate_hugepages(start, end, node, altmap); @@ -1639,8 +1637,6 @@ void register_page_bootmem_memmap(unsigned long section_nr, pmd_t *pmd; unsigned int nr_pmd_pages; struct page *page; - bool base_mapping = !boot_cpu_has(X86_FEATURE_PSE) || - is_hugetlb_free_vmemmap_enabled(); for (; addr < end; addr = next) { pte_t *pte = NULL; @@ -1666,7 +1662,7 @@ void register_page_bootmem_memmap(unsigned long section_nr, } get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO); - if (base_mapping) { + if (!boot_cpu_has(X86_FEATURE_PSE)) { next = (addr + PAGE_SIZE) & PAGE_MASK; pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 03ca83db0a3e..d43565dd5fb9 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -904,20 +904,6 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, } #endif -#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP -extern bool hugetlb_free_vmemmap_enabled; - -static inline bool is_hugetlb_free_vmemmap_enabled(void) -{ - return hugetlb_free_vmemmap_enabled; -} -#else -static inline bool is_hugetlb_free_vmemmap_enabled(void) -{ - return false; -} -#endif - #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; @@ -1077,13 +1063,14 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } - -static inline bool is_hugetlb_free_vmemmap_enabled(void) -{ - return false; -} #endif /* CONFIG_HUGETLB_PAGE */ +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +extern bool hugetlb_free_vmemmap_enabled; +#else +#define hugetlb_free_vmemmap_enabled false +#endif + static inline spinlock_t *huge_pte_lock(struct hstate *h, struct mm_struct *mm, pte_t *pte) { diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index d96a3c7551c8..9d8a551c08d5 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1056,7 +1056,7 @@ bool mhp_supports_memmap_on_memory(unsigned long size) * populate a single PMD. */ return memmap_on_memory && - !is_hugetlb_free_vmemmap_enabled() && + !hugetlb_free_vmemmap_enabled && IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && size == memory_block_size_bytes() && IS_ALIGNED(vmemmap_size, PMD_SIZE) && From patchwork Sat Jun 12 09:45:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12316949 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D6EEC48BCF for ; Sat, 12 Jun 2021 09:46:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BD1C36138C for ; Sat, 12 Jun 2021 09:46:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD1C36138C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5E99E6B0071; Sat, 12 Jun 2021 05:46:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 599756B0072; Sat, 12 Jun 2021 05:46:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3EC0D6B0073; Sat, 12 Jun 2021 05:46:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0237.hostedemail.com [216.40.44.237]) by kanga.kvack.org (Postfix) with ESMTP id 0C3896B0071 for ; Sat, 12 Jun 2021 05:46:25 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 99F65ABF4 for ; Sat, 12 Jun 2021 09:46:25 +0000 (UTC) X-FDA: 78244591530.15.F69562F Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf20.hostedemail.com (Postfix) with ESMTP id A5DB3F2 for ; Sat, 12 Jun 2021 09:46:16 +0000 (UTC) Received: by mail-pl1-f175.google.com with SMTP id b12so4042192plg.11 for ; Sat, 12 Jun 2021 02:46:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cStPclh9XG3RxOFyVxsu9zTH21TiQql+7187dLUWbK8=; b=eUO6qTRKBADJwUvuErZZeJR8S4O44jHtqy+r5h1kUNyB7U5lcux3/vJZGSz4EC02lO 7caZ1sMLwKh29iHmshLFiGfBTWgBDKKEjZjL3O5ByeTACbVC8GH2GUgoeg67eDlDasSR N0FiROzzDYmjEZB+5zshhVw9hSa+qmCcD233m++NOqgB+idXbQVGZwjP6bMNUaEkpb2W XRg1Nby+1RCfoKMTpIigJGUcKHGMaWhMd1hYUjUJaFdxBNuznmYHQ3/+Cqav/8LnsyOj DILszkKbXV/29+eg5CZ27W9r79KzqlhDBmcwEgL3FhTrZPAeNxW7QzuxWimHV8kAOXCe cRPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cStPclh9XG3RxOFyVxsu9zTH21TiQql+7187dLUWbK8=; b=OnKeblGjQ66uno0rTkSXqRNZ4xnlnyROHWf9fb083jryPpgI5LYML8T6W3dfhuXFNX rkRNeIpQhDxxyksHN1SC/tF9EKNCmxTgXGy+o4AzM9VMQ4n7UczhMwFHN3F8FMIEJ4z6 ng+cgm8BeX2NWZdI3LKnZ8XVFX9xUEKlvbk0nr9vaDTLB8C24tTBLQayLqltJ7DJO4YQ XONjNO2qLRJqFBwsUsleTZ9tMrx0PkJEpl6JBkXiI0XsvQAgBGWt/6EEx/P2zfCwpIAq PxyS69FHd1LTlhJsY4ibs0eQ5vp4KJzp9RKmviQ7b6u+SFKfu8ObmmyKZ/U4AQUGL6iZ LpMg== X-Gm-Message-State: AOAM530zDXgy8nH338fBvtgtPr0D7pBf3bdScHyXUo1XWupVeS9lFPhS +deDD6OXN1JnY6b5yx6WD5le8g== X-Google-Smtp-Source: ABdhPJxri4vZdp0HhrlywijdZcCZtQiBBCNKpg4WIBbifMFNu/V4m/2DPd2uuq3Hp+FCgOe5CZEYOw== X-Received: by 2002:a17:903:3112:b029:103:6340:3195 with SMTP id w18-20020a1709033112b029010363403195mr7899387plc.29.1623491184271; Sat, 12 Jun 2021 02:46:24 -0700 (PDT) Received: from localhost.tiktokd.org ([139.177.225.246]) by smtp.gmail.com with ESMTPSA id t39sm6929557pfg.147.2021.06.12.02.46.18 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 12 Jun 2021 02:46:23 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v2 3/3] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON Date: Sat, 12 Jun 2021 17:45:55 +0800 Message-Id: <20210612094555.71344-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210612094555.71344-1-songmuchun@bytedance.com> References: <20210612094555.71344-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=eUO6qTRK; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf20.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Stat-Signature: is9yomd9fo945dfmiuudqxic1i7ccjmi X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A5DB3F2 X-HE-Tag: 1623491176-169156 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap pages associated with each HugeTLB page is default off. Now the vmemmap is PMD mapped. So there is no side effect when this feature is enabled with no HugeTLB pages in the system. Someone may want to enable this feature in the compiler time instead of using boot command line. So add a config to make it default on when someone do not want to enable it via command line. Signed-off-by: Muchun Song --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ fs/Kconfig | 10 ++++++++++ mm/hugetlb_vmemmap.c | 6 ++++-- 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a01aadafee38..8eee439d943c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1604,6 +1604,9 @@ on: enable the feature off: disable the feature + Built with CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON=y, + the default is on. + This is not compatible with memory_hotplug.memmap_on_memory. If both parameters are enabled, hugetlb_free_vmemmap takes precedence over memory_hotplug.memmap_on_memory. diff --git a/fs/Kconfig b/fs/Kconfig index f40b5b98f7ba..e78bc5daf7b0 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -245,6 +245,16 @@ config HUGETLB_PAGE_FREE_VMEMMAP depends on X86_64 depends on SPARSEMEM_VMEMMAP +config HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON + bool "Default freeing vmemmap pages of HugeTLB to on" + default n + depends on HUGETLB_PAGE_FREE_VMEMMAP + help + When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap + pages associated with each HugeTLB page is default off. Say Y here + to enable freeing vmemmap pages of HugeTLB by default. It can then + be disabled on the command line via hugetlb_free_vmemmap=off. + config MEMFD_CREATE def_bool TMPFS || HUGETLBFS diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 06802056f296..c540c21e26f5 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -182,7 +182,7 @@ #define RESERVE_VMEMMAP_NR 2U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) -bool hugetlb_free_vmemmap_enabled; +bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); static int __init early_hugetlb_free_vmemmap_param(char *buf) { @@ -197,7 +197,9 @@ static int __init early_hugetlb_free_vmemmap_param(char *buf) if (!strcmp(buf, "on")) hugetlb_free_vmemmap_enabled = true; - else if (strcmp(buf, "off")) + else if (!strcmp(buf, "off")) + hugetlb_free_vmemmap_enabled = false; + else return -EINVAL; return 0;