From patchwork Wed Jul 14 09:17:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12376351 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3C7AC11F66 for ; Wed, 14 Jul 2021 09:20:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 53CF0613C3 for ; Wed, 14 Jul 2021 09:20:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 53CF0613C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4F49E6B0083; Wed, 14 Jul 2021 05:20:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CB6D6B0085; Wed, 14 Jul 2021 05:20:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36B686B0088; Wed, 14 Jul 2021 05:20:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0138.hostedemail.com [216.40.44.138]) by kanga.kvack.org (Postfix) with ESMTP id 0FCC56B0083 for ; Wed, 14 Jul 2021 05:20:56 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 0C0CF8245571 for ; Wed, 14 Jul 2021 09:20:55 +0000 (UTC) X-FDA: 78360648870.20.B77E48C Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf30.hostedemail.com (Postfix) with ESMTP id 3AA88E001813 for ; Wed, 14 Jul 2021 09:20:53 +0000 (UTC) Received: by mail-pl1-f174.google.com with SMTP id u3so1193547plf.5 for ; Wed, 14 Jul 2021 02:20:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=6+1Zi925Y3jPcRtbAKIcVV1fKTrsYAIlXF0kFmPpmxQ=; b=FvwWysAqaTK/8OoCr2rcrbgk8UIkYwN62kiGJOH1QSeJmvGlGaOsEwcfAJPNcnS1np BRTqw0hikwNjpkY6GexEqx9l0LOpOtC/IvLwsX75QDKxid+isjHlxaj04dgCYvDAOyRb vGJiiboAWUqCvUqJ6ql8uB4m087UrhwS6l3KJMa49e0CCpxpjvfRL44KiqldRcdJQRJX jWps8MTWlDFyNHg9lrISjLChE2CF999UbzU5mYNuxGQHW4md8cirwvCuiRs8+TuNcoOt ZbbDEd8uohJBWcOl8jxjWR3lPCwOmLYMqJRPbKAj5Kz5hnrz7Vv2mQqzVCr92cP791wI 13/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=6+1Zi925Y3jPcRtbAKIcVV1fKTrsYAIlXF0kFmPpmxQ=; b=hR4UJexC/ifBf4k05zaLyn+V+f2Zqex+h/zl0/HsLhglgwRZCGhVjI4JVxT8t/HxHG OdkeiFYaWPv73qd3LdlRpgIc64nruozWP/WJljYyUbmjorwKLvZqF+4JjNyEAxDlPmaA /K/XAn0CUYBE3+BokRPkCgt00fyAd7bJ9uTIN3mKKFdeZSlzfx750bdrMEEfCnb04u1N 951qDghiPBf40UvzJbZ89tmQ1+8c5rlvwrxccyIAXM5noWF7KeJwNFc9FL+s2zuVZraw WFUX5bh28gKxpCIOVK+WbzZVNMUJzFBoxBVeMdxL50I8gPbPCT4WyXXeKd7kKvfa8wyV USYg== X-Gm-Message-State: AOAM531V+XDqNHLzJyn1Ah9DoBJPQHOm8qS8bi/v4Mptle3xJ/obLHvT jfoGK6cufA1BOI6X6r9qN+7gMw== X-Google-Smtp-Source: ABdhPJzlFEfeANIpePsNfcifpvtfVZ7BKkvkdB69fY8s3YLkJ5V8SsI5wTMABLZPOXw/Z8jgOP4bPA== X-Received: by 2002:a17:902:848e:b029:12b:57b:7f3e with SMTP id c14-20020a170902848eb029012b057b7f3emr7145216plo.61.1626254451873; Wed, 14 Jul 2021 02:20:51 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.243]) by smtp.gmail.com with ESMTPSA id k19sm1742540pji.32.2021.07.14.02.20.46 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Jul 2021 02:20:51 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH 0/5] Free the 2nd vmemmap page associated with each HugeTLB page Date: Wed, 14 Jul 2021 17:17:55 +0800 Message-Id: <20210714091800.42645-1-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) MIME-Version: 1.0 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=FvwWysAq; spf=pass (imf30.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam02 X-Stat-Signature: tcftdhppx3s19ta5a8fjauw1dzsx1fai X-Rspamd-Queue-Id: 3AA88E001813 X-HE-Tag: 1626254453-470596 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After the feature of "Free sonme vmemmap pages of HugeTLB page" is enabled, the mapping of the vmemmap addresses associated with a 2MB HugeTLB page becomes the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | -------------> | 1 | | | +-----------+ +-----------+ | | | 2 | ----------------^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | 3 | ------------------+ | | | | | | +-----------+ | | | | | | | 4 | --------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | ----------------------+ | | | | +-----------+ | | | | | 6 | ------------------------+ | | | +-----------+ | | | | 7 | --------------------------+ | | +-----------+ | | | | | | +-----------+ As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and remaped. However, the 2nd vmemmap page frame is also can be freed to the buddy allocator, then we can change the mapping from the figure above to the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | | 2 | -----------------+ | | | | | | | +-----------+ | | | | | | | | 3 | -------------------+ | | | | | | +-----------+ | | | | | | | 4 | ---------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | -----------------------+ | | | | +-----------+ | | | | | 6 | -------------------------+ | | | +-----------+ | | | | 7 | ---------------------------+ | | +-----------+ | | | | | | +-----------+ After we do this, all tail vmemmap pages (1-7) are mapped to the head vmemmap page frame (0). In other words, there are more than one page struct with PG_head associated with each HugeTLB page. We __know__ that there is only one head page struct, the tail page structs with PG_head are fake head page structs. We need an approach to distinguish between those two different types of page structs so that compound_head(), PageHead() and PageTail() can work properly if the parameter is the tail page struct but with PG_head. The following code snippet describes how to distinguish between real and fake head page struct. if (test_bit(PG_head, &page->flags)) { unsigned long head = READ_ONCE(page[1].compound_head); if (head & 1) { if (head == (unsigned long)page + 1) ==> head page struct else ==> tail page struct } else ==> head page struct } We can safely access the field of the @page[1] with PG_head because the @page is a compound page composed with at least two contiguous pages. The main implementation is in the patch 3. In our server, we can save extra 2GB memory with this patchset applied if there are 1 TB HugeTLB (2 MB) pages. If the size of the HugeTLB page is 1 GB, it only can save 4MB. For 2 MB HugeTLB page, it is a nice gain. Muchun Song (5): mm: introduce PAGEFLAGS_MASK to replace ((1UL << NR_PAGEFLAGS) - 1) mm: introduce save_page_flags to cooperate with show_page_flags mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key mm: sparsemem: use page table lock to protect kernel pmd operations Documentation/admin-guide/kernel-parameters.txt | 2 +- include/linux/hugetlb.h | 6 +- include/linux/page-flags.h | 103 ++++++++++++++++++++++-- include/trace/events/mmflags.h | 4 + include/trace/events/page_ref.h | 8 +- lib/test_printf.c | 2 +- lib/vsprintf.c | 2 +- mm/hugetlb_vmemmap.c | 67 ++++++++------- mm/ptdump.c | 16 +++- mm/sparse-vmemmap.c | 70 ++++++++++++---- 10 files changed, 218 insertions(+), 62 deletions(-)