From patchwork Thu Aug 19 06:58:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12446447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 016A1C432BE for ; Thu, 19 Aug 2021 07:01:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6B5E660FDA for ; Thu, 19 Aug 2021 07:01:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6B5E660FDA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id DE3C66B006C; Thu, 19 Aug 2021 03:01:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D93A66B0071; Thu, 19 Aug 2021 03:01:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5B346B0072; Thu, 19 Aug 2021 03:01:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0055.hostedemail.com [216.40.44.55]) by kanga.kvack.org (Postfix) with ESMTP id AAED46B006C for ; Thu, 19 Aug 2021 03:01:21 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 45E4282499A8 for ; Thu, 19 Aug 2021 07:01:21 +0000 (UTC) X-FDA: 78490933962.30.680B1CE Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf09.hostedemail.com (Postfix) with ESMTP id 94CD83007710 for ; Thu, 19 Aug 2021 07:01:18 +0000 (UTC) Received: by mail-pg1-f172.google.com with SMTP id n18so4986379pgm.12 for ; Thu, 19 Aug 2021 00:01:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=COyVX7DGswoLsT07bE7693Is9lpdbptkw7XZJq06dVE=; b=mbMCyhird95qS+Yla1yXzpOlmZrRUMQ71pWPpfzmMopuQKJvRINFP/i47lYrDp/uAs T7mgM7lTYPbqV9o9fN29nmg9nOo+Qb94iLBjQpSJA2hFDzloeoeOU7XhOdnr8Je4gyOM dShHm47W8pdOJvS9EytUQzkcR/CW5SV1UOvK1k8kHDz0F7pcZ9JZSP+t0vuEd6XMUDCO GA55e06bVgTC839A0JCWtfzNgpw8lfbAxGXBNSN38dnNAhT3c/vVVCTUOzA2X1qEOhhI mTXJ/DuszuE43W7eAfUeylslFj8cgYhZfOt6pYs2rQXC/ve40/yRb5nAr0Rw88zgQbhr Xhxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=COyVX7DGswoLsT07bE7693Is9lpdbptkw7XZJq06dVE=; b=sxSDWZgQVWJnGU+DmMK9Lv4LvmP4eXj7eFZknQsNG4mKrhBW1fPIvEzHwpoAEHbJFr W9fjRnZAdlz/YIWZWyA/ktSW0WOQdGkaNoB/8SjyrmyiSMxJmg48wB98E1D8OySid/yv VWm1zoeLGS1GlzxVg2LlTBEKe8YdUolpspdn2FA3ugavU9g1bgX/eGYzMIHx9H+R5PVg dgpPDemP3NkXN5X4Q0WpYld36Hu6rTxFmcSwC/tTwaiVzp2m6B6FeJa3g4LF+s+I4L+B dM2JOLXrjDBJD2lqGUzKp6t2MfXEGIIb2lFa54nyv54HFy80IN8RtwO02glSejbJC5PN dxog== X-Gm-Message-State: AOAM531HMLHqfN6vNCcQsTh1A582CzQ2T4F1sz+CNlmoTFJ7qITmVUqy 2UUFSUoFp0qm52+HrmKzJ42ehw== X-Google-Smtp-Source: ABdhPJx2xyVqZmF+texY3vtwLaBS7RZT8T4ghFi9Ki5m8/cAxmOqX5mlxtewEl+7oTvNcjN1iz8mog== X-Received: by 2002:a63:25c7:: with SMTP id l190mr12592795pgl.165.1629356477276; Thu, 19 Aug 2021 00:01:17 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.237]) by smtp.gmail.com with ESMTPSA id t30sm2490395pgl.47.2021.08.19.00.01.11 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 19 Aug 2021 00:01:16 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH v2 0/4] Free the 2nd vmemmap page associated with each HugeTLB page Date: Thu, 19 Aug 2021 14:58:27 +0800 Message-Id: <20210819065831.43186-1-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 94CD83007710 X-Stat-Signature: 1nyuenu58mzpdq5fh9wa89bfu3ukmg81 Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=mbMCyhir; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf09.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-HE-Tag: 1629356478-782346 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After the feature of "Free sonme vmemmap pages of HugeTLB page" is enabled, the mapping of the vmemmap addresses associated with a 2MB HugeTLB page becomes the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | -------------> | 1 | | | +-----------+ +-----------+ | | | 2 | ----------------^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | 3 | ------------------+ | | | | | | +-----------+ | | | | | | | 4 | --------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | ----------------------+ | | | | +-----------+ | | | | | 6 | ------------------------+ | | | +-----------+ | | | | 7 | --------------------------+ | | +-----------+ | | | | | | +-----------+ As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and remaped. However, the 2nd vmemmap page frame is also can be freed to the buddy allocator, then we can change the mapping from the figure above to the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | | 2 | -----------------+ | | | | | | | +-----------+ | | | | | | | | 3 | -------------------+ | | | | | | +-----------+ | | | | | | | 4 | ---------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | -----------------------+ | | | | +-----------+ | | | | | 6 | -------------------------+ | | | +-----------+ | | | | 7 | ---------------------------+ | | +-----------+ | | | | | | +-----------+ After we do this, all tail vmemmap pages (1-7) are mapped to the head vmemmap page frame (0). In other words, there are more than one page struct with PG_head associated with each HugeTLB page. We __know__ that there is only one head page struct, the tail page structs with PG_head are fake head page structs. We need an approach to distinguish between those two different types of page structs so that compound_head(), PageHead() and PageTail() can work properly if the parameter is the tail page struct but with PG_head. The following code snippet describes how to distinguish between real and fake head page struct. if (test_bit(PG_head, &page->flags)) { unsigned long head = READ_ONCE(page[1].compound_head); if (head & 1) { if (head == (unsigned long)page + 1) ==> head page struct else ==> tail page struct } else ==> head page struct } We can safely access the field of the @page[1] with PG_head because the @page is a compound page composed with at least two contiguous pages. The main implementation is in the patch 1. In our server, we can save extra 2GB memory with this patchset applied if there are 1 TB HugeTLB (2 MB) pages. If the size of the HugeTLB page is 1 GB, it only can save 4MB. For 2 MB HugeTLB page, it is a nice gain. Changlogs in v2: 1. Drop two patches of introducing PAGEFLAGS_MASK from this series. 2. Let page_head_if_fake() return page instead of NULL. 3. Add a selftest to check if PageHead or PageTail work well. Muchun Song (4): mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key mm: sparsemem: use page table lock to protect kernel pmd operations selftests: vm: add a hugetlb test case Documentation/admin-guide/kernel-parameters.txt | 2 +- include/linux/hugetlb.h | 6 +- include/linux/page-flags.h | 77 ++++++++++++- mm/hugetlb_vmemmap.c | 64 ++++++----- mm/ptdump.c | 16 ++- mm/sparse-vmemmap.c | 70 +++++++++--- tools/testing/selftests/vm/vmemmap_hugetlb.c | 139 ++++++++++++++++++++++++ 7 files changed, 320 insertions(+), 54 deletions(-) create mode 100644 tools/testing/selftests/vm/vmemmap_hugetlb.c