From patchwork Wed Mar 2 08:46:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12765647 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A7962C433EF for ; Wed, 2 Mar 2022 08:48:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=AcmVB3Y5ppsBwnKJyWc8j0uYqKkkpFHjTX3KfWahANE=; b=vaU6WHJKfZJqk1 4J+VXm7z7TC/RQCzlzGbWc98wJlXO9eloVs7yuHjWZU9SQ5OzoFW8wmCphp2fK1wJP1X/XHSN/pnR 2emMdkgxb9zZ84vrpa8S5L8BbHd3Z3kci/OG9ECie5Dqi03OlK8MIunF3PMiTx/OoSJHPmn17W7Qu 4zC0n2MYwOTTqGygp4xK2TEcCSBVEiWt168cd88tTwZiEP3aFnkJBPy1EHeh9gi8IFVmBlx7SZNjO OcfZ2uuKkc+YZ0u252MlBlmSzYGg/xl+SBExt7qE31kKmm1EDgFCVwpjQnrGOEmuzMNCeuacSeIeG ntAQKDs8iobH2QFgdhjQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nPKdQ-001ubn-5r; Wed, 02 Mar 2022 08:47:00 +0000 Received: from mail-pg1-x532.google.com ([2607:f8b0:4864:20::532]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nPKdL-001uZP-V0 for linux-arm-kernel@lists.infradead.org; Wed, 02 Mar 2022 08:46:57 +0000 Received: by mail-pg1-x532.google.com with SMTP id 195so1059359pgc.6 for ; Wed, 02 Mar 2022 00:46:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=ZWLmgm9buCbXqY23s5VhjmfJZhCL4RghEpABSyXi0/w=; b=QeneTRGP9wfQ25M47Q+R8Wl68p5vbBqnFJgw6WQkaxsfswBfxD1OITyA/HIqzBVEo2 AZnzxKeNl71ZMwKvldPBo/wm56YZT2vbzLAEyG8hvJerHG+nXA3Zw60rCyyOd0IpAdop J5DaQ73SFL4Scu7mhuf9krPmf8PFtNI7oPVChKCk6vmzr9midVGbN6LbpOIaShuS9YYg bGP22tPqpAGxGlbbLkhLm4bbuSykKGe4jJAwXg0fUSGEYcyZYd6ndab19DhXAeVFaVkB jcJ8izz4tngPcWkiO8vQR5PPQLzHKWKeHu0zP523nhOOPDxxIA4DenbBNVzA1yYmCiMi gIaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=ZWLmgm9buCbXqY23s5VhjmfJZhCL4RghEpABSyXi0/w=; b=Xd3qHdsE+IgS6ahlqJu1JnB7hzgiymS6qJx1OVS9eRW/++fRL2OpjJ+R/5feqP9mvZ lnSj09P5xvvugVnk++zW1r19HAG5W622O/ZfUv+lX+w1PIH/wKM+h8aoceBqXGKpZnBF zvHH8Cv5laybup5pB4hC4Kn4lk0mDF5aFZ3Ll09uGu3yDWTATwUnZkHGnrFfljroPpJ2 FkYwI8RyGfXtf9gHWHeTvpesQ3SfPwx7rc9jVW8aVGmGrQ/xlas+GHt8q5ZlFCikpwqJ lHRlmjWK9ZUASVzp1BsjQNVLlDRIbwO2NP6TRr+OVAEwSmecsnDRwBe118F1zy7n5AmG 1jRg== X-Gm-Message-State: AOAM53372KxlVB6w8ZHpsz40f83eKwJLqfWLDn8v01YI8ICAXSNx7Rjg WolThCySczbd3HwMRg4eM4noYw== X-Google-Smtp-Source: ABdhPJxKaSrM6iBBGkvbdb2L3RZylc2pqe5tLOOKYk/MoVAf2KGjquxtsY4qytU/fXNy/oElueVSrg== X-Received: by 2002:a63:4386:0:b0:378:b62e:28b3 with SMTP id q128-20020a634386000000b00378b62e28b3mr11156359pga.442.1646210813974; Wed, 02 Mar 2022 00:46:53 -0800 (PST) Received: from FVFYT0MHHV2J.bytedance.net ([61.120.150.70]) by smtp.gmail.com with ESMTPSA id s10-20020a63214a000000b003652f4ee81fsm14828816pgm.69.2022.03.02.00.46.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Mar 2022 00:46:53 -0800 (PST) From: Muchun Song To: will@kernel.org, akpm@linux-foundation.org, david@redhat.com, bodeddub@amazon.com, osalvador@suse.de, mike.kravetz@oracle.com, rientjes@google.com, mark.rutland@arm.com, catalin.marinas@arm.com, james.morse@arm.com, song.bao.hua@hisilicon.com Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v2 RESEND 1/2] arm64: avoid flushing icache multiple times on contiguous HugeTLB Date: Wed, 2 Mar 2022 16:46:23 +0800 Message-Id: <20220302084624.33340-1-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220302_004656_024705_4900827B X-CRM114-Status: GOOD ( 13.07 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When a contiguous HugeTLB page is mapped, set_pte_at() will be called CONT_PTES/CONT_PMDS times. Therefore, __sync_icache_dcache() will flush cache multiple times if the page is executable (to ensure the I-D cache coherency). However, the first flushing cache already covers subsequent cache flush operations. So only flusing cache for the head page if it is a HugeTLB page to avoid redundant cache flushing. In the next patch, it is also depends on this change since the tail vmemmap pages of HugeTLB is mapped with read-only meanning only head page struct can be modified. Signed-off-by: Muchun Song Reviewed-by: Catalin Marinas --- arch/arm64/mm/flush.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index 2aaf950b906c..a06c6ac770d4 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -52,6 +52,13 @@ void __sync_icache_dcache(pte_t pte) { struct page *page = pte_page(pte); + /* + * HugeTLB pages are always fully mapped, so only setting head page's + * PG_dcache_clean flag is enough. + */ + if (PageHuge(page)) + page = compound_head(page); + if (!test_bit(PG_dcache_clean, &page->flags)) { sync_icache_aliases((unsigned long)page_address(page), (unsigned long)page_address(page) + From patchwork Wed Mar 2 08:46:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12765648 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 51954C433EF for ; Wed, 2 Mar 2022 08:48:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=cw3n5/KK2/UEEPnmk8rA/EQYJ/9oAwLs2fNAtJyz4fA=; b=x8nxO1RUtlbMjm ENXe1lPfpKPjnLOIF47MI6EdnHhd00tC8MvA2s3gztv+oDsfO5mCnjFVkX1sxwaEZ3Llu2KFD1bBU uzy/2n/zmInhk2Yu3yXvRbzu2fFWUZN3vw9draLwoBElEdlW0+I1U1jMxuj//gPOfe4Oy608W5BmJ d9x18X0QV2e9vzIRBYsVKeYDUASi1bTdOub90HSIcEHyoeK/KYOQeJ0blhIIBXpsvTXFzEPfSKnBq b759HV5OKNXcUQenTTkcs+ur4X27zYE6NzO+FERXh7t0wHEolmy+G+gzYQl8Om2I6bym7kF9f3xtW NvyPLWb40sk84MaE5wnQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nPKdh-001ukf-OH; Wed, 02 Mar 2022 08:47:17 +0000 Received: from mail-pf1-x42a.google.com ([2607:f8b0:4864:20::42a]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nPKdT-001ueL-Q3 for linux-arm-kernel@lists.infradead.org; Wed, 02 Mar 2022 08:47:09 +0000 Received: by mail-pf1-x42a.google.com with SMTP id x18so1316456pfh.5 for ; Wed, 02 Mar 2022 00:47:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=n5vkvz8TS8A/TWoh79lH9r9rs5OiOfx+UY7K0dS1wLk=; b=wl2TmBKvSygE3gpYnoEtAVUPwIHbIZfqPBnxY1Sa4fLuCtcwbvEh96XNKvJeFoTBGY zyMot7wIAREuVKNjfHTA88VWlv/6wBBaV27OciwAirMzkKbi3U6DXHPQZJNwTq/KCk5y ziLSLuqBHgOW2ZZA1cncwbUtgLeHqkly3yK1Ihuxndwh9dB9V0HaFAPN0os+jH+uaVuG B5A6fCFnCdJQ1QCkqJJnlxeKFGQoDyH6J6zwd+ZEjcHM3VcRV5Wh7QnT2CxbpbMRTf+2 FfmQGJEQJTRPs9qmuqY5UXoCvSHjnlWnoxAqd1DZ1YgxqRuRXQH7/u5WUjKrvcKNjzaG UWAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=n5vkvz8TS8A/TWoh79lH9r9rs5OiOfx+UY7K0dS1wLk=; b=5tDbBxtEL8HrkL8CDfzwl2Vbw8tiP7g0mI7yTIQKHw1mvpfL72PzMv1i0nkA8g6mG6 bt4IGLbf1tlKBWbbN0ya0KNiC2ICYnJ1Ic7Q9BpDb+YELFKY7Fgqdz+HqkBzKn3Bb71B emWnWuo6H0QaOlR6Od4DwKcvhLy1mQK7PDtRGmfcezfbm3fEdYE8BKos1xJdg4KJJiLt bEKCA3dY8BMG9agK3iKafLBKwgS5FaDuHD95tCy7q1i7gI6erAeb1cgJWRcCCKIiNw7q kmxzr2nL+0nECeCLxCqYguXNXtalQksSe9hirRP16fruW71o/6waSq9Rmb/WE+H/eRi1 C/fA== X-Gm-Message-State: AOAM532zSEJppEJ/YfUmuahO6SR3YgFoimrPKJCeqGSH3Y7+Gdri2AiO xWMViDIAZbR2Os+yy5bfEP9lQQ== X-Google-Smtp-Source: ABdhPJyAYZXpZGvAJrNyThqIJZ03hBH5XLxTn59gglkVUBWGfIWTNTaemO7wpw61WlPlckKkeVFjrw== X-Received: by 2002:a05:6a00:de:b0:4e0:ca1a:9f07 with SMTP id e30-20020a056a0000de00b004e0ca1a9f07mr32280686pfj.11.1646210822468; Wed, 02 Mar 2022 00:47:02 -0800 (PST) Received: from FVFYT0MHHV2J.bytedance.net ([61.120.150.70]) by smtp.gmail.com with ESMTPSA id s10-20020a63214a000000b003652f4ee81fsm14828816pgm.69.2022.03.02.00.46.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Mar 2022 00:47:02 -0800 (PST) From: Muchun Song To: will@kernel.org, akpm@linux-foundation.org, david@redhat.com, bodeddub@amazon.com, osalvador@suse.de, mike.kravetz@oracle.com, rientjes@google.com, mark.rutland@arm.com, catalin.marinas@arm.com, james.morse@arm.com, song.bao.hua@hisilicon.com Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v2 RESEND 2/2] arm64: mm: hugetlb: add support for free vmemmap pages of HugeTLB Date: Wed, 2 Mar 2022 16:46:24 +0800 Message-Id: <20220302084624.33340-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220302084624.33340-1-songmuchun@bytedance.com> References: <20220302084624.33340-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220302_004703_895124_C10C95B7 X-CRM114-Status: GOOD ( 25.77 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The feature of minimizing overhead of struct page associated with each HugeTLB page aims to free its vmemmap pages (used as struct page) to save memory, where is ~14GB/16GB per 1TB HugeTLB pages (2MB/1GB type). In short, when a HugeTLB page is allocated or freed, the vmemmap array representing the range associated with the page will need to be remapped. When a page is allocated, vmemmap pages are freed after remapping. When a page is freed, previously discarded vmemmap pages must be allocated before remapping. More implementations and details can be found here [1]. The preparation of freeing vmemmap pages associated with each HugeTLB page is ready, so we can support this feature for arm64 now. The flush_dcache_page() need to be adapted to operate on the head page's flags since the tail vmemmap pages are mapped with read-only after the feature is enabled (clear operation is not permitted). There was some discussions about this in the thread [2], but there was no conclusion in the end. And I copied the concern proposed by Anshuman to here. 1st concern: ''' But what happens when a hot remove section's vmemmap area (which is being teared down) is nearby another vmemmap area which is either created or being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB pages inside the hot remove section might be safe. But what about other HugeTLB areas whose vmemmap area shares page table entries with vmemmap entries for a section being hot removed ? Massive HugeTLB alloc /use/free test cycle using memory just adjacent to a memory hotplug area, which is always added and removed periodically, should be able to expose this problem. ''' Answer: At the time memory is removed, all HugeTLB pages either have been migrated away or dissolved. So there is no race between memory hot remove and free_huge_page_vmemmap(). Therefore, HugeTLB pages inside the hot remove section is safe. Let's talk your question "what about other HugeTLB areas whose vmemmap area shares page table entries with vmemmap entries for a section being hot removed ?", the question is not established. The minimal granularity size of hotplug memory 128MB (on arm64, 4k base page), any HugeTLB smaller than 128MB is within a section, then, there is no share PTE page tables between HugeTLB in this section and ones in other sections and a HugeTLB page could not cross two sections. In this case, the section cannot be freed. Any HugeTLB bigger than 128MB (section size) whose vmemmap pages is an integer multiple of 2MB (PMD-mapped). As long as: 1) HugeTLBs are naturally aligned, power-of-two sizes 2) The HugeTLB size >= the section size 3) The HugeTLB size >= the vmemmap leaf mapping size Then a HugeTLB will not share any leaf page table entries with *anything else*, but will share intermediate entries. In this case, at the time memory is removed, all HugeTLB pages either have been migrated away or dissolved. So there is also no race between memory hot remove and free_huge_page_vmemmap(). 2nd concern: ''' differently, not sure if ptdump would require any synchronization. Dumping an wrong value is probably okay but crashing because a page table entry is being freed after ptdump acquired the pointer is bad. On arm64, ptdump() is protected against hotremove via [get|put]_online_mems(). ''' Answer: The ptdump should be fine since vmemmap_remap_free() only exchanges PTEs or split the PMD entry (which means allocating a PTE page table). Both operations do not free any page tables (PTE), so ptdump cannot run into a UAF on any page tables. The wrost case is just dumping an wrong value. [1] https://lore.kernel.org/all/20210510030027.56044-1-songmuchun@bytedance.com/ [2] https://lore.kernel.org/all/20210518091826.36937-1-songmuchun@bytedance.com/ Signed-off-by: Muchun Song --- Changes in v2: - Update commit message (Mark Rutland). - Fix flush_dcache_page(). arch/arm64/mm/flush.c | 14 ++++++++++++++ fs/Kconfig | 2 +- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index a06c6ac770d4..705484a9b9df 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -75,6 +75,20 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache); */ void flush_dcache_page(struct page *page) { +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP + /* + * Only the head page's flags of HugeTLB can be cleared since the tail + * vmemmap pages associated with each HugeTLB page are mapped with + * read-only when CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is enabled (more + * details can refer to vmemmap_remap_pte()). Although + * __sync_icache_dcache() only set PG_dcache_clean flag on the head + * page struct, some tail page structs still can see the flag since + * the head vmemmap page frame is reused (more details can refer to + * the comments above page_fixed_fake_head()). + */ + if (PageHuge(page)) + page = compound_head(page); +#endif if (test_bit(PG_dcache_clean, &page->flags)) clear_bit(PG_dcache_clean, &page->flags); } diff --git a/fs/Kconfig b/fs/Kconfig index 7a2b11c0b803..04cfd5bf5ec9 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -247,7 +247,7 @@ config HUGETLB_PAGE config HUGETLB_PAGE_FREE_VMEMMAP def_bool HUGETLB_PAGE - depends on X86_64 + depends on X86_64 || ARM64 depends on SPARSEMEM_VMEMMAP config HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON