From patchwork Tue Mar 26 10:14:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13603838 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2F7F2C6FD1F for ; Tue, 26 Mar 2024 10:15:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=1HbY/4lrS5pddc2WNcKXv+naij2eHIFH10t+ThbMzC8=; b=ayaZPFmXgHVCU+ y+ZxaHQAAgX84yEkkajFdS9zcFl1n0wUMi4vr10FdSmNvPNzQKyqQCaOtAYTRhFqau3ju2cQinJ6u Ul2aDXooTW3T+fZsCiIM1Z5kVceLgt2PEz7lNovCjENaEkR6WSlefP9APNrVwUZf6OPoFte3t5x2J hWgf5xNwYYA4HF/w+iuqz03QhPVNwwXgNNjvgvZQseHFNfNxr4yNTOmisWdx5rnbkM4Hd74Cci/q4 2DmsGtaO3TVVJqeyd1BZQYEwLTvemVYSU15DMqXIqnlEFBBFIPuEdRkfbZe+7GayUr9uuOLg8/h9d Vk7f1qGb3Cm2ybxpp5dA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rp3pi-0000000400D-2N8W; Tue, 26 Mar 2024 10:15:06 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rp3pe-00000003zwZ-3Lln for linux-arm-kernel@lists.infradead.org; Tue, 26 Mar 2024 10:15:05 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 598CD2F4; Tue, 26 Mar 2024 03:15:33 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 867653F64C; Tue, 26 Mar 2024 03:14:58 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Mark Rutland , Ard Biesheuvel , David Hildenbrand , Donald Dutile , Eric Chanudet Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 0/3] Speed up boot with faster linear map creation Date: Tue, 26 Mar 2024 10:14:45 +0000 Message-Id: <20240326101448.3453626-1-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240326_031502_966551_FF565E6F X-CRM114-Status: GOOD ( 12.55 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi All, It turns out that creating the linear map can take a significant proportion of the total boot time, especially when rodata=full. And a large portion of the time it takes to create the linear map is issuing TLBIs. This series reworks the kernel pgtable generation code to significantly reduce the number of TLBIs. See each patch for details. The below shows the execution time of map_mem() across a couple of different systems with different RAM configurations. We measure after applying each patch and show the improvement relative to base (v6.9-rc1): | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra | VM, 16G | VM, 64G | VM, 256G | Metal, 512G ---------------|-------------|-------------|-------------|------------- | ms (%) | ms (%) | ms (%) | ms (%) ---------------|-------------|-------------|-------------|------------- base | 151 (0%) | 2191 (0%) | 8990 (0%) | 17443 (0%) no-cont-remap | 77 (-49%) | 429 (-80%) | 1753 (-80%) | 3796 (-78%) no-alloc-remap | 77 (-49%) | 375 (-83%) | 1532 (-83%) | 3366 (-81%) lazy-unmap | 63 (-58%) | 330 (-85%) | 1312 (-85%) | 2929 (-83%) This series applies on top of v6.9-rc1. All mm selftests pass. I haven't yet tested all VA size configs (although I don't anticipate any issues); I'll do this as part of followup. Thanks, Ryan Ryan Roberts (3): arm64: mm: Don't remap pgtables per- cont(pte|pmd) block arm64: mm: Don't remap pgtables for allocate vs populate arm64: mm: Lazily clear pte table mappings from fixmap arch/arm64/include/asm/fixmap.h | 5 +- arch/arm64/include/asm/mmu.h | 8 + arch/arm64/include/asm/pgtable.h | 4 - arch/arm64/kernel/cpufeature.c | 10 +- arch/arm64/mm/fixmap.c | 11 + arch/arm64/mm/mmu.c | 364 +++++++++++++++++++++++-------- include/linux/pgtable.h | 8 + 7 files changed, 307 insertions(+), 103 deletions(-) Tested-by: Itaru Kitayama Tested-By: Eric Chanudet --- 2.25.1