From patchwork Tue Apr 4 13:54:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13200194 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 263AFC6FD1D for ; Tue, 4 Apr 2023 13:55:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=l/2tRaWz1SI9ay1oSInO58nUibgiwEmqTchoo3uPNwQ=; b=lUY+34sWqkM4om Q4Gc8xsWZdEMVvzSxQt4Jy+DSgLFIuv6thyBA8P4hOM+lhnc5Wok9Er/5410dbfgQMIx/eXWXetp2 1k4MXD8dWRcQzTa9nut/fZ1R/zAtMIvoQmEYnryuMVA6nGDhZUUpJKMZ8xmfCO3/6NSYtiL56Yhbt Fu1IX3kP5VaDxs7QlsNReqtTCo3VULnw26UDY/Vg6IH1vQMhI1TLcHPABOmjo8lB6rFF/jYkRBeV4 d6U838YsgkqD+iks5HZJRoVwbw7O76KOnWKu84N5E01es89z/eyf8uknh38l3SDxBrMR8POHbNhnH DdVVoLr0lLooBXAK7Avg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pjh7e-001cOc-0R; Tue, 04 Apr 2023 13:54:54 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pjh7X-001cKU-2L for linux-arm-kernel@lists.infradead.org; Tue, 04 Apr 2023 13:54:51 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D7F5662E9E; Tue, 4 Apr 2023 13:54:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C757CC4339E; Tue, 4 Apr 2023 13:54:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1680616486; bh=8ksVMgDrk7HHJTXebdqDx5dTl1Pqs1PN5oSXZt4AlcU=; h=From:To:Cc:Subject:Date:From; b=Nheto0MAAGkvanEFjM3JqzDT75tTtus3P7qSMLRg5GrJp+1ZymShuD68GRD/h0vAn zlNzDxtQ504o8Sx75RL9/WPvNxowgzY6p8wh5DdqPdY6Pukf+WGTW6lIBfDpoBM9DP YyQfTQu3p0GiXitDPSKVluRHCXudeHx7RQCYCOm8YPmMgYdwkzBLSZ75/vsITwMs29 fBGz5197BZD9o+cI5jkcGRgt2hra8T1KcdCmreEolyuFu407CxYElLAu0Wta8V+8WH 7VuR+aaXAGHTtHIblp/FAEaKkZr0cNy24jSFR58s7+1H5pPxJhkAJTb5XS0prJWQsS ktQloWojKVf/A== From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org Cc: will@kernel.org, catalin.marinas@arm.com, broonie@kernel.org, mark.rutland@arm.com, Ard Biesheuvel , Shanker Donthineni Subject: [PATCH] arm64: module: Widen module region to 2 GiB Date: Tue, 4 Apr 2023 15:54:37 +0200 Message-Id: <20230404135437.2744866-1-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=10531; i=ardb@kernel.org; h=from:subject; bh=8ksVMgDrk7HHJTXebdqDx5dTl1Pqs1PN5oSXZt4AlcU=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIUVHR2aS3YR/fMoKoi9CE+1aD7C9M1D4LLooeM6OQkdOa 4vmSIWOUhYGMQ4GWTFFFoHZf9/tPD1RqtZ5lizMHFYmkCEMXJwCMJG1dgz/61ewdH2yEPjn91Ve xTfK2unqynLP/Xuy9bwtvzzNvbmnjpFh7km5LRwbDGzMOv8d2bCnct5S89teVjuc5P/o5tTyKeu zAAA= X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230404_065447_849717_A2E4BDEE X-CRM114-Status: GOOD ( 33.19 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Shanker reports that, after loading a 110+ MiB kernel module, no other modules can be loaded, in spite of the fact that module PLTs have been enabled in the build. This is due to the fact, even with module PLTs enabled, the module region is dimensioned to be a fixed 128 MiB region, as we simply never anticipated the need for supporting modules that huge. So let's increase the size of the statically allocated module region to 2 GiB, and update the module loading logic so that we prefer loading modules in the vicinity of the kernel text, where relative branches can be resolved without the need for PLTs. Only when we run out of space here (or when CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled), we will fall back to the larger window and allocate from there. While at it, let's try to simplify the logic a bit, to make it easier to follow: - remove the special cases for KASAN, which are no longer needed now that KASAN_VMALLOC is always enabled when KASAN is configured; - instead of defining a global module base address, define a global module limit, and work our way down from it. Cc: Shanker Donthineni Signed-off-by: Ard Biesheuvel --- Documentation/arm64/memory.rst | 8 ++--- arch/arm64/include/asm/memory.h | 2 +- arch/arm64/include/asm/module.h | 10 ++++-- arch/arm64/kernel/kaslr.c | 38 +++++++++++------------ arch/arm64/kernel/module.c | 54 ++++++++++++++++----------------- 5 files changed, 59 insertions(+), 53 deletions(-) diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst index 2a641ba7be3b717a..55a55f30eed8a6ce 100644 --- a/Documentation/arm64/memory.rst +++ b/Documentation/arm64/memory.rst @@ -33,8 +33,8 @@ AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: 0000000000000000 0000ffffffffffff 256TB user ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map [ffff600000000000 ffff7fffffffffff] 32TB [kasan shadow region] - ffff800000000000 ffff800007ffffff 128MB modules - ffff800008000000 fffffbffefffffff 124TB vmalloc + ffff800000000000 ffff80007fffffff 2GB modules + ffff800080000000 fffffbffefffffff 124TB vmalloc fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space @@ -50,8 +50,8 @@ AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support): 0000000000000000 000fffffffffffff 4PB user fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map [fffd800000000000 ffff7fffffffffff] 512TB [kasan shadow region] - ffff800000000000 ffff800007ffffff 128MB modules - ffff800008000000 fffffbffefffffff 124TB vmalloc + ffff800000000000 ffff80007fffffff 2GB modules + ffff800080000000 fffffbffefffffff 124TB vmalloc fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 78e5163836a0ab95..b58c3127323e16c8 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -46,7 +46,7 @@ #define KIMAGE_VADDR (MODULES_END) #define MODULES_END (MODULES_VADDR + MODULES_VSIZE) #define MODULES_VADDR (_PAGE_END(VA_BITS_MIN)) -#define MODULES_VSIZE (SZ_128M) +#define MODULES_VSIZE (SZ_2G) #define VMEMMAP_START (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT))) #define VMEMMAP_END (VMEMMAP_START + VMEMMAP_SIZE) #define PCI_IO_END (VMEMMAP_START - SZ_8M) diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h index 18734fed3bdd7609..98dae9f87b521f07 100644 --- a/arch/arm64/include/asm/module.h +++ b/arch/arm64/include/asm/module.h @@ -31,9 +31,15 @@ u64 module_emit_veneer_for_adrp(struct module *mod, Elf64_Shdr *sechdrs, void *loc, u64 val); #ifdef CONFIG_RANDOMIZE_BASE -extern u64 module_alloc_base; +extern u64 module_alloc_limit; #else -#define module_alloc_base ((u64)_etext - MODULES_VSIZE) +#define module_alloc_limit MODULE_REF_END +#endif + +#ifdef CONFIG_ARM64_MODULE_PLTS +#define MODULE_REF_END ((u64)_end) +#else +#define MODULE_REF_END ((u64)_etext) #endif struct plt_entry { diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c index e7477f21a4c9d062..14e96c3f707a74a3 100644 --- a/arch/arm64/kernel/kaslr.c +++ b/arch/arm64/kernel/kaslr.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -17,10 +18,11 @@ #include #include #include +#include #include #include -u64 __ro_after_init module_alloc_base; +u64 __ro_after_init module_alloc_limit = MODULE_REF_END; u16 __initdata memstart_offset_seed; struct arm64_ftr_override kaslr_feature_override __initdata; @@ -30,12 +32,6 @@ static int __init kaslr_init(void) u64 module_range; u32 seed; - /* - * Set a reasonable default for module_alloc_base in case - * we end up running with module randomization disabled. - */ - module_alloc_base = (u64)_etext - MODULES_VSIZE; - if (kaslr_feature_override.val & kaslr_feature_override.mask & 0xf) { pr_info("KASLR disabled on command line\n"); return 0; @@ -69,24 +65,28 @@ static int __init kaslr_init(void) * resolved via PLTs. (Branches between modules will be * resolved normally.) */ - module_range = SZ_2G - (u64)(_end - _stext); - module_alloc_base = max((u64)_end - SZ_2G, (u64)MODULES_VADDR); + module_range = SZ_2G; } else { /* - * Randomize the module region by setting module_alloc_base to - * a PAGE_SIZE multiple in the range [_etext - MODULES_VSIZE, - * _stext) . This guarantees that the resulting region still - * covers [_stext, _etext], and that all relative branches can - * be resolved without veneers unless this region is exhausted - * and we fall back to a larger 2GB window in module_alloc() - * when ARM64_MODULE_PLTS is enabled. + * Randomize the module region over a 128 MB window covering + * the kernel text. This guarantees that the resulting region + * still covers [_stext, _etext], and that all relative + * branches can be resolved without veneers unless this region + * is exhausted and we fall back to a larger 2GB window in + * module_alloc() when ARM64_MODULE_PLTS is enabled. */ - module_range = MODULES_VSIZE - (u64)(_etext - _stext); + module_range = SZ_128M; } + /* + * Subtract the size of the core kernel region that must be in range + * for all loaded modules. + */ + module_range -= MODULE_REF_END - (u64)_stext; + /* use the lower 21 bits to randomize the base of the module region */ - module_alloc_base += (module_range * (seed & ((1 << 21) - 1))) >> 21; - module_alloc_base &= PAGE_MASK; + module_alloc_limit += (module_range * (seed & ((1 << 21) - 1))) >> 21; + module_alloc_limit &= PAGE_MASK; return 0; } diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c index 5af4975caeb58ff7..aa61493957c010b2 100644 --- a/arch/arm64/kernel/module.c +++ b/arch/arm64/kernel/module.c @@ -24,7 +24,6 @@ void *module_alloc(unsigned long size) { - u64 module_alloc_end = module_alloc_base + MODULES_VSIZE; gfp_t gfp_mask = GFP_KERNEL; void *p; @@ -32,33 +31,34 @@ void *module_alloc(unsigned long size) if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS)) gfp_mask |= __GFP_NOWARN; - if (IS_ENABLED(CONFIG_KASAN_GENERIC) || - IS_ENABLED(CONFIG_KASAN_SW_TAGS)) - /* don't exceed the static module region - see below */ - module_alloc_end = MODULES_END; - - p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base, - module_alloc_end, gfp_mask, PAGE_KERNEL, VM_DEFER_KMEMLEAK, - NUMA_NO_NODE, __builtin_return_address(0)); + /* + * First, try to allocate from the 128 MB region just below the limit. + * If KASLR is disabled, or CONFIG_RANDOMIZE_MODULE_REGION_FULL is not + * set, this will produce an allocation that allows all relative + * branches into the kernel text to be resolved without the need for + * veneers (PLTs). If CONFIG_RANDOMIZE_MODULE_REGION_FULL is set, this + * 128 MB window might not cover the kernel text, but branches between + * modules will still be in relative branching range. + */ + p = __vmalloc_node_range(size, MODULE_ALIGN, + module_alloc_limit - SZ_128M, + module_alloc_limit, gfp_mask, PAGE_KERNEL, + VM_DEFER_KMEMLEAK, NUMA_NO_NODE, + __builtin_return_address(0)); - if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) && - (IS_ENABLED(CONFIG_KASAN_VMALLOC) || - (!IS_ENABLED(CONFIG_KASAN_GENERIC) && - !IS_ENABLED(CONFIG_KASAN_SW_TAGS)))) - /* - * KASAN without KASAN_VMALLOC can only deal with module - * allocations being served from the reserved module region, - * since the remainder of the vmalloc region is already - * backed by zero shadow pages, and punching holes into it - * is non-trivial. Since the module region is not randomized - * when KASAN is enabled without KASAN_VMALLOC, it is even - * less likely that the module region gets exhausted, so we - * can simply omit this fallback in that case. - */ - p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base, - module_alloc_base + SZ_2G, GFP_KERNEL, - PAGE_KERNEL, 0, NUMA_NO_NODE, - __builtin_return_address(0)); + /* + * If the prior allocation failed, and we have configured support for + * fixing up out-of-range relative branches through the use of PLTs, + * fall back to a 2 GB window for module allocations. This is the + * maximum we can support, due to the use of 32-bit place relative + * symbol references, which cannot be fixed up using PLTs. + */ + if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS)) + p = __vmalloc_node_range(size, MODULE_ALIGN, + module_alloc_limit - SZ_2G, + module_alloc_limit, GFP_KERNEL, + PAGE_KERNEL, 0, NUMA_NO_NODE, + __builtin_return_address(0)); if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) { vfree(p);