From patchwork Fri Jan 10 18:40:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13935245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9145E77188 for ; Fri, 10 Jan 2025 18:41:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B735A8D0007; Fri, 10 Jan 2025 13:41:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AFC428D0005; Fri, 10 Jan 2025 13:41:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 974EE8D0007; Fri, 10 Jan 2025 13:41:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7348C8D0005 for ; Fri, 10 Jan 2025 13:41:25 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3EED1120DDA for ; Fri, 10 Jan 2025 18:41:25 +0000 (UTC) X-FDA: 82992410130.04.7D47EC6 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) by imf21.hostedemail.com (Postfix) with ESMTP id 5C8671C000A for ; Fri, 10 Jan 2025 18:41:23 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=LKsqYMkV; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of 30WmBZwgKCOwXOQYaObPUccUZS.QcaZWbil-aaYjOQY.cfU@flex--jackmanb.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=30WmBZwgKCOwXOQYaObPUccUZS.QcaZWbil-aaYjOQY.cfU@flex--jackmanb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736534483; a=rsa-sha256; cv=none; b=JVG/kJUIKgageMjFoXIXfUfjzctolugTk0Bqn8l2PoPBDdmR8PtvmzgM2rS+sBHemSX9Bh W+hGvH/75lF8KTz7+bRZY2aaPvd3obLAJco1EZAjupEVg73nGBCrjM2HYR0gLErbuwdSJT jSZyjoUMIREqk1DayKyEmBJ/P4xJawg= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=LKsqYMkV; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of 30WmBZwgKCOwXOQYaObPUccUZS.QcaZWbil-aaYjOQY.cfU@flex--jackmanb.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=30WmBZwgKCOwXOQYaObPUccUZS.QcaZWbil-aaYjOQY.cfU@flex--jackmanb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736534483; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KV+yP+i7I0yJqjdnPuT5zohhg/S0Da4STnyrBal2kA0=; b=2SxMR5uXhrbkwWojwxLkNrAVFEH4OFlIYg32AH970y4F3+WvZqV68gqruh7z9J1mdU6HC+ i4U3toWivakh3N7yv9q6wIhEGgp6FkEKctUsiCV69fvBao2GXQPPuW+y4yVuFCehKlFYw6 jpZPNfJXfRvlx4a2IY99eAV8DWsUyWU= Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-436379713baso11447025e9.2 for ; Fri, 10 Jan 2025 10:41:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534482; x=1737139282; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KV+yP+i7I0yJqjdnPuT5zohhg/S0Da4STnyrBal2kA0=; b=LKsqYMkVAdZSgdWpl4M8xJGL0RGpRAco5Q73cJTH1xTpiVfbpLag/zuwOsCa3AhDrq jqJ8K+jn8U5I687v/StFh+mrz/KfJKoD32PQdXhZaw+7x566blWS5lzUH/QeBICvLjQ2 rho3ZYZP/xR5pEpF4f3x7RSlA0oetKVWCgUa3pPN3+97+4nkUaViR8bxmlt+SYTf4qC4 FWgeof/ylylsHk4xxh9K+eVb7IP2MEYhv1gE7ugWHnqLD+oK1tU+XXyXBPfcKdrg7mSM vPsy7K2EI15eQmGSq2HgnVO18Lr8NNilv/b80R7vhIGtRkHI98DsbxLCY5JpRqOkKKFQ Ew7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534482; x=1737139282; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KV+yP+i7I0yJqjdnPuT5zohhg/S0Da4STnyrBal2kA0=; b=voteOPH7dx4yuKNJhQi7ew7AvyunnHCD2VTrRys/JJReUyGKmJFMaVa9IzGVrD/d+V uaa/2TLjaBWN8vM+FMzRKmAnekykZs9kbNDmYVZOVySEL1treAE6FWql7MlIBzdx39kf 80kBO3+4j8eYCXynYnmkldcGMoZ5zYbGiG+KJ/Qb8DmSkxrn6AvTY8vSpM88Ce/OHdh/ JtPxd3xk1dBzETchUJ7R21g4BhkpxjZUP1a8BXG2hQhl553tjLJNscXkBrbpznwfCPl7 /9v6GGWLcg2VtWSAes53icGMPrME2Qhz6qQXBFm+YvxjrJsd+Ohc6RjGFR9rXjZ/PmKE 2zug== X-Forwarded-Encrypted: i=1; AJvYcCUm1xvnlnAyu/hIWpwxb5JkVk6S+ytSrx2zqvmG+LFxwHxZjEt023lgVJ8ltp/0mwD2VUxnC5ZTnA==@kvack.org X-Gm-Message-State: AOJu0YxybBYzI6A5gSOGSQmKyTB2JVcZZVUu0Ukyi1yZIeo3gETuab6j x6VjOdzJKUzsoeejW7weIESUk+QmZSl/MQDaApp9yV37i99I751iBYfwtJsEitzfeOv0nR60x0b cUTwEuaRo5g== X-Google-Smtp-Source: AGHT+IGysxF4lj+Pa3edYCQyxcx2uUdgDB4x/aH3it4KUk62HUjYx0xS9V+RTFKPnAfHJUfpiQOrJBNcPZbgiA== X-Received: from wmbfl22.prod.google.com ([2002:a05:600c:b96:b0:436:6fa7:621]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:5117:b0:431:60ec:7a96 with SMTP id 5b1f17b1804b1-436e26ddc53mr94187935e9.25.1736534481726; Fri, 10 Jan 2025 10:41:21 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:42 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-16-8419288bc805@google.com> Subject: [PATCH RFC v2 16/29] mm: asi: Map kernel text and static data as nonsensitive From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5C8671C000A X-Stat-Signature: iwco5jonjps9ntzjpx736kcp5i7nwq51 X-Rspam-User: X-HE-Tag: 1736534483-473138 X-HE-Meta: U2FsdGVkX19forVi00lq8HF8Jvw1KlDA+ke7RRCaC8pdh0fiDzi5qCMHicxk9AeYwh5YUuCBwHK80bHWl/hwgqxjGc8kc1uqHr9/q93nnBYPxur4LVF/oxA5PTIHYWW+72WYFzNSlnqq4HpOKJPQpo2At/vsGolDFkJLU45gEWjGYGjzGacZnWYDjZ3McGIe66TYHrc8Gt6zHhow2cyEGekPUB5ZpN37DMrQypOp+DI6ii6KSB4+6Kw5zuEm6egW+N4ti4hd5mXgZGXNggWriFgWl7Hyxp+hoW8I+DJmUztPNnyP7S10ALVN+dOpUG6klw/F5NKxZUzcxTBC594zgp+vbjp8Bp0evScCKTIKagln29GwS6mkn4/WSZ1X9pX3PLByUHuzasca3VBH5HDFS7iBj8r4e7iN94swu0UPGXGdxY56NCTG+3bVkoCu+5E2Ceg3N9DpozPo7uT81w2wbzvMxi+CWcyuXxtRMPxvNp1T+9BjMHzhAu5njWFk/Govok1iATIG1/x3ioH0Y/D+hnHxoWhuheCYbxNnpoDaBgIhnJi9lKjRq3Y40hjcUJHNtv92ErHoa9iGlpWGg5QfbhsLHiHHmoNQtFJoIsR2HQipHeoSTNfBjO06N9oMTTVxBs9Luu1C3TPOnkLtPMvQkMxAa/91H8A0Wlk2bvZfX4jhyOzbyRJ9yWQ1E32U3YRa7iSXmVkAn1NwKCS4ifWZmhvInqwKdqbeaDd1z08XNO8GaeSP9POTuQrsdvjVfR+2ibxeE/Z6+DTKpBC8IGF0yHqg4LDphCfKQhTu3fswvasPIRiY/BhQjb6O4vVeWiPOlLnKOQD66L6Bh8FEhSCqyWoLsXtNHGu+4Kcxu8byy3k3trGswqk1GKG7BQDasngUlhZkl72rq97HAdpPth7C6RVzv1E3FtGZ8Au0VcJ91/TTKOR9621Wwxkx883QdKPYJhAjlaXWOARaYKu7+Ig Cme31vJz 6wrYwCopvajzNIkjtc40ltA8EVRWkz/jwrfUqwY42N1PNf29OLQAaR7h2+/MuolVNZJeH0ZMEN3PnH2NXACuLDVFVOl5ZOtecmuuuB5gE50zOpSOZVdxs0QDY8kp57NqhnGQnOHo9OY96KLZwjbW2SCr6O6J0xpbmmVLcHUGNihp/e0a8/6Fkx2Ne2SoGCnx03FpbmjvxM53OnztBH4SFt5NdkdlGSmc/ZlZWKbK5uPrlCZhJBKO1tGlU8lSVSgy1bSkKJ4W6IHfX0UHl+29K3V394A3akiXjOKBP+AuoK3v6X31qxxoYQBfG5n3NRqQ/kcE+JnOhhEJaYcGztWHoxFaJz59nH0/y7p2VzBXNl5EBgHpH5PqRcPTFFUR9PtveYUIyLepEvVm2t8h1FnCoI16W+JKr7yPFFwH7SvG9rccQW1rQ3qRUYmtAGIVEcr6FoK7CNNYmM9HvnqMRFIn23bqrAxnHVEIh44SSQDNgVG51uTD4rWseOj1yXvwVMEuwQcl7hOgKxrlkCVqpHujZQjt3f6TARo59wcSktcqK1U8cvdjRL5Q/snroIQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Basically we need to map the kernel code and all its static variables. Per-CPU variables need to be treated specially as described in the comments. The cpu_entry_area is similar - this needs to be nonsensitive so that the CPU can access the GDT etc when handling a page fault. Under 5-level paging, most of the kernel memory comes under a single PGD entry (see Documentation/x86/x86_64/mm.rst. Basically, the mapping is for this big region is the same as under 4-level, just wrapped in an outer PGD entry). For that region, the "clone" logic is moved down one step of the paging hierarchy. Note that the p4d_alloc in asi_clone_p4d won't actually be used in practice; the relevant PGD entry will always have been populated by prior asi_map calls so this code would "work" if we just wrote p4d_offset (but asi_clone_p4d would be broken if viewed in isolation). The vmemmap area is not under this single PGD, it has its own 2-PGD area, so we still use asi_clone_pgd for that one. Signed-off-by: Brendan Jackman --- arch/x86/mm/asi.c | 105 +++++++++++++++++++++++++++++++++++++- include/asm-generic/vmlinux.lds.h | 11 ++++ 2 files changed, 115 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index b951f2100b8bdea5738ded16166255deb29faf57..bc2cf0475a0e7344a66d81453f55034b2fc77eef 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -7,7 +7,6 @@ #include #include -#include #include #include #include @@ -186,8 +185,68 @@ void __init asi_check_boottime_disable(void) pr_info("ASI enablement ignored due to incomplete implementation.\n"); } +/* + * Map data by sharing sub-PGD pagetables with the unrestricted mapping. This is + * more efficient than asi_map, but only works when you know the whole top-level + * page needs to be mapped in the restricted tables. Note that the size of the + * mappings this creates differs between 4 and 5-level paging. + */ +static void asi_clone_pgd(pgd_t *dst_table, pgd_t *src_table, size_t addr) +{ + pgd_t *src = pgd_offset_pgd(src_table, addr); + pgd_t *dst = pgd_offset_pgd(dst_table, addr); + + if (!pgd_val(*dst)) + set_pgd(dst, *src); + else + WARN_ON_ONCE(pgd_val(*dst) != pgd_val(*src)); +} + +/* + * For 4-level paging this is exactly the same as asi_clone_pgd. For 5-level + * paging it clones one level lower. So this always creates a mapping of the + * same size. + */ +static void asi_clone_p4d(pgd_t *dst_table, pgd_t *src_table, size_t addr) +{ + pgd_t *src_pgd = pgd_offset_pgd(src_table, addr); + pgd_t *dst_pgd = pgd_offset_pgd(dst_table, addr); + p4d_t *src_p4d = p4d_alloc(&init_mm, src_pgd, addr); + p4d_t *dst_p4d = p4d_alloc(&init_mm, dst_pgd, addr); + + if (!p4d_val(*dst_p4d)) + set_p4d(dst_p4d, *src_p4d); + else + WARN_ON_ONCE(p4d_val(*dst_p4d) != p4d_val(*src_p4d)); +} + +/* + * percpu_addr is where the linker put the percpu variable. asi_map_percpu finds + * the place where the percpu allocator copied the data during boot. + * + * This is necessary even when the page allocator defaults to + * global-nonsensitive, because the percpu allocator uses the memblock allocator + * for early allocations. + */ +static int asi_map_percpu(struct asi *asi, void *percpu_addr, size_t len) +{ + int cpu, err; + void *ptr; + + for_each_possible_cpu(cpu) { + ptr = per_cpu_ptr(percpu_addr, cpu); + err = asi_map(asi, ptr, len); + if (err) + return err; + } + + return 0; +} + static int __init asi_global_init(void) { + int err; + if (!boot_cpu_has(X86_FEATURE_ASI)) return 0; @@ -207,6 +266,46 @@ static int __init asi_global_init(void) VMALLOC_START, VMALLOC_END, "ASI Global Non-sensitive vmalloc"); + /* Map all kernel text and static data */ + err = asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)__START_KERNEL, + (size_t)_end - __START_KERNEL); + if (WARN_ON(err)) + return err; + err = asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)FIXADDR_START, + FIXADDR_SIZE); + if (WARN_ON(err)) + return err; + /* Map all static percpu data */ + err = asi_map_percpu( + ASI_GLOBAL_NONSENSITIVE, + __per_cpu_start, __per_cpu_end - __per_cpu_start); + if (WARN_ON(err)) + return err; + + /* + * The next areas are mapped using shared sub-P4D paging structures + * (asi_clone_p4d instead of asi_map), since we know the whole P4D will + * be mapped. + */ + asi_clone_p4d(asi_global_nonsensitive_pgd, init_mm.pgd, + CPU_ENTRY_AREA_BASE); +#ifdef CONFIG_X86_ESPFIX64 + asi_clone_p4d(asi_global_nonsensitive_pgd, init_mm.pgd, + ESPFIX_BASE_ADDR); +#endif + /* + * The vmemmap area actually _must_ be cloned via shared paging + * structures, since mappings can potentially change dynamically when + * hugetlbfs pages are created or broken down. + * + * We always clone 2 PGDs, this is a corrolary of the sizes of struct + * page, a page, and the physical address space. + */ + WARN_ON(sizeof(struct page) * MAXMEM / PAGE_SIZE != 2 * (1UL << PGDIR_SHIFT)); + asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd, VMEMMAP_START); + asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd, + VMEMMAP_START + (1UL << PGDIR_SHIFT)); + return 0; } subsys_initcall(asi_global_init) @@ -599,6 +698,10 @@ static bool follow_physaddr( * Map the given range into the ASI page tables. The source of the mapping is * the regular unrestricted page tables. Can be used to map any kernel memory. * + * In contrast to some internal ASI logic (asi_clone_pgd and asi_clone_p4d) this + * never shares pagetables between restricted and unrestricted address spaces, + * instead it creates wholly new equivalent mappings. + * * The caller MUST ensure that the source mapping will not change during this * function. For dynamic kernel memory, this is generally ensured by mapping the * memory within the allocator. diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index eeadbaeccf88b73af40efe5221760a7cb37058d2..18f6c0448baf5dfbd0721ba9a6d89000fa86f061 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -1022,6 +1022,16 @@ COMMON_DISCARDS \ } +/* + * ASI maps certain sections with certain sensitivity levels, so they need to + * have a page-aligned size. + */ +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +#define ASI_ALIGN() ALIGN(PAGE_SIZE) +#else +#define ASI_ALIGN() . +#endif + /** * PERCPU_INPUT - the percpu input sections * @cacheline: cacheline size @@ -1043,6 +1053,7 @@ *(.data..percpu) \ *(.data..percpu..shared_aligned) \ PERCPU_DECRYPTED_SECTION \ + . = ASI_ALIGN(); \ __per_cpu_end = .; /**