From patchwork Fri Jan 10 18:40:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13935584 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A771DE77188 for ; Fri, 10 Jan 2025 23:20:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=04uHjKUyTWFbGw9XsUQM/4RpTm46PikrB5zM1HeAkFE=; b=zN6PCyJ4rprehf0quJAfajjCeD lxpweDkHm4KuYfOh88d5Z0ulTcJJVW0vXRVug3w8PwJnAfV3LcTelTw5SSFZdB7vIbiaO9HgC2B45 4Y2wgSg09dosNCyBbz+PDDe3FGElQ0IJ5jKREA1icnH5uBBRTUyAq4JS8DIsXGnTqVuNIO1ZO4ZRc KipZwu8rKwfZFVUFNyBpfzfDkhHz0Gxbl6/GZ536DjSR9+BDoFQ+QdTopPzfYaMywTtvcmq0KjBS6 6Hq/uSohxu9vyyTU9MxUMWjtBE1IwQEFWdgs5Z+/H42nkHmS2Ksclt9RbQY5HLQGyXwwGVujFq7ep lgGngmIQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tWOIV-0000000HEqP-0jVv; Fri, 10 Jan 2025 23:20:11 +0000 Received: from casper.infradead.org ([2001:8b0:10b:1236::1]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tWJx2-0000000GbuL-12kF for linux-riscv@bombadil.infradead.org; Fri, 10 Jan 2025 18:41:44 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:Cc:To:From:Subject: Message-ID:References:Mime-Version:In-Reply-To:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=P63tWziIl5MLfOevD/STQnofaJPDmGipUyDaNTwON8A=; b=P/3BljkJ+MZocxMP/xEVEIKHRJ OBRbZSXftz/RN0uLSSWASK3Zlh1JVE3He8te1rwTzQa+OIczT8i4V4Z1fjnkJUU0nMOGnEdbDqYWi ANGp2zQohJGCyred9aK9LyNs5quZXRSdEmG9DrcpEIqK06JtO76+CCa514f7eS+ff9i9mVUmPVd7T ui5I9dTB44SO9B5sO8JVfbRknioRhu42BoHW6hnqBPsib2zVuIC6TMQftN+3uDFRzMy84usY4gOiX azwMXkIbWfgbMBVZDsF2a6jYIjg8+1A+ycKcJX+wUTL0ZN4R3vHTbtyrx/1N2Psn+/aBBa08S+Hv0 +mKl5VkA==; Received: from mail-wr1-x44a.google.com ([2a00:1450:4864:20::44a]) by casper.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tWJwz-0000000EBVB-1lBN for linux-riscv@lists.infradead.org; Fri, 10 Jan 2025 18:41:43 +0000 Received: by mail-wr1-x44a.google.com with SMTP id ffacd0b85a97d-385d51ba2f5so1237649f8f.2 for ; Fri, 10 Jan 2025 10:41:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534500; x=1737139300; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=P63tWziIl5MLfOevD/STQnofaJPDmGipUyDaNTwON8A=; b=H66sv8ac/q9cNIe+j+5IiRc/lT2MPaOYpJ9yFI3utRDJPaEc0+CWaUHy7W6WUI+PrJ luob1cZV/GSNAT/Cb7mjBJIOwu/Es0CkWo0fRil3uoxQUDb0fTERUABNOwYAR0ICY5fe x/cbDv7mabtUukGaCavMJNgO7Cpixgrcb+S/05XQj/Q58dxxqBjQ2URGgwvYCKhKTWAj SVYnMYIvb+CpY1E84nyziVirlw3mgOGk8W73jn27fn3438pY+T0mQK4zv1+pye9RtYS7 WxB3J/EAwAkOkDcv0MQeDGCMvrbARhzv5CId7WM19cQJKA1jBXEEWh2eB5QDMv72uWCl q1Qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534500; x=1737139300; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=P63tWziIl5MLfOevD/STQnofaJPDmGipUyDaNTwON8A=; b=V9uscFX5YTqZ7m61tZYsED5gwkOrvPB1AqOqkQEwaKjupmapmfB+h2V7IgAj8HOeTC jQ0Sno4W2LdoJ2Jcge3Q1f8F+E9AzGBhLKDuDFGy+LSixFFYb2lBsQWEycF1R6YnkdDA jfHBUFBgubKCk1gQehuX51yL7s98gM9UZnKR72xNlqGw4FmC6hCyMXLhxP5BB2wD+cJm gqgeVPsYXFZPdQaSkL0+gNQibwHLUJT2TJ1kvv3C6jEWzZRgZ2rbfxLyGj74DdtZqILG c0lWfC3hRBjLXYk6lvV/JpYN38Mkj0EXom2ezQ0G4sROIdiRHiPbEf7unFPZjaugciMy TqYg== X-Forwarded-Encrypted: i=1; AJvYcCUtMxS6oseRd/S9oYFDsZBV0dvOzJ9BZPzSAYAPsXF/75dSP2BM+irlOSC2Xt+xv8FdmWmAgMYXdosJfA==@lists.infradead.org X-Gm-Message-State: AOJu0YwpePqj0r8CYQNac0kY/lWuaCsMzUHgWS3uk8soEBEA3otdJ0fJ jjZXwBRX3FDk01/EDmJtz/6FhacklF5psrkF4ygWplHxl5xGSHSE2njYMs1hAzWogETW1kqjbuG NsETTRGqquA== X-Google-Smtp-Source: AGHT+IFpLRoJ7ZhWq6f6CfOMCmb66msRmVLu03/rxTgeCGcpzkKTsEslMdO3oeVqCUER7+YbrfuCetVYZstiXQ== X-Received: from wmbew14.prod.google.com ([2002:a05:600c:808e:b0:436:16c6:831]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:156c:b0:385:e3c5:61ae with SMTP id ffacd0b85a97d-38a873125c3mr11343217f8f.31.1736534499784; Fri, 10 Jan 2025 10:41:39 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:50 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-24-8419288bc805@google.com> Subject: [PATCH RFC v2 24/29] mm: asi: Add infrastructure for mapping userspace addresses From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman , Junaid Shahid , Reiji Watanabe X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250110_184141_499708_EEFCEF11 X-CRM114-Status: GOOD ( 29.30 ) X-Mailman-Approved-At: Fri, 10 Jan 2025 15:19:44 -0800 X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org In preparation for sandboxing bare-metal processes, teach ASI to map userspace addresses into the restricted address space. Add a new policy helper to determine based on the class whether to do this. If the helper returns true, mirror userspace mappings into the ASI pagetables. Later, it will be possible for users who do not have a significant security boundary between KVM guests and their VMM process, to take advantage of this to reduce mitigation costs when switching between those two domains - to illustrate this idea, it's now reflected in the KVM taint policy, although the KVM class is still hard-coded not to map userspace addresses. Co-developed-by: Junaid Shahid Signed-off-by: Junaid Shahid Co-developed-by: Reiji Watanabe Signed-off-by: Reiji Watanabe Signed-off-by: Brendan Jackman --- arch/x86/include/asm/asi.h | 11 +++++ arch/x86/include/asm/pgalloc.h | 6 +++ arch/x86/include/asm/pgtable_64.h | 4 ++ arch/x86/kvm/x86.c | 12 +++-- arch/x86/mm/asi.c | 92 +++++++++++++++++++++++++++++++++++++++ include/asm-generic/asi.h | 4 ++ 6 files changed, 125 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 555edb5f292e4d6baba782f51d014aa48dc850b6..e925d7d2cfc85bca8480c837548654e7a5a7009e 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -133,6 +133,7 @@ struct asi { struct mm_struct *mm; int64_t ref_count; enum asi_class_id class_id; + spinlock_t pgd_lock; }; DECLARE_PER_CPU_ALIGNED(struct asi *, curr_asi); @@ -147,6 +148,7 @@ const char *asi_class_name(enum asi_class_id class_id); int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_asi); void asi_destroy(struct asi *asi); +void asi_clone_user_pgtbl(struct mm_struct *mm, pgd_t *pgdp); /* Enter an ASI domain (restricted address space) and begin the critical section. */ void asi_enter(struct asi *asi); @@ -286,6 +288,15 @@ static __always_inline bool asi_in_critical_section(void) void asi_handle_switch_mm(void); +/* + * This function returns true when we would like to map userspace addresses + * in the restricted address space. + */ +static inline bool asi_maps_user_addr(enum asi_class_id class_id) +{ + return false; +} + #endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ #endif diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index dcd836b59bebd329c3d265b98e48ef6eb4c9e6fc..edf9fe76c53369eefcd5bf14a09cbf802cf1ea21 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -114,12 +114,16 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud) { paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT); set_p4d(p4d, __p4d(_PAGE_TABLE | __pa(pud))); + if (!pgtable_l5_enabled()) + asi_clone_user_pgtbl(mm, (pgd_t *)p4d); } static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d, pud_t *pud) { paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT); set_p4d_safe(p4d, __p4d(_PAGE_TABLE | __pa(pud))); + if (!pgtable_l5_enabled()) + asi_clone_user_pgtbl(mm, (pgd_t *)p4d); } extern void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud); @@ -137,6 +141,7 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d) return; paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT); set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d))); + asi_clone_user_pgtbl(mm, pgd); } static inline void pgd_populate_safe(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d) @@ -145,6 +150,7 @@ static inline void pgd_populate_safe(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4 return; paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT); set_pgd_safe(pgd, __pgd(_PAGE_TABLE | __pa(p4d))); + asi_clone_user_pgtbl(mm, pgd); } static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr) diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index d1426b64c1b9715cd9e4d1d7451ae4feadd8b2f5..fe6d83ec632a6894527784f2ebdbd013161c6f09 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -157,6 +157,8 @@ static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d) static inline void native_p4d_clear(p4d_t *p4d) { native_set_p4d(p4d, native_make_p4d(0)); + if (!pgtable_l5_enabled()) + asi_clone_user_pgtbl(NULL, (pgd_t *)p4d); } static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd) @@ -167,6 +169,8 @@ static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd) static inline void native_pgd_clear(pgd_t *pgd) { native_set_pgd(pgd, native_make_pgd(0)); + if (pgtable_l5_enabled()) + asi_clone_user_pgtbl(NULL, pgd); } /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3e0811eb510650abc601e4adce1ce4189835a730..920475fe014f6503dd88c7bbdb6b2707c084a689 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9712,11 +9712,15 @@ static inline int kvm_x86_init_asi_class(void) /* * And the same for data left behind by code in the userspace domain * (i.e. the VMM itself, plus kernel code serving its syscalls etc). - * This should eventually be configurable: users whose VMMs contain - * no secrets can disable it to avoid paying a mitigation cost on - * transition between their guest and userspace. + * + * + * If we decided to map userspace into the guest's restricted address + * space then we don't bother with this since we assume either no bugs + * allow the guest to leak that data, or the user doesn't care about + * that security boundary. */ - policy.protect_data |= ASI_TAINT_USER_DATA; + if (!asi_maps_user_addr(ASI_CLASS_KVM)) + policy.protect_data |= ASI_TAINT_USER_DATA; return asi_init_class(ASI_CLASS_KVM, &policy); } diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index c5073af1a82ded1c6fc467cd7a5d29a39d676bb4..093103c1bc2677c81d68008aca064fab53b73a62 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "mm_internal.h" #include "../../../mm/internal.h" @@ -351,6 +352,33 @@ static void __asi_destroy(struct asi *asi) memset(asi, 0, sizeof(struct asi)); } +static void __asi_init_user_pgds(struct mm_struct *mm, struct asi *asi) +{ + int i; + + if (!asi_maps_user_addr(asi->class_id)) + return; + + /* + * The code below must be executed only after the given asi is + * available in mm->asi[index] to ensure at least either this + * function or __asi_clone_user_pgd() will copy entries in the + * unrestricted pgd to the restricted pgd. + */ + if (WARN_ON_ONCE(&mm->asi[asi->class_id] != asi)) + return; + + /* + * See the comment for __asi_clone_user_pgd() why we hold the lock here. + */ + spin_lock(&asi->pgd_lock); + + for (i = 0; i < KERNEL_PGD_BOUNDARY; i++) + set_pgd(asi->pgd + i, READ_ONCE(*(mm->pgd + i))); + + spin_unlock(&asi->pgd_lock); +} + int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_asi) { struct asi *asi; @@ -388,6 +416,7 @@ int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_ asi->mm = mm; asi->class_id = class_id; + spin_lock_init(&asi->pgd_lock); for (i = KERNEL_PGD_BOUNDARY; i < PTRS_PER_PGD; i++) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); @@ -398,6 +427,7 @@ int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_ else *out_asi = asi; + __asi_init_user_pgds(mm, asi); mutex_unlock(&mm->asi_init_lock); return err; @@ -891,3 +921,65 @@ void asi_unmap(struct asi *asi, void *addr, size_t len) asi_flush_tlb_range(asi, addr, len); } + +/* + * This function is to copy the given unrestricted pgd entry for + * userspace addresses to the corresponding restricted pgd entries. + * It means that the unrestricted pgd entry must be updated before + * this function is called. + * We map entire userspace addresses to the restricted address spaces + * by copying unrestricted pgd entries to the restricted page tables + * so that we don't need to maintain consistency of lower level PTEs + * between the unrestricted page table and the restricted page tables. + */ +void asi_clone_user_pgtbl(struct mm_struct *mm, pgd_t *pgdp) +{ + unsigned long pgd_idx; + struct asi *asi; + int i; + + if (!static_asi_enabled()) + return; + + /* We shouldn't need to take care non-userspace mapping. */ + if (!pgdp_maps_userspace(pgdp)) + return; + + /* + * The mm will be NULL for p{4,g}d_clear(). We need to get + * the owner mm for this pgd in this case. The pgd page has + * a valid pt_mm only when SHARED_KERNEL_PMD == 0. + */ + BUILD_BUG_ON(SHARED_KERNEL_PMD); + if (!mm) { + mm = pgd_page_get_mm(virt_to_page(pgdp)); + if (WARN_ON_ONCE(!mm)) + return; + } + + /* + * Compute a PGD index of the given pgd entry. This will be the + * index of the ASI PGD entry to be updated. + */ + pgd_idx = pgdp - PTR_ALIGN_DOWN(pgdp, PAGE_SIZE); + + for (i = 0; i < ARRAY_SIZE(mm->asi); i++) { + asi = mm->asi + i; + + if (!asi_pgd(asi) || !asi_maps_user_addr(asi->class_id)) + continue; + + /* + * We need to synchronize concurrent callers of + * __asi_clone_user_pgd() among themselves, as well as + * __asi_init_user_pgds(). The lock makes sure that reading + * the unrestricted pgd and updating the corresponding + * ASI pgd are not interleaved by concurrent calls. + * We cannot rely on mm->page_table_lock here because it + * is not always held when pgd/p4d_clear_bad() is called. + */ + spin_lock(&asi->pgd_lock); + set_pgd(asi_pgd(asi) + pgd_idx, READ_ONCE(*pgdp)); + spin_unlock(&asi->pgd_lock); + } +} diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 4f033d3ef5929707fd280f74fc800193e45143c1..d103343292fad567dcd73e45e986fb3974e59898 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -95,6 +95,10 @@ void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } static inline void asi_check_boottime_disable(void) { } +static inline void asi_clone_user_pgtbl(struct mm_struct *mm, pgd_t *pgdp) { }; + +static inline bool asi_maps_user_addr(enum asi_class_id class_id) { return false; } + #endif /* !CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ #endif /* !_ASSEMBLY_ */