From patchwork Fri Jan 10 18:40:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13935254 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78FFEE7719C for ; Fri, 10 Jan 2025 18:42:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A6B96B00D5; Fri, 10 Jan 2025 13:41:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 557216B00D7; Fri, 10 Jan 2025 13:41:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D1D26B00D8; Fri, 10 Jan 2025 13:41:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 138B46B00D5 for ; Fri, 10 Jan 2025 13:41:46 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id BDDC5C0E1A for ; Fri, 10 Jan 2025 18:41:45 +0000 (UTC) X-FDA: 82992410970.01.445C48C Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) by imf04.hostedemail.com (Postfix) with ESMTP id DB71E40003 for ; Fri, 10 Jan 2025 18:41:43 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=h6riJRrm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 35mmBZwgKCAMmdfnpdqejrrjoh.frpolqx0-ppnydfn.ruj@flex--jackmanb.bounces.google.com designates 209.85.221.74 as permitted sender) smtp.mailfrom=35mmBZwgKCAMmdfnpdqejrrjoh.frpolqx0-ppnydfn.ruj@flex--jackmanb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736534504; a=rsa-sha256; cv=none; b=41JQHEsbyuZr6KX2e/daK5xfT+W/ISUYVUEhX04BFATtnriZSO8xnqnrHwqxU7fSSEAWrJ GDF/eEva+2+9XRvTHErXZ+VsWDKFkOeZj4Ze/gbSmHDbbk3duPGLpE9A7mYY2RmjLdqL6N pbpvg3APIoBkly0PZFBaVdln6T1XqH0= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=h6riJRrm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 35mmBZwgKCAMmdfnpdqejrrjoh.frpolqx0-ppnydfn.ruj@flex--jackmanb.bounces.google.com designates 209.85.221.74 as permitted sender) smtp.mailfrom=35mmBZwgKCAMmdfnpdqejrrjoh.frpolqx0-ppnydfn.ruj@flex--jackmanb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736534504; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lGjVOQFY6aPZLutIkt8TLO/unqgXoc0/gnPdxCEE5wQ=; b=MXxQULQl9TV2CVJsGWI4di4hZ6B7aF+4EgRcbEXygZjWkkQimy3GtPT5s/jbQ6zWxn61ZO IHVW7MM88tx3j7Rvn8hsON04XBTa+HlG4SBMt9EmlkIEcz1Rpg2oxamcMiqMI20Q42niEO S/ackPW+60QaVIYHuKA1sxOkodIMP34= Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-38629a685fdso880514f8f.2 for ; Fri, 10 Jan 2025 10:41:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534502; x=1737139302; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lGjVOQFY6aPZLutIkt8TLO/unqgXoc0/gnPdxCEE5wQ=; b=h6riJRrm509oyy//4WTFtu3hJ0J9inlfGIf3tXlOELhWz4BIfgbFr6UwTjuw2qlAUa cxagnjauP/aq5NLwjvGIHT2HPWl0zYVDBxLUH5tAizAvxG8+qteBjGi1A3MElgxldKtq lF8FRbgvSFImSD4P/WLuI5u3rKXqf6B25fbw3lASZK09I4H4jj4H5rnwLfincXZc45H1 Wvaya7S2MLUoz/S0rjhp0yiezROi/Imkg919DKhdihbuqd/QSJQqyYVbdyz11OmpZWGD 5h4k/X1tTLUyIaNM5z9/DkPIc6Vfo44LU2k9B8DICsc42suJUG/T7A/pmOa7yG6VOZer nd4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534502; x=1737139302; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lGjVOQFY6aPZLutIkt8TLO/unqgXoc0/gnPdxCEE5wQ=; b=ft/N56B6DNyCDFVB1knHU8vfRoJCFhvdU8YePIt1im3wqrH80HK7wMf+NJuF0KFSux Z1T1fsh9XvQ328+gKLl1IaVWtLIHJfhqtAVB+289SZyCUZA5YmS5LZ/+FDRa6//Og2L4 suWCF4mutyo7uIfDDAan2G2sPyVDhnwv+NWEIKIyrad6HTVbqzfHeGe+01khei4p+n3O IPJVECpBEy/48rLJVJf/sg2UNq2QSR9Wa04pCJwp3AwQ0Q1dNJZORa2pTqu6lJ3Xe90w eih/W6NqYxZ+jMCKVC7uC6qpSwlh0Wr1nJKNWC/QoQoHjsk0x3pBiZ1Zg8rgdIiTLpdo jIkA== X-Forwarded-Encrypted: i=1; AJvYcCWcBBEYh9XDu9WS8ja9+hqwUaCqC2gmV1mSfJIrw5qW+hSHUEbq8ZPVNoc+1X0N8rQLCU9YYrbIYA==@kvack.org X-Gm-Message-State: AOJu0Yy2TDhYDgwf1qwlT2ZGnTh/Ua65S5stvVD3ry9hCkXe6Gza7iz6 P3lGYV5jH/DAlkThQqGTZ3JMX5nOksmv7TJXI0NzaSfzrl3hih0j2YJFZDsuu8y6Un1qJwA7lgZ 6t9UMUckQ+A== X-Google-Smtp-Source: AGHT+IEWN36a0RhRoQNfORq+2yO51X6txHS1Rw2POezO/cSMdMR6AN2SOjOPPnxrT0b5W7xr5QSy1CVP6p9cPA== X-Received: from wmqa1.prod.google.com ([2002:a05:600c:3481:b0:436:1995:1888]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:588b:0:b0:385:df84:8496 with SMTP id ffacd0b85a97d-38a872c9432mr10863969f8f.3.1736534502226; Fri, 10 Jan 2025 10:41:42 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:51 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-25-8419288bc805@google.com> Subject: [PATCH RFC v2 25/29] mm: asi: Restricted execution fore bare-metal processes From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: DB71E40003 X-Stat-Signature: h8r7ezpnpg7rka4o4iwi9tkoksafxqza X-Rspam-User: X-HE-Tag: 1736534503-569994 X-HE-Meta: U2FsdGVkX1/eIcO13nSSi7ZAiH5Hh5BeidpXm5mzr6gB5g3k53DqvCai5G85Z7YtlukFodx8E/EBOy+rj1hot5rUEWtP+LyBEkWh/Jqf/QiuGU9arxZ/4vdPW6D3+TVeoO4YvouAKpbNVlQfhZfFvxJQo8nVDpLpdE49/e676f9wPOI5zt4FzD7u9pO3Wncpnzb3SbpQG6tC1s16npALir8s4ccW0VmVnWRie/OUe75KfVDGEreqJGcvbFfi1dJdBlCskwXWSUq+YVqwCZFC+NKzuCV3LkqGtzuLvZt+Hk/k3w20hObOvBQKxqRGttaP641K+PSu3/kse49lVklCI4o8IiZczA4pyroNxbN6ZlSrxZeSi+4oQAta5qpTBXa13+gWoclVpa3qioeyWxRlaUcHJ+3OL8v3V7Gwq63cFUv2FTJhrS0xERMv6Y4bwdFE2pZrp9r9l6tg8R/kS+VBr2EQKE13XpGG7ZiV1XThCXKvyHd7RMf9KlAznii/+kBOFdPmuj+cSXCrt0hXuP33EE4PjFUH6Q6HWM++bLQRIp72Mhz4uOTxmBcQCRimMeVToq5164mXkDSnjcsOJtL47r02mu6oWBBq/bAWcjOxIVLihZlmjC1CWML2EgE9x07Sw79PXvFHiMqlAkjsWoyFs49ScmbgWYH182n71s1O0GcRHHFvJUX7hdZWavzhE9y7fW6ltJbc8O2JSrjiPssrCH6iix/sM2LB0QRTkRiLeQ/lhNSg9TmAsjVgieF41qreCxmgC9FRC/gjn8xS15zLUkhZg0Xbso7Wk/Vane8wcxY1Jo+X9TwKKDHs2uPeiNjsz29ZvE43NkRPxLg6MGlwqy1daBIOm98zacVFPfVx88yeJwsurWFp1W+pVeJU4kuS/Rj3bLh1ZmHo9rFae6wiMcak42AA5AHB22Nu6fL75cbFV+CdZppe9uBeQv9zP8OnI24M9EKEcOa78Zf23Dl Jhg00Aac oV/T6wZlxkPq7BW7UsruDEFnADZsZ3nJMNJEckTBqbq2u3lSbCDEkCq3LDUBlPJpfUu53OF0X56PX9UOxGYk0E5143Mz3lN4Yd3lj2bhVybIphsdCw5ayNYl/i3PeuDXpKQ+t/995RflqGsTnfJR5nFsbO+dU4q6vxawKqqxxDCXR+7IDUb7qjVxuBmLbfNXR+m1xHSDL15t7auIorvqbCSL0b6zl1Nbpd+3vbV5445URz+XLoeGDIkZ9WQlry1XnFsNh+pIAZWzlok/gnJxTrKbj6VDFtxTIQx9V7o3ru1wzHfQzvdfOQ9Bj/icrHgZfhN7t2ILU2JwbH078m76nusS1tF2Qum/lvK+NAY3d+TYiyRHL0FJBgiIwo8E3nKgxRdjgFTMcJyZgup+TqsrcY/uY1pqaOZGyOdCa9JtsRMDmITo/wkxebqIZLkVmPuXRSXgwe1wMwOTYX3/Vqe6IsHJ8xqWeUiS2Dlg/R9QqBEWBPrOzQMDS7heS4SdBC0edln4YF+M+3qotchSN+NLY9e2IL6DV6AFOkSU9Mbupqk4rjR/SKxn5C397yCsr8aQtg4IC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Now userspace gets a restricted address space too. The critical section begins on exit to userspace and ends when it makes a system call. Other entries from userspace just interrupt the critical section via asi_intr_enter(). The reason why system calls have to actually asi_relax() (i.e. fully terminate the critical section instead of just interrupting it) is that system calls are the type of kernel entry that can lead to transition into a _different_ ASI domain, namely the KVM one: it is not supported to transition into a different domain while a critical section exists (i.e. while asi_state.target is not NULL), even if it has been paused by asi_intr_enter() (i.e. even if asi_state.intr_nest_depth is nonzero) - there must be an asi_relax() between any two asi_enter()s. The restricted address space for bare-metal tasks naturally contains the entire userspace address region, although the task's own memory is still missing from the direct map. This implementation creates new userspace-specific APIs for asi_init(), asi_destroy() and asi_enter(), which seems a little ugly, maybe this suggest a general rework of these APIs given that the "generic" version only has one caller. For RFC code this seems good enough though. Signed-off-by: Brendan Jackman --- arch/x86/include/asm/asi.h | 8 ++++++-- arch/x86/mm/asi.c | 49 ++++++++++++++++++++++++++++++++++++++++---- include/asm-generic/asi.h | 9 +++++++- include/linux/entry-common.h | 11 ++++++++++ init/main.c | 2 ++ kernel/entry/common.c | 1 + kernel/fork.c | 4 +++- 7 files changed, 76 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index e925d7d2cfc85bca8480c837548654e7a5a7009e..c3c1a57f0147ae9bd11d89c8bf7c8a4477728f51 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -140,19 +140,23 @@ DECLARE_PER_CPU_ALIGNED(struct asi *, curr_asi); void asi_check_boottime_disable(void); -void asi_init_mm_state(struct mm_struct *mm); +int asi_init_mm_state(struct mm_struct *mm); int asi_init_class(enum asi_class_id class_id, struct asi_taint_policy *taint_policy); +void asi_init_userspace_class(void); void asi_uninit_class(enum asi_class_id class_id); const char *asi_class_name(enum asi_class_id class_id); int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_asi); void asi_destroy(struct asi *asi); +void asi_destroy_userspace(struct mm_struct *mm); void asi_clone_user_pgtbl(struct mm_struct *mm, pgd_t *pgdp); /* Enter an ASI domain (restricted address space) and begin the critical section. */ void asi_enter(struct asi *asi); +void asi_enter_userspace(void); + /* * Leave the "tense" state if we are in it, i.e. end the critical section. We * will stay relaxed until the next asi_enter. @@ -294,7 +298,7 @@ void asi_handle_switch_mm(void); */ static inline bool asi_maps_user_addr(enum asi_class_id class_id) { - return false; + return class_id == ASI_CLASS_USERSPACE; } #endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 093103c1bc2677c81d68008aca064fab53b73a62..1e9dc568e79e8686a4dbf47f765f2c2535d025ec 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -25,6 +25,7 @@ const char *asi_class_names[] = { #if IS_ENABLED(CONFIG_KVM) [ASI_CLASS_KVM] = "KVM", #endif + [ASI_CLASS_USERSPACE] = "userspace", }; DEFINE_PER_CPU_ALIGNED(struct asi *, curr_asi); @@ -67,6 +68,32 @@ int asi_init_class(enum asi_class_id class_id, struct asi_taint_policy *taint_po } EXPORT_SYMBOL_GPL(asi_init_class); +void __init asi_init_userspace_class(void) +{ + static struct asi_taint_policy policy = { + /* + * Prevent going to userspace with sensitive data potentially + * left in sidechannels by code running in the unrestricted + * address space, or another MM. Note we don't check for guest + * data here. This reflects the assumption that the guest trusts + * its VMM (absent fancy HW features, which are orthogonal). + */ + .protect_data = ASI_TAINT_KERNEL_DATA | ASI_TAINT_OTHER_MM_DATA, + /* + * Don't go into userspace with control flow state controlled by + * other processes, or any KVM guest the process is running. + * Note this bit is about protecting userspace from other parts + * of the system, while data_taints is about protecting other + * parts of the system from the guest. + */ + .prevent_control = ASI_TAINT_GUEST_CONTROL | ASI_TAINT_OTHER_MM_CONTROL, + .set = ASI_TAINT_USER_CONTROL | ASI_TAINT_USER_DATA, + }; + int err = asi_init_class(ASI_CLASS_USERSPACE, &policy); + + WARN_ON(err); +} + void asi_uninit_class(enum asi_class_id class_id) { if (!boot_cpu_has(X86_FEATURE_ASI)) @@ -385,7 +412,8 @@ int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_ int err = 0; uint i; - *out_asi = NULL; + if (out_asi) + *out_asi = NULL; if (!boot_cpu_has(X86_FEATURE_ASI)) return 0; @@ -424,7 +452,7 @@ int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_ exit_unlock: if (err) __asi_destroy(asi); - else + else if (out_asi) *out_asi = asi; __asi_init_user_pgds(mm, asi); @@ -515,6 +543,12 @@ static __always_inline void maybe_flush_data(struct asi *next_asi) this_cpu_and(asi_taints, ~ASI_TAINTS_DATA_MASK); } +void asi_destroy_userspace(struct mm_struct *mm) +{ + VM_BUG_ON(!asi_class_initialized(ASI_CLASS_USERSPACE)); + asi_destroy(&mm->asi[ASI_CLASS_USERSPACE]); +} + noinstr void __asi_enter(void) { u64 asi_cr3; @@ -584,6 +618,11 @@ noinstr void asi_enter(struct asi *asi) } EXPORT_SYMBOL_GPL(asi_enter); +noinstr void asi_enter_userspace(void) +{ + asi_enter(¤t->mm->asi[ASI_CLASS_USERSPACE]); +} + noinstr void asi_relax(void) { if (static_asi_enabled()) { @@ -633,13 +672,15 @@ noinstr void asi_exit(void) } EXPORT_SYMBOL_GPL(asi_exit); -void asi_init_mm_state(struct mm_struct *mm) +int asi_init_mm_state(struct mm_struct *mm) { if (!boot_cpu_has(X86_FEATURE_ASI)) - return; + return 0; memset(mm->asi, 0, sizeof(mm->asi)); mutex_init(&mm->asi_init_lock); + + return asi_init(mm, ASI_CLASS_USERSPACE, NULL); } void asi_handle_switch_mm(void) diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index d103343292fad567dcd73e45e986fb3974e59898..c93f9e779ce1fa61e3df7835f5ab744cce7d667b 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -15,6 +15,7 @@ enum asi_class_id { #if IS_ENABLED(CONFIG_KVM) ASI_CLASS_KVM, #endif + ASI_CLASS_USERSPACE, ASI_MAX_NUM_CLASSES, }; static_assert(order_base_2(X86_CR3_ASI_PCID_BITS) <= ASI_MAX_NUM_CLASSES); @@ -37,8 +38,10 @@ int asi_init_class(enum asi_class_id class_id, static inline void asi_uninit_class(enum asi_class_id class_id) { } +static inline void asi_init_userspace_class(void) { } + struct mm_struct; -static inline void asi_init_mm_state(struct mm_struct *mm) { } +static inline int asi_init_mm_state(struct mm_struct *mm) { return 0; } static inline int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_asi) @@ -48,8 +51,12 @@ static inline int asi_init(struct mm_struct *mm, enum asi_class_id class_id, static inline void asi_destroy(struct asi *asi) { } +static inline void asi_destroy_userspace(struct mm_struct *mm) { } + static inline void asi_enter(struct asi *asi) { } +static inline void asi_enter_userspace(void) { } + static inline void asi_relax(void) { } static inline bool asi_is_relaxed(void) { return true; } diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 1e50cdb83ae501467ecc30ee52f1379d409f962e..f04c4c038556f84ddf3bc09b6c1dd22a9dbd2f6b 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -191,6 +191,16 @@ static __always_inline long syscall_enter_from_user_mode(struct pt_regs *regs, l { long ret; + /* + * End the ASI critical section for userspace. Syscalls are the only + * place this happens - all other entry from userspace is handled via + * ASI's interrupt-tracking. The reason syscalls are special is that's + * where it's possible to switch to another ASI domain within the same + * task (i.e. KVM_RUN), an asi_relax() is required here in case of an + * upcoming asi_enter(). + */ + asi_relax(); + enter_from_user_mode(regs); instrumentation_begin(); @@ -355,6 +365,7 @@ static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs) */ static __always_inline void exit_to_user_mode(void) { + instrumentation_begin(); trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); diff --git a/init/main.c b/init/main.c index c4778edae7972f512d5eefe8400075ac35a70d1c..d19e149d385e8321d2f3e7c28aa75802af62d09c 100644 --- a/init/main.c +++ b/init/main.c @@ -953,6 +953,8 @@ void start_kernel(void) /* Architectural and non-timekeeping rng init, before allocator init */ random_init_early(command_line); + asi_init_userspace_class(); + /* * These use large bootmem allocations and must precede * initalization of page allocator diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 5b6934e23c21d36a3238dc03e391eb9e3beb4cfb..874254ed5958d62eaeaef4fe3e8c02e56deaf5ed 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -218,6 +218,7 @@ __visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs) __syscall_exit_to_user_mode_work(regs); instrumentation_end(); exit_to_user_mode(); + asi_enter_userspace(); } noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs) diff --git a/kernel/fork.c b/kernel/fork.c index bb73758790d08112265d398b16902ff9a4c2b8fe..54068d2415939b92409ca8a45111176783c6acbd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -917,6 +917,7 @@ void __mmdrop(struct mm_struct *mm) /* Ensure no CPUs are using this as their lazy tlb mm */ cleanup_lazy_tlbs(mm); + asi_destroy_userspace(mm); WARN_ON_ONCE(mm == current->active_mm); mm_free_pgd(mm); destroy_context(mm); @@ -1297,7 +1298,8 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, if (mm_alloc_pgd(mm)) goto fail_nopgd; - asi_init_mm_state(mm); + if (asi_init_mm_state(mm)) + goto fail_nocontext; if (init_new_context(p, mm)) goto fail_nocontext;