From patchwork Fri Jan 10 18:40:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13935234 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D63DE77188 for ; Fri, 10 Jan 2025 18:41:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 000AC6B00BD; Fri, 10 Jan 2025 13:41:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EF4ED6B00BF; Fri, 10 Jan 2025 13:41:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CAA0E6B00C0; Fri, 10 Jan 2025 13:41:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9F28D6B00BD for ; Fri, 10 Jan 2025 13:41:00 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 63ACFA0DEE for ; Fri, 10 Jan 2025 18:41:00 +0000 (UTC) X-FDA: 82992409080.27.C1D1F66 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) by imf25.hostedemail.com (Postfix) with ESMTP id 6777AA000B for ; Fri, 10 Jan 2025 18:40:58 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Q6ZZQEyx; spf=pass (imf25.hostedemail.com: domain of 3uGmBZwgKCNM8z19BzC05DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jackmanb.bounces.google.com designates 209.85.128.73 as permitted sender) smtp.mailfrom=3uGmBZwgKCNM8z19BzC05DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jackmanb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736534458; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=06++54m2jQMoQajJn4JgiUDifyLkkN/hJfJdF7FjHRg=; b=5Ak0F1QRR4/rXMlJ9Jd1s/EAXiW/xNxpt71XQYjfKx5/HvL7UVSbGnflOuajWK9woMF0ub iSzc7cehViZenGZrJowaoQk3RukhGDj4YhKyGAltj8Drduc69YKU7ad3gtBcJkw+4L6lub Cbokv7vuIHf8O4J2erd9gPUu9K4veaU= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Q6ZZQEyx; spf=pass (imf25.hostedemail.com: domain of 3uGmBZwgKCNM8z19BzC05DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jackmanb.bounces.google.com designates 209.85.128.73 as permitted sender) smtp.mailfrom=3uGmBZwgKCNM8z19BzC05DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jackmanb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736534458; a=rsa-sha256; cv=none; b=viC3FZK0eO4aVtyLBzN7UAxSp0ldeLD5qx01WXE10wXaORfAYOD5XyB/9eziKDbxTUSi+d LUcLUVYhLH0rAegeJjmP2Ivy9M4KtU2SijJ6J03tV0xxBEM0bsOJ7clkJIGrEAOau8gIxQ 127MdPHjB/gxKt6/kTqy/s7gFiYWkgQ= Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43631d8d9c7so11981215e9.1 for ; Fri, 10 Jan 2025 10:40:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534457; x=1737139257; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=06++54m2jQMoQajJn4JgiUDifyLkkN/hJfJdF7FjHRg=; b=Q6ZZQEyx+Hyd8rjYT32sPbasvVxjNoTcVu1Zaag59WXCiGlTJ/q4VQUGSmVayAyba3 60+WaJzkk1ONV0fETbBIK1wxGfwMmTIQ01ANHU9P2ZEdRtRFLuBjIh9EGoW5xfejOj+Q jYmcbwjd2LWm+liEmGGqeHhsdUcXyYy4+ya61xZ3ejnKlavPpdk3Oj5LWITifjKoyVgd rRWGlHPwHDcOkzK9QiO72QkRsU8Aa/HwXbTFjxQ1+y/0mALSP3Vnof3U+dEt9Iv00ZC3 oOgxVtcXrE7+RT8tdTUfehnO9I6dJLNOsjlp2jPyh2Z7MEqIIo+O34q7p6aLeIOKSxQa i8dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534457; x=1737139257; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=06++54m2jQMoQajJn4JgiUDifyLkkN/hJfJdF7FjHRg=; b=T5RtRo47I4kedPmcw1DoM2V0XZBjMwr52/NmTHXyjbOipK64j1nuvb1UbMMO/ERB2T /RwfqtbVWjUwV4TNNfHbTz+W5SoEA4uIYiw5awwFfJujcsfZj5EXIANA2etue+JO8mm3 rCSv9gfyvzakqULjW4yHHIh0cGuNHVSwjw41tG9dssfIIPxEjolblZL8/KY+GCh7eTKT DMEzVFM1Jw3Ea6E0gY8FfYyiQQZsjiPkKbVDyTSisP+5565Nw7myfSF6sSfy/FWU77UR OFIlj18oPLzPyC8PT9WNsGp96iJ1DW5naxQbG6w9HqbQzaOx85FieRyBlAFtikizU2cW H9fg== X-Forwarded-Encrypted: i=1; AJvYcCUb89DWSMZvnlAZ/5V9rdzY4Rb+acBG7S2sX4ykGsVDPtlZ2zBrhk3sw7fykz8wm/prWsrbbWkk0A==@kvack.org X-Gm-Message-State: AOJu0YxYYbuNqlCVkC5LP/Mb4UT4oufR6SX4ZpVyiyBv/xuQ1RssJLUf u8IbWZDBmyX2rmQZ0k84+HAv4HBuwczo+QK/yZmkojv+H7bZUOwxUmhlpNaRlUtfdAZ67Bg11/Q p5Ynzvlgnow== X-Google-Smtp-Source: AGHT+IGnlkbZgvto5V3RLsC0pJW1qd+uu81ptaoP4aB2W/3FKEzvBJWjRMlcB/euDId6QKmAnD2UbMWUf073rw== X-Received: from wmqd22.prod.google.com ([2002:a05:600c:34d6:b0:436:185e:c91d]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:444a:b0:434:fe3c:c662 with SMTP id 5b1f17b1804b1-436e9d7b99cmr59996085e9.12.1736534456699; Fri, 10 Jan 2025 10:40:56 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:31 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-5-8419288bc805@google.com> Subject: [PATCH RFC v2 05/29] mm: asi: ASI support in interrupts/exceptions From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman , Junaid Shahid X-Rspamd-Queue-Id: 6777AA000B X-Stat-Signature: o1ayr78f8adr33h4rzjab7p9b54hf8oq X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1736534458-712268 X-HE-Meta: U2FsdGVkX19WO0W+2KGfYgd1ovO2jwjOdJQCz2Hak/R+oIUFayrcyXYb+jcy97ytbYttZ04WgjMPvW1W7qeL80VcAA2n8HVYzKG6K595J+m8EuDgf1u5OG3VhkSPqfSKJAyp2r7wbXGo5uNEXyNbTVNLHLQ56jQ31KsKgVbsTqao4G3jgBhZRQ1v2tnuJl696zng6u4OFbeR/3M2ByGxICiPkpW+vOo9fHfl6gIxbhGwLdVZrkWnZ6r8qEyQpbLQgTmXNU6bSEd9E2FOZ8DVxT9hhy3SZvd7fdQ3KFcYXtBrTFhEbPDtxhtEmX3dpqFjBowo0PHlASJTntzLvHONBJQa6KNxZeCrDnWo8+QHQWbq0oJusg1670Uskq0Tv0/fPLDz80XVUjmZ7QzwoU6XVbvcj5S2qibKMKIf7XzS5U37e4p+CX+05Z4ivKEpbygxSes2Z+YXfMBqkd9ETXQ+5gtHUYj7ccQrmJf/rEB6Dc95ko+9j1yGSn1qUlG2HDpTYNw/4q1pLzFd4j8ufy2FFsvjb4exRCkuBscqfyEaL9sTSmV6htlXVmfp6mXAtRNy5zBOeageytW1F2LhcE79oPHJ+oKj1ToMD7Lr9uK873qw4nmw/slFy5nbRaRdjs3gF0R4qrT6/VPu9EYX6oSP2Iry7SXBe6GlKIXrfC4pq47ZzzKDmqXxmKs4++glmPPOXAnrd9hAR3DceOzxTP6m06abFv2oI+JnZ6+qUihfpyQIthK2UBzUS7UdbYIRxOZySj+UZBOH1dqlTaWPbcAwAt+nbNroGb9NfZVIT4OaWZbJp81juybtiWBOQAsoyQ4KLuBbR2SvNBQTD49HY8Lz4s2mSUA31U99D+Feh1Ssi+adEiJk9PN0yQSpMOGDLTMlatOcl5jI7tX7ME0GbyTgi2qSsWokMR/K9+smybUzU4YuNxbvtzZETeJsYx54Fs8MSoYWL9QrEQNMxA7BbRh 6Iu4eYlK zWonI1xakWH9yUnUkkbz8z5cgnNXGqt3P/VNP3lA7J1Bshz7LNudP08G+oroT4QbzjT9DekSN9o6gZy9UeJ3sgAs7buLL7URfEM8qZxgbrFwe51Xtee1u0UrRIlIoKfaOXzX3weNjJmC7FkyLvootqXtKERrMHD2PVKHoTv3sBMXzd385cXOoygIbZHFH6Hb8cbewvZ8YoPnc1bR89CScBkcr2IU7gYYF0wp+MypiuoGKYghIcGK5pShQhK4Nso5vJW07vLJWc9CIUNETe8OLOHvL2hhJlD6B/EPLJG4MsRVV/NSQeexTn9ecg9Qkrhqg6lJY5HPqGPCkPpAMmlm9Ozry1/ZEdCA0q2m3TuQYSzAlbq97f2yt2NpB/xTELK97OeLCCkJmQFIhEAtqjp0klF1DwAyFoqbXzZSd7yOFNThUy4mquvYpfvHYX4QtRIkDKcLKWY4tBFDaJmQHmIVUwOGarQ3LFZ3q5TVamkU3SPb+eqp8t742KRSEpZX7MGmWNV1DE4QaUUn2YJ7NHi3XvSIOLzUMQAfc40VcR8ZIeIbaXQtBs1chNkZgtQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add support for potentially switching address spaces from within interrupts/exceptions/NMIs etc. An interrupt does not automatically switch to the unrestricted address space. It can switch if needed to access some memory not available in the restricted address space, using the normal asi_exit call. On return from the outermost interrupt, if the target address space was the restricted address space (e.g. we were in the critical code path between ASI Enter and VM Enter), the restricted address space will be automatically restored. Otherwise, execution will continue in the unrestricted address space until the next explicit ASI Enter. In order to keep track of when to restore the restricted address space, an interrupt/exception nesting depth counter is maintained per-task. An alternative implementation without needing this counter is also possible, but the counter unlocks an additional nice-to-have benefit by allowing detection of whether or not we are currently executing inside an exception context, which would be useful in a later patch. Note that for KVM on SVM, this is not actually necessary as NMIs are in fact maskable via CLGI. It's not clear to me if VMX has something equivalent but we will need this infrastructure in place for userspace support anyway. RFC: Once userspace ASI is implemented, this idtentry integration looks a bit heavy-handed. For example, we don't need this logic for INT 80 emulation, so having it in DEFINE_IDTENTRY_RAW is confusing. It could lead to a bug if the order of interrupter counter modifications and ASI transition logic gets flipped around somehow. checkpatch.pl SPACING is false positive. AVOID_BUG ignored for RFC. Checkpatch-args: --ignore=SPACING,AVOID_BUG Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/x86/include/asm/asi.h | 68 ++++++++++++++++++++++++++++++++++++++-- arch/x86/include/asm/idtentry.h | 50 ++++++++++++++++++++++++----- arch/x86/include/asm/processor.h | 5 +++ arch/x86/kernel/process.c | 2 ++ arch/x86/kernel/traps.c | 22 +++++++++++++ arch/x86/mm/asi.c | 7 ++++- include/asm-generic/asi.h | 10 ++++++ 7 files changed, 153 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index b9671ef2dd3278adceed18507fd260e21954d574..9a9a139518289fc65f26a4d1cd311aa52cc5357f 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -157,6 +157,11 @@ void asi_relax(void); /* Immediately exit the restricted address space if in it */ void asi_exit(void); +static inline void asi_init_thread_state(struct thread_struct *thread) +{ + thread->asi_state.intr_nest_depth = 0; +} + /* The target is the domain we'll enter when returning to process context. */ static __always_inline struct asi *asi_get_target(struct task_struct *p) { @@ -197,9 +202,10 @@ static __always_inline bool asi_is_relaxed(void) /* * Is the current task in the critical section? * - * This is just the inverse of !asi_is_relaxed(). We have both functions in order to - * help write intuitive client code. In particular, asi_is_tense returns false - * when ASI is disabled, which is judged to make user code more obvious. + * This is just the inverse of !asi_is_relaxed(). We have both functions in + * order to help write intuitive client code. In particular, asi_is_tense + * returns false when ASI is disabled, which is judged to make user code more + * obvious. */ static __always_inline bool asi_is_tense(void) { @@ -211,6 +217,62 @@ static __always_inline pgd_t *asi_pgd(struct asi *asi) return asi ? asi->pgd : NULL; } +static __always_inline void asi_intr_enter(void) +{ + if (static_asi_enabled() && asi_is_tense()) { + current->thread.asi_state.intr_nest_depth++; + barrier(); + } +} + +void __asi_enter(void); + +static __always_inline void asi_intr_exit(void) +{ + if (static_asi_enabled() && asi_is_tense()) { + /* + * If an access to sensitive memory got reordered after the + * decrement, the #PF handler for that access would see a value + * of 0 for the counter and re-__asi_enter before returning to + * the faulting access, triggering an infinite PF loop. + */ + barrier(); + + if (--current->thread.asi_state.intr_nest_depth == 0) { + /* + * If the decrement got reordered after __asi_enter, an + * interrupt that came between __asi_enter and the + * decrement would always see a nonzero value for the + * counter so it wouldn't call __asi_enter again and we + * would return to process context in the wrong address + * space. + */ + barrier(); + __asi_enter(); + } + } +} + +/* + * Returns the nesting depth of interrupts/exceptions that have interrupted the + * ongoing critical section. If the current task is not in a critical section + * this is 0. + */ +static __always_inline int asi_intr_nest_depth(void) +{ + return current->thread.asi_state.intr_nest_depth; +} + +/* + * Remember that interrupts/exception don't count as the critical section. If + * you want to know if the current task is in the critical section use + * asi_is_tense(). + */ +static __always_inline bool asi_in_critical_section(void) +{ + return asi_is_tense() && !asi_intr_nest_depth(); +} + #define INIT_MM_ASI(init_mm) \ .asi_init_lock = __MUTEX_INITIALIZER(init_mm.asi_init_lock), diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index ad5c68f0509d4dfd0834303c0f9dabc93ef73aa4..9e00da0a3b08f83ca5e603dc2abbfd5fa3059811 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -12,6 +12,7 @@ #include #include +#include typedef void (*idtentry_t)(struct pt_regs *regs); @@ -55,12 +56,15 @@ static __always_inline void __##func(struct pt_regs *regs); \ \ __visible noinstr void func(struct pt_regs *regs) \ { \ - irqentry_state_t state = irqentry_enter(regs); \ + irqentry_state_t state; \ \ + asi_intr_enter(); \ + state = irqentry_enter(regs); \ instrumentation_begin(); \ __##func (regs); \ instrumentation_end(); \ irqentry_exit(regs, state); \ + asi_intr_exit(); \ } \ \ static __always_inline void __##func(struct pt_regs *regs) @@ -102,12 +106,15 @@ static __always_inline void __##func(struct pt_regs *regs, \ __visible noinstr void func(struct pt_regs *regs, \ unsigned long error_code) \ { \ - irqentry_state_t state = irqentry_enter(regs); \ + irqentry_state_t state; \ \ + asi_intr_enter(); \ + state = irqentry_enter(regs); \ instrumentation_begin(); \ __##func (regs, error_code); \ instrumentation_end(); \ irqentry_exit(regs, state); \ + asi_intr_exit(); \ } \ \ static __always_inline void __##func(struct pt_regs *regs, \ @@ -139,7 +146,16 @@ static __always_inline void __##func(struct pt_regs *regs, \ * is required before the enter/exit() helpers are invoked. */ #define DEFINE_IDTENTRY_RAW(func) \ -__visible noinstr void func(struct pt_regs *regs) +static __always_inline void __##func(struct pt_regs *regs); \ + \ +__visible noinstr void func(struct pt_regs *regs) \ +{ \ + asi_intr_enter(); \ + __##func (regs); \ + asi_intr_exit(); \ +} \ + \ +static __always_inline void __##func(struct pt_regs *regs) /** * DEFINE_FREDENTRY_RAW - Emit code for raw FRED entry points @@ -178,7 +194,18 @@ noinstr void fred_##func(struct pt_regs *regs) * is required before the enter/exit() helpers are invoked. */ #define DEFINE_IDTENTRY_RAW_ERRORCODE(func) \ -__visible noinstr void func(struct pt_regs *regs, unsigned long error_code) +static __always_inline void __##func(struct pt_regs *regs, \ + unsigned long error_code); \ + \ +__visible noinstr void func(struct pt_regs *regs, unsigned long error_code)\ +{ \ + asi_intr_enter(); \ + __##func (regs, error_code); \ + asi_intr_exit(); \ +} \ + \ +static __always_inline void __##func(struct pt_regs *regs, \ + unsigned long error_code) /** * DECLARE_IDTENTRY_IRQ - Declare functions for device interrupt IDT entry @@ -209,14 +236,17 @@ static void __##func(struct pt_regs *regs, u32 vector); \ __visible noinstr void func(struct pt_regs *regs, \ unsigned long error_code) \ { \ - irqentry_state_t state = irqentry_enter(regs); \ + irqentry_state_t state; \ u32 vector = (u32)(u8)error_code; \ \ + asi_intr_enter(); \ + state = irqentry_enter(regs); \ kvm_set_cpu_l1tf_flush_l1d(); \ instrumentation_begin(); \ run_irq_on_irqstack_cond(__##func, regs, vector); \ instrumentation_end(); \ irqentry_exit(regs, state); \ + asi_intr_exit(); \ } \ \ static noinline void __##func(struct pt_regs *regs, u32 vector) @@ -255,13 +285,16 @@ static __always_inline void instr_##func(struct pt_regs *regs) \ \ __visible noinstr void func(struct pt_regs *regs) \ { \ - irqentry_state_t state = irqentry_enter(regs); \ + irqentry_state_t state; \ \ + asi_intr_enter(); \ + state = irqentry_enter(regs); \ kvm_set_cpu_l1tf_flush_l1d(); \ instrumentation_begin(); \ instr_##func (regs); \ instrumentation_end(); \ irqentry_exit(regs, state); \ + asi_intr_exit(); \ } \ \ void fred_##func(struct pt_regs *regs) \ @@ -294,13 +327,16 @@ static __always_inline void instr_##func(struct pt_regs *regs) \ \ __visible noinstr void func(struct pt_regs *regs) \ { \ - irqentry_state_t state = irqentry_enter(regs); \ + irqentry_state_t state; \ \ + asi_intr_enter(); \ + state = irqentry_enter(regs); \ kvm_set_cpu_l1tf_flush_l1d(); \ instrumentation_begin(); \ instr_##func (regs); \ instrumentation_end(); \ irqentry_exit(regs, state); \ + asi_intr_exit(); \ } \ \ void fred_##func(struct pt_regs *regs) \ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index f02220e6b4df911d87e2fee4b497eade61a27161..a32a53405f45e4c0473fe081e216029cf5bd0cdd 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -508,6 +508,11 @@ struct thread_struct { struct { /* Domain to enter when returning to process context. */ struct asi *target; + /* + * The depth of interrupt/exceptions interrupting an ASI + * critical section + */ + int intr_nest_depth; } asi_state; #endif diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index f63f8fd00a91f3d1171f307b92179556ba2d716d..44abc161820153b7f68664b97267658b8e011101 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -96,6 +96,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) #ifdef CONFIG_VM86 dst->thread.vm86 = NULL; #endif + asi_init_thread_state(&dst->thread); + /* Drop the copied pointer to current's fpstate */ dst->thread.fpu.fpstate = NULL; diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 2dbadf347b5f4f66625c4f49b76c41b412270d57..beea861da8d3e9a4e2afb3a92ed5f66f11d67bd6 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -65,6 +65,7 @@ #include #include #include +#include #include #include #include @@ -463,6 +464,27 @@ DEFINE_IDTENTRY_DF(exc_double_fault) } #endif + /* + * Do an asi_exit() only here because a #DF usually indicates + * the system is in a really bad state, and we don't want to + * cause any additional issue that would prevent us from + * printing a correct stack trace. + * + * The additional issues are not related to a possible triple + * fault, which can only occurs if a fault is encountered while + * invoking this handler, but here we are already executing it. + * Instead, an ASI-induced #PF here could potentially end up + * getting another #DF. For example, if there was some issue in + * invoking the #PF handler. The handler for the second #DF + * could then again cause an ASI-induced #PF leading back to the + * same recursion. + * + * This is not needed in the espfix64 case above, since that + * code is about turning a #DF into a #GP which is okay to + * handle in the restricted domain. That's also why we don't + * asi_exit() in the #GP handler. + */ + asi_exit(); irqentry_nmi_enter(regs); instrumentation_begin(); notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 5baf563a078f5b3a6cd4b9f5e92baaf81b0774c4..054315d566c082c0925a00ce3a0877624c8b9957 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -235,7 +235,7 @@ static __always_inline void maybe_flush_data(struct asi *next_asi) this_cpu_and(asi_taints, ~ASI_TAINTS_DATA_MASK); } -static noinstr void __asi_enter(void) +noinstr void __asi_enter(void) { u64 asi_cr3; struct asi *target = asi_get_target(current); @@ -250,6 +250,7 @@ static noinstr void __asi_enter(void) * disabling preemption should be fine. */ VM_BUG_ON(preemptible()); + VM_BUG_ON(current->thread.asi_state.intr_nest_depth != 0); if (!target || target == this_cpu_read(curr_asi)) return; @@ -290,6 +291,7 @@ noinstr void asi_enter(struct asi *asi) if (!static_asi_enabled()) return; + VM_WARN_ON_ONCE(asi_intr_nest_depth()); VM_WARN_ON_ONCE(!asi); /* Should not have an asi_enter() without a prior asi_relax(). */ @@ -305,6 +307,7 @@ EXPORT_SYMBOL_GPL(asi_enter); noinstr void asi_relax(void) { if (static_asi_enabled()) { + VM_WARN_ON_ONCE(asi_intr_nest_depth()); barrier(); asi_set_target(current, NULL); } @@ -326,6 +329,8 @@ noinstr void asi_exit(void) asi = this_cpu_read(curr_asi); if (asi) { + WARN_ON_ONCE(asi_in_critical_section()); + maybe_flush_control(NULL); unrestricted_cr3 = diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index eedc961ee916a9e1da631ca489ea4a7bc9e6089f..7f542c59c2b8a2b74432e4edb7199f9171db8a84 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -52,6 +52,8 @@ static inline bool asi_is_relaxed(void) { return true; } static inline bool asi_is_tense(void) { return false; } +static inline bool asi_in_critical_section(void) { return false; } + static inline void asi_exit(void) { } static inline bool asi_is_restricted(void) { return false; } @@ -65,6 +67,14 @@ static inline pgd_t *asi_pgd(struct asi *asi) { return NULL; } static inline void asi_handle_switch_mm(void) { } +static inline void asi_init_thread_state(struct thread_struct *thread) { } + +static inline void asi_intr_enter(void) { } + +static inline int asi_intr_nest_depth(void) { return 0; } + +static inline void asi_intr_exit(void) { } + #define static_asi_enabled() false static inline void asi_check_boottime_disable(void) { }