From patchwork Thu Sep 29 22:29:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBAC2C433FE for ; Thu, 29 Sep 2022 22:31:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8698A8D000D; Thu, 29 Sep 2022 18:31:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F4138D0007; Thu, 29 Sep 2022 18:31:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61D108D000D; Thu, 29 Sep 2022 18:31:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 544468D0007 for ; Thu, 29 Sep 2022 18:31:08 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 2310741360 for ; Thu, 29 Sep 2022 22:31:08 +0000 (UTC) X-FDA: 79966569816.01.BE78F67 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf08.hostedemail.com (Postfix) with ESMTP id 5424416001D for ; Thu, 29 Sep 2022 22:31:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490666; x=1696026666; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xqFSCPOHjTfopGQJjqWcfN6D1bVBxbnxIn0y2R87szg=; b=nJYGRQjPuB/tEyrrLJfFqpk47k93vVW3MYIfBCvbs5qe1U4GP2OTIARu WKuepsPER94981fBfd9AA9ETUU+uJB471iepQiQijTDareF++EjFocJRt h6kQnFkS3VOd0mT6AKNehUFPayGHdcjErRE2QONuDrrHycWmdpjlxS7IF lgj0+GV3W0zVXSmeHFAzlagZJMc7zGFKRnkUa1JvKgeeI3hY2dBm0UhHe m9ZCuHrWnEWRk7FCmlde5DYeY9P5vcEJGnGT2LVAiOpOt4LVTUtNHbfy3 YNfC6dV1kQo/LFYISZYLOOYhUdM3bwqRZV5uOKQ+DkWdzquzVB3XOLibM A==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="302957204" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="302957204" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:31:04 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016403" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016403" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:31:02 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [OPTIONAL/RFC v2 37/39] x86/cet: Add PTRACE interface for CET Date: Thu, 29 Sep 2022 15:29:34 -0700 Message-Id: <20220929222936.14584-38-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490666; a=rsa-sha256; cv=none; b=jN4jYw70Vk5BZ7SHmadV/gMN149mGyU20chBo76TbY596W16t2IBfrN4b6dv6shil9Mn6M IO8txfZ7BX3jfHuNjw1azrZrKQKLUjBsGNOeFo5wtqZAGLVWQwSDxhfWhlBE2rGWOjMvL0 r8DIEeMlJbbG6tt/7gqZv3xNJUjma/8= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=nJYGRQjP; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490666; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=P/eZH5mFLRTwYwnE3K/NvGBQticJnxOtSjB8bSaE8v8=; b=kgGkAC6blmbkrgKgMz121it5XAhw30SW2EFFHHsgHJ/BfJD+YshtjyFCdfLufdkph/+TYQ U5tM4hg729cBgFxLDhuof8CghN4SBxJSR91g6xI00wvwEsUuwf7YdYmZ4PeR5hPMv2gsMv vB04DqViuA4BAYB4HFVPp0eSO2opROk= X-Stat-Signature: nmcutqkwxzqimf8t4rgxer35hb6az38h X-Rspamd-Queue-Id: 5424416001D X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=nJYGRQjP; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam01 X-HE-Tag: 1664490665-708068 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Some applications (like GDB and CRIU) would like to tweak CET state via ptrace. This allows for existing functionality to continue to work for seized CET applications. Provide an interface based on the xsave buffer format of CET, but filter unneeded states to make the kernel’s job easier. There is already ptrace functionality for accessing xstate, but this does not include supervisor xfeatures. So there is not a completely clear place for where to put the CET state. Adding it to the user xfeatures regset would complicate that code, as it currently shares logic with signals which should not have supervisor features. Don’t add a general supervisor xfeature regset like the user one, because it is better to maintain flexibility for other supervisor xfeatures to define their own interface. For example, an xfeature may decide not to expose all of it’s state to userspace. A lot of enum values remain to be used, so just put it in dedicated CET regset. The only downside to not having a generic supervisor xfeature regset, is that apps need to be enlightened of any new supervisor xfeature exposed this way (i.e. they can’t try to have generic save/restore logic). But maybe that is a good thing, because they have to think through each new xfeature instead of encountering issues when new a new supervisor xfeature was added. By adding a CET regset, it also has the effect of including the CET state in a core dump, which could be useful for debugging. Inside the setter CET regset, filter out invalid state. Today this includes states disallowed by the HW and states involving Indirect Branch Tracking which the kernel does not currently support for usersapce. So this leaves three pieces of data that can be set, shadow stack enablement, WRSS enablement and the shadow stack pointer. It is worth noting that this is separate than enabling shadow stack via the arch_prctl()s. Enabling shadow stack involves more than just flipping the bit. The kernel is made aware that it has to do extra things when cloning or handling signals. That logic is triggered off of separate feature enablement state kept in the task struct. So the flipping on HW shadow stack enforcement without notifying the kernel to change its behavior would severely limit what an application could do without crashing. Since there is likely no use for this, only allow the CET registers to be set if shadow stack is already enabled via the arch_prctl()s. This will let apps like GDB toggle shadow stack enforcement for apps that already have shadow stack enabled, and minimize scenarios the kernel has to worry about. Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Signed-off-by: Yu-cheng Yu --- v2: - Check alignment on ssp. - Block IBT bits. - Handle init states instead of returning error. - Add verbose commit log justifying the design. Yu-Cheng v12: - Return -ENODEV when CET registers are in INIT state. - Check reserved/non-support bits from user input. arch/x86/include/asm/fpu/regset.h | 7 ++- arch/x86/include/asm/msr-index.h | 5 ++ arch/x86/kernel/fpu/regset.c | 95 +++++++++++++++++++++++++++++++ arch/x86/kernel/ptrace.c | 20 +++++++ include/uapi/linux/elf.h | 1 + 5 files changed, 125 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/fpu/regset.h b/arch/x86/include/asm/fpu/regset.h index 4f928d6a367b..8622184d87f5 100644 --- a/arch/x86/include/asm/fpu/regset.h +++ b/arch/x86/include/asm/fpu/regset.h @@ -7,11 +7,12 @@ #include -extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active; +extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active, + cetregs_active; extern user_regset_get2_fn fpregs_get, xfpregs_get, fpregs_soft_get, - xstateregs_get; + xstateregs_get, cetregs_get; extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set, - xstateregs_set; + xstateregs_set, cetregs_set; /* * xstateregs_active == regset_fpregs_active. Please refer to the comment diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 6674bdb096f3..fbc319682664 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -432,6 +432,11 @@ #define CET_RESERVED (BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9)) #define CET_SUPPRESS BIT_ULL(10) #define CET_WAIT_ENDBR BIT_ULL(11) +#define CET_EG_LEG_BITMAP_BASE_MASK GENMASK_ULL(63, 13) + +#define CET_U_IBT_MASK (CET_ENDBR_EN | CET_LEG_IW_EN | CET_NO_TRACK_EN | \ + CET_NO_TRACK_EN | CET_SUPPRESS_DISABLE | CET_SUPPRESS | \ + CET_WAIT_ENDBR | CET_EG_LEG_BITMAP_BASE_MASK) #define MSR_IA32_PL0_SSP 0x000006a4 /* ring-0 shadow stack pointer */ #define MSR_IA32_PL1_SSP 0x000006a5 /* ring-1 shadow stack pointer */ diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c index 75ffaef8c299..440dc1921ee4 100644 --- a/arch/x86/kernel/fpu/regset.c +++ b/arch/x86/kernel/fpu/regset.c @@ -174,6 +174,101 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, return ret; } +int cetregs_active(struct task_struct *target, const struct user_regset *regset) +{ +#ifdef CONFIG_X86_SHADOW_STACK + if (target->thread.shstk.size) + return regset->n; +#endif + return 0; +} + +int cetregs_get(struct task_struct *target, const struct user_regset *regset, + struct membuf to) +{ + struct fpu *fpu = &target->thread.fpu; + struct cet_user_state *cetregs; + + if (!boot_cpu_has(X86_FEATURE_SHSTK)) + return -ENODEV; + + sync_fpstate(fpu); + cetregs = get_xsave_addr(&fpu->fpstate->regs.xsave, XFEATURE_CET_USER); + if (!cetregs) { + /* + * The registers are the in the init state. The init values for + * these regs are zero, so just zero the output buffer. + */ + membuf_zero(&to, sizeof(struct cet_user_state)); + return 0; + } + + return membuf_write(&to, cetregs, sizeof(struct cet_user_state)); +} + +int cetregs_set(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + struct fpu *fpu = &target->thread.fpu; + struct xregs_state *xsave = &fpu->fpstate->regs.xsave; + struct cet_user_state *cetregs, tmp; + bool ia32; + int r; + + if (!boot_cpu_has(X86_FEATURE_SHSTK) || + !cetregs_active(target, regset)) + return -ENODEV; + + ia32 = IS_ENABLED(CONFIG_IA32_EMULATION) && + target->thread_info.status & TS_COMPAT; + + r = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &tmp, 0, -1); + if (r) + return r; + + /* + * Some kernel instructions (IRET, etc) can cause exceptions in the case + * of disallowed CET register values. Just prevent invalid values. + */ + if ((tmp.user_ssp >= TASK_SIZE_MAX) || + (ia32 && !IS_ALIGNED(tmp.user_ssp, 4)) || + (!ia32 && !IS_ALIGNED(tmp.user_ssp, 8))) + return -EINVAL; + + /* + * Don't allow any IBT bits to be set because it is not supported by + * the kernel yet. Also don't allow reserved bits. + */ + if ((tmp.user_cet & CET_RESERVED) || (tmp.user_cet & CET_U_IBT_MASK)) + return -EINVAL; + + fpu_force_restore(fpu); + + /* + * Don't want to init the xfeature until the kernel will definetely + * overwrite it, otherwise if it inits and then fails out, it would + * end up initing it to random data. + */ + if (!xfeature_saved(xsave, XFEATURE_CET_USER) && + WARN_ON(init_xfeature(xsave, XFEATURE_CET_USER))) + return -ENODEV; + + cetregs = get_xsave_addr(xsave, XFEATURE_CET_USER); + if (WARN_ON(!cetregs)) { + /* + * This shouldn't ever be NULL because it was successfully + * inited above if needed. The only scenario would be if an + * xfeature was somehow saved in a buffer, but not enabled in + * xsave. + */ + return -ENODEV; + } + + memmove(cetregs, &tmp, sizeof(tmp)); + return 0; +} + #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION /* diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index eed8a65d335d..f9e6635b69ce 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -51,6 +51,7 @@ enum x86_regset_32 { REGSET_XSTATE32, REGSET_TLS32, REGSET_IOPERM32, + REGSET_CET32, }; enum x86_regset_64 { @@ -58,6 +59,7 @@ enum x86_regset_64 { REGSET_FP64, REGSET_IOPERM64, REGSET_XSTATE64, + REGSET_CET64, }; #define REGSET_GENERAL \ @@ -1267,6 +1269,15 @@ static struct user_regset x86_64_regsets[] __ro_after_init = { .active = ioperm_active, .regset_get = ioperm_get }, + [REGSET_CET64] = { + .core_note_type = NT_X86_CET, + .n = sizeof(struct cet_user_state) / sizeof(u64), + .size = sizeof(u64), + .align = sizeof(u64), + .active = cetregs_active, + .regset_get = cetregs_get, + .set = cetregs_set + }, }; static const struct user_regset_view user_x86_64_view = { @@ -1336,6 +1347,15 @@ static struct user_regset x86_32_regsets[] __ro_after_init = { .active = ioperm_active, .regset_get = ioperm_get }, + [REGSET_CET32] = { + .core_note_type = NT_X86_CET, + .n = sizeof(struct cet_user_state) / sizeof(u64), + .size = sizeof(u64), + .align = sizeof(u64), + .active = cetregs_active, + .regset_get = cetregs_get, + .set = cetregs_set + }, }; static const struct user_regset_view user_x86_32_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index c7b056af9ef0..11089731e2e9 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -406,6 +406,7 @@ typedef struct elf64_shdr { #define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */ #define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */ #define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */ +#define NT_X86_CET 0x203 /* x86 CET state */ #define NT_S390_HIGH_GPRS 0x300 /* s390 upper register halves */ #define NT_S390_TIMER 0x301 /* s390 timer register */ #define NT_S390_TODCMP 0x302 /* s390 TOD clock comparator register */