From patchwork Thu Sep 29 22:28:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994654 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A6E7C433FE for ; Thu, 29 Sep 2022 22:29:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBECC6B0073; Thu, 29 Sep 2022 18:29:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D6E9E6B0074; Thu, 29 Sep 2022 18:29:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0FBF8D0001; Thu, 29 Sep 2022 18:29:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B324A6B0073 for ; Thu, 29 Sep 2022 18:29:50 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7D393C1366 for ; Thu, 29 Sep 2022 22:29:50 +0000 (UTC) X-FDA: 79966566540.15.F1F7F4E Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf20.hostedemail.com (Postfix) with ESMTP id F38AD1C000D for ; Thu, 29 Sep 2022 22:29:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490590; x=1696026590; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=dgoCV7vkgGcH+obhSo8r9k9vnopVaP/5/9bw1BHk+Fw=; b=UVm/hcuMvljZVYMjYd1NDI7HHTIQdrebdhKMC8JHleAaWWlLwl/c/J3L roxNmq5lN2KwIi/wbkKUpuEss30Lw5XeVrMATr3SgS3KU7bq1Mz20UcI7 bsvRd9xwa48om+K5DMsflIuG4AijzquFnZVakZNFihQgOMFVu4QgMyV+2 KXsq2CAkqQhNVN+GP6+LY9OcPyvH/ijVcu7FvCifPpg1A/eLZRnVy3Jgz ve2JJPkEsPi4D7xTns4yVzmhxV1PhoZnFhZA54htn9hgTuSKj+tOHXgWo WGmf3Pxd704tkhngB5NeMa69ZsyiAEl3KpjDJBCKJMa/+3qQ+lji2Y3Zn w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420307" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420307" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:49 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016062" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016062" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:47 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 01/39] Documentation/x86: Add CET description Date: Thu, 29 Sep 2022 15:28:58 -0700 Message-Id: <20220929222936.14584-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490590; a=rsa-sha256; cv=none; b=7qk1BJPPb6VsVmEipftSn343h04qLeQtmGD+YYmkYMl7w2i31ah3JcYAy2VgpHCp91+t7b SL9+dtRcCJf3IS6H9g66pPOqn92+JUN9XYFjsbcyoKTIuD6tgiwCsMzF8uFikArbw2ncbr he68mNt6fd4Rpbi+bBUyAFrAilp6+hU= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="UVm/hcuM"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf20.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490590; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=f/AO9orQjh+mbq+2nrP5OaSs9S+n+y0P27nwTlrGW/E=; b=QKMGpm2U7/y2TgV6xrgvoLG9Z6jaCtc8FSDMhkbiqDdaUr6LeWZCewZLLCypWxYGHwwxWL jL0oscn2uAQb3ifm/Xu3tJQGV6AB0DXRY0kQdb+/tnEkzlWmaRcJPaHxdroXnIHbUq57cA YnCzwotxHcKMlv0+0VezursxNR7+1Bw= X-Rspamd-Queue-Id: F38AD1C000D X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="UVm/hcuM"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf20.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: om4xbtdsspao91oty4dmgaz1a6xn9bs5 X-HE-Tag: 1664490589-36074 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Introduce a new document on Control-flow Enforcement Technology (CET). Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Signed-off-by: Bagas Sanjaya --- v2: - Updated to new arch_prctl() API - Add bit about new proc status v1: - Update and clarify the docs. - Moved kernel parameters documentation to other patch. Documentation/x86/cet.rst | 140 ++++++++++++++++++++++++++++++++++++ Documentation/x86/index.rst | 1 + 2 files changed, 141 insertions(+) create mode 100644 Documentation/x86/cet.rst diff --git a/Documentation/x86/cet.rst b/Documentation/x86/cet.rst new file mode 100644 index 000000000000..4a0dfb6830f9 --- /dev/null +++ b/Documentation/x86/cet.rst @@ -0,0 +1,140 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================= +Control-flow Enforcement Technology (CET) +========================================= + +Overview +======== + +Control-flow Enforcement Technology (CET) is term referring to several +related x86 processor features that provides protection against control +flow hijacking attacks. The HW feature itself can be set up to protect +both applications and the kernel. Only user-mode protection is implemented +in the 64-bit kernel. + +CET introduces Shadow Stack and Indirect Branch Tracking. Shadow stack is +a secondary stack allocated from memory and cannot be directly modified by +applications. When executing a CALL instruction, the processor pushes the +return address to both the normal stack and the shadow stack. Upon +function return, the processor pops the shadow stack copy and compares it +to the normal stack copy. If the two differ, the processor raises a +control-protection fault. Indirect branch tracking verifies indirect +CALL/JMP targets are intended as marked by the compiler with 'ENDBR' +opcodes. Not all CPU's have both Shadow Stack and Indirect Branch Tracking +and only Shadow Stack is currently supported in the kernel. + +The Kconfig options is X86_SHADOW_STACK, and it can be disabled with +the kernel parameter clearcpuid, like this: "clearcpuid=shstk". + +To build a CET-enabled kernel, Binutils v2.31 and GCC v8.1 or LLVM v10.0.1 +or later are required. To build a CET-enabled application, GLIBC v2.28 or +later is also required. + +At run time, /proc/cpuinfo shows CET features if the processor supports +CET. + +Application Enabling +==================== + +An application's CET capability is marked in its ELF header and can be +verified from readelf/llvm-readelf output: + + readelf -n | grep -a SHSTK + properties: x86 feature: SHSTK + +The kernel does not process these applications directly. Applications must +enable them using the interface descriped in section 4. Typically this +would be done in dynamic loader or static runtime objects, as is the case +in glibc. + +Backward Compatibility +====================== + +GLIBC provides a few CET tunables via the GLIBC_TUNABLES environment +variable: + +GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK,-WRSS + Turn off SHSTK/WRSS. + +GLIBC_TUNABLES=glibc.tune.x86_shstk= + This controls how dlopen() handles SHSTK legacy libraries:: + + on - continue with SHSTK enabled; + permissive - continue with SHSTK off. + +Details can be found in the GLIBC manual pages. + +CET arch_prctl()'s +================== + +Elf features should be enabled by the loader using the below arch_prctl's. + +arch_prctl(ARCH_CET_ENABLE, unsigned int feature) + Enable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_CET_DISABLE, unsigned int feature) + Disable features specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_CET_LOCK, unsigned int features) + Lock in features at their current enabled or disabled status. + +The return values are as following: + On success, return 0. On error, errno can be:: + + -EPERM if any of the passed feature are locked. + -EOPNOTSUPP if the feature is not supported by the hardware or + disabled by kernel parameter. + -EINVAL arguments (non existing feature, etc) + +Currently shadow stack and WRSS are supported via this interface. WRSS +can only be enabled with shadow stack, and is automatically disabled +if shadow stack is disabled. + +Proc status +=========== +To check if an application is actually running with shadow stack, the +user can read the /proc/$PID/arch_status. It will report "wrss" or +"shstk" depending on what is enabled. + +The implementation of the Shadow Stack +====================================== + +Shadow Stack size +----------------- + +A task's shadow stack is allocated from memory to a fixed size of +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to +the maximum size of the normal stack, but capped to 4 GB. However, +a compat-mode application's address space is smaller, each of its thread's +shadow stack size is MIN(1/4 RLIMIT_STACK, 4 GB). + +Signal +------ + +By default, the main program and its signal handlers use the same shadow +stack. Because the shadow stack stores only return addresses, a large +shadow stack covers the condition that both the program stack and the +signal alternate stack run out. + +The kernel creates a restore token for the shadow stack and pushes the +restorer address to the shadow stack. Then verifies that token when +restoring from the signal handler. + +Fork +---- + +The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required +to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a +shadow access triggers a page fault with the shadow stack access bit set +in the page fault error code. + +When a task forks a child, its shadow stack PTEs are copied and both the +parent's and the child's shadow stack PTEs are cleared of the dirty bit. +Upon the next shadow stack access, the resulting shadow stack page fault +is handled by page copy/re-use. + +When a pthread child is created, the kernel allocates a new shadow stack +for the new thread. diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index c73d133fd37c..9ac03055c4b5 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -22,6 +22,7 @@ x86-specific Documentation mtrr pat intel-hfi + cet iommu intel_txt amd-memory-encryption From patchwork Thu Sep 29 22:28:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994656 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D471C4332F for ; Thu, 29 Sep 2022 22:29:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D7C36B0074; Thu, 29 Sep 2022 18:29:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3600D8D0001; Thu, 29 Sep 2022 18:29:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 161EA6B0078; Thu, 29 Sep 2022 18:29:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 091786B0074 for ; Thu, 29 Sep 2022 18:29:53 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C394041379 for ; Thu, 29 Sep 2022 22:29:52 +0000 (UTC) X-FDA: 79966566624.28.D578B45 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id 3373A40004 for ; Thu, 29 Sep 2022 22:29:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490592; x=1696026592; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=LROLd1DXeI67ae2qnslC9c0J7dEIWigwReKeDsym990=; b=NAxXXU1kiNaa+/BRHssfizFdzWgwe67Q19ePN7YNM5ymHWKbZDNfNut7 Oi7OluOCpT22/D61Z99z6HwjrGewnW/V6hIDlE7ahAeBimPA4drGj1OwG 70+IE7Q2L/IOuJDjlpMnYEPuQeR+gm/GrXafqL9y5f1lkcF8TQSR3H0uw o5fuYxUVOr40lihHikT1RuEYC1EbvBNdZInaogv27afWi5fztl1D9LmZc ErKDGtmmZJkZ8pGEln1jiAPYv2Z+GU/oBP4y56X3RqhCnGCfB2oVkyK8N 2rh+sIAddsXjqa6h5OK0eaYZvFSRb4e2aaH30qhfZ1Q0jx8uC8UKgFmZT Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420311" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420311" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:50 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016074" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016074" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:48 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 02/39] x86/cet/shstk: Add Kconfig option for Shadow Stack Date: Thu, 29 Sep 2022 15:28:59 -0700 Message-Id: <20220929222936.14584-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490592; a=rsa-sha256; cv=none; b=XgeT+5mj7lZlaYVl4arhv8/Ws+akuwCx1JyENRFspN4Nusq25IcXNX+TaRixGlV55rn1ru JhJU6aNM536UO40wzluo/W82FPYxszv19vb35h0se587tGNPPODx4n6I1uOqx48xkbtCxt 0k9Q0Nw/aJFEZDFMdCmjr0/k3H43IkI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=NAxXXU1k; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490592; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=zJrzKUJqqQt+ShwooMSXDPE4Q2qA9Kmom53Abesu9Fs=; b=Kmy2RHqDQj1h61EhEjFNj0MvhCMQyd7BNhtcmCtG0DeM1IwKQN5YFum7X/0SgTcr0Duhq4 Q7slZmxUU3fokd+/xFN2uZsvAFCTzQTigRNOPkbf+jbvmSnrms8FJwhlr/BYYbqMYuXe9K FtBlXrPLfdSOZsXzH12PTjjWUr2JDyY= X-Rspamd-Queue-Id: 3373A40004 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=NAxXXU1k; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: 3kacfpnp7pzednisnugw65mfobafhbrd X-HE-Tag: 1664490591-243 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Shadow Stack provides protection against function return address corruption. It is active when the processor supports it, the kernel has CONFIG_X86_SHADOW_STACK enabled, and the application is built for the feature. This is only implemented for the 64-bit kernel. When it is enabled, legacy non-Shadow Stack applications continue to work, but without protection. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v2: - Remove already wrong kernel size increase info (tlgx) - Change prompt to remove "Intel" (tglx) - Update line about what CPUs are supported (Dave) Yu-cheng v25: - Remove X86_CET and use X86_SHADOW_STACK directly. Yu-cheng v24: - Update for the splitting X86_CET to X86_SHADOW_STACK and X86_IBT. arch/x86/Kconfig | 18 ++++++++++++++++++ arch/x86/Kconfig.assembler | 5 +++++ 2 files changed, 23 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f9920f1341c8..b68eb75887b8 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -26,6 +26,7 @@ config X86_64 depends on 64BIT # Options that are inherently 64-bit kernel only: select ARCH_HAS_GIGANTIC_PAGE + select ARCH_HAS_SHADOW_STACK select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_USE_CMPXCHG_LOCKREF select HAVE_ARCH_SOFT_DIRTY @@ -1936,6 +1937,23 @@ config X86_SGX If unsure, say N. +config ARCH_HAS_SHADOW_STACK + def_bool n + +config X86_SHADOW_STACK + prompt "X86 Shadow Stack" + def_bool n + depends on ARCH_HAS_SHADOW_STACK + select ARCH_USES_HIGH_VMA_FLAGS + help + Shadow Stack protection is a hardware feature that detects function + return address corruption. Today the kernel's support is limited to + virtualizing it in KVM guests. + + CPUs supporting shadow stacks were first released in 2020. + + If unsure, say N. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler index 26b8c08e2fc4..00c79dd93651 100644 --- a/arch/x86/Kconfig.assembler +++ b/arch/x86/Kconfig.assembler @@ -19,3 +19,8 @@ config AS_TPAUSE def_bool $(as-instr,tpause %ecx) help Supported by binutils >= 2.31.1 and LLVM integrated assembler >= V7 + +config AS_WRUSS + def_bool $(as-instr,wrussq %rax$(comma)(%rbx)) + help + Supported by binutils >= 2.31 and LLVM integrated assembler From patchwork Thu Sep 29 22:29:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28FF1C433F5 for ; Thu, 29 Sep 2022 22:29:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 449356B0075; Thu, 29 Sep 2022 18:29:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FA6C6B0078; Thu, 29 Sep 2022 18:29:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24BEF6B007B; Thu, 29 Sep 2022 18:29:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0F12B6B0075 for ; Thu, 29 Sep 2022 18:29:54 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id CA6C414035C for ; Thu, 29 Sep 2022 22:29:53 +0000 (UTC) X-FDA: 79966566666.11.70A77C6 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id 300234000E for ; Thu, 29 Sep 2022 22:29:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490593; x=1696026593; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Kp1mPFDaKOAGIHvHbRWlRoW3a8eGfU8Up+9Fdi9eA7A=; b=nOsEAfrDz193RXyeKBajOFCCBgiCaBCYDYpQbQ/zG8rX3NlWEcrsBwP3 bvRLNzNee2fcFthr+1L8GA8Tsq7g+5F+9JIpgdDUyCieeW3WJdm22Ti8h BoI5E3Eb0tfQqvrT6VI1+AvaWIsAtC/MaNUXvsOAtLXNlJbNdpdGmUPCP VfEwMVbMYwqbnJ8+mkH7GYFDWNFAZuo7mbouuK+XY9gyGd8mgaONTBZKJ n8cq6T8KOHvjAKDsAfoyBN8hiQoyB5RpMu/1cWx3sDnr0i+uqyd3cUrhP vViuHOXGnG3GG6W/USUKY2U6RVAtb6QFwXHKyCgeh1WAlLtGHJGNzLK+C Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420320" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420320" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:52 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016081" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016081" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:50 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 03/39] x86/cpufeatures: Add CPU feature flags for shadow stacks Date: Thu, 29 Sep 2022 15:29:00 -0700 Message-Id: <20220929222936.14584-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490593; a=rsa-sha256; cv=none; b=cP4bi4ifFX52/CaPiswgjIndKBCBD/VlmmAq8lPg+nGcyKadjwAnke33qSOFN0UfvTIj1U 1eSoz9GwLN3q3cob/EmN4jbRxiW9Kjm7I+EUn40V4PKxIKhUpRvohrWvGlzOBxBEYyIq0/ Qz2VM52xob39DZUNnM8dxYW43fxDGuI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=nOsEAfrD; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490593; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=gasIAjXPJ5DSCHj/M4rU2rKPBcrxAsvcpv8AhobX+M8=; b=e7vFo5FR6RYETu/Y59JlXiw1QrUOZBfQnB7k67sOGFYlLz2XlZC2eg4IF+KSimG1qQ0rrc 6Wpjy+eenpDq5KyNbxNZgxQswNMFrjKU67n6EPiYkS7fUSb6id37VFeLQuCG/02xZwWR1Y V2mbwNOXxqeMmE23y8e7dLyseWHlXCQ= X-Rspamd-Queue-Id: 300234000E X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=nOsEAfrD; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: 37mrrhih3xeuiotxrcnbc5bzmkanfodu X-HE-Tag: 1664490593-701083 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu The Control-Flow Enforcement Technology contains two related features, one of which is Shadow Stacks. Future patches will utilize this feature for shadow stack support in KVM, so add a CPU feature flags for Shadow Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]). To protect shadow stack state from malicious modification, the registers are only accessible in supervisor mode. This implementation context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend on XSAVES. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook Reviewed-by: Borislav Petkov --- v2: - Remove IBT reference in commit log (Kees) - Describe xsaves dependency using text from (Dave) v1: - Remove IBT, can be added in a follow on IBT series. Yu-cheng v25: - Make X86_FEATURE_IBT depend on X86_FEATURE_SHSTK. Yu-cheng v24: - Update for splitting CONFIG_X86_CET to CONFIG_X86_SHADOW_STACK and CONFIG_X86_IBT. - Move DISABLE_IBT definition to the IBT series. arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/kernel/cpu/cpuid-deps.c | 1 + 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index ef4775c6db01..d0b49da95c70 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -365,6 +365,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* Shadow Stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 33d2cd04d254..00fe41eee92d 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -87,6 +87,12 @@ # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) #endif +#ifdef CONFIG_X86_SHADOW_STACK +#define DISABLE_SHSTK 0 +#else +#define DISABLE_SHSTK (1 << (X86_FEATURE_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -107,7 +113,7 @@ #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \ - DISABLE_ENQCMD) + DISABLE_ENQCMD|DISABLE_SHSTK) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 #define DISABLED_MASK19 0 diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index c881bcafba7d..bf1b55a1ba21 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -78,6 +78,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_XFD, X86_FEATURE_XSAVES }, { X86_FEATURE_XFD, X86_FEATURE_XGETBV1 }, { X86_FEATURE_AMX_TILE, X86_FEATURE_XFD }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, {} }; From patchwork Thu Sep 29 22:29:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994658 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDBCBC4321E for ; Thu, 29 Sep 2022 22:29:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 583666B0078; Thu, 29 Sep 2022 18:29:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 496C16B007B; Thu, 29 Sep 2022 18:29:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24C8A6B007D; Thu, 29 Sep 2022 18:29:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 15FF96B0078 for ; Thu, 29 Sep 2022 18:29:56 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id DEF3DABA03 for ; Thu, 29 Sep 2022 22:29:55 +0000 (UTC) X-FDA: 79966566750.16.06A637E Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id 5A39E4000E for ; Thu, 29 Sep 2022 22:29:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490595; x=1696026595; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=aQ5z6eOsVvFJ2Mg3fViTs1TwTvR0tJtyuBHvl8Ak6H4=; b=i+GLSjqVxqwZzDEZvf39+OCk93/XQdKdlfMZsDR+CAFmq/QA9joaAUDt IEAuq5DHPPejxi7NDTjVQL3r1IShzgGoRW3+wjRl5/VOqylBPEcDIuYZ1 T4pzZe2nZinRxsaWh9eJTDBMy8vPTpRm5bL7MicSIvFthLxqV4jJJkjRp Wjo0mBfDezdjBlRFzfxeKDibkWZ3YzBKx3OvonUo2wlVM8rnmRxLQdudQ oEj01yti2SbLLTEyGgtN5d+HwFgp4HNs9P695dzSax9CvPq2tYEKaGBWJ gRRdUM/nu4/9aFJj7veYY2DJwcPZP2762dezPmkRZJUHn89KzgOOAadJ2 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420326" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420326" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:54 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016087" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016087" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:52 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 04/39] x86/cpufeatures: Enable CET CR4 bit for shadow stack Date: Thu, 29 Sep 2022 15:29:01 -0700 Message-Id: <20220929222936.14584-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490595; a=rsa-sha256; cv=none; b=3EM9/0UDYrwA0WKzVwHkxVB2KF+Gk2/WTyxYuje8nc6N1YEqeV0br6/otZfdSjW+VXv+KG tXy6kbBzkeLT4MRuTr+Z2G5JJ6YO9PtcF40RzSJCKAA5C+GTaWz+HAowgcYnioMe4987gp NJ3lTFzjf6yqV1wJ8mMi3wzi8upGJLw= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=i+GLSjqV; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490595; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=OQwgkH5qeQG8mhrHgr+T4c1CVgHWe4YBvkrgm0uxPJ0=; b=Ce5NN/S00kkcIlHfya8JGwCXHJUit2POLObKxIiD1STkxj/8j8E9LRFjBk1LOZNVLb8DrX ubVsi+5S+xzLZzzV98ybWJv6fbTTdVDaomkrmYCXM1BfsQW2weoQbesHkv1xdCQwuGBvn9 SDgm8Eqd9cAQyZOWcoOazCJhaqlQZXI= X-Rspamd-Queue-Id: 5A39E4000E X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=i+GLSjqV; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: gawu67uek4xhhotc18tn4muqyu6duabu X-HE-Tag: 1664490595-524214 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Utilizing CET features requires a CR4 bit to be enabled as well as bits to be set in CET MSRs. Setting the CR4 bit does two things: 1. Enables the usage of WRUSS instruction, which the kernel can use to write to userspace shadow stacks. 2. Allows those individual aspects of CET to be enabled later via the MSR. 3. Allows CET to be enabled in guests While future patches will allow the MSR values to be saved and restored per task, the CR4 bit will allow for WRUSS to be used regardless of if a tasks CET MSRs have been restored. Kernel IBT already enables the CET CR4 bit when it detects IBT HW support and is configured with kernel IBT. However future patches that enable userspace shadow stack support will need the bit set as well. So change the logic to enable it in either case. Clear MSR_IA32_U_CET in cet_disable() so that it can't live to see userspace in a new kexec-ed kernel that has CR4.CET set from kernel IBT. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v2: - In the shadow stack case, go back to only setting CR4.CET if the kernel is compiled with user shadow stack support. - Clear MSR_IA32_U_CET as well. (PeterZ) KVM refresh: - Set CR4.CET if SHSTK or IBT are supported by HW, so that KVM can support CET even if IBT is disabled. - Drop no_user_shstk (Dave Hansen) - Elaborate on what the CR4 bit does in the commit log - Integrate with Kernel IBT logic v1: - Moved kernel-parameters.txt changes here from patch 1. Yu-cheng v25: - Remove software-defined X86_FEATURE_CET. arch/x86/kernel/cpu/common.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 3e508f239098..d7415bb556b2 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -598,16 +598,21 @@ __noendbr void ibt_restore(u64 save) static __always_inline void setup_cet(struct cpuinfo_x86 *c) { - u64 msr = CET_ENDBR_EN; + bool kernel_ibt = HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT); + bool user_shstk = IS_ENABLED(CONFIG_X86_SHADOW_STACK) && + cpu_feature_enabled(X86_FEATURE_SHSTK); + u64 msr = 0; - if (!HAS_KERNEL_IBT || - !cpu_feature_enabled(X86_FEATURE_IBT)) + if (!kernel_ibt && !user_shstk) return; + if (kernel_ibt) + msr = CET_ENDBR_EN; + wrmsrl(MSR_IA32_S_CET, msr); cr4_set_bits(X86_CR4_CET); - if (!ibt_selftest()) { + if (kernel_ibt && !ibt_selftest()) { pr_err("IBT selftest: Failed!\n"); setup_clear_cpu_cap(X86_FEATURE_IBT); return; @@ -616,10 +621,15 @@ static __always_inline void setup_cet(struct cpuinfo_x86 *c) __noendbr void cet_disable(void) { - if (cpu_feature_enabled(X86_FEATURE_IBT)) - wrmsrl(MSR_IA32_S_CET, 0); + if (!(cpu_feature_enabled(X86_FEATURE_IBT) || + cpu_feature_enabled(X86_FEATURE_SHSTK))) + return; + + wrmsrl(MSR_IA32_S_CET, 0); + wrmsrl(MSR_IA32_U_CET, 0); } + /* * Some CPU features depend on higher CPUID levels, which may not always * be available due to CPUID level capping or broken virtualization From patchwork Thu Sep 29 22:29:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E33C6C433FE for ; Thu, 29 Sep 2022 22:29:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 756616B007B; Thu, 29 Sep 2022 18:29:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 706266B007D; Thu, 29 Sep 2022 18:29:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 50C978D0001; Thu, 29 Sep 2022 18:29:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3EAF76B007B for ; Thu, 29 Sep 2022 18:29:58 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 14998813CC for ; Thu, 29 Sep 2022 22:29:58 +0000 (UTC) X-FDA: 79966566876.22.4E28CC9 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id 1EB6E40004 for ; Thu, 29 Sep 2022 22:29:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490597; x=1696026597; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=LoFGHvMxo/XLUyc6wYvZ2GlUzqPZTSOyLeaXWuDfgbI=; b=DlKBLzafF85oAC4C6HWNmCa1luFdYuIOY4tc2pEytlKGy0bJiYkHoTV4 Aej38HUlm6NrkA73NVWLd5f4/cFKfheOrHm5Jsuqzhi/FdYVNCXjhyQlk hHphZd4wD9GYLtcJElwRWPkrmWJVIieqWRkZ2B82ID8PTsT04O6qZ0a1Q 15DTGBuh+EG8k0e/q95g3P9bynhCIHqFJJYynVb37po/TEIT9elt4dvwJ 0tazoCZs8eW+Mp9SOPN2Cckq0Do5XJ/cajENLvsIPxYk4wRMWeq6gMX3m ppCkfNG60400r2nRskjPQzuIooh1LzQDiwrYDQ/V14jH3cW3x9ldbsCdc g==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420332" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420332" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:56 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016091" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016091" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:54 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 05/39] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Date: Thu, 29 Sep 2022 15:29:02 -0700 Message-Id: <20220929222936.14584-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490597; a=rsa-sha256; cv=none; b=Ia/pqy+Fuc4sOHRo9x2qT/Ty0itaDzB87A1RXdPCARddAf9Bkjdil+cL7fOxDaZOEWk7Pt gejs3RmfZ7fMyKfWi7/h5osk9bE+eG1c2ZzKlS1SAp5bXZlFiAEj4QxOC4+67gLIL3jvbk hSPh8tfBcL5ptoVtzB8KKI1uMCCypW0= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=DlKBLzaf; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490597; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=f69RDIxh6uqPybRXYYDf6w3JZ/fdsozoKk2P+kDXMyc=; b=r68yMJDFTS0rZxeqH7br/7QC02UnKv9ZAOr9NkJYohvYijK1yV74MsFJV0BgYtrBZimT6r MN/B3dF2KZ+q/k5bupsqnHHU6Lan1fgWaEW5AMp4w1LYaoDd7DjhhJS5ZIBP4QB7Axwrbp AnHgscb7pApzHTxtksiiffeISIqrUR8= X-Rspamd-Queue-Id: 1EB6E40004 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=DlKBLzaf; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: o5e7gbtioinejfpfxj3r7j68nd8qko3q X-HE-Tag: 1664490596-940825 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Shadow stack register state can be managed with XSAVE. The registers can logically be separated into two groups: * Registers controlling user-mode operation * Registers controlling kernel-mode operation The architecture has two new XSAVE state components: one for each group of those groups of registers. This lets an OS manage them separately if it chooses. Future patches for host userspace and KVM guests will only utilize the user-mode registers, so only configure XSAVE to save user-mode registers. This state will add 16 bytes to the xsave buffer size. Future patches will use the user-mode XSAVE area to save guest user-mode CET state. However, VMCS includes new fields for guest CET supervisor states. KVM can use these to save and restore guest supervisor state, so host supervisor XSAVE support is not required. Adding this exacerbates the already unwieldy if statement in check_xstate_against_struct() that handles warning about un-implemented xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when it actually check's the xfeature. This ends up exceeding 80 chars, but was better on balance than other options explored. Pass the bool as pointer to make it clear that XCHECK_SZ() can change the variable. While configuring user-mode XSAVE, clarify kernel-mode registers are not managed by XSAVE by defining the xfeature in XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT. This serves more of a documentation as code purpose, and functionally, only enables a few safety checks. Both XSAVE state components are supervisor states, even the state controlling user-mode operation. This is a departure from earlier features like protection keys where the PKRU state a normal user (non-supervisor) state. Having the user state be supervisor-managed ensures there is no direct, unprivileged access to it, making it harder for an attacker to subvert CET. To facilitate this privileged access, define the two user-mode CET MSRs, and the bits defined in those MSRs relevant to future shadow stack enablement patches. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v2: - Change name to XFEATURE_CET_KERNEL_UNUSED (peterz) KVM refresh: - Reword commit log using some verbiage posted by Dave Hansen - Remove unlikely to be used supervisor cet xsave struct - Clarify that supervisor cet state is not saved by xsave - Remove unused supervisor MSRs v1: - Remove outdated reference to sigreturn checks on msr's. Yu-cheng v29: - Move CET MSR definition up in msr-index.h. arch/x86/include/asm/fpu/types.h | 14 ++++- arch/x86/include/asm/fpu/xstate.h | 6 +- arch/x86/kernel/fpu/xstate.c | 93 ++++++++++++++++--------------- 3 files changed, 63 insertions(+), 50 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index eb7cd1139d97..344baad02b97 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_PASID, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL_UNUSED, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -138,6 +138,8 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG) #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) @@ -252,6 +254,14 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + u64 user_cet; /* user control-flow settings */ + u64 user_ssp; /* user shadow stack pointer */ +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index cd3dd170e23a..d4427b88ee12 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -50,7 +50,8 @@ #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -77,7 +78,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index c8340156bfd2..5e6a4867fd05 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -39,26 +39,26 @@ */ static const char *xfeature_names[] = { - "x87 floating point registers" , - "SSE registers" , - "AVX registers" , - "MPX bounds registers" , - "MPX CSR" , - "AVX-512 opmask" , - "AVX-512 Hi256" , - "AVX-512 ZMM_Hi256" , - "Processor Trace (unused)" , - "Protection Keys User registers", - "PASID state", - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "AMX Tile config" , - "AMX Tile data" , - "unknown xstate feature" , + "x87 floating point registers" , + "SSE registers" , + "AVX registers" , + "MPX bounds registers" , + "MPX CSR" , + "AVX-512 opmask" , + "AVX-512 Hi256" , + "AVX-512 ZMM_Hi256" , + "Processor Trace (unused)" , + "Protection Keys User registers" , + "PASID state" , + "Control-flow User registers" , + "Control-flow Kernel registers (unused)" , + "unknown xstate feature" , + "unknown xstate feature" , + "unknown xstate feature" , + "unknown xstate feature" , + "AMX Tile config" , + "AMX Tile data" , + "unknown xstate feature" , }; static unsigned short xsave_cpuid_features[] __initdata = { @@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = { [XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT, [XFEATURE_PKRU] = X86_FEATURE_PKU, [XFEATURE_PASID] = X86_FEATURE_ENQCMD, + [XFEATURE_CET_USER] = X86_FEATURE_SHSTK, [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE, [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE, }; @@ -276,6 +277,7 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_CET_USER); print_xstate_feature(XFEATURE_MASK_XTILE_CFG); print_xstate_feature(XFEATURE_MASK_XTILE_DATA); } @@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate) XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR | \ XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER | \ XFEATURE_MASK_XTILE) /* @@ -446,13 +449,14 @@ static void __init __xstate_dump_leaves(void) } \ } while (0) -#define XCHECK_SZ(sz, nr, nr_macro, __struct) do { \ - if ((nr == nr_macro) && \ - WARN_ONCE(sz != sizeof(__struct), \ - "%s: struct is %zu bytes, cpu state %d bytes\n", \ - __stringify(nr_macro), sizeof(__struct), sz)) { \ - __xstate_dump_leaves(); \ - } \ +#define XCHECK_SZ(checked, sz, nr, nr_macro, __struct) do { \ + if (nr == nr_macro) { \ + *checked = true; \ + if (WARN_ONCE(sz != sizeof(__struct), \ + "%s: struct is %zu bytes, cpu state %d bytes\n", \ + __stringify(nr_macro), sizeof(__struct), sz)) \ + __xstate_dump_leaves(); \ + } \ } while (0) /** @@ -527,33 +531,30 @@ static bool __init check_xstate_against_struct(int nr) * Ask the CPU for the size of the state. */ int sz = xfeature_size(nr); + bool chked = false; + /* * Match each CPU state with the corresponding software * structure. */ - XCHECK_SZ(sz, nr, XFEATURE_YMM, struct ymmh_struct); - XCHECK_SZ(sz, nr, XFEATURE_BNDREGS, struct mpx_bndreg_state); - XCHECK_SZ(sz, nr, XFEATURE_BNDCSR, struct mpx_bndcsr_state); - XCHECK_SZ(sz, nr, XFEATURE_OPMASK, struct avx_512_opmask_state); - XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state); - XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); - XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); - XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); - XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg); + XCHECK_SZ(&chked, sz, nr, XFEATURE_YMM, struct ymmh_struct); + XCHECK_SZ(&chked, sz, nr, XFEATURE_BNDREGS, struct mpx_bndreg_state); + XCHECK_SZ(&chked, sz, nr, XFEATURE_BNDCSR, struct mpx_bndcsr_state); + XCHECK_SZ(&chked, sz, nr, XFEATURE_OPMASK, struct avx_512_opmask_state); + XCHECK_SZ(&chked, sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state); + XCHECK_SZ(&chked, sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); + XCHECK_SZ(&chked, sz, nr, XFEATURE_PKRU, struct pkru_state); + XCHECK_SZ(&chked, sz, nr, XFEATURE_PASID, struct ia32_pasid_state); + XCHECK_SZ(&chked, sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg); + XCHECK_SZ(&chked, sz, nr, XFEATURE_CET_USER, struct cet_user_state); /* The tile data size varies between implementations. */ - if (nr == XFEATURE_XTILE_DATA) + if (nr == XFEATURE_XTILE_DATA) { check_xtile_data_against_struct(sz); + chked = true; + } - /* - * Make *SURE* to add any feature numbers in below if - * there are "holes" in the xsave state component - * numbers. - */ - if ((nr < XFEATURE_YMM) || - (nr >= XFEATURE_MAX) || - (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) { + if (!chked) { WARN_ONCE(1, "no structure for xstate: %d\n", nr); XSTATE_WARN_ON(1); return false; From patchwork Thu Sep 29 22:29:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994660 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49A8EC4332F for ; Thu, 29 Sep 2022 22:30:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9A0C6B007D; Thu, 29 Sep 2022 18:30:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D21256B007E; Thu, 29 Sep 2022 18:30:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B73378D0001; Thu, 29 Sep 2022 18:30:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A9DC86B007D for ; Thu, 29 Sep 2022 18:30:00 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 830A3A13EE for ; Thu, 29 Sep 2022 22:30:00 +0000 (UTC) X-FDA: 79966566960.28.157DE96 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id DB2184000E for ; Thu, 29 Sep 2022 22:29:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490598; x=1696026598; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=FoZEd4RgFgbf1YU6s4vPiaf1vvw8efRraUc3vBPTh/I=; b=jyYKOHQ6M2YmdLStejXRN3i+r9V89KH7FHpp7GlQrO/sKVFXFwj3NShW A+gKE/pDBnuO/DDaL+e6S43ZCBj15BfBmvdtm6OPOu58etJxZERxnuz7m CcDfthSMig+dRtlyMU2uAwtU1EiYTYta0tkNqq+7jgL3gwuRIoQgCj/Bz 7O3WngBVSoItPzwBSiIncHLpcHgZuHIMBPdhckutHu2raQN2V1nj8m92t Sizr0EFLgwy9E3L+XiAv656pFg26c4O0y6y3+8ktu0qOEVeBeiThPUCdj bxVRlNV5Ggz0Rb/zPBjKCoTFzCq3kkzcyIt2ZNot3X2YjRwUo/RqYHquf w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420337" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420337" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:58 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016103" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016103" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:56 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v2 06/39] x86/fpu: Add helper for modifying xstate Date: Thu, 29 Sep 2022 15:29:03 -0700 Message-Id: <20220929222936.14584-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490599; a=rsa-sha256; cv=none; b=UYYgxo7d5kMHaFUwWh1rkZROBFyHOqhXyE6VZQ0DOzsLJd+4HBpy2YLKTuaaD1MBLN8+yk FDEWqy5bvkmq89KsHM2IsD09rQEu2DzcA0OCn4L+EuM9tALGxLM6NHWL6wr2NU4AoKBNb/ eGX8S62nh/FVUZrU9HVKqdZynsMV0jk= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=jyYKOHQ6; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490599; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=f3+bB0MvrLszBcKEP57kKKEiykVoieJOxmHGWYpEeP4=; b=aZG5KiLKJ9lgqha3h1VUCKx7emZxIIB9I3X6QTlhExH80PDfzmOEYuYjHOxTkhCFuQxPWf LqzI34/O8Ki2JN0K/CdOr7765dHK+ceN/ZjT8CdsKO061hqa83YhT+aKafU1Z8Y2HUz29E bc6NGxvjxEyOBLGguDPFma+pCoIfdRA= X-Rspamd-Queue-Id: DB2184000E X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=jyYKOHQ6; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: ifjskw3ssdk1pwoacajxdsrwnb658wdh X-HE-Tag: 1664490598-438966 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Just like user xfeatures, supervisor xfeatures can be active in the registers or present in the task FPU buffer. If the registers are active, the registers can be modified directly. If the registers are not active, the modification must be performed on the task FPU buffer. When the state is not active, the kernel could perform modifications directly to the buffer. But in order for it to do that, it needs to know where in the buffer the specific state it wants to modify is located. Doing this is not robust against optimizations that compact the FPU buffer, as each access would require computing where in the buffer it is. The easiest way to modify supervisor xfeature data is to force restore the registers and write directly to the MSRs. Often times this is just fine anyway as the registers need to be restored before returning to userspace. Do this for now, leaving buffer writing optimizations for the future. Add a new function fpregs_lock_and_load() that can simultaneously call fpregs_lock() and do this restore. Also perform some extra sanity checks in this function since this will be used in non-fpu focused code. Suggested-by: Thomas Gleixner Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v2: - Drop optimization of writing directly the buffer, and change API accordingly. - fpregs_lock_and_load() suggested by tglx - Some commit log verbiage from dhansen v1: - New patch. arch/x86/include/asm/fpu/api.h | 6 ++++++ arch/x86/kernel/fpu/core.c | 19 +++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 503a577814b2..3a86ee18ae99 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -82,6 +82,12 @@ static inline void fpregs_unlock(void) preempt_enable(); } +/* + * Lock and load the fpu state into the registers, if they are not already + * loaded. + */ +void fpu_lock_and_load(void); + #ifdef CONFIG_X86_DEBUG_FPU extern void fpregs_assert_state_consistent(void); #else diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 3b28c5b25e12..778d3054ccc7 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -756,6 +756,25 @@ void switch_fpu_return(void) } EXPORT_SYMBOL_GPL(switch_fpu_return); +void fpu_lock_and_load(void) +{ + /* + * fpregs_lock() only disables preemption (mostly). So modifing state + * in an interrupt could screw up some in progress fpregs operation, + * but appear to work. Warn about it. + */ + WARN_ON_ONCE(!irq_fpu_usable()); + WARN_ON_ONCE(current->flags & PF_KTHREAD); + + fpregs_lock(); + + fpregs_assert_state_consistent(); + + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + fpregs_restore_userregs(); +} +EXPORT_SYMBOL_GPL(fpu_lock_and_load); + #ifdef CONFIG_X86_DEBUG_FPU /* * If current FPU state according to its tracking (loaded FPU context on this From patchwork Thu Sep 29 22:29:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994661 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96101C433FE for ; Thu, 29 Sep 2022 22:30:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C41C6B007E; Thu, 29 Sep 2022 18:30:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 29AF26B0080; Thu, 29 Sep 2022 18:30:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0EBF08D0001; Thu, 29 Sep 2022 18:30:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id F0C526B007E for ; Thu, 29 Sep 2022 18:30:01 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CDED31213F1 for ; Thu, 29 Sep 2022 22:30:01 +0000 (UTC) X-FDA: 79966567002.16.32D508D Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id 0F5DD4000E for ; Thu, 29 Sep 2022 22:30:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490601; x=1696026601; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=MwsMCEjvOOofiHSsEHK6TEhAP3HReNlZTKgCWCup6IY=; b=nuH6xA9Z6OOBzRm3boJQ5pL12eqhbwwTH4zxa31nWXalyY9NvfhLDIZV dOGB5iXEU2P6FX/TPuS4LEPY93ERclKArw9eA6wbvc1ARQMJr3HA86tQy 1Jwasyr5R2KTGYedMxDtfr8MKUVNkEfwn2GSt5xM7opnT4BdDdK5c5CaP GeSRFeZEWF41d7akNEEcjDs/x5+MdfCKHUrcVbU7b/fsaZHoZ8G+vLWph xkjJ7TQDmgvlfw0wotIwjeHArX6qdlBtPxGokMpeBrsOxti2je9eTTc6J evpTG7HhFOWRAmMupbJb2aoJ29X6NPiIp3FfhCtcVMEbE6p9Z2xpU8qV4 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420344" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420344" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:00 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016114" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016114" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:29:58 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Michael Kerrisk Subject: [PATCH v2 07/39] x86/cet: Add user control-protection fault handler Date: Thu, 29 Sep 2022 15:29:04 -0700 Message-Id: <20220929222936.14584-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490601; a=rsa-sha256; cv=none; b=z7LWwsNcJEwTYPvyHZpYJ2IDLaKT/z/PMdfZ9QThogfH8IYj0qMYMIdKC9UNeoFggIA6Vk sViF5zb63n2Jb/h1/RPemmKeW/q66pMn4TupsKWyO38XPQDhSQZYyW8FiBssMY8VoJu1R6 y+SuZgmfznYuoAKYSgnMj11vLbxMlSQ= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=nuH6xA9Z; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490601; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=p07br9KQ0mXiKl3gBP/Q9u5oYPFg55v0GjOQLgtOFbk=; b=tHxhdZkbQg5AcLwG2OLLPYVVcpeqRo0hfEUtBB39goQj4Gv9C5RklYO3RukEEyxuAxMiUB wvGYkjGjAKdgAUg6RDsN0dNs4+TOG12h/hS9k5PEWThYsL0n3YZ9UxQxCri8AwckB4nhe0 by8sCZyQDvwvfvvcNMLDPx9TTnPAgNs= X-Rspamd-Queue-Id: 0F5DD4000E X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=nuH6xA9Z; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: 8zh8kqfqnkkpnw45e986ws8co6judwja X-HE-Tag: 1664490600-965203 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the shadow stack. There already exists a control-protection fault handler for handling kernel IBT. Refactor this fault handler into sparate user and kernel handlers, like the page fault handler. Add a control-protection handler for usermode. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Cc: Michael Kerrisk Reviewed-by: Kees Cook --- v2: - Integrate with kernel IBT fault handler - Update printed messages. (Dave) - Remove array_index_nospec() usage. (Dave) - Remove IBT messages. (Dave) - Add enclave error code bit processing it case it can get triggered somehow. - Add extra "unknown" in control_protection_err. v1: - Update static asserts for NSIGSEGV Yu-cheng v29: - Remove pr_emerg() since it is followed by die(). - Change boot_cpu_has() to cpu_feature_enabled(). Yu-cheng v25: - Change CONFIG_X86_CET to CONFIG_X86_SHADOW_STACK. - Change X86_FEATURE_CET to X86_FEATURE_SHSTK. arch/arm/kernel/signal.c | 2 +- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/signal32.c | 2 +- arch/sparc/kernel/signal32.c | 2 +- arch/sparc/kernel/signal_64.c | 2 +- arch/x86/include/asm/idtentry.h | 2 +- arch/x86/kernel/idt.c | 2 +- arch/x86/kernel/signal_compat.c | 2 +- arch/x86/kernel/traps.c | 98 ++++++++++++++++++++++++++---- arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/xen-asm.S | 2 +- include/uapi/asm-generic/siginfo.h | 3 +- 12 files changed, 97 insertions(+), 24 deletions(-) diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index ea128e32e8ca..fa47b8754624 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -681,7 +681,7 @@ asmlinkage void do_rseq_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 9ad911f1647c..81b13a21046e 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -1166,7 +1166,7 @@ void __init minsigstksz_setup(void) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c index 4700f8522d27..bbd542704730 100644 --- a/arch/arm64/kernel/signal32.c +++ b/arch/arm64/kernel/signal32.c @@ -460,7 +460,7 @@ void compat_setup_restart_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c index dad38960d1a8..82da8a2d769d 100644 --- a/arch/sparc/kernel/signal32.c +++ b/arch/sparc/kernel/signal32.c @@ -751,7 +751,7 @@ asmlinkage int do_sys32_sigstack(u32 u_ssptr, u32 u_ossptr, unsigned long sp) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c index 570e43e6fda5..b4e410976e0d 100644 --- a/arch/sparc/kernel/signal_64.c +++ b/arch/sparc/kernel/signal_64.c @@ -562,7 +562,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long orig_i0, unsigned long */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index 72184b0b2219..6768c9d4468c 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -618,7 +618,7 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault); #endif /* #CP */ -#ifdef CONFIG_X86_KERNEL_IBT +#if defined(CONFIG_X86_KERNEL_IBT) || defined(CONFIG_X86_SHADOW_STACK) DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection); #endif diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index a58c6bc1cd68..90cce3614ead 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -107,7 +107,7 @@ static const __initconst struct idt_data def_idts[] = { ISTG(X86_TRAP_MC, asm_exc_machine_check, IST_INDEX_MCE), #endif -#ifdef CONFIG_X86_KERNEL_IBT +#if defined(CONFIG_X86_KERNEL_IBT) || defined(CONFIG_X86_SHADOW_STACK) INTG(X86_TRAP_CP, asm_exc_control_protection), #endif diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index 879ef8c72f5c..d441804443d5 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void) */ BUILD_BUG_ON(NSIGILL != 11); BUILD_BUG_ON(NSIGFPE != 15); - BUILD_BUG_ON(NSIGSEGV != 9); + BUILD_BUG_ON(NSIGSEGV != 10); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 6); BUILD_BUG_ON(NSIGCHLD != 6); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index d62b2cb85cea..b7dde8730236 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -211,12 +211,6 @@ DEFINE_IDTENTRY(exc_overflow) do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL); } -#ifdef CONFIG_X86_KERNEL_IBT - -static __ro_after_init bool ibt_fatal = true; - -extern void ibt_selftest_ip(void); /* code label defined in asm below */ - enum cp_error_code { CP_EC = (1 << 15) - 1, @@ -229,16 +223,74 @@ enum cp_error_code { CP_ENCL = 1 << 15, }; -DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +#ifdef CONFIG_X86_SHADOW_STACK +static const char * const control_protection_err[] = { + "unknown", + "near-ret", + "far-ret/iret", + "endbranch", + "rstorssp", + "setssbsy", +}; + +static DEFINE_RATELIMIT_STATE(cpf_rate, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + +static void do_user_control_protection_fault(struct pt_regs *regs, + unsigned long error_code) { - if (!cpu_feature_enabled(X86_FEATURE_IBT)) { - pr_err("Unexpected #CP\n"); - BUG(); + struct task_struct *tsk; + unsigned long ssp; + + /* Read SSP before enabling interrupts. */ + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + cond_local_irq_enable(regs); + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + WARN_ONCE(1, "User-mode control protection fault with shadow support disabled\n"); + + tsk = current; + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_CP; + + /* Ratelimit to prevent log spamming. */ + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) && + __ratelimit(&cpf_rate)) { + unsigned int cpec; + + cpec = error_code & CP_EC; + if (cpec >= ARRAY_SIZE(control_protection_err)) + cpec = 0; + + pr_emerg("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)%s", + tsk->comm, task_pid_nr(tsk), + regs->ip, regs->sp, ssp, error_code, + control_protection_err[cpec], + error_code & CP_ENCL ? " in enclave" : ""); + print_vma_addr(KERN_CONT " in ", regs->ip); + pr_cont("\n"); } - if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) - return; + force_sig_fault(SIGSEGV, SEGV_CPERR, (void __user *)0); + cond_local_irq_disable(regs); +} +#else +static void do_user_control_protection_fault(struct pt_regs *regs, + unsigned long error_code) +{ + WARN_ONCE(1, "User-mode control protection fault with shadow support disabled\n"); +} +#endif + +#ifdef CONFIG_X86_KERNEL_IBT + +static __ro_after_init bool ibt_fatal = true; + +extern void ibt_selftest_ip(void); /* code label defined in asm below */ +static void do_kernel_control_protection_fault(struct pt_regs *regs) +{ if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { regs->ax = 0; return; @@ -283,9 +335,29 @@ static int __init ibt_setup(char *str) } __setup("ibt=", ibt_setup); - +#else +static void do_kernel_control_protection_fault(struct pt_regs *regs) +{ + WARN_ONCE(1, "Kernel-mode control protection fault with IBT disabled\n"); +} #endif /* CONFIG_X86_KERNEL_IBT */ +#if defined(CONFIG_X86_KERNEL_IBT) || defined(CONFIG_X86_SHADOW_STACK) +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + if (!cpu_feature_enabled(X86_FEATURE_IBT) && + !cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pr_err("Unexpected #CP\n"); + BUG(); + } + + if (user_mode(regs)) + do_user_control_protection_fault(regs, error_code); + else + do_kernel_control_protection_fault(regs); +} +#endif /* defined(CONFIG_X86_KERNEL_IBT) || defined(CONFIG_X86_SHADOW_STACK) */ + #ifdef CONFIG_X86_F00F_BUG void handle_invalid_op(struct pt_regs *regs) #else diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 0ed2e487a693..57faa287163f 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -628,7 +628,7 @@ static struct trap_array_entry trap_array[] = { TRAP_ENTRY(exc_coprocessor_error, false ), TRAP_ENTRY(exc_alignment_check, false ), TRAP_ENTRY(exc_simd_coprocessor_error, false ), -#ifdef CONFIG_X86_KERNEL_IBT +#if defined(CONFIG_X86_KERNEL_IBT) || defined(CONFIG_X86_SHADOW_STACK) TRAP_ENTRY(exc_control_protection, false ), #endif }; diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S index 6b4fdf6b9542..e45ff6300c7d 100644 --- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -148,7 +148,7 @@ xen_pv_trap asm_exc_page_fault xen_pv_trap asm_exc_spurious_interrupt_bug xen_pv_trap asm_exc_coprocessor_error xen_pv_trap asm_exc_alignment_check -#ifdef CONFIG_X86_KERNEL_IBT +#if defined(CONFIG_X86_KERNEL_IBT) || defined(CONFIG_X86_SHADOW_STACK) xen_pv_trap asm_exc_control_protection #endif #ifdef CONFIG_X86_MCE diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ffbe4cec9f32..0f52d0ac47c5 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -242,7 +242,8 @@ typedef struct siginfo { #define SEGV_ADIPERR 7 /* Precise MCD exception */ #define SEGV_MTEAERR 8 /* Asynchronous ARM MTE error */ #define SEGV_MTESERR 9 /* Synchronous ARM MTE exception */ -#define NSIGSEGV 9 +#define SEGV_CPERR 10 /* Control protection fault */ +#define NSIGSEGV 10 /* * SIGBUS si_codes From patchwork Thu Sep 29 22:29:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994662 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69A81C4332F for ; Thu, 29 Sep 2022 22:30:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09F446B0080; Thu, 29 Sep 2022 18:30:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 04FF48D0001; Thu, 29 Sep 2022 18:30:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE3C26B0082; Thu, 29 Sep 2022 18:30:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D05DB6B0080 for ; Thu, 29 Sep 2022 18:30:03 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B1A6B813CB for ; Thu, 29 Sep 2022 22:30:03 +0000 (UTC) X-FDA: 79966567086.27.4488325 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id 1F7CE40009 for ; Thu, 29 Sep 2022 22:30:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490603; x=1696026603; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=SxXCrRN2y+PQFpGewVcJHXNAHqkkdqYVarKu7dXm5kc=; b=jI+ZTxGUpuVcbznEetAoSbJD6GKKimte+u/IT5rPtt/g7Pcnsz6XBY+f Pv/lgQqWf+vywihLRaJLqXle8bc4UPDN/FysN/tBaIrMiytzizldD0QAy tHz6KFciHy72WVnPhP0pX6zLBkDPehJgbZdEi3q3VfOs/vmNBAULom4d+ H9QW01cNCTcnrGfKYYkmChXywOcSXtgM4FCpmlaMRgNskeT92h0OiurzL 6KvBtySzlznfAbiYYcXN3Dakk34eAtSJswC2+m4/fPm1/8h3VivptJ+XZ RMtUUvGyusuRSCTDpXJnMeQ3DKJ1IRKgypn5p76lFI6wNgB+8+D3V7q1h A==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420352" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420352" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:02 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016133" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016133" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:00 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Christoph Hellwig Subject: [PATCH v2 08/39] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Date: Thu, 29 Sep 2022 15:29:05 -0700 Message-Id: <20220929222936.14584-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490603; a=rsa-sha256; cv=none; b=DqR8gpm9De/MrfiwL4CD8mAkctTqJ9Tbs5siVA7L3JKjJc4d/QRRCak56vuHoyBoWoOLez zXKFDRGq9k5Rc+ibeyjgi8LIF8mtxVsiRWQqhPXs69c5bXoXMxZA2DSCKcH6U0nysEcCm5 T7QAJiuY7/GBoqQ9IEu2BXiWBHOE+O0= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=jI+ZTxGU; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490603; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=D79JqGchwmU9nIUWIHs/9FxJjSmWLWFdR7CBX3VhdCM=; b=ZT1TaHBJvIfVValYjBVzTAc84fIr1XGAtpl78Ow9HBJJ+Yuvbj4O4kDRSP65B8s/SLObyF /Y6WGOhd3yJvSlhOuvEftYPyX6oqTNO8SFJXI6tmSn9HqzNCQc3d9mWUus/VAG/gxZQTLn cZa1o3JmjmvxJas/bKU2K7PHZFR1PE8= X-Rspamd-Queue-Id: 1F7CE40009 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=jI+ZTxGU; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: smq7qqrxshspb4jkqctjd6b4f4wo9xtp X-HE-Tag: 1664490602-284817 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Processors sometimes directly create Write=0,Dirty=1 PTEs. These PTEs are created by software. One such case is that kernel read-only pages are historically set up as Dirty. New processors that support Shadow Stack regard Write=0,Dirty=1 PTEs as shadow stack pages. When CR4.CET=1 and IA32_S_CET.SH_STK_EN=1, some instructions can write to such supervisor memory. The kernel does not set IA32_S_CET.SH_STK_EN, but to reduce ambiguity between shadow stack and regular Write=0 pages, removed Dirty=1 from any kernel Write=0 PTEs. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: "H. Peter Anvin" Cc: Kees Cook Cc: Thomas Gleixner Cc: Dave Hansen Cc: Christoph Hellwig Cc: Andy Lutomirski Cc: Ingo Molnar Cc: Borislav Petkov Cc: Peter Zijlstra --- v2: - Normalize PTE bit descriptions between patches arch/x86/include/asm/pgtable_types.h | 6 +++--- arch/x86/mm/pat/set_memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index aa174fed3a71..ff82237e7b6b 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -192,10 +192,10 @@ enum page_cache_mode { #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 1abd5438f126..ed9193b469ba 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1977,7 +1977,7 @@ int set_memory_nx(unsigned long addr, int numpages) int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0); } int set_memory_rw(unsigned long addr, int numpages) From patchwork Thu Sep 29 22:29:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B3DFC433F5 for ; Thu, 29 Sep 2022 22:30:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C7138D0002; Thu, 29 Sep 2022 18:30:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 077E58D0001; Thu, 29 Sep 2022 18:30:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBD578D0002; Thu, 29 Sep 2022 18:30:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C96F18D0001 for ; Thu, 29 Sep 2022 18:30:05 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9DD11140203 for ; Thu, 29 Sep 2022 22:30:05 +0000 (UTC) X-FDA: 79966567170.08.2EE37FF Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id CF7CA40010 for ; Thu, 29 Sep 2022 22:30:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490604; x=1696026604; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=EMJDMBIIqSBkyZw8axj1oJhYe4QJUZ5xpfQW/D7WNF8=; b=XgMcTRNxrfnXq0TQMrmkGCxbF8OzdR26ii4iFT8ZjrTQ69SsHWGSa13k TuKamjNduRa1QBwrvQ6lKd2FPXP8HGGePkkOhGLhsLYmTojgMqYfIa8Bl YxU9jcrEBpSjsucBtPnDpnxsTYYODEZ4ms/eFpk4eN94t9lSjV8f9fr8v /rnzbcmnZt9i1Bxwu6Jg/UtOXDH6gi9UCj0rpS5LHv0aGYLiCEe2bQhxU ME/yYY8URTPNz5MBuhT+yK4uwaVYo7w7MpbDh+CLMqmlxu+mqpfYwfVeC esLmQT/GpIKelQG8zRZuxEWt6PFvrzxxU9DfstaPmCjVd9CaPCHY2FThz g==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420363" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420363" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:04 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016153" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016153" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:02 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 09/39] x86/mm: Move pmd_write(), pud_write() up in the file Date: Thu, 29 Sep 2022 15:29:06 -0700 Message-Id: <20220929222936.14584-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490605; a=rsa-sha256; cv=none; b=X+yYAcBWDRku041YGJnpmC8OBkvWYXHh3BbmUKv3PzE7jIAzUVcCYQE4KPePhgmDCZqu3X Xs9ZfDCa/vTnr8hsrHHdB7XcjTdZCpQ9+dVYIN+1Qhc/fXrgKam1QrY1iMas5nxARDdTyc oKk6tPtbBPyCjKhO2qaZkTPkDdirdeM= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=XgMcTRNx; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490605; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=sQaItF/InnS1iMhJqwSDM3wa0XHsPz05Q0poFreRGNk=; b=WnvgdvW1jGdm2C2agz3ogJHq2rfhY2WTKDtchi2hJoJOyUQQFr2S4C0aqeqifJyi1dUCV9 0evYAa85XXXf/bOd/PIfjGIDjV4yVBtmZ/VCgM1CRPA3BXWYB9SHb4C1/srNkbRHOFBiw1 4xk4rHlCewZPwZVY2EUdeyPZj8KRPKU= X-Rspamd-Queue-Id: CF7CA40010 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=XgMcTRNx; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: ah36iuu47ptzrftyo9subidwe1zpptgs X-HE-Tag: 1664490604-76936 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu To prepare the introduction of _PAGE_COW, move pmd_write() and pud_write() up in the file, so that they can be used by other helpers below. No functional changes. Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- arch/x86/include/asm/pgtable.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 44e2d6f1dbaa..6496ec84b953 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -159,6 +159,18 @@ static inline int pte_write(pte_t pte) return pte_flags(pte) & _PAGE_RW; } +#define pmd_write pmd_write +static inline int pmd_write(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_RW; +} + +#define pud_write pud_write +static inline int pud_write(pud_t pud) +{ + return pud_flags(pud) & _PAGE_RW; +} + static inline int pte_huge(pte_t pte) { return pte_flags(pte) & _PAGE_PSE; @@ -1102,12 +1114,6 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); -#define pmd_write pmd_write -static inline int pmd_write(pmd_t pmd) -{ - return pmd_flags(pmd) & _PAGE_RW; -} - #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) @@ -1137,12 +1143,6 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } -#define pud_write pud_write -static inline int pud_write(pud_t pud) -{ - return pud_flags(pud) & _PAGE_RW; -} - #ifndef pmdp_establish #define pmdp_establish pmdp_establish static inline pmd_t pmdp_establish(struct vm_area_struct *vma, From patchwork Thu Sep 29 22:29:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E915AC433F5 for ; Thu, 29 Sep 2022 22:30:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 841378D0003; Thu, 29 Sep 2022 18:30:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 819378D0001; Thu, 29 Sep 2022 18:30:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 691878D0003; Thu, 29 Sep 2022 18:30:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 54FAA8D0001 for ; Thu, 29 Sep 2022 18:30:08 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A3F1816052D for ; Thu, 29 Sep 2022 22:30:07 +0000 (UTC) X-FDA: 79966567254.19.CBB780F Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id E536040004 for ; Thu, 29 Sep 2022 22:30:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490606; x=1696026606; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=g1pYpuPS14apQh0BEoJpa6H8/WMotUpKE4DNuLPNKDA=; b=b7fg3dfzDSbwb2fn2ZwlyUnm/iLa1y4ovMELk+fJfNiZVr5sUKo1RRKT 6d8IYVIdX7cMeASEdfuLIwqhdHwgwHvsHI0Onfw4T5JKxxfe3duS0A8hC fZbkCWimJ1Z0g31jN6qVr+wb/SJV5uEU8+KGLEpqXunOiXa7SzNbO+jKH rdIDwrZc38trA2crNxzo6xklPWpdNCh8GudTyxRWbzsfFFTvb0CHga30P H2/iFyd5ky84ioJl7rgEmeRrsrriPuKoMtcOz5Pjh3d4Bg6ky7fPFihLC p7fuA/1K+R8n1oz2gINiCrYDHKWqNKoX75JRfwiZH4+7W6UFEHd4aY/nA A==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420385" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420385" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:06 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016164" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016164" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:04 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 10/39] x86/mm: Introduce _PAGE_COW Date: Thu, 29 Sep 2022 15:29:07 -0700 Message-Id: <20220929222936.14584-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490607; a=rsa-sha256; cv=none; b=IpJPyssSjb115HlpRQiKti2jWf49Tc810AxpPjox6u80TpNVgD9wSRuGWPxRqyRumxiM6I ETvT3jbRGt6eP8fRrs2akDUJW3O9D3M5gTsKMGwekqjCd/WbSiHiL8GlNxL8ZB99IFiASX EebpoDEu0AEbrqyNge9zn9sziGYo72o= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=b7fg3dfz; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490607; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=KDEAJ5sJe0xhlBuZdGRKap9V0KZAW8vCJItUBln9Q4Q=; b=lcC3469hiEP3jDC18TVme8IPBnw39c9q5r/YWeBY99w2gWINezdguUiMuShCRnFDKM1VE+ BxepGOQ0hyGoLpYPyfZ6Ts9WpFTXANTXxpDpIRunMewrBFF4wR4agi4NkQk09DSz0Tg/qA ck9WRgFlYzih2+2RZ0OdOvHzzmy4JAg= X-Rspamd-Queue-Id: E536040004 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=b7fg3dfz; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: qx1ptfnie3y16ioss96af6cfzhu3oa1f X-HE-Tag: 1664490606-68009 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu There is essentially no room left in the x86 hardware PTEs on some OSes (not Linux). That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0,Dirty=1. The reason it's lightly used is that Dirty=1 is normally set _before_ a write. A write with a Write=0 PTE would typically only generate a fault, not set Dirty=1. Hardware can (rarely) both set Write=1 *and* generate the fault, resulting in a Dirty=0,Write=1 PTE. Hardware which supports shadow stacks will no longer exhibit this oddity. The kernel should avoid inadvertently creating shadow stack memory because it is security sensitive. So given the above, all it needs to do is avoid manually crating Write=0,Dirty=1 PTEs in software. In places where Linux normally creates Write=0,Dirty=1, it can use the software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other words, whenever Linux needs to create Write=0,Dirty=1, it instead creates Write=0,Cow=1 except for shadow stack, which is Write=0,Dirty=1. This clearly separates shadow stack from other data, and results in the following: (a) (Write=0,Cow=1,Dirty=0) A modified, copy-on-write (COW) page. Previously when a typical anonymous writable mapping was made COW via fork(), the kernel would mark it Write=0,Dirty=1. Now it will instead use the Cow bit. (b) (Write=0,Cow=1,Dirty=0) A R/O page that has been COW'ed. The user page is in a R/O VMA, and get_user_pages() needs a writable copy. The page fault handler creates a copy of the page and sets the new copy's PTE as Write=0 and Cow=1. (c) (Write=0,Cow=0,Dirty=1) A shadow stack PTE. (d) (Write=0,Cow=1,Dirty=0) A shared shadow stack PTE. When a shadow stack page is being shared among processes (this happens at fork()), its PTE is made Dirty=0, so the next shadow stack access causes a fault, and the page is duplicated and Dirty=1 is set again. This is the COW equivalent for shadow stack pages, even though it's copy-on-access rather than copy-on-write. (e) (Write=0,Cow=0,Dirty=1) A Cow PTE created when a processor without shadow stack support set Dirty=1. Define _PAGE_COW and update pte_*() helpers and apply the same changes to pmd and pud. There are six bits left available to software in the 64-bit PTE after consuming a bit for _PAGE_COW. No space is consumed in 32-bit kernels because shadow stacks are not enabled there. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v2: - Update commit log with comments (Dave Hansen) - Add comments in code to explain pte modification code better (Dave) - Clarify info on the meaning of various Write,Cow,Dirty combinations arch/x86/include/asm/pgtable.h | 210 ++++++++++++++++++++++++--- arch/x86/include/asm/pgtable_types.h | 42 +++++- 2 files changed, 231 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 6496ec84b953..ad201dae7316 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -124,9 +124,17 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + return pte_flags(pte) & _PAGE_DIRTY_BITS; +} + +static inline bool pte_shstk(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return false; + + return (pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pte_young(pte_t pte) @@ -134,9 +142,17 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; +} + +static inline bool pmd_shstk(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return false; + + return (pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pmd_young(pmd_t pmd) @@ -144,9 +160,9 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -156,13 +172,21 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { - return pte_flags(pte) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pte_flags(pte) & _PAGE_RW) || pte_shstk(pte); } #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pmd_flags(pmd) & _PAGE_RW) || pmd_shstk(pmd); } #define pud_write pud_write @@ -300,6 +324,44 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) return native_make_pte(v & ~clear); } +/* + * Normally the Dirty bit is used to denote COW memory on x86. But + * in the case of X86_FEATURE_SHSTK, the software COW bit is used, + * since the Dirty=1,Write=0 will result in the memory being treated + * as shaodw stack by the HW. So when creating COW memory, a software + * bit is used _PAGE_BIT_COW. The following functions pte_mkcow() and + * pte_clear_cow() take a PTE marked conventially COW (Dirty=1) and + * transition it to the shadow stack compatible version of COW (Cow=1). + */ + +static inline pte_t pte_mkcow(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pte; + + pte = pte_clear_flags(pte, _PAGE_DIRTY); + return pte_set_flags(pte, _PAGE_COW); +} + +static inline pte_t pte_clear_cow(pte_t pte) +{ + /* + * _PAGE_COW is unnecessary on !X86_FEATURE_SHSTK kernels. + * See the _PAGE_COW definition for more details. + */ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pte; + + /* + * PTE is getting copied-on-write, so it will be dirtied + * if writable, or made shadow stack if shadow stack and + * being copied on access. Set they dirty bit for both + * cases. + */ + pte = pte_set_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_COW); +} + #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { @@ -319,7 +381,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -329,7 +391,16 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { - return pte_clear_flags(pte, _PAGE_RW); + pte = pte_clear_flags(pte, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (Write=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pte_dirty(pte)) + pte = pte_mkcow(pte); + return pte; } static inline pte_t pte_mkexec(pte_t pte) @@ -339,7 +410,19 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pteval_t dirty = _PAGE_DIRTY; + + /* Avoid creating Dirty=1,Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte)) + dirty = _PAGE_COW; + + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + /* pte_clear_cow() also sets Dirty=1 */ + return pte_clear_cow(pte); } static inline pte_t pte_mkyoung(pte_t pte) @@ -349,7 +432,12 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkwrite(pte_t pte) { - return pte_set_flags(pte, _PAGE_RW); + pte = pte_set_flags(pte, _PAGE_RW); + + if (pte_dirty(pte)) + pte = pte_clear_cow(pte); + + return pte; } static inline pte_t pte_mkhuge(pte_t pte) @@ -396,6 +484,26 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) return native_make_pmd(v & ~clear); } +/* See comments above pte_mkcow() */ +static inline pmd_t pmd_mkcow(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pmd; + + pmd = pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_set_flags(pmd, _PAGE_COW); +} + +/* See comments above pte_mkcow() */ +static inline pmd_t pmd_clear_cow(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pmd; + + pmd = pmd_set_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_COW); +} + #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pmd_uffd_wp(pmd_t pmd) { @@ -420,17 +528,36 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_wrprotect(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_RW); + pmd = pmd_clear_flags(pmd, _PAGE_RW); + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pmd_dirty(pmd)) + pmd = pmd_mkcow(pmd); + return pmd; } static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pmdval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pmd_write(pmd)) + dirty = _PAGE_COW; + + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + return pmd_clear_cow(pmd); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -450,7 +577,11 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_RW); + pmd = pmd_set_flags(pmd, _PAGE_RW); + + if (pmd_dirty(pmd)) + pmd = pmd_clear_cow(pmd); + return pmd; } static inline pud_t pud_set_flags(pud_t pud, pudval_t set) @@ -467,6 +598,26 @@ static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear) return native_make_pud(v & ~clear); } +/* See comments above pte_mkcow() */ +static inline pud_t pud_mkcow(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pud; + + pud = pud_clear_flags(pud, _PAGE_DIRTY); + return pud_set_flags(pud, _PAGE_COW); +} + +/* See comments above pte_mkcow() */ +static inline pud_t pud_clear_cow(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pud; + + pud = pud_set_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_COW); +} + static inline pud_t pud_mkold(pud_t pud) { return pud_clear_flags(pud, _PAGE_ACCESSED); @@ -474,17 +625,32 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { - return pud_clear_flags(pud, _PAGE_RW); + pud = pud_clear_flags(pud, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pud_dirty(pud)) + pud = pud_mkcow(pud); + return pud; } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pud_write(pud)) + dirty = _PAGE_COW; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -504,7 +670,11 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { - return pud_set_flags(pud, _PAGE_RW); + pud = pud_set_flags(pud, _PAGE_RW); + + if (pud_dirty(pud)) + pud = pud_clear_cow(pud); + return pud; } #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index ff82237e7b6b..85d88c0f9618 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -21,7 +21,8 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ @@ -34,6 +35,15 @@ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 +/* + * Indicates a copy-on-write page. + */ +#ifdef CONFIG_X86_SHADOW_STACK +#define _PAGE_BIT_COW _PAGE_BIT_SOFTW5 /* copy-on-write */ +#else +#define _PAGE_BIT_COW 0 +#endif + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -117,6 +127,36 @@ #define _PAGE_SOFTW4 (_AT(pteval_t, 0)) #endif +/* + * The hardware requires shadow stack to be read-only and Dirty. + * _PAGE_COW is a software-only bit used to separate copy-on-write PTEs + * from shadow stack PTEs: + * (a) (Write=0,Cow=1,Dirty=0) A modified, copy-on-write (COW) page. + * Previously when a typical anonymous writable mapping was made COW via + * fork(), the kernel would mark it Write=0,Dirty=1. Now it will instead + * use the Cow bit. + * (b) (Write=0,Cow=1,Dirty=0) A R/O page that has been COW'ed. The user page + * is in a R/O VMA, and get_user_pages() needs a writable copy. The page + * fault handler creates a copy of the page and sets the new copy's PTE + * as Write=0 and Cow=1. + * (c) (Write=0,Cow=0,Dirty=1) A shadow stack PTE. + * (d) (Write=0,Cow=1,Dirty=0) A shared shadow stack PTE. When a shadow stack + * page is being shared among processes (this happens at fork()), its PTE + * is made Dirty=0, so the next shadow stack access causes a fault, and + * the page is duplicated and Dirty=1 is set again. This is the COW + * equivalent for shadow stack pages, even though it's copy-on-access + * rather than copy-on-write. + * (e) (Write=0,Cow=0,Dirty=1) A Cow PTE created when a processor without + * shadow stack support set Dirty=1. + */ +#ifdef CONFIG_X86_SHADOW_STACK +#define _PAGE_COW (_AT(pteval_t, 1) << _PAGE_BIT_COW) +#else +#define _PAGE_COW (_AT(pteval_t, 0)) +#endif + +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY | _PAGE_COW) + #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) /* From patchwork Thu Sep 29 22:29:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994664 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84919C4321E for ; Thu, 29 Sep 2022 22:30:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DC2D68D0005; Thu, 29 Sep 2022 18:30:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D96AC8D0001; Thu, 29 Sep 2022 18:30:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC1BC8D0005; Thu, 29 Sep 2022 18:30:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AEDA98D0001 for ; Thu, 29 Sep 2022 18:30:09 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7D50E813C6 for ; Thu, 29 Sep 2022 22:30:09 +0000 (UTC) X-FDA: 79966567338.25.D5D66DC Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id CB8E340009 for ; Thu, 29 Sep 2022 22:30:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490608; x=1696026608; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=9UWqqA5zjHhGRdDkC9YrucJaLkwFDjky53Grbijfra4=; b=RNSM3EXryNNmY2lyrJvR4GwKaQVyFqqY5GshA9f4eGJAAoDmJZXoGg3H KlsXCY1BqNDhsH/NaS3so8zRIKsQuFQIGxVoKDtdlJ79dcOAd77Kc8Y7p Obt4bZmDAJEdZT5CRPp3HuycWJq3Hc9vo1F9d3kjA8KsJV3OavfilgfA1 u1Z0OjWxRdIa6BsqyekwPf0cKYG78PKDT8NcMJGkE88Z3qrifZ/NsNagS dJYd1wY9gtelKEIqczmSAOy6BVXUVUzMsnDzO9fi7qTgqEIYE+OL7sa2A Vyqr2di99EZWU1/QfXdP9HhqAxNMqEi/h7S9KIggPEadnFcMViJd/o1X3 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420394" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420394" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:08 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016179" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016179" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:06 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 11/39] x86/mm: Update pte_modify for _PAGE_COW Date: Thu, 29 Sep 2022 15:29:08 -0700 Message-Id: <20220929222936.14584-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490609; a=rsa-sha256; cv=none; b=uQLrd5BGo6PTFTQJE68o9s3zTVN6Jf00AzSXIB/90MOprPnx4VV/YEt3l+dqW3LLiHgRmj ItnBtIkOEoV/lOGMZffj65ieBv9fOYIZWYBreOj3bMHDYmqS9U5XBqQbzWONxK4pLtFtJ3 Pi7Teal9Yon4s8ME55v8EKQfKknzddY= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=RNSM3EXr; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490609; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=oXlmDvtKZSc+YU7n1KH/2dUKxM/yZCAfhNBElAj75kE=; b=s09xe77ePovEA2IOqA0EcUPN8PQiHOjRqr6CVlr+XRPok/04NHhb5oOYu11B0sZWp1X5FM biqhm0jC6djyKBbPbsYmrt8pFiyHCoUdPyMORuQEKTRnDL2V6UQqzkyq7tzBD+zqQwhhpJ i2+Vn7VQnStpE2u9gM/sOyt1H6Gl3Ww= X-Rspamd-Queue-Id: CB8E340009 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=RNSM3EXr; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: dki6kb1dtou1hhzyrqs9c94rf8jyurfm X-HE-Tag: 1664490608-939466 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu The Write=0,Dirty=1 PTE has been used to indicate copy-on-write pages. However, newer x86 processors also regard a Write=0,Dirty=1 PTE as a shadow stack page. In order to separate the two, the software-defined _PAGE_DIRTY is changed to _PAGE_COW for the copy-on-write case, and pte_*() are updated to do this. pte_modify() takes a "raw" pgprot_t which was not necessarily created with any of the existing PTE bit helpers. That means that it can return a pte_t with Write=0,Dirty=1, a shadow stack PTE, when it did not intend to create one. However pte_modify() changes a PTE to 'newprot', but it doesn't use the pte_*(). Modify it to also move _PAGE_DIRTY to _PAGE_COW. Apply the same changes to pmd_modify(). Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v2: - Update commit log with text and suggestions from (Dave Hansen) - Drop fixup_dirty_pte() in favor of clearing the HW dirty bit along with the _PAGE_CHG_MASK masking, then calling pte_mkdirty() (Dave Hansen) arch/x86/include/asm/pgtable.h | 41 +++++++++++++++++++++++++++++----- 1 file changed, 35 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index ad201dae7316..2f2963429f48 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -790,26 +790,55 @@ static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { + pteval_t _page_chg_mask_no_dirty = _PAGE_CHG_MASK & ~_PAGE_DIRTY; pteval_t val = pte_val(pte), oldval = val; + pte_t pte_result; /* * Chop off the NX bit (if present), and add the NX portion of * the newprot (if present): */ - val &= _PAGE_CHG_MASK; - val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK; + val &= _page_chg_mask_no_dirty; + val |= check_pgprot(newprot) & ~_page_chg_mask_no_dirty; val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); - return __pte(val); + + pte_result = __pte(val); + + /* + * Dirty bit is not preserved above so it can be done + * in a special way for the shadow stack case, where it + * needs to set _PAGE_COW. pte_mkdirty() will do this in + * the case of shadow stack. + */ + if (pte_dirty(pte)) + pte_result = pte_mkdirty(pte_result); + + return pte_result; } static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { + pteval_t _hpage_chg_mask_no_dirty = _HPAGE_CHG_MASK & ~_PAGE_DIRTY; pmdval_t val = pmd_val(pmd), oldval = val; + pmd_t pmd_result; - val &= _HPAGE_CHG_MASK; - val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK; + val &= _hpage_chg_mask_no_dirty; + val |= check_pgprot(newprot) & ~_hpage_chg_mask_no_dirty; val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); - return __pmd(val); + + + pmd_result = __pmd(val); + + /* + * Dirty bit is not preserved above so it can be done + * specially for the shadow stack case. It needs to move + * the HW dirty bit to the software COW bit. Set in the + * result if it was set in the original value. + */ + if (pmd_dirty(pmd)) + pmd_result = pmd_mkdirty(pmd_result); + + return pmd_result; } /* From patchwork Thu Sep 29 22:29:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76801C433FE for ; Thu, 29 Sep 2022 22:30:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF9208D0006; Thu, 29 Sep 2022 18:30:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA8668D0001; Thu, 29 Sep 2022 18:30:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFA188D0006; Thu, 29 Sep 2022 18:30:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C14808D0001 for ; Thu, 29 Sep 2022 18:30:11 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 908FD1C6E3A for ; Thu, 29 Sep 2022 22:30:11 +0000 (UTC) X-FDA: 79966567422.01.8744867 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id CE71440009 for ; Thu, 29 Sep 2022 22:30:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490610; x=1696026610; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=eV9GA8OUMVYJDpffX6i9+4mzKnfsWlOOmYPUr5qRV9M=; b=PdG27S/4UWgyWjMOAV08UDmdHbVkcUt1D7FDX4uaykb7jiiyXF3v2aZf bZDgr1ajKWroAhxzR1FM8Qn2ruj0WVT/6OzSk4h8joYJKMmEST/Sxxh37 9usMkjENiNXiKyUB9wlGj6akmzuqKfPKmKbEszOrypK8/JPYG76hg39mR KLPKWGqs8T9mYgibXwhYPOXPtMGW1CrhYuSUjKHl+WM+bCkxlJspCkDg0 7W6ugvMuby8GT/jYql+YdxZmIll/FME8hMIQllSfrp3uZ5Q/6LdfOCxjO urvs3nfnAhYnUMafE0zNdPvkZvFnHmUOCNtOq9w/GNsT3aatFYiT090G4 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420417" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420417" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:10 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016186" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016186" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:08 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 12/39] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW Date: Thu, 29 Sep 2022 15:29:09 -0700 Message-Id: <20220929222936.14584-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490611; a=rsa-sha256; cv=none; b=WEgTKDkQlnQpZe2jV+7lLOj/c1Z1PJJnOtYk3viVwg9fcK7sg7HQowcrphgO/g6OelrcTH bAvdbSgINiUOM9UQcLOzCk0Vul7rsTdFSLwQlDcYjGk/Y1tawqRmIpTE1FSIdZxbhiQFlR sH7FNtcB8EUbPhFbWy5iHSBHxnAVDwE= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="PdG27S/4"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490611; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=0lqQoOoJwESjO+CTCm+DZMhNYniwX6TJlB5GrhfWP90=; b=hWbkpVKho8cHeJn/WfXblS84V3nMWw+oI25Rs3YYFojac0XVxJsiUGpWDqcQW14bqPttAb v5lr9jmo0h+cSmKq9sE1WoN6bqE4MX6ekwAcvp8+uMkiZx0pBdGz4PYLD34NkAPIIQFxzE mKyfiFP+LsEyY+/ttA2LwZDLcZ4rZ1I= X-Rspamd-Queue-Id: CE71440009 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="PdG27S/4"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: mheiez4nr83c4y1ngrgwtjcy7t3y1i6o X-HE-Tag: 1664490610-835127 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu When Shadow Stack is in use, Write=0,Dirty=1 PTE are reserved for shadow stack. Copy-on-write PTes then have Write=0,Cow=1. When a PTE goes from Write=1,Dirty=1 to Write=0,Cow=1, it could become a transient shadow stack PTE in two cases: The first case is that some processors can start a write but end up seeing a Write=0 PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting Shadow Stack, and a TLB flush is not necessary. The second case is that when _PAGE_DIRTY is replaced with _PAGE_COW non- atomically, a transient shadow stack PTE can be created as a result. Thus, prevent that with cmpxchg. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v2: - Compile out some code due to clang build error - Clarify commit log (dhansen) - Normalize PTE bit descriptions between patches (dhansen) - Update comment with text from (dhansen) Yu-cheng v30: - Replace (pmdval_t) cast with CONFIG_PGTABLE_LEVELES > 2 (Borislav Petkov). arch/x86/include/asm/pgtable.h | 36 ++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 2f2963429f48..58c7bf9d7392 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1287,6 +1287,23 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { +#ifdef CONFIG_X86_SHADOW_STACK + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pte_t old_pte, new_pte; + + old_pte = READ_ONCE(*ptep); + do { + new_pte = pte_wrprotect(old_pte); + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } +#endif clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1339,6 +1356,25 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { +#ifdef CONFIG_X86_SHADOW_STACK + /* + * If Shadow Stack is enabled, pmd_wrprotect() moves _PAGE_DIRTY + * to _PAGE_COW (see comments at pmd_wrprotect()). + * When a thread reads a RW=1, Dirty=0 PMD and before changing it + * to RW=0, Dirty=0, another thread could have written to the page + * and the PMD is RW=1, Dirty=1 now. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pmd_t old_pmd, new_pmd; + + old_pmd = READ_ONCE(*pmdp); + do { + new_pmd = pmd_wrprotect(old_pmd); + } while (!try_cmpxchg(&pmdp->pmd, &old_pmd.pmd, new_pmd.pmd)); + + return; + } +#endif clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } From patchwork Thu Sep 29 22:29:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AB7EC43219 for ; Thu, 29 Sep 2022 22:30:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD89B8D0007; Thu, 29 Sep 2022 18:30:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE8B68D0001; Thu, 29 Sep 2022 18:30:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A61C48D0007; Thu, 29 Sep 2022 18:30:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 998F78D0001 for ; Thu, 29 Sep 2022 18:30:13 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4FD28ABA9A for ; Thu, 29 Sep 2022 22:30:13 +0000 (UTC) X-FDA: 79966567506.13.78F85B4 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id B482E40012 for ; Thu, 29 Sep 2022 22:30:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490612; x=1696026612; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=vKTf+AsczWlZQi59ytE2TWI1uBC+j04xvFTJcZyikik=; b=W02eGnAgGZOC4WWfVOPB4xnKjVdz/aU30rQm2aP2U5nfgDa5+g6GX7Nc KGP8r5wv5XaAKdaHEefO6HLKJ0PF/99U3oq38XAhomDJMfsr+jtSO4/cL pBpDX3zvb1Iwj9KUbFGleZi/58KNHtbQTGb0G66jnqZprvd0N9w5tHalK rtjkz6vRnF2E+usM/OrJBHNtFhF67YuBAT9lwGOukd3wfZQjHucYIyX2A 15CEf56Dy+hbGIPAOKEmBTm5eC1Rww7Luija1R+2YUHM97QyR9Rj5f1Fp jO+pLmy7FpsPcVyOtsp7ru41fldX6Nhk/o2YQmWL2PCUem2A3mHdxWDSJ g==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420434" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420434" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:12 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016195" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016195" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:10 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Peter Xu Subject: [PATCH v2 13/39] mm: Move VM_UFFD_MINOR_BIT from 37 to 38 Date: Thu, 29 Sep 2022 15:29:10 -0700 Message-Id: <20220929222936.14584-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490613; a=rsa-sha256; cv=none; b=m6KTc8rTJDG0C8t0fig8Pnw6wQ8SwZq3rB+QORM6wKk/SBfjppOamL7ZmXRx62zAhVriAN 7c0+H/XYx3DaLM2jr0ps9z6O+sPVVb50XzSZIW+Y7GBKyd9WjqmDxpHjVJ1eYmm4TkdDRM QW6dWo5OqjCK6Xzxtl0nlQedvIIkSfA= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=W02eGnAg; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490613; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=jypyb0PivqULfUeLI/7ij5TbVGZAEVUq6T/gxHCECB8=; b=hOsuWX6n1x55ZxM0TGb0z08FGyPaPfotHPVWqUxUQHfYYyc26mKEe3l1jq2riwWu6eZstL L35ufrYdaH2511Sp8UlZI/BkbVoCW1xM5mWEIHtA0E75hLNht0k2Y8oQa5INWByjAaWBx5 VwSbwk99CUCeqk+LkiF8ItxAwuhxRnM= X-Rspamd-Queue-Id: B482E40012 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=W02eGnAg; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: jjpigqwqep71kw1j4e36zcn6b3stoujj X-HE-Tag: 1664490612-379413 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu To introduce VM_SHADOW_STACK as VM_HIGH_ARCH_BIT (37), and make all VM_HIGH_ARCH_BITs stay together, move VM_UFFD_MINOR_BIT from 37 to 38. Signed-off-by: Yu-cheng Yu Reviewed-by: Axel Rasmussen Signed-off-by: Rick Edgecombe Cc: Peter Xu Cc: Mike Kravetz Reviewed-by: Kees Cook Acked-by: Peter Xu --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 21f8b27bd9fd..be80fc827212 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -365,7 +365,7 @@ extern unsigned int kobjsize(const void *objp); #endif #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR -# define VM_UFFD_MINOR_BIT 37 +# define VM_UFFD_MINOR_BIT 38 # define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */ #else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ # define VM_UFFD_MINOR VM_NONE From patchwork Thu Sep 29 22:29:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0486BC433F5 for ; Thu, 29 Sep 2022 22:30:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A29A8D0008; Thu, 29 Sep 2022 18:30:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 950EB8D0001; Thu, 29 Sep 2022 18:30:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C9ED8D0008; Thu, 29 Sep 2022 18:30:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6DB8E8D0001 for ; Thu, 29 Sep 2022 18:30:15 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3D7C516137D for ; Thu, 29 Sep 2022 22:30:15 +0000 (UTC) X-FDA: 79966567590.20.05619DD Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id A321E40009 for ; Thu, 29 Sep 2022 22:30:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490614; x=1696026614; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=eA3IOTEvlbESNVd0B9n9A/9HGJD20AxSfeCoBdQ1OOM=; b=F3GZi5hoZENWNOWzI5J07zM35v0x5F1mKTFgWTL3tqTbkDKQBF/HG0mU V5wQ0P89Zn6VC6t6OvWa0jGaKJvVIDlILdvBzbQBnvM34WLWDffNDZAwP DdQbWqz/6iSXmeRwmzgsulPD4qRtF28XVLF+8UQyE2t7vCMbJ0zBmBeHs YRaSWKMDLWYP5R2hjlJoiDOvTgJ7Ol58Vzxrlw/exEH1yZZ+cPinf5FdT 9PHCpAIIVJeHRhdAnMTbYY3ORIGG9ysqdZiqjSbT3FIn7F3l72AqIGlEU X5DcLk+JQ9FFMa72zeXd3RKSIOZFLqSJzTgVruoRWO04e+DCpJeVqJ8lP Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420445" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420445" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:14 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016205" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016205" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:12 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 14/39] mm: Introduce VM_SHADOW_STACK for shadow stack memory Date: Thu, 29 Sep 2022 15:29:11 -0700 Message-Id: <20220929222936.14584-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490614; a=rsa-sha256; cv=none; b=Ggv5QZKzE9zMcE0Il9rJJYQ9rEx11sR6zsbKqTwPrYagq0GQoXfQUDP08KRN7/ktfdfkTE Z49JiWMepbzcLUvvragsIGcniiaV/k6N19+uRI4zMbHxPMWbPTO8tvJ1QTI7sLxveBi6Cc q/ydT3IfKaIF5zWDzbHskU5n9HJmnoo= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=F3GZi5ho; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490614; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=d5kHpuIJ8ZUHtrfp7IeYufKnRiW/rKo24qE8EgfXHsU=; b=MApOz400vboZ31JgXCFDN5zC1x/1O+X1kkZL8EM+ZXTqLG8KgyE0GaROGAB+Q5rFWGh1GB w4PusA8Q/s/cYRh1dfFRkdatxPxFTTMn1hy3xhHKadMddhpZ6fQvtd5iIVZiveyBxmFS9E Z3O8OePFNVVW0et7IlL+s4L7BbFH2n0= X-Rspamd-Queue-Id: A321E40009 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=F3GZi5ho; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: 8o4zepopg68aqakr9fcxu5t57bn3ekyd X-HE-Tag: 1664490614-955163 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu A shadow stack PTE must be read-only and have _PAGE_DIRTY set. However, read-only and Dirty PTEs also exist for copy-on-write (COW) pages. These two cases are handled differently for page faults. Introduce VM_SHADOW_STACK to track shadow stack VMAs. Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Kees Cook --- Documentation/filesystems/proc.rst | 1 + arch/x86/mm/mmap.c | 2 ++ fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 8 ++++++++ 4 files changed, 14 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index e7aafc82be99..d54ff397947a 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -560,6 +560,7 @@ encoded manner. The codes are the following: mt arm64 MTE allocation tags are enabled um userfaultfd missing tracking uw userfaultfd wr-protect tracking + ss shadow stack page == ======================================= Note that there is no guarantee that every flag and associated mnemonic will diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index c90c20904a60..f3f52c5e2fd6 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -165,6 +165,8 @@ unsigned long get_mmap_base(int is_legacy) const char *arch_vma_name(struct vm_area_struct *vma) { + if (vma->vm_flags & VM_SHADOW_STACK) + return "[shadow stack]"; return NULL; } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 4e0023643f8b..a20899392c8d 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -700,6 +700,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_ARCH_HAS_SHADOW_STACK + [ilog2(VM_SHADOW_STACK)] = "ss", +#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index be80fc827212..8cd413c5a329 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -314,11 +314,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -334,6 +336,12 @@ extern unsigned int kobjsize(const void *objp); #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_SHADOW_STACK +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 +#else +# define VM_SHADOW_STACK VM_NONE +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) From patchwork Thu Sep 29 22:29:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A435C4321E for ; Thu, 29 Sep 2022 22:30:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B0ED8D0009; Thu, 29 Sep 2022 18:30:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 160FE8D0001; Thu, 29 Sep 2022 18:30:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1C998D0009; Thu, 29 Sep 2022 18:30:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E35718D0001 for ; Thu, 29 Sep 2022 18:30:25 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 93276121018 for ; Thu, 29 Sep 2022 22:30:25 +0000 (UTC) X-FDA: 79966568010.24.1210996 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf26.hostedemail.com (Postfix) with ESMTP id DFBBC14000B for ; Thu, 29 Sep 2022 22:30:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490624; x=1696026624; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=sJ1yALx83MDkolAtFtbXJ6b6IY3XSWP28Qwg8djJ2Rc=; b=a22k0ZSGcXAc68of3mtJQETXAzAWhcHYSc/dDbBqk9RZLwnGKHOgk5lR /TXIPtJfd9QjuaYm6Oj6r5qmOTQ4qHLnYO7hNNCMt+6hSifO5MphGUa9u 75Zl5tQGo2/VpCzkFVWJu+Mhbl/wLLuSNdOhkqAFKiquxKep2hq2GFdHG nKMMaDrnQ5hAFxN2C1oeex1j5QHlpgDNO7S1yvH8QKLS6mG5Y4COcmAbB klm4lE6659uREOnxah9iVZdHSwQX7rEWrOojtNuhr1Nvyd8DX3FXJGhTB x0/K2D7uYeQci+hwctsQO/d5QmkzT5h/5sZJ537XkNDjLm0avlaTFUnuH g==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420479" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420479" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:23 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016214" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016214" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:14 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 15/39] x86/mm: Check Shadow Stack page fault errors Date: Thu, 29 Sep 2022 15:29:12 -0700 Message-Id: <20220929222936.14584-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490625; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=tcI2PG8aLvQHuJ+BgPRkzRQFjd58dlK8IW7n3kMXIm0=; b=qzLM/4e3liQXcSkNzoAw0TMnoMl8UBIKIoYb4XpG48svomWtIbjsMSXAyOLq3XWMglOjTD kEwOiwch+jvurT3uEja3Ve1KqR5MXgg47SS7cTzXq5PRNE4wpWGeIlDHT2dms5B1jVbSEh lCGfSoJ/WN3TFp2k2ygqKxc9a1f7D2Y= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=a22k0ZSG; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490625; a=rsa-sha256; cv=none; b=jrG4YKORZ3bRZ/s2wORarwhV6AsvF0Qnp8YTw3R8bFUigd/dl0C7+z+Q5qrIeCmS2mwzTu vcaD3wxnEUpYVpY2PY+wKSCKD2kR3u7QFJQV199M33QexDuq4MAf+xXEOB2ChTZ99PoplC B6F57nHFVnXvm4ggP4u8MW8S9frP8rg= Authentication-Results: imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=a22k0ZSG; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: wzbhy93uze9z3tdszsysjp7mfddzkeut X-Rspamd-Queue-Id: DFBBC14000B X-Rspamd-Server: rspam08 X-HE-Tag: 1664490624-924281 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu The CPU performs "shadow stack accesses" when it expects to encounter shadow stack mappings. These accesses can be implicit (via CALL/RET instructions) or explicit (instructions like WRSS). Shadow stacks accesses to shadow-stack mappings can see faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. The kernel needs to use faults to implement those features. The architecture has concepts of both shadow stack reads and shadow stack writes. Any shadow stack access to non-shadow stack memory will generate a fault with the shadow stack error code bit set. This means that, unlike normal write protection, the fault handler needs to create a type of memory that can be written to (with instructions that generate shadow stack writes), even to fulfill a read access. So in the case of COW memory, the COW needs to take place even with a shadow stack read. Otherwise the page will be left (shadow stack) writable in userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE for shadow stack accesses, even if the access was a shadow stack read. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. Also, generate the errors for invalid shadow stack accesses. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v2: - Update commit log with verbiage/feedback from Dave Hansen - Clarify reasoning for FAULT_FLAG_WRITE for all shadow stack accesses - Update comments with some verbiage from Dave Hansen Yu-cheng v30: - Update Subject line and add a verb arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 21 +++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fa71a5d12e87..e5697b393069 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1107,8 +1107,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Shadow stack accesses (PF_SHSTK=1) are only permitted to + * shadow stack VMAs. All other accesses result in an error. + */ + if (error_code & X86_PF_SHSTK) { + if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK))) + return 1; + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ + if (unlikely(vma->vm_flags & VM_SHADOW_STACK)) + return 1; if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; return 0; @@ -1300,6 +1314,13 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * In order to fullfull a shadow stack access, the page needs + * to be made (shadow stack) writable. So treat all shadow stack + * accesses as writes. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR) From patchwork Thu Sep 29 22:29:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FB4FC4332F for ; Thu, 29 Sep 2022 22:30:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D215A8D000B; Thu, 29 Sep 2022 18:30:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CFA848D0001; Thu, 29 Sep 2022 18:30:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AAD518D000C; Thu, 29 Sep 2022 18:30:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 745438D0001 for ; Thu, 29 Sep 2022 18:30:26 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 50499161357 for ; Thu, 29 Sep 2022 22:30:26 +0000 (UTC) X-FDA: 79966568052.14.3C7CE5C Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf10.hostedemail.com (Postfix) with ESMTP id CA6F5C0009 for ; Thu, 29 Sep 2022 22:30:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490625; x=1696026625; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=VsTBZoElthVu/yGuB+Di9RxtRD1iZA84O3vOvi0lRyE=; b=YO06CTpLGHgo6nRBFBcl14HnVGhcXRPnh6DSmcpCYqS0odBZPppP8+vd ofJwads1H5LEj+KWUo6tiC0KVFtcGZaN1AX8h4Qfu/9lrCMmiLcgFsS5T VTDitqokg6lTviSOZWozG3Jpv23rJkFTKXNX34Wh8NtXD+IiT8Na0Qegz 9msNCM+2/L5+OjYmAyDA6U5rJeUw1xq4+BXPEC7gZmbd9MN66uNqP3GrI 5qXZZRYE2TX9PJIkXBMrVKgq7I7hApdM3EpGxD8G8aXpvtVlIzbZbhXNg pFb1OEmeXupd4CxXqmPupjR2uMTuh1mqNh729r00yNkuAzGPegD9rEQjv w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420481" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420481" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:24 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016218" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016218" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:16 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 16/39] x86/mm: Update maybe_mkwrite() for shadow stack Date: Thu, 29 Sep 2022 15:29:13 -0700 Message-Id: <20220929222936.14584-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=YO06CTpL; spf=pass (imf10.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490626; a=rsa-sha256; cv=none; b=ZefvNKHdBveSxwW93jxHHVSrmU28QDH2T96oBXqNmRTgVRTx3cncuyWBKQmRf8SGdPAcMP /jA2MDB/ZT1YIaWIbf/Kg3mZz8Km2CuG78NUa5iJgYdlRBauBls5i3Qvipp8knLKjCXpQv dka1FKRQFcvwWyhE4h0TgrBEj+IKQp0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490626; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=hSokEo+okB4VVxEGWNT3kVWJxhhV773MtNt90JDX7wU=; b=bK/finlaHiuHNpmf1nATA1oO1OvzSkSmSrCQOFE/gqs89C1+mrC8S0iiXF1j485nw17ZLg HUDs6qXxv/IqWAs4aX2Mkn8pieEyXHdFyH9ekGUJCMxuZ7Xn08OS9TZTVx/ewy5I4nP93s js3gdZlDSTsejJYC8jKY3A9N3qNDZFk= Authentication-Results: imf10.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=YO06CTpL; spf=pass (imf10.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: u4pc45pqpof3n1zkowashnqmgwpk3cgc X-Rspamd-Queue-Id: CA6F5C0009 X-Rspamd-Server: rspam05 X-HE-Tag: 1664490625-859563 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu When serving a page fault, maybe_mkwrite() makes a PTE writable if there is a write access to it, and its vma has VM_WRITE. Shadow stack accesses to shadow stack vma's are also treated as write accesses by the fault handler. This is because setting shadow stack memory makes it writable via some instructions, so COW has to happen even for shadow stack reads. So maybe_mkwrite() should continue to set VM_WRITE vma's as normally writable, but also set VM_WRITE|VM_SHADOW_STACK vma's as shadow stack. Do this by adding a pte_mkwrite_shstk() and a cross-arch stub. Check for VM_SHADOW_STACK in maybe_mkwrite() and call pte_mkwrite_shstk() accordingly. Apply the same changes to maybe_pmd_mkwrite(). Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v2: - Change to handle shadow stacks that are VM_WRITE|VM_SHADOW_STACK - Ditch arch specific maybe_mkwrite(), and make the code generic Yu-cheng v29: - Remove likely()'s. arch/x86/include/asm/pgtable.h | 2 ++ include/linux/mm.h | 14 +++++++++++++- include/linux/pgtable.h | 14 ++++++++++++++ mm/huge_memory.c | 9 ++++++++- mm/memory.c | 3 +-- 5 files changed, 38 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 58c7bf9d7392..7a769c4dbc1c 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -419,6 +419,7 @@ static inline pte_t pte_mkdirty(pte_t pte) return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); } +#define pte_mkwrite_shstk pte_mkwrite_shstk static inline pte_t pte_mkwrite_shstk(pte_t pte) { /* pte_clear_cow() also sets Dirty=1 */ @@ -555,6 +556,7 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd) return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); } +#define pmd_mkwrite_shstk pmd_mkwrite_shstk static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) { return pmd_clear_cow(pmd); diff --git a/include/linux/mm.h b/include/linux/mm.h index 8cd413c5a329..fef14ab3abcb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -981,13 +981,25 @@ void free_compound_page(struct page *page); * servicing faults for write access. In the normal case, do always want * pte_mkwrite. But get_user_pages can cause write faults for mappings * that do not have writing enabled, when used by access_process_vm. + * + * If a vma is shadow stack (a type of writable memory), mark the pte shadow + * stack. */ +#ifndef maybe_mkwrite static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { - if (likely(vma->vm_flags & VM_WRITE)) + if (!(vma->vm_flags & VM_WRITE)) + goto out; + + if (vma->vm_flags & VM_SHADOW_STACK) + pte = pte_mkwrite_shstk(pte); + else pte = pte_mkwrite(pte); + +out: return pte; } +#endif vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page); void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 014ee8f0fbaa..21115b4895ca 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -480,6 +480,13 @@ static inline pte_t pte_sw_mkyoung(pte_t pte) #define pte_mk_savedwrite pte_mkwrite #endif +#ifndef pte_mkwrite_shstk +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + return pte; +} +#endif + #ifndef pte_clear_savedwrite #define pte_clear_savedwrite pte_wrprotect #endif @@ -488,6 +495,13 @@ static inline pte_t pte_sw_mkyoung(pte_t pte) #define pmd_savedwrite pmd_write #endif +#ifndef pmd_mkwrite_shstk +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + return pmd; +} +#endif + #ifndef pmd_mk_savedwrite #define pmd_mk_savedwrite pmd_mkwrite #endif diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e9414ee57c5b..11fc69eb4717 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -554,8 +554,15 @@ __setup("transparent_hugepage=", setup_transparent_hugepage); pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { - if (likely(vma->vm_flags & VM_WRITE)) + if (!(vma->vm_flags & VM_WRITE)) + goto out; + + if (vma->vm_flags & VM_SHADOW_STACK) + pmd = pmd_mkwrite_shstk(pmd); + else pmd = pmd_mkwrite(pmd); + +out: return pmd; } diff --git a/mm/memory.c b/mm/memory.c index 4ba73f5aa8bb..6e8379f6793c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4098,8 +4098,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); From patchwork Thu Sep 29 22:29:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994670 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C69AC4167E for ; Thu, 29 Sep 2022 22:30:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 821DA8D000A; Thu, 29 Sep 2022 18:30:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 77BCC8D000B; Thu, 29 Sep 2022 18:30:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61BDF8D000A; Thu, 29 Sep 2022 18:30:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 531C58D0001 for ; Thu, 29 Sep 2022 18:30:26 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2FE88121018 for ; Thu, 29 Sep 2022 22:30:26 +0000 (UTC) X-FDA: 79966568052.16.F700DC0 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf26.hostedemail.com (Postfix) with ESMTP id AADA414000B for ; Thu, 29 Sep 2022 22:30:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490625; x=1696026625; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=kQODslNH1nloZ+Qn0eQNHtmSgsYUq5sI4TpgOnIwj+g=; b=j61QdmZltfciumsEE5IbzAW5NQklml+Iyg/15nW2sKyz08dA8jfE/wBf kUOz+H+17yuDvuYRljAhYUhUSUeXJieSFa1oCStNnugkOGgbgTv2Ewg6O D1CwJweYGcRi9csVAqbUMVE8YgOKY5rdhBrs2wKP0eldN5EdYxc/squ0T PFhaIyu9VuaJqqSFQ7LuhyYiihLEce5d6z366Wrh8mOAkrDcKBayFMzWC WaVt0QmWKnxuKZw0lk+O7rfQ4fZRS9lnv10rMKiEvGMdaj7YtIgiif6nz AznVBTkKGtDSt3Urs2ALFbbULYsCFSRhC3eFxoEMtiYuLhArMJI3U3VMF w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420480" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420480" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:23 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016227" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016227" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:18 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 17/39] mm: Fixup places that call pte_mkwrite() directly Date: Thu, 29 Sep 2022 15:29:14 -0700 Message-Id: <20220929222936.14584-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490625; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=T5B82Q/4J9Kk/ESzzD5k/poEJjVrxDvqOShCeWflLFA=; b=HAYwMG0DtX36M9IMa9krInkwTfBMFHxyDwmoW/TfGunnRc0PA3LZVlpM7h7BIi8r0DaNf4 0nKNkSXaYj9Z2zsY4o7QDz9YZbI6Y+ci8SjeBYGpweD9IG1VnBOw05UtHi7FlwG2DAAPro OyvuZryX7CDpUD+W2kbCniC30lXHnKs= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=j61QdmZl; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490625; a=rsa-sha256; cv=none; b=z94OwRcECk5kGx4VIPo89a41OR+/mvCMS0BTl7y6jD5mkGH7N7qm7Yv02eBZoYjAhxJQpE M+MXjyyeHE4XKNUiIyWfEbHe/OcB+ZEsn+3ZsrgEWc/CD/oMhGIVzHJquMo0xNO5QK45aK WGWBCBIZy1E5UVvH6fDUACfx5A9SvFE= Authentication-Results: imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=j61QdmZl; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: twih8xxdquczcbcsrdf6k9haj9845awo X-Rspamd-Queue-Id: AADA414000B X-Rspamd-Server: rspam08 X-HE-Tag: 1664490625-39064 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu With the introduction of shadow stack memory there are two ways a pte can be writable: regular writable memory and shadow stack memory. In past patches, maybe_mkwrite() has been updated to apply pte_mkwrite() or pte_mkwrite_shstk() depending on the VMA flag. This covers most cases where a PTE is made writable. However, there are places where pte_mkwrite() is called directly and the logic should now also create a shadow stack PTE in the case of a shadow stack VMA. - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly and call pte_mkwrite(), which is the same as maybe_mkwrite() in logic and intention. Just change them to maybe_mkwrite(). - When userfaultfd is creating a PTE after userspace handles the fault it calls pte_mkwrite() directly. Teach it about pte_mkwrite_shstk() In other cases where pte_mkwrite() is called directly, the VMA will not be VM_SHADOW_STACK, and so shadow stack memory should not be created. - In the case of pte_savedwrite(), shadow stack VMA's are excluded. - In the case of the "dirty_accountable" optimization in mprotect(), shadow stack VMA's won't be VM_SHARED, so it is not nessary. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v2: - Updated commit log with comment's from Dave Hansen - Dave also suggested (I understood) to maybe tweak vm_get_page_prot() to avoid having to call maybe_mkwrite(). After playing around with this I opted to *not* do this. Shadow stack memory memory is effectively writable, so having the default permissions be writable ended up mapping the zero page as writable and other surprises. So creating shadow stack memory needs to be done with manual logic like pte_mkwrite(). - Drop change in change_pte_range() because it couldn't actually trigger for shadow stack VMAs. - Clarify reasoning for skipped cases of pte_mkwrite(). Yu-cheng v25: - Apply same changes to do_huge_pmd_numa_page() as to do_numa_page(). mm/migrate_device.c | 3 +-- mm/userfaultfd.c | 10 +++++++--- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 27fb37d65476..eba3164736b3 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -606,8 +606,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, goto abort; } entry = mk_pte(page, vma->vm_page_prot); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 7327b2573f7c..b49372c7de41 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -63,6 +63,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, int ret; pte_t _dst_pte, *dst_pte; bool writable = dst_vma->vm_flags & VM_WRITE; + bool shstk = dst_vma->vm_flags & VM_SHADOW_STACK; bool vm_shared = dst_vma->vm_flags & VM_SHARED; bool page_in_cache = page->mapping; spinlock_t *ptl; @@ -83,9 +84,12 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, writable = false; } - if (writable) - _dst_pte = pte_mkwrite(_dst_pte); - else + if (writable) { + if (shstk) + _dst_pte = pte_mkwrite_shstk(_dst_pte); + else + _dst_pte = pte_mkwrite(_dst_pte); + } else /* * We need this to make sure write bit removed; as mk_pte() * could return a pte with write bit set. From patchwork Thu Sep 29 22:29:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994672 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5001C43219 for ; Thu, 29 Sep 2022 22:30:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D56D8D0001; Thu, 29 Sep 2022 18:30:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 35D5D8D000D; Thu, 29 Sep 2022 18:30:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13BE58D0001; Thu, 29 Sep 2022 18:30:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E20398D000C for ; Thu, 29 Sep 2022 18:30:26 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B9E1A1C6E4A for ; Thu, 29 Sep 2022 22:30:26 +0000 (UTC) X-FDA: 79966568052.10.D99392B Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf26.hostedemail.com (Postfix) with ESMTP id 3BADB14000B for ; Thu, 29 Sep 2022 22:30:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490626; x=1696026626; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=a9AIyyilAohGDoPHrssf1eNaoetpft93CZ4gQ0mvYLg=; b=bzzlnu1MAkqlHia0g9m+JOg5ND1GSgo1IHX8bjySy/ADH5vUdMfSzJsv 8TeUaEdyJUP+DoLxcEQJCJbT+HmsJIJMOct3pWzYbnQhALtKTunXrC+d+ 95amfp4KVtwzTmmH+qEz8c8mviIhteh9Wz6iZSv5YwhsR7chv6LZvJTgI M+zYWdEtjCbi2ZabbyVBw4JQaJNMf8xU9V2NRJ6iHFpKHktnv1+kxuYUt riNsr4+homX2tLnOhPX9Qh1Sh4clp0i5TYo82Qnkk/feBNME3+yuSzh7p KY29r1R2jJHdakqNuP1oN4X6X0kP4/4i2NtoC4v7tE3NSZGcurESGLfr1 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420483" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420483" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:24 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016239" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016239" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:20 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 18/39] mm: Add guard pages around a shadow stack. Date: Thu, 29 Sep 2022 15:29:15 -0700 Message-Id: <20220929222936.14584-19-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490626; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=mFZ/2ILHxyUz8oUv7NSDjOnndkwzMgPd0BSIOD2PQY8=; b=HpWkGJSLis9AVWgqJOVygq0XW0gfKG2XWvkkXnlLHtDIhsxqpOpmUtqnYUtJWkeJwNGK8E RFTi4BMxqRUnI827ps/y0EVN5sNisSoY9d99p8P4Pti5TT7Tl2TFhUTpBaNPK152MB5mhk gW1xCRoPbd/QPv+/e81sClXjnSEk3Vg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=bzzlnu1M; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490626; a=rsa-sha256; cv=none; b=jD/FNKHO8d9pcfFtDE4D1jpz3xX5TV9r8gHFtgYBMFnEfKMjiA0e/XH0j9AIzrFDrKkmFU D4Uf8ZnbdVNgdI8bWWXppScGUlbYy5wj5ChSSVKgdBzXSy8AqRDafIrOzOJHTQo/CxPKQj /n/Qg0kLh+y7UuT3ZSVUndDdysPPM38= Authentication-Results: imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=bzzlnu1M; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: t4xw5io8gsdgm7f6xdydwniz9dee7rec X-Rspamd-Queue-Id: 3BADB14000B X-Rspamd-Server: rspam08 X-HE-Tag: 1664490626-630417 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu The architecture of shadow stack constrains the ability of userspace to move the shadow stack pointer (SSP) in order to prevent corrupting or switching to other shadow stacks. The RSTORSSP can move the spp to different shadow stacks, but it requires a specially placed token in order to do this. However, the architecture does not prevent incrementing the stack pointer to wander onto an adjacent shadow stack. To prevent this in software, enforce guard pages at the beginning of shadow stack vmas, such that there will always be a gap between adjacent shadow stacks. Make the gap big enough so that no userspace SSP changing operations (besides RSTORSSP), can move the SSP from one stack to the next. The SSP can increment or decrement by CALL, RET and INCSSP. CALL and RET can move the SSP by a maximum of 8 bytes, at which point the shadow stack would be accessed. The INCSSP instruction can also increment the shadow stack pointer. It is the shadow stack analog of an instruction like: addq $0x80, %rsp However, there is one important difference between an ADD on %rsp and INCSSP. In addition to modifying SSP, INCSSP also reads from the memory of the first and last elements that were "popped". It can be thought of as acting like this: READ_ONCE(ssp); // read+discard top element on stack ssp += nr_to_pop * 8; // move the shadow stack READ_ONCE(ssp-8); // read+discard last popped stack element The maximum distance INCSSP can move the SSP is 2040 bytes, before it would read the memory. Therefore a single page gap will be enough to prevent any operation from shifting the SSP to an adjacent stack, since it would have to land in the gap at least once, causing a fault. This could be accomplished by using VM_GROWSDOWN, but this has a downside. The behavior would allow shadow stack's to grow, which is unneeded and adds a strange difference to how most regular stacks work. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v2: - Use __weak instead of #ifdef (Dave Hansen) - Only have start gap on shadow stack (Andy Luto) - Create stack_guard_start_gap() to not duplicate code in an arch version of vm_start_gap() (Dave Hansen) - Improve commit log partly with verbiage from (Dave Hansen) Yu-cheng v25: - Move SHADOW_STACK_GUARD_GAP to arch/x86/mm/mmap.c. Yu-cheng v24: - Instead changing vm_*_gap(), create x86-specific versions. arch/x86/mm/mmap.c | 23 +++++++++++++++++++++++ include/linux/mm.h | 11 ++++++----- mm/mmap.c | 7 +++++++ 3 files changed, 36 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index f3f52c5e2fd6..b0427bd2da30 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -250,3 +250,26 @@ bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) return false; return true; } + +unsigned long stack_guard_start_gap(struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_GROWSDOWN) + return stack_guard_gap; + + /* + * Shadow stack pointer is moved by CALL, RET, and INCSSP(Q/D). + * INCSSPQ moves shadow stack pointer up to 255 * 8 = ~2 KB + * (~1KB for INCSSPD) and touches the first and the last element + * in the range, which triggers a page fault if the range is not + * in a shadow stack. Because of this, creating 4-KB guard pages + * around a shadow stack prevents these instructions from going + * beyond. + * + * Creation of VM_SHADOW_STACK is tightly controlled, so a vma + * can't be both VM_GROWSDOWN and VM_SHADOW_STACK + */ + if (vma->vm_flags & VM_SHADOW_STACK) + return PAGE_SIZE; + + return 0; +} diff --git a/include/linux/mm.h b/include/linux/mm.h index fef14ab3abcb..09458e77bf52 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2775,15 +2775,16 @@ struct vm_area_struct *vma_lookup(struct mm_struct *mm, unsigned long addr) return vma; } +unsigned long stack_guard_start_gap(struct vm_area_struct *vma); + static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { + unsigned long gap = stack_guard_start_gap(vma); unsigned long vm_start = vma->vm_start; - if (vma->vm_flags & VM_GROWSDOWN) { - vm_start -= stack_guard_gap; - if (vm_start > vma->vm_start) - vm_start = 0; - } + vm_start -= gap; + if (vm_start > vma->vm_start) + vm_start = 0; return vm_start; } diff --git a/mm/mmap.c b/mm/mmap.c index 9d780f415be3..f0d2e9143bd0 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -247,6 +247,13 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) return origbrk; } +unsigned long __weak stack_guard_start_gap(struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_GROWSDOWN) + return stack_guard_gap; + return 0; +} + static inline unsigned long vma_compute_gap(struct vm_area_struct *vma) { unsigned long gap, prev_end; From patchwork Thu Sep 29 22:29:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EDD6C433F5 for ; Thu, 29 Sep 2022 22:30:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 90B8E8D000D; Thu, 29 Sep 2022 18:30:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 868DB8D000C; Thu, 29 Sep 2022 18:30:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53AC78D000E; Thu, 29 Sep 2022 18:30:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2C0C28D000C for ; Thu, 29 Sep 2022 18:30:27 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id ECEC5813C6 for ; Thu, 29 Sep 2022 22:30:26 +0000 (UTC) X-FDA: 79966568052.08.F35C360 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf10.hostedemail.com (Postfix) with ESMTP id 6AF7CC0005 for ; Thu, 29 Sep 2022 22:30:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490626; x=1696026626; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=6n6cAgUq+fer3NSIP02jyEcIyWW6Zs7Gr4LVaF52ejw=; b=j9CrzQ5brmPsYwkyPUrGUjgD2PCuR743pkLdtxUnVzhOoiKXFEIBhYaW CsVpkvX8GiJ37CoVFo/49s/rCqAMjoQkasRoF5HpPb6HCxSbUIYhUC357 dl0oAGB710PALNPfeUNqSt0zIP5Yu6PkuUzz3fa4AR5PZk8AKWKiWIXwI gxI3yGYCEPUp+IZgplSn3lb75+85mYwIGFdWDZ5xzfUKHCOdSY8FOA2cu Uzoef9R8OcN6S9D0/lxqtW+jhNNsRMXnGMO88VvApb/jdv3UlYTO3Mney Re9yzeGtcCSj8v5FQYb9puZfiBP6sihRp1u7uqsSnAhxLyGUmTv6b0N20 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420486" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420486" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:24 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016248" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016248" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:22 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 19/39] mm/mmap: Add shadow stack pages to memory accounting Date: Thu, 29 Sep 2022 15:29:16 -0700 Message-Id: <20220929222936.14584-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=j9CrzQ5b; spf=pass (imf10.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490626; a=rsa-sha256; cv=none; b=8Z3tOZxMghnchhPOui9G7yKbmvkvR4DvfO05iBJE6LKZZynkxmyxSrHYFlPG0ASG1iPX1E la5yeGKsyEoVtSMpnbe+QPnlVbH5u5w2hSoztZy2cbb7odmOBgpeU0JrGnZtxWCImzumEN 2QB9erpzaijth4GsLlvFyntAw0F2wxo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490626; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=GheW5a4Oo9wP4OJZdSYVaVB8bQHZQ3tjnBvRIQx7MBU=; b=l9h8kIHrSqrFry/luaMS6vmZlLrDe/7VwT8q7Ajas/XNI9X+UlEB1M+OmXEssMBti9H1rt a7J13xXIMH3c3TJqaabXut1OXA1E+REdZl0HsO7vcSexNqIL6F9fGcjxFP3RSk3jp4ExxF mfIPgIWgz+ovJNznfcEmk0UoBPnv7K4= Authentication-Results: imf10.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=j9CrzQ5b; spf=pass (imf10.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: h77pme9js4wufmt3kj5nmhnn689oizb9 X-Rspamd-Queue-Id: 6AF7CC0005 X-Rspamd-Server: rspam05 X-HE-Tag: 1664490626-361298 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Account shadow stack pages to stack memory. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook Reviewed-by: Kees Cook --- v2: - Remove is_shadow_stack_mapping() and just change it to directly bitwise and VM_SHADOW_STACK. Yu-cheng v26: - Remove redundant #ifdef CONFIG_MMU. Yu-cheng v25: - Remove #ifdef CONFIG_ARCH_HAS_SHADOW_STACK for is_shadow_stack_mapping(). mm/mmap.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index f0d2e9143bd0..8569ef09614c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1682,6 +1682,9 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags) if (file && is_file_hugepages(file)) return 0; + if (vm_flags & VM_SHADOW_STACK) + return 1; + return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE; } @@ -3289,6 +3292,8 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages) mm->exec_vm += npages; else if (is_stack_mapping(flags)) mm->stack_vm += npages; + else if (flags & VM_SHADOW_STACK) + mm->stack_vm += npages; else if (is_data_mapping(flags)) mm->data_vm += npages; } From patchwork Thu Sep 29 22:29:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994674 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67899C4332F for ; Thu, 29 Sep 2022 22:30:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D4C6B8D000E; Thu, 29 Sep 2022 18:30:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C84438D000C; Thu, 29 Sep 2022 18:30:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE2248D000F; Thu, 29 Sep 2022 18:30:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 98EA38D000E for ; Thu, 29 Sep 2022 18:30:27 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 7ACA9140ECF for ; Thu, 29 Sep 2022 22:30:27 +0000 (UTC) X-FDA: 79966568094.07.FDB2AD4 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf26.hostedemail.com (Postfix) with ESMTP id D8D3214000B for ; Thu, 29 Sep 2022 22:30:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490626; x=1696026626; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=HEoICz/7FChfhD1X/KoxDXONV3hJ7PY4eGyUJG+8lDI=; b=S6+2yhOxbT6OwUa0dyD30JSvPwva4qtMYJvkTBeFeY7u0o+4cKvU2Liz YIsUfN1Jk3VCA3CfpQeJMRL65oR8dFqt627QbeHkDfb218kNrOjHo+C7f cqCiBdPDe7+sJ+g7nHDlhh/4NK/K2ck/cN7LMa1WLh8TJWa23+J4gpGMO XascxxYpTKXtzLspumJicRXQ7pCT0NSqlrsPxsRhkzsQwcgsm6u+qFWDT A3x75m065NF7TrKbWokQZCAChisV3z9SRsYvW5IT15gkaoTRzu87UZcim JMyix2jgzJGCBsMVkSGuOBg4+r6WIRBypRY3quRWRyzxTXZMFomtQdfgb Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420496" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420496" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:26 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016264" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016264" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:24 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 20/39] mm/mprotect: Exclude shadow stack from preserve_write Date: Thu, 29 Sep 2022 15:29:17 -0700 Message-Id: <20220929222936.14584-21-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490627; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=EksmLWFFd0h6xLIFCIvhVRecSrF3cMgLHZG5B52jltk=; b=Ia5nUDjnUA92jV6r2cP5BXxo2HPd07ESR12U33Not+Dl+gRWbxLsVo52n7wsP7YoGrZatK bvIucyaISlXsGSZXMvxu2G1bsND+4WU1vK+z8N+6D0fJZgkcU1zCrB7MuGcBhYJl8boa+B qRCP3rbNT//lSau0zJHkpWQgm6otl3g= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=S6+2yhOx; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490627; a=rsa-sha256; cv=none; b=Nk87/VI0j1KvsSm2ejByBW6FEvaG1vvj0ygrKlkERnLe7KlBo69qA/9+7uJ+PUW2vJgakQ e2eZY6gtdEuxZUekSgfd3kLDvSj9w2lO/uRRgysNV8KmgSsyqkLbfQLPY5NC2z2R/iL1tB fAGqFuPXpdt0FlAwvUuOG6OKgN7BDFk= Authentication-Results: imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=S6+2yhOx; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: cc5makqkbkhtc7munrentz61y5uyhp3s X-Rspamd-Queue-Id: D8D3214000B X-Rspamd-Server: rspam08 X-HE-Tag: 1664490626-122983 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu In change_pte_range(), when a PTE is changed for prot_numa, _PAGE_RW is preserved to avoid the additional write fault after the NUMA hinting fault. However, pte_write() now includes both normal writable and shadow stack (Write=0, Dirty=1) PTEs, but the latter does not have _PAGE_RW and has no need to preserve it. Exclude shadow stack from preserve_write test, and apply the same change to change_huge_pmd(). Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe --- Yu-cheng v25: - Move is_shadow_stack_mapping() to a separate line. Yu-cheng v24: - Change arch_shadow_stack_mapping() to is_shadow_stack_mapping(). mm/huge_memory.c | 7 +++++++ mm/mprotect.c | 7 +++++++ 2 files changed, 14 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 11fc69eb4717..492c4f190f55 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1800,6 +1800,13 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return 0; preserve_write = prot_numa && pmd_write(*pmd); + + /* + * Preserve only normal writable huge PMD, but not shadow + * stack (RW=0, Dirty=1). + */ + if (vma->vm_flags & VM_SHADOW_STACK) + preserve_write = false; ret = 1; #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION diff --git a/mm/mprotect.c b/mm/mprotect.c index bc6bddd156ca..983206529dce 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -114,6 +114,13 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, pte_t ptent; bool preserve_write = prot_numa && pte_write(oldpte); + /* + * Preserve only normal writable PTE, but not shadow + * stack (RW=0, Dirty=1). + */ + if (vma->vm_flags & VM_SHADOW_STACK) + preserve_write = false; + /* * Avoid trapping faults against the zero or KSM * pages. See similar comment in change_huge_pmd. From patchwork Thu Sep 29 22:29:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8B12C433F5 for ; Thu, 29 Sep 2022 22:30:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 622698D000F; Thu, 29 Sep 2022 18:30:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D0D18D000C; Thu, 29 Sep 2022 18:30:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46EC08D000F; Thu, 29 Sep 2022 18:30:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 319288D000C for ; Thu, 29 Sep 2022 18:30:30 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 14AF5161379 for ; Thu, 29 Sep 2022 22:30:30 +0000 (UTC) X-FDA: 79966568220.03.FE187D1 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf03.hostedemail.com (Postfix) with ESMTP id 9DFE020009 for ; Thu, 29 Sep 2022 22:30:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490629; x=1696026629; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=4GU+7bD3PIJhe/EOuq1oD0vsUP4qa2yf2XDdcT/yQi0=; b=Wq+gFIgJ1b6pSGRL/4iDhb4jPFOtqjtWimOIna/4WpExGxoHv7TGGG71 wq+Isq9kZIVrlnXkkH8TEl43x5FPlJ7CmgB8bcB1IpL4Q9GX6E4MnJ/H7 WXzjgHn2MF9Z36brH7w6TAyAfC1fKYpzZEfx+yAzebdgMPzsT83uBv3BS XhhibZ21OFZ5v4KO2T9zAItQN/THzo0lOEG9Or0VV4BofbXeMOse/w+0h UvhA1uUQLkaS1W5wRspvYDQSkFeLVgzZ2ltqgevhCMoY0yAHdr14GM9ve Q8n4CIUyYLHZ3cORzw5GeP3aUrnJSonuQ7kcFW14JjKgcvFjHUG/qzwi6 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328420501" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="328420501" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:28 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016278" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016278" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:26 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Andrew Morton Subject: [PATCH v2 21/39] mm: Re-introduce vm_flags to do_mmap() Date: Thu, 29 Sep 2022 15:29:18 -0700 Message-Id: <20220929222936.14584-22-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490629; a=rsa-sha256; cv=none; b=k8l3XsWlwUAfeieFDxfOT/cmMw5mWDrdI2gs8xCDRwn7TlE6jxeRBY5Ke10G3gCE+n08lU OfjOofflATWcQ6w68pxvZNrKw81pRCdRjodLsdDzruT5Hh9bPThyY/TUZND7LbSWBz+I8j f6vVXFNrkH98zEbbHeEq+tsIhCVShRo= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Wq+gFIgJ; spf=pass (imf03.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490629; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=M4MQEJ7o6/aBNWbOiWfcANyScompl1B2ZxfmL+wazd8=; b=IsQiRxs82Y4Gn4zhY4IsHX4sXkFSE1EjHMmBjjywnqw52/9S8Sv5rjpSn19Dyq8VcdHB/y C+7CWcTyEId9ROKDmwhwos7evdx9/m/JkppjIubFe4eHmZachoQraGsMtiswuQbtYcku8R LmbxFKFucQ7DoZ+5tonPE4p5TUxXnTg= X-Stat-Signature: whycjog79fd7ci9iiwstxugwxqado74j X-Rspamd-Queue-Id: 9DFE020009 Authentication-Results: imf03.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Wq+gFIgJ; spf=pass (imf03.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1664490629-840518 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHADOW_STACK to do_mmap(). Thus, re-introduce vm_flags to do_mmap(). Signed-off-by: Yu-cheng Yu Reviewed-by: Peter Collingbourne Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Andrew Morton Cc: Oleg Nesterov Cc: linux-mm@kvack.org --- fs/aio.c | 2 +- include/linux/mm.h | 3 ++- ipc/shm.c | 2 +- mm/mmap.c | 10 +++++----- mm/nommu.c | 4 ++-- mm/util.c | 2 +- 6 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 606613e9d1f4..a54b5ee72f1c 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -554,7 +554,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + MAP_SHARED, 0, 0, &unused, NULL); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index 09458e77bf52..6aa0ffe3666c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2667,7 +2667,8 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf); extern int __do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf, bool downgrade); extern int do_munmap(struct mm_struct *, unsigned long, size_t, diff --git a/ipc/shm.c b/ipc/shm.c index b3048ebd5c31..f236b3e14ec4 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1646,7 +1646,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; } - addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap(file, addr, size, prot, flags, 0, 0, &populate, NULL); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/mmap.c b/mm/mmap.c index 8569ef09614c..e1006c41b1cc 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1375,11 +1375,11 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, */ unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long flags, vm_flags_t vm_flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) { struct mm_struct *mm = current->mm; - vm_flags_t vm_flags; int pkey = 0; *populate = 0; @@ -1439,7 +1439,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_LOCKED) @@ -2964,7 +2964,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, file = get_file(vma->vm_file); ret = do_mmap(vma->vm_file, start, size, - prot, flags, pgoff, &populate, NULL); + prot, flags, 0, pgoff, &populate, NULL); fput(file); out: mmap_write_unlock(mm); diff --git a/mm/nommu.c b/mm/nommu.c index e819cbc21b39..85b41107a192 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1059,6 +1059,7 @@ unsigned long do_mmap(struct file *file, unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1066,7 +1067,6 @@ unsigned long do_mmap(struct file *file, struct vm_area_struct *vma; struct vm_region *region; struct rb_node *rb; - vm_flags_t vm_flags; unsigned long capabilities, result; int ret; @@ -1085,7 +1085,7 @@ unsigned long do_mmap(struct file *file, /* we've determined that we can make the mapping, now translate what we * now know into VMA flags */ - vm_flags = determine_vm_flags(file, prot, flags, capabilities); + vm_flags |= determine_vm_flags(file, prot, flags, capabilities); /* we're going to need to record the mapping */ region = kmem_cache_zalloc(vm_region_jar, GFP_KERNEL); diff --git a/mm/util.c b/mm/util.c index c9439c66d8cf..f15929f2c5bd 100644 --- a/mm/util.c +++ b/mm/util.c @@ -549,7 +549,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, + ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); From patchwork Thu Sep 29 22:29:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994676 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B07A8C43217 for ; Thu, 29 Sep 2022 22:30:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3AC338D0010; Thu, 29 Sep 2022 18:30:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 383D78D000C; Thu, 29 Sep 2022 18:30:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24C0A8D0010; Thu, 29 Sep 2022 18:30:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1693D8D000C for ; Thu, 29 Sep 2022 18:30:39 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id F0C00C138A for ; Thu, 29 Sep 2022 22:30:38 +0000 (UTC) X-FDA: 79966568556.08.3586AD3 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf11.hostedemail.com (Postfix) with ESMTP id 6526540002 for ; Thu, 29 Sep 2022 22:30:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490637; x=1696026637; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=epNKIV9ZZ/lINSB0y0Sxnd07+7mgCJcP2nqXsQVHPjA=; b=m56ZyZZWIzCqsVD82xD0VnwJEknaeT4U8nzhTlCewblK2bcrNG16QfDg ifDcNXc9km9T1ocbbgO6WV1gGSWS1Ez/A1vRY4VMSpCNSqsffEcN4qsWl WdU0xeAckTCJJZwXwEbHLTOsJYy7hU55/xcqx1OUnpcfWtrS8Yi0qVfoa i+zgXCsJ/kK9k3SvxXy2BMxdhtUJFaJ/6QK1heIaT4YQ6rjDRLHMKrazC 5z60KgyO8L5Pt3at7RD5hs62RJ1/d8EPqEGIEvEI4LAi3SmJNVlKzn/S+ Wse38ZflAmvq4MGh7v7v4JF31imeRsD16vtzgICcLIOzzrGIIWHhYWDG6 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207498" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207498" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:36 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016293" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016293" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:28 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v2 22/39] mm: Don't allow write GUPs to shadow stack memory Date: Thu, 29 Sep 2022 15:29:19 -0700 Message-Id: <20220929222936.14584-23-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490637; a=rsa-sha256; cv=none; b=vZNrHVMbhDsf/3NSj3xYI0QASYOkfYp6DDuVl+ux+XmH/CYEk3k36xh1kjgZPGEq6DOAsF 5mq6KMbSSREHPdId23vMbbDuhr1L+clTi9KUERowzPe5dOcmgCT0O2rAdpjJlZyUQ8pMqj usmUOUwdaQUlIRYydMtJIks/Mhe/DCc= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=m56ZyZZW; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490637; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=5v/NkEKqPgfcFAdBfY0++rQv21lgwOcZnH+kYMbRqsk=; b=Zy4AUVuJ1F4JhPt4AdZFozEwKCDtip1lPHQ4hm7zgP359WRlZuJi6BLuQQefPkPl+xOSxr X+SpELfSR/mroak9Lsd99czLQoibvukN2A954ofJoIvIXQ9QeYlKgX0B4Z8z2FU/GFGKHD PsjeyZ7AZr7/QoMd/5BovsIxAKaPVEk= X-Stat-Signature: q15y1dpr1ba9af9mzcj5kof9s9dow6fk X-Rspamd-Queue-Id: 6526540002 X-Rspamd-Server: rspam04 Authentication-Results: imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=m56ZyZZW; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspam-User: X-HE-Tag: 1664490637-811562 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack memory is writable only in very specific, controlled ways. However, since it is writable, the kernel treats it as such. As a result there remain many ways for userspace to trigger the kernel to write to shadow stack's via get_user_pages(, FOLL_WRITE) operations. To make this a little less exposed, block writable GUPs for shadow stack VMAs. Still allow FOLL_FORCE to write through shadow stack protections, as it does for read-only protections. Signed-off-by: Rick Edgecombe --- v2: - New patch arch/x86/include/asm/pgtable.h | 3 +++ mm/gup.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 7a769c4dbc1c..2e6a5ee70034 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1633,6 +1633,9 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write) { unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER; + if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY) + return 0; + if (write) need_pte_bits |= _PAGE_RW; diff --git a/mm/gup.c b/mm/gup.c index 5abdaf487460..56da98f3335c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1043,7 +1043,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) return -EFAULT; if (write) { - if (!(vm_flags & VM_WRITE)) { + if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /* From patchwork Thu Sep 29 22:29:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994677 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C3CEC4321E for ; Thu, 29 Sep 2022 22:30:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1DA9D8D0011; Thu, 29 Sep 2022 18:30:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 189318D000C; Thu, 29 Sep 2022 18:30:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 002B18D0011; Thu, 29 Sep 2022 18:30:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D37918D000C for ; Thu, 29 Sep 2022 18:30:39 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B50521403E3 for ; Thu, 29 Sep 2022 22:30:39 +0000 (UTC) X-FDA: 79966568598.26.8E429A5 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf11.hostedemail.com (Postfix) with ESMTP id 17B0A40002 for ; Thu, 29 Sep 2022 22:30:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490639; x=1696026639; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=FnYcr5J2TgwiovrmqIsxREyO7fe7Xn/9OctH1SrnQU0=; b=Q+vHgPikJnu4eQc8nwz5w+jmEQHo2sxFegNAjfLSKj8gnRAflgBPsGsJ QSGP9CoFwQdQ+YqnKVNWzfbSK5rGLiR9NcF7bbqJIZpGU/9nEyg3wgOuG eshYh/lGsEqNFDDWwNV8lrMpJkjbtaG8aqVlMtm8O3N8uzUvtV2cKVlOl l5gdCuRK+krjsN0CWemMnJOwNHfsYlZ+fFe0NsYP/jMOgWlxBNrlkbZGz o4gbrwlAViJxKXjT1a6v3y7ffiIwFcR00bZbrKtR3wc+ZS97UetHVK9eY 3QgiJ6E3HAjcI8x20ocicbunCVYdxBPqFfgwQ7PJUgZhAA4qwNvOLP4wj g==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207500" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207500" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:36 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016303" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016303" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:34 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v2 23/39] x86: Introduce userspace API for CET enabling Date: Thu, 29 Sep 2022 15:29:20 -0700 Message-Id: <20220929222936.14584-24-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490639; a=rsa-sha256; cv=none; b=2kr9ptW32GdNzQkdjbRv8E+sWeN/Tk5BSfbhZFTUBjeleY2w3REvAyDFtvqfuLN1aRNCqh 2axDc087Srd0y70kl2onzcpjFKRb0D0HjhSceSKjq1PK+NlAz2D0V6i6FqDB7PjaePmm8b TqQ0aMNyDayGchZEhHMD7JlrGSdMtMc= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Q+vHgPik; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490639; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=uNzRSy0qKkUuuoYPSEiQwT/pu08X/bqUb//gxyNZVM0=; b=zk9l7uZFoMXWcaSAarJoYseh289D+UpK51lMLSMNnpX+pDtuxRJgMlcmiu/J3ospqUtyE9 +/srnF4rl6UeCL8jzDeP1EHSolDE+ec+RXSGebrzQJXOktoKLsXm9Qcxp7J7mNJYS0cqug lwx+8H+Xt0vgfONPXvalGe3yZuW2c7I= X-Stat-Signature: mhffadirg8aj9b3c1b1qfxbuty13ik5u X-Rspamd-Queue-Id: 17B0A40002 X-Rspamd-Server: rspam04 Authentication-Results: imf11.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Q+vHgPik; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspam-User: X-HE-Tag: 1664490638-177436 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Kirill A. Shutemov" Add three new arch_prctl() handles: - ARCH_CET_ENABLE/DISABLE enables or disables the specified feature. Returns 0 on success or an error. - ARCH_CET_LOCK prevents future disabling or enabling of the specified feature. Returns 0 on success or an error The features are handled per-thread and inherited over fork(2)/clone(2), but reset on exec(). This is preparation patch. It does not impelement any features. Signed-off-by: Kirill A. Shutemov [tweaked with feedback from tglx] Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v2: - Only allow one enable/disable per call (tglx) - Return error code like a normal arch_prctl() (Alexander Potapenko) - Make CET only (tglx) arch/x86/include/asm/cet.h | 20 ++++++++++++++++ arch/x86/include/asm/processor.h | 3 +++ arch/x86/include/uapi/asm/prctl.h | 6 +++++ arch/x86/kernel/process.c | 4 ++++ arch/x86/kernel/process_64.c | 5 +++- arch/x86/kernel/shstk.c | 38 +++++++++++++++++++++++++++++++ 6 files changed, 75 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/cet.h create mode 100644 arch/x86/kernel/shstk.c diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h new file mode 100644 index 000000000000..0fa4dbc98c49 --- /dev/null +++ b/arch/x86/include/asm/cet.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_CET_H +#define _ASM_X86_CET_H + +#ifndef __ASSEMBLY__ +#include + +struct task_struct; + +#ifdef CONFIG_X86_SHADOW_STACK +long cet_prctl(struct task_struct *task, int option, + unsigned long features); +#else +static inline long cet_prctl(struct task_struct *task, int option, + unsigned long features) { return -EINVAL; } +#endif /* CONFIG_X86_SHADOW_STACK */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_X86_CET_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 356308c73951..a92bf76edafe 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -530,6 +530,9 @@ struct thread_struct { */ u32 pkru; + unsigned long features; + unsigned long features_locked; + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 500b96e71f18..028158e35269 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -20,4 +20,10 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +/* Don't use 0x3001-0x3004 because of old glibcs */ + +#define ARCH_CET_ENABLE 0x4001 +#define ARCH_CET_DISABLE 0x4002 +#define ARCH_CET_LOCK 0x4003 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 58a6ea472db9..034880311e6b 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -367,6 +367,10 @@ void arch_setup_new_exec(void) task_clear_spec_ssb_noexec(current); speculation_ctrl_update(read_thread_flags()); } + + /* Reset thread features on exec */ + current->thread.features = 0; + current->thread.features_locked = 0; } #ifdef CONFIG_X86_IOPL_IOPERM diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 1962008fe743..8fa2c2b7de65 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -829,7 +829,10 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_MAP_VDSO_64: return prctl_map_vdso(&vdso_image_64, arg2); #endif - + case ARCH_CET_ENABLE: + case ARCH_CET_DISABLE: + case ARCH_CET_LOCK: + return cet_prctl(task, option, arg2); default: ret = -EINVAL; break; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c new file mode 100644 index 000000000000..e3276ac9e9b9 --- /dev/null +++ b/arch/x86/kernel/shstk.c @@ -0,0 +1,38 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * shstk.c - Intel shadow stack support + * + * Copyright (c) 2021, Intel Corporation. + * Yu-cheng Yu + */ + +#include +#include +#include + +long cet_prctl(struct task_struct *task, int option, unsigned long features) +{ + if (option == ARCH_CET_LOCK) { + task->thread.features_locked |= features; + return 0; + } + + /* Don't allow via ptrace */ + if (task != current) + return -EINVAL; + + /* Do not allow to change locked features */ + if (features & task->thread.features_locked) + return -EPERM; + + /* Only support enabling/disabling one feature at a time. */ + if (hweight_long(features) > 1) + return -EINVAL; + + if (option == ARCH_CET_DISABLE) { + return -EINVAL; + } + + /* Handle ARCH_CET_ENABLE */ + return -EINVAL; +} From patchwork Thu Sep 29 22:29:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994678 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31164C4332F for ; Thu, 29 Sep 2022 22:30:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB1DC8D0012; Thu, 29 Sep 2022 18:30:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A3C378D000C; Thu, 29 Sep 2022 18:30:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83EEA8D0012; Thu, 29 Sep 2022 18:30:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 74C538D000C for ; Thu, 29 Sep 2022 18:30:40 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 486CC4139F for ; Thu, 29 Sep 2022 22:30:40 +0000 (UTC) X-FDA: 79966568640.22.84A98AF Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf25.hostedemail.com (Postfix) with ESMTP id CBFE2A0018 for ; Thu, 29 Sep 2022 22:30:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490639; x=1696026639; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=giGkTaTZmpDnHAjW9RaJP2B5vpHIjFWnvaB2xH31zds=; b=MjLzhAcTua0xkuSbBxFKYZKojXhsS+OG9C7kqY/Y8HPCmi/OEhCMXSSh 0ZUidKkWXPkqAwJdT9t5XsoHVVQyrq0Nueu4apCShHvjcToJmuPp3YC3X keU3WM+tSu04Oc+QZZQx/zbdGLcgXLw8JxqrWWMVBUjF4CJYbS7bC7vPW QHX/Tv8HRakJUoiDbEpg5JiW2sHRtkfsN/CbYppdH3OQ3bFvF6WLfAeo0 XldAqn0VUnm/rsnAZk4N8selbfaZ9Utj1ePMucXVXsw/xSXxhpBteJqZp aLR5jZwOrfvEBmwMeHV60v5bjBPqVMwxaCxK4FhrSr5P7Mu/oRXDNUEX7 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207510" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207510" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:38 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016310" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016310" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:36 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 24/39] x86/cet/shstk: Add user-mode shadow stack support Date: Thu, 29 Sep 2022 15:29:21 -0700 Message-Id: <20220929222936.14584-25-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490640; a=rsa-sha256; cv=none; b=d66fIji3eHe5F7Ujf6GoX0bsIswBWUtXckQlUq67jO2hmUjUmRdidjOaF12m5VnHpK7cL1 NYd9OluQkfX6FHvX/K9f9YvO8bB3mKk4lBnfaDCewsPiDxAbY+uD0f901U3x6xheALZAdW BUfb9gqm5tGyzeOixOlLruV9Z8t+mSk= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=MjLzhAcT; spf=pass (imf25.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490640; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=pTkJNVHQdSed2C1i82MTfcGb0OdmpQrihNN9DlDOXus=; b=eppqmxI90e0CdUjnh8t2aBDm72DUNDoGvwzdUTyj0At8ju2dH/vpTMEZRw90TTuMF+eXfC ZM1q+x1rj1W+9HYZYYuN6+t9fXaw07xP15Bzw4DrynKcFZ0u8GhawsVrjFB/YkWZ99QJLF bBjZZ3uQ/8rcqYJJLW38fYEd/LPdRm4= X-Stat-Signature: 3eoe7uejuynhmyo4bs4wnik7kfkswwih X-Rspamd-Queue-Id: CBFE2A0018 Authentication-Results: imf25.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=MjLzhAcT; spf=pass (imf25.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1664490639-751357 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Introduce basic shadow stack enabling/disabling/allocation routines. A task's shadow stack is allocated from memory with VM_SHADOW_STACK flag and has a fixed size of min(RLIMIT_STACK, 4GB). Keep the task's shadow stack address and size in thread_struct. This will be copied when cloning new threads, but needs to be cleared during exec, so add a function to do this. Do not support IA32 emulation. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v2: - Get rid of unnessary shstk->base checks - Don't support IA32 emulation v1: - Switch to xsave helpers. - Expand commit log. Yu-cheng v30: - Remove superfluous comments for struct thread_shstk. - Replace 'populate' with 'unused'. Yu-cheng v28: - Update shstk_setup() with wrmsrl_safe(), returns success when shadow stack feature is not present (since this is a setup function). arch/x86/include/asm/cet.h | 13 +++ arch/x86/include/asm/msr.h | 11 +++ arch/x86/include/asm/processor.h | 5 ++ arch/x86/include/uapi/asm/prctl.h | 2 + arch/x86/kernel/Makefile | 2 + arch/x86/kernel/process_64.c | 2 + arch/x86/kernel/shstk.c | 143 ++++++++++++++++++++++++++++++ 7 files changed, 178 insertions(+) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 0fa4dbc98c49..a4a1f4c0089b 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -7,12 +7,25 @@ struct task_struct; +struct thread_shstk { + u64 base; + u64 size; +}; + #ifdef CONFIG_X86_SHADOW_STACK long cet_prctl(struct task_struct *task, int option, unsigned long features); +int shstk_setup(void); +void shstk_free(struct task_struct *p); +int shstk_disable(void); +void reset_thread_shstk(void); #else static inline long cet_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } +static inline int shstk_setup(void) { return -EOPNOTSUPP; } +static inline void shstk_free(struct task_struct *p) {} +static inline int shstk_disable(void) { return -EOPNOTSUPP; } +static inline void reset_thread_shstk(void) {} #endif /* CONFIG_X86_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h index 65ec1965cd28..a9cb4c434e60 100644 --- a/arch/x86/include/asm/msr.h +++ b/arch/x86/include/asm/msr.h @@ -310,6 +310,17 @@ void msrs_free(struct msr *msrs); int msr_set_bit(u32 msr, u8 bit); int msr_clear_bit(u32 msr, u8 bit); +static inline void set_clr_bits_msrl(u32 msr, u64 set, u64 clear) +{ + u64 val, new_val; + + rdmsrl(msr, val); + new_val = (val & ~clear) | set; + + if (new_val != val) + wrmsrl(msr, new_val); +} + #ifdef CONFIG_SMP int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h); int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a92bf76edafe..3a0c9d9d4d1d 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -27,6 +27,7 @@ struct vm86; #include #include #include +#include #include #include @@ -533,6 +534,10 @@ struct thread_struct { unsigned long features; unsigned long features_locked; +#ifdef CONFIG_X86_SHADOW_STACK + struct thread_shstk shstk; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 028158e35269..41af3a8c4fa4 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -26,4 +26,6 @@ #define ARCH_CET_DISABLE 0x4002 #define ARCH_CET_LOCK 0x4003 +#define CET_SHSTK 0x1 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index a20a5ebfacd7..8950d1f71226 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -139,6 +139,8 @@ obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += sev.o +obj-$(CONFIG_X86_SHADOW_STACK) += shstk.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 8fa2c2b7de65..be544b4b4c8b 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -514,6 +514,8 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip, load_gs_index(__USER_DS); } + reset_thread_shstk(); + loadsegment(fs, 0); loadsegment(es, _ds); loadsegment(ds, _ds); diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index e3276ac9e9b9..a0b8d4adb2bf 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -8,8 +8,151 @@ #include #include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include #include +static bool feature_enabled(unsigned long features) +{ + return current->thread.features & features; +} + +static void feature_set(unsigned long features) +{ + current->thread.features |= features; +} + +static void feature_clr(unsigned long features) +{ + current->thread.features &= ~features; +} + +static unsigned long alloc_shstk(unsigned long size) +{ + int flags = MAP_ANONYMOUS | MAP_PRIVATE; + struct mm_struct *mm = current->mm; + unsigned long addr, unused; + + mmap_write_lock(mm); + addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); + + mmap_write_unlock(mm); + + return addr; +} + +static void unmap_shadow_stack(u64 base, u64 size) +{ + while (1) { + int r; + + r = vm_munmap(base, size); + + /* + * vm_munmap() returns -EINTR when mmap_lock is held by + * something else, and that lock should not be held for a + * long time. Retry it for the case. + */ + if (r == -EINTR) { + cond_resched(); + continue; + } + + /* + * For all other types of vm_munmap() failure, either the + * system is out of memory or there is bug. + */ + WARN_ON_ONCE(r); + break; + } +} + +int shstk_setup(void) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + unsigned long addr, size; + + /* Already enabled */ + if (feature_enabled(CET_SHSTK)) + return 0; + + /* Also not supported for 32 bit */ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) || in_ia32_syscall()) + return -EOPNOTSUPP; + + size = PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + fpu_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, addr + size); + wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN); + fpregs_unlock(); + + shstk->base = addr; + shstk->size = size; + feature_set(CET_SHSTK); + + return 0; +} + +void reset_thread_shstk(void) +{ + memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk)); + current->thread.features = 0; + current->thread.features_locked = 0; +} + +void shstk_free(struct task_struct *tsk) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) || + !feature_enabled(CET_SHSTK)) + return; + + if (!tsk->mm) + return; + + unmap_shadow_stack(shstk->base, shstk->size); +} + +int shstk_disable(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return -EOPNOTSUPP; + + /* Already disabled? */ + if (!feature_enabled(CET_SHSTK)) + return 0; + + fpu_lock_and_load(); + /* Disable WRSS too when disabling shadow stack */ + set_clr_bits_msrl(MSR_IA32_U_CET, 0, CET_SHSTK_EN); + wrmsrl(MSR_IA32_PL3_SSP, 0); + fpregs_unlock(); + + shstk_free(current); + feature_clr(CET_SHSTK); + + return 0; +} + long cet_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_CET_LOCK) { From patchwork Thu Sep 29 22:29:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994679 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5BA2C43219 for ; Thu, 29 Sep 2022 22:30:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B75D18D0013; Thu, 29 Sep 2022 18:30:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B274B8D000C; Thu, 29 Sep 2022 18:30:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A1188D0013; Thu, 29 Sep 2022 18:30:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 856C18D000C for ; Thu, 29 Sep 2022 18:30:42 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 60BE51213F0 for ; Thu, 29 Sep 2022 22:30:42 +0000 (UTC) X-FDA: 79966568724.21.033F3D1 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id C304280014 for ; Thu, 29 Sep 2022 22:30:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490641; x=1696026641; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=sfY0qfAAMaBKBaoxmspQvWEjmGcNq7mh/op82wqEfzo=; b=MNfFuXwJJZBLf0pX1oF1YTAq60D4LAKeO+KZzASVT2TXcFW1epgG1o+l w/k3IvlTWMIV5wKjcf8dAsOIKkqFACJQFTKlJ7ZjiyccT0d2iBfHjwGeA 4BejpMO1gF+EbQaJZBOds59U+J8IcubfIm7PE03k0EG3ReyCUsC1NgI/n i2cZWIKUr77RWeazIvQIHiWA4IIBrVMfxGoGE7tyPa4a09/lkFsifzTLg DvyIFMcBEVFo1RvrAx/iOdTi2Tca/X6zMtqoFYDGGBCfuxOVKoYsFgVfV q57dfXZ6xjQcMpBh9KSvfl3aUxeaHEOony1rTwlD4FiyEx6V4833Z8UZS w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207517" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207517" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:40 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016319" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016319" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:38 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 25/39] x86/cet/shstk: Handle thread shadow stack Date: Thu, 29 Sep 2022 15:29:22 -0700 Message-Id: <20220929222936.14584-26-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=MNfFuXwJ; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490642; a=rsa-sha256; cv=none; b=n4171uN6Mg54F0kE5ZDD5YNMTvi2swaNaS4iT3GEwEm5IBP54dLo8K9U0JWIRXKO+CyPZ3 WlAEb6EAdQhrCxSay8Ffji/pdWBw0yNx+VJg6OFByxieWTnAJtI1XKLiC5zN3C5z6awx1q f7vj0zS7PxVhYIRi9u2KDqP9TZOM8sE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490642; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=enSeqn6/zMbUDQ8oNusi22DzvIgvNm7w+oXXQu9Ss8w=; b=TiqRmuI32VGHqQFa9WTr5ReG0Mykzkw70oztHFpX1n1ykpoCuUxmpdmawJsL8v5JTKOpjq jouY041UCML5ohuHe63+N1k5cqAGra5lxw30P6pGACHmarEfiIlVur7DFR8jkXeZpoaJMl CqCMBxRKYj9tq2ypAZK/na+F2RUUBp8= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=MNfFuXwJ; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: zbkcfbpycawj634mcgmxbpt79eisrbyc X-Rspamd-Queue-Id: C304280014 X-Rspamd-Server: rspam05 X-HE-Tag: 1664490641-429646 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflicts for each other. In the normal non-cet case this is handled in two ways. With regular CLONE_VM a new stack is provided by userspace such that the parent and child have different stacks. For vfork, the parent is suspended until the child exits. So as long as the child doesn't return from the vfork()/CLONE_VFORK calling function and sticks to a limited set of operations, the parent and child can share the same stack. For shadow stack, these scenarios present similar sharing problems. For the CLONE_VM case, the child and the parent must have separate shadow stacks. Instead of changing clone to take a shadow stack, have the kernel just allocate one and switch to it. Use stack_size passed from clone3() syscall for thread shadow stack size. A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32-bit address space. The clone() does not pass stack_size, which was added to clone3(). In that case, use RLIMIT_STACK size and cap to 4 GB. For shadow stack enabled vfork(), the parent and child can share the same shadow stack, like they can share a normal stack. Since the parent is suspended until the child terminates, the child will not interfere with the parent while executing as long as it doesn't return from the vfork() and overwrite up the shadow stack. The child can safely overwrite down the shadow stack, as the parent can just overwrite this later. So CET does not add any additional limitations for vfork(). Userspace implementing posix vfork() can actually prevent the child from returning from the vfork() calling function, using CET. Glibc does this by adjusting the shadow stack pointer in the child, so that the child receives a #CP if it tries to return from vfork() calling function. Free the shadow stack on thread exit by doing it in mm_release(). Skip this when exiting a vfork() child since the stack is shared in the parent. During this operation, the shadow stack pointer of the new thread needs to be updated to point to the newly allocated shadow stack. Since the ability to do this is confined to the FPU subsystem, change fpu_clone() to take the new shadow stack pointer, and update it internally inside the FPU subsystem. This part was suggested by Thomas Gleixner. Suggested-by: Thomas Gleixner Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v2: - Have fpu_clone() take new shadow stack pointer and update SSP in xsave buffer for new task. (tglx) v1: - Expand commit log. - Add more comments. - Switch to xsave helpers. Yu-cheng v30: - Update comments about clone()/clone3(). (Borislav Petkov) Yu-cheng v29: - WARN_ON_ONCE() when get_xsave_addr() returns NULL, and update comments. (Dave Hansen) arch/x86/include/asm/cet.h | 7 +++++ arch/x86/include/asm/fpu/sched.h | 3 +- arch/x86/include/asm/mmu_context.h | 2 ++ arch/x86/kernel/fpu/core.c | 40 ++++++++++++++++++++++++- arch/x86/kernel/process.c | 17 ++++++++++- arch/x86/kernel/shstk.c | 48 +++++++++++++++++++++++++++++- 6 files changed, 113 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index a4a1f4c0089b..924de99e0c61 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -16,6 +16,9 @@ struct thread_shstk { long cet_prctl(struct task_struct *task, int option, unsigned long features); int shstk_setup(void); +int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, + unsigned long stack_size, + unsigned long *shstk_addr); void shstk_free(struct task_struct *p); int shstk_disable(void); void reset_thread_shstk(void); @@ -23,6 +26,10 @@ void reset_thread_shstk(void); static inline long cet_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } static inline int shstk_setup(void) { return -EOPNOTSUPP; } +static inline int shstk_alloc_thread_stack(struct task_struct *p, + unsigned long clone_flags, + unsigned long stack_size, + unsigned long *shstk_addr) { return 0; } static inline void shstk_free(struct task_struct *p) {} static inline int shstk_disable(void) { return -EOPNOTSUPP; } static inline void reset_thread_shstk(void) {} diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h index b2486b2cbc6e..54c9c2fd1907 100644 --- a/arch/x86/include/asm/fpu/sched.h +++ b/arch/x86/include/asm/fpu/sched.h @@ -11,7 +11,8 @@ extern void save_fpregs_to_fpstate(struct fpu *fpu); extern void fpu__drop(struct fpu *fpu); -extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal); +extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long shstk_addr); extern void fpu_flush_thread(void); /* diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index b8d40ddeab00..d29988cbdf20 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -146,6 +146,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + shstk_free(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 778d3054ccc7..f332e9b42b6d 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -555,8 +555,40 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu) } } +#ifdef CONFIG_X86_SHADOW_STACK +static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp) +{ + struct cet_user_state *xstate; + + /* If ssp update is not needed. */ + if (!ssp) + return 0; + + xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave, + XFEATURE_CET_USER); + + /* + * If there is a non-zero ssp, then 'dst' must be configured with a shadow + * stack and the fpu state should be up to date since it was just copied + * from the parent in fpu_clone(). So there must be a valid non-init CET + * state location in the buffer. + */ + if (WARN_ON_ONCE(!xstate)) + return 1; + + xstate->user_ssp = (u64)ssp; + + return 0; +} +#else +static int update_fpu_shstk(struct task_struct *dst, unsigned long shstk_addr) +{ +} +#endif + /* Clone current's FPU state on fork */ -int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) +int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long ssp) { struct fpu *src_fpu = ¤t->thread.fpu; struct fpu *dst_fpu = &dst->thread.fpu; @@ -616,6 +648,12 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) if (use_xsave()) dst_fpu->fpstate->regs.xsave.header.xfeatures &= ~XFEATURE_MASK_PASID; + /* + * Update shadow stack pointer, in case it changed during clone. + */ + if (update_fpu_shstk(dst, ssp)) + return 1; + trace_x86_fpu_copy_src(src_fpu); trace_x86_fpu_copy_dst(dst_fpu); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 034880311e6b..5e63d190becd 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "process.h" @@ -118,6 +119,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + shstk_free(tsk); fpu__drop(fpu); } @@ -139,6 +141,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) struct inactive_task_frame *frame; struct fork_frame *fork_frame; struct pt_regs *childregs; + unsigned long shstk_addr = 0; int ret = 0; childregs = task_pt_regs(p); @@ -173,7 +176,12 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) frame->flags = X86_EFLAGS_FIXED; #endif - fpu_clone(p, clone_flags, args->fn); + /* Allocate a new shadow stack for pthread if needed */ + ret = shstk_alloc_thread_stack(p, clone_flags, args->flags, &shstk_addr); + if (ret) + return ret; + + fpu_clone(p, clone_flags, args->fn, shstk_addr); /* Kernel thread ? */ if (unlikely(p->flags & PF_KTHREAD)) { @@ -219,6 +227,13 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); + /* + * If copy_thread() if failing, don't leak the shadow stack possibly + * allocated in shstk_alloc_thread_stack() above. + */ + if (ret) + shstk_free(p); + return ret; } diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index a0b8d4adb2bf..db4e53f9fdaf 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -118,6 +118,46 @@ void reset_thread_shstk(void) current->thread.features_locked = 0; } +int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, + unsigned long stack_size, unsigned long *shstk_addr) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + unsigned long addr; + + /* + * If shadow stack is not enabled on the new thread, skip any + * switch to a new shadow stack. + */ + if (!feature_enabled(CET_SHSTK)) + return 0; + + /* + * clone() does not pass stack_size, which was added to clone3(). + * Use RLIMIT_STACK and cap to 4 GB. + */ + if (!stack_size) + stack_size = min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G); + + /* + * For CLONE_VM, except vfork, the child needs a separate shadow + * stack. + */ + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) + return 0; + + + stack_size = PAGE_ALIGN(stack_size); + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + shstk->base = addr; + shstk->size = stack_size; + + *shstk_addr = addr + stack_size; + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; @@ -126,7 +166,13 @@ void shstk_free(struct task_struct *tsk) !feature_enabled(CET_SHSTK)) return; - if (!tsk->mm) + /* + * When fork() with CLONE_VM fails, the child (tsk) already has a + * shadow stack allocated, and exit_thread() calls this function to + * free it. In this case the parent (current) and the child share + * the same mm struct. + */ + if (!tsk->mm || tsk->mm != current->mm) return; unmap_shadow_stack(shstk->base, shstk->size); From patchwork Thu Sep 29 22:29:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994680 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CA74C433FE for ; Thu, 29 Sep 2022 22:30:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B753A8D0014; Thu, 29 Sep 2022 18:30:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B255C8D000C; Thu, 29 Sep 2022 18:30:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 903648D0014; Thu, 29 Sep 2022 18:30:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 74A0A8D000C for ; Thu, 29 Sep 2022 18:30:43 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 54EBF14131C for ; Thu, 29 Sep 2022 22:30:43 +0000 (UTC) X-FDA: 79966568766.13.D803C26 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id BC39C80012 for ; Thu, 29 Sep 2022 22:30:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490642; x=1696026642; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=iq8dhkXWm5cP6VIZVGhcIx1cQKd1zCLxnJKDHuiQvpo=; b=fybC2i5tUm7GNMa+CeQoFnPQeUf66KGC6wiewWrXucjS8nycHiOGfcGW 59At2739lzcntxGUe679xXjscLBOd2/Msr1Ya9xv3p5NznJlcXHj2Eoym eI5WVEPmWXfFg80iHO8Jxu+GVFb8qsnH6SE40l3YFie0P32IMoMY75YZP L9hO9pRZzKmJ7KuH3PLz55PDjnfPHxadTleIUDja+OZJ69asO1iq16Ulq G2dq4J3yWoRgWvpI5XEndykdu6y10pXR6FJqIpkkJCZ+je9eRSLWNGQYm 5Vof9WIwsK1rtMdSbXufIMyV8KJxmgq3h0BAJj5GhhXaayJLZ0Gq7KlZd Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207524" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207524" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:42 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016324" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016324" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:40 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 26/39] x86/cet/shstk: Introduce routines modifying shstk Date: Thu, 29 Sep 2022 15:29:23 -0700 Message-Id: <20220929222936.14584-27-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=fybC2i5t; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490643; a=rsa-sha256; cv=none; b=7u7S64GdyDl9B1vyrVrv+ldv5+PuodJqrRwN50yZFCHK6sgmRh2jR8P54dDrCgjumRnYdd lnADsf+rRgvW9W9f89C5Wp5j7GEzIy/y5npTd9A00Ry6ZMeEGB0v76bImVPO1fx6Y/KxJa 5ielAEuLy0jMRLrUUT4YckSaeOfb15Q= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490643; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=xiW2n+uI17DNG/FgK5vLAz1r1+sUnXkOEinVds2IduY=; b=Z1/nmx8K24DIbW6OL5uNOheB9CVIouzkGnZJu7lgiSdAjmj1cP1b0a2MEKmf5LrSENw2U+ ziJFOqWo/+FymU00H47nLhyxqx1NZRlclDoMFWX4sAdpH5tp9xDb5QSSRk0ZZju1yu4J1d yXdHsOukK0GfXZG6LkoeyVZPzUW/Fh8= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=fybC2i5t; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: 3qs1ufsiq8pjtak5wfmzbzpip85quas9 X-Rspamd-Queue-Id: BC39C80012 X-Rspamd-Server: rspam05 X-HE-Tag: 1664490642-784141 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Shadow stack's are normally written to via CALL/RET or specific CET instuctions like RSTORSSP/SAVEPREVSSP. However during some Linux operations the kernel will need to write to directly using the ring-0 only WRUSS instruction. A shadow stack restore token marks a restore point of the shadow stack, and the address in a token must point directly above the token, which is within the same shadow stack. This is distinctively different from other pointers on the shadow stack, since those pointers point to executable code area. Introduce token setup and verify routines. Also introduce WRUSS, which is a kernel-mode instruction but writes directly to user shadow stack. In future patches that enable shadow stack to work with signals, the kernel will need something to denote the point in the stack where sigreturn may be called. This will prevent attackers calling sigreturn at arbitrary places in the stack, in order to help prevent SROP attacks. To do this, something that can only be written by the kernel needs to be placed on the shadow stack. This can be accomplished by setting bit 63 in the frame written to the shadow stack. Userspace return addresses can't have this bit set as it is in the kernel range. It is also can't be a valid restore token. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v2: - Add data helpers for writing to shadow stack. v1: - Use xsave helpers. Yu-cheng v30: - Update commit log, remove description about signals. - Update various comments. - Remove variable 'ssp' init and adjust return value accordingly. - Check get_user_shstk_addr() return value. - Replace 'ia32' with 'proc32'. Yu-cheng v29: - Update comments for the use of get_xsave_addr(). arch/x86/include/asm/special_insns.h | 13 ++++ arch/x86/kernel/shstk.c | 108 +++++++++++++++++++++++++++ 2 files changed, 121 insertions(+) diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 35f709f619fb..f096f52bd059 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -223,6 +223,19 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_SHADOW_STACK +static inline int write_user_shstk_64(u64 __user *addr, u64 val) +{ + asm_volatile_goto("1: wrussq %[val], (%[addr])\n" + _ASM_EXTABLE(1b, %l[fail]) + :: [addr] "r" (addr), [val] "r" (val) + :: fail); + return 0; +fail: + return -EFAULT; +} +#endif /* CONFIG_X86_SHADOW_STACK */ + #define nop() asm volatile ("nop") static inline void serialize(void) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index db4e53f9fdaf..8904aef487bf 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -25,6 +25,8 @@ #include #include +#define SS_FRAME_SIZE 8 + static bool feature_enabled(unsigned long features) { return current->thread.features & features; @@ -40,6 +42,31 @@ static void feature_clr(unsigned long features) current->thread.features &= ~features; } +/* + * Create a restore token on the shadow stack. A token is always 8-byte + * and aligned to 8. + */ +static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) +{ + unsigned long addr; + + /* Token must be aligned */ + if (!IS_ALIGNED(ssp, 8)) + return -EINVAL; + + addr = ssp - SS_FRAME_SIZE; + + /* Mark the token 64-bit */ + ssp |= BIT(0); + + if (write_user_shstk_64((u64 __user *)addr, (u64)ssp)) + return -EFAULT; + + *token_addr = addr; + + return 0; +} + static unsigned long alloc_shstk(unsigned long size) { int flags = MAP_ANONYMOUS | MAP_PRIVATE; @@ -158,6 +185,87 @@ int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, return 0; } +static unsigned long get_user_shstk_addr(void) +{ + unsigned long long ssp; + + fpu_lock_and_load(); + + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + fpregs_unlock(); + + return ssp; +} + +static int put_shstk_data(u64 __user *addr, u64 data) +{ + WARN_ON(data & BIT(63)); + + /* + * Mark the high bit so that the sigframe can't be processed as a + * return address. + */ + if (write_user_shstk_64(addr, data | BIT(63))) + return -EFAULT; + return 0; +} + +static int get_shstk_data(unsigned long *data, unsigned long __user *addr) +{ + unsigned long ldata; + + if (unlikely(get_user(ldata, addr))) + return -EFAULT; + + if (!(ldata & BIT(63))) + return -EINVAL; + + *data = ldata & ~BIT(63); + + return 0; +} + +/* + * Verify the user shadow stack has a valid token on it, and then set + * *new_ssp according to the token. + */ +static int shstk_check_rstor_token(unsigned long *new_ssp) +{ + unsigned long token_addr; + unsigned long token; + + token_addr = get_user_shstk_addr(); + if (!token_addr) + return -EINVAL; + + if (get_user(token, (unsigned long __user *)token_addr)) + return -EFAULT; + + /* Is mode flag correct? */ + if (!(token & BIT(0))) + return -EINVAL; + + /* Is busy flag set? */ + if (token & BIT(1)) + return -EINVAL; + + /* Mask out flags */ + token &= ~3UL; + + /* Restore address aligned? */ + if (!IS_ALIGNED(token, 8)) + return -EINVAL; + + /* Token placed properly? */ + if (((ALIGN_DOWN(token, 8) - 8) != token_addr) || token >= TASK_SIZE_MAX) + return -EINVAL; + + *new_ssp = token; + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; From patchwork Thu Sep 29 22:29:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994681 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F056CC4332F for ; Thu, 29 Sep 2022 22:30:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D74DE8D0015; Thu, 29 Sep 2022 18:30:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D24048D000C; Thu, 29 Sep 2022 18:30:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADC598D0015; Thu, 29 Sep 2022 18:30:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8E1208D000C for ; Thu, 29 Sep 2022 18:30:45 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6235BA13FE for ; Thu, 29 Sep 2022 22:30:45 +0000 (UTC) X-FDA: 79966568850.19.A29B881 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id 9B3E380012 for ; Thu, 29 Sep 2022 22:30:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490644; x=1696026644; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=gU6V+WJRMKjubaZpzETKP7bqEU4+qBZvqdpAOnIAvbA=; b=YdrLJaAp8C/CqhBNb59ytGuVhQFIxfdtuNR1d9ijeAYqWeTX8Swy+6CQ 0VGcJq7BO3BsctbFrwPnGXpnLddh5/Ic3+tfZuY47yboDZfzB3o2+DFW8 0YUKZ4IxauhWqRLECcqN+HLSBJZefAB4aFuVfXl5nEdYWB73/OSFVkiL7 FAhI/6/Gz9cvFEgoBpzRkRm4RB9msYEBgeAWYqv7UaJjDz7Zl5nm3Ajwc 2bD3GJU0wVDI6F0WNjhqSKXPAKBDE4PDMCpLM+rBAXN+O95YphsqYFSSg DBdtuzpbeabva/ReN89jcnldXhmyNN5FqsvvB+y1sWeMjpwCsF9jIRvxs A==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207532" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207532" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:44 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016330" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016330" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:42 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 27/39] x86/cet/shstk: Handle signals for shadow stack Date: Thu, 29 Sep 2022 15:29:24 -0700 Message-Id: <20220929222936.14584-28-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=YdrLJaAp; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490645; a=rsa-sha256; cv=none; b=eFzXEiICZkAVXkPJIk9YBz9xExj37M+Y+NSV4S+2Q84tiBkpz/NKoYzuy2csHEFcr7ZPeC au9rhCWzEz3GSetuvSVjhG6FZg6hIX+EgHNbZO/8vTJOV/ghTr/9/0TQak9aaZ0SyvjDCK TAJ5OwW2goHgi/DPKuEkZNEj1SPRbxI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490645; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=ltFa5zxmO5hHvnn3cBqmlVqLcljC74FcXW88X6lCLqU=; b=Oy/uMPpLkUIJOehBRGkvSjL+GUDgyUe+8W9NXtETG1AYdBnDAHipf7I7tsj5WRZPSD7WQ9 5KbeCQrpFl5Yzgi64R8smk/eYCkmObP6sSGS63/L9nwNr34+6gAUWubIvpv60RqmkHvGLa is+0c9KLLDveRJEiyr9tADWxR8f5usE= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=YdrLJaAp; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: am138idhtyjzbus4i5phi8pi5rxqgcr4 X-Rspamd-Queue-Id: 9B3E380012 X-Rspamd-Server: rspam05 X-HE-Tag: 1664490644-651074 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu When a signal is handled normally the context is pushed to the stack before handling it. For shadow stacks, since the shadow stack only track's return addresses, there isn't any state that needs to be pushed. However, there are still a few things that need to be done. These things are userspace visible and which will be kernel ABI for shadow stacks. One is to make sure the restorer address is written to shadow stack, since the signal handler (if not changing ucontext) returns to the restorer, and the restorer calls sigreturn. So add the restorer on the shadow stack before handling the signal, so there is not a conflict when the signal handler returns to the restorer. The other thing to do is to place some type of checkable token on the thread's shadow stack before handling the signal and check it during sigreturn. This is an extra layer of protection to hamper attackers calling sigreturn manually as in SROP-like attacks. For this token we can use the shadow stack data format defined earlier. Have the data pushed be the previous SSP. In the future the sigreturn might want to return back to a different stack. Storing the SSP (instead of a restore offset or something) allows for future functionality that may want to restore to a different stack. So, when handling a signal push - the SSP pointing in the shadow stack data format - the restorer address below the restore token. In sigreturn, verify SSP is stored in the data format and pop the shadow stack. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Andy Lutomirski Cc: Cyrill Gorcunov Cc: Florian Weimer Cc: H. Peter Anvin Cc: Kees Cook --- v2: - Switch to new shstk signal format v1: - Use xsave helpers. - Expand commit log. Yu-cheng v27: - Eliminate saving shadow stack pointer to signal context. Yu-cheng v25: - Update commit log/comments for the sc_ext struct. - Use restorer address already calculated. - Change CONFIG_X86_CET to CONFIG_X86_SHADOW_STACK. - Change X86_FEATURE_CET to X86_FEATURE_SHSTK. - Eliminate writing to MSR_IA32_U_CET for shadow stack. - Change wrmsrl() to wrmsrl_safe() and handle error. arch/x86/ia32/ia32_signal.c | 1 + arch/x86/include/asm/cet.h | 5 ++ arch/x86/kernel/shstk.c | 126 ++++++++++++++++++++++++++++++------ arch/x86/kernel/signal.c | 10 +++ 4 files changed, 123 insertions(+), 19 deletions(-) diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c index c9c3859322fa..88d71b9de616 100644 --- a/arch/x86/ia32/ia32_signal.c +++ b/arch/x86/ia32/ia32_signal.c @@ -34,6 +34,7 @@ #include #include #include +#include static inline void reload_segments(struct sigcontext_32 *sc) { diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 924de99e0c61..8c6fab9f402a 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -6,6 +6,7 @@ #include struct task_struct; +struct ksignal; struct thread_shstk { u64 base; @@ -22,6 +23,8 @@ int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, void shstk_free(struct task_struct *p); int shstk_disable(void); void reset_thread_shstk(void); +int setup_signal_shadow_stack(struct ksignal *ksig); +int restore_signal_shadow_stack(void); #else static inline long cet_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } @@ -33,6 +36,8 @@ static inline int shstk_alloc_thread_stack(struct task_struct *p, static inline void shstk_free(struct task_struct *p) {} static inline int shstk_disable(void) { return -EOPNOTSUPP; } static inline void reset_thread_shstk(void) {} +static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; } +static inline int restore_signal_shadow_stack(void) { return 0; } #endif /* CONFIG_X86_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 8904aef487bf..04442134aadd 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -227,41 +227,129 @@ static int get_shstk_data(unsigned long *data, unsigned long __user *addr) } /* - * Verify the user shadow stack has a valid token on it, and then set - * *new_ssp according to the token. + * Create a restore token on shadow stack, and then push the user-mode + * function return address. */ -static int shstk_check_rstor_token(unsigned long *new_ssp) +static int shstk_setup_rstor_token(unsigned long ret_addr, unsigned long *new_ssp) { - unsigned long token_addr; - unsigned long token; + unsigned long ssp, token_addr; + int err; + + if (!ret_addr) + return -EINVAL; + + ssp = get_user_shstk_addr(); + if (!ssp) + return -EINVAL; + + err = create_rstor_token(ssp, &token_addr); + if (err) + return err; + + ssp = token_addr - sizeof(u64); + err = write_user_shstk_64((u64 __user *)ssp, (u64)ret_addr); + + if (!err) + *new_ssp = ssp; + + return err; +} + +static int shstk_push_sigframe(unsigned long *ssp) +{ + unsigned long target_ssp = *ssp; + + /* Token must be aligned */ + if (!IS_ALIGNED(*ssp, 8)) + return -EINVAL; - token_addr = get_user_shstk_addr(); - if (!token_addr) + if (!IS_ALIGNED(target_ssp, 8)) return -EINVAL; - if (get_user(token, (unsigned long __user *)token_addr)) + *ssp -= SS_FRAME_SIZE; + if (put_shstk_data((void *__user)*ssp, target_ssp)) return -EFAULT; - /* Is mode flag correct? */ - if (!(token & BIT(0))) + return 0; +} + + +static int shstk_pop_sigframe(unsigned long *ssp) +{ + unsigned long token_addr; + int err; + + err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); + if (unlikely(err)) + return err; + + /* Restore SSP aligned? */ + if (unlikely(!IS_ALIGNED(token_addr, 8))) return -EINVAL; - /* Is busy flag set? */ - if (token & BIT(1)) + /* SSP in userspace? */ + if (unlikely(token_addr >= TASK_SIZE_MAX)) return -EINVAL; - /* Mask out flags */ - token &= ~3UL; + *ssp = token_addr; + + return 0; +} + +int setup_signal_shadow_stack(struct ksignal *ksig) +{ + void __user *restorer = ksig->ka.sa.sa_restorer; + unsigned long ssp; + int err; - /* Restore address aligned? */ - if (!IS_ALIGNED(token, 8)) + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) || + !feature_enabled(CET_SHSTK)) + return 0; + + if (!restorer) return -EINVAL; - /* Token placed properly? */ - if (((ALIGN_DOWN(token, 8) - 8) != token_addr) || token >= TASK_SIZE_MAX) + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) + return -EINVAL; + + err = shstk_push_sigframe(&ssp); + if (unlikely(err)) + return err; + + /* Push restorer address */ + ssp -= SS_FRAME_SIZE; + err = write_user_shstk_64((u64 __user *)ssp, (u64)restorer); + if (unlikely(err)) + return -EFAULT; + + fpu_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); + + return 0; +} + +int restore_signal_shadow_stack(void) +{ + unsigned long ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) || + !feature_enabled(CET_SHSTK)) + return 0; + + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) return -EINVAL; - *new_ssp = token; + err = shstk_pop_sigframe(&ssp); + if (unlikely(err)) + return err; + + fpu_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); return 0; } diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 9c7265b524c7..d2081305f698 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -47,6 +47,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 /* @@ -472,6 +473,9 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig, frame = get_sigframe(&ksig->ka, regs, sizeof(struct rt_sigframe), &fp); uc_flags = frame_uc_flags(regs); + if (setup_signal_shadow_stack(ksig)) + return -EFAULT; + if (!user_access_begin(frame, sizeof(*frame))) return -EFAULT; @@ -675,6 +679,9 @@ SYSCALL_DEFINE0(rt_sigreturn) if (!restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags)) goto badframe; + if (restore_signal_shadow_stack()) + goto badframe; + if (restore_altstack(&frame->uc.uc_stack)) goto badframe; @@ -992,6 +999,9 @@ COMPAT_SYSCALL_DEFINE0(x32_rt_sigreturn) if (!restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags)) goto badframe; + if (restore_signal_shadow_stack()) + goto badframe; + if (compat_restore_altstack(&frame->uc.uc_stack)) goto badframe; From patchwork Thu Sep 29 22:29:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994682 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACD96C433FE for ; Thu, 29 Sep 2022 22:30:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 951FC8D0016; Thu, 29 Sep 2022 18:30:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8DCD78D000C; Thu, 29 Sep 2022 18:30:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 72DDD8D0016; Thu, 29 Sep 2022 18:30:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 637178D000C for ; Thu, 29 Sep 2022 18:30:47 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 31D1A1C6E3A for ; Thu, 29 Sep 2022 22:30:47 +0000 (UTC) X-FDA: 79966568934.28.29ADFBD Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id 6E11F80017 for ; Thu, 29 Sep 2022 22:30:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490646; x=1696026646; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=XGWOQlXbeszNuJyd3LIBQE35j+5JhmUKmTsmVq3Ln7w=; b=ARa763ZiNGlRj3/V8qEI4aYUH0VYlW3/PlMLtr+C5sxmmwb3T4wuu5EG xOphNkhiV8/yTuU3yCaHMU2fThH17OfumBSOiZxv7HPA10GGmJUv84QqA eV/Fkddh1dQzjF2LgkTsv08UcvGA2GPYJdJamfSorLfj64vHcdwdxytSg MykEfDZVH5ABEMdsmEvaW9l7gqOy6Qv7VZbZVmoHZI/W0FmIoX7Hj0Ffx NbJgCWRrQ9zZgNv6+rrFGTSNgRmRbsLBchWeZ3XUwIr+yL038j0NprbN2 y5TOGHG/vVzfraa/ntquUQdDtjabkEP3sqinTxgnapp4WEFJa2RyZL3C7 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207539" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207539" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:46 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016335" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016335" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:44 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v2 28/39] x86/cet/shstk: Introduce map_shadow_stack syscall Date: Thu, 29 Sep 2022 15:29:25 -0700 Message-Id: <20220929222936.14584-29-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ARa763Zi; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490646; a=rsa-sha256; cv=none; b=AwtN3piKN4KvN9Rm+fcOpJbtVUhPE8kAL+JI86pmu/ARyYiLJnhL8pGEfWYMqjhIRXaQ+r HPe1HYxlx9H7wCVory0Hj4x6J5qTP3MxyeypUfFgVmMty1gFDN9tR3jwyYr1iuz3o60Hhz 0bf63xWUtcuIOCvF2yfSdX9XAAwfZe0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490646; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=q88ylyIgXi0CgPtY6HhN2noAdmM+m9aTMLM6zVPCjU4=; b=3WR92GAmnzCX1zkLK2ZWl89fP857twZwCG0812ExqcvuydoFg6DbnZ71qRVmgos+5I5B6q f6pJ6daz/S/k0Bi0qKwfj0XguSM3aEsPlGTIU7jqWLO+pZS7zFYFWv97uHx2Jby/O7xMc8 EZDBOS+PnaXGDawIWQ+Nt7CnUR+Xyqw= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ARa763Zi; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: 4pdyy3i91bd5k4nbih9w1wxn8de5acxi X-Rspamd-Queue-Id: 6E11F80017 X-Rspamd-Server: rspam05 X-HE-Tag: 1664490646-184370 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When operating with shadow stacks enabled, the kernel will automatically allocate shadow stacks for new threads, however in some cases userspace will need additional shadow stacks. The main example of this is the ucontext family of functions, which require userspace allocating and pivoting to userspace managed stacks. Unlike most other user memory permissions, shadow stacks need to be provisioned with special data in order to be useful. They need to be setup with a restore token so that userspace can pivot to them via the RSTORSSP instruction. But, the security design of shadow stack's is that they should not be written to except in limited circumstances. This presents a problem for userspace, as to how userspace can provision this special data, without allowing for the shadow stack to be generally writable. Previously, a new PROT_SHADOW_STACK was attempted, which could be mprotect()ed from RW permissions after the data was provisioned. This was found to not be secure enough, as other thread's could write to the shadow stack during the writable window. The kernel can use a special instruction, WRUSS, to write directly to userspace shadow stacks. So the solution can be that memory can be mapped as shadow stack permissions from the beginning (never generally writable in userspace), and the kernel itself can write the restore token. First, a new madvise() flag was explored, which could operate on the PROT_SHADOW_STACK memory. This had a couple downsides: 1. Extra checks were needed in mprotect() to prevent writable memory from ever becoming PROT_SHADOW_STACK. 2. Extra checks/vma state were needed in the new madvise() to prevent restore tokens being written into the middle of pre-used shadow stacks. It is ideal to prevent restore tokens being added at arbitrary locations, so the check was to make sure the shadow stack had never been written to. 3. It stood out from the rest of the madvise flags, as more of direct action than a hint at future desired behavior. So rather than repurpose two existing syscalls (mmap, madvise) that don't quite fit, just implement a new map_shadow_stack syscall to allow userspace to map and setup new shadow stacks in one step. While ucontext is the primary motivator, userspace may have other unforeseen reasons to setup it's own shadow stacks using the WRSS instruction. Towards this provide a flag so that stacks can be optionally setup securely for the common case of ucontext without enabling WRSS. Or potentially have the kernel set up the shadow stack in some new way. The following example demonstrates how to create a new shadow stack with map_shadow_stack: void *shstk = map_shadow_stack(adrr, stack_size, SHADOW_STACK_SET_TOKEN); Signed-off-by: Rick Edgecombe --- v2: - Change syscall to take address like mmap() for CRIU's usage v1: - New patch (replaces PROT_SHADOW_STACK). arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/uapi/asm/mman.h | 2 ++ arch/x86/kernel/shstk.c | 48 +++++++++++++++++++++----- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 2 +- kernel/sys_ni.c | 1 + 6 files changed, 46 insertions(+), 9 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c84d12608cd2..d9639e3e0a33 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -372,6 +372,7 @@ 448 common process_mrelease sys_process_mrelease 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node +451 common map_shadow_stack sys_map_shadow_stack # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index 775dbd3aff73..c9fc57c88fcc 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -12,6 +12,8 @@ ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) #endif +#define SHADOW_STACK_SET_TOKEN 0x1 /* Set up a restore token in the shadow stack */ + #include #endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 04442134aadd..873830d63adc 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -62,24 +63,34 @@ static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) if (write_user_shstk_64((u64 __user *)addr, (u64)ssp)) return -EFAULT; - *token_addr = addr; + if (token_addr) + *token_addr = addr; return 0; } -static unsigned long alloc_shstk(unsigned long size) +static unsigned long alloc_shstk(unsigned long addr, unsigned long size, + unsigned long token_offset, bool set_res_tok) { int flags = MAP_ANONYMOUS | MAP_PRIVATE; struct mm_struct *mm = current->mm; - unsigned long addr, unused; + unsigned long mapped_addr, unused; mmap_write_lock(mm); - addr = do_mmap(NULL, addr, size, PROT_READ, flags, - VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); - + mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); - return addr; + if (!set_res_tok || IS_ERR_VALUE(addr)) + goto out; + + if (create_rstor_token(mapped_addr + token_offset, NULL)) { + vm_munmap(mapped_addr, size); + return -EINVAL; + } + +out: + return mapped_addr; } static void unmap_shadow_stack(u64 base, u64 size) @@ -122,7 +133,7 @@ int shstk_setup(void) return -EOPNOTSUPP; size = PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); - addr = alloc_shstk(size); + addr = alloc_shstk(0, size, size, false); if (IS_ERR_VALUE(addr)) return PTR_ERR((void *)addr); @@ -174,6 +185,7 @@ int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, stack_size = PAGE_ALIGN(stack_size); + addr = alloc_shstk(0, stack_size, 0, false); if (IS_ERR_VALUE(addr)) return PTR_ERR((void *)addr); @@ -395,6 +407,26 @@ int shstk_disable(void) return 0; } + +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{ + unsigned long aligned_size; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return -ENOSYS; + + /* + * An overflow would result in attempting to write the restore token + * to the wrong location. Not catastrophic, but just return the right + * error code and block it. + */ + aligned_size = PAGE_ALIGN(size); + if (aligned_size < size) + return -EOVERFLOW; + + return alloc_shstk(addr, aligned_size, size, flags & SHADOW_STACK_SET_TOKEN); +} + long cet_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_CET_LOCK) { diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index a34b0f9a9972..3ae05cbdea5b 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1056,6 +1056,7 @@ asmlinkage long sys_memfd_secret(unsigned int flags); asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long len, unsigned long home_node, unsigned long flags); +asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 45fa180cc56a..b12940ec5926 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -887,7 +887,7 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv) __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) #undef __NR_syscalls -#define __NR_syscalls 451 +#define __NR_syscalls 452 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 860b2dcf3ac4..cb9aebd34646 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -381,6 +381,7 @@ COND_SYSCALL(vm86old); COND_SYSCALL(modify_ldt); COND_SYSCALL(vm86); COND_SYSCALL(kexec_file_load); +COND_SYSCALL(map_shadow_stack); /* s390 */ COND_SYSCALL(s390_pci_mmio_read); From patchwork Thu Sep 29 22:29:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60877C4332F for ; Thu, 29 Sep 2022 22:30:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 867F68D000F; Thu, 29 Sep 2022 18:30:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CA008D000C; Thu, 29 Sep 2022 18:30:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F7598D0017; Thu, 29 Sep 2022 18:30:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4CC318D000C for ; Thu, 29 Sep 2022 18:30:49 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 247D916137E for ; Thu, 29 Sep 2022 22:30:49 +0000 (UTC) X-FDA: 79966569018.03.DD63FFA Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id 5FDAF8000F for ; Thu, 29 Sep 2022 22:30:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490648; x=1696026648; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=SiBonWtkwMqX0JsNUovdDLOqqHtXkkKku/WhNLKvM50=; b=NtcYnGaWnC3Jd7x87tdGSzVwyR5seXkMdwGPcblw45cmbs8Jg/RKKaop QNdNOTY5y9H049PI21dQgt2y/t+nCdCJo7vDrnpAtW858gFRnSIJ9Rm5d bOA4LO9HUN9pXCDS4ok5xlD/bf+b+ZmhlwSwXJUTf2i09GKiENf4vOd1Z XEbrDnvF5aewUHX7syOoWwb671EP3EkdyT7Hyb2z56FF7DjfGy/izGjNE u6H/QSDUwm1jmS6SGsH90zND2JVCCTdZ2quZ2mGMqeQA1/5iFVdi14SzL 6u1mPON4d3fxru9Vm7f756mCJ0On1oMjhQLXjNwkRXSGI69wFnemUnSxT w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207549" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207549" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:47 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016342" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016342" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:46 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v2 29/39] x86/cet/shstk: Support wrss for userspace Date: Thu, 29 Sep 2022 15:29:26 -0700 Message-Id: <20220929222936.14584-30-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=NtcYnGaW; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490648; a=rsa-sha256; cv=none; b=saqBFJQka5+ullyF7wIP5h4mYeDpQmmRVlrguKPuxiyKra+w1XcIS7Pqkg6HldoCYXhSFG //JTD/4EJJAmgY2SvaJoYnEN+Fybykh925f9rv/1SrUST6BllZZFECUtp8wHSxBknf/RGe /X4+NBbH1cz+TDkA1ziDyYRD7ccPlBI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490648; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=3ZitXXrtaYxB0tD0za+P/cxxrVrNMYR3JVP2kPtDJD0=; b=LncTJWCZAA7JeO9GMko9M38WxVnRafhtbcQQz351gnnoSRqQy8TdIUVaVzbcSaoTsxzG7J gjF5cWE6HzgHPuxhfU4bvdi6Y5FUzLkqmdZicPIz8eNeeIgSW8z5vNT3j7NRk8UhU+Jdol MqxjRLEuSibS7OsZhruOb+HYiE6NmF0= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=NtcYnGaW; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: 41xqd81r3poahr985oquzwzkehazm1rn X-Rspamd-Queue-Id: 5FDAF8000F X-Rspamd-Server: rspam05 X-HE-Tag: 1664490648-271970 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For the current shadow stack implementation, shadow stacks contents easily be arbitrarily provisioned with data. This property helps apps protect themselves better, but also restricts any potential apps that may want to do exotic things at the expense of a little security. The x86 shadow stack feature introduces a new instruction, wrss, which can be enabled to write directly to shadow stack permissioned memory from userspace. Allow it to get enabled via the prctl interface. Only enable the userspace wrss instruction, which allows writes to userspace shadow stacks from userspace. Do not allow it to be enabled independently of shadow stack, as HW does not support using WRSS when shadow stack is disabled. From a fault handler perspective, WRSS will behave very similar to WRUSS, which is treated like a user access from a #PF err code perspective. Signed-off-by: Rick Edgecombe --- v2: - Add some commit log verbiage from (Dave Hansen) v1: - New patch. arch/x86/include/asm/cet.h | 2 ++ arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/shstk.c | 34 +++++++++++++++++++++++++++++-- 3 files changed, 35 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 8c6fab9f402a..edf681d4843a 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -25,6 +25,7 @@ int shstk_disable(void); void reset_thread_shstk(void); int setup_signal_shadow_stack(struct ksignal *ksig); int restore_signal_shadow_stack(void); +int wrss_control(bool enable); #else static inline long cet_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } @@ -38,6 +39,7 @@ static inline int shstk_disable(void) { return -EOPNOTSUPP; } static inline void reset_thread_shstk(void) {} static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; } static inline int restore_signal_shadow_stack(void) { return 0; } +static inline int wrss_control(bool enable) { return -EOPNOTSUPP; } #endif /* CONFIG_X86_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 41af3a8c4fa4..d811f0c5fc4f 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -27,5 +27,6 @@ #define ARCH_CET_LOCK 0x4003 #define CET_SHSTK 0x1 +#define CET_WRSS 0x2 #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 873830d63adc..fc64a04366aa 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -386,6 +386,36 @@ void shstk_free(struct task_struct *tsk) unmap_shadow_stack(shstk->base, shstk->size); } +int wrss_control(bool enable) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return -EOPNOTSUPP; + + /* + * Only enable wrss if shadow stack is enabled. If shadow stack is not + * enabled, wrss will already be disabled, so don't bother clearing it + * when disabling. + */ + if (!feature_enabled(CET_SHSTK)) + return -EPERM; + + /* Already enabled/disabled? */ + if (feature_enabled(CET_WRSS) == enable) + return 0; + + fpu_lock_and_load(); + if (enable) { + set_clr_bits_msrl(MSR_IA32_U_CET, CET_WRSS_EN, 0); + feature_set(CET_WRSS); + } else { + set_clr_bits_msrl(MSR_IA32_U_CET, 0, CET_WRSS_EN); + feature_clr(CET_WRSS); + } + fpregs_unlock(); + + return 0; +} + int shstk_disable(void) { if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) @@ -397,12 +427,12 @@ int shstk_disable(void) fpu_lock_and_load(); /* Disable WRSS too when disabling shadow stack */ - set_clr_bits_msrl(MSR_IA32_U_CET, 0, CET_SHSTK_EN); + set_clr_bits_msrl(MSR_IA32_U_CET, 0, CET_SHSTK_EN | CET_WRSS_EN); wrmsrl(MSR_IA32_PL3_SSP, 0); fpregs_unlock(); shstk_free(current); - feature_clr(CET_SHSTK); + feature_clr(CET_SHSTK | CET_WRSS); return 0; } From patchwork Thu Sep 29 22:29:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994684 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A0FAC433FE for ; Thu, 29 Sep 2022 22:30:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 693CC8D0017; Thu, 29 Sep 2022 18:30:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 61FEC8D000C; Thu, 29 Sep 2022 18:30:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 448D88D0017; Thu, 29 Sep 2022 18:30:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 34BDB8D000C for ; Thu, 29 Sep 2022 18:30:51 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0CA4216137A for ; Thu, 29 Sep 2022 22:30:51 +0000 (UTC) X-FDA: 79966569102.13.9A7B38A Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id 6462180012 for ; Thu, 29 Sep 2022 22:30:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490650; x=1696026650; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=TiT3uy8CkFT3vSQU1URwMBZV8RmBfXFHioOJIpTN2Lc=; b=BK6Pkur37qq+wmGZtDOS2Oq/AJi9kGHYS/wTuqWroqK9SVHvECbPUhsT Eekcq6LEtWYFV2qd5EkDpFAMUS1LNvvcrEbyzQ7Ps6Wpj93keik2uXjf4 L4VLG3wQn/gIio1EKMrjAvTdfEhOwNf5JdajDiEgv85sumkFMf44Ikl9f IFyh1XaYTr37OtNYaocaVJzFSxxl17GgkVn0w/FbNeExMOZbsZA+nzUQ7 rVcW917F7e3BoCOPV1owijos7vG166RXCiA4PwEt3EhgCyujrQH4tuUPY XlpIRJLL4bheskecGhZVppX1/MCTAUbqwy0W2lSTmguvHjBCl1gDOdIQw Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207557" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207557" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:49 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016345" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016345" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:47 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v2 30/39] x86: Expose thread features status in /proc/$PID/arch_status Date: Thu, 29 Sep 2022 15:29:27 -0700 Message-Id: <20220929222936.14584-31-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=BK6Pkur3; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490650; a=rsa-sha256; cv=none; b=tz9qPpYtLp1/0WwQHmaS/+VhlaIApaNZSZuGGcsKR2/8KtJUEfV6dPJU7jwbRrZdqFffbp VC4yjuphzlLgCAPUna19H5LHoO3tHTOgiSkeSFhHkc7wjHh7AZ1nq/yuHHzI/B6uqNMylM KHZ/bA4wpjqsXbaBW564KVfH6rKsWXM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490650; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=/p3W10K/8z25RuvbgRy1vxvG6JLf79zbmHkHORIHHV8=; b=bPjsyc17GhbHPifYgBX0QG2D0+hvBgOoRyAycNA4eUdqQLhuj9/KnVesgKnCD7NvRTBmb8 7VRwqCRMFESLJnU2G67743vtn0tRmmLnSRacoG49Msr/g215Lf+9Rm+H4FV2XXv5Wv28On PEguap0eUdbI3iQZlGqwpC0gmmniAYw= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=BK6Pkur3; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: m7uoaso59qrtjfhyzwc6trh7kzsggrxa X-Rspamd-Queue-Id: 6462180012 X-Rspamd-Server: rspam05 X-HE-Tag: 1664490650-397974 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: "Kirill A. Shutemov" Applications and loaders can have logic to decide whether to enable CET. They usually don't report whether CET has been enabled or not, so there is no way to verify whether an application actually is protected by CET features. Add two lines in /proc/$PID/arch_status to report enabled and locked features. Signed-off-by: Kirill A. Shutemov [Switched to CET, added to commit log] Signed-off-by: Rick Edgecombe --- v2: - New patch arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/fpu/xstate.c | 47 --------------------------- arch/x86/kernel/proc.c | 63 ++++++++++++++++++++++++++++++++++++ 3 files changed, 65 insertions(+), 47 deletions(-) create mode 100644 arch/x86/kernel/proc.c diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 8950d1f71226..b87b8a0a3749 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -141,6 +141,8 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT) += sev.o obj-$(CONFIG_X86_SHADOW_STACK) += shstk.o +obj-$(CONFIG_PROC_FS) += proc.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 5e6a4867fd05..9258fc1169cc 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -10,8 +10,6 @@ #include #include #include -#include -#include #include #include @@ -1746,48 +1744,3 @@ long fpu_xstate_prctl(int option, unsigned long arg2) return -EINVAL; } } - -#ifdef CONFIG_PROC_PID_ARCH_STATUS -/* - * Report the amount of time elapsed in millisecond since last AVX512 - * use in the task. - */ -static void avx512_status(struct seq_file *m, struct task_struct *task) -{ - unsigned long timestamp = READ_ONCE(task->thread.fpu.avx512_timestamp); - long delta; - - if (!timestamp) { - /* - * Report -1 if no AVX512 usage - */ - delta = -1; - } else { - delta = (long)(jiffies - timestamp); - /* - * Cap to LONG_MAX if time difference > LONG_MAX - */ - if (delta < 0) - delta = LONG_MAX; - delta = jiffies_to_msecs(delta); - } - - seq_put_decimal_ll(m, "AVX512_elapsed_ms:\t", delta); - seq_putc(m, '\n'); -} - -/* - * Report architecture specific information - */ -int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, - struct pid *pid, struct task_struct *task) -{ - /* - * Report AVX512 state if the processor and build option supported. - */ - if (cpu_feature_enabled(X86_FEATURE_AVX512F)) - avx512_status(m, task); - - return 0; -} -#endif /* CONFIG_PROC_PID_ARCH_STATUS */ diff --git a/arch/x86/kernel/proc.c b/arch/x86/kernel/proc.c new file mode 100644 index 000000000000..fa9cbe13c298 --- /dev/null +++ b/arch/x86/kernel/proc.c @@ -0,0 +1,63 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include + +/* + * Report the amount of time elapsed in millisecond since last AVX512 + * use in the task. + */ +static void avx512_status(struct seq_file *m, struct task_struct *task) +{ + unsigned long timestamp = READ_ONCE(task->thread.fpu.avx512_timestamp); + long delta; + + if (!timestamp) { + /* + * Report -1 if no AVX512 usage + */ + delta = -1; + } else { + delta = (long)(jiffies - timestamp); + /* + * Cap to LONG_MAX if time difference > LONG_MAX + */ + if (delta < 0) + delta = LONG_MAX; + delta = jiffies_to_msecs(delta); + } + + seq_put_decimal_ll(m, "AVX512_elapsed_ms:\t", delta); + seq_putc(m, '\n'); +} + +static void dump_features(struct seq_file *m, unsigned long features) +{ + if (features & CET_SHSTK) + seq_puts(m, "shstk "); + if (features & CET_WRSS) + seq_puts(m, "wrss "); +} + +/* + * Report architecture specific information + */ +int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) +{ + /* + * Report AVX512 state if the processor and build option supported. + */ + if (cpu_feature_enabled(X86_FEATURE_AVX512F)) + avx512_status(m, task); + + seq_puts(m, "Thread_features:\t"); + dump_features(m, task->thread.features); + seq_putc(m, '\n'); + + seq_puts(m, "Thread_features_locked:\t"); + dump_features(m, task->thread.features_locked); + seq_putc(m, '\n'); + + return 0; +} From patchwork Thu Sep 29 22:29:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1480C4332F for ; Thu, 29 Sep 2022 22:30:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 694708D0018; Thu, 29 Sep 2022 18:30:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5803B8D000C; Thu, 29 Sep 2022 18:30:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35DF88D0018; Thu, 29 Sep 2022 18:30:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 251C08D000C for ; Thu, 29 Sep 2022 18:30:53 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id EB27D1C6E40 for ; Thu, 29 Sep 2022 22:30:52 +0000 (UTC) X-FDA: 79966569144.16.4EBC9EB Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id 42D3680015 for ; Thu, 29 Sep 2022 22:30:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490652; x=1696026652; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=1ROr10PWtEQqZbu33f8bK3mcr7CnDfijyx9zzPA3SPo=; b=WamyQe9PhYdG9xJTug8E2N4IjJ6XCUHlTPqYAd3Z/Kc6t4e97y3k6nFO fQWwxEHEGCClT7QIafCkCsB4PyoaOJviscoE6fyOPxJaOtUTZ1sdFB70J cVHR8N5vy4E3N+d3Jafg5Qsft5fhOiDs78spZrUAebtRdgjo8Bn1f8lI7 lDdM4HhQk1xYw8G/5HpPNACzJdpP1FZ9KCEhnsxGT8qmn1t8M/3e92Pde u33NNBQFgA93rwsDz9HSXLlWsEgJa2GyLmtxFZ82mFoP6NDPOkqtokJVf 4WYtTfJn4dIMVnOTTmB8ERWl72cIpHIJoxIU3KY+N/9hUo5UtezHyc1J+ A==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207567" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207567" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:51 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016353" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016353" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:49 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v2 31/39] x86/cet/shstk: Wire in CET interface Date: Thu, 29 Sep 2022 15:29:28 -0700 Message-Id: <20220929222936.14584-32-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=WamyQe9P; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490652; a=rsa-sha256; cv=none; b=MGmAiG1IbpN6WZFOirMUXTQ2hWzcwMfEHuKi0U6Vd3wWbaUpdbmn1/t7jUGhsvbKo0zuXG ZH1FVryoRTGDYY3ynMJi5pVHKR3HPJzTw47ZzpHqV9j/zmjwIixenb00enRSqPOmK004v5 fqyaKkYnOv1obYZutWWT8RrqPmxfMvM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490652; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=pIU+MFYxIuSqjSIdCsgc/ltahySYgaVHA8pVv5d6kdw=; b=yKRjE1JjtJlW9n1RY/OFnvNqB7uFtQZgJCLdKSAvbr5fHWySQ2ZIlmQfDrV7fb2PqUXux1 HNdJaIFjMtfzhRfUILFk9cHiBe4DMV8X1j6eO/gDKeXkXPsBy1gbQ5i0+VafgxwPdy2wsu pNDCqwaW8fXItPpOA1+Igxk6oX/qpmE= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=WamyQe9P; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: 89qhc7w6krofczycjak4946gz16ouaxc X-Rspamd-Queue-Id: 42D3680015 X-Rspamd-Server: rspam05 X-HE-Tag: 1664490651-516687 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The kernel now has the main CET functionality to support applications. Wire in the WRSS and shadow stack enable/disable functions into the existing CET API skeleton. Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook --- v2: - Split from other patches arch/x86/kernel/shstk.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index fc64a04366aa..0efec02dbe6b 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -477,9 +477,17 @@ long cet_prctl(struct task_struct *task, int option, unsigned long features) return -EINVAL; if (option == ARCH_CET_DISABLE) { + if (features & CET_WRSS) + return wrss_control(false); + if (features & CET_SHSTK) + return shstk_disable(); return -EINVAL; } /* Handle ARCH_CET_ENABLE */ + if (features & CET_SHSTK) + return shstk_setup(); + if (features & CET_WRSS) + return wrss_control(true); return -EINVAL; } From patchwork Thu Sep 29 22:29:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994686 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5B80C433F5 for ; Thu, 29 Sep 2022 22:30:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D4B98D0019; Thu, 29 Sep 2022 18:30:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 386108D000C; Thu, 29 Sep 2022 18:30:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 188F68D0019; Thu, 29 Sep 2022 18:30:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EE31C8D000C for ; Thu, 29 Sep 2022 18:30:54 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id BD8E6ABC5D for ; Thu, 29 Sep 2022 22:30:54 +0000 (UTC) X-FDA: 79966569228.20.E867F3C Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id 032F48000F for ; Thu, 29 Sep 2022 22:30:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490654; x=1696026654; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=MI1WvPXOkWZtq+RC4qAdjj6kqmoT7KI2Qlv92j0zT4E=; b=lmdJYqWpaWF6ZailXgc45BWv48QhF0lbn1wCA66ZAj0zejRzhEncfN6H 41ILLJQIPsI4V39JUI2SwZUGlMDuLrIPf0id/rmMHSUYchKwZ3rVe/J3v lGiVk+n3dNKnp4P0yCpAbzKZH/Kie7BgMVQnFmfDnw884H6ygJAdqDde/ IO1rk57guNKN3NGy4/Ug2kZVBFbkILqlpMBLM1ghtDUtZWlzy3mFerMN/ fx0XX2pVXRKVAGj9c4IF85ic/Q7JfE6YcDRht27Jcnmo4B5LY5idcXESJ 5fbHNJJlJc668ZYl5DJ2DfMx9Cdt7OS2WRJYMi5kCxIFaGAFM4Reblj49 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207572" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207572" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:53 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016356" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016356" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:51 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 32/39] selftests/x86: Add shadow stack test Date: Thu, 29 Sep 2022 15:29:29 -0700 Message-Id: <20220929222936.14584-33-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=lmdJYqWp; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490654; a=rsa-sha256; cv=none; b=OUCFVQThflh0t3mDrXJeRBluw3Z0A27Hoe/i2fYeoS5MM3hBnffa06oQPcAQmwSNgnNkZl flkDlbnNu4+12kB9bEU9fC5TnVxjMHtfwGYKYvQRum5tCeE/vQ2PcdDc2RV2IZ3FQFUfe6 t5imEnUmhftj3xMrO3Qb2mMKUEL2ftc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490654; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=vKQn3vBtM2rOz8LJ9dsaI2FEWbNGKBwySK+u9NhBG7o=; b=kdD/UvuavrcSL4Ee0n1LtHko5WQaMWYCN7nUL9U9ORmqcuvygAc7mqUn1bfoyyC5DJ95AY fQybSWH4OmjuwMg55IM32f8HehFbTvoRBS8DPsnmJeUFrslXWA80GICvi2O52N+nxMFeNp +VB6+SKBxvHwcJg7TeWB3eE3oef2OGM= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=lmdJYqWp; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: e1kmh54r7bhc1sxzz85b8zcw814aamif X-Rspamd-Queue-Id: 032F48000F X-Rspamd-Server: rspam05 X-HE-Tag: 1664490653-764659 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a simple selftest for exercising some shadow stack behavior: - map_shadow_stack syscall and pivot - Faulting in shadow stack memory - Handling shadow stack violations - GUP of shadow stack memory - mprotect() of shadow stack memory - Userfaultfd on shadow stack memory Since this test exercises a recently added syscall manually, it needs to find the automatically created __NR_foo defines. Per the selftest documentation, KHDR_INCLUDES can be used to help the selftest Makefile's find the headers from the kernel source. This way the new selftest can be built inside the kernel source tree without installing the headers to the system. So also add KHDR_INCLUDES as described in the selftest docs, to facilitate this. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe --- v2: - Change print statements to more align with other selftests - Add more tests - Add KHDR_INCLUDES to Makefile v1: - New patch. tools/testing/selftests/x86/Makefile | 4 +- .../testing/selftests/x86/test_shadow_stack.c | 571 ++++++++++++++++++ 2 files changed, 573 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/x86/test_shadow_stack.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 0388c4d60af0..cfc8a26ad151 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -18,7 +18,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \ - corrupt_xstate_header amx + corrupt_xstate_header amx test_shadow_stack # Some selftests require 32bit support enabled also on 64bit systems TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall @@ -34,7 +34,7 @@ BINARIES_64 := $(TARGETS_C_64BIT_ALL:%=%_64) BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32)) BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64)) -CFLAGS := -O2 -g -std=gnu99 -pthread -Wall +CFLAGS := -O2 -g -std=gnu99 -pthread -Wall $(KHDR_INCLUDES) # call32_from_64 in thunks.S uses absolute addresses. ifeq ($(CAN_BUILD_WITH_NOPIE),1) diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c new file mode 100644 index 000000000000..249397736d0d --- /dev/null +++ b/tools/testing/selftests/x86/test_shadow_stack.c @@ -0,0 +1,571 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This program test's basic kernel shadow stack support. It enables shadow + * stack manual via the arch_prctl(), instead of relying on glibc. It's + * Makefile doesn't compile with shadow stack support, so it doesn't rely on + * any particular glibc. As a result it can't do any operations that require + * special glibc shadow stack support (longjmp(), swapcontext(), etc). Just + * stick to the basics and hope the compiler doesn't do anything strange. + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SS_SIZE 0x200000 + +#if (__GNUC__ < 8) || (__GNUC__ == 8 && __GNUC_MINOR__ < 5) +int main(int argc, char *argv[]) +{ + printf("[SKIP]\tCompiler does not support CET.\n"); + return 0; +} +#else +void write_shstk(unsigned long *addr, unsigned long val) +{ + asm volatile("wrssq %[val], (%[addr])\n" + : "+m" (addr) + : [addr] "r" (addr), [val] "r" (val)); +} + +static inline unsigned long __attribute__((always_inline)) get_ssp(void) +{ + unsigned long ret = 0; + + asm volatile("xor %0, %0; rdsspq %0" : "=r" (ret)); + return ret; +} + +/* + * For use in inline enablement of shadow stack. + * + * The program can't return from the point where shadow stack get's enabled + * because there will be no address on the shadow stack. So it can't use + * syscall() for enablement, since it is a function. + * + * Based on code from nolibc.h. Keep a copy here because this can't pull in all + * of nolibc.h. + */ +#define ARCH_PRCTL(arg1, arg2) \ +({ \ + long _ret; \ + register long _num asm("eax") = __NR_arch_prctl; \ + register long _arg1 asm("rdi") = (long)(arg1); \ + register long _arg2 asm("rsi") = (long)(arg2); \ + \ + asm volatile ( \ + "syscall\n" \ + : "=a"(_ret) \ + : "r"(_arg1), "r"(_arg2), \ + "0"(_num) \ + : "rcx", "r11", "memory", "cc" \ + ); \ + _ret; \ +}) + +void *create_shstk(void *addr) +{ + return (void *)syscall(__NR_map_shadow_stack, addr, SS_SIZE, SHADOW_STACK_SET_TOKEN); +} + +void *create_normal_mem(void *addr) +{ + return mmap(addr, SS_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); +} + +void free_shstk(void *shstk) +{ + munmap(shstk, SS_SIZE); +} + +int reset_shstk(void *shstk) +{ + return madvise(shstk, SS_SIZE, MADV_DONTNEED); +} + +void try_shstk(unsigned long new_ssp) +{ + unsigned long ssp; + + printf("[INFO]\tnew_ssp = %lx, *new_ssp = %lx\n", + new_ssp, *((unsigned long *)new_ssp)); + + ssp = get_ssp(); + printf("[INFO]\tchanging ssp from %lx to %lx\n", ssp, new_ssp); + + asm volatile("rstorssp (%0)\n":: "r" (new_ssp)); + asm volatile("saveprevssp"); + printf("[INFO]\tssp is now %lx\n", get_ssp()); + + /* Switch back to original shadow stack */ + ssp -= 8; + asm volatile("rstorssp (%0)\n":: "r" (ssp)); + asm volatile("saveprevssp"); +} + +int test_shstk_pivot(void) +{ + void *shstk = create_shstk(0); + + if (shstk == MAP_FAILED) { + printf("[FAIL]\tError creating shadow stack: %d\n", errno); + return 1; + } + try_shstk((unsigned long)shstk + SS_SIZE - 8); + free_shstk(shstk); + + printf("[OK]\tShadow stack pivot\n"); + return 0; +} + +int test_shstk_faults(void) +{ + unsigned long *shstk = create_shstk(0); + + /* Read shadow stack, test if it's zero to not get read optimized out */ + if (*shstk != 0) + goto err; + + /* Wrss memory that was already read. */ + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + /* Page out memory, so we can wrss it again. */ + if (reset_shstk((void *)shstk)) + goto err; + + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + printf("[OK]\tShadow stack faults\n"); + return 0; + +err: + return 1; +} + +unsigned long saved_ssp; +unsigned long saved_ssp_val; +volatile bool segv_triggered; + +void __attribute__((noinline)) violate_ss(void) +{ + saved_ssp = get_ssp(); + saved_ssp_val = *(unsigned long *)saved_ssp; + + /* Corrupt shadow stack */ + printf("[INFO]\tCorrupting shadow stack\n"); + write_shstk((void *)saved_ssp, 0); +} + +void segv_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tGenerated shadow stack violation successfully\n"); + + segv_triggered = true; + + /* Fix shadow stack */ + write_shstk((void *)saved_ssp, saved_ssp_val); +} + +int test_shstk_violation(void) +{ + struct sigaction sa; + + sa.sa_sigaction = segv_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + /* Make sure segv_triggered is set before violate_ss() */ + asm volatile("" : : : "memory"); + + violate_ss(); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow stack violation test\n"); + + return !segv_triggered; +} + +/* Gup test state */ +#define MAGIC_VAL 0x12345678 +bool is_shstk_access; +void *shstk_ptr; +int fd; + +void reset_test_shstk(void *addr) +{ + if (shstk_ptr != NULL) + free_shstk(shstk_ptr); + shstk_ptr = create_shstk(addr); +} + +void test_access_fix_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tViolation from %s\n", is_shstk_access ? "shstk access" : "normal write"); + + segv_triggered = true; + + /* Fix shadow stack */ + if (is_shstk_access) { + reset_test_shstk(shstk_ptr); + return; + } + + free_shstk(shstk_ptr); + create_normal_mem(shstk_ptr); +} + +bool test_shstk_access(void *ptr) +{ + is_shstk_access = true; + segv_triggered = false; + write_shstk(ptr, MAGIC_VAL); + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool test_write_access(void *ptr) +{ + is_shstk_access = false; + segv_triggered = false; + *(unsigned long *)ptr = MAGIC_VAL; + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool gup_write(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (write(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +bool gup_read(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (read(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +int test_gup(void) +{ + struct sigaction sa; + int status; + pid_t pid; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + return 1; + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> write access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> write access success\n"); + + close(fd); + + /* COW/gup test */ + reset_test_shstk(0); + pid = fork(); + if (!pid) { + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + exit(1); + + if (gup_write(shstk_ptr)) { + close(fd); + exit(1); + } + close(fd); + exit(0); + } + waitpid(pid, &status, 0); + if (WEXITSTATUS(status)) { + printf("[FAIL]\tWrite in child failed\n"); + return 1; + } + if (*(unsigned long *)shstk_ptr == MAGIC_VAL) { + printf("[FAIL]\tWrite in child wrote through to shared memory\n"); + return 1; + } + + printf("[INFO]\tCow gup write -> write access success\n"); + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow gup test\n"); + + return 0; +} + +int test_mprotect(void) +{ + struct sigaction sa; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + /* mprotect a shaodw stack as read only */ + reset_test_shstk(0); + if (mprotect(shstk_ptr, SS_SIZE, PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_READ) failed\n"); + return 1; + } + + /* try to wrss it and fail */ + if (!test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to read-only memory succeeded\n"); + return 1; + } + + /* then back to writable */ + if (mprotect(shstk_ptr, SS_SIZE, PROT_WRITE | PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_WRITE) failed\n"); + return 1; + } + + /* then pivot to it and succeed */ + if (test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to mprotect() writable memory failed\n"); + return 1; + } + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tmprotect() test\n"); + + return 0; +} + +char zero[4096]; + +static void *uffd_thread(void *arg) +{ + struct uffdio_copy req; + int uffd = *(int *)arg; + struct uffd_msg msg; + + if (read(uffd, &msg, sizeof(msg)) <= 0) + return (void *)1; + + req.dst = msg.arg.pagefault.address; + req.src = (__u64)zero; + req.len = 4096; + req.mode = 0; + + if (ioctl(uffd, UFFDIO_COPY, &req)) + return (void *)1; + + return (void *)0; +} + +int test_userfaultfd(void) +{ + struct uffdio_register uffdio_register; + struct uffdio_api uffdio_api; + struct sigaction sa; + pthread_t thread; + void *res; + int uffd; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) { + printf("[SKIP]\tUserfaultfd unavailable.\n"); + return 0; + } + + reset_test_shstk(0); + + uffdio_api.api = UFFD_API; + uffdio_api.features = 0; + if (ioctl(uffd, UFFDIO_API, &uffdio_api)) + goto err; + + uffdio_register.range.start = (__u64)shstk_ptr; + uffdio_register.range.len = 4096; + uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; + if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) + goto err; + + if (pthread_create(&thread, NULL, &uffd_thread, &uffd)) + goto err; + + test_shstk_access(shstk_ptr); + + if (pthread_join(thread, &res)) + goto err; + + if (test_shstk_access(shstk_ptr)) + goto err; + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tUserfaultfd test\n"); + return !!res; +err: + free_shstk(shstk_ptr); + close(uffd); + signal(SIGSEGV, SIG_DFL); + return 1; +} + +int main(int argc, char *argv[]) +{ + int ret = 0; + + if (ARCH_PRCTL(ARCH_CET_ENABLE, CET_SHSTK)) { + printf("[SKIP]\tCould not enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_CET_DISABLE, CET_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + if (ARCH_PRCTL(ARCH_CET_ENABLE, CET_SHSTK)) { + printf("[SKIP]\tCould not re-enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_CET_ENABLE, CET_WRSS)) { + printf("[SKIP]\tCould not enable WRSS\n"); + ret = 1; + goto out; + } + + /* Should have succeeded if here, but this is a test, so double check. */ + if (!get_ssp()) { + printf("[FAIL]\tShadow stack disabled\n"); + return 1; + } + + if (test_shstk_pivot()) { + ret = 1; + printf("[FAIL]\tShadow stack pivot\n"); + goto out; + } + + if (test_shstk_faults()) { + ret = 1; + printf("[FAIL]\tShadow stack fault test\n"); + goto out; + } + + if (test_shstk_violation()) { + ret = 1; + printf("[FAIL]\tShadow stack violation test\n"); + goto out; + } + + if (test_gup()) { + ret = 1; + printf("[FAIL]\tShadow shadow stack gup\n"); + } + + if (test_mprotect()) { + ret = 1; + printf("[FAIL]\tShadow shadow mprotect test\n"); + } + + if (test_userfaultfd()) { + ret = 1; + printf("[FAIL]\tUserfaultfd test\n"); + } + +out: + /* + * Disable shadow stack before the function returns, or there will be a + * shadow stack violation. + */ + if (ARCH_PRCTL(ARCH_CET_DISABLE, CET_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + return ret; +} +#endif From patchwork Thu Sep 29 22:29:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70B36C433FE for ; Thu, 29 Sep 2022 22:30:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C7678D000C; Thu, 29 Sep 2022 18:30:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0507A8D0007; Thu, 29 Sep 2022 18:30:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBD928D000C; Thu, 29 Sep 2022 18:30:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CD7388D0007 for ; Thu, 29 Sep 2022 18:30:57 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B350380483 for ; Thu, 29 Sep 2022 22:30:57 +0000 (UTC) X-FDA: 79966569354.03.BCED123 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id 313928000F for ; Thu, 29 Sep 2022 22:30:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490657; x=1696026657; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=/a1pKHmwhVYyv4Ui3rbvLcahiUY8jR6mb1CYyGIJjno=; b=T6r0lnE9GQf7994PbYdeew4/HdoZEP84EDrPeibT0FDFRjwNKufJu2OV TK+2yYpj2FGyHR28KqleDUYS4lHx2DCGWnD+GOIoyZ5scWwPbmRMAuKOb a7TBemPvNj4YUfScNnol3+r1JuNRrYbXB1JwIMVIOQStxbUE2lLWuVkHP EKqUgksmOgYiXWz8fCNkWWEr1L1dZLM3zBFvmKEhtKhi2L/RA1jQm0NLa TPoUEwr2qsed8S4/X7gkqUX/kuZtk0m67Bd9+3Ujx/dtx6ZNSmj+wQjR3 0JmFg0+x+6g7yucdgFJx29slwMHvAzV4UdOhmMNjdMNN+A4PDUB+y4MUe w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207578" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207578" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:55 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016364" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016364" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:53 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v2 33/39] x86/cpufeatures: Limit shadow stack to Intel CPUs Date: Thu, 29 Sep 2022 15:29:30 -0700 Message-Id: <20220929222936.14584-34-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=T6r0lnE9; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490657; a=rsa-sha256; cv=none; b=ocpZGgKLrckHjmJca0D/E9GAoE+PxHzL9tTQ0I93encB7o8aXIdu4/t32oRBMw8aefqR2G MSQr/QAs1QvFROYDEUQPtA5jFQneZ6Ys8ePzHQxI3aW6KipmOFwOgOny7Bcw5j3vy4RXYL IIVerNf+29aZMd2EwcEtVytxpYESw8o= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490657; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=ZvZHPtice4S+QtAh8n4S4qav6Cl3YwQ728ZutqRRbh0=; b=Bk4Ovsm/H7C6CS/mcP4BG5Jq1D1hPuHylnYtpV3V3By+w5MM4NqNVKMGu1zHP3VStlKKgX i639Ixa8rwptI8K5+n3gCWZ10VZTTwarS/znYiJb1e5OMqpFmUgTubcqt0zk/2ON7M0o6G sWKThEKW2HBy/HdO4gjos+2WScIRjJk= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=T6r0lnE9; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: p3wod1tuoy1zoh196xpsf76qeo9nerqz X-Rspamd-Queue-Id: 313928000F X-Rspamd-Server: rspam05 X-HE-Tag: 1664490657-297915 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack is supported on newer AMD processors, but the kernel implementation has not been tested on them. Prevent basic issues from showing up for normal users by disabling shadow stack on all CPUs except Intel until it has been tested. At which point the limitation should be removed. Signed-off-by: Rick Edgecombe --- v1: - New patch. arch/x86/kernel/cpu/common.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index d7415bb556b2..f7cacc5698d5 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -606,6 +606,14 @@ static __always_inline void setup_cet(struct cpuinfo_x86 *c) if (!kernel_ibt && !user_shstk) return; + /* + * Shadow stack is supported on AMD processors, but has not been + * tested. Only support it on Intel processors until this is done. + * At which point, this vendor check should be removed. + */ + if (c->x86_vendor != X86_VENDOR_INTEL) + setup_clear_cpu_cap(X86_FEATURE_SHSTK); + if (kernel_ibt) msr = CET_ENDBR_EN; From patchwork Thu Sep 29 22:29:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994688 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFB27C433F5 for ; Thu, 29 Sep 2022 22:30:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CFC3A8D001A; Thu, 29 Sep 2022 18:30:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CAB9D8D0007; Thu, 29 Sep 2022 18:30:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8A558D001A; Thu, 29 Sep 2022 18:30:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9770D8D0007 for ; Thu, 29 Sep 2022 18:30:58 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7C5B7C095E for ; Thu, 29 Sep 2022 22:30:58 +0000 (UTC) X-FDA: 79966569396.24.C07732D Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id E008F80012 for ; Thu, 29 Sep 2022 22:30:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490657; x=1696026657; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=o5EM2h/irvXLoTBpZNMDXHTEc20J7+qtmhCjx0ujYys=; b=a2sAeOTmeo4xoWMu8+CoCS9GGkiOx90MYGePP//2ZqGuH+j2yuo+3aqJ s/tdhv25OgqE83F3JzCAMB2vEOAvaMzkqh0BWLc304GtS/Frw2NZuCg5w tMnq3w5UX0/Q0kA/4e45toglb6lxq8t3e2rVg8hyRGsSKQDLWOTuZuQCo IxVjPAxCfcwI2fq4Gr2Ux7Ua55XPeWI+uBWx+aRULv9AMEsoU+0ACiDZW ZPsiF/Fa3IiCcP4qAllPMQsC3kSxz+XehZjpRxkLnUOjQxBPlVwgHm5k8 iHPdSntAMzNKJJWeueJfAhJ/wyLNSat4zlp4QcvC8kjSWKD4aCikdNll1 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207586" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207586" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:57 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016369" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016369" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:55 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [OPTIONAL/CLEANUP v2 34/39] x86: Separate out x86_regset for 32 and 64 bit Date: Thu, 29 Sep 2022 15:29:31 -0700 Message-Id: <20220929222936.14584-35-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=a2sAeOTm; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490658; a=rsa-sha256; cv=none; b=24gjMzKZ93NPjyxxx+frB+JXg5gcKWqBq/VCHUqm3fEBGC2q4z6gvYBauwC7AwZc5h6wQ5 oN3q4VDnQAPP49AdtsykpX/TRqmPwdYWAN9gFjJ9foCnYQ9FVoztyxakNr0mu/NImsrKP5 Jv+paubFepo58lBj/TZ/YTiEwWANqQY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490658; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TDYDj7RAfk7yldiHnVNnirXWX/+xKjES3TB5HelLfUg=; b=gnn6ExJdfMuiyL3I/iyQpWRXVTIxOmUTqeu6W3SFROFxipZPzK68qxxB3YzbDpi1hVNLGc z06Gr12MuqX3jOw29I9dymaCtNCGQSJTt3EVm09E/D5Dj8rFC/LPSkwAoYsyMS3FS4M9z0 hWfdebFXFwPQu50EjWXvLTxs94/R/OM= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=a2sAeOTm; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: de5kgwjatgkkxfgh4drudporsjbfm1az X-Rspamd-Queue-Id: E008F80012 X-Rspamd-Server: rspam05 X-HE-Tag: 1664490657-839854 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In fill_thread_core_info() the ptrace accessible registers are collected for a core file to be written out as notes. The note array is allocated from a size calculated by iterating the user regset view, and counting the regsets that have a non-zero core_note_type. However, this only allows for there to be non-zero core_note_type at the end of the regset view. If there are any in the middle, fill_thread_core_info() will overflow the note allocation, as it iterates over the size of the view and the allocation would be smaller than that. To apparently avoid this problem, x86_32_regsets and x86_64_regsets need to be constructed in a special way. They both draw their indices from a shared enum x86_regset, but 32 bit and 64 bit don't all support the same regsets and can be compiled in at the same time in the case of IA32_EMULATION. So this enum has to be laid out in a special way such that there are no gaps for both x86_32_regsets and x86_64_regsets. This involves ordering them just right by creating aliases for enum’s that are only in one view or the other, or creating multiple versions like REGSET_IOPERM32/REGSET_IOPERM64. So the collection of the registers tries to minimize the size of the allocation, but it doesn’t quite work. Then the x86 ptrace side works around it by constructing the enum just right to avoid a problem. In the end there is no functional problem, but it is somewhat strange and fragile. It could also be improved like this [1], by better utilizing the smaller array, but this still wastes space in the regset array’s if they are not carefully crafted to avoid gaps. Instead, just fully separate out the enums and give them separate 32 and 64 enum names. Add some bitsize-free defines for REGSET_GENERAL and REGSET_FP since they are the only two referred to in bitsize generic code. This should have no functional change and is only changing how constants are generated and referred to. [1] https://lore.kernel.org/lkml/20180717162502.32274-1-yu-cheng.yu@intel.com/ Signed-off-by: Rick Edgecombe --- v2: - New patch arch/x86/kernel/ptrace.c | 61 ++++++++++++++++++++++++++-------------- 1 file changed, 40 insertions(+), 21 deletions(-) diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 37c12fb92906..1a4df5fbc5e9 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -44,16 +44,35 @@ #include "tls.h" -enum x86_regset { - REGSET_GENERAL, - REGSET_FP, - REGSET_XFP, - REGSET_IOPERM64 = REGSET_XFP, - REGSET_XSTATE, - REGSET_TLS, +enum x86_regset_32 { + REGSET_GENERAL32, + REGSET_FP32, + REGSET_XFP32, + REGSET_XSTATE32, + REGSET_TLS32, REGSET_IOPERM32, }; +enum x86_regset_64 { + REGSET_GENERAL64, + REGSET_FP64, + REGSET_IOPERM64, + REGSET_XSTATE64, +}; + +#define REGSET_GENERAL \ +({ \ + BUILD_BUG_ON((int)REGSET_GENERAL32 != (int)REGSET_GENERAL64); \ + REGSET_GENERAL32; \ +}) + +#define REGSET_FP \ +({ \ + BUILD_BUG_ON((int)REGSET_FP32 != (int)REGSET_FP64); \ + REGSET_FP32; \ +}) + + struct pt_regs_offset { const char *name; int offset; @@ -788,13 +807,13 @@ long arch_ptrace(struct task_struct *child, long request, #ifdef CONFIG_X86_32 case PTRACE_GETFPXREGS: /* Get the child extended FPU state. */ return copy_regset_to_user(child, &user_x86_32_view, - REGSET_XFP, + REGSET_XFP32, 0, sizeof(struct user_fxsr_struct), datap) ? -EIO : 0; case PTRACE_SETFPXREGS: /* Set the child extended FPU state. */ return copy_regset_from_user(child, &user_x86_32_view, - REGSET_XFP, + REGSET_XFP32, 0, sizeof(struct user_fxsr_struct), datap) ? -EIO : 0; #endif @@ -1086,13 +1105,13 @@ static long ia32_arch_ptrace(struct task_struct *child, compat_long_t request, case PTRACE_GETFPXREGS: /* Get the child extended FPU state. */ return copy_regset_to_user(child, &user_x86_32_view, - REGSET_XFP, 0, + REGSET_XFP32, 0, sizeof(struct user32_fxsr_struct), datap); case PTRACE_SETFPXREGS: /* Set the child extended FPU state. */ return copy_regset_from_user(child, &user_x86_32_view, - REGSET_XFP, 0, + REGSET_XFP32, 0, sizeof(struct user32_fxsr_struct), datap); @@ -1215,19 +1234,19 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request, #ifdef CONFIG_X86_64 static struct user_regset x86_64_regsets[] __ro_after_init = { - [REGSET_GENERAL] = { + [REGSET_GENERAL64] = { .core_note_type = NT_PRSTATUS, .n = sizeof(struct user_regs_struct) / sizeof(long), .size = sizeof(long), .align = sizeof(long), .regset_get = genregs_get, .set = genregs_set }, - [REGSET_FP] = { + [REGSET_FP64] = { .core_note_type = NT_PRFPREG, .n = sizeof(struct fxregs_state) / sizeof(long), .size = sizeof(long), .align = sizeof(long), .active = regset_xregset_fpregs_active, .regset_get = xfpregs_get, .set = xfpregs_set }, - [REGSET_XSTATE] = { + [REGSET_XSTATE64] = { .core_note_type = NT_X86_XSTATE, .size = sizeof(u64), .align = sizeof(u64), .active = xstateregs_active, .regset_get = xstateregs_get, @@ -1256,31 +1275,31 @@ static const struct user_regset_view user_x86_64_view = { #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION static struct user_regset x86_32_regsets[] __ro_after_init = { - [REGSET_GENERAL] = { + [REGSET_GENERAL32] = { .core_note_type = NT_PRSTATUS, .n = sizeof(struct user_regs_struct32) / sizeof(u32), .size = sizeof(u32), .align = sizeof(u32), .regset_get = genregs32_get, .set = genregs32_set }, - [REGSET_FP] = { + [REGSET_FP32] = { .core_note_type = NT_PRFPREG, .n = sizeof(struct user_i387_ia32_struct) / sizeof(u32), .size = sizeof(u32), .align = sizeof(u32), .active = regset_fpregs_active, .regset_get = fpregs_get, .set = fpregs_set }, - [REGSET_XFP] = { + [REGSET_XFP32] = { .core_note_type = NT_PRXFPREG, .n = sizeof(struct fxregs_state) / sizeof(u32), .size = sizeof(u32), .align = sizeof(u32), .active = regset_xregset_fpregs_active, .regset_get = xfpregs_get, .set = xfpregs_set }, - [REGSET_XSTATE] = { + [REGSET_XSTATE32] = { .core_note_type = NT_X86_XSTATE, .size = sizeof(u64), .align = sizeof(u64), .active = xstateregs_active, .regset_get = xstateregs_get, .set = xstateregs_set }, - [REGSET_TLS] = { + [REGSET_TLS32] = { .core_note_type = NT_386_TLS, .n = GDT_ENTRY_TLS_ENTRIES, .bias = GDT_ENTRY_TLS_MIN, .size = sizeof(struct user_desc), @@ -1311,10 +1330,10 @@ u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS]; void __init update_regset_xstate_info(unsigned int size, u64 xstate_mask) { #ifdef CONFIG_X86_64 - x86_64_regsets[REGSET_XSTATE].n = size / sizeof(u64); + x86_64_regsets[REGSET_XSTATE64].n = size / sizeof(u64); #endif #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION - x86_32_regsets[REGSET_XSTATE].n = size / sizeof(u64); + x86_32_regsets[REGSET_XSTATE32].n = size / sizeof(u64); #endif xstate_fx_sw_bytes[USER_XSTATE_XCR0_WORD] = xstate_mask; } From patchwork Thu Sep 29 22:29:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 369A1C4332F for ; Thu, 29 Sep 2022 22:31:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CAF938D0015; Thu, 29 Sep 2022 18:31:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C386C8D0007; Thu, 29 Sep 2022 18:31:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD9208D0015; Thu, 29 Sep 2022 18:31:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A0F288D0007 for ; Thu, 29 Sep 2022 18:31:01 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7BDE01213EF for ; Thu, 29 Sep 2022 22:31:01 +0000 (UTC) X-FDA: 79966569522.06.06BBD16 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id 662698000F for ; Thu, 29 Sep 2022 22:31:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490660; x=1696026660; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=eB1cPNZIwM/4vJ9UX4P/L+g239SirPcbtbxQL47/qDg=; b=eigAfoeWPj6y1ymLg7/QLDo9qrqOkj9ssnrB3Vp9CQjP9uGaegtDFerl c5zMwMcxQV0RiYNVNlqClcFF/JYCokwk7Uzo/QtLILP/Jq71+Euzu4isf BquYpqMMRbyw+Tocs0QR5LyekxAPca3jzR18N2uOmLG58SBs47pLlt7if 5TeCGnbeftD3LSpkeR5cJXs652fifLyPjiIPK5qpHg20aCf850TFQSBwi SENbKDNrs0aDpsmb36NoWi69orEvJimZ/grxPpy4MXn/C5o19qGnqr6l9 Jt8qfhC2eHP2NbF6Z4lMu9bSmMxBRcmL/1Xnj2NGcN3dtrGH5IHnvHzwY w==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207591" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207591" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:59 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016373" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016373" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:57 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [OPTIONAL/CLEANUP v2 35/39] x86: Improve formatting of user_regset arrays Date: Thu, 29 Sep 2022 15:29:32 -0700 Message-Id: <20220929222936.14584-36-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=eigAfoeW; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490661; a=rsa-sha256; cv=none; b=Go8mHDINvve2B4u5LACv/gqi3K+PZQvjxGCm/EiDGfmw96rEFiIfW7ksMQ2Bdip3dDYUKW c7noPRy84MqWd2uizWHqdO5OJSBFlLRDed3UwEGGDThDozZkLclPcqRKJvqJfNAZi7HfRv fOnWXmLYs57h2Kxu5ahkk/azyeG/INE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490661; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=eKhh+puzuVKOwR9JiDJkvVS21wbFTX0x6OZlw0Afe9U=; b=bpvoICwe2XeIEATdoprui6AWjAQLKHIHckicVBj1vlY6t/9Czu/2CiFwbHr1idx0/nxOs7 K2FxWpAxLA/sl4zoNwNl15oxO4+hW0tX11ooYc9XuJB6jxdCky8W01keoKQuAoHsXoRM+Y pTJPOhUyEMl5I6IPPX5Agsll1Nv2Av8= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=eigAfoeW; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: x7aztpgmfxqfinw735q176u3o6fnd7hj X-Rspamd-Queue-Id: 662698000F X-Rspamd-Server: rspam05 X-HE-Tag: 1664490660-150312 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Back in 2018, Ingo Molnar suggested[0] to improve the formatting of the struct user_regset arrays. They have multiple member initializations per line and some lines exceed 100 chars. Reformat them like he suggested. [0] https://lore.kernel.org/lkml/20180711102035.GB8574@gmail.com/ Signed-off-by: Rick Edgecombe --- v2: - New patch arch/x86/kernel/ptrace.c | 107 ++++++++++++++++++++++++--------------- 1 file changed, 65 insertions(+), 42 deletions(-) diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 1a4df5fbc5e9..eed8a65d335d 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -1235,28 +1235,37 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request, static struct user_regset x86_64_regsets[] __ro_after_init = { [REGSET_GENERAL64] = { - .core_note_type = NT_PRSTATUS, - .n = sizeof(struct user_regs_struct) / sizeof(long), - .size = sizeof(long), .align = sizeof(long), - .regset_get = genregs_get, .set = genregs_set + .core_note_type = NT_PRSTATUS, + .n = sizeof(struct user_regs_struct) / sizeof(long), + .size = sizeof(long), + .align = sizeof(long), + .regset_get = genregs_get, + .set = genregs_set }, [REGSET_FP64] = { - .core_note_type = NT_PRFPREG, - .n = sizeof(struct fxregs_state) / sizeof(long), - .size = sizeof(long), .align = sizeof(long), - .active = regset_xregset_fpregs_active, .regset_get = xfpregs_get, .set = xfpregs_set + .core_note_type = NT_PRFPREG, + .n = sizeof(struct fxregs_state) / sizeof(long), + .size = sizeof(long), + .align = sizeof(long), + .active = regset_xregset_fpregs_active, + .regset_get = xfpregs_get, + .set = xfpregs_set }, [REGSET_XSTATE64] = { - .core_note_type = NT_X86_XSTATE, - .size = sizeof(u64), .align = sizeof(u64), - .active = xstateregs_active, .regset_get = xstateregs_get, - .set = xstateregs_set + .core_note_type = NT_X86_XSTATE, + .size = sizeof(u64), + .align = sizeof(u64), + .active = xstateregs_active, + .regset_get = xstateregs_get, + .set = xstateregs_set }, [REGSET_IOPERM64] = { - .core_note_type = NT_386_IOPERM, - .n = IO_BITMAP_LONGS, - .size = sizeof(long), .align = sizeof(long), - .active = ioperm_active, .regset_get = ioperm_get + .core_note_type = NT_386_IOPERM, + .n = IO_BITMAP_LONGS, + .size = sizeof(long), + .align = sizeof(long), + .active = ioperm_active, + .regset_get = ioperm_get }, }; @@ -1276,42 +1285,56 @@ static const struct user_regset_view user_x86_64_view = { #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION static struct user_regset x86_32_regsets[] __ro_after_init = { [REGSET_GENERAL32] = { - .core_note_type = NT_PRSTATUS, - .n = sizeof(struct user_regs_struct32) / sizeof(u32), - .size = sizeof(u32), .align = sizeof(u32), - .regset_get = genregs32_get, .set = genregs32_set + .core_note_type = NT_PRSTATUS, + .n = sizeof(struct user_regs_struct32) / sizeof(u32), + .size = sizeof(u32), + .align = sizeof(u32), + .regset_get = genregs32_get, + .set = genregs32_set }, [REGSET_FP32] = { - .core_note_type = NT_PRFPREG, - .n = sizeof(struct user_i387_ia32_struct) / sizeof(u32), - .size = sizeof(u32), .align = sizeof(u32), - .active = regset_fpregs_active, .regset_get = fpregs_get, .set = fpregs_set + .core_note_type = NT_PRFPREG, + .n = sizeof(struct user_i387_ia32_struct) / sizeof(u32), + .size = sizeof(u32), + .align = sizeof(u32), + .active = regset_fpregs_active, + .regset_get = fpregs_get, + .set = fpregs_set }, [REGSET_XFP32] = { - .core_note_type = NT_PRXFPREG, - .n = sizeof(struct fxregs_state) / sizeof(u32), - .size = sizeof(u32), .align = sizeof(u32), - .active = regset_xregset_fpregs_active, .regset_get = xfpregs_get, .set = xfpregs_set + .core_note_type = NT_PRXFPREG, + .n = sizeof(struct fxregs_state) / sizeof(u32), + .size = sizeof(u32), + .align = sizeof(u32), + .active = regset_xregset_fpregs_active, + .regset_get = xfpregs_get, + .set = xfpregs_set }, [REGSET_XSTATE32] = { - .core_note_type = NT_X86_XSTATE, - .size = sizeof(u64), .align = sizeof(u64), - .active = xstateregs_active, .regset_get = xstateregs_get, - .set = xstateregs_set + .core_note_type = NT_X86_XSTATE, + .size = sizeof(u64), + .align = sizeof(u64), + .active = xstateregs_active, + .regset_get = xstateregs_get, + .set = xstateregs_set }, [REGSET_TLS32] = { - .core_note_type = NT_386_TLS, - .n = GDT_ENTRY_TLS_ENTRIES, .bias = GDT_ENTRY_TLS_MIN, - .size = sizeof(struct user_desc), - .align = sizeof(struct user_desc), - .active = regset_tls_active, - .regset_get = regset_tls_get, .set = regset_tls_set + .core_note_type = NT_386_TLS, + .n = GDT_ENTRY_TLS_ENTRIES, + .bias = GDT_ENTRY_TLS_MIN, + .size = sizeof(struct user_desc), + .align = sizeof(struct user_desc), + .active = regset_tls_active, + .regset_get = regset_tls_get, + .set = regset_tls_set }, [REGSET_IOPERM32] = { - .core_note_type = NT_386_IOPERM, - .n = IO_BITMAP_BYTES / sizeof(u32), - .size = sizeof(u32), .align = sizeof(u32), - .active = ioperm_active, .regset_get = ioperm_get + .core_note_type = NT_386_IOPERM, + .n = IO_BITMAP_BYTES / sizeof(u32), + .size = sizeof(u32), + .align = sizeof(u32), + .active = ioperm_active, + .regset_get = ioperm_get }, }; From patchwork Thu Sep 29 22:29:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994690 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12481C433F5 for ; Thu, 29 Sep 2022 22:31:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A63CC8D0009; Thu, 29 Sep 2022 18:31:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A13758D0007; Thu, 29 Sep 2022 18:31:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B4AB8D0009; Thu, 29 Sep 2022 18:31:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7E2858D0007 for ; Thu, 29 Sep 2022 18:31:06 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5623E1A12D8 for ; Thu, 29 Sep 2022 22:31:06 +0000 (UTC) X-FDA: 79966569732.02.11FABEA Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf02.hostedemail.com (Postfix) with ESMTP id 62E548001B for ; Thu, 29 Sep 2022 22:31:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490665; x=1696026665; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=z6bKqSpBSm7G4FnjcLcDWt5cOl/k3rfBMUw4ayPxySE=; b=lljCjsCOXMYfxk+x0RqDLKgZPShK9ifA8Wf+Ob3U630b2lK6ZdTIAMtV QT+74mF3sOEl2N08vJBQQijHqU/xddF0XLZamF+I4q/LJAsewVh/t77I4 050/NtBSht+4hmg8900n/r0T0ulLd2Mc4aS4NFD7njJFHy8Kn5aMFDIDj rARBmFV3MuWcwS1mtpJLphUAAmGG8aVrJq9pwHCSl2itRB6jRyURfL6hO J8dt70I40Xtm5P02iZGbW4ZLNOzV/wrHfW5bgxF1xt7e8rsnVGZiQc7bX /oxPAyGAVYEGpqQjQyaHvddXgBHPM7k7OIJ4T66xCrrIviof3M8Tt3vL5 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289207600" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="289207600" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:31:04 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016392" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016392" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:59 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [OPTIONAL/RFC v2 36/39] x86/fpu: Add helper for initing features Date: Thu, 29 Sep 2022 15:29:33 -0700 Message-Id: <20220929222936.14584-37-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=lljCjsCO; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490665; a=rsa-sha256; cv=none; b=XCL/vNH4iKZTFenxsOmhwnP8M1aH92BnKQgCvsuhJFBEdJbya/it9vKP08vhu/rSfdFMiq os63OUqosApErgA5MdKQh+pPoLmxMVc3TipQpNQX8P5/4zz+EsWWpDOsOQ2Um+ZkzcoCkQ wpdGuJdd6I/YOmJu1drw932ZWO9sg4U= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490665; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=WuA1rvkCcA4cxE44fou8kqxKmVuvEs8XFPp2D+CKCZo=; b=I5KaNI8BUX/ZRgatWeiYe21+kLKvtwnepWhvdhxHzVTKuJtsxANEHCO87X8zzlAGKhcwPS zn5oP2F+HsHxNAAZiK0cXjyd9oFausVIyxda/Nen9ImeG4/R1N6E1Ke+UK5NoN37DasNLK pNKU5caT4ci/ww5in3LJF+J5ALJCUYY= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=lljCjsCO; spf=pass (imf02.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Stat-Signature: h3jfxpufrie73r7a34oqn989nxhecxex X-Rspamd-Queue-Id: 62E548001B X-Rspamd-Server: rspam05 X-HE-Tag: 1664490665-604524 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If an xfeature is saved in a buffer, the xfeature's bit will be set in xsave->header.xfeatures. The CPU may opt to not save the xfeature if it is in it's init state. In this case the xfeature buffer address cannot be retrieved with get_xsave_addr(). Future patches will need to handle the case of writing to an xfeature that may not be saved. So provide helpers to init an xfeature in an xsave buffer. This could of course be done directly by reaching into the xsave buffer, however this would not be robust against future changes to optimize the xsave buffer by compacting it. In that case the xsave buffer would need to be re-arranged as well. So the logic properly belongs encapsulated in a helper where the logic can be unified. Signed-off-by: Rick Edgecombe --- v2: - New patch arch/x86/kernel/fpu/xstate.c | 58 +++++++++++++++++++++++++++++------- arch/x86/kernel/fpu/xstate.h | 6 ++++ 2 files changed, 53 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 9258fc1169cc..82cee1f2f0c8 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -942,6 +942,24 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr) return (void *)xsave + xfeature_get_offset(xcomp_bv, xfeature_nr); } +static int xsave_buffer_access_checks(int xfeature_nr) +{ + /* + * Do we even *have* xsave state? + */ + if (!boot_cpu_has(X86_FEATURE_XSAVE)) + return 1; + + /* + * We should not ever be requesting features that we + * have not enabled. + */ + if (WARN_ON_ONCE(!xfeature_enabled(xfeature_nr))) + return 1; + + return 0; +} + /* * Given the xsave area and a state inside, this function returns the * address of the state. @@ -962,17 +980,7 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr) */ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr) { - /* - * Do we even *have* xsave state? - */ - if (!boot_cpu_has(X86_FEATURE_XSAVE)) - return NULL; - - /* - * We should not ever be requesting features that we - * have not enabled. - */ - if (WARN_ON_ONCE(!xfeature_enabled(xfeature_nr))) + if (xsave_buffer_access_checks(xfeature_nr)) return NULL; /* @@ -992,6 +1000,34 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr) return __raw_xsave_addr(xsave, xfeature_nr); } +/* + * Given the xsave area and a state inside, this function + * initializes an xfeature in the buffer. + * + * get_xsave_addr() will return NULL if the feature bit is + * not present in the header. This function will make it so + * the xfeature buffer address is ready to be retrieved by + * get_xsave_addr(). + * + * Inputs: + * xstate: the thread's storage area for all FPU data + * xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP, + * XFEATURE_SSE, etc...) + * Output: + * 1 if the feature cannot be inited, 0 on success + */ +int init_xfeature(struct xregs_state *xsave, int xfeature_nr) +{ + if (xsave_buffer_access_checks(xfeature_nr)) + return 1; + + /* + * Mark the feature inited. + */ + xsave->header.xfeatures |= BIT_ULL(xfeature_nr); + return 0; +} + #ifdef CONFIG_ARCH_HAS_PKEYS /* diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h index 5ad47031383b..fb8aae678e9f 100644 --- a/arch/x86/kernel/fpu/xstate.h +++ b/arch/x86/kernel/fpu/xstate.h @@ -54,6 +54,12 @@ extern void fpu__init_cpu_xstate(void); extern void fpu__init_system_xstate(unsigned int legacy_size); extern void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr); +extern int init_xfeature(struct xregs_state *xsave, int xfeature_nr); + +static inline int xfeature_saved(struct xregs_state *xsave, int xfeature_nr) +{ + return xsave->header.xfeatures & BIT_ULL(xfeature_nr); +} static inline u64 xfeatures_mask_supervisor(void) { From patchwork Thu Sep 29 22:29:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBAC2C433FE for ; Thu, 29 Sep 2022 22:31:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8698A8D000D; Thu, 29 Sep 2022 18:31:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F4138D0007; Thu, 29 Sep 2022 18:31:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61D108D000D; Thu, 29 Sep 2022 18:31:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 544468D0007 for ; Thu, 29 Sep 2022 18:31:08 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 2310741360 for ; Thu, 29 Sep 2022 22:31:08 +0000 (UTC) X-FDA: 79966569816.01.BE78F67 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf08.hostedemail.com (Postfix) with ESMTP id 5424416001D for ; Thu, 29 Sep 2022 22:31:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490666; x=1696026666; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xqFSCPOHjTfopGQJjqWcfN6D1bVBxbnxIn0y2R87szg=; b=nJYGRQjPuB/tEyrrLJfFqpk47k93vVW3MYIfBCvbs5qe1U4GP2OTIARu WKuepsPER94981fBfd9AA9ETUU+uJB471iepQiQijTDareF++EjFocJRt h6kQnFkS3VOd0mT6AKNehUFPayGHdcjErRE2QONuDrrHycWmdpjlxS7IF lgj0+GV3W0zVXSmeHFAzlagZJMc7zGFKRnkUa1JvKgeeI3hY2dBm0UhHe m9ZCuHrWnEWRk7FCmlde5DYeY9P5vcEJGnGT2LVAiOpOt4LVTUtNHbfy3 YNfC6dV1kQo/LFYISZYLOOYhUdM3bwqRZV5uOKQ+DkWdzquzVB3XOLibM A==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="302957204" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="302957204" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:31:04 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016403" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016403" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:31:02 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [OPTIONAL/RFC v2 37/39] x86/cet: Add PTRACE interface for CET Date: Thu, 29 Sep 2022 15:29:34 -0700 Message-Id: <20220929222936.14584-38-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490666; a=rsa-sha256; cv=none; b=jN4jYw70Vk5BZ7SHmadV/gMN149mGyU20chBo76TbY596W16t2IBfrN4b6dv6shil9Mn6M IO8txfZ7BX3jfHuNjw1azrZrKQKLUjBsGNOeFo5wtqZAGLVWQwSDxhfWhlBE2rGWOjMvL0 r8DIEeMlJbbG6tt/7gqZv3xNJUjma/8= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=nJYGRQjP; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490666; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=P/eZH5mFLRTwYwnE3K/NvGBQticJnxOtSjB8bSaE8v8=; b=kgGkAC6blmbkrgKgMz121it5XAhw30SW2EFFHHsgHJ/BfJD+YshtjyFCdfLufdkph/+TYQ U5tM4hg729cBgFxLDhuof8CghN4SBxJSR91g6xI00wvwEsUuwf7YdYmZ4PeR5hPMv2gsMv vB04DqViuA4BAYB4HFVPp0eSO2opROk= X-Stat-Signature: nmcutqkwxzqimf8t4rgxer35hb6az38h X-Rspamd-Queue-Id: 5424416001D X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=nJYGRQjP; spf=pass (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam01 X-HE-Tag: 1664490665-708068 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Some applications (like GDB and CRIU) would like to tweak CET state via ptrace. This allows for existing functionality to continue to work for seized CET applications. Provide an interface based on the xsave buffer format of CET, but filter unneeded states to make the kernel’s job easier. There is already ptrace functionality for accessing xstate, but this does not include supervisor xfeatures. So there is not a completely clear place for where to put the CET state. Adding it to the user xfeatures regset would complicate that code, as it currently shares logic with signals which should not have supervisor features. Don’t add a general supervisor xfeature regset like the user one, because it is better to maintain flexibility for other supervisor xfeatures to define their own interface. For example, an xfeature may decide not to expose all of it’s state to userspace. A lot of enum values remain to be used, so just put it in dedicated CET regset. The only downside to not having a generic supervisor xfeature regset, is that apps need to be enlightened of any new supervisor xfeature exposed this way (i.e. they can’t try to have generic save/restore logic). But maybe that is a good thing, because they have to think through each new xfeature instead of encountering issues when new a new supervisor xfeature was added. By adding a CET regset, it also has the effect of including the CET state in a core dump, which could be useful for debugging. Inside the setter CET regset, filter out invalid state. Today this includes states disallowed by the HW and states involving Indirect Branch Tracking which the kernel does not currently support for usersapce. So this leaves three pieces of data that can be set, shadow stack enablement, WRSS enablement and the shadow stack pointer. It is worth noting that this is separate than enabling shadow stack via the arch_prctl()s. Enabling shadow stack involves more than just flipping the bit. The kernel is made aware that it has to do extra things when cloning or handling signals. That logic is triggered off of separate feature enablement state kept in the task struct. So the flipping on HW shadow stack enforcement without notifying the kernel to change its behavior would severely limit what an application could do without crashing. Since there is likely no use for this, only allow the CET registers to be set if shadow stack is already enabled via the arch_prctl()s. This will let apps like GDB toggle shadow stack enforcement for apps that already have shadow stack enabled, and minimize scenarios the kernel has to worry about. Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Signed-off-by: Yu-cheng Yu --- v2: - Check alignment on ssp. - Block IBT bits. - Handle init states instead of returning error. - Add verbose commit log justifying the design. Yu-Cheng v12: - Return -ENODEV when CET registers are in INIT state. - Check reserved/non-support bits from user input. arch/x86/include/asm/fpu/regset.h | 7 ++- arch/x86/include/asm/msr-index.h | 5 ++ arch/x86/kernel/fpu/regset.c | 95 +++++++++++++++++++++++++++++++ arch/x86/kernel/ptrace.c | 20 +++++++ include/uapi/linux/elf.h | 1 + 5 files changed, 125 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/fpu/regset.h b/arch/x86/include/asm/fpu/regset.h index 4f928d6a367b..8622184d87f5 100644 --- a/arch/x86/include/asm/fpu/regset.h +++ b/arch/x86/include/asm/fpu/regset.h @@ -7,11 +7,12 @@ #include -extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active; +extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active, + cetregs_active; extern user_regset_get2_fn fpregs_get, xfpregs_get, fpregs_soft_get, - xstateregs_get; + xstateregs_get, cetregs_get; extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set, - xstateregs_set; + xstateregs_set, cetregs_set; /* * xstateregs_active == regset_fpregs_active. Please refer to the comment diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 6674bdb096f3..fbc319682664 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -432,6 +432,11 @@ #define CET_RESERVED (BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9)) #define CET_SUPPRESS BIT_ULL(10) #define CET_WAIT_ENDBR BIT_ULL(11) +#define CET_EG_LEG_BITMAP_BASE_MASK GENMASK_ULL(63, 13) + +#define CET_U_IBT_MASK (CET_ENDBR_EN | CET_LEG_IW_EN | CET_NO_TRACK_EN | \ + CET_NO_TRACK_EN | CET_SUPPRESS_DISABLE | CET_SUPPRESS | \ + CET_WAIT_ENDBR | CET_EG_LEG_BITMAP_BASE_MASK) #define MSR_IA32_PL0_SSP 0x000006a4 /* ring-0 shadow stack pointer */ #define MSR_IA32_PL1_SSP 0x000006a5 /* ring-1 shadow stack pointer */ diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c index 75ffaef8c299..440dc1921ee4 100644 --- a/arch/x86/kernel/fpu/regset.c +++ b/arch/x86/kernel/fpu/regset.c @@ -174,6 +174,101 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, return ret; } +int cetregs_active(struct task_struct *target, const struct user_regset *regset) +{ +#ifdef CONFIG_X86_SHADOW_STACK + if (target->thread.shstk.size) + return regset->n; +#endif + return 0; +} + +int cetregs_get(struct task_struct *target, const struct user_regset *regset, + struct membuf to) +{ + struct fpu *fpu = &target->thread.fpu; + struct cet_user_state *cetregs; + + if (!boot_cpu_has(X86_FEATURE_SHSTK)) + return -ENODEV; + + sync_fpstate(fpu); + cetregs = get_xsave_addr(&fpu->fpstate->regs.xsave, XFEATURE_CET_USER); + if (!cetregs) { + /* + * The registers are the in the init state. The init values for + * these regs are zero, so just zero the output buffer. + */ + membuf_zero(&to, sizeof(struct cet_user_state)); + return 0; + } + + return membuf_write(&to, cetregs, sizeof(struct cet_user_state)); +} + +int cetregs_set(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + struct fpu *fpu = &target->thread.fpu; + struct xregs_state *xsave = &fpu->fpstate->regs.xsave; + struct cet_user_state *cetregs, tmp; + bool ia32; + int r; + + if (!boot_cpu_has(X86_FEATURE_SHSTK) || + !cetregs_active(target, regset)) + return -ENODEV; + + ia32 = IS_ENABLED(CONFIG_IA32_EMULATION) && + target->thread_info.status & TS_COMPAT; + + r = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &tmp, 0, -1); + if (r) + return r; + + /* + * Some kernel instructions (IRET, etc) can cause exceptions in the case + * of disallowed CET register values. Just prevent invalid values. + */ + if ((tmp.user_ssp >= TASK_SIZE_MAX) || + (ia32 && !IS_ALIGNED(tmp.user_ssp, 4)) || + (!ia32 && !IS_ALIGNED(tmp.user_ssp, 8))) + return -EINVAL; + + /* + * Don't allow any IBT bits to be set because it is not supported by + * the kernel yet. Also don't allow reserved bits. + */ + if ((tmp.user_cet & CET_RESERVED) || (tmp.user_cet & CET_U_IBT_MASK)) + return -EINVAL; + + fpu_force_restore(fpu); + + /* + * Don't want to init the xfeature until the kernel will definetely + * overwrite it, otherwise if it inits and then fails out, it would + * end up initing it to random data. + */ + if (!xfeature_saved(xsave, XFEATURE_CET_USER) && + WARN_ON(init_xfeature(xsave, XFEATURE_CET_USER))) + return -ENODEV; + + cetregs = get_xsave_addr(xsave, XFEATURE_CET_USER); + if (WARN_ON(!cetregs)) { + /* + * This shouldn't ever be NULL because it was successfully + * inited above if needed. The only scenario would be if an + * xfeature was somehow saved in a buffer, but not enabled in + * xsave. + */ + return -ENODEV; + } + + memmove(cetregs, &tmp, sizeof(tmp)); + return 0; +} + #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION /* diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index eed8a65d335d..f9e6635b69ce 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -51,6 +51,7 @@ enum x86_regset_32 { REGSET_XSTATE32, REGSET_TLS32, REGSET_IOPERM32, + REGSET_CET32, }; enum x86_regset_64 { @@ -58,6 +59,7 @@ enum x86_regset_64 { REGSET_FP64, REGSET_IOPERM64, REGSET_XSTATE64, + REGSET_CET64, }; #define REGSET_GENERAL \ @@ -1267,6 +1269,15 @@ static struct user_regset x86_64_regsets[] __ro_after_init = { .active = ioperm_active, .regset_get = ioperm_get }, + [REGSET_CET64] = { + .core_note_type = NT_X86_CET, + .n = sizeof(struct cet_user_state) / sizeof(u64), + .size = sizeof(u64), + .align = sizeof(u64), + .active = cetregs_active, + .regset_get = cetregs_get, + .set = cetregs_set + }, }; static const struct user_regset_view user_x86_64_view = { @@ -1336,6 +1347,15 @@ static struct user_regset x86_32_regsets[] __ro_after_init = { .active = ioperm_active, .regset_get = ioperm_get }, + [REGSET_CET32] = { + .core_note_type = NT_X86_CET, + .n = sizeof(struct cet_user_state) / sizeof(u64), + .size = sizeof(u64), + .align = sizeof(u64), + .active = cetregs_active, + .regset_get = cetregs_get, + .set = cetregs_set + }, }; static const struct user_regset_view user_x86_32_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index c7b056af9ef0..11089731e2e9 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -406,6 +406,7 @@ typedef struct elf64_shdr { #define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */ #define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */ #define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */ +#define NT_X86_CET 0x203 /* x86 CET state */ #define NT_S390_HIGH_GPRS 0x300 /* s390 upper register halves */ #define NT_S390_TIMER 0x301 /* s390 timer register */ #define NT_S390_TODCMP 0x302 /* s390 TOD clock comparator register */ From patchwork Thu Sep 29 22:29:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994692 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC61BC433F5 for ; Thu, 29 Sep 2022 22:31:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 72ED38D000B; Thu, 29 Sep 2022 18:31:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6DEEF8D0007; Thu, 29 Sep 2022 18:31:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 57FE48D000B; Thu, 29 Sep 2022 18:31:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 49EA68D0007 for ; Thu, 29 Sep 2022 18:31:28 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AC5AD1413B2 for ; Thu, 29 Sep 2022 22:31:27 +0000 (UTC) X-FDA: 79966570614.10.C8F3848 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf24.hostedemail.com (Postfix) with ESMTP id 0CA97180013 for ; Thu, 29 Sep 2022 22:31:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490687; x=1696026687; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=u9CznCATrsGnwa8pqTQLU3Taz6CK/euF7IU8I3CM4v4=; b=SPWjuh5+iFVJ+XTdS20s4WLziKgJcth43ph6Y7JAluihDsMReYRZ9JVC NnDtcfZqr0CEw5zkQ7hxEM+16sUNjUfT2AIma+/lqY7c3lTUI/jcBlNZX fAWY5Uej+PKFIIQs47F9Dy6T7kj9KxSQAYAEod9L9N7cH6m19PjIrBNHW 7kBUupuW0ZAtvvDocShW23v1pt/Ghpb1x1lU6GWwbs66vqDX/rn2jz3kD Y1y7rNBpJFSSg82joad3/Ri4sA3ZF41vTkUsFfUadcraNQs4xJ78z9lpB +vHQnWr4/MQJudsvsQOzz/4Dbev9SiRznC4nNNmk2fE6ok8oS2qfMAp3Z g==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="302015655" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="302015655" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:31:06 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016413" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016413" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:31:04 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Mike Rapoport Subject: [OPTIONAL/RFC v2 38/39] x86/cet/shstk: Add ARCH_CET_UNLOCK Date: Thu, 29 Sep 2022 15:29:35 -0700 Message-Id: <20220929222936.14584-39-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490687; a=rsa-sha256; cv=none; b=RHlMptEL6SWGbcz/08G0OQUt74pDg94+zbJ+Ww9+PH3p8sIto4OR5QhQv+ZFI3kuHq9qOP EWQCuTrBGGRqXC51CfSNgMz3yUghvzs3E+t4ipjy/lqKo44qC5b/eMTitbGBHdU2uW22Xq Lomoyual4ZXb845YIIFJY7R8xZnb9Hw= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=SPWjuh5+; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490687; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=efT3MnxieyXninwXrxTo6VFUs2I4x2OETTRzAB+4ThM=; b=gEXpowICawXjWvwJrnRUKmNRRurtQ2c8/26UtoSluEkGEOqpkVekDMIUlyClyeImAAxMH7 NbaMsdhkjU1gp1LQ/qFe6Gjj+/ia6QezmXnt6hR65wRl3H5VO0Boj6RFux+hcV/CHf/PMa KOcwRKjyjW/J1Q8LDA2gy6pL0LS5ego= X-Rspamd-Queue-Id: 0CA97180013 X-Rspam-User: Authentication-Results: imf24.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=SPWjuh5+; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: qaqgt9j6ah1d8t38menpcgwrhrxd3ex4 X-HE-Tag: 1664490686-961850 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mike Rapoport Userspace loaders may lock features before a CRIU restore operation has the chance to set them to whatever state is required by the process being restored. Allow a way for CRIU to unlock features. Add it as an arch_prctl() like the other CET operations, but restrict it being called by the ptrace arch_pctl() interface. Signed-off-by: Mike Rapoport [Merged into recent API changes, added commit log and docs] Signed-off-by: Rick Edgecombe --- v2: - New patch Documentation/x86/cet.rst | 3 +++ arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/process_64.c | 1 + arch/x86/kernel/shstk.c | 9 +++++++-- 4 files changed, 12 insertions(+), 2 deletions(-) diff --git a/Documentation/x86/cet.rst b/Documentation/x86/cet.rst index 4a0dfb6830f9..6b270a24ebc3 100644 --- a/Documentation/x86/cet.rst +++ b/Documentation/x86/cet.rst @@ -81,6 +81,9 @@ arch_prctl(ARCH_CET_DISABLE, unsigned int feature) arch_prctl(ARCH_CET_LOCK, unsigned int features) Lock in features at their current enabled or disabled status. +arch_prctl(ARCH_CET_UNLOCK, unsigned int features) + Unlock features. + The return values are as following: On success, return 0. On error, errno can be:: diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index d811f0c5fc4f..2f4d81ab4849 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -25,6 +25,7 @@ #define ARCH_CET_ENABLE 0x4001 #define ARCH_CET_DISABLE 0x4002 #define ARCH_CET_LOCK 0x4003 +#define ARCH_CET_UNLOCK 0x4004 #define CET_SHSTK 0x1 #define CET_WRSS 0x2 diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index be544b4b4c8b..fbb2062dd0d2 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -834,6 +834,7 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_CET_ENABLE: case ARCH_CET_DISABLE: case ARCH_CET_LOCK: + case ARCH_CET_UNLOCK: return cet_prctl(task, option, arg2); default: ret = -EINVAL; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 0efec02dbe6b..af1255164f0c 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -464,9 +464,14 @@ long cet_prctl(struct task_struct *task, int option, unsigned long features) return 0; } - /* Don't allow via ptrace */ - if (task != current) + /* Only allow via ptrace */ + if (task != current) { + if (option == ARCH_CET_UNLOCK) { + task->thread.features_locked &= ~features; + return 0; + } return -EINVAL; + } /* Do not allow to change locked features */ if (features & task->thread.features_locked) From patchwork Thu Sep 29 22:29:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12994693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88F17C433FE for ; Thu, 29 Sep 2022 22:31:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C889E8D0007; Thu, 29 Sep 2022 18:31:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C366A8D000F; Thu, 29 Sep 2022 18:31:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A61918D0007; Thu, 29 Sep 2022 18:31:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 83CAB8D000F for ; Thu, 29 Sep 2022 18:31:28 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5AA6E41360 for ; Thu, 29 Sep 2022 22:31:28 +0000 (UTC) X-FDA: 79966570656.03.3D65F73 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf24.hostedemail.com (Postfix) with ESMTP id C1F4618000B for ; Thu, 29 Sep 2022 22:31:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490687; x=1696026687; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qC94pWx3ETPKg6Jl0TVzncTVpDWyMDjZ00/mG9Fm96U=; b=Sr88jguHpaK0CRMJBjGC3ESwZSQq2rx6xEV1YLX8a0ALh6rz55s3As6Z V7yVCBJsr50H1VmAvuLW1s91nA0O8gekiX6VadJ5HgRGUSJIoPeoKPjaW 9JWLhBDTFn7tCsY0Ishsm+jkewBRMYkMyZo/e7HaRLwsLJQ9mMQgznSHd 1Xx6jqO/qFKPfHmXAJ/QXn2yckdPTcW5ICRGqVbDyWMt20c6zYqMRnsQr BxQj4k0AUcPZN6VX6kJ0oAc6g3mzr/+JtfRoc0U/PuWNrIn8DeQPH6Gg0 adrY7Aobivj6N53zkDWAK+iA64bsKFQmOHkSPJE8uUliia/Vp7tGWDxXf Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="302015663" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="302015663" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:31:07 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016422" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016422" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:31:06 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com Subject: [OPTIONAL/RFC v2 39/39] x86: Add alt shadow stack support Date: Thu, 29 Sep 2022 15:29:36 -0700 Message-Id: <20220929222936.14584-40-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664490688; a=rsa-sha256; cv=none; b=Zq3V0QEwf9ZO0XNrX4k19MolBBsB7yxtGweo9h5crbIUKYMAA764mL1BNNfnCdpwP+RSPS 60J+oD25z0VnhV8wGwxmvizTPsOatdrqQxHi/6oLu/IUUctAqrmiWJsFfilTRRLqda8h+V qoDMQDPuInZwIwfUYqx2hwho65Z2nB4= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Sr88jguH; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664490688; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YK0wMfp8L1+q10cbAjxZWdEWh8f6c8JaMVpZaoKdA/Q=; b=H5DaTz4pLrlw/3S6odr2XAkBow+Gc94hjNyD6tldGeCIBpDDW033hSj/ng+mwKVGYupzsz ZZk3gLZ/SdS8R5h9ixug4EPE//LGasKNINh8TIKkyPM1DMfJA6eaqkZmtlbW4pP8nrkbkz /sqSMZKs8PNntlMLBdsRK/C6YYhv0N0= X-Rspamd-Queue-Id: C1F4618000B X-Rspam-User: Authentication-Results: imf24.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Sr88jguH; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: k58jznczf7sf79pj7c3zsujo9ogrbfbq X-HE-Tag: 1664490687-377993 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To handle stack overflows, applications can register a separate signal alt stack to use for the stack to handle signals. To handle shadow stack overflows the kernel can similarly provide the ability to have an alt shadow stack. Signals push information about the execution context to the stack that will handle the signal. The data pushed is use to restore registers and other state after the signal. In the case of handling the signal on a normal stack, the stack just needs to be unwound over the stack frame, but in the case of alt stacks, the saved stack pointer is important for the sigreturn to find it’s way back to the thread. With shadow stack there is a new type of stack pointer, the shadow stack pointer (SSP), that needs to be restored. Just like the regular stack pointer, it needs to be saved somewhere in order to implement shadow alt stacks. This is already done as part of the token placed to prevent SROP attacks, so on sigreturn from an alt shadow stack, the kernel can easily know which SSP to restore. But to enable SS_AUTODISARM like functionality, the kernel also needs to push the shadow alt stack and size somewhere, like happens in regular alt stacks. So push this data using the same format. In the end the shadow stack sigframe looks like this: |1...old SSP|1...alt stack size|1...alt stack base| 0| In the future, any other data could come between the alt stack base and the guard zero. The guard zero is to prevent tricking the kernel into processing half of one frame and half of the adjacent frame. In past designs for userspace shadow stacks, shadow alt stacks were not supported. Since there was only one shadow stack, longjmp() could jump out of a signal by using incssp to unwind the SSP to the place where the setjmp() was called. Since alt shadow stacks are a new thing, simply don't support longjmp()ing from an alt shadow stacks. Introduce a new syscall "sigaltshstk" that behaves similarly to sigaltstack. Have it take new and old stack_t's to specify the base and length of the alt shadow stack. Don't have it adopt the same flag semantics though, because not all alt stack flags will necessarily apply to alt shadow stacks. As long as the syscall is getting new flag meanings make SS_AUTODISARM the default behavior for sigaltshstk(), and not require a flag. Today the only flag supported is SS_DISABLE, and a !SS_AUTODISARM mode is not supported. So when a signal hits it will jump to the location specified in sigaltshstk(). Currently (without WRSS), userspace doesn’t have the ability to arbitrarily set the SSP. But telling the kernel to set the SSP to an arbitrary point on signal is kind of like that. So there would be a weakening of the shadow stack protections unless additional checks are made. With the SS_AUTODISARM-style behavior, the SSP will only jump to the shadow stack if the SSP is not already on the shadow stack, otherwise it will just push the SSP. So have the kernel checks for a token whenever transitioning to the alt stack from a place other than the alt stack. This token can be written by the kernel during shadow stack allocation, using the map_shadow_stack syscall. Signed-off-by: Rick Edgecombe --- v2: - New patch arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/asm/cet.h | 2 + arch/x86/include/asm/processor.h | 3 + arch/x86/kernel/process.c | 3 + arch/x86/kernel/shstk.c | 178 +++++++++++++++--- include/linux/syscalls.h | 1 + kernel/sys_ni.c | 1 + .../testing/selftests/x86/test_shadow_stack.c | 75 ++++++++ 8 files changed, 240 insertions(+), 24 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index d9639e3e0a33..a2dd5d56caa4 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -373,6 +373,7 @@ 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common map_shadow_stack sys_map_shadow_stack +452 common sigaltshstk sys_sigaltshstk # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index edf681d4843a..52119b913ed6 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -26,6 +26,7 @@ void reset_thread_shstk(void); int setup_signal_shadow_stack(struct ksignal *ksig); int restore_signal_shadow_stack(void); int wrss_control(bool enable); +void reset_alt_shstk(void); #else static inline long cet_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } @@ -40,6 +41,7 @@ static inline void reset_thread_shstk(void) {} static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; } static inline int restore_signal_shadow_stack(void) { return 0; } static inline int wrss_control(bool enable) { return -EOPNOTSUPP; } +static inline void reset_alt_shstk(void) {} #endif /* CONFIG_X86_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 3a0c9d9d4d1d..b9fb966edec7 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -536,6 +536,9 @@ struct thread_struct { #ifdef CONFIG_X86_SHADOW_STACK struct thread_shstk shstk; + unsigned long sas_shstk_sp; + size_t sas_shstk_size; + unsigned int sas_shstk_flags; #endif /* Floating point and extended processor state */ diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 5e63d190becd..b71eb2d6a20f 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -176,6 +176,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) frame->flags = X86_EFLAGS_FIXED; #endif + if ((clone_flags & (CLONE_VM|CLONE_VFORK)) == CLONE_VM) + reset_alt_shstk(); + /* Allocate a new shadow stack for pthread if needed */ ret = shstk_alloc_thread_stack(p, clone_flags, args->flags, &shstk_addr); if (ret) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index af1255164f0c..05ee3793b60f 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -25,6 +25,7 @@ #include #include #include +#include #define SS_FRAME_SIZE 8 @@ -149,11 +150,18 @@ int shstk_setup(void) return 0; } +void reset_alt_shstk(void) +{ + current->thread.sas_shstk_sp = 0; + current->thread.sas_shstk_size = 0; +} + void reset_thread_shstk(void) { memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk)); current->thread.features = 0; current->thread.features_locked = 0; + reset_alt_shstk(); } int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, @@ -238,39 +246,67 @@ static int get_shstk_data(unsigned long *data, unsigned long __user *addr) return 0; } +static bool on_alt_shstk(unsigned long ssp) +{ + unsigned long alt_ss_start = current->thread.sas_shstk_sp; + unsigned long alt_ss_end = alt_ss_start + current->thread.sas_shstk_size; + + return ssp >= alt_ss_start && ssp < alt_ss_end; +} + +static bool alt_shstk_active(void) +{ + return current->thread.sas_shstk_sp; +} + +static bool alt_shstk_valid(unsigned long ssp, size_t size) +{ + if (ssp && (size < PAGE_SIZE || size >= TASK_SIZE_MAX)) + return -EINVAL; + + if (ssp >= TASK_SIZE_MAX) + return -EINVAL; + + return 0; +} + /* - * Create a restore token on shadow stack, and then push the user-mode - * function return address. + * Verify the user shadow stack has a valid token on it, and then set + * *new_ssp according to the token. */ -static int shstk_setup_rstor_token(unsigned long ret_addr, unsigned long *new_ssp) +static int shstk_check_rstor_token(unsigned long token_addr, unsigned long *new_ssp) { - unsigned long ssp, token_addr; - int err; + unsigned long token; - if (!ret_addr) + if (get_user(token, (unsigned long __user *)token_addr)) + return -EFAULT; + + /* Is mode flag correct? */ + if (!(token & BIT(0))) return -EINVAL; - ssp = get_user_shstk_addr(); - if (!ssp) + /* Is busy flag set? */ + if (token & BIT(1)) return -EINVAL; - err = create_rstor_token(ssp, &token_addr); - if (err) - return err; + /* Mask out flags */ + token &= ~3UL; + + /* Restore address aligned? */ + if (!IS_ALIGNED(token, 8)) + return -EINVAL; - ssp = token_addr - sizeof(u64); - err = write_user_shstk_64((u64 __user *)ssp, (u64)ret_addr); + /* Token placed properly? */ + if (((ALIGN_DOWN(token, 8) - 8) != token_addr) || token >= TASK_SIZE_MAX) + return -EINVAL; - if (!err) - *new_ssp = ssp; + *new_ssp = token; - return err; + return 0; } -static int shstk_push_sigframe(unsigned long *ssp) +static int shstk_push_sigframe(unsigned long *ssp, unsigned long target_ssp) { - unsigned long target_ssp = *ssp; - /* Token must be aligned */ if (!IS_ALIGNED(*ssp, 8)) return -EINVAL; @@ -278,17 +314,32 @@ static int shstk_push_sigframe(unsigned long *ssp) if (!IS_ALIGNED(target_ssp, 8)) return -EINVAL; + *ssp -= SS_FRAME_SIZE; + if (write_user_shstk_64((u64 __user *)*ssp, 0)) + return -EFAULT; + + *ssp -= SS_FRAME_SIZE; + if (put_shstk_data((u64 __user *)*ssp, current->thread.sas_shstk_sp)) + return -EFAULT; + + *ssp -= SS_FRAME_SIZE; + if (put_shstk_data((u64 __user *)*ssp, current->thread.sas_shstk_size)) + return -EFAULT; + *ssp -= SS_FRAME_SIZE; if (put_shstk_data((void *__user)*ssp, target_ssp)) return -EFAULT; + current->thread.sas_shstk_sp = 0; + current->thread.sas_shstk_size = 0; + return 0; } static int shstk_pop_sigframe(unsigned long *ssp) { - unsigned long token_addr; + unsigned long token_addr, shstk_sp, shstk_size; int err; err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); @@ -303,7 +354,38 @@ static int shstk_pop_sigframe(unsigned long *ssp) if (unlikely(token_addr >= TASK_SIZE_MAX)) return -EINVAL; + *ssp += SS_FRAME_SIZE; + err = get_shstk_data(&shstk_size, (void __user *)*ssp); + if (unlikely(err)) + return err; + + *ssp += SS_FRAME_SIZE; + err = get_shstk_data(&shstk_sp, (void __user *)*ssp); + if (unlikely(err)) + return err; + + if (unlikely(alt_shstk_valid((unsigned long)shstk_sp, shstk_size))) + return -EINVAL; + *ssp = token_addr; + current->thread.sas_shstk_sp = shstk_sp; + current->thread.sas_shstk_size = shstk_size; + + return 0; +} + +static unsigned long get_sig_start_ssp(unsigned long orig_ssp, unsigned long *ssp) +{ + unsigned long sp_end = (current->thread.sas_shstk_sp + + current->thread.sas_shstk_size) - SS_FRAME_SIZE; + + if (!alt_shstk_active() || on_alt_shstk(*ssp)) { + *ssp = orig_ssp; + return 0; + } + + if (shstk_check_rstor_token(sp_end, ssp)) + return -EINVAL; return 0; } @@ -311,7 +393,7 @@ static int shstk_pop_sigframe(unsigned long *ssp) int setup_signal_shadow_stack(struct ksignal *ksig) { void __user *restorer = ksig->ka.sa.sa_restorer; - unsigned long ssp; + unsigned long ssp, orig_ssp; int err; if (!cpu_feature_enabled(X86_FEATURE_SHSTK) || @@ -321,11 +403,15 @@ int setup_signal_shadow_stack(struct ksignal *ksig) if (!restorer) return -EINVAL; - ssp = get_user_shstk_addr(); - if (unlikely(!ssp)) + orig_ssp = get_user_shstk_addr(); + if (unlikely(!orig_ssp)) return -EINVAL; - err = shstk_push_sigframe(&ssp); + err = get_sig_start_ssp(orig_ssp, &ssp); + if (unlikely(err)) + return err; + + err = shstk_push_sigframe(&ssp, orig_ssp); if (unlikely(err)) return err; @@ -496,3 +582,47 @@ long cet_prctl(struct task_struct *task, int option, unsigned long features) return wrss_control(true); return -EINVAL; } + +SYSCALL_DEFINE2(sigaltshstk, const stack_t __user *, uss, stack_t __user *, uoss) +{ + unsigned long ssp; + stack_t new, old; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return -ENOSYS; + + ssp = get_user_shstk_addr(); + + if (unlikely(!ssp || on_alt_shstk(ssp))) + return -EPERM; + + if (uss) { + if (unlikely(copy_from_user(&new, uss, sizeof(stack_t)))) + return -EFAULT; + + if (unlikely(alt_shstk_valid((unsigned long)new.ss_sp, + new.ss_size))) + return -EINVAL; + + if (new.ss_flags & SS_DISABLE) { + current->thread.sas_shstk_sp = 0; + current->thread.sas_shstk_size = 0; + return 0; + } + + current->thread.sas_shstk_sp = (unsigned long) new.ss_sp; + current->thread.sas_shstk_size = new.ss_size; + /* No saved flags for now */ + } + + if (!uoss) + return 0; + + memset(&old, 0, sizeof(stack_t)); + old.ss_sp = (void __user *)current->thread.sas_shstk_sp; + old.ss_size = current->thread.sas_shstk_size; + if (copy_to_user(uoss, &old, sizeof(stack_t))) + return -EFAULT; + + return 0; +} diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 3ae05cbdea5b..7b7e7bb992c2 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1057,6 +1057,7 @@ asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long l unsigned long home_node, unsigned long flags); asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags); +asmlinkage long sys_sigaltshstk(const struct sigaltstack *uss, struct sigaltstack *uoss); /* * Architecture-specific system calls diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index cb9aebd34646..3a5f8b76e7a4 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -382,6 +382,7 @@ COND_SYSCALL(modify_ldt); COND_SYSCALL(vm86); COND_SYSCALL(kexec_file_load); COND_SYSCALL(map_shadow_stack); +COND_SYSCALL(sigaltshstk); /* s390 */ COND_SYSCALL(s390_pci_mmio_read); diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c index 249397736d0d..22b856de5cdd 100644 --- a/tools/testing/selftests/x86/test_shadow_stack.c +++ b/tools/testing/selftests/x86/test_shadow_stack.c @@ -492,6 +492,76 @@ int test_userfaultfd(void) return 1; } +volatile bool segv_pass; + +long sigaltshstk(stack_t *uss, stack_t *ouss) +{ + return syscall(__NR_sigaltshstk, uss, ouss); +} + +void segv_alt_handler(int signum, siginfo_t *si, void *uc) +{ + unsigned long min = (unsigned long)shstk_ptr; + unsigned long max = (unsigned long)shstk_ptr + SS_SIZE; + unsigned long ssp = get_ssp(); + stack_t alt_shstk_stackt; + + if (sigaltshstk(NULL, &alt_shstk_stackt)) + goto fail; + + if (alt_shstk_stackt.ss_sp || alt_shstk_stackt.ss_size) + goto fail; + + if (ssp < min || ssp > max - 8) + goto fail; + + segv_pass = true; + return; +fail: + segv_pass = false; +} + +int test_shstk_alt_stack(void) +{ + stack_t alt_shstk_stackt; + struct sigaction sa; + int ret = 1; + + sa.sa_sigaction = segv_alt_handler; + if (sigaction(SIGUSR1, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + shstk_ptr = create_shstk(0); + if (shstk_ptr == MAP_FAILED) + goto err_sig; + + alt_shstk_stackt.ss_sp = shstk_ptr; + alt_shstk_stackt.ss_size = SS_SIZE; + if (sigaltshstk(&alt_shstk_stackt, NULL) == -1) + goto err_shstk; + + segv_pass = false; + + /* Make sure segv_was_on_alt is set before signal */ + asm volatile("" : : : "memory"); + + raise(SIGUSR1); + + if (segv_pass) { + printf("[OK]\tAlt shadow stack test.\n"); + ret = 0; + } + +err_shstk: + alt_shstk_stackt.ss_flags = SS_DISABLE; + sigaltshstk(&alt_shstk_stackt, NULL); + free_shstk(shstk_ptr); +err_sig: + signal(SIGUSR1, SIG_DFL); + return ret; +} + int main(int argc, char *argv[]) { int ret = 0; @@ -556,6 +626,11 @@ int main(int argc, char *argv[]) printf("[FAIL]\tUserfaultfd test\n"); } + if (test_shstk_alt_stack()) { + ret = 1; + printf("[FAIL]\tAlt shadow stack test\n"); + } + out: /* * Disable shadow stack before the function returns, or there will be a